Originally I wrote it get-instance-by-value as a convenience function
when I knew that there was only one object being returned by an
index. If there are multiple objects that match the input term, then
get-instance-by-value is effectively returning a random one.
There is going to be a huge performance difference on sets of this
size between the BDB backend and the original CL-SQL backend because
the SQL backend cannot support the ordering constraints of an index
without reading all the objects into memory O(n) vs O(s) where s is
the subset of the index size n that matches the input term where s can
= 1.
I'm surprised you are seeing this difference on Postmodern, unless
your input name is matching a large number of objects (i.e. s is
large). I had thought that the native PostgreSQL backend, postmodern,
fixed the linear cost problem of CL-SQL indices and provides the same
O(s) performance that BDB does, but I'm not certain (Henrik?
Robert?). It sounds like you are seeing a high linear cost.
As you point out, when the duplicate set is large, get-value on the
secondary index does exactly the same thing without having to load in
the whole duplicate set. The fix you provided behaves the same and
handles the case where you want a random member of a set and the set
is large so there is no reason not to do it.
I'll go ahead and add this patch.
There is not SQL recording built into Elephant that I'm aware of,
however Henrik or Robert should weigh in on this as it may be easy to
add.
Cheers,
Ian
On Nov 28, 2007, at 3:53 AM, Alain Picard wrote:
Dear Elephant developers,
I've been considering using Elephant for a project of mine,
and have been doing some basic performance tests, using the
new postmodern back end (which seems way cool, btw).
The scenario I'm testing is something like this; you
have a base class:
(defclass person-mixin ()
((name :accessor person-name :initarg :name :index t))
(:metaclass persistent-metaclass))
and a derived one:
(defclass employee (person-mixin)
((job :accessor job
:initarg :job))
(:metaclass persistent-metaclass))
And you go off an make a million instances of employees.
[Let's say we're a very big corporation. :-)]
Then when I did the following:
(time (get-instance-by-value 'employee 'name name))
I was surprised to find that not only is it slow, but it conses
like a madman. This led me to inspect what this function actually
does, and it turns out that it ends up doing a map-index, which
does a with-btree-cursor to find a get-instances-by-value and
then throws away all but the first.
;;; Current definition, in 0.9.1
(defmethod get-instance-by-value ((class symbol) slot-name value)
(let ((list (get-instances-by-value (find-class class) slot-name
value)))
(when (consp list)
(car list))))
(defmethod get-instance-by-value ((class persistent-metaclass) slot-
name value)
(let ((list (get-instances-by-value class slot-name value)))
(when (consp list)
(car list))))
It seems odd to create a cursor to find something when you have an
index on that slot. Also, it seems to me users of
GET-INSTANCE-BY-VALUE probably imagine there is only 1 instance to
return; and so would there be a huge problem in using something like
the following instead:
;;; Proposed definitions:
(defmethod get-instance-by-value ((class persistent-metaclass) slot-
name value)
(let ((bt (find-inverted-index class slot-name)))
(if bt
(get-value value bt) ; Do it the "simple" way
(first (get-instances-by-value class slot-name value)))))
(defmethod get-instance-by-value ((class symbol) slot-name value)
(get-instance-by-value (find-class class) slot-name value))
This is more than a factor of 10 faster under elephant/postmodern
for a class with 30,000 instances.
Am I missing something really basic here? Is there a simpler
way to do what I want without this performance penalty?
Will this simply not work for some other back ends I'm not
aware of? I feel a certain tension in the code trying to
be "all things to all back-ends", and certain decisions are
clearly inspired by the Berkeley DB back end, which sadly I
could not use for the venture I have in mind (for licensing reasons).
Lastly, is there a way to trace all the SQL commands going
back and forth to postgresql in postmodern? So far I've resorted
to Postgres statement logging, which is painful to match up
with what the application does. I'm looking for the postmodern
equivalent of CLSQL's START-SQL-RECORDING.
Thanks in advance!
Alain Picard
--
Please read about why Top Posting
is evil at: http://en.wikipedia.org/wiki/Top-posting
and http://www.dickalba.demon.co.uk/usenet/guide/faq_topp.html
Please read about why HTML in email is evil at:
http://www.birdhouse.org/etc/evilmail.html
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel