Re: [elephant-devel] Re: Derived Indicies

Ian Eslick Thu, 08 May 2008 09:18:35 -0700


On May 8, 2008, at 11:32 AM, Leslie P. Polzer wrote:

(defpclass person ()
  ((name ...)
   (inbox :accessor inbox :initform (make-indexed-btree message)
        :index-on (date sender))))

Which would create a indexed-btree that stored messages and auto-
created indices on the date and sender slots?

That is a bit more OODB/Lispy than the association mechanism I've
implemented.  However I don't think you need to overload defpclass to
get this functionality.  In fact I think it starts to get too ugly.

I was thinking of this, but the other way round; MESSAGE shouldcontain

the usual index declarations:

(defpclass message ()
 ((date :index t) ; create an index on DATE
  (sender :index t))) ; create an index on SENDER

These indices would be created in the correct "indexing namespace"
when the object is put into a container. This is easy to maintain
with, say, an INDEXED-SET class. Whatever.

Can you be more explicit about what indexes you imagine being createdhere and where they are stored and how they are accessed?

This should probably also imply that objects of the MESSAGE class
are not automatically registered in the store controller's class
root (as it is now); this should only happen on

(defpclass message ()
 (...)
(:index t))

How do you know what to sort the 'inbox' sorted-set by? Does it sorton message date or sender or both?

FYI, the class indexing function is now implicit, since elephantmaintains a master list of oid->class-schema to support the schemamechanism which is itself an indexed btree. I think I have an optionto make this connection weak so the class index isn't updatedautomatically.

inbox should be an object (class instance or btree) that has its own
api.


In fact that's my current solution but I don't like it much. It's not
good for quick development. You need to think too much about the
storage part.

I think we need to work through this model as an independent extensionfirst. I'm leery at this point of premature optimization given thecost in complexity, testing, and API changes that these kinds ofchanges incur. (plus I'm not going to have time for a major upgradefor some time).

However associations, like psets, are not sorted (dup-btree
oid:instance-ref). The value of (user message) is a persistentobjectthat is added to a dup-btree maintained by the metaclass protocol.It
maps the oid of a user to the messages that store it.
All too complicated. IMHO a great feature of Elephant is that it let

(setf (user message) charlie) and (inbox charlie) is too complicated?Associations implements and maintains your set-of-objects modelwithout requiring you to explicitly add objects to the appropriatecontainer. Seems like alot of gain for two lines of code! Moreover,it limits the proliferation of btrees which makes life easier forpostmodern given the table-per-btree implementation restriction.

you work with your objects without worrying much about the storage
backend. As of now, this feature still needs to get better.
Sure, it works (at least for me), but it's not as nice as it could
be. Elephant should take care of all the low-level sorting stuff
(probably creating indices wherever needed or even sorting without
indices for prototyping).

That would definitely be nice, but I'm not convinced the increasedcomplexity is worth the benefit. These things need to work robustlyin the presence of migration, schema changes, multiple stores,transactions, the existing MOP, disconnected operations and thelimitations of all the different data stores, all without adverselyimpacting the base performance.

One hard part in implementing something like this is telling thesystem how to hook into the slot-value and (setf slot-value) functionson individual class slots without incurring significant overhead. Thequery system should do the right thing without these connections viajoins and you can add the association declarations when you needbetter performance. You could figure this out on the fly, of course,but building a large index can tie up the system for quite some timeand you don't want that to happen randomly.


Of course, all what you propose is doable.

It would be reasonably cheap to do query caching of these sorted
OIDs so that subsequent OFFSET & LIMIT style accesses over the same
query set would be fast, just instantiating those messages that areneeded.


While I'm at it: OFFSET and LIMIT (a real limit which lets you specify
an arbitrary Lisp expression) are things we definitely want to aim
for in 1.0. They are not difficult to implement at all, but they don't
work with GET-INSTANCES-BY-* and, worse, MAP-BTREE. This means
everyone has to write their own version of these functions that
take appropriate arguments and move the cursor around themselves
instead of relying on a simple high-level API.

Can't you generalize this today as a higher order function that doesthis as a scan over an index, something like:

(map-inverted-index class index (offset-limit-scanner offset limit-fn) :oids t)

(defun offset-limit-scanner (offset limit-fn &optional (sc *store-controller*))

  (let ((count 0))
    (lambda (oid)
       (incf count)
       (when (> count offset)
         (let ((instance (controller-recreate-instance sc oid)))
           (when (limit-fn instance)
              (stop-mapping))))))

In general, I believe these are things to think about for as roadmapitems for 1.1 and 1.2. They won't happen soon enough to justifydelaying 1.0 for months. As I said above, I think this should be acontrib that implements this behind a macro that generates theappropriate methods for a set of generic functions. get-instances-by-class was always a convenience, not a catch-all.

I'd have implemented these extensions myself, but I thought it better
to wait for the integration of the query language to add it.

Well, don't hold your breath. :) Unless someone other than me picksup the query system work, it could be months before I get around to it.

The derived index hack is still more efficient for large sets.
Without changes to the data stores to create an efficient way of
sorting concatenated values, I don't see a way to improve on iteasily.


I'm not sure you actually need concatenated index values at all
if you manage your objects correctly. I.e. putting them in appropriate
containers (the natural OODB way) as opposed to throwing them all
together in some indexing namespace and then tediously (for programmer
and machine) selecting the stuff you need.


Hmmm...I'll have to think about that.

Good ideas here, let's keep the ideas coming and even better, see somecontributions/extensions that implement this without impacting the MOPand all the ultimate automation we might desire.

 Leslie

_______________________________________________
elephant-devel site list
[email protected]
http://common-lisp.net/mailman/listinfo/elephant-devel


_______________________________________________
elephant-devel site list
[email protected]
http://common-lisp.net/mailman/listinfo/elephant-devel

Re: [elephant-devel] Re: Derived Indicies

Reply via email to