On May 8, 2008, at 11:32 AM, Leslie P. Polzer wrote:


(defpclass person ()
  ((name ...)
   (inbox :accessor inbox :initform (make-indexed-btree message)
        :index-on (date sender))))

Which would create a indexed-btree that stored messages and auto-
created indices on the date and sender slots?

That is a bit more OODB/Lispy than the association mechanism I've
implemented.  However I don't think you need to overload defpclass to
get this functionality.  In fact I think it starts to get too ugly.


I was thinking of this, but the other way round; MESSAGE should contain
the usual index declarations:

(defpclass message ()
 ((date :index t) ; create an index on DATE
  (sender :index t))) ; create an index on SENDER

These indices would be created in the correct "indexing namespace"
when the object is put into a container. This is easy to maintain
with, say, an INDEXED-SET class. Whatever.

Can you be more explicit about what indexes you imagine being created here and where they are stored and how they are accessed?

This should probably also imply that objects of the MESSAGE class
are not automatically registered in the store controller's class
root (as it is now); this should only happen on

(defpclass message ()
 (...)
(:index t))


How do you know what to sort the 'inbox' sorted-set by? Does it sort on message date or sender or both?

FYI, the class indexing function is now implicit, since elephant maintains a master list of oid->class-schema to support the schema mechanism which is itself an indexed btree. I think I have an option to make this connection weak so the class index isn't updated automatically.


inbox should be an object (class instance or btree) that has its own
api.

In fact that's my current solution but I don't like it much. It's not
good for quick development. You need to think too much about the
storage part.

I think we need to work through this model as an independent extension first. I'm leery at this point of premature optimization given the cost in complexity, testing, and API changes that these kinds of changes incur. (plus I'm not going to have time for a major upgrade for some time).


However associations, like psets, are not sorted (dup-btree
oid:instance-ref). The value of (user message) is a persistent object that is added to a dup-btree maintained by the metaclass protocol. It
maps the oid of a user to the messages that store it.

All too complicated. IMHO a great feature of Elephant is that it let

(setf (user message) charlie) and (inbox charlie) is too complicated? Associations implements and maintains your set-of-objects model without requiring you to explicitly add objects to the appropriate container. Seems like alot of gain for two lines of code! Moreover, it limits the proliferation of btrees which makes life easier for postmodern given the table-per-btree implementation restriction.

you work with your objects without worrying much about the storage
backend. As of now, this feature still needs to get better.
Sure, it works (at least for me), but it's not as nice as it could
be. Elephant should take care of all the low-level sorting stuff
(probably creating indices wherever needed or even sorting without
indices for prototyping).

That would definitely be nice, but I'm not convinced the increased complexity is worth the benefit. These things need to work robustly in the presence of migration, schema changes, multiple stores, transactions, the existing MOP, disconnected operations and the limitations of all the different data stores, all without adversely impacting the base performance.

One hard part in implementing something like this is telling the system how to hook into the slot-value and (setf slot-value) functions on individual class slots without incurring significant overhead. The query system should do the right thing without these connections via joins and you can add the association declarations when you need better performance. You could figure this out on the fly, of course, but building a large index can tie up the system for quite some time and you don't want that to happen randomly.

Of course, all what you propose is doable.


It would be reasonably cheap to do query caching of these sorted
OIDs so that subsequent OFFSET & LIMIT style accesses over the same
query set would be fast, just instantiating those messages that are needed.

While I'm at it: OFFSET and LIMIT (a real limit which lets you specify
an arbitrary Lisp expression) are things we definitely want to aim
for in 1.0. They are not difficult to implement at all, but they don't
work with GET-INSTANCES-BY-* and, worse, MAP-BTREE. This means
everyone has to write their own version of these functions that
take appropriate arguments and move the cursor around themselves
instead of relying on a simple high-level API.

Can't you generalize this today as a higher order function that does this as a scan over an index, something like:

(map-inverted-index class index (offset-limit-scanner offset limit- fn) :oids t)

(defun offset-limit-scanner (offset limit-fn &optional (sc *store- controller*))
  (let ((count 0))
    (lambda (oid)
       (incf count)
       (when (> count offset)
         (let ((instance (controller-recreate-instance sc oid)))
           (when (limit-fn instance)
              (stop-mapping))))))

In general, I believe these are things to think about for as roadmap items for 1.1 and 1.2. They won't happen soon enough to justify delaying 1.0 for months. As I said above, I think this should be a contrib that implements this behind a macro that generates the appropriate methods for a set of generic functions. get-instances-by- class was always a convenience, not a catch-all.

I'd have implemented these extensions myself, but I thought it better
to wait for the integration of the query language to add it.

Well, don't hold your breath. :) Unless someone other than me picks up the query system work, it could be months before I get around to it.


The derived index hack is still more efficient for large sets.
Without changes to the data stores to create an efficient way of
sorting concatenated values, I don't see a way to improve on it easily.

I'm not sure you actually need concatenated index values at all
if you manage your objects correctly. I.e. putting them in appropriate
containers (the natural OODB way) as opposed to throwing them all
together in some indexing namespace and then tediously (for programmer
and machine) selecting the stuff you need.

Hmmm...I'll have to think about that.

Good ideas here, let's keep the ideas coming and even better, see some contributions/extensions that implement this without impacting the MOP and all the ultimate automation we might desire.

 Leslie

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Reply via email to