On May 8, 2008, at 11:32 AM, Leslie P. Polzer wrote:
(defpclass person ()
((name ...)
(inbox :accessor inbox :initform (make-indexed-btree message)
:index-on (date sender))))
Which would create a indexed-btree that stored messages and auto-
created indices on the date and sender slots?
That is a bit more OODB/Lispy than the association mechanism I've
implemented. However I don't think you need to overload defpclass to
get this functionality. In fact I think it starts to get too ugly.
I was thinking of this, but the other way round; MESSAGE should
contain
the usual index declarations:
(defpclass message ()
((date :index t) ; create an index on DATE
(sender :index t))) ; create an index on SENDER
These indices would be created in the correct "indexing namespace"
when the object is put into a container. This is easy to maintain
with, say, an INDEXED-SET class. Whatever.
Can you be more explicit about what indexes you imagine being created
here and where they are stored and how they are accessed?
This should probably also imply that objects of the MESSAGE class
are not automatically registered in the store controller's class
root (as it is now); this should only happen on
(defpclass message ()
(...)
(:index t))
How do you know what to sort the 'inbox' sorted-set by? Does it sort
on message date or sender or both?
FYI, the class indexing function is now implicit, since elephant
maintains a master list of oid->class-schema to support the schema
mechanism which is itself an indexed btree. I think I have an option
to make this connection weak so the class index isn't updated
automatically.
inbox should be an object (class instance or btree) that has its own
api.
In fact that's my current solution but I don't like it much. It's not
good for quick development. You need to think too much about the
storage part.
I think we need to work through this model as an independent extension
first. I'm leery at this point of premature optimization given the
cost in complexity, testing, and API changes that these kinds of
changes incur. (plus I'm not going to have time for a major upgrade
for some time).
However associations, like psets, are not sorted (dup-btree
oid:instance-ref). The value of (user message) is a persistent
object
that is added to a dup-btree maintained by the metaclass protocol.
It
maps the oid of a user to the messages that store it.
All too complicated. IMHO a great feature of Elephant is that it let
(setf (user message) charlie) and (inbox charlie) is too complicated?
Associations implements and maintains your set-of-objects model
without requiring you to explicitly add objects to the appropriate
container. Seems like alot of gain for two lines of code! Moreover,
it limits the proliferation of btrees which makes life easier for
postmodern given the table-per-btree implementation restriction.
you work with your objects without worrying much about the storage
backend. As of now, this feature still needs to get better.
Sure, it works (at least for me), but it's not as nice as it could
be. Elephant should take care of all the low-level sorting stuff
(probably creating indices wherever needed or even sorting without
indices for prototyping).
That would definitely be nice, but I'm not convinced the increased
complexity is worth the benefit. These things need to work robustly
in the presence of migration, schema changes, multiple stores,
transactions, the existing MOP, disconnected operations and the
limitations of all the different data stores, all without adversely
impacting the base performance.
One hard part in implementing something like this is telling the
system how to hook into the slot-value and (setf slot-value) functions
on individual class slots without incurring significant overhead. The
query system should do the right thing without these connections via
joins and you can add the association declarations when you need
better performance. You could figure this out on the fly, of course,
but building a large index can tie up the system for quite some time
and you don't want that to happen randomly.
Of course, all what you propose is doable.
It would be reasonably cheap to do query caching of these sorted
OIDs so that subsequent OFFSET & LIMIT style accesses over the same
query set would be fast, just instantiating those messages that are
needed.
While I'm at it: OFFSET and LIMIT (a real limit which lets you specify
an arbitrary Lisp expression) are things we definitely want to aim
for in 1.0. They are not difficult to implement at all, but they don't
work with GET-INSTANCES-BY-* and, worse, MAP-BTREE. This means
everyone has to write their own version of these functions that
take appropriate arguments and move the cursor around themselves
instead of relying on a simple high-level API.
Can't you generalize this today as a higher order function that does
this as a scan over an index, something like:
(map-inverted-index class index (offset-limit-scanner offset limit-
fn) :oids t)
(defun offset-limit-scanner (offset limit-fn &optional (sc *store-
controller*))
(let ((count 0))
(lambda (oid)
(incf count)
(when (> count offset)
(let ((instance (controller-recreate-instance sc oid)))
(when (limit-fn instance)
(stop-mapping))))))
In general, I believe these are things to think about for as roadmap
items for 1.1 and 1.2. They won't happen soon enough to justify
delaying 1.0 for months. As I said above, I think this should be a
contrib that implements this behind a macro that generates the
appropriate methods for a set of generic functions. get-instances-by-
class was always a convenience, not a catch-all.
I'd have implemented these extensions myself, but I thought it better
to wait for the integration of the query language to add it.
Well, don't hold your breath. :) Unless someone other than me picks
up the query system work, it could be months before I get around to it.
The derived index hack is still more efficient for large sets.
Without changes to the data stores to create an efficient way of
sorting concatenated values, I don't see a way to improve on it
easily.
I'm not sure you actually need concatenated index values at all
if you manage your objects correctly. I.e. putting them in appropriate
containers (the natural OODB way) as opposed to throwing them all
together in some indexing namespace and then tediously (for programmer
and machine) selecting the stuff you need.
Hmmm...I'll have to think about that.
Good ideas here, let's keep the ideas coming and even better, see some
contributions/extensions that implement this without impacting the MOP
and all the ultimate automation we might desire.
Leslie
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel