Another way to get reasonable reporting efficiency for queries with multiple constraints is to use cursors over secondary indices directly.
For your example I would create a CLOS class in Elephant instead of a table. Each instance of this class is like a record in a table. To link two records together you have slots that contain references to the contained objects. You don't keep collections, those are to be stored in secondary indices so they are ordered and readout can be linear in the size of the result set. (defpclass customer ((name ... :index t))) (defpclass invoice ((number ... :index t) (date ... :index t) (customer ... :index t))) You can now easily do BTree lookups using secondary indices via the 0.6.0 indexing mechanism: (get-instances-by-value 'invoice 'number <invoice-integer>) => returns a single invoice (get-instances-by-range 'invoice 'date <start date> <end date>) => returns a list of invoices between two dates (get-instances-by-value 'invoice 'customer These are highly efficient calls although they do require going to disk unless you are using DCM, but the performance for reports is not going to be a real problem as the secondary indices are ordered so in retrieving a set of invoices over dates so performance is roughly O( # invoices in range + log # total invoices ) and for small ranges is effectively log 2N disk accesses (one for index, one for deserializing object if it's not cached). The rub is when you want to do a query for a customer's invoices between two date ranges using the customer's name. In SQL this is a join query and the query compiler will typically optimize how this works. If either set is likely to be small (# of invoices per customer or # of invoices in a date range) you can write your own query to fetch one subset and filter it: (defun get-customer-invoices-for-dates (name start-date end-date) (let ((customer (car (get-instances-by-value 'customer 'name name)))) (select-if (lambda (invoice) (in-date-range-p invoice start-date end-date)) (get-instances-by-value 'invoice 'customer customer)))) To handle joins over large collections, a more sophisticated function using cursors would need to be written and in some cases you'd need to generate some new index data structures to get the time down to O ( N log M ) with N the target dataset and M the largest # of instances for any class involved in the query. An easy way to do this is to use derived indices which, for example, can use any number of slots in an object to create an ordering - so you can pre-sort objects by date and then by customer so you can skip over customer-date pairs as you traverse the btree. That might require some more explanation but I dont' have the time just now. :) In a future release we hope to integrate DCM/prevalence style functionality more directly into the persistent object system so common linear queries are fast and only reports over non-resident (infrequently and not recently accessed) objects is expensive. Good luck and let us know if you need more suggestions! Ian Robert L. Read wrote: > On Wed, 2006-07-26 at 17:36 -0400, Daniel Salama wrote: >> The other approach I thought would be to model it similarly as to how >> I would do it in a relational database. Basically, I would create >> separate collections of objects representing the tables I would have >> in the relational database. Then, within each object, e.g. a customer >> object, I would create a reference to a collection that holds a >> subset of invoices, for example. This would allow me to simply query >> the invoices collection of a customer and that's it. At the same >> time, I would be able to query the entire invoices collection. > Dear Daniel, > I think this approach is much better than creating a very large > object. > > Personally, I have an opinion a lot of people disagree with --- I > use the "prevalence" model, > which is basically that I keep all of the objects in memory, and when > I change something I > write back to the data store. This pretty much makes your reporting > efficiency issues > go away, because you can compute any report really, really fast. > > I have checked in, in the "contrib" directory, a packaged called > DCM, for Data Collection Management, > that does the in-memory management --- the responsibility of the user > is to call "register-object" whenever > an object needs to be back. DCM also supports the "reference" problem > that you mention --- that is, > instead of putting a whole object into a slot, you put the key there > and look it up in a separate object. > > In this model, each class of object you would objectify (which is > very similar to the "tables" in > relational model or "entities" in the entity-relational model.) Each > should class gets a "director", and > you operate against the director when you do something. One of the > advantages of this approach is > that you can choose the "strategy" for each director --- so you can > choose to cache the objects in > memory, or to directly use the database store, or even to use a > generational system. > > I think the tests of DCM could be considered a little bit of > pseudocode for what you want. > > In considering whether or not things should be kept in memory, one > should do the math: the > maximum number of objects * their size vs. the free memory. Memories > are big and getting bigger. > > Let me know if this addresses you ideas or if you have any other > questions; Ian and I had > previously agreed that the lack of a big example application is one of > the biggest weaknesses in > Elephant right now. > > ------------------------------------------------------------------------ > > _______________________________________________ > elephant-devel site list > elephant-devel@common-lisp.net > http://common-lisp.net/mailman/listinfo/elephant-devel _______________________________________________ elephant-devel site list elephant-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/elephant-devel