I'm interested in this problem myself, but haven't had the time to do the appropriate research. Many SQL query engines are built on top of a BTree infrastructure so building and performing optimized queries should be quite doable in Elephant. However, I do know that efficient operations are fairly complicated.
I'd be happy to support your work by including a search interface + simple compiler into the main Elephant release. I'm assuming you want to search for persistent objects and not standard objects stored in BTrees. There are two ways to extract a set of objects associated with a particular slot value: 1) Get all objects of the desired type and do a linear search 2) Create a slot-index for searchable slots. You can use the accessor functions in get-instances-by-value and get-instances-by-range. This returns the persistent object. Current cursor accesses require full deserialization of the target object but fortunately this doesn't include the slot, just the oid+classname, etc. If you extract a range of objects using a slot index, they'll already be ordered according to the sort order of the index values. You can do joins by filtering each list by whether the object exists in the next list or not. i.e. age = 18 (oid1, oid2, oid3, oid4, oid5, oid6, oid7 ... ) name = "Cindy" (oid4, oid7, oid3 ...) state = MA (oid10, oid4, oid7 ...) Filtered objects = oid4, oid7 ... Optimization issues: 1) Returned objects are in value order, not OID or object order; so each merge of lists is naively C*N^2 where C is the number of constraints (3 above) and N is the average size of a returned list. This can be avoided by sorting each list (N log N) prior to merging which is effectively M log M (where M = the sum of all returned objects for all the constraints). In fact if you append the lists (N) and then sort removing duplicates M log M you get the same result. You then need another N log N walk of the list to order the elements according to the original list of constraints. I'm sure there's a better way, but this is a decent first order take. 2) Now this works fine when the # of objects in a given query are small. But let's say you want to pick all adults with the name "Jerry" from a 1M entry database of people. This is a range query (get-instances-by-range 'person 'age 18 120) and a value query (get-instances-by-value 'person 'first-name "Jerry"). However, the # of Jerry instances and in particular the # of age > 18 instances will be enormous (at least 500k). You might not even be able to fit all of these in memory! In this case we need to know how large the sets of objects are under given constraints. (I'm not sure if BDB provides a way to get access to the size of the index between two values - the total size of the tree + the height of the nodes should be telling as the BTrees are balanced...). In this case we can try to use cursors or slot accessors to do the filtering based on the total size and accessibility of the queries. At this point it gets somewhat complicated and I think it's worthwhile to appeal to the SQL implementation literature. Also, what happens if a slot is not indexed? How do we search multiple object classes at the same time? For example, how would we specify a search over instances and it's super classes when the superclass also contain the slot constraints? (Currently class indexes are specific to a class name) Is this helpful? A first cut approach as described in #1 above will probably flesh out a large number of issues and shouldn't take that much time. If you are motivated, or have large queries, then it's appropriate to spend some time talking about how to handle the issues of large sets in #2. Query language: Does it make sense to have a full query language, or a simpler macros to make this simple? The easy macro approach probably doesn't need anything sophisticated: (retrieve-instances-by-constraints ((person :with-superclasses t) employee) ((age 18 120) (name "George") (state :MA)) (:symbols-as-strings t :order t)) Here we search for person objects, parents of person objects and employee objects for the three constraints. The search option :symbols-as-strings means to treat :MA as matching :MA or "MA" - that means two slot queries. Not sure that's a good idea, but it's an example of modifying how the query operates. :order means to order the resulting objects by their age, in this case. The order of ranges in the constraint list would determine the total order. What if we want to do join queries? How about OR constraints? (state (or :MA :NY)) Ian _______________________________________________ elephant-devel site list elephant-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/elephant-devel