Re: [elephant-devel] Query System

Ian Eslick Fri, 09 May 2008 18:55:46 -0700

Welcome back Daniel, we all know the work drill!

Here are a few thoughts to throw into the mix...

One advantage of the relational model is that you have implicit datastructures (tables) that can be assembled from existing tables via theSQL query. This is nice because it means we don't have to explicitlycreate and maintain the structure for all these derived datastructures. In a pure lisp model, you actually have to do all thismaintenance yourself, especially the optimizations necessary forefficiency that add to complexity. I feel that Elephant shouldprobably fall somewhere in-between. You maintain the data structuresthat you want to work with in your program logic, but the system canmaintain pointers and indices and other relationships that make iteasy and efficient to generate and work with subsets of objectsefficiently (a user's inbox, for example).

Some of the limitations/frustrations with the current system may becaused by people trying to do familiar relational tasks in the OODBframework.

I also think that Robert's lisp-as-query-language works well for theprevalence model when all objects are in memory, but I think it's lesspractical in, say, BDB where you are going to disk alot. However,it's a good discipline to consider - when does it makes sense to addnew syntax/apis and when does it make sense to use lisp directly.

You mentioned associations. The best way to think about associationsis that it is an easy way to maintain back pointers. For example, ifa message object has a slot that contains a reference to a user, wemay also want the user object to have an accessor that provides quickand efficient access to the collection of messages that point to it.That's what associations are for. You could do this by declaringafter methods on (setf (user message) value) that add the message to apset sitting in a user instance slot, but that gets tedious. AsLeslie says, we're trying to make common cases simple and reasonablyefficient.

So the approach I'd like to see taken to designing the query frameworkis to capture the use cases and metaphors that people are reallyinterested in and are encountering in real-world use and pick thelargest subset that fits nicely into a clean, theoretical conceptualmodel. There are already a good number (Leslie, Alex, etc) on thelist that we could start with.

For example, I often find myself wanting to filter a set of objects bymore than one parameter (messages from user U that are high prioritybetween 4/1/08 and 5/1/08). What is the complexity of differentapproaches afforded by the existing Elephant implementation?


In order of computational efficiency (I surmise):

1. scan all messages and collect/operate on only those matching allcriteria2. scan an index on messages instead of all messages; pick the onelikely to yield the smallest subset3. intersection: scan two or more indexes for subsets represented assequences of oids, instantiate, filter and operate on the objectsrepresented by the intersection.4. create an index that orders objects by all three parameters andjust walk the matching set. Trade off space for time.


Any others?

The other consideration is the conceptual framework we want to use toapproach the problem. Procedural? Constraint satisfaction? Logicalform? Graph matching? There are some good examples of existing OODBsystems in lisp out there (PLOB, AllegroStore/AllegroCache, Statice,etc). If you search the list archives, I think I've forwardedreferences in the past.

I tend to lean towards a constraint satisfaction approach, as mysketch demonstrations. "Operate on the set of objects that satisfythese constraints." There are a bunch of practical issues. Do we mapquery sets? Do we cache them? Do we represent them as lists? Arethey lazily evaluated? If we don't have a DSL, but allow arbitrarylisp expressions, then there isn't enough information to automaticallyselect indexes, perform intersections, etc.

My other strong suggestion, besides starting by capturing the majoruse cases, is that we begin by implementing a procedural approach byimplementing the building blocks for filter, sort, intersect, etc. Ifwe take the list of four filtering approaches above, we can startwriting code that do these things and use them to implement some ofthe use cases. The common building blocks and problems that wediscover will inform the additions we'll want to the MOP, new implicitdata structures like associations, the most convenient query syntax,etc. Plus it will be useful in the meantime. This fits into theclassic lisp bottom-up DSL development model (well proselytized byPaul. Graham).


Ian



On May 9, 2008, at 6:02 PM, [EMAIL PROTECTED] wrote:

Hello everyone,
I apologize for being disconnected for so long. I had volunteered tohelp in the query system and should have done more progress by now.Unfortunately, the same as some (most or all) of you, putting foodon the table for my family has a higher priority and my current jobhas demanded 110% of my time lately.
Enough excuses! I have been passively reading several of your emailthreads. I am convinced that a query system will bring a lot ofvalue to Elephant. The question that still arises is whether or notpeople want a SQL-like syntax or a Lisp-like syntax.
As Ian has suggested, publicly and/or privately, we should startdesigning the query system in a very basic form. The most criticalpart would be query optimization, which I'd rather work on after wehave the basic query system in place. But there are a lot ofdecisions to make before we get there and coming to a consensus ofhow it should look and how it should work is of critical importance.
From a simplistic point of view, a SQL-like syntax should allow forthe execution of the basic relational algebraic operations (union,difference, cartesian product, projection, and selection). For themost part, these would not be difficult to implement. However, IMHO,there is an intrinsic "contradiction" in applying a SQL-like syntaxon top of Elephant.
Assume you have the following Tables (relations) in a SQL world:

Books (
 book_id,
 title,
 author
)

Publishers (
 publisher_id,
 name
)

BooksPublishers (
 book_id,
 publisher_id,
 year
)
Suppose you wanted to get the cartesian product of all the bookspublished in 2008, you could run a SQL query like:
SELECT Books.*, Publishers.* FROM Books, Publishers, BooksPublishersWHERE Books.book_id = BooksPublishers.book_id ANDPublishers.publisher_id = BooksPublishers.publisher_id ANDBooksPublishers.year = 2008
The result will be a concatenation of all the columns from the Booksand Publishers tables. In a SQL-world, you would access theseresults in a key-value pair type mode (e.g. Books.book_id = 1,Books.title = "1984", etc). However, when you think in terms ofElephant (at least my understanding of it), you're dealing withobjects and not key-value pairs from multiple tables. So, instead ofgetting a concatenation of all the columns, you "should" be gettingjust a list of Book objects (or Publisher objects) that met yourquery criteria, such that when you iterate thru them, you could"query" their Publishers (or the Books). So, if we had somethinglike (please keep in mind this is no suggestion to syntax orcorrectness but just for illustrative purposes):
(defpclass book ()
 ((title :accessor book-title :index t)
  (author :accessor book-author :index t)
  (published_copies :accessor book-copies :initform (make-pset))))

(defpclass publisher ()
 ((name :accessor publisher-name :index t)))

(defmethod add-published-copy ((bk book) (pb publisher) year)
 (insert-item '(pb year) (book-copies bk)))

(defmethod map-published-copies (fn (bk book))
 (map-pset fn (book-copies bk)))
(setq objs (select book :where ((map-published-copies (lambda (itemyear) (= (second item) year)) $bk 2008)))))
From then on, you could just iterate through the book objects in theresult set for their respective published copies. The problem withthis is that, ok, you get all the books that met your criteria butif you then wanted to get a list of all the published copies, youwould need to apply the filter criteria again. The reason I think it"should behave" this way is because Elephant deals with sets ofobjects, and you use Lisp to navigate through the object space,whereas in a SQL-world you are not dealing with objects but with aresult set that contains all the columns you asked for. If we wereto emulate the same behavior in the query system, that would sort ofdefeat the purpose of Elephant. For that matter, you might as welluse some of the other libraries (e.g. CL-SQL, cl-perec, cl-rdbms,etc).
The above example is a very simple example. We haven't looked atSORTING, LIMIT, OFFSET, etc. Things which will simply make thiswhole dilemma more difficult.
I haven't looked into Ian's association mechanism yet. Maybe thequery system could/should be an extension to that with somespecialized features to apply filter criteria instead (and possiblyevolve into something similar to Ruby's ActiveRecord). I know theassociation mechanism is still being developed and I haven't reallyseen anyone comment much on it other than what Ian has mentioned. Inone of Ian's comments, he said:
"A more general query language is probably the right solutionfor this interface. The query language would know aboutassociations, derived indices, etc and perform query planning viaintrospection over the class objects."
At the same time, Robert said on another thread:
"One might philosophically prefer SQL. I personally vasterprefer to work in a powerful programming language to accomplishthese things. Obviously, whether two classes that refer to eachother stand in a "parent-child" relationship or not depends entirelyon the circumstances. I prefer to write simple functions such as"delete-order" below, which both utilize and (in a sense) expand thepower of LISP applied to persistent objects."
Leslie said on yet another thread:
"While I'm at it: OFFSET and LIMIT (a real limit which lets youspecify an arbitrary Lisp expression) are things we definitely wantto aim for in 1.0. They are not difficult to implement at all, butthey don't work with GET-INSTANCES-BY-* and, worse, MAP-BTREE. Thismeans everyone has to write their own version of these functionsthat take appropriate arguments and move the cursor aroundthemselves instead of relying on a simple high-level API.
I'd have implemented these extensions myself, but I thought itbetter to wait for the integration of the query language to add it."
And Alex said:
"I think main problem is not how it looks, but that querylanguage actually makes programming a lot easier."
All those comments make sense. There seems to be a group agreementthat something is needed, but everyone has their own ideas of how itshould work. Both the query language and the associations are stillbeing developed, so if we get consensus no how these should work, itmay give a better direction to both feature sets. If anyone has anycomments or suggestion as to whether a query system be of realinterest/necessity and if so, which would be the preferred querysyntax and expected behavior, that would really help.
I'm willing to work on this in as much as possible with my limitedknowledge of Lisp and Elephant. However, given a clear direction ofwhere this should go, I will be able to focus better and learnfaster what I haven't learned so far.
Again, your feedback is much appreciated. I'm hopeful to be able towork more on this over the weekend, assuming I get some feedbackfrom you guys.
Thanks
Daniel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Re: [elephant-devel] Query System

Reply via email to