Welcome back Daniel, we all know the work drill!
Here are a few thoughts to throw into the mix...
One advantage of the relational model is that you have implicit data
structures (tables) that can be assembled from existing tables via the
SQL query. This is nice because it means we don't have to explicitly
create and maintain the structure for all these derived data
structures. In a pure lisp model, you actually have to do all this
maintenance yourself, especially the optimizations necessary for
efficiency that add to complexity. I feel that Elephant should
probably fall somewhere in-between. You maintain the data structures
that you want to work with in your program logic, but the system can
maintain pointers and indices and other relationships that make it
easy and efficient to generate and work with subsets of objects
efficiently (a user's inbox, for example).
Some of the limitations/frustrations with the current system may be
caused by people trying to do familiar relational tasks in the OODB
framework.
I also think that Robert's lisp-as-query-language works well for the
prevalence model when all objects are in memory, but I think it's less
practical in, say, BDB where you are going to disk alot. However,
it's a good discipline to consider - when does it makes sense to add
new syntax/apis and when does it make sense to use lisp directly.
You mentioned associations. The best way to think about associations
is that it is an easy way to maintain back pointers. For example, if
a message object has a slot that contains a reference to a user, we
may also want the user object to have an accessor that provides quick
and efficient access to the collection of messages that point to it.
That's what associations are for. You could do this by declaring
after methods on (setf (user message) value) that add the message to a
pset sitting in a user instance slot, but that gets tedious. As
Leslie says, we're trying to make common cases simple and reasonably
efficient.
So the approach I'd like to see taken to designing the query framework
is to capture the use cases and metaphors that people are really
interested in and are encountering in real-world use and pick the
largest subset that fits nicely into a clean, theoretical conceptual
model. There are already a good number (Leslie, Alex, etc) on the
list that we could start with.
For example, I often find myself wanting to filter a set of objects by
more than one parameter (messages from user U that are high priority
between 4/1/08 and 5/1/08). What is the complexity of different
approaches afforded by the existing Elephant implementation?
In order of computational efficiency (I surmise):
1. scan all messages and collect/operate on only those matching all
criteria
2. scan an index on messages instead of all messages; pick the one
likely to yield the smallest subset
3. intersection: scan two or more indexes for subsets represented as
sequences of oids, instantiate, filter and operate on the objects
represented by the intersection.
4. create an index that orders objects by all three parameters and
just walk the matching set. Trade off space for time.
Any others?
The other consideration is the conceptual framework we want to use to
approach the problem. Procedural? Constraint satisfaction? Logical
form? Graph matching? There are some good examples of existing OODB
systems in lisp out there (PLOB, AllegroStore/AllegroCache, Statice,
etc). If you search the list archives, I think I've forwarded
references in the past.
I tend to lean towards a constraint satisfaction approach, as my
sketch demonstrations. "Operate on the set of objects that satisfy
these constraints." There are a bunch of practical issues. Do we map
query sets? Do we cache them? Do we represent them as lists? Are
they lazily evaluated? If we don't have a DSL, but allow arbitrary
lisp expressions, then there isn't enough information to automatically
select indexes, perform intersections, etc.
My other strong suggestion, besides starting by capturing the major
use cases, is that we begin by implementing a procedural approach by
implementing the building blocks for filter, sort, intersect, etc. If
we take the list of four filtering approaches above, we can start
writing code that do these things and use them to implement some of
the use cases. The common building blocks and problems that we
discover will inform the additions we'll want to the MOP, new implicit
data structures like associations, the most convenient query syntax,
etc. Plus it will be useful in the meantime. This fits into the
classic lisp bottom-up DSL development model (well proselytized by
Paul. Graham).
Ian
On May 9, 2008, at 6:02 PM, [EMAIL PROTECTED] wrote:
Hello everyone,
I apologize for being disconnected for so long. I had volunteered to
help in the query system and should have done more progress by now.
Unfortunately, the same as some (most or all) of you, putting food
on the table for my family has a higher priority and my current job
has demanded 110% of my time lately.
Enough excuses! I have been passively reading several of your email
threads. I am convinced that a query system will bring a lot of
value to Elephant. The question that still arises is whether or not
people want a SQL-like syntax or a Lisp-like syntax.
As Ian has suggested, publicly and/or privately, we should start
designing the query system in a very basic form. The most critical
part would be query optimization, which I'd rather work on after we
have the basic query system in place. But there are a lot of
decisions to make before we get there and coming to a consensus of
how it should look and how it should work is of critical importance.
From a simplistic point of view, a SQL-like syntax should allow for
the execution of the basic relational algebraic operations (union,
difference, cartesian product, projection, and selection). For the
most part, these would not be difficult to implement. However, IMHO,
there is an intrinsic "contradiction" in applying a SQL-like syntax
on top of Elephant.
Assume you have the following Tables (relations) in a SQL world:
Books (
book_id,
title,
author
)
Publishers (
publisher_id,
name
)
BooksPublishers (
book_id,
publisher_id,
year
)
Suppose you wanted to get the cartesian product of all the books
published in 2008, you could run a SQL query like:
SELECT Books.*, Publishers.* FROM Books, Publishers, BooksPublishers
WHERE Books.book_id = BooksPublishers.book_id AND
Publishers.publisher_id = BooksPublishers.publisher_id AND
BooksPublishers.year = 2008
The result will be a concatenation of all the columns from the Books
and Publishers tables. In a SQL-world, you would access these
results in a key-value pair type mode (e.g. Books.book_id = 1,
Books.title = "1984", etc). However, when you think in terms of
Elephant (at least my understanding of it), you're dealing with
objects and not key-value pairs from multiple tables. So, instead of
getting a concatenation of all the columns, you "should" be getting
just a list of Book objects (or Publisher objects) that met your
query criteria, such that when you iterate thru them, you could
"query" their Publishers (or the Books). So, if we had something
like (please keep in mind this is no suggestion to syntax or
correctness but just for illustrative purposes):
(defpclass book ()
((title :accessor book-title :index t)
(author :accessor book-author :index t)
(published_copies :accessor book-copies :initform (make-pset))))
(defpclass publisher ()
((name :accessor publisher-name :index t)))
(defmethod add-published-copy ((bk book) (pb publisher) year)
(insert-item '(pb year) (book-copies bk)))
(defmethod map-published-copies (fn (bk book))
(map-pset fn (book-copies bk)))
(setq objs (select book :where ((map-published-copies (lambda (item
year) (= (second item) year)) $bk 2008)))))
From then on, you could just iterate through the book objects in the
result set for their respective published copies. The problem with
this is that, ok, you get all the books that met your criteria but
if you then wanted to get a list of all the published copies, you
would need to apply the filter criteria again. The reason I think it
"should behave" this way is because Elephant deals with sets of
objects, and you use Lisp to navigate through the object space,
whereas in a SQL-world you are not dealing with objects but with a
result set that contains all the columns you asked for. If we were
to emulate the same behavior in the query system, that would sort of
defeat the purpose of Elephant. For that matter, you might as well
use some of the other libraries (e.g. CL-SQL, cl-perec, cl-rdbms,
etc).
The above example is a very simple example. We haven't looked at
SORTING, LIMIT, OFFSET, etc. Things which will simply make this
whole dilemma more difficult.
I haven't looked into Ian's association mechanism yet. Maybe the
query system could/should be an extension to that with some
specialized features to apply filter criteria instead (and possibly
evolve into something similar to Ruby's ActiveRecord). I know the
association mechanism is still being developed and I haven't really
seen anyone comment much on it other than what Ian has mentioned. In
one of Ian's comments, he said:
"A more general query language is probably the right solution
for this interface. The query language would know about
associations, derived indices, etc and perform query planning via
introspection over the class objects."
At the same time, Robert said on another thread:
"One might philosophically prefer SQL. I personally vaster
prefer to work in a powerful programming language to accomplish
these things. Obviously, whether two classes that refer to each
other stand in a "parent-child" relationship or not depends entirely
on the circumstances. I prefer to write simple functions such as
"delete-order" below, which both utilize and (in a sense) expand the
power of LISP applied to persistent objects."
Leslie said on yet another thread:
"While I'm at it: OFFSET and LIMIT (a real limit which lets you
specify an arbitrary Lisp expression) are things we definitely want
to aim for in 1.0. They are not difficult to implement at all, but
they don't work with GET-INSTANCES-BY-* and, worse, MAP-BTREE. This
means everyone has to write their own version of these functions
that take appropriate arguments and move the cursor around
themselves instead of relying on a simple high-level API.
I'd have implemented these extensions myself, but I thought it
better to wait for the integration of the query language to add it."
And Alex said:
"I think main problem is not how it looks, but that query
language actually makes programming a lot easier."
All those comments make sense. There seems to be a group agreement
that something is needed, but everyone has their own ideas of how it
should work. Both the query language and the associations are still
being developed, so if we get consensus no how these should work, it
may give a better direction to both feature sets. If anyone has any
comments or suggestion as to whether a query system be of real
interest/necessity and if so, which would be the preferred query
syntax and expected behavior, that would really help.
I'm willing to work on this in as much as possible with my limited
knowledge of Lisp and Elephant. However, given a clear direction of
where this should go, I will be able to focus better and learn
faster what I haven't learned so far.
Again, your feedback is much appreciated. I'm hopeful to be able to
work more on this over the weekend, assuming I get some feedback
from you guys.
Thanks
Daniel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel