My fellow Elephants,

Unstable isn't unstable anymore! All BDB tests, including migration, for BDB/Mac/Allegro and BDB/Mac/SBCL are green as of today's checkin.

All major new features are implemented, including:
- Instance map, class schema evolution and MOP compliance
- New slot types
  - Cached read, write-through slots
  - Hierarchical indexed slots
  - Virtual, hierarchical derived indices
  - Set-valued slots
  - Many-to-1 and many-to-many slot associations
- Trivial query interface example (query.lisp)
- Migration and upgrade
- Partial test suite (basic association, indexing, migration, basic schema-evolution)

There are definitely holes in the test suite that need to be plugged and I'm sure that this will uncover bugs, particularly in the schema evolution, upgrade or association infrastructure. The steps needed to prepare this branch for the next release are:

- Integrate patches from the main repository
(Leslie's patch is the only one that I haven't already integrated into unstable, I think)

- Evaluate multi-threading issues for schema evolution
(only one thread should be able to manipulate class objects at a time)

- Upgrade Postmodern and CLSQL data stores
  - Support btrees with duplicate keys
  - Some minor API additions for upgrade & bootstrapping

- Testing
- Expand testing for schema evolution (most complex/subtle bugs were there)
  - Validate upgrade procedure 0.9.1 -> 0.9.2
- Verify referential integrity (delete object, what happens to stale refs?)
  - Standard tests for new features

- Documentation of new features

I am tied up with work for the next two weeks. I'm happy to support bug fixes, lisp compatibility issues, etc - but progress will only be made for the remainder of March and early April if others step in to help.

Robert and I hope to integrate this work into another 0.9.x release in late April. I think this new functionality makes Elephant sufficiently feature-rich and robust that after some burn-in time we should consider packaging this into a 1.0 release that we can commit to support for the longer term. We can have a 1.1 development branch in which add major new features like an all-lisp data store or a query compiler as longer term projects.

There are a few features that could use attention that could, but need not, make it into the upcoming release:

- Online GC strategy

Now that we have an oid table that maintains information for each object and is used to de-serialize a reference, we can implement facilities such as forwarding pointers, counts or marks that makes it possible to build an online persistent heap GC facility without an overly significant cost or code impact.

- Query language/interpreter

Daniel Salama is thinking about the query syntax and is motivated to help implement something there. I'd be psyched to see an interpreter that extends my sketch to take good advantage of indices and associations.

- System-level schema evolution

Robert is thinking through some system-level schema versioning and evolution ideas akin to the Postgresql notion of schemas, but neither of us has the bandwidth to implement this right now. The basic idea is to group a set of class schemas into a version set and to use these version tags to dispatch a generic-function that can override the default transformation of an instance from one schema version to the next. This would allow you to connect to an old DB with new code, call a global upgrade fn, and have everything converted in one go.

This would be an independent application layer so would not impact an upcoming release either way.

Regards,
Ian


PS - I did some profiling of the unstable branch on BDB/Mac to see what effects different query strategies might have. It though some of you would be interested in this. This is preliminary and not well controlled, but the order of magnitude should be about right.

The objects described below are 5-slot objects with a mix of indexed, cached, transient, etc.

Persistent object creation: 3000 objects per second
Persistent object reference deserialization w/ object instantiation: 10k per second
Persistent object reference deserialization of oids only: 40k per second

This last # would be the key factor in handling queries over large object databases. Since we can instantiate using only an oid, we only need to instantiate objects we need. This should make things like counts and paging pretty efficient for moderately sized databases. Indexing, of course, will have a significant impact on the performance of query by reducing the number of manipulated OIDs.






_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Reply via email to