Just to capture these references for posterity.

By fractal trees I think Leslie means variations on this work which optimizes the internal node structure of B+-trees. i.e.

http://www.pittsburgh.intel-research.net/people/gibbons/papers/fpbptrees.pdf

This approach optimizes the in-memory cache performance of the linear scan through the keys of a large BTree node. To Robert's point, having a basic BTree implementation gets us a good chunk of the way to this, without the extra complexity of a sophisticated node structure. The performance gains of adding that complexity could be quite interesting, if we are traversing these structures in memory often enough, but I suspect we're going to find with any first pass implementation that there are some different issues with Lisp that may suggest a different optimization strategy. (e.g. typical properties of the data, overhead of data conversion, etc). Moreoever, if we're disk dominated this performance enhancement ends up on the wrong-end of Ahmdal's law

S(b) trees improve disk read performance at the expense of complexity. The key idea is to make long-range scans more efficient by changing the disk page allocation strategy to ensure that most/all of the children of a B-Tree block are allocated in contiguous pages on- disk so you can fetch many pages at once, knowing that in the average case you're going to be reading more pages in that contiguous region. Update cost is increased, but if updates are far less frequent than reads the result is a net win. This also motivates putting more data in-line in the leaf nodes instead of only storing references or short key/values.

Reference: http://citeseer.ist.psu.edu/cache/papers/cs/1208/http:zSzzSzwww.cs.umb.eduzSz ~poneilzSzsb-tree.pdf/the-sb-tree-an.pdf

My observation in my own applications is that I spend a great deal of time waiting on disk access - anything we can do to pay in-memory ops or computation in exchange for less average disk access time is likely to be a bigger win than reducing in-memory costs.

The nice thing about both of these hacks is that they happen under the B-Tree abstraction so there is no need to rush to implement them.

Ian


On Mar 26, 2008, at 11:17 PM, Robert L. Read wrote:
On Wed, 2008-03-26 at 20:33 +0100, Leslie P. Polzer wrote:
I suppose this is a good opportunity for me to chime in with
a few thoughts. Aren't B+trees a choice that is too conservative
for a modern storage backend?
There seem to be more modern data structures (S(b) trees or
fractal trees) that are especially well suited for storing
variable-length keys.

Perhaps.  I would certainly hope that our design would allow such
structures to be swapped out as a "strategy" pattern.

Personally, I had never heard of those structures until you mentioned
them, and a quick search does not yield a concise description of their
advantages---or of how to implement them.

If you can briefly describe the differences that would be great. On the other other hand, a LISP-native back end using B+-trees would be a nice
leap forward for us; using something better might be even better but
would not lift us into a new valence band of quality.

The fact that one can find example implementations, possibly even in
LISP, of the B+-tree, is an advantage.

My personal philosophy is gradualism --- crawling is the best way to
learn to walk, and having a B+-tree implementation gets us half the way
to the latest data structure.


 Leslie

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Reply via email to