Re: [elephant-devel] summer of code: native lisp backend?

Ian Eslick Thu, 27 Mar 2008 07:05:48 -0700

Just to capture these references for posterity.

By fractal trees I think Leslie means variations on this work whichoptimizes the internal node structure of B+-trees. i.e.


http://www.pittsburgh.intel-research.net/people/gibbons/papers/fpbptrees.pdf

This approach optimizes the in-memory cache performance of the linearscan through the keys of a large BTree node. To Robert's point,having a basic BTree implementation gets us a good chunk of the way tothis, without the extra complexity of a sophisticated node structure.The performance gains of adding that complexity could be quiteinteresting, if we are traversing these structures in memory oftenenough, but I suspect we're going to find with any first passimplementation that there are some different issues with Lisp that maysuggest a different optimization strategy. (e.g. typical propertiesof the data, overhead of data conversion, etc). Moreoever, if we'redisk dominated this performance enhancement ends up on the wrong-endof Ahmdal's law

S(b) trees improve disk read performance at the expense ofcomplexity. The key idea is to make long-range scans more efficientby changing the disk page allocation strategy to ensure that most/allof the children of a B-Tree block are allocated in contiguous pages on-disk so you can fetch many pages at once, knowing that in the averagecase you're going to be reading more pages in that contiguous region.Update cost is increased, but if updates are far less frequent thanreads the result is a net win. This also motivates putting more datain-line in the leaf nodes instead of only storing references or shortkey/values.

Reference: http://citeseer.ist.psu.edu/cache/papers/cs/1208/http:zSzzSzwww.cs.umb.eduzSz~poneilzSzsb-tree.pdf/the-sb-tree-an.pdf

My observation in my own applications is that I spend a great deal oftime waiting on disk access - anything we can do to pay in-memory opsor computation in exchange for less average disk access time is likelyto be a bigger win than reducing in-memory costs.

The nice thing about both of these hacks is that they happen under theB-Tree abstraction so there is no need to rush to implement them.


Ian


On Mar 26, 2008, at 11:17 PM, Robert L. Read wrote:

On Wed, 2008-03-26 at 20:33 +0100, Leslie P. Polzer wrote:

I suppose this is a good opportunity for me to chime in with
a few thoughts. Aren't B+trees a choice that is too conservative
for a modern storage backend?
There seem to be more modern data structures (S(b) trees or
fractal trees) that are especially well suited for storing
variable-length keys.


Perhaps.  I would certainly hope that our design would allow such
structures to be swapped out as a "strategy" pattern.

Personally, I had never heard of those structures until you mentioned
them, and a quick search does not yield a concise description of their
advantages---or of how to implement them.

If you can briefly describe the differences that would be great. Ontheother other hand, a LISP-native back end using B+-trees would be anice

leap forward for us; using something better might be even better but
would not lift us into a new valence band of quality.

The fact that one can find example implementations, possibly even in
LISP, of the B+-tree, is an advantage.

My personal philosophy is gradualism --- crawling is the best way to

learn to walk, and having a B+-tree implementation gets us half theway

to the latest data structure.


 Leslie

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Re: [elephant-devel] summer of code: native lisp backend?

Reply via email to