Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-16 Thread Ian Eslick
I should also say that the serializer has to be thread safe, so there is some non-trivial time overhead in terms of locking on each serializer call. I wish I could do primitive atomic ops or at least something like a futex in common lisp, then I could create deadlock free data structures a

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-16 Thread Ian Eslick
I don't disagree with you that it's easy to create expensive/slow programs. My only point was that when it matters, it doesn't necessarily have to be a determinative issue. The big overheads in Elephant are due to Lisp's data model and the use of the MOP. The serializer is currently desig

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-16 Thread Alex Mizrahi
IE> I doubt that using a Lisp with a modern compiler affects CPU time in IE> any interesting way; IE> I've observed that algorithm implementation with reasonable data IE> structure choices is within 10's of % points of the same algorithm in IE> C. it's possible to optimize single function, but ty

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-16 Thread Ian Eslick
I checked, Rucksack writes dirty objects to disk on each commit - so should have the same performance profile as our existing backends, of course with different constants... Ian On May 16, 2008, at 11:26 AM, Ian Eslick wrote: I just wanted to clarify a few points in this interesting dis

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-16 Thread Ian Eslick
I just wanted to clarify a few points in this interesting discussion. but if we are using high-level language like Lisp, it's much more likely that CPU overhead will be large comparing to memory latency. and as compression, even simpiest/fastest one will only add to CPU overhead, i'm very

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-16 Thread Alex Mizrahi
??>> or it will matter if people are doing lots of linear reads -- but it's ??>> hard for me to imagine such database access patterns. RLR> I think linear reads are quite common. They are, after all, why we RLR> have implemented range queries. Moreover, the act of reading all the RLR> data into

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-15 Thread Robert L. Read
On Thu, 2008-05-15 at 10:39 +0300, Alex Mizrahi wrote: > RLR> The purpose of compression is of course NOT to make the store smaller. > RLR> It is to decrease the amount of I/O that is required to access the > RLR> store. > > time to do I/O does not linearly depend on I/O amount. > > for exampl

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-15 Thread Alex Mizrahi
RLR> The purpose of compression is of course NOT to make the store smaller. RLR> It is to decrease the amount of I/O that is required to access the RLR> store. time to do I/O does not linearly depend on I/O amount. for example, HDD reads data in blocks, with each block definitely not less than

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-14 Thread Robert L. Read
On Wed, 2008-05-14 at 23:27 -0400, Ian Eslick wrote: > Actually I wouldn't do that quite yet. If we're going to be putting > things into the C world via BDB, then we should keep the serializer > that is efficient at doing that! > Well, if we're willing to have type-specific serializers, I do

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-14 Thread Ian Eslick
However, in theory someone could write a compressing serializer right now, independently of implementing a pure-lisp Btree-based backend. Definitely true, and quite welcome. For example, the serializer that Rucksack has written could be inserted in place of our existing serializer (or even se

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-14 Thread Robert L. Read
I feel compelled to respectfully disagree with Alex's views stated below. The purpose of compression is of course NOT to make the store smaller. It is to decrease the amount of I/O that is required to access the store. I think this is where most of the time is spent. I suspect Alex is correct th

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-14 Thread Alex Mizrahi
IE> I was quite wrong-headed on the performance point. The biggest time IE> cost in database operations is time spent moving data to/from disk, in IE> comparison CPU time is quite small. this depends on size of a database -- relatively small ones can be cached in RAM, and with asynchronous writ

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-14 Thread Ian Eslick
On May 14, 2008, at 10:02 AM, Robert L. Read wrote: On Wed, 2008-05-14 at 09:40 -0400, Ian Eslick wrote: If the BTree node binary data is stored in a lisp octet array, and we don't want to deserialize to compare keys, then we need to write the procedure that you can find in libberkeley-db.c in

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-14 Thread Ian Eslick
Here is the reference: http://www.paulgraham.com/carl.html I was quite wrong-headed on the performance point. The biggest time cost in database operations is time spent moving data to/from disk, in comparison CPU time is quite small. I was thinking about the internal copying overhead, wh

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-14 Thread Ian Eslick
Here is an interesting discussion of using lisp for a large, high- performance system and the reasons to use static/C data. I wasn't advocating using C at all, and in fact removing all dependency on it for the lisp-only backend. With cffi or uffi you can get direct access to memory primit

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-14 Thread Robert L. Read
On Wed, 2008-05-14 at 09:40 -0400, Ian Eslick wrote: > If the BTree node binary data is stored in a lisp octet array, and > we > don't want to deserialize to compare keys, then we need to write the > procedure that you can find in libberkeley-db.c in the src/db-bdb > file. It performs a cont

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-14 Thread Robert L. Read
I am certainly in favor of using fixed-size pages as a basic model. I think this is very common. It has the advantage of allowing page-based caching more easily. One can even contemplate things like a RAID-style striping across machines on a LAN or WAN. I think we should create a really pure-li

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-14 Thread Ian Eslick
On May 14, 2008, at 3:28 AM, Leslie P. Polzer wrote: For example, in BDB, the primitive is the page. BTree nodes are layed out in one or more pages, each page has some binary metadata explaining it's type and organization (free list, etc). A short key value is written directly into the pa

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-14 Thread Leslie P. Polzer
> For example, in BDB, the primitive is the page. BTree nodes are layed > out in one or more pages, each page has some binary metadata > explaining it's type and organization (free list, etc). A short key > value is written directly into the page, a long one is written into an > overflow page, e

Re: [elephant-devel] Lisp Btrees: design considerations

2008-05-13 Thread Ian Eslick
I think they key decision was what serialization format we're going to use for btree nodes, log entries, etc and how that relates to caching data during transactions, maintaining empty lists, etc. The current serializer produces a byte sequence. If we continue with that model, how do we wr

[elephant-devel] Lisp Btrees: design considerations

2008-05-13 Thread Leslie P. Polzer
I suppose the "binary paging" approach mentioned in the design considerations document means the problem of organizing the data efficiently on disk right from the start. Is this correct? Do you think it would make good sense to start working on the btree library without thinking much about on-dis