I should also say that the serializer has to be thread safe, so there
is some non-trivial time overhead in terms of locking on each
serializer call. I wish I could do primitive atomic ops or at least
something like a futex in common lisp, then I could create deadlock
free data structures a
I don't disagree with you that it's easy to create expensive/slow
programs. My only point was that when it matters, it doesn't
necessarily have to be a determinative issue.
The big overheads in Elephant are due to Lisp's data model and the use
of the MOP. The serializer is currently desig
IE> I doubt that using a Lisp with a modern compiler affects CPU time in
IE> any interesting way;
IE> I've observed that algorithm implementation with reasonable data
IE> structure choices is within 10's of % points of the same algorithm in
IE> C.
it's possible to optimize single function, but ty
I checked, Rucksack writes dirty objects to disk on each commit - so
should have the same performance profile as our existing backends, of
course with different constants...
Ian
On May 16, 2008, at 11:26 AM, Ian Eslick wrote:
I just wanted to clarify a few points in this interesting dis
I just wanted to clarify a few points in this interesting discussion.
but if we are using high-level language like Lisp, it's much more
likely that CPU overhead will be large comparing to memory latency.
and as compression, even simpiest/fastest one will only add to CPU
overhead, i'm very
??>> or it will matter if people are doing lots of linear reads -- but it's
??>> hard for me to imagine such database access patterns.
RLR> I think linear reads are quite common. They are, after all, why we
RLR> have implemented range queries. Moreover, the act of reading all the
RLR> data into
On Thu, 2008-05-15 at 10:39 +0300, Alex Mizrahi wrote:
> RLR> The purpose of compression is of course NOT to make the store smaller.
> RLR> It is to decrease the amount of I/O that is required to access the
> RLR> store.
>
> time to do I/O does not linearly depend on I/O amount.
>
> for exampl
RLR> The purpose of compression is of course NOT to make the store smaller.
RLR> It is to decrease the amount of I/O that is required to access the
RLR> store.
time to do I/O does not linearly depend on I/O amount.
for example, HDD reads data in blocks, with each block definitely not less
than
On Wed, 2008-05-14 at 23:27 -0400, Ian Eslick wrote:
> Actually I wouldn't do that quite yet. If we're going to be putting
> things into the C world via BDB, then we should keep the serializer
> that is efficient at doing that!
>
Well, if we're willing to have type-specific serializers, I do
However, in theory someone could write a compressing serializer right
now, independently of implementing a pure-lisp Btree-based backend.
Definitely true, and quite welcome.
For example, the serializer that Rucksack has written could be
inserted
in place of our existing serializer (or even se
I feel compelled to respectfully disagree with Alex's views stated
below.
The purpose of compression is of course NOT to make the store smaller.
It is to decrease the amount of I/O that is required to access the
store. I think this is where most of the time is spent. I suspect Alex
is correct th
IE> I was quite wrong-headed on the performance point. The biggest time
IE> cost in database operations is time spent moving data to/from disk, in
IE> comparison CPU time is quite small.
this depends on size of a database -- relatively small ones can be cached in
RAM, and with asynchronous writ
On May 14, 2008, at 10:02 AM, Robert L. Read wrote:
On Wed, 2008-05-14 at 09:40 -0400, Ian Eslick wrote:
If the BTree node binary data is stored in a lisp octet array, and
we
don't want to deserialize to compare keys, then we need to write the
procedure that you can find in libberkeley-db.c in
Here is the reference: http://www.paulgraham.com/carl.html
I was quite wrong-headed on the performance point. The biggest time
cost in database operations is time spent moving data to/from disk, in
comparison CPU time is quite small. I was thinking about the internal
copying overhead, wh
Here is an interesting discussion of using lisp for a large, high-
performance system and the reasons to use static/C data.
I wasn't advocating using C at all, and in fact removing all
dependency on it for the lisp-only backend. With cffi or uffi you can
get direct access to memory primit
On Wed, 2008-05-14 at 09:40 -0400, Ian Eslick wrote:
> If the BTree node binary data is stored in a lisp octet array, and
> we
> don't want to deserialize to compare keys, then we need to write the
> procedure that you can find in libberkeley-db.c in the src/db-bdb
> file. It performs a cont
I am certainly in favor of using fixed-size pages as a basic model. I
think this is very common. It has the advantage of allowing page-based
caching more easily. One can even contemplate things like a RAID-style
striping across machines on a LAN or WAN.
I think we should create a really pure-li
On May 14, 2008, at 3:28 AM, Leslie P. Polzer wrote:
For example, in BDB, the primitive is the page. BTree nodes are
layed
out in one or more pages, each page has some binary metadata
explaining it's type and organization (free list, etc). A short key
value is written directly into the pa
> For example, in BDB, the primitive is the page. BTree nodes are layed
> out in one or more pages, each page has some binary metadata
> explaining it's type and organization (free list, etc). A short key
> value is written directly into the page, a long one is written into an
> overflow page, e
I think they key decision was what serialization format we're going to
use for btree nodes, log entries, etc and how that relates to caching
data during transactions, maintaining empty lists, etc.
The current serializer produces a byte sequence. If we continue with
that model, how do we wr
I suppose the "binary paging" approach mentioned in the design considerations
document means the problem of organizing the data efficiently on disk
right from the start. Is this correct?
Do you think it would make good sense to start working on the btree library
without thinking much about on-dis
21 matches
Mail list logo