How often do users of Elephant rely on BTrees where the symbols are ordered according to the symbol's name?

Would it be terribly inconvenient to you if you had to convert symbol keys to strings to get an alphabetical ordering, but were still assured of contiguity of identical symbol keys in secondary btrees?

The argument is that we serialize symbols to strings all the time (slot access, etc) and this engenders a great deal of overhead. Most symbol serialization is highly redundant and can be factored out by assigning a persistent ID to each symbol as we do with persistent objects. This results in significantly less disk space (a constant vs. N*char_width bits for every slot value), reduced IO bandwidth (and increased locality), and less serialization/deserialization time.

The downside is that the C function passed to BDB which is used to compare two strings so that the BTree is ordered does not have access to the persistent table. Thus we can't order strings according to their characters, but only according to their ID which means a random order. Of course symbols will be identical to themselves and so will be grouped together in duplicate indices.

Sorting according to the characters of the symbol may be possible, but there are a number of implications that require some thinking about and I want to put this off for now.

As this is a user configurable option (my-config.sexp) and will be enabled by default I don't think there is any harm in promoting this.

Comments?

Thank you,
Ian
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Reply via email to