On Thursday, 13 June 2024, at 6:08 AM, ori wrote:
> Sounds fairly interesting, though I'm curious how it compares;
my guess was that because of the lack of locality due to using
hashes for the score, a trie wouldn't be that different from a
hash table.
You are right. Lack of locality is a main issue. 

But, as Noam Preil has shown in his workshop presentation, it is nearly 
impossible to work on the current implementation. Example: index.c has 19202 
bytes, whereas trie.c has 3232, doing roughly the same work.

The main drawback of the current venti is the disk based index. A fair 
comparison will have to put this on a ramdisk. The index section of my data set 
now is 5GB, but 2GB would be surely sufficient. When I have my implemention 
ready to work (this is a Herculian task, but I intend just to disable index 
disk writing without changing the rest), I shall publish the performance data.

One idea to improve locality is avoiding  the storage of the full score in the 
trie nodes. Only the nodes referring to the first 6 nibbles tend to be densely 
populated. The leaves  need not really to contain the score, as the index addr 
stored there also points to a place where the score is found in the the arena 
partition. This should not really be detrimental for performance as the block 
is needed anyway.


------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T21878aa53884911b-Mf21f7a13249d0b37cde75c53
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Reply via email to