I think this might be interesting to some of you... "Judy is a general purpose dynamic array implemented as a C callable library. Judy's speed and memory usage are typically better than other data storage models and improves with very large data sets."
http://judy.sourceforge.net/application/10minutes.htm http://judy.sourceforge.net/application/ http://sourceforge.net/projects/judy I've appended a few extracts from the "10minutes.htm" url given above. Tim. A (CPU) cache-line fill is additional time required to do a read reference from RAM when a word is not found in cache. In today's computers the time for a cache-line fill is in the range of 50..2000 machine instructions. Therefore a cache-line fill should be avoided when fewer than 50 instructions can do the same job. (Modern machines tend to pipeline writes to RAM. They often take no additional time in the Judy design.) Some of the reasons Judy outperforms binary trees, b-trees, and skip-lists: * Judy rarely compromises speed/space performance for simplicity (Judy will never be called simple except at the API). * Judy is designed to avoid cache-line fills wherever possible. (This is the main design criteria for Judy.) * A b-tree requires a search of each node (branch), resulting in more cache-line fills. * A binary-tree has many more levels (about 8X), resulting in more cache-line fills. * A skip-list is roughly equivalent to a degree-4 (4-ary) tree, resulting in more cache-line fills >From a speed point of view Judy is chiefly a 256-ary digital tree or trie (per D. Knuth Volume 3 definitions). A degree of 256-ary is a somewhat "magic" N-ary for a variety of reasons -- but mostly because a byte (the least addressable memory unit) is 8 bits. Also a higher degree means reduced cache-line fills per access. You see the theme here -- avoid cache-line fills like the plague. The presence of a CPU cache in modern machines has changed many of the ways to write a performance algorithm. To take advantage of a cache, it is important to leverage as much as possible. In a Judy tree, the presence of a cache results in 1..3 (or more) fewer cache-line fills per lookup than would be possible without a cache. Judy adapts efficiently to a wide range of populations and data set densities. Since the Judy data structure is a tree of trees, each sub-tree is a static expanse that is optimized to match the "character" or density of the keys it contains. Notice that to insert or delete a key is almost as simple as setting or clearing a bit.