I'm still thinking about the problem of how to handle range queries on very
large sets of data, using Random Partitioning.

Has anyone used tree search to solve this? What do you think?

More specifically, something like this:

- Store a maximum of 1000 values per supercolumn (or some other fixed
number)
- Each supercolumn has a "greaterChild" and a "lessChild" in addition to the
values
- When the number of values in the supercolumn grows beyond the maximum,
split it into 3 parts, with the top third going into "greaterChild" and the
bottom third into "lessChild"
- To find a value, look at "greaterChild" and "lessChild" to find out
whether your key is within the current range, and if not, where to look next
- Range searches mean finding the first value, then looking at
"greaterChild" or "lessChild" (depending on the direction of your search)
until you reach the end of the range.

Super Column Family:

index [ <columnFamilyId> [ "firstVal" : <val> ,
                           "lastVal" : <val> ,
                           <val> : <dataId>,
                           "lessChild" : <columnFamilyId> ,
                           "greaterChild" : <columnFamilyId> ]

Reply via email to