On Fri, Mar 26, 2010 at 4:35 PM, Mike Malone <m...@simplegeo.com> wrote: > With the random partitioner there's no need to suggest a token. The key > space is statistically random so you should be able to just split 2^128 into > equal sized segments and get fairly equal storage load. Your read / write > load could get out of whack if you have hot spots and stuff, I guess. But > for a large distributed data set I think that's unlikely. > For order preserving partitioners it's harder. We've been thinking about > this issue at SimpleGeo and were planning on implementing an algorithm that > could determine the median row key statistically without having to inspect > every key. Basically, it would pull a random sample of row keys (maybe from > the Index file?) and then determine the median of that sample. Thoughts?
That's exactly what the bootstrap token calculation does for OPP, after picking the most-loaded node to talk to. You could expose that over JMX, or generalize it to giving say 100 tokens, evenly spaced, so the tool could estimate position to within 1%. -Jonathan