Mike, If you have the assumption that your rows are roughly equal in size (at least statistcally), then you could also just take a node's total load (this is exposed via Jmx) and divide by the amount of keys/rows on that node. Not sure how to get the latter, but shouldn't be such a big deal to integrate in JMX if not already there.
Roland 26.03.2010 22:36 schrieb am "Mike Malone" <m...@simplegeo.com>: 2010/3/26 Roland Hänel <rol...@haenel.me> > > Jonathan, > > I agree with your idea about a tool that could 'propose' good token choices for op... With the random partitioner there's no need to suggest a token. The key space is statistically random so you should be able to just split 2^128 into equal sized segments and get fairly equal storage load. Your read / write load could get out of whack if you have hot spots and stuff, I guess. But for a large distributed data set I think that's unlikely. For order preserving partitioners it's harder. We've been thinking about this issue at SimpleGeo and were planning on implementing an algorithm that could determine the median row key statistically without having to inspect every key. Basically, it would pull a random sample of row keys (maybe from the Index file?) and then determine the median of that sample. Thoughts? Mike