Jonathan, I agree with your idea about a tool that could 'propose' good token choices for optimal load-balancing.
If I was going to write such a tool: do you think the thrift API provides the necessary information? I think with the RandomPartitioner you cannot scan all your rows to actually find out how big certain ranges of rows are. And even with the OPP (that is the major target for this kind of tool, for sure) you would have to fetch all row's content just to find out how large it is, right? Greetings, Roland 25.03.2010 22:28 schrieb am "Jonathan Ellis" <jbel...@gmail.com>: One problem is if the heaviest node is next to a node that's is lighter than average, instead of heavier. Then if the new node takes extra from the heaviest, say 75% instead of just 1/2, and then we take 1/2 of the heaviest's neighbor and put it on the heaviest, you made that lighter-than-average node even lighter. Could you move 1/2, 1/4, etc. only until you get to a node lighter than average? Probably. But I'm not sure if it's a big enough win to justify the the complexity. Probably a better solution would be a tool where you tell it "I want to add N nodes to my cluster, analyzes the load factors and tell me what tokens to add them with, and what additional moves to make to get me within M% of equal loads, with the minimum amount of data movement." -Jonathan On Thu, Mar 25, 2010 at 1:52 PM, Jeremy Dunck <jdu...@gmail.com> wrote: > On Thu, Mar 25, 2010 at 1...