If nodetool loadbalance does not do what it's name implies, should it be renamed or maybe even remove altogether since the recommendation is to _never_ use it in production?
Bill On Thu, Aug 5, 2010 at 6:41 AM, aaron morton <aa...@thelastpickle.com>wrote: > This comment from Ben Black may help... > > "I recommend you _never_ use nodetool loadbalance in production because > it will _not_ result in balanced load. The correct process is manual > calculation of tokens (the algorithm for RP is on the Operations wiki > page) and nodetool move. > " > http://www.mail-archive.com/user@cassandra.apache.org/msg04933.html > > So the recommendation is to manually set initial tokens and then manually > move them. > > As for the need to decommission I'm guessing it's for reasons such as > making it easier to avoid overlapping tokens and to avoid accepting writes > that will soon be moved. > > Others may be able to add more. > > Aaron > > > On 5 Aug 2010, at 14:49, anand_s wrote: > > > > > Hi, > > > > Have some thoughts on load balancing on current / new nodes. I have come > > across some posts around this, but not sure of what is being finally > > proposed, so.. > > > > From what I have read, a nodebalance on a node does a decommission and > > bootstrap of that node. Is there a reason why it is that way > (decommission > > and bootstrap) and not just a simple look at my next neighbor and just > split > > the load with it? As in if the ring has nodes A, B, C and D with load (in > > GB) on these respectively is 100, 70, 100, 80. Then a nodetool balance on > B > > should result in 100, 85, 85, 80 (some tokens move from C to B). It is > still > > manual but data movement is only what is needed – 15 GB instead of the > > 100+GB (decommission and bootstrap) . The idea is not to get a perfect > > balance, but an acceptable balance with less data movement. > > > > Also when a new node is added, it takes 50% from the most loaded node. > Don't > > we want to rebalance such that the load is more or less evenly > distributed > > across the cluster? Would it not help if I could just specify the % load > as > > a parameter to rebalance command, so that I can optimize the moment of > data > > for rebalancing. E.g. A,B,C,E is a cluster with load being 80, 78, 83, > 84. > > Now I add a new node D (position will be before E), so eventually after > all > > the rebalance activity I want the load to be ~66 (245/5) . Now to > minimize > > the movement of data and still get a good balance, we move only what is > > needed (so data sort of flows from more to less loaded nodes until > > balanced). This could be a manual process (I am basically suggesting a > > similar approach as in paragraph one). > > > > Another thought is that instead of using pure current usage on a node to > > determine load, shouldn't there be higher level concept like "node > weight" > > to handle heterogeneous nodes or is the expectation that all nodes are > more > > or less equal? > > > > > > Thanks > > Anand > > -- > > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Question-on-load-balancing-in-a-cluster-tp5375140p5375140.html > > Sent from the cassandra-u...@incubator.apache.org mailing list archive > at Nabble.com. > >