Yes, imo, it should be renamed.
On Fri, Aug 6, 2010 at 10:10 AM, Bill Au <bill.w...@gmail.com> wrote: > If nodetool loadbalance does not do what it's name implies, should it be > renamed or maybe even remove altogether since the recommendation is to > _never_ use it in production? > > Bill > > On Thu, Aug 5, 2010 at 6:41 AM, aaron morton <aa...@thelastpickle.com> > wrote: >> >> This comment from Ben Black may help... >> >> "I recommend you _never_ use nodetool loadbalance in production because >> it will _not_ result in balanced load. The correct process is manual >> calculation of tokens (the algorithm for RP is on the Operations wiki >> page) and nodetool move. >> " >> http://www.mail-archive.com/user@cassandra.apache.org/msg04933.html >> >> So the recommendation is to manually set initial tokens and then manually >> move them. >> >> As for the need to decommission I'm guessing it's for reasons such as >> making it easier to avoid overlapping tokens and to avoid accepting writes >> that will soon be moved. >> >> Others may be able to add more. >> >> Aaron >> >> >> On 5 Aug 2010, at 14:49, anand_s wrote: >> >> > >> > Hi, >> > >> > Have some thoughts on load balancing on current / new nodes. I have come >> > across some posts around this, but not sure of what is being finally >> > proposed, so.. >> > >> > From what I have read, a nodebalance on a node does a decommission and >> > bootstrap of that node. Is there a reason why it is that way >> > (decommission >> > and bootstrap) and not just a simple look at my next neighbor and just >> > split >> > the load with it? As in if the ring has nodes A, B, C and D with load >> > (in >> > GB) on these respectively is 100, 70, 100, 80. Then a nodetool balance >> > on B >> > should result in 100, 85, 85, 80 (some tokens move from C to B). It is >> > still >> > manual but data movement is only what is needed – 15 GB instead of the >> > 100+GB (decommission and bootstrap) . The idea is not to get a perfect >> > balance, but an acceptable balance with less data movement. >> > >> > Also when a new node is added, it takes 50% from the most loaded node. >> > Don't >> > we want to rebalance such that the load is more or less evenly >> > distributed >> > across the cluster? Would it not help if I could just specify the % load >> > as >> > a parameter to rebalance command, so that I can optimize the moment of >> > data >> > for rebalancing. E.g. A,B,C,E is a cluster with load being 80, 78, 83, >> > 84. >> > Now I add a new node D (position will be before E), so eventually after >> > all >> > the rebalance activity I want the load to be ~66 (245/5) . Now to >> > minimize >> > the movement of data and still get a good balance, we move only what is >> > needed (so data sort of flows from more to less loaded nodes until >> > balanced). This could be a manual process (I am basically suggesting a >> > similar approach as in paragraph one). >> > >> > Another thought is that instead of using pure current usage on a node to >> > determine load, shouldn't there be higher level concept like "node >> > weight" >> > to handle heterogeneous nodes or is the expectation that all nodes are >> > more >> > or less equal? >> > >> > >> > Thanks >> > Anand >> > -- >> > View this message in context: >> > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Question-on-load-balancing-in-a-cluster-tp5375140p5375140.html >> > Sent from the cassandra-u...@incubator.apache.org mailing list archive >> > at Nabble.com. >> > >