Re: Question on load balancing in a cluster

Bill Au Fri, 06 Aug 2010 10:11:20 -0700

If nodetool loadbalance does not do what it's name implies, should it be
renamed or maybe even remove altogether since the recommendation is to
_never_ use it in production?


Bill

On Thu, Aug 5, 2010 at 6:41 AM, aaron morton <aa...@thelastpickle.com>wrote:

> This comment from Ben Black may help...
>
> "I recommend you _never_ use nodetool loadbalance in production because
> it will _not_ result in balanced load.  The correct process is manual
> calculation of tokens (the algorithm for RP is on the Operations wiki
> page) and nodetool move.
> "
> http://www.mail-archive.com/user@cassandra.apache.org/msg04933.html
>
> So the recommendation is to manually set initial tokens and then manually
> move them.
>
> As for the need to decommission I'm guessing it's for reasons such as
> making it easier to avoid overlapping tokens and to avoid accepting writes
> that will soon be moved.
>
> Others may be able to add more.
>
> Aaron
>
>
> On 5 Aug 2010, at 14:49, anand_s wrote:
>
> >
> > Hi,
> >
> > Have some thoughts on load balancing on current / new nodes. I have come
> > across some posts around this, but not sure of what is being finally
> > proposed, so..
> >
> > From what I have read, a nodebalance on a node does a decommission and
> > bootstrap of that node. Is there a reason why it is that way
> (decommission
> > and bootstrap) and not just a simple look at my next neighbor and just
> split
> > the load with it? As in if the ring has nodes A, B, C and D with load (in
> > GB) on these respectively is 100, 70, 100, 80. Then a nodetool balance on
> B
> > should result in 100, 85, 85, 80 (some tokens move from C to B). It is
> still
> > manual but data movement is only what is needed – 15 GB instead of the
> > 100+GB (decommission and bootstrap) . The idea is not to get a perfect
> > balance, but an acceptable balance with less data movement.
> >
> > Also when a new node is added, it takes 50% from the most loaded node.
> Don't
> > we want to rebalance such that the load is more or less evenly
> distributed
> > across the cluster? Would it not help if I could just specify the % load
> as
> > a parameter to rebalance command, so that I can optimize the moment of
> data
> > for rebalancing. E.g. A,B,C,E is a cluster with load being 80, 78, 83,
> 84.
> > Now I add a new node D (position will be before E), so eventually after
> all
> > the rebalance activity I want the load to be ~66 (245/5) . Now to
> minimize
> > the movement of data and still get a good balance, we move only what is
> > needed (so data sort of flows from more to less loaded nodes until
> > balanced). This could be a manual process (I am basically suggesting a
> > similar approach as in paragraph one).
> >
> > Another thought is that instead of using pure current usage on a node to
> > determine load, shouldn't there be higher level concept like "node
> weight"
> > to handle heterogeneous nodes or is the expectation that all nodes are
> more
> > or less equal?
> >
> >
> > Thanks
> > Anand
> > --
> > View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Question-on-load-balancing-in-a-cluster-tp5375140p5375140.html
> > Sent from the cassandra-u...@incubator.apache.org mailing list archive
> at Nabble.com.
>
>

Re: Question on load balancing in a cluster

Reply via email to