Re: Question on load balancing in a cluster

Benjamin Black Fri, 06 Aug 2010 14:10:45 -0700

Yes, imo, it should be renamed.


On Fri, Aug 6, 2010 at 10:10 AM, Bill Au <bill.w...@gmail.com> wrote:
> If nodetool loadbalance does not do what it's name implies, should it be
> renamed or maybe even remove altogether since the recommendation is to
> _never_ use it in production?
>
> Bill
>
> On Thu, Aug 5, 2010 at 6:41 AM, aaron morton <aa...@thelastpickle.com>
> wrote:
>>
>> This comment from Ben Black may help...
>>
>> "I recommend you _never_ use nodetool loadbalance in production because
>> it will _not_ result in balanced load.  The correct process is manual
>> calculation of tokens (the algorithm for RP is on the Operations wiki
>> page) and nodetool move.
>> "
>> http://www.mail-archive.com/user@cassandra.apache.org/msg04933.html
>>
>> So the recommendation is to manually set initial tokens and then manually
>> move them.
>>
>> As for the need to decommission I'm guessing it's for reasons such as
>> making it easier to avoid overlapping tokens and to avoid accepting writes
>> that will soon be moved.
>>
>> Others may be able to add more.
>>
>> Aaron
>>
>>
>> On 5 Aug 2010, at 14:49, anand_s wrote:
>>
>> >
>> > Hi,
>> >
>> > Have some thoughts on load balancing on current / new nodes. I have come
>> > across some posts around this, but not sure of what is being finally
>> > proposed, so..
>> >
>> > From what I have read, a nodebalance on a node does a decommission and
>> > bootstrap of that node. Is there a reason why it is that way
>> > (decommission
>> > and bootstrap) and not just a simple look at my next neighbor and just
>> > split
>> > the load with it? As in if the ring has nodes A, B, C and D with load
>> > (in
>> > GB) on these respectively is 100, 70, 100, 80. Then a nodetool balance
>> > on B
>> > should result in 100, 85, 85, 80 (some tokens move from C to B). It is
>> > still
>> > manual but data movement is only what is needed – 15 GB instead of the
>> > 100+GB (decommission and bootstrap) . The idea is not to get a perfect
>> > balance, but an acceptable balance with less data movement.
>> >
>> > Also when a new node is added, it takes 50% from the most loaded node.
>> > Don't
>> > we want to rebalance such that the load is more or less evenly
>> > distributed
>> > across the cluster? Would it not help if I could just specify the % load
>> > as
>> > a parameter to rebalance command, so that I can optimize the moment of
>> > data
>> > for rebalancing. E.g. A,B,C,E is a cluster with load being 80, 78, 83,
>> > 84.
>> > Now I add a new node D (position will be before E), so eventually after
>> > all
>> > the rebalance activity I want the load to be ~66 (245/5) . Now to
>> > minimize
>> > the movement of data and still get a good balance, we move only what is
>> > needed (so data sort of flows from more to less loaded nodes until
>> > balanced). This could be a manual process (I am basically suggesting a
>> > similar approach as in paragraph one).
>> >
>> > Another thought is that instead of using pure current usage on a node to
>> > determine load, shouldn't there be higher level concept like "node
>> > weight"
>> > to handle heterogeneous nodes or is the expectation that all nodes are
>> > more
>> > or less equal?
>> >
>> >
>> > Thanks
>> > Anand
>> > --
>> > View this message in context:
>> > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Question-on-load-balancing-in-a-cluster-tp5375140p5375140.html
>> > Sent from the cassandra-u...@incubator.apache.org mailing list archive
>> > at Nabble.com.
>>
>
>

Re: Question on load balancing in a cluster

Reply via email to