Re: how to solve one node is in heavy load in unbalanced cluster

Yan Chunlu Thu, 28 Jul 2011 21:06:37 -0700

and by the way, my RF=3 and the other two nodes have much more capacity, why
does they always routed the request to node3?


coud I do a rebalance now? before node repair?

On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu <springri...@gmail.com> wrote:

> add new nodes seems added more pressure  to the cluster?  how about your
> data size?
>
>
> On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan <fr...@aimatch.com> wrote:
>
>> "Dropped read message" might be an indicator of capacity issue. We
>> experienced the similar issue with 0.7.6.
>>
>> We ended up adding two extra nodes and physically rebooted the offending
>> node(s).
>>
>> The entire cluster then calmed down.
>>
>> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu <springri...@gmail.com>wrote:
>>
>>> I have three nodes and RF=3.here is the current ring:
>>>
>>>
>>> Address Status State Load Owns Token
>>>
>>> 84944475733633104818662955375549269696
>>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
>>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
>>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>>>
>>>
>>> it is very un-balanced and I would like to re-balance it using
>>> "nodetool move" asap. unfortunately I haven't been run node repair for
>>> a long time.
>>>
>>> aaron suggested it's better to run node repair on every node then
>>> re-balance it.
>>>
>>>
>>> problem is the node3 is in heavy-load currently, and the entire
>>> cluster slow down if I start doing node repair. I have to
>>> disablegossip and disablethrift to stop the repair.
>>>
>>> only cassandra running on that server and I have no idea what it was
>>> doing. the cpu load is about 20+ currently. compcationstats and
>>> netstats shows it was not doing anything.
>>>
>>> I have change client to not to connect to node3, but still, it seems
>>> in heavy load and io utils is 100%.
>>>
>>>
>>> the log seems normal(although not sure what about the "Dropped read
>>> message" thing):
>>>
>>>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
>>> 2563726360 used; max is 4248829952
>>>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>>>  INFO 13:21:38,560 Pool Name                    Active   Pending
>>>  INFO 13:21:38,560 ReadStage                         8      7555
>>>  INFO 13:21:38,561 RequestResponseStage              0         0
>>>  INFO 13:21:38,561 ReadRepairStage                   0         0
>>>
>>>
>>>
>>> is there anyway to tell what node3 was doing? or at least is there any
>>> way to make it not slowdown the whole cluster?
>>>
>>
>>
>>
>> --
>> Frank Duan
>> aiMatch
>> fr...@aimatch.com
>> c: 703.869.9951
>> www.aiMatch.com
>>
>>
>

Re: how to solve one node is in heavy load in unbalanced cluster

Reply via email to