any help? thanks! On Fri, Jul 29, 2011 at 12:05 PM, Yan Chunlu <springri...@gmail.com> wrote:
> and by the way, my RF=3 and the other two nodes have much more capacity, > why does they always routed the request to node3? > > coud I do a rebalance now? before node repair? > > > On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu <springri...@gmail.com>wrote: > >> add new nodes seems added more pressure to the cluster? how about your >> data size? >> >> >> On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan <fr...@aimatch.com> wrote: >> >>> "Dropped read message" might be an indicator of capacity issue. We >>> experienced the similar issue with 0.7.6. >>> >>> We ended up adding two extra nodes and physically rebooted the offending >>> node(s). >>> >>> The entire cluster then calmed down. >>> >>> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu <springri...@gmail.com>wrote: >>> >>>> I have three nodes and RF=3.here is the current ring: >>>> >>>> >>>> Address Status State Load Owns Token >>>> >>>> 84944475733633104818662955375549269696 >>>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102 >>>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360 >>>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696 >>>> >>>> >>>> it is very un-balanced and I would like to re-balance it using >>>> "nodetool move" asap. unfortunately I haven't been run node repair for >>>> a long time. >>>> >>>> aaron suggested it's better to run node repair on every node then >>>> re-balance it. >>>> >>>> >>>> problem is the node3 is in heavy-load currently, and the entire >>>> cluster slow down if I start doing node repair. I have to >>>> disablegossip and disablethrift to stop the repair. >>>> >>>> only cassandra running on that server and I have no idea what it was >>>> doing. the cpu load is about 20+ currently. compcationstats and >>>> netstats shows it was not doing anything. >>>> >>>> I have change client to not to connect to node3, but still, it seems >>>> in heavy load and io utils is 100%. >>>> >>>> >>>> the log seems normal(although not sure what about the "Dropped read >>>> message" thing): >>>> >>>> INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving >>>> 2563726360 used; max is 4248829952 >>>> WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms >>>> INFO 13:21:38,560 Pool Name Active Pending >>>> INFO 13:21:38,560 ReadStage 8 7555 >>>> INFO 13:21:38,561 RequestResponseStage 0 0 >>>> INFO 13:21:38,561 ReadRepairStage 0 0 >>>> >>>> >>>> >>>> is there anyway to tell what node3 was doing? or at least is there any >>>> way to make it not slowdown the whole cluster? >>>> >>> >>> >>> >>> -- >>> Frank Duan >>> aiMatch >>> fr...@aimatch.com >>> c: 703.869.9951 >>> www.aiMatch.com >>> >>> >> >