and by the way, my RF=3 and the other two nodes have much more capacity, why does they always routed the request to node3?
coud I do a rebalance now? before node repair? On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu <springri...@gmail.com> wrote: > add new nodes seems added more pressure to the cluster? how about your > data size? > > > On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan <fr...@aimatch.com> wrote: > >> "Dropped read message" might be an indicator of capacity issue. We >> experienced the similar issue with 0.7.6. >> >> We ended up adding two extra nodes and physically rebooted the offending >> node(s). >> >> The entire cluster then calmed down. >> >> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu <springri...@gmail.com>wrote: >> >>> I have three nodes and RF=3.here is the current ring: >>> >>> >>> Address Status State Load Owns Token >>> >>> 84944475733633104818662955375549269696 >>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102 >>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360 >>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696 >>> >>> >>> it is very un-balanced and I would like to re-balance it using >>> "nodetool move" asap. unfortunately I haven't been run node repair for >>> a long time. >>> >>> aaron suggested it's better to run node repair on every node then >>> re-balance it. >>> >>> >>> problem is the node3 is in heavy-load currently, and the entire >>> cluster slow down if I start doing node repair. I have to >>> disablegossip and disablethrift to stop the repair. >>> >>> only cassandra running on that server and I have no idea what it was >>> doing. the cpu load is about 20+ currently. compcationstats and >>> netstats shows it was not doing anything. >>> >>> I have change client to not to connect to node3, but still, it seems >>> in heavy load and io utils is 100%. >>> >>> >>> the log seems normal(although not sure what about the "Dropped read >>> message" thing): >>> >>> INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving >>> 2563726360 used; max is 4248829952 >>> WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms >>> INFO 13:21:38,560 Pool Name Active Pending >>> INFO 13:21:38,560 ReadStage 8 7555 >>> INFO 13:21:38,561 RequestResponseStage 0 0 >>> INFO 13:21:38,561 ReadRepairStage 0 0 >>> >>> >>> >>> is there anyway to tell what node3 was doing? or at least is there any >>> way to make it not slowdown the whole cluster? >>> >> >> >> >> -- >> Frank Duan >> aiMatch >> fr...@aimatch.com >> c: 703.869.9951 >> www.aiMatch.com >> >> >