What's your replication factor? Can you check tp stats and net stats to see if 
you are getting more mutations on these nodes ?

Sent from my iPhone

On Jul 16, 2013, at 3:18 PM, Jure Koren <[email protected]> wrote:

> Hi C* user list,
> 
> I have a curious recurring problem with Cassandra 1.2 and what seems like a 
> GC issue.
> 
> The cluster looks somewhat well balanced, all nodes are running HotSpot JVM 
> 1.6.0_31-b04 and cassandra 1.2.3.
> 
> Address Rack Status State Load Owns
> 10.2.3.6 RAC6 Up Normal 15.13 GB 12.71%
> 10.2.3.5 RAC5 Up Normal 16.87 GB 13.57%
> 10.2.3.8 RAC8 Up Normal 13.27 GB 13.71%
> 10.2.3.1 RAC1 Up Normal 16.46 GB 14.08%
> 10.2.3.7 RAC7 Up Normal 11.59 GB 14.34%
> 10.2.3.2 RAC2 Up Normal 23.15 GB 15.12%
> 10.2.3.4 RAC4 Up Normal 16.52 GB 16.47%
> 
> Every now and then (roughly once a month, currently), two nodes (always the 
> same two) need to be restarted after they start eating all available CPU 
> cycles and read and write latencies increase dramatically. Restart fixes this 
> every time.
> 
> The only metric that significantly deviates from the average for all nodes 
> shows GC doing something: http://bou.si/rest/parnew.png
> 
> Is there a way to debug this? After searching online it appears as nobody has 
> really solved this problem and I have no idea what could cause such behaviour 
> in just two particular cluster nodes.
> 
> I'm now thinking of decomissioning the problematic nodes and bootstrapping 
> them anew, but can't decide if this could possibly help.
> 
> Thanks in advance for any insight anyone might offer,
> 
> --
> Jure Koren, DevOps
> http://www.zemanta.com/

Reply via email to