Hi C* user list,

I have a curious recurring problem with Cassandra 1.2 and what seems like a GC 
issue.

The cluster looks somewhat well balanced, all nodes are running HotSpot JVM 
1.6.0_31-b04 and cassandra 1.2.3.

Address         Rack        Status State   Load            Owns       
10.2.3.6        RAC6        Up     Normal  15.13 GB        12.71%     
10.2.3.5        RAC5        Up     Normal  16.87 GB        13.57%     
10.2.3.8        RAC8        Up     Normal  13.27 GB        13.71%     
10.2.3.1        RAC1        Up     Normal  16.46 GB        14.08%     
10.2.3.7        RAC7        Up     Normal  11.59 GB        14.34%     
10.2.3.2        RAC2        Up     Normal  23.15 GB        15.12%     
10.2.3.4        RAC4        Up     Normal  16.52 GB        16.47%     

Every now and then (roughly once a month, currently), two nodes (always the 
same two) need to be restarted after they start eating all available CPU 
cycles and read and write latencies increase dramatically. Restart fixes this 
every time.

The only metric that significantly deviates from the average for all nodes 
shows GC doing something: http://bou.si/rest/parnew.png

Is there a way to debug this? After searching online it appears as nobody has 
really solved this problem and I have no idea what could cause such behaviour 
in just two particular cluster nodes.

I'm now thinking of decomissioning the problematic nodes and bootstrapping 
them anew, but can't decide if this could possibly help.

Thanks in advance for any insight anyone might offer,

-- 
Jure Koren, DevOps
http://www.zemanta.com/

Reply via email to