I have a three node cluster running 1.0.2, today there's a very strange problem that suddenly two of cassandra node(let's say B and C) was costing a lot of cpu, turned out for some reason the "java" binary just dont run.... I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works okay.
after that node A stop working... same problem, I install "sun jdk", then it's okay. but minutes later, B stop working again, about 5-10 minutes later after the cassandra started, it stop responding connections, I can't access 9160 and nodetool dont return either. I have turned on DEBUG and dont see much useful information, the last rows on node B are as belows: DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 65) resolving 2 responses DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 106) digests verified DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 110) resolve: 0 ms. DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line 694) Read: 5 ms. DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line 116) Version is now 3 this problem is really driving me crazy since I just dont know what happened, and how to debug it, I tried to kill node A and restart it, then node B halt, after I restart B, then node C goes down...... one thing may related is that the log time on node B is not the same with the system time(A and C are okay). while date on node B shows: Sun Jul 1 23:10:57 CST 2012 (system time) but you may noticed that the time is "2012-07-01 07:45:XX" in those above log message. the system time is right, just not sure why cassandra's log file shows the wrong time, I didn't recall cassandra have timezone settings.....