adjust the timezone of java by -Duser.timezone and the timezone of cassandra is the same with system(Debian 6.0).
after restart cassandra I found the following error message in the log file of node B. after about 2 minutes later, node C stop responding.... the error log of node B: Thrift transport error occurred during processing of message. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) the log info in node C: DEBUG [MutationStage:25] 2012-07-01 23:29:42,909 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='spark', key='39373438366235383638373631353532643133393334633435326333323634373131656462306139', modifications=[ColumnFamily(permacache [76616c7565:false:67906@1341156582948365,])]) applied. Sending response to 79529@/192.168.1.129 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java (line 523) insert DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark', key='636f6d6d656e74735f706172656e74735f32373232343938', modifications=[ColumnFamily(permacache [76616c7565:false:6@1341156582953843 ,])])]/QUORUM DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to /192.168.1.40 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to /192.168.1.129 DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913 ResponseVerbHandler.java (line 44) Processing response on a callback from 50050@/192.168.1.129 DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914 ResponseVerbHandler.java (line 44) Processing response on a callback from 50051@/192.168.1.40 DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line 116) Version is now 3 On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu <springri...@gmail.com> wrote: > I have a three node cluster running 1.0.2, today there's a very strange > problem that suddenly two of cassandra node(let's say B and C) was costing > a lot of cpu, turned out for some reason the "java" binary just dont > run.... I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works > okay. > > after that node A stop working... same problem, I install "sun jdk", then > it's okay. but minutes later, B stop working again, about 5-10 minutes > later after the cassandra started, it stop responding connections, I can't > access 9160 and nodetool dont return either. > > I have turned on DEBUG and dont see much useful information, the last rows > on node B are as belows: > DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java > (line 65) resolving 2 responses > DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java > (line 106) digests verified > DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java > (line 110) resolve: 0 ms. > DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line > 694) Read: 5 ms. > DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line > 116) Version is now 3 > DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line > 116) Version is now 3 > > > this problem is really driving me crazy since I just dont know what > happened, and how to debug it, I tried to kill node A and restart it, then > node B halt, after I restart B, then node C goes down...... > > > one thing may related is that the log time on node B is not the same with > the system time(A and C are okay). > > while date on node B shows: > Sun Jul 1 23:10:57 CST 2012 (system time) > > but you may noticed that the time is "2012-07-01 07:45:XX" in those above > log message. the system time is right, just not sure why cassandra's log > file shows the wrong time, I didn't recall cassandra have timezone > settings..... > > > > >