huge great thanks!!!! it is the leap second problem! finally I can go to bed....
On Mon, Jul 2, 2012 at 12:11 AM, David Daeschler <david.daesch...@gmail.com>wrote: > This looks like the problem a bunch of us were having yesterday that > isn't cleared without a reboot or a date command. It seems to be > related to the leap second that was added between the 30th June and > the 1st of July. > > See the mailing list thread with subject "High CPU usage as of 8pm eastern > time" > > If you are seeing high CPU usage and a stall after restarting > cassandra still, and you are on Linux, try: > > date; date `date +"%m%d%H%M%C%y.%S"`; date; > > In a terminal and see if everything starts working again. > > I hope this helps. > -- > David Daeschler > > > > On Sun, Jul 1, 2012 at 11:33 AM, Yan Chunlu <springri...@gmail.com> wrote: > > adjust the timezone of java by -Duser.timezone and the timezone of > > cassandra is the same with system(Debian 6.0). > > > > after restart cassandra I found the following error message in the log > file > > of node B. after about 2 minutes later, node C stop responding.... > > > > the error log of node B: > > > > Thrift transport error occurred during processing of message. > > org.apache.thrift.transport.TTransportException > > at > > > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > > at > > > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > > at > > > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > > at > > > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > > at > > > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > > at > > > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > > at > > > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) > > at > > > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > > > > > > > the log info in node C: > > > > > > DEBUG [MutationStage:25] 2012-07-01 23:29:42,909 > RowMutationVerbHandler.java > > (line 60) RowMutation(keyspace='spark', > > > key='39373438366235383638373631353532643133393334633435326333323634373131656462306139', > > modifications=[ColumnFamily(permacache > > [76616c7565:false:67906@1341156582948365,])]) applied. Sending > response to > > 79529@/192.168.1.129 > > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java > (line > > 523) insert > > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line > > 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark', > > key='636f6d6d656e74735f706172656e74735f32373232343938', > > modifications=[ColumnFamily(permacache > > [76616c7565:false:6@1341156582953843,])])]/QUORUM > > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line > > 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 > to > > /192.168.1.40 > > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line > > 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 > to > > /192.168.1.129 > > DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line > > 116) Version is now 3 > > DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913 > > ResponseVerbHandler.java (line 44) Processing response on a callback from > > 50050@/192.168.1.129 > > DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java > (line > > 116) Version is now 3 > > DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914 > > ResponseVerbHandler.java (line 44) Processing response on a callback from > > 50051@/192.168.1.40 > > DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java > (line > > 116) Version is now 3 > > > > > > > > On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu <springri...@gmail.com> > wrote: > >> > >> I have a three node cluster running 1.0.2, today there's a very strange > >> problem that suddenly two of cassandra node(let's say B and C) was > costing > >> a lot of cpu, turned out for some reason the "java" binary just dont > run.... > >> I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works > okay. > >> > >> after that node A stop working... same problem, I install "sun jdk", > then > >> it's okay. but minutes later, B stop working again, about 5-10 minutes > later > >> after the cassandra started, it stop responding connections, I can't > access > >> 9160 and nodetool dont return either. > >> > >> I have turned on DEBUG and dont see much useful information, the last > rows > >> on node B are as belows: > >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java > >> (line 65) resolving 2 responses > >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java > >> (line 106) digests verified > >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java > >> (line 110) resolve: 0 ms. > >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line > >> 694) Read: 5 ms. > >> DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java > (line > >> 116) Version is now 3 > >> DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java > (line > >> 116) Version is now 3 > >> > >> > >> this problem is really driving me crazy since I just dont know what > >> happened, and how to debug it, I tried to kill node A and restart it, > then > >> node B halt, after I restart B, then node C goes down...... > >> > >> > >> one thing may related is that the log time on node B is not the same > with > >> the system time(A and C are okay). > >> > >> while date on node B shows: > >> Sun Jul 1 23:10:57 CST 2012 (system time) > >> > >> but you may noticed that the time is "2012-07-01 07:45:XX" in those > above > >> log message. the system time is right, just not sure why cassandra's > log > >> file shows the wrong time, I didn't recall cassandra have timezone > >> settings..... > >> > >> > >> > >> > > >