Thrift and ClientState are both unrelated to hints. What do you see in the logs after "Started hinted handoff for host:..." from HintedHandoffManager?
It should either have an error message or something along the lines of "Finished hinted handoff of:..." Where there any schema updates that preceded this happening? As for the thrift stuff, which rpc_server_type are you using? On Wed, Aug 7, 2013 at 6:14 AM, David McNelis <dmcne...@gmail.com> wrote: > Morning folks, > > For the last couple of days all of my nodes (17, all running 1.2.8) have > been stuck at various percentages of completion for compacting system.hints. > I've tried restarting the nodes (including a full rolling restart of the > cluster) to no avail. > > When I turn on Debugging I am seeing this error on all of the nodes > constantly: > > DEBUG 09:03:21,999 Thrift transport error occurred during processing of > message. > org.apache.thrift.transport.TTransportException > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > at > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > at > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > at > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > at > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > > When I turn on tracing, I see that shortly after this error there is a > message similar to: > TRACE 09:03:22,000 ClientState removed for socket addr /10.55.56.211:35431 > > The IP in this message is sometimes a client machine, sometimes another > cassandra node with no processes other than C* running on it (which I think > rules out an issue with a particular client library doing something funny > with Thrift). > > While I wouldn't expect a Thrift issue to cause problems with compaction, > I'm out of other ideas at the moment. Anyone have any thoughts they could > share? > > Thanks, > David