Nate, We had a node that was flaking on us last week and had a lot of handoffs fail to that node. We ended up decommissioning that node entirely. I can't find the actual error we were getting at the time (logs have been rotated out), but currently we're not seeing any errors there.
We haven't had any schema updates recently and we are using the sync rpc server. We had hsha turned on for a while, but we were getting a bunch of transport frame size errors. On Wed, Aug 7, 2013 at 1:55 PM, Nate McCall <zznat...@gmail.com> wrote: > Thrift and ClientState are both unrelated to hints. > > What do you see in the logs after "Started hinted handoff for > host:..." from HintedHandoffManager? > > It should either have an error message or something along the lines of > "Finished hinted handoff of:..." > > Where there any schema updates that preceded this happening? > > As for the thrift stuff, which rpc_server_type are you using? > > > > On Wed, Aug 7, 2013 at 6:14 AM, David McNelis <dmcne...@gmail.com> wrote: > > Morning folks, > > > > For the last couple of days all of my nodes (17, all running 1.2.8) have > > been stuck at various percentages of completion for compacting > system.hints. > > I've tried restarting the nodes (including a full rolling restart of the > > cluster) to no avail. > > > > When I turn on Debugging I am seeing this error on all of the nodes > > constantly: > > > > DEBUG 09:03:21,999 Thrift transport error occurred during processing of > > message. > > org.apache.thrift.transport.TTransportException > > at > > > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > > at > > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > > at > > > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > > at > > > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > > at > > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > > at > > > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > > at > > > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > > at > > > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > > at > org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22) > > at > > > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:724) > > > > > > When I turn on tracing, I see that shortly after this error there is a > > message similar to: > > TRACE 09:03:22,000 ClientState removed for socket addr / > 10.55.56.211:35431 > > > > The IP in this message is sometimes a client machine, sometimes another > > cassandra node with no processes other than C* running on it (which I > think > > rules out an issue with a particular client library doing something funny > > with Thrift). > > > > While I wouldn't expect a Thrift issue to cause problems with compaction, > > I'm out of other ideas at the moment. Anyone have any thoughts they > could > > share? > > > > Thanks, > > David >