Hi all I'm running into a weird error on Cassandra 1.0.7. As my clusters load gets heavier many of the nodes seem to hit the same error around the same time, resulting in MutationStage backing up and never clearing down. The only way to recover the cluster is to kill all the nodes and start them up again. The error is as below and is repeated continuously until I kill the Cassandra process.
ERROR [ReplicateOnWriteStage:57] 2012-03-21 14:02:05,099 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ReplicateOnWriteStage:57,5,main] java.lang.RuntimeException: java.util.concurrent.TimeoutException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1227) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.util.concurrent.TimeoutException at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:301) at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:544) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1223) ... 3 more