I'm running cassandra-1.2.8 in a cluster with 45 nodes across three racks. All nodes are well behaved except one. Whenever I start this node, it starts churning CPU. Running nodetool tpstats, I notice that the number of pending gossip stage tasks is constantly increasing [1]. When looking at nodetool gossipinfo, I notice that this node has updated to the latest schema hash, but that it thinks other nodes in the cluster are on the older version. I've tried to drain, decommission, wipe node data, bootstrap, and repair the node. However, the node just started doing the same thing again.
Has anyone run into this issue before? Can anyone provide any insight into why this node is the only one in the cluster having problems? Are there any easy fixes? Thank you, Faraaz [1] $ /cassandra/bin/nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 8 0 0 RequestResponseStage 0 0 49198 0 0 MutationStage 0 0 224286 0 0 ReadRepairStage 0 0 0 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 1 2213 18 0 0 AntiEntropyStage 0 0 0 0 0 MigrationStage 0 0 72 0 0 MemtablePostFlusher 0 0 102 0 0 FlushWriter 0 0 99 0 0 MiscStage 0 0 0 0 0 commitlog_archiver 0 0 0 0 0 InternalResponseStage 0 0 19 0 0 HintedHandoff 0 0 2 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 BINARY 0 READ 0 MUTATION 0 _TRACE 0 REQUEST_RESPONSE 0