We've been working on tracking down the causes of nodes in the cluster incorrectly marking other, healthy nodes down. We've identified three scenarios. The first two deal with the Gossip thread blocking while processing a state change, preventing subsequent heartbeats from being processed:
1. Write activity + cluster membership changes (CASSANDRA-6297). The Gossip stage would block while flushing system.peers, which could get backed up flushes of user tables. By default, there is one flush thread per configured data directory. (Thus, increasing memtable_flush_writers in cassandra.yaml can be an effective workaround, especially if you are on SSDs where the increased contention will be low.) 2. Cluster membership changes with many keyspaces configured (CASSANDRA-6244). Computing the ranges to be transferred between nodes is linear with respect to the number of keyspaces (since that is where replication options are configured). I suspect that enabling vnodes will exacerbate this as well. We're still analyzing the third: 3. Large (hundreds to thousands of node) clusters with vnodes enabled show FD false positives even without cluster membership changes (CASSANDRA-6127). Fixes for (1) and (2) are committed and will be in 1.2.12 and 2.0.3. We can reproduce (3) and hope to have a resolution soon. In the meantime, caution is advised when deploying vnode-enabled clusters, since other pressures on the system could make this a problem with smaller clusters as well. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced