We've been working on tracking down the causes of nodes in the cluster
incorrectly marking other, healthy nodes down.  We've identified three
scenarios.  The first two deal with the Gossip thread blocking while
processing a state change, preventing subsequent heartbeats from being
processed:

1. Write activity + cluster membership changes (CASSANDRA-6297).  The
Gossip stage would block while flushing system.peers, which could get
backed up flushes of user tables.  By default, there is one flush
thread per configured data directory.  (Thus, increasing
memtable_flush_writers in cassandra.yaml can be an effective
workaround, especially if you are on SSDs where the increased
contention will be low.)

2. Cluster membership changes with many keyspaces configured
(CASSANDRA-6244).  Computing the ranges to be transferred between
nodes is linear with respect to the number of keyspaces (since that is
where replication options are configured).  I suspect that enabling
vnodes will exacerbate this as well.

We're still analyzing the third:

3. Large (hundreds to thousands of node) clusters with vnodes enabled
show FD false positives even without cluster membership changes
(CASSANDRA-6127).

Fixes for (1) and (2) are committed and will be in 1.2.12 and 2.0.3.
We can reproduce (3) and hope to have a resolution soon.  In the
meantime, caution is advised when deploying vnode-enabled clusters,
since other pressures on the system could make this a problem with
smaller clusters as well.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Reply via email to