Read timeouts when performing rolling restart

Riccardo Ferrari Wed, 12 Sep 2018 02:59:28 -0700

Hi list,

We are seeing the following behaviour when performing a rolling restart:


On the node I need to restart:
*  I run the 'nodetool drain'
* Then 'service cassandra restart'

so far so good. The load incerase on the other 5 nodes is negligible.
The node is generally out of service just for the time of the restart (ie.
cassandra.yml update)

When the node comes back up and switch on the native transport I start see
lots of read timeouts in our various services:

com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout
during read query at consistency LOCAL_ONE (1 responses were required but
only 0 replica responded)

Indeed the restarting node have a huge peak on the system load, because of
hints and compactions, nevertheless I don't notice a load increase on the
other 5 nodes.

Specs:
6 nodes cluster on Cassandra 3.0.6
- keyspace RF=3

Java driver 3.5.1:
- DefaultRetryPolicy
- default LoadBalancingPolicy (that should be DCAwareRoundRobinPolicy)

QUESTIONS:
How come that a single node is impacting the whole cluster?
Is there a way to further delay the native transposrt startup?
Any hint on troubleshooting it further?

Thanks

Read timeouts when performing rolling restart

Reply via email to