Hi list, We are seeing the following behaviour when performing a rolling restart:
On the node I need to restart: * I run the 'nodetool drain' * Then 'service cassandra restart' so far so good. The load incerase on the other 5 nodes is negligible. The node is generally out of service just for the time of the restart (ie. cassandra.yml update) When the node comes back up and switch on the native transport I start see lots of read timeouts in our various services: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded) Indeed the restarting node have a huge peak on the system load, because of hints and compactions, nevertheless I don't notice a load increase on the other 5 nodes. Specs: 6 nodes cluster on Cassandra 3.0.6 - keyspace RF=3 Java driver 3.5.1: - DefaultRetryPolicy - default LoadBalancingPolicy (that should be DCAwareRoundRobinPolicy) QUESTIONS: How come that a single node is impacting the whole cluster? Is there a way to further delay the native transposrt startup? Any hint on troubleshooting it further? Thanks