Helllo,

I have a cassandra cluster running on cassandra 3.0.3 and am seeing some
strange behavior that I cannot explain when restarting cassandra nodes. The
cluster is currently setup in a single datacenter and consists of 55 nodes.
I am currently in the process of restarting nodes in the cluster but have
noticed that after restarting the cassandra process with `service cassandra
start; service cassandra stop` when the node comes back and I run `nodetool
status` there is usually a non-zero number of nodes in the rest of the
cluster that are marked as DN. If I got to another node in the cluster,
from its perspective all nodes included the restarted one are marked as UN.
It seems to take ~15 to 20 minutes before the restarted node is updated to
show all nodes as UN. During the 15 minutes writes and reads . to the
cluster appear to be degraded and do not recover unless I stop the
cassandra process again or wait for all nodes to be marked as UN. The
cluster also has 3 seed nodes which during this process are up and
available the whole time.

I have also tried doing `gossipinfo` on the restarted node and according to
the output all nodes have a status of NORMAL. Has anyone seen this before
and is there anything I can do to fix/reduce the impact of running a
restart on a cassandra node?

Thanks,
Andrew Jorgensen
@ajorgensen

Reply via email to