Could be something like https://issues.apache.org/jira/browse/CASSANDRA-14358
Hard to say after the fact. On Fri, Jul 19, 2019 at 8:49 AM Rahul Reddy <rahulreddy1...@gmail.com> wrote: > Here , > > We have 6 nodes each in 2 data centers us-east-1 and us-west-2 . We have > RF 3 and cl set to local quorum. And gossip snitch. All our instance are > c5.2xlarge and data files and comit logs are stored in gp2 ebs. C5 > instance type had a bug which aws asked to set nvme_timeout to higher > number in etc/grub.conf. after setting the parameter and did run nodetool > drain and reboot the node in east > > Instance cameup but Cassandra didn't come up normal had to start the > Cassandra. Cassandra cameup but it shows other instances down. Even though > didn't reboot the other node down same was observed in one other node. How > could that happen and don't any errors in system.log which is set to info. > Without any intervention gossip settled in 10 mins entire cluster became > normal. > > Tried same thing West it happened again > > > > I'm concerned how to check what caused it and if a reboot happens again > how to avoid this. > If I just STOP Cassandra instead of reboot I don't see this issue. > >