“aws asked to set nvme_timeout to higher number in etc/grub.conf.”

Did you ask AWS if setting higher value is real solution to bug - Is there not 
any patch available to address the bug?   - just curios to know

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com]
Sent: Friday, July 19, 2019 10:49 AM
To: user@cassandra.apache.org
Subject: Rebooting one Cassandra node caused all the application nodes go down

Here ,

We have 6 nodes each in 2 data centers us-east-1 and us-west-2  . We have RF 3 
and  cl set to local quorum. And gossip snitch. All our instance are c5.2xlarge 
and data files and comit logs are stored in gp2 ebs.  C5 instance type had a 
bug which aws asked to set nvme_timeout to higher number in etc/grub.conf. 
after setting the parameter and did run nodetool drain and reboot the node in 
east

Instance cameup but Cassandra didn't come up normal had to start the Cassandra. 
Cassandra cameup but it shows other instances down. Even though didn't reboot 
the other node down same was observed in one other node. How could that happen 
and don't any errors in system.log which is set to info.
Without any intervention gossip settled in 10 mins entire cluster became normal.

Tried same thing West it happened again



I'm concerned how to check what caused it and if a reboot happens again how to 
avoid this.
 If I just  STOP Cassandra instead of reboot I don't see this issue.

Reply via email to