Re: Rebooting one Cassandra node caused all the application nodes go down

Nitan Kainth Fri, 19 Jul 2019 10:33:36 -0700

Do you see schemat in sync? Nodetool describecluster.

Check system log for any corruption.



Regards,
Nitan
Cell: 510 449 9629

> On Jul 19, 2019, at 12:32 PM, ZAIDI, ASAD A <az1...@att.com> wrote:
> 
> “aws asked to set nvme_timeout to higher number in etc/grub.conf.”
>  
> Did you ask AWS if setting higher value is real solution to bug - Is there 
> not any patch available to address the bug?   - just curios to know
>  
> From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
> Sent: Friday, July 19, 2019 10:49 AM
> To: user@cassandra.apache.org
> Subject: Rebooting one Cassandra node caused all the application nodes go down
>  
> Here ,
>  
> We have 6 nodes each in 2 data centers us-east-1 and us-west-2  . We have RF 
> 3 and  cl set to local quorum. And gossip snitch. All our instance are 
> c5.2xlarge and data files and comit logs are stored in gp2 ebs.  C5 instance 
> type had a bug which aws asked to set nvme_timeout to higher number in 
> etc/grub.conf. after setting the parameter and did run nodetool drain and 
> reboot the node in east
>  
> Instance cameup but Cassandra didn't come up normal had to start the 
> Cassandra. Cassandra cameup but it shows other instances down. Even though 
> didn't reboot the other node down same was observed in one other node. How 
> could that happen and don't any errors in system.log which is set to info.
> Without any intervention gossip settled in 10 mins entire cluster became 
> normal.
>  
> Tried same thing West it happened again
>  
>  
>  
> I'm concerned how to check what caused it and if a reboot happens again how 
> to avoid this.
>  If I just  STOP Cassandra instead of reboot I don't see this issue.
>

Re: Rebooting one Cassandra node caused all the application nodes go down

Reply via email to