Re: Rebooting one Cassandra node caused all the application nodes go down

Nitan Kainth Fri, 19 Jul 2019 11:41:05 -0700

You no corruption error or you see corruption error?


Regards,
Nitan
Cell: 510 449 9629

> On Jul 19, 2019, at 1:52 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote:
> 
> Schema matches and corruption errors in system.log
> 
>> On Fri, Jul 19, 2019, 1:33 PM Nitan Kainth <nitankai...@gmail.com> wrote:
>> Do you see schemat in sync? Nodetool describecluster.
>> 
>> Check system log for any corruption.
>> 
>> 
>> Regards,
>> Nitan
>> Cell: 510 449 9629
>> 
>>> On Jul 19, 2019, at 12:32 PM, ZAIDI, ASAD A <az1...@att.com> wrote:
>>> 
>>> “aws asked to set nvme_timeout to higher number in etc/grub.conf.”
>>> 
>>>  
>>> 
>>> Did you ask AWS if setting higher value is real solution to bug - Is there 
>>> not any patch available to address the bug?   - just curios to know
>>> 
>>>  
>>> 
>>> From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
>>> Sent: Friday, July 19, 2019 10:49 AM
>>> To: user@cassandra.apache.org
>>> Subject: Rebooting one Cassandra node caused all the application nodes go 
>>> down
>>> 
>>>  
>>> 
>>> Here ,
>>> 
>>>  
>>> 
>>> We have 6 nodes each in 2 data centers us-east-1 and us-west-2  . We have 
>>> RF 3 and  cl set to local quorum. And gossip snitch. All our instance are 
>>> c5.2xlarge and data files and comit logs are stored in gp2 ebs.  C5 
>>> instance type had a bug which aws asked to set nvme_timeout to higher 
>>> number in etc/grub.conf. after setting the parameter and did run nodetool 
>>> drain and reboot the node in east
>>> 
>>>  
>>> 
>>> Instance cameup but Cassandra didn't come up normal had to start the 
>>> Cassandra. Cassandra cameup but it shows other instances down. Even though 
>>> didn't reboot the other node down same was observed in one other node. How 
>>> could that happen and don't any errors in system.log which is set to info.
>>> 
>>> Without any intervention gossip settled in 10 mins entire cluster became 
>>> normal.
>>> 
>>>  
>>> 
>>> Tried same thing West it happened again
>>> 
>>>  
>>> 
>>>  
>>> 
>>>  
>>> 
>>> I'm concerned how to check what caused it and if a reboot happens again how 
>>> to avoid this.
>>> 
>>>  If I just  STOP Cassandra instead of reboot I don't see this issue.
>>> 
>>>

Re: Rebooting one Cassandra node caused all the application nodes go down

Reply via email to