Re: Rebooting one Cassandra node caused all the application nodes go down

Rajsekhar Mallick Fri, 19 Jul 2019 10:06:35 -0700

Hello Rahul,
 As per your description, Cassandra process is up and running as you
verified from the logs.
But nodetool and grafana arnt fetching data.
This points to the suspect being jmx port 7199.


Do run and check 'netstat -anp | egrep"7199|9042|7070" ' on the impacted
and other hosts in the cluster.
There has to be some difference . Observe The ip address to which the jmx
port 7199 is binding to. Is it the same as it was prior to reboot.

Thanks


On Fri, 19 Jul, 2019, 10:28 PM Rahul Reddy, <rahulreddy1...@gmail.com>
wrote:

> Raj,
>
> No that was not the case in system.log I see the started listening to call
> client at 16:42 but some how it still unreachable to 16:50 below grafana
> dashboard shows it. Once everything up in logs why would it still show down
> in nodetool status and grafana.
>
> Zaidi,
>
> In latest aws Linux Ami they took care of this bug . And also changing the
> Ami needs rebuild of all the nodes so didn't took that route.
>
> On Fri, Jul 19, 2019, 12:32 PM ZAIDI, ASAD A <az1...@att.com> wrote:
>
>> “aws asked to set nvme_timeout to higher number in etc/grub.conf.”
>>
>>
>>
>> Did you ask AWS if setting higher value is real solution to bug - Is
>> there not any patch available to address the bug?   - just curios to know
>>
>>
>>
>> *From:* Rahul Reddy [mailto:rahulreddy1...@gmail.com]
>> *Sent:* Friday, July 19, 2019 10:49 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Rebooting one Cassandra node caused all the application nodes
>> go down
>>
>>
>>
>> Here ,
>>
>>
>>
>> We have 6 nodes each in 2 data centers us-east-1 and us-west-2  . We have
>> RF 3 and  cl set to local quorum. And gossip snitch. All our instance are
>> c5.2xlarge and data files and comit logs are stored in gp2 ebs.  C5
>> instance type had a bug which aws asked to set nvme_timeout to higher
>> number in etc/grub.conf. after setting the parameter and did run nodetool
>> drain and reboot the node in east
>>
>>
>>
>> Instance cameup but Cassandra didn't come up normal had to start the
>> Cassandra. Cassandra cameup but it shows other instances down. Even though
>> didn't reboot the other node down same was observed in one other node. How
>> could that happen and don't any errors in system.log which is set to info.
>>
>> Without any intervention gossip settled in 10 mins entire cluster became
>> normal.
>>
>>
>>
>> Tried same thing West it happened again
>>
>>
>>
>>
>>
>>
>>
>> I'm concerned how to check what caused it and if a reboot happens again
>> how to avoid this.
>>
>>  If I just  STOP Cassandra instead of reboot I don't see this issue.
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Rebooting one Cassandra node caused all the application nodes go down

Reply via email to