Re: Rebooting one Cassandra node caused all the application nodes go down

Rahul Reddy Fri, 19 Jul 2019 10:53:47 -0700

Schema matches and corruption errors in system.log

On Fri, Jul 19, 2019, 1:33 PM Nitan Kainth <nitankai...@gmail.com> wrote:


> Do you see schemat in sync? Nodetool describecluster.
>
> Check system log for any corruption.
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Jul 19, 2019, at 12:32 PM, ZAIDI, ASAD A <az1...@att.com> wrote:
>
> “aws asked to set nvme_timeout to higher number in etc/grub.conf.”
>
>
>
> Did you ask AWS if setting higher value is real solution to bug - Is there
> not any patch available to address the bug?   - just curios to know
>
>
>
> *From:* Rahul Reddy [mailto:rahulreddy1...@gmail.com
> <rahulreddy1...@gmail.com>]
> *Sent:* Friday, July 19, 2019 10:49 AM
> *To:* user@cassandra.apache.org
> *Subject:* Rebooting one Cassandra node caused all the application nodes
> go down
>
>
>
> Here ,
>
>
>
> We have 6 nodes each in 2 data centers us-east-1 and us-west-2  . We have
> RF 3 and  cl set to local quorum. And gossip snitch. All our instance are
> c5.2xlarge and data files and comit logs are stored in gp2 ebs.  C5
> instance type had a bug which aws asked to set nvme_timeout to higher
> number in etc/grub.conf. after setting the parameter and did run nodetool
> drain and reboot the node in east
>
>
>
> Instance cameup but Cassandra didn't come up normal had to start the
> Cassandra. Cassandra cameup but it shows other instances down. Even though
> didn't reboot the other node down same was observed in one other node. How
> could that happen and don't any errors in system.log which is set to info.
>
> Without any intervention gossip settled in 10 mins entire cluster became
> normal.
>
>
>
> Tried same thing West it happened again
>
>
>
>
>
>
>
> I'm concerned how to check what caused it and if a reboot happens again
> how to avoid this.
>
>  If I just  STOP Cassandra instead of reboot I don't see this issue.
>
>
>
>

Re: Rebooting one Cassandra node caused all the application nodes go down

Reply via email to