Re: Incorrect quorum count in driver error logs

Rutvij Bhatt Mon, 26 Jun 2017 14:49:59 -0700

Yes.

On Mon, Jun 26, 2017 at 5:45 PM Hannu Kröger <hkro...@gmail.com> wrote:


> Just to be sure: you have only one datacenter configured in Cassandra?
>
> Hannu
>
> On 27 Jun 2017, at 0.02, Rutvij Bhatt <rut...@sense.com> wrote:
>
> Hi guys,
>
> I observed some odd behaviour with our Cassandra cluster the other day
> while doing some maintenance operation and was wondering if anyone would be
> able to provide some insight.
>
> Initially, I started a node up to join the cluster. That node appeared to
> be having issues joining due to some SSTable corruption it encountered.
> Since it was still in early staged and I had never seen this failure
> before, I decided to take it out of commission and just try again. However,
> since it was in a bad state, I decided to issue a "nodetool removenode
> <host id>" on a peer rather than a "nodetool decommission" on the node
> itself.
>
> The removenode command hung indefinitely - my guess is that this is
> related to https://issues.apache.org/jira/browse/CASSANDRA-6542. We are
> using 2.1.11.
>
> While this was happening, the driver in the application started logging
> error messages about not being able to reach a quorum of 4. This, to me,
> was mysterious as none of my keyspaces have an RF > 3. That quorum count in
> the error implied an RF of 6 or 7.
>
> I eventually forced that node out of the ring with "nodetool removenode
> force". This seemed to mostly fix the issue, though there seems to have
> been enough of a load spike to cause some of the machines' JVMs to
> accumulate a lot of garbage very fast and spit out a ton of "Not marking
> nodes down due to local pause of ... ", trying to clean it up. Some of
> these nodes seemed unresponsive to their peers, who marked them DOWN (as
> indicated by "nodetool status" and the cassandra log file on those
> machines), further exacerbating the situation on the nodes that were still
> up.
>
> I guess my question is two-fold. First, can anyone provide some insight
> into what may have happened? Second, what do you consider good practices
> when dealing with such issues? Any advice is greatly appreciated!
>
> Thanks,
> Rutvij
>
>

Re: Incorrect quorum count in driver error logs

Reply via email to