Yes. On Mon, Jun 26, 2017 at 5:45 PM Hannu Kröger <hkro...@gmail.com> wrote:
> Just to be sure: you have only one datacenter configured in Cassandra? > > Hannu > > On 27 Jun 2017, at 0.02, Rutvij Bhatt <rut...@sense.com> wrote: > > Hi guys, > > I observed some odd behaviour with our Cassandra cluster the other day > while doing some maintenance operation and was wondering if anyone would be > able to provide some insight. > > Initially, I started a node up to join the cluster. That node appeared to > be having issues joining due to some SSTable corruption it encountered. > Since it was still in early staged and I had never seen this failure > before, I decided to take it out of commission and just try again. However, > since it was in a bad state, I decided to issue a "nodetool removenode > <host id>" on a peer rather than a "nodetool decommission" on the node > itself. > > The removenode command hung indefinitely - my guess is that this is > related to https://issues.apache.org/jira/browse/CASSANDRA-6542. We are > using 2.1.11. > > While this was happening, the driver in the application started logging > error messages about not being able to reach a quorum of 4. This, to me, > was mysterious as none of my keyspaces have an RF > 3. That quorum count in > the error implied an RF of 6 or 7. > > I eventually forced that node out of the ring with "nodetool removenode > force". This seemed to mostly fix the issue, though there seems to have > been enough of a load spike to cause some of the machines' JVMs to > accumulate a lot of garbage very fast and spit out a ton of "Not marking > nodes down due to local pause of ... ", trying to clean it up. Some of > these nodes seemed unresponsive to their peers, who marked them DOWN (as > indicated by "nodetool status" and the cassandra log file on those > machines), further exacerbating the situation on the nodes that were still > up. > > I guess my question is two-fold. First, can anyone provide some insight > into what may have happened? Second, what do you consider good practices > when dealing with such issues? Any advice is greatly appreciated! > > Thanks, > Rutvij > >