Re: Repair failed and crash the node, how to bring it back?

2019-07-31 Thread Alexander Dejanovski
Hi Martin, apparently this is the bug you've been hit by on hints : https://issues.apache.org/jira/browse/CASSANDRA-14080 It was fixed in 3.0.17. You didn't provide the logs from Cassandra at the time of the crash, only the output of nodetool, so it's hard to say what caused it. You may be hit by

Re: Repair failed and crash the node, how to bring it back?

2019-07-31 Thread Martin Xue
Hi Alex, Thanks for your reply. The disk space was around 80%. The crash happened during repair, primary range full repair on 1TB keyspace. Would that crash again? Thanks Regards Martin On Thu., 1 Aug. 2019, 12:04 am Alexander Dejanovski, wrote: > It looks like you have a corrupted hint file.

Re: Differing snitches in different datacenters

2019-07-31 Thread Voytek Jarnot
Thanks Paul. Yes - finding a definitive answer is where I'm failing as well. I think we're probably going to try it and see what happens, but that's a bit worrisome. On Mon, Jul 29, 2019 at 3:35 PM Paul Chandler wrote: > Hi Voytek, > > I looked into this a little while ago, and couldn’t really f

Re: Repair failed and crash the node, how to bring it back?

2019-07-31 Thread Alexander Dejanovski
It looks like you have a corrupted hint file. Did the node run out of disk space while repair was running? You might want to move the hint files off their current directory and try to restart the node again. Since you'll have lost mutations then, you'll need... to run repair ¯\_(ツ)_/¯ ---

Re: Repair / compaction for 6 nodes, 2 DC cluster

2019-07-31 Thread Alexander Dejanovski
Hi Martin, you can stop the anticompaction by roll restarting the nodes (not sure if "nodetool stop COMPACTION" will actually stop anticompaction, I never tried). Note that this will leave your cluster with SSTables marked as repaired and others that are not. These two types of SSTables will neve

Re: Repair / compaction for 6 nodes, 2 DC cluster

2019-07-31 Thread Martin Xue
Sorry ASAD, don't have chance, still bogged down with the production issue... On Wed, Jul 31, 2019 at 10:56 PM ZAIDI, ASAD A wrote: > Did you get chance to look at tlp reaper tool i.e. > http://cassandra-reaper.io/ > > It is pretty awesome – Thanks to TLP team. > > > > > > > > *From:* Martin Xue

Repair failed and crash the node, how to bring it back?

2019-07-31 Thread Martin Xue
Hi, I am running repair on production, started with one of 6 nodes in the cluster (3 nodes in each of two DC). Cassandra version 3.0.14. running: repair -pr --full keyspace on node 1, 1TB data, takes two days, and crash, error shows: 3202]] finished (progress: 3%) Exception occurred during clean

Re: Repair / compaction for 6 nodes, 2 DC cluster

2019-07-31 Thread Martin Xue
Thanks Alex, In this case, as I have already run the repair and anti-compaction have started (including in other nodes). I don't know how long they will finish (anti-compaction). is there a way to check? nodetool compactionstats shows one process finished, then there is another one coming up. Sha

RE: Repair / compaction for 6 nodes, 2 DC cluster

2019-07-31 Thread ZAIDI, ASAD A
Did you get chance to look at tlp reaper tool i.e. http://cassandra-reaper.io/ It is pretty awesome – Thanks to TLP team. From: Martin Xue [mailto:martin...@gmail.com] Sent: Wednesday, July 31, 2019 12:09 AM To: user@cassandra.apache.org Subject: Repair / compaction for 6 nodes, 2 DC cluster He

Re: Repair / compaction for 6 nodes, 2 DC cluster

2019-07-31 Thread Oleksandr Shulgin
On Wed, Jul 31, 2019 at 7:10 AM Martin Xue wrote: > Hello, > > Good day. This is Martin. > > Can someone help me with the following query regarding Cassandra repair > and compaction? > Martin, This blog post from The Last Pickle provides an in-depth explanation as well as some practical advice: