Hello,

'*nodetool cleanup*' use to be mono-threaded (up to C*2.1) then used all
the cores (C*2.1 - C*2.1.14) and is now something that can be controlled
(C*2.1.14+):
'*nodetool cleanup -j 2*' for example would use 2 compactors maximum (out
of the number of concurrent_compactors you defined (probably no more than
8).

*Global*: My advice would be to run on all nodes with a 1 or 2 threads
(never more than half of what's available). The impact of the cleanup
should not be bigger than the impact of a compaction. Also, be sure to
leave some room for regular compactions. This way, the cleanup should be
rather safe and it should be acceptable to run it in parallel in most
cases. In parallel you will save time to move to other operations quickly,
but generally there is no rush to run cleanup per se. So it's up to you to
run it in parallel or not. I often did, fwiw.

*Early Cassandra 2.1:* If you're using a Cassandra version between 2.1 and
2.1.14, I would go 1 node at the time, as you cannot really control the
number of threads. This operation in early C*2.1 is risky and heavy,
upgrade if you can, then cleanup would be my advice here :). Be careful
there if you decide to go for the cleanup anyway. Monitor pending
compaction stacking and disk space used mostly. In worst case you want to
have 50% of the disk free before starting cleanups.

*Note: *Reducing disk space usage - If disk space available is low or if
you mind the data size variation, you can run the cleanup per* tables*
sequentially, one by one, instead of running it on the whole node or
keyspace. Cleanups are going through compactions that starts by increasing
the used disk space to write temporary SSTables. Most of the disk space is
freed at the end of the cleanup operation. Going one table at the time and
with a low number of threads helped me in the past running cleanups in the
most extreme conditions.
Here is how this could be run (you may need to adapt this):

```
*screen -R cleanup*
*# From screen:*
*for ks in "myks yourks whateverks"; do tables=$(ls
/var/lib/cassandra/data/$ks | sort | cut -d "-" -f 1); for table in
$tables; do echo "Running nodetool cleanup on $ks.$table..."; nodetool
cleanup -j 2 $ks $table; done; done*
```

The screen is a good idea to answer the question 'Did the cleanup finish?'.
You get back to the screen and see if the command returned or not and you
don't have to kill the command just after running it.

C*heers,
-----------------------
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le lun. 22 oct. 2018 à 21:18, Jeff Jirsa <jji...@gmail.com> a écrit :

> Nodetool will eventually return when it’s done
>
> You can also watch nodetool compactionstats
>
> --
> Jeff Jirsa
>
>
> > On Oct 22, 2018, at 10:53 AM, Ian Spence <ian.spe...@globalrelay.net>
> wrote:
> >
> > Environment: Cassandra 2.2.9, GNU/Linux CentOS 6 + 7. Two DCs, 3 RACs in
> DC1 and 6 in DC2.
> >
> > We recently added 16 new nodes to our 38-node cluster (now 54 nodes).
> What would be the safest and most
> > efficient way of running a cleanup operation? I’ve experimented with
> running cleanup on a single node and
> > nodetool just hangs, but that seems to be a known issue.
> >
> > Would something like running it on a couple of nodes per day, working
> through the cluster, work?
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Reply via email to