Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K SSTables for a single small (GBytes) CF

Steinmaurer, Thomas Thu, 01 Mar 2018 07:16:12 -0800

Hello,

Production, 9 node cluster with Cassandra 2.1.18, vnodes, default 256 tokens, 
RF=3, compaction throttling = 16, concurrent compactors = 4, running in AWS 
using m4.xlarge at ~ 35% CPU AVG


We have a nightly cronjob starting a "nodetool repair -pr ks cf1 cf2" 
concurrently on all nodes, where data volume for cf1 and cf2 is ~ 1-5GB in 
size, so pretty small.

After extending the cluster from 6 to the current 9 nodes and "nodetool 
cleanup" being finished, the above repair is resulting in > 30K SSTables for 
these two CFs on several nodes with very, very tiny files < 1Kbytes , but not 
on all nodes. Obviously, this affects read latency + disk IO + CPU a lot and it 
needs several hours until the situation relaxes. We have other clusters with 
the same spec which also have been extended from 6 to 9 nodes in the past, 
where we don't see this issue. For now, we have disabled the nightly cron job.

Any input on how to trouble-shoot this issue about the root cause?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313

Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K SSTables for a single small (GBytes) CF

Reply via email to