Hello, Production, 9 node cluster with Cassandra 2.1.18, vnodes, default 256 tokens, RF=3, compaction throttling = 16, concurrent compactors = 4, running in AWS using m4.xlarge at ~ 35% CPU AVG
We have a nightly cronjob starting a "nodetool repair -pr ks cf1 cf2" concurrently on all nodes, where data volume for cf1 and cf2 is ~ 1-5GB in size, so pretty small. After extending the cluster from 6 to the current 9 nodes and "nodetool cleanup" being finished, the above repair is resulting in > 30K SSTables for these two CFs on several nodes with very, very tiny files < 1Kbytes , but not on all nodes. Obviously, this affects read latency + disk IO + CPU a lot and it needs several hours until the situation relaxes. We have other clusters with the same spec which also have been extended from 6 to 9 nodes in the past, where we don't see this issue. For now, we have disabled the nightly cron job. Any input on how to trouble-shoot this issue about the root cause? Thanks, Thomas The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freist?dterstra?e 313