Hi, we are running a 9 node cluster under load. The nodes are running in EC2 on i2.2xlarge instances. Cassandra version is 2.2.4. One node was down yesterday for more than 3 hours. So we manually started an incremental repair this morning via nodetool (anti-entropy repair?)
What we can see is that user CPU on that node goes up to over 95% and also goes up on all other nodes. Also the number of SSTables is exploding, I guess due to anticompaction. What are my tuning options to have a more gentle repair behaviour? Which settings should I look at if I want CPU to stay below 50% for instance. My worry is always to impact the read/write performance during times when we do anti-entropy repairs. Cheers, Reik