Hey all;

So, we have Cassandra running on a 5-server ring, with a RF of 3, and we're regularly seeing major slowdowns in read & write performance while running nodetool repair, as well as the occasional Cassandra crash during the repair window - slowdowns past 10 seconds to perform a single write.

The repair cycle runs nightly on a different server, so each server has it run once a week.

We're running 0.7.0 currently, and we'll be upgrading to 0.7.6 shortly.

System load on the Cassandra servers is never more than 10% CPU and utterly minimal IO usage, so I wouldn't think we'd be seeing issues quite like this.

What sort of knobs should I be looking at tuning to reduce the impact that nodetool repair has on Cassandra? What questions should I be asking as to why Cassandra slows down to the level that it does, and what I should be optimizing?

Additionally, what should I be looking for in the logs when this is happening? There's a lot in the logs, but I'm not sure what to look for.

Cassadra is, in this instance, backing a system that supports around a million requests a day, so not terribly heavy traffic.

Thanks,

Aurynn

Reply via email to