> Is it normal that the repair takes 4+ hours for every node, with only about > 10G data? If this is not expected, do we have any hint what could be causing > this?
It does not seem entirely crazy, depending on the nature of your data and how CPU-intensive it is "per byte" to compact. Assuming there is no functional problem that is delaying this, the question is what the bottleneck is. If you have a lot of read traffic that is keeping the drives busy, it might be that compaction is throttling on reading from disk (despite being sequential for the compaction) because of the live reads. Else you might be CPU bound (you can use something like htop to gauge fairly well whether you seem to be saturating a core doing compaction). To be clear, the processes to watch for are: * The "validating compaction" happening on the node repairing AND ITS NEIGHBORS - can be CPU or I/O bound (or throttled) - nodetool compactionstats, htop, iostat -x -k 1 * Streaming of data - can be network or disk bound (maybe throttled if the streaming throttling is in the version you're running) - nodetool netstats, ifstat, iostat -x -k 1 * The "sstable rebuild" compaction happening after streaming, building bloom filters and indexes. Can be CPU or I/O bound (or throttled) - nodetool compactionstats, htop, iostat -x -k 1 -- / Peter Schuller (@scode on twitter)