Sorry about the lack of response to your actual issue. I'm afraid I don't have an exhaustive analysis, but some quick notes:
> balanced ring but the other nodes are at 60GB. Each repair basically > generates thousands of pending compactions of various types (SSTable build, > minor, major & validation) : it spikes up to 4000 thousands, levels then In general the pending compactions count is very misleading. It can usually mostly be used to indicate that "something is backing up" (such as a large compaction or repair). Unless this has changed in 0.8, each "potential" compaction that gets submitted is counted so even though the number might be 10 000, it can immediately dive down to 0 very suddenly so the number is not really indicative of the amount of compaction work that is pending. Again though, I am a bit out of date on the behavior of 0.8 so this could be wrong. > the number of SSTables for some keyspaces goes dramatically up (from 3 or 4 > to several dozens). Typically with a long running compaction, such as that triggered by repair, that's what happens as flushed memtables accumulate. In particular for memtables with frequent flushes. Are you running with concurrent compaction enabled? > the commit log keeps increasing in size, I'm at 4.3G now, it went up to 40G > when the compaction was throttled at 16MB/s. On the other nodes it's around > 1GB at most Hmmmm. The Commit Log should not be retained longer than what is required for memtables to be flushed. Is it possible you have had an out-of-disk condition and flushing has stalled? Are you seeing flushes happening in the log? > the data directory is bigger than on the other nodes. I've seen it go up to > 480GB when the compaction was throttled at 16MB/s How much data are you writing? Is it at all plausible that the huge spike is a reflection of lots of overwriting writes that aren't being compacted? Normally when disk space spikes with repair it's due to other nodes streaming huge amounts (maybe all of their data) to the node, leading to a temporary spike. But if your "real" size is expected to be 60, 480 sounds excessive. Are you sure other nodes aren't running repairs at the same time and magnifying each other's data load spikes? > What's even weirder is that currently I have 9 compactions running but CPU > is throttled at 1/number of cores half the time (while > 80% the rest of the > time). Could this be because other repairs are happening in the ring ? You mean compaction is taking less CPU than it "should"? No, this should not be due to other nodes repairing. However it sounds to me like you are bottlenecking on I/O and the repairs and compactions are probably proceeding extremely slowly, probably being completely drowned out by live traffic (which is probably having an abnormally high performance impact due to data size spike). What's your read concurrency configured on the node? What does "iostat -x -k 1" show in the average queue size column? Is "nodetool -h localhost tpstats" showing that ReadStage is usually "full" (@ your limit)? -- / Peter Schuller (@scode on twitter)