Sorry about the lack of response to your actual issue. I'm afraid I
don't have an exhaustive analysis, but some quick notes:

> balanced ring but the other nodes are at 60GB. Each repair basically
> generates thousands of pending compactions of various types (SSTable build,
> minor, major & validation) : it spikes up to 4000 thousands, levels then

In general the pending compactions count is very misleading. It can
usually mostly be used to indicate that "something is backing up"
(such as a large compaction or repair). Unless this has changed in
0.8, each "potential" compaction that gets submitted is counted so
even though the number might be 10 000, it can immediately dive down
to 0 very suddenly so the number is not really indicative of the
amount of compaction work that is pending. Again though, I am a bit
out of date on the behavior of 0.8 so this could be wrong.

> the number of SSTables for some keyspaces goes dramatically up (from 3 or 4
> to several dozens).

Typically with a long running compaction, such as that triggered by
repair, that's what happens as flushed memtables accumulate. In
particular for memtables with frequent flushes.

Are you running with concurrent compaction enabled?

> the commit log keeps increasing in size, I'm at 4.3G now, it went up to 40G
> when the compaction was throttled at 16MB/s. On the other nodes it's around
> 1GB at most

Hmmmm. The Commit Log should not be retained longer than what is
required for memtables to be flushed. Is it possible you have had an
out-of-disk condition and flushing has stalled? Are you seeing flushes
happening in the log?

> the data directory is bigger than on the other nodes. I've seen it go up to
> 480GB when the compaction was throttled at 16MB/s

How much data are you writing? Is it at all plausible that the huge
spike is a reflection of lots of overwriting writes that aren't being
compacted?

Normally when disk space spikes with repair it's due to other nodes
streaming huge amounts (maybe all of their data) to the node, leading
to a temporary spike. But if your "real" size is expected to be 60,
480 sounds excessive. Are you sure other nodes aren't running repairs
at the same time and magnifying each other's data load spikes?

> What's even weirder is that currently I have 9 compactions running but CPU
> is throttled at 1/number of cores half the time (while > 80% the rest of the
> time). Could this be because other repairs are happening in the ring ?

You mean compaction is taking less CPU than it "should"?

No, this should not be due to other nodes repairing. However it sounds
to me like you are bottlenecking on I/O and the repairs and
compactions are probably proceeding extremely slowly, probably being
completely drowned out by live traffic (which is probably having an
abnormally high performance impact due to data size spike).

What's your read concurrency configured on the node? What does "iostat
-x -k 1" show in the average queue size column? Is "nodetool -h
localhost tpstats" showing that ReadStage is usually "full" (@ your
limit)?

-- 
/ Peter Schuller (@scode on twitter)

Reply via email to