> It runs correctly during several days. Last night, we started to have timeout 
> exception on insert and high cpu load on all nodes.
>
> We stopped inserts. But the CPU remains high (without any insert or read).

Has data been written to the cluster faster than background compaction
is proceeding? If so you may see cassandra eating CPU (and doing I/O)
in the background for extended periods of time even after you stop
sending requests to it.

If this is what is happening it should be visible in the log that it's
doing compaction, and you should see that the data directories contain
lots of files (unless it's just now catching up) rather than the
fairly few expectation when compaction is up to speed.

Also consider that even if you're not writing faster than it can
handle, if you have lots of data in total, the bigger compactions will
take a considerable mount of time so you may see CPU+disk activity for
long periods even if all is otherwise well.

Of course you say your're seeing timeouts. Is is possible these are
timeouts that happen during compaction in general? What kind of
latency are we talking about (a few extra hundre millis or several
seconds?) and is there a correlation between the timeouts and lots of
data being flushed to disk (iostat -x -k 1)?

-- 
/ Peter Schuller

Reply via email to