> It runs correctly during several days. Last night, we started to have timeout > exception on insert and high cpu load on all nodes. > > We stopped inserts. But the CPU remains high (without any insert or read).
Has data been written to the cluster faster than background compaction is proceeding? If so you may see cassandra eating CPU (and doing I/O) in the background for extended periods of time even after you stop sending requests to it. If this is what is happening it should be visible in the log that it's doing compaction, and you should see that the data directories contain lots of files (unless it's just now catching up) rather than the fairly few expectation when compaction is up to speed. Also consider that even if you're not writing faster than it can handle, if you have lots of data in total, the bigger compactions will take a considerable mount of time so you may see CPU+disk activity for long periods even if all is otherwise well. Of course you say your're seeing timeouts. Is is possible these are timeouts that happen during compaction in general? What kind of latency are we talking about (a few extra hundre millis or several seconds?) and is there a correlation between the timeouts and lots of data being flushed to disk (iostat -x -k 1)? -- / Peter Schuller