I only sifted recent history of this thread (for time reasons), but: > You have started a major compaction which is now competing with those > near constant minor compactions for far too little I/O (3 SATA drives > in RAID0, perhaps?). Normally, this would result in a massive > ballooning of your heap use as all sorts of activities (like memtable > flushes) backed up, as well.
AFAIK memtable flushing is unrelated to compaction in the sense that they occur concurrently and don't block each other (except to the extent that they truly do compete for e.g. disk or CPU resources). While small memtables do indeed mean more compaction activity in total, the expensiveness of any given compaction should not be severely affecting. As far as I can tell, the two primary effects of small memtable sizes are: * An increase in total amount of compaction work done in total for a given database size. * An increase in the number of sstables that may accumulate while larger compactions are running. ** That in turn is particularly relevant because it can generate a lot of seek-bound activity; consider for example range queries that end up spanning 10 000 files on disk. If memtable flushes are not able to complete fast enough to cope with write activity, even if that is the case only during concurrenct compaction (for whatever reason), that suggests to me that write activity is too high. Increasing memtable sizes may help on average due to decreased compaction work, but I don't see why it would significantly affect the performance one compactions *do* in fact run. With respect to timeouts on writes: I make no claims as to whether it is expected, because I have not yet investigated, but I definitely see sporadic slowness when benchmarking high-throughput writes on a cassandra trunk snapshot somewhere between 0.6 and 0.7. This occurs even when writing to a machine where the commit log and data directories are both on separate RAID volumes that are battery backed and should have no trouble eating write bursts (and the data is such that one is CPU bound rather than diskbound on average; so it only needs to eat bursts). I've had to add re-try to the benchmarking tool (or else up the timeout) because the default was not enough. I have not investigated exactly why this happens but it's an interesting effect that as far as I can tell should not be there. Haver other people done high-throughput writes (to the point of CPU saturation) over extended periods of time while consistently seeing low latencies (consistencty meaning never exceeding hundreds of ms over several days)? -- / Peter Schuller