> However, the point has to do with the fact Peter mentions. With smaller > memtables I see that minor compaction is unable to keep up with the writes. > The number of sstables grows constantly during my peaks hours. With 400MB > memtables the cluster is always compacting and the number of sstables grows > constantly. > I don't see the cluster is io bounded even with compaction (disk > utilization is bellow 60% during compactions) but I think that a large > number of sstables affects my reads latency. Now I have 5-7 sstables during > the peak hours and when I tried with smaller sstables I saw a 30 sstables > (and then I got scared and rollbacked the change)
When you say that it grows constantly, does that mean up to 30 or even farther? Because it is expected that smaller sstables will give you higher sstable count spikes. Only one compaction runs at a time, and as larger compactions run, they will take some amount of time. Given that amount of time, with smaller memtable sizes the number of sstables that have time to be flushed in the mean time is higher. So a higher sstable count spike is not necessarily indicative that you're not keeping up, unless it just grows and grows indefinitely. But you're right that sstable count will affect the seek overhead of reads. What is your total data size? (Affects the maximum work necessary for the biggest compaction jobs.) With respect to your disk utilization: I assume your ~ 35 kb rows are made up of several smaller columns? (If not I would expect compaction to be disk bound rather than I/O bound, at least assuming you're not running with a very fast I/O device) In any case; if indeed you are in a position where the sstable counts are not just due to the results of large compactions allowing for several memtable flushes to happen in the mean time, and you are in fact not keeping up with writes due to being CPU bound, then yeah - basically that means you need more capacity to handle the load (unless you can re-model data to be less CPU heavy in Cassandra, but that seems like the wrong way to go in most cases). Given your 200 writes/second, assuming they are full fows of 35 kb, implies about 7 MB/second of writes. Given small enough column values it seems plausible that you'd be CPU bound on compaction (hand-wavy gut feelingly on my part). (A nice future improvement would be to allow for concurrent compaction so that Cassandra would be able to utilize multiple CPU cores which may mitigate this if you have left-over CPU. However, this is not currently supported.) -- / Peter Schuller