Re: Capacity problem with a lot of writes?

Peter Schuller Thu, 25 Nov 2010 12:47:50 -0800

> However, the point has to do with the fact Peter mentions. With smaller
> memtables I see that minor compaction is unable to keep up with the writes.
> The number of sstables grows constantly during my peaks hours. With 400MB
> memtables the cluster is always compacting and the number of sstables grows
> constantly.
> I don't see the cluster is io bounded even with compaction (disk
>  utilization is bellow 60% during compactions) but I think that a large
> number of  sstables affects my reads latency. Now I have 5-7 sstables during
> the peak hours and when I tried with smaller sstables I saw a 30 sstables
> (and then I got scared and rollbacked the change)


When you say that it grows constantly, does that mean up to 30 or even
farther? Because it is expected that smaller sstables will give you
higher sstable count spikes. Only one compaction runs at a time, and
as larger compactions run, they will take some amount of time. Given
that amount of time, with smaller memtable sizes the number of
sstables that have time to be flushed in the mean time is higher.

So a higher sstable count spike is not necessarily indicative that
you're not keeping up, unless it just grows and grows indefinitely.
But you're right that sstable count will affect the seek overhead of
reads.

What is your total data size? (Affects the maximum work necessary for
the biggest compaction jobs.)

With respect to your disk utilization: I assume your ~ 35 kb rows are
made up of several smaller columns? (If not I would expect compaction
to be disk bound rather than I/O bound, at least assuming you're not
running with a very fast I/O device)

In any case; if indeed you are in a position where the sstable counts
are not just due to the results of large compactions allowing for
several memtable flushes to happen in the mean time, and you are in
fact not keeping up with writes due to being CPU bound, then yeah -
basically that means you need more capacity to handle the load (unless
you can re-model data to be less CPU heavy in Cassandra, but that
seems like the wrong way to go in most cases).

Given your 200 writes/second, assuming they are full fows of 35 kb,
implies about 7 MB/second of writes. Given small enough column values
it seems plausible that you'd be CPU bound on compaction (hand-wavy
gut feelingly on my part).

(A nice future improvement would be to allow for concurrent compaction
so that Cassandra would be able to utilize multiple CPU cores which
may mitigate this if you have left-over CPU. However, this is not
currently supported.)

-- 
/ Peter Schuller

Re: Capacity problem with a lot of writes?

Reply via email to