> I don't have a problem with disk space. I have a problem with the data
> size.

[snip]

> Bottom line is that I want to reduce the number of requests that goes to
> disk. Since there is enough data that is no longer valid I can do it by
> reclaiming the space. The only way to do it is by running Major compaction.
> I can wait and let Cassandra do it for me but then the data size will get
> even bigger and the response time will be worst. I can do it manually but I
> prefer it to happen in the background with less impact on the system

Ok - that makes perfect sense then. Sorry for misunderstanding :)

So essentially, for workloads that are teetering on the edge of cache
warmness and is subject to significant overwrites or removals, it may
be beneficial to perform much more aggressive background compaction
even though it might waste lots of CPU, to keep the in-memory working
set down.

There was talk (I think in the compaction redesign ticket) about
potentially improving the use of bloom filters such that obsolete data
in sstables could be eliminated from the read set without
necessitating actual compaction; that might help address cases like
these too.

I don't think there's a pre-existing silver bullet in a current
release; you probably have to live with the need for
greater-than-theoretically-optimal memory requirements to keep the
working set in memory.

-- 
/ Peter Schuller

Reply via email to