Pretty sure there's logic in there that says "don't bother compacting a single sstable."
On Wed, Jan 5, 2011 at 2:26 PM, shimi <shim...@gmail.com> wrote: > How does minor compaction is triggered? Is it triggered Only when a new > SStable is added? > > I was wondering if triggering a compaction with minimumCompactionThreshold > set to 1 would be useful. If this can happen I assume it will do compaction > on files with similar size and remove deleted rows on the rest. > Shimi > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller <peter.schul...@infidyne.com> > wrote: >> >> > I don't have a problem with disk space. I have a problem with the data >> > size. >> >> [snip] >> >> > Bottom line is that I want to reduce the number of requests that goes to >> > disk. Since there is enough data that is no longer valid I can do it by >> > reclaiming the space. The only way to do it is by running Major >> > compaction. >> > I can wait and let Cassandra do it for me but then the data size will >> > get >> > even bigger and the response time will be worst. I can do it manually >> > but I >> > prefer it to happen in the background with less impact on the system >> >> Ok - that makes perfect sense then. Sorry for misunderstanding :) >> >> So essentially, for workloads that are teetering on the edge of cache >> warmness and is subject to significant overwrites or removals, it may >> be beneficial to perform much more aggressive background compaction >> even though it might waste lots of CPU, to keep the in-memory working >> set down. >> >> There was talk (I think in the compaction redesign ticket) about >> potentially improving the use of bloom filters such that obsolete data >> in sstables could be eliminated from the read set without >> necessitating actual compaction; that might help address cases like >> these too. >> >> I don't think there's a pre-existing silver bullet in a current >> release; you probably have to live with the need for >> greater-than-theoretically-optimal memory requirements to keep the >> working set in memory. >> >> -- >> / Peter Schuller > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com