On Wed, Jan 5, 2011 at 4:31 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > Pretty sure there's logic in there that says "don't bother compacting > a single sstable." > > On Wed, Jan 5, 2011 at 2:26 PM, shimi <shim...@gmail.com> wrote: >> How does minor compaction is triggered? Is it triggered Only when a new >> SStable is added? >> >> I was wondering if triggering a compaction with minimumCompactionThreshold >> set to 1 would be useful. If this can happen I assume it will do compaction >> on files with similar size and remove deleted rows on the rest. >> Shimi >> On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller <peter.schul...@infidyne.com> >> wrote: >>> >>> > I don't have a problem with disk space. I have a problem with the data >>> > size. >>> >>> [snip] >>> >>> > Bottom line is that I want to reduce the number of requests that goes to >>> > disk. Since there is enough data that is no longer valid I can do it by >>> > reclaiming the space. The only way to do it is by running Major >>> > compaction. >>> > I can wait and let Cassandra do it for me but then the data size will >>> > get >>> > even bigger and the response time will be worst. I can do it manually >>> > but I >>> > prefer it to happen in the background with less impact on the system >>> >>> Ok - that makes perfect sense then. Sorry for misunderstanding :) >>> >>> So essentially, for workloads that are teetering on the edge of cache >>> warmness and is subject to significant overwrites or removals, it may >>> be beneficial to perform much more aggressive background compaction >>> even though it might waste lots of CPU, to keep the in-memory working >>> set down. >>> >>> There was talk (I think in the compaction redesign ticket) about >>> potentially improving the use of bloom filters such that obsolete data >>> in sstables could be eliminated from the read set without >>> necessitating actual compaction; that might help address cases like >>> these too. >>> >>> I don't think there's a pre-existing silver bullet in a current >>> release; you probably have to live with the need for >>> greater-than-theoretically-optimal memory requirements to keep the >>> working set in memory. >>> >>> -- >>> / Peter Schuller >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
I was wording if it made sense to have a JMX operation that can compact a list of tables by file name. This opens it up for power users to have more options then compact entire keyspace.