I modified the code to limit the size of the SSTables. I will be glad if someone can take a look at it
https://github.com/Shimi/cassandra/tree/cassandra-0.6 <https://github.com/Shimi/cassandra/tree/cassandra-0.6>Shimi On Fri, Jan 7, 2011 at 2:04 AM, Jonathan Shook <jsh...@gmail.com> wrote: > I believe the following condition within submitMinorIfNeeded(...) > determines whether to continue, so it's not a hard loop. > > // if (sstables.size() >= minThreshold) ... > > > > On Thu, Jan 6, 2011 at 2:51 AM, shimi <shim...@gmail.com> wrote: > > According to the code it make sense. > > submitMinorIfNeeded() calls doCompaction() which > > calls submitMinorIfNeeded(). > > With minimumCompactionThreshold = 1 submitMinorIfNeeded() will always run > > compaction. > > > > Shimi > > On Thu, Jan 6, 2011 at 10:26 AM, shimi <shim...@gmail.com> wrote: > >> > >> > >> On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis <jbel...@gmail.com> > wrote: > >>> > >>> Pretty sure there's logic in there that says "don't bother compacting > >>> a single sstable." > >> > >> No. You can do it. > >> Based on the log I have a feeling that it triggers an infinite > compaction > >> loop. > >> > >>> > >>> On Wed, Jan 5, 2011 at 2:26 PM, shimi <shim...@gmail.com> wrote: > >>> > How does minor compaction is triggered? Is it triggered Only when a > new > >>> > SStable is added? > >>> > > >>> > I was wondering if triggering a compaction > >>> > with minimumCompactionThreshold > >>> > set to 1 would be useful. If this can happen I assume it will do > >>> > compaction > >>> > on files with similar size and remove deleted rows on the rest. > >>> > Shimi > >>> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller > >>> > <peter.schul...@infidyne.com> > >>> > wrote: > >>> >> > >>> >> > I don't have a problem with disk space. I have a problem with the > >>> >> > data > >>> >> > size. > >>> >> > >>> >> [snip] > >>> >> > >>> >> > Bottom line is that I want to reduce the number of requests that > >>> >> > goes to > >>> >> > disk. Since there is enough data that is no longer valid I can do > it > >>> >> > by > >>> >> > reclaiming the space. The only way to do it is by running Major > >>> >> > compaction. > >>> >> > I can wait and let Cassandra do it for me but then the data size > >>> >> > will > >>> >> > get > >>> >> > even bigger and the response time will be worst. I can do it > >>> >> > manually > >>> >> > but I > >>> >> > prefer it to happen in the background with less impact on the > system > >>> >> > >>> >> Ok - that makes perfect sense then. Sorry for misunderstanding :) > >>> >> > >>> >> So essentially, for workloads that are teetering on the edge of > cache > >>> >> warmness and is subject to significant overwrites or removals, it > may > >>> >> be beneficial to perform much more aggressive background compaction > >>> >> even though it might waste lots of CPU, to keep the in-memory > working > >>> >> set down. > >>> >> > >>> >> There was talk (I think in the compaction redesign ticket) about > >>> >> potentially improving the use of bloom filters such that obsolete > data > >>> >> in sstables could be eliminated from the read set without > >>> >> necessitating actual compaction; that might help address cases like > >>> >> these too. > >>> >> > >>> >> I don't think there's a pre-existing silver bullet in a current > >>> >> release; you probably have to live with the need for > >>> >> greater-than-theoretically-optimal memory requirements to keep the > >>> >> working set in memory. > >>> >> > >>> >> -- > >>> >> / Peter Schuller > >>> > > >>> > > >>> > >>> > >>> > >>> -- > >>> Jonathan Ellis > >>> Project Chair, Apache Cassandra > >>> co-founder of Riptano, the source for professional Cassandra support > >>> http://riptano.com > >> > > > > >