Re: Manual Compaction in Production

Jonathan Ellis Mon, 08 Nov 2010 18:34:08 -0800

On Mon, Nov 8, 2010 at 8:23 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> I am using a build with support for removing tombstones during minor
> compacts. I am pretty happy to see SSTables shrink during non-major
> compactions. If I understand correctly bloomfilters have false
> positives, so a key may appear to be in other SSTables and not be
> removed by minor compaction.


Right, but (a) the false positive rate is normally well under 0.1%,
and (b) compaction changes the set of FPs you get, so the 0.1% (say)
you aren't able to collect in minor compaction A is likely to be
removed tomorrow during minor compaction B.

> Also I have no data to back this up, but when nodes get multiple GB of
> data , ~400 GB but the daily data inserted is ~1GB/day. It may be many
> days from the time delete request until the time the SSTables with the
> key gets even minor compacted.

Sure, but if you're only inserting 1GB/day then you can afford to
wait.  Sort of a self-fixing problem.

> Wouldn't these two scenarios (and possibly others) still require major
> compaction to bring you down to the lowest possible disk utilization?

If you're so close to maxing out your disk space that you need to do
major compactions to recover, then you should usually get more disk
space.  It's your cheapest resource, certainly cheaper than adding
enough i/o capacity that major compactions are negligible.

Another option would be to tune minor compactions to be more
aggressive -- today that means lowering the min compaction threshold;
https://issues.apache.org/jira/browse/CASSANDRA-1083 also needs some
more attention.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Manual Compaction in Production

Reply via email to