On Mon, Nov 8, 2010 at 8:23 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > I am using a build with support for removing tombstones during minor > compacts. I am pretty happy to see SSTables shrink during non-major > compactions. If I understand correctly bloomfilters have false > positives, so a key may appear to be in other SSTables and not be > removed by minor compaction.
Right, but (a) the false positive rate is normally well under 0.1%, and (b) compaction changes the set of FPs you get, so the 0.1% (say) you aren't able to collect in minor compaction A is likely to be removed tomorrow during minor compaction B. > Also I have no data to back this up, but when nodes get multiple GB of > data , ~400 GB but the daily data inserted is ~1GB/day. It may be many > days from the time delete request until the time the SSTables with the > key gets even minor compacted. Sure, but if you're only inserting 1GB/day then you can afford to wait. Sort of a self-fixing problem. > Wouldn't these two scenarios (and possibly others) still require major > compaction to bring you down to the lowest possible disk utilization? If you're so close to maxing out your disk space that you need to do major compactions to recover, then you should usually get more disk space. It's your cheapest resource, certainly cheaper than adding enough i/o capacity that major compactions are negligible. Another option would be to tune minor compactions to be more aggressive -- today that means lowering the min compaction threshold; https://issues.apache.org/jira/browse/CASSANDRA-1083 also needs some more attention. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com