The rules for tombstone eviction are as follows (regardless of your compaction strategy):
1. gc_grace must be expired, and 2. No other row fragments can exist for the row that aren't also participating in the compaction. For LCS, there is no 'rule' that the tombstones can only be evicted at the highest level. They can be evicted on whichever of the level that the row converges on. Depending on your use case this may mean it always happens at level4, it might also mean that it most often happens at L1, or L2. On Fri, Nov 9, 2012 at 7:31 AM, Mina Naguib <mina.nag...@adgear.com> wrote: > > > On 2012-11-08, at 1:12 PM, B. Todd Burruss <bto...@gmail.com> wrote: > > > we are having the problem where we have huge SSTABLEs with tombstoned > data in them that is not being compacted soon enough (because size tiered > compaction requires, by default, 4 like sized SSTABLEs). this is using > more disk space than we anticipated. > > > > we are very write heavy compared to reads, and we delete the data after > N number of days (depends on the column family, but N is around 7 days) > > > > my question is would leveled compaction help to get rid of the > tombstoned data faster than size tiered, and therefore reduce the disk > space usage > > From my experience, levelled compaction makes space reclamation after > deletes even less predictable than sized-tier. > > The reason is that deletes, like all mutations, are just recorded into > sstables. They enter level0, and get slowly, over time, promoted upwards > to levelN. > > Depending on your *total* mutation volume VS your data set size, this may > be quite a slow process. This is made even worse if the size of the data > you're deleting (say, an entire row worth several hundred kilobytes) is > to-be-deleted by a small row-level tombstone. If the row is sitting in > level 4, the tombstone won't impact it until enough data has pushed over > all existing data in level3, level2, level1, level0 > > Finally, to guard against the tombstone missing any data, the tombstone > itself is not candidate for removal (I believe even after gc_grace has > passed) unless it's reached the highest populated level in levelled > compaction. This means if you have 4 levels and issue a ton of deletes > (even deletes that will never impact existing data), these tombstones are > deadweight that cannot be purged until they hit level4. > > For a write-heavy workload, I recommend you stick with sized-tier. You > have several options at your disposal (compaction min/max thresholds, > gc_grace) to move things along. If that doesn't help, I've heard of some > fairly reputable people doing some fairly blasphemous things (major > compactions every night). > > > -- Ben Coverston DataStax -- The Apache Cassandra Company