On Mon, 10 Sep 2018, 19:40 Jeff Jirsa, <jji...@gmail.com> wrote: > I think it's important to describe exactly what's going on for people who > just read the list but who don't have context. This blog does a really good > job: > http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html > , but briefly: > > - When a TTL expires, we treat it as a tombstone, because it may have been > written ON TOP of another piece of live data, so we need to get that > deletion marker to all hosts, just like a manual explicit delete > - Tombstones in sstable A may shadow data in sstable B, so doing anything > on just one sstable MAY NOT remove the tombstone - we can't get rid of the > tombstone if sstable A overlaps another sstable with the same partition > (which we identify via bloom filter) that has any data with a lower > timestamp (we don't check the sstable for a shadowed value, we just look at > the minimum live timestamp of the table) > > "nodetool garbagecollect" looks for sstables that overlap (partition keys) > and combine them together, which makes tombstones past GCGS purgable and > should remove them (and data shadowed by them). > > If you're on a version without nodetool garbagecollection, you can > approximate it using user defined compaction ( > http://thelastpickle.com/blog/2016/10/18/user-defined-compaction.html ) - > it's a JMX endpoint that let's you tell cassandra to compact one or more > sstables together based on parameters you choose. This is somewhat like > upgradesstables or scrub, but you can combine sstables as well. If you > choose candidates intelligently (notably, oldest sstables first, or > sstables you know overlap), you can likely manually clean things up pretty > quickly. At one point, I had a jar that would do single sstable at a time, > oldest sstable first, and it pretty much worked for this purpose most of > the time. > > If you have room, a "nodetool compact" on stcs will also work, but it'll > give you one huge sstable, which will be unfortunate long term (probably > less of a problem if you're no longer writing to this table). >
That's a really nice refresher, thanks Jeff! >From the nature of the data at hand and because of the SizeTiered compaction, I would expect that more or less all tables do overlap with each other. Even if we would be able to identify the overlapping ones (how?), I expect that we would have to do an equivalent of the major compaction, but (maybe) in multiple stages. Not sure that's really worth the trouble for us. Thanks, -- Alex On Mon, Sep 10, 2018 at 10:29 AM Charulata Sharma (charshar) > <chars...@cisco.com.invalid> wrote: > >> Scrub takes a very long time and does not remove the tombstones. You >> should do garbage cleaning. It immediately removes the tombstones. >> >> >> >> Thaks, >> >> Charu >> >> >> >> *From: *Oleksandr Shulgin <oleksandr.shul...@zalando.de> >> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Date: *Monday, September 10, 2018 at 6:53 AM >> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Subject: *Drop TTLd rows: upgradesstables -a or scrub? >> >> >> >> Hello, >> >> >> >> We have some tables with significant amount of TTLd rows that have >> expired by now (and more gc_grace_seconds have passed since the TTL). We >> have stopped writing more data to these tables quite a while ago, so >> background compaction isn't running. The compaction strategy is the >> default SizeTiered one. >> >> >> >> Now we would like to get rid of all the droppable tombstones in these >> tables. What would be the approach that puts the least stress on the >> cluster? >> >> >> >> We've considered a few, but the most promising ones seem to be these two: >> `nodetool scrub` or `nodetool upgradesstables -a`. We are using Cassandra >> version 3.0. >> >> >> >> Now, this docs page recommends to use upgradesstables wherever possible: >> https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsScrub.html >> >> What is the reason behind it? >> >> >> >> From source code I can see that Scrubber the class which is going to drop >> the tombstones (and report the total number in the logs): >> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/compaction/Scrubber.java#L308 >> >> >> >> I couldn't find similar handling in the upgradesstables code path. Is >> the assumption correct that this one will not drop the tombstone as a side >> effect of rewriting the files? >> >> >> >> Any drawbacks of using scrub for this task? >> >> >> >> Thanks, >> -- >> >> Oleksandr "Alex" Shulgin | Senior Software Engineer | Team Flux | Data >> Services | Zalando SE | Tel: +49 176 127-59-707 >> >> >> >