I think it's important to describe exactly what's going on for people who
just read the list but who don't have context. This blog does a really good
job:
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
, but briefly:

- When a TTL expires, we treat it as a tombstone, because it may have been
written ON TOP of another piece of live data, so we need to get that
deletion marker to all hosts, just like a manual explicit delete
- Tombstones in sstable A may shadow data in sstable B, so doing anything
on just one sstable MAY NOT remove the tombstone - we can't get rid of the
tombstone if sstable A overlaps another sstable with the same partition
(which we identify via bloom filter) that has any data with a lower
timestamp (we don't check the sstable for a shadowed value, we just look at
the minimum live timestamp of the table)

"nodetool garbagecollect" looks for sstables that overlap (partition keys)
and combine them together, which makes tombstones past GCGS purgable and
should remove them (and data shadowed by them).

If you're on a version without nodetool garbagecollection, you can
approximate it using user defined compaction (
http://thelastpickle.com/blog/2016/10/18/user-defined-compaction.html ) -
it's a JMX endpoint that let's you tell cassandra to compact one or more
sstables together based on parameters you choose. This is somewhat like
upgradesstables or scrub, but you can combine sstables as well. If you
choose candidates intelligently (notably, oldest sstables first, or
sstables you know overlap), you can likely manually clean things up pretty
quickly. At one point, I had a jar that would do single sstable at a time,
oldest sstable first, and it pretty much worked for this purpose most of
the time.

If you have room, a "nodetool compact" on stcs will also work, but it'll
give you one huge sstable, which will be unfortunate long term (probably
less of a problem if you're no longer writing to this table).


On Mon, Sep 10, 2018 at 10:29 AM Charulata Sharma (charshar)
<chars...@cisco.com.invalid> wrote:

> Scrub takes a very long time and does not remove the tombstones. You
> should do garbage cleaning. It immediately removes the tombstones.
>
>
>
> Thaks,
>
> Charu
>
>
>
> *From: *Oleksandr Shulgin <oleksandr.shul...@zalando.de>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, September 10, 2018 at 6:53 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Drop TTLd rows: upgradesstables -a or scrub?
>
>
>
> Hello,
>
>
>
> We have some tables with significant amount of TTLd rows that have expired
> by now (and more gc_grace_seconds have passed since the TTL).  We have
> stopped writing more data to these tables quite a while ago, so background
> compaction isn't running.  The compaction strategy is the default
> SizeTiered one.
>
>
>
> Now we would like to get rid of all the droppable tombstones in these
> tables.  What would be the approach that puts the least stress on the
> cluster?
>
>
>
> We've considered a few, but the most promising ones seem to be these two:
> `nodetool scrub` or `nodetool upgradesstables -a`.  We are using Cassandra
> version 3.0.
>
>
>
> Now, this docs page recommends to use upgradesstables wherever possible:
> https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsScrub.html
>
> What is the reason behind it?
>
>
>
> From source code I can see that Scrubber the class which is going to drop
> the tombstones (and report the total number in the logs):
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/compaction/Scrubber.java#L308
>
>
>
> I couldn't find similar handling in the upgradesstables code path.  Is the
> assumption correct that this one will not drop the tombstone as a side
> effect of rewriting the files?
>
>
>
> Any drawbacks of using scrub for this task?
>
>
>
> Thanks,
> --
>
> Oleksandr "Alex" Shulgin | Senior Software Engineer | Team Flux | Data
> Services | Zalando SE | Tel: +49 176 127-59-707
>
>
>

Reply via email to