As far as I remember, in newer Cassandra versions, with STCS, nodetool compact 
offers a ‘-s’ command-line option to split the output into files with 50%, 25% 
… in size, thus in this case, not a single largish SSTable anymore. By default, 
without -s, it is a single SSTable though.

Thomas

From: Jeff Jirsa <jji...@gmail.com>
Sent: Montag, 10. September 2018 19:40
To: cassandra <user@cassandra.apache.org>
Subject: Re: Drop TTLd rows: upgradesstables -a or scrub?

I think it's important to describe exactly what's going on for people who just 
read the list but who don't have context. This blog does a really good job: 
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fthelastpickle.com%2Fblog%2F2016%2F07%2F27%2Fabout-deletes-and-tombstones.html&data=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cba2e0ee3b8494113460008d617456159%7C70ebe3a35b30435d9d677716d74ca190%7C1&sdata=QsmCCwsIvZC0iBvjyM8f47iNPB4i0i6SJNxmVtEixI0%3D&reserved=0>
 , but briefly:

- When a TTL expires, we treat it as a tombstone, because it may have been 
written ON TOP of another piece of live data, so we need to get that deletion 
marker to all hosts, just like a manual explicit delete
- Tombstones in sstable A may shadow data in sstable B, so doing anything on 
just one sstable MAY NOT remove the tombstone - we can't get rid of the 
tombstone if sstable A overlaps another sstable with the same partition (which 
we identify via bloom filter) that has any data with a lower timestamp (we 
don't check the sstable for a shadowed value, we just look at the minimum live 
timestamp of the table)

"nodetool garbagecollect" looks for sstables that overlap (partition keys) and 
combine them together, which makes tombstones past GCGS purgable and should 
remove them (and data shadowed by them).

If you're on a version without nodetool garbagecollection, you can approximate 
it using user defined compaction ( 
http://thelastpickle.com/blog/2016/10/18/user-defined-compaction.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fthelastpickle.com%2Fblog%2F2016%2F10%2F18%2Fuser-defined-compaction.html&data=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cba2e0ee3b8494113460008d617456159%7C70ebe3a35b30435d9d677716d74ca190%7C1&sdata=oPBoTnhhYOqY6vjxayVXuo3sevdph0Zm0cUmtV2r7nU%3D&reserved=0>
 ) - it's a JMX endpoint that let's you tell cassandra to compact one or more 
sstables together based on parameters you choose. This is somewhat like 
upgradesstables or scrub, but you can combine sstables as well. If you choose 
candidates intelligently (notably, oldest sstables first, or sstables you know 
overlap), you can likely manually clean things up pretty quickly. At one point, 
I had a jar that would do single sstable at a time, oldest sstable first, and 
it pretty much worked for this purpose most of the time.

If you have room, a "nodetool compact" on stcs will also work, but it'll give 
you one huge sstable, which will be unfortunate long term (probably less of a 
problem if you're no longer writing to this table).


On Mon, Sep 10, 2018 at 10:29 AM Charulata Sharma (charshar) 
<chars...@cisco.com.invalid<mailto:chars...@cisco.com.invalid>> wrote:
Scrub takes a very long time and does not remove the tombstones. You should do 
garbage cleaning. It immediately removes the tombstones.

Thaks,
Charu

From: Oleksandr Shulgin 
<oleksandr.shul...@zalando.de<mailto:oleksandr.shul...@zalando.de>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, September 10, 2018 at 6:53 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Drop TTLd rows: upgradesstables -a or scrub?

Hello,

We have some tables with significant amount of TTLd rows that have expired by 
now (and more gc_grace_seconds have passed since the TTL).  We have stopped 
writing more data to these tables quite a while ago, so background compaction 
isn't running.  The compaction strategy is the default SizeTiered one.

Now we would like to get rid of all the droppable tombstones in these tables.  
What would be the approach that puts the least stress on the cluster?

We've considered a few, but the most promising ones seem to be these two: 
`nodetool scrub` or `nodetool upgradesstables -a`.  We are using Cassandra 
version 3.0.

Now, this docs page recommends to use upgradesstables wherever possible: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsScrub.html<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.datastax.com%2Fen%2Fcassandra%2F3.0%2Fcassandra%2Ftools%2FtoolsScrub.html&data=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cba2e0ee3b8494113460008d617456159%7C70ebe3a35b30435d9d677716d74ca190%7C1&sdata=bLlEXcX7M4%2FQvZaVfkusSosZxFXpOmHn6QftqgP%2Fwsk%3D&reserved=0>
What is the reason behind it?

From source code I can see that Scrubber the class which is going to drop the 
tombstones (and report the total number in the logs): 
https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/compaction/Scrubber.java#L308<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2Fcassandra-3.0%2Fsrc%2Fjava%2Forg%2Fapache%2Fcassandra%2Fdb%2Fcompaction%2FScrubber.java%23L308&data=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cba2e0ee3b8494113460008d617456159%7C70ebe3a35b30435d9d677716d74ca190%7C1&sdata=Is9QfCYwrFTWhmud9u15rAa7zWkMgRBwJP2NYqUuxFg%3D&reserved=0>

I couldn't find similar handling in the upgradesstables code path.  Is the 
assumption correct that this one will not drop the tombstone as a side effect 
of rewriting the files?

Any drawbacks of using scrub for this task?

Thanks,
--
Oleksandr "Alex" Shulgin | Senior Software Engineer | Team Flux | Data Services 
| Zalando SE | Tel: +49 176 127-59-707

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313

Reply via email to