As far as I remember, in newer Cassandra versions, with STCS, nodetool compact offers a ‘-s’ command-line option to split the output into files with 50%, 25% … in size, thus in this case, not a single largish SSTable anymore. By default, without -s, it is a single SSTable though.
Thomas From: Jeff Jirsa <jji...@gmail.com> Sent: Montag, 10. September 2018 19:40 To: cassandra <user@cassandra.apache.org> Subject: Re: Drop TTLd rows: upgradesstables -a or scrub? I think it's important to describe exactly what's going on for people who just read the list but who don't have context. This blog does a really good job: http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fthelastpickle.com%2Fblog%2F2016%2F07%2F27%2Fabout-deletes-and-tombstones.html&data=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cba2e0ee3b8494113460008d617456159%7C70ebe3a35b30435d9d677716d74ca190%7C1&sdata=QsmCCwsIvZC0iBvjyM8f47iNPB4i0i6SJNxmVtEixI0%3D&reserved=0> , but briefly: - When a TTL expires, we treat it as a tombstone, because it may have been written ON TOP of another piece of live data, so we need to get that deletion marker to all hosts, just like a manual explicit delete - Tombstones in sstable A may shadow data in sstable B, so doing anything on just one sstable MAY NOT remove the tombstone - we can't get rid of the tombstone if sstable A overlaps another sstable with the same partition (which we identify via bloom filter) that has any data with a lower timestamp (we don't check the sstable for a shadowed value, we just look at the minimum live timestamp of the table) "nodetool garbagecollect" looks for sstables that overlap (partition keys) and combine them together, which makes tombstones past GCGS purgable and should remove them (and data shadowed by them). If you're on a version without nodetool garbagecollection, you can approximate it using user defined compaction ( http://thelastpickle.com/blog/2016/10/18/user-defined-compaction.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fthelastpickle.com%2Fblog%2F2016%2F10%2F18%2Fuser-defined-compaction.html&data=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cba2e0ee3b8494113460008d617456159%7C70ebe3a35b30435d9d677716d74ca190%7C1&sdata=oPBoTnhhYOqY6vjxayVXuo3sevdph0Zm0cUmtV2r7nU%3D&reserved=0> ) - it's a JMX endpoint that let's you tell cassandra to compact one or more sstables together based on parameters you choose. This is somewhat like upgradesstables or scrub, but you can combine sstables as well. If you choose candidates intelligently (notably, oldest sstables first, or sstables you know overlap), you can likely manually clean things up pretty quickly. At one point, I had a jar that would do single sstable at a time, oldest sstable first, and it pretty much worked for this purpose most of the time. If you have room, a "nodetool compact" on stcs will also work, but it'll give you one huge sstable, which will be unfortunate long term (probably less of a problem if you're no longer writing to this table). On Mon, Sep 10, 2018 at 10:29 AM Charulata Sharma (charshar) <chars...@cisco.com.invalid<mailto:chars...@cisco.com.invalid>> wrote: Scrub takes a very long time and does not remove the tombstones. You should do garbage cleaning. It immediately removes the tombstones. Thaks, Charu From: Oleksandr Shulgin <oleksandr.shul...@zalando.de<mailto:oleksandr.shul...@zalando.de>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Monday, September 10, 2018 at 6:53 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Drop TTLd rows: upgradesstables -a or scrub? Hello, We have some tables with significant amount of TTLd rows that have expired by now (and more gc_grace_seconds have passed since the TTL). We have stopped writing more data to these tables quite a while ago, so background compaction isn't running. The compaction strategy is the default SizeTiered one. Now we would like to get rid of all the droppable tombstones in these tables. What would be the approach that puts the least stress on the cluster? We've considered a few, but the most promising ones seem to be these two: `nodetool scrub` or `nodetool upgradesstables -a`. We are using Cassandra version 3.0. Now, this docs page recommends to use upgradesstables wherever possible: https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsScrub.html<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.datastax.com%2Fen%2Fcassandra%2F3.0%2Fcassandra%2Ftools%2FtoolsScrub.html&data=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cba2e0ee3b8494113460008d617456159%7C70ebe3a35b30435d9d677716d74ca190%7C1&sdata=bLlEXcX7M4%2FQvZaVfkusSosZxFXpOmHn6QftqgP%2Fwsk%3D&reserved=0> What is the reason behind it? From source code I can see that Scrubber the class which is going to drop the tombstones (and report the total number in the logs): https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/compaction/Scrubber.java#L308<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2Fcassandra-3.0%2Fsrc%2Fjava%2Forg%2Fapache%2Fcassandra%2Fdb%2Fcompaction%2FScrubber.java%23L308&data=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cba2e0ee3b8494113460008d617456159%7C70ebe3a35b30435d9d677716d74ca190%7C1&sdata=Is9QfCYwrFTWhmud9u15rAa7zWkMgRBwJP2NYqUuxFg%3D&reserved=0> I couldn't find similar handling in the upgradesstables code path. Is the assumption correct that this one will not drop the tombstone as a side effect of rewriting the files? Any drawbacks of using scrub for this task? Thanks, -- Oleksandr "Alex" Shulgin | Senior Software Engineer | Team Flux | Data Services | Zalando SE | Tel: +49 176 127-59-707 The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313