Hi,

I have a question regarding TTLs and Tombstones with a pretty long scenario + solution question. My first, general question is - when Cassandra checks for the TTL (if it expired) and creates the Tombstone if needed? I know it happens during compaction, but is this the only situation? How about checking it on reads? How about the "nodetool-based" actions? Scrub? Repair?

The reason of my question is such scenario - I add the same amount of rows to CF every month. All of them have TTL of 6 months - when I add data from July, data from January should expire. I do NOT modify these data any later. However, because of SizeTiered compaction and large SSTables my old data do not expire in terms of disk usage - the're in the biggest/oldest SSTable which is not going to be compacted any soon. I want to get rid of the data I don't need. So my solution is to perform a user defined compaction on the single file that contains the oldest data (I make an assumption that in my use case it's the biggest / oldest SSTable). It works (at least the first compaction - see below), but I want to make sure that I'm right and I understand why it happens ;-)

Heres how I understand how it works (it's December, my oldest data are from November, so I want to have nothing older than June):

I have a large SSTable which was compacted in August for the last time and it's the oldest SSTable, much larger than the rest, so I can assume that it contains: (a) some Tombstones for the January data (when it was compacted for the last time January was the month to be expired so the Tombstones were created) which haven't been removed so far (b) some data from February - May which are NOT marked for deletion so far because when compaction has occured for the last time they were "fresh" enough to stay
(c) some newer data (June+)
So I compact it. Tombstones (a) are removed. Expired data (b) are marked for deletion by creating Tombstones for them. The rest of data is untouched. This reduces the file size by ~10-20%. This is what I checked and it worked. Then I wait 10 days (gc_grace) and compact it once again. It should remove all the Tombstones created during previous compaction, so file size should be reduced significantly (let's say it should be like 20% of the initial size or so). This is what I wait for.
Am I right?

How about repair? As compaction is a "per-node" task, I guess I should run repair between these two compactions to make sure that Tombstones have been transfered to other replicas?

Or maybe - returning to my first question - Cassandra checks TTLs much more often (like with every single read?) so they're "spread" among many SSTables and they won't get compacted efficiently during compacting the oldest SSTable only? Or maybe jobs like scrub check TTLs and create Tombstones too? Or repair?

I know that I could check some of these things with new nodetool features (like checking % of Tombstones in SSTable), but I run 1.1.1 and it's unavailable here. I know that 1.2 (or 1.1.7?) handles Tombstones in a better way, but - still - it's not my case unless I upgrade.

Kind regards,
MichaƂ

Reply via email to