Yup, I know it was pretty long mail and it was Christmas time, so I
thought it might be left without a reply for some time, but as some time
has passed, I'll try to remind you about my question with additional help:
TL;DR version:
WHEN does Cassandra remove expired (because of TTL) data? Which
operations cause Cassandra to check for TTL and create Tombstones for
them if needed? It happens during compaction, for sure. How about scrub,
repair? Others?
Regards,
Michał
W dniu 28.12.2012 09:08, Michal Michalski pisze:
Hi,
I have a question regarding TTLs and Tombstones with a pretty long
scenario + solution question. My first, general question is - when
Cassandra checks for the TTL (if it expired) and creates the Tombstone
if needed? I know it happens during compaction, but is this the only
situation? How about checking it on reads? How about the
"nodetool-based" actions? Scrub? Repair?
The reason of my question is such scenario - I add the same amount of
rows to CF every month. All of them have TTL of 6 months - when I add
data from July, data from January should expire. I do NOT modify these
data any later. However, because of SizeTiered compaction and large
SSTables my old data do not expire in terms of disk usage - the're in
the biggest/oldest SSTable which is not going to be compacted any soon.
I want to get rid of the data I don't need. So my solution is to perform
a user defined compaction on the single file that contains the oldest
data (I make an assumption that in my use case it's the biggest / oldest
SSTable). It works (at least the first compaction - see below), but I
want to make sure that I'm right and I understand why it happens ;-)
Heres how I understand how it works (it's December, my oldest data are
from November, so I want to have nothing older than June):
I have a large SSTable which was compacted in August for the last time
and it's the oldest SSTable, much larger than the rest, so I can assume
that it contains:
(a) some Tombstones for the January data (when it was compacted for the
last time January was the month to be expired so the Tombstones were
created) which haven't been removed so far
(b) some data from February - May which are NOT marked for deletion so
far because when compaction has occured for the last time they were
"fresh" enough to stay
(c) some newer data (June+)
So I compact it. Tombstones (a) are removed. Expired data (b) are marked
for deletion by creating Tombstones for them. The rest of data is
untouched. This reduces the file size by ~10-20%. This is what I checked
and it worked.
Then I wait 10 days (gc_grace) and compact it once again. It should
remove all the Tombstones created during previous compaction, so file
size should be reduced significantly (let's say it should be like 20% of
the initial size or so). This is what I wait for.
Am I right?
How about repair? As compaction is a "per-node" task, I guess I should
run repair between these two compactions to make sure that Tombstones
have been transfered to other replicas?
Or maybe - returning to my first question - Cassandra checks TTLs much
more often (like with every single read?) so they're "spread" among many
SSTables and they won't get compacted efficiently during compacting the
oldest SSTable only? Or maybe jobs like scrub check TTLs and create
Tombstones too? Or repair?
I know that I could check some of these things with new nodetool
features (like checking % of Tombstones in SSTable), but I run 1.1.1 and
it's unavailable here. I know that 1.2 (or 1.1.7?) handles Tombstones in
a better way, but - still - it's not my case unless I upgrade.
Kind regards,
Michał