I'm aware of https://issues.apache.org/jira/browse/CASSANDRA-4917, which optimizes tombstone creation for TTLed columns: "We only need to ensure that ExpiringColumn and tombstone together live as long as gc_grace. If the ExpiringColumn's TTL>=gc_grace_seconds then we can create an already gcable tombstone and drop that instantly." I presume the point is that GCable tombstones can still do work (preventing spurious writing from nodes that were down) but only until the data is flushed to disk. If the effective TTL exceeds gc_grace_seconds then the tombstone will be deleted anyway.
It occurred to me that if you never update the TTL of a column, then there should be no need for tombstones at all: any replicas will have the same TTL. So there'd be no risk of missed deletes. You wouldn't even need GCable tombstones. The purpose of a tombstone is to cover the case where a different node was down and it didn't notice the delete and it still had the column and tried to replicate it back; but that won't happen if it too had the TTL. So, if - and it's a big if - a table disallowed updates to TTL, then you could really optimize deletion of TTLed columns: you could do away with tombstones entirely. If a table allows updates to TTL then it's possible a different node will have the row without the TTL and the tombstone would be needed. Or am I missing something? Disallowing updates would seem to enable optimizations in general. Many data are write-once. Donald A. Smith | Senior Software Engineer P: 425.201.3900 x 3866 C: (206) 819-5965 F: (646) 443-2333 dona...@audiencescience.com<mailto:dona...@audiencescience.com> [AudienceScience]
<<inline: image001.jpg>>