Coordination of expired TTLs compared to tombstones

Robert Wille Fri, 29 May 2015 11:31:56 -0700

I was wondering something about Cassandra’s internals.

Suppose I have CL > 1 and I read a partition with a bunch of tombstones. Those 
tombstones have to be sent to the coordinator for consistency reasons so that 
if another replica produces non-tombstone data that is older than the 
tombstone, it can know that the data has been deleted.


I was wondering how that compares to cells with expired TTLs. Does the node get 
to skip sending data back to the coordinator for an expired TTL? I am under the 
impression that expired data doesn’t have to be sent to the coordinator, but as 
I think about it, it seems like that might not be true. 

Suppose you wrote a cell with no TTL, and then updated it with a TTL. Suppose 
that node 1 got both writes, but node 2 only got the first one. If you asked 
for the cell after it expired, and node 1 did not send anything to the 
coordinator, it seems to me that that could violate consistency levels. Also, 
read repair could never fix node 2. So, how does that work?

On a related note, do cells with expired TTLs have to wait gc_grace_seconds 
before they can be compacted out? It seems to me that if they could get 
compacted out immediately after expiration, you could get zombie data, just 
like you can with tombstones. For example, write a cell with no TTL to all 
replicas, shut down one replica, update the cell with a TTL, compact after the 
TTL has expired, then bring the other node back up. Voila, the formerly down 
node has a value that will replicate to the other nodes.

Thanks in advance

Robert

Coordination of expired TTLs compared to tombstones

Reply via email to