As I understand, it has to do with a node being up but missing the delete 
message (remember, if you apply the delete at CL.QUORUM, you can have almost 
half the replicas miss it and still succeed). Imagine that you have 3 nodes A, 
B, and C, each of which has a column 'foo' with a value 'bar'. Their state 
would be:
A: 'foo':'bar'     B: 'foo':'bar'     C: 'foo':'bar'

We attempt to delete column 'foo', and it succeeds on nodes A and B (meaning 
that we succeeded on CL.QUORUM). Unfortunately the packet going to node C runs 
afoul of the network gods and gets zapped in transit. The state is now:
A: 'foo':deleted     B: 'foo':deleted     C: 'foo':'bar'

If we try a read at this point, at CL.QUORUM, we are guaranteed to get at least 
one record that 'foo' was deleted and because of timestamps we know to tell the 
client as much.

After GCGraceSeconds and a compaction, the state of the nodes will be:
A: None     B: None     C: 'foo':'bar'

Some time later, we attempt a read and just happen to get C's response first. 
The response will be that 'foo' is storing 'bar'. Not only that, but read 
repair happens as well, so the state will become:
A: 'foo':'bar'     B: 'foo':'bar'     C: 'foo':'bar'

We have the infamous undelete.

----- Original Message -----
From: "A J" <>
Sent: Thursday, June 30, 2011 8:25:29 PM
Subject: Meaning of 'nodetool repair has to run within GCGraceSeconds'

I am little confused of the reason why nodetool repair has to run
within GCGraceSeconds.

The documentation at:
is not very clear to me.

How can a delete be 'unforgotten' if I don't run nodetool repair? (I
understand that if a node is down for more than GCGraceSeconds, I
should not get it up without resynching is completely. Otherwise
deletes may reappear.
But not sure how exactly nodetool repair ties into this mechanism of
distributed deletes.

Thanks for any clarifications.

Reply via email to