The DistributedDeletes link in that section explains the root reason for
needing to do this.  It's not that deletes are forgotten, it's that a write
(deletes are basically tombstone writes) didn't get replicated to all
replicas.  For example, at RF=3, write consistency level QUORUM, if one of
the replicas goes down for several hours while you're performing deletes,
then comes back up, it won't necessarily have all of those tombstones.
Hinted handoff will replay some of the deletes, but not all of them if
you're down for an extended period of time.

Once you have "zombie" data, the only way to get rid of it is to re-run the
delete.

On Wed, Sep 26, 2012 at 3:26 AM, Thomas Stets <thomas.st...@gmail.com>wrote:

> The Cassandra Operations page (http://wiki.apache.org/cassandra/Operations) 
> says:
>
> > Unless your application performs no deletes, it is vital that production
> clusters run nodetool repair periodically on all nodes in the cluster.
> The hard requirement for repair frequency is the value used for
> GCGraceSeconds Running nodetool repair often enough to guarantee that all
> nodes have performed a repair in a given period GCGraceSeconds long,
> ensures that deletes are not "forgotten" in the cluster.
>
> Is it really that common for deletes to be forgotten, or is it just a
> precaution against an unlikely-but-hard-to-fix problem?
>
>   regards, Thomas
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Reply via email to