> I have few more questions:
>
> 1. If we change the write/delete consistency level to ALL, do we
> eliminate the data inconsistency among nodes (since the delete
> operations will apply to ALL replicas)?
>
> 2. My understanding is that "Read Repair" doesn't handle tombstones.
> How about "Node Tool Repair" (do we still see inconsistent data among
> nodes after running "Node Tool Repair")?

Read repair and nodetool repair handle it during normal circumstances.
The root cause here is that not running nodetool repair within
GCGraceSeconds breaks the underlying design, leading to the type of
inconsistency you got that is not healed by RR or repair.

The most important thing is to, from now on, make sure nodetool repair
is run often enough - either by running it more often or by increasing
GCGraceSeconds - so that deletes are never forgotten to begin with.

In terms of what to do now that you're in this position, my summary of
my understanding based on the JIRA ticket and DistributedDeletes is
here:
   
http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds

Running at CL.ALL won't help in this case since the expiry of the
tombstones means the inconsistency won't be reconciled (see the JIRA
ticket). If you're not in this position (i.e., nodetool repair having
been run often enough). Using CL.ALL could technically "help" in the
normal case, but that's not the best way to heal consistency. Instead,
let Cassandra use normal read-repair. Using CL.ALL means, for one
thing, that you cannot survive node failures since CL.ALL queries will
start failing.

Basically, the tombstone issue is a non-problem as long as you run
nodetool repair often enough with respect to GCGraceSeconds. The
situation right now is a bit special because the contraints of the
cluster were violated (i.e., expired tombstones prior to nodetool
repair having been run).

I hope that clarifies.

-- 
/ Peter Schuller

Reply via email to