> I have few more questions: > > 1. If we change the write/delete consistency level to ALL, do we > eliminate the data inconsistency among nodes (since the delete > operations will apply to ALL replicas)? > > 2. My understanding is that "Read Repair" doesn't handle tombstones. > How about "Node Tool Repair" (do we still see inconsistent data among > nodes after running "Node Tool Repair")?
Read repair and nodetool repair handle it during normal circumstances. The root cause here is that not running nodetool repair within GCGraceSeconds breaks the underlying design, leading to the type of inconsistency you got that is not healed by RR or repair. The most important thing is to, from now on, make sure nodetool repair is run often enough - either by running it more often or by increasing GCGraceSeconds - so that deletes are never forgotten to begin with. In terms of what to do now that you're in this position, my summary of my understanding based on the JIRA ticket and DistributedDeletes is here: http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds Running at CL.ALL won't help in this case since the expiry of the tombstones means the inconsistency won't be reconciled (see the JIRA ticket). If you're not in this position (i.e., nodetool repair having been run often enough). Using CL.ALL could technically "help" in the normal case, but that's not the best way to heal consistency. Instead, let Cassandra use normal read-repair. Using CL.ALL means, for one thing, that you cannot survive node failures since CL.ALL queries will start failing. Basically, the tombstone issue is a non-problem as long as you run nodetool repair often enough with respect to GCGraceSeconds. The situation right now is a bit special because the contraints of the cluster were violated (i.e., expired tombstones prior to nodetool repair having been run). I hope that clarifies. -- / Peter Schuller