Kind of an interesting question I think you are saying if a client read resolved only the two nodes as said in Aaron's email back to the client and read -repair was kicked off because of the inconsistent values and the write did not complete yet and I guess you would have two nodes go down to lose the value right after the read, and before write was finished such that the client read a value that was never stored in the database. The odds of two nodes going out are pretty slim though.
Or, what if the node with part of the write went down, as long as the client stays up, he would complete his write on the other two nodes. Seems to me as long as two nodes don't fail, you are reading at quorum and fit with the consistency model since you get a value that will be on two nodes in the immediate future. Thanks, Dean On 10/25/12 9:45 AM, "shankarpnsn" <shankarp...@gmail.com> wrote: >aaron morton wrote >>> 2. You do a write operation (W1) with quorom of val=2 >>> node1 = val1 node2 = val2 node3 = val1 (write val2 is not complete >>>yet) >> If the write has not completed then it is not a successful write at the >> specified CL as it could fail now. >> >> Therefor the R +W > N Strong Consistency guarantee does not apply at >>this >> exact point in time. A read to the cluster at this exact point in time >> using QUOURM may return val2 or val1. Again the operation W1 has not >> completed, if read R' starts and completes while W1 is processing it may >> or may not return the result of W1. > >I agree completely that it is fair to have this indeterminism in case of >partial/failed/in-flight writes, based on what nodes respond to a >subsequent >read. > > >aaron morton wrote >> It's import to point out the difference between Read Repair, in the >> context of the read_repair_chance setting, and Consistent Reads in the >> context of the CL setting. All of this is outside of the processing of >> your read request. It is separate from the stuff below. >> >> Inside the user read request when ReadCallback.get() is called and CL >> nodes have responded the responses are compared. If a DigestMismatch >> happens then a Row Repair read is started, the result of this read is >> returned to the user. This Row Repair read MAY detect differences, if it >> does it resolves the super set, sends the delta to the replicas and >> returns the super set value to be returned to the client. >> >>> In this case, for read R1, the value val2 does not have a quorum. Would >>> read >>> R1 return val2 or val4 ? >> >> If val4 is in the memtable on node before the second read the result >>will >> be val4. >> Writes that happen between the initial read and the second read after a >> Digest Mismatch are included in the read result. > >Thanks for clarifying this, Aaron. This is very much in line with what I >figured out from the code and brings me back to my initial question on the >point of when and what the user/client gets to see as the read result. Let >us, for now, consider only the repairs initiated as a part of /consistent >reads/. If the Row Repair (after resolving and sending the deltas to >replicas, but not waiting for a quorum success after the repair) returns >the >super set value immediately to the user, wouldn't it be a breach of the >consistent reads paradigm? My intuition behind saying this is because we >would respond to the client without the replicas having confirmed their >meeting the consistency requirement. > >I agree that returning val4 is the right thing to do if quorum (two) nodes >among (node1,node2,node3) have the val4 at the second read after digest >mismatch. But wouldn't it be incorrect to respond to user with any value >when the second read (after mismatch) doesn't find a quorum. So after >sending the deltas to the replicas as a part of the repair (still a part >of >/consistent reads/), shouldn't the value be read again to check for the >presence of a quorum after the repair? > >In the example we had, assume the mismatch is detected during a read R1 >from >coordinator node C, that reaches node1, node2 >State seen by C after first read R1: <node1 = val1, node2 = val 2, node3 >= >val1> > >A second read is initiated as a part of repair for consistent read of R1. >This second read observes the values (val1, val2) from (node1, node2) and >sends the corresponding row repair delta to node1. I'm guessing C cannot >respond back to user with val2 until C knows that node1 has actually >written >the value val2 thereby meeting the quorum. Is this interpretation correct >? > > > > > > >-- >View this message in context: >http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does >-ReadRepair-exactly-do-tp7583261p7583395.html >Sent from the cassandra-u...@incubator.apache.org mailing list archive at >Nabble.com.