If you are working at CF ONE you are accepting that *any* value for a key+col combination stored on a replica for a row is a valid response, and that includes no value.
After the nodes have detected the others are UP they will start their HH in a staggered fashion, and will rate limit themselves to avoid overwhelming the node. It may take some time to complete. > Otherwise, clients of A may see a > discontinuity where data that was available during the partition see it > go away and then come back. If you are concerned about reads been consistent, then use CL QUORUM. If you are reading at CL ONE (in 1.0* ) the read will go one replica 90% of the time, and you will only get the result from that one replica. Which may be any value the key+col has been set to including no value. The other 10% of the time Read Repair will kick in (this is the configured value for read_repair_chance in 1.0, you can change this value). The purpose of RR is to make is so that the next time a read happens the data is consistent. So reading the CL ONE the read will go to all nodes, you will get a response from one and only one of them. In the background the responses from the others will be checked and consistency repaired. If you were working at a higher CL the responses from CL nodes are checked as part of the read request, synchronous to the read, and you get a consistent result from all nodes. RR may still run in the background and CL nodes may be less than RF nodes. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 31/01/2012, at 6:51 AM, Thorsten von Eicken wrote: > I'm trying to work through various failure modes to figure out the > proper operating procedure and proper client coding practices. I'm a > little unclear about what happens when a network partition gets > repaired. Take the following scenario: > - cluster with 5 nodes: A thru E; RF = 3; read_cf = 1; write_cf = 1 > - network partition divides A-C off from D-E > - operation continues on both sides, obviously some data is unavailable > from D-E > - hinted handoffs accumulate > > Now the network partition is repaired. The question I have is what is > the sequencing of events, in particular between processing HH and > forwarding read requests across the former partition. I'm hoping that > there is a time period to process HH *before* nodes forward requests. > E.g. it would be really good for A not to forward read requests to D > until D is done with HH processing. Otherwise, clients of A may see a > discontinuity where data that was available during the partition see it > go away and then come back. > > Is there a manual or wiki section that discusses some of this and I just > missed it? >