I'm trying to work through various failure modes to figure out the proper operating procedure and proper client coding practices. I'm a little unclear about what happens when a network partition gets repaired. Take the following scenario: - cluster with 5 nodes: A thru E; RF = 3; read_cf = 1; write_cf = 1 - network partition divides A-C off from D-E - operation continues on both sides, obviously some data is unavailable from D-E - hinted handoffs accumulate
Now the network partition is repaired. The question I have is what is the sequencing of events, in particular between processing HH and forwarding read requests across the former partition. I'm hoping that there is a time period to process HH *before* nodes forward requests. E.g. it would be really good for A not to forward read requests to D until D is done with HH processing. Otherwise, clients of A may see a discontinuity where data that was available during the partition see it go away and then come back. Is there a manual or wiki section that discusses some of this and I just missed it?