Tyler, thanks for the detail explanation. Still have few questions in my mind....
# When you said send "read digest request" to the rest of the replica, do you mean all replica(s) in current and other DC? or just the one last replica in my current DC and one of the co-ordinate node in other DC? (our read and write is all "local_quorum" of replication factor of 3, local_dc_repair_chance=0)) # Sending "read digest request" to other DC, happen sequently correct? If network latency between DC is bad during time, will that affect overall read latency? # We observe that one of our cql query perform okay during normal load, but degrade greatly when we have batch of same cql(looking for the exact columns and key) sending to server in short period of time(say 100 of them within a sec). Our other table or keyspace don't see any latency drop during the time, so i am not sure we are hitting the capacity yet. So we suspect read_repair chance may have something to do wit it. Anything we can look into and see what may cause the latency spike when we have large number of same cql hitting the server? Thanks On Wed, Nov 19, 2014 at 7:49 AM, Tyler Hobbs <ty...@datastax.com> wrote: > > On Sun, Nov 16, 2014 at 5:13 PM, Jimmy Lin <y2klyf+w...@gmail.com> wrote: > >> I have read that read repair suppose to be running as background, but >> does the co-ordinator node need to wait for the response(along with other >> normal read tasks) before return the entire result back to the caller? >> > > For the 10% of requests where read repair is triggered, the coordinator > will send a request to every replica. (A data request to two replicas, > digest requests to the rest.) Once enough replicas have replied to satisfy > the consistency level, the result will be returned to the client; if > there's a mismatch in the responses from the replicas, a blocking repair > will be performed before responding to the client. Later, in the > background, the coordinator will check the remaining responses from > replicas to see if they match up. If any of them do not, they will be > repaired in the background. > > >> >> # >> how a high rate of read repair impact performance? I read something that >> it will impact through put but not latency, how so? >> > > That's correct, it should impact throughput but not necessarily latency. > Throughput is lower because more replicas have to do work, but latency is > unaffected (unless you're hitting capacity) because blocking repair only > happens under the same conditions that it normally does. > > >> >> # >> is it safe to even just make read_repair_chance = 0? >> (since we are mostly talking to one DC, the other DC most of the time >> serve as backup/emergency ) >> > > Sure, it's safe enough. People use read repair for different reasons. > Some would say that RR keeps their other datacenter's caches warm. Others > rely on it in place of normal repairs (which is not particularly safe, but > if your consistency requirements allow for it, it's fine). If you're > running regular repairs anyway, it's safe to turn off read repair. > > > -- > Tyler Hobbs > DataStax <http://datastax.com/> >