Re: Could a READ REPAIR really be triggered even if there avg 80 ms between calls

Tobias Eriksson Tue, 01 Sep 2020 23:11:03 -0700

Thanx Jeff,
Hmmm… when you say unhealthy system, what would I check to rule that out
And is there an easy way to monitor GC time in a Cassandra cluster, trying to 
understand if this is the case
-Tobias

From: Jeff Jirsa <jji...@gmail.com>
Reply to: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Tuesday, 1 September 2020 at 17:27
To: cassandra <user@cassandra.apache.org>
Subject: Re: Could a READ REPAIR really be triggered even if there avg 80 ms 
between calls

Yes, it's possible. A typical JVM GC pause for most configs is on the order of 
50-200ms. If you have a host do a small collection/pause, then the read at #4 
is basically racing the write at #1

(or, if you have an unhealthy cluster that's regularly dropping writes due to 
much larger problems, then it's even more likely)

On Tue, Sep 1, 2020 at 12:10 AM Tobias Eriksson 
<tobias.eriks...@qvantel.com<mailto:tobias.eriks...@qvantel.com>> wrote:
Hi
 We are seeing READ REPAIRs happening, and my understanding is this
Setup 2 DCs with lots of Nodes, Replication Factor = 3

  1.  Data written (on INSERT/UPDATE)
  2.  Data replicated by Cassandra, but will not finish before (4) below
  3.  Wait 80 ms on average
  4.  Data read again with QUORUM i.e asking for atleast 2 out of 3 nodes for 
result, and now ONE replies with inaccurate data
  5.  (4) triggers a READ REPAIR
  6.  The READ REPAIR now synchs to ALL nodes also in DC2

So my question is: Is it really possible that Cassandra within 80 ms is not 
able to replicate to all 3 nodes in DC1 ?

-Tobias

Re: Could a READ REPAIR really be triggered even if there avg 80 ms between calls

Reply via email to