Your schema may have read repair (non-blocking, background) set to 10%
(0.1, for dclocal).
You may have GC pauses causing writes (or reads) to be delayed.
You may be hitting a cassandra bug.

Would need the `TRACING` output to know for sure.


On Mon, Aug 10, 2020 at 10:10 PM Tobias Eriksson <
tobias.eriks...@qvantel.com> wrote:

> Hi
>
> We have a Cassandra solution with 2 DCs where each DC has  >30 nodes
>
> From time to time we see problems with READ REPAIR, but I am stuck with
> the analysis
>
> We have a pattern for these faults where we do
>
>    1. INSERT with Local Quorum (2 out of 3)
>    2. Wait for 0.5 - 1 seconds time window
>    3. READ with Local Quorum (2 out of 3)
>       1. Triggers a read repair
>    4. Then we do an UPDATE …
>
>
>
> The replication factor is 3
>
> In my world in (1) we for sure store the data in 2 out of 3 places, and I
> would be surprised if we would not also reach the 3;rd node within 0.5 sec
>
> So how come in (3) the read can’t get a proper response from 2 out of 3
>
> Some are saying the problem started occurring when we added DC2, but I
> can’t understand how it could be as our query is Local Quorum and will
> involve only DC1
>
>
>
> How can I debug this fault ?
>
> How can I track if the data has reached all 3 nodes ?
>
>
>
> All ideas are welcome
>
> -Tobias
>
>
>
>
>

Reply via email to