[jira] [Comment Edited] (CASSANDRA-19120) local consistencies may get timeout if blocking read repair is sending the read repair mutation to other DC

Stefan Miklosovic (Jira) Fri, 01 Dec 2023 02:45:22 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791993#comment-17791993
 ]

Stefan Miklosovic edited comment on CASSANDRA-19120 at 12/1/23 10:44 AM:
-------------------------------------------------------------------------

[~chovatia.jayd...@gmail.com] you wrote:

_The fix you recommended won't work because if the remote node is a 
participant, the condition will always be false because of the `&&` operator in 
the if loop. It should be an `||` operator. A slight modification will work:_

{code}
if (!repairs.containsKey(participant) || 
!shouldBlockOn.test(participant.endpoint()))
    adjustedBlockFor--;
{code}

I think it should be really like this:

{code}
if (!repairs.containsKey(participant) && 
!shouldBlockOn.test(participant.endpoint()))
    adjustedBlockFor--;
{code}

You said: "if the remote node is a participant, the condition will always be 
false".

If a remote replica is participant, then shouldBlockOn(test(remote)) will 
evaluate to false because remote is not in our DC. Then we negated it - so if 
it is in remote DC, !shouldBlockOn will be true. If remote participant is not 
in repairs, then the whole condition evaluates as true so we are going to 
decrement adjustedBlockFor because we do not wait for such replica.

If remote participant _is_ in repairs, the first clause is false so whole "if" 
is false but that is ok because it is remote anyway for which we do not wait.

If participant is local and it is in repairs, first clause evaluates to false - 
that is ok because we are going to block on it - we do not adjust block for

If participant is local and it is not in repairs then we dont change 
adjustedBlockFor anyway.

was (Author: smiklosovic):
[~chovatia.jayd...@gmail.com] you wrote:

_The fix you recommended won't work because if the remote node is a 
participant, the condition will always be false because of the `&&` operator in 
the if loop. It should be an `||` operator. A slight modification will work:_

{code}
if (!repairs.containsKey(participant) || 
!shouldBlockOn.test(participant.endpoint()))
    adjustedBlockFor--;
{code}

I think it should be really like this:

{code}
if (!repairs.containsKey(participant) && 
!shouldBlockOn.test(participant.endpoint()))
    adjustedBlockFor--;
{code}

You said: "if the remote node is a participant, the condition will always be 
false".

If a remote replica is participant, then shouldBlockOn(test(remote)) will 
evaluate to false because remote is not in our DC. Then we negated it - so if 
it is in remote DC, !shouldBlockOn will be true. If remote participant is not 
in repairs, then the whole condition evaluates as true so we are going to 
decrement adjustedBlockFor because we do not wait for such replica.

If remote participant _is_ in repairs, the first clause is false so whole is if 
is false but that is ok because it is remote anyway for which we do not wait.

If participant is local and it is in repairs, first clause evaluates to false - 
that is ok because we are going to block on it - we do not adjust block for

If participant is local and it is not in repairs then we dont change 
adjustedBlockFor anyway.

> local consistencies may get timeout if blocking read repair is sending the 
> read repair mutation to other DC 
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19120
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19120
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Runtian Liu
>            Priority: Normal
>         Attachments: image-2023-11-29-15-26-08-056.png, signature.asc
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> For a two DCs cluster setup. When a new node is being added to DC1, for 
> blocking read repair triggered by local_quorum in DC1, it will require to 
> send read repair mutation to an extra node(1)(2). The selector for read 
> repair may select *ANY* node that has not been contacted before(3) instead of 
> selecting the DC1 nodes. If a node from DC2 is selected, this will cause 100% 
> timeout because of the bug described below:
> When we initialized the latch(4) for blocking read repair, the shouldBlockOn 
> function will only return true for local nodes(5), the blockFor value will be 
> reduced if a local node doesn't require repair(6). The blockFor is same as 
> the number of read repair mutation sent out. But when the coordinator node 
> receives the response from the target nodes, the latch only count down for 
> nodes in same DC(7). The latch will wait till timeout and the read request 
> will timeout.
> This can be reproduced if you have a constant load on a 3 + 3 cluster when 
> adding a node. If you have someway to trigger blocking read repair(maybe by 
> adding load using stress tool). If you use local_quorum consistency with a 
> constant read after write load in the same DC that you are adding node. You 
> will see read timeout issue from time to time because of the bug described 
> above
>  
> I think for read repair when selecting the extra node to do repair, we should 
> prefer local nodes than the nodes from other region. Also, we need to fix the 
> latch part so even if we send mutation to the nodes in other DC, we don't get 
> a timeout.
> (1)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L455]
> (2)[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ConsistencyLevel.java#L183]
> (3)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L458]
> (4)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L96]
> (5)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L71]
> (6)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L88]
> (7)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L113]
>  

--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-19120) local consistencies may get timeout if blocking read repair is sending the read repair mutation to other DC

Reply via email to