[ 
https://issues.apache.org/jira/browse/CASSANDRA-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18033390#comment-18033390
 ] 

Long Pan commented on CASSANDRA-13704:
--------------------------------------

This question is about Cassandra 4.x behavior.

If the warning logs or rejection errors introduced by this change are enabled, 
I assume they might appear even during regular range movement operations, such 
as when adding a new node. This is because node status (joining, normal, etc..) 
changes are propagated through gossip, which is eventually consistent.

For example:
 * At {*}t₀{*}, node5 finishes joining and becomes {*}UN{*}, making an existing 
node (say, node0) no longer own token range {*}range_x{*}.

 * At {*}t₁{*}, node0 learns from gossip that node5 is {*}UN{*}.

 * At {*}t₂{*}, node1 learns the same.

Between *t₁* and {*}t₂{*}, if a request involving *range_x* reaches node1 as 
the coordinator, node1—still believing node0 owns that range—would forward the 
request to node0. However, node0, having already updated its view, would know 
it no longer owns *range_x* and thus emit the warning log or rejection error.

{*}1{*}. Is my understanding correct that such warnings or rejections are 
expected (and relatively common) during operations like node additions, 
especially in larger clusters where gossip convergence takes longer?

{*}2{*}. Another question: if my understanding is correct, would CEP-21 
(transactional cluster metadata) completely solve this issue or at least 
greatly reduce its likelihood?
Thanks!

> Safer handling of out of range tokens
> -------------------------------------
>
>                 Key: CASSANDRA-13704
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13704
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Legacy/Coordination, Legacy/Observability
>            Reporter: Sam Tunnicliffe
>            Assignee: Caleb Rackliffe
>            Priority: Urgent
>             Fix For: 4.0.14, 4.1.7, 5.0.1
>
>         Attachments: CASSANDRA-13704_5-0_23_ci_summary.html, 
> CASSANDRA-13704_5-0_23_results_details.tar.xz, 
> CASSANDRA-13704_5-0_24_ci_summary.html, 
> CASSANDRA-13704_5-0_24_results_details.tar.xz, 
> CASSANDRA-13704_5-0_25_ci_summary.html, 
> CASSANDRA-13704_5-0_25_results_details.tar.xz, ci_summary-1.html, 
> ci_summary-2.html, ci_summary.html, result_details.tar-1.gz, 
> result_details.tar-2.gz, result_details.tar.gz
>
>          Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> It is possible for nodes to have a divergent view of the ring, which can 
> result in some operations being sent to the wrong nodes. This is an umbrella 
> ticket to mitigate such issues by adding logging when a node is asked to 
> perform an operation for tokens it does not own. This will be useful for 
> detecting when the nodes' views of the ring diverge, which is not highly 
> visible at the moment, and also for post-hoc analysis.
> It may also be beneficial to straight up reject certain operations, though 
> this will need to balance the risk of performing those ops against the 
> consequences rejecting them has on availability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to