[
https://issues.apache.org/jira/browse/CASSANDRA-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18033390#comment-18033390
]
Long Pan commented on CASSANDRA-13704:
--------------------------------------
This question is about Cassandra 4.x behavior.
If the warning logs or rejection errors introduced by this change are enabled,
I assume they might appear even during regular range movement operations, such
as when adding a new node. This is because node status (joining, normal, etc..)
changes are propagated through gossip, which is eventually consistent.
For example:
* At {*}t₀{*}, node5 finishes joining and becomes {*}UN{*}, making an existing
node (say, node0) no longer own token range {*}range_x{*}.
* At {*}t₁{*}, node0 learns from gossip that node5 is {*}UN{*}.
* At {*}t₂{*}, node1 learns the same.
Between *t₁* and {*}t₂{*}, if a request involving *range_x* reaches node1 as
the coordinator, node1—still believing node0 owns that range—would forward the
request to node0. However, node0, having already updated its view, would know
it no longer owns *range_x* and thus emit the warning log or rejection error.
{*}1{*}. Is my understanding correct that such warnings or rejections are
expected (and relatively common) during operations like node additions,
especially in larger clusters where gossip convergence takes longer?
{*}2{*}. Another question: if my understanding is correct, would CEP-21
(transactional cluster metadata) completely solve this issue or at least
greatly reduce its likelihood?
Thanks!
> Safer handling of out of range tokens
> -------------------------------------
>
> Key: CASSANDRA-13704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13704
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Legacy/Coordination, Legacy/Observability
> Reporter: Sam Tunnicliffe
> Assignee: Caleb Rackliffe
> Priority: Urgent
> Fix For: 4.0.14, 4.1.7, 5.0.1
>
> Attachments: CASSANDRA-13704_5-0_23_ci_summary.html,
> CASSANDRA-13704_5-0_23_results_details.tar.xz,
> CASSANDRA-13704_5-0_24_ci_summary.html,
> CASSANDRA-13704_5-0_24_results_details.tar.xz,
> CASSANDRA-13704_5-0_25_ci_summary.html,
> CASSANDRA-13704_5-0_25_results_details.tar.xz, ci_summary-1.html,
> ci_summary-2.html, ci_summary.html, result_details.tar-1.gz,
> result_details.tar-2.gz, result_details.tar.gz
>
> Time Spent: 8h 10m
> Remaining Estimate: 0h
>
> It is possible for nodes to have a divergent view of the ring, which can
> result in some operations being sent to the wrong nodes. This is an umbrella
> ticket to mitigate such issues by adding logging when a node is asked to
> perform an operation for tokens it does not own. This will be useful for
> detecting when the nodes' views of the ring diverge, which is not highly
> visible at the moment, and also for post-hoc analysis.
> It may also be beneficial to straight up reject certain operations, though
> this will need to balance the risk of performing those ops against the
> consequences rejecting them has on availability.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]