[ 
https://issues.apache.org/jira/browse/CASSANDRA-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881259#comment-17881259
 ] 

Michael Semb Wever edited comment on CASSANDRA-13704 at 9/12/24 10:13 AM:
--------------------------------------------------------------------------

Configuration wise, doesn't this fit into guardrails better ? 
(and log_out_of_token_range_requests → warn_out_of_token_range_requests)
(can guardrails also be hidden?)

bq. They determine how streaming, repair, hints, mutations, read repair, and 
point/range reads handle cases where they are being executed on nodes that do 
not own the range(es) for the data involved.

It only applies to single partition reads and mutations.  So rejection only 
provides a mix mode of guarantees (and degraded availability) …?

bq. Finally, nodetool info has a new option, --out-of-range-ops, that will 
display per-keyspace counts of operations for invalid tokens.

Why do we only collect metrics when log_out_of_token_range_requests is enabled 
?  I would think this is useful regardless.  Are we expecting a significant 
performance impact from it?

bq. Once review settles, I'll likely add entries to NEWS.txt along w/ the 
CHANGES content, but given this is something we should probably never disable, 
I'm not too keen on adding it to the example cassandra.yaml.

This needs a dev ML discussion.
It is a substantial patch for stable branches, and _if rejection is default 
true_ it is a significant behavioural change.  This may not suit all users, and 
introduce unexpected and unnecessary degraded availability.  


was (Author: michaelsembwever):
Configuration wise, doesn't this fit into guardrails better ? 
(and log_out_of_token_range_requests → warn_out_of_token_range_requests)
(can guardrails also be hidden?)

bq. They determine how streaming, repair, hints, mutations, read repair, and 
point/range reads handle cases where they are being executed on nodes that do 
not own the range(es) for the data involved.

It only applies to single partition reads and mutations.  So rejection only 
provides a mix mode of guarantees (and degraded availability) …?

bq. Finally, nodetool info has a new option, --out-of-range-ops, that will 
display per-keyspace counts of operations for invalid tokens.

Why do we only collect metrics when log_out_of_token_range_requests is enabled 
?  I would think this is useful regardless…?  (are we expecting a significant 
performance impact from it?)

bq. Once review settles, I'll likely add entries to NEWS.txt along w/ the 
CHANGES content, but given this is something we should probably never disable, 
I'm not too keen on adding it to the example cassandra.yaml.

This needs a dev ML discussion.
It is a substantial patch for stable branches, and _if rejection is default 
true_ it is a significant behavioural change.  This may not suit all users, and 
introduce unexpected and unnecessary degraded availability.  

> Safer handling of out of range tokens
> -------------------------------------
>
>                 Key: CASSANDRA-13704
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13704
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Coordination, Legacy/Observability
>            Reporter: Sam Tunnicliffe
>            Assignee: Caleb Rackliffe
>            Priority: Urgent
>             Fix For: 4.0.x, 4.1.x, 5.0.x
>
>         Attachments: ci_summary-1.html, ci_summary-2.html, ci_summary.html, 
> result_details.tar-1.gz, result_details.tar-2.gz, result_details.tar.gz
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> It is possible for nodes to have a divergent view of the ring, which can 
> result in some operations being sent to the wrong nodes. This is an umbrella 
> ticket to mitigate such issues by adding logging when a node is asked to 
> perform an operation for tokens it does not own. This will be useful for 
> detecting when the nodes' views of the ring diverge, which is not highly 
> visible at the moment, and also for post-hoc analysis.
> It may also be beneficial to straight up reject certain operations, though 
> this will need to balance the risk of performing those ops against the 
> consequences rejecting them has on availability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to