[ 
https://issues.apache.org/jira/browse/CASSANDRA-18758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890422#comment-17890422
 ] 

Michael Semb Wever commented on CASSANDRA-18758:
------------------------------------------------

bq. I am fine to move the logic as part of a new option for the nodetool 
gossipinfo command - please let me know if you want me to spend time doing this.

Yes I'm interested.

If the scope is reduced to observability via {{`nodetool gossipinfo 
--check-metadata`}} it would be non-invasive to runtime but make possible for 
operators to write their own monitoring checks, which would in turn make it 
easier to find those stacktrace errors in the logs.   Those stacktrace errors 
can then be reported as bugs that we can fix.

Additional metrics would be nice, but I don't see how you can update them in a 
non-invasive way.   So sticking with just {{`nodetool gossipinfo 
--check-metadata`}} is something I can commit to helping out with.

> Detect token-ownership mismatch
> -------------------------------
>
>                 Key: CASSANDRA-18758
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18758
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Gossip
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Normal
>         Attachments: Screenshot 2024-10-15 at 13.07.14.png
>
>
> *Problem Statement*
> As we know, Cassandra exchanges important topology and 
> token-ownership-related details over Gossip. Cassandra internally maintains 
> the following two separate caches that have the token-ownership information 
> maintained: 1) Gossip cache and 2) Storage Service cache. The first Gossip 
> cache is updated on a node, followed by the storage service cache. In the hot 
> path, ownership is calculated from the storage service cache. Since two 
> separate caches maintain the same information, then inconsistencies are bound 
> to happen. It could be very well feasible that the Gossip cache has 
> up-to-date ownership of the Cassandra cluster, but the service cache does 
> not, and in that scenario, inconsistent data will be served to the user.
> Currently, there is no mechanism in Cassandra that detects and fixes these 
> two caches. 
> *Long-term solution*
> We are going with the long-term transactional metadata 
> ([https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21]) to handle 
> such inconsistencies, and that’s the right thing to do.
> *Short-term solution*
> But CEP-21 might take some time, and until then, there is a need to *detect* 
> such inconsistencies. Once we detect inconsistencies, then we could have two 
> options: 1) restart the node or 2) Fix the inconsistencies on-the-fly.
> This JIRA is providing a short-term solution. Please review the pull request 
> (on 4.1): [https://github.com/apache/cassandra/pull/3548]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to