[jira] [Commented] (CASSANDRA-20581) Improved observability in AutoRepair to report both expected vs. actual repair bytes and expected vs. actual keyspaces

Jaydeepkumar Chovatia (Jira) Thu, 15 May 2025 14:22:06 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951916#comment-17951916
 ]


Jaydeepkumar Chovatia commented on CASSANDRA-20581:
---------------------------------------------------

I will create a separate PR for the following suggestion from [~clohfink] 

"maybe a small addition - can you add a "unrepairedAge" in TableMetrics or the 
like which gives oldest mutation (from sstable metadata) from the unrepaired 
set? Just sstable age isnt always sufficient cause hints/repair streaming can 
sneak older stuff in."
{code:java}
unrepairedAge = createTableGauge("UnrepairedAgeInSeconds", () ->
        {
            long oldest = Long.MAX_VALUE;
            for (SSTableReader sstable : cfs.getSSTables(SSTableSet.CANONICAL))
            {
                if (!sstable.isRepaired())
                {
                    oldest = Math.min(oldest, sstable.getMinTimestamp());
                }
            }
            return Math.max(0, FBUtilities.nowInSeconds() - 
TimeUnit.MICROSECONDS.toSeconds(oldest));
        } {code}

> Improved observability in AutoRepair to report both expected vs. actual 
> repair bytes and expected vs. actual keyspaces
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20581
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20581
>             Project: Apache Cassandra
>          Issue Type: Task
>          Components: Consistency/Repair
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Normal
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current AutoRepair framework has enough visibility to know which nodes 
> are going through repair, the duration they ran repair, the token ranges 
> repaired, the failed ranges, and the successful/failed keyspaces/tables.
> However, on a node when repair is running, we miss a few fine-grained signals 
> {*}while repair is in progress{*}, such as the following:
>  # Number of tables/keyspaces/token ranges repaired vs. pending
>  # Number of Merkle trees/token-ranges out of sync, which indirectly tells us 
> inconsistencies among the nodes
>  ## Convert this metric into % of data in sync vs. not
>  
> Fine-grained node-level observability also came up while 
> [reviewing|https://github.com/jaydeepkumar1984/cassandra/pull/54#issuecomment-2813945023]
>  the POC for Repair on bootstrap, as it is a must for repairing as part of 
> the bootstrap.
> This ticket is to improve the AutoRepair observability, and make it similar 
> to _nodetool compactionstats_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-20581) Improved observability in AutoRepair to report both expected vs. actual repair bytes and expected vs. actual keyspaces

Reply via email to