[ https://issues.apache.org/jira/browse/CASSANDRA-20581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951916#comment-17951916 ]
Jaydeepkumar Chovatia commented on CASSANDRA-20581: --------------------------------------------------- I will create a separate PR for the following suggestion from [~clohfink] "maybe a small addition - can you add a "unrepairedAge" in TableMetrics or the like which gives oldest mutation (from sstable metadata) from the unrepaired set? Just sstable age isnt always sufficient cause hints/repair streaming can sneak older stuff in." {code:java} unrepairedAge = createTableGauge("UnrepairedAgeInSeconds", () -> { long oldest = Long.MAX_VALUE; for (SSTableReader sstable : cfs.getSSTables(SSTableSet.CANONICAL)) { if (!sstable.isRepaired()) { oldest = Math.min(oldest, sstable.getMinTimestamp()); } } return Math.max(0, FBUtilities.nowInSeconds() - TimeUnit.MICROSECONDS.toSeconds(oldest)); } {code} > Improved observability in AutoRepair to report both expected vs. actual > repair bytes and expected vs. actual keyspaces > ---------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-20581 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20581 > Project: Apache Cassandra > Issue Type: Task > Components: Consistency/Repair > Reporter: Jaydeepkumar Chovatia > Assignee: Jaydeepkumar Chovatia > Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The current AutoRepair framework has enough visibility to know which nodes > are going through repair, the duration they ran repair, the token ranges > repaired, the failed ranges, and the successful/failed keyspaces/tables. > However, on a node when repair is running, we miss a few fine-grained signals > {*}while repair is in progress{*}, such as the following: > # Number of tables/keyspaces/token ranges repaired vs. pending > # Number of Merkle trees/token-ranges out of sync, which indirectly tells us > inconsistencies among the nodes > ## Convert this metric into % of data in sync vs. not > > Fine-grained node-level observability also came up while > [reviewing|https://github.com/jaydeepkumar1984/cassandra/pull/54#issuecomment-2813945023] > the POC for Repair on bootstrap, as it is a must for repairing as part of > the bootstrap. > This ticket is to improve the AutoRepair observability, and make it similar > to _nodetool compactionstats_ -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org