[ https://issues.apache.org/jira/browse/CASSANDRA-20581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jaydeepkumar Chovatia updated CASSANDRA-20581: ---------------------------------------------- Description: The current AutoRepair framework has enough visibility to know which nodes are going through repair, the duration they ran repair, the token ranges repaired, and the failed ranges. However, on a node when repair is running, we miss a few fine-grained signals, such as the following: # Number of tables/keyspaces/token ranges repaired vs. pending # Number of Merkle trees/token-ranges out of sync, which indirectly tells us inconsistencies among the nodes ## Convert this metric into % of data in sync vs. not Fine-grained node-level observability also came up while [reviewing|https://github.com/jaydeepkumar1984/cassandra/pull/54#issuecomment-2813945023] the POC for Repair on bootstrap, as it is a must for repairing as part of the bootstrap. This ticket is to improve the AutoRepair observability, and make it similar to _nodetool compactionstats_ was: The current AutoRepair framework has enough visibility to know which nodes are going through repair, the duration they ran repair, the token ranges repaired, and the failed ranges. However, on a node when repair is running, we miss a few fine-grained signals, such as the following: # Number of tables/keyspaces/token ranges repaired vs. pending # Number of Merkle trees/token-ranges out of sync, which indirectly tells us inconsistencies among the nodes ## Convert this metric into % of data repaired/not-repaired Fine-grained node-level observability also came up while [reviewing|https://github.com/jaydeepkumar1984/cassandra/pull/54#issuecomment-2813945023] the POC for Repair on bootstrap, as it is a must for repairing as part of the bootstrap. This ticket is to improve the AutoRepair observability, and make it similar to _nodetool compactionstats_ > Fine-grained observability on a node > ------------------------------------ > > Key: CASSANDRA-20581 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20581 > Project: Apache Cassandra > Issue Type: Task > Components: Consistency/Repair > Reporter: Jaydeepkumar Chovatia > Assignee: Jaydeepkumar Chovatia > Priority: Normal > > The current AutoRepair framework has enough visibility to know which nodes > are going through repair, the duration they ran repair, the token ranges > repaired, and the failed ranges. > However, on a node when repair is running, we miss a few fine-grained > signals, such as the following: > # Number of tables/keyspaces/token ranges repaired vs. pending > # Number of Merkle trees/token-ranges out of sync, which indirectly tells us > inconsistencies among the nodes > ## Convert this metric into % of data in sync vs. not > > Fine-grained node-level observability also came up while > [reviewing|https://github.com/jaydeepkumar1984/cassandra/pull/54#issuecomment-2813945023] > the POC for Repair on bootstrap, as it is a must for repairing as part of > the bootstrap. > This ticket is to improve the AutoRepair observability, and make it similar > to _nodetool compactionstats_ -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org