[ https://issues.apache.org/jira/browse/HDDS-12207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937254#comment-17937254 ]
Ethan Rose edited comment on HDDS-12207 at 3/24/25 3:47 PM: ------------------------------------------------------------ Thanks for looking at thisĀ [~sarvekshayr]. I took a stab some ideas for output too and came up with something like this: {code} { "keys": [ { "volumeName": "vol", // Split volume and bucket for easier post-processing. This also matches key list output format. "bucketName": "bucket", "name": "HISTORY.md", // Set to false if any of the replica's checks failed, or any block had no replicas found. "pass": false, "blocks": [ { "containerID": 1, "blockID": 123, "replicas": [ { "datanode": { "uuid": "123-456", "hostname": "dn1" }, "checks": [ { "type": "checksums", "pass": false, "failures": [ { "present": true, // The block was found in this container replica. "message": "Inconsistent read for chunk=123 len=10 bytesRead=5" // Comes from the checksum exception thrown out of the block input stream. May also be block not found. } ] }, { "type": "block existence", "pass": false, "failures": [ { "message": "" // It's possible that the getBlock call failed for a different reason other than the block being missing. We can write that here. } ] }, { "type": "container states", "pass": false, "failures": [ { // This check works on both SCM and the replicas, so scm state would end up duplicated among each replica's output in this layout. // SCM states of DELETING or DELETED would trigger a failure. Missing containers would already have an empty replica list as described above. "scmState": "CLOSED", "present": true, // Use the datanodes' readContainer API instead of SCM's getContainer API for the most up to date info. // UNHEALTHY would currently be the only replica state to count as a failure. "replicaState": "UNHEALTHY" } ] } ] } ] } ] } ], "pass": true // Populated at the end to quickly see if there were any failures. } {code} We can make the output print minimal information by default, and use additional flags to add information. For example, instead of {{\-\-failures-only}} to print only the failing keys. we can print only failures by default. Passing {{\-\-all}} would print results for all passing and failing keys. The extra {{failures}} information would be omitted unless ((--verbose}} is passed. was (Author: erose): Thanks for looking at thisĀ [~sarvekshayr]. I took a stab some ideas for output too and came up with something like this: {code} { "keys": [ { "volumeName": "vol", // Split volume and bucket for easier post-processing. This also matches key list output format. "bucketName": "bucket", "name": "HISTORY.md", // Set to false if any of the replica's checks failed, or any block had no replicas found. "pass": false, "blocks": [ { "containerID": 1, "blockID": 123, "replicas": [ { "datanode": { "uuid": "123-456", "hostname": "dn1" }, "checks": [ { "type": "checksums", "pass": false, "failures": [ { "present": true, // The block was found in this container replica. "message": "Inconsistent read for chunk=123 len=10 bytesRead=5" // Comes from the checksum exception thrown out of the block input stream. May also be block not found. } ] }, { "type": "block existence", "pass": false, "failures": [ { "message": "" // It's possible that the getBlock call failed for a different reason other than the block being missing. We can write that here. } ] }, { "type": "container states", "pass": false, "failures": [ { // This check works on both SCM and the replicas, so scm state would end up duplicated among each replica's output in this layout. // SCM states of DELETING or DELETED would trigger a failure. Missing containers would already have an empty replica list as described above. "scmState": "CLOSED", "present": true, // Use the datanodes' readContainer API instead of SCM's getContainer API for the most up to date info. // UNHEALTHY would currently be the only replica state to count as a failure. "replicaState": "UNHEALTHY" } ] } ] } ] } ] } ], "pass": true // Populated at the end to quickly see if there were any failures. } {code} We can make the output print minimal information by default, and use additional flags to add information. For example, instead of {{--failures-only}} to print only the failing keys. we can print only failures by default. Passing {{--all}} would print results for all passing and failing keys. The extra {{failures}} information would be omitted unless ((--verbose}} is passed. > Unify output of `ozone debug replicas verify` checks > ---------------------------------------------------- > > Key: HDDS-12207 > URL: https://issues.apache.org/jira/browse/HDDS-12207 > Project: Apache Ozone > Issue Type: Sub-task > Reporter: Ethan Rose > Assignee: Sarveksha Yeshavantha Raju > Priority: Major > > Make {{ozone debug replicas verify}} output json information about each key > and the checks that were run on it. This could optionally be streamed to > stdout or broken up into multiple files as specified by the user. As new > checks are added, their results will be included in the same json objects. We > can also add an option to skip output for keys that passed all the specified > checks. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org