+1 for the merge of this feature. Thank you to all who contributed to this
feature that enhances the robustness and manageability of Apache Ozone.

- Sid

On Mon, Jun 16, 2025 at 10:35 AM Ritesh Shukla <rit...@cloudera.com.invalid>
wrote:

> +1
> Disclaimer: I am a sponsor and contributor for this work
>
> On Tue, Jun 10, 2025 at 5:40 PM Ethan Rose <er...@apache.org> wrote:
>
> > Based on discussion in the community sync this week I want to add some
> more
> > information.
> > Code
> >
> > For those interested in checking out the code, these are some of the
> major
> > classes to start with:
> >
> >    - ReconcileContainerTask: This is the command on the datanode that is
> >    received from SCM to reconcile a container with a datanode’s peers. It
> >    passes through the ReplicationSupervisor just like replication and
> >    reconstruction commands.
> >    - ContainerProtos.ContainerChecksumInfo: This is the proto format of
> the
> >    new file that is written into the containers with the merkle tree and
> > list
> >    of deleted blocks.
> >    - ContainerMerkleTreeWriter: This class is used to build merkle trees
> >    chunk by chunk and generate a protobuf representation of the tree.
> >    - ContainerChecksumTreeManager: This class coordinates reads and
> writes
> >    of ContainerChecksumInfo for containers. The diff method determines
> >    which repairs should be done on a container based on a peer’s merkle
> > tree.
> >    - KeyValueContainerCheck#scanData: This is the existing method called
> by
> >    the background and on-demand container data scanners to scan a
> > container.
> >    It has been updated to build the merkle tree as it runs.
> >    - KeyValueHandler#reconcileContainer: This method updates the
> container
> >    based on the peer’s replica.
> >    - Major tests for reconciliation have been added to
> >    TestContainerCommandReconciliation (integration test) and
> >    TestContainerReconciliationWithMockDatanodes (unit test with mocked
> >    clients).
> >       - There are more tasks under the reconciliation jira to expand the
> >       types of faults being tested.
> >
> > Logging
> >
> > Logging was added on the datanodes to track reconciliation as it is
> > happening. The datanode application log will print a summary of messages
> > like this:
> >
> > 2025-06-10 20:13:14,570 [main] INFO  keyvalue.KeyValueHandler
> > (KeyValueHandler.java:reconcileContainer(1595)) - Beginning
> > reconciliation for container 100 with peer
> > bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Current data
> > checksum is dcce847d
> > 2025-06-10 20:13:14,589 [main] WARN  keyvalue.KeyValueHandler
> > (KeyValueHandler.java:reconcileContainer(1681)) - Container 100
> > reconciled with peer
> > bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Data checksum
> > updated from dcce847d to 16189e0b.
> > Missing blocks repaired: 5/5
> > Missing chunks repaired: 0/0
> > Corrupt chunks repaired:  10/10
> > Time taken: 19 ms
> > 2025-06-10 20:13:14,589 [main] WARN  keyvalue.KeyValueHandler
> > (KeyValueHandler.java:reconcileContainer(1704)) - Completed
> > reconciliation for container 100 with 1/1 peers. 15 blocks were
> > updated. Data checksum updated from dcce847d to 16189e0b
> >
> > This shows:
> >
> >    - Reconciliation started between this datanode and one other peer for
> >    container 100
> >    - After reconciliation with the peer completed, the data checksum of
> our
> >    container was updated
> >    - Compared to this peer, we needed to ingest 5 missing blocks and
> repair
> >    10 corrupt chunks. All operations were successful
> >    - At the end we get a summary of how many changes were done to this
> >    container after consulting all the peers in the reconcile request. In
> > this
> >    case there was only one peer.
> >    By enabling debug logging we can see the individual blocks and chunks
> >    that were repaired as well.
> >
> > In the dn-container.log file, dataChecksum is now included for every log
> > line. We also get one new line in this log every time the checksum for a
> > container is updated.
> >
> > In case logs roll off, a debug tool to inspect container’s checksum
> > information locally on a datanode will be implemented in HDDS-13239
> > <https://issues.apache.org/jira/browse/HDDS-13239>.
> > Metrics
> >
> > The metrics for reconciliation tasks are available as a part of
> > ReplicationSupervisor class which includes:
> >
> >    - numRequestedContainerReconciliations - Number of reconciliation
> tasks
> >    - numQueuedContainerReconciliations - Number of queued tasks
> >    - numTimeoutContainerReconciliations - Number of timed-out tasks
> >    - numSuccessContainerReconciliations- Number of Success
> >    - numFailureContainerReconciliations - Number of Failures
> >    - numSkippedContainerReconciliations - Number of Skipped Tasks
> >
> > Latency/Count metrics for the tasks exposed by CommandHandlerMetrics for
> > ReconcileContainerCommandHandler:
> >
> >    - TotalRunTimeMs - The total runtime of the command handler in
> >    milliseconds
> >    - AvgRunTimeMs - Average run time of the command handler in
> milliseconds
> >    - QueueWaitingTaskCount - The number of queued tasks waiting for
> >    execution
> >    - InvocationCount - The number of times the command handler has been
> >    invoked
> >    - CommandReceivedCount - The number of received SCM commands for each
> >    command type
> >
> > Other container reconciliation-related tasks are encapsulated in
> > ContainerMerkleTreeMetrics:
> >
> >    - numMerkleTreeWriteFailure - Number of Merkle tree write failure
> >    - numMerkleTreeReadFailure - Number of Merkle tree read failure
> >    - numMerkleTreeDiffFailure - Number of Merkle tree diff failure
> >    - numNoRepairContainerDiff - Number of container diff that doesn’t
> >    require repair
> >    - numRepairContainerDiff - Number of container diff that require
> repair
> >    - merkleTreeWriteLatencyNS- Merkle tree write latency
> >    - merkleTreeReadLatencyNS - Merkle tree read latency
> >    - merkleTreeCreateLatencyNS - Merkle tree creation latency
> >    - merkleTreeDiffLatencyNS - Merkle tree diff latency
> >
> >
> > Thanks for reviewing
> >
> > Ethan
> >
>

Reply via email to