+1
Thanks for all the work that has been put into this.

-Siyao

On Mon, Jun 16, 2025 at 10:39 AM Siddharth Wagle <swa...@apache.org> wrote:

> +1 for the merge of this feature. Thank you to all who contributed to this
> feature that enhances the robustness and manageability of Apache Ozone.
>
> - Sid
>
> On Mon, Jun 16, 2025 at 10:35 AM Ritesh Shukla <rit...@cloudera.com.invalid
> >
> wrote:
>
> > +1
> > Disclaimer: I am a sponsor and contributor for this work
> >
> > On Tue, Jun 10, 2025 at 5:40 PM Ethan Rose <er...@apache.org> wrote:
> >
> > > Based on discussion in the community sync this week I want to add some
> > more
> > > information.
> > > Code
> > >
> > > For those interested in checking out the code, these are some of the
> > major
> > > classes to start with:
> > >
> > >    - ReconcileContainerTask: This is the command on the datanode that
> is
> > >    received from SCM to reconcile a container with a datanode’s peers.
> It
> > >    passes through the ReplicationSupervisor just like replication and
> > >    reconstruction commands.
> > >    - ContainerProtos.ContainerChecksumInfo: This is the proto format of
> > the
> > >    new file that is written into the containers with the merkle tree
> and
> > > list
> > >    of deleted blocks.
> > >    - ContainerMerkleTreeWriter: This class is used to build merkle
> trees
> > >    chunk by chunk and generate a protobuf representation of the tree.
> > >    - ContainerChecksumTreeManager: This class coordinates reads and
> > writes
> > >    of ContainerChecksumInfo for containers. The diff method determines
> > >    which repairs should be done on a container based on a peer’s merkle
> > > tree.
> > >    - KeyValueContainerCheck#scanData: This is the existing method
> called
> > by
> > >    the background and on-demand container data scanners to scan a
> > > container.
> > >    It has been updated to build the merkle tree as it runs.
> > >    - KeyValueHandler#reconcileContainer: This method updates the
> > container
> > >    based on the peer’s replica.
> > >    - Major tests for reconciliation have been added to
> > >    TestContainerCommandReconciliation (integration test) and
> > >    TestContainerReconciliationWithMockDatanodes (unit test with mocked
> > >    clients).
> > >       - There are more tasks under the reconciliation jira to expand
> the
> > >       types of faults being tested.
> > >
> > > Logging
> > >
> > > Logging was added on the datanodes to track reconciliation as it is
> > > happening. The datanode application log will print a summary of
> messages
> > > like this:
> > >
> > > 2025-06-10 20:13:14,570 [main] INFO  keyvalue.KeyValueHandler
> > > (KeyValueHandler.java:reconcileContainer(1595)) - Beginning
> > > reconciliation for container 100 with peer
> > > bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Current data
> > > checksum is dcce847d
> > > 2025-06-10 20:13:14,589 [main] WARN  keyvalue.KeyValueHandler
> > > (KeyValueHandler.java:reconcileContainer(1681)) - Container 100
> > > reconciled with peer
> > > bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Data checksum
> > > updated from dcce847d to 16189e0b.
> > > Missing blocks repaired: 5/5
> > > Missing chunks repaired: 0/0
> > > Corrupt chunks repaired:  10/10
> > > Time taken: 19 ms
> > > 2025-06-10 20:13:14,589 [main] WARN  keyvalue.KeyValueHandler
> > > (KeyValueHandler.java:reconcileContainer(1704)) - Completed
> > > reconciliation for container 100 with 1/1 peers. 15 blocks were
> > > updated. Data checksum updated from dcce847d to 16189e0b
> > >
> > > This shows:
> > >
> > >    - Reconciliation started between this datanode and one other peer
> for
> > >    container 100
> > >    - After reconciliation with the peer completed, the data checksum of
> > our
> > >    container was updated
> > >    - Compared to this peer, we needed to ingest 5 missing blocks and
> > repair
> > >    10 corrupt chunks. All operations were successful
> > >    - At the end we get a summary of how many changes were done to this
> > >    container after consulting all the peers in the reconcile request.
> In
> > > this
> > >    case there was only one peer.
> > >    By enabling debug logging we can see the individual blocks and
> chunks
> > >    that were repaired as well.
> > >
> > > In the dn-container.log file, dataChecksum is now included for every
> log
> > > line. We also get one new line in this log every time the checksum for
> a
> > > container is updated.
> > >
> > > In case logs roll off, a debug tool to inspect container’s checksum
> > > information locally on a datanode will be implemented in HDDS-13239
> > > <https://issues.apache.org/jira/browse/HDDS-13239>.
> > > Metrics
> > >
> > > The metrics for reconciliation tasks are available as a part of
> > > ReplicationSupervisor class which includes:
> > >
> > >    - numRequestedContainerReconciliations - Number of reconciliation
> > tasks
> > >    - numQueuedContainerReconciliations - Number of queued tasks
> > >    - numTimeoutContainerReconciliations - Number of timed-out tasks
> > >    - numSuccessContainerReconciliations- Number of Success
> > >    - numFailureContainerReconciliations - Number of Failures
> > >    - numSkippedContainerReconciliations - Number of Skipped Tasks
> > >
> > > Latency/Count metrics for the tasks exposed by CommandHandlerMetrics
> for
> > > ReconcileContainerCommandHandler:
> > >
> > >    - TotalRunTimeMs - The total runtime of the command handler in
> > >    milliseconds
> > >    - AvgRunTimeMs - Average run time of the command handler in
> > milliseconds
> > >    - QueueWaitingTaskCount - The number of queued tasks waiting for
> > >    execution
> > >    - InvocationCount - The number of times the command handler has been
> > >    invoked
> > >    - CommandReceivedCount - The number of received SCM commands for
> each
> > >    command type
> > >
> > > Other container reconciliation-related tasks are encapsulated in
> > > ContainerMerkleTreeMetrics:
> > >
> > >    - numMerkleTreeWriteFailure - Number of Merkle tree write failure
> > >    - numMerkleTreeReadFailure - Number of Merkle tree read failure
> > >    - numMerkleTreeDiffFailure - Number of Merkle tree diff failure
> > >    - numNoRepairContainerDiff - Number of container diff that doesn’t
> > >    require repair
> > >    - numRepairContainerDiff - Number of container diff that require
> > repair
> > >    - merkleTreeWriteLatencyNS- Merkle tree write latency
> > >    - merkleTreeReadLatencyNS - Merkle tree read latency
> > >    - merkleTreeCreateLatencyNS - Merkle tree creation latency
> > >    - merkleTreeDiffLatencyNS - Merkle tree diff latency
> > >
> > >
> > > Thanks for reviewing
> > >
> > > Ethan
> > >
> >
>

Reply via email to