+1 for the merge of this feature.

- Soumitra

On Mon, Jun 16, 2025 at 11:43 PM Siyao Meng <si...@apache.org> wrote:

> +1
> Thanks for all the work that has been put into this.
>
> -Siyao
>
> On Mon, Jun 16, 2025 at 10:39 AM Siddharth Wagle <swa...@apache.org>
> wrote:
>
> > +1 for the merge of this feature. Thank you to all who contributed to
> this
> > feature that enhances the robustness and manageability of Apache Ozone.
> >
> > - Sid
> >
> > On Mon, Jun 16, 2025 at 10:35 AM Ritesh Shukla
> <rit...@cloudera.com.invalid
> > >
> > wrote:
> >
> > > +1
> > > Disclaimer: I am a sponsor and contributor for this work
> > >
> > > On Tue, Jun 10, 2025 at 5:40 PM Ethan Rose <er...@apache.org> wrote:
> > >
> > > > Based on discussion in the community sync this week I want to add
> some
> > > more
> > > > information.
> > > > Code
> > > >
> > > > For those interested in checking out the code, these are some of the
> > > major
> > > > classes to start with:
> > > >
> > > >    - ReconcileContainerTask: This is the command on the datanode that
> > is
> > > >    received from SCM to reconcile a container with a datanode’s
> peers.
> > It
> > > >    passes through the ReplicationSupervisor just like replication and
> > > >    reconstruction commands.
> > > >    - ContainerProtos.ContainerChecksumInfo: This is the proto format
> of
> > > the
> > > >    new file that is written into the containers with the merkle tree
> > and
> > > > list
> > > >    of deleted blocks.
> > > >    - ContainerMerkleTreeWriter: This class is used to build merkle
> > trees
> > > >    chunk by chunk and generate a protobuf representation of the tree.
> > > >    - ContainerChecksumTreeManager: This class coordinates reads and
> > > writes
> > > >    of ContainerChecksumInfo for containers. The diff method
> determines
> > > >    which repairs should be done on a container based on a peer’s
> merkle
> > > > tree.
> > > >    - KeyValueContainerCheck#scanData: This is the existing method
> > called
> > > by
> > > >    the background and on-demand container data scanners to scan a
> > > > container.
> > > >    It has been updated to build the merkle tree as it runs.
> > > >    - KeyValueHandler#reconcileContainer: This method updates the
> > > container
> > > >    based on the peer’s replica.
> > > >    - Major tests for reconciliation have been added to
> > > >    TestContainerCommandReconciliation (integration test) and
> > > >    TestContainerReconciliationWithMockDatanodes (unit test with
> mocked
> > > >    clients).
> > > >       - There are more tasks under the reconciliation jira to expand
> > the
> > > >       types of faults being tested.
> > > >
> > > > Logging
> > > >
> > > > Logging was added on the datanodes to track reconciliation as it is
> > > > happening. The datanode application log will print a summary of
> > messages
> > > > like this:
> > > >
> > > > 2025-06-10 20:13:14,570 [main] INFO  keyvalue.KeyValueHandler
> > > > (KeyValueHandler.java:reconcileContainer(1595)) - Beginning
> > > > reconciliation for container 100 with peer
> > > > bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Current data
> > > > checksum is dcce847d
> > > > 2025-06-10 20:13:14,589 [main] WARN  keyvalue.KeyValueHandler
> > > > (KeyValueHandler.java:reconcileContainer(1681)) - Container 100
> > > > reconciled with peer
> > > > bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Data checksum
> > > > updated from dcce847d to 16189e0b.
> > > > Missing blocks repaired: 5/5
> > > > Missing chunks repaired: 0/0
> > > > Corrupt chunks repaired:  10/10
> > > > Time taken: 19 ms
> > > > 2025-06-10 20:13:14,589 [main] WARN  keyvalue.KeyValueHandler
> > > > (KeyValueHandler.java:reconcileContainer(1704)) - Completed
> > > > reconciliation for container 100 with 1/1 peers. 15 blocks were
> > > > updated. Data checksum updated from dcce847d to 16189e0b
> > > >
> > > > This shows:
> > > >
> > > >    - Reconciliation started between this datanode and one other peer
> > for
> > > >    container 100
> > > >    - After reconciliation with the peer completed, the data checksum
> of
> > > our
> > > >    container was updated
> > > >    - Compared to this peer, we needed to ingest 5 missing blocks and
> > > repair
> > > >    10 corrupt chunks. All operations were successful
> > > >    - At the end we get a summary of how many changes were done to
> this
> > > >    container after consulting all the peers in the reconcile request.
> > In
> > > > this
> > > >    case there was only one peer.
> > > >    By enabling debug logging we can see the individual blocks and
> > chunks
> > > >    that were repaired as well.
> > > >
> > > > In the dn-container.log file, dataChecksum is now included for every
> > log
> > > > line. We also get one new line in this log every time the checksum
> for
> > a
> > > > container is updated.
> > > >
> > > > In case logs roll off, a debug tool to inspect container’s checksum
> > > > information locally on a datanode will be implemented in HDDS-13239
> > > > <https://issues.apache.org/jira/browse/HDDS-13239>.
> > > > Metrics
> > > >
> > > > The metrics for reconciliation tasks are available as a part of
> > > > ReplicationSupervisor class which includes:
> > > >
> > > >    - numRequestedContainerReconciliations - Number of reconciliation
> > > tasks
> > > >    - numQueuedContainerReconciliations - Number of queued tasks
> > > >    - numTimeoutContainerReconciliations - Number of timed-out tasks
> > > >    - numSuccessContainerReconciliations- Number of Success
> > > >    - numFailureContainerReconciliations - Number of Failures
> > > >    - numSkippedContainerReconciliations - Number of Skipped Tasks
> > > >
> > > > Latency/Count metrics for the tasks exposed by CommandHandlerMetrics
> > for
> > > > ReconcileContainerCommandHandler:
> > > >
> > > >    - TotalRunTimeMs - The total runtime of the command handler in
> > > >    milliseconds
> > > >    - AvgRunTimeMs - Average run time of the command handler in
> > > milliseconds
> > > >    - QueueWaitingTaskCount - The number of queued tasks waiting for
> > > >    execution
> > > >    - InvocationCount - The number of times the command handler has
> been
> > > >    invoked
> > > >    - CommandReceivedCount - The number of received SCM commands for
> > each
> > > >    command type
> > > >
> > > > Other container reconciliation-related tasks are encapsulated in
> > > > ContainerMerkleTreeMetrics:
> > > >
> > > >    - numMerkleTreeWriteFailure - Number of Merkle tree write failure
> > > >    - numMerkleTreeReadFailure - Number of Merkle tree read failure
> > > >    - numMerkleTreeDiffFailure - Number of Merkle tree diff failure
> > > >    - numNoRepairContainerDiff - Number of container diff that doesn’t
> > > >    require repair
> > > >    - numRepairContainerDiff - Number of container diff that require
> > > repair
> > > >    - merkleTreeWriteLatencyNS- Merkle tree write latency
> > > >    - merkleTreeReadLatencyNS - Merkle tree read latency
> > > >    - merkleTreeCreateLatencyNS - Merkle tree creation latency
> > > >    - merkleTreeDiffLatencyNS - Merkle tree diff latency
> > > >
> > > >
> > > > Thanks for reviewing
> > > >
> > > > Ethan
> > > >
> > >
> >
>

Reply via email to