+1 Thanks for all the work that has been put into this. -Siyao
On Mon, Jun 16, 2025 at 10:39 AM Siddharth Wagle <swa...@apache.org> wrote: > +1 for the merge of this feature. Thank you to all who contributed to this > feature that enhances the robustness and manageability of Apache Ozone. > > - Sid > > On Mon, Jun 16, 2025 at 10:35 AM Ritesh Shukla <rit...@cloudera.com.invalid > > > wrote: > > > +1 > > Disclaimer: I am a sponsor and contributor for this work > > > > On Tue, Jun 10, 2025 at 5:40 PM Ethan Rose <er...@apache.org> wrote: > > > > > Based on discussion in the community sync this week I want to add some > > more > > > information. > > > Code > > > > > > For those interested in checking out the code, these are some of the > > major > > > classes to start with: > > > > > > - ReconcileContainerTask: This is the command on the datanode that > is > > > received from SCM to reconcile a container with a datanode’s peers. > It > > > passes through the ReplicationSupervisor just like replication and > > > reconstruction commands. > > > - ContainerProtos.ContainerChecksumInfo: This is the proto format of > > the > > > new file that is written into the containers with the merkle tree > and > > > list > > > of deleted blocks. > > > - ContainerMerkleTreeWriter: This class is used to build merkle > trees > > > chunk by chunk and generate a protobuf representation of the tree. > > > - ContainerChecksumTreeManager: This class coordinates reads and > > writes > > > of ContainerChecksumInfo for containers. The diff method determines > > > which repairs should be done on a container based on a peer’s merkle > > > tree. > > > - KeyValueContainerCheck#scanData: This is the existing method > called > > by > > > the background and on-demand container data scanners to scan a > > > container. > > > It has been updated to build the merkle tree as it runs. > > > - KeyValueHandler#reconcileContainer: This method updates the > > container > > > based on the peer’s replica. > > > - Major tests for reconciliation have been added to > > > TestContainerCommandReconciliation (integration test) and > > > TestContainerReconciliationWithMockDatanodes (unit test with mocked > > > clients). > > > - There are more tasks under the reconciliation jira to expand > the > > > types of faults being tested. > > > > > > Logging > > > > > > Logging was added on the datanodes to track reconciliation as it is > > > happening. The datanode application log will print a summary of > messages > > > like this: > > > > > > 2025-06-10 20:13:14,570 [main] INFO keyvalue.KeyValueHandler > > > (KeyValueHandler.java:reconcileContainer(1595)) - Beginning > > > reconciliation for container 100 with peer > > > bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Current data > > > checksum is dcce847d > > > 2025-06-10 20:13:14,589 [main] WARN keyvalue.KeyValueHandler > > > (KeyValueHandler.java:reconcileContainer(1681)) - Container 100 > > > reconciled with peer > > > bbc09073-ac0d-4b2f-afe4-1de5f9dc6f43(dn3/237.6.76.4). Data checksum > > > updated from dcce847d to 16189e0b. > > > Missing blocks repaired: 5/5 > > > Missing chunks repaired: 0/0 > > > Corrupt chunks repaired: 10/10 > > > Time taken: 19 ms > > > 2025-06-10 20:13:14,589 [main] WARN keyvalue.KeyValueHandler > > > (KeyValueHandler.java:reconcileContainer(1704)) - Completed > > > reconciliation for container 100 with 1/1 peers. 15 blocks were > > > updated. Data checksum updated from dcce847d to 16189e0b > > > > > > This shows: > > > > > > - Reconciliation started between this datanode and one other peer > for > > > container 100 > > > - After reconciliation with the peer completed, the data checksum of > > our > > > container was updated > > > - Compared to this peer, we needed to ingest 5 missing blocks and > > repair > > > 10 corrupt chunks. All operations were successful > > > - At the end we get a summary of how many changes were done to this > > > container after consulting all the peers in the reconcile request. > In > > > this > > > case there was only one peer. > > > By enabling debug logging we can see the individual blocks and > chunks > > > that were repaired as well. > > > > > > In the dn-container.log file, dataChecksum is now included for every > log > > > line. We also get one new line in this log every time the checksum for > a > > > container is updated. > > > > > > In case logs roll off, a debug tool to inspect container’s checksum > > > information locally on a datanode will be implemented in HDDS-13239 > > > <https://issues.apache.org/jira/browse/HDDS-13239>. > > > Metrics > > > > > > The metrics for reconciliation tasks are available as a part of > > > ReplicationSupervisor class which includes: > > > > > > - numRequestedContainerReconciliations - Number of reconciliation > > tasks > > > - numQueuedContainerReconciliations - Number of queued tasks > > > - numTimeoutContainerReconciliations - Number of timed-out tasks > > > - numSuccessContainerReconciliations- Number of Success > > > - numFailureContainerReconciliations - Number of Failures > > > - numSkippedContainerReconciliations - Number of Skipped Tasks > > > > > > Latency/Count metrics for the tasks exposed by CommandHandlerMetrics > for > > > ReconcileContainerCommandHandler: > > > > > > - TotalRunTimeMs - The total runtime of the command handler in > > > milliseconds > > > - AvgRunTimeMs - Average run time of the command handler in > > milliseconds > > > - QueueWaitingTaskCount - The number of queued tasks waiting for > > > execution > > > - InvocationCount - The number of times the command handler has been > > > invoked > > > - CommandReceivedCount - The number of received SCM commands for > each > > > command type > > > > > > Other container reconciliation-related tasks are encapsulated in > > > ContainerMerkleTreeMetrics: > > > > > > - numMerkleTreeWriteFailure - Number of Merkle tree write failure > > > - numMerkleTreeReadFailure - Number of Merkle tree read failure > > > - numMerkleTreeDiffFailure - Number of Merkle tree diff failure > > > - numNoRepairContainerDiff - Number of container diff that doesn’t > > > require repair > > > - numRepairContainerDiff - Number of container diff that require > > repair > > > - merkleTreeWriteLatencyNS- Merkle tree write latency > > > - merkleTreeReadLatencyNS - Merkle tree read latency > > > - merkleTreeCreateLatencyNS - Merkle tree creation latency > > > - merkleTreeDiffLatencyNS - Merkle tree diff latency > > > > > > > > > Thanks for reviewing > > > > > > Ethan > > > > > >