[ 
https://issues.apache.org/jira/browse/HDDS-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-13073:
----------------------------------
    Labels: pull-request-available  (was: )

> Checksums verifier provides wrong results as it always verifies the data of 
> only one node
> -----------------------------------------------------------------------------------------
>
>                 Key: HDDS-13073
>                 URL: https://issues.apache.org/jira/browse/HDDS-13073
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Rishabh Patel
>            Assignee: Rishabh Patel
>            Priority: Major
>              Labels: pull-request-available
>
> For a 3 way Ratis replicated key, the checksums tool returns the wrong 
> results. 
> The checksum verification for each key on each node is always the same for 
> all three replicas. i.e., all replicas fail the checksums or none fail when 
> one replica should. 
> The checksums verifier provides incorrect results. 
>  
> This can be traced down to the way the 
> [pipeline|https://github.com/apache/ozone/blob/1825cdf6057ae4ac2d0bcbdcfb0bed1302054e9e/hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ChecksumVerifier.java#L62-L66]
>  is created for the checksum verification.  
> {code:java}
> Pipeline.Builder pipelineBuilder = 
> Pipeline.newBuilder(keyLocation.getPipeline())
>     .setReplicationConfig(StandaloneReplicationConfig.getInstance(ONE))
>     .setNodes(Collections.singletonList(datanode))
>     .setLeaderId(datanode.getID())
>     .setSuggestedLeaderId(datanode.getID())
>     .setReplicaIndexes(Collections.singletonMap(datanode, replicaIndex)); 
> {code}
>  
> When a client is created using this pipeline, it is 
> [cached|https://github.com/apache/ozone/blob/1825cdf6057ae4ac2d0bcbdcfb0bed1302054e9e/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java#L150-L161]
>  
> {code:java}
> protected XceiverClientSpi getClient(Pipeline pipeline, boolean topologyAware)
>       throws IOException {
>     try {
>       // create different client different pipeline node based on
>       // network topology
>       String key = getPipelineCacheKey(pipeline, topologyAware);
>       return clientCache.get(key, () -> newClient(pipeline));
>     } catch (Exception e) {
>       throw new IOException(
>           "Exception getting XceiverClient: " + e, e);
>     }
>   } {code}
>  
> The key for the cached entry is generated in 
> [getPipelineCacheKey|https://github.com/apache/ozone/blob/1825cdf6057ae4ac2d0bcbdcfb0bed1302054e9e/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java#L163-L165]
> {code:java}
> String key = pipeline.getId().getId().toString() + pipeline.getType(); {code}
>  
> When the pipeline is created via 
> {{{}Pipeline.newBuilder(keyLocation.getPipeline()){}}}, it inherits the 
> original pipeline's id. This results in the first cached client being reused 
> for subsequent checksums verification.
>  
> Example debug log line. Note the expected node in the pipeline and the 
> returned node.
> {code:java}
> 2025-05-19 06:01:18,857 [main] INFO  scm.XceiverClientManager 
> (XceiverClientManager.java:getClient(156)) - ATTENTION! getting possibly 
> cached XceiverClient for pipeline Pipeline{ Id: 
> c74e00f6-66d0-4bd0-9ee7-05b9c258bd5e, Nodes: [ 
> {91480a1b-f789-4147-922d-6790aef31cf1(localhost/127.0.0.1), ReplicaIndex: 
> 0},], ReplicationConfig: STANDALONE/ONE, State:OPEN, 
> leaderId:91480a1b-f789-4147-922d-6790aef31cf1, 
> CreationTimestamp2025-05-19T06:01:15.106-07:00[America/Los_Angeles]} with key 
> c74e00f6-66d0-4bd0-9ee7-05b9c258bd5eSTAND_ALONE
> 2025-05-19 06:01:18,857 [main] INFO  scm.XceiverClientManager 
> (XceiverClientManager.java:getClient(158)) - ATTENTION! returning cached 
> XceiverClient for node 1bf6b67b-5816-4e0e-90b4-9c42dd2b5df7 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to