[ https://issues.apache.org/jira/browse/HDDS-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HDDS-13073: ---------------------------------- Labels: pull-request-available (was: ) > Checksums verifier provides wrong results as it always verifies the data of > only one node > ----------------------------------------------------------------------------------------- > > Key: HDDS-13073 > URL: https://issues.apache.org/jira/browse/HDDS-13073 > Project: Apache Ozone > Issue Type: Sub-task > Reporter: Rishabh Patel > Assignee: Rishabh Patel > Priority: Major > Labels: pull-request-available > > For a 3 way Ratis replicated key, the checksums tool returns the wrong > results. > The checksum verification for each key on each node is always the same for > all three replicas. i.e., all replicas fail the checksums or none fail when > one replica should. > The checksums verifier provides incorrect results. > > This can be traced down to the way the > [pipeline|https://github.com/apache/ozone/blob/1825cdf6057ae4ac2d0bcbdcfb0bed1302054e9e/hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ChecksumVerifier.java#L62-L66] > is created for the checksum verification. > {code:java} > Pipeline.Builder pipelineBuilder = > Pipeline.newBuilder(keyLocation.getPipeline()) > .setReplicationConfig(StandaloneReplicationConfig.getInstance(ONE)) > .setNodes(Collections.singletonList(datanode)) > .setLeaderId(datanode.getID()) > .setSuggestedLeaderId(datanode.getID()) > .setReplicaIndexes(Collections.singletonMap(datanode, replicaIndex)); > {code} > > When a client is created using this pipeline, it is > [cached|https://github.com/apache/ozone/blob/1825cdf6057ae4ac2d0bcbdcfb0bed1302054e9e/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java#L150-L161] > > {code:java} > protected XceiverClientSpi getClient(Pipeline pipeline, boolean topologyAware) > throws IOException { > try { > // create different client different pipeline node based on > // network topology > String key = getPipelineCacheKey(pipeline, topologyAware); > return clientCache.get(key, () -> newClient(pipeline)); > } catch (Exception e) { > throw new IOException( > "Exception getting XceiverClient: " + e, e); > } > } {code} > > The key for the cached entry is generated in > [getPipelineCacheKey|https://github.com/apache/ozone/blob/1825cdf6057ae4ac2d0bcbdcfb0bed1302054e9e/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java#L163-L165] > {code:java} > String key = pipeline.getId().getId().toString() + pipeline.getType(); {code} > > When the pipeline is created via > {{{}Pipeline.newBuilder(keyLocation.getPipeline()){}}}, it inherits the > original pipeline's id. This results in the first cached client being reused > for subsequent checksums verification. > > Example debug log line. Note the expected node in the pipeline and the > returned node. > {code:java} > 2025-05-19 06:01:18,857 [main] INFO scm.XceiverClientManager > (XceiverClientManager.java:getClient(156)) - ATTENTION! getting possibly > cached XceiverClient for pipeline Pipeline{ Id: > c74e00f6-66d0-4bd0-9ee7-05b9c258bd5e, Nodes: [ > {91480a1b-f789-4147-922d-6790aef31cf1(localhost/127.0.0.1), ReplicaIndex: > 0},], ReplicationConfig: STANDALONE/ONE, State:OPEN, > leaderId:91480a1b-f789-4147-922d-6790aef31cf1, > CreationTimestamp2025-05-19T06:01:15.106-07:00[America/Los_Angeles]} with key > c74e00f6-66d0-4bd0-9ee7-05b9c258bd5eSTAND_ALONE > 2025-05-19 06:01:18,857 [main] INFO scm.XceiverClientManager > (XceiverClientManager.java:getClient(158)) - ATTENTION! returning cached > XceiverClient for node 1bf6b67b-5816-4e0e-90b4-9c42dd2b5df7 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org