Re: [PR] HDDS-13067. Container Balancer delete commands should not be sent with an expiration time in the past [ozone]

via GitHub Thu, 22 May 2025 05:26:21 -0700


siddhantsangwan commented on code in PR #8491:
URL: https://github.com/apache/ozone/pull/8491#discussion_r2102430321



##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/balancer/TestMoveManager.java:
##########
@@ -496,6 +499,44 @@ public void testMoveCompleteFutureReplicasUnhealthy() 
throws Exception {
         .sendDeleteCommand(eq(containerInfo), eq(0), eq(src), eq(true));
   }
 
+  @Test
+  public void testDeleteNotSentWithExpirationTimeInPast() throws Exception {
+    containerInfo = ReplicationTestUtil.createContainer(
+        HddsProtos.LifeCycleState.CLOSED, new ECReplicationConfig(3, 2));
+    setupMocks();
+
+    replicas.addAll(ReplicationTestUtil
+        .createReplicas(containerInfo.containerID(), 1, 2, 3, 4, 5));
+    Iterator<ContainerReplica> iterator = replicas.iterator();
+    ContainerReplica srcReplica = iterator.next();
+    src = srcReplica.getDatanodeDetails();
+    tgt = MockDatanodeDetails.randomDatanodeDetails();
+    nodes.put(src, NodeStatus.inServiceHealthy());
+    nodes.put(tgt, NodeStatus.inServiceHealthy());
+
+    CompletableFuture<MoveManager.MoveResult> res =
+        moveManager.move(containerInfo.containerID(), src, tgt);
+    ContainerReplicaOp op = new ContainerReplicaOp(
+        ADD, tgt, srcReplica.getReplicaIndex(), null, clock.millis() + 1000);
+    moveManager.opCompleted(op, containerInfo.containerID(), false);
+
+    ArgumentCaptor<Long> longCaptor = ArgumentCaptor.forClass(Long.class);
+    verify(replicationManager).sendDeleteCommand(
+        eq(containerInfo), eq(srcReplica.getReplicaIndex()), eq(src),
+        eq(true), longCaptor.capture());
+
+    // 6 minutes is the datanodeTimeoutOffset set for datanodeCommands sent by 
replicationManager by default
+    assertTrue((Duration.ofMillis(longCaptor.getValue()).toMillis()
+        - Duration.ofMinutes(6).toMillis()) > clock.millis());

Review Comment:
   Thanks for adding the test. We want to ensure it asserts the delete command 
is sent with an SCM deadline of `moveStartTime + moveTimeout`, so the assertion 
needs to be changed. The 6-minute `datanodeTimeoutOffset` is used later when 
the Replication Manager sends the command to the datanode, so it's not relevant 
here.
   
   It'd also be good to have a test that reproduces the situation where a 
delete command was being sent with a deadline in the past, and make sure that 
doesn't happen with the new changes. The example I added in the jira can be a 
good guide for you to reproduce the error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-13067. Container Balancer delete commands should not be sent with an expiration time in the past [ozone]

Reply via email to