[PR] HDDS-12127. RM should not expire pending deletes, but retry until the delete is confirmed or node is dead [ozone]

via GitHub Fri, 24 Jan 2025 04:30:51 -0800


sodonnel opened a new pull request, #7746:
URL: https://github.com/apache/ozone/pull/7746


   ## What changes were proposed in this pull request?
   
   When RM schedules a delete of a container on a datanode, it should keep 
track of that delete until either:
   
   1. A ICR / FCR is received which confirms the container is removed.
   2. The datanode goes dead.
   
   Currently, RM expires the delete attempt after 10 minutes and while it 
should resend the command to the same datanode on retry it may not (eg 
[HDDS-12115](https://issues.apache.org/jira/browse/HDDS-12115)) or in other 
scenarios that cause the datanode ordering to change.
   
   With this change, the expiry still occurs and the command can get dropped on 
the datanode, but in the ContainerReplicaPendingOps expiry thread, it no long 
removes the pending delete from the pending list. Instead it will trigger a 
notification to RM which will then resend the same command with a new deadline 
until it has been confirmed as successful. RM will subscribe to the 
notifications from ContainerReplicaPendingOps and re-run any expired delete 
commands.
   
   This is to combat a recent problem we experienced where delete command hung 
for a very long time and RM issued new deletes to other DNs, resulting in all 
replicas of a container getting removed unexpectedly.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-12127
   
   ## How was this patch tested?
   
   Various unit tests modified and added. Manually tested the deletes are 
resent in docker.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] HDDS-12127. RM should not expire pending deletes, but retry until the delete is confirmed or node is dead [ozone]

Reply via email to