[jira] [Comment Edited] (HDDS-13599) Take write Lock of all block files before a container replica directory is deleted

Sammi Chen (Jira) Thu, 28 Aug 2025 21:13:38 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-13599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016921#comment-18016921
 ]


Sammi Chen edited comment on HDDS-13599 at 8/29/25 4:12 AM:
------------------------------------------------------------

[~szetszwo], Thread C will get the old path, because the "Container container" 
its hold get staled after new replica is added to DN in memory ContainerSet, 
the "Container container" its hold always points to the old path. 

After container replica is copied to destination volume, it call the 
KeyValueHandler.importContainer() to import the container, and this  
KeyValueHandler.importContainer() returns a new "Container container" which 
points to the new path. And new "Container container" is added to DN in memory 
ContainerSet to replace the old "Container container" which is already held by 
Thread C. Given that the concurrency of DN read, there could be hundreds of 
such Thread C there.  That's why locking file resolver doesn't work. 

Every chunk reader thread holds the read lock of KeyValueContainer, and replica 
deletion thread hold the write lock of KeyValueContainer can help.  But this 
case, replica ONE container is being moved by disk balancer is not a common 
case, first disk balancer is by default disabled, and real user rarely use 
replica ONE data in production cluster. It's kind of not worth to make the 
chunk read thread acquire lock of KeyValueContainer for such a minority case, 
not to mention the performance impact.  

Since it's a case for disk balancer, we can solve it with disk balancer's way, 
that's the HDDS-13602, we delay the deletion of old replica, to make sure all 
chunk reader threads, which hold the old "Container container", can finish 
their reading from old replica chunk file.  Thoughts? 


was (Author: sammi):
[~szetszwo], Thread C will get the old path, because the "Container container" 
its hold get staled after new replica is added to DN in memory ContainerSet, 
the "Container container" its hold always points to the old path. 

After container replica is copied to destination volume, it call the 
KeyValueHandler.importContainer() to import the container, and this  
KeyValueHandler.importContainer() returns a new "Container container" which 
points to the new path. And new "Container container" is added to DN in memory 
ContainerSet to replace the old "Container container" which is already held by 
Thread C. Given that the concurrency of DN read, there could be hundreds of 
such Thread C there.  That's why locking file resolver doesn't work. 

Every chunk reader thread holds the read lock of KeyValueContainer, and replica 
deletion thread hold the write lock of KeyValueContainer can help.  But this 
case, replica ONE container is being moved by disk balancer is not a common 
case, first disk balancer is by default disabled, and real user rarely use 
replica ONE data in production cluster. It's kind of not worth to make the 
chunk read thread acquire lock of KeyValueContainer for such a minority case, 
not to mention the performance impact.  

Since it's case for disk balancer, we can solve it with disk balancer's way, 
that's the HDDS-13602, we delay the deletion of old replica, to make sure all 
chunk reader threads, which hold the old "Container container", can finish 
their reading from old replica chunk file.  Thoughts? 

> Take write Lock of all block files before a container replica directory is 
> deleted
> ----------------------------------------------------------------------------------
>
>                 Key: HDDS-13599
>                 URL: https://issues.apache.org/jira/browse/HDDS-13599
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Sammi Chen
>            Priority: Major
>         Attachments: screenshot-1.png
>
>
> To avoid interim read failure caused by block file deleted during container 
> replica directory deletion. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDDS-13599) Take write Lock of all block files before a container replica directory is deleted

Reply via email to