[ 
https://issues.apache.org/jira/browse/HDDS-13599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015754#comment-18015754
 ] 

Sammi Chen commented on HDDS-13599:
-----------------------------------

[~szetszwo], it's a bigger problem that a deleted replica will face.

!screenshot-1.png!

Assuming this time flow for a ContainerA with location 
/data1/hdds/5cd3bf7b-c743-478a-b398-df17271429b8/current/containerDir0/1/
t1: ContainerA start to move
t2 & t3: new read request R1 and R2 coming, these two will get the 
ContainerDataA from containerSet, and start to locate the chunk path and chunk 
file, then read, ContainerDataA points to 
/data1/hdds/5cd3bf7b-c743-478a-b398-df17271429b8/current/containerDir0/1/
t4: ContainerA is copied to destination volume on 
/data2/hdds/42f95c25-27ee-4616-9242-8068845637bc/current/containerDir0/1/ , and 
in memory containerSet is updated to new ContainerDataB, ContainerDataB points 
to /data2/hdds/42f95c25-27ee-4616-9242-8068845637bc/current/containerDir0/1/
t5 & t6: new request R3 and R4 coming, they will get the new ContainerDataB 
from containerSet
t7: old container is marked as DELETD state, and container directory 
/data1/hdds/5cd3bf7b-c743-478a-b398-df17271429b8/current/containerDir0/1/ is 
deleted from disk

Since R3 and R4 uses new ContainerDataB, they will always succeed.
For R1 and R2, they hold the ContainerDataA which points to the old 
/data1/hdds/5cd3bf7b-c743-478a-b398-df17271429b8/current/containerDir0/1/. 
Assume R1 locks related chunk files before reading and succeed, then the 
replica deletion task locks all the chunk files and delete them successful, 
then R2 tries to locks related chunk file but chunk file is got deleted, then 
R2 will fail. 
This is what you mentioned that if the data has only one replica, it cannot 
fail, and locking the file doesn't help for this case. 

> Take write Lock of all block files before a container replica directory is 
> deleted
> ----------------------------------------------------------------------------------
>
>                 Key: HDDS-13599
>                 URL: https://issues.apache.org/jira/browse/HDDS-13599
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Sammi Chen
>            Priority: Major
>         Attachments: screenshot-1.png
>
>
> To avoid interim read failure caused by block file deleted during container 
> replica directory deletion. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to