[
https://issues.apache.org/jira/browse/HDDS-13599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015754#comment-18015754
]
Sammi Chen commented on HDDS-13599:
-----------------------------------
[~szetszwo], it's a bigger problem that a deleted replica will face.
!screenshot-1.png!
Assuming this time flow for a ContainerA with location
/data1/hdds/5cd3bf7b-c743-478a-b398-df17271429b8/current/containerDir0/1/
t1: ContainerA start to move
t2 & t3: new read request R1 and R2 coming, these two will get the
ContainerDataA from containerSet, and start to locate the chunk path and chunk
file, then read, ContainerDataA points to
/data1/hdds/5cd3bf7b-c743-478a-b398-df17271429b8/current/containerDir0/1/
t4: ContainerA is copied to destination volume on
/data2/hdds/42f95c25-27ee-4616-9242-8068845637bc/current/containerDir0/1/ , and
in memory containerSet is updated to new ContainerDataB, ContainerDataB points
to /data2/hdds/42f95c25-27ee-4616-9242-8068845637bc/current/containerDir0/1/
t5 & t6: new request R3 and R4 coming, they will get the new ContainerDataB
from containerSet
t7: old container is marked as DELETD state, and container directory
/data1/hdds/5cd3bf7b-c743-478a-b398-df17271429b8/current/containerDir0/1/ is
deleted from disk
Since R3 and R4 uses new ContainerDataB, they will always succeed.
For R1 and R2, they hold the ContainerDataA which points to the old
/data1/hdds/5cd3bf7b-c743-478a-b398-df17271429b8/current/containerDir0/1/.
Assume R1 locks related chunk files before reading and succeed, then the
replica deletion task locks all the chunk files and delete them successful,
then R2 tries to locks related chunk file but chunk file is got deleted, then
R2 will fail.
This is what you mentioned that if the data has only one replica, it cannot
fail, and locking the file doesn't help for this case.
> Take write Lock of all block files before a container replica directory is
> deleted
> ----------------------------------------------------------------------------------
>
> Key: HDDS-13599
> URL: https://issues.apache.org/jira/browse/HDDS-13599
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Sammi Chen
> Priority: Major
> Attachments: screenshot-1.png
>
>
> To avoid interim read failure caused by block file deleted during container
> replica directory deletion.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]