Shashikant Banerjee created HDDS-935:
----------------------------------------
Summary: Avoid creating an already created container on a datanode
in case of disk removal followed by datanode restart
Key: HDDS-935
URL: https://issues.apache.org/jira/browse/HDDS-935
Project: Hadoop Distributed Data Store
Issue Type: Improvement
Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Rakesh R
Assignee: Shashikant Banerjee
Currently, a container gets created when a writeChunk request comes to
HddsDispatcher and if the container does not exist already. In case a disk on
which a container exists gets removed and datanode restarts and now, if a
writeChunkRequest comes , it might end up creating the same container again
with an updated BCSID as it won't detect the disk is removed. This won't be
detected by SCM as well as it will have the latest BCSID. This Jira aims to
address this issue.
The proposed fix would be to persist the all the containerIds existing in the
containerSet when a ratis snapshot is taken in the snapshot file. If the disk
is removed and dn gets restarted, the container set will be rebuild after
scanning all the available disks and the the container list stored in the
snapshot file will give all the containers created in the datanode. The diff
between these two will give the exact list of containers which were created but
were not detected after the restart. Any writeChunk request now should validate
the container Id from the list of missing containers. Also, we need to ensure
container creation does not happen as part of applyTransaction of writeChunk
request in Ratis.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]