ZanderXu created HDFS-16804:
-------------------------------

             Summary: AddVolume contains a race condition with shutdown block 
pool
                 Key: HDFS-16804
                 URL: https://issues.apache.org/jira/browse/HDFS-16804
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: ZanderXu
            Assignee: ZanderXu


Add Volume contains a race condition with shutdown block pool, causing the 
ReplicaMap still contains some blocks belong to the removed block pool.

And the new volume still contains one unused BlockPoolSlice belongs to the 
removed block pool, caused some problems, such as: incorrect dfsUsed, incorrect 
numBlocks of the volume.

Let's review the logic of addVolume and shutdownBlockPool respectively.

 

AddVolume Logic:
 * Step1: Get all namespaceInfo from blockPoolManager
 * Step2: Create one temporary FsVolumeImpl object
 * Step3: Create some blockPoolSlice according to the namespaceInfo and add 
them to the temporary FsVolumeImpl object
 * Step4: Scan all blocks of the namespaceInfo from the volume and store them 
by one temporary ReplicaMap
 * Step5: Active the temporary FsVolumeImpl which created before (with 
FsDatasetImpl synchronized lock)
 ** Step5.1: Merge all blocks of the temporary ReplicaMap to the global 
ReplicaMap
 ** Step5.2: Add the FsVolumeImpl to the volumes

ShutdownBlockPool Logic:(with blockPool write lock)
 * Step1: Cleanup the blockPool from the global ReplicaMap
 * Step2: Shutdown the block pool from all the volumes
 ** Step2.1: do some clean operations for the block pool, such as saveReplica, 
saveDfsUsed, etc
 ** Step2.2: remove the blockPool from bpSlices

The race condition can be reproduced by the following steps:
 * AddVolume Step1: Get all namespaceInfo from blockPoolManager
 * ShutdownBlockPool Step1: Cleanup the blockPool from the global ReplicaMap
 * ShutdownBlockPool Step2: Shutdown the block pool from all the volumes
 * AddVolume Step 2~5

And result:
 * The global replicaMap contains some blocks belong to the removed blockPool
 * The bpSlices of the FsVolumeImpl contains one blockPoolSlice belong to the 
removed blockPool

Expected result:
 * The global replicaMap shouldn't contain any blocks belong to the removed 
blockPool
 * The bpSlices of any FsVolumeImpl shouldn't contain any blockPoolSlice belong 
to the removed blockPool



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to