Jiandan Yang created HDFS-13915: ------------------------------------ Summary: replace datanode failed because of NameNodeRpcServer#getAdditionalDatanode returning excessive datanodeInfo Key: HDFS-13915 URL: https://issues.apache.org/jira/browse/HDFS-13915 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Environment:
Reporter: Jiandan Yang Assignee: Jiandan Yang Consider following situation: 1. create a file with ALLSSD policy 2. return [SSD,SSD,DISK] due to lack of SSD space 3. client call NameNodeRpcServer#getAdditionalDatanode when recovering write pipeline and replacing bad datanode 4. BlockPlacementPolicyDefault#chooseTarget will call StoragePolicy#chooseStorageTypes(3, [SSD,DISK], none, false), but chooseStorageTypes return [SSD,SSD] 5. do numOfReplicas = requiredStorageTypes.size() and numOfReplicas is set to 2 and choose additional two datanodes 6. BlockPlacementPolicyDefault#chooseTarget return four datanodes to client 7. DataStreamer#findNewDatanode find nodes.length != original.length + 1 and throw IOException, and finally lead to write failed client warn logs is: \{code:java} WARN [DataStreamer for file /home/yarn/opensearch/in/data/120141286/0_65535/table/ucs_process/MANIFEST-093545 block BP-1742758844-11.138.8.184-1483707043031:blk_7086344902_6012765313] org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[11.138.5.4:50010,DS-04826cfc-1885-4213-a58b-8606845c5c42,SSD], DatanodeInfoWithStorage[11.138.5.9:50010,DS-f6d8eb8b-2550-474b-a692-c991d7a6f6b3,SSD], DatanodeInfoWithStorage[11.138.5.153:50010,DS-f5d77ca0-6fe3-4523-8ca8-5af975f845b6,SSD], DatanodeInfoWithStorage[11.138.9.156:50010,DS-0d15ea12-1bad-4444-84f7-1a4917a1e194,DISK]], original=[DatanodeInfoWithStorage[11.138.5.4:50010,DS-04826cfc-1885-4213-a58b-8606845c5c42,SSD], DatanodeInfoWithStorage[11.138.9.156:50010,DS-0d15ea12-1bad-4444-84f7-1a4917a1e194,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org