wangzhixiang created HDFS-15560: ----------------------------------- Summary: The getMaxNodesPerRack May Cause "Failed to place enough replicas" Key: HDFS-15560 URL: https://issues.apache.org/jira/browse/HDFS-15560 Project: Hadoop HDFS Issue Type: Bug Reporter: wangzhixiang Assignee: wangzhixiang
In our hdfs Cluster, the nodes in each rack is extremely uneven. Eg. rack1=[1 node], rack2=[1 node], rack3=[3 nodes], rack4=[5 nodes], rack5=[4 nodes], rack6=[4 nodes]. When invoke getMaxNodesPerRack method, we will get MaxNodesPerRack = 4 by MaxNodesPerRack = (totalNumOfReplicas-1)/numOfRacks + 2, totalNumOfReplicas = 18, numOfRacks = 6。 And the replications of some files in our cluster is set to 50, so it be allocated 18 replicas and we need the all nodes . However, the rack4 could only choose 4 nodes because of MaxNodesPerRack = 4. It will cause only 17 (1+1+3+4+4+4) replicas be choosen and throws the warn log "Failed to place enough replicas, still in need of 1 to reach 18". Besides, ReplicationMonitor will add the file as ReplicationWork to retry and it still failed in loop. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org