[ 
https://issues.apache.org/jira/browse/HDFS-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikola Vujic resolved HDFS-5184.
--------------------------------

    Resolution: Done

This is fixed in HDP 2 with the new implementation of the block placement 
policy with node group.

> BlockPlacementPolicyWithNodeGroup does not work correct when avoidStaleNodes 
> is true
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-5184
>                 URL: https://issues.apache.org/jira/browse/HDFS-5184
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Nikola Vujic
>            Priority: Minor
>
> If avoidStaleNodes is true then choosing targets is potentially done in two 
> attempts. If we don't find enough targets to place replicas in the first 
> attempt then second attempt is invoked with the aim to use stale nodes in 
> order to find the remaining targets. This second attempt breaks node group 
> rule of not having two replicas in the same node group.
> Invocation of the second attempt looks like this:
> {code}
> DatanodeDescriptor choseTarget(excludeNodes,...) {
>   oldExcludedNodes=new HashMap<Node, Node>(excludedNodes);
>   // first attempt 
>   // if we don't find enough targets then
>   if (avoidStaleNodes) {
>     for (Node node : results) { 
>       oldExcludedNodes.put(node, node); 
>     } 
>     numOfReplicas = totalReplicasExpected - results.size();
>     return chooseTarget(numOfReplicas, writer, oldExcludedNodes, blocksize, 
> maxNodesPerRack, results, false);
>   }
> }
> {code}
> So, all excluded nodes from the first attempt which are neither in 
> oldExcludedNodes nor in results will be ignored and the second invocation of 
> chooseTarget will use an incomplete set of excluded nodes. For example, if we 
> have next topology:
>  dn1 -> /d1/r1/n1
>  dn2 -> /d1/r1/n1
>  dn3 -> /d1/r1/n2
>  dn4 -> /d1/r1/n2
>  and if we want to choose 3 targets with avoidStaleNodes=true then in the 
> first attempt we will choose 2 targets since we have only two node groups. 
> Let's say we choose dn1 and dn3. Then, we will add dn1 and dn3 in the 
> oldExcudedNodes and use that set of excluded nodes in the second attempt. 
> This set of excluded nodes is incomplete and allows us to select dn2 and dn4 
> in the second attempt which should not be selected due to node group 
> awareness but it is happening in the current code!
> Repro:
>  - add 
> CONF.setBoolean(DFSConfigKeys.DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY,
>  true); to TestReplicationPolicyWithNodeGroup.
>  - testChooseMoreTargetsThanNodeGroups() should fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to