Ming Ma created HDFS-10206:
------------------------------
Summary: getBlockLocations might not sort datanodes properly by
distance
Key: HDFS-10206
URL: https://issues.apache.org/jira/browse/HDFS-10206
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Ming Ma
If the DFSClient machine is not a datanode, but it shares its rack with some
datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}}
might not put the local-rack datanodes at the beginning of the sorted list.
That is because the function didn't call {{networktopology.add(client);}} to
properly set the node's parent node; something required by
{{networktopology.sortByDistance}} to compute distance between two nodes in the
same topology tree.
Another issue with {{networktopology.sortByDistance}} is it only distinguishes
local rack from remote rack, but it doesn't support general distance
calculation to tell how remote the rack is.
{noformat}
NetworkTopology.java
protected int getWeight(Node reader, Node node) {
// 0 is local, 1 is same rack, 2 is off rack
// Start off by initializing to off rack
int weight = 2;
if (reader != null) {
if (reader.equals(node)) {
weight = 0;
} else if (isOnSameRack(reader, node)) {
weight = 1;
}
}
return weight;
}
{noformat}
HDFS-10203 has suggested moving the sorting from namenode to DFSClient to
address another issue. Regardless of where we do the sorting, we still fix the
issues outline here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)