Andrew Wang created HDFS-6268: --------------------------------- Summary: Better sorting in NetworkTopology#pseudoSortByDistance when no local node is found Key: HDFS-6268 URL: https://issues.apache.org/jira/browse/HDFS-6268 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor
In NetworkTopology#pseudoSortByDistance, if no local node is found, it will always place the first rack local node in the list in front. This became an issue when a dataset was loaded from a single datanode. This datanode ended up being the first replica for all the blocks in the dataset. When running an Impala query, the non-local reads when reading past a block boundary were all hitting this node, meaning massive load skew. -- This message was sent by Atlassian JIRA (v6.2#6252)