Andrew Wang created HDFS-12222: ---------------------------------- Summary: Add EC information to BlockLocations Key: HDFS-12222 URL: https://issues.apache.org/jira/browse/HDFS-12222 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0-alpha1 Reporter: Andrew Wang
HDFS applications query block location information to compute splits. One example of this is FileInputFormat: https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 You see bits of code like this that calculate offsets as follows: {noformat} long bytesInThisBlock = blkLocations[startIndex].getOffset() + blkLocations[startIndex].getLength() - offset; {noformat} EC confuses this since the block locations include parity block locations as well, which are not part of the logical file length. This messes up the offset calculation and thus topology/caching information too. Applications can figure out what's a parity block by reading the EC policy and then parsing the schema, but it'd be a lot better if we exposed this more generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org