Abdullah Alamoudi created HDFS-6607: ---------------------------------------
Summary: DFSInputStream Seek performance improvement Key: HDFS-6607 URL: https://issues.apache.org/jira/browse/HDFS-6607 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, performance Affects Versions: 2.4.1 Reporter: Abdullah Alamoudi Priority: Minor When having a DFSInputStream open and seeking to a position that resides in the same block, if the target position is in the TCP buffer already, the seek is performed efficiently simply by eating up the intervening data. See line 1368 in the file: hadoop-common/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java. However, if the position is in the same block but after the TCP buffer, the inputstream performs a set of actions including closing the current block reader, locating the block again, selecting a data node and creating a new block reader. During this, many objects are created and all of this is very inefficient for users with random access needs (e.g index access). -- This message was sent by Atlassian JIRA (v6.2#6252)