Abdullah Alamoudi created HDFS-6607:
---------------------------------------

             Summary: DFSInputStream Seek performance improvement
                 Key: HDFS-6607
                 URL: https://issues.apache.org/jira/browse/HDFS-6607
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: hdfs-client, performance
    Affects Versions: 2.4.1
            Reporter: Abdullah Alamoudi
            Priority: Minor


When having a DFSInputStream open and seeking to a position that resides in the 
same block, if the target position is in the TCP buffer already, the seek is 
performed efficiently simply by eating up the intervening data. See line 1368 
in the file: 
hadoop-common/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java.

However, if the position is in the same block but after the TCP buffer, the 
inputstream performs a set of actions including closing the current block 
reader, locating the block again, selecting a data node and creating a new 
block reader. During this, many objects are created and all of this is very 
inefficient for users with random access needs (e.g index access).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to