Zesheng Wu created HDFS-6596: -------------------------------- Summary: Improve InputStream when read spans two blocks Key: HDFS-6596 URL: https://issues.apache.org/jira/browse/HDFS-6596 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu
In the current implementation of DFSInputStream, read(buffer, offset, length) is implemented as following: {code} int realLen = (int) Math.min(len, (blockEnd - pos + 1L)); if (locatedBlocks.isLastBlockComplete()) { realLen = (int) Math.min(realLen, locatedBlocks.getFileLength()); } int result = readBuffer(strategy, off, realLen, corruptedBlockMap); {code} >From the above code, we can conclude that the read will return at most >(blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the >caller must call read() second time to complete the request, and must wait >second time to acquire the DFSInputStream lock(read() is synchronized for >DFSInputStream). For latency sensitive applications, such as hbase, this will >result in latency pain point when they under massive race conditions. So here >we propose that we should loop internally in read() to do best effort read. In the current implementation of pread(read(position, buffer, offset, lenght)), it does loop internally to do best effort read. So we can refactor to support this on normal read. -- This message was sent by Atlassian JIRA (v6.2#6252)