Jing Zhao created HDFS-8797: ------------------------------- Summary: WebHdfsFileSystem creates too many connections for pread Key: HDFS-8797 URL: https://issues.apache.org/jira/browse/HDFS-8797 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Jing Zhao
While running a test we found that WebHdfsFileSystem can create several thousand connections when doing a position read of a 200MB file. For each connection the client will connect to the DataNode again and the DataNode will create a new DFSClient instance to handle the read request. This also leads to several thousand {{getBlockLocations}} call to the NameNode. The cause of the issue is that in {{FSInputStream#read(long, byte[], int, int)}}, each time the inputstream reads some time, it seeks back to the old position and resets its state to SEEK. Thus the next read will regenerate the connection. {code} public int read(long position, byte[] buffer, int offset, int length) throws IOException { synchronized (this) { long oldPos = getPos(); int nread = -1; try { seek(position); nread = read(buffer, offset, length); } finally { seek(oldPos); } return nread; } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)