Xiaobing Zhou created HDFS-8696: ----------------------------------- Summary: Small reads are blocked by large long running reads Key: HDFS-8696 URL: https://issues.apache.org/jira/browse/HDFS-8696 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Priority: Blocker
There is an issue that appears related to the webhdfs server. When making two concurrent requests, the DN will sometimes pause for extended periods (I've seen 1-300 seconds), killing performance and dropping connections. To reproduce: 1. set up a HDFS cluster 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing the time out to /tmp/times.txt {noformat} i=1 while (true); do echo $i let i++ /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null "http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root&length=1"; done {noformat} 3. Watch for 1-byte requests that take more than one second: tail -F /tmp/times.txt | grep -E "^[^0]" 4. After it has had a chance to warm up, start doing large transfers from another shell: {noformat} i=1 while (true); do echo $i let i++ (/usr/bin/time -f %e curl -s -L -o /dev/null "http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root"); done {noformat} It's easy to find after a minute or two that small reads will sometimes pause for 1-300 seconds. In some extreme cases, it appears that the transfers timeout and the DN drops the connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)