Carl Steinbach created HDFS-7175:
------------------------------------

             Summary: Client-side SocketTimeoutException during Fsck
                 Key: HDFS-7175
                 URL: https://issues.apache.org/jira/browse/HDFS-7175
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
            Reporter: Carl Steinbach


HDFS-2538 disabled status reporting for the fsck command (it can optionally be 
enabled with the -showprogress option). We have observed that without status 
reporting the client will abort with read timeout:

{noformat}
[hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
14/09/30 06:03:41 WARN security.UserGroupInformation: 
PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
cause:java.net.SocketTimeoutException: Read timed out
Exception in thread "main" java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:152)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
        at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
        at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
        at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
        at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
{noformat}

Since there's nothing for the client to read it will abort if the time required 
to complete the fsck operation is longer than the client's read timeout setting.

I can think of a couple ways to fix this:
# Set an infinite read timeout on the client side (not a good idea!).
# Have the server-side write (and flush) zeros to the wire and instruct the 
client to ignore these characters instead of echoing them.
# It's possible that flushing an empty buffer on the server-side will trigger 
an HTTP response with a zero length payload. This may be enough to keep the 
client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to