CR Hota created HDFS-14588:
------------------------------

             Summary: Client retries Standby NN continuously even if Active NN 
is available (WebHDFS)
                 Key: HDFS-14588
                 URL: https://issues.apache.org/jira/browse/HDFS-14588
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: CR Hota


This is a behavior we have observed in our HA setup of HDFS.
 # Active NN is up and serving traffic.
 # Stand By NN is restarted for maintenance.
 # After step 2 all new clients (webhdfs only) which connect to Stand By keep 
seeing Retriable Exception as Stand By NN is not yet started (Rpc server is yet 
to come up as FS image is loading) but http server is started and ready to 
accept traffic. This keeps happening till rpcserver is up and SNN knows that 
it's truely standby. Based on start up time this behavior can continue based on 
start-up times which is high (many minutes) for big clusters.

This above behavior is causing low availability of HDFS when HDFS is actually 
still available.

Ideally webhdfs should throw standby exception (if HA is enabled) and let 
clients connect to active following that. If active is also not available 
clients will bounce and automatically connect to the right active.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to