Kihwal Lee created HDFS-8995:
--------------------------------

             Summary: Flaw in registration bookeeping can make DN die on 
reconnect
                 Key: HDFS-8995
                 URL: https://issues.apache.org/jira/browse/HDFS-8995
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Kihwal Lee
            Priority: Critical


Normally data nodes re-register with the namenode when it was unreachable for 
more than the heartbeat expiration and becomes reachable again. Datanodes keep 
retrying the rpc call such as incremental block report and heartbeat and when 
it finally gets through the namenode tells it to re-register.

We have observed some of datanodes stay dead in such scenarios. Further 
investigation has revealed that those were told to shutdown by the namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to