[ 
https://issues.apache.org/jira/browse/HDFS-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved HDFS-8221.
----------------------------------------
    Resolution: Duplicate

> HDFS have two Standby NNs because ActiveStandbyElectorLock ephemeralOwner in 
> ZK is different with the sessionId stored in ZKFC
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8221
>                 URL: https://issues.apache.org/jira/browse/HDFS-8221
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: auto-failover
>    Affects Versions: 2.4.1
>            Reporter: shenxingfeng
>
> Firstly, NN1 is active, NN2 is standby. When NN1 become standby due to some 
> reasons, NN2 then take over the active state imediately. But after NN2 
> becoming active, It changed to standby again. And, HDFS got two standby NN 
> forever.
> After check the log, I found that NN2 become standby beacuse It have wrong 
> sessionID with ActiveStandbyElectorLock ephemeralOwner stored in Znode.
> And the rootcause is when NN1 go to standby, NN2 create one session A with 
> zk, and become active. Ideally, NN2 should have the same sessionID with 
> ActiveStandbyElectorLock ephemeralOwner stored in Znode, but some network 
> reason  can result in NN2'ZKFC sessionID changed. 
> So, I think when NN2 become standby due to different sessionid, NN2 should 
> unlock the state in Znode in order to failover again.
> ActiveStandyElector.processResult
> ==================
> Code code = Code.get(rc);
>     if (isSuccess(code)) {
>       // the following owner check completes verification in case the lock 
> znode
>       // creation was retried
>       if (stat.getEphemeralOwner() == zkClient.getSessionId()) {
>         // we own the lock znode. so we are the leader
>         if (!becomeActive()) {
>           reJoinElectionAfterFailureToBecomeActive();
>         }
>       } else {
>         // we dont own the lock znode. so we are a standby.
>         becomeStandby();
>       }
>       // the watch set by us will notify about changes
>       return;
>     }
> ActiveStandbyElectorLock content
> ==================
> [zk: 160.149.0.114:24002(CONNECTED) 1] get 
> /hadoop-ha/hacluster/ActiveStandbyElectorLock
> 160-149-0-117 锟斤拷(锟斤拷
> cZxid = 0x2000a38d9
> ctime = Thu Apr 16 11:32:54 CST 2015
> mZxid = 0x2000a38d9
> mtime = Thu Apr 16 11:32:54 CST 2015
> pZxid = 0x2000a38d9
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x164cb2b3e4b36ae4
> dataLength = 38
> numChildren = 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to