[jira] [Resolved] (HDFS-3908) In HA mode, when there is a ledger in BK missing, which is generated after the last checkpoint, NN can not restore it.

Brahma Reddy Battula (JIRA) Thu, 06 Oct 2016 22:47:01 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brahma Reddy Battula resolved HDFS-3908.
----------------------------------------
    Resolution: Won't Fix

Closing as wn't fix as per discussion 
[here|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201607.mbox/%[email protected]%3E]
 and even not require after HDFS-10957 

> In HA mode, when there is a ledger in BK missing, which is generated after 
> the last checkpoint, NN can not restore it.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3908
>                 URL: https://issues.apache.org/jira/browse/HDFS-3908
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.1-alpha
>            Reporter: Han Xiao
>
> If not HA, when the num of edits.dir is larger than 1. Missing of one editlog 
> file in a dir will not relust problem cause of the replica in the other dir. 
> However, when in HA mode(using BK as ShareStorage), if an ledger missing, the 
> missing ledger will not restored at the phase of NN starting even if the 
> related editlog file existing in local dir.
> The missing maintains when NN is still in standby state. However, when the NN 
> enters active state, it will read the editlog file(related to the missing 
> ledger) in local. But, unfortunately, the ledger after the missing one in BK 
> can't be readed at such a phase(cause of gap).
> Therefore in the following situation, editlogs will not be restored even 
> there is an editlog file either in BK or in local dir: 
> In such a stituation, editlog can't be restored:
> 1、fsiamge file: fsimage_0000000000000005946.md5
> 2、legder in zk:
>       \[zk: localhost:2181(CONNECTED) 0\] ls 
> /hdfsEdit/ledgers/edits_00000000000000594
>       edits_000000000000005941_000000000000005942
>       edits_000000000000005943_000000000000005944
>       edits_000000000000005945_000000000000005946
>       edits_000000000000005949_000000000000005949   
> （missing edits_000000000000005947_000000000000005948）
> 3、editlog in local editlog dir：
>       \-rw-r--r-- 1 root root      30 Sep  8 03:24 
> edits_0000000000000005947-0000000000000005948
>       \-rw-r--r-- 1 root root 1048576 Sep  8 03:35 
> edits_0000000000000005950-0000000000000005950
>       \-rw-r--r-- 1 root root 1048576 Sep  8 04:42 
> edits_0000000000000005951-0000000000000005951
>       （miss edits_0000000000000005949-0000000000000005919）
> 4、and the seen_txid
>       vm2:/tmp/hadoop-root/dfs/name/current # cat seen_txid
>       5949
> Here, we want to restored editlog from txid 5946(image) to txid 
> 5949(seen_txid). The 5947-5948 is missing in BK, 5949-5949 is missing in 
> local dir.
> When start the NN, the following exception is thrown:
> 2012-09-08 06:26:10,031 FATAL 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Error encountered requiring 
> NN shutdown. Shutting down immediately.
> java.io.IOException: There appears to be a gap in the edit log.  We expected 
> txid 5949, but got txid 5950.
>         at 
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:163)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:692)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:223)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:182)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:599)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1325)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1233)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:990)
>         at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>         at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> 2012-09-08 06:26:10,036 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at vm2/160.161.0.155
> ************************************************************/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (HDFS-3908) In HA mode, when there is a ledger in BK missing, which is generated after the last checkpoint, NN can not restore it.

Reply via email to