[
https://issues.apache.org/jira/browse/HDFS-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brahma Reddy Battula resolved HDFS-3908.
----------------------------------------
Resolution: Won't Fix
Closing as wn't fix as per discussion
[here|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201607.mbox/%[email protected]%3E]
and even not require after HDFS-10957
> In HA mode, when there is a ledger in BK missing, which is generated after
> the last checkpoint, NN can not restore it.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-3908
> URL: https://issues.apache.org/jira/browse/HDFS-3908
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: 2.0.1-alpha
> Reporter: Han Xiao
>
> If not HA, when the num of edits.dir is larger than 1. Missing of one editlog
> file in a dir will not relust problem cause of the replica in the other dir.
> However, when in HA mode(using BK as ShareStorage), if an ledger missing, the
> missing ledger will not restored at the phase of NN starting even if the
> related editlog file existing in local dir.
> The missing maintains when NN is still in standby state. However, when the NN
> enters active state, it will read the editlog file(related to the missing
> ledger) in local. But, unfortunately, the ledger after the missing one in BK
> can't be readed at such a phase(cause of gap).
> Therefore in the following situation, editlogs will not be restored even
> there is an editlog file either in BK or in local dir:
> In such a stituation, editlog can't be restored:
> 1、fsiamge file: fsimage_0000000000000005946.md5
> 2、legder in zk:
> \[zk: localhost:2181(CONNECTED) 0\] ls
> /hdfsEdit/ledgers/edits_00000000000000594
> edits_000000000000005941_000000000000005942
> edits_000000000000005943_000000000000005944
> edits_000000000000005945_000000000000005946
> edits_000000000000005949_000000000000005949
> (missing edits_000000000000005947_000000000000005948)
> 3、editlog in local editlog dir:
> \-rw-r--r-- 1 root root 30 Sep 8 03:24
> edits_0000000000000005947-0000000000000005948
> \-rw-r--r-- 1 root root 1048576 Sep 8 03:35
> edits_0000000000000005950-0000000000000005950
> \-rw-r--r-- 1 root root 1048576 Sep 8 04:42
> edits_0000000000000005951-0000000000000005951
> (miss edits_0000000000000005949-0000000000000005919)
> 4、and the seen_txid
> vm2:/tmp/hadoop-root/dfs/name/current # cat seen_txid
> 5949
> Here, we want to restored editlog from txid 5946(image) to txid
> 5949(seen_txid). The 5947-5948 is missing in BK, 5949-5949 is missing in
> local dir.
> When start the NN, the following exception is thrown:
> 2012-09-08 06:26:10,031 FATAL
> org.apache.hadoop.hdfs.server.namenode.NameNode: Error encountered requiring
> NN shutdown. Shutting down immediately.
> java.io.IOException: There appears to be a gap in the edit log. We expected
> txid 5949, but got txid 5950.
> at
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:163)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:692)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:223)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:182)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:599)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1325)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1233)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:990)
> at
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> 2012-09-08 06:26:10,036 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at vm2/160.161.0.155
> ************************************************************/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]