[ https://issues.apache.org/jira/browse/HDFS-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stephen Chu resolved HDFS-4281. ------------------------------- Resolution: Duplicate Yes, I believe it is. Marking as duplicate. > NameNode recovery does not detect NN RPC address on HA cluster > -------------------------------------------------------------- > > Key: HDFS-4281 > URL: https://issues.apache.org/jira/browse/HDFS-4281 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.0.0-alpha > Reporter: Stephen Chu > Attachments: core-site.xml, hdfs-site.xml, nn_recover > > > On a shut down HA cluster, I ran "hdfs namenode -recover" and encountered: > {code} > You have selected Metadata Recovery mode. This mode is intended to recover > lost metadata on a corrupt filesystem. Metadata recovery mode often > permanently deletes data from your HDFS filesystem. Please back up\ > your edit log and fsimage before trying this! > Are you ready to proceed? (Y/N) > (Y or N) Y > 12/12/05 16:43:48 INFO namenode.MetaRecoveryContext: starting recovery... > 12/12/05 16:43:48 WARN common.Util: Path /dfs/nn should be specified as a URI > in configuration files. Please update hdfs configuration. > 12/12/05 16:43:48 WARN common.Util: Path /dfs/nn should be specified as a URI > in configuration files. Please update hdfs configuration. > 12/12/05 16:43:48 WARN namenode.FSNamesystem: Only one image storage > directory (dfs.namenode.name.dir) configured. Beware of dataloss due to lack > of redundant storage directories! > 12/12/05 16:43:48 INFO util.HostsFileReader: Refreshing hosts > (include/exclude) list > 12/12/05 16:43:48 INFO blockmanagement.DatanodeManager: > dfs.block.invalidate.limit=1000 > 12/12/05 16:43:48 INFO blockmanagement.BlockManager: > dfs.block.access.token.enable=true > 12/12/05 16:43:48 INFO blockmanagement.BlockManager: > dfs.block.access.key.update.interval=600 min(s), > dfs.block.access.token.lifetime=600 min(s), > dfs.encrypt.data.transfer.algorithm=null > 12/12/05 16:43:48 INFO namenode.MetaRecoveryContext: RECOVERY FAILED: caught > exception > java.lang.IllegalStateException: Could not determine own NN ID in namespace > 'ha-nn-uri'. Please ensure that this node is one of the machines listed as an > NN RPC address, or configure dfs.ha.namenode.id > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.HAUtil.getNameNodeIdOfOtherNode(HAUtil.java:155) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createBlockTokenSecretManager(BlockManager.java:323) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:239) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:451) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:416) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1063) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204) > 12/12/05 16:43:48 FATAL namenode.NameNode: Exception in namenode join > java.lang.IllegalStateException: Could not determine own NN ID in namespace > 'ha-nn-uri'. Please ensure that this node is one of the machines listed as an > NN RPC address, or configure dfs.ha.namenode.id > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.HAUtil.getNameNodeIdOfOtherNode(HAUtil.java:155) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createBlockTokenSecretManager(BlockManager.java:323) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:239) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:451) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:416) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1063) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204) > 12/12/05 16:43:48 INFO util.ExitUtil: Exiting with status 1 > 12/12/05 16:43:48 INFO namenode.NameNode: SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down NameNode at > cs-10-20-193-228.cloud.cloudera.com/10.20.193.228 > ************************************************************/ > {code} > The exception message says > {code} > Please ensure that this node is one of the machines listed as an NN RPC > address, or configure dfs.ha.namenode.id > {code} > I ran the recover command from a machine listed as an NN RPC: > {code} > <property> > <name>dfs.namenode.rpc-address.ha-nn-uri.nn1</name> > <value>cs-10-20-193-228.cloud.cloudera.com:17020</value> > </property> > {code} > Setting dfs.ha.namenode.id allows me to proceed. If we always need to specify > the dfs.ha.namenode.id, then we can edit the exception message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira