That's not supposed to happen. What version of Hadoop are you using? It Please file a jira with details including how the namenodes are configured.
For the recovery: First and foremost, do not shut down the active namenode. Put it into safe mode and issue a saveNamespace command to create a checkpoint. Then use the bootstrapStandby command to re-initialize the standby. Hope it helps. Kihwal On Tue, Aug 27, 2019 at 7:12 AM Lionel CL <whuca...@outlook.com> wrote: > Hi committee, > We encountered a NN error as below, > The primary NN was shut down last Thursday and we recover it by remove > some OP in the edit log.. But today the standby NN was shut down again by > the same error... > could you pls help address the possible root cause? > > 2019-08-27 09:51:14,075 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered > exception on operation CloseOp [length=0, inodeId=0, > path=/******/v2-data-20190826.data, replication=2, mtime=1566870616821, > atime=1566870359230, blockSize=134217728, blocks=[blk_1270599798_758966421, > blk_1270599852_758967928, blk_1270601282_759026903, > blk_1270602443_759027052, blk_1270602446_759061086, > blk_1270603081_759050235], permissions=smc_ss:smc_ss:rw-r--r--, > aclEntries=null, clientName=, clientMachine=, overwrite=false, > storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, > txid=4359520942] > java.io.IOException: Mismatched block IDs or generation stamps, attempting > to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as > block # 4/6 of /******/v2-data-20190826.mayfly.data > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393) > 2019-08-27 09:51:14,077 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write > lock held for 11714 ms via > > > Thanks & Best Regards, > Lionel Cao >