Cao, Lionel created HDFS-14787: ---------------------------------- Summary: [Help] NameNode error Key: HDFS-14787 URL: https://issues.apache.org/jira/browse/HDFS-14787 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Cao, Lionel Attachments: core-site.xml, hadoop-cmf-hdfs-NAMENODE-smc-nn02.jq.log.out.20190827, hdfs-site.xml, move&concat.java, rt-Append.txt
Hi committee, We encountered a NN error as below, The primary NN was shut down last Thursday and we recover it by remove some OP in the edit log.. But the standby NN was shut down again yesterday by the same error... could you pls help address the possible root cause? Attach some error log: Full log and NameNode configuration pls refer to the attachments. Besides, I have attached some java code which could cause the error, # We do some append action in spark streaming program (rt-Append.txt) which caused the primary NN shutdown last Thursday # We do some move & concat operation in data convert program(move&concat.java) which caused the standby NN shutdown yesterday 2019-08-27 09:51:12,409 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 766146/953617 transactions completed. (80%)2019-08-27 09:51:12,409 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 766146/953617 transactions completed. (80%)2019-08-27 09:51:12,858 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smcjob/.sparkStaging/application_1561429828507_20423/__spark_libs__2381992047634476351.zip2019-08-27 09:51:12,870 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smcjob/.sparkStaging/application_1561429828507_20423/oozietest2-0.0.1-SNAPSHOT.jar2019-08-27 09:51:12,898 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smcjob/.sparkStaging/application_1561429828507_20423/__spark_conf__.zip2019-08-27 09:51:12,910 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smctest/.sparkStaging/application_1561429828507_20424/__spark_libs__8875310030853528804.zip2019-08-27 09:51:12,927 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smctest/.sparkStaging/application_1561429828507_20424/__spark_conf__.zip2019-08-27 09:51:13,777 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 857745/953617 transactions completed. (90%)2019-08-27 09:51:14,035 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smc_ss/.sparkStaging/application_1561429828507_20425/__spark_libs__7422229681005558653.zip2019-08-27 09:51:14,067 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smc_ss/.sparkStaging/application_1561429828507_20426/__spark_libs__7479542421029947753.zip2019-08-27 09:51:14,070 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smctest/.sparkStaging/application_1561429828507_20428/__spark_libs__7647933078788028649.zip2019-08-27 09:51:14,075 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/smc/tboxng/prod/zxq_hbd_tbox_signal_channel/ods/vin_hash=33/v2-data-20190826.mayfly.data, replication=2, mtime=1566870616821, atime=1566870359230, blockSize=134217728, blocks=[blk_1270599798_758966421, blk_1270599852_758967928, blk_1270601282_759026903, blk_1270602443_759027052, blk_1270602446_759061086, blk_1270603081_759050235], permissions=smc_ss:smc_ss:rw-r--r--, aclEntries=null, clientName=, clientMachine=, overwrite=false, storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=4359520942]java.io.IOException: Mismatched block IDs or generation stamps, attempting to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as block # 4/6 of /smc/tboxng/prod/zxq_hbd_tbox_signal_channel/ods/vin_hash=33/v2-data-20190826.mayfly.data at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)2019-08-27 09:51:14,077 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 11714 ms viajava.lang.Thread.getStackTrace(Thread.java:1559)org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261)org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:218)org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1630)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:309)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393) Number of suppressed write-lock reports: 0 Longest write-lock held interval: 117142019-08-27 09:51:14,077 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block report queue is full2019-08-27 09:51:14,077 FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby NN.java.io.IOException: Mismatched block IDs or generation stamps, attempting to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as block # 4/6 of /smc/tboxng/prod/zxq_hbd_tbox_signal_channel/ods/vin_hash=33/v2-data-20190826.mayfly.data at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)2019-08-27 09:51:14,105 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.io.IOException: Mismatched block IDs or generation stamps, attempting to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as block # 4/6 of /smc/tboxng/prod/zxq_hbd_tbox_signal_channel/ods/vin_hash=33/v2-data-20190826.mayfly.data2019-08-27 09:51:14,118 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at smc-nn02.jq/10.129.148.13************************************************************/2019-08-27 10:43:15,713 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: /************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = smc-nn02.jq/10.129.148.13STARTUP_MSG: args = []STARTUP_MSG: version = 3.0.0-cdh6.0.1 -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org