Cao, Lionel created HDFS-14787:
----------------------------------

             Summary: [Help] NameNode error 
                 Key: HDFS-14787
                 URL: https://issues.apache.org/jira/browse/HDFS-14787
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 3.0.0
            Reporter: Cao, Lionel
         Attachments: core-site.xml, 
hadoop-cmf-hdfs-NAMENODE-smc-nn02.jq.log.out.20190827, hdfs-site.xml, 
move&concat.java, rt-Append.txt

Hi committee,

We encountered a NN error as below,

The primary NN was shut down last Thursday and we recover it by remove some OP 
in the edit log..  But the standby NN was shut down again yesterday by the same 
error...

could you pls help address the possible root cause?

 

Attach some error log:

Full log and NameNode configuration pls refer to the attachments. 

Besides, I have attached some java code which could cause the error, 
 # We do some append action in spark streaming program (rt-Append.txt) which 
caused the primary NN shutdown last Thursday
 # We do some move & concat operation in data convert program(move&concat.java) 
which caused the standby NN shutdown yesterday

2019-08-27 09:51:12,409 INFO 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 
766146/953617 transactions completed. (80%)2019-08-27 09:51:12,409 INFO 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 
766146/953617 transactions completed. (80%)2019-08-27 09:51:12,858 INFO 
org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 
2 to 2 for 
/user/smcjob/.sparkStaging/application_1561429828507_20423/__spark_libs__2381992047634476351.zip2019-08-27
 09:51:12,870 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
Increasing replication from 2 to 2 for 
/user/smcjob/.sparkStaging/application_1561429828507_20423/oozietest2-0.0.1-SNAPSHOT.jar2019-08-27
 09:51:12,898 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
Increasing replication from 2 to 2 for 
/user/smcjob/.sparkStaging/application_1561429828507_20423/__spark_conf__.zip2019-08-27
 09:51:12,910 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
Increasing replication from 2 to 2 for 
/user/smctest/.sparkStaging/application_1561429828507_20424/__spark_libs__8875310030853528804.zip2019-08-27
 09:51:12,927 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
Increasing replication from 2 to 2 for 
/user/smctest/.sparkStaging/application_1561429828507_20424/__spark_conf__.zip2019-08-27
 09:51:13,777 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: 
replaying edit log: 857745/953617 transactions completed. (90%)2019-08-27 
09:51:14,035 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
Increasing replication from 2 to 2 for 
/user/smc_ss/.sparkStaging/application_1561429828507_20425/__spark_libs__7422229681005558653.zip2019-08-27
 09:51:14,067 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
Increasing replication from 2 to 2 for 
/user/smc_ss/.sparkStaging/application_1561429828507_20426/__spark_libs__7479542421029947753.zip2019-08-27
 09:51:14,070 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
Increasing replication from 2 to 2 for 
/user/smctest/.sparkStaging/application_1561429828507_20428/__spark_libs__7647933078788028649.zip2019-08-27
 09:51:14,075 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: 
Encountered exception on operation CloseOp [length=0, inodeId=0, 
path=/smc/tboxng/prod/zxq_hbd_tbox_signal_channel/ods/vin_hash=33/v2-data-20190826.mayfly.data,
 replication=2, mtime=1566870616821, atime=1566870359230, blockSize=134217728, 
blocks=[blk_1270599798_758966421, blk_1270599852_758967928, 
blk_1270601282_759026903, blk_1270602443_759027052, blk_1270602446_759061086, 
blk_1270603081_759050235], permissions=smc_ss:smc_ss:rw-r--r--, 
aclEntries=null, clientName=, clientMachine=, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, 
txid=4359520942]java.io.IOException: Mismatched block IDs or generation stamps, 
attempting to replace block blk_1270602446_759027503 with 
blk_1270602446_759061086 as block # 4/6 of 
/smc/tboxng/prod/zxq_hbd_tbox_signal_channel/ods/vin_hash=33/v2-data-20190826.mayfly.data
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) 
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869) 
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
 at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)2019-08-27
 09:51:14,077 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
FSNamesystem write lock held for 11714 ms 
viajava.lang.Thread.getStackTrace(Thread.java:1559)org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261)org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:218)org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1630)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:309)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
 Number of suppressed write-lock reports: 0 Longest write-lock held interval: 
117142019-08-27 09:51:14,077 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block report queue 
is full2019-08-27 09:51:14,077 FATAL 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error 
encountered while tailing edits. Shutting down standby NN.java.io.IOException: 
Mismatched block IDs or generation stamps, attempting to replace block 
blk_1270602446_759027503 with blk_1270602446_759061086 as block # 4/6 of 
/smc/tboxng/prod/zxq_hbd_tbox_signal_channel/ods/vin_hash=33/v2-data-20190826.mayfly.data
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) 
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869) 
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
 at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)2019-08-27
 09:51:14,105 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: 
java.io.IOException: Mismatched block IDs or generation stamps, attempting to 
replace block blk_1270602446_759027503 with blk_1270602446_759061086 as block # 
4/6 of 
/smc/tboxng/prod/zxq_hbd_tbox_signal_channel/ods/vin_hash=33/v2-data-20190826.mayfly.data2019-08-27
 09:51:14,118 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
SHUTDOWN_MSG: 
/************************************************************SHUTDOWN_MSG: 
Shutting down NameNode at 
smc-nn02.jq/10.129.148.13************************************************************/2019-08-27
 10:43:15,713 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
STARTUP_MSG: 
/************************************************************STARTUP_MSG: 
Starting NameNodeSTARTUP_MSG:   host = smc-nn02.jq/10.129.148.13STARTUP_MSG:   
args = []STARTUP_MSG:   version = 3.0.0-cdh6.0.1



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to