chuanjie.duan created HDFS-16349: ------------------------------------ Summary: FSEditLog checkForGaps break HDFS RollingUpgrade Rollback Key: HDFS-16349 URL: https://issues.apache.org/jira/browse/HDFS-16349 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.2.2 Reporter: chuanjie.duan
2021-11-22 20:36:44,440 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest log: 10.65.57.133:8485=segmentState { startTxId: 3906965 endTxId: 3906965 isInProgress: false } lastWriterEpoch: 5 lastCommittedTxId: 3906964 2021-11-22 20:36:44,457 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /data12/data/flashHadoopU/namenode/current 2021-11-22 20:36:44,495 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /data12/data/flashHadoopU/namenode/current/edits_inprogress_0000000000003898378 -> /data12/data/flashHadoopU/namenode/current/edits_0000000000003898378-0000000000003898412 2021-11-22 20:36:44,657 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 2510934 but unable to find any edit logs containing txid 2510933 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070 2021-11-22 20:36:44,760 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system... 2021-11-22 20:36:44,761 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped. 2021-11-22 20:36:44,761 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete. 2021-11-22 20:36:44,761 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. Old version: 2.7.3 New version: 3.2.2 Steps to Reproduce Step 1: Start NN1 as active , NN2 as standby . Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare" Step 3: Start NN2 active and NN1 as standby with rolling upgrade started option. Step 4: DN also restarted in upgrade mode. Step 5: Restart journalnode with new hadoop version Step 6: a few days later Step 7: bring down both NN, journalnode, DN Step 8: Start JN with old version Step 9: Start NN1 with rolling upgrade rollback option. nn started failed with above ERROR(Above mentioned txid version 2510933 has been deleted because of checkpoint mechanism) -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org