chuanjie.duan created HDFS-16349:
------------------------------------

             Summary: FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
                 Key: HDFS-16349
                 URL: https://issues.apache.org/jira/browse/HDFS-16349
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs
    Affects Versions: 3.2.2
            Reporter: chuanjie.duan


2021-11-22 20:36:44,440 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest log: 
10.65.57.133:8485=segmentState {
  startTxId: 3906965
  endTxId: 3906965
  isInProgress: false
}
lastWriterEpoch: 5
lastCommittedTxId: 3906964

2021-11-22 20:36:44,457 INFO 
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering 
unfinalized segments in /data12/data/flashHadoopU/namenode/current
2021-11-22 20:36:44,495 INFO 
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits 
file 
/data12/data/flashHadoopU/namenode/current/edits_inprogress_0000000000003898378 
-> 
/data12/data/flashHadoopU/namenode/current/edits_0000000000003898378-0000000000003898412
2021-11-22 20:36:44,657 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
loading fsimage
java.io.IOException: Gap in transactions. Expected to be able to read up until 
at least txid 2510934 but unable to find any edit logs containing txid 2510933
    at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578)
    at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536)
    at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652)
    at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
    at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976)
    at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
    at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
    at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
    at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped 
HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070
2021-11-22 20:36:44,760 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Stopping NameNode metrics system...
2021-11-22 20:36:44,761 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
NameNode metrics system stopped.
2021-11-22 20:36:44,761 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
NameNode metrics system shutdown complete.
2021-11-22 20:36:44,761 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: 
Failed to start namenode.

Old version: 2.7.3

New version: 3.2.2

Steps to Reproduce

Step 1: Start NN1 as active , NN2 as standby .
Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare"
Step 3: Start NN2 active and NN1 as standby with rolling upgrade started option.
Step 4: DN also restarted in upgrade mode.

Step 5: Restart journalnode with new hadoop version 
Step 6: a few days later

Step 7: bring down both NN, journalnode, DN

Step 8: Start JN with old version
Step 9: Start NN1 with rolling upgrade rollback option. nn started failed with 
above ERROR(Above mentioned txid version 2510933 has been deleted because of  
checkpoint mechanism)

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to