Yiqun Lin created HDFS-11044: -------------------------------- Summary: TestRollingUpgrade fails intermittently Key: HDFS-11044 URL: https://issues.apache.org/jira/browse/HDFS-11044 Project: Hadoop HDFS Issue Type: Bug Reporter: Yiqun Lin Assignee: Yiqun Lin
The test {{TestRollingUpgrade#testRollback}} fails intermittently in trunk(https://builds.apache.org/job/PreCommit-HDFS-Build/17250/testReport/). The stack info: {code} java.lang.AssertionError: Test resulted in an unexpected exit at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1949) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929) at org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(TestRollingUpgrade.java:351) {code} I looked into that, it seems there is some IOException happenning in writing files to nn storages(Can see jenkins report). And then this exception will be remenbered in {{ExitUtil.firstExitException}}. Finally when we do the cluster's shutdown operations, this exception will be threw. The exception info: {code} 2016-10-21 12:54:02,300 [main] FATAL hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1946)) - Test resulted in an unexpected exit org.apache.hadoop.util.ExitUtil$ExitException: java.io.IOException: All the storage failed while writing properties to VERSION file at org.apache.hadoop.hdfs.server.namenode.NNStorage.writeAll(NNStorage.java:1151) at org.apache.hadoop.hdfs.server.namenode.FSImage.updateStorageVersion(FSImage.java:999) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:850) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:819) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) {code} The IOException is beacause that all the sotrage dir have be removed. IMO, one of the reason is that when we writing some properties or write transactionId to storage failed that lead the existing sotrage to be removed. Since that in test {{TestRollingUpgrade#testRollback}} it will do many times for restarting namenode operations, I'm not if it's normal here. But one way the I am sure to fix this: We can use {{checkExitOnShutdown(false)}} to skip the ExitException check. And this have been done in {{TestRollingUpgrade#testRollingUpgradeWithQJM}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org