Eugene Koontz created HDFS-4302: ----------------------------------- Summary: Precondition in EditLogFileInputStream's length() method is checked too early in NameNode startup, causing fatal exception Key: HDFS-4302 URL: https://issues.apache.org/jira/browse/HDFS-4302 Project: Hadoop HDFS Issue Type: Bug Components: ha, namenode Reporter: Eugene Koontz
When bringing up a namenode in standby mode, where DEBUG is enabled for namenode, the namenode will hit the following code in {{hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java}}: {code} if (LOG.isDebugEnabled()) { LOG.debug("edit log length: " + in.length() + ", start txid: " + expectedStartingTxId + ", last txid: " + lastTxId); } {code}. However, if {{in}} has an {{EditLogFileInputStream}} as its {{streams[0]}}, this code is hit before the {{EditLogFileInputStream}}'s {{advertizedSize}} is initialized (before the HTTP client connects to the remote edit log server (i.e. the journal node)). This causes the following precondition to fail in {{EditLogFileInputStream:length()}}: {code} Preconditions.checkState(advertisedSize != -1, "must get input stream before length is available"); {code} which shuts down the namenode with the following log messages and stack trace: {code} 2012-12-11 10:45:33,319 DEBUG ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(217)) - Call: getEditLogManifest took 88ms 2012-12-11 10:45:33,336 DEBUG client.QuorumJournalManager (QuorumJournalManager.java:selectInputStreams(459)) - selectInputStream manifests: 172.16.175.1:8485: [[1,3]] 2012-12-11 10:45:33,351 DEBUG namenode.FSImage (FSImage.java:loadFSImage(605)) - Planning to load image : FSImageFile(file=/tmp/hadoop-data/dfs/name/current/fsimage_0000000000000000000, cpktTxId=0000000000000000000) 2012-12-11 10:45:33,351 DEBUG namenode.FSImage (FSImage.java:loadFSImage(607)) - Planning to load edit log stream: org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@465098f9 2012-12-11 10:45:33,355 INFO namenode.FSImage (FSImageFormat.java:load(168)) - Loading image file /tmp/hadoop-data/dfs/name/current/fsimage_0000000000000000000 using no compression 2012-12-11 10:45:33,355 INFO namenode.FSImage (FSImageFormat.java:load(171)) - Number of files = 1 2012-12-11 10:45:33,356 INFO namenode.FSImage (FSImageFormat.java:loadFilesUnderConstruction(383)) - Number of files under construction = 0 2012-12-11 10:45:33,357 INFO namenode.FSImage (FSImageFormat.java:load(193)) - Image file of size 119 loaded in 0 seconds. 2012-12-11 10:45:33,357 INFO namenode.FSImage (FSImage.java:loadFSImage(753)) - Loaded image for txid 0 from /tmp/hadoop-data/dfs/name/current/fsimage_0000000000000000000 2012-12-11 10:45:33,357 DEBUG namenode.FSImage (FSImage.java:loadEdits(686)) - About to load edits: org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@465098f9 2012-12-11 10:45:33,359 INFO namenode.FSImage (FSImage.java:loadEdits(694)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@465098f9 expecting start txid #1 2012-12-11 10:45:33,361 DEBUG ipc.Client (Client.java:stop(1060)) - Stopping client 2012-12-11 10:45:33,363 DEBUG ipc.Client (Client.java:close(1016)) - IPC Client (1462017562) connection to Eugenes-MacBook-Pro.local/172.16.175.1:8485 from hdfs/eugenes-macbook-pro.lo...@example.com: closed 2012-12-11 10:45:33,363 DEBUG ipc.Client (Client.java:run(848)) - IPC Client (1462017562) connection to Eugenes-MacBook-Pro.local/172.16.175.1:8485 from hdfs/eugenes-macbook-pro.lo...@example.com: stopped, remaining connections 0 2012-12-11 10:45:33,464 FATAL namenode.NameNode (NameNode.java:main(1224)) - Exception in namenode join java.lang.IllegalStateException: must get input stream before length is available at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog.length(EditLogFileInputStream.java:405) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.length(EditLogFileInputStream.java:258) at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.length(RedundantEditLogInputStream.java:256) at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.length(RedundantEditLogInputStream.java:256) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:125) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:88) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:697) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:259) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:604) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:447) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:409) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:400) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:434) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:606) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:591) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1219) 2012-12-11 10:45:33,470 INFO util.ExitUtil (ExitUtil.java:terminate(84)) - Exiting with status 1 2012-12-11 10:45:33,471 INFO namenode.NameNode (StringUtils.java:run(620)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at Eugenes-MacBook-Pro.local/172.16.175.1 ************************************************************/ {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira