Giri created HADOOP-8371: ---------------------------- Summary: Hadoop 1.0.1 release - DFS rollback issues Key: HADOOP-8371 URL: https://issues.apache.org/jira/browse/HADOOP-8371 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 1.0.1 Environment: All tests were done on a single node cluster, that runs namenode, secondarynamenode, datanode, all on one machine, running Ubuntu 12.04 Reporter: Giri Priority: Minor
h1.Test Setup All tests were done on a single node cluster, that runs namenode, secondarynamenode, datanode, all on one machine, running Ubuntu 12.04. /usr/local/hadoop/ is a soft link to /usr/local/hadoop-0.20.203.0/ /usr/local/hadoop-1.0.1 contains the upgrade version. h1.Version - 0.20.203.0 * Formatted name node. * Contents of {dfs.name.dir}/current/VERSION {quote} Tue May 08 08:08:57 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 {quote} * Contents of {dfs.name.dir}/previous.checkpoint/VERSION {quote} Tue May 08 08:03:35 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 {quote} * Copied a few test files into HDFS. * Output from "fs -lsr /" command {quote} hduser@ruff790:/usr/local/hadoop/bin$ ./hadoop dfs -lsr / drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /test -rw-r--r-- 1 hduser supergroup 27574849 2012-05-08 08:04 /test/rr_archive_1655003175_1660003165.gz -rw-r--r-- 1 hduser supergroup 18065179 2012-05-08 08:04 /test/twonkyportal.log.2011-12-03.rr.gz drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user/hduser {quote} * Executed "hadoop dfsadmin -finalizeUpgrade" (I do not think this is required, but i do not think it should matter either). * Stopped DFS by executing "stop-dfs.sh" h1. Version - 1.0.1 h2. Upgrade * Tried starting DFS by running /usr/local/hadoop-1.0.1/bin/start-dfs.sh * As expected the name node start failed due to a version mismatch. {quote} 2012-05-08 08:22:38,166 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: File system image contains an old layout version -31. An upgrade to version -32 is required. Please restart NameNode with -upgrade option. {quote} * Ran /usr/local/hadoop-1.0.1/bin/stop-dfs.sh to stop datanode and secondarynamenode. * Started DFS by running /usr/local/hadoop-1.0.1/bin/start-dfs.sh -upgrade * Checked upgrade status by calling /usr/local/hadoop-1.0.1/bin/hadoop dfsadmin -upgradeProgress status {quote} Upgrade for version -32 has been completed. Upgrade is not finalized. {quote} * Contents of {dfs.name.dir}/current/VERSION {quote} #Tue May 08 08:25:51 EDT 2012 namespaceID=350250898 cTime=1336479951669 storageType=NAME_NODE layoutVersion=-32 {quote} * Contents of {dfs.name.dir}/previous.checkpoint/VERSION {quote} Tue May 08 08:03:35 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 {quote} * Contents of {dfs.name.dir}/previous/VERSION {quote} #Tue May 08 08:08:57 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 {quote} * Checked to make sure i can list the contents of DFS * Stop DFS. h2.Rollback * Started DFS by running /usr/local/hadoop-1.0.1/bin/start-dfs.sh -rollback * As per contents of "hadoop-hduser-namenode-ruff790.log", rollback seems to have succeeded. {quote} 012-05-08 08:37:41,799 INFO org.apache.hadoop.hdfs.server.common.Storage: Rolling back storage directory /usr/local/app/hadoop/tmp/dfs/name. new LV = -31; new CTime = 0 2012-05-08 08:37:41,801 INFO org.apache.hadoop.hdfs.server.common.Storage: Rollback of /usr/local/app/hadoop/tmp/dfs/name is complete. {quote} * Contents of {dfs.name.dir}/current/VERSION {quote} Tue May 08 08:37:42 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 {quote} * Contents of {dfs.name.dir}/previous.checkpoint/VERSION {quote} #Tue May 08 08:08:57 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 {quote} * Checked to make sure i can list the contents of DFS {quote} hduser@ruff790:/usr/local/hadoop-1.0.1/bin$ ./hadoop dfs -lsr / drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /test -rw-r--r-- 1 hduser supergroup 27574849 2012-05-08 08:04 /test/rr_archive_1655003175_1660003165.gz -rw-r--r-- 1 hduser supergroup 18065179 2012-05-08 08:04 /test/twonkyportal.log.2011-12-03.rr.gz drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user/hduser {quote} * However at this point i could not browse the file system from WebUI. Then i realized that data node is not really running. From the data node log file it seems like it had shut down during the rollback process. {quote} 012-05-08 08:37:57,953 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Unregistered data node: 127.0.0.1:50010 at org.apache.hadoop.hdfs.server.namenode.NameNode.verifyRequest(NameNode.java:1077) {quote} * So i ran "stop-dfs.sh" to shut down namnode and secondarynamenode. * Next "start-dfs.sh" fails to start the name node, as expected, with a version mismatch error. {quote} 2012-05-08 08:50:51,084 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: File system image contains an old layout version -31. An upgrade to version -32 is required. Please restart NameNode with -upgrade option. {quote} * Shut everything down and go back to the old version. h1. Version - 0.20.203.0 (Again) * Now that i have rolled back the "1.0.1" upgrade i thought i could go back to version 0.20.203.0 * So i go back and run /usr/local/hadoop/bin/start-dfs.sh and namenode does not start up. It fails with error message: {quote} 2012-05-08 08:57:09,261 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: Unexpected version of the file system log file: -32. Current version = -31. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira