Save namespace can cause NN to be unable to come up on restart --------------------------------------------------------------
Key: HDFS-1921 URL: https://issues.apache.org/jira/browse/HDFS-1921 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Aaron T. Myers Priority: Critical Fix For: 0.22.0, 0.23.0 I discovered this in the course of trying to implement a fix for HDFS-1505. Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save namespace proceeds in the following order: # rename current to lastcheckpoint.tmp for all of them, # save image and recreate edits for all of them, # rename lastcheckpoint.tmp to previous.checkpoint. The problem is that step 3 occurs regardless of whether or not an error occurs for all storage directories in step 2. Upon restart, the NN will see non-existent or corrupt {{current}} directories, and no {{lastcheckpoint.tmp}} directories, and so will conclude that the storage directories are not formatted. This issue appears to be present on both 0.22 and 0.23. This should arguably be a 0.22/0.23 blocker. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira