existing in_use.lock file is removed after failing to lock this file
--------------------------------------------------------------------

                 Key: HDFS-2632
                 URL: https://issues.apache.org/jira/browse/HDFS-2632
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
    Affects Versions: 0.21.0
         Environment: Scientific Linux 5.3
            Reporter: Dan Bradley


If an attempt is made to start the namenode when it is already running, an 
exception is generated on failure to lock in_use.lock.  However, there is a 
bug: in_use.lock is deleted!  After that, if another attempt is made to start 
the namenode, there is no in_use.lock file, so the new instance goes ahead and 
starts messing with the namenode state files.  It eventually fails to bind to 
the TCP port, but it has already done damage by that time.  Specifically, the 
'edits' file being written to by the running instance is moved to 
'previous.checkpoint' so all further transactions are lost when the HDFS 
service is next restarted.  We observed a case of data loss because of this.

This issue relates to HDFS-1690, but the problem in HDFS-1690 was stated in a 
way that is specific to -format.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to