Zesheng Wu created HDFS-4689:
--------------------------------

             Summary: freeze/seal a hdfs file
                 Key: HDFS-4689
                 URL: https://issues.apache.org/jira/browse/HDFS-4689
             Project: Hadoop HDFS
          Issue Type: New Feature
          Components: datanode, hdfs-client, namenode
    Affects Versions: 2.0.0-alpha
            Reporter: Zesheng Wu


I would like to describe the problem scenario at first, that is in our hbase 
cluster:
1. rs1 loses its zookeeper lock, and hmaster realizes that
2. hmaster assigns the regions of rs1 to rs2
3. rs2 renames the hlog of rs1, and begins to replay the log
4. but at the meantime, rs1 is still running, and the client still writes data 
to rs1
5. in this scenario, the data written after rs2 renamed rs1's hlog will be lost

The root cause of the problem is: 
As we all know, when we open a hdfs file for write, the file meta is only 
updated when a block is finished or when the file is closed. But the client 
thinks that the data is successfully written when it receives ack from 
datanode. Under this premise, after a file is renamed, the client is not 
required to update the meta immediately, so the client will not realize about 
the renaming, and will keep writing to the block, and will write successfully 
until the block is finished or the file is closed. The data written during this 
time will certainly be lost.

The basic idea about how to solve this is to add a freeze/seal semantics for a 
file, when a file is frozen/sealed, the client can't write any data to it, but 
it can be renamed or deleted.

If we can freeze/seal a file, the scenario at the beginning will like this:
1. rs1 loses its zookeeper lock, and hmaster realizes that
2. hmaster freezes/seals the hlog of rs1
3. hmaster assigns the regions of rs1 to rs2
4. rs2 renames the hlog of rs1, and begins to replay the log
5. after rs2 successfully replayed the log, the log file is deleted
6. in this scenario, after hmaster freezed/sealed the hlog file of rs1, rs1 
can't write any data to it even if it is still running, this can guarantee no 
data will be lost

I hope I've described the problem clearly. Is there anyone has already worked 
on this feature? And any idea about this will be very appreciated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to