Ashutosh Chauhan created HDFS-9763:
--------------------------------------

             Summary: Add merge api
                 Key: HDFS-9763
                 URL: https://issues.apache.org/jira/browse/HDFS-9763
             Project: Hadoop HDFS
          Issue Type: New Feature
          Components: fs
            Reporter: Ashutosh Chauhan


It will be good to add merge(Path dir1, Path dir2, ... ) api to HDFS. Semantics 
will be to move all files under dir1 to dir2 and doing a rename of files in 
case of collisions.
In absence of this api, Hive[1] has to check for collision for each file and 
then come up unique name and try again and so on. This is inefficient in 
multiple ways:

1) It generates huge number of calls on NN (atleast 2*number of source files in 
dir1)
2) It suffers from TOCTOU[2] bug for client picked up name in case of collision.
3) Whole operation is not atomic.

A merge api outlined as above will be immensely useful for Hive and potentially 
to other HDFS users.

[1] 
https://github.com/apache/hive/blob/release-2.0.0-rc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2576
[2]https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to