Amir Shenavandeh created HADOOP-16775:
-----------------------------------------

             Summary: Hadoop DistCp reuses the same temp file within the task 
for different files.
                 Key: HADOOP-16775
                 URL: https://issues.apache.org/jira/browse/HADOOP-16775
             Project: Hadoop Common
          Issue Type: Improvement
          Components: tools/distcp
    Affects Versions: 2.0
            Reporter: Amir Shenavandeh


Hadoop DistCp reuses the same temp file name for all the files copied within 
each task attempt and then moves them to the target name, which also a server 
side copy. For copies over S3 this will cause inconsistency as S3 is only 
consistent for read after writes, for brand new objects. There is also 
inconsistency for contents of overwritten objects on S3.

To avoid this, we should randomize the temp file name.  

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to