Amir Shenavandeh created HADOOP-16775: -----------------------------------------
Summary: Hadoop DistCp reuses the same temp file within the task for different files. Key: HADOOP-16775 URL: https://issues.apache.org/jira/browse/HADOOP-16775 Project: Hadoop Common Issue Type: Improvement Components: tools/distcp Affects Versions: 2.0 Reporter: Amir Shenavandeh Hadoop DistCp reuses the same temp file name for all the files copied within each task attempt and then moves them to the target name, which also a server side copy. For copies over S3 this will cause inconsistency as S3 is only consistent for read after writes, for brand new objects. There is also inconsistency for contents of overwritten objects on S3. To avoid this, we should randomize the temp file name. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org