Authur Wang created HADOOP-19307: ------------------------------------ Summary: Add option to add parent directory of source directories to target directories Key: HADOOP-19307 URL: https://issues.apache.org/jira/browse/HADOOP-19307 Project: Hadoop Common Issue Type: New Feature Components: tools/distcp Affects Versions: 3.0.0 Environment: hadoop 3.3.1 Reporter: Authur Wang
Currently, when we execute the Hadoop distcp with -update -delete src1/* src2/* dest command to keep the source and target directories exactly the same。 When either -update or -overwrite is specified, the *contents* of the source-directories are copied to target, and not the source directories themselves. Consider a copy from /source/first/ and /source/second/ to /target/, where the source paths have the following contents: hdfs://nn1:8020/source/first/1 hdfs://nn1:8020/source/first/2 hdfs://nn1:8020/source/second/10 hdfs://nn1:8020/source/second/20 distcp2 -update hdfs://nn1:8020/source/first hdfs://nn1:8020/source/second hdfs://nn2:8020/target would yield the following contents in /target: hdfs://nn2:8020/target/1 hdfs://nn2:8020/target/2 hdfs://nn2:8020/target/10 hdfs://nn2:8020/target/20 But, sometimes, we need to preserve parent directories like this: hdfs://nn1:8020/target/first/1 hdfs://nn1:8020/target/first/2 hdfs://nn1:8020/target/second/10 hdfs://nn1:8020/target/second/20 So, should we introduce an option -preserveParentDir to keep the parent directories to be copied with -update or -overwrite ? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org