Authur Wang created HADOOP-19307:
------------------------------------

             Summary: Add option to add parent directory of source directories 
to target directories
                 Key: HADOOP-19307
                 URL: https://issues.apache.org/jira/browse/HADOOP-19307
             Project: Hadoop Common
          Issue Type: New Feature
          Components: tools/distcp
    Affects Versions: 3.0.0
         Environment: hadoop 3.3.1
            Reporter: Authur Wang


Currently, when we execute the Hadoop distcp  with -update -delete src1/* 
src2/* dest command to keep the source and target directories exactly the same。 
When either -update or -overwrite is specified, the *contents* of the 
source-directories are copied to target, and not the source directories 
themselves. 

Consider a copy from /source/first/ and /source/second/ to /target/, where the 
source paths have the following contents:

hdfs://nn1:8020/source/first/1
hdfs://nn1:8020/source/first/2
hdfs://nn1:8020/source/second/10
hdfs://nn1:8020/source/second/20

distcp2 -update hdfs://nn1:8020/source/first hdfs://nn1:8020/source/second 
hdfs://nn2:8020/target


would yield the following contents in /target:

hdfs://nn2:8020/target/1
hdfs://nn2:8020/target/2
hdfs://nn2:8020/target/10
hdfs://nn2:8020/target/20

 

But, sometimes, we need to preserve parent directories like this:

hdfs://nn1:8020/target/first/1
hdfs://nn1:8020/target/first/2
hdfs://nn1:8020/target/second/10
hdfs://nn1:8020/target/second/20

 

So, should we introduce an option -preserveParentDir to keep the parent 
directories to be copied with -update or -overwrite ?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to