[ https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275975#comment-15275975 ]
Harsh J commented on HIVE-13704: -------------------------------- [~ashutoshc] - The opposite. HADOOP-10459 added the new method call in {{run()}}, so any Hadoop releases with that fix in will no longer execute DistCp correctly in Hive, because Hive has skipped calling {{run()}}. > Don't call DistCp.execute() instead of DistCp.run() > --------------------------------------------------- > > Key: HIVE-13704 > URL: https://issues.apache.org/jira/browse/HIVE-13704 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.3.0, 2.0.0 > Reporter: Harsh J > Priority: Critical > > HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} > method runs added logic that drives the state of {{SimpleCopyListing}} which > runs in the driver, and of {{CopyCommitter}} which runs in the job runtime. > When Hive ends up running DistCp for copy work (Between non matching FS or > between encrypted/non-encrypted zones, for sizes above a configured value) > this state not being set causes wrong paths to appear on the target (subdirs > named after the file, instead of just the file). > Hive should call DistCp's Tool {{run}} method and not the {{execute}} method > directly, to not skip the target exists flag that the {{setTargetPathExists}} > call would set: > https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126 -- This message was sent by Atlassian JIRA (v6.3.4#6332)