[ https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589769#comment-16589769 ]
mahesh kumar behera commented on HIVE-13704: -------------------------------------------- [~ashutoshc] [~spena] There seems to be a leak of job object if we call run instead of execute. The issue is in the run method of distcp which does not close the job created. As per this issue, the problem with calling execute is that , setTargetPathExists is not done in execute. Can we do that and other settings done in run method in hive and call distcp.execute instead of distcp.run ? //cc [~thejas] [~anishek][~sankarh] > Don't call DistCp.execute() instead of DistCp.run() > --------------------------------------------------- > > Key: HIVE-13704 > URL: https://issues.apache.org/jira/browse/HIVE-13704 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.3.0, 2.0.0 > Reporter: Harsh J > Assignee: Sergio Peña > Priority: Critical > Fix For: 2.1.1, 2.2.0 > > Attachments: HIVE-13704.1.patch > > > HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} > method runs added logic that drives the state of {{SimpleCopyListing}} which > runs in the driver, and of {{CopyCommitter}} which runs in the job runtime. > When Hive ends up running DistCp for copy work (Between non matching FS or > between encrypted/non-encrypted zones, for sizes above a configured value) > this state not being set causes wrong paths to appear on the target (subdirs > named after the file, instead of just the file). > Hive should call DistCp's Tool {{run}} method and not the {{execute}} method > directly, to not skip the target exists flag that the {{setTargetPathExists}} > call would set: > https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126 -- This message was sent by Atlassian JIRA (v7.6.3#76005)