[ 
https://issues.apache.org/jira/browse/HIVE-22548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987441#comment-16987441
 ] 

mahesh kumar behera commented on HIVE-22548:
--------------------------------------------

[~ste...@apache.org]

The directory listing is required by the caller. Earlier there were two calls 
to list status. Now it's merged to one list status. The directory listing done 
in removeEmptyDpDirectory is used by removeTempOrDuplicateFiles. The directory 
listing is kept in removeEmptyDpDirectory and is called in parallel for 
multiple partitions to reduce execution time.

> Optimise Utilities.removeTempOrDuplicateFiles when moving files to final 
> location
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-22548
>                 URL: https://issues.apache.org/jira/browse/HIVE-22548
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>    Affects Versions: 3.1.2
>            Reporter: Rajesh Balamohan
>            Assignee: mahesh kumar behera
>            Priority: Major
>         Attachments: HIVE-22548.01.patch
>
>
> {{Utilities.removeTempOrDuplicateFiles}}
> is very slow with cloud storage, as it executes {{listStatus}} twice and also 
> runs in single threaded mode.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L1629



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to