[ https://issues.apache.org/jira/browse/HIVE-22548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987441#comment-16987441 ]
mahesh kumar behera commented on HIVE-22548: -------------------------------------------- [~ste...@apache.org] The directory listing is required by the caller. Earlier there were two calls to list status. Now it's merged to one list status. The directory listing done in removeEmptyDpDirectory is used by removeTempOrDuplicateFiles. The directory listing is kept in removeEmptyDpDirectory and is called in parallel for multiple partitions to reduce execution time. > Optimise Utilities.removeTempOrDuplicateFiles when moving files to final > location > --------------------------------------------------------------------------------- > > Key: HIVE-22548 > URL: https://issues.apache.org/jira/browse/HIVE-22548 > Project: Hive > Issue Type: Improvement > Components: Hive > Affects Versions: 3.1.2 > Reporter: Rajesh Balamohan > Assignee: mahesh kumar behera > Priority: Major > Attachments: HIVE-22548.01.patch > > > {{Utilities.removeTempOrDuplicateFiles}} > is very slow with cloud storage, as it executes {{listStatus}} twice and also > runs in single threaded mode. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L1629 -- This message was sent by Atlassian Jira (v8.3.4#803005)