[
https://issues.apache.org/jira/browse/HIVE-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ning Zhang updated HIVE-1852:
-----------------------------
Attachment: HIVE-1852.3.patch
Taking Hairong and Joydeep's comments.
The original implementation assumes srcf be a path potentially containing
wildcards, but in the current code path wildcards in the 'LOAD DATA' commands
are handled differently (first a copy task handles the wildcards and then
followed by a move task which calls the replaceFiles() function). So srcf
should be a single leaf directory (although we don't prevent subdirectories
inside srcf).
Because of this I simplified the function by renaming srcf to destf and
eliminated tmppath. The .3 patch passed all unit tests.
> Reduce unnecessary DFSClient.rename() calls
> -------------------------------------------
>
> Key: HIVE-1852
> URL: https://issues.apache.org/jira/browse/HIVE-1852
> Project: Hive
> Issue Type: Improvement
> Reporter: Ning Zhang
> Assignee: Ning Zhang
> Attachments: HIVE-1852.2.patch, HIVE-1852.3.patch, HIVE-1852.patch
>
>
> In Hive client side (MoveTask etc), DFSCleint.rename() is called for every
> file inside a directory. It is very expensive for a large directory in a busy
> DFS namenode. We should replace it with a single rename() call on the whole
> directory.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.