[ 
https://issues.apache.org/jira/browse/HIVE-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973384#action_12973384
 ] 

Ning Zhang commented on HIVE-1852:
----------------------------------

@joydeep, I've tested the case like 'LOAD DATA LOCAL INPATH '/dir/*.txt' INTO 
TABLE blah and it worked. Is this what you meant by wildcards? If so I'll add a 
test case if it doesn't not exist already.  The plan for that is to generate a 
CopyTask first from /dir/*.txt to a temp dir say /dir/tmp-1000 and the MoveTask 
is moving /dir/tmp-1000 to the table's destination location. replaceFiles is 
only called in the MoveTask so that worked. 

Also the only reason checkPath is there for replaceFiles is to check nested 
subdirectories. For INSERT OVERWRITE (whether be dynamic or static partitions) 
the temporary directory should not contain any sub directories. For LOAD DATA 
commands, Namit said we don't have any use cases for loading a directory 
containing subdirectories, but it should be enabled. That's why I removed it. 
But if we want to ensure no subdirectories, I can bring it back. 

> Reduce unnecessary DFSClient.rename() calls
> -------------------------------------------
>
>                 Key: HIVE-1852
>                 URL: https://issues.apache.org/jira/browse/HIVE-1852
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1852.2.patch, HIVE-1852.3.patch, HIVE-1852.patch
>
>
> In Hive client side (MoveTask etc), DFSCleint.rename() is called for every 
> file inside a directory. It is very expensive for a large directory in a busy 
> DFS namenode. We should replace it with a single rename() call on the whole 
> directory. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to