[ https://issues.apache.org/jira/browse/HIVE-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973384#action_12973384 ]
Ning Zhang commented on HIVE-1852: ---------------------------------- @joydeep, I've tested the case like 'LOAD DATA LOCAL INPATH '/dir/*.txt' INTO TABLE blah and it worked. Is this what you meant by wildcards? If so I'll add a test case if it doesn't not exist already. The plan for that is to generate a CopyTask first from /dir/*.txt to a temp dir say /dir/tmp-1000 and the MoveTask is moving /dir/tmp-1000 to the table's destination location. replaceFiles is only called in the MoveTask so that worked. Also the only reason checkPath is there for replaceFiles is to check nested subdirectories. For INSERT OVERWRITE (whether be dynamic or static partitions) the temporary directory should not contain any sub directories. For LOAD DATA commands, Namit said we don't have any use cases for loading a directory containing subdirectories, but it should be enabled. That's why I removed it. But if we want to ensure no subdirectories, I can bring it back. > Reduce unnecessary DFSClient.rename() calls > ------------------------------------------- > > Key: HIVE-1852 > URL: https://issues.apache.org/jira/browse/HIVE-1852 > Project: Hive > Issue Type: Improvement > Reporter: Ning Zhang > Assignee: Ning Zhang > Attachments: HIVE-1852.2.patch, HIVE-1852.3.patch, HIVE-1852.patch > > > In Hive client side (MoveTask etc), DFSCleint.rename() is called for every > file inside a directory. It is very expensive for a large directory in a busy > DFS namenode. We should replace it with a single rename() call on the whole > directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.