[ https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806158#comment-15806158 ]
Vihang Karajgaonkar commented on HIVE-14165: -------------------------------------------- Thanks for the patch [~stakiar]. It seems like the previous implementation was ignoring zero length files for computing the splits. While FileInputFormat.getSplit() creates an empty Split for the zero length files. I am not sure how it impacts the execution, may be worth while to test. Also, if needed may be you can ignore the empty splits before adding them to {{FetchInputFormatSplit[] inputSplit}} > Remove Hive file listing during split computation > ------------------------------------------------- > > Key: HIVE-14165 > URL: https://issues.apache.org/jira/browse/HIVE-14165 > Project: Hive > Issue Type: Sub-task > Affects Versions: 2.1.0 > Reporter: Abdullah Yousufi > Assignee: Sahil Takiar > Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, > HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, > HIVE-14165.patch > > > The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's > FileInputFormat.java will list the files during split computation anyway to > determine their size. One way to remove this is to catch the > InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the > Hive side instead of doing the file listing beforehand. > For S3 select queries on partitioned tables, this results in a 2x speedup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)