[ https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767035#comment-15767035 ]
Thomas Poepping commented on HIVE-14165: ---------------------------------------- Hi Sahil, When you update the patch, can you create a new ReviewBoard submission? WRT the <tt>InputFormat</tt> issue, my feeling is that we should stray away from backwards-incompatible changes. Is there no way we can avoid the backwards-incompatible change, but still avoid the unnecessary list? I will be able to provide more targeted feedback once the RB submission has been updated. > Remove Hive file listing during split computation > ------------------------------------------------- > > Key: HIVE-14165 > URL: https://issues.apache.org/jira/browse/HIVE-14165 > Project: Hive > Issue Type: Sub-task > Affects Versions: 2.1.0 > Reporter: Abdullah Yousufi > Assignee: Sahil Takiar > Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, > HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, > HIVE-14165.patch > > > The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's > FileInputFormat.java will list the files during split computation anyway to > determine their size. One way to remove this is to catch the > InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the > Hive side instead of doing the file listing beforehand. > For S3 select queries on partitioned tables, this results in a 2x speedup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)