[ https://issues.apache.org/jira/browse/HIVE-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-5834: -------------------------- Attachment: HIVE-5834.00-tez.patch Tested with a count(1) with a filter For a table of 1500 x 70mb ORC files. Before = 26 seconds After = 18 seconds For a table of 23699 x ~2mb ORC files Before = 32.9 seconds After = 23.0 seconds > Avoid reading ORC footers for files which will not be split in > OrcInputFormat::getSplits() > ------------------------------------------------------------------------------------------ > > Key: HIVE-5834 > URL: https://issues.apache.org/jira/browse/HIVE-5834 > Project: Hive > Issue Type: Improvement > Components: Tez > Affects Versions: 0.13.0 > Reporter: Gopal V > Assignee: Gopal V > Attachments: HIVE-5834.00-tez.patch > > > OrcInputFormat::getSplits() fires off a SplitGenerator task for every file in > the task. > The footer & data are on the same node for all files with only 1 hdfs block. > On top of that, it will never need a further split as long as its total size > is < context.maxSize. > Reading that footer locally is faster than reading it in the split gen and > sending it from the AM. -- This message was sent by Atlassian JIRA (v6.1#6144)