[ https://issues.apache.org/jira/browse/HIVE-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gunther Hagleitner updated HIVE-5834: ------------------------------------- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to branch. Thanks Gopal! > Avoid reading ORC footers for files which will not be split in > OrcInputFormat::getSplits() > ------------------------------------------------------------------------------------------ > > Key: HIVE-5834 > URL: https://issues.apache.org/jira/browse/HIVE-5834 > Project: Hive > Issue Type: Improvement > Components: Tez > Affects Versions: tez-branch > Reporter: Gopal V > Assignee: Gopal V > Labels: perfomance, split > Fix For: tez-branch > > Attachments: HIVE-5834.00-tez.patch > > > OrcInputFormat::getSplits() fires off a SplitGenerator task for every file in > the task. > The footer & data are on the same node for all files with only 1 hdfs block. > On top of that, it will never need a further split as long as its total size > is < context.maxSize. > Reading that footer locally is faster than reading it in the split gen and > sending it from the AM. -- This message was sent by Atlassian JIRA (v6.1#6144)