[ https://issues.apache.org/jira/browse/HIVE-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-5834: -------------------------- Fix Version/s: tez-branch Labels: perfomance split (was: ) Affects Version/s: (was: 0.13.0) tez-branch Release Note: Avoid reading ORC footers for files where data and footer are in the same HDFS block Status: Patch Available (was: Open) > Avoid reading ORC footers for files which will not be split in > OrcInputFormat::getSplits() > ------------------------------------------------------------------------------------------ > > Key: HIVE-5834 > URL: https://issues.apache.org/jira/browse/HIVE-5834 > Project: Hive > Issue Type: Improvement > Components: Tez > Affects Versions: tez-branch > Reporter: Gopal V > Assignee: Gopal V > Labels: perfomance, split > Fix For: tez-branch > > Attachments: HIVE-5834.00-tez.patch > > > OrcInputFormat::getSplits() fires off a SplitGenerator task for every file in > the task. > The footer & data are on the same node for all files with only 1 hdfs block. > On top of that, it will never need a further split as long as its total size > is < context.maxSize. > Reading that footer locally is faster than reading it in the split gen and > sending it from the AM. -- This message was sent by Atlassian JIRA (v6.1#6144)