[ https://issues.apache.org/jira/browse/HIVE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai resolved HIVE-4868. ---------------------------- Resolution: Duplicate > When reading an ORC file by an MR job, some Mappers may not be able to > process data in some cases > ------------------------------------------------------------------------------------------------- > > Key: HIVE-4868 > URL: https://issues.apache.org/jira/browse/HIVE-4868 > Project: Hive > Issue Type: Improvement > Reporter: Yin Huai > Assignee: Yin Huai > > Let's say a stripe of an ORC file is 256 MB and we set the split size for an > MR job to 64 MB. Right now, splits are created based on byte ranges. > Here is an example: > {code} > |<-The start of a stripe |<-The end of a stripe > v v > |---------------------------------------| > ^ ^ > |<- The start of a split |<- The end of a split > {\code} > So, for some Mappers, it is possible that there is no start of a stripe > within the byte range of a split. Those Mappers will process 0 record. We can > improve how splits are created for ORC. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira