[ 
https://issues.apache.org/jira/browse/HIVE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764500#comment-13764500
 ] 

Yin Huai commented on HIVE-4868:
--------------------------------

HIVE-5102 will address this issue.
                
> When reading an ORC file by an MR job, some Mappers may not be able to 
> process data in some cases
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4868
>                 URL: https://issues.apache.org/jira/browse/HIVE-4868
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> Let's say a stripe of an ORC file is 256 MB and we set the split size for an 
> MR job to 64 MB. Right now, splits are created based on byte ranges. 
> Here is an example:
> {code}
> |<-The start of a stripe                |<-The end of a stripe
> v                                       v
> |---------------------------------------|
>    ^                        ^ 
>    |<- The start of a split |<- The end of a split
> {\code}
> So, for some Mappers, it is possible that there is no start of a stripe 
> within the byte range of a split. Those Mappers will process 0 record. We can 
> improve how splits are created for ORC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to