Abhishek Somani created HIVE-15390:
--------------------------------------

             Summary: Orc reader unnecessarily reading stripe footers with 
hive.optimize.index.filter set to true
                 Key: HIVE-15390
                 URL: https://issues.apache.org/jira/browse/HIVE-15390
             Project: Hive
          Issue Type: Bug
          Components: ORC
    Affects Versions: 1.2.1
            Reporter: Abhishek Somani
            Assignee: Abhishek Somani


In a split given to a task, the task's orc reader is unnecessarily reading 
stripe footers for stripes that are not its responsibility to read. This is 
happening with hive.optimize.index.filter set to true.

Assuming one split per task(no tez grouping considered), a task should not need 
to read beyond the split's end offset. Even in some split computation 
strategies where a split's end offset can be in the middle of a stripe, it 
should not need to read more than one stripe beyond the split's end offset(to 
fully read a stripe that started in it). However I see that some tasks make 
unnecessary filesystem calls to read all the stripe footers in a file from the 
split start offset till the end of the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to