[ 
https://issues.apache.org/jira/browse/HIVE-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805459#comment-13805459
 ] 

Eric Hanson commented on HIVE-5632:
-----------------------------------

Have you considered adding min/max metadata at the split (as opposed to stripe) 
level? If there are, say, 1 million rows per split, you could check to see if 
you could skip a split on 1/100th of the time it takes to check 100 stripes 
within the split that are 10,000 rows each. 

Having hierarchical min/max metadata may be a good idea, both at the split and 
stripe level.

> Eliminate splits based on SARGs using stripe statistics in ORC
> --------------------------------------------------------------
>
>                 Key: HIVE-5632
>                 URL: https://issues.apache.org/jira/browse/HIVE-5632
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>         Attachments: HIVE-5632.1.patch.txt, HIVE-5632.2.patch.txt, 
> orc_split_elim.orc
>
>
> HIVE-5562 provides stripe level statistics in ORC. Stripe level statistics 
> combined with predicate pushdown in ORC (HIVE-4246) can be used to eliminate 
> the stripes (thereby splits) that doesn't satisfy the predicate condition. 
> This can greatly reduce unnecessary reads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to