[ 
https://issues.apache.org/jira/browse/HIVE-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-5562:
-----------------------------

    Attachment: HIVE-5562.2.patch.txt

Addressed [~owen.omalley]'s review comments.

> Provide stripe level column statistics in ORC
> ---------------------------------------------
>
>                 Key: HIVE-5562
>                 URL: https://issues.apache.org/jira/browse/HIVE-5562
>             Project: Hive
>          Issue Type: New Feature
>          Components: File Formats
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5562.1.patch.txt, HIVE-5562.2.patch.txt
>
>
> ORC maintains two levels of column statistics. Index statistics (for every 
> rowgroup) and file level column statistics for the entire file. It is useful 
> to have stripe level column statistics which will be intermediate to index 
> and file statistics. The reason to maintain stripe level statistics is that, 
> the current input split computation logic is based on stripe boundaries. So 
> if stripe level statistics are available and if a stripe doesn't satisfy a 
> predicate condition then that entire stripe (also split) can be eliminated 
> from split computation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to