[ https://issues.apache.org/jira/browse/HIVE-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prasanth J updated HIVE-5562: ----------------------------- Status: Patch Available (was: Open) marking it as patch available. > Provide stripe level column statistics in ORC > --------------------------------------------- > > Key: HIVE-5562 > URL: https://issues.apache.org/jira/browse/HIVE-5562 > Project: Hive > Issue Type: New Feature > Components: File Formats > Affects Versions: 0.13.0 > Reporter: Prasanth J > Assignee: Prasanth J > Labels: orcfile > Fix For: 0.13.0 > > Attachments: HIVE-5562.1.patch.txt > > > ORC maintains two levels of column statistics. Index statistics (for every > rowgroup) and file level column statistics for the entire file. It is useful > to have stripe level column statistics which will be intermediate to index > and file statistics. The reason to maintain stripe level statistics is that, > the current input split computation logic is based on stripe boundaries. So > if stripe level statistics are available and if a stripe doesn't satisfy a > predicate condition then that entire stripe (also split) can be eliminated > from split computation. -- This message was sent by Atlassian JIRA (v6.1#6144)