Prasanth J created HIVE-5562:
--------------------------------
Summary: Provide stripe level column statistics in ORC
Key: HIVE-5562
URL: https://issues.apache.org/jira/browse/HIVE-5562
Project: Hive
Issue Type: New Feature
Components: File Formats
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Fix For: 0.13.0
ORC maintains two levels of column statistics. Index statistics (for every
rowgroup) and file level column statistics for the entire file. It is useful to
have stripe level column statistics which will be intermediate to index and
file statistics. The reason to maintain stripe level statistics is that, the
current input split computation logic is based on stripe boundaries. So if
stripe level statistics are available and if a stripe doesn't satisfy a
predicate condition then that entire stripe (also split) can be eliminated from
split computation.
--
This message was sent by Atlassian JIRA
(v6.1#6144)