Eric Hanson created HIVE-4478:
---------------------------------
Summary: In ORC, add boolean noNulls flag to column stripe metadata
Key: HIVE-4478
URL: https://issues.apache.org/jira/browse/HIVE-4478
Project: Hive
Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Owen O'Malley
Currently, the stripe metadata for ORC contains the min and max value for each
column in the stripe. This will be used for stripe elimination. However, an
additional bit of metadata, noNulls (true/false), is needed to help speed up
vectorized query execution as much as 30%.
The vectorized QE code has a Boolean flag for each column vector called
noNulls. If this is true, all the null-checking logic is skipped. For simple
filters and arithmetic expressions, this can save on the order of 30% of the
time.
Once this noNulls stripe metadata is available, the vectorized iterator for ORC
can be updated to avoid all expense to load the isNull bitmap, and efficiently
set the noNulls flag for each column vector.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira