[ 
https://issues.apache.org/jira/browse/HIVE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4478:
------------------------------

    Description: 
Currently, the stripe metadata for ORC contains the min and max value for each 
column in the stripe. This will be used for stripe elimination. However, an 
additional bit of metadata for each column for each stripe, noNulls 
(true/false), is needed to help speed up vectorized query execution as much as 
30%. 

The vectorized QE code has a Boolean flag for each column vector called 
noNulls. If this is true, all the null-checking logic is skipped for that 
column for a VectorizedRowBatch when an operation is performed on that column. 
For simple filters and arithmetic expressions, this can save on the order of 
30% of the time.

Once this noNulls stripe metadata is available, the vectorized iterator 
(reader) for ORC can be updated to avoid all expense to load the isNull bitmap, 
and efficiently set the noNulls flag for each column vector.

  was:
Currently, the stripe metadata for ORC contains the min and max value for each 
column in the stripe. This will be used for stripe elimination. However, an 
additional bit of metadata, noNulls (true/false), is needed to help speed up 
vectorized query execution as much as 30%. 

The vectorized QE code has a Boolean flag for each column vector called 
noNulls. If this is true, all the null-checking logic is skipped. For simple 
filters and arithmetic expressions, this can save on the order of 30% of the 
time.

Once this noNulls stripe metadata is available, the vectorized iterator for ORC 
can be updated to avoid all expense to load the isNull bitmap, and efficiently 
set the noNulls flag for each column vector.

    
> In ORC, add boolean noNulls flag to column stripe metadata
> ----------------------------------------------------------
>
>                 Key: HIVE-4478
>                 URL: https://issues.apache.org/jira/browse/HIVE-4478
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
>            Assignee: Owen O'Malley
>
> Currently, the stripe metadata for ORC contains the min and max value for 
> each column in the stripe. This will be used for stripe elimination. However, 
> an additional bit of metadata for each column for each stripe, noNulls 
> (true/false), is needed to help speed up vectorized query execution as much 
> as 30%. 
> The vectorized QE code has a Boolean flag for each column vector called 
> noNulls. If this is true, all the null-checking logic is skipped for that 
> column for a VectorizedRowBatch when an operation is performed on that 
> column. For simple filters and arithmetic expressions, this can save on the 
> order of 30% of the time.
> Once this noNulls stripe metadata is available, the vectorized iterator 
> (reader) for ORC can be updated to avoid all expense to load the isNull 
> bitmap, and efficiently set the noNulls flag for each column vector.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to