[ https://issues.apache.org/jira/browse/HIVE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prasanth J updated HIVE-4478: ----------------------------- Attachment: HIVE-4478.1.patch.txt > In ORC, add boolean noNulls flag to column stripe metadata > ---------------------------------------------------------- > > Key: HIVE-4478 > URL: https://issues.apache.org/jira/browse/HIVE-4478 > Project: Hive > Issue Type: Sub-task > Components: File Formats > Reporter: Eric Hanson > Assignee: Prasanth J > Attachments: HIVE-4478.1.patch.txt > > > Currently, the stripe metadata for ORC contains the min and max value for > each column in the stripe. This will be used for stripe elimination. However, > an additional bit of metadata for each column for each stripe, noNulls > (true/false), is needed to help speed up vectorized query execution as much > as 30%. > The vectorized QE code has a Boolean flag for each column vector called > noNulls. If this is true, all the null-checking logic is skipped for that > column for a VectorizedRowBatch when an operation is performed on that > column. For simple filters and arithmetic expressions, this can save on the > order of 30% of the time. > Once this noNulls stripe metadata is available, the vectorized iterator > (reader) for ORC can be updated to avoid all expense to load the isNull > bitmap, and efficiently set the noNulls flag for each column vector. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira