[ https://issues.apache.org/jira/browse/HIVE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696152#comment-13696152 ]
Hudson commented on HIVE-4478: ------------------------------ Integrated in Hive-trunk-h0.21 #2168 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2168/]) HIVE-4478. In ORC remove ispresent stream from columns that contain no null values in a stripe. (Prasanth Jayachandran via omalley) (Revision 1497912) Result = FAILURE omalley : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1497912 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java * /hive/trunk/ql/src/test/resources/orc-file-dump.out > In ORC, add boolean noNulls flag to column stripe metadata > ---------------------------------------------------------- > > Key: HIVE-4478 > URL: https://issues.apache.org/jira/browse/HIVE-4478 > Project: Hive > Issue Type: Sub-task > Components: File Formats > Affects Versions: 0.12.0 > Reporter: Eric Hanson > Assignee: Prasanth J > Fix For: 0.12.0 > > Attachments: HIVE-4478.1.patch.txt, HIVE-4478.2.git.patch.txt > > > Currently, the stripe metadata for ORC contains the min and max value for > each column in the stripe. This will be used for stripe elimination. However, > an additional bit of metadata for each column for each stripe, noNulls > (true/false), is needed to help speed up vectorized query execution as much > as 30%. > The vectorized QE code has a Boolean flag for each column vector called > noNulls. If this is true, all the null-checking logic is skipped for that > column for a VectorizedRowBatch when an operation is performed on that > column. For simple filters and arithmetic expressions, this can save on the > order of 30% of the time. > Once this noNulls stripe metadata is available, the vectorized iterator > (reader) for ORC can be updated to avoid all expense to load the isNull > bitmap, and efficiently set the noNulls flag for each column vector. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira