[ https://issues.apache.org/jira/browse/HIVE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624546#comment-16624546 ]
Hive QA commented on HIVE-16812: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12940706/HIVE-16812.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 14993 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.TestTxnCommands2.testOriginalFileReaderWhenNonAcidConvertedToAcid (batchId=299) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testOriginalFileReaderWhenNonAcidConvertedToAcid (batchId=311) org.apache.hadoop.hive.ql.exec.spark.TestSparkSessionTimeout.testMultiSessionSparkSessionTimeout (batchId=245) org.apache.hadoop.hive.ql.exec.spark.TestSparkSessionTimeout.testMultiSparkSessionTimeout (batchId=245) org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testFSCallsVectorizedOrcAcidRowBatchReader (batchId=289) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testKillQuery (batchId=251) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/13966/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13966/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13966/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12940706 - PreCommit-HIVE-Build > VectorizedOrcAcidRowBatchReader doesn't filter delete events > ------------------------------------------------------------ > > Key: HIVE-16812 > URL: https://issues.apache.org/jira/browse/HIVE-16812 > Project: Hive > Issue Type: Improvement > Components: Transactions > Affects Versions: 2.3.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Priority: Critical > Attachments: HIVE-16812.02.patch > > > the c'tor of VectorizedOrcAcidRowBatchReader has > {noformat} > // Clone readerOptions for deleteEvents. > Reader.Options deleteEventReaderOptions = readerOptions.clone(); > // Set the range on the deleteEventReaderOptions to 0 to INTEGER_MAX > because > // we always want to read all the delete delta files. > deleteEventReaderOptions.range(0, Long.MAX_VALUE); > {noformat} > This is suboptimal since base and deltas are sorted by ROW__ID. So for each > split if base we can find min/max ROW_ID and only load events from delta that > are in [min,max] range. This will reduce the number of delete events we load > in memory (to no more than there in the split). > When we support sorting on PK, the same should apply but we'd need to make > sure to store PKs in ORC index > See {{OrcRawRecordMerger.discoverKeyBounds()}} > {{hive.acid.key.index}} in Orc footer has an index of ROW__IDs so we should > know min/max easily for any file written by {{OrcRecordUpdater}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)