[ https://issues.apache.org/jira/browse/HIVE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630908#comment-16630908 ]
Hive QA commented on HIVE-16812: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12941456/HIVE-16812.06.patch {color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15005 tests executed *Failed tests:* {noformat} org.apache.hive.spark.client.rpc.TestRpc.testClientTimeout (batchId=319) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/14085/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14085/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14085/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12941456 - PreCommit-HIVE-Build > VectorizedOrcAcidRowBatchReader doesn't filter delete events > ------------------------------------------------------------ > > Key: HIVE-16812 > URL: https://issues.apache.org/jira/browse/HIVE-16812 > Project: Hive > Issue Type: Improvement > Components: Transactions > Affects Versions: 2.3.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Priority: Critical > Attachments: HIVE-16812.02.patch, HIVE-16812.04.patch, > HIVE-16812.05.patch, HIVE-16812.06.patch > > > the c'tor of VectorizedOrcAcidRowBatchReader has > {noformat} > // Clone readerOptions for deleteEvents. > Reader.Options deleteEventReaderOptions = readerOptions.clone(); > // Set the range on the deleteEventReaderOptions to 0 to INTEGER_MAX > because > // we always want to read all the delete delta files. > deleteEventReaderOptions.range(0, Long.MAX_VALUE); > {noformat} > This is suboptimal since base and deltas are sorted by ROW__ID. So for each > split if base we can find min/max ROW_ID and only load events from delta that > are in [min,max] range. This will reduce the number of delete events we load > in memory (to no more than there in the split). > When we support sorting on PK, the same should apply but we'd need to make > sure to store PKs in ORC index > See {{OrcRawRecordMerger.discoverKeyBounds()}} > {{hive.acid.key.index}} in Orc footer has an index of ROW__IDs so we should > know min/max easily for any file written by {{OrcRecordUpdater}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)