[ https://issues.apache.org/jira/browse/HIVE-20730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Saurabh Seth updated HIVE-20730: -------------------------------- Attachment: HIVE-20730.patch Status: Patch Available (was: Open) I have tweaked {{VectorizedOrcAcidRowBatchReader.findMinMaxKeys}} to set a SARG into delete_delta based on the stripe stats in case the {{hive.acid.key.index}} is not present. [~ekoifman], I couldn't add a unit test for this because I don't completely understand how the query based compactor will generate such a file (OrcRecordUpdater seems to always write the index). I tested this change by ignoring the index present in files written using OrcRecordUpdater. If you have any suggestions, please let me know. > Do delete event filtering even if hive.acid.index is not there > -------------------------------------------------------------- > > Key: HIVE-20730 > URL: https://issues.apache.org/jira/browse/HIVE-20730 > Project: Hive > Issue Type: Improvement > Components: Transactions > Affects Versions: 4.0.0 > Reporter: Eugene Koifman > Assignee: Saurabh Seth > Priority: Major > Attachments: HIVE-20730.patch > > > since HIVE-16812 {{VectorizedOrcAcidRowBatchReader}} filters delete events > based on min/max ROW__ID in the split which relies on {{hive.acid.index}} to > be in the ORC footer. > There is no way to generate {{hive.acid.index}} from a plain query as in > HIVE-20699 and so we need to make sure that we generate a SARG into > delete_delta/bucket_x based on stripe stats even the index is missing -- This message was sent by Atlassian JIRA (v7.6.3#76005)