[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Saket Saurabh updated HIVE-14035: --------------------------------- Attachment: HIVE-14035.14.patch Patch #14 significantly refactors the way split strategies are chosen for ACID split-update case and now correctly sets the isOriginal flag on a per split basis. When split-update is enabled, a split on base file can be of three types: split on an original_base, split on an compacted_base, & split on an insert_delta. It is possible that we might end up with a set of OrcSplits that splits both original and insert_delta in same job. In such cases, it is very important that we set the isOriginal flag correctly, otherwise it will mess up the way split strategies are used to instantiate a number of things. This patch takes care of that. Additionally, the patch now also optimizes for the case when we had to process uncovered buckets when the split had no base (possible previously when we had only deltas). Now when split-update is enabled, every split will have a base, because there is no point of having a split that is supposed to just read the delete_deltas. (Minor compaction is not a concern here because minor compaction always creates a single split and has a separate logic of doing that, and that has not been modified.) Tests for all these changes are added to TestInputOutputFormat for various scenarios. Also addresses comments at RB. > Enable predicate pushdown to delta files created by ACID Transactions > --------------------------------------------------------------------- > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions > Reporter: Saket Saurabh > Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)