[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
---------------------------------
    Attachment: HIVE-14035.14.patch

Patch #14 significantly refactors the way split strategies are chosen for ACID 
split-update case and now correctly sets the isOriginal flag on a per split 
basis. When split-update is enabled, a split on base file can be of three 
types: split on an original_base, split on an compacted_base, & split on an 
insert_delta. It is possible that we might end up with a set of OrcSplits that 
splits both original and insert_delta in same job. In such cases, it is very 
important that we set the isOriginal flag correctly, otherwise it will mess up 
the way split strategies are used to instantiate a number of things. This patch 
takes care of that.
Additionally, the patch now also optimizes for the case when we had to process 
uncovered buckets when the split had no base (possible previously when we had 
only deltas). Now when split-update is enabled, every split will have a base, 
because there is no point of having a split that is supposed to just read the 
delete_deltas. (Minor compaction is not a concern here because minor compaction 
always creates a single split and has a separate logic of doing that, and that 
has not been modified.) 
Tests for all these changes are added to TestInputOutputFormat for various 
scenarios. Also addresses comments at RB. 

> Enable predicate pushdown to delta files created by ACID Transactions
> ---------------------------------------------------------------------
>
>                 Key: HIVE-14035
>                 URL: https://issues.apache.org/jira/browse/HIVE-14035
>             Project: Hive
>          Issue Type: New Feature
>          Components: Transactions
>            Reporter: Saket Saurabh
>            Assignee: Saket Saurabh
>         Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to