[ 
https://issues.apache.org/jira/browse/HIVE-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071894#comment-17071894
 ] 

Marta Kuczora commented on HIVE-23114:
--------------------------------------

The implementation in HIVE-21164 is built upon the implementation of the 
insert-only (or also called as mm) tables. And it seems that that original 
implementation doesn't handle well the use-cases where multiple 
FileSinkOperators are present in one task and these FileSinkOperators are 
writing the same table. And the query written in the reproduction steps is 
exactly that type of query.
This issue happens with multi-insert queries only if dynamic partitions are 
involved. In this case we will end up with two FileSinkOperators within one 
task and each of them will write to the same table.
Some basic steps what a FileSinkOperator does is the following:
- Writes the data
- When it's finished in the closeOp it creates a manifest file which will 
contain the successfully written data files. 
- Then in the end in the jobCloseOp it reads the manifest file and cleans up 
all files which are written to the table but not in the manifest file
There are multiple places where problem can occur, it depends on in what order 
the closeOp and jobCloseOp methods of each FileSinkOperators are executed.
It can cause collision in the manifest file creation as both FileSinkOperators 
will try to create it with the same path. It can also happen that one 
FileSinkOperator deletes the data written by the other FileSinkOperator. It 
really depends on the order of execution of the FileSinkOperator's methods. 

The FileSinkOperators of the query written in the reproduction steps are 
writing data to different partitions. One is writing into the partitions where 
c is not null and the other one is writing to the partitions where c is null. I 
propose a fix for this use-case in this Jira. It can happen that the multiple 
FileSinkOperators are writing data into overlapping partitions, but this 
use-case is not straightforward to fix. The way how an insert overwrite query 
works with overlapping partitions is not consistent between table types. For 
external tables, the latest FileSinkOperator will overwrite the previously 
written data, for ACID tables the data written by the first FileSinkOperator 
will be the final result. So this use case needs more clarification, but until 
it is fixed a workaround can be to switch off direct insert or to split the 
query into separate insert overwrite statements.

> Insert overwrite with dynamic partitioning is not working correctly with 
> direct insert
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-23114
>                 URL: https://issues.apache.org/jira/browse/HIVE-23114
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>
> This is a follow-up Jira for the 
> [conversation|https://issues.apache.org/jira/browse/HIVE-21164?focusedCommentId=17059280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17059280]
>  in HIVE-21164
>  Doing an insert overwrite from a multi-insert statement with dynamic 
> partitioning will give wrong results for ACID tables when 
> 'hive.acid.direct.insert.enabled' is true or for insert-only tables.
> Reproduction:
> {noformat}
> set hive.acid.direct.insert.enabled=true;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.vectorized.execution.enabled=false;
> set hive.stats.autogather=false;
> create external table multiinsert_test_text (a int, b int, c int) stored as 
> textfile;
> insert into multiinsert_test_text values (1111, 11, 1111), (2222, 22, 1111), 
> (3333, 33, 2222), (4444, 44, NULL), (5555, 55, NULL);
> create table multiinsert_test_acid (a int, b int) partitioned by (c int) 
> stored as orc tblproperties('transactional'='true');
> create table multiinsert_test_mm (a int, b int) partitioned by (c int) stored 
> as orc tblproperties('transactional'='true', 
> 'transactional_properties'='insert_only');
> from multiinsert_test_text a
> insert overwrite table multiinsert_test_acid partition (c)
> select
>  a.a,
>  a.b,
>  a.c
>  where a.c is not null
> insert overwrite table multiinsert_test_acid partition (c)
> select
>  a.a,
>  a.b,
>  a.c
> where a.c is null;
> select * from multiinsert_test_acid;
> from multiinsert_test_text a
> insert overwrite table multiinsert_test_mm partition (c)
> select
>  a.a,
>  a.b,
>  a.c
>  where a.c is not null
> insert overwrite table multiinsert_test_mm partition (c)
> select
>  a.a,
>  a.b,
>  a.c
> where a.c is null;
> select * from multiinsert_test_mm;
> {noformat}
> The result of these steps can be different, it depends on the execution order 
> of the FileSinkOperators of the insert overwrite statements. It can happen 
> that an error occurs due to manifest file collision, it can happen that no 
> error occurs but the result will be incorrect.
>  Running the same insert query with an external table of with and ACID table 
> with 'hive.acid.direct.insert.enabled=false' will give the follwing result:
> {noformat}
> 1111    11      1111
> 2222    22      1111
> 3333    33      2222
> 4444    44      NULL
> 5555    55      NULL
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to