[ 
https://issues.apache.org/jira/browse/HIVE-28366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870423#comment-17870423
 ] 

yongzhi.shao commented on HIVE-28366:
-------------------------------------

[~dkuzmenko] :

Sir, this should be due to the fact that before the commit method is executed, 
the client calls the refresh method to get the latest snapshot, and uses it to 
commit.
Assuming that the insert overwrite is executed first, and then the insert into 
command starts executing the commit method, this would cause the problem.
Since read and write are often distributed different physical nodes in large 
distributed systems, if you want to guarantee a completely reliable commit, you 
need to pass the snapshotId of the read node the whole time, but almost no one 
does this in the first place.

> Iceberg: Concurrent Insert and IOW produce incorrect result 
> ------------------------------------------------------------
>
>                 Key: HIVE-28366
>                 URL: https://issues.apache.org/jira/browse/HIVE-28366
>             Project: Hive
>          Issue Type: Bug
>          Components: Iceberg integration
>    Affects Versions: 4.0.0
>            Reporter: Denys Kuzmenko
>            Assignee: Denys Kuzmenko
>            Priority: Major
>
> 1. create a table and insert some data:
> {code}
> create table ice_t (i int, p int) partitioned by spec (truncate(10, i)) 
> stored by iceberg;
> insert into ice_t values (1, 1), (2, 2);
> insert into ice_t values (10, 10), (20, 20);
> insert into ice_t values (40, 40), (30, 30);
> {code}
> Then concurrently execute the following jobs:
> Job 1:
> {code}
> insert into ice_t select i*100, p*100 from ice_t;
> {code}
> Job 2:
> {code}
> insert overwrite ice_t select i+1, p+1 from ice_t;
> {code}
> If Job 1 finishes first, Job 2 still succeeds for me, and after that the 
> table content will be the following:
> {code}
> 2      2
> 3      3
> 11     11
> 21     21
> 31     31
> 41     41
> 100    100
> 200    200
> 1000   1000
> 2000   2000
> 3000   3000
> 4000   4000
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to