[ 
https://issues.apache.org/jira/browse/HIVE-23349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-23349:
----------------------------------
    Description: 
2 concurrent MERGE INSERT operations generate duplicates due to lack of 
locking. 
MERGE INSERT is treated as regular INSERT, it acquires SHARED_READ lock that 
doesn't prevent other SHARED_READs. We should use EXCLUSIVE lock here or 
EXCL_WRITE if hive.txn.write.xlock=false;

{code}
create table target (a int, b int) stored as orc TBLPROPERTIES 
('transactional'='true')");
insert into target values (1,2), (3,4)
create table source (a int, b int)
{code}

execute in parallel:
{code}
insert into source values (5,6), (7,8)
{code}

  was:
2 concurrent MERGE INSERT operations generate duplicates due to lack of 
locking. 
MERGE INSERT is treated as regular INSERT, it acquires SHARED_READ lock that 
doesn't prevent other SHARED_READs. We should use EXCLUSIVE lock here or 
EXCL_WRITE if hive.txn.write.xlock=false (INSERT would acquire SHARED_WRITE in 
this case);

{code}
create table target (a int, b int) stored as orc TBLPROPERTIES 
('transactional'='true')");
insert into target values (1,2), (3,4)
create table source (a int, b int)
{code}

execute in parallel:
{code}
insert into source values (5,6), (7,8)
{code}


> ACID: Concurrent MERGE INSERT operations produce duplicates
> -----------------------------------------------------------
>
>                 Key: HIVE-23349
>                 URL: https://issues.apache.org/jira/browse/HIVE-23349
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Denys Kuzmenko
>            Assignee: Denys Kuzmenko
>            Priority: Major
>
> 2 concurrent MERGE INSERT operations generate duplicates due to lack of 
> locking. 
> MERGE INSERT is treated as regular INSERT, it acquires SHARED_READ lock that 
> doesn't prevent other SHARED_READs. We should use EXCLUSIVE lock here or 
> EXCL_WRITE if hive.txn.write.xlock=false;
> {code}
> create table target (a int, b int) stored as orc TBLPROPERTIES 
> ('transactional'='true')");
> insert into target values (1,2), (3,4)
> create table source (a int, b int)
> {code}
> execute in parallel:
> {code}
> insert into source values (5,6), (7,8)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to