[ https://issues.apache.org/jira/browse/HIVE-23349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denys Kuzmenko updated HIVE-23349: ---------------------------------- Description: 2 concurrent MERGE INSERT operations generate duplicates due to lack of locking. MERGE INSERT is treated as regular INSERT, it acquires SHARED_READ lock that doesn't prevent other SHARED_READs. We should use EXCLUSIVE lock here or EXCL_WRITE if hive.txn.write.xlock=false; {code} create table target (a int, b int) stored as orc TBLPROPERTIES ('transactional'='true')"); insert into target values (1,2), (3,4) create table source (a int, b int) {code} execute in parallel: {code} insert into source values (5,6), (7,8) {code} was: 2 concurrent MERGE INSERT operations generate duplicates due to lack of locking. MERGE INSERT is treated as regular INSERT, it acquires SHARED_READ lock that doesn't prevent other SHARED_READs. We should use EXCLUSIVE lock here or EXCL_WRITE if hive.txn.write.xlock=false (INSERT would acquire SHARED_WRITE in this case); {code} create table target (a int, b int) stored as orc TBLPROPERTIES ('transactional'='true')"); insert into target values (1,2), (3,4) create table source (a int, b int) {code} execute in parallel: {code} insert into source values (5,6), (7,8) {code} > ACID: Concurrent MERGE INSERT operations produce duplicates > ----------------------------------------------------------- > > Key: HIVE-23349 > URL: https://issues.apache.org/jira/browse/HIVE-23349 > Project: Hive > Issue Type: Bug > Components: Transactions > Reporter: Denys Kuzmenko > Assignee: Denys Kuzmenko > Priority: Major > > 2 concurrent MERGE INSERT operations generate duplicates due to lack of > locking. > MERGE INSERT is treated as regular INSERT, it acquires SHARED_READ lock that > doesn't prevent other SHARED_READs. We should use EXCLUSIVE lock here or > EXCL_WRITE if hive.txn.write.xlock=false; > {code} > create table target (a int, b int) stored as orc TBLPROPERTIES > ('transactional'='true')"); > insert into target values (1,2), (3,4) > create table source (a int, b int) > {code} > execute in parallel: > {code} > insert into source values (5,6), (7,8) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)