[jira] [Work logged] (HIVE-26472) Concurrent UPDATEs can cause duplicate rows

ASF GitHub Bot (Jira) Thu, 18 Aug 2022 23:48:09 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26472?focusedWorklogId=801874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-801874
 ]


ASF GitHub Bot logged work on HIVE-26472:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Aug/22 06:47
            Start Date: 19/Aug/22 06:47
    Worklog Time Spent: 10m 
      Work Description: deniskuzZ commented on code in PR #3524:
URL: https://github.com/apache/hive/pull/3524#discussion_r949854353


##########
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java:
##########
@@ -4053,14 +4053,25 @@ public void replTableWriteIdState(String 
validWriteIdList, String dbName, String
 
   @Override
   public long allocateTableWriteId(long txnId, String dbName, String 
tableName) throws TException {
-    return allocateTableWriteIdsBatch(Collections.singletonList(txnId), 
dbName, tableName).get(0).getWriteId();
+    return allocateTableWriteId(txnId, dbName, tableName, false);
   }
 
   @Override
-  public List<TxnToWriteId> allocateTableWriteIdsBatch(List<Long> txnIds, 
String dbName, String tableName)
-          throws TException {
+  public long allocateTableWriteId(long txnId, String dbName, String 
tableName, boolean shouldRealloc) throws TException {
+    return allocateTableWriteIdsBatch(Collections.singletonList(txnId), 
dbName, tableName, shouldRealloc).get(0).getWriteId();
+  }
+
+
+  @Override
+  public List<TxnToWriteId> allocateTableWriteIdsBatch(List<Long> txnIds, 
String dbName, String tableName) throws TException {
+    return allocateTableWriteIdsBatch(txnIds, dbName, tableName, false);
+  }
+
+  public List<TxnToWriteId> allocateTableWriteIdsBatch(List<Long> txnIds, 
String dbName, String tableName,

Review Comment:
   If it's not used/exposed via API yet, there is no need to modify 
allocateTableWriteIdsBatch. 





Issue Time Tracking
-------------------

    Worklog Id:     (was: 801874)
    Time Spent: 1h  (was: 50m)

> Concurrent UPDATEs can cause duplicate rows
> -------------------------------------------
>
>                 Key: HIVE-26472
>                 URL: https://issues.apache.org/jira/browse/HIVE-26472
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 4.0.0-alpha-1
>            Reporter: John Sherman
>            Assignee: John Sherman
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: debug.diff
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Concurrent UPDATEs to the same table can cause duplicate rows when the 
> following occurs:
> Two UPDATEs get assigned txnIds and writeIds like this:
> UPDATE #1 = txnId: 100 writeId: 50 <--- commits first
> UPDATE #2 = txnId: 101 writeId: 49
> To replicate the issue:
> I applied the attach debug.diff patch which adds hive.lock.sleep.writeid 
> (which controls the amount to sleep before acquiring a writeId) and 
> hive.lock.sleep.post.writeid (which controls the amount to sleep after 
> acquiring a writeId).
> {code:java}
> CREATE TABLE test_update(i int) STORED AS ORC 
> TBLPROPERTIES('transactional'="true");
> INSERT INTO test_update VALUES (1);
> Start two beeline connections.
> In connection #1 - run:
> set hive.driver.parallel.compilation = true;
> set hive.lock.sleep.writeid=5s;
> update test_update set i = 1 where i = 1;
> Wait one second and in connection #2 - run:
> set hive.driver.parallel.compilation = true;
> set hive.lock.sleep.post.writeid=10s;
> update test_update set i = 1 where i = 1;
> After both updates complete - it is likely that test_update contains two rows 
> now.
> {code}
> HIVE-24211 seems to address the case when:
> UPDATE #1 = txnId: 100 writeId: 50
> UPDATE #2 = txnId: 101 writeId: 49 <--- commits first (I think this causes 
> UPDATE #1 to detect the snapshot is out of date because commitedTxn > UPDATE 
> #1s txnId)
> A possible work around is to set hive.driver.parallel.compilation = false, 
> but this would only help in cases there is only one HS2 instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26472) Concurrent UPDATEs can cause duplicate rows

Reply via email to