Sankar Hariappan created HIVE-19927: ---------------------------------------
Summary: Last Repl ID set by bootstrap dump is not proper and may cause loss of data if have ACID tables. Key: HIVE-19927 URL: https://issues.apache.org/jira/browse/HIVE-19927 Project: Hive Issue Type: Sub-task Components: HiveServer2, Transactions Affects Versions: 3.1.0 Reporter: Sankar Hariappan Assignee: Sankar Hariappan During bootstrap dump of ACID tables, let's consider the below sequence. - Current session (REPL DUMP), Open txn (Txn1) - Event-10 - Another session (Session-2), Open txn (Txn2) - Event-11 - Session-2 -> Insert data (T1.D1) to ACID table. - Event-12 - Get lastReplId = last event ID logged. (Event-12) - Session-2 -> Commit Txn (Txn2) - Event-13 - Dump ACID tables based on validTxnList based on Txn1. --> This step skips all the data written by txns > Txn1. So, T1.D1 will be missing. - Commit Txn (Txn1) - REPL LOAD from bootstrap dump will skip T1.D1. - Incremental REPL DUMP will start from Event-13 and hence lose Txn2 which is opened after Txn1. So, data T1.D1 will be lost for ever. Proposed to capture the lastReplId of bootstrap before opening current txn (Txn1) and store it in Driver context and use it for dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005)