[ https://issues.apache.org/jira/browse/HIVE-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sankar Hariappan updated HIVE-18988: ------------------------------------ Attachment: (was: HIVE-18988.04.patch) > Support bootstrap replication of ACID tables > -------------------------------------------- > > Key: HIVE-18988 > URL: https://issues.apache.org/jira/browse/HIVE-18988 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl > Affects Versions: 3.0.0 > Reporter: Sankar Hariappan > Assignee: Sankar Hariappan > Priority: Major > Labels: ACID, DR, pull-request-available, replication > Fix For: 3.1.0 > > Attachments: HIVE-18988.01.patch, HIVE-18988.02.patch, > HIVE-18988.03.patch > > > Bootstrapping of ACID tables, need special handling to replicate a stable > state of data. > - If ACID feature enables, then perform bootstrap dump for ACID tables with > in read txn. > -> Dump table/partition metadata. > -> Get the list of valid data files for a table using same logic as read txn > do. > -> Dump latest ValidWriteIdList as per current read txn. > - Set the valid last replication state such that it doesn't miss any open > txn started after triggering bootstrap dump. > - If any txns on-going which was opened before triggering bootstrap dump, > then it is not guaranteed that if open_txn event captured for these txns. > Also, if these txns are opened for streaming ingest case, then dumped ACID > table data may include data of open txns which impact snapshot isolation at > target. To avoid that, bootstrap dump should wait for timeout (new > configuration: hive.repl.bootstrap.dump.open.txn.timeout). After timeout, > just force abort those txns and continue. > - If any txns force aborted belongs to a streaming ingest case, then dumped > ACID table data may have aborted data too. So, it is necessary to replicate > the aborted write ids to target to mark those data invalid for any readers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)