[ https://issues.apache.org/jira/browse/HIVE-21530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sankar Hariappan reassigned HIVE-21530: --------------------------------------- Assignee: Sankar Hariappan (was: mahesh kumar behera) > Replicate Streaming ingest on ACID tables. > ------------------------------------------ > > Key: HIVE-21530 > URL: https://issues.apache.org/jira/browse/HIVE-21530 > Project: Hive > Issue Type: Sub-task > Components: repl, Transactions > Affects Versions: 4.0.0 > Reporter: Sankar Hariappan > Assignee: Sankar Hariappan > Priority: Major > Labels: DR, Replication > Attachments: Hive ACID Replication_ Streaming Ingest Tables.pdf > > > implement replication of hive streaming ingest of tables as per [^Hive ACID > Replication_ Streaming Ingest Tables.pdf] . > changes to txn_commit to include information about transaction batch. > changes to copy task to only copy if there is a difference in file size or > checksum, seems specific to transaction batch shouldnt be used for normal > transactions. > copy the correct sequence of files w.r.t data file + side file. > remove side files ( which looks like are suffixed as _flush in file names) > when the batch is committed. > how do we determine the idempotent nature of the events here, update the > corresponding table + partition and not copy new version of the file. > validate if partial copied data files are handled on the target warehouse > given correct side file. can we leave the side file file forever, in case > during transaction batch copy after certain transactions are copied over then > primary warehouse fails. we wont be able to remove _flush file, on failover > do we have to handle this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)