[ https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757359#comment-16757359 ]
Hive QA commented on HIVE-21029: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12957007/HIVE-21029.04.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 15721 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15863/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15863/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15863/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12957007 - PreCommit-HIVE-Build > External table replication for existing deployments running incremental > replication. > ------------------------------------------------------------------------------------ > > Key: HIVE-21029 > URL: https://issues.apache.org/jira/browse/HIVE-21029 > Project: Hive > Issue Type: Bug > Components: repl > Affects Versions: 3.0.0, 3.1.0, 3.1.1 > Reporter: anishek > Assignee: Sankar Hariappan > Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21029.01.patch, HIVE-21029.02.patch, > HIVE-21029.03.patch, HIVE-21029.04.patch > > > Existing deployments using hive replication do not get external tables > replicated. For such deployments to enable external table replication they > will have to provide a specific switch to first bootstrap external tables as > part of hive incremental replication, following which the incremental > replication will take care of further changes in external tables. > The switch will be provided by an additional hive configuration (for ex: > hive.repl.bootstrap.external.tables) and is to be used in > {code} WITH {code} clause of > {code} REPL DUMP {code} command. > Additionally the existing hive config _hive.repl.include.external.tables_ > will always have to be set to "true" in the above clause. > Proposed usage for enabling external tables replication on existing > replication policy. > 1. Consider an ongoing repl policy <db1> in incremental phase. > Enable hive.repl.include.external.tables=true and > hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP. > - Dumps all events but skips events related to external tables. > - Instead, combine bootstrap dump for all external tables under “_bootstrap” > directory. > - Also, includes the data locations file "_external_tables_info”. > - LIMIT or TO clause shouldn’t be there to ensure the latest events are > dumped before bootstrap dumping external tables. > 2. REPL LOAD on this dump applies all the events first, copies external > tables data and then bootstrap external tables (metadata). > - It is possible that the external tables (metadata) are not point-in time > consistent with rest of the tables. > - But, it would be eventually consistent when the next incremental load is > applied. > - This REPL LOAD is fault tolerant and can be retried if failed. > 3. All future REPL DUMPs on this repl policy should set > hive.repl.bootstrap.external.tables=false. > - If not set to false, then target might end up having inconsistent set of > external tables as bootstrap wouldn’t clean-up any dropped external tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)