[ https://issues.apache.org/jira/browse/HIVE-21761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sankar Hariappan resolved HIVE-21761. ------------------------------------- Resolution: Fixed All patches committed to master. > Support table level replication in Hive > --------------------------------------- > > Key: HIVE-21761 > URL: https://issues.apache.org/jira/browse/HIVE-21761 > Project: Hive > Issue Type: New Feature > Components: repl > Reporter: Sankar Hariappan > Assignee: Sankar Hariappan > Priority: Major > Labels: DR, Replication > > *Requirements:* > {code:java} > - User needs to define replication policy to replicate any specific table. > This enables user to replicate only the business critical tables instead of > replicating all tables which may throttle the network bandwidth, storage and > also slow-down Hive replication. > - User needs to define replication policy using regular expressions (such as > db.sales_*) and needs to include additional tables which are non-matching > given pattern and exclude some tables which are matching given pattern. > - User needs to dynamically add/remove tables to the list either by manually > changing the replication policy during run time. > {code} > *Design:* > {code:java} > 1. Hive continue to support DB level replication policy of format <db_name> > but logically, we support the policy as <db_name>.'t1|t3| …'.'t*'. > 2. Regular expression can also be supported as replication policy. For > example, > a. <db_name>.'<prefix*>' > b. <db_name>.'<*suffix>' > c. <db_name>.'<prefix*suffix>' > d. <db_name>.'<regex>' > 3. User can provide include and exclude list to specify the tables to be > included in the replication policy. > a. Include list specifies the tables to be included. > b. Exclude list specifies the tables to be excluded even if it satisfies > the expression in include list. > c. So the tables included in the policy is a-b. > d. For backward compatibility, if no include or exclude list is given, then > all the tables will be included in > the policy. > 4. New format for the Replication policy have 3 parts all separated with Dot > (.). > a. First part is DB name. > b. Second part is included list. Valid java regex within single quote. > c. Third part is excluded list. Valid java regex within single quote. > - <db_name> -- Full DB replication which is currently supported > - <db_name>.'.*?' -- Full DB replication > - <db_name>.'t1|t3' -- DB replication with static list of tables t1 and > t3 included. > - <db_name>.'(t1*)|t2'.'t100' -- DB replication with all tables having > prefix t1 and also include table t2 which doesn’t have prefix t1 and exclude > t100 which has the prefix t1. > 5. If the DB property “repl.source.for” is set, then by default all the > tables in the DB will be enabled for replication and will continue to archive > deleted data to CM path. > 6. REPL DUMP takes 2 inputs along with existing FROM and WITH clause. > a. REPL DUMP <current_repl_policy> [REPLACE <previous_repl_policy> FROM > <last_repl_id> WITH <key_values_list>; > current_repl_policy and previous_repl_policy can be any format mentioned in > Point-4. > b. REPLACE clause to be supported to take previous repl policy as input. > c. Rest of the format remains same. > 7. Now, REPL DUMP on this DB will replicate the tables based on > current_repl_policy. > 8. Single table replication of format <db_name>.t1 is not supported. User can > provide the same with <db_name>.'t1' format. > 9. If any table is added dynamically either due to change in regular > expression or added to include list should be bootstrapped. > a. Hive will automatically figure out the list of tables newly included in > the list by comparing the current_repl_policy & previous_repl_policy inputs > and combine bootstrap dump for added tables as part of incremental dump. As > we can combine first incremental with bootstrap dump, it removes the current > limitation of target DB being inconsistent after bootstrap unless we run > first incremental replication. > b. If any table is renamed, then it may gets dynamically added/removed for > replication based on defined replication policy + include/exclude list. So, > Hive will perform bootstrap for the table which is just included after > rename. > c. Also, if renamed table is excluded from replication policy, then need to > drop the old table at target as well. > 10. Only the initial bootstrap load expects the target DB to be empty but the > intermediate bootstrap on tables due to regex or inclusion/exclusion list > change or renames doesn’t expect the target DB or table to be empty. If any > table with same name exist during such bootstrap, the table will be > overwritten including data. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)