[ 
https://issues.apache.org/jira/browse/HIVE-16676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16676:
------------------------------------
    Description: 
 For bootstrap dump, if the table is renamed after fetching the table names, 
then new table will be missing in the dump and so the target database doesn't 
have both old and new table. During incremental replication, later RENAME 
events will be noop as the old table doesn't exist in target.

To generalise the solution for this issue, the following logic is proposed.
1. Each table should store the CREATE event ID into the table parameters. If a 
table follows Create -> Drop -> Create sequence, then it is easy to 
differentiate if the table is old or new one.
2. Bootstrap should combine the delta changes as Incremental Dump into the 
dumpDir.
3. After bootstrap dump completes, then traverse the events from 
bootDumpBeginReplId.
    - If a RENAME event is found, then check,
    - If the source table is dumped and create event ID matches, then just dump 
the RENAME event as such.
    - If the source table is dumped but the create event ID is later than the 
event, then skip the event.
    - If the source table doesn’t exist, but the target table exists, then skip 
the event.
    - If both source and target tables are missing, then dump the target table 
to the bootstrap dumpDir.

4. For other events, just dump the event with following logic.
    - CREATE: If object exists, then skip else dump it.
    - DROP: If object doesn’t exist, then skip else dump it.
    - ALTER: If the object exist and the create event ID matches, then dump 
else skip it.

5. Rename event load should check,
    - If source table exists and if create event ID is same, then apply the 
event else skip it.
    - If source table doesn’t exist, then check if the target table exists, if 
yes, then skip the event.


  was:
 For bootstrap dump, if the table is renamed after fetching the table names, 
then new table will be missing in the dump and so the target database doesn't 
have both old and new table. During incremental replication, later RENAME 
events will be noop as the old table doesn't exist in target.

To generalise the solution for this issue, the following logic is proposed.
1. Each table should store the CREATE event ID into the table parameters. If a 
table follows Create -> Drop -> Create sequence, then it is easy to 
differentiate if the table is old or new one.
2. Bootstrap should combine the delta changes as Incremental Dump into the 
dumpDir.
3. After bootstrap dump completes, then traverse the events from 
bootDumpBeginReplId.
    - If a RENAME event is found, then check,
    - If the source table is dumped and create event ID matches, then just dump 
the RENAME event as such.
    - If the source table is dumped but the create event ID is later than the 
event, then skip the event.
    - If the source table doesn’t exist, but the target table exists, then skip 
the event.
    - If both source and target tables are missing, then dump the target table 
to the bootstrap dumpDir.
4. For other events, just dump the event with following logic.
    - CREATE: If object exists, then skip else dump it.
    - DROP: If object doesn’t exist, then skip else dump it.
    - ALTER: If the object exist and the create event ID matches, then dump 
else skip it.
5. Rename event load should check,
    - If source table exists and if create event ID is same, then apply the 
event else skip it.
    - If source table doesn’t exist, then check if the target table exists, if 
yes, then skip the event.



> Bootstrap REPL DUMP should ensure no data loss due to concurrent operations.
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-16676
>                 URL: https://issues.apache.org/jira/browse/HIVE-16676
>             Project: Hive
>          Issue Type: Sub-task
>          Components: repl
>    Affects Versions: 2.1.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>
>  For bootstrap dump, if the table is renamed after fetching the table names, 
> then new table will be missing in the dump and so the target database doesn't 
> have both old and new table. During incremental replication, later RENAME 
> events will be noop as the old table doesn't exist in target.
> To generalise the solution for this issue, the following logic is proposed.
> 1. Each table should store the CREATE event ID into the table parameters. If 
> a table follows Create -> Drop -> Create sequence, then it is easy to 
> differentiate if the table is old or new one.
> 2. Bootstrap should combine the delta changes as Incremental Dump into the 
> dumpDir.
> 3. After bootstrap dump completes, then traverse the events from 
> bootDumpBeginReplId.
>     - If a RENAME event is found, then check,
>     - If the source table is dumped and create event ID matches, then just 
> dump the RENAME event as such.
>     - If the source table is dumped but the create event ID is later than the 
> event, then skip the event.
>     - If the source table doesn’t exist, but the target table exists, then 
> skip the event.
>     - If both source and target tables are missing, then dump the target 
> table to the bootstrap dumpDir.
> 4. For other events, just dump the event with following logic.
>     - CREATE: If object exists, then skip else dump it.
>     - DROP: If object doesn’t exist, then skip else dump it.
>     - ALTER: If the object exist and the create event ID matches, then dump 
> else skip it.
> 5. Rename event load should check,
>     - If source table exists and if create event ID is same, then apply the 
> event else skip it.
>     - If source table doesn’t exist, then check if the target table exists, 
> if yes, then skip the event.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to