[ https://issues.apache.org/jira/browse/HIVE-16676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sankar Hariappan updated HIVE-16676: ------------------------------------ Description: For bootstrap dump, if the table is renamed after fetching the table names, then new table will be missing in the dump and so the target database doesn't have both old and new table. During incremental replication, later RENAME events will be noop as the old table doesn't exist in target. To generalise the solution for this issue, the following logic is proposed. 1. Each table should store the CREATE event ID into the table parameters. If a table follows Create -> Drop -> Create sequence, then it is easy to differentiate if the table is old or new one. 2. Bootstrap should combine the delta changes as Incremental Dump into the dumpDir. 3. After bootstrap dump completes, then traverse the events from bootDumpBeginReplId. - If a RENAME event is found, then check, - If the source table is dumped and create event ID matches, then just dump the RENAME event as such. - If the source table is dumped but the create event ID is later than the event, then skip the event. - If the source table doesn’t exist, but the target table exists, then skip the event. - If both source and target tables are missing, then dump the target table to the bootstrap dumpDir. 4. For other events, just dump the event with following logic. - CREATE: If object exists, then skip else dump it. - DROP: If object doesn’t exist, then skip else dump it. - ALTER: If the object exist and the create event ID matches, then dump else skip it. 5. Rename event load should check source table and if create event ID is same, then apply the event. 6. If source table doesn’t exist, then check if the target table exists, if yes, then skip the event. was: Currently, RENAME TABLE and RENAME PARTITION events are treated as ALTER events. For bootstrap dump, if the table is renamed after fetching the table names, then new table will be missing in the dump and so the target database doesn't have both old and new table. During incremental replication, later RENAME events will be noop as the old table doesn't exist in target. In order to make RENAME replication simple, it is suggested to treat RENAME as DROP+CREATE event. EVENT_RENAME_TABLE = EVENT_DROP_TABLE + EVENT_CREATE_TABLE. EVENT_RENAME_PARTITION = EVENT_DROP_PARTITION + EVENT_ADD_PARTITION. > Bootstrap REPL DUMP should ensure no data loss due to concurrent operations. > ---------------------------------------------------------------------------- > > Key: HIVE-16676 > URL: https://issues.apache.org/jira/browse/HIVE-16676 > Project: Hive > Issue Type: Sub-task > Components: repl > Affects Versions: 2.1.0 > Reporter: Sankar Hariappan > Assignee: Sankar Hariappan > > For bootstrap dump, if the table is renamed after fetching the table names, > then new table will be missing in the dump and so the target database doesn't > have both old and new table. During incremental replication, later RENAME > events will be noop as the old table doesn't exist in target. > To generalise the solution for this issue, the following logic is proposed. > 1. Each table should store the CREATE event ID into the table parameters. If > a table follows Create -> Drop -> Create sequence, then it is easy to > differentiate if the table is old or new one. > 2. Bootstrap should combine the delta changes as Incremental Dump into the > dumpDir. > 3. After bootstrap dump completes, then traverse the events from > bootDumpBeginReplId. > - If a RENAME event is found, then check, > - If the source table is dumped and create event ID matches, then just dump > the RENAME event as such. > - If the source table is dumped but the create event ID is later than the > event, then skip the event. > - If the source table doesn’t exist, but the target table exists, then skip > the event. > - If both source and target tables are missing, then dump the target table to > the bootstrap dumpDir. > 4. For other events, just dump the event with following logic. > - CREATE: If object exists, then skip else dump it. > - DROP: If object doesn’t exist, then skip else dump it. > - ALTER: If the object exist and the create event ID matches, then dump else > skip it. > 5. Rename event load should check source table and if create event ID is > same, then apply the event. > 6. If source table doesn’t exist, then check if the target table exists, if > yes, then skip the event. -- This message was sent by Atlassian JIRA (v6.3.15#6346)