On Thu, Jan 21, 2021 at 9:17 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2...@gmail.com> wrote: > > > > Hi Amit. > > > > PSA the v17 patch for the Tablesync Solution1. > > > > Thanks for the updated patch. Below are few comments: > 1. Why are we changing the scope of PG_TRY in DropSubscription()? > Also, it might be better to keep the replication slot drop part as it > is. >
The latest patch [v18] was re-designed to make tablesync slots as TEMPORARY [ak0122], so this code in DropSubscription is modified a lot. This review comment is not applicable anymore. > 2. > - * - Tablesync worker finishes the copy and sets table state to SYNCWAIT; > - * waits for state change. > + * - Tablesync worker does initial table copy; there is a > FINISHEDCOPY state to > + * indicate when the copy phase has completed, so if the worker crashes > + * before reaching SYNCDONE the copy will not be re-attempted. > > In the last line, shouldn't the state be FINISHEDCOPY instead of SYNCDONE? > OK. The code comment was correct, but maybe confusing. I have reworded it in the latest patch [v18]. > 3. > +void > +tablesync_cleanup_at_interrupt(void) > +{ > + bool drop_slot_needed; > + char originname[NAMEDATALEN] = {0}; > + RepOriginId originid; > + TimeLineID tli; > + Oid subid = MySubscription->oid; > + Oid relid = MyLogicalRepWorker->relid; > + > + elog(DEBUG1, > + "tablesync_cleanup_at_interrupt for relid = %d", > + MyLogicalRepWorker->relid); > > The function name and message makes it sound like that we drop slot > and origin at any interrupt. Isn't it better to name it as > tablesync_cleanup_at_shutdown()? > The latest patch [v18] was re-designed to make tablesync slots as TEMPORARY [ak0122], so this cleanup function is removed. This review comment is not applicable anymore. > 4. > + drop_slot_needed = > + wrconn != NULL && > + MyLogicalRepWorker->relstate != SUBREL_STATE_SYNCDONE && > + MyLogicalRepWorker->relstate != SUBREL_STATE_READY; > + > + if (drop_slot_needed) > + { > + char syncslotname[NAMEDATALEN] = {0}; > + bool missing_ok = true; /* no ERROR if slot is missing. */ > > I think we can avoid using missing_ok and drop_slot_needed variables. > The latest patch [v18] was re-designed to make tablesync slots as TEMPORARY [ak0122], so this code no longer exists. This review comment is not applicable anymore. > 5. Can we drop the origin along with the slot in > process_syncing_tables_for_sync() instead of > process_syncing_tables_for_apply()? I think this is possible because > of the other changes you made in origin.c. Also, if possible, we can > try to use the same code to drop the slot and origin in > tablesync_cleanup_at_interrupt and process_syncing_tables_for_sync. > No, the origin tracking cannot be dropped by the tablesync worker for the normal use-case even with my modified origin.c; it would fail during the commit TX because while trying to do replorigin_session_advance it would find the asserted origin id was not there anymore. Also, the latest patch [v18] was re-designed to make tablesync slots as TEMPORARY [ak0122], so the tablesync_cleanup_at_interrupt function no longer exists (so the origin.c change of v17 has also been removed). > 6. > + if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY) > + { > + /* > + * The COPY phase was previously done, but tablesync then crashed/etc > + * before it was able to finish normally. > + */ > > There seems to be a typo (crashed/etc) in the above comment. > OK. Fixed in latest patch [v18]. ---- [ak0122] = https://www.postgresql.org/message-id/CAA4eK1LS0_mdVx2zG3cS%2BH88FJiwyS3kZi7zxijJ_gEuw2uQ2g%40mail.gmail.com [v18] = https://www.postgresql.org/message-id/CAHut%2BPvm0R%3DMn_uVN_JhK0scE54V6%2BEDGHJg1WYJx0Q8HX_mkQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia