On Thu, Jan 21, 2021 at 3:47 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2...@gmail.com> wrote: > > > > Hi Amit. > > > > PSA the v17 patch for the Tablesync Solution1. > > > > Thanks for the updated patch. Below are few comments: >
One more comment: In LogicalRepSyncTableStart(), you are trying to remove the slot on the failure of copy which won't work if the publisher is down. If that happens on restart of tablesync worker, we will retry to create the slot with the same name and it will fail because the previous slot is still not removed from the publisher. I think the same problem can happen if, after an error in tablesync worker and we drop the subscription before tablesync worker gets a chance to restart. So, to avoid these problems can we use the TEMPORARY slot for tablesync workers as previously? If I remember correctly, the main problem was we don't know where to start decoding if we fail in catchup phase. But for that origins should be sufficient because if we fail before copy then anyway we have to create a new slot and origin but if we fail after copy then we can use the start_decoding_position from the origin. So before copy, we still need to use CRS_USE_SNAPSHOT while creating a temporary slot but if we are already in FINISHED COPY state at the start of tablesync worker then create a slot with CRS_NOEXPORT_SNAPSHOT option and then use origin's start_pos and proceed decoding changes from that point onwards similar to how currently the apply worker works. -- With Regards, Amit Kapila.