On Mon, Feb 1, 2021 at 3:44 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2...@gmail.com> wrote: > > > > On Mon, Feb 1, 2021 at 1:54 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > On Mon, Feb 1, 2021 at 6:48 AM Peter Smith <smithpb2...@gmail.com> wrote: > > > > > > > > On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapil...@gmail.com> > > > > wrote: > > > > > > > > > > I have made the below changes in the patch. Let me know what you think > > > > > about these? > > > > > 1. It was a bit difficult to understand the code in DropSubscription > > > > > so I have rearranged the code to match the way we are doing in HEAD > > > > > where we drop the slots at the end after finishing all the other > > > > > cleanup. > > > > > > > > There was a reason why the v22 logic was different from HEAD. > > > > > > > > The broken connection leaves dangling slots which is unavoidable. > > > > > > > > > > I think this is true only when the user specifically requested it by > > > the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right? > > > Otherwise, we give an error on a broken connection. Also, if that is > > > true then is there a reason to pass missing_ok as true while dropping > > > tablesync slots? > > > > > > > AFAIK there is always a potential race with DropSubscription dropping > > slots. The DropSubscription might be running at exactly the same time > > the apply worker has just dropped the very same tablesync slot. > > > > We stopped the workers before getting a list of NotReady relations and > then we try to drop the corresponding slots. So, how such a race > condition can happen? Note, because we have a lock on pg_subscrition, > there is no chance that the workers can restart till the transaction > end.
OK. I think I was forgetting the logicalrep_worker_stop would also go into a loop waiting for the worker process to die. So even if the tablesync worker does simultaneously drop it's own slot, I think it will certainly at least be in SYNCDONE state before DropSubscription does anything else with that worker. > > > By > > saying missing_ok = true it means DropSubscription would not give > > ERROR in such a case, so at least the DROP SUBSCRIPTION would not fail > > with an unexpected error. > > > > > > > > > But, > > > > whereas the user knows the name of the Subscription slot (they named > > > > it), there is no easy way for them to know the names of the remaining > > > > tablesync slots unless we log them. > > > > > > > > That is why the v22 code was written to process the tablesync slots > > > > even for wrconn == NULL so their name could be logged: > > > > elog(WARNING, "no connection; cannot drop tablesync slot \"%s\".", > > > > syncslotname); > > > > > > > > The v23 patch removed this dangling slot name information, so it makes > > > > it difficult for the user to know what tablesync slots to cleanup. > > > > > > > > > > Okay, then can we think of combining with the existing error of the > > > replication slot? I think that might produce a very long message, so > > > another idea could be to LOG a separate WARNING for each such slot > > > just before giving the error. > > > > There may be many subscribed tables so I agree combining to one > > message might be too long. Yes, we can add another loop to output the > > necessary information. But, isn’t logging each tablesync slot WARNING > > before the subscription slot ERROR exactly the behaviour which already > > existed in v22. IIUC the DropSubscription refactoring in V23 was not > > done for any functional change, but was intended only to make the code > > simpler, but how is that goal achieved if v23 ends up needing 3 loops > > where v22 only needed 1 loop to do the same thing? > > > > No, there is a functionality change as well. The way we have code in > v22 can easily lead to a problem when we have dropped the slots but > get an error while removing origins or an entry from subscription rel. > In such cases, we won't be able to rollback the drop of slots but the > other database operations will be rolled back. This is the reason we > have to drop the slots at the end. We need to ensure the same thing > for AlterSubscription_refresh. Does this make sense now? > OK. ---- Kind Regards, Peter Smith. Fujitsu Australia.