At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao <masao.fu...@gmail.com> wrote in <CAHGQGwHqQVHmQ7wM=elnnp1_oy-gvssacajxwje4nc2twsq...@mail.gmail.com> > On Thu, Feb 2, 2017 at 2:36 PM, Michael Paquier > <michael.paqu...@gmail.com> wrote: > > On Thu, Feb 2, 2017 at 2:13 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> writes: > >>> Then, the reason for the TRY-CATCH cluase is that I found that > >>> some functions called from there can throw exceptions. > >> > >> Yes, but all LWLocks should be released by normal error recovery. > >> It should not be necessary for this code to clean that up by hand. > >> If it were necessary, there would be TRY-CATCH around every single > >> LWLockAcquire in the backend, and we'd have an unreadable and > >> unmaintainable system. Please don't add a TRY-CATCH unless it's > >> *necessary* -- and you haven't explained why this one is. > > Yes.
Thank you for the suggestion. I minunderstood that. > > Putting hands into the code and at the problem, I can see that > > dropping a subscription on a node makes it unresponsive in case of a > > stop. And that's just because calls to LWLockRelease are missing as in > > the patch attached. A try/catch problem should not be necessary. > > Thanks for the patch! > > With the patch, LogicalRepLauncherLock is released at the end of > DropSubscription(). But ISTM that the lock should be released just after > logicalrep_worker_stop() and there is no need to protect the removal of > replication slot with the lock. That's true. logicalrep_worker_stop returns after confirmig that worker->proc is cleard, so no false relaunch cannot be caused. After all, logicalrep_worker_stop is surrounded by LWLockAcquire/Relase pair. So it can be moved into the funciton and make the lock secrion to be more narrower. > /* > * If we found worker but it does not have proc set it is starting up, > * wait for it to finish and then kill it. > */ > while (worker && !worker->proc) > { > > ISTM that the above loop in logicalrep_worker_stop() is not necessary > because LogicalRepLauncherLock ensures that the above condition is > always false. Thought? Am I missing something? The lock exists only to keep the launcher from starting a worker. Creating a subscription and starting a worker for the slot run independently. > If the above condition is true, which means that there is the worker slot > having the "subid" of the worker to kill, but its "proc" has not been set yet. Yes. The situation happens after launcher sets subid and before ApplyWorkerMain attaches the slot. The lock doesn't protect the section. If someone can drop a subscription just after its creation, it happens. > Without LogicalRepLauncherLock, this situation can happen after "subid" > is set by the launcher and before "proc" is set by the worker. But > LogicalRepLauncherLock protects those operations, so logicalrep_worker_stop() > called while holding the lock should always think the above condition is > false. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
>From 490add168b7cf63ec584e75f6a7f79efc08ae200 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Fri, 3 Feb 2017 10:31:01 +0900 Subject: [PATCH] Refactor the lock section for subscription worker termination --- src/backend/commands/subscriptioncmds.c | 3 --- src/backend/replication/logical/launcher.c | 5 +++++ 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 3b70807..67c587c 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -508,9 +508,6 @@ DropSubscription(DropSubscriptionStmt *stmt) /* Clean up dependencies */ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0); - /* Protect against launcher restarting the worker. */ - LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE); - /* Kill the apply worker so that the slot becomes accessible. */ logicalrep_worker_stop(subid); diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c index d222cff..233be06 100644 --- a/src/backend/replication/logical/launcher.c +++ b/src/backend/replication/logical/launcher.c @@ -368,6 +368,9 @@ logicalrep_worker_stop(Oid subid) break; } + /* Block the lauchner not to restart this worker */ + LWLockAcquire(LogicalRepLauncherLock); + /* Now terminate the worker ... */ kill(worker->proc->pid, SIGTERM); LWLockRelease(LogicalRepWorkerLock); @@ -398,6 +401,8 @@ logicalrep_worker_stop(Oid subid) ResetLatch(&MyProc->procLatch); } + + LWLockRelease(LogicalRepLauncherLock); } /* -- 2.9.2
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers