On Wed, Mar 18, 2015 at 2:22 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> Can you try this: >> >> diff --git a/src/backend/postmaster/bgworker.c >> b/src/backend/postmaster/bgworker.c >> index f80141a..39b919f 100644 >> --- a/src/backend/postmaster/bgworker.c >> +++ b/src/backend/postmaster/bgworker.c >> @@ -244,6 +244,8 @@ BackgroundWorkerStateChange(void) >> rw->rw_terminate = true; >> if (rw->rw_pid != 0) >> kill(rw->rw_pid, SIGTERM); >> + else >> + ReportBackgroundWorkerPID(rw); >> } >> continue; >> } >> > > It didn't fix the problem. IIUC, you have done this to ensure that > if worker is not already started, then update it's pid, so that we > can get the required status in WaitForBackgroundWorkerShutdown(). > As this is a timing issue, it can so happen that before Postmaster > gets a chance to report the pid, backend has already started waiting > on WaitLatch().
I think I figured out the problem. That fix only helps in the case where the postmaster noticed the new registration previously but didn't start the worker, and then later notices the termination. What's much more likely to happen is that the worker is started and terminated so quickly that both happen before we create a RegisteredBgWorker for it. The attached patch fixes that case, too. Assuming this actually fixes the problem, I think we should back-patch it into 9.4. To recap, the problem is that, at present, if you register a worker and then terminate it before it's launched, GetBackgroundWorkerPid() will still return BGWH_NOT_YET_STARTED, which it makes it seem like we're still waiting for it to start. But when or if the slot is reused for an unrelated registration, then GetBackgroundWorkerPid() will switch to returning BGWH_STOPPED. It's hard to believe that's the behavior anyone wants. With this patch, the return value will always be BGWH_STOPPED in this situation. That has the virtue of being consistent, and practically speaking I think it's the behavior that everyone will want, because the case where this matters is when you are waiting for workers to start or waiting for worker to stop, and in either case you will want to treat a worker that was marked for termination before the postmaster actually started it as already-stopped rather than not-yet-started. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
stop-notify-fix-v2.patch
Description: binary/octet-stream
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers