On Thu, Oct 15, 2020 at 8:40 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.mu...@gmail.com> writes:
> > The process exit event is like an 'edge', not a 'level'... hmm.  It
> > might be enough to set report_postmaster_not_running = true the first
> > time it tells us so if we try to wait again we'll treat it like a
> > level.  I will look into it later today.
>
> Seems like having that be per-WaitEventSet state is also not a great
> idea --- if we detect PM death while waiting on one WES, and then
> wait on another one, it won't work.  A plain process-wide static
> variable would be a better way I bet.

I don't think that's a problem -- the kernel will report the event to
each interested kqueue object.  The attached fixes the problem for me.
From 50e4c632385951cd37af746bc8b89d893181819f Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Wed, 14 Oct 2020 20:19:04 +1300
Subject: [PATCH] Make WL_POSTMASTER_DEATH level-triggered on kqueue builds.

If WaitEventWait() reports that the postmaster has gone away, later
calls to WaitEventWait() should continue to report that.  Otherwise
further waits that occur in the proc_exit() path after we already
noticed the postmaster's demise could block forever.

Back-patch to 13, where the kqueue support landed.

Reported-by: Tom Lane <t...@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3624029.1602701929%40sss.pgh.pa.us
---
 src/backend/storage/ipc/latch.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 63c6c97536..663f7ba7eb 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1492,7 +1492,10 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 		timeout_p = &timeout;
 	}
 
-	/* Report events discovered by WaitEventAdjustKqueue(). */
+	/*
+	 * Report postmaster events discovered by WaitEventAdjustKqueue() or
+	 * earlier calls to WaitEventSetWait().
+	 */
 	if (unlikely(set->report_postmaster_not_running))
 	{
 		if (set->exit_on_postmaster_death)
@@ -1563,6 +1566,13 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 				 cur_kqueue_event->filter == EVFILT_PROC &&
 				 (cur_kqueue_event->fflags & NOTE_EXIT) != 0)
 		{
+			/*
+			 * The kernel will tell this kqueue object only once about the exit
+			 * of the postmaster, so let's remember that for next time so that
+			 * we provide level-triggered semantics.
+			 */
+			set->report_postmaster_not_running = true;
+
 			if (set->exit_on_postmaster_death)
 				proc_exit(1);
 			occurred_events->fd = PGINVALID_SOCKET;
-- 
2.27.0

Reply via email to