On Tue, Mar 14, 2023 at 11:20 AM Andres Freund <and...@anarazel.de> wrote:
> On windows it looks like pids can't be reused as long as there are handles for
> the process. Unfortunately, we close the handle for the process in
> pgwin32_deadchild_callback(), which runs in a separate thread, so the pid can
> be reused before we get to waitpid(). And thus it can happen while we start
> new children.

Ahhh.  Right, of course.  The handle thing makes total sense now that
you point it out, and although I couldn't find it in the fine manual,
a higher authority has it in black and white[1].  Even without knowing
which of those calls is releasing the process table entry, we're doing
all of them on the wrong side of that IOCP.  Alright, here is a patch
to schlep most of that code over into waitpid(), where it belongs.

[1] https://devblogs.microsoft.com/oldnewthing/20110107-00/?p=11803
From cd83520c7d9ca214c98b9c0d4c74c9371f5555b0 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Tue, 14 Mar 2023 11:53:26 +1300
Subject: [PATCH] Fix waitpid() emulation on Windows.

Unix doesn't allow a PID to be recycled for a new process until after
the earlier process is "reaped" by waitpid() (or similar functions). Our
emulation didn't guarantee that, and the postmaster could finish up
tracking multiple children with the same PID, and get very confused.

Fix, by moving the guts of pgwin32_deadchild_callback() into waitpid(),
so that resources are released synchronously.  The process and PID
continue to exist until we close the process handle, when we're ready to
adjust our own bookkeeping of running children.

This seems to explain a couple of failures on CI.  It had never been
reported before, despite the code being as old as the Windows port.
Perhaps Windows started recycling PIDs more rapidly, or perhaps timing
changes due to commit 7389aad6 made it more likely to break.

Diagnosed-by: Andres Freund <and...@anarazel.de>
Discussion: https://postgr.es/m/20230208012852.bvkn2am4h4iqjogq%40awork3.anarazel.de
---
 src/backend/postmaster/postmaster.c | 65 ++++++++++++++++-------------
 1 file changed, 36 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 2552327d90..3da18ab6ac 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -6421,36 +6421,24 @@ ShmemBackendArrayRemove(Backend *bn)
 static pid_t
 waitpid(pid_t pid, int *exitstatus, int options)
 {
+	win32_deadchild_waitinfo *childinfo;
+	DWORD		exitcode;
 	DWORD		dwd;
 	ULONG_PTR	key;
 	OVERLAPPED *ovl;
 
-	/*
-	 * Check if there are any dead children. If there are, return the pid of
-	 * the first one that died.
-	 */
-	if (GetQueuedCompletionStatus(win32ChildQueue, &dwd, &key, &ovl, 0))
+	Assert(pid == -1);
+	Assert(options == WNOHANG);
+
+	/* Try to consume one win32_deadchild_waitinfo from the queue. */
+	if (!GetQueuedCompletionStatus(win32ChildQueue, &dwd, &key, &ovl, 0))
 	{
-		*exitstatus = (int) key;
-		return dwd;
+		errno = EAGAIN;
+		return -1;
 	}
 
-	return -1;
-}
-
-/*
- * Note! Code below executes on a thread pool! All operations must
- * be thread safe! Note that elog() and friends must *not* be used.
- */
-static void WINAPI
-pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
-{
-	win32_deadchild_waitinfo *childinfo = (win32_deadchild_waitinfo *) lpParameter;
-	DWORD		exitcode;
-
-	if (TimerOrWaitFired)
-		return;					/* timeout. Should never happen, since we use
-								 * INFINITE as timeout value. */
+	childinfo = (win32_deadchild_waitinfo *) key;
+	pid = childinfo->procId;
 
 	/*
 	 * Remove handle from wait - required even though it's set to wait only
@@ -6466,13 +6454,11 @@ pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
 		write_stderr("could not read exit code for process\n");
 		exitcode = 255;
 	}
-
-	if (!PostQueuedCompletionStatus(win32ChildQueue, childinfo->procId, (ULONG_PTR) exitcode, NULL))
-		write_stderr("could not post child completion status\n");
+	*exitstatus = exitcode;
 
 	/*
-	 * Handle is per-process, so we close it here instead of in the
-	 * originating thread
+	 * Close the process handle.  Only after this point can the PID can be
+	 * recycled by the kernel.
 	 */
 	CloseHandle(childinfo->procHandle);
 
@@ -6482,7 +6468,28 @@ pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
 	 */
 	free(childinfo);
 
-	/* Queue SIGCHLD signal */
+	return pid;
+}
+
+/*
+ * Note! Code below executes on a thread pool! All operations must
+ * be thread safe! Note that elog() and friends must *not* be used.
+ */
+static void WINAPI
+pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
+{
+	/* Should never happen, since we use INFINITE as timeout value. */
+	if (!TimerOrWaitFired)
+		return;
+
+	/* Post the win32_deadchild_waitinfo object for waitpid() to deal with. */
+	if (!PostQueuedCompletionStatus(win32ChildQueue,
+									0,
+									(ULONG_PTR) lpParameter,
+									NULL))
+		write_stderr("could not post child completion status\n");
+
+	/* Queue SIGCHLD signal. */
 	pg_queue_signal(SIGCHLD);
 }
 #endif							/* WIN32 */
-- 
2.39.1

Reply via email to