On Fri, Apr 9, 2021 at 6:11 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
> On Wed, Apr 7, 2021 at 7:31 PM Robins Tharakan <thara...@gmail.com> wrote:
> > Correct. This is easily reproducible on this test-instance, so let me know 
> > if you want me to test a patch.
>
> From your description it sounds like signals are not arriving at all,
> rather than some more complicated race.  Let's go back to basics...

I was looking into the portability of SIGURG and OOB socket data for
something totally different (hallway track discussion from PGCon,
could we use that for query cancel, like FTP does, instead of opening
another socket?), and lo and behold, someone has figured out a
workaround for this latch problem:

https://github.com/microsoft/WSL/issues/8619

I don't really want to add code to scrape uname() ouput detect
different kernels at runtime as shown there, but it doesn't seem to
make a difference on Linux if we just always do what was suggested.  I
didn't look too hard into whether that is the right place to put the
call, or really understand *why* it works, and since I am not a
Windows user and we don't have a WSL1 CI, I can't confirm that it
works or explore whether there is some other ordering of operations
that would be better but still work, but if that does the trick then
maybe we should just do something like the attached.

Thoughts?
From 7c077ff2f0d922a2d13948db8d6c01eb791c83c6 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Sun, 30 Jul 2023 10:43:27 +1200
Subject: [PATCH] Work around signalfd() oddity on WSL1.

It's not clear why setting SIGURG's disposition to SIG_DFL (which should
be no change) affects the behavior of WSL1 (one of the ways that Windows
can run Linux programs).  It's harmles to do that on Linux, so let's add
that workaround to help WSL1 users.  Based on research done by  Elvis
Pranskevichus in a WSL bug report:

https://github.com/microsoft/WSL/issues/8619

Discussion: https://postgr.es/m/CAEP4nAymAZP1VEBNoWAQca85ZtU5YxuwS95%2BVu%2BXW%2B-eMfq_vQ%40mail.gmail.com

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index db889385b7..06dc7c503a 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -307,6 +307,12 @@ InitializeLatchSupport(void)
 	if (signal_fd < 0)
 		elog(FATAL, "signalfd() failed");
 	ReserveExternalFD();
+
+	/*
+	 * Setting this to its default disposition (ignored) appears to be
+	 * redundant, but seems to fix a problem with signalfd() on WSL1 kernels.
+	 */
+	pqsignal(SIGURG, SIG_DFL);
 #endif
 
 #ifdef WAIT_USE_KQUEUE
-- 
2.39.2

Reply via email to