On Wed, Jan 12, 2022 at 8:00 PM Alexander Lakhin <exclus...@gmail.com> wrote:
> By the look of things, you are right and this is the localhost-only issue.

But can't that be explained with timing races?  You change some stuff
around and it becomes less likely that you get a FIN to arrive in a
super narrow window, which I'm guessing looks something like: recv ->
EWOULDBLOCK, [receive FIN], wait -> FD_CLOSE, wait [hangs].  Note that
it's not happening on several Windows BF animals, and the ones it is
happening on only do it only every few weeks.

Here's a draft attempt at a fix.  First I tried to use recv(fd, &c, 1,
MSG_PEEK) == 0 to detect EOF, which seemed to me to be a reasonable
enough candidate, but somehow it corrupts the stream (!?), so I used
Alexander's POLLHUP idea, except I pushed it down to a more principled
place IMHO.  Then I suppressed it after the initial check because then
the logic from my earlier patch takes over, so stuff like FeBeWaitSet
doesn't suffer from extra calls, only these two paths that haven't
been converted to long-lived WESes yet.  Does this pass the test?

I wonder if this POLLHUP technique is reliable enough (I know that
wouldn't work on other systems[1], which is why I was trying to make
MSG_PEEK work...).

What about environment variable PG_TEST_USE_UNIX_SOCKETS=1, does it
reproduce with that set, and does the patch fix it?  I'm hoping that
explains some Windows CI failures from a nearby thread[2].

[1] 
https://illumos.topicbox.com/groups/developer/T5576767e764aa26a-Maf8f3460c2866513b0ac51bf
[2] 
https://www.postgresql.org/message-id/flat/CALT9ZEG%3DC%3DJSypzt2gz6NoNtx-ew2tYHbwiOfY_xNo%2ByBY_%3Djw%40mail.gmail.com
From a2cf2b016afc03d384a1682cd02aab760273d6fb Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Thu, 13 Jan 2022 15:48:14 +1300
Subject: [PATCH] Fix handling of socket FD_CLOSE events on Windows.

In some situations we could hang waiting for an FD_CLOSE event that has
already been reported.  Fix this by adding some state to WaitEvent to
remember that we've already been told the socket is closed.  To handle
client code that creates and destroys multiple temporary WaitEventSet
objects and thus would lose that state, check for EOF explicitly the
first time through.
---
 src/backend/storage/ipc/latch.c | 41 +++++++++++++++++++++++++++++++++
 src/include/storage/latch.h     |  1 +
 2 files changed, 42 insertions(+)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 61c876beff..86b86e71c4 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -899,6 +899,7 @@ AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd, Latch *latch,
 	event->user_data = user_data;
 #ifdef WIN32
 	event->reset = false;
+	event->eof = -1;			/* unknown */
 #endif
 
 	if (events == WL_LATCH_SET)
@@ -1851,6 +1852,45 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			cur_event->reset = false;
 		}
 
+		/*
+		 * Windows does not report FD_CLOSE repeatedly, and does not report
+		 * FD_READ on EOF.  Therefore we need to remember if we've seen
+		 * FD_CLOSE, to defend against callers that wait twice in a row
+		 * without a recv() that fails with WSAEWOULDBLOCK in between.
+		 */
+		if (cur_event->events & WL_SOCKET_READABLE)
+		{
+			/*
+			 * Check for EOF explicitly the first time through, to bridge the
+			 * gap between temporary WaitEventSet objects (such as those
+			 * created by WaitLatchOrSocket()).
+			 */
+			if (cur_event->eof == -1)
+			{
+				WSAPOLLFD	pollfd;
+				int			rc;
+
+				pollfd.fd = cur_event->fd;
+				pollfd.events = POLLRDNORM;
+				pollfd.revents = 0;
+				rc = WSAPoll(&pollfd, 1, 0);
+				if (rc == 1 && pollfd.revents & POLLHUP)
+					cur_event->eof = 1; /* EOF */
+				else
+					cur_event->eof = 0; /* data or error available */
+			}
+
+			/* If we've seen EOF or FD_CLOSE, keep reporting readiness. */
+			if (cur_event->eof == 1)
+			{
+				occurred_events->pos = cur_event->pos;
+				occurred_events->user_data = cur_event->user_data;
+				occurred_events->events = WL_SOCKET_READABLE;
+				occurred_events->fd = cur_event->fd;
+				return 1;
+			}
+		}
+
 		/*
 		 * Windows does not guarantee to log an FD_WRITE network event
 		 * indicating that more data can be sent unless the previous send()
@@ -2002,6 +2042,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 		{
 			/* EOF/error, so signal all caller-requested socket flags */
 			occurred_events->events |= (cur_event->events & WL_SOCKET_MASK);
+			cur_event->eof = 1;
 		}
 
 		if (occurred_events->events != 0)
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 3aa7b33834..41a857c0c3 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -147,6 +147,7 @@ typedef struct WaitEvent
 	void	   *user_data;		/* pointer provided in AddWaitEventToSet */
 #ifdef WIN32
 	bool		reset;			/* Is reset of the event required? */
+	int8		eof;			/* Has EOF been reached on a socket? */
 #endif
 } WaitEvent;
 
-- 
2.33.1

Reply via email to