While self-reviewing my "Refactoring backend fork+exec code" patches, I
noticed this in pq_init():
/*
* In backends (as soon as forked) we operate the underlying socket in
* nonblocking mode and use latches to implement blocking semantics if
* needed. That allows us to provide safely interruptible reads and
* writes.
*
* Use COMMERROR on failure, because ERROR would try to send the error
to
* the client, which might require changing the mode again, leading to
* infinite recursion.
*/
#ifndef WIN32
if (!pg_set_noblock(MyProcPort->sock))
ereport(COMMERROR,
(errmsg("could not set socket to nonblocking mode:
%m")));
#endif
#ifndef WIN32
/* Don't give the socket to any subprograms we execute. */
if (fcntl(MyProcPort->sock, F_SETFD, FD_CLOEXEC) < 0)
elog(FATAL, "fcntl(F_SETFD) failed on socket: %m");
#endif
Using COMMERROR here seems bogus. Firstly, if there was a problem with
recursion, surely the elog(FATAL) that follows would also be wrong. But
more seriously, it's not cool to continue using the connection as if
everything is OK, if we fail to put it into non-blocking mode. We should
disconnect. (COMMERROR merely logs the error, it does not bail out like
ERROR does)
Barring objections, I'll commit and backpatch the attached to fix that.
--
Heikki Linnakangas
Neon (https://neon.tech)
From fc737586783a05672ff8eb96e245dfdeeadcd506 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 11 Mar 2024 16:36:37 +0200
Subject: [PATCH 1/1] Disconnect if socket cannot be put into non-blocking mode
Commit 387da18874 moved the code to put socket into non-blocking mode
from socket_set_nonblocking() into the one-time initialization
function, pg_init(). In socket_set_nonblocking(), there indeed was a
risk of recursion on failure like the comment said, but in pg_init(),
ERROR or FATAL is fine. There's even another elog(FATAL) just after
this, if setting FD_CLOEXEC fails.
Note that COMMERROR merely logged the error, it did not close the
connection, so if putting the socket to non-blocking mode failed we
would use the connection anyway. You might not immediately notice,
because most socket operations in a regular backend wait for the
socket to become readable/writable anyway. But e.g. replication will
be quite broken.
Backpatch to all supported versions.
Discussion: xx
---
src/backend/libpq/pqcomm.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index c606bf34473..d9e49ca7028 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -189,14 +189,10 @@ pq_init(void)
* nonblocking mode and use latches to implement blocking semantics if
* needed. That allows us to provide safely interruptible reads and
* writes.
- *
- * Use COMMERROR on failure, because ERROR would try to send the error to
- * the client, which might require changing the mode again, leading to
- * infinite recursion.
*/
#ifndef WIN32
if (!pg_set_noblock(MyProcPort->sock))
- ereport(COMMERROR,
+ ereport(FATAL,
(errmsg("could not set socket to nonblocking mode: %m")));
#endif
--
2.39.2