Hi Thomas! Thanks for working on this patch.
I have attached a new version with some typo corrections of doc entry,
removing of redundant `include` entries and trailing whitespaces. Also I
added in doc the case when single query transaction with disconnected
client might be eventually commited upon completion in autocommit mode
if it doesn't return any rows (doesn't communicate with user) before
sending final commit message. This behavior might be unexpected for
clients and hence IMO it's worth noticing.
> diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
> index 4c7b1e7bfd..8cf95d09a4 100644
> --- a/src/backend/libpq/pqcomm.c
> +++ b/src/backend/libpq/pqcomm.c
>
> @@ -919,13 +923,13 @@ socket_set_nonblocking(bool nonblocking)
> }
>
> /* --------------------------------
> - * pq_recvbuf - load some bytes into the input buffer
> + * pq_recvbuf_ext - load some bytes into the input buffer
> *
> * returns 0 if OK, EOF if trouble
> * --------------------------------
> */
> static int
> -pq_recvbuf(void)
> +pq_recvbuf_ext(bool nonblocking)
> {
> if (PqRecvPointer > 0)
> {
> @@ -941,8 +945,7 @@ pq_recvbuf(void)
> PqRecvLength = PqRecvPointer = 0;
> }
>
> - /* Ensure that we're in blocking mode */
> - socket_set_nonblocking(false);
> + socket_set_nonblocking(nonblocking);
>
> /* Can fill buffer from PqRecvLength and upwards */
> for (;;)
> @@ -954,6 +957,9 @@ pq_recvbuf(void)
>
> if (r < 0)
> {
> + if (nonblocking && (errno == EAGAIN || errno ==
EWOULDBLOCK))
> + return 0;
> +
> if (errno == EINTR)
> continue; /* Ok if interrupted */
>
> @@ -981,6 +987,13 @@ pq_recvbuf(void)
> }
> }
>
> +static int
> +pq_recvbuf(void)
> +{
> + return pq_recvbuf_ext(false);
> +}
> +
> +
AFAICS, the above fragment is not related with primary fix directly.
AFAICS, there are the following open items that don't allow to treat the
current patch completed:
* Absence of tap tests emulating some scenarios of client disconnection.
IIRC, you wanted to rewrite the test case posted by Sergey.
* Concerns about portability of `pq_check_connection()`A implementation.
BTW, on windows postgres with this patch have not been built [1].
* Absence of benchmark results to show lack of noticeable performance
regression after applying non-zero timeout for checking client liveness.
1.
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.131820
--
Regards,
Maksim Milyutin
>From 3ec788ac5e7c47fe135a3618849db179942f4b27 Mon Sep 17 00:00:00 2001
From: Maksim Milyutin <milyuti...@gmail.com>
Date: Fri, 26 Mar 2021 10:18:30 +0300
Subject: [PATCH v9] Detect POLLHUP while running queries
Provide a new optional GUC that can be used to check whether the client
connection has gone away periodically while running very long queries.
Author: Sergey Cherkashin <s.cherkas...@postgrespro.ru>
Author: Thomas Munro <thomas.mu...@gmail.com>
Reviewed-by: Thomas Munro <thomas.mu...@gmail.com>
Reviewed-by: Tatsuo Ishii <is...@sraoss.co.jp>
Reviewed-by: Konstantin Knizhnik <k.knizh...@postgrespro.ru>
Reviewed-by: Zhihong Yu <z...@yugabyte.com>
Reviewed-by: Andres Freund <and...@anarazel.de>
Reviewed-by: Maksim Milyutin <milyuti...@gmail.com>
Reviewed-by: Tom Lane <t...@sss.pgh.pa.us> (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
---
doc/src/sgml/config.sgml | 36 +++++++++++
src/backend/libpq/pqcomm.c | 63 +++++++++++++++++--
src/backend/tcop/postgres.c | 27 ++++++++
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 10 +++
src/backend/utils/misc/guc.c | 10 +++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/include/libpq/libpq.h | 2 +
src/include/miscadmin.h | 1 +
src/include/utils/timeout.h | 1 +
10 files changed, 150 insertions(+), 4 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ddc6d789d8..abec47ada7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -998,6 +998,42 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-client-connection-check-interval" xreflabel="client_connection_check_interval">
+ <term><varname>client_connection_check_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>client_connection_check_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the time interval between checks that the client is still
+ connected, while running queries. The check is performed by polling the
+ socket to allows long running queries to be aborted immediately.
+ </para>
+ <para>
+ If this value is specified without units, it is taken as milliseconds.
+ The default value is <literal>0</literal>, which disables connection
+ checks. Without connection checks, the server will detect the loss of
+ the connection only after next interaction with client through
+ connection socket: when it starts to wait for a new request, receives a
+ request or sends a response. As a consequence, some modificatory single
+ query transaction in autocommit mode that doesn't return any rows
+ becomes commited somewhen later silently and unexpectedly for previously
+ disconnected client.
+ </para>
+ <para>
+ For the kernel itself to detect lost TCP connections reliably and within
+ a known timeframe in all scenarios including network failure, it may
+ also be necessary to adjust the default TCP keepalive settings of the
+ operating system, or the
+ <xref linkend="guc-tcp-keepalives-idle"/>,
+ <xref linkend="guc-tcp-keepalives-interval"/> and
+ <xref linkend="guc-tcp-keepalives-count"/> settings of
+ <productname>PostgreSQL</productname>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 4c7b1e7bfd..8cf95d09a4 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -59,6 +59,9 @@
#include <grp.h>
#include <unistd.h>
#include <sys/file.h>
+#ifdef HAVE_POLL_H
+#include <sys/poll.h>
+#endif
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/time.h>
@@ -104,6 +107,7 @@
*/
int Unix_socket_permissions;
char *Unix_socket_group;
+int client_connection_check_interval;
/* Where the Unix socket files are (list of palloc'd strings) */
static List *sock_paths = NIL;
@@ -919,13 +923,13 @@ socket_set_nonblocking(bool nonblocking)
}
/* --------------------------------
- * pq_recvbuf - load some bytes into the input buffer
+ * pq_recvbuf_ext - load some bytes into the input buffer
*
* returns 0 if OK, EOF if trouble
* --------------------------------
*/
static int
-pq_recvbuf(void)
+pq_recvbuf_ext(bool nonblocking)
{
if (PqRecvPointer > 0)
{
@@ -941,8 +945,7 @@ pq_recvbuf(void)
PqRecvLength = PqRecvPointer = 0;
}
- /* Ensure that we're in blocking mode */
- socket_set_nonblocking(false);
+ socket_set_nonblocking(nonblocking);
/* Can fill buffer from PqRecvLength and upwards */
for (;;)
@@ -954,6 +957,9 @@ pq_recvbuf(void)
if (r < 0)
{
+ if (nonblocking && (errno == EAGAIN || errno == EWOULDBLOCK))
+ return 0;
+
if (errno == EINTR)
continue; /* Ok if interrupted */
@@ -981,6 +987,13 @@ pq_recvbuf(void)
}
}
+static int
+pq_recvbuf(void)
+{
+ return pq_recvbuf_ext(false);
+}
+
+
/* --------------------------------
* pq_getbyte - get a single byte from connection, or return EOF
* --------------------------------
@@ -1921,3 +1934,45 @@ pq_settcpusertimeout(int timeout, Port *port)
return STATUS_OK;
}
+
+/*
+ * POLLRDHUP is a Linux extension to poll(2). Unlike POLLHUP, it has to be
+ * requested explicitly. It detects half-shutdown sockets, which are not
+ * always reported as POLLHUP on Linux, depending on the type of socket. We'll
+ * look out for both.
+ */
+#ifdef POLLRDHUP
+#define PG_POLLRDHUP POLLRDHUP
+#else
+#define PG_POLLRDHUP 0
+#endif
+
+/*
+ * Check if the client is still connected.
+ */
+bool
+pq_check_connection(void)
+{
+#if defined(HAVE_POLL) || defined(WIN32)
+ struct pollfd pollfd;
+ int rc;
+
+ pollfd.fd = MyProcPort->sock;
+ pollfd.events = POLLOUT | POLLIN | PG_POLLRDHUP;
+ pollfd.revents = 0;
+#ifdef WIN32
+ rc = WSAPoll(&pollfd, 1, 0);
+#else
+ rc = poll(&pollfd, 1, 0);
+#endif
+ if (rc < 0)
+ {
+ elog(COMMERROR, "could not poll socket: %m");
+ return false;
+ }
+ else if (rc == 1 && (pollfd.revents & (POLLHUP | PG_POLLRDHUP)))
+ return false;
+#endif
+
+ return true;
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2b1b68109f..b04573cd43 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2671,6 +2671,14 @@ start_xact_command(void)
* not desired, the timeout has to be disabled explicitly.
*/
enable_statement_timeout();
+
+ /* Start timeout for checking if the client has gone away if necessary. */
+ if (client_connection_check_interval > 0 &&
+ IsUnderPostmaster &&
+ MyProcPort &&
+ !get_timeout_active(CLIENT_CONNECTION_CHECK_TIMEOUT))
+ enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
+ client_connection_check_interval);
}
static void
@@ -3149,6 +3157,25 @@ ProcessInterrupts(void)
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating connection due to administrator command")));
}
+ if (CheckClientConnectionPending)
+ {
+ CheckClientConnectionPending = false;
+
+ /*
+ * Check for lost connection and re-arm, if still configured, but not
+ * if we've arrived back at DoingCommandRead state. We don't want to
+ * wake up idle sessions, and they already know how to detect lost
+ * connections.
+ */
+ if (!DoingCommandRead && client_connection_check_interval > 0)
+ {
+ if (!pq_check_connection())
+ ClientConnectionLost = true;
+ else
+ enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
+ client_connection_check_interval);
+ }
+ }
if (ClientConnectionLost)
{
QueryCancelPending = false; /* lost connection trumps QueryCancel */
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 73e0a672ae..a9f0fc3017 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -30,6 +30,7 @@ ProtocolVersion FrontendProtocol;
volatile sig_atomic_t InterruptPending = false;
volatile sig_atomic_t QueryCancelPending = false;
volatile sig_atomic_t ProcDiePending = false;
+volatile sig_atomic_t CheckClientConnectionPending = false;
volatile sig_atomic_t ClientConnectionLost = false;
volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7abeccb536..a3ec358538 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -73,6 +73,7 @@ static void StatementTimeoutHandler(void);
static void LockTimeoutHandler(void);
static void IdleInTransactionSessionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
+static void ClientCheckTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -620,6 +621,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
RegisterTimeout(IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
IdleInTransactionSessionTimeoutHandler);
RegisterTimeout(IDLE_SESSION_TIMEOUT, IdleSessionTimeoutHandler);
+ RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
}
/*
@@ -1242,6 +1244,14 @@ IdleSessionTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ClientCheckTimeoutHandler(void)
+{
+ CheckClientConnectionPending = true;
+ InterruptPending = true;
+ SetLatch(MyLatch);
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0c5dc4d3e8..9f67e8140e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3481,6 +3481,16 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"client_connection_check_interval", PGC_USERSET, CLIENT_CONN_OTHER,
+ gettext_noop("Sets the time interval for checking connection with the client."),
+ gettext_noop("A value of 0 disables this feature."),
+ GUC_UNIT_MS
+ },
+ &client_connection_check_interval,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b234a6bfe6..9691cb86f1 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -717,6 +717,9 @@
#dynamic_library_path = '$libdir'
+#client_connection_check_interval = 0 # set time interval between
+ # checks for client disconnection while
+ # running long queries; 0 for never
#------------------------------------------------------------------------------
# LOCK MANAGEMENT
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index b20deeb555..fac233c9f4 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -71,6 +71,7 @@ extern int pq_getbyte(void);
extern int pq_peekbyte(void);
extern int pq_getbyte_if_available(unsigned char *c);
extern int pq_putmessage_v2(char msgtype, const char *s, size_t len);
+extern bool pq_check_connection(void);
/*
* prototypes for functions in be-secure.c
@@ -109,6 +110,7 @@ extern ssize_t secure_open_gssapi(Port *port);
extern char *SSLCipherSuites;
extern char *SSLECDHCurve;
extern bool SSLPreferServerCiphers;
+extern int client_connection_check_interval;
extern int ssl_min_protocol_version;
extern int ssl_max_protocol_version;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 013850ac28..6f8251e0b0 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -85,6 +85,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleInTransactionSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
+extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
/* these are marked volatile because they are examined by signal handlers: */
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index ecb2a366a5..93e6a691b3 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -32,6 +32,7 @@ typedef enum TimeoutId
STANDBY_LOCK_TIMEOUT,
IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
IDLE_SESSION_TIMEOUT,
+ CLIENT_CONNECTION_CHECK_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
--
2.25.1