On Tue, Mar 30, 2021 at 10:00 AM Thomas Munro <thomas.mu...@gmail.com> wrote:
> If we want to ship this in v14 we have to make a decision ASAP:
>
> 1.  Ship the POLLHUP patch (like v9) that only works reliably on
> Linux.  Maybe disable the feature completely on other OSes?
> 2.  Ship the patch that tries to read (like v7).  It should work on
> all systems, but it can be fooled by pipelined commands (though it can
> detect a pipelined 'X').
>
> Personally, I lean towards #2.

I changed my mind.  Let's commit the pleasingly simple Linux-only
feature for now, and extend to it to send some kind of no-op message
in a later release.  So this is the version I'd like to go with.
Objections?

I moved the GUC into tcop/postgres.c and tcop/tcopprot.h, because it
directly controls postgres.c's behaviour, not pqcomm.c's.  The latter
only contains the code to perform the check.
From 1ad04e414d66ee23ce0612831648a25902661731 Mon Sep 17 00:00:00 2001
From: Maksim Milyutin <milyuti...@gmail.com>
Date: Fri, 26 Mar 2021 10:18:30 +0300
Subject: [PATCH v10] Detect POLLHUP/POLLRDHUP while running queries.

Provide a new optional GUC check_client_connection_interval that can be
used to check whether the client connection has gone away periodically
while running very long queries.

For now this is Linux-only, because POLLRDHUP is not in POSIX and other
OSes don't have a reliable way to know if a connection was shut down
without actually trying to read or write.  In future we might extend
this to other OSes by trying to send a no-op/heartbeat message, but
that may require protocol changes.

Author: Sergey Cherkashin <s.cherkas...@postgrespro.ru>
Author: Thomas Munro <thomas.mu...@gmail.com>
Reviewed-by: Thomas Munro <thomas.mu...@gmail.com>
Reviewed-by: Tatsuo Ishii <is...@sraoss.co.jp>
Reviewed-by: Konstantin Knizhnik <k.knizh...@postgrespro.ru>
Reviewed-by: Zhihong Yu <z...@yugabyte.com>
Reviewed-by: Andres Freund <and...@anarazel.de>
Reviewed-by: Maksim Milyutin <milyuti...@gmail.com>
Reviewed-by: Tom Lane <t...@sss.pgh.pa.us> (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
---
 doc/src/sgml/config.sgml                      | 35 ++++++++++++++++++
 src/backend/libpq/pqcomm.c                    | 36 +++++++++++++++++++
 src/backend/tcop/postgres.c                   | 30 ++++++++++++++++
 src/backend/utils/init/globals.c              |  1 +
 src/backend/utils/init/postinit.c             | 10 ++++++
 src/backend/utils/misc/guc.c                  | 26 ++++++++++++++
 src/backend/utils/misc/postgresql.conf.sample |  3 ++
 src/include/libpq/libpq.h                     |  1 +
 src/include/miscadmin.h                       |  1 +
 src/include/tcop/tcopprot.h                   |  1 +
 src/include/utils/timeout.h                   |  1 +
 11 files changed, 145 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d1e2e8c4c3..bc1a42fa91 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -998,6 +998,41 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-client-connection-check-interval" xreflabel="client_connection_check_interval">
+      <term><varname>client_connection_check_interval</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>client_connection_check_interval</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the time interval between checks that the client is still
+        connected, while running queries.  The check is performed by polling
+        the socket to allow long running queries to be aborted immediately.
+        This option is currently available only on Linux, because it uses the
+        <symbol>POLLRDHUP</symbol> extension to the <symbol>poll</symbol>
+        system call.
+       </para>
+       <para>
+        If the value is specified without units, it is taken as milliseconds.
+        The default value is <literal>0</literal>, which disables connection
+        checks.  Without connection checks, the server will detect the loss of
+        the connection only at the next interaction with the socket, while
+        waiting for or receiving a new request, or sending a response.
+       </para>
+       <para>
+        For the kernel itself to detect lost TCP connections reliably and within
+        a known timeframe in all scenarios including network failure, it may
+        also be necessary to adjust the default TCP keepalive settings of the
+        operating system, or the
+        <xref linkend="guc-tcp-keepalives-idle"/>,
+        <xref linkend="guc-tcp-keepalives-interval"/> and
+        <xref linkend="guc-tcp-keepalives-count"/> settings of
+        <productname>PostgreSQL</productname>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
      </sect2>
 
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 4c7b1e7bfd..c4a67a27d2 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -54,6 +54,9 @@
  */
 #include "postgres.h"
 
+#ifdef HAVE_POLL_H
+#include <poll.h>
+#endif
 #include <signal.h>
 #include <fcntl.h>
 #include <grp.h>
@@ -1921,3 +1924,36 @@ pq_settcpusertimeout(int timeout, Port *port)
 
 	return STATUS_OK;
 }
+
+/*
+ * Check if the client is still connected.
+ */
+bool
+pq_check_connection(void)
+{
+#if defined(POLLRDHUP)
+	/*
+	 * POLLRDHUP is a Linux extension to poll(2) to detect sockets closed by
+	 * the other end.  We don't have a portable way to do that without actually
+	 * trying to read or write data on other systems.  We don't want to read
+	 * because that would be confused by buffered pipelined queries and COPY
+	 * data.  Perhaps in future we'll try to write a heartbeat message instead.
+	 */
+	struct pollfd pollfd;
+	int			rc;
+
+	pollfd.fd = MyProcPort->sock;
+	pollfd.events = POLLOUT | POLLIN | POLLRDHUP;
+	pollfd.revents = 0;
+	rc = poll(&pollfd, 1, 0);
+	if (rc < 0)
+	{
+		elog(COMMERROR, "could not poll socket: %m");
+		return false;
+	}
+	else if (rc == 1 && (pollfd.revents & (POLLHUP | POLLRDHUP)))
+		return false;
+#endif
+
+	return true;
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2b1b68109f..481c697072 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -102,6 +102,9 @@ int			max_stack_depth = 100;
 /* wait N seconds to allow attach from a debugger */
 int			PostAuthDelay = 0;
 
+/* Time between checks that the client is still connected. */
+int			client_connection_check_interval = 0;
+
 /* ----------------
  *		private typedefs etc
  * ----------------
@@ -2671,6 +2674,14 @@ start_xact_command(void)
 	 * not desired, the timeout has to be disabled explicitly.
 	 */
 	enable_statement_timeout();
+
+	/* Start timeout for checking if the client has gone away if necessary. */
+	if (client_connection_check_interval > 0 &&
+		IsUnderPostmaster &&
+		MyProcPort &&
+		!get_timeout_active(CLIENT_CONNECTION_CHECK_TIMEOUT))
+		enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
+							 client_connection_check_interval);
 }
 
 static void
@@ -3149,6 +3160,25 @@ ProcessInterrupts(void)
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
 					 errmsg("terminating connection due to administrator command")));
 	}
+	if (CheckClientConnectionPending)
+	{
+		CheckClientConnectionPending = false;
+
+		/*
+		 * Check for lost connection and re-arm, if still configured, but not
+		 * if we've arrived back at DoingCommandRead state.  We don't want to
+		 * wake up idle sessions, and they already know how to detect lost
+		 * connections.
+		 */
+		if (!DoingCommandRead && client_connection_check_interval > 0)
+		{
+			if (!pq_check_connection())
+				ClientConnectionLost = true;
+			else
+				enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
+									 client_connection_check_interval);
+		}
+	}
 	if (ClientConnectionLost)
 	{
 		QueryCancelPending = false; /* lost connection trumps QueryCancel */
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 73e0a672ae..a9f0fc3017 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -30,6 +30,7 @@ ProtocolVersion FrontendProtocol;
 volatile sig_atomic_t InterruptPending = false;
 volatile sig_atomic_t QueryCancelPending = false;
 volatile sig_atomic_t ProcDiePending = false;
+volatile sig_atomic_t CheckClientConnectionPending = false;
 volatile sig_atomic_t ClientConnectionLost = false;
 volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false;
 volatile sig_atomic_t IdleSessionTimeoutPending = false;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7abeccb536..a3ec358538 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -73,6 +73,7 @@ static void StatementTimeoutHandler(void);
 static void LockTimeoutHandler(void);
 static void IdleInTransactionSessionTimeoutHandler(void);
 static void IdleSessionTimeoutHandler(void);
+static void ClientCheckTimeoutHandler(void);
 static bool ThereIsAtLeastOneRole(void);
 static void process_startup_options(Port *port, bool am_superuser);
 static void process_settings(Oid databaseid, Oid roleid);
@@ -620,6 +621,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 		RegisterTimeout(IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
 						IdleInTransactionSessionTimeoutHandler);
 		RegisterTimeout(IDLE_SESSION_TIMEOUT, IdleSessionTimeoutHandler);
+		RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
 	}
 
 	/*
@@ -1242,6 +1244,14 @@ IdleSessionTimeoutHandler(void)
 	SetLatch(MyLatch);
 }
 
+static void
+ClientCheckTimeoutHandler(void)
+{
+	CheckClientConnectionPending = true;
+	InterruptPending = true;
+	SetLatch(MyLatch);
+}
+
 /*
  * Returns true if at least one role is defined in this database cluster.
  */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 03daec9a08..ebebee3915 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -20,6 +20,7 @@
 #include <float.h>
 #include <math.h>
 #include <limits.h>
+#include <poll.h>
 #ifndef WIN32
 #include <sys/mman.h>
 #endif
@@ -204,6 +205,7 @@ static bool check_autovacuum_work_mem(int *newval, void **extra, GucSource sourc
 static bool check_effective_io_concurrency(int *newval, void **extra, GucSource source);
 static bool check_maintenance_io_concurrency(int *newval, void **extra, GucSource source);
 static bool check_huge_page_size(int *newval, void **extra, GucSource source);
+static bool check_client_connection_check_interval(int *newval, void **extra, GucSource source);
 static void assign_pgstat_temp_directory(const char *newval, void *extra);
 static bool check_application_name(char **newval, void **extra, GucSource source);
 static void assign_application_name(const char *newval, void *extra);
@@ -3491,6 +3493,16 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"client_connection_check_interval", PGC_USERSET, CLIENT_CONN_OTHER,
+			gettext_noop("Sets the time interval between checks for disconnection while running queries."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&client_connection_check_interval,
+		0, 0, INT_MAX,
+		check_client_connection_check_interval, NULL, NULL
+	},
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -11970,6 +11982,20 @@ check_huge_page_size(int *newval, void **extra, GucSource source)
 	return true;
 }
 
+static bool
+check_client_connection_check_interval(int *newval, void **extra, GucSource source)
+{
+#ifndef POLLRDHUP
+	/* Linux only, for now.  See pq_check_connection(). */
+	if (*newval != 0)
+	{
+		GUC_check_errdetail("client_connection_check_interval must be set to 0 on platforms that lack POLLRDHUP.");
+		return false;
+	}
+#endif
+	return true;
+}
+
 static void
 assign_pgstat_temp_directory(const char *newval, void *extra)
 {
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 791d39cf07..a799c0faa1 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -718,6 +718,9 @@
 
 #dynamic_library_path = '$libdir'
 
+#client_connection_check_interval = 0	# time between checks for client
+					# disconnection while running queries;
+					# 0 for never
 
 #------------------------------------------------------------------------------
 # LOCK MANAGEMENT
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index b20deeb555..3ebbc8d665 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -71,6 +71,7 @@ extern int	pq_getbyte(void);
 extern int	pq_peekbyte(void);
 extern int	pq_getbyte_if_available(unsigned char *c);
 extern int	pq_putmessage_v2(char msgtype, const char *s, size_t len);
+extern bool pq_check_connection(void);
 
 /*
  * prototypes for functions in be-secure.c
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 013850ac28..6f8251e0b0 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -85,6 +85,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleInTransactionSessionTimeoutPending;
 extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
 extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
 
+extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
 extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
 
 /* these are marked volatile because they are examined by signal handlers: */
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index e5472100a4..241e7c9961 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -29,6 +29,7 @@ extern CommandDest whereToSendOutput;
 extern PGDLLIMPORT const char *debug_query_string;
 extern int	max_stack_depth;
 extern int	PostAuthDelay;
+extern int	client_connection_check_interval;
 
 /* GUC-configurable parameters */
 
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index ecb2a366a5..93e6a691b3 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -32,6 +32,7 @@ typedef enum TimeoutId
 	STANDBY_LOCK_TIMEOUT,
 	IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
 	IDLE_SESSION_TIMEOUT,
+	CLIENT_CONNECTION_CHECK_TIMEOUT,
 	/* First user-definable timeout reason */
 	USER_TIMEOUT,
 	/* Maximum number of timeout reasons */
-- 
2.30.2

Reply via email to