On Tue, Mar 30, 2021 at 10:00 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > If we want to ship this in v14 we have to make a decision ASAP: > > 1. Ship the POLLHUP patch (like v9) that only works reliably on > Linux. Maybe disable the feature completely on other OSes? > 2. Ship the patch that tries to read (like v7). It should work on > all systems, but it can be fooled by pipelined commands (though it can > detect a pipelined 'X'). > > Personally, I lean towards #2.
I changed my mind. Let's commit the pleasingly simple Linux-only feature for now, and extend to it to send some kind of no-op message in a later release. So this is the version I'd like to go with. Objections? I moved the GUC into tcop/postgres.c and tcop/tcopprot.h, because it directly controls postgres.c's behaviour, not pqcomm.c's. The latter only contains the code to perform the check.
From 1ad04e414d66ee23ce0612831648a25902661731 Mon Sep 17 00:00:00 2001 From: Maksim Milyutin <milyuti...@gmail.com> Date: Fri, 26 Mar 2021 10:18:30 +0300 Subject: [PATCH v10] Detect POLLHUP/POLLRDHUP while running queries. Provide a new optional GUC check_client_connection_interval that can be used to check whether the client connection has gone away periodically while running very long queries. For now this is Linux-only, because POLLRDHUP is not in POSIX and other OSes don't have a reliable way to know if a connection was shut down without actually trying to read or write. In future we might extend this to other OSes by trying to send a no-op/heartbeat message, but that may require protocol changes. Author: Sergey Cherkashin <s.cherkas...@postgrespro.ru> Author: Thomas Munro <thomas.mu...@gmail.com> Reviewed-by: Thomas Munro <thomas.mu...@gmail.com> Reviewed-by: Tatsuo Ishii <is...@sraoss.co.jp> Reviewed-by: Konstantin Knizhnik <k.knizh...@postgrespro.ru> Reviewed-by: Zhihong Yu <z...@yugabyte.com> Reviewed-by: Andres Freund <and...@anarazel.de> Reviewed-by: Maksim Milyutin <milyuti...@gmail.com> Reviewed-by: Tom Lane <t...@sss.pgh.pa.us> (much earlier version) Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru --- doc/src/sgml/config.sgml | 35 ++++++++++++++++++ src/backend/libpq/pqcomm.c | 36 +++++++++++++++++++ src/backend/tcop/postgres.c | 30 ++++++++++++++++ src/backend/utils/init/globals.c | 1 + src/backend/utils/init/postinit.c | 10 ++++++ src/backend/utils/misc/guc.c | 26 ++++++++++++++ src/backend/utils/misc/postgresql.conf.sample | 3 ++ src/include/libpq/libpq.h | 1 + src/include/miscadmin.h | 1 + src/include/tcop/tcopprot.h | 1 + src/include/utils/timeout.h | 1 + 11 files changed, 145 insertions(+) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index d1e2e8c4c3..bc1a42fa91 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -998,6 +998,41 @@ include_dir 'conf.d' </listitem> </varlistentry> + <varlistentry id="guc-client-connection-check-interval" xreflabel="client_connection_check_interval"> + <term><varname>client_connection_check_interval</varname> (<type>integer</type>) + <indexterm> + <primary><varname>client_connection_check_interval</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Sets the time interval between checks that the client is still + connected, while running queries. The check is performed by polling + the socket to allow long running queries to be aborted immediately. + This option is currently available only on Linux, because it uses the + <symbol>POLLRDHUP</symbol> extension to the <symbol>poll</symbol> + system call. + </para> + <para> + If the value is specified without units, it is taken as milliseconds. + The default value is <literal>0</literal>, which disables connection + checks. Without connection checks, the server will detect the loss of + the connection only at the next interaction with the socket, while + waiting for or receiving a new request, or sending a response. + </para> + <para> + For the kernel itself to detect lost TCP connections reliably and within + a known timeframe in all scenarios including network failure, it may + also be necessary to adjust the default TCP keepalive settings of the + operating system, or the + <xref linkend="guc-tcp-keepalives-idle"/>, + <xref linkend="guc-tcp-keepalives-interval"/> and + <xref linkend="guc-tcp-keepalives-count"/> settings of + <productname>PostgreSQL</productname>. + </para> + </listitem> + </varlistentry> + </variablelist> </sect2> diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 4c7b1e7bfd..c4a67a27d2 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -54,6 +54,9 @@ */ #include "postgres.h" +#ifdef HAVE_POLL_H +#include <poll.h> +#endif #include <signal.h> #include <fcntl.h> #include <grp.h> @@ -1921,3 +1924,36 @@ pq_settcpusertimeout(int timeout, Port *port) return STATUS_OK; } + +/* + * Check if the client is still connected. + */ +bool +pq_check_connection(void) +{ +#if defined(POLLRDHUP) + /* + * POLLRDHUP is a Linux extension to poll(2) to detect sockets closed by + * the other end. We don't have a portable way to do that without actually + * trying to read or write data on other systems. We don't want to read + * because that would be confused by buffered pipelined queries and COPY + * data. Perhaps in future we'll try to write a heartbeat message instead. + */ + struct pollfd pollfd; + int rc; + + pollfd.fd = MyProcPort->sock; + pollfd.events = POLLOUT | POLLIN | POLLRDHUP; + pollfd.revents = 0; + rc = poll(&pollfd, 1, 0); + if (rc < 0) + { + elog(COMMERROR, "could not poll socket: %m"); + return false; + } + else if (rc == 1 && (pollfd.revents & (POLLHUP | POLLRDHUP))) + return false; +#endif + + return true; +} diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 2b1b68109f..481c697072 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -102,6 +102,9 @@ int max_stack_depth = 100; /* wait N seconds to allow attach from a debugger */ int PostAuthDelay = 0; +/* Time between checks that the client is still connected. */ +int client_connection_check_interval = 0; + /* ---------------- * private typedefs etc * ---------------- @@ -2671,6 +2674,14 @@ start_xact_command(void) * not desired, the timeout has to be disabled explicitly. */ enable_statement_timeout(); + + /* Start timeout for checking if the client has gone away if necessary. */ + if (client_connection_check_interval > 0 && + IsUnderPostmaster && + MyProcPort && + !get_timeout_active(CLIENT_CONNECTION_CHECK_TIMEOUT)) + enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT, + client_connection_check_interval); } static void @@ -3149,6 +3160,25 @@ ProcessInterrupts(void) (errcode(ERRCODE_ADMIN_SHUTDOWN), errmsg("terminating connection due to administrator command"))); } + if (CheckClientConnectionPending) + { + CheckClientConnectionPending = false; + + /* + * Check for lost connection and re-arm, if still configured, but not + * if we've arrived back at DoingCommandRead state. We don't want to + * wake up idle sessions, and they already know how to detect lost + * connections. + */ + if (!DoingCommandRead && client_connection_check_interval > 0) + { + if (!pq_check_connection()) + ClientConnectionLost = true; + else + enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT, + client_connection_check_interval); + } + } if (ClientConnectionLost) { QueryCancelPending = false; /* lost connection trumps QueryCancel */ diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c index 73e0a672ae..a9f0fc3017 100644 --- a/src/backend/utils/init/globals.c +++ b/src/backend/utils/init/globals.c @@ -30,6 +30,7 @@ ProtocolVersion FrontendProtocol; volatile sig_atomic_t InterruptPending = false; volatile sig_atomic_t QueryCancelPending = false; volatile sig_atomic_t ProcDiePending = false; +volatile sig_atomic_t CheckClientConnectionPending = false; volatile sig_atomic_t ClientConnectionLost = false; volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false; volatile sig_atomic_t IdleSessionTimeoutPending = false; diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 7abeccb536..a3ec358538 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -73,6 +73,7 @@ static void StatementTimeoutHandler(void); static void LockTimeoutHandler(void); static void IdleInTransactionSessionTimeoutHandler(void); static void IdleSessionTimeoutHandler(void); +static void ClientCheckTimeoutHandler(void); static bool ThereIsAtLeastOneRole(void); static void process_startup_options(Port *port, bool am_superuser); static void process_settings(Oid databaseid, Oid roleid); @@ -620,6 +621,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, RegisterTimeout(IDLE_IN_TRANSACTION_SESSION_TIMEOUT, IdleInTransactionSessionTimeoutHandler); RegisterTimeout(IDLE_SESSION_TIMEOUT, IdleSessionTimeoutHandler); + RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler); } /* @@ -1242,6 +1244,14 @@ IdleSessionTimeoutHandler(void) SetLatch(MyLatch); } +static void +ClientCheckTimeoutHandler(void) +{ + CheckClientConnectionPending = true; + InterruptPending = true; + SetLatch(MyLatch); +} + /* * Returns true if at least one role is defined in this database cluster. */ diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 03daec9a08..ebebee3915 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -20,6 +20,7 @@ #include <float.h> #include <math.h> #include <limits.h> +#include <poll.h> #ifndef WIN32 #include <sys/mman.h> #endif @@ -204,6 +205,7 @@ static bool check_autovacuum_work_mem(int *newval, void **extra, GucSource sourc static bool check_effective_io_concurrency(int *newval, void **extra, GucSource source); static bool check_maintenance_io_concurrency(int *newval, void **extra, GucSource source); static bool check_huge_page_size(int *newval, void **extra, GucSource source); +static bool check_client_connection_check_interval(int *newval, void **extra, GucSource source); static void assign_pgstat_temp_directory(const char *newval, void *extra); static bool check_application_name(char **newval, void **extra, GucSource source); static void assign_application_name(const char *newval, void *extra); @@ -3491,6 +3493,16 @@ static struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"client_connection_check_interval", PGC_USERSET, CLIENT_CONN_OTHER, + gettext_noop("Sets the time interval between checks for disconnection while running queries."), + NULL, + GUC_UNIT_MS + }, + &client_connection_check_interval, + 0, 0, INT_MAX, + check_client_connection_check_interval, NULL, NULL + }, /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL @@ -11970,6 +11982,20 @@ check_huge_page_size(int *newval, void **extra, GucSource source) return true; } +static bool +check_client_connection_check_interval(int *newval, void **extra, GucSource source) +{ +#ifndef POLLRDHUP + /* Linux only, for now. See pq_check_connection(). */ + if (*newval != 0) + { + GUC_check_errdetail("client_connection_check_interval must be set to 0 on platforms that lack POLLRDHUP."); + return false; + } +#endif + return true; +} + static void assign_pgstat_temp_directory(const char *newval, void *extra) { diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 791d39cf07..a799c0faa1 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -718,6 +718,9 @@ #dynamic_library_path = '$libdir' +#client_connection_check_interval = 0 # time between checks for client + # disconnection while running queries; + # 0 for never #------------------------------------------------------------------------------ # LOCK MANAGEMENT diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h index b20deeb555..3ebbc8d665 100644 --- a/src/include/libpq/libpq.h +++ b/src/include/libpq/libpq.h @@ -71,6 +71,7 @@ extern int pq_getbyte(void); extern int pq_peekbyte(void); extern int pq_getbyte_if_available(unsigned char *c); extern int pq_putmessage_v2(char msgtype, const char *s, size_t len); +extern bool pq_check_connection(void); /* * prototypes for functions in be-secure.c diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 013850ac28..6f8251e0b0 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -85,6 +85,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleInTransactionSessionTimeoutPending; extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending; extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending; +extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending; extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost; /* these are marked volatile because they are examined by signal handlers: */ diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h index e5472100a4..241e7c9961 100644 --- a/src/include/tcop/tcopprot.h +++ b/src/include/tcop/tcopprot.h @@ -29,6 +29,7 @@ extern CommandDest whereToSendOutput; extern PGDLLIMPORT const char *debug_query_string; extern int max_stack_depth; extern int PostAuthDelay; +extern int client_connection_check_interval; /* GUC-configurable parameters */ diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h index ecb2a366a5..93e6a691b3 100644 --- a/src/include/utils/timeout.h +++ b/src/include/utils/timeout.h @@ -32,6 +32,7 @@ typedef enum TimeoutId STANDBY_LOCK_TIMEOUT, IDLE_IN_TRANSACTION_SESSION_TIMEOUT, IDLE_SESSION_TIMEOUT, + CLIENT_CONNECTION_CHECK_TIMEOUT, /* First user-definable timeout reason */ USER_TIMEOUT, /* Maximum number of timeout reasons */ -- 2.30.2