Hi all

The attached comments-only patch expands the signal handling section in
miscadmin.h a bit so that it mentions ProcSignal, deferred signal handling
during blocking calls, etc. It adds cross-refs between major signal
handling routines and the miscadmin comment to help readers track the
various scattered but inter-related code.

I hope this helps some new developers in future.
From a16d0a0f8f502ba3631d20d51c7bb50cedea6d57 Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.rin...@2ndquadrant.com>
Date: Mon, 18 Jan 2021 12:21:18 +0800
Subject: [PATCH v1 1/2] Comments and cross-references for signal handling

To help new developers come to terms with the complexity of signal
handling in PostgreSQL's various types backend, expand the
main comment on signal handling in miscadmin.h. Cover the ProcSignal
mechanism for SIGUSR1 multiplexing. Mention that blocking calls into 3rd
party code and uninterruptible syscalls should be avoided in backend
code because of Pg's deferred signal processing logic. Provide a set of
cross-references to key routines related to signal handling.

In various signal handling routines, cross-reference the high level
overview comment in miscadmin.c.

This should possibly become part of the developer documentation rather
than a comment in a header file, but since we already have the comment,
it seemed sensible to start by updating it and making it more
discoverable.
---
 src/backend/postmaster/interrupt.c | 21 ++++++++
 src/backend/tcop/postgres.c        | 26 ++++++++-
 src/include/miscadmin.h            | 84 ++++++++++++++++++++++++------
 src/port/pgsleep.c                 |  2 +-
 4 files changed, 113 insertions(+), 20 deletions(-)

diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index dd9136a942..e5256179ec 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -10,6 +10,15 @@
  *	  src/backend/postmaster/interrupt.c
  *
  *-------------------------------------------------------------------------
+ *
+ * This file defines bare-bones interrupt handlers for secondary helper
+ * processes run by the postmaster - the walwriter and bgwriter, and
+ * potentially some background workers.
+ *
+ * These handlers are NOT used by normal user backends as they do not support
+ * interruption of normal execution to respond to a signal, query cancellation,
+ * etc. See miscadmin.h for details on interrupt handling used by normal
+ * postgres backends - CHECK_FOR_INTERRUPTS(), ProcessInterrupts(), die(), etc.
  */
 
 #include "postgres.h"
@@ -28,6 +37,9 @@ volatile sig_atomic_t ShutdownRequestPending = false;
 
 /*
  * Simple interrupt handler for main loops of background processes.
+ *
+ * See also CHECK_FOR_INTERRUPTS() and ProcessInterrupts() for the user-backend
+ * variant of this function.
  */
 void
 HandleMainLoopInterrupts(void)
@@ -51,6 +63,8 @@ HandleMainLoopInterrupts(void)
  * Normally, this handler would be used for SIGHUP. The idea is that code
  * which uses it would arrange to check the ConfigReloadPending flag at
  * convenient places inside main loops, or else call HandleMainLoopInterrupts.
+ *
+ * Most backends use this handler.
  */
 void
 SignalHandlerForConfigReload(SIGNAL_ARGS)
@@ -67,6 +81,9 @@ SignalHandlerForConfigReload(SIGNAL_ARGS)
  * Simple signal handler for exiting quickly as if due to a crash.
  *
  * Normally, this would be used for handling SIGQUIT.
+ *
+ * See also quickdie() and die() for the separate signal handling logic
+ * used by normal user backends.
  */
 void
 SignalHandlerForCrashExit(SIGNAL_ARGS)
@@ -99,6 +116,10 @@ SignalHandlerForCrashExit(SIGNAL_ARGS)
  *
  * ShutdownRequestPending should be checked at a convenient place within the
  * main loop, or else the main loop should call HandleMainLoopInterrupts.
+ *
+ * See also die() for the extended version of this handler that's used in
+ * backends that may need to be interrupted while performing long-running
+ * actions.
  */
 void
 SignalHandlerForShutdownRequest(SIGNAL_ARGS)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cf1359e33..907b524e0f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2852,8 +2852,15 @@ quickdie(SIGNAL_ARGS)
 }
 
 /*
- * Shutdown signal from postmaster: abort transaction and exit
- * at soonest convenient time
+ * Shutdown signal from postmaster: set flags to ask ProcessInterrupts() to
+ * abort the current transaction and exit at the soonest convenient time.
+ *
+ * This handler is used as the SIGTERM handler by most backend types that may
+ * run arbitrary backend code, user queries, etc. Some backends use different
+ * handlers and sometimes different flags.
+ *
+ * See "System interrupt and critical section handling" in miscadmin.h for a
+ * higher level explanation of signal handling.
  */
 void
 die(SIGNAL_ARGS)
@@ -2924,6 +2931,9 @@ FloatExceptionHandler(SIGNAL_ARGS)
  * handling following receipt of SIGUSR1. Designed to be similar to die()
  * and StatementCancelHandler(). Called only by a normal user backend
  * that begins a transaction during recovery.
+ *
+ * This runs in signal handler context (via procsignal_sigusr1_handler) so it
+ * must be quick, non-blocking, re-entrant and limit its stack use.
  */
 void
 RecoveryConflictInterrupt(ProcSignalReason reason)
@@ -3059,6 +3069,18 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
  * If an interrupt condition is pending, and it's safe to service it,
  * then clear the flag and accept the interrupt.  Called only when
  * InterruptPending is true.
+ *
+ * This routine runs outside interrupt handler context whenever control has
+ * returned to normal postgres code that calls CHECK_FOR_INTERRUPTS().  It
+ * handles the flags set by die() and other normal interrupt handlers as well
+ * as ProcSignal flags set by procsignal_sigusr1_handler() SIGUSR1
+ * multiplexing.
+ *
+ * Not all backend types' signal handlers set the flags tested by
+ * ProcessInterrupts().
+ *
+ * See "System interrupt and critical section handling" in in miscadmin.h for
+ * higher level details on signal handling.
  */
 void
 ProcessInterrupts(void)
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f082d04448..53e052472b 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -35,23 +35,46 @@
 /*****************************************************************************
  *	  System interrupt and critical section handling
  *
- * There are two types of interrupts that a running backend needs to accept
- * without messing up its state: QueryCancel (SIGINT) and ProcDie (SIGTERM).
- * In both cases, we need to be able to clean up the current transaction
- * gracefully, so we can't respond to the interrupt instantaneously ---
- * there's no guarantee that internal data structures would be self-consistent
- * if the code is interrupted at an arbitrary instant.  Instead, the signal
- * handlers set flags that are checked periodically during execution.
+ * There are a few types of interrupts that a running backend needs to accept
+ * without messing up its state. It must handle QueryCancel (SIGINT) and
+ * ProcDie (SIGTERM) as well as a number of ProcSignal requests
+ * (RecoveryConflictInterrupt()) delivered over SIGUSR1 multiplexing.
+ *
+ * In most of these cases we need to be able to clean up the current
+ * transaction gracefully, so we can't respond to the interrupt
+ * instantaneously --- there's no guarantee that internal data structures
+ * would be self-consistent if the code is interrupted at an arbitrary
+ * instant.  Instead, the signal handlers set flags that are checked
+ * periodically during execution.
  *
  * The CHECK_FOR_INTERRUPTS() macro is called at strategically located spots
- * where it is normally safe to accept a cancel or die interrupt.  In some
- * cases, we invoke CHECK_FOR_INTERRUPTS() inside low-level subroutines that
- * might sometimes be called in contexts that do *not* want to allow a cancel
- * or die interrupt.  The HOLD_INTERRUPTS() and RESUME_INTERRUPTS() macros
- * allow code to ensure that no cancel or die interrupt will be accepted,
- * even if CHECK_FOR_INTERRUPTS() gets called in a subroutine.  The interrupt
- * will be held off until CHECK_FOR_INTERRUPTS() is done outside any
- * HOLD_INTERRUPTS() ... RESUME_INTERRUPTS() section.
+ * where it is normally safe to accept a cancel or die interrupt or
+ * ProcSignal request. It's just a wrapper for ProcessInterrupts().
+ * ProcessInterrupts() acts on flags set by both normal signal handlers like
+ * die() and by ProcSignal handlers like RecoveryConflictInterrupt() and is
+ * responsible for actually cancelling queries, terminating the current the
+ * backend, etc.
+ *
+ * Interrupts will only be acted on when PostgreSQL or extension code is in
+ * control so that CHECK_FOR_INTERRUPTS() gets called regularly.  Consequently,
+ * if a backend calls into long-running or blocking external library code or
+ * makes uninterruptible system calls then it may fail to respond to SIGTERM or
+ * other requests in a timely manner. Amongst other issues, this can cause the
+ * backend to fail to exit in response to a postmaster shutdown request and
+ * prevent a clean postmaster shutdown. The administrator may have to SIGKILL
+ * the process to make it exit, which will force crash recovery on next
+ * startup. Avoid making long-running calls into non-postgres code. Instead,
+ * structure your code as a loop over non-blocking calls that invokes
+ * CHECK_FOR_INTERRUPTS() each iteration and uses WaitEventSetWait() if has to
+ * sleep.
+ *
+ * In some cases, we invoke CHECK_FOR_INTERRUPTS() inside low-level
+ * subroutines that might sometimes be called in contexts that do *not* want
+ * to allow a cancel or die interrupt. The HOLD_INTERRUPTS() and
+ * RESUME_INTERRUPTS() macros allow code to ensure that no cancel or die
+ * interrupt will be accepted, even if CHECK_FOR_INTERRUPTS() gets called in
+ * a subroutine.  The interrupt will be held off until CHECK_FOR_INTERRUPTS()
+ * is done outside any HOLD_INTERRUPTS() ...  RESUME_INTERRUPTS() section.
  *
  * There is also a mechanism to prevent query cancel interrupts, while still
  * allowing die interrupts: HOLD_CANCEL_INTERRUPTS() and
@@ -71,8 +94,35 @@
  * mechanism.  A critical section not only holds off cancel/die interrupts,
  * but causes any ereport(ERROR) or ereport(FATAL) to become ereport(PANIC)
  * --- that is, a system-wide reset is forced.  Needless to say, only really
- * *critical* code should be marked as a critical section!	Currently, this
- * mechanism is only used for XLOG-related code.
+ * *critical* code should be marked as a critical section! This
+ * mechanism is mainly used for XLOG-related code.
+ *
+ * See also:
+ *
+ * - signal handler setup:
+ *       pqsignal()
+ * - Signal handling for full featured backends:
+ *       CHECK_FOR_INTERRUPTS()
+ *       ProcessInterrupts()
+ *       die()
+ *       StatementCancelHandler()
+ *       RecoveryConflictInterrupt()
+ * - Generic interrupt handling for simple postmaster children:
+ *       HandleMainLoopInterrupts()
+ *       SignalHandlerFor... routines in interrupt.c
+ * - ProcSignal SIGUSR1 multiplexing:
+ *       SendProcSignal()
+ *       procsignal_sigusr1_handler()
+ *       RecoveryConflictInterrupt()
+ * - Nonstandard signal handlers:
+ *       WalSndSignals() for the walsender
+ *       bgworker_die() and StartBackendWorker() for bgworkers
+ *       ... and any other nonstandard pqsignal() handlers
+ * - Interrupt-aware sleeping and waiting:
+ *       WaitLatch()
+ *       WaitEventSetWait()
+ *       pg_usleep()
+ *
  *
  *****************************************************************************/
 
diff --git a/src/port/pgsleep.c b/src/port/pgsleep.c
index d6d51363e7..08b011d57d 100644
--- a/src/port/pgsleep.c
+++ b/src/port/pgsleep.c
@@ -41,7 +41,7 @@
  * so that the sleep effectively starts over!  It is therefore rather hazardous
  * to use this for long sleeps; a continuing stream of signal events could
  * prevent the sleep from ever terminating.  Better practice for long sleeps
- * is to use WaitLatch() with a timeout.
+ * is to use WaitLatch() or WaitEventSetWait() with a timeout.
  */
 void
 pg_usleep(long microsec)
-- 
2.29.2

Reply via email to