Re: SIGQUIT handling, redux

Tom Lane Thu, 10 Sep 2020 19:41:18 -0700

I wrote:
> I'll take a closer look at the idea of using _exit(1) tomorrow,
> but I'd be pretty hesitant to back-patch that.


Here's a patch for that; it passes light testing, including forcing
the backend through the SIGTERM and timeout code paths.  There's
not much to it except for changing the signal handlers, but I did
add a cross-check that no on-exit handlers have been registered
before we get done with using these signal handlers.

I looked briefly at the idea of postponing ProcessStartupPacket
until InitPostgres has set up a fairly normal environment.  It
seems like it'd take a fair amount of refactoring.  I really
doubt it's worth the effort, even though the result would be
arguably cleaner logic than what we have now.

                        regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 15ad675bc1..081022a206 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4298,6 +4298,8 @@ report_fork_failure_to_client(Port *port, int errnum)
  * returns: nothing.  Will not return at all if there's any failure.
  *
  * Note: this code does not depend on having any access to shared memory.
+ * Indeed, our approach to SIGTERM/timeout handling *requires* that
+ * shared memory not have been touched yet; see comments within.
  * In the EXEC_BACKEND case, we are physically attached to shared memory
  * but have not yet set up most of our local pointers to shmem structures.
  */
@@ -4341,27 +4343,14 @@ BackendInitialize(Port *port)
 	whereToSendOutput = DestRemote; /* now safe to ereport to client */
 
 	/*
-	 * We arrange to do proc_exit(1) if we receive SIGTERM or timeout while
-	 * trying to collect the startup packet; while SIGQUIT results in
-	 * _exit(2).  Otherwise the postmaster cannot shutdown the database FAST
-	 * or IMMED cleanly if a buggy client fails to send the packet promptly.
+	 * We arrange to do _exit(1) if we receive SIGTERM or timeout while trying
+	 * to collect the startup packet; while SIGQUIT results in _exit(2).
+	 * Otherwise the postmaster cannot shutdown the database FAST or IMMED
+	 * cleanly if a buggy client fails to send the packet promptly.
 	 *
-	 * XXX this is pretty dangerous; signal handlers should not call anything
-	 * as complex as proc_exit() directly.  We minimize the hazard by not
-	 * keeping these handlers active for longer than we must.  However, it
-	 * seems necessary to be able to escape out of DNS lookups as well as the
-	 * startup packet reception proper, so we can't narrow the scope further
-	 * than is done here.
-	 *
-	 * XXX it follows that the remainder of this function must tolerate losing
-	 * control at any instant.  Likewise, any pg_on_exit_callback registered
-	 * before or during this function must be prepared to execute at any
-	 * instant between here and the end of this function.  Furthermore,
-	 * affected callbacks execute partially or not at all when a second
-	 * exit-inducing signal arrives after proc_exit_prepare() decrements
-	 * on_proc_exit_index.  (Thanks to that mechanic, callbacks need not
-	 * anticipate more than one call.)  This is fragile; it ought to instead
-	 * follow the norm of handling interrupts at selected, safe opportunities.
+	 * Exiting with _exit(1) is only possible because we have not yet touched
+	 * shared memory; therefore no outside-the-process state needs to get
+	 * cleaned up.
 	 */
 	pqsignal(SIGTERM, process_startup_packet_die);
 	pqsignal(SIGQUIT, SignalHandlerForCrashExit);
@@ -4420,8 +4409,8 @@ BackendInitialize(Port *port)
 		port->remote_hostname = strdup(remote_host);
 
 	/*
-	 * Ready to begin client interaction.  We will give up and proc_exit(1)
-	 * after a time delay, so that a broken client can't hog a connection
+	 * Ready to begin client interaction.  We will give up and _exit(1) after
+	 * a time delay, so that a broken client can't hog a connection
 	 * indefinitely.  PreAuthDelay and any DNS interactions above don't count
 	 * against the time limit.
 	 *
@@ -4449,6 +4438,17 @@ BackendInitialize(Port *port)
 	disable_timeout(STARTUP_PACKET_TIMEOUT, false);
 	PG_SETMASK(&BlockSig);
 
+	/*
+	 * As a safety check that nothing in startup has yet performed
+	 * shared-memory modifications that would need to be undone if we had
+	 * exited through SIGTERM or timeout above, check that no on_shmem_exit
+	 * handlers have been registered yet.  (This isn't terribly bulletproof,
+	 * since someone might misuse an on_proc_exit handler for shmem cleanup,
+	 * but it's a cheap and helpful check.  We cannot disallow on_proc_exit
+	 * handlers unfortunately, since pq_init() already registered one.)
+	 */
+	check_on_shmem_exit_lists_are_empty();
+
 	/*
 	 * Stop here if it was bad or a cancel packet.  ProcessStartupPacket
 	 * already did any appropriate error reporting.
@@ -5369,23 +5369,21 @@ sigusr1_handler(SIGNAL_ARGS)
 
 /*
  * SIGTERM while processing startup packet.
- * Clean up and exit(1).
  *
- * Running proc_exit() from a signal handler is pretty unsafe, since we
- * can't know what code we've interrupted.  But the alternative of using
- * _exit(2) is also unpalatable, since it'd mean that a "fast shutdown"
- * would cause a database crash cycle (forcing WAL replay at restart)
- * if any sessions are in authentication.  So we live with it for now.
+ * Running proc_exit() from a signal handler would be quite unsafe.
+ * However, since we have not yet touched shared memory, we can just
+ * pull the plug and exit without running any atexit handlers.
  *
- * One might be tempted to try to send a message indicating why we are
- * disconnecting.  However, that would make this even more unsafe.  Also,
- * it seems undesirable to provide clues about the database's state to
- * a client that has not yet completed authentication.
+ * One might be tempted to try to send a message, or log one, indicating
+ * why we are disconnecting.  However, that would be quite unsafe in itself.
+ * Also, it seems undesirable to provide clues about the database's state
+ * to a client that has not yet completed authentication, or even sent us
+ * a startup packet.
  */
 static void
 process_startup_packet_die(SIGNAL_ARGS)
 {
-	proc_exit(1);
+	_exit(1);
 }
 
 /*
@@ -5404,16 +5402,12 @@ dummy_handler(SIGNAL_ARGS)
 
 /*
  * Timeout while processing startup packet.
- * As for process_startup_packet_die(), we clean up and exit(1).
- *
- * This is theoretically just as hazardous as in process_startup_packet_die(),
- * although in practice we're almost certainly waiting for client input,
- * which greatly reduces the risk.
+ * As for process_startup_packet_die(), we exit via _exit(1).
  */
 static void
 StartupPacketTimeoutHandler(void)
 {
-	proc_exit(1);
+	_exit(1);
 }
 
 
diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index 11c3f132a1..36a067c924 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -416,3 +416,20 @@ on_exit_reset(void)
 	on_proc_exit_index = 0;
 	reset_on_dsm_detach();
 }
+
+/* ----------------------------------------------------------------
+ *		check_on_shmem_exit_lists_are_empty
+ *
+ *		Debugging check that no shmem cleanup handlers have been registered
+ *		prematurely in the current process.
+ * ----------------------------------------------------------------
+ */
+void
+check_on_shmem_exit_lists_are_empty(void)
+{
+	if (before_shmem_exit_index)
+		elog(FATAL, "before_shmem_exit has been called prematurely");
+	if (on_shmem_exit_index)
+		elog(FATAL, "on_shmem_exit has been called prematurely");
+	/* Checking DSM detach state seems unnecessary given the above */
+}
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 462fe46341..88994fdc26 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -72,6 +72,7 @@ extern void on_shmem_exit(pg_on_exit_callback function, Datum arg);
 extern void before_shmem_exit(pg_on_exit_callback function, Datum arg);
 extern void cancel_before_shmem_exit(pg_on_exit_callback function, Datum arg);
 extern void on_exit_reset(void);
+extern void check_on_shmem_exit_lists_are_empty(void);
 
 /* ipci.c */
 extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;

Re: SIGQUIT handling, redux

Reply via email to