On 19/07/18 23:13, Andres Freund wrote:
On 2018-07-19 14:54:52 -0500, Nico Williams wrote:
On Thu, Jul 19, 2018 at 12:20:53PM -0700, Andres Freund wrote:
On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote:
Ugh. Yeah, in wal_quickdie, and other aux process *_quickdie() handlers, I
agree we should just _exit(2). All we want to do is to exit the process
immediately.

Indeed.

bgworker_quickdie() makes some effort to block SIGQUIT during the exit()
processing, but that doesn't solve the whole problem. The process could've
been in the middle of a malloc/free when the signal arrived, for example.
exit() is simply not safe to call from a signal handler.

Yea. I really don't understand why we have't been able to agree on this
for *years* now.

I mean, only calling async-signal-safe functions from signal handlers is
a critical POSIX requirement, and exit(3) is NOT async-signal-safe.

There's a few standard requirements that aren't particularly relevant in
practice (including some functions not being listed as signal
safe). Just arguing from that isn't really helpful. But there's plenty
hard evidence, including a few messages upthread!, of it being
practically problematic. Just looking at the reasoning at why exit (and
malloc) aren't signal safe however, makes it clear that this particular
restriction should be adhered to, however.

ISTM that no-one has any great ideas on what to do about the ereport() in quickdie(). But I think we have consensus on replacing the exit(2) calls with _exit(2). If we do just that, it would be better than the status quo, even if it doesn't completely fix the problem. This would prevent the case that Asim reported, for starters.

Any objections to the attached?

With _exit(), I think we wouldn't really need the "PG_SETMASK(&BlockSig);" calls in the aux process *_quickdie handlers that don't do anything else than call _exit(). But I didn't dare to remove them yet.

- Heikki
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..ac8693fd61 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -650,22 +650,21 @@ bgworker_quickdie(SIGNAL_ARGS)
 	/*
 	 * We DO NOT want to run proc_exit() callbacks -- we're here because
 	 * shared memory may be corrupted, so we don't want to try to clean up our
-	 * transaction.  Just nail the windows shut and get out of town.  Now that
-	 * there's an atexit callback to prevent third-party code from breaking
-	 * things by calling exit() directly, we have to reset the callbacks
-	 * explicitly to make this work as intended.
-	 */
-	on_exit_reset();
-
-	/*
-	 * Note we do exit(2) not exit(0).  This is to force the postmaster into a
-	 * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
-	 * backend.  This is necessary precisely because we don't clean up our
-	 * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
-	 * should ensure the postmaster sees this as a crash, too, but no harm in
-	 * being doubly sure.)
+	 * transaction.  Just nail the windows shut and get out of town.
+	 *
+	 * There's an atexit callback to prevent third-party code from breaking
+	 * things by calling exit() directly.  We don't want to trigger that, so
+	 * we use _exit(), which doesn't run atexit callbacks, rather than exit().
+	 * And exit() wouldn't be safe to run from a signal handler, anyway.
+	 *
+	 * Note we use _exit(2) not _exit(0).  This is to force the postmaster
+	 * into a system reset cycle if some idiot DBA sends a manual SIGQUIT to a
+	 * random backend.  This is necessary precisely because we don't clean up
+	 * our shared memory state.  (The "dead man switch" mechanism in
+	 * pmsignal.c should ensure the postmaster sees this as a crash, too, but
+	 * no harm in being doubly sure.)
 	 */
-	exit(2);
+	_exit(2);
 }
 
 /*
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 960d3de204..010d53a6ce 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -404,22 +404,21 @@ bg_quickdie(SIGNAL_ARGS)
 	/*
 	 * We DO NOT want to run proc_exit() callbacks -- we're here because
 	 * shared memory may be corrupted, so we don't want to try to clean up our
-	 * transaction.  Just nail the windows shut and get out of town.  Now that
-	 * there's an atexit callback to prevent third-party code from breaking
-	 * things by calling exit() directly, we have to reset the callbacks
-	 * explicitly to make this work as intended.
-	 */
-	on_exit_reset();
-
-	/*
-	 * Note we do exit(2) not exit(0).  This is to force the postmaster into a
-	 * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
-	 * backend.  This is necessary precisely because we don't clean up our
-	 * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
-	 * should ensure the postmaster sees this as a crash, too, but no harm in
-	 * being doubly sure.)
+	 * transaction.  Just nail the windows shut and get out of town.
+	 *
+	 * There's an atexit callback to prevent third-party code from breaking
+	 * things by calling exit() directly.  We don't want to trigger that, so
+	 * we use _exit(), which doesn't run atexit callbacks, rather than exit().
+	 * And exit() wouldn't be safe to run from a signal handler, anyway.
+	 *
+	 * Note we use _exit(2) not _exit(0).  This is to force the postmaster
+	 * into a system reset cycle if some idiot DBA sends a manual SIGQUIT to a
+	 * random backend.  This is necessary precisely because we don't clean up
+	 * our shared memory state.  (The "dead man switch" mechanism in
+	 * pmsignal.c should ensure the postmaster sees this as a crash, too, but
+	 * no harm in being doubly sure.)
 	 */
-	exit(2);
+	_exit(2);
 }
 
 /* SIGHUP: set flag to re-read config file at next convenient time */
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index de1b22d045..f20bdb73b0 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -818,22 +818,21 @@ chkpt_quickdie(SIGNAL_ARGS)
 	/*
 	 * We DO NOT want to run proc_exit() callbacks -- we're here because
 	 * shared memory may be corrupted, so we don't want to try to clean up our
-	 * transaction.  Just nail the windows shut and get out of town.  Now that
-	 * there's an atexit callback to prevent third-party code from breaking
-	 * things by calling exit() directly, we have to reset the callbacks
-	 * explicitly to make this work as intended.
-	 */
-	on_exit_reset();
-
-	/*
-	 * Note we do exit(2) not exit(0).  This is to force the postmaster into a
-	 * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
-	 * backend.  This is necessary precisely because we don't clean up our
-	 * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
-	 * should ensure the postmaster sees this as a crash, too, but no harm in
-	 * being doubly sure.)
+	 * transaction.  Just nail the windows shut and get out of town.
+	 *
+	 * There's an atexit callback to prevent third-party code from breaking
+	 * things by calling exit() directly.  We don't want to trigger that, so
+	 * we use _exit(), which doesn't run atexit callbacks, rather than exit().
+	 * And exit() wouldn't be safe to run from a signal handler, anyway.
+	 *
+	 * Note we use _exit(2) not _exit(0).  This is to force the postmaster
+	 * into a system reset cycle if some idiot DBA sends a manual SIGQUIT to a
+	 * random backend.  This is necessary precisely because we don't clean up
+	 * our shared memory state.  (The "dead man switch" mechanism in
+	 * pmsignal.c should ensure the postmaster sees this as a crash, too, but
+	 * no harm in being doubly sure.)
 	 */
-	exit(2);
+	_exit(2);
 }
 
 /* SIGHUP: set flag to re-read config file at next convenient time */
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 38300527a5..1517b50c59 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -74,22 +74,21 @@ startupproc_quickdie(SIGNAL_ARGS)
 	/*
 	 * We DO NOT want to run proc_exit() callbacks -- we're here because
 	 * shared memory may be corrupted, so we don't want to try to clean up our
-	 * transaction.  Just nail the windows shut and get out of town.  Now that
-	 * there's an atexit callback to prevent third-party code from breaking
-	 * things by calling exit() directly, we have to reset the callbacks
-	 * explicitly to make this work as intended.
+	 * transaction.  Just nail the windows shut and get out of town.
+	 *
+	 * There's an atexit callback to prevent third-party code from breaking
+	 * things by calling exit() directly.  We don't want to trigger that, so
+	 * we use _exit(), which doesn't run atexit callbacks, rather than exit().
+	 * And exit() wouldn't be safe to run from a signal handler, anyway.
+	 *
+	 * Note we use _exit(2) not _exit(0).  This is to force the postmaster
+	 * into a system reset cycle if some idiot DBA sends a manual SIGQUIT to a
+	 * random backend.  This is necessary precisely because we don't clean up
+	 * our shared memory state.  (The "dead man switch" mechanism in
+	 * pmsignal.c should ensure the postmaster sees this as a crash, too, but
+	 * no harm in being doubly sure.)
 	 */
-	on_exit_reset();
-
-	/*
-	 * Note we do exit(2) not exit(0).  This is to force the postmaster into a
-	 * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
-	 * backend.  This is necessary precisely because we don't clean up our
-	 * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
-	 * should ensure the postmaster sees this as a crash, too, but no harm in
-	 * being doubly sure.)
-	 */
-	exit(2);
+	_exit(2);
 }
 
 
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index eceed1bf88..c9b5b1a26a 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -314,22 +314,21 @@ wal_quickdie(SIGNAL_ARGS)
 	/*
 	 * We DO NOT want to run proc_exit() callbacks -- we're here because
 	 * shared memory may be corrupted, so we don't want to try to clean up our
-	 * transaction.  Just nail the windows shut and get out of town.  Now that
-	 * there's an atexit callback to prevent third-party code from breaking
-	 * things by calling exit() directly, we have to reset the callbacks
-	 * explicitly to make this work as intended.
-	 */
-	on_exit_reset();
-
-	/*
-	 * Note we do exit(2) not exit(0).  This is to force the postmaster into a
-	 * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
-	 * backend.  This is necessary precisely because we don't clean up our
-	 * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
-	 * should ensure the postmaster sees this as a crash, too, but no harm in
-	 * being doubly sure.)
+	 * transaction.  Just nail the windows shut and get out of town.
+	 *
+	 * There's an atexit callback to prevent third-party code from breaking
+	 * things by calling exit() directly.  We don't want to trigger that, so
+	 * we use _exit(), which doesn't run atexit callbacks, rather than exit().
+	 * And exit() wouldn't be safe to run from a signal handler, anyway.
+	 *
+	 * Note we use _exit(2) not _exit(0).  This is to force the postmaster
+	 * into a system reset cycle if some idiot DBA sends a manual SIGQUIT to a
+	 * random backend.  This is necessary precisely because we don't clean up
+	 * our shared memory state.  (The "dead man switch" mechanism in
+	 * pmsignal.c should ensure the postmaster sees this as a crash, too, but
+	 * no harm in being doubly sure.)
 	 */
-	exit(2);
+	_exit(2);
 }
 
 /* SIGHUP: set flag to re-read config file at next convenient time */
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 7c292d8071..ce2b116f89 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -859,22 +859,21 @@ WalRcvQuickDieHandler(SIGNAL_ARGS)
 	/*
 	 * We DO NOT want to run proc_exit() callbacks -- we're here because
 	 * shared memory may be corrupted, so we don't want to try to clean up our
-	 * transaction.  Just nail the windows shut and get out of town.  Now that
-	 * there's an atexit callback to prevent third-party code from breaking
-	 * things by calling exit() directly, we have to reset the callbacks
-	 * explicitly to make this work as intended.
-	 */
-	on_exit_reset();
-
-	/*
-	 * Note we do exit(2) not exit(0).  This is to force the postmaster into a
-	 * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
-	 * backend.  This is necessary precisely because we don't clean up our
-	 * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
-	 * should ensure the postmaster sees this as a crash, too, but no harm in
-	 * being doubly sure.)
+	 * transaction.  Just nail the windows shut and get out of town.
+	 *
+	 * There's an atexit callback to prevent third-party code from breaking
+	 * things by calling exit() directly.  We don't want to trigger that, so
+	 * we use _exit(), which doesn't run atexit callbacks, rather than exit().
+	 * And exit() wouldn't be safe to run from a signal handler, anyway.
+	 *
+	 * Note we use _exit(2) not _exit(0).  This is to force the postmaster
+	 * into a system reset cycle if some idiot DBA sends a manual SIGQUIT to a
+	 * random backend.  This is necessary precisely because we don't clean up
+	 * our shared memory state.  (The "dead man switch" mechanism in
+	 * pmsignal.c should ensure the postmaster sees this as a crash, too, but
+	 * no harm in being doubly sure.)
 	 */
-	exit(2);
+	_exit(2);
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f4133953be..fab627d7f8 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2632,22 +2632,21 @@ quickdie(SIGNAL_ARGS)
 	/*
 	 * We DO NOT want to run proc_exit() callbacks -- we're here because
 	 * shared memory may be corrupted, so we don't want to try to clean up our
-	 * transaction.  Just nail the windows shut and get out of town.  Now that
-	 * there's an atexit callback to prevent third-party code from breaking
-	 * things by calling exit() directly, we have to reset the callbacks
-	 * explicitly to make this work as intended.
-	 */
-	on_exit_reset();
-
-	/*
-	 * Note we do exit(2) not exit(0).  This is to force the postmaster into a
-	 * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
-	 * backend.  This is necessary precisely because we don't clean up our
-	 * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
-	 * should ensure the postmaster sees this as a crash, too, but no harm in
-	 * being doubly sure.)
+	 * transaction.  Just nail the windows shut and get out of town.
+	 *
+	 * There's an atexit callback to prevent third-party code from breaking
+	 * things by calling exit() directly.  We don't want to trigger that, so
+	 * we use _exit(), which doesn't run atexit callbacks, rather than exit().
+	 * And exit() wouldn't be safe to run from a signal handler, anyway.
+	 *
+	 * Note we use _exit(2) not _exit(0).  This is to force the postmaster
+	 * into a system reset cycle if some idiot DBA sends a manual SIGQUIT to a
+	 * random backend.  This is necessary precisely because we don't clean up
+	 * our shared memory state.  (The "dead man switch" mechanism in
+	 * pmsignal.c should ensure the postmaster sees this as a crash, too, but
+	 * no harm in being doubly sure.)
 	 */
-	exit(2);
+	_exit(2);
 }
 
 /*

Reply via email to