On Wed, Feb 01, 2023 at 09:58:06AM -0800, Nathan Bossart wrote:
> On Wed, Feb 01, 2023 at 08:58:01AM -0800, Andres Freund wrote:
>> On 2023-02-01 10:12:26 -0500, Tom Lane wrote:
>>> The fundamental issue is that we have no good way to break out
>>> of system(), and I think the original idea was that
>>> in_restore_command would be set *only* for the duration of the
>>> system() call.  That's clearly been lost sight of completely,
>>> but maybe as a stopgap we could try to get back to that.
>> 
>> We could push the functions setting in_restore_command down into
>> ExecuteRecoveryCommand(). But I don't think that'd end up necessarily
>> being right either - we'd now use the mechanism in places we previously
>> didn't (cleanup/end commands).
> 
> Right, we'd only want to set it for restore_command.  I think that's
> doable.

Here is a first draft for the proposed stopgap fix.  If we want to proceed
with this, I can provide patches for the back branches.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From c703a9f7ac6c43e65fc32117980495ac7980e2e3 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Wed, 1 Feb 2023 14:32:02 -0800
Subject: [PATCH v1 1/1] stopgap fix for restore_command

---
 src/backend/access/transam/shell_restore.c | 22 ++++++++++++++++++++--
 src/backend/access/transam/xlogarchive.c   |  7 -------
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/transam/shell_restore.c b/src/backend/access/transam/shell_restore.c
index 8458209f49..abec023c1a 100644
--- a/src/backend/access/transam/shell_restore.c
+++ b/src/backend/access/transam/shell_restore.c
@@ -21,6 +21,7 @@
 #include "access/xlogarchive.h"
 #include "access/xlogrecovery.h"
 #include "common/percentrepl.h"
+#include "postmaster/startup.h"
 #include "storage/ipc.h"
 #include "utils/wait_event.h"
 
@@ -124,8 +125,7 @@ shell_recovery_end(const char *lastRestartPointFileName)
  * human-readable name describing the command emitted in the logs. If
  * 'failOnSignal' is true and the command is killed by a signal, a FATAL
  * error is thrown. Otherwise, 'fail_elevel' is used for the log message.
- * If 'exitOnSigterm' is true and the command is killed by SIGTERM, we exit
- * immediately.
+ * If 'exitOnSigterm' is true and SIGTERM is received, we exit immediately.
  *
  * Returns whether the command succeeded.
  */
@@ -146,7 +146,25 @@ ExecuteRecoveryCommand(const char *command, const char *commandName,
 	 */
 	fflush(NULL);
 	pgstat_report_wait_start(wait_event_info);
+
+	/*
+	 * PreRestoreCommand() is used to tell the SIGTERM handler for the startup
+	 * process that it is okay to proc_exit() right away on SIGTERM.  This is
+	 * done for the duration of the system() call because there isn't a good
+	 * way to break out while it is executing.  Since we might call proc_exit()
+	 * in a signal handler here, it is extremely important that nothing but the
+	 * system() call happens between the calls to PreRestoreCommand() and
+	 * PostRestoreCommand().  Any additional code must go before or after this
+	 * section.
+	 */
+	if (exitOnSigterm)
+		PreRestoreCommand();
+
 	rc = system(command);
+
+	if (exitOnSigterm)
+		PostRestoreCommand();
+
 	pgstat_report_wait_end();
 
 	if (rc != 0)
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 4b89addf97..66312c816b 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -147,18 +147,11 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	else
 		XLogFileName(lastRestartPointFname, 0, 0L, wal_segment_size);
 
-	/*
-	 * Check signals before restore command and reset afterwards.
-	 */
-	PreRestoreCommand();
-
 	/*
 	 * Copy xlog from archival storage to XLOGDIR
 	 */
 	ret = shell_restore(xlogfname, xlogpath, lastRestartPointFname);
 
-	PostRestoreCommand();
-
 	if (ret)
 	{
 		/*
-- 
2.25.1

Reply via email to