On Wed, Feb 01, 2023 at 09:58:06AM -0800, Nathan Bossart wrote: > On Wed, Feb 01, 2023 at 08:58:01AM -0800, Andres Freund wrote: >> On 2023-02-01 10:12:26 -0500, Tom Lane wrote: >>> The fundamental issue is that we have no good way to break out >>> of system(), and I think the original idea was that >>> in_restore_command would be set *only* for the duration of the >>> system() call. That's clearly been lost sight of completely, >>> but maybe as a stopgap we could try to get back to that. >> >> We could push the functions setting in_restore_command down into >> ExecuteRecoveryCommand(). But I don't think that'd end up necessarily >> being right either - we'd now use the mechanism in places we previously >> didn't (cleanup/end commands). > > Right, we'd only want to set it for restore_command. I think that's > doable.
Here is a first draft for the proposed stopgap fix. If we want to proceed with this, I can provide patches for the back branches. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
>From c703a9f7ac6c43e65fc32117980495ac7980e2e3 Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nathandboss...@gmail.com> Date: Wed, 1 Feb 2023 14:32:02 -0800 Subject: [PATCH v1 1/1] stopgap fix for restore_command --- src/backend/access/transam/shell_restore.c | 22 ++++++++++++++++++++-- src/backend/access/transam/xlogarchive.c | 7 ------- 2 files changed, 20 insertions(+), 9 deletions(-) diff --git a/src/backend/access/transam/shell_restore.c b/src/backend/access/transam/shell_restore.c index 8458209f49..abec023c1a 100644 --- a/src/backend/access/transam/shell_restore.c +++ b/src/backend/access/transam/shell_restore.c @@ -21,6 +21,7 @@ #include "access/xlogarchive.h" #include "access/xlogrecovery.h" #include "common/percentrepl.h" +#include "postmaster/startup.h" #include "storage/ipc.h" #include "utils/wait_event.h" @@ -124,8 +125,7 @@ shell_recovery_end(const char *lastRestartPointFileName) * human-readable name describing the command emitted in the logs. If * 'failOnSignal' is true and the command is killed by a signal, a FATAL * error is thrown. Otherwise, 'fail_elevel' is used for the log message. - * If 'exitOnSigterm' is true and the command is killed by SIGTERM, we exit - * immediately. + * If 'exitOnSigterm' is true and SIGTERM is received, we exit immediately. * * Returns whether the command succeeded. */ @@ -146,7 +146,25 @@ ExecuteRecoveryCommand(const char *command, const char *commandName, */ fflush(NULL); pgstat_report_wait_start(wait_event_info); + + /* + * PreRestoreCommand() is used to tell the SIGTERM handler for the startup + * process that it is okay to proc_exit() right away on SIGTERM. This is + * done for the duration of the system() call because there isn't a good + * way to break out while it is executing. Since we might call proc_exit() + * in a signal handler here, it is extremely important that nothing but the + * system() call happens between the calls to PreRestoreCommand() and + * PostRestoreCommand(). Any additional code must go before or after this + * section. + */ + if (exitOnSigterm) + PreRestoreCommand(); + rc = system(command); + + if (exitOnSigterm) + PostRestoreCommand(); + pgstat_report_wait_end(); if (rc != 0) diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 4b89addf97..66312c816b 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -147,18 +147,11 @@ RestoreArchivedFile(char *path, const char *xlogfname, else XLogFileName(lastRestartPointFname, 0, 0L, wal_segment_size); - /* - * Check signals before restore command and reset afterwards. - */ - PreRestoreCommand(); - /* * Copy xlog from archival storage to XLOGDIR */ ret = shell_restore(xlogfname, xlogpath, lastRestartPointFname); - PostRestoreCommand(); - if (ret) { /* -- 2.25.1