On Wed, Feb 3, 2021 at 11:11 AM Robert Haas <robertmh...@gmail.com> wrote: > I think the way it works right now is stupid and the proposed change > is going in the right direction. We have ample evidence already that > handing off fsyncs to a background process is a good idea, and there's > no reason why that shouldn't be beneficial during crash recovery just > as it is at other times. But even if it somehow failed to improve > performance during recovery, there's another good reason to do this, > which is that it would make the code simpler. Having the pendingOps > stuff in the startup process in some recovery situations and in the > checkpointer in other recovery situations makes this harder to reason > about. As Tom said, the system state where bgwriter and checkpointer > are not running is an uncommon one, and is probably more likely to > have (or grow) bugs than the state where they are running.
Yeah, it's a good argument. > The rat's-nest of logic introduced by the comment "Perform a > checkpoint to update all our recovery activity to disk." inside > StartupXLOG() could really do with some simplification. Right now we > have three cases: CreateEndOfRecoveryRecord(), RequestCheckpoint(), > and CreateCheckpoint(). Maybe with this change we could get it down to > just two, since RequestCheckpoint() already knows what to do about > !IsUnderPostmaster. True. Done in this version. Here's a rebase of this patch + Simon's patch to report on stats. I also have a sketch patch to provide a GUC that turns off the end-of-recovery checkpoint as mentioned earlier, attached (sharing mainly because this is one of the stack of patches that Jakub was testing for his baseback/PITR workloads and he might want to test some more), but I'm not proposing that for PG14. That idea is tangled up with the "relfile tombstone" stuff I wrote about elsewhere[1], but I haven't finished studying the full arsenal of footguns in that area (it's something like: if we don't wait for end-of-recovery checkpoint before allowing connections, then we'd better start creating tombstones in recovery unless the WAL level is high enough to avoid data eating hazards with unlogged changes and a double crash). [1] https://commitfest.postgresql.org/33/3030/
From 7a82c9e622ea45c981d05dff08c1d4279ec3c73b Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Wed, 26 Aug 2020 16:34:33 +1200 Subject: [PATCH v2 1/3] Run checkpointer and bgworker in crash recovery. Relieve the startup process of some writeback and checkpointing work during crash recovery, making it more like replication recovery. This wasn't done back in commit cdd46c76 out of caution, but now it seems more conservative to have the crash recovery environment work the same as the more commonly exercised streaming replication environment. In order to have a bgwriter running, you also need a checkpointer. While you could give startup and bgwriter their own backend private pending ops table, it's nicer to merger their requests in one place. Reviewed-by: Tom Lane <t...@sss.pgh.pa.us> Reviewed-by: Simon Riggs <si...@2ndquadrant.com> Reviewed-by: Robert Haas <robertmh...@gmail.com> Tested-by: Jakub Wartak <jakub.war...@tomtom.com> Discussion: https://postgr.es/m/CA%2BhUKGJ8NRsqgkZEnsnRc2MFROBV-jCnacbYvtpptK2A9YYp9Q%40mail.gmail.com --- src/backend/access/transam/xlog.c | 32 +++++++++------------------ src/backend/postmaster/bgwriter.c | 3 --- src/backend/postmaster/checkpointer.c | 3 --- src/backend/postmaster/postmaster.c | 17 ++++++-------- src/backend/storage/sync/sync.c | 30 +++---------------------- src/include/storage/sync.h | 1 - 6 files changed, 21 insertions(+), 65 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index f4d1ce5dea..6e8e6cf1e4 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -880,9 +880,6 @@ bool reachedConsistency = false; static bool InRedo = false; -/* Have we launched bgwriter during recovery? */ -static bool bgwriterLaunched = false; - /* For WALInsertLockAcquire/Release functions */ static int MyLockNo = 0; static bool holdingAllLocks = false; @@ -7268,25 +7265,14 @@ StartupXLOG(void) /* Also ensure XLogReceiptTime has a sane value */ XLogReceiptTime = GetCurrentTimestamp(); + PublishStartupProcessInformation(); + /* * Let postmaster know we've started redo now, so that it can launch - * checkpointer to perform restartpoints. We don't bother during - * crash recovery as restartpoints can only be performed during - * archive recovery. And we'd like to keep crash recovery simple, to - * avoid introducing bugs that could affect you when recovering after - * crash. - * - * After this point, we can no longer assume that we're the only - * process in addition to postmaster! Also, fsync requests are - * subsequently to be handled by the checkpointer, not locally. + * the archiver if necessary. */ - if (ArchiveRecoveryRequested && IsUnderPostmaster) - { - PublishStartupProcessInformation(); - EnableSyncRequestForwarding(); + if (IsUnderPostmaster) SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED); - bgwriterLaunched = true; - } /* * Allow read-only connections immediately if we're consistent @@ -7879,7 +7865,7 @@ StartupXLOG(void) * after we're fully out of recovery mode and already accepting * queries. */ - if (bgwriterLaunched) + if (ArchiveRecoveryRequested && IsUnderPostmaster) { if (LocalPromoteIsTriggered) { @@ -7915,7 +7901,11 @@ StartupXLOG(void) CHECKPOINT_WAIT); } else - CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE); + { + RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY | + CHECKPOINT_IMMEDIATE | + CHECKPOINT_WAIT); + } } if (ArchiveRecoveryRequested) @@ -12123,7 +12113,7 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, * Request a restartpoint if we've replayed too much xlog since the * last one. */ - if (bgwriterLaunched) + if (ArchiveRecoveryRequested && IsUnderPostmaster) { if (XLogCheckpointNeeded(readSegNo)) { diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index 715d5195bb..5584f4bc24 100644 --- a/src/backend/postmaster/bgwriter.c +++ b/src/backend/postmaster/bgwriter.c @@ -12,9 +12,6 @@ * * As of Postgres 9.2 the bgwriter no longer handles checkpoints. * - * The bgwriter is started by the postmaster as soon as the startup subprocess - * finishes, or as soon as recovery begins if we are doing archive recovery. - * It remains alive until the postmaster commands it to terminate. * Normal termination is by SIGTERM, which instructs the bgwriter to exit(0). * Emergency termination is by SIGQUIT; like any backend, the bgwriter will * simply abort and exit on SIGQUIT. diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 5907a7befc..7b7f01e1fc 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -10,9 +10,6 @@ * fill WAL segments; the checkpointer itself doesn't watch for the * condition.) * - * The checkpointer is started by the postmaster as soon as the startup - * subprocess finishes, or as soon as recovery begins if we are doing archive - * recovery. It remains alive until the postmaster commands it to terminate. * Normal termination is by SIGUSR2, which instructs the checkpointer to * execute a shutdown checkpoint and then exit(0). (All backends must be * stopped before SIGUSR2 is issued!) Emergency termination is by SIGQUIT; diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index edab95a19e..4d5b4b9775 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1392,6 +1392,12 @@ PostmasterMain(int argc, char *argv[]) */ AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING); + /* Start bgwriter and checkpointer so they can help with recovery */ + if (CheckpointerPID == 0) + CheckpointerPID = StartCheckpointer(); + if (BgWriterPID == 0) + BgWriterPID = StartBackgroundWriter(); + /* * We're ready to rock and roll... */ @@ -1754,7 +1760,7 @@ ServerLoop(void) * fails, we'll just try again later. Likewise for the checkpointer. */ if (pmState == PM_RUN || pmState == PM_RECOVERY || - pmState == PM_HOT_STANDBY) + pmState == PM_HOT_STANDBY || pmState == PM_STARTUP) { if (CheckpointerPID == 0) CheckpointerPID = StartCheckpointer(); @@ -5125,15 +5131,6 @@ sigusr1_handler(SIGNAL_ARGS) FatalError = false; AbortStartTime = 0; - /* - * Crank up the background tasks. It doesn't matter if this fails, - * we'll just try again later. - */ - Assert(CheckpointerPID == 0); - CheckpointerPID = StartCheckpointer(); - Assert(BgWriterPID == 0); - BgWriterPID = StartBackgroundWriter(); - /* * Start the archiver if we're responsible for (re-)archiving received * files. diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c index 708215614d..ee18898ea2 100644 --- a/src/backend/storage/sync/sync.c +++ b/src/backend/storage/sync/sync.c @@ -129,10 +129,10 @@ InitSync(void) { /* * Create pending-operations hashtable if we need it. Currently, we need - * it if we are standalone (not under a postmaster) or if we are a startup - * or checkpointer auxiliary process. + * it if we are standalone (not under a postmaster) or if we are a + * checkpointer auxiliary process. */ - if (!IsUnderPostmaster || AmStartupProcess() || AmCheckpointerProcess()) + if (!IsUnderPostmaster || AmCheckpointerProcess()) { HASHCTL hash_ctl; @@ -589,27 +589,3 @@ RegisterSyncRequest(const FileTag *ftag, SyncRequestType type, return ret; } - -/* - * In archive recovery, we rely on checkpointer to do fsyncs, but we will have - * already created the pendingOps during initialization of the startup - * process. Calling this function drops the local pendingOps so that - * subsequent requests will be forwarded to checkpointer. - */ -void -EnableSyncRequestForwarding(void) -{ - /* Perform any pending fsyncs we may have queued up, then drop table */ - if (pendingOps) - { - ProcessSyncRequests(); - hash_destroy(pendingOps); - } - pendingOps = NULL; - - /* - * We should not have any pending unlink requests, since mdunlink doesn't - * queue unlink requests when isRedo. - */ - Assert(pendingUnlinks == NIL); -} diff --git a/src/include/storage/sync.h b/src/include/storage/sync.h index fbdf34f762..6fd50cfa7b 100644 --- a/src/include/storage/sync.h +++ b/src/include/storage/sync.h @@ -60,7 +60,6 @@ extern void SyncPreCheckpoint(void); extern void SyncPostCheckpoint(void); extern void ProcessSyncRequests(void); extern void RememberSyncRequest(const FileTag *ftag, SyncRequestType type); -extern void EnableSyncRequestForwarding(void); extern bool RegisterSyncRequest(const FileTag *ftag, SyncRequestType type, bool retryOnError); -- 2.30.1
From 3568596c9e2a6331f60d8e25bff35546279a42e4 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Sat, 13 Mar 2021 00:57:38 +1300 Subject: [PATCH v2 2/3] Log buffer stats after crash recovery. Author: Simon Riggs <si...@2ndquadrant.com> Discussion: https://postgr.es/m/CANP8%2BjLqwRoKQEvAi%3DiRhSwgL1JWNxiUhx%2BRyG-D82CE_qrDew%40mail.gmail.com --- src/backend/access/transam/xlog.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6e8e6cf1e4..b10b1b614a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7574,6 +7574,29 @@ StartupXLOG(void) (errmsg("redo done at %X/%X system usage: %s", LSN_FORMAT_ARGS(ReadRecPtr), pg_rusage_show(&ru0)))); + if (!InArchiveRecovery) + { + if (!INSTR_TIME_IS_ZERO(pgBufferUsage.blk_read_time)) + ereport(LOG, + (errmsg("crash recovery complete: wrote %ld buffers (%.1f%%) (%0.3f ms); " + "dirtied %ld buffers; read %ld buffers (%0.3f ms)", + pgBufferUsage.shared_blks_written, + (double) pgBufferUsage.shared_blks_written * 100 / NBuffers, + INSTR_TIME_GET_MILLISEC(pgBufferUsage.blk_write_time), + pgBufferUsage.shared_blks_dirtied, + pgBufferUsage.shared_blks_read, + INSTR_TIME_GET_MILLISEC(pgBufferUsage.blk_read_time) + ))); + else + ereport(LOG, + (errmsg("crash recovery complete: wrote %ld buffers (%.1f%%); " + "dirtied %ld buffers; read %ld buffers", + pgBufferUsage.shared_blks_written, + (double) pgBufferUsage.shared_blks_written * 100 / NBuffers, + pgBufferUsage.shared_blks_dirtied, + pgBufferUsage.shared_blks_read + ))); + } xtime = GetLatestXTime(); if (xtime) ereport(LOG, -- 2.30.1
From 6420ae1112818ed7e4f4d5ecd306ef10f8873556 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Fri, 28 Aug 2020 16:35:25 +1200 Subject: [PATCH v2 3/3] Optionally, don't wait for end-of-recovery checkpoint. Add a GUC to specify whether you want to wait for the end-of-recovery checkpoint to complete before continuing. Default to the traditional behavior, "on". XXX This probably needs a better GUC name. There may also be problems with timelines; see the nearby fast promotion code. XXX This is experimental code only! There may be fundamental problems with the concept. Or not. --- src/backend/access/transam/xlog.c | 23 +++++++++++++++-------- src/backend/utils/misc/guc.c | 9 +++++++++ src/include/access/xlog.h | 1 + 3 files changed, 25 insertions(+), 8 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b10b1b614a..4c001c466b 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -111,6 +111,7 @@ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ int wal_retrieve_retry_interval = 5000; int max_slot_wal_keep_size_mb = -1; bool track_wal_io_timing = false; +bool end_of_recovery_checkpoint_wait = true; #ifdef WAL_DEBUG bool XLOG_DEBUG = false; @@ -7874,6 +7875,18 @@ StartupXLOG(void) if (InRecovery) { + int checkpoint_flags = CHECKPOINT_IMMEDIATE; + + /* + * Should we wait for the end-of-recovery checkpoint to complete? + * Traditionally we do this, but it may be useful to allow connections + * sooner. + */ + if (end_of_recovery_checkpoint_wait) + checkpoint_flags |= CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_WAIT; + else + checkpoint_flags |= CHECKPOINT_FORCE; + /* * Perform a checkpoint to update all our recovery activity to disk. * @@ -7919,16 +7932,10 @@ StartupXLOG(void) } if (!promoted) - RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY | - CHECKPOINT_IMMEDIATE | - CHECKPOINT_WAIT); + RequestCheckpoint(checkpoint_flags); } else - { - RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY | - CHECKPOINT_IMMEDIATE | - CHECKPOINT_WAIT); - } + RequestCheckpoint(checkpoint_flags); } if (ArchiveRecoveryRequested) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 855076b1fd..e59194a3f8 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2058,6 +2058,15 @@ static struct config_bool ConfigureNamesBool[] = NULL, NULL, NULL }, + { + {"end_of_recovery_checkpoint_wait", PGC_SIGHUP, WAL_CHECKPOINTS, + gettext_noop("Whether to wait for the end-of-recovery checkpoint to complete."), + }, + &end_of_recovery_checkpoint_wait, + true, + NULL, NULL, NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 6d384d3ce6..77b4560f4b 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -132,6 +132,7 @@ extern char *PrimaryConnInfo; extern char *PrimarySlotName; extern bool wal_receiver_create_temp_slot; extern bool track_wal_io_timing; +extern bool end_of_recovery_checkpoint_wait; /* indirectly set via GUC system */ extern TransactionId recoveryTargetXid; -- 2.30.1