On Fri, Feb 14, 2025 at 2:35 AM Bertrand Drouvot
<bertranddrouvot...@gmail.com> wrote:
>
> Hi,
>
> On Fri, Feb 14, 2025 at 12:17:48AM -0800, Masahiko Sawada wrote:
> > On Tue, Feb 11, 2025 at 11:44 PM Bertrand Drouvot
> > <bertranddrouvot...@gmail.com> wrote:
>
> > Looking at the latest custodian worker patch, the basic architecture
> > is to have a single custodian worker and processes can ask it for some
> > work such as removing logical decoding related files. The online
> > wal_level change will be the one of the tasks that processes (eps.
> > checkpointer) can ask for it. On the other hand, one point that I
> > think might not fit this wal_level work well is that while the
> > custodian worker is a long-lived worker process,
>
> That was the case initialy but it looks like it would not have been the case
> at the end. See, Tom's comment in [1]:
>
> "
> I wonder if a single long-lived custodian task is the right model at all.
> At least for RemovePgTempFiles, it'd make more sense to write it as a
> background worker that spawns, does its work, and then exits,
> independently of anything else
> "
>
> > it's sufficient for
> > the online wal_level change work to have a bgworker that does its work
> > and then exits.
>
> Fully agree and I did not think about changing this behavior.
>
> > IOW, from the perspective of this work, I prefer the
> > idea of having one short-lived worker for one task over having one
> > long-lived worker for multiple tasks.
>
> Yeah, or one short-lived worker for multiple tasks could work too. It just
> starts when it has something to do and then exit.
>
> > Reading that thread, while we
> > need to resolve the XID wraparound issue for the work of removing
> > logical decoding related files, the work of removing temporary files
> > seems to fit a short-lived worker style. So I thought as one of the
> > directions, it might be worth considering to have an infrastructure
> > where we can launch a bgworker just for one task, and we implement the
> > online wal_level change and temporary files removal on top of it.
>
> Yeap, that was exactly my point when I mentioned the custodian thread (taking
> into account Tom's comment quoted above).
>

I've written PoC patches to have the online wal_level change work use
a more generic infrastructure. These patches are still in PoC state
but seem like a good direction to me. Here is a brief explanation for
each patch.

* The 0001 patch introduces "reserved background worker slots". We
allocate max_process_workers + BGWORKER_CLASS_RESERVED at startup, and
if the number of running bgworker exceeds max_worker_processes, only
workers using the reserved slots can be launched. We can request to
use the reserved slots by adding BGWORKER_CLASS_RESERVED flag at
bgworker registration.

* The 0002 patch introduces "bgtask worker". The bgtask infrastructure
is designed to execute internal tasks in background in
one-worker-per-one-task style. Internally, bgtask workers use the
reserved bgworker so it's guaranteed that they can launch. The
internal tasks that we can request are predefined and this patch has a
dummy task as a placeholder. This patch implements only the minimal
functionality for the online wal_level change work. I've not tested if
this bgtask infrastructure can be used for tasks that we wanted to
offload to the custodian worker.

* The 0003 patch makes wal_level a SIGHUP parameter. We do the online
wal_level change work using the bgtask infrastructure. There are no
major changes from the previous version other than that.

Regards,


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From 9ab16e2442449ce9cf2ac3f2681db4e461b7dcd2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 23 Jan 2025 16:36:37 -0800
Subject: [PATCH v3 3/3] PoC: Convert wal_level a PGC_SIGHUP parameter.

---
 src/backend/access/rmgrdesc/xlogdesc.c        |  10 +
 src/backend/access/transam/Makefile           |   1 +
 src/backend/access/transam/meson.build        |   1 +
 src/backend/access/transam/xlog.c             |  89 ++-
 src/backend/access/transam/xlogfuncs.c        |   2 +-
 src/backend/access/transam/xloglevel.c        | 575 ++++++++++++++++++
 src/backend/access/transam/xlogrecovery.c     |   5 +
 src/backend/commands/publicationcmds.c        |   2 +-
 src/backend/postmaster/checkpointer.c         |   6 +
 src/backend/postmaster/pgarch.c               |  26 +
 src/backend/postmaster/postmaster.c           |  20 +-
 src/backend/replication/logical/decode.c      |  12 +
 src/backend/replication/logical/logical.c     |   2 +-
 src/backend/replication/logical/slotsync.c    |  10 +-
 src/backend/replication/slot.c                |   6 +-
 src/backend/replication/walsender.c           |  29 +
 src/backend/storage/ipc/ipci.c                |   2 +
 src/backend/storage/ipc/procsignal.c          |   4 +
 src/backend/storage/ipc/standby.c             |   6 +-
 .../utils/activity/wait_event_names.txt       |   3 +
 src/backend/utils/init/postinit.c             |   3 +
 src/backend/utils/misc/bgtask.c               |  15 +-
 src/backend/utils/misc/guc_tables.c           |   4 +-
 src/bin/pg_controldata/pg_controldata.c       |   4 +
 src/include/access/xlog.h                     |  55 +-
 src/include/catalog/pg_control.h              |   1 +
 src/include/postmaster/pgarch.h               |   1 +
 src/include/replication/walsender.h           |   1 +
 src/include/storage/lwlocklist.h              |   1 +
 src/include/storage/procsignal.h              |   2 +
 src/include/utils/bgtask.h                    |   4 +-
 src/include/utils/guc_hooks.h                 |   2 +
 32 files changed, 844 insertions(+), 60 deletions(-)
 create mode 100644 src/backend/access/transam/xloglevel.c

diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 58040f28656..742a4c8b34e 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -164,6 +164,13 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 	{
 		int			wal_level;
 
+		memcpy(&wal_level, rec, sizeof(int));
+		appendStringInfo(buf, "wal_level %s", get_wal_level_string(wal_level));
+	}
+	else if (info == XLOG_WAL_LEVEL_CHANGE)
+	{
+		int			wal_level;
+
 		memcpy(&wal_level, rec, sizeof(int));
 		appendStringInfo(buf, "wal_level %s", get_wal_level_string(wal_level));
 	}
@@ -218,6 +225,9 @@ xlog_identify(uint8 info)
 		case XLOG_CHECKPOINT_REDO:
 			id = "CHECKPOINT_REDO";
 			break;
+		case XLOG_WAL_LEVEL_CHANGE:
+			id = "WAL_LEVEL_CHANGE";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 661c55a9db7..d2730952a6e 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -32,6 +32,7 @@ OBJS = \
 	xlogbackup.o \
 	xlogfuncs.o \
 	xloginsert.o \
+	xloglevel.o \
 	xlogprefetcher.o \
 	xlogreader.o \
 	xlogrecovery.o \
diff --git a/src/backend/access/transam/meson.build b/src/backend/access/transam/meson.build
index e8ae9b13c8e..af516cfd4a4 100644
--- a/src/backend/access/transam/meson.build
+++ b/src/backend/access/transam/meson.build
@@ -20,6 +20,7 @@ backend_sources += files(
   'xlogbackup.c',
   'xlogfuncs.c',
   'xloginsert.c',
+  'xloglevel.c',
   'xlogprefetcher.c',
   'xlogrecovery.c',
   'xlogstats.c',
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a50fd99d9e5..00402dc02a0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6147,6 +6147,18 @@ StartupXLOG(void)
 	Insert->fullPageWrites = lastFullPageWrites;
 	UpdateFullPageWrites();
 
+	/*
+	 * Update wal_level in shared memory.
+	 *
+	 * We use the wal_level value configured in this server rather than the
+	 * value in the ControlFile. We don't write a XLOG_WAL_LEVEL_CHANGE record
+	 * as the recovery still in-progress, but that change would be included in
+	 * the XLOG_PARAMETER_CHANGE record written by the subsequent
+	 * XLogReportParameters().
+	 */
+	if (performedWalRecovery)
+		UpdateWalLevelAfterRecovery(ControlFile->wal_level);
+
 	/*
 	 * Emit checkpoint or end-of-recovery record in XLOG, if required.
 	 */
@@ -7022,7 +7034,7 @@ CreateCheckPoint(int flags)
 	WALInsertLockAcquireExclusive();
 
 	checkPoint.fullPageWrites = Insert->fullPageWrites;
-	checkPoint.wal_level = wal_level;
+	checkPoint.wal_level = GetActiveWalLevel();
 
 	if (shutdown)
 	{
@@ -7076,9 +7088,11 @@ CreateCheckPoint(int flags)
 	 */
 	if (!shutdown)
 	{
+		int			level = GetActiveWalLevel();
+
 		/* Include WAL level in record for WAL summarizer's benefit. */
 		XLogBeginInsert();
-		XLogRegisterData(&wal_level, sizeof(wal_level));
+		XLogRegisterData(&level, sizeof(level));
 		(void) XLogInsert(RM_XLOG_ID, XLOG_CHECKPOINT_REDO);
 
 		/*
@@ -7403,7 +7417,7 @@ CreateEndOfRecoveryRecord(void)
 		elog(ERROR, "can only be used to end recovery");
 
 	xlrec.end_time = GetCurrentTimestamp();
-	xlrec.wal_level = wal_level;
+	xlrec.wal_level = GetActiveWalLevel();
 
 	WALInsertLockAcquireExclusive();
 	xlrec.ThisTimeLineID = XLogCtl->InsertTimeLineID;
@@ -8532,7 +8546,7 @@ xlog_redo(XLogReaderState *record)
 		 */
 		if (InRecovery && InHotStandby &&
 			xlrec.wal_level < WAL_LEVEL_LOGICAL &&
-			wal_level >= WAL_LEVEL_LOGICAL)
+			GetActiveWalLevel() >= WAL_LEVEL_LOGICAL)
 			InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
 											   0, InvalidOid,
 											   InvalidTransactionId);
@@ -8604,6 +8618,25 @@ xlog_redo(XLogReaderState *record)
 	{
 		/* nothing to do here, just for informational purposes */
 	}
+	else if (info == XLOG_WAL_LEVEL_CHANGE)
+	{
+		WalLevel	new_wal_level;
+
+		memcpy(&new_wal_level, XLogRecGetData(record), sizeof(new_wal_level));
+
+		if (InRecovery && InHotStandby &&
+			new_wal_level < WAL_LEVEL_LOGICAL &&
+			GetActiveWalLevel() >= WAL_LEVEL_LOGICAL)
+			InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
+											   0, InvalidOid,
+											   InvalidTransactionId);
+
+		/* Update our copy of the wal_level parameter in pg_control */
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->wal_level = new_wal_level;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 /*
@@ -8825,11 +8858,11 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 	 * During recovery, we don't need to check WAL level. Because, if WAL
 	 * level is not sufficient, it's impossible to get here during recovery.
 	 */
-	if (!backup_started_in_recovery && !XLogIsNeeded())
+	if (!backup_started_in_recovery && GetActiveWalLevel() < WAL_LEVEL_REPLICA)
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("WAL level not sufficient for making an online backup"),
-				 errhint("\"wal_level\" must be set to \"replica\" or \"logical\" at server start.")));
+				 errhint("\"wal_level\" must be set to \"replica\" or \"logical\".")));
 
 	if (strlen(backupidstr) > MAXPGPATH)
 		ereport(ERROR,
@@ -9155,17 +9188,13 @@ do_pg_backup_stop(BackupState *state, bool waitforarchive)
 
 	Assert(state != NULL);
 
-	backup_stopped_in_recovery = RecoveryInProgress();
-
 	/*
-	 * During recovery, we don't need to check WAL level. Because, if WAL
-	 * level is not sufficient, it's impossible to get here during recovery.
+	 * This backend must have already called do_pg_backup_start(). The WAL
+	 * level check should have been done there.
 	 */
-	if (!backup_stopped_in_recovery && !XLogIsNeeded())
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("WAL level not sufficient for making an online backup"),
-				 errhint("\"wal_level\" must be set to \"replica\" or \"logical\" at server start.")));
+	Assert(get_backup_status() == SESSION_BACKUP_RUNNING);
+
+	backup_stopped_in_recovery = RecoveryInProgress();
 
 	/*
 	 * OK to update backup counter and session-level lock.
@@ -9452,6 +9481,36 @@ register_persistent_abort_backup_handler(void)
 	already_done = true;
 }
 
+/*
+ * Wait for all running backups to finish.
+ *
+ * Note that this function doesn't have any interlock to prevent new
+ * backups from being executed.
+ */
+void
+wait_for_backup_finish(void)
+{
+	for (;;)
+	{
+		int			running_backups;
+
+		CHECK_FOR_INTERRUPTS();
+
+		WALInsertLockAcquireExclusive();
+		running_backups = XLogCtl->Insert.runningBackups;
+		WALInsertLockRelease();
+
+		if (running_backups == 0)
+			break;
+
+		WaitLatch(MyLatch,
+				  WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+				  1000L,
+				  WAIT_EVENT_BACKUP_TERMINATION);
+		ResetLatch(MyLatch);
+	}
+}
+
 /*
  * Get latest WAL insert pointer
  */
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 8c3090165f0..82e4c43f185 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -242,7 +242,7 @@ pg_create_restore_point(PG_FUNCTION_ARGS)
 				 errmsg("recovery is in progress"),
 				 errhint("WAL control functions cannot be executed during recovery.")));
 
-	if (!XLogIsNeeded())
+	if (GetActiveWalLevel() < WAL_LEVEL_REPLICA)
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("WAL level not sufficient for creating a restore point"),
diff --git a/src/backend/access/transam/xloglevel.c b/src/backend/access/transam/xloglevel.c
new file mode 100644
index 00000000000..1a9dfb10e0d
--- /dev/null
+++ b/src/backend/access/transam/xloglevel.c
@@ -0,0 +1,575 @@
+/*-------------------------------------------------------------------------
+ * xloglevel.c
+ *		Functionality for controlling 'wal_level' value online.
+ *
+ * This module implements dynamic WAL level changes via SIGHUP signal,
+ * eliminating the need for server restarts. The main idea is to decouple
+ * two aspects that were previously controlled by a single wal_level
+ * setting: the information included in WAL records and the functionalities
+ * available at each WAL level.
+ *
+ * To increase the WAL level, we first allow processes to write WAL records containing
+ * the additional information required by the target functionality, while keeping
+ * these functionalities unavailable. Once all processes have synchronized to
+ * generate the WAL records, the WAL level is increased further to enable
+ * the new functionalities.
+ *
+ * Decreasing the WAL level follows a similar pattern but requires additional steps.
+ * First, any functionality that won't be supported at the lower level must be
+ * terminated. For instance, decreasing from 'replica' to 'logical' requires
+ * invalidating all logical replication slots. After termination, processes reduce
+ * their WAL information content, and once all processes are synchronized, the WAL
+ * level is decreased.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/xloglevel.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "access/xact.h"
+#include "access/xloginsert.h"
+#include "catalog/pg_control.h"
+#include "miscadmin.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
+#include "replication/slot.h"
+#include "storage/ipc.h"
+#include "storage/lmgr.h"
+#include "storage/procsignal.h"
+#include "tcop/tcopprot.h"
+#include "utils/bgtask.h"
+#include "utils/guc_hooks.h"
+
+typedef struct WalLevelCtlData
+{
+	/*
+	 * current_wal_level is the value used by all backends to know the active
+	 * wal_level, which affects what information is logged in WAL records and
+	 * which WAL-related features such as replication, WAL archiving and
+	 * logical decoding can be used in the system.
+	 *
+	 * The process-local 'wal_level' value, which is updated when changing
+	 * wal_level GUC parameter, might not represent the up-to-date WAL level.
+	 * We need to use GetActiveWalLevel() to get the active WAL level.
+	 */
+	WalLevel	current_wal_level;
+
+	/*
+	 * Different WAL level if we're in the process of changing WAL level
+	 * online. Otherwise, same as current_wal_level.
+	 */
+	WalLevel	target_wal_level;
+}			WalLevelCtlData;
+static WalLevelCtlData * WalLevelCtl = NULL;
+
+typedef void (*wal_level_decrease_action_cb) (void);
+typedef void (*wal_level_increase_action_cb) (void);
+
+bool		WalLevelInitialized = false;
+
+/*
+ * Both doStandbyInfoLogging and doLogicalInfoLogging are backend-local
+ * caches.
+ *
+ * They can be used to determine if information required by hot standby
+ * and/or logical decoding need to be written in WAL records. This local
+ * cache is updated when (1) process startup and (2) processing the
+ * global barrier.
+ */
+static bool doStandbyInfoLogging = false;
+static bool doLogicalInfoLogging = false;
+
+static void update_wal_logging_info_local(void);
+static void update_wal_level(WalLevel new_wal_level, bool notify_all);
+static void write_wal_level_change(WalLevel new_wal_level);
+static void increase_to_replica_action(void);
+static void increase_wal_level(WalLevel next_level, WalLevel target_level,
+							   wal_level_increase_action_cb action_cb);
+static void decrease_to_replica_action(void);
+static void decrease_to_minimal_action(void);
+static void decrease_wal_level(WalLevel next_level, WalLevel target_level,
+							   wal_level_decrease_action_cb action_cb);
+static const char *get_wal_level_string(WalLevel level);
+
+Size
+WalLevelCtlShmemSize(void)
+{
+	return sizeof(WalLevelCtlData);
+}
+
+void
+WalLevelCtlShmemInit(void)
+{
+	bool		found;
+
+	WalLevelCtl = ShmemInitStruct("wal_level control",
+								  WalLevelCtlShmemSize(),
+								  &found);
+
+	if (!found)
+	{
+		WalLevelCtl->current_wal_level = WAL_LEVEL_REPLICA;
+		WalLevelCtl->target_wal_level = WAL_LEVEL_REPLICA;
+	}
+}
+
+/*
+ * Initialize the global wal_level. This function is called after processing
+ * the configuration at startup.
+ */
+void
+InitializeWalLevelCtl(void)
+{
+	Assert(!WalLevelInitialized);
+
+	WalLevelCtl->current_wal_level = wal_level;
+	WalLevelCtl->target_wal_level = wal_level;
+	WalLevelInitialized = true;
+}
+
+/*
+ * Update both doStandbyInfoLogging and doLogicalInfoLogging based on the
+ * current global WAL level value.
+ */
+static void
+update_wal_logging_info_local(void)
+{
+	LWLockAcquire(WalLevelControlLock, LW_SHARED);
+	doStandbyInfoLogging =
+		(WalLevelCtl->current_wal_level >= WAL_LEVEL_STANDBY_INFO_LOGGING);
+	doLogicalInfoLogging =
+		(WalLevelCtl->current_wal_level >= WAL_LEVEL_LOGICAL_INFO_LOGGING);
+	LWLockRelease(WalLevelControlLock);
+}
+
+/*
+ * Initialize process-local xlog info status. This must be called during the
+ * process startup time.
+ */
+void
+InitWalLoggingState(void)
+{
+	update_wal_logging_info_local();
+}
+
+/*
+ * This function is called when we are ordered to update the local state
+ * by a ProcSignalBarrier.
+ */
+bool
+ProcessBarrierUpdateWalLoggingState(void)
+{
+	update_wal_logging_info_local();
+	return true;
+}
+
+/*
+ * Return true if the logical info logging is enabled.
+ */
+bool
+LogicalInfoLoggingEnabled(void)
+{
+	return doLogicalInfoLogging;
+}
+
+/*
+ * Return true if the standby info logging is enabled.
+ */
+bool
+StandbyInfoLoggingEnabled(void)
+{
+	return doStandbyInfoLogging;
+}
+
+/*
+ * Return the active wal_level stored on the shared memory, WalLevelCtl.
+ */
+int
+GetActiveWalLevel(void)
+{
+	WalLevel	level;
+
+	LWLockAcquire(WalLevelControlLock, LW_SHARED);
+	level = WalLevelCtl->current_wal_level;
+	LWLockRelease(WalLevelControlLock);
+
+	return level;
+}
+
+/*
+ * Return true if the logical decoding is ready to use.
+ */
+bool
+LogicalDecodingEnabled(void)
+{
+	return GetActiveWalLevel() >= WAL_LEVEL_LOGICAL;
+}
+
+/*
+ * Update WAL level after the recovery if necessary.
+ *
+ * This function must be called ONCE at the end of the recovery.
+ */
+void
+UpdateWalLevelAfterRecovery(WalLevel level)
+{
+	if (wal_level == level)
+		return;
+
+	WalLevelCtl->current_wal_level = level;
+	UpdateWalLevel();
+
+	/* Order all processes to reflect the new WAL level */
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_UPDATE_WAL_LOGGING_STATE));
+}
+
+void
+UpdateWalLevel(void)
+{
+	/*
+	 * During recovery, we don't need to any coordination for WAL level
+	 * changing with running processes as any writes are not permitted yet.
+	 */
+	if (RecoveryInProgress())
+	{
+		WalLevelCtl->current_wal_level = wal_level;
+		return;
+	}
+
+	LWLockAcquire(WalLevelControlLock, LW_EXCLUSIVE);
+
+	if (WalLevelCtl->current_wal_level == wal_level)
+	{
+		LWLockRelease(WalLevelControlLock);
+		return;
+	}
+
+	WalLevelCtl->target_wal_level = wal_level;
+	LWLockRelease(WalLevelControlLock);
+
+	/* Offload the wal_level change work to a bgtask worker */
+	BgTaskRequest(BGTASK_TYPE_WAL_LEVEL_CHANGE, InvalidOid);
+}
+
+/*
+ * Update the WAL level value on shared memory.
+ *
+ * If notify_all is true, we order all processes to update their local
+ * WAL-logging information cache using global signal barriers, and wait
+ * for the complete.
+ */
+static void
+update_wal_level(WalLevel new_wal_level, bool notify_all)
+{
+	LWLockAcquire(WalLevelControlLock, LW_EXCLUSIVE);
+	WalLevelCtl->current_wal_level = new_wal_level;
+	LWLockRelease(WalLevelControlLock);
+
+	if (notify_all)
+	{
+		WaitForProcSignalBarrier(
+								 EmitProcSignalBarrier(PROCSIGNAL_BARRIER_UPDATE_WAL_LOGGING_STATE));
+	}
+}
+
+/*
+ * Write XLOG_WAL_LEVEL_CHANGE record with the given WAL level.
+ */
+static void
+write_wal_level_change(WalLevel new_wal_level)
+{
+	XLogBeginInsert();
+	XLogRegisterData((char *) (&new_wal_level), sizeof(int));
+	XLogInsert(RM_XLOG_ID, XLOG_WAL_LEVEL_CHANGE);
+}
+
+/*
+ * Callback function for increasing WAL level to 'replica' from 'minimal'.
+ */
+static void
+increase_to_replica_action(void)
+{
+	/*
+	 * We create a checkpoint to increase wal_level from 'minimal' so that we
+	 * can restart from there in case of a server crash.
+	 */
+	RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
+}
+
+/*
+ * Function to increase WAL level from 'minimal' to 'replica' or from 'replica'
+ * to 'logical'.
+ *
+ * 'target_level' is the destination WAL level, which must be 'replica' or
+ * 'logical'. 'next_level' is the intermediate level for the transition, which
+ * must be STANDBY_INFO_LOGGING or LOGICAL_INFO_LOGGING.
+ */
+static void
+increase_wal_level(WalLevel next_level, WalLevel target_level,
+				   wal_level_increase_action_cb action_cb)
+
+{
+	Assert(next_level == WAL_LEVEL_STANDBY_INFO_LOGGING ||
+		   next_level == WAL_LEVEL_LOGICAL_INFO_LOGGING);
+	Assert(target_level == WAL_LEVEL_REPLICA ||
+		   target_level == WAL_LEVEL_LOGICAL);
+
+	/*
+	 * Increase to the 'next_level' so that all process can start to write WAL
+	 * records with information required by the 'target_level'. We order all
+	 * process to reflect this change.
+	 */
+	update_wal_level(next_level, true);
+
+	/* Invoke the callback function, if specified */
+	if (action_cb)
+		action_cb();
+
+	START_CRIT_SECTION();
+
+	/*
+	 * Now increase to the 'target_level'.
+	 *
+	 * It's safe to take increase the WAL level, even when not strictly
+	 * required. So first set it true and then write the WAL record.
+	 */
+	update_wal_level(target_level, false);
+	write_wal_level_change(target_level);
+
+	END_CRIT_SECTION();
+
+	ereport(DEBUG1,
+			(errmsg("wal_level has been increased to \"%s\"",
+					get_wal_level_string(target_level))));
+}
+
+/*
+ * Callback function for decreasing WAL level to 'logical' from 'replica'.
+ */
+static void
+decrease_to_replica_action(void)
+{
+	/*
+	 * Invalidate all logical replication slots, terminating processes doing
+	 * logical decoding (and logical replication) too.
+	 */
+	InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
+									   0, InvalidOid,
+									   InvalidTransactionId);
+}
+
+/*
+ * Callback function for decreasing WAL level to 'replica' from 'minimal'.
+ *
+ * We terminate some functionalities that are not available at 'minimal' such
+ * as WAL archival and log shipping. The WAL level has already been decreased
+ * to WAL_LEVEL_STANDBY_INFO_LOGGING, we don't need to deal with the case
+ * where these aux processes are concurrently launched.
+ */
+static void
+decrease_to_minimal_action(void)
+{
+	/* shutdown archiver */
+	PgArchShutdown();
+
+	/* shutdown wal senders */
+	WalSndTerminate();
+	WalSndWaitStopping();
+
+	/* wait for currently running backups to finish */
+	wait_for_backup_finish();
+}
+
+/*
+ * Function to increase WAL level from 'logical' to 'replica' or from 'replica'
+ * to 'minimal'.
+ *
+ * 'target_level' is the destination WAL level, which must be 'replica' or
+ * 'minimal'. 'next_level' is the intermediate level for the transition, which
+ * must be STANDBY_INFO_LOGGING or LOGICAL_INFO_LOGGING.
+ */
+static void
+decrease_wal_level(WalLevel next_level, WalLevel target_level,
+				   wal_level_decrease_action_cb action_cb)
+{
+	Assert(next_level == WAL_LEVEL_STANDBY_INFO_LOGGING ||
+		   next_level == WAL_LEVEL_LOGICAL_INFO_LOGGING);
+	Assert(target_level == WAL_LEVEL_REPLICA ||
+		   target_level == WAL_LEVEL_MINIMAL);
+
+	/*
+	 * Decrease to the 'next_level' to prevent functionality that require the
+	 * current WAL level from being newly started while not affecting the
+	 * information contained in WAL records.
+	 */
+	update_wal_level(next_level, true);
+
+	/* Invoke the callback function, if specified */
+	if (action_cb)
+		action_cb();
+
+	START_CRIT_SECTION();
+
+	/*
+	 * Now increase to the 'target_level'.
+	 *
+	 * We must not decreasing WAL level before writing the WAL records,
+	 * because otherwise we would end up having insufficient information in
+	 * WAL records. Therefore, unlike increasing the WAL level, we write the
+	 * WAL recordand then set the shared WAL level.
+	 */
+	write_wal_level_change(target_level);
+	update_wal_level(target_level, false);
+
+	END_CRIT_SECTION();
+
+	/* Order all processed to disable logical information WAL-logging */
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_UPDATE_WAL_LOGGING_STATE));
+
+	ereport(DEBUG1,
+			(errmsg("wal_level has been decreased to \"%s\"",
+					get_wal_level_string(target_level))));
+
+}
+
+void
+BgTaskWalLevelChange(void)
+{
+	WalLevel	current;
+	WalLevel	target;
+
+	LWLockAcquire(WalLevelControlLock, LW_SHARED);
+	current = WalLevelCtl->current_wal_level;
+	target = WalLevelCtl->target_wal_level;
+	LWLockRelease(WalLevelControlLock);
+
+	/* The target WAL level must not be intermediate levels */
+	Assert(target == WAL_LEVEL_MINIMAL ||
+		   target == WAL_LEVEL_REPLICA ||
+		   target == WAL_LEVEL_LOGICAL);
+
+	/* No need to change wal_level, exit */
+	if (current == target)
+		return;
+
+	ereport(LOG,
+			(errmsg("changing wal_level from \"%s\" to \"%s\"",
+					get_wal_level_string(current), get_wal_level_string(target))));
+
+	if (target > current)
+	{
+		/* We're increasing the WAL level one by one */
+
+		if (current == WAL_LEVEL_MINIMAL ||
+			current == WAL_LEVEL_STANDBY_INFO_LOGGING)
+		{
+			/* increasing to 'replica' */
+			increase_wal_level(WAL_LEVEL_STANDBY_INFO_LOGGING,
+							   WAL_LEVEL_REPLICA,
+							   increase_to_replica_action);
+		}
+
+		if (target == WAL_LEVEL_LOGICAL)
+		{
+			/* increasing to 'logical' */
+			increase_wal_level(WAL_LEVEL_LOGICAL_INFO_LOGGING,
+							   WAL_LEVEL_LOGICAL,
+							   NULL);
+		}
+	}
+	else
+	{
+		/* We're decreasing the WAL level one by one */
+
+		if (current == WAL_LEVEL_LOGICAL)
+		{
+			/* decreasing to 'replica' */
+			decrease_wal_level(WAL_LEVEL_LOGICAL_INFO_LOGGING,
+							   WAL_LEVEL_REPLICA,
+							   decrease_to_replica_action);
+		}
+
+		if (target == WAL_LEVEL_MINIMAL)
+		{
+			/* decreasing to 'replica' */
+			decrease_wal_level(WAL_LEVEL_STANDBY_INFO_LOGGING,
+							   WAL_LEVEL_MINIMAL,
+							   decrease_to_minimal_action);
+		}
+	}
+
+	ereport(LOG,
+			(errmsg("successfully made wal_level \"%s\" effective",
+					get_wal_level_string(target))));
+}
+
+/*
+ * Find a string representation of wal_level.
+ *
+ * This function doesn't support deprecated wal_level values such
+ * as 'archive'.
+ */
+static const char *
+get_wal_level_string(WalLevel level)
+{
+	switch (level)
+	{
+		case WAL_LEVEL_MINIMAL:
+			return "minimal";
+		case WAL_LEVEL_STANDBY_INFO_LOGGING:
+			return "minimal-standby_info_logging";
+		case WAL_LEVEL_REPLICA:
+			return "replica";
+		case WAL_LEVEL_LOGICAL_INFO_LOGGING:
+			return "replica-logical_info_logging";
+		case WAL_LEVEL_LOGICAL:
+			return "logical";
+	}
+
+	return "???";
+}
+
+const char *
+show_wal_level(void)
+{
+	return get_wal_level_string(GetActiveWalLevel());
+}
+
+bool
+check_wal_level(int *newval, void **extra, GucSource source)
+{
+	/* Just accept the value when restoring state in a parallel worker */
+	if (InitializingParallelWorker)
+		return true;
+
+	if (!WalLevelInitialized)
+		return true;
+
+	if (wal_level != *newval)
+	{
+		if (RecoveryInProgress())
+		{
+			GUC_check_errmsg("cannot change \"wal_level\" during recovery");
+			return false;
+		}
+
+		if (WalLevelCtl->current_wal_level != WalLevelCtl->target_wal_level)
+		{
+			GUC_check_errmsg("cannot change \"wal_level\" while changing level is in-progress");
+			GUC_check_errdetail("\"wal_level\" is being changed to \"%s\"",
+								get_wal_level_string(WalLevelCtl->target_wal_level));
+			return false;
+		}
+	}
+
+	return true;
+}
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 473de6710d7..bc189570141 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1122,6 +1122,11 @@ validateRecoveryParameters(void)
 			ereport(WARNING,
 					(errmsg("specified neither \"primary_conninfo\" nor \"restore_command\""),
 					 errhint("The database server will regularly poll the pg_wal subdirectory to check for files placed there.")));
+
+		if (wal_level == WAL_LEVEL_MINIMAL)
+			ereport(FATAL,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("cannot enter standby mode with \"wal_level=minimal\"")));
 	}
 	else
 	{
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 150a768d16f..df927420a4b 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -921,7 +921,7 @@ CreatePublication(ParseState *pstate, CreatePublicationStmt *stmt)
 
 	InvokeObjectPostCreateHook(PublicationRelationId, puboid, 0);
 
-	if (wal_level != WAL_LEVEL_LOGICAL)
+	if (!XLogLogicalInfoActive())
 		ereport(WARNING,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("\"wal_level\" is insufficient to publish logical changes"),
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index b94f9cdff21..6da71f49b07 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -1377,6 +1377,12 @@ UpdateSharedMemoryConfig(void)
 	 */
 	UpdateFullPageWrites();
 
+	/*
+	 * If wal_level has been changed by SIGHUP, delegate changing wal_level
+	 * value to the wal level control worker.
+	 */
+	UpdateWalLevel();
+
 	elog(DEBUG2, "checkpointer updated shared memory configuration values");
 }
 
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 12ee815a626..d3001c6ebd3 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -291,6 +291,32 @@ PgArchWakeup(void)
 		SetLatch(&ProcGlobal->allProcs[arch_pgprocno].procLatch);
 }
 
+void
+PgArchShutdown(void)
+{
+	int			arch_pgprocno = PgArch->pgprocno;
+
+	if (arch_pgprocno == INVALID_PROC_NUMBER)
+		return;
+
+	/* terminate archiver */
+	kill(GetPGProcByNumber(arch_pgprocno)->pid, SIGUSR2);
+
+	/* and wait for it to exit */
+	for (;;)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		/* is it gone? */
+		if (arch_pgprocno == INVALID_PROC_NUMBER)
+			break;
+
+		WaitLatch(MyLatch,
+				  WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+				  100L, WAIT_EVENT_ARCHIVER_SHUTDOWN);
+		ResetLatch(MyLatch);
+	}
+}
 
 /* SIGUSR2 signal handler for archiver process */
 static void
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 108a1e5cfbe..3f9b907754f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -835,15 +835,6 @@ PostmasterMain(int argc, char *argv[])
 					 MaxConnections);
 		ExitPostmaster(1);
 	}
-	if (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_MINIMAL)
-		ereport(ERROR,
-				(errmsg("WAL archival cannot be enabled when \"wal_level\" is \"minimal\"")));
-	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
-		ereport(ERROR,
-				(errmsg("WAL streaming (\"max_wal_senders\" > 0) requires \"wal_level\" to be \"replica\" or \"logical\"")));
-	if (summarize_wal && wal_level == WAL_LEVEL_MINIMAL)
-		ereport(ERROR,
-				(errmsg("WAL cannot be summarized when \"wal_level\" is \"minimal\"")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -1036,6 +1027,12 @@ PostmasterMain(int argc, char *argv[])
 	RemovePgTempFilesInDir(PG_TEMP_FILES_DIR, true, false);
 #endif
 
+	/*
+	 * Initialize logical info logging and logical decoding state. We must do
+	 * this after initializing GUCs as it reflects wal_level setting.
+	 */
+	InitializeWalLevelCtl();
+
 	/*
 	 * Forcibly remove the files signaling a standby promotion request.
 	 * Otherwise, the existence of those files triggers a promotion too early,
@@ -3285,7 +3282,7 @@ LaunchMissingBackgroundProcesses(void)
 	 * If WAL archiving is enabled always, we are allowed to start archiver
 	 * even during recovery.
 	 */
-	if (PgArchPMChild == NULL &&
+	if (GetActiveWalLevel() >= WAL_LEVEL_REPLICA && PgArchPMChild == NULL &&
 		((XLogArchivingActive() && pmState == PM_RUN) ||
 		 (XLogArchivingAlways() && (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY))) &&
 		PgArchCanRestart())
@@ -3331,7 +3328,8 @@ LaunchMissingBackgroundProcesses(void)
 	}
 
 	/* If we need to start a WAL summarizer, try to do that now */
-	if (summarize_wal && WalSummarizerPMChild == NULL &&
+	if (summarize_wal && GetActiveWalLevel() >= WAL_LEVEL_REPLICA &&
+		WalSummarizerPMChild == NULL &&
 		(pmState == PM_RUN || pmState == PM_HOT_STANDBY) &&
 		Shutdown <= SmartShutdown)
 		WalSummarizerPMChild = StartChildProcess(B_WAL_SUMMARIZER);
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 24d88f368d8..24782bd6fbd 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -178,6 +178,18 @@ xlog_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 				}
 				break;
 			}
+		case XLOG_WAL_LEVEL_CHANGE:
+			{
+				int			new_wal_level;
+
+				memcpy(&new_wal_level, XLogRecGetData(buf->record),
+					   sizeof(new_wal_level));
+
+				if (new_wal_level < WAL_LEVEL_LOGICAL)
+					ereport(ERROR,
+							(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+							 errmsg("logical decoding on standby requires \"wal_level\" >= \"logical\" on the primary")));
+			}
 		case XLOG_NOOP:
 		case XLOG_NEXTOID:
 		case XLOG_SWITCH:
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 8ea846bfc3b..8679b2d0ef7 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -115,7 +115,7 @@ CheckLogicalDecodingRequirements(void)
 	 * needs the same check.
 	 */
 
-	if (wal_level < WAL_LEVEL_LOGICAL)
+	if (!LogicalDecodingEnabled())
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("logical decoding requires \"wal_level\" >= \"logical\"")));
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 987857b9491..3cae6ab7d41 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -1038,14 +1038,14 @@ ValidateSlotSyncParams(int elevel)
 {
 	/*
 	 * Logical slot sync/creation requires wal_level >= logical.
-	 *
-	 * Since altering the wal_level requires a server restart, so error out in
-	 * this case regardless of elevel provided by caller.
 	 */
-	if (wal_level < WAL_LEVEL_LOGICAL)
-		ereport(ERROR,
+	if (!LogicalDecodingEnabled())
+	{
+		ereport(elevel,
 				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				errmsg("replication slot synchronization requires \"wal_level\" >= \"logical\""));
+		return false;
+	}
 
 	/*
 	 * A physical replication slot(primary_slot_name) is required on the
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fe5acd8b1fc..b092008ce56 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1403,7 +1403,7 @@ CheckSlotRequirements(void)
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("replication slots can only be used if \"max_replication_slots\" > 0")));
 
-	if (wal_level < WAL_LEVEL_REPLICA)
+	if (GetActiveWalLevel() < WAL_LEVEL_REPLICA)
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("replication slots can only be used if \"wal_level\" >= \"replica\"")));
@@ -2360,13 +2360,13 @@ RestoreSlotFromDisk(const char *name)
 	 * NB: Changing the requirements here also requires adapting
 	 * CheckSlotRequirements() and CheckLogicalDecodingRequirements().
 	 */
-	if (cp.slotdata.database != InvalidOid && wal_level < WAL_LEVEL_LOGICAL)
+	if (cp.slotdata.database != InvalidOid && GetActiveWalLevel() < WAL_LEVEL_LOGICAL)
 		ereport(FATAL,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("logical replication slot \"%s\" exists, but \"wal_level\" < \"logical\"",
 						NameStr(cp.slotdata.name)),
 				 errhint("Change \"wal_level\" to be \"logical\" or higher.")));
-	else if (wal_level < WAL_LEVEL_REPLICA)
+	else if (GetActiveWalLevel() < WAL_LEVEL_REPLICA)
 		ereport(FATAL,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("physical replication slot \"%s\" exists, but \"wal_level\" < \"replica\"",
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 446d10c1a7d..9d0b33cf833 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -277,6 +277,11 @@ static void WalSndSegmentOpen(XLogReaderState *state, XLogSegNo nextSegNo,
 void
 InitWalSender(void)
 {
+	if (GetActiveWalLevel() < WAL_LEVEL_REPLICA)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("WAL senders require \"wal_level\" to be \"replica\" or \"logical\"")));
+
 	am_cascading_walsender = RecoveryInProgress();
 
 	/* Create a per-walsender data structure in shared memory */
@@ -3733,6 +3738,30 @@ WalSndInitStopping(void)
 	}
 }
 
+/*
+ * Terminate all running walsenders. Unlike shutting down them using
+ * WalSndInitStopping, this function can be used to terminate walsenders
+ * even while normal backends are generating WAL records.
+ */
+void
+WalSndTerminate(void)
+{
+	for (int i = 0; i < max_wal_senders; i++)
+	{
+		WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+		pid_t		pid;
+
+		SpinLockAcquire(&walsnd->mutex);
+		pid = walsnd->pid;
+		SpinLockRelease(&walsnd->mutex);
+
+		if (pid == 0)
+			continue;
+
+		kill(pid, SIGUSR2);
+	}
+}
+
 /*
  * Wait that all the WAL senders have quit or reached the stopping state. This
  * is used by the checkpointer to control when the shutdown checkpoint can
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 7c3c74dcdaf..2e5f990afe3 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -149,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
 	size = add_size(size, InjectionPointShmemSize());
 	size = add_size(size, SlotSyncShmemSize());
 	size = add_size(size, BgTaskShmemSize());
+	size = add_size(size, WalLevelCtlShmemSize());
 
 	/* include additional requested shmem from preload libraries */
 	size = add_size(size, total_addin_request);
@@ -331,6 +332,7 @@ CreateOrAttachShmemStructs(void)
 	PgArchShmemInit();
 	ApplyLauncherShmemInit();
 	SlotSyncShmemInit();
+	WalLevelCtlShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7401b6e625e..b930036d3a3 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -573,6 +574,9 @@ ProcessProcSignalBarrier(void)
 					case PROCSIGNAL_BARRIER_SMGRRELEASE:
 						processed = ProcessBarrierSmgrRelease();
 						break;
+					case PROCSIGNAL_BARRIER_UPDATE_WAL_LOGGING_STATE:
+						processed = ProcessBarrierUpdateWalLoggingState();
+						break;
 				}
 
 				/*
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 5acb4508f85..761f301cf50 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -498,7 +498,7 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * seems OK, given that this kind of conflict should not normally be
 	 * reached, e.g. due to using a physical replication slot.
 	 */
-	if (wal_level >= WAL_LEVEL_LOGICAL && isCatalogRel)
+	if (GetActiveWalLevel() >= WAL_LEVEL_LOGICAL && isCatalogRel)
 		InvalidateObsoleteReplicationSlots(RS_INVAL_HORIZON, 0, locator.dbOid,
 										   snapshotConflictHorizon);
 }
@@ -1313,13 +1313,13 @@ LogStandbySnapshot(void)
 	 * record. Fortunately this routine isn't executed frequently, and it's
 	 * only a shared lock.
 	 */
-	if (wal_level < WAL_LEVEL_LOGICAL)
+	if (!XLogLogicalInfoActive())
 		LWLockRelease(ProcArrayLock);
 
 	recptr = LogCurrentRunningXacts(running);
 
 	/* Release lock if we kept it longer ... */
-	if (wal_level >= WAL_LEVEL_LOGICAL)
+	if (XLogLogicalInfoActive())
 		LWLockRelease(ProcArrayLock);
 
 	/* GetRunningTransactionData() acquired XidGenLock, we must release it */
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index fef6887eb43..6bdf585f1e5 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -52,6 +52,7 @@
 Section: ClassName - WaitEventActivity
 
 ARCHIVER_MAIN	"Waiting in main loop of archiver process."
+ARCHIVER_SHUTDOWN	"Waiting for archiver process to be terminated."
 AUTOVACUUM_MAIN	"Waiting in main loop of autovacuum launcher process."
 BGWRITER_HIBERNATE	"Waiting in background writer process, hibernating."
 BGWRITER_MAIN	"Waiting in main loop of background writer process."
@@ -106,6 +107,7 @@ APPEND_READY	"Waiting for subplan nodes of an <literal>Append</literal> plan nod
 ARCHIVE_CLEANUP_COMMAND	"Waiting for <xref linkend="guc-archive-cleanup-command"/> to complete."
 ARCHIVE_COMMAND	"Waiting for <xref linkend="guc-archive-command"/> to complete."
 BACKEND_TERMINATION	"Waiting for the termination of another backend."
+BACKUP_TERMINATION	"Waiting for all running backups to complete."
 BACKUP_WAIT_WAL_ARCHIVE	"Waiting for WAL files required for a backup to be successfully archived."
 BGWORKER_SHUTDOWN	"Waiting for background worker to shut down."
 BGWORKER_STARTUP	"Waiting for background worker to start up."
@@ -347,6 +349,7 @@ DSMRegistry	"Waiting to read or update the dynamic shared memory registry."
 InjectionPoint	"Waiting to read or update information related to injection points."
 SerialControl	"Waiting to read or update shared <filename>pg_serial</filename> state."
 BgTaskControl	"Waiting to read or update shared background task information."
+WalLevelControl	"Waiting to read or update wal_level control state."
 
 #
 # END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index d70a5be446a..97675d81c54 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -647,6 +647,9 @@ BaseInit(void)
 	/* Initialize lock manager's local structs */
 	InitLockManagerAccess();
 
+	/* Initialize WAL-logging state */
+	InitWalLoggingState();
+
 	/*
 	 * Initialize replication slots after pgstat. The exit hook might need to
 	 * drop ephemeral slots, which in turn triggers stats reporting.
diff --git a/src/backend/utils/misc/bgtask.c b/src/backend/utils/misc/bgtask.c
index ac67968b9a1..15645e65d1f 100644
--- a/src/backend/utils/misc/bgtask.c
+++ b/src/backend/utils/misc/bgtask.c
@@ -24,6 +24,7 @@
 
 #include "postgres.h"
 
+#include "access/xlog.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/interrupt.h"
 #include "miscadmin.h"
@@ -60,8 +61,6 @@ typedef struct BgTaskCtlData
 }			BgTaskCtlData;
 static BgTaskCtlData * BgTaskCtl = NULL;
 
-static void dummy_bgtask_fn(void);
-
 static const struct
 {
 	const char *name;
@@ -69,9 +68,9 @@ static const struct
 }			InternalBgTasks[] =
 
 {
-	[BGTASK_TYPE_DUMMY] = {
-		.name = "dummy task",
-		.func = &dummy_bgtask_fn,
+	[BGTASK_TYPE_WAL_LEVEL_CHANGE] = {
+		.name = "online wal_level change",
+		.func = &BgTaskWalLevelChange,
 	},
 };
 
@@ -233,9 +232,3 @@ BgTaskWorkerDetach(int code, Datum arg)
 	BgTaskCtl->task_slots[type].in_progress = false;
 	LWLockRelease(BgTaskControlLock);
 }
-
-static void
-dummy_bgtask_fn(void)
-{
-	elog(LOG, "executed dummy background task");
-}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 42728189322..e277c881593 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -5091,13 +5091,13 @@ struct config_enum ConfigureNamesEnum[] =
 	},
 
 	{
-		{"wal_level", PGC_POSTMASTER, WAL_SETTINGS,
+		{"wal_level", PGC_SIGHUP, WAL_SETTINGS,
 			gettext_noop("Sets the level of information written to the WAL."),
 			NULL
 		},
 		&wal_level,
 		WAL_LEVEL_REPLICA, wal_level_options,
-		NULL, NULL, NULL
+		check_wal_level, NULL, show_wal_level
 	},
 
 	{
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index cf11ab3f2ee..61a5593869d 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -76,8 +76,12 @@ wal_level_str(WalLevel wal_level)
 	{
 		case WAL_LEVEL_MINIMAL:
 			return "minimal";
+		case WAL_LEVEL_STANDBY_INFO_LOGGING:
+			return "minimal-standby_info_logging";
 		case WAL_LEVEL_REPLICA:
 			return "replica";
+		case WAL_LEVEL_LOGICAL_INFO_LOGGING:
+			return "replica-logical_info_logging";
 		case WAL_LEVEL_LOGICAL:
 			return "logical";
 	}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 4411c1468ac..2788d20da5e 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -71,10 +71,38 @@ extern PGDLLIMPORT int XLogArchiveMode;
 /* WAL levels */
 typedef enum WalLevel
 {
+	/*
+	 * This WAL level corresponds to 'minimal', where we don't WAL-logging for
+	 * neither archival nor log-shipping.
+	 */
 	WAL_LEVEL_MINIMAL = 0,
+
+	/*
+	 * In this level, we enable WAL-logging information required only for
+	 * archival, log-shipping, and hot standby. However, these functionalities
+	 * are not available with this level.
+	 */
+	WAL_LEVEL_STANDBY_INFO_LOGGING,
+
+	/*
+	 * This WAL level corresponds to 'replica', where we allow archival, and,
+	 * log-shipping, and hot standby.
+	 */
 	WAL_LEVEL_REPLICA,
+
+	/*
+	 * In this level, we enable WAL-logging information required for logical
+	 * decoding. However, logical decoding is not available with this level.
+	 */
+	WAL_LEVEL_LOGICAL_INFO_LOGGING,
+
+	/*
+	 * This WAL level corresponds to 'logical', where we allow logical
+	 * decoding.
+	 */
 	WAL_LEVEL_LOGICAL,
 } WalLevel;
+extern bool WalLevelInitialized;
 
 /* Compression algorithms for WAL */
 typedef enum WalCompression
@@ -97,16 +125,16 @@ extern PGDLLIMPORT int wal_level;
 
 /* Is WAL archiving enabled (always or only while server is running normally)? */
 #define XLogArchivingActive() \
-	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode > ARCHIVE_MODE_OFF)
+	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || GetActiveWalLevel() >= WAL_LEVEL_REPLICA), XLogArchiveMode > ARCHIVE_MODE_OFF)
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
-	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
+	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || GetActiveWalLevel() >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
  * WAL-logging if we fsync() the data before committing instead?
  */
-#define XLogIsNeeded() (wal_level >= WAL_LEVEL_REPLICA)
+#define XLogIsNeeded() (StandbyInfoLoggingEnabled())
 
 /*
  * Is a full-page image needed for hint bit updates?
@@ -120,10 +148,10 @@ extern PGDLLIMPORT int wal_level;
 #define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
-#define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
+#define XLogStandbyInfoActive() (StandbyInfoLoggingEnabled())
 
 /* Do we need to WAL-log information required only for logical replication? */
-#define XLogLogicalInfoActive() (wal_level >= WAL_LEVEL_LOGICAL)
+#define XLogLogicalInfoActive() (LogicalInfoLoggingEnabled())
 
 #ifdef WAL_DEBUG
 extern PGDLLIMPORT bool XLOG_DEBUG;
@@ -270,6 +298,22 @@ extern void SetInstallXLogFileSegmentActive(void);
 extern bool IsInstallXLogFileSegmentActive(void);
 extern void XLogShutdownWalRcv(void);
 
+/*
+ * Routines used by xloglevel.c to control WAL-logging information.
+ */
+extern Size WalLevelCtlShmemSize(void);
+extern void WalLevelCtlShmemInit(void);
+extern void InitializeWalLevelCtl(void);
+extern void InitWalLoggingState(void);
+extern bool ProcessBarrierUpdateWalLoggingState(void);
+extern bool LogicalInfoLoggingEnabled(void);
+extern bool StandbyInfoLoggingEnabled(void);
+extern bool LogicalDecodingEnabled(void);
+extern int	GetActiveWalLevel(void);
+extern void UpdateWalLevelAfterRecovery(WalLevel level);
+extern void UpdateWalLevel(void);
+extern void BgTaskWalLevelChange(void);
+
 /*
  * Routines to start, stop, and get status of a base backup.
  */
@@ -297,6 +341,7 @@ extern void do_pg_backup_stop(BackupState *state, bool waitforarchive);
 extern void do_pg_abort_backup(int code, Datum arg);
 extern void register_persistent_abort_backup_handler(void);
 extern SessionBackupState get_backup_status(void);
+extern void wait_for_backup_finish(void);
 
 /* File path names (all relative to $PGDATA) */
 #define RECOVERY_SIGNAL_FILE	"recovery.signal"
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 3797f25b306..39a14c67f6a 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -80,6 +80,7 @@ typedef struct CheckPoint
 /* 0xC0 is used in Postgres 9.5-11 */
 #define XLOG_OVERWRITE_CONTRECORD		0xD0
 #define XLOG_CHECKPOINT_REDO			0xE0
+#define XLOG_WAL_LEVEL_CHANGE			0xF0
 
 
 /*
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 8fc6bfeec1b..73f2d13223d 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -31,6 +31,7 @@ extern void PgArchShmemInit(void);
 extern bool PgArchCanRestart(void);
 extern void PgArchiverMain(char *startup_data, size_t startup_data_len) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
+extern void PgArchShutdown(void);
 extern void PgArchForceDirScan(void);
 
 #endif							/* _PGARCH_H */
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index c3e8e191339..d6d426915e9 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -44,6 +44,7 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(bool physical, bool logical);
+extern void WalSndTerminate(void);
 extern void WalSndInitStopping(void);
 extern void WalSndWaitStopping(void);
 extern void HandleWalSndInitStopping(void);
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index fef9f555310..686f8a45494 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -84,3 +84,4 @@ PG_LWLOCK(50, DSMRegistry)
 PG_LWLOCK(51, InjectionPoint)
 PG_LWLOCK(52, SerialControl)
 PG_LWLOCK(53, BgTaskControl)
+PG_LWLOCK(54, WalLevelControl)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed933..fc5b238ec20 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -54,6 +54,8 @@ typedef enum
 typedef enum
 {
 	PROCSIGNAL_BARRIER_SMGRRELEASE, /* ask smgr to close files */
+	PROCSIGNAL_BARRIER_UPDATE_WAL_LOGGING_STATE,	/* ask to update xlog info
+													 * state */
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/include/utils/bgtask.h b/src/include/utils/bgtask.h
index d9d89811c98..860da74f424 100644
--- a/src/include/utils/bgtask.h
+++ b/src/include/utils/bgtask.h
@@ -22,11 +22,11 @@
  */
 typedef enum
 {
-	BGTASK_TYPE_DUMMY = 0,
+	BGTASK_TYPE_WAL_LEVEL_CHANGE = 0,
 }			BgTaskType;
 
 /* The number of internal background tasks */
-#define NUM_INTERNAL_BGTASKS	BGTASK_TYPE_DUMMY + 1
+#define NUM_INTERNAL_BGTASKS 	BGTASK_TYPE_WAL_LEVEL_CHANGE + 1
 
 typedef void (*bgtask_fn_type) (void);
 
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 87999218d68..b8d36e0a9b8 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -169,6 +169,8 @@ extern bool check_wal_buffers(int *newval, void **extra, GucSource source);
 extern bool check_wal_consistency_checking(char **newval, void **extra,
 										   GucSource source);
 extern void assign_wal_consistency_checking(const char *newval, void *extra);
+extern const char *show_wal_level(void);
+extern bool check_wal_level(int *newval, void **extra, GucSource source);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 extern bool check_synchronized_standby_slots(char **newval, void **extra,
-- 
2.43.5

From 76aee1354bae13a29c7ffb53be6736a66fb3024c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 17 Feb 2025 11:52:09 -0800
Subject: [PATCH v3 2/3] Introduce bgtask, infrastructure to perform tasks in
 background.

---
 src/backend/postmaster/bgworker.c             |   4 +
 src/backend/storage/ipc/ipci.c                |   2 +
 .../utils/activity/wait_event_names.txt       |   1 +
 src/backend/utils/misc/Makefile               |   1 +
 src/backend/utils/misc/bgtask.c               | 241 ++++++++++++++++++
 src/backend/utils/misc/meson.build            |   1 +
 src/include/postmaster/bgworker_internals.h   |   8 +-
 src/include/storage/lwlocklist.h              |   1 +
 src/include/utils/bgtask.h                    |  38 +++
 9 files changed, 293 insertions(+), 4 deletions(-)
 create mode 100644 src/backend/utils/misc/bgtask.c
 create mode 100644 src/include/utils/bgtask.h

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 0f4389ad595..dc4af5ac904 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -29,6 +29,7 @@
 #include "storage/procsignal.h"
 #include "storage/shmem.h"
 #include "tcop/tcopprot.h"
+#include "utils/bgtask.h"
 #include "utils/ascii.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -147,6 +148,9 @@ static const struct
 	},
 	{
 		"TablesyncWorkerMain", TablesyncWorkerMain
+	},
+	{
+		"BgTaskWorkerMain", BgTaskWorkerMain
 	}
 };
 
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed70367..7c3c74dcdaf 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -148,6 +148,7 @@ CalculateShmemSize(int *num_semaphores)
 	size = add_size(size, WaitEventCustomShmemSize());
 	size = add_size(size, InjectionPointShmemSize());
 	size = add_size(size, SlotSyncShmemSize());
+	size = add_size(size, BgTaskShmemSize());
 
 	/* include additional requested shmem from preload libraries */
 	size = add_size(size, total_addin_request);
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
 	StatsShmemInit();
 	WaitEventCustomShmemInit();
 	InjectionPointShmemInit();
+	BgTaskShmemInit();
 }
 
 /*
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f071628..fef6887eb43 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -346,6 +346,7 @@ WALSummarizer	"Waiting to read or update WAL summarization state."
 DSMRegistry	"Waiting to read or update the dynamic shared memory registry."
 InjectionPoint	"Waiting to read or update information related to injection points."
 SerialControl	"Waiting to read or update shared <filename>pg_serial</filename> state."
+BgTaskControl	"Waiting to read or update shared background task information."
 
 #
 # END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index b362ae43771..b68f263a396 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
 override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
 
 OBJS = \
+	bgtask.o \
 	conffiles.o \
 	guc.o \
 	guc-file.o \
diff --git a/src/backend/utils/misc/bgtask.c b/src/backend/utils/misc/bgtask.c
new file mode 100644
index 00000000000..ac67968b9a1
--- /dev/null
+++ b/src/backend/utils/misc/bgtask.c
@@ -0,0 +1,241 @@
+/*--------------------------------------------------------------------
+ * bgtask.c
+ *		Infrastructure for executing internal task in background.
+ *
+ * The bgtask infrastructure can be used to offload internal tasks that are
+ * potentially costly to a background worker called "bgtask worker". The
+ * internal tasks that can be requested to bgtask workers are pre-defined
+ * (see BgTaskType and InternalBgTasks[]). Since the bgtask workers use
+ * reserved background worker slots it's guaranteed that they can be
+ * launched successfully.
+ *
+ * XXX: This files contains the minimal functionality to just request
+ * tasks to bgtask workers. Bgtask workers are never restarted on failure.
+ * These behavior may need to be changed.
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/misc/bgtask.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/lwlock.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/bgtask.h"
+
+/* Struct for single background task status */
+typedef struct BgTaskSlot
+{
+	Oid			dboid;
+	pid_t		worker_pid;
+
+	/*
+	 * True if this task is requested. This flag is set by the requester, and
+	 * cleared by the worker assigned to the task.
+	 */
+	bool		in_progress;
+}			BgTaskSlot;
+
+/* Control struct for internal background tasks */
+typedef struct BgTaskCtlData
+{
+	/* NUM_INTERNAL_BGTASKS */
+	int			total_tasks;
+
+	/*
+	 * Array of internal task status. The index of this array corresponds to
+	 * BgTaskType number.
+	 */
+	BgTaskSlot	task_slots[FLEXIBLE_ARRAY_MEMBER];
+}			BgTaskCtlData;
+static BgTaskCtlData * BgTaskCtl = NULL;
+
+static void dummy_bgtask_fn(void);
+
+static const struct
+{
+	const char *name;
+	bgtask_fn_type func;
+}			InternalBgTasks[] =
+
+{
+	[BGTASK_TYPE_DUMMY] = {
+		.name = "dummy task",
+		.func = &dummy_bgtask_fn,
+	},
+};
+
+static void BgTaskWorkerDetach(int code, Datum arg);
+
+Size
+BgTaskShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(BgTaskCtlData, task_slots);
+	size = add_size(size, mul_size(NUM_INTERNAL_BGTASKS,
+								   sizeof(BgTaskSlot)));
+
+	return size;
+}
+
+void
+BgTaskShmemInit(void)
+{
+	bool		found;
+
+	BgTaskCtl = ShmemInitStruct("Background Task Control Data",
+								BgTaskShmemSize(),
+								&found);
+
+	if (!found)
+	{
+		BgTaskCtl->total_tasks = NUM_INTERNAL_BGTASKS;
+		MemSet(BgTaskCtl->task_slots, 0,
+			   sizeof(BgTaskSlot) * BgTaskCtl->total_tasks);
+	}
+}
+
+/*
+ * Request the work on the given database to bgtask worker.
+ *
+ * Launching a bgtask worker must be done successfully because we allocates enough
+ * reserved background workers for bgtask workers.
+ */
+void
+BgTaskRequest(BgTaskType type, Oid dboid)
+{
+	BgTaskSlot *task;
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	pid_t		pid;
+
+	if (type > NUM_INTERNAL_BGTASKS)
+		elog(ERROR, "invalid task type %d", type);
+
+	LWLockAcquire(BgTaskControlLock, LW_EXCLUSIVE);
+
+	task = &(BgTaskCtl->task_slots[type]);
+
+	/* Return if the specified task is already in-progress */
+	if (task->in_progress)
+	{
+		LWLockRelease(BgTaskControlLock);
+		return;
+	}
+
+	/* Mark this task in-progress */
+	task->in_progress = true;
+	task->dboid = dboid;
+	LWLockRelease(BgTaskControlLock);
+
+	ereport(DEBUG1,
+			(errmsg("starting bgtask worker for task %d on database oid %u",
+					type, dboid)));
+
+	/* Register the new dynamic worker */
+	MemSet(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION |
+		BGWORKER_CLASS_RESERVED;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, MAXPGPATH, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "BgTaskWorkerMain");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "background task worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;	/* XXX need to be configurable
+												 * by the requester? */
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(type);
+
+	if (RegisterDynamicBackgroundWorker(&bgw, &bgw_handle) &&
+		WaitForBackgroundWorkerStartup(bgw_handle, &pid))
+	{
+		/*
+		 * The dynamic worker registration must be done successfully since
+		 * bgtask workers use the reserved bgworker slots.
+		 */
+		elog(ERROR, "could not register a background task worker for %d", type);
+	}
+}
+
+void
+BgTaskWorkerMain(Datum main_arg)
+{
+	BgTaskType	type = DatumGetInt32(main_arg);
+	BgTaskSlot *task;
+	Oid			dboid;
+
+	Assert(type < NUM_INTERNAL_BGTASKS);
+
+	on_shmem_exit(BgTaskWorkerDetach, (Datum) main_arg);
+
+	LWLockAcquire(BgTaskControlLock, LW_EXCLUSIVE);
+	task = &(BgTaskCtl->task_slots[type]);
+
+	/* Attach to the task slot */
+	task->worker_pid = MyProcPid;
+	dboid = task->dboid;
+
+	LWLockRelease(BgTaskControlLock);
+
+	if (OidIsValid(dboid))
+		ereport(LOG,
+				(errmsg("bgtask worker for task \"%s\" on database %u has started ",
+						InternalBgTasks[type].name, dboid)));
+	else
+		ereport(LOG,
+				(errmsg("bgtask worker for task \"%s\" has started ",
+						InternalBgTasks[type].name)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, 0);
+
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	PG_TRY();
+	{
+		/* Execute the task */
+		(void) InternalBgTasks[type].func();
+	}
+	PG_FINALLY();
+	{
+		/* Ensure to mark this task as completed */
+		LWLockAcquire(BgTaskControlLock, LW_EXCLUSIVE);
+		task->in_progress = false;
+		LWLockRelease(BgTaskControlLock);
+	}
+	PG_END_TRY();
+
+	proc_exit(0);
+}
+
+/*
+ * Callback function for on_shmem_exit().
+ */
+static void
+BgTaskWorkerDetach(int code, Datum arg)
+{
+	BgTaskType	type = DatumGetInt32(arg);
+
+	LWLockAcquire(BgTaskControlLock, LW_EXCLUSIVE);
+	BgTaskCtl->task_slots[type].worker_pid = InvalidPid;
+	BgTaskCtl->task_slots[type].in_progress = false;
+	LWLockRelease(BgTaskControlLock);
+}
+
+static void
+dummy_bgtask_fn(void)
+{
+	elog(LOG, "executed dummy background task");
+}
diff --git a/src/backend/utils/misc/meson.build b/src/backend/utils/misc/meson.build
index 9e389a00d05..950f7cd9383 100644
--- a/src/backend/utils/misc/meson.build
+++ b/src/backend/utils/misc/meson.build
@@ -1,6 +1,7 @@
 # Copyright (c) 2022-2025, PostgreSQL Global Development Group
 
 backend_sources += files(
+  'bgtask.c',
   'conffiles.c',
   'guc.c',
   'guc_funcs.c',
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index c26ec6b2387..cb6e0e3e75e 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -15,6 +15,7 @@
 #include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "postmaster/bgworker.h"
+#include "utils/bgtask.h"
 
 /* GUC options */
 
@@ -26,11 +27,10 @@
 /*
  * The number of background workers reserved for ones registered with
  * BGWORKER_CLASS_RESERVED flag. We allocate total background worker
- * slots for max_worker_process plus this number.
- *
- * XXX: we don't have any reserved slots for now.
+ * slots for max_worker_process plus this number. Currently, we allocate
+ * reserved slots only for internal background task purposes.
  */
-#define BGWORKER_NUM_RESERVED_WORKERS			0
+#define BGWORKER_NUM_RESERVED_WORKERS			NUM_INTERNAL_BGTASKS
 
 /*
  * List of background workers, private to postmaster.
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index cf565452382..fef9f555310 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -83,3 +83,4 @@ PG_LWLOCK(49, WALSummarizer)
 PG_LWLOCK(50, DSMRegistry)
 PG_LWLOCK(51, InjectionPoint)
 PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, BgTaskControl)
diff --git a/src/include/utils/bgtask.h b/src/include/utils/bgtask.h
new file mode 100644
index 00000000000..d9d89811c98
--- /dev/null
+++ b/src/include/utils/bgtask.h
@@ -0,0 +1,38 @@
+/*-------------------------------------------------------------------------
+ *
+ * bgtask.h
+ *	  Definitions of infrastructure for executing internal task in background.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/bgtask.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BGTASK_H
+#define BGTASK_H
+
+/*
+ * The type of internal background tasks.
+ *
+ * When adding a new task, please change InternalBgTask array in bgtask.c
+ * too.
+ */
+typedef enum
+{
+	BGTASK_TYPE_DUMMY = 0,
+}			BgTaskType;
+
+/* The number of internal background tasks */
+#define NUM_INTERNAL_BGTASKS	BGTASK_TYPE_DUMMY + 1
+
+typedef void (*bgtask_fn_type) (void);
+
+extern Size BgTaskShmemSize(void);
+extern void BgTaskShmemInit(void);
+extern void BgTaskRequest(BgTaskType type, Oid dboid);
+extern void BgTaskWorkerMain(Datum main_arg);
+
+#endif							/* BTASK_H */
-- 
2.43.5

From 0256847b0645743ce9670c05834ea8fb62a3fc5d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 17 Feb 2025 11:50:36 -0800
Subject: [PATCH v3 1/3] Introduce reserved background worker slots.

---
 src/backend/bootstrap/bootstrap.c           |   2 +
 src/backend/postmaster/bgworker.c           | 125 +++++++++++++++-----
 src/backend/postmaster/pmchild.c            |   3 +-
 src/backend/postmaster/postmaster.c         |   6 +
 src/backend/storage/lmgr/proc.c             |   2 +-
 src/backend/tcop/postgres.c                 |   4 +
 src/backend/utils/init/globals.c            |   1 +
 src/backend/utils/init/postinit.c           |   3 +-
 src/include/miscadmin.h                     |   1 +
 src/include/postmaster/bgworker.h           |   7 +-
 src/include/postmaster/bgworker_internals.h |  10 ++
 11 files changed, 129 insertions(+), 35 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 6db864892d0..4d886b22a97 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -31,6 +31,7 @@
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "pg_getopt.h"
+#include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
 #include "storage/bufpage.h"
 #include "storage/ipc.h"
@@ -322,6 +323,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
 	SetProcessingMode(BootstrapProcessing);
 	IgnoreSystemIndexes = true;
 
+	InitializeMaxBgWorkers();
 	InitializeMaxBackends();
 
 	/*
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index b288915cec8..0f4389ad595 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -81,21 +81,36 @@ typedef struct BackgroundWorkerSlot
 } BackgroundWorkerSlot;
 
 /*
- * In order to limit the total number of parallel workers (according to
- * max_parallel_workers GUC), we maintain the number of active parallel
- * workers.  Since the postmaster cannot take locks, two variables are used for
- * this purpose: the number of registered parallel workers (modified by the
- * backends, protected by BackgroundWorkerLock) and the number of terminated
- * parallel workers (modified only by the postmaster, lockless).  The active
- * number of parallel workers is the number of registered workers minus the
- * terminated ones.  These counters can of course overflow, but it's not
- * important here since the subtraction will still give the right number.
+ * Struct holding information about background workers.
  */
 typedef struct BackgroundWorkerArray
 {
 	int			total_slots;
+
+	/*
+	 * In order to limit the total number of parallel workers (according to
+	 * max_parallel_workers GUC), we maintain the number of active parallel
+	 * workers.  Since the postmaster cannot take locks, two variables are
+	 * used for this purpose: the number of registered parallel workers
+	 * (modified by the backends, protected by BackgroundWorkerLock) and the
+	 * number of terminated parallel workers (modified only by the postmaster,
+	 * lockless).  The active number of parallel workers is the number of
+	 * registered workers minus the terminated ones.  These counters can of
+	 * course overflow, but it's not important here since the subtraction will
+	 * still give the right number.
+	 */
 	uint32		parallel_register_count;
 	uint32		parallel_terminate_count;
+
+	/*
+	 * Similar to parallel_register_count and parallel_terminate_count, but
+	 * these numbers tracks the total number of registration and termination.
+	 * The total active number of background workers is (register_count -
+	 * terminate_count).
+	 */
+	uint32		register_count;
+	uint32		terminate_count;
+
 	BackgroundWorkerSlot slot[FLEXIBLE_ARRAY_MEMBER];
 } BackgroundWorkerArray;
 
@@ -149,7 +164,7 @@ BackgroundWorkerShmemSize(void)
 
 	/* Array of workers is variably sized. */
 	size = offsetof(BackgroundWorkerArray, slot);
-	size = add_size(size, mul_size(max_worker_processes,
+	size = add_size(size, mul_size(MaxBgWorkers,
 								   sizeof(BackgroundWorkerSlot)));
 
 	return size;
@@ -171,7 +186,9 @@ BackgroundWorkerShmemInit(void)
 		dlist_iter	iter;
 		int			slotno = 0;
 
-		BackgroundWorkerData->total_slots = max_worker_processes;
+		BackgroundWorkerData->total_slots = MaxBgWorkers;
+		BackgroundWorkerData->register_count = 0;
+		BackgroundWorkerData->terminate_count = 0;
 		BackgroundWorkerData->parallel_register_count = 0;
 		BackgroundWorkerData->parallel_terminate_count = 0;
 
@@ -187,7 +204,7 @@ BackgroundWorkerShmemInit(void)
 			RegisteredBgWorker *rw;
 
 			rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
-			Assert(slotno < max_worker_processes);
+			Assert(slotno < MaxBgWorkers);
 			slot->in_use = true;
 			slot->terminate = false;
 			slot->pid = InvalidPid;
@@ -196,12 +213,13 @@ BackgroundWorkerShmemInit(void)
 			rw->rw_worker.bgw_notify_pid = 0;	/* might be reinit after crash */
 			memcpy(&slot->worker, &rw->rw_worker, sizeof(BackgroundWorker));
 			++slotno;
+			++BackgroundWorkerData->register_count;
 		}
 
 		/*
 		 * Mark any remaining slots as not in use.
 		 */
-		while (slotno < max_worker_processes)
+		while (slotno < MaxBgWorkers)
 		{
 			BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno];
 
@@ -213,6 +231,15 @@ BackgroundWorkerShmemInit(void)
 		Assert(found);
 }
 
+/*
+ * Initialize MaxBgWorkers value form config options.
+ */
+void
+InitializeMaxBgWorkers(void)
+{
+	MaxBgWorkers = max_worker_processes + BGWORKER_NUM_RESERVED_WORKERS;
+}
+
 /*
  * Search the postmaster's backend-private list of RegisteredBgWorker objects
  * for the one that maps to the given slot number.
@@ -249,16 +276,16 @@ BackgroundWorkerStateChange(bool allow_new_workers)
 
 	/*
 	 * The total number of slots stored in shared memory should match our
-	 * notion of max_worker_processes.  If it does not, something is very
-	 * wrong.  Further down, we always refer to this value as
-	 * max_worker_processes, in case shared memory gets corrupted while we're
-	 * looping.
+	 * notion of MaxBgWorkers.  If it does not, something is very wrong.
+	 * Further down, we always refer to this value as max_worker_processes, in
+	 * case shared memory gets corrupted while we're looping.
 	 */
-	if (max_worker_processes != BackgroundWorkerData->total_slots)
+	if (MaxBgWorkers != BackgroundWorkerData->total_slots)
 	{
 		ereport(LOG,
-				(errmsg("inconsistent background worker state (\"max_worker_processes\"=%d, total slots=%d)",
+				(errmsg("inconsistent background worker state (\"max_worker_processes\"=%d, reserved slot=%d, total slots=%d)",
 						max_worker_processes,
+						BGWORKER_NUM_RESERVED_WORKERS,
 						BackgroundWorkerData->total_slots)));
 		return;
 	}
@@ -267,7 +294,7 @@ BackgroundWorkerStateChange(bool allow_new_workers)
 	 * Iterate through slots, looking for newly-registered workers or workers
 	 * who must die.
 	 */
-	for (slotno = 0; slotno < max_worker_processes; ++slotno)
+	for (slotno = 0; slotno < MaxBgWorkers; ++slotno)
 	{
 		BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno];
 		RegisteredBgWorker *rw;
@@ -325,10 +352,11 @@ BackgroundWorkerStateChange(bool allow_new_workers)
 
 			/*
 			 * We need a memory barrier here to make sure that the load of
-			 * bgw_notify_pid and the update of parallel_terminate_count
-			 * complete before the store to in_use.
+			 * bgw_notify_pid and the update of terminate_count and
+			 * parallel_terminate_count complete before the store to in_use.
 			 */
 			notify_pid = slot->worker.bgw_notify_pid;
+			BackgroundWorkerData->terminate_count++;
 			if ((slot->worker.bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
 				BackgroundWorkerData->parallel_terminate_count++;
 			slot->pid = 0;
@@ -430,14 +458,16 @@ ForgetBackgroundWorker(RegisteredBgWorker *rw)
 {
 	BackgroundWorkerSlot *slot;
 
-	Assert(rw->rw_shmem_slot < max_worker_processes);
+	Assert(rw->rw_shmem_slot < MaxBgWorkers);
 	slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 	Assert(slot->in_use);
 
 	/*
 	 * We need a memory barrier here to make sure that the update of
-	 * parallel_terminate_count completes before the store to in_use.
+	 * terminate_count and parallel_terminate_count completes before the store
+	 * to in_use.
 	 */
+	BackgroundWorkerData->terminate_count++;
 	if ((rw->rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
 		BackgroundWorkerData->parallel_terminate_count++;
 
@@ -462,7 +492,7 @@ ReportBackgroundWorkerPID(RegisteredBgWorker *rw)
 {
 	BackgroundWorkerSlot *slot;
 
-	Assert(rw->rw_shmem_slot < max_worker_processes);
+	Assert(rw->rw_shmem_slot < MaxBgWorkers);
 	slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 	slot->pid = rw->rw_pid;
 
@@ -485,7 +515,7 @@ ReportBackgroundWorkerExit(RegisteredBgWorker *rw)
 	BackgroundWorkerSlot *slot;
 	int			notify_pid;
 
-	Assert(rw->rw_shmem_slot < max_worker_processes);
+	Assert(rw->rw_shmem_slot < MaxBgWorkers);
 	slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 	slot->pid = rw->rw_pid;
 	notify_pid = rw->rw_worker.bgw_notify_pid;
@@ -548,7 +578,7 @@ ForgetUnstartedBackgroundWorkers(void)
 		BackgroundWorkerSlot *slot;
 
 		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
-		Assert(rw->rw_shmem_slot < max_worker_processes);
+		Assert(rw->rw_shmem_slot < MaxBgWorkers);
 		slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 
 		/* If it's not yet started, and there's someone waiting ... */
@@ -939,6 +969,7 @@ void
 RegisterBackgroundWorker(BackgroundWorker *worker)
 {
 	RegisteredBgWorker *rw;
+	bool		use_reserved_slot;
 	static int	numworkers = 0;
 
 	/*
@@ -990,17 +1021,34 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
 		return;
 	}
 
+	use_reserved_slot = (worker->bgw_flags & BGWORKER_CLASS_RESERVED) != 0;
+
 	/*
 	 * Enforce maximum number of workers.  Note this is overly restrictive: we
 	 * could allow more non-shmem-connected workers, because these don't count
 	 * towards the MAX_BACKENDS limit elsewhere.  For now, it doesn't seem
 	 * important to relax this restriction.
 	 */
-	if (++numworkers > max_worker_processes)
+	numworkers++;
+	if (!use_reserved_slot &&
+		(MaxBgWorkers - numworkers) <= BGWORKER_NUM_RESERVED_WORKERS)
+	{
+		ereport(LOG,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("too many background workers max %d nworkers %d n_resreved %d",
+						MaxBgWorkers, numworkers, BGWORKER_NUM_RESERVED_WORKERS),
+				 errdetail_plural("Up to %d background worker can be registered with the current settings.",
+								  "Up to %d background workers can be registered with the current settings.",
+								  max_worker_processes,
+								  max_worker_processes),
+				 errhint("Consider increasing the configuration parameter \"%s\".", "max_worker_processes")));
+		return;
+	}
+	else if (numworkers > MaxBgWorkers)
 	{
 		ereport(LOG,
 				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
-				 errmsg("too many background workers"),
+				 errmsg("remaining background workers are reserved"),
 				 errdetail_plural("Up to %d background worker can be registered with the current settings.",
 								  "Up to %d background workers can be registered with the current settings.",
 								  max_worker_processes,
@@ -1048,6 +1096,7 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
 	int			slotno;
 	bool		success = false;
 	bool		parallel;
+	bool		use_reserved_slot;
 	uint64		generation = 0;
 
 	/*
@@ -1065,6 +1114,7 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
 		return false;
 
 	parallel = (worker->bgw_flags & BGWORKER_CLASS_PARALLEL) != 0;
+	use_reserved_slot = (worker->bgw_flags & BGWORKER_CLASS_RESERVED) != 0;
 
 	LWLockAcquire(BackgroundWorkerLock, LW_EXCLUSIVE);
 
@@ -1088,6 +1138,18 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
 		return false;
 	}
 
+	if (!use_reserved_slot &&
+		(MaxBgWorkers -
+		 (BackgroundWorkerData->register_count - BackgroundWorkerData->terminate_count)) <=
+		BGWORKER_NUM_RESERVED_WORKERS)
+	{
+		Assert(BackgroundWorkerData->register_count -
+			   BackgroundWorkerData->terminate_count <=
+			   MAX_PARALLEL_WORKER_LIMIT);
+		LWLockRelease(BackgroundWorkerLock);
+		return false;
+	}
+
 	/*
 	 * Look for an unused slot.  If we find one, grab it.
 	 */
@@ -1102,6 +1164,7 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
 			slot->generation++;
 			slot->terminate = false;
 			generation = slot->generation;
+			BackgroundWorkerData->register_count++;
 			if (parallel)
 				BackgroundWorkerData->parallel_register_count++;
 
@@ -1159,7 +1222,7 @@ GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
 	BackgroundWorkerSlot *slot;
 	pid_t		pid;
 
-	Assert(handle->slot < max_worker_processes);
+	Assert(handle->slot < MaxBgWorkers);
 	slot = &BackgroundWorkerData->slot[handle->slot];
 
 	/*
@@ -1298,7 +1361,7 @@ TerminateBackgroundWorker(BackgroundWorkerHandle *handle)
 	BackgroundWorkerSlot *slot;
 	bool		signal_postmaster = false;
 
-	Assert(handle->slot < max_worker_processes);
+	Assert(handle->slot < MaxBgWorkers);
 	slot = &BackgroundWorkerData->slot[handle->slot];
 
 	/* Set terminate flag in shared memory, unless slot has been reused. */
diff --git a/src/backend/postmaster/pmchild.c b/src/backend/postmaster/pmchild.c
index 0d473226c3a..515a9fda3af 100644
--- a/src/backend/postmaster/pmchild.c
+++ b/src/backend/postmaster/pmchild.c
@@ -33,6 +33,7 @@
 
 #include "miscadmin.h"
 #include "postmaster/autovacuum.h"
+#include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
 #include "replication/walsender.h"
 #include "storage/pmsignal.h"
@@ -100,7 +101,7 @@ InitPostmasterChildSlots(void)
 	pmchild_pools[B_BACKEND].size = 2 * (MaxConnections + max_wal_senders);
 
 	pmchild_pools[B_AUTOVAC_WORKER].size = autovacuum_worker_slots;
-	pmchild_pools[B_BG_WORKER].size = max_worker_processes;
+	pmchild_pools[B_BG_WORKER].size = max_worker_processes + BGWORKER_NUM_RESERVED_WORKERS;
 
 	/*
 	 * There can be only one of each of these running at a time.  They each
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index bb22b13adef..108a1e5cfbe 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -910,6 +910,12 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	LocalProcessControlFile(false);
 
+	/*
+	 * Initialize MaxBgWorker. This should be called after initializing GUC
+	 * before any chances to register background workers.
+	 */
+	InitializeMaxBgWorkers();
+
 	/*
 	 * Register the apply launcher.  It's probably a good idea to call this
 	 * before any modules had a chance to take the background worker slots.
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 49204f91a20..33a5b062198 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -290,7 +290,7 @@ InitProcGlobal(void)
 			dlist_push_tail(&ProcGlobal->autovacFreeProcs, &proc->links);
 			proc->procgloballist = &ProcGlobal->autovacFreeProcs;
 		}
-		else if (i < MaxConnections + autovacuum_worker_slots + NUM_SPECIAL_WORKER_PROCS + max_worker_processes)
+		else if (i < MaxConnections + autovacuum_worker_slots + NUM_SPECIAL_WORKER_PROCS + MaxBgWorkers)
 		{
 			/* PGPROC for bgworker, add to bgworkerFreeProcs list */
 			dlist_push_tail(&ProcGlobal->bgworkerFreeProcs, &proc->links);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1149d89d7a1..7ae12517e8b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -53,6 +53,7 @@
 #include "pg_getopt.h"
 #include "pg_trace.h"
 #include "pgstat.h"
+#include "postmaster/bgworker_internals.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -4064,6 +4065,9 @@ PostgresSingleUserMain(int argc, char *argv[],
 	/* read control file (error checking and contains config ) */
 	LocalProcessControlFile(false);
 
+	/* Initialize MaxBgWorkers */
+	InitializeMaxBgWorkers();
+
 	/*
 	 * process any libraries that should be preloaded at postmaster start
 	 */
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdaef..8bccc13684d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -143,6 +143,7 @@ int			MaxConnections = 100;
 int			max_worker_processes = 8;
 int			max_parallel_workers = 8;
 int			MaxBackends = 0;
+int			MaxBgWorkers = 0;
 
 /* GUC parameters for vacuum */
 int			VacuumBufferUsageLimit = 2048;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 01bb6a410cb..d70a5be446a 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -545,10 +545,11 @@ void
 InitializeMaxBackends(void)
 {
 	Assert(MaxBackends == 0);
+	Assert(MaxBgWorkers > 0);
 
 	/* Note that this does not include "auxiliary" processes */
 	MaxBackends = MaxConnections + autovacuum_worker_slots +
-		max_worker_processes + max_wal_senders + NUM_SPECIAL_WORKER_PROCS;
+		MaxBgWorkers + max_wal_senders + NUM_SPECIAL_WORKER_PROCS;
 
 	if (MaxBackends > MAX_BACKENDS)
 		ereport(ERROR,
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index a2b63495eec..2885357cf4d 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -174,6 +174,7 @@ extern PGDLLIMPORT int data_directory_mode;
 
 extern PGDLLIMPORT int NBuffers;
 extern PGDLLIMPORT int MaxBackends;
+extern PGDLLIMPORT int MaxBgWorkers;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h
index 058667a47a0..9ff12ba1e57 100644
--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -66,8 +66,13 @@
  * background workers should not use this class.
  */
 #define BGWORKER_CLASS_PARALLEL					0x0010
-/* add additional bgworker classes here */
 
+/*
+ * This class is used for the worker to use a reserved background worker
+ * slots from the pool of BGWORKER_NUM_RESERVED_WORKERS workers.
+ */
+#define BGWORKER_CLASS_RESERVED					0x0020
+/* add additional bgworker classes here */
 
 typedef void (*bgworker_main_type) (Datum main_arg);
 
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index 092b1610663..c26ec6b2387 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -23,6 +23,15 @@
  */
 #define MAX_PARALLEL_WORKER_LIMIT 1024
 
+/*
+ * The number of background workers reserved for ones registered with
+ * BGWORKER_CLASS_RESERVED flag. We allocate total background worker
+ * slots for max_worker_process plus this number.
+ *
+ * XXX: we don't have any reserved slots for now.
+ */
+#define BGWORKER_NUM_RESERVED_WORKERS			0
+
 /*
  * List of background workers, private to postmaster.
  *
@@ -43,6 +52,7 @@ extern PGDLLIMPORT dlist_head BackgroundWorkerList;
 
 extern Size BackgroundWorkerShmemSize(void);
 extern void BackgroundWorkerShmemInit(void);
+extern void InitializeMaxBgWorkers(void);
 extern void BackgroundWorkerStateChange(bool allow_new_workers);
 extern void ForgetBackgroundWorker(RegisteredBgWorker *rw);
 extern void ReportBackgroundWorkerPID(RegisteredBgWorker *rw);
-- 
2.43.5

Reply via email to