Dear All,

Thank you for the attention to the patch. I updated a patch with a better
solution for the master branch which can be easily backported to the other
branches as we agree on the final solution.

Two tests are introduced which are based on Tomas Vondra's test for logical 
slots
with injection points from the discussion (patches: [1], [2], [3]). Tests
are implemented as module tests in src/test/modules/test_replslot_required_lsn
directory. I slightly modified the original test for logical slots and created a
new simpler test for physical slots without any additional injection points.

I prepared a new solution (patch [4]) which is also based on Tomas Vondra's
proposal. With a fresh eye, I realized that it can fix the issue as well. It is
easy and less invasive to implement. The new solution differs from my original
solution: it is backward compatible (doesn't require any changes in 
ReplicationSlot
structure). My original solution can be backward compatible as well if to
allocate flushed_restart_lsn in a separate array in shmem, not in the
ReplicationSlot structure, but I believe the new solution is the better one. If
you still think that my previous solution is the better (I don't think so), I
will prepare a backward compatible patch with my previous solution.

I also proposed one more commit (patch [5]) which removes unnecessary calls of
ReplicationSlotsComputeRequiredLSN function which seems to be redundant. This
function updates the oldest required LSN for slots and it is called every time
when slots' restart_lsn is changed. Once, we use the oldest required LSN in
CreateCheckPoint/CreateRestartPoint to remove old WAL segments, I believe, there
is no need to calculate the oldest value immediately when the slot is advancing
and in other cases when restart_lsn is changed. It may affect on
GetWALAvailability function because the oldest required LSN will be not up to
date, but this function seems to be used in the system view
pg_get_replication_slots and doesn't affect the logic of old WAL segments
removal. I also have some doubts concerning advancing of logical replication
slots: the call of ReplicationSlotsComputeRequiredLSN was removed. Not sure, how
it can affect on ReplicationSlotsComputeRequiredXmin. This commit is not
necessary for the fix but I think it is worth to consider. It may be dropped or
applied only to the master branch.

This patch can be easily backported to the major release branches. I will
quickly prepare the patches for the major releases as we agree on the final
solution.

I apologize for such too late change in patch when it is already on commitfest.
I'm not well experienced yet in the internals of PostgreSQL at the moment,
sometimes the better solution needs some time to grow. In doing we learn :)

[1] 0001-Add-injection-points-to-test-replication-slot-advanc.v2.patch
[2] 0002-Add-TAP-test-to-check-logical-repl-slot-advance-duri.v2.patch
[3] 0003-Add-TAP-test-to-check-physical-repl-slot-advance-dur.v2.patch
[4] 0004-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.v2.patch
[5] 0005-Remove-redundant-ReplicationSlotsComputeRequiredLSN-.v2.patch

With best regards,
Vitaly
From a8693c3003df7f9850af0be5284bb6f0e7a82fa6 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davy...@postgrespro.ru>
Date: Wed, 30 Apr 2025 12:48:27 +0300
Subject: [PATCH 3/5] Add TAP test to check physical repl slot advance during
 checkpoint

The test verifies that the physical replication slot is still valid
after immediate restart on checkpoint completion in case when the slot
was advanced during checkpoint.

Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
 .../test_replslot_required_lsn/meson.build    |   3 +-
 .../t/002_physical_slot.pl                    | 126 ++++++++++++++++++
 2 files changed, 128 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl

diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build
index 999c16201fb..44d2546632b 100644
--- a/src/test/modules/test_replslot_required_lsn/meson.build
+++ b/src/test/modules/test_replslot_required_lsn/meson.build
@@ -9,7 +9,8 @@ tests += {
        'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
     },
     'tests': [
-      't/001_logical_slot.pl'
+      't/001_logical_slot.pl',
+      't/002_physical_slot.pl'
     ],
   },
 }
diff --git a/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl b/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl
new file mode 100644
index 00000000000..f89aec1da32
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl
@@ -0,0 +1,126 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the physical slot is advanced during
+# checkpoint. The test checks that the physical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+# Discussion:
+# https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+	plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init();
+$node->append_conf('postgresql.conf',
+	"shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf',
+	"wal_level = 'replica'");
+$node->start();
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# create a simple table to generate data into
+$node->safe_psql('postgres',
+	q{create table t (id serial primary key, b text)});
+
+# create a physical replication slot
+$node->safe_psql('postgres',
+	q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# advance slot to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+	q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# run checkpoint, to flush current state to disk and set a baseline
+$node->safe_psql('postgres', q{checkpoint});
+
+# insert 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+	q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)});
+
+# advance slot to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+	q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# run another checkpoint, to set a new restore LSN
+$node->safe_psql('postgres', q{checkpoint});
+
+# another 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+	q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)});
+
+my $restart_lsn_init = $node->safe_psql('postgres',
+	q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'});
+chomp($restart_lsn_init);
+note("restart lsn before checkpoint: $restart_lsn_init");
+
+# run another checkpoint, this time in the background, and make it wait
+# on the injection point), so that the checkpoint stops right before
+# removing old WAL segments
+note('starting checkpoint');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(
+	q{select injection_points_attach('checkpoint-before-old-wal-removal','wait')});
+$checkpoint->query_until(qr/starting_checkpoint/,
+q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+# wait until the checkpoint stops right before removing WAL segments
+note('waiting for injection_point');
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+note('injection_point is reached');
+
+# OK, we're in the right situation,  time to advance the physical slot,
+# which recalculates the required LSN, and then unblock the checkpoint,
+# which removes the WAL still needed by the logical slot
+$node->safe_psql('postgres',
+	q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# Continue checkpoint
+$node->safe_psql('postgres',
+	q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+my $restart_lsn_old = $node->safe_psql('postgres',
+	q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'});
+chomp($restart_lsn_old);
+note("restart lsn before stop: $restart_lsn_old");
+
+# abruptly stop the server (1 second should be enough for the checkpoint
+# to finish, would be better )
+$node->stop('immediate');
+
+$node->start;
+
+# Get the restart_lsn of the slot right after restarting
+my $restart_lsn = $node->safe_psql('postgres',
+	q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'});
+chomp($restart_lsn);
+note("restart lsn: $restart_lsn");
+
+# Get wal segment name for slot's restart_lsn
+my $restart_lsn_segment = $node->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn'::pg_lsn)");
+chomp($restart_lsn_segment);
+
+# Check if the required wal segment exists
+note("required by slot segment name: $restart_lsn_segment");
+my $datadir = $node->data_dir;
+ok(-f "$datadir/pg_wal/$restart_lsn_segment",
+	"WAL segment $restart_lsn_segment for physical slot's restart_lsn $restart_lsn exists");
+
+done_testing();
-- 
2.34.1

From f17ba2642b7c7ef13163c3e411f3d9218a1faa11 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davy...@postgrespro.ru>
Date: Wed, 30 Apr 2025 14:09:21 +0300
Subject: [PATCH 4/5] Keep WAL segments by slot's flushed restart LSN

The patch fixes the issue with unexpected removal of old WAL segments
after checkpoint followed by immediate restart. The issue occurs when a
slot is advanced after the start of checkpoint and before old WAL
segments removal at end of checkpoint.

The idea of the patch is to get the minimal restart_lsn at the beginning
of checkpoint (or restart point) creation and use this value when
calculating oldest LSN for WAL segments removal at the end of
checkpoint. This idea was proposed by Tomas Vondra in the discussion.

Discussion:
https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
 src/backend/access/transam/xlog.c | 37 ++++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1f2256a3b86..79a21e2d088 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -677,7 +677,8 @@ static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn,
 												  XLogRecPtr pagePtr,
 												  TimeLineID newTLI);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
-static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
+static void KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinLSN,
+					   XLogSegNo *logSegNo);
 static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
 
 static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli,
@@ -7087,6 +7088,7 @@ CreateCheckPoint(int flags)
 	VirtualTransactionId *vxids;
 	int			nvxids;
 	int			oldXLogAllowed = 0;
+	XLogRecPtr	slotsMinReqLSN;
 
 	/*
 	 * An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -7315,6 +7317,11 @@ CreateCheckPoint(int flags)
 	 */
 	END_CRIT_SECTION();
 
+	/*
+	 * Get the current minimum LSN to be used later in WAL segments cleanup.
+	 */
+	slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
 	/*
 	 * In some cases there are groups of actions that must all occur on one
 	 * side or the other of a checkpoint record. Before flushing the
@@ -7507,17 +7514,20 @@ CreateCheckPoint(int flags)
 	 * prevent the disk holding the xlog from growing full.
 	 */
 	XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
-	KeepLogSeg(recptr, &_logSegNo);
+	KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
 	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
 										   _logSegNo, InvalidOid,
 										   InvalidTransactionId))
 	{
+		slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+		CheckPointReplicationSlots(shutdown);
+
 		/*
 		 * Some slots have been invalidated; recalculate the old-segment
 		 * horizon, starting again from RedoRecPtr.
 		 */
 		XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
-		KeepLogSeg(recptr, &_logSegNo);
+		KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
 	}
 	_logSegNo--;
 	RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
@@ -7792,6 +7802,7 @@ CreateRestartPoint(int flags)
 	XLogRecPtr	endptr;
 	XLogSegNo	_logSegNo;
 	TimestampTz xtime;
+	XLogRecPtr	slotsMinReqLSN;
 
 	/* Concurrent checkpoint/restartpoint cannot happen */
 	Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
@@ -7874,6 +7885,11 @@ CreateRestartPoint(int flags)
 	MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
 	CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
 
+	/*
+	 * Get the current minimum LSN to be used later in WAL segments cleanup.
+	 */
+	slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+
 	if (log_checkpoints)
 		LogCheckpointStart(flags, true);
 
@@ -7962,17 +7978,20 @@ CreateRestartPoint(int flags)
 	receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
 	replayPtr = GetXLogReplayRecPtr(&replayTLI);
 	endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
-	KeepLogSeg(endptr, &_logSegNo);
+	KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
 	if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
 										   _logSegNo, InvalidOid,
 										   InvalidTransactionId))
 	{
+		slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
+		CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
+
 		/*
 		 * Some slots have been invalidated; recalculate the old-segment
 		 * horizon, starting again from RedoRecPtr.
 		 */
 		XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
-		KeepLogSeg(endptr, &_logSegNo);
+		KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo);
 	}
 	_logSegNo--;
 
@@ -8067,6 +8086,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
 	XLogSegNo	oldestSegMaxWalSize;	/* oldest segid kept by max_wal_size */
 	XLogSegNo	oldestSlotSeg;	/* oldest segid kept by slot */
 	uint64		keepSegs;
+	XLogRecPtr	slotsMinReqLSN;
 
 	/*
 	 * slot does not reserve WAL. Either deactivated, or has never been active
@@ -8080,8 +8100,9 @@ GetWALAvailability(XLogRecPtr targetLSN)
 	 * oldestSlotSeg to the current segment.
 	 */
 	currpos = GetXLogWriteRecPtr();
+	slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
 	XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
-	KeepLogSeg(currpos, &oldestSlotSeg);
+	KeepLogSeg(currpos, slotsMinReqLSN, &oldestSlotSeg);
 
 	/*
 	 * Find the oldest extant segment file. We get 1 until checkpoint removes
@@ -8142,7 +8163,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
  * invalidation is optionally done here, instead.
  */
 static void
-KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
+KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinReqLSN, XLogSegNo *logSegNo)
 {
 	XLogSegNo	currSegNo;
 	XLogSegNo	segno;
@@ -8155,7 +8176,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 	 * Calculate how many segments are kept by slots first, adjusting for
 	 * max_slot_wal_keep_size.
 	 */
-	keep = XLogGetReplicationSlotMinimumLSN();
+	keep = slotsMinReqLSN;
 	if (keep != InvalidXLogRecPtr && keep < recptr)
 	{
 		XLByteToSeg(keep, segno, wal_segment_size);
-- 
2.34.1

From 68b16da5448ec64661319bca07939e07066fe2a6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@vondra.me>
Date: Thu, 21 Nov 2024 20:37:00 +0100
Subject: [PATCH 1/5] Add injection points to test replication slot advance

New injection points:

* checkpoint-before-old-wal-removal - triggered in the checkpointer
  process just before old WAL segments cleanup.

* logical-replication-slot-advance-segment - triggered in
  LogicalConfirmReceivedLocation when restart_lsn was changed enough to
  point to a next WAL segment.

Original patch by: Tomas Vondra <to...@vondra.me>
Modified by: Vitaly Davydov <v.davy...@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
 src/backend/access/transam/xlog.c         |  4 ++++
 src/backend/replication/logical/logical.c | 18 ++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2d4c346473b..1f2256a3b86 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7498,6 +7498,10 @@ CreateCheckPoint(int flags)
 	if (PriorRedoPtr != InvalidXLogRecPtr)
 		UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
 
+#ifdef USE_INJECTION_POINTS
+	INJECTION_POINT("checkpoint-before-old-wal-removal");
+#endif
+
 	/*
 	 * Delete old log files, those no longer needed for last checkpoint to
 	 * prevent the disk holding the xlog from growing full.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index a8d2e024d34..2163dc5e275 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -30,6 +30,7 @@
 
 #include "access/xact.h"
 #include "access/xlogutils.h"
+#include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
 
@@ -1825,9 +1827,13 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
 	{
 		bool		updated_xmin = false;
 		bool		updated_restart = false;
+		XLogRecPtr	restart_lsn pg_attribute_unused();
 
 		SpinLockAcquire(&MyReplicationSlot->mutex);
 
+		/* remember the old restart lsn */
+		restart_lsn = MyReplicationSlot->data.restart_lsn;
+
 		MyReplicationSlot->data.confirmed_flush = lsn;
 
 		/* if we're past the location required for bumping xmin, do so */
@@ -1869,6 +1875,18 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
 		/* first write new xmin to disk, so we know what's up after a crash */
 		if (updated_xmin || updated_restart)
 		{
+#ifdef USE_INJECTION_POINTS
+			XLogSegNo	seg1,
+						seg2;
+
+			XLByteToSeg(restart_lsn, seg1, wal_segment_size);
+			XLByteToSeg(MyReplicationSlot->data.restart_lsn, seg2, wal_segment_size);
+
+			/* trigger injection point, but only if segment changes */
+			if (seg1 != seg2)
+				INJECTION_POINT("logical-replication-slot-advance-segment");
+#endif
+
 			ReplicationSlotMarkDirty();
 			ReplicationSlotSave();
 			elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin, updated_restart);
-- 
2.34.1

From 5836ce690d6167d4f79181cc3af0531245969df6 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <v.davy...@postgrespro.ru>
Date: Thu, 1 May 2025 12:18:52 +0300
Subject: [PATCH 5/5] Remove redundant ReplicationSlotsComputeRequiredLSN calls

The function ReplicationSlotsComputeRequiredLSN is used to calculate the
oldest slots' required LSN. It is called every time when restart_lsn
value of any slot is changed (for example, when a slot is advancing).
The slot's oldest required LSN is used to remote old WAL segments in two
places - when checkpoint or restart point is created (CreateCheckPoint,
CreateRestartPoint functions). Old WAL segments seems to be truncated in
these two functions only.

The idea of the patch is to call ReplicationSlotsComputeRequiredLSN in
CreateCheckPoint or CreateRestartPoint functions only, before call of
RemoveOldXlogFiles function where old WAL segments are removed. There
is no obvious need to recalculate oldest required LSN every time when a
slot's restart_lsn is changed.

The value of the oldest required lsn can affect on slot invalidation.
The function InvalidateObsoleteReplicationSlots with non zero second
parameter (oldestSegno) is called in CreateCheckPoint,
CreateRestartPoint functions only where slot invalidation occurs with
reason RS_INVAL_WAL_REMOVED. Once we update the oldest slots' required
lsn in the beginning of these functions, the proposed patch should not
break the behaviour of slot invalidation function in this case.
---
 src/backend/access/transam/xlog.c          | 4 ++++
 src/backend/replication/logical/logical.c  | 1 -
 src/backend/replication/logical/slotsync.c | 4 ----
 src/backend/replication/slot.c             | 5 -----
 src/backend/replication/slotfuncs.c        | 2 --
 src/backend/replication/walsender.c        | 1 -
 6 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 79a21e2d088..5875b5f7b9c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7320,6 +7320,7 @@ CreateCheckPoint(int flags)
 	/*
 	 * Get the current minimum LSN to be used later in WAL segments cleanup.
 	 */
+	ReplicationSlotsComputeRequiredLSN();
 	slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
 
 	/*
@@ -7519,6 +7520,7 @@ CreateCheckPoint(int flags)
 										   _logSegNo, InvalidOid,
 										   InvalidTransactionId))
 	{
+		ReplicationSlotsComputeRequiredLSN();
 		slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
 		CheckPointReplicationSlots(shutdown);
 
@@ -7888,6 +7890,7 @@ CreateRestartPoint(int flags)
 	/*
 	 * Get the current minimum LSN to be used later in WAL segments cleanup.
 	 */
+	ReplicationSlotsComputeRequiredLSN();
 	slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
 
 	if (log_checkpoints)
@@ -7983,6 +7986,7 @@ CreateRestartPoint(int flags)
 										   _logSegNo, InvalidOid,
 										   InvalidTransactionId))
 	{
+		ReplicationSlotsComputeRequiredLSN();
 		slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
 		CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 2163dc5e275..e796023033c 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1905,7 +1905,6 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
 			SpinLockRelease(&MyReplicationSlot->mutex);
 
 			ReplicationSlotsComputeRequiredXmin(false);
-			ReplicationSlotsComputeRequiredLSN();
 		}
 	}
 	else
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 656e66e0ae0..30662c09275 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -335,7 +335,6 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		SpinLockRelease(&slot->mutex);
 
 		ReplicationSlotsComputeRequiredXmin(false);
-		ReplicationSlotsComputeRequiredLSN();
 	}
 
 	return updated_config || updated_xmin_or_lsn;
@@ -502,9 +501,6 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
 		slot->data.restart_lsn = restart_lsn;
 		SpinLockRelease(&slot->mutex);
 
-		/* Prevent WAL removal as fast as possible */
-		ReplicationSlotsComputeRequiredLSN();
-
 		XLByteToSeg(slot->data.restart_lsn, segno, wal_segment_size);
 
 		/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 600b87fa9cb..dd18fe10f7d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1008,7 +1008,6 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * limits.
 	 */
 	ReplicationSlotsComputeRequiredXmin(false);
-	ReplicationSlotsComputeRequiredLSN();
 
 	/*
 	 * If removing the directory fails, the worst thing that will happen is
@@ -1494,9 +1493,6 @@ ReplicationSlotReserveWal(void)
 		slot->data.restart_lsn = restart_lsn;
 		SpinLockRelease(&slot->mutex);
 
-		/* prevent WAL removal as fast as possible */
-		ReplicationSlotsComputeRequiredLSN();
-
 		/*
 		 * If all required WAL is still there, great, otherwise retry. The
 		 * slot should prevent further removal of WAL, unless there's a
@@ -2014,7 +2010,6 @@ restart:
 	if (invalidated)
 	{
 		ReplicationSlotsComputeRequiredXmin(false);
-		ReplicationSlotsComputeRequiredLSN();
 	}
 
 	return invalidated;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 36cc2ed4e44..3300fb9b1c9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -583,7 +583,6 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 	 * advancing potentially done.
 	 */
 	ReplicationSlotsComputeRequiredXmin(false);
-	ReplicationSlotsComputeRequiredLSN();
 
 	ReplicationSlotRelease();
 
@@ -819,7 +818,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 
 		ReplicationSlotMarkDirty();
 		ReplicationSlotsComputeRequiredXmin(false);
-		ReplicationSlotsComputeRequiredLSN();
 		ReplicationSlotSave();
 
 #ifdef USE_ASSERT_CHECKING
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9fa8beb6103..0767c2803d9 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2384,7 +2384,6 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
 	if (changed)
 	{
 		ReplicationSlotMarkDirty();
-		ReplicationSlotsComputeRequiredLSN();
 		PhysicalWakeupLogicalWalSnd();
 	}
 
-- 
2.34.1

From 2a5ead4a45c9624eace2dbad63f18ca76c307db6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@vondra.me>
Date: Thu, 21 Nov 2024 23:07:22 +0100
Subject: [PATCH 2/5] Add TAP test to check logical repl slot advance during
 checkpoint

The test verifies that logical replication slot is still valid after
immediate restart on checkpoint completion in case when the slot was
advanced during checkpoint.

Original patch by: Tomas Vondra <to...@vondra.me>
Modified by: Vitaly Davydov <v.davy...@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
 src/test/modules/Makefile                     |   4 +-
 src/test/modules/meson.build                  |   1 +
 .../test_replslot_required_lsn/Makefile       |  18 +++
 .../test_replslot_required_lsn/meson.build    |  15 +++
 .../t/001_logical_slot.pl                     | 124 ++++++++++++++++++
 5 files changed, 160 insertions(+), 2 deletions(-)
 create mode 100644 src/test/modules/test_replslot_required_lsn/Makefile
 create mode 100644 src/test/modules/test_replslot_required_lsn/meson.build
 create mode 100644 src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index aa1d27bbed3..53d3dd8e0ed 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -46,9 +46,9 @@ SUBDIRS = \
 
 
 ifeq ($(enable_injection_points),yes)
-SUBDIRS += injection_points gin typcache
+SUBDIRS += injection_points gin typcache test_replslot_required_lsn
 else
-ALWAYS_SUBDIRS += injection_points gin typcache
+ALWAYS_SUBDIRS += injection_points gin typcache test_replslot_required_lsn
 endif
 
 ifeq ($(with_ssl),openssl)
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 9de0057bd1d..ac0dbd1f10f 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -43,3 +43,4 @@ subdir('typcache')
 subdir('unsafe_tests')
 subdir('worker_spi')
 subdir('xid_wraparound')
+subdir('test_replslot_required_lsn')
diff --git a/src/test/modules/test_replslot_required_lsn/Makefile b/src/test/modules/test_replslot_required_lsn/Makefile
new file mode 100644
index 00000000000..e5ff8af255b
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/test_replslot_required_lsn/Makefile
+
+EXTRA_INSTALL=src/test/modules/injection_points \
+	contrib/test_decoding
+
+export enable_injection_points
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_replslot_required_lsn
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build
new file mode 100644
index 00000000000..999c16201fb
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/meson.build
@@ -0,0 +1,15 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+tests += {
+  'name': 'test_replslot_required_lsn',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+    'env': {
+       'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+    },
+    'tests': [
+      't/001_logical_slot.pl'
+    ],
+  },
+}
diff --git a/src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl b/src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl
new file mode 100644
index 00000000000..ff13c741ad0
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl
@@ -0,0 +1,124 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# This test verifies the case when the logical slot is advanced during
+# checkpoint. The test checks that the logical slot's restart_lsn still refers
+# to an existed WAL segment after immediate restart.
+#
+# Discussion:
+# https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+	plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+
+$node = PostgreSQL::Test::Cluster->new('mike');
+$node->init;
+$node->append_conf('postgresql.conf',
+	"shared_preload_libraries = 'injection_points'");
+$node->append_conf('postgresql.conf',
+	"wal_level = 'logical'");
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+# create a simple table to generate data into
+$node->safe_psql('postgres',
+	q{create table t (id serial primary key, b text)});
+
+# create the two slots we'll need
+$node->safe_psql('postgres',
+	q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')});
+$node->safe_psql('postgres',
+	q{select pg_create_physical_replication_slot('slot_physical', true)});
+
+# advance both to current position, just to have everything "valid"
+$node->safe_psql('postgres',
+	q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)});
+$node->safe_psql('postgres',
+	q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+# run checkpoint, to flush current state to disk and set a baseline
+$node->safe_psql('postgres', q{checkpoint});
+
+# generate transactions to get RUNNING_XACTS
+my $xacts = $node->background_psql('postgres');
+$xacts->query_until(qr/run_xacts/,
+q(\echo run_xacts
+SELECT 1 \watch 0.1
+\q
+));
+
+# insert 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+	q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)});
+
+# run another checkpoint, to set a new restore LSN
+$node->safe_psql('postgres', q{checkpoint});
+
+# another 2M rows, that's about 260MB (~20 segments) worth of WAL
+$node->safe_psql('postgres',
+	q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)});
+
+# run another checkpoint, this time in the background, and make it wait
+# on the injection point), so that the checkpoint stops right before
+# removing old WAL segments
+print('starting checkpoint\n');
+
+my $checkpoint = $node->background_psql('postgres');
+$checkpoint->query_safe(q(select injection_points_attach('checkpoint-before-old-wal-removal','wait')));
+$checkpoint->query_until(qr/starting_checkpoint/,
+q(\echo starting_checkpoint
+checkpoint;
+\q
+));
+
+print('waiting for injection_point\n');
+# wait until the checkpoint stops right before removing WAL segments
+$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal');
+
+
+# try to advance the logical slot, but make it stop when it moves to the
+# next WAL segment (has to happen in the background too)
+my $logical = $node->background_psql('postgres');
+$logical->query_safe(q{select injection_points_attach('logical-replication-slot-advance-segment','wait');});
+$logical->query_until(qr/get_changes/,
+q(
+\echo get_changes
+select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1
+\q
+));
+
+# wait until the checkpoint stops right before removing WAL segments
+$node->wait_for_event('client backend', 'logical-replication-slot-advance-segment');
+
+# OK, we're in the right situation,  time to advance the physical slot,
+# which recalculates the required LSN, and then unblock the checkpoint,
+# which removes the WAL still needed by the logical slot
+$node->safe_psql('postgres',
+	q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())});
+
+$node->safe_psql('postgres',
+	q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
+
+# abruptly stop the server (1 second should be enough for the checkpoint
+# to finish, would be better )
+$node->stop('immediate');
+
+$node->start;
+
+eval {
+	$node->safe_psql('postgres', q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);});
+};
+is($@, '', "Logical slot still valid");
+
+done_testing();
-- 
2.34.1

Reply via email to