Dear Amit, Hayato On Wednesday, September 24, 2025 14:31 MSK, "Hayato Kuroda (Fujitsu)" <kuroda.hay...@fujitsu.com> wrote:
>> I was thinking some more about this solution. Won't it lead to the >> same problem if ReplicationSlotReserveWal() calls >> ReplicationSlotsComputeRequiredLSN() after the above calculation of >> checkpointer? > Exactly. I verified that in your patch, the invalidation can still happen if > we cannot finish the LSN computation before the KeepLogSegments(). Yes. The moment, when WAL reservation takes place is the call of ReplicationSlotsComputeRequiredLSN which updates the oldest slots' lsn (XLogCtl->replicationSlotMinLSN). If it occurs at the moment between KeepLogSeg and RemoveOldXlogFiles, such reservation will not be taken into account. This behaviour seems to be before commit 2090edc6f32f652a2c, but the probability of such race condition was too slow due to the short time period between KeepLogSeg and RemoveOldXlogFiles. The commit 2090edc6f32f652a2c increased the probability of such race condition because CheckPointGuts can take greater time to execute. The attached patch doesn't solve the original problem completely but it decreases the probability of such race condition, as it was before the commit. I propose to apply this patch and then to think how to resolve this race condition, which seems to take place in 18 and master as well. I updated the patch by improving some comments as suggested by Amit. With best regards, Vitaly
From 6163a4f8c22de55cc00620103c26ccc387c119dd Mon Sep 17 00:00:00 2001 From: Vitaly Davydov <v.davy...@postgrespro.ru> Date: Wed, 17 Sep 2025 13:29:54 +0300 Subject: [PATCH] Fix invalidation when slot is created during checkpoint The commit 2090edc6f32f652a2c introduced an issue of slot invalidation if the slot was creating during checkpoint. The issue happens in 17 and earlier versions. The reason was in calculation of oldest slots' lsn at the beginning of the checkpoint which is used in wal removal function. If the slot reserved the wal during checkpoint, the new oldest value may be lesser than the calculated value at the beginning of the checkpoint. As the result, the new slot is invalidated unexpectedly. To fix the issue, the oldest slots' lsn is calculated as the minimal value of these two values: the current oldest slots' lsn and the oldest slots' lsn at the beginning of the checkpoint. Discussion: https://www.postgresql.org/message-id/flat/5e045179-236f-4f8f-84f1-0f2566ba784c.mengjuan.cmj%40alibaba-inc.com --- src/backend/access/transam/xlog.c | 40 +++++++++++++++++++++++++++---- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 09fe5272022..6f2162a9fdc 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7309,9 +7309,15 @@ CreateCheckPoint(int flags) InvalidTransactionId)) { /* - * Recalculate the current minimum LSN to be used in the WAL segment - * cleanup. Then, we must synchronize the replication slots again in - * order to make this LSN safe to use. + * Recalculate the current oldest LSN to be used in the WAL segment + * cleanup followed by synchronization of replication slots to disk + * again. It guarantees that the saved on disk restart LSNs will address + * existing WAL segments for existing replication slots. + * + * If a creating slot reserves WAL immediately after slotsMinReqLSN + * calculation and before slot synchronization to disk, the guarantee is + * not met, but it will be handled in the subsequent KeepLogSeg + * function, where current oldest reserved LSN will be used. */ slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); CheckPointReplicationSlots(shutdown); @@ -7973,8 +7979,32 @@ KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinReqLSN, XLogSegNo *logSegNo) XLByteToSeg(recptr, currSegNo, wal_segment_size); segno = currSegNo; - /* Calculate how many segments are kept by slots. */ - keep = slotsMinReqLSN; + /* + * Calculate how many segments are kept by slots. Keep the wal using + * the minimal value from the current reserved LSN and the reserved LSN at + * the moment of checkpoint start (before CheckPointReplicationSlots). + * + * The reserved LSN at checkpoint start guarantees that the saved on disk + * restart LSNs of existing slots will address existing WAL segments after + * immediate restart at checkpoint end, even if slot advance take place. + * This gurantee doesn't hold for new slots created during checkpoint. + * + * If a slot is creating during checkpoint, it may reserve WAL older than + * the reserved LSN at checkpoint start, following its computation. It can + * lead to the slot invalidation during checkpoint. + * + * The use of the minimal value of these both LSNs help to avoid + * invalidation of slot created during checkpoint. + * + * TODO: A new creating during checkpoint slot may reserve WAL at the moment + * between minimal reserved LSN calculation and the removal of the old WAL + * segments. In this case the slot may invalidated during checkpoint because + * its WAL reservation will not be taked into account. + */ + keep = XLogGetReplicationSlotMinimumLSN(); + if (!XLogRecPtrIsInvalid(slotsMinReqLSN)) + keep = Min(keep, slotsMinReqLSN); + if (keep != InvalidXLogRecPtr && keep < recptr) { XLByteToSeg(keep, segno, wal_segment_size); -- 2.34.1