On Tuesday, March 3, 2026 8:25 PM Zhijie Hou (Fujitsu) <[email protected]> 
wrote:
> > >
> > > I reproduced and debugged this issue where a replication slot's
> > > restart_lsn fails to advance. In my environment, I found it only
> > > occurs when a synced slot first builds a consistent snapshot. The
> > > problematic code path is in
> > > SnapBuildProcessRunningXacts():
> > >
> > >     if (builder->state < SNAPBUILD_CONSISTENT)
> > >     {
> > >         /* returns false if there's no point in performing cleanup just 
> > > yet */
> > >         if (!SnapBuildFindSnapshot(builder, lsn, running))
> > >             return;
> > >     }
> > >
> > > When a synced slot reaches consistency for the first time with no
> > > running transactions, SnapBuildFindSnapshot() returns false, causing
> > > the function to return without updating the candidate restart_lsn.
> 
> 
> > However, I have a question that even if we haven't incremented it in
> > the first cycle, why is it not incrementing restart_lsn in consecutive sync
> cycles.
> 
> Because no walsender was active during the reproduction step, so the slot
> remained inactive on the publisher and its restart_lsn didn't advanced. As a
> result, the slotsync process stalled while continuously retrying the first 
> cycle
> of snapshot building.

I've analyzed the issues further and identified two distinct cases that can
prevent the slotsync worker from advancing restart_lsn:

1) One is that I mentioned earlier, when the slotsync worker builds a consistent
snapshot for the first time, it does not advance restart_lsn to that LSN nor
serialize the snapshot. If the remote slot on the primary remains unchanged, the
slotsync worker will repeatedly report a WARNING, since it always perceives
itself as building a consistent point for the first time at that same LSN.

The solution I thought is to try to advance the restart_lsn on reaching
consistency, the detailed analysis is as follows:

First is how a consistent snapshot is built first time,

a) a consistent snapshot is built incrementatlly by waiting for running
   transaction to finish.
b) a consistent snapshot is built by restoring a serialized snapshot.
c) a consistent snapshot is built because xl_running_xacts record showed no
   running transactions.

Currently, we do not advance the restart_lsn in all above case, which cause the
restart_lsn fall behind, causing the slotsync worker to repeatedly reporting
LOG. In my reproducation, it reached case c) but case a) and b) can cause the
same issue.

To improve it, I am thinking to advance the restart_lsn in all above cases when
approproate:

For case a), we cannot unconditionaly advance restart_lsn to the LSN of last
xl_running_xact because without collecting all previous transactions info from
the old restart_lsn we could not build a snapshot immediately at this LSN again
after restarting. The solution I thought is to serialize the snapshot in this
case and then advance the restart_lsn, in which case we can avoid collecting all
previous.

For case b) we can direcltly advance the restart_lsn since with a serialized
snapshot we can built reach consistency after restarting as well.

For case c), similar to case b) it's OK to advance the restart_lsn.

In both a) b) c) case, there is one general case where we cannot advance, that
is, when there are transactions decoded but not yet committed before reaching
the consistent point. These transactions data may still be replicated so we
cannot simply advance beyond them.

To implement above, we still need the return value of SnapBuildFindSnapshot() to
distinguish between case a) and b) c). The function returns true for case a
return false for case b) and c). The comment stop SnapBuildFindSnapshot can be
updated because I think the return value is not directly related to whether to
maintain restart_lsn and clean the txn data or not.

See the attachment that implements above.

2) Another reason is when using pg_logical_slot_peek_changes() (including the
binary version) on the primary. This function allows reading WAL beyond the
current confirmed_flush_lsn without actually advancing confirm pos. If the 
starting
position of an xl_running_xacts record happens to be exactly at the
confirmed_flush_lsn, the function can advance restart_lsn to that point.

However, during slot synchronization, the standby cannot read WAL beyond
confirmed_flush_lsn. This means it cannot access the final xl_running_xacts
record needed to advance restart_lsn to the latest position, causing the
advancement to fail.

In Fuji-San's example:

[PRIMARY]
=# SELECT slot_name, restart_lsn, confirmed_flush_lsn from
pg_replication_slots where slot_name = 'logical_slot';
  slot_name   | restart_lsn | confirmed_flush_lsn
--------------+-------------+---------------------
 logical_slot | 0/03000140  | 0/03000140

[STANDBY]
=# SELECT slot_name, restart_lsn, confirmed_flush_lsn from
pg_replication_slots where slot_name = 'logical_slot';
  slot_name   | restart_lsn | confirmed_flush_lsn
--------------+-------------+---------------------
 logical_slot | 0/03000098  | 0/03000140

0/03000140 is the start position of the last xl_running_xacts.

One way to improve is to change the advancement function to read the last record
as well:

diff --git a/src/backend/replication/logical/logical.c 
b/src/backend/replication/logical/logical.c
index 603a2b94d05..309feaf2219 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -2134,7 +2134,7 @@ LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr moveto,
                InvalidateSystemCaches();
 
                /* Decode records until we reach the requested target */
-               while (ctx->reader->EndRecPtr < moveto)
+               while (ctx->reader->EndRecPtr <= moveto)


Note that I am not insisting on the approach of changing the advancement, I am
trying to analyze the root reason and explore some alternatives for reference.

Best Regards,
Hou zj


Attachment: v3-0001-Advance-restart_lsn-when-reaching-consistency-wit.patch
Description: v3-0001-Advance-restart_lsn-when-reaching-consistency-wit.patch

Reply via email to