On Mon, Nov 10, 2025 at 2:41 PM Amit Kapila <[email protected]> wrote: > > On Wed, Nov 5, 2025 at 3:48 PM Alexander Korotkov <[email protected]> > wrote: > > > > On Mon, Oct 6, 2025 at 6:46 PM Vitaly Davydov <[email protected]> > > wrote: > > > There is one subtle thing. Once, the operation of restart_lsn assignment > > > is not > > > an atomic, the following scenario may happen theoretically: > > > 1. Read GetRedoRecPtr() in the backend (ReplicationSlotReserveWal) > > > 2. Assign a new redo LSN in the checkpointer > > > 3. Call ReplicationSlotsComputeRequiredLSN() in the checkpointer > > > 3. Assign the old redo LSN to restart_lsn > > > > > > In this scenario, the restart_lsn will point to a previous redo LSN and > > > it will > > > be not protected by the new redo LSN. This scenario is unlikely, but it > > > can > > > happen theoretically. I have no ideas how to deal with it, except of > > > assigning > > > restart_lsn under XLogCtl->info_lck lock to avoid concurrent modification > > > of > > > XLogCtl->RecoRecPtr until it is assigned to restart_lsn of a creating > > > slot. > > > > > > In case of recovery, when GetXLogReplayRecPtr() is used, the protection by > > > redo LSN seems to work as well, because a new redo LSN is taken from the > > > latest > > > replayed checkpoint. Thus, it is guaranteed that GetXLogReplayRecPtr() > > > will not > > > be less than the new redo LSN, if it is called right after assignment of > > > redo > > > LSN in CreateRestartPoint(). > > > > Thank you for highlighting this scenario. I've reviewed it. I think > > we could avoid it by covering appropriate parts of > > ReplicationSlotReserveWal() and Create{Check|Restart}Point() by a new > > LWLock. The draft patch is attached. What do you think? > > > > The fix seems to be only provided for bank branches, but IIUC the > problem can happen in HEAD as well. In Head, how about acquiring > ReplicationSlotAllocationLock in Exclusive mode during > ReplicationSlotReserveWal? This lock is acquired in SHARE mode in > CheckPointReplicationSlots. So, this should make our calculations > correct and avoid invalidating the newly created slot. >
We need to check whether a similar change is required in reserve_wal_for_local_slot() as well for sync slots. -- With Regards, Amit Kapila.
