On Mon, Oct 6, 2025 at 6:46 PM Vitaly Davydov <[email protected]> wrote: > There is one subtle thing. Once, the operation of restart_lsn assignment is > not > an atomic, the following scenario may happen theoretically: > 1. Read GetRedoRecPtr() in the backend (ReplicationSlotReserveWal) > 2. Assign a new redo LSN in the checkpointer > 3. Call ReplicationSlotsComputeRequiredLSN() in the checkpointer > 3. Assign the old redo LSN to restart_lsn > > In this scenario, the restart_lsn will point to a previous redo LSN and it > will > be not protected by the new redo LSN. This scenario is unlikely, but it can > happen theoretically. I have no ideas how to deal with it, except of assigning > restart_lsn under XLogCtl->info_lck lock to avoid concurrent modification of > XLogCtl->RecoRecPtr until it is assigned to restart_lsn of a creating slot. > > In case of recovery, when GetXLogReplayRecPtr() is used, the protection by > redo LSN seems to work as well, because a new redo LSN is taken from the > latest > replayed checkpoint. Thus, it is guaranteed that GetXLogReplayRecPtr() will > not > be less than the new redo LSN, if it is called right after assignment of redo > LSN in CreateRestartPoint().
Thank you for highlighting this scenario. I've reviewed it. I think
we could avoid it by covering appropriate parts of
ReplicationSlotReserveWal() and Create{Check|Restart}Point() by a new
LWLock. The draft patch is attached. What do you think?
------
Regards,
Alexander Korotkov
Supabase
ReplicationSlotReserveWALLock.patch
Description: Binary data
