On Fri, Nov 21, 2025 at 12:14 AM Zhijie Hou (Fujitsu) <[email protected]> wrote: > > On Thursday, November 20, 2025 6:44 PM Amit Kapila <[email protected]> > wrote: > > > > On Thu, Nov 20, 2025 at 4:07 PM Zhijie Hou (Fujitsu) > > <[email protected]> > > wrote: > > > > > > On Thursday, November 20, 2025 4:26 PM Vitaly Davydov > > <[email protected]> wrote: > > > > > > > > > > > Concerning reserve_wal_for_local_slot. It seems it is used in > > > > synchronization of failover logical slots. For me, it is tricky to > > > > change restart_lsn of a synced logical slot to RedoRecPtr, because > > > > it may lead to problems with logical replication using such slot > > > > after the replica promotion. But it seems it is the architectural > > > > problem and it is not related to the problems, solved by the patch. > > > > > > I think this is not an issue because if we use the redo pointer > > > instead of the remote restart_lsn as the initial value, the synced > > > slot won't be marked as sync-ready, so user cannot use it after > > > promotion (also well documented). This is also the existing behavior > > > before the patch, e.g., if the required WALs were removed, the oldest > > > available WAL was used as the initial value, similarly resulting in the > > > slot not > > being sync-ready. > > > > > > > Would it be better to discuss this in a separate thread? Though this is > > related > > to original problem but still in a separate part of code > > (slotsync) which I think can have a separate fix especially when the fix is > > also > > somewhat different. > > > > > > > > > > The change of lock mode to EXCLUSIVE in > > > > ReplicationSlotsComputeRequiredLSN may affect the performance when > > a > > > > lot of slots are advanced during some small period of time. It may > > > > affect the walsender performance. It advances the logical or > > > > physical slots when receive a confirmation from the replica. I > > > > guess, the slot advancement may be pretty frequent operation. > > > > > > Yes, I had the same thought and considered a simple alternative > > > (similar to your suggestion below): use an exclusive lock only when > > > updating the slot.restart_lsn during WAL reservation, while continuing > > > to use a shared lock in the computation function. Additionally, place > > XLogSetReplicationSlotMinimumLSN() under the lock. > > > This approach will also help serialize the process. > > > > > > > Can we discuss this as well in a separate thread? > > OK, I think it makes sense to start separate threads. > > I have split the patches based on the different bugs they > address and am sharing them here for reference. >
I'm reviewing the 0001 patch and the problem that can be addressed by that patch. While the proposed patch addresses the race condition between a checkpointing and newly created slot, could the same issue happen between the checkpointing and copying a slot? I'm trying to understand when we have to acquire ReplicationSlotAllocationLock in an exclusive mode in the new lock scheme. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
