On Mon, Mar 10, 2025 at 11:46 AM Shlok Kyal <shlok.kyal....@gmail.com> wrote: > > > I tried to reproduce the scenario described using the following steps: > > Here are the steps: > > 1. set max_replication_slots = 2 > > 2. create two logical replication slot, lets say slot1 and slot2. drop > the first slot (slot1). > > 3. Now run pg_copy_logical_replication slot function on slot2. Lets > say copied slot is 'slot_cp'. > In function copy_replication_slot add breakpoint just after function > 'create_logical_replication_slot'. > slot_cp will be created. In array > ReplicationSlotCtl->replication_slots, slot_cp will come before slot2. > > 4. Now invalidate the 'slot2'. > Function 'InvalidateObsoleteReplicationSlots' will be called. Now it > will loop through ReplicationSlotCtl->replication_slots and will get > 'slot_cp' first. Since the slot is acquired by copy_replication_slot, > It will wait for the slot to be released. Once slot is released, > 'slot_cp' and 'slot2' will be invalidated. >
It is also possible that InvalidateObsoleteReplicationSlots() has already past slot_cp location in which case it doesn't need to wait for the backend. I have changed the comments for the master branch patch. Do let me know what you think of the attached change. If you agree with this, then please prepare the patches for back-branches that reflect this change. -- With Regards, Amit Kapila.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 999c14c920d..215b797664f 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -694,22 +694,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot) plugin = NameStr(*(PG_GETARG_NAME(3))); } - /* - * Create new slot and acquire it - * - * After slot is created there can be a race condition with function - * InvalidateObsoleteReplicationSlots, when the copied slot appear before - * source slot in array ReplicationSlotCtl->replication_slots. - * - * If just after slot creation, the source slot is invalidated due to - * some operation. The execution of InvalidateObsoleteReplicationSlots will - * wait for this function to release the copied slot. Then the source slot - * will be invalidated. So, the copying of source slot is successful in this - * case. - * - * The wal_status of copied slot is set to lost immediately and hence will - * not have any harm. - */ + /* Create new slot and acquire it */ if (logical_slot) { /* @@ -801,7 +786,14 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot) NameStr(*src_name)), errhint("Retry when the source replication slot's confirmed_flush_lsn is valid."))); - /* Check if source slot became invalidated during the copy operation */ + /* + * Copying an invalid slot doesn't make sense. Note that the source + * slot can become invalid after we creat the new slot and copy the + * data of source slot. This is possible because the operations in + * InvalidateObsoleteReplicationSlots() are not serialized with this + * function. Even though we can't detect such a case here, the copied + * slot will become invalid in the next checkpoint cycle. + */ if (second_slot_contents.data.invalidated != RS_INVAL_NONE) ereport(ERROR, errmsg("cannot copy replication slot \"%s\"",