On Mon, Mar 10, 2025 at 11:46 AM Shlok Kyal <shlok.kyal....@gmail.com> wrote:
>
>
> I tried to reproduce the scenario described using the following steps:
>
> Here are the steps:
>
> 1. set max_replication_slots = 2
>
> 2. create two logical replication slot, lets say slot1 and slot2. drop
> the first slot (slot1).
>
> 3. Now run pg_copy_logical_replication slot function on slot2. Lets
> say copied slot is 'slot_cp'.
> In function copy_replication_slot add breakpoint just after function
> 'create_logical_replication_slot'.
> slot_cp will be created. In array
> ReplicationSlotCtl->replication_slots, slot_cp will come before slot2.
>
> 4. Now invalidate the 'slot2'.
> Function 'InvalidateObsoleteReplicationSlots' will be called. Now it
> will loop through ReplicationSlotCtl->replication_slots and will get
> 'slot_cp' first. Since the slot is acquired by copy_replication_slot,
> It will wait for the slot to be released. Once slot is released,
> 'slot_cp' and 'slot2' will be invalidated.
>

It is also possible that InvalidateObsoleteReplicationSlots() has
already past slot_cp location in which case it doesn't need to wait
for the backend.

I have changed the comments for the master branch patch. Do let me
know what you think of the attached change. If you agree with this,
then please prepare the patches for back-branches that reflect this
change.

-- 
With Regards,
Amit Kapila.
diff --git a/src/backend/replication/slotfuncs.c 
b/src/backend/replication/slotfuncs.c
index 999c14c920d..215b797664f 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -694,22 +694,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool 
logical_slot)
                plugin = NameStr(*(PG_GETARG_NAME(3)));
        }
 
-       /*
-        * Create new slot and acquire it
-        *
-        * After slot is created there can be a race condition with function
-        * InvalidateObsoleteReplicationSlots, when the copied slot appear 
before
-        * source slot in array ReplicationSlotCtl->replication_slots.
-        *
-        * If just after slot creation, the source slot is invalidated due to
-        * some operation. The execution of InvalidateObsoleteReplicationSlots 
will
-        * wait for this function to release the copied slot. Then the source 
slot
-        * will be invalidated. So, the copying of source slot is successful in 
this
-        * case.
-        *
-        * The wal_status of copied slot is set to lost immediately and hence 
will
-        * not have any harm.
-        */
+       /* Create new slot and acquire it */
        if (logical_slot)
        {
                /*
@@ -801,7 +786,14 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool 
logical_slot)
                                                        NameStr(*src_name)),
                                         errhint("Retry when the source 
replication slot's confirmed_flush_lsn is valid.")));
 
-               /* Check if source slot became invalidated during the copy 
operation */
+               /*
+                * Copying an invalid slot doesn't make sense. Note that the 
source
+                * slot can become invalid after we creat the new slot and copy 
the
+                * data of source slot. This is possible because the operations 
in
+                * InvalidateObsoleteReplicationSlots() are not serialized with 
this
+                * function. Even though we can't detect such a case here, the 
copied
+                * slot will become invalid in the next checkpoint cycle.
+                */
                if (second_slot_contents.data.invalidated != RS_INVAL_NONE)
                        ereport(ERROR,
                                        errmsg("cannot copy replication slot 
\"%s\"",

Reply via email to