On Wed, Feb 19, 2025 at 3:46 AM Shlok Kyal <shlok.kyal....@gmail.com> wrote:
>
> On Tue, 18 Feb 2025 at 15:26, Zhijie Hou (Fujitsu)
> <houzj.f...@fujitsu.com> wrote:
> >
> > On Monday, February 17, 2025 7:31 PM Shlok Kyal <shlok.kyal....@gmail.com> 
> > wrote:
> > >
> > > On Thu, 13 Feb 2025 at 15:54, vignesh C <vignes...@gmail.com> wrote:
> > > >
> > > > On Tue, 4 Feb 2025 at 15:27, Shlok Kyal <shlok.kyal....@gmail.com> 
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Currently, we can copy an invalidated slot using the function
> > > > > 'pg_copy_logical_replication_slot'. As per the suggestion in the
> > > > > thread [1], we should prohibit copying of such slots.
> > > > >
> > > > > I have created a patch to address the issue.
> > > >
> > > > This patch does not fix all the copy_replication_slot scenarios
> > > > completely, there is a very corner concurrency case where an
> > > > invalidated slot still gets copied:
> > > > +       /* We should not copy invalidated replication slots */
> > > > +       if (src_isinvalidated)
> > > > +               ereport(ERROR,
> > > > +
> > > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > +                                errmsg("cannot copy an invalidated
> > > > replication slot")));
> > > >
> > > > Consider the following scenario:
> > > > step 1) Set up streaming replication between the primary and standby 
> > > > nodes.
> > > > step 2) Create a logical replication slot (test1) on the standby node.
> > > > step 3) Have a breakpoint in InvalidatePossiblyObsoleteSlot if cause
> > > > is RS_INVAL_WAL_LEVEL, no need to hold other invalidation causes or
> > > > add a sleep in InvalidatePossiblyObsoleteSlot function like below:
> > > > if (cause == RS_INVAL_WAL_LEVEL)
> > > > {
> > > > while (bsleep)
> > > > sleep(1);
> > > > }
> > > > step 4) Reduce wal_level on the primary to replica and restart the 
> > > > primary
> > > node.
> > > > step 5) SELECT 'copy' FROM pg_copy_logical_replication_slot('test1',
> > > > 'test2');  -- It will wait till the lock held by
> > > > InvalidatePossiblyObsoleteSlot is released while trying to create a
> > > > slot.
> > > > step 6) Increase wal_level back to logical on the primary node and
> > > > restart the primary.
> > > > step 7) Now allow the invalidation to happen (continue the breakpoint
> > > > held at step 3), the replication control lock will be released and the
> > > > invalidated slot will be copied
> > > >
> > > > After this:
> > > > postgres=# SELECT 'copy' FROM
> > > > pg_copy_logical_replication_slot('test1', 'test2');  ?column?
> > > > ----------
> > > >  copy
> > > > (1 row)
> > > >
> > > > -- The invalidated slot (test1) is copied successfully:
> > > > postgres=# select * from pg_replication_slots ;
> > > >  slot_name |    plugin     | slot_type | datoid | database | temporary
> > > > | active | active_pid | xmin | catalog_xmin | restart_lsn |
> > > > confirmed_flush_lsn | wal_status | safe_wal_size | two_phas
> > > > e |          inactive_since          | conflicting |
> > > > invalidation_reason   | failover | synced
> > > >
> > > -----------+---------------+-----------+--------+----------+-----------+
> > > --------+------------+------+--------------+-------------+---------------
> > > ------+------------+---------------+---------
> > > >
> > > --+----------------------------------+-------------+----------------------
> > > --+----------+--------
> > > >  test1     | test_decoding | logical   |      5 | postgres | f
> > > > | f      |            |      |          745 | 0/4029060   | 0/4029098
> > > >          | lost       |               | f
> > > >   | 2025-02-13 15:26:54.666725+05:30 | t           |
> > > > wal_level_insufficient | f        | f
> > > >  test2     | test_decoding | logical   |      5 | postgres | f
> > > > | f      |            |      |          745 | 0/4029060   | 0/4029098
> > > >          | reserved   |               | f
> > > >   | 2025-02-13 15:30:30.477836+05:30 | f           |
> > > >      | f        | f
> > > > (2 rows)
> > > >
> > > > -- A subsequent attempt to decode changes from the invalidated slot
> > > > (test2) fails:
> > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > NULL);
> > > > WARNING:  detected write past chunk end in TXN 0x5e77e6c6f300
> > > > ERROR:  logical decoding on standby requires "wal_level" >= "logical"
> > > > on the primary
> > > >
> > > > -- Alternatively, the following error may occur:
> > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > NULL);
> > > > WARNING:  detected write past chunk end in TXN 0x582d1b2d6ef0
> > > >     data
> > > > ------------
> > > >  BEGIN 744
> > > >  COMMIT 744
> > > > (2 rows)
> > > >
> > > > This is an edge case that can occur under specific conditions
> > > > involving replication slot invalidation when there is a huge lag
> > > > between primary and standby.
> > > > There might be a similar concurrency case for wal_removed too.
> > > >
> > >
> > > Hi Vignesh,
> > >
> > > Thanks for reviewing the patch.
> >
> > Thanks for updating the patch. I have a question related to it.
> >
> > >
> > > I have tested the above scenario and was able to reproduce it. I have 
> > > fixed it in
> > > the v2 patch.
> > > Currently we are taking a shared lock on ReplicationSlotControlLock.
> > > This issue can be resolved if we take an exclusive lock instead.
> > > Thoughts?
> >
> > It's not clear to me why increasing the lock level can solve it, could you
> > elaborate a bit more on this ?
> >
> In HEAD, InvalidateObsoleteReplicationSlots acquires a SHARED lock on
> 'ReplicationSlotControlLock'
> Also in function 'copy_replication_slot' we take a  SHARED lock on
> 'ReplicationSlotControlLock' during fetching of source slot.
>
> So, for the case described by Vignesh in [1], first
> InvalidateObsoleteReplicationSlot is called and we hold a SHARED lock
> on 'ReplicationSlotControlLock'. We are now holding the function using
> the sleep
> if (cause == RS_INVAL_WAL_LEVEL)
>  {
> while (bsleep)
> sleep(1);
> }
>
> Now we create a copy of the slot since 'copy_replication_slot'  takes
> a SHARED lock on 'ReplicationSlotControlLock'. It will take the lock
> and fetch the info of the source slot (the slot is not invalidated
> till now). and the function 'copy_replication_slot' calls function
> 'create_logical_replication_slot' which takes a EXCLUSIVE lock on
> ReplicationSlotControlLock and hence it will wait for function
> InvalidateObsoleteReplicationSlot to release lock. Once the function
> 'InvalidateObsoleteReplicationSlot' releases the lock, the execution
> of 'create_logical_replication_slot' continues and creates a copy of
> the source slot.
>
> Now with the patch, 'copy_replication_slot' will take an EXCLUSIVE
> lock on  'ReplicationSlotControlLock'. to fetch the slot info. Hence,
> it will wait for the 'InvalidateObsoleteReplicationSlot' to release
> the lock and then fetch the source slot info and try to create the
> copied slot (which will fail as source slot is invalidated before we
> fetch its info)
>
> > Besides, do we need one more invalidated check in the following codes after
> > creating the slot ?
> >
> >                 /*
> >                  * Check if the source slot still exists and is valid. We 
> > regard it as
> >                  * invalid if the type of replication slot or name has been 
> > changed,
> >                  * or the restart_lsn either is invalid or has gone 
> > backward. (The
> >                  ...
> >
>
> This approach seems more feasible to me. It also resolves the issue
> suggested by Vignesh in [1].  I have made changes for the same in v3
> patch.
>

I agree to check if the source slot got invalidated during the copy.
But why do we need to search the slot by the slot name again as
follows?

+       /* Check if source slot was invalidated while copying of slot */
+       LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+
+       for (int i = 0; i < max_replication_slots; i++)
+       {
+           ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
+
+           if (s->in_use && strcmp(NameStr(s->data.name),
NameStr(*src_name)) == 0)
+           {
+               /* Copy the slot contents while holding spinlock */
+               SpinLockAcquire(&s->mutex);
+               first_slot_contents = *s;
+               SpinLockRelease(&s->mutex);
+               src = s;
+               break;
+           }
+       }
+
+       LWLockRelease(ReplicationSlotControlLock);

I think 'src' already points to the source slot.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Reply via email to