On Mon, Jul 7, 2025 at 7:00 PM Ashutosh Bapat
<ashutosh.bapat....@gmail.com> wrote:
>
> On Mon, Jul 7, 2025 at 9:53 AM shveta malik <shveta.ma...@gmail.com> wrote:
> >
> > Thanks for the patch.
> >
> > > I couldn't figure out whether the query on primary to fetch all the
> > > slots to be synchronized should filter based on invalidation_reason
> > > and conflicting or not. According to synchronize_slots(), it seems
> > > that we retain invalidated slots on standby when failover = true and
> > > they would remain with synced = true on standby. Is that right?
> > >
> >
> > Yes, that’s correct. We sync the invalidation status of replication
> > slots from the primary to the standby and then stop synchronizing any
> > slots that have been marked as invalidated, retaining synced flag as
> > true. IMO, there's no need to filter out conflicting slots on the
> > primary, because instead of excluding them there and showing
> > everything as failover-ready on the standby, the correct approach is
> > to reflect the actual state on standby.This means conflicting slots
> > will appear as non-failover-ready on the standby. That’s why Step 3
> > also considers conflicting flag in its evaluation.
>
> Thanks for the explanation. WFM.
>
> If a slot is invalidated because RS_INVAL_WAL_REMOVED, conflicting =
> false, but the slot is not useful standby. But then it's not useful on
> primary as well. Is that why we are not including (invalidation_reason
> IS NOT NULL) condition along in (synced AND NOT temporary AND NOT
> conflicting)?

Thanks for bringing it up. Even if the slot is not useful on the
primary node as well, we shall still show failover-ready as false on
standby. We should modify the query of step3 to check
'invalidation_reason IS NULL' instead of  'NOT conflicting'. That will
cover all the cases where the slot  is invalidated and thus not
failover ready.

> >
> > 1)
> > +/* primary # */ SELECT array_agg(quote_literal(r.slot_name)) AS slots
> > +               FROM pg_replication_slots r
> > +               WHERE r.failover AND NOT r.temporary;
> >
> > On primary, there is no  need to check temporary-status. We do not
> > allow setting failover as true for temporary slots.
>
> Why does synchronize_slots() has it in its query?
>

It is not needed but no harm in maintaining it.
If we want documents to be in sync with code, we can have 'not
temporary' check in doc as well.

> >
> > 2)
> > Although not directly related to the concerns addressed in the given
> > patch, I think it would be helpful to add a note in the original doc
> > stating that Steps 1 and 2 should be executed on each subscriber node
> > that will be served by the standby after failover.
>
> There's a bit of semantic repeatition here since an earlier paragraph
> mentions a given subscriber. But I think overall it's better to be
> more clear than being less clear.
> >
> > I have attached a top-up patch with the above changes and a few more
> > trivial changes. Please include it if you find it okay.
>
> Thanks. Included. I have made a few edits and included them in the
> attached patch.
>

Thanks. The existing changes (originally targeted in this patch) look
good to me.

I have attached a top-up patch for step-3 correction. Please include
if you find it okay to be fixed in the same patch, otherwise we can
handle it separately.

thanks
Shveta

Attachment: 0001-top-up-patch.patch
Description: Binary data

Reply via email to