On Wed, Sep 18, 2024 at 3:31 PM shveta malik <shveta.ma...@gmail.com> wrote:
>
> On Wed, Sep 18, 2024 at 2:49 PM shveta malik <shveta.ma...@gmail.com> wrote:
> >
> > > > Please find the attached v46 patch having changes for the above review
> > > > comments and your test review comments and Shveta's review comments.
> > > >
>
> When the synced slot is marked as 'inactive_timeout' invalidated on
> hot standby due to invalidation of publisher 's failover slot, the
> former starts showing NULL' inactive_since'. Is this intentional
> behaviour? I feel inactive_since should be non-NULL here too?
> Thoughts?
>
> physical standby:
> postgres=# select slot_name, inactive_since, invalidation_reason,
> failover, synced from pg_replication_slots;
> slot_name  |          inactive_since                              |
> invalidation_reason | failover | synced
> -------------+----------------------------------+---------------------+----------+--------
> sub2 | 2024-09-18 15:20:04.364998+05:30 |           | t        | t
> sub3 | 2024-09-18 15:20:04.364953+05:30 |           | t        | t
>
> After sync of invalidation_reason:
>
> slot_name  |          inactive_since          | invalidation_reason |
> failover | synced
> -------------+----------------------------------+---------------------+----------+--------
>  sub2 |                               | inactive_timeout    | t        | t
>  sub3 |                               | inactive_timeout    | t        | t
>
>

For synced slots on the standby, inactive_since indicates the last
synchronization time rather than the time the slot became inactive
(see doc - 
https://www.postgresql.org/docs/devel/view-pg-replication-slots.html).

In the reported case above, once a synced slot is invalidated we don't
even keep the last synchronization time for it. This is because when a
synced slot on the standby is marked invalid, inactive_since is reset
to NULL each time the slot-sync worker acquires a lock on it. This
lock acquisition before checking invalidation is done to avoid certain
race conditions and will activate the slot temporarily, resetting
inactive_since. Later, the slot-sync worker updates inactive_since for
all synced slots to the current synchronization time. However, for
invalid slots, this update is skipped, as per the patch’s design.

If we want to preserve the inactive_since value for the invalid synced
slots on standby, we need to clarify the time it should display. Here
are three possible approaches:

1) Copy the primary's inactive_since upon invalidation: When a slot
becomes invalid on the primary, the slot-sync worker could copy the
primary slot’s inactive_since to the standby slot and retain it, by
preventing future updates on the standby.

2) Use the current time of standby when the synced slot is marked
invalid for the first time and do not update it in subsequent sync
cycles if the slot is invalid.

Approach (2) seems more reasonable to me, however, Both 1) & 2)
approaches contradicts the purpose of inactive_since, as it no longer
represents either the true "last sync time" or the "time slot became
inactive" because the slot-sync worker acquires locks periodically for
syncing, and keeps activating the slot.

3) Continuously update inactive_since for invalid synced slots as
well: Treat invalid synced slots like valid ones by updating
inactive_since with each sync cycle. This way, we can keep the "last
sync time" in the inactive_since. However, this could confuse users
when "invalidation_reason=inactive_timeout" is set for a synced slot
on standby but inactive_since would reflect sync time rather than the
time slot became inactive. IIUC, on the primary, when
invalidation_reason=inactive_timeout for a slot, the inactive_since
represents the actual time the slot became inactive before getting
invalidated, unless the primary is restarted.

Thoughts?

--
Thanks,
Nisha


Reply via email to