On Wed, Sep 18, 2024 at 3:31 PM shveta malik <shveta.ma...@gmail.com> wrote: > > On Wed, Sep 18, 2024 at 2:49 PM shveta malik <shveta.ma...@gmail.com> wrote: > > > > > > Please find the attached v46 patch having changes for the above review > > > > comments and your test review comments and Shveta's review comments. > > > > > > When the synced slot is marked as 'inactive_timeout' invalidated on > hot standby due to invalidation of publisher 's failover slot, the > former starts showing NULL' inactive_since'. Is this intentional > behaviour? I feel inactive_since should be non-NULL here too? > Thoughts? > > physical standby: > postgres=# select slot_name, inactive_since, invalidation_reason, > failover, synced from pg_replication_slots; > slot_name | inactive_since | > invalidation_reason | failover | synced > -------------+----------------------------------+---------------------+----------+-------- > sub2 | 2024-09-18 15:20:04.364998+05:30 | | t | t > sub3 | 2024-09-18 15:20:04.364953+05:30 | | t | t > > After sync of invalidation_reason: > > slot_name | inactive_since | invalidation_reason | > failover | synced > -------------+----------------------------------+---------------------+----------+-------- > sub2 | | inactive_timeout | t | t > sub3 | | inactive_timeout | t | t > >
For synced slots on the standby, inactive_since indicates the last synchronization time rather than the time the slot became inactive (see doc - https://www.postgresql.org/docs/devel/view-pg-replication-slots.html). In the reported case above, once a synced slot is invalidated we don't even keep the last synchronization time for it. This is because when a synced slot on the standby is marked invalid, inactive_since is reset to NULL each time the slot-sync worker acquires a lock on it. This lock acquisition before checking invalidation is done to avoid certain race conditions and will activate the slot temporarily, resetting inactive_since. Later, the slot-sync worker updates inactive_since for all synced slots to the current synchronization time. However, for invalid slots, this update is skipped, as per the patch’s design. If we want to preserve the inactive_since value for the invalid synced slots on standby, we need to clarify the time it should display. Here are three possible approaches: 1) Copy the primary's inactive_since upon invalidation: When a slot becomes invalid on the primary, the slot-sync worker could copy the primary slot’s inactive_since to the standby slot and retain it, by preventing future updates on the standby. 2) Use the current time of standby when the synced slot is marked invalid for the first time and do not update it in subsequent sync cycles if the slot is invalid. Approach (2) seems more reasonable to me, however, Both 1) & 2) approaches contradicts the purpose of inactive_since, as it no longer represents either the true "last sync time" or the "time slot became inactive" because the slot-sync worker acquires locks periodically for syncing, and keeps activating the slot. 3) Continuously update inactive_since for invalid synced slots as well: Treat invalid synced slots like valid ones by updating inactive_since with each sync cycle. This way, we can keep the "last sync time" in the inactive_since. However, this could confuse users when "invalidation_reason=inactive_timeout" is set for a synced slot on standby but inactive_since would reflect sync time rather than the time slot became inactive. IIUC, on the primary, when invalidation_reason=inactive_timeout for a slot, the inactive_since represents the actual time the slot became inactive before getting invalidated, unless the primary is restarted. Thoughts? -- Thanks, Nisha