On Sat, Aug 31, 2024 at 1:45 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > Hi, > > > Please find the attached v44 patch with the above changes. I will > include the 0002 xid_age based invalidation patch later. >
Thanks for the patch Bharath. My review and testing is WIP, but please find few comments and queries: 1) I see that ReplicationSlotAlter() will error out if the slot is invalidated due to timeout. I have not tested it myself, but do you know if slot-alter errors out for other invalidation causes as well? Just wanted to confirm that the behaviour is consistent for all invalidation causes. 2) When a slot is invalidated, and we try to use that slot, it gives this msg: ERROR: can no longer get changes from replication slot "mysubnew1_2" DETAIL: The slot became invalid because it was inactive since 2024-09-03 14:23:34.094067+05:30, which is more than 600 seconds ago. HINT: You might need to increase "replication_slot_inactive_timeout.". Isn't HINT misleading? Even if we increase it now, the slot can not be reused again. 3) When the slot is invalidated, the' inactive_since' still keeps on changing when there is a subscriber trying to start replication continuously. I think ReplicationSlotAcquire() keeps on failing and thus Release keeps on setting it again and again. Shouldn't we stop setting/chnaging 'inactive_since' once the slot is invalidated already, otherwise it will be misleading. postgres=# select failover,synced,inactive_since,invalidation_reason from pg_replication_slots; failover | synced | inactive_since | invalidation_reason ----------+--------+----------------------------------+--------------------- t | f | 2024-09-03 14:23:.. | inactive_timeout after sometime: failover | synced | inactive_since | invalidation_reason ----------+--------+----------------------------------+--------------------- t | f | 2024-09-03 14:26:..| inactive_timeout 4) src/sgml/config.sgml: 4a) + A value of zero (which is default) disables the timeout mechanism. Better will be: A value of zero (which is default) disables the inactive timeout invalidation mechanism . or A value of zero (which is default) disables the slot invalidation due to the inactive timeout mechanism. i.e. rephrase to indicate that invalidation is disabled. 4b) 'synced' and inactive_since should point to pg_replication_slots: example: <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>synced</structfield> 5) src/sgml/system-views.sgml: + ..the slot has been inactive for longer than the duration specified by replication_slot_inactive_timeout parameter. Better to have: ..the slot has been inactive for a time longer than the duration specified by the replication_slot_inactive_timeout parameter. thanks Shveta