On Mon, Sep 16, 2024 at 3:31 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > Please find the attached v46 patch having changes for the above review > comments and your test review comments and Shveta's review comments. >
-ReplicationSlotAcquire(const char *name, bool nowait) +ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid) { ReplicationSlot *s; int active_pid; @@ -615,6 +620,22 @@ retry: /* We made this slot active, so it's ours now. */ MyReplicationSlot = s; + /* + * An error is raised if error_if_invalid is true and the slot has been + * previously invalidated due to inactive timeout. + */ + if (error_if_invalid && + s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT) + { + Assert(s->inactive_since > 0); + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("can no longer get changes from replication slot \"%s\"", + NameStr(s->data.name)), + errdetail("This slot has been invalidated because it was inactive for longer than the amount of time specified by \"%s\".", + "replication_slot_inactive_timeout"))); + } Why raise the ERROR just for timeout invalidation here and why not if the slot is invalidated for other reasons? This raises the question of what happens before this patch if the invalid slot is used from places where we call ReplicationSlotAcquire(). I did a brief code analysis and found that for StartLogicalReplication(), even if the error won't occur in ReplicationSlotAcquire(), it would have been caught in CreateDecodingContext(). I think that is where we should also add this new error. Similarly, pg_logical_slot_get_changes_guts() and other logical replication functions should be calling CreateDecodingContext() which can raise the new ERROR. I am not sure about how the invalid slots are handled during physical replication, please check the behavior of that before this patch. -- With Regards, Amit Kapila.