Re: Optionally automatically disable logical replication subscriptions on error

Amit Kapila Mon, 06 Dec 2021 02:59:34 -0800

On Mon, Dec 6, 2021 at 10:07 AM Mark Dilger
<mark.dil...@enterprisedb.com> wrote:
>
> > On Dec 1, 2021, at 8:48 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> >
> > The patch disables the subscription for non-transient errors. I am not
> > sure if we can easily make the call to decide whether any particular
> > error is transient or not. For example, DISK_FULL or OUT_OF_MEMORY
> > might not rectify itself. Why not just allow to disable the
> > subscription on any error? And then let the user check the error
> > either in view or logs and decide whether it would like to enable the
> > subscription or do something before it (like making space in disk, or
> > fixing the network).
>
> The original idea of the patch, back when I first wrote and proposed it, was 
> to remove the *absurdity* of retrying a transaction which, in the absence of 
> human intervention, was guaranteed to simply fail again ad infinitum.  
> Retrying in the face of resource errors is not *absurd* even though it might 
> fail again ad infinitum.  The reason is that there is at least a chance that 
> the situation will clear up without human intervention.
>
> > The other problem I see with this transient error stuff is maintaining
> > the list of error codes that we think are transient. I think we need a
> > discussion for each of the error_codes we are listing now and whatever
> > new error_code we add in the future which doesn't seem like a good
> > idea.
>
> A reasonable rule might be:  "the subscription will be disabled if the server 
> can determine that retries cannot possibly succeed without human 
> intervention."  We shouldn't need to categorize all error codes perfectly, as 
> long as we're conservative.  What I propose is similar to how we determine 
> whether to mark a function leakproof; we don't have to mark all leakproof 
> functions as such, we just can't mark one as such if it is not.
>
> If we're going to debate the error codes, I think we would start with an 
> empty list, and add to the list on sufficient analysis.
>


Yeah, an empty list is a sort of what I thought was a good start
point. I feel we should learn from real-world use cases to see if
people really want to continue retrying even after using this option.


-- 
With Regards,
Amit Kapila.

Re: Optionally automatically disable logical replication subscriptions on error

Reply via email to