On Mon, Dec 6, 2021 at 10:07 AM Mark Dilger <mark.dil...@enterprisedb.com> wrote: > > > On Dec 1, 2021, at 8:48 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > The patch disables the subscription for non-transient errors. I am not > > sure if we can easily make the call to decide whether any particular > > error is transient or not. For example, DISK_FULL or OUT_OF_MEMORY > > might not rectify itself. Why not just allow to disable the > > subscription on any error? And then let the user check the error > > either in view or logs and decide whether it would like to enable the > > subscription or do something before it (like making space in disk, or > > fixing the network). > > The original idea of the patch, back when I first wrote and proposed it, was > to remove the *absurdity* of retrying a transaction which, in the absence of > human intervention, was guaranteed to simply fail again ad infinitum. > Retrying in the face of resource errors is not *absurd* even though it might > fail again ad infinitum. The reason is that there is at least a chance that > the situation will clear up without human intervention. > > > The other problem I see with this transient error stuff is maintaining > > the list of error codes that we think are transient. I think we need a > > discussion for each of the error_codes we are listing now and whatever > > new error_code we add in the future which doesn't seem like a good > > idea. > > A reasonable rule might be: "the subscription will be disabled if the server > can determine that retries cannot possibly succeed without human > intervention." We shouldn't need to categorize all error codes perfectly, as > long as we're conservative. What I propose is similar to how we determine > whether to mark a function leakproof; we don't have to mark all leakproof > functions as such, we just can't mark one as such if it is not. > > If we're going to debate the error codes, I think we would start with an > empty list, and add to the list on sufficient analysis. >
Yeah, an empty list is a sort of what I thought was a good start point. I feel we should learn from real-world use cases to see if people really want to continue retrying even after using this option. -- With Regards, Amit Kapila.