On Wed, Feb 2, 2022 at 5:08 AM Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Wed, Feb 2, 2022 at 1:06 PM David G. Johnston
> <david.g.johns...@gmail.com> wrote:
>
> ...
> >
> > I already explained that the concept of err_cnt is not useful.  The fact
> that you include it here makes me think you are still thinking that this
> all somehow is meant to keep track of history.  It is not.  The workers are
> state machines and "error" is one of the states - with relevant attributes
> to display to the user, and system, while in that state.  The state machine
> reporting does not care about historical states nor does it report on
> them.  There is some uncertainty if we continue with the automatic
> re-launch;
> >
>
> I think automatic retry will help to allow some transient errors say
> like network glitches that can be resolved on retry and will keep the
> behavior transparent. This is also consistent with what we do in
> standby mode where if there is an error on primary due to which
> standby is not able to fetch some data it will just retry. We can't
> fix any error that occurred on the server-side, so the way is to retry
> which is true for both standby and subscribers.
>

Good points.  In short there are two subsets of problems to deal with
here.  We should address them separately, though the pg_subscription_worker
table should provide relevant information for both cases.  If we are in a
retry situation relevant information, like next_scheduled_retry
(estimated), should be provided (if there is some kind of delay involved).
In a situation like "unique constraint violation" the
"next_scheduled_retry" would be null; or make the field a text field and
print "Manual Intervention Required".  Likewise, the XID/LSN would be null
in a retry situation since we haven't received a wholly intact transaction
from the publisher (we may know of such an ID but if the final COMMIT
message is never even seen before the feed dies we should not be exposing
that incomplete information to the user).

A standby is not expected to encounter any user data constraint problems so
even a system with manual intervention for such will work for standbys
because they will never hit that code path.  And you cannot simply skip
applying the failed transaction and move onto the next one - that data also
never came over.

David J.

Reply via email to