Re: Design of pg_stat_subscription_workers vs pgstats

Masahiko Sawada Mon, 31 Jan 2022 22:18:05 -0800

On Fri, Jan 28, 2022 at 2:59 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Fri, Jan 28, 2022 at 1:49 AM David G. Johnston
> <david.g.johns...@gmail.com> wrote:
> >
> > On Thu, Jan 27, 2022 at 5:08 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
> >>
> >> On Thu, Jan 27, 2022 at 11:16 AM Andres Freund <and...@anarazel.de> wrote:
> >> >
> >> > On 2022-01-25 20:27:07 +0900, Masahiko Sawada wrote:
> >> >
> >> > > There will be some challenges in a case where updating 
> >> > > pg_subscription_rel
> >> > > also failed too (what to report to the user, etc.). And moreover, we 
> >> > > don't
> >> > > want to consume space for temporary information in the system catalog.
> >> >
> >> > You're consuming resources in a *WAY* worse way right now. The stats 
> >> > file gets
> >> > constantly written out, and quite often read back by backends. In 
> >> > contrast to
> >> > parts of pg_subscription_rel or such that data can't be removed from
> >> > shared_buffers under pressure.
> >> >
> >>
> >> I don't think pg_subscription_rel is the right place to store error
> >> info as the error can happen say while processing some message type
> >> like BEGIN where we can't map it to pg_subscription_rel entry. There
> >> could be other cases as well where we won't be able to map it to
> >> pg_subscription_rel like some error related to some other table while
> >> processing trigger function.
> >>
> >> In general, there doesn't appear to be much advantage in storing all
> >> the error info in system catalogs as we don't want it to be persistent
> >> (crash-safe). Also, this information is not about any system object
> >> that future operations can use, so won't map from that angle as well.
> >
> >
> > Repeating myself here to try and keep complaints regarding 
> > pg_stat_subscription_worker in one place.
> >
> > This is my specific email with respect to the pg_stat_scription_workers 
> > design.
> >
> > https://www.postgresql.org/message-id/CAKFQuwZbFuPSV1WLiNFuODst1sUZon2Qwbj8d9tT%3D38hMhJfvw%40mail.gmail.com
> >
> > Specifically,
> >
> > pg_stat_subscription_workers is defined as showing:
> > "will contain one row per subscription
> > worker on which errors have occurred, for workers applying logical
> > replication changes and workers handling the initial data copy of the
> > subscribed tables."
> >
> > The fact that these errors remain (last_error_*) even after they are no 
> > longer relevant is my main gripe regarding this feature.  The information 
> > itself is generally useful though last_error_count is not.  These fields 
> > should auto-clear and be named current_error_* as they exist for purposes 
> > of describing the current state of any error-encountering logical 
> > replication worker so that ALTER SUBSCRIPTION SKIP, or someone other manual 
> > intervention, can be done with that knowledge without having to scan the 
> > subscriber's server logs.
> >
>
> We can discuss names of columns but the main reason was that tomorrow
> say we want to account for total errors not only the current error
> then we have to introduce the field error_count or something like that
> which will then conflict with names like current_*. Similar for
> transaction abort_count. In the initial versions of the patch, we were
> not using last_* for column names but similar arguments led us to
> change names to last_ terminology and the same was being used in
> pg_stat_archiver. But, feel free to suggest better names. Yes, I agree
> with an auto-clear point as well and there seems to be an agreement
> for doing it after the next successful apply and or after we skipped
> the failed xact.
>
> > This is my email trying to understand reality better in order to figure out 
> > what exactly is causing the limitations that are negatively impacting the 
> > design of this feature.
> >
> > https://www.postgresql.org/message-id/CAKFQuwYJ7dsW%2BStsw5%2BZVoY3nwQ9j6pPt-7oYjGddH-h7uVb%2Bg%40mail.gmail.com
> >
> > In short, it was convenient to use the statistics collector here even if 
> > doing so resulted in a non-user friendly (IMO) design.  Given all of the 
> > limitations to the statistics collection infrastructure, and the fact that 
> > this data is not statistical in the usual usage of the term, I find that to 
> > be less than satisfying.
> >
>
> I think the failures/conflicts are also important information for
> users to know, so having a view of those doesn't appear to be a bad
> idea. All this data is less suitable for system catalogs like
> pg_subscription_rel or others for the reasons quoted in my previous
> email [1].


I see that it's better to use a better IPC for ALTER SUBSCRIPTION SKIP
feature to pass error-XID or error-LSN information to the worker
whereas I'm also not sure of the advantages in storing all error
information in a system catalog. Since what we need to do for this
purpose is only error-XID/LSN, we can store only error-XID/LSN in the
catalog? That is, the worker stores error-XID/LSN in the catalog on an
error, and ALTER SUBSCRIPTION SKIP command enables the worker to skip
the transaction in question. The worker clears the error-XID/LSN after
successfully applying or skipping the first non-empty transaction.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Design of pg_stat_subscription_workers vs pgstats

Reply via email to