On Fri, Jan 28, 2022 at 2:59 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Fri, Jan 28, 2022 at 1:49 AM David G. Johnston > <david.g.johns...@gmail.com> wrote: > > > > On Thu, Jan 27, 2022 at 5:08 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > >> > >> On Thu, Jan 27, 2022 at 11:16 AM Andres Freund <and...@anarazel.de> wrote: > >> > > >> > On 2022-01-25 20:27:07 +0900, Masahiko Sawada wrote: > >> > > >> > > There will be some challenges in a case where updating > >> > > pg_subscription_rel > >> > > also failed too (what to report to the user, etc.). And moreover, we > >> > > don't > >> > > want to consume space for temporary information in the system catalog. > >> > > >> > You're consuming resources in a *WAY* worse way right now. The stats > >> > file gets > >> > constantly written out, and quite often read back by backends. In > >> > contrast to > >> > parts of pg_subscription_rel or such that data can't be removed from > >> > shared_buffers under pressure. > >> > > >> > >> I don't think pg_subscription_rel is the right place to store error > >> info as the error can happen say while processing some message type > >> like BEGIN where we can't map it to pg_subscription_rel entry. There > >> could be other cases as well where we won't be able to map it to > >> pg_subscription_rel like some error related to some other table while > >> processing trigger function. > >> > >> In general, there doesn't appear to be much advantage in storing all > >> the error info in system catalogs as we don't want it to be persistent > >> (crash-safe). Also, this information is not about any system object > >> that future operations can use, so won't map from that angle as well. > > > > > > Repeating myself here to try and keep complaints regarding > > pg_stat_subscription_worker in one place. > > > > This is my specific email with respect to the pg_stat_scription_workers > > design. > > > > https://www.postgresql.org/message-id/CAKFQuwZbFuPSV1WLiNFuODst1sUZon2Qwbj8d9tT%3D38hMhJfvw%40mail.gmail.com > > > > Specifically, > > > > pg_stat_subscription_workers is defined as showing: > > "will contain one row per subscription > > worker on which errors have occurred, for workers applying logical > > replication changes and workers handling the initial data copy of the > > subscribed tables." > > > > The fact that these errors remain (last_error_*) even after they are no > > longer relevant is my main gripe regarding this feature. The information > > itself is generally useful though last_error_count is not. These fields > > should auto-clear and be named current_error_* as they exist for purposes > > of describing the current state of any error-encountering logical > > replication worker so that ALTER SUBSCRIPTION SKIP, or someone other manual > > intervention, can be done with that knowledge without having to scan the > > subscriber's server logs. > > > > We can discuss names of columns but the main reason was that tomorrow > say we want to account for total errors not only the current error > then we have to introduce the field error_count or something like that > which will then conflict with names like current_*. Similar for > transaction abort_count. In the initial versions of the patch, we were > not using last_* for column names but similar arguments led us to > change names to last_ terminology and the same was being used in > pg_stat_archiver. But, feel free to suggest better names. Yes, I agree > with an auto-clear point as well and there seems to be an agreement > for doing it after the next successful apply and or after we skipped > the failed xact. > > > This is my email trying to understand reality better in order to figure out > > what exactly is causing the limitations that are negatively impacting the > > design of this feature. > > > > https://www.postgresql.org/message-id/CAKFQuwYJ7dsW%2BStsw5%2BZVoY3nwQ9j6pPt-7oYjGddH-h7uVb%2Bg%40mail.gmail.com > > > > In short, it was convenient to use the statistics collector here even if > > doing so resulted in a non-user friendly (IMO) design. Given all of the > > limitations to the statistics collection infrastructure, and the fact that > > this data is not statistical in the usual usage of the term, I find that to > > be less than satisfying. > > > > I think the failures/conflicts are also important information for > users to know, so having a view of those doesn't appear to be a bad > idea. All this data is less suitable for system catalogs like > pg_subscription_rel or others for the reasons quoted in my previous > email [1].
I see that it's better to use a better IPC for ALTER SUBSCRIPTION SKIP feature to pass error-XID or error-LSN information to the worker whereas I'm also not sure of the advantages in storing all error information in a system catalog. Since what we need to do for this purpose is only error-XID/LSN, we can store only error-XID/LSN in the catalog? That is, the worker stores error-XID/LSN in the catalog on an error, and ALTER SUBSCRIPTION SKIP command enables the worker to skip the transaction in question. The worker clears the error-XID/LSN after successfully applying or skipping the first non-empty transaction. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/