Re: Add an option to skip loading missing publication to avoid logical replication failure

Xuneng Zhou Tue, 06 May 2025 03:04:11 -0700

Hi,

A clear benefit of addressing this in code is to ensure that the user sees
the log message, which can be valuable for trouble-shooting—even under race
conditions.


                        ereport(WARNING,


errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

                                        errmsg("skipped loading
publication: %s", pubname),

                                        errdetail("The publication does not
exist at this point in the WAL."),

                                        errhint("Create the publication if
it does not exist."));


The performance impact appears low, assuming the
AcceptInvalidationMessages and maybe_reread_subscription check are
introduced only in the code path that handles keepalive messages requiring
a reply.

>
> > vignesh C <vignes...@gmail.com> writes:
> > > Due to the asynchronous nature of these processes, the ALTER
> > > SUBSCRIPTION command may not be immediately observed by the apply
> > > worker. Meanwhile, the walsender may process and decode an INSERT
> > > statement.
> > > If the insert targets a table (e.g., tab_3) that does not belong to
> > > the current publication (pub1), the walsender silently skips
> > > replicating the record and advances its decoding position. This
> > > position is sent in a keepalive message to the subscriber, and since
> > > there are no pending transactions to flush, the apply worker reports
> > > it as the latest received LSN.
> >
> > So this theory presumes that the apply worker receives and reacts to
> > the keepalive message, yet it has not observed a relevant
> > subscriber-side catalog update that surely committed before the
> > keepalive was generated.  It's fairly hard to see how that is okay,
> > because it's at least adjacent to something that must be considered a
> > bug: applying transmitted data without having observed DDL updates to
> > the target table.  Why is the processing of keepalives laxer than the
> > processing of data messages?
> >
>
> Valid question, as of now, we don't have a specific rule about
> ordering the processing of keepalives or invalidation messages. The
> effect of invalidation messages is realized by calling
> maybe_reread_subscription at three different times after accepting
> invalidation message, (a) after starting a transaction in
> begin_replication_step, (b) in the commit message handling if there is
> no data modification happened in that transaction, and (c) when we
> don't get any transactions for a while
>
> The (a) ensures we consume any target table change before applying a
> new transaction. The other two places ensure that we keep consuming
> invalidation messages from time to time.
>
> Now, we can consume invalidation messages during keepalive message
> handling and or at some other places, to ensure that we never process
> any remote message before consuming an invalidation message. However,
> it is not clear to if this is a must kind of thing. We can provide
> strict guarantees for ordering of messages from any one of the
> servers, but providing it across nodes doesn't sound to be a
> must-criterion.
>
> --
> With Regards,
> Amit Kapila.
>

Re: Add an option to skip loading missing publication to avoid logical replication failure

Reply via email to