Re: Add an option to skip loading missing publication to avoid logical replication failure

vignesh C Tue, 04 Mar 2025 06:08:30 -0800

On Tue, 4 Mar 2025 at 12:22, vignesh C <vignes...@gmail.com> wrote:
>
> On Mon, 3 Mar 2025 at 16:41, Amit Kapila <amit.kapil...@gmail.com> wrote:
> >
> > On Mon, Mar 3, 2025 at 2:30 PM vignesh C <vignes...@gmail.com> wrote:
> > >
> > > On Tue, 25 Feb 2025 at 15:32, vignesh C <vignes...@gmail.com> wrote:
> > > >
> > > > The attached script has the script that was used for testing. Here the
> > > > NUM_RECORDS count should be changed accordingly for each of the tests
> > > > and while running the test with the patch change uncomment the drop
> > > > publication command.
> > >
> > > I have done further analysis on the test and changed the test to
> > > compare it better with HEAD. The execution time is in milliseconds.
> > > Brach/records  |  100     |  1000   |  10000    |  100000  |  1000000
> > > Head               |   10.43  |  15.86  |   64.44    |  550.56  |   
> > > 8991.04
> > > Patch              |   11.35  |  17.26   |   73.50    |  640.21  |  
> > > 10104.72
> > > % diff              |   -8.82  |  -8.85    |   -14.08   |   -16.28  |  
> > > -12.38
> > >
> > > There is a  performance degradation in the range of 8.8 to 16.2 percent.
> > >
> >
> > - /* Validate the entry */
> > - if (!entry->replicate_valid)
> > + /*
> > + * If the publication is invalid, check for updates.
> > + * This optimization ensures that the next block, which queries the system
> > + * tables and builds the relation entry, runs only if a new publication was
> > + * created.
> > + */
> > + if (!publications_valid && data->publications)
> > + {
> > + bool skipped_pub = false;
> > + List    *publications;
> > +
> > + publications = LoadPublications(data->publication_names, &skipped_pub);
> >
> > The publications_valid flag indicates whether the publications cache
> > is valid or not; the flag is set to false for any invalidation in the
> > pg_publication catalog. I wonder that instead of using the same flag
> > what if we use a separate publications_skipped flag? If that works,
> > you don't even need to change the current location where we
> > LoadPublications.
>
> There is almost negligible dip with the above suggested way, the test
> results for the same is given below(execution time is in milli
> seconds):
> Brach/records  |  100     |  1000   |  10000    |  100000 |  1000000
> Head               |   10.25  |  15.85  |   65.53    |  569.15  |  9194.19
> Patch              |   10.25  |  15.84  |   65.91    |  571.75  |  9208.66
> % diff              |      0.00 |    0.06  |    -0.58    |     -0.46  |    
> -0.16
>
> There is a performance dip in the range of 0 to 0.58 percent.
> The attached patch has the changes for the same. The test script used
> is also attached.


On further thinking, I felt the use of publications_updated variable
is not required we can use publications_valid itself which will be set
if the publication system table is invalidated. Here is a patch for
the same.

Regards,
Vignesh

From 856403e4943954267b0e9b4fa96ccc6955838210 Mon Sep 17 00:00:00 2001
From: Vignesh <vignes...@gmail.com>
Date: Mon, 3 Mar 2025 11:54:27 +0530
Subject: [PATCH v6] Fix logical replication breakage after ALTER SUBSCRIPTION
 ... SET PUBLICATION

Altering a subscription with `ALTER SUBSCRIPTION ... SET PUBLICATION`
could cause logical replication to break under certain conditions. When
the apply worker restarts after executing SET PUBLICATION, it continues
using the existing replication slot and origin. If the replication origin
was not updated before the restart, the WAL start location could point to
a position prior to the existence of the specified publication, leading to
persistent start of apply worker and reporting errors.

This patch skips loading the publication if the publication does not exist
and loads the publication and updates the relation entry when the publication
gets created.

Discussion: https://www.postgresql.org/message-id/flat/CALDaNm0-n8FGAorM%2BbTxkzn%2BAOUyx5%3DL_XmnvOP6T24%2B-NcBKg%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/caa4ek1+t-etxerm4dhwzgxbpkaflcp__5bpa_qzffqp7-0w...@mail.gmail.com
---
 src/backend/replication/pgoutput/pgoutput.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 7d464f656aa..4cfd2b02023 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1762,6 +1762,8 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
 
 /*
  * Load publications from the list of publication names.
+ *
+ * Here, we just skip the publications that don't exist yet.
  */
 static List *
 LoadPublications(List *pubnames)
@@ -1772,9 +1774,12 @@ LoadPublications(List *pubnames)
 	foreach(lc, pubnames)
 	{
 		char	   *pubname = (char *) lfirst(lc);
-		Publication *pub = GetPublicationByName(pubname, false);
+		Publication *pub = GetPublicationByName(pubname, true);
 
-		result = lappend(result, pub);
+		if (pub)
+			result = lappend(result, pub);
+		else
+			elog(WARNING, "skipped loading publication: %s", pubname);
 	}
 
 	return result;
@@ -2053,8 +2058,11 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		entry->attrmap = NULL;
 	}
 
-	/* Validate the entry */
-	if (!entry->replicate_valid)
+	/*
+	 * Validate the entry if (a) the entry is not valid, or (b) there is a
+	 * change in the publications.
+	 */
+	if (!entry->replicate_valid || !publications_valid)
 	{
 		Oid			schemaId = get_rel_namespace(relid);
 		List	   *pubids = GetRelationPublications(relid);
-- 
2.43.0

Re: Add an option to skip loading missing publication to avoid logical replication failure

Reply via email to