On Mon, Jul 29, 2024, at 6:11 PM, Euler Taveira wrote:
> The options are:
> 
> (a) temporary replication slot: requires an additional replication slot.
> small payload. it is extremely slow in comparison with the other
> options.
> (b) logical message: can be consumed by logical replication when/if it
> is supported some day. big payload. fast.
> (c) snapshot of running txn:  small payload. fast.
> (d) named restore point: biggest payload. fast.
> 
> I don't have a strong preference but if I need to pick one I would
> choose option (c) or option (d). The option (a) is out of the question.

I'm attaching a patch that implements option (c). While reading the code
I noticed that I left a comment that should be removed by commit
b9639138262. 0002 removes it.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From f4afe05fc7e73c5c23bcdeba4fc65a538c83b8ba Mon Sep 17 00:00:00 2001
From: Euler Taveira <eu...@eulerto.com>
Date: Mon, 29 Jul 2024 19:44:16 -0300
Subject: [PATCH 1/2] pg_createsubscriber: fix slow recovery

If the primary server is idle when you are running pg_createsubscriber,
it used to take some time during recovery. The reason is that it was
using the LSN returned by pg_create_logical_replication_slot as
recovery_target_lsn. This LSN points to the next WAL record that might
not be available at WAL, hence, the recovery routine waits until some
activity writes a WAL record to end the recovery. Inject a new WAL
record after the last replication slot to avoid slowness.

Discussion: https://www.postgresql.org/message-id/2377319.1719766794%40sss.pgh.pa.us
---
 src/bin/pg_basebackup/pg_createsubscriber.c | 23 +++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index b02318782a6..00976c643a1 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -778,6 +778,29 @@ setup_publisher(struct LogicalRepInfo *dbinfo)
 		else
 			exit(1);
 
+		/*
+		 * An idle server might not write a new WAL record until the recovery
+		 * is about to end. Since pg_createsubscriber is using the LSN
+		 * returned by the last replication slot as recovery_target_lsn, this
+		 * LSN is ahead of the current WAL position and the recovery waits
+		 * until something writes a WAL record to reach the target and ends
+		 * the recovery. To avoid the recovery slowness in this case, injects
+		 * a new WAL record here.
+		 */
+		if (i == num_dbs - 1 && !dry_run)
+		{
+			PGresult   *res;
+
+			res = PQexec(conn, "SELECT pg_log_standby_snapshot()");
+			if (PQresultStatus(res) != PGRES_TUPLES_OK)
+			{
+				pg_log_error("could not write an additional WAL record: %s",
+							 PQresultErrorMessage(res));
+				disconnect_database(conn, true);
+			}
+			PQclear(res);
+		}
+
 		disconnect_database(conn, false);
 	}
 
-- 
2.30.2

From 75558e8379abae3a642583f31b21e0ca5db80d2b Mon Sep 17 00:00:00 2001
From: Euler Taveira <eu...@eulerto.com>
Date: Mon, 29 Jul 2024 20:59:32 -0300
Subject: [PATCH 2/2] pg_createsubscriber: remove obsolete comment

This comment should have been removed by commit b9639138262. There is no
replication slot check on primary anymore.
---
 src/bin/pg_basebackup/pg_createsubscriber.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index 00976c643a1..87668640f78 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -2209,10 +2209,7 @@ main(int argc, char **argv)
 	stop_standby_server(subscriber_dir);
 
 	/*
-	 * Create the required objects for each database on publisher. This step
-	 * is here mainly because if we stop the standby we cannot verify if the
-	 * primary slot is in use. We could use an extra connection for it but it
-	 * doesn't seem worth.
+	 * Create the required objects for each database on publisher.
 	 */
 	consistent_lsn = setup_publisher(dbinfo);
 
-- 
2.30.2

Reply via email to