On Mon, Jul 29, 2024, at 6:11 PM, Euler Taveira wrote: > The options are: > > (a) temporary replication slot: requires an additional replication slot. > small payload. it is extremely slow in comparison with the other > options. > (b) logical message: can be consumed by logical replication when/if it > is supported some day. big payload. fast. > (c) snapshot of running txn: small payload. fast. > (d) named restore point: biggest payload. fast. > > I don't have a strong preference but if I need to pick one I would > choose option (c) or option (d). The option (a) is out of the question.
I'm attaching a patch that implements option (c). While reading the code I noticed that I left a comment that should be removed by commit b9639138262. 0002 removes it. -- Euler Taveira EDB https://www.enterprisedb.com/
From f4afe05fc7e73c5c23bcdeba4fc65a538c83b8ba Mon Sep 17 00:00:00 2001 From: Euler Taveira <eu...@eulerto.com> Date: Mon, 29 Jul 2024 19:44:16 -0300 Subject: [PATCH 1/2] pg_createsubscriber: fix slow recovery If the primary server is idle when you are running pg_createsubscriber, it used to take some time during recovery. The reason is that it was using the LSN returned by pg_create_logical_replication_slot as recovery_target_lsn. This LSN points to the next WAL record that might not be available at WAL, hence, the recovery routine waits until some activity writes a WAL record to end the recovery. Inject a new WAL record after the last replication slot to avoid slowness. Discussion: https://www.postgresql.org/message-id/2377319.1719766794%40sss.pgh.pa.us --- src/bin/pg_basebackup/pg_createsubscriber.c | 23 +++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c index b02318782a6..00976c643a1 100644 --- a/src/bin/pg_basebackup/pg_createsubscriber.c +++ b/src/bin/pg_basebackup/pg_createsubscriber.c @@ -778,6 +778,29 @@ setup_publisher(struct LogicalRepInfo *dbinfo) else exit(1); + /* + * An idle server might not write a new WAL record until the recovery + * is about to end. Since pg_createsubscriber is using the LSN + * returned by the last replication slot as recovery_target_lsn, this + * LSN is ahead of the current WAL position and the recovery waits + * until something writes a WAL record to reach the target and ends + * the recovery. To avoid the recovery slowness in this case, injects + * a new WAL record here. + */ + if (i == num_dbs - 1 && !dry_run) + { + PGresult *res; + + res = PQexec(conn, "SELECT pg_log_standby_snapshot()"); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_error("could not write an additional WAL record: %s", + PQresultErrorMessage(res)); + disconnect_database(conn, true); + } + PQclear(res); + } + disconnect_database(conn, false); } -- 2.30.2
From 75558e8379abae3a642583f31b21e0ca5db80d2b Mon Sep 17 00:00:00 2001 From: Euler Taveira <eu...@eulerto.com> Date: Mon, 29 Jul 2024 20:59:32 -0300 Subject: [PATCH 2/2] pg_createsubscriber: remove obsolete comment This comment should have been removed by commit b9639138262. There is no replication slot check on primary anymore. --- src/bin/pg_basebackup/pg_createsubscriber.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c index 00976c643a1..87668640f78 100644 --- a/src/bin/pg_basebackup/pg_createsubscriber.c +++ b/src/bin/pg_basebackup/pg_createsubscriber.c @@ -2209,10 +2209,7 @@ main(int argc, char **argv) stop_standby_server(subscriber_dir); /* - * Create the required objects for each database on publisher. This step - * is here mainly because if we stop the standby we cannot verify if the - * primary slot is in use. We could use an extra connection for it but it - * doesn't seem worth. + * Create the required objects for each database on publisher. */ consistent_lsn = setup_publisher(dbinfo); -- 2.30.2