On Tue, Dec 30, 2025 at 02:34:21PM +0700, Alena Vinter wrote: > I’m now back on track and have refined the implementation based on earlier > discussions. The current version fully adopts the `include_if_exists` > approach, writes temporary recovery settings to a separate config file, and > disables it on exit by renaming to `.disabled`.
It seems roughly OK, so I have put my hands on it. A couple of notes regarding the things done in the updated version attached: - Addition of two tests to check pg_createsubscriber.conf.disabled in the data folders of node S and K. - More description in the tests. - The "dry run mode" node has disappeared from the recovery parameter StringInfo, so added it back at the top of the parameters generated. - Missing a newline after the include_if_exists. - dirable_rename() logs already something on failure, I see no need for an extra warning to say the same. Adding the warning telling that a manual intervention may be required is good, though. - Let's group the document change with the main patch. - More stylistic changes, comments and code. - The new test fails if we undo the changes in pg_createsubscriber.c, as we'd want, in the shape of node K FATAL-ing due to an incorrect recovery configuration fed to the node. Alena, what do you think? -- Michael
From e4601823d84b8226f3a230ddb131c00dfdd8b46b Mon Sep 17 00:00:00 2001 From: Alena Vinter <[email protected]> Date: Tue, 30 Dec 2025 13:56:18 +0700 Subject: [PATCH v10] pg_createsubscriber: use include_if_exists for recovery config and disable on exit Write temporary recovery parameters to pg_createsubscriber.conf and include it from postgresql.auto.conf via include_if_exists. Upon completion or failure, rename the file to pg_createsubscriber.conf.disabled so it is ignored on subsequent restarts. --- src/bin/pg_basebackup/pg_createsubscriber.c | 100 ++++++++++++++---- .../t/040_pg_createsubscriber.pl | 46 ++++++++ doc/src/sgml/ref/pg_createsubscriber.sgml | 12 ++- 3 files changed, 132 insertions(+), 26 deletions(-) diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c index 9a825bfd166e..3ce24860c04b 100644 --- a/src/bin/pg_basebackup/pg_createsubscriber.c +++ b/src/bin/pg_basebackup/pg_createsubscriber.c @@ -20,6 +20,7 @@ #include "common/connect.h" #include "common/controldata_utils.h" +#include "common/file_utils.h" #include "common/logging.h" #include "common/pg_prng.h" #include "common/restricted_token.h" @@ -33,6 +34,21 @@ #define DEFAULT_SUB_PORT "50432" #define OBJECTTYPE_PUBLICATIONS 0x0001 +/* + * Configuration files for recovery parameters. + * + * The recovery parameters are set in INCLUDED_CONF_FILE, itself loaded by + * the server through an include_if_exists in postgresql.auto.conf. + * + * INCLUDED_CONF_FILE is renamed to INCLUDED_CONF_FILE_DISABLED when exiting, + * so as the recovery parameters set by this tool never take effect on node + * restart. The contents of INCLUDED_CONF_FILE_DISABLED can be useful for + * debugging. + */ +#define PG_AUTOCONF_FILENAME "postgresql.auto.conf" +#define INCLUDED_CONF_FILE "pg_createsubscriber.conf" +#define INCLUDED_CONF_FILE_DISABLED INCLUDED_CONF_FILE ".disabled" + /* Command-line options */ struct CreateSubscriberOptions { @@ -156,22 +172,43 @@ static char *subscriber_dir = NULL; static bool recovery_ended = false; static bool standby_running = false; +static bool recovery_params_set = false; /* - * Cleanup objects that were created by pg_createsubscriber if there is an - * error. + * Clean up objects created by pg_createsubscriber. * - * Publications and replication slots are created on primary. Depending on the - * step it failed, it should remove the already created objects if it is - * possible (sometimes it won't work due to a connection issue). - * There is no cleanup on the target server. The steps on the target server are - * executed *after* promotion, hence, at this point, a failure means recreate - * the physical replica and start again. + * Publications and replication slots are created on the primary. Depending + * on the step where it failed, already-created objects should be removed if + * possible (sometimes this won't work due to a connection issue). + * There is no cleanup on the target server *after* its promotion, because any + * failure at this point means recreating the physical replica and starting + * again. + * + * The recovery configuration is always removed, by renaming the included + * configuration file out of the way. */ static void cleanup_objects_atexit(void) { + /* Rename the included configuration file, if necessary. */ + if (recovery_params_set) + { + char conf_filename[MAXPGPATH]; + char conf_filename_disabled[MAXPGPATH]; + + snprintf(conf_filename, MAXPGPATH, "%s/%s", subscriber_dir, + INCLUDED_CONF_FILE); + snprintf(conf_filename_disabled, MAXPGPATH, "%s/%s", subscriber_dir, + INCLUDED_CONF_FILE_DISABLED); + + if (durable_rename(conf_filename, conf_filename_disabled) != 0) + { + /* durable_rename() has already logged something. */ + pg_log_warning_hint("A manual removal of the recovery parameters may be required."); + } + } + if (success) return; @@ -1303,6 +1340,10 @@ setup_recovery(const struct LogicalRepInfo *dbinfo, const char *datadir, const c * targets (name, time, xid, LSN). */ recoveryconfcontents = GenerateRecoveryConfig(conn, NULL, NULL); + + if (dry_run) + appendPQExpBufferStr(recoveryconfcontents, "# dry run mode\n"); + appendPQExpBufferStr(recoveryconfcontents, "recovery_target = ''\n"); appendPQExpBufferStr(recoveryconfcontents, "recovery_target_timeline = 'latest'\n"); @@ -1322,23 +1363,36 @@ setup_recovery(const struct LogicalRepInfo *dbinfo, const char *datadir, const c appendPQExpBufferStr(recoveryconfcontents, "recovery_target_name = ''\n"); appendPQExpBufferStr(recoveryconfcontents, "recovery_target_time = ''\n"); appendPQExpBufferStr(recoveryconfcontents, "recovery_target_xid = ''\n"); - - if (dry_run) - { - appendPQExpBufferStr(recoveryconfcontents, "# dry run mode\n"); - appendPQExpBuffer(recoveryconfcontents, - "recovery_target_lsn = '%X/%08X'\n", - LSN_FORMAT_ARGS((XLogRecPtr) InvalidXLogRecPtr)); - } - else - { - appendPQExpBuffer(recoveryconfcontents, "recovery_target_lsn = '%s'\n", - lsn); - WriteRecoveryConfig(conn, datadir, recoveryconfcontents); - } - disconnect_database(conn, false); + appendPQExpBuffer(recoveryconfcontents, "recovery_target_lsn = '%s'\n", lsn); pg_log_debug("recovery parameters:\n%s", recoveryconfcontents->data); + + if (!dry_run) + { + char conf_filename[MAXPGPATH]; + FILE *fd; + + /* Write the recovery parameters to INCLUDED_CONF_FILE */ + snprintf(conf_filename, MAXPGPATH, "%s/%s", datadir, + INCLUDED_CONF_FILE); + fd = fopen(conf_filename, "w"); + if (fd == NULL) + pg_fatal("could not open file \"%s\": %m", conf_filename); + + if (fwrite(recoveryconfcontents->data, recoveryconfcontents->len, 1, fd) != 1) + pg_fatal("could not write to file \"%s\": %m", conf_filename); + + fclose(fd); + recovery_params_set = true; + + /* Include conditionally the recovery parameters. */ + resetPQExpBuffer(recoveryconfcontents); + appendPQExpBufferStr(recoveryconfcontents, + "include_if_exists '" INCLUDED_CONF_FILE "'\n"); + WriteRecoveryConfig(conn, datadir, recoveryconfcontents); + } + + disconnect_database(conn, false); } /* diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl index a905183b738e..e1de946488ed 100644 --- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl +++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl @@ -160,6 +160,7 @@ primary_slot_name = '$slotname' primary_conninfo = '$pconnstr dbname=postgres' hot_standby_feedback = on ]); +my $sconnstr = $node_s->connstr; $node_s->set_standby_mode(); $node_s->start; @@ -472,6 +473,11 @@ command_ok( ], 'run pg_createsubscriber on node S'); +# Check that included file is renamed after success. +my $node_s_datadir = $node_s->data_dir; +ok( -f "$node_s_datadir/pg_createsubscriber.conf.disabled", + "pg_createsubscriber.conf.disabled exists in node S"); + # Confirm the physical replication slot has been removed $result = $node_p->safe_psql($db1, "SELECT count(*) FROM pg_replication_slots WHERE slot_name = '$slotname'" @@ -579,10 +585,50 @@ is($result, qq($db1|{test_pub3} $db2|{pub2}), "subscriptions use the correct publications"); +# Verify that node K, set as a standby, is able to start correctly without +# the recovery configuration written by pg_createsubscriber interfering. +# This node is created from node S, where pg_createsubscriber has been run. + +# Create a physical standby from the promoted subscriber +$node_s->safe_psql('postgres', + "SELECT pg_create_physical_replication_slot('$slotname');"); + +# Create backup from promoted subscriber +$node_s->backup('backup_3'); + +# Initialize new physical standby +my $node_k = PostgreSQL::Test::Cluster->new('node_k'); +$node_k->init_from_backup($node_s, 'backup_3', has_streaming => 1); + +my $node_k_datadir = $node_k->data_dir; +ok( -f "$node_k_datadir/pg_createsubscriber.conf.disabled", + "pg_createsubscriber.conf.disabled exists in node K"); + +# Configure the new standby +$node_k->append_conf( + 'postgresql.conf', qq[ +primary_slot_name = '$slotname' +primary_conninfo = '$sconnstr dbname=postgres' +hot_standby_feedback = on +]); + +$node_k->set_standby_mode(); +my $node_k_name = $node_s->name; +command_ok( + [ + 'pg_ctl', '--wait', + '--pgdata' => $node_k->data_dir, + '--log' => $node_k->logfile, + '--options' => "--cluster-name=$node_k_name", + 'start' + ], + "node K has started"); + # clean up $node_p->teardown_node; $node_s->teardown_node; $node_t->teardown_node; $node_f->teardown_node; +$node_k->teardown_node; done_testing(); diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml index e450c6a5b376..cf45ff3573d1 100644 --- a/doc/src/sgml/ref/pg_createsubscriber.sgml +++ b/doc/src/sgml/ref/pg_createsubscriber.sgml @@ -518,8 +518,11 @@ PostgreSQL documentation <step> <para> - Write recovery parameters into the target data directory and restart the - target server. It specifies an LSN (<xref + Write recovery parameters into the separate configuration file + <filename>pg_createsubscriber.conf</filename> that is included from + <filename>postgresql.auto.conf</filename> using + <literal>include_if_exists</literal> in the target data directory, then + restart the target server. It specifies an LSN (<xref linkend="guc-recovery-target-lsn"/>) of the write-ahead log location up to which recovery will proceed. It also specifies <literal>promote</literal> as the action that the server should take @@ -532,7 +535,10 @@ PostgreSQL documentation server ends standby mode and is accepting read-write transactions. If <option>--recovery-timeout</option> option is set, <application>pg_createsubscriber</application> terminates if recovery - does not end until the given number of seconds. + does not end until the given number of seconds. Upon completion, the + included configuration file is renamed to + <filename>pg_createsubscriber.conf.disabled</filename> so as it is no + longer loaded on subsequent restarts. </para> </step> -- 2.51.0
signature.asc
Description: PGP signature
