Hi. With all the recent work on pg_createsubscriber I decided to take a look at it, namely the documentation.
The 0001 patch is basically just fixing some typos and other minor edits. The 0002 includes some of the more editoralized (but still limited) changes. The main thing I noted is that our synopsis is simply wrong regarding the optionality of the --database option. I went with a fix that included only showing the two mandatory options and adding a paragraph to usage explaining that --database exists and how to use it. Our existing example also covers that common usage. I also noted that we've started including both the long and short forms of the options in our synopsis. I went with the "choose the short option as the only one" as exemplified by pg_upgrade. It seems quite noisy to include both in a synopsis that is fine with including 10+ other options under [options...]. I tweaked the wording of config-file and felt it worth pointing out the caveat regarding the port setting. It took me a bit, and reading implementation code, to understand what was meant by "Additional recovery parameters are added to avoid ...". I decided that detail seemed unnecessary for the user-facing documentation - supposedly they will never see the recovery file if everything works OK (and if they do the various empty string settings should be reasonably self-evident). If we do want to keep it I'll try to find a better wording. I removed the use of the word "switch" and replaced it with "option" for consistency. There might be some items in 0001 that belong in 0002 and vice-versa; the final patch would just be a single one in any case. I also dropped the bit about "intended for large databases, use normal logical replication for small ones". If you already have a standby why not use this even for small ones? Maybe the simplicity of normal replication is more important than how long an overnight process to copy the data tables might take. We've explained how the two options compare to each other but we tend not to, for good reason, include such conclusions/recommendations in the technical/reference documentation. David J.
From 36e1f1e641c871781be264a665081dd52a6d77d9 Mon Sep 17 00:00:00 2001 From: "David G. Johnston" <David.G.Johnston@Gmail.com> Date: Mon, 10 Mar 2025 13:55:55 -0700 Subject: [PATCH 2/2] doc: editorial pass over pg_createsubscriber --- doc/src/sgml/ref/pg_createsubscriber.sgml | 105 ++++++++++------------ 1 file changed, 49 insertions(+), 56 deletions(-) diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml index d39ec44297..cffc9cfc42 100644 --- a/doc/src/sgml/ref/pg_createsubscriber.sgml +++ b/doc/src/sgml/ref/pg_createsubscriber.sgml @@ -23,23 +23,10 @@ PostgreSQL documentation <cmdsynopsis> <command>pg_createsubscriber</command> <arg rep="repeat"><replaceable>option</replaceable></arg> - <group choice="plain"> - <group choice="req"> - <arg choice="plain"><option>-d</option></arg> - <arg choice="plain"><option>--database</option></arg> - </group> - <replaceable>dbname</replaceable> - <group choice="req"> - <arg choice="plain"><option>-D</option> </arg> - <arg choice="plain"><option>--pgdata</option></arg> - </group> - <replaceable>datadir</replaceable> - <group choice="req"> - <arg choice="plain"><option>-P</option></arg> - <arg choice="plain"><option>--publisher-server</option></arg> - </group> - <replaceable>connstr</replaceable> - </group> + <arg choice="plain"><option>-D</option></arg> + <arg choice="plain"><replaceable>datadir</replaceable></arg> + <arg choice="plain"><option>-P</option></arg> + <arg choice="plain"><replaceable>connstr</replaceable></arg> </cmdsynopsis> </refsynopsisdiv> @@ -50,32 +37,26 @@ PostgreSQL documentation <application>pg_createsubscriber</application> creates a new logical replica from a physical standby server. All tables in the specified database are included in the <link linkend="logical-replication">logical - replication</link> setup. A pair of - publication and subscription objects are created for each database. It - must be run at the target server. + replication</link> setup. It must be run at the target server. </para> <para> - After a successful run, the state of the target server is analogous to a - fresh logical replication setup. The main difference between the logical - replication setup and <application>pg_createsubscriber</application> is how - the data synchronization is done. <application>pg_createsubscriber</application> - does not copy the initial table data. It does only the synchronization phase, - which ensures each table is brought up to a synchronized state. + The minimal command shown above replicates only a single database: the + one named in the connection string pointing to the source server. More + commonly, you would use the <option>--database</option> (<option>-d</option>) + option one or more times to specify which databases in the source to replicate. + The corresponding publications, subscriptions, and replication slots are + then created using auto-generated names, which can be overriden + using the corresponding long-form command-line options, once each per database. </para> <para> - <application>pg_createsubscriber</application> targets large database - systems because in logical replication setup, most of the time is spent - doing the initial data copy. Furthermore, a side effect of this long time - spent synchronizing data is usually a large amount of changes to be applied - (that were produced during the initial data copy), which increases even - more the time when the logical replica will be unavailable. For smaller - databases, it is recommended to set up logical replication with initial data - synchronization. For details, see the <command>CREATE SUBSCRIPTION</command> - <link linkend="sql-createsubscription-params-with-copy-data"> - <literal>copy_data</literal></link> option. - + After a successful run, the state of the target server is analogous to a + fresh logical replication setup. The main difference compared the logical + replication setup is that <application>pg_createsubscriber</application> + does not perform an initial data copy. Instead it relies on a LSN-based + point-in-time recovery of physical backup to coincide with the LSN-based + starting point of the subscription(s) it creates. </para> </refsect1> @@ -92,12 +73,12 @@ PostgreSQL documentation <term><option>--database=<replaceable class="parameter">dbname</replaceable></option></term> <listitem> <para> - The name of the database in which to create a subscription. Multiple + The name of the database for which to create a subscription. Multiple databases can be selected by writing multiple <option>-d</option> - options. If <option>-d</option> option is not provided, the database - name will be obtained from <option>-P</option> option. If the database - name is not specified in either the <option>-d</option> option or - <option>-P</option> option, an error will be reported. + options. If no <option>-d</option> option is provided, the database + name will be obtained from the <option>-P</option> option. If the database + name is not specified in either the <option>-d</option> or + <option>-P</option> options, an error will be reported. </para> </listitem> </varlistentry> @@ -107,7 +88,7 @@ PostgreSQL documentation <term><option>--pgdata=<replaceable class="parameter">directory</replaceable></option></term> <listitem> <para> - The target directory that contains a cluster directory from a physical + The target directory that contains a cluster directory of a physical replica. </para> </listitem> @@ -143,6 +124,11 @@ PostgreSQL documentation The connection string to the publisher. For details see <xref linkend="libpq-connstring"/>. </para> + <para> + If the <option>-d</option> option is not provided, the database name + of the single database to replicate is obtained from here. If a database + name is also not specified in this connection string, an error will be reported. + </para> </listitem> </varlistentry> @@ -162,8 +148,8 @@ PostgreSQL documentation <term><option>--recovery-timeout=<replaceable class="parameter">seconds</replaceable></option></term> <listitem> <para> - The maximum number of seconds to wait for recovery to end. Setting to - 0 disables. The default is 0. + The maximum number of seconds to wait for recovery to end. + A value of 0 (the default) disables. </para> </listitem> </varlistentry> @@ -186,7 +172,7 @@ PostgreSQL documentation <term><option>--subscriber-username=<replaceable class="parameter">username</replaceable></option></term> <listitem> <para> - The user name to connect as on target server. Defaults to the current + The user name to connect as on the target server. Defaults to the current operating system user name. </para> </listitem> @@ -210,12 +196,18 @@ PostgreSQL documentation <term><option>--config-file=<replaceable class="parameter">filename</replaceable></option></term> <listitem> <para> - Use the specified main server configuration file for the target data - directory. <application>pg_createsubscriber</application> internally uses + Use the specified main server configuration file for the target service. + <application>pg_createsubscriber</application> internally uses the <application>pg_ctl</application> command to start and - stop the target server. It allows you to specify the actual - <filename>postgresql.conf</filename> configuration file if it is stored - outside the data directory. + stop the target server; which by default expects a <filename>postgresql.conf</filename> + configuration file to exist in the data directory. Pass the actual file location via this option + if that is not the case. + </para> + <para> + Note, the configuration setting <literal>port</literal> in this file is effectively ignored + as <application>pg_createsubscriber</application> always specifies the target listening port + on the <application>pg_ctl</application> command-line. + Use the <option>-p</option> option to specify which port the target server should listen on. </para> </listitem> </varlistentry> @@ -310,7 +302,7 @@ PostgreSQL documentation </para> <para> - The target server must be used as a physical standby. The target server + The target server must be setup as a physical standby. The target server must have <xref linkend="guc-max-replication-slots"/> and <xref linkend="guc-max-logical-replication-workers"/> configured to a value greater than or equal to the number of specified databases. The target @@ -344,16 +336,17 @@ PostgreSQL documentation </para> <para> - <application>pg_createsubscriber</application> usually starts the target - server with different connection settings during transformation. Hence, - connections to the target server should fail. + <application>pg_createsubscriber</application> is designed have full and + exclusive control over the target server. Plan for downtime or migration + of any applications or users of the target server in its soon-to-be converted + physical standby mode. </para> <para> Since DDL commands are not replicated by logical replication, avoid executing DDL commands that change the database schema while running <application>pg_createsubscriber</application>. If the target server has - already been converted to logical replica, the DDL commands might not be + already been converted to a logical replica, the DDL commands might not be replicated, which might cause an error. </para> -- 2.34.1
From 7d09cf0d7500ee7ffb403ace54b1271232346ac2 Mon Sep 17 00:00:00 2001 From: "David G. Johnston" <David.G.Johnston@Gmail.com> Date: Mon, 10 Mar 2025 12:23:05 -0700 Subject: [PATCH 1/2] doc: minor fixups in pg_createsubscriber --- doc/src/sgml/ref/pg_createsubscriber.sgml | 68 +++++++++++------------ 1 file changed, 31 insertions(+), 37 deletions(-) diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml index b4b996236e..d39ec44297 100644 --- a/doc/src/sgml/ref/pg_createsubscriber.sgml +++ b/doc/src/sgml/ref/pg_createsubscriber.sgml @@ -70,7 +70,7 @@ PostgreSQL documentation doing the initial data copy. Furthermore, a side effect of this long time spent synchronizing data is usually a large amount of changes to be applied (that were produced during the initial data copy), which increases even - more the time when the logical replica will be available. For smaller + more the time when the logical replica will be unavailable. For smaller databases, it is recommended to set up logical replication with initial data synchronization. For details, see the <command>CREATE SUBSCRIPTION</command> <link linkend="sql-createsubscription-params-with-copy-data"> @@ -94,7 +94,7 @@ PostgreSQL documentation <para> The name of the database in which to create a subscription. Multiple databases can be selected by writing multiple <option>-d</option> - switches. If <option>-d</option> option is not provided, the database + options. If <option>-d</option> option is not provided, the database name will be obtained from <option>-P</option> option. If the database name is not specified in either the <option>-d</option> option or <option>-P</option> option, an error will be reported. @@ -128,7 +128,7 @@ PostgreSQL documentation <term><option>--subscriber-port=<replaceable class="parameter">port</replaceable></option></term> <listitem> <para> - The port number on which the target server is listening for + The port number on which the target server will listen for connections. Defaults to running the target server on port 50432 to avoid unintended client connections. </para> @@ -226,10 +226,10 @@ PostgreSQL documentation <para> The publication name to set up the logical replication. Multiple publications can be specified by writing multiple - <option>--publication</option> switches. The number of publication + <option>--publication</option> options. The number of publication names must match the number of specified databases, otherwise an error - is reported. The order of the multiple publication name switches must - match the order of database switches. If this option is not specified, + is reported. The order of the multiple publication name options must + match the order of database options. If this option is not specified, a generated name is assigned to the publication name. </para> </listitem> @@ -241,10 +241,10 @@ PostgreSQL documentation <para> The replication slot name to set up the logical replication. Multiple replication slots can be specified by writing multiple - <option>--replication-slot</option> switches. The number of + <option>--replication-slot</option> options. The number of replication slot names must match the number of specified databases, otherwise an error is reported. The order of the multiple replication - slot name switches must match the order of database switches. If this + slot name options must match the order of database options. If this option is not specified, the subscription name is assigned to the replication slot name. </para> @@ -257,10 +257,10 @@ PostgreSQL documentation <para> The subscription name to set up the logical replication. Multiple subscriptions can be specified by writing multiple - <option>--subscription</option> switches. The number of subscription + <option>--subscription</option> options. The number of subscription names must match the number of specified databases, otherwise an error - is reported. The order of the multiple subscription name switches must - match the order of database switches. If this option is not specified, + is reported. The order of the multiple subscription name options must + match the order of database options. If this option is not specified, a generated name is assigned to the subscription name. </para> </listitem> @@ -281,7 +281,7 @@ PostgreSQL documentation <term><option>--help</option></term> <listitem> <para> - Show help about <application>pg_createsubscriber</application> command + Show help about <application>pg_createsubscriber</application> command- line arguments, and exit. </para> </listitem> @@ -300,7 +300,7 @@ PostgreSQL documentation There are some prerequisites for <application>pg_createsubscriber</application> to convert the target server into a logical replica. If these are not met, an error will be reported. - The source and target servers must have the same major version as the + The source and target servers must have the same major version as <application>pg_createsubscriber</application>. The given target data directory must have the same system identifier as the source data directory. The given database user for the target data directory must have @@ -317,7 +317,7 @@ PostgreSQL documentation server must have <xref linkend="guc-max-worker-processes"/> configured to a value greater than the number of specified databases. The target server must accept local connections. If you are planning to use the - <option>--enable-two-phase</option> switch then you will also need to set + <option>--enable-two-phase</option> option then you will also need to set the <xref linkend="guc-max-prepared-transactions"/> appropriately. </para> @@ -361,14 +361,14 @@ PostgreSQL documentation If <application>pg_createsubscriber</application> fails while processing, objects (publications, replication slots) created on the source server are removed. The removal might fail if the target server cannot connect to - the source server. In such a case, a warning message will inform the - objects left. If the target server is running, it will be stopped. + the source server. In such a case, a warning message will list the + remaining objects. If the target server is running, it will be stopped. </para> <para> If the replication is using <xref linkend="guc-primary-slot-name"/>, it - will be removed from the source server after the logical replication - setup. + will be removed from the source server once the logical replication + synchronization begins. </para> <para> @@ -378,7 +378,7 @@ PostgreSQL documentation </para> <para> - Unless the <option>--enable-two-phase</option> switch is specified, + Unless the <option>--enable-two-phase</option> option is specified, <application>pg_createsubscriber</application> sets up logical replication with two-phase commit disabled. This means that any prepared transactions will be replicated at the time @@ -391,8 +391,8 @@ PostgreSQL documentation <para> <application>pg_createsubscriber</application> changes the system - identifier using <application>pg_resetwal</application>. It would avoid - situations in which the target server might use WAL files from the source + identifier using <application>pg_resetwal</application>. This prevents + situations where the target server would use WAL files from the source server. If the target server has a standby, replication will break and a fresh standby should be created. </para> @@ -410,7 +410,7 @@ PostgreSQL documentation <para> The basic idea is to have a replication start point from the source server - and set up a logical replication to start from this point: + and set up logical replication to start from this point: </para> <procedure> @@ -439,7 +439,7 @@ PostgreSQL documentation linkend="sql-createpublication-params-for-all-tables"><literal>FOR ALL TABLES</literal></link>. If the <option>--publication</option> option is not specified, the publication has the following name pattern: - <quote><literal>pg_createsubscriber_%u_%x</literal></quote> (parameter: + <quote><literal>pg_createsubscriber_%u_%x</literal></quote> (parameters: database <parameter>oid</parameter>, random <parameter>int</parameter>). If the <option>--replication-slot</option> option is not specified, the replication slot has the following name pattern: @@ -449,7 +449,7 @@ PostgreSQL documentation step. The last replication slot LSN is used as a stopping point in the <xref linkend="guc-recovery-target-lsn"/> parameter and by the subscriptions as a replication start point. It guarantees that no - transaction will be lost. + transactions will be lost. </para> </step> @@ -460,16 +460,11 @@ PostgreSQL documentation linkend="guc-recovery-target-lsn"/>) of the write-ahead log location up to which recovery will proceed. It also specifies <literal>promote</literal> as the action that the server should take - once the recovery target is reached. Additional <link - linkend="runtime-config-wal-recovery-target">recovery parameters</link> - are added to avoid unexpected behavior during the recovery process such - as end of the recovery as soon as a consistent state is reached (WAL - should be applied until the replication start location) and multiple - recovery targets that can cause a failure. This step finishes once the + once the recovery target is reached. This step finishes once the server ends standby mode and is accepting read-write transactions. If - <option>--recovery-timeout</option> option is set, - <application>pg_createsubscriber</application> terminates if recovery - does not end until the given number of seconds. + a non-zero <option>--recovery-timeout</option> is set, + <application>pg_createsubscriber</application> terminates if promotion + does not happen within the given number of seconds. </para> </step> @@ -491,8 +486,7 @@ PostgreSQL documentation <step> <para> Drop publications on the target server that were replicated because they - were created before the replication start location. It has no use on - the subscriber. + were created before the replication start location. </para> </step> @@ -502,7 +496,7 @@ PostgreSQL documentation subscription. When the target server starts the recovery process, it catches up to the replication start point. This is the exact LSN to be used as a initial replication location for each subscription. The - replication origin name is obtained since the subscription was created. + replication origin name was obtained when the subscription was created. The replication origin name and the replication start point are used in <link linkend="pg-replication-origin-advance"><function>pg_replication_origin_advance()</function></link> @@ -536,7 +530,7 @@ PostgreSQL documentation <step> <para> - Update the system identifier on the target server. The + Update the system identifier on the target server. The application <xref linkend="app-pgresetwal"/> is run to modify the system identifier. The target server is stopped as a <command>pg_resetwal</command> requirement. </para> -- 2.34.1