Hi.

With all the recent work on pg_createsubscriber I decided to take a look at
it, namely the documentation.

The 0001 patch is basically just fixing some typos and other minor edits.

The 0002 includes some of the more editoralized (but still limited) changes.

The main thing I noted is that our synopsis is simply wrong regarding the
optionality of the --database option.  I went with a fix that included only
showing the two mandatory options and adding a paragraph to usage
explaining that --database exists and how to use it.  Our existing
example also covers that common usage.

I also noted that we've started including both the long and short forms of
the options in our synopsis.  I went with the "choose the short option as
the only one" as exemplified by pg_upgrade.  It seems quite noisy to
include both in a synopsis that is fine with including 10+ other options
under [options...].

I tweaked the wording of config-file and felt it worth pointing out the
caveat regarding the port setting.

It took me a bit, and reading implementation code, to understand what was
meant by "Additional recovery parameters are added to avoid ...".  I
decided that detail seemed unnecessary for the user-facing documentation -
supposedly they will never see the recovery file if everything works OK
(and if they do the various empty string settings should be
reasonably self-evident).  If we do want to keep it I'll try to find a
better wording.

I removed the use of the word "switch" and replaced it with "option" for
consistency.

There might be some items in 0001 that belong in 0002 and vice-versa; the
final patch would just be a single one in any case.

I also dropped the bit about "intended for large databases, use normal
logical replication for small ones".  If you already have a standby why not
use this even for small ones?  Maybe the simplicity of normal replication
is more important than how long an overnight process to copy the data
tables might take.  We've explained how the two options compare to each
other but we tend not to, for good reason, include such
conclusions/recommendations in the technical/reference documentation.

David J.
From 36e1f1e641c871781be264a665081dd52a6d77d9 Mon Sep 17 00:00:00 2001
From: "David G. Johnston" <David.G.Johnston@Gmail.com>
Date: Mon, 10 Mar 2025 13:55:55 -0700
Subject: [PATCH 2/2] doc: editorial pass over pg_createsubscriber

---
 doc/src/sgml/ref/pg_createsubscriber.sgml | 105 ++++++++++------------
 1 file changed, 49 insertions(+), 56 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index d39ec44297..cffc9cfc42 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -23,23 +23,10 @@ PostgreSQL documentation
   <cmdsynopsis>
    <command>pg_createsubscriber</command>
    <arg rep="repeat"><replaceable>option</replaceable></arg>
-   <group choice="plain">
-    <group choice="req">
-     <arg choice="plain"><option>-d</option></arg>
-     <arg choice="plain"><option>--database</option></arg>
-    </group>
-    <replaceable>dbname</replaceable>
-    <group choice="req">
-     <arg choice="plain"><option>-D</option> </arg>
-     <arg choice="plain"><option>--pgdata</option></arg>
-    </group>
-    <replaceable>datadir</replaceable>
-    <group choice="req">
-     <arg choice="plain"><option>-P</option></arg>
-     <arg choice="plain"><option>--publisher-server</option></arg>
-    </group>
-    <replaceable>connstr</replaceable>
-   </group>
+   <arg choice="plain"><option>-D</option></arg>
+   <arg choice="plain"><replaceable>datadir</replaceable></arg>
+   <arg choice="plain"><option>-P</option></arg>
+   <arg choice="plain"><replaceable>connstr</replaceable></arg>
   </cmdsynopsis>
  </refsynopsisdiv>
 
@@ -50,32 +37,26 @@ PostgreSQL documentation
    <application>pg_createsubscriber</application> creates a new logical
    replica from a physical standby server.  All tables in the specified
    database are included in the <link linkend="logical-replication">logical
-   replication</link> setup.  A pair of
-   publication and subscription objects are created for each database.  It
-   must be run at the target server.
+   replication</link> setup.  It must be run at the target server.
   </para>
 
   <para>
-   After a successful run, the state of the target server is analogous to a
-   fresh logical replication setup.  The main difference between the logical
-   replication setup and <application>pg_createsubscriber</application> is how
-   the data synchronization is done. <application>pg_createsubscriber</application>
-   does not copy the initial table data. It does only the synchronization phase,
-   which ensures each table is brought up to a synchronized state.
+   The minimal command shown above replicates only a single database: the
+   one named in the connection string pointing to the source server.  More
+   commonly, you would use the <option>--database</option> (<option>-d</option>)
+   option one or more times to specify which databases in the source to replicate.
+   The corresponding publications, subscriptions, and replication slots are
+   then created using auto-generated names, which can be overriden
+   using the corresponding long-form command-line options, once each per database.
   </para>
 
   <para>
-   <application>pg_createsubscriber</application> targets large database
-   systems because in logical replication setup, most of the time is spent
-   doing the initial data copy.  Furthermore, a side effect of this long time
-   spent synchronizing data is usually a large amount of changes to be applied
-   (that were produced during the initial data copy), which increases even
-   more the time when the logical replica will be unavailable. For smaller
-   databases, it is recommended to set up logical replication with initial data
-   synchronization.  For details, see the <command>CREATE SUBSCRIPTION</command>
-   <link linkend="sql-createsubscription-params-with-copy-data">
-   <literal>copy_data</literal></link> option.
-
+   After a successful run, the state of the target server is analogous to a
+   fresh logical replication setup.  The main difference compared the logical
+   replication setup is that <application>pg_createsubscriber</application>
+   does not perform an initial data copy.  Instead it relies on a LSN-based
+   point-in-time recovery of physical backup to coincide with the LSN-based
+   starting point of the subscription(s) it creates.
   </para>
  </refsect1>
 
@@ -92,12 +73,12 @@ PostgreSQL documentation
      <term><option>--database=<replaceable class="parameter">dbname</replaceable></option></term>
      <listitem>
       <para>
-       The name of the database in which to create a subscription.  Multiple
+       The name of the database for which to create a subscription.  Multiple
        databases can be selected by writing multiple <option>-d</option>
-       options. If <option>-d</option> option is not provided, the database
-       name will be obtained from <option>-P</option> option. If the database
-       name is not specified in either the <option>-d</option> option or
-       <option>-P</option> option, an error will be reported.
+       options. If no <option>-d</option> option is provided, the database
+       name will be obtained from the <option>-P</option> option. If the database
+       name is not specified in either the <option>-d</option> or
+       <option>-P</option> options, an error will be reported.
       </para>
      </listitem>
     </varlistentry>
@@ -107,7 +88,7 @@ PostgreSQL documentation
      <term><option>--pgdata=<replaceable class="parameter">directory</replaceable></option></term>
      <listitem>
       <para>
-       The target directory that contains a cluster directory from a physical
+       The target directory that contains a cluster directory of a physical
        replica.
       </para>
      </listitem>
@@ -143,6 +124,11 @@ PostgreSQL documentation
        The connection string to the publisher.  For details see <xref
        linkend="libpq-connstring"/>.
       </para>
+      <para>
+       If the <option>-d</option> option is not provided, the database name
+       of the single database to replicate is obtained from here.  If a database
+       name is also not specified in this connection string, an error will be reported.
+      </para>
      </listitem>
     </varlistentry>
 
@@ -162,8 +148,8 @@ PostgreSQL documentation
      <term><option>--recovery-timeout=<replaceable class="parameter">seconds</replaceable></option></term>
      <listitem>
       <para>
-       The maximum number of seconds to wait for recovery to end.  Setting to
-       0 disables.  The default is 0.
+       The maximum number of seconds to wait for recovery to end.
+       A value of 0 (the default) disables.
       </para>
      </listitem>
     </varlistentry>
@@ -186,7 +172,7 @@ PostgreSQL documentation
      <term><option>--subscriber-username=<replaceable class="parameter">username</replaceable></option></term>
      <listitem>
       <para>
-       The user name to connect as on target server.  Defaults to the current
+       The user name to connect as on the target server.  Defaults to the current
        operating system user name.
       </para>
      </listitem>
@@ -210,12 +196,18 @@ PostgreSQL documentation
      <term><option>--config-file=<replaceable class="parameter">filename</replaceable></option></term>
      <listitem>
       <para>
-       Use the specified main server configuration file for the target data
-       directory.  <application>pg_createsubscriber</application> internally uses
+       Use the specified main server configuration file for the target service.
+       <application>pg_createsubscriber</application> internally uses
        the <application>pg_ctl</application> command to start and
-       stop the target server.  It allows you to specify the actual
-       <filename>postgresql.conf</filename> configuration file if it is stored
-       outside the data directory.
+       stop the target server; which by default expects a <filename>postgresql.conf</filename>
+       configuration file to exist in the data directory.  Pass the actual file location via this option
+       if that is not the case.
+      </para>
+      <para>
+       Note, the configuration setting <literal>port</literal> in this file is effectively ignored
+       as <application>pg_createsubscriber</application> always specifies the target listening port
+       on the <application>pg_ctl</application> command-line.
+       Use the <option>-p</option> option to specify which port the target server should listen on.
       </para>
      </listitem>
     </varlistentry>
@@ -310,7 +302,7 @@ PostgreSQL documentation
    </para>
 
    <para>
-    The target server must be used as a physical standby.  The target server
+    The target server must be setup as a physical standby.  The target server
     must have <xref linkend="guc-max-replication-slots"/> and <xref
     linkend="guc-max-logical-replication-workers"/> configured to a value
     greater than or equal to the number of specified databases.  The target
@@ -344,16 +336,17 @@ PostgreSQL documentation
    </para>
 
    <para>
-    <application>pg_createsubscriber</application> usually starts the target
-    server with different connection settings during transformation.  Hence,
-    connections to the target server should fail.
+    <application>pg_createsubscriber</application> is designed have full and
+    exclusive control over the target server.  Plan for downtime or migration
+    of any applications or users of the target server in its soon-to-be converted
+    physical standby mode.
    </para>
 
    <para>
     Since DDL commands are not replicated by logical replication, avoid
     executing DDL commands that change the database schema while running
     <application>pg_createsubscriber</application>.  If the target server has
-    already been converted to logical replica, the DDL commands might not be
+    already been converted to a logical replica, the DDL commands might not be
     replicated, which might cause an error.
    </para>
 
-- 
2.34.1

From 7d09cf0d7500ee7ffb403ace54b1271232346ac2 Mon Sep 17 00:00:00 2001
From: "David G. Johnston" <David.G.Johnston@Gmail.com>
Date: Mon, 10 Mar 2025 12:23:05 -0700
Subject: [PATCH 1/2] doc: minor fixups in pg_createsubscriber

---
 doc/src/sgml/ref/pg_createsubscriber.sgml | 68 +++++++++++------------
 1 file changed, 31 insertions(+), 37 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index b4b996236e..d39ec44297 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -70,7 +70,7 @@ PostgreSQL documentation
    doing the initial data copy.  Furthermore, a side effect of this long time
    spent synchronizing data is usually a large amount of changes to be applied
    (that were produced during the initial data copy), which increases even
-   more the time when the logical replica will be available. For smaller
+   more the time when the logical replica will be unavailable. For smaller
    databases, it is recommended to set up logical replication with initial data
    synchronization.  For details, see the <command>CREATE SUBSCRIPTION</command>
    <link linkend="sql-createsubscription-params-with-copy-data">
@@ -94,7 +94,7 @@ PostgreSQL documentation
       <para>
        The name of the database in which to create a subscription.  Multiple
        databases can be selected by writing multiple <option>-d</option>
-       switches. If <option>-d</option> option is not provided, the database
+       options. If <option>-d</option> option is not provided, the database
        name will be obtained from <option>-P</option> option. If the database
        name is not specified in either the <option>-d</option> option or
        <option>-P</option> option, an error will be reported.
@@ -128,7 +128,7 @@ PostgreSQL documentation
      <term><option>--subscriber-port=<replaceable class="parameter">port</replaceable></option></term>
      <listitem>
       <para>
-       The port number on which the target server is listening for
+       The port number on which the target server will listen for
        connections.  Defaults to running the target server on port 50432 to
        avoid unintended client connections.
       </para>
@@ -226,10 +226,10 @@ PostgreSQL documentation
       <para>
        The publication name to set up the logical replication.  Multiple
        publications can be specified by writing multiple
-       <option>--publication</option> switches.  The number of publication
+       <option>--publication</option> options.  The number of publication
        names must match the number of specified databases, otherwise an error
-       is reported.  The order of the multiple publication name switches must
-       match the order of database switches.  If this option is not specified,
+       is reported.  The order of the multiple publication name options must
+       match the order of database options.  If this option is not specified,
        a generated name is assigned to the publication name.
       </para>
      </listitem>
@@ -241,10 +241,10 @@ PostgreSQL documentation
       <para>
        The replication slot name to set up the logical replication.  Multiple
        replication slots can be specified by writing multiple
-       <option>--replication-slot</option> switches.  The number of
+       <option>--replication-slot</option> options.  The number of
        replication slot names must match the number of specified databases,
        otherwise an error is reported.  The order of the multiple replication
-       slot name switches must match the order of database switches.  If this
+       slot name options must match the order of database options.  If this
        option is not specified, the subscription name is assigned to the
        replication slot name.
       </para>
@@ -257,10 +257,10 @@ PostgreSQL documentation
       <para>
        The subscription name to set up the logical replication.  Multiple
        subscriptions can be specified by writing multiple
-       <option>--subscription</option> switches.  The number of subscription
+       <option>--subscription</option> options.  The number of subscription
        names must match the number of specified databases, otherwise an error
-       is reported.  The order of the multiple subscription name switches must
-       match the order of database switches.  If this option is not specified,
+       is reported.  The order of the multiple subscription name options must
+       match the order of database options.  If this option is not specified,
        a generated name is assigned to the subscription name.
       </para>
      </listitem>
@@ -281,7 +281,7 @@ PostgreSQL documentation
      <term><option>--help</option></term>
      <listitem>
       <para>
-       Show help about <application>pg_createsubscriber</application> command
+       Show help about <application>pg_createsubscriber</application> command-
        line arguments, and exit.
       </para>
      </listitem>
@@ -300,7 +300,7 @@ PostgreSQL documentation
     There are some prerequisites for
     <application>pg_createsubscriber</application> to convert the target server
     into a logical replica.  If these are not met, an error will be reported.
-    The source and target servers must have the same major version as the
+    The source and target servers must have the same major version as
     <application>pg_createsubscriber</application>.  The given target data
     directory must have the same system identifier as the source data
     directory.  The given database user for the target data directory must have
@@ -317,7 +317,7 @@ PostgreSQL documentation
     server must have <xref linkend="guc-max-worker-processes"/> configured to a
     value greater than the number of specified databases.  The target server
     must accept local connections. If you are planning to use the
-    <option>--enable-two-phase</option> switch then you will also need to set
+    <option>--enable-two-phase</option> option then you will also need to set
     the <xref linkend="guc-max-prepared-transactions"/> appropriately.
    </para>
 
@@ -361,14 +361,14 @@ PostgreSQL documentation
     If <application>pg_createsubscriber</application> fails while processing,
     objects (publications, replication slots) created on the source server are
     removed.  The removal might fail if the target server cannot connect to
-    the source server.  In such a case, a warning message will inform the
-    objects left.  If the target server is running, it will be stopped.
+    the source server.  In such a case, a warning message will list the
+    remaining objects.  If the target server is running, it will be stopped.
    </para>
 
    <para>
     If the replication is using <xref linkend="guc-primary-slot-name"/>, it
-    will be removed from the source server after the logical replication
-    setup.
+    will be removed from the source server once the logical replication
+    synchronization begins.
    </para>
 
    <para>
@@ -378,7 +378,7 @@ PostgreSQL documentation
    </para>
 
    <para>
-    Unless the <option>--enable-two-phase</option> switch is specified,
+    Unless the <option>--enable-two-phase</option> option is specified,
     <application>pg_createsubscriber</application> sets up logical
     replication with two-phase commit disabled.  This means that any
     prepared transactions will be replicated at the time
@@ -391,8 +391,8 @@ PostgreSQL documentation
 
    <para>
     <application>pg_createsubscriber</application> changes the system
-    identifier using <application>pg_resetwal</application>.  It would avoid
-    situations in which the target server might use WAL files from the source
+    identifier using <application>pg_resetwal</application>.  This prevents
+    situations where the target server would use WAL files from the source
     server.  If the target server has a standby, replication will break and a
     fresh standby should be created.
    </para>
@@ -410,7 +410,7 @@ PostgreSQL documentation
 
    <para>
     The basic idea is to have a replication start point from the source server
-    and set up a logical replication to start from this point:
+    and set up logical replication to start from this point:
    </para>
 
    <procedure>
@@ -439,7 +439,7 @@ PostgreSQL documentation
       linkend="sql-createpublication-params-for-all-tables"><literal>FOR ALL
       TABLES</literal></link>.  If the <option>--publication</option> option
       is not specified, the publication has the following name pattern:
-      <quote><literal>pg_createsubscriber_%u_%x</literal></quote> (parameter:
+      <quote><literal>pg_createsubscriber_%u_%x</literal></quote> (parameters:
       database <parameter>oid</parameter>, random <parameter>int</parameter>).
       If the <option>--replication-slot</option> option is not specified, the
       replication slot has the following name pattern:
@@ -449,7 +449,7 @@ PostgreSQL documentation
       step.  The last replication slot LSN is used as a stopping point in the
       <xref linkend="guc-recovery-target-lsn"/> parameter and by the
       subscriptions as a replication start point.  It guarantees that no
-      transaction will be lost.
+      transactions will be lost.
      </para>
     </step>
 
@@ -460,16 +460,11 @@ PostgreSQL documentation
       linkend="guc-recovery-target-lsn"/>) of the write-ahead log location up
       to which recovery will proceed.  It also specifies
       <literal>promote</literal> as the action that the server should take
-      once the recovery target is reached.  Additional <link
-      linkend="runtime-config-wal-recovery-target">recovery parameters</link>
-      are added to avoid unexpected behavior during the recovery process such
-      as end of the recovery as soon as a consistent state is reached (WAL
-      should be applied until the replication start location) and multiple
-      recovery targets that can cause a failure.  This step finishes once the
+      once the recovery target is reached.  This step finishes once the
       server ends standby mode and is accepting read-write transactions.  If
-      <option>--recovery-timeout</option> option is set,
-      <application>pg_createsubscriber</application> terminates if recovery
-      does not end until the given number of seconds.
+      a non-zero <option>--recovery-timeout</option> is set,
+      <application>pg_createsubscriber</application> terminates if promotion
+      does not happen within the given number of seconds.
      </para>
     </step>
 
@@ -491,8 +486,7 @@ PostgreSQL documentation
     <step>
      <para>
       Drop publications on the target server that were replicated because they
-      were created before the replication start location.  It has no use on
-      the subscriber.
+      were created before the replication start location.
      </para>
     </step>
 
@@ -502,7 +496,7 @@ PostgreSQL documentation
       subscription.  When the target server starts the recovery process, it
       catches up to the replication start point.  This is the exact LSN to be
       used as a initial replication location for each subscription.  The
-      replication origin name is obtained since the subscription was created.
+      replication origin name was obtained when the subscription was created.
       The replication origin name and the replication start point are used in
       <link
       linkend="pg-replication-origin-advance"><function>pg_replication_origin_advance()</function></link>
@@ -536,7 +530,7 @@ PostgreSQL documentation
 
     <step>
      <para>
-      Update the system identifier on the target server. The
+      Update the system identifier on the target server. The application
       <xref linkend="app-pgresetwal"/> is run to modify the system identifier.
       The target server is stopped as a <command>pg_resetwal</command> requirement.
      </para>
-- 
2.34.1

Reply via email to