Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

Michael Paquier Fri, 04 Oct 2019 01:37:57 -0700

On Thu, Oct 03, 2019 at 12:43:37PM +0300, Alexey Kondratov wrote:
> On 03.10.2019 6:07, Michael Paquier wrote:
>> I have reworked your first patch as per the attached.  What do you
>> think about it?  The part with the control file needs to go down to
>> v12, and I would likely split that into two commits on HEAD: one for
>> the control file and a second for the recovery.conf portion with the
>> fix for --no-ensure-shutdown to keep a cleaner history.
> 
> It looks fine for me excepting the progress reporting part. It now adds
> PG_CONTROL_FILE_SIZE to fetch_done. However, I cannot find that control file
> is either included into filemap and fetch_size or counted during
> calculate_totals(). Maybe I've missed something, but now it looks like we
> report something that wasn't planned for progress reporting, doesn't
> it?


Right.  The pre-12 code actually handles that incorrecly as it assumed
that any files written through file_ops.c should be part of the
progress.  So I went with the simplest solution, and backpatched this
part with 6f3823b.  I have also committed the set of fixes for the new
options so as we have a better base of work than what's on HEAD
currently.

>> +               # Check that incompatible options error out.
>> +               command_fails(
>> +                       [
>> +                               'pg_rewind', "--debug",
>> +                               "--source-pgdata=$standby_pgdata",
>> +                               "--target-pgdata=$master_pgdata", "-R",
>> +                               "--no-ensure-shutdown"
>> +                       ],
>> +                       'pg_rewind local with -R');
>> Incompatible options had better be checked within a separate perl
>> script?  We generally do that for the other binaries.
> 
> Yes, it makes sense. I've reworked the patch with tests and added a couple
> of extra cases.

Regarding the tests, adding a --dry-run command is a good idea.
However I think that there is more value to automate the use of the
single user mode automatically in the tests as that's more critical
from the point of view of rewind run, and stopping the cluster with
immediate mode causes, as expected, the next --dry-run command to
fail.

Another thing is that I think that we should use -F with --single.
This makes recovery faster, and the target data folder is synced
at the end of pg_rewind anyway.

Using the long option names makes the tests easier to follow in this
case, so I have switched -R to --write-recovery-conf.

Some comments and the docs have been using some confusing wording, so
I have reworked what I found (like many "it" in a single sentence
referring different things).

+command_fails(
+    [
+        'pg_rewind', "--debug",
+        "--source-pgdata=$standby_pgdata",
+        "--target-pgdata=$master_pgdata",
+        "--no-ensure-shutdown"
+    ],
+    'pg_rewind local without source shutdown');
Regarding all the set of incompatible options, we have much more of
that after the initial option parsing so I think that we should group
all the cheap ones together.  Let's tackle that as a separate patch.
We can also just check after --no-ensure-shutdown directly in
RewindTest.pm as I have switched the cluster to not be cleanly shut
down anymore to stress the automatic recovery path, and trigger that
before running pg_rewind for the local and remote mode.  

Attached is an updated patch with all I found.  What do you think?
--
Michael

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index fbf454803b..42d29edd4e 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -169,12 +169,14 @@ PostgreSQL documentation
       <term><option>--no-ensure-shutdown</option></term>
       <listitem>
        <para>
-        <application>pg_rewind</application> verifies that the target server
-        is cleanly shutdown before rewinding; by default, if it isn't, it
-        starts the server in single-user mode to complete crash recovery.
+        <application>pg_rewind</application> requires that the target server
+        is cleanly shut down before rewinding. By default, if the target server
+        is not shut down cleanly, <application>pg_rewind</application> starts
+        the target server in single-user mode to complete crash recovery first,
+        and stops it.
         By passing this option, <application>pg_rewind</application> skips
         this and errors out immediately if the server is not cleanly shut
-        down.  Users are expected to handle the situation themselves in that
+        down. Users are expected to handle the situation themselves in that
         case.
        </para>
       </listitem>
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index fe1468b771..875a43b219 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -270,11 +270,12 @@ main(int argc, char **argv)
 	pg_free(buffer);
 
 	/*
-	 * If the target instance was not cleanly shut down, run a single-user
-	 * postgres session really quickly and reload the control file to get the
-	 * new state. Note if no_ensure_shutdown is specified, pg_rewind won't do
-	 * that automatically. That means users need to do themselves in advance,
-	 * else pg_rewind will soon quit, see sanityChecks().
+	 * If the target instance was not cleanly shut down, start and stop the
+	 * target cluster once in single-user mode to enforce recovery to finish,
+	 * ensuring that the cluster can be used by pg_rewind.  Note that if
+	 * no_ensure_shutdown is specified, pg_rewind ignores this step, and users
+	 * need to make sure by themselves that the target cluster is in a clean
+	 * state.
 	 */
 	if (!no_ensure_shutdown &&
 		ControlFile_target.state != DB_SHUTDOWNED &&
@@ -847,8 +848,12 @@ ensureCleanShutdown(const char *argv0)
 	if (dry_run)
 		return;
 
-	/* finally run postgres in single-user mode */
-	snprintf(cmd, MAXCMDLEN, "\"%s\" --single -D \"%s\" template1 < \"%s\"",
+	/*
+	 * Finally run postgres in single-user mode.  There is no need to use
+	 * fsync here.  This makes the recovery faster, and the target data folder
+	 * is synced at the end anyway.
+	 */
+	snprintf(cmd, MAXCMDLEN, "\"%s\" --single -F -D \"%s\" template1 < \"%s\"",
 			 exec_path, datadir_target, DEVNULL);
 
 	if (system(cmd) != 0)
diff --git a/src/bin/pg_rewind/t/005_same_timeline.pl b/src/bin/pg_rewind/t/005_same_timeline.pl
index 40dbc44caa..df469d3939 100644
--- a/src/bin/pg_rewind/t/005_same_timeline.pl
+++ b/src/bin/pg_rewind/t/005_same_timeline.pl
@@ -1,3 +1,7 @@
+#
+# Test that running pg_rewind with the source and target clusters
+# on the same timeline runs successfully.
+#
 use strict;
 use warnings;
 use TestLib;
diff --git a/src/bin/pg_rewind/t/RewindTest.pm b/src/bin/pg_rewind/t/RewindTest.pm
index c540722420..82fa220ac8 100644
--- a/src/bin/pg_rewind/t/RewindTest.pm
+++ b/src/bin/pg_rewind/t/RewindTest.pm
@@ -227,8 +227,10 @@ sub run_pg_rewind
 	# Append the rewind-specific role to the connection string.
 	$standby_connstr = "$standby_connstr user=rewind_user";
 
-	# Stop the master and be ready to perform the rewind
-	$node_master->stop;
+	# Stop the master and be ready to perform the rewind.  The cluster
+	# needs recovery to finish once, and pg_rewind makes sure that it
+	# happens automatically.
+	$node_master->stop('immediate');
 
 	# At this point, the rewind processing is ready to run.
 	# We now have a very simple scenario with a few diverged WAL record.
@@ -260,19 +262,21 @@ sub run_pg_rewind
 	}
 	elsif ($test_mode eq "remote")
 	{
-
-		# Do rewind using a remote connection as source
+		# Do rewind using a remote connection as source, generating
+		# recovery configuration automatically.
 		command_ok(
 			[
 				'pg_rewind',                      "--debug",
 				"--source-server",                $standby_connstr,
-				"--target-pgdata=$master_pgdata", "-R",
-				"--no-sync"
+				"--target-pgdata=$master_pgdata", "--no-sync",
+				"--write-recovery-conf"
 			],
 			'pg_rewind remote');
 
-		# Check that standby.signal has been created.
-		ok(-e "$master_pgdata/standby.signal");
+		# Check that standby.signal is here as recovery configuration
+		# was requested.
+		ok( -e "$master_pgdata/standby.signal",
+			'standby.signal created after pg_rewind');
 
 		# Now, when pg_rewind apparently succeeded with minimal permissions,
 		# add REPLICATION privilege.  So we could test that new standby

signature.asc
Description: PGP signature

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

Reply via email to