Re: Race condition in recovery?

Dilip Kumar Wed, 09 Jun 2021 02:42:36 -0700

On Wed, Jun 9, 2021 at 1:37 PM Dilip Kumar <dilipbal...@gmail.com> wrote:
>
> On Wed, Jun 9, 2021 at 12:14 PM Dilip Kumar <dilipbal...@gmail.com> wrote:
> >
> > On Wed, Jun 9, 2021 at 2:07 AM Robert Haas <robertmh...@gmail.com> wrote:
> > 2021-06-09 12:11:08.618 IST [122456] LOG:  entering standby mode
> > 2021-06-09 12:11:08.622 IST [122456] LOG:  restored log file 
> > "00000002.history" from archive
> > cp: cannot stat 
> > ‘/home/dilipkumar/work/PG/postgresql/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/000000010000000000000002’:
> >  No such file or directory
> > 2021-06-09 12:11:08.627 IST [122456] LOG:  redo starts at 0/2000028
> > 2021-06-09 12:11:08.627 IST [122456] LOG:  consistent recovery state 
> > reached at 0/3000000
> >
> > Next, I will investigate, without a fix on v11 (maybe v12, v10..) why it is 
> > not hitting the defect location at all.  And after that, I will check the 
> > status on other older versions.
>
> Reason for the problem was that the "-Xnone" parameter was not
> accepted by "sub backup" in PostgresNode.pm so I created that for
> backpatch.  With attached patches I am to make it pass in v12,v11,v10
> (with fix) and fail (without fix).  However, we will have to make some
> change for 9.6 because pg_basebackup doesn't support -Xnone on 9.6,
> maybe we can delete the content from pg_wal after the backup, if we
> think that approach looks fine then I will make the changes for 9.6 as
> well.
>
> Note: for param backport for v12 and v11 same patch getting applied
> but for v10 due to some conflict we need a separate patch (both
> attached).


I have fixed it for 9.6 as well by removing the wal from the xlog
directory. Attaching all the patches in single mail to avoid
confusion.

Note:
v7-0001 applies to master, v13,v12 (but for v12 before this we need to
apply backport)
v12-v8-0001-Backport is same as v11-v8-0001-Backport (duplicated for
version wise separation)
v11-v8-0002 is same as v10-v8-0002

Basically, for v12 and v11 same backport patch works and for V11 and
V10 same main patch works, still I duplicated them to avoid confusion.
-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From 8b96729a51015e850672b2d685ec1b61490f967c Mon Sep 17 00:00:00 2001
From: Robert Haas <rh...@postgresql.org>
Date: Tue, 8 Jun 2021 14:31:22 -0400
Subject: [PATCH v8 1/2] Back-port a few PostgresNode.pm methods.

The 'lsn' and 'wait_for_catchup' methods only exist in v10 and
higher, but are needed in order to support a test planned test
case for a bug that exists all the way back to v9.6. To minimize
cross-branch differences in the test case, back-port these
methods.

Discussion: http://postgr.es/m/ca+tgmoag5dma_8xc1wvbvftpjtwx5uzkgehxe7mij+im9jy...@mail.gmail.com
---
 src/test/perl/PostgresNode.pm | 83 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 5c59265..5f5a7d2 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1472,6 +1472,89 @@ sub issues_sql_like
 
 =pod
 
+=item $node->lsn(mode)
+
+Look up WAL locations on the server:
+
+ * insert location (master only, error on replica)
+ * write location (master only, error on replica)
+ * flush location (master only, error on replica)
+ * receive location (always undef on master)
+ * replay location (always undef on master)
+
+mode must be specified.
+
+=cut
+
+sub lsn
+{
+	my ($self, $mode) = @_;
+	my %modes = (
+		'insert'  => 'pg_current_xlog_insert_location()',
+		'flush'   => 'pg_current_xlog_flush_location()',
+		'write'   => 'pg_current_xlog_location()',
+		'receive' => 'pg_last_xlog_receive_location()',
+		'replay'  => 'pg_last_xlog_replay_location()');
+
+	$mode = '<undef>' if !defined($mode);
+	die "unknown mode for 'lsn': '$mode', valid modes are "
+	  . join(', ', keys %modes)
+	  if !defined($modes{$mode});
+
+	my $result = $self->safe_psql('postgres', "SELECT $modes{$mode}");
+	chomp($result);
+	if ($result eq '')
+	{
+		return;
+	}
+	else
+	{
+		return $result;
+	}
+}
+
+=pod
+
+=item $node->wait_for_catchup(standby_name, mode, target_lsn)
+
+Wait for the node with application_name standby_name (usually from node->name)
+until its replication position in pg_stat_replication equals or passes the
+upstream's xlog insert point at the time this function is called. By default
+the replay_location is waited for, but 'mode' may be specified to wait for any
+of sent|write|flush|replay.
+
+If there is no active replication connection from this peer, waits until
+poll_query_until timeout.
+
+Requires that the 'postgres' db exists and is accessible.
+
+target_lsn may be any arbitrary lsn, but is typically $master_node->lsn('insert').
+
+This is not a test. It die()s on failure.
+
+=cut
+
+sub wait_for_catchup
+{
+	my ($self, $standby_name, $mode, $target_lsn) = @_;
+	$mode = defined($mode) ? $mode : 'replay';
+	my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1, 'replay' => 1 );
+	die "unknown mode $mode for 'wait_for_catchup', valid modes are " . join(', ', keys(%valid_modes)) unless exists($valid_modes{$mode});
+	# Allow passing of a PostgresNode instance as shorthand
+	if ( blessed( $standby_name ) && $standby_name->isa("PostgresNode") )
+	{
+		$standby_name = $standby_name->name;
+	}
+	die 'target_lsn must be specified' unless defined($target_lsn);
+	print "Waiting for replication conn " . $standby_name . "'s " . $mode . "_location to pass " . $target_lsn . " on " . $self->name . "\n";
+	my $query = qq[SELECT '$target_lsn' <= ${mode}_location FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';];
+	$self->poll_query_until('postgres', $query)
+		or die "timed out waiting for catchup, current position is " . ($self->safe_psql('postgres', $query) || '(unknown)');
+	print "done\n";
+}
+
+=pod
+
 =back
 
 =cut
-- 
1.8.3.1

From c37afe2ca70710c2e6492fdfd073fecbffbfe79b Mon Sep 17 00:00:00 2001
From: Robert Haas <rh...@postgresql.org>
Date: Tue, 8 Jun 2021 15:09:09 -0400
Subject: [PATCH v8 2/2] Fix corner case failure of new standby to follow new
 primary.

This only happens if (1) the new standby has no WAL available locally,
(2) the new standby is starting from the old timeline, (3) the promotion
happened in the WAL segment from which the new standby is starting,
(4) the timeline history file for the new timeline is available from
the archive but the WAL files for are not (i.e. this is a race),
(5) the WAL files for the new timeline are available via streaming,
and (6) recovery_target_timeline='latest'.

Commit ee994272ca50f70b53074f0febaec97e28f83c4e introduced this
logic and was an improvement over the previous code, but it mishandled
this case. If recovery_target_timeline='latest' and restore_command is
set, validateRecoveryParameters() can change recoveryTargetTLI to be
different from receiveTLI. If streaming is then tried afterward,
expectedTLEs gets initialized with the history of the wrong timeline.
It's supposed to be a list of entries explaining how to get to the
target timeline, but in this case it ends up with a list of entries
explaining how to get to the new standby's original timeline, which
isn't right.

Dilip Kumar and Robert Haas, reviewed by Kyotaro Horiguchi.

Discussion: http://postgr.es/m/CAFiTN-sE-jr=lb8jquxeqikd-ux+jhixyh4ydizmpedgqku...@mail.gmail.com
---
 src/backend/access/transam/xlog.c                |  10 ++-
 src/test/recovery/t/025_stuck_on_old_timeline.pl | 107 +++++++++++++++++++++++
 src/test/recovery/t/cp_history_files             |  10 +++
 3 files changed, 126 insertions(+), 1 deletion(-)
 create mode 100644 src/test/recovery/t/025_stuck_on_old_timeline.pl
 create mode 100644 src/test/recovery/t/cp_history_files

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9ad6da9..7bcb308 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11835,11 +11835,19 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 						 * pg_xlog by now.  Use XLOG_FROM_STREAM so that
 						 * source info is set correctly and XLogReceiptTime
 						 * isn't changed.
+						 *
+						 * NB: We must set readTimeLineHistory based on
+						 * recoveryTargetTLI, not receiveTLI. Normally they'll
+						 * be the same, but if recovery_target_timeline is
+						 * 'latest' and archiving is configured, then it's
+						 * possible that we managed to retrieve one or more
+						 * new timeline history files from the archive,
+						 * updating recoveryTargetTLI.
 						 */
 						if (readFile < 0)
 						{
 							if (!expectedTLEs)
-								expectedTLEs = readTimeLineHistory(receiveTLI);
+								expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
 							readFile = XLogFileRead(readSegNo, PANIC,
 													receiveTLI,
 													XLOG_FROM_STREAM, false);
diff --git a/src/test/recovery/t/025_stuck_on_old_timeline.pl b/src/test/recovery/t/025_stuck_on_old_timeline.pl
new file mode 100644
index 0000000..b885f1c
--- /dev/null
+++ b/src/test/recovery/t/025_stuck_on_old_timeline.pl
@@ -0,0 +1,107 @@
+
+# Copyright (c) 2021, PostgreSQL Global Development Group
+
+# Testing streaming replication where standby is promoted and a new cascading
+# standby (without WAL) is connected to the promoted standby.  Both archiving
+# and streaming are enabled, but only the history file is available from the
+# archive, so the WAL files all have to be streamed.  Test that the cascading
+# standby can follow the new primary (promoted standby).
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use FindBin;
+use Test::More tests => 1;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+
+# Set up an archive command that will copy the history file but not the WAL
+# files. No real archive command should behave this way; the point is to
+# simulate a race condition where the new cascading standby starts up after
+# the timeline history file reaches the archive but before any of the WAL files
+# get there.
+$node_primary->init(allows_streaming => 1, has_archiving => 1);
+my $perlbin = $^X;
+$perlbin =~ s{\\}{\\\\}g if ($TestLib::windows_os);
+my $archivedir_primary = $node_primary->archive_dir;
+$node_primary->append_conf(
+	'postgresql.conf', qq(
+archive_command = '$perlbin "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
+));
+$node_primary->start;
+
+# Take backup from primary
+my $backup_name = 'my_backup';
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby = get_new_node('standby');
+$node_standby->init_from_backup($node_primary, $backup_name,
+	allows_streaming => 1, has_streaming => 1, has_archiving => 1);
+$node_standby->start;
+
+# Take backup of standby
+$node_standby->backup($backup_name);
+
+# clean up pg_xlog from the backup
+my $pgxlogdir = $node_standby->backup_dir. "/" . $backup_name . "/pg_xlog";
+opendir my $dh, $pgxlogdir or die "failed to open $pgxlogdir";
+while (my $f = readdir($dh))
+{
+	unlink("$pgxlogdir/$f") if (-f "$pgxlogdir/$f");
+}
+closedir($dh);
+
+# Create cascading standby but don't start it yet.
+# Must set up both streaming and archiving.
+my $node_cascade = get_new_node('cascade');
+$node_cascade->init_from_backup($node_standby, $backup_name,
+	has_streaming => 1);
+$node_cascade->enable_restoring($node_primary);
+
+$node_cascade->append_conf('recovery.conf', qq(
+recovery_target_timeline='latest'
+));
+
+# Promote the standby.
+$node_standby->promote;
+
+# Wait for promotion to complete
+$node_standby->poll_query_until('postgres',
+								"SELECT NOT pg_is_in_recovery();")
+	or die "Timed out while waiting for promotion";
+
+# Find next WAL segment to be archived
+my $walfile_to_be_archived = $node_standby->safe_psql('postgres',
+	"SELECT pg_xlogfile_name(pg_current_xlog_location());");
+
+# Make WAL segment eligible for archival
+$node_standby->safe_psql('postgres', 'SELECT pg_switch_xlog()');
+
+# Wait until the WAL segment has been archived.
+# Since the history file gets created on promotion and is archived before any
+# WAL segment, this is enough to guarantee that the history file was
+# archived.
+my $archive_wait_query =
+  "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";
+$node_standby->poll_query_until('postgres', $archive_wait_query)
+  or die "Timed out while waiting for WAL segment to be archived";
+my $last_archived_wal_file = $walfile_to_be_archived;
+
+# Start cascade node
+$node_cascade->start;
+
+# Create some content on promoted standby and check its presence on the
+# cascading standby.
+$node_standby->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT 1 AS a");
+
+# Wait for the replication to catch up
+$node_standby->wait_for_catchup($node_cascade, 'replay',
+	$node_standby->lsn('insert'));
+
+# Check that cascading standby has the new content
+my $result =
+  $node_cascade->safe_psql('postgres', "SELECT count(*) FROM tab_int");
+print "cascade: $result\n";
+is($result, 1, 'check streamed content on cascade standby');
diff --git a/src/test/recovery/t/cp_history_files b/src/test/recovery/t/cp_history_files
new file mode 100644
index 0000000..cfeea41
--- /dev/null
+++ b/src/test/recovery/t/cp_history_files
@@ -0,0 +1,10 @@
+#!/usr/bin/perl
+
+use File::Copy;
+use strict;
+use warnings;
+
+die "wrong number of arguments" if @ARGV != 2;
+my ($source, $target) = @ARGV;
+exit if $source !~ /history/;
+copy($source, $target) or die "couldn't copy $source to $target: $!";
-- 
1.8.3.1

From a52e20bd0bde14d5e194e3d853b9f6ea72019ad5 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Wed, 9 Jun 2021 12:52:42 +0530
Subject: [PATCH v8] Back port backup param in PostgresNode.pm

---
 src/test/perl/PostgresNode.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 61aa048..beb7bc1 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -548,7 +548,7 @@ target server since it isn't done by default.
 
 sub backup
 {
-	my ($self, $backup_name) = @_;
+	my ($self, $backup_name, %params) = @_;
 	my $backup_path = $self->backup_dir . '/' . $backup_name;
 	my $name        = $self->name;
 
@@ -556,7 +556,8 @@ sub backup
 	TestLib::system_or_bail(
 		'pg_basebackup', '-D', $backup_path, '-h',
 		$self->host,     '-p', $self->port,  '--checkpoint',
-		'fast',          '--no-sync');
+		'fast',          '--no-sync',
+		@{ $params{backup_options} });
 	print "# Backup finished\n";
 	return;
 }
-- 
1.8.3.1

From 0c8e0ebeb480b434b58e29c46594f873db3f7087 Mon Sep 17 00:00:00 2001
From: Robert Haas <rh...@postgresql.org>
Date: Tue, 8 Jun 2021 12:52:55 -0400
Subject: [PATCH v8 2/2] Fix corner case failure of new standby to follow new
 primary.

This only happens if (1) the new standby has no WAL available locally,
(2) the new standby is starting from the old timeline, (3) the promotion
happened in the WAL segment from which the new standby is starting,
(4) the timeline history file for the new timeline is available from
the archive but the WAL files for are not (i.e. this is a race),
(5) the WAL files for the new timeline are available via streaming,
and (6) recovery_target_timeline='latest'.

Commit ee994272ca50f70b53074f0febaec97e28f83c4e introduced this
logic and was an improvement over the previous code, but it mishandled
this case. If recovery_target_timeline='latest' and restore_command is
set, validateRecoveryParameters() can change recoveryTargetTLI to be
different from receiveTLI. If streaming is then tried afterward,
expectedTLEs gets initialized with the history of the wrong timeline.
It's supposed to be a list of entries explaining how to get to the
target timeline, but in this case it ends up with a list of entries
explaining how to get to the new standby's original timeline, which
isn't right.

Dilip Kumar and Robert Haas, reviewed by Kyotaro Horiguchi.

Discussion: http://postgr.es/m/CAFiTN-sE-jr=lb8jquxeqikd-ux+jhixyh4ydizmpedgqku...@mail.gmail.com
---
 src/backend/access/transam/xlog.c                | 10 ++-
 src/test/recovery/t/025_stuck_on_old_timeline.pl | 97 ++++++++++++++++++++++++
 src/test/recovery/t/cp_history_files             | 10 +++
 3 files changed, 116 insertions(+), 1 deletion(-)
 create mode 100644 src/test/recovery/t/025_stuck_on_old_timeline.pl
 create mode 100644 src/test/recovery/t/cp_history_files

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 42b902b..828564f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12288,11 +12288,19 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 						 * pg_wal by now.  Use XLOG_FROM_STREAM so that source
 						 * info is set correctly and XLogReceiptTime isn't
 						 * changed.
+						 *
+						 * NB: We must set readTimeLineHistory based on
+						 * recoveryTargetTLI, not receiveTLI. Normally they'll
+						 * be the same, but if recovery_target_timeline is
+						 * 'latest' and archiving is configured, then it's
+						 * possible that we managed to retrieve one or more
+						 * new timeline history files from the archive,
+						 * updating recoveryTargetTLI.
 						 */
 						if (readFile < 0)
 						{
 							if (!expectedTLEs)
-								expectedTLEs = readTimeLineHistory(receiveTLI);
+								expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
 							readFile = XLogFileRead(readSegNo, PANIC,
 													receiveTLI,
 													XLOG_FROM_STREAM, false);
diff --git a/src/test/recovery/t/025_stuck_on_old_timeline.pl b/src/test/recovery/t/025_stuck_on_old_timeline.pl
new file mode 100644
index 0000000..b3ce5da
--- /dev/null
+++ b/src/test/recovery/t/025_stuck_on_old_timeline.pl
@@ -0,0 +1,97 @@
+
+# Copyright (c) 2021, PostgreSQL Global Development Group
+
+# Testing streaming replication where standby is promoted and a new cascading
+# standby (without WAL) is connected to the promoted standby.  Both archiving
+# and streaming are enabled, but only the history file is available from the
+# archive, so the WAL files all have to be streamed.  Test that the cascading
+# standby can follow the new primary (promoted standby).
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use FindBin;
+use Test::More tests => 1;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+
+# Set up an archive command that will copy the history file but not the WAL
+# files. No real archive command should behave this way; the point is to
+# simulate a race condition where the new cascading standby starts up after
+# the timeline history file reaches the archive but before any of the WAL files
+# get there.
+$node_primary->init(allows_streaming => 1, has_archiving => 1);
+my $perlbin = $^X;
+$perlbin =~ s{\\}{\\\\}g if ($TestLib::windows_os);
+my $archivedir_primary = $node_primary->archive_dir;
+$node_primary->append_conf(
+	'postgresql.conf', qq(
+archive_command = '$perlbin "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
+));
+$node_primary->start;
+
+# Take backup from primary
+my $backup_name = 'my_backup';
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby = get_new_node('standby');
+$node_standby->init_from_backup($node_primary, $backup_name,
+	allows_streaming => 1, has_streaming => 1, has_archiving => 1);
+$node_standby->start;
+
+# Take backup of standby, use -Xnone so that pg_wal is empty.
+$node_standby->backup($backup_name, backup_options => ['-Xnone']);
+
+# Create cascading standby but don't start it yet.
+# Must set up both streaming and archiving.
+my $node_cascade = get_new_node('cascade');
+$node_cascade->init_from_backup($node_standby, $backup_name,
+	has_streaming => 1);
+$node_cascade->enable_restoring($node_primary);
+$node_cascade->append_conf('recovery.conf', qq(
+recovery_target_timeline='latest'
+));
+
+# Promote the standby.
+$node_standby->promote;
+
+# Wait for promotion to complete
+$node_standby->poll_query_until('postgres',
+								"SELECT NOT pg_is_in_recovery();")
+	or die "Timed out while waiting for promotion";
+
+# Find next WAL segment to be archived
+my $walfile_to_be_archived = $node_standby->safe_psql('postgres',
+	"SELECT pg_walfile_name(pg_current_wal_lsn());");
+
+# Make WAL segment eligible for archival
+$node_standby->safe_psql('postgres', 'SELECT pg_switch_wal()');
+
+# Wait until the WAL segment has been archived.
+# Since the history file gets created on promotion and is archived before any
+# WAL segment, this is enough to guarantee that the history file was
+# archived.
+my $archive_wait_query =
+  "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";
+$node_standby->poll_query_until('postgres', $archive_wait_query)
+  or die "Timed out while waiting for WAL segment to be archived";
+my $last_archived_wal_file = $walfile_to_be_archived;
+
+# Start cascade node
+$node_cascade->start;
+
+# Create some content on promoted standby and check its presence on the
+# cascading standby.
+$node_standby->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT 1 AS a");
+
+# Wait for the replication to catch up
+$node_standby->wait_for_catchup($node_cascade, 'replay',
+	$node_standby->lsn('insert'));
+
+# Check that cascading standby has the new content
+my $result =
+  $node_cascade->safe_psql('postgres', "SELECT count(*) FROM tab_int");
+print "cascade: $result\n";
+is($result, 1, 'check streamed content on cascade standby');
diff --git a/src/test/recovery/t/cp_history_files b/src/test/recovery/t/cp_history_files
new file mode 100644
index 0000000..cfeea41
--- /dev/null
+++ b/src/test/recovery/t/cp_history_files
@@ -0,0 +1,10 @@
+#!/usr/bin/perl
+
+use File::Copy;
+use strict;
+use warnings;
+
+die "wrong number of arguments" if @ARGV != 2;
+my ($source, $target) = @ARGV;
+exit if $source !~ /history/;
+copy($source, $target) or die "couldn't copy $source to $target: $!";
-- 
1.8.3.1

From 3782d36bd8821b1e7785fbd247aafda6a6cf8975 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Wed, 9 Jun 2021 13:15:22 +0530
Subject: [PATCH] Back-port backup param in PostgresNode.pm

---
 src/test/perl/PostgresNode.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index fdcc159..9d895c1 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -512,13 +512,13 @@ target server since it isn't done by default.
 
 sub backup
 {
-	my ($self, $backup_name) = @_;
+	my ($self, $backup_name, %params) = @_;
 	my $backup_path = $self->backup_dir . '/' . $backup_name;
 	my $name        = $self->name;
 
 	print "# Taking pg_basebackup $backup_name from node \"$name\"\n";
 	TestLib::system_or_bail('pg_basebackup', '-D', $backup_path, '-h',
-		$self->host, '-p', $self->port, '--no-sync');
+		$self->host, '-p', $self->port, '--no-sync', @{ $params{backup_options} });
 	print "# Backup finished\n";
 }
 
-- 
1.8.3.1

From 0c8e0ebeb480b434b58e29c46594f873db3f7087 Mon Sep 17 00:00:00 2001
From: Robert Haas <rh...@postgresql.org>
Date: Tue, 8 Jun 2021 12:52:55 -0400
Subject: [PATCH v8 2/2] Fix corner case failure of new standby to follow new
 primary.

This only happens if (1) the new standby has no WAL available locally,
(2) the new standby is starting from the old timeline, (3) the promotion
happened in the WAL segment from which the new standby is starting,
(4) the timeline history file for the new timeline is available from
the archive but the WAL files for are not (i.e. this is a race),
(5) the WAL files for the new timeline are available via streaming,
and (6) recovery_target_timeline='latest'.

Commit ee994272ca50f70b53074f0febaec97e28f83c4e introduced this
logic and was an improvement over the previous code, but it mishandled
this case. If recovery_target_timeline='latest' and restore_command is
set, validateRecoveryParameters() can change recoveryTargetTLI to be
different from receiveTLI. If streaming is then tried afterward,
expectedTLEs gets initialized with the history of the wrong timeline.
It's supposed to be a list of entries explaining how to get to the
target timeline, but in this case it ends up with a list of entries
explaining how to get to the new standby's original timeline, which
isn't right.

Dilip Kumar and Robert Haas, reviewed by Kyotaro Horiguchi.

Discussion: http://postgr.es/m/CAFiTN-sE-jr=lb8jquxeqikd-ux+jhixyh4ydizmpedgqku...@mail.gmail.com
---
 src/backend/access/transam/xlog.c                | 10 ++-
 src/test/recovery/t/025_stuck_on_old_timeline.pl | 97 ++++++++++++++++++++++++
 src/test/recovery/t/cp_history_files             | 10 +++
 3 files changed, 116 insertions(+), 1 deletion(-)
 create mode 100644 src/test/recovery/t/025_stuck_on_old_timeline.pl
 create mode 100644 src/test/recovery/t/cp_history_files

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 42b902b..828564f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12288,11 +12288,19 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 						 * pg_wal by now.  Use XLOG_FROM_STREAM so that source
 						 * info is set correctly and XLogReceiptTime isn't
 						 * changed.
+						 *
+						 * NB: We must set readTimeLineHistory based on
+						 * recoveryTargetTLI, not receiveTLI. Normally they'll
+						 * be the same, but if recovery_target_timeline is
+						 * 'latest' and archiving is configured, then it's
+						 * possible that we managed to retrieve one or more
+						 * new timeline history files from the archive,
+						 * updating recoveryTargetTLI.
 						 */
 						if (readFile < 0)
 						{
 							if (!expectedTLEs)
-								expectedTLEs = readTimeLineHistory(receiveTLI);
+								expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
 							readFile = XLogFileRead(readSegNo, PANIC,
 													receiveTLI,
 													XLOG_FROM_STREAM, false);
diff --git a/src/test/recovery/t/025_stuck_on_old_timeline.pl b/src/test/recovery/t/025_stuck_on_old_timeline.pl
new file mode 100644
index 0000000..b3ce5da
--- /dev/null
+++ b/src/test/recovery/t/025_stuck_on_old_timeline.pl
@@ -0,0 +1,97 @@
+
+# Copyright (c) 2021, PostgreSQL Global Development Group
+
+# Testing streaming replication where standby is promoted and a new cascading
+# standby (without WAL) is connected to the promoted standby.  Both archiving
+# and streaming are enabled, but only the history file is available from the
+# archive, so the WAL files all have to be streamed.  Test that the cascading
+# standby can follow the new primary (promoted standby).
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use FindBin;
+use Test::More tests => 1;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+
+# Set up an archive command that will copy the history file but not the WAL
+# files. No real archive command should behave this way; the point is to
+# simulate a race condition where the new cascading standby starts up after
+# the timeline history file reaches the archive but before any of the WAL files
+# get there.
+$node_primary->init(allows_streaming => 1, has_archiving => 1);
+my $perlbin = $^X;
+$perlbin =~ s{\\}{\\\\}g if ($TestLib::windows_os);
+my $archivedir_primary = $node_primary->archive_dir;
+$node_primary->append_conf(
+	'postgresql.conf', qq(
+archive_command = '$perlbin "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
+));
+$node_primary->start;
+
+# Take backup from primary
+my $backup_name = 'my_backup';
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby = get_new_node('standby');
+$node_standby->init_from_backup($node_primary, $backup_name,
+	allows_streaming => 1, has_streaming => 1, has_archiving => 1);
+$node_standby->start;
+
+# Take backup of standby, use -Xnone so that pg_wal is empty.
+$node_standby->backup($backup_name, backup_options => ['-Xnone']);
+
+# Create cascading standby but don't start it yet.
+# Must set up both streaming and archiving.
+my $node_cascade = get_new_node('cascade');
+$node_cascade->init_from_backup($node_standby, $backup_name,
+	has_streaming => 1);
+$node_cascade->enable_restoring($node_primary);
+$node_cascade->append_conf('recovery.conf', qq(
+recovery_target_timeline='latest'
+));
+
+# Promote the standby.
+$node_standby->promote;
+
+# Wait for promotion to complete
+$node_standby->poll_query_until('postgres',
+								"SELECT NOT pg_is_in_recovery();")
+	or die "Timed out while waiting for promotion";
+
+# Find next WAL segment to be archived
+my $walfile_to_be_archived = $node_standby->safe_psql('postgres',
+	"SELECT pg_walfile_name(pg_current_wal_lsn());");
+
+# Make WAL segment eligible for archival
+$node_standby->safe_psql('postgres', 'SELECT pg_switch_wal()');
+
+# Wait until the WAL segment has been archived.
+# Since the history file gets created on promotion and is archived before any
+# WAL segment, this is enough to guarantee that the history file was
+# archived.
+my $archive_wait_query =
+  "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";
+$node_standby->poll_query_until('postgres', $archive_wait_query)
+  or die "Timed out while waiting for WAL segment to be archived";
+my $last_archived_wal_file = $walfile_to_be_archived;
+
+# Start cascade node
+$node_cascade->start;
+
+# Create some content on promoted standby and check its presence on the
+# cascading standby.
+$node_standby->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT 1 AS a");
+
+# Wait for the replication to catch up
+$node_standby->wait_for_catchup($node_cascade, 'replay',
+	$node_standby->lsn('insert'));
+
+# Check that cascading standby has the new content
+my $result =
+  $node_cascade->safe_psql('postgres', "SELECT count(*) FROM tab_int");
+print "cascade: $result\n";
+is($result, 1, 'check streamed content on cascade standby');
diff --git a/src/test/recovery/t/cp_history_files b/src/test/recovery/t/cp_history_files
new file mode 100644
index 0000000..cfeea41
--- /dev/null
+++ b/src/test/recovery/t/cp_history_files
@@ -0,0 +1,10 @@
+#!/usr/bin/perl
+
+use File::Copy;
+use strict;
+use warnings;
+
+die "wrong number of arguments" if @ARGV != 2;
+my ($source, $target) = @ARGV;
+exit if $source !~ /history/;
+copy($source, $target) or die "couldn't copy $source to $target: $!";
-- 
1.8.3.1

From a52e20bd0bde14d5e194e3d853b9f6ea72019ad5 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Wed, 9 Jun 2021 12:52:42 +0530
Subject: [PATCH v8] Back port backup param in PostgresNode.pm

---
 src/test/perl/PostgresNode.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 61aa048..beb7bc1 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -548,7 +548,7 @@ target server since it isn't done by default.
 
 sub backup
 {
-	my ($self, $backup_name) = @_;
+	my ($self, $backup_name, %params) = @_;
 	my $backup_path = $self->backup_dir . '/' . $backup_name;
 	my $name        = $self->name;
 
@@ -556,7 +556,8 @@ sub backup
 	TestLib::system_or_bail(
 		'pg_basebackup', '-D', $backup_path, '-h',
 		$self->host,     '-p', $self->port,  '--checkpoint',
-		'fast',          '--no-sync');
+		'fast',          '--no-sync',
+		@{ $params{backup_options} });
 	print "# Backup finished\n";
 	return;
 }
-- 
1.8.3.1

From ad1a5f0d9867c0123ae37e74a24fb2f47bc99c60 Mon Sep 17 00:00:00 2001
From: Robert Haas <rh...@postgresql.org>
Date: Tue, 8 Jun 2021 12:52:55 -0400
Subject: [PATCH v7] Fix corner case failure of new standby to follow new
 primary.

This only happens if (1) the new standby has no WAL available locally,
(2) the new standby is starting from the old timeline, (3) the promotion
happened in the WAL segment from which the new standby is starting,
(4) the timeline history file for the new timeline is available from
the archive but the WAL files for are not (i.e. this is a race),
(5) the WAL files for the new timeline are available via streaming,
and (6) recovery_target_timeline='latest'.

Commit ee994272ca50f70b53074f0febaec97e28f83c4e introduced this
logic and was an improvement over the previous code, but it mishandled
this case. If recovery_target_timeline='latest' and restore_command is
set, validateRecoveryParameters() can change recoveryTargetTLI to be
different from receiveTLI. If streaming is then tried afterward,
expectedTLEs gets initialized with the history of the wrong timeline.
It's supposed to be a list of entries explaining how to get to the
target timeline, but in this case it ends up with a list of entries
explaining how to get to the new standby's original timeline, which
isn't right.

Dilip Kumar and Robert Haas, reviewed by Kyotaro Horiguchi.

Discussion: http://postgr.es/m/CAFiTN-sE-jr=lb8jquxeqikd-ux+jhixyh4ydizmpedgqku...@mail.gmail.com
---
 src/backend/access/transam/xlog.c             | 10 +-
 .../recovery/t/025_stuck_on_old_timeline.pl   | 94 +++++++++++++++++++
 src/test/recovery/t/cp_history_files          | 10 ++
 3 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 src/test/recovery/t/025_stuck_on_old_timeline.pl
 create mode 100644 src/test/recovery/t/cp_history_files

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 441a9124cd..17eeff0720 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12658,11 +12658,19 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 						 * pg_wal by now.  Use XLOG_FROM_STREAM so that source
 						 * info is set correctly and XLogReceiptTime isn't
 						 * changed.
+						 *
+						 * NB: We must set readTimeLineHistory based on
+						 * recoveryTargetTLI, not receiveTLI. Normally they'll
+						 * be the same, but if recovery_target_timeline is
+						 * 'latest' and archiving is configured, then it's
+						 * possible that we managed to retrieve one or more
+						 * new timeline history files from the archive,
+						 * updating recoveryTargetTLI.
 						 */
 						if (readFile < 0)
 						{
 							if (!expectedTLEs)
-								expectedTLEs = readTimeLineHistory(receiveTLI);
+								expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
 							readFile = XLogFileRead(readSegNo, PANIC,
 													receiveTLI,
 													XLOG_FROM_STREAM, false);
diff --git a/src/test/recovery/t/025_stuck_on_old_timeline.pl b/src/test/recovery/t/025_stuck_on_old_timeline.pl
new file mode 100644
index 0000000000..842878a0c8
--- /dev/null
+++ b/src/test/recovery/t/025_stuck_on_old_timeline.pl
@@ -0,0 +1,94 @@
+
+# Copyright (c) 2021, PostgreSQL Global Development Group
+
+# Testing streaming replication where standby is promoted and a new cascading
+# standby (without WAL) is connected to the promoted standby.  Both archiving
+# and streaming are enabled, but only the history file is available from the
+# archive, so the WAL files all have to be streamed.  Test that the cascading
+# standby can follow the new primary (promoted standby).
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use FindBin;
+use Test::More tests => 1;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+
+# Set up an archive command that will copy the history file but not the WAL
+# files. No real archive command should behave this way; the point is to
+# simulate a race condition where the new cascading standby starts up after
+# the timeline history file reaches the archive but before any of the WAL files
+# get there.
+$node_primary->init(allows_streaming => 1, has_archiving => 1);
+my $perlbin = $^X;
+$perlbin =~ s{\\}{\\\\}g if ($TestLib::windows_os);
+my $archivedir_primary = $node_primary->archive_dir;
+$node_primary->append_conf(
+	'postgresql.conf', qq(
+archive_command = '$perlbin "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
+));
+$node_primary->start;
+
+# Take backup from primary
+my $backup_name = 'my_backup';
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby = get_new_node('standby');
+$node_standby->init_from_backup($node_primary, $backup_name,
+	allows_streaming => 1, has_streaming => 1, has_archiving => 1);
+$node_standby->start;
+
+# Take backup of standby, use -Xnone so that pg_wal is empty.
+$node_standby->backup($backup_name, backup_options => ['-Xnone']);
+
+# Create cascading standby but don't start it yet.
+# Must set up both streaming and archiving.
+my $node_cascade = get_new_node('cascade');
+$node_cascade->init_from_backup($node_standby, $backup_name,
+	has_streaming => 1);
+$node_cascade->enable_restoring($node_primary);
+
+# Promote the standby.
+$node_standby->promote;
+
+# Wait for promotion to complete
+$node_standby->poll_query_until('postgres',
+								"SELECT NOT pg_is_in_recovery();")
+	or die "Timed out while waiting for promotion";
+
+# Find next WAL segment to be archived
+my $walfile_to_be_archived = $node_standby->safe_psql('postgres',
+	"SELECT pg_walfile_name(pg_current_wal_lsn());");
+
+# Make WAL segment eligible for archival
+$node_standby->safe_psql('postgres', 'SELECT pg_switch_wal()');
+
+# Wait until the WAL segment has been archived.
+# Since the history file gets created on promotion and is archived before any
+# WAL segment, this is enough to guarantee that the history file was
+# archived.
+my $archive_wait_query =
+  "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";
+$node_standby->poll_query_until('postgres', $archive_wait_query)
+  or die "Timed out while waiting for WAL segment to be archived";
+my $last_archived_wal_file = $walfile_to_be_archived;
+
+# Start cascade node
+$node_cascade->start;
+
+# Create some content on promoted standby and check its presence on the
+# cascading standby.
+$node_standby->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT 1 AS a");
+
+# Wait for the replication to catch up
+$node_standby->wait_for_catchup($node_cascade, 'replay',
+	$node_standby->lsn('insert'));
+
+# Check that cascading standby has the new content
+my $result =
+  $node_cascade->safe_psql('postgres', "SELECT count(*) FROM tab_int");
+print "cascade: $result\n";
+is($result, 1, 'check streamed content on cascade standby');
diff --git a/src/test/recovery/t/cp_history_files b/src/test/recovery/t/cp_history_files
new file mode 100644
index 0000000000..cfeea41e5b
--- /dev/null
+++ b/src/test/recovery/t/cp_history_files
@@ -0,0 +1,10 @@
+#!/usr/bin/perl
+
+use File::Copy;
+use strict;
+use warnings;
+
+die "wrong number of arguments" if @ARGV != 2;
+my ($source, $target) = @ARGV;
+exit if $source !~ /history/;
+copy($source, $target) or die "couldn't copy $source to $target: $!";
-- 
2.24.3 (Apple Git-128)

Re: Race condition in recovery?

Reply via email to