At Mon, 20 Jun 2022 16:13:43 +0900, Michael Paquier <mich...@paquier.xyz> wrote 
in 
> On Fri, May 27, 2022 at 07:01:37PM +0000, Imseih (AWS), Sami wrote:
> > What we found:
> > 
> > 1. missingContrecPtr is set when 
> >    StandbyMode is false, therefore
> >    only a writer should set this value
> >    and a record is then sent downstream.
> > 
> >    But a standby going through crash 
> >    recovery will always have StandbyMode = false,
> >    causing the missingContrecPtr to be incorrectly
> >    set.
> 
> That stands as true as far as I know, StandbyMode would be switched
> only once we get out of crash recovery, and only if archive recovery
> completes when there is a restore_command.

Anyway the change;
-               abortedRecPtr = InvalidXLogRecPtr;
-               missingContrecPtr = InvalidXLogRecPtr;
+               //abortedRecPtr = InvalidXLogRecPtr;
+               //missingContrecPtr = InvalidXLogRecPtr;

Is injecting a bug that invalid "aborted contrecord" record can be
injected for certain conditions.  If a bug is intentionally injected,
it's quite natrual that some behavior gets broken..

> > 2. If StandbyModeRequested is checked instead,
> >      we ensure that a standby will not set a 
> >      missingContrecPtr.
> > 
> > 3. After applying the patch below, the tap test succeeded
> 
> Hmm.  I have not looked at that in depth, but if the intention is to
> check that the database is able to write WAL, looking at
> XLogCtl->SharedRecoveryState would be the way to go because that's the
> flip switching between crash recovery, archive recovery and the end of
> recovery (when WAL can be safely written).

What we are checking there is "we are going to write WAL", not "we are
writing".

(!StanbyMode && StandbyModeRequested) means the server have been
running crash-recovery before starting archive recovery.  In that
case, the server is supposed to continue with archived WAL without
insering a record.  However, if no archived WAL available and the
server promoted, the server needs to insert an "aborted contrecord"
record again.  (I'm not sure how that case happens in the field,
though.)

So I don't think !StandbyModeRequested is the right thing
here. Actually the attached test fails with the fix.

> The check in xlogrecovery_redo() still looks like a good thing to have
> anyway, because we know that we can safely skip the contrecord.  Now,
> for any patch produced, could the existing TAP test be extended so as
> we are able to get a PANIC even if we keep around the sanity check in
> xlogrecovery_redo().  That would likely involve an immediate shutdown
> of a standby followed by a start sequence?

Thus, I still don't see what have happened at Imseih's hand, but I can
cause PANIC with a bit tricky steps, which I don't think valid.  This
is what I wanted to know the exact steps to cause the PANIC.

The attached 1 is the PoC of the TAP test (it uses system()..), and
the second is a tentative fix for that.  (I don't like the fix, too,
though...)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/test/recovery/t/026_overwrite_contrecord.pl b/src/test/recovery/t/026_overwrite_contrecord.pl
index 78feccd9aa..bce3d6d701 100644
--- a/src/test/recovery/t/026_overwrite_contrecord.pl
+++ b/src/test/recovery/t/026_overwrite_contrecord.pl
@@ -68,8 +68,7 @@ ok($initfile ne $endfile, "$initfile differs from $endfile");
 # contents
 $node->stop('immediate');
 
-unlink $node->basedir . "/pgdata/pg_wal/$endfile"
-  or die "could not unlink " . $node->basedir . "/pgdata/pg_wal/$endfile: $!";
+system(sprintf("mv %s/pgdata/pg_wal/$endfile %s/archives/", $node->basedir, $node->basedir));
 
 # OK, create a standby at this spot.
 $node->backup_fs_cold('backup');
@@ -106,4 +105,17 @@ $node_standby->promote;
 $node->stop;
 $node_standby->stop;
 
+my $node_primary_2  = PostgreSQL::Test::Cluster->new('primary_2');
+$node_primary_2->init_from_backup($node, 'backup', has_restoring => 1, standby => 0);
+
+$node_primary_2->append_conf(
+	'postgresql.conf', qq(
+log_min_messages = debug1
+));
+$node_primary_2->start;
+$node_primary_2->poll_query_until('postgres',
+								  'SELECT NOT pg_is_in_recovery()');
+is($node_primary_2->safe_psql('postgres', 'SELECT pg_is_in_recovery()'), 'f',
+  'check if the copied server has promoted');
+  
 done_testing();
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 6eba626420..6b026af74e 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3025,6 +3025,19 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
 
 		if (record)
 		{
+			/*
+			 * If we see an complete record after recorded aborted contrecord,
+			 * that means the aborted contrecord is a false one.  This happens
+			 * when we can continue archive recovery after crash recovery ended
+			 * with aborted contrecord.
+			 */
+			if (abortedRecPtr != InvalidXLogRecPtr &&
+				abortedRecPtr < xlogreader->EndRecPtr)
+			{
+				abortedRecPtr = InvalidXLogRecPtr;
+				missingContrecPtr = InvalidXLogRecPtr;
+			}
+
 			/* Great, got a record */
 			return record;
 		}

Reply via email to