At Mon, 20 Jun 2022 16:13:43 +0900, Michael Paquier <mich...@paquier.xyz> wrote in > On Fri, May 27, 2022 at 07:01:37PM +0000, Imseih (AWS), Sami wrote: > > What we found: > > > > 1. missingContrecPtr is set when > > StandbyMode is false, therefore > > only a writer should set this value > > and a record is then sent downstream. > > > > But a standby going through crash > > recovery will always have StandbyMode = false, > > causing the missingContrecPtr to be incorrectly > > set. > > That stands as true as far as I know, StandbyMode would be switched > only once we get out of crash recovery, and only if archive recovery > completes when there is a restore_command.
Anyway the change; - abortedRecPtr = InvalidXLogRecPtr; - missingContrecPtr = InvalidXLogRecPtr; + //abortedRecPtr = InvalidXLogRecPtr; + //missingContrecPtr = InvalidXLogRecPtr; Is injecting a bug that invalid "aborted contrecord" record can be injected for certain conditions. If a bug is intentionally injected, it's quite natrual that some behavior gets broken.. > > 2. If StandbyModeRequested is checked instead, > > we ensure that a standby will not set a > > missingContrecPtr. > > > > 3. After applying the patch below, the tap test succeeded > > Hmm. I have not looked at that in depth, but if the intention is to > check that the database is able to write WAL, looking at > XLogCtl->SharedRecoveryState would be the way to go because that's the > flip switching between crash recovery, archive recovery and the end of > recovery (when WAL can be safely written). What we are checking there is "we are going to write WAL", not "we are writing". (!StanbyMode && StandbyModeRequested) means the server have been running crash-recovery before starting archive recovery. In that case, the server is supposed to continue with archived WAL without insering a record. However, if no archived WAL available and the server promoted, the server needs to insert an "aborted contrecord" record again. (I'm not sure how that case happens in the field, though.) So I don't think !StandbyModeRequested is the right thing here. Actually the attached test fails with the fix. > The check in xlogrecovery_redo() still looks like a good thing to have > anyway, because we know that we can safely skip the contrecord. Now, > for any patch produced, could the existing TAP test be extended so as > we are able to get a PANIC even if we keep around the sanity check in > xlogrecovery_redo(). That would likely involve an immediate shutdown > of a standby followed by a start sequence? Thus, I still don't see what have happened at Imseih's hand, but I can cause PANIC with a bit tricky steps, which I don't think valid. This is what I wanted to know the exact steps to cause the PANIC. The attached 1 is the PoC of the TAP test (it uses system()..), and the second is a tentative fix for that. (I don't like the fix, too, though...) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
diff --git a/src/test/recovery/t/026_overwrite_contrecord.pl b/src/test/recovery/t/026_overwrite_contrecord.pl index 78feccd9aa..bce3d6d701 100644 --- a/src/test/recovery/t/026_overwrite_contrecord.pl +++ b/src/test/recovery/t/026_overwrite_contrecord.pl @@ -68,8 +68,7 @@ ok($initfile ne $endfile, "$initfile differs from $endfile"); # contents $node->stop('immediate'); -unlink $node->basedir . "/pgdata/pg_wal/$endfile" - or die "could not unlink " . $node->basedir . "/pgdata/pg_wal/$endfile: $!"; +system(sprintf("mv %s/pgdata/pg_wal/$endfile %s/archives/", $node->basedir, $node->basedir)); # OK, create a standby at this spot. $node->backup_fs_cold('backup'); @@ -106,4 +105,17 @@ $node_standby->promote; $node->stop; $node_standby->stop; +my $node_primary_2 = PostgreSQL::Test::Cluster->new('primary_2'); +$node_primary_2->init_from_backup($node, 'backup', has_restoring => 1, standby => 0); + +$node_primary_2->append_conf( + 'postgresql.conf', qq( +log_min_messages = debug1 +)); +$node_primary_2->start; +$node_primary_2->poll_query_until('postgres', + 'SELECT NOT pg_is_in_recovery()'); +is($node_primary_2->safe_psql('postgres', 'SELECT pg_is_in_recovery()'), 'f', + 'check if the copied server has promoted'); + done_testing();
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 6eba626420..6b026af74e 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -3025,6 +3025,19 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode, if (record) { + /* + * If we see an complete record after recorded aborted contrecord, + * that means the aborted contrecord is a false one. This happens + * when we can continue archive recovery after crash recovery ended + * with aborted contrecord. + */ + if (abortedRecPtr != InvalidXLogRecPtr && + abortedRecPtr < xlogreader->EndRecPtr) + { + abortedRecPtr = InvalidXLogRecPtr; + missingContrecPtr = InvalidXLogRecPtr; + } + /* Great, got a record */ return record; }