On Sat, Apr 9, 2022 at 12:59 PM Andres Freund <and...@anarazel.de> wrote:
> On 2022-04-08 17:55:51 -0400, Tom Lane wrote:
> > I tried adjusting the patch so it does guarantee that (as attached),
> > and in two out of two tries it got past the archive_cleanup_command
> > failure but then hung up waiting for standby2 to promote.
>
> Adding
>
> $node_standby->safe_psql('postgres', "SELECT pg_switch_wal()");
> just after
> $node_standby2->start;
>
> makes the tests pass here.

Sorry for the delay... I got a bit confused about the different things
going on in this thread but I hope I've got it now:

1.  This test had some pre-existing bugs/races, which hadn't failed
before due to scheduling, even under Valgrind.  The above changes
appear to fix those problems.  To Michael for comment.

> What is that second test really testing?
>
> # Check the presence of temporary files specifically generated during
> # archive recovery.  To ensure the presence of the temporary history
> # file, switch to a timeline large enough to allow a standby to recover
> # a history file from an archive.  As this requires at least two timeline
> # switches, promote the existing standby first.  Then create a second
> # standby based on the promoted one.  Finally, the second standby is
> # promoted.
>
> Note "Then create a second standby based on the promoted one." - but that's
> not actually what's happening:

2.  There may also be other problems with the test but those aren't
relevant to skink's failure, which starts on the 5th test.  To Michael
for comment.

> But I think there's also something funky in the prefetching logic. I think it
> may attempt restoring during prefetching somehow, even though there's code
> that appears to try to prevent that?

3.  Urghl.  Yeah.  There is indeed code to report XLREAD_WOULDBLOCK
for the lastSourceFailed case, but we don't really want to do that
more often than every 5 seconds.  So I think I need to make that
"sticky", so that we don't attempt to prefetch again (ie to read from
the next segment, which invokes restore_command) until we've replayed
all the records we already have, then hit the end and go to sleep.
The attached patch does that, and makes the offending test pass under
Valgrind for me, even without the other changes already mentioned.  If
I understand correctly, this is due to a timing race in the tests
(though I didn't check where exactly), because all those extra
fork/exec calls are extremely slow under Valgrind.

> And interestingly I'm not seeing the
> "switched WAL source from stream to archive after failure"
> lines I'd expect.

I see them now.  It's because it gives up when it's reading ahead
(nonblocking), which may not be strictly necessary but I found it
simpler to think about.  Then when it tries again in 5 seconds it's in
blocking mode so it doesn't give up so easily.

2022-04-11 18:15:08.220 NZST [524796] DEBUG:  switched WAL source from
stream to archive after failure
cp: cannot stat '/tmp/archive/000000010000000000000017': No such file
or directory
2022-04-11 18:15:08.226 NZST [524796] DEBUG:  could not restore file
"000000010000000000000017" from archive: child process exited with
exit code 1
2022-04-11 18:15:08.226 NZST [524796] DEBUG:  could not open file
"pg_wal/000000010000000000000017": No such file or directory
2022-04-11 18:15:08.226 NZST [524796] DEBUG:  switched WAL source from
archive to stream after failure
From e488e20f7c364ba1adaae8d1f6ea92a7f5d0bda6 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 11 Apr 2022 17:16:24 +1200
Subject: [PATCH] Don't retry restore_command while reading ahead.

Suppress further attempts to read ahead in the WAL if we run out of
data, until records already decoded have been replayed.  When replaying
from archives, this avoids repeatedly retrying the restore_command until
after we've slept for 5 seconds.

When replaying from a network stream, this makes us less aggressive in
theory, but we probably can't benefit from being more aggressive yet
anyway as the flushedUpto variable doesn't advance until we run out of
data (though we could improve that in future).

While here, fix a potential off-by one confusion with the
no_readahead_until mechanism: we suppress readahead until after the
record that begins at that LSN has been replayed (ie we are replaying a
higher LSN), so change < to <=.

Defect in commit 5dc0418f.

Reported-by: Andres Freund <and...@anarazel.de>
Discussion: https://postgr.es/m/20220409005910.alw46xqmmgny2sgr%40alap3.anarazel.de
---
 src/backend/access/transam/xlogprefetcher.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlogprefetcher.c b/src/backend/access/transam/xlogprefetcher.c
index f3428888d2..43e82b65c2 100644
--- a/src/backend/access/transam/xlogprefetcher.c
+++ b/src/backend/access/transam/xlogprefetcher.c
@@ -487,8 +487,8 @@ XLogPrefetcherNextBlock(uintptr_t pgsr_private, XLogRecPtr *lsn)
 			 */
 			nonblocking = XLogReaderHasQueuedRecordOrError(reader);
 
-			/* Certain records act as barriers for all readahead. */
-			if (nonblocking && replaying_lsn < prefetcher->no_readahead_until)
+			/* Readahead is disabled until we replay past a certain point. */
+			if (nonblocking && replaying_lsn <= prefetcher->no_readahead_until)
 				return LRQ_NEXT_AGAIN;
 
 			record = XLogReadAhead(prefetcher->reader, nonblocking);
@@ -496,8 +496,13 @@ XLogPrefetcherNextBlock(uintptr_t pgsr_private, XLogRecPtr *lsn)
 			{
 				/*
 				 * We can't read any more, due to an error or lack of data in
-				 * nonblocking mode.
+				 * nonblocking mode.  Don't try to read ahead again until we've
+				 * replayed everything already decoded.
 				 */
+				if (nonblocking && prefetcher->reader->decode_queue_tail)
+					prefetcher->no_readahead_until =
+						prefetcher->reader->decode_queue_tail->lsn;
+
 				return LRQ_NEXT_AGAIN;
 			}
 
-- 
2.30.2

Reply via email to