On Sat, Apr 9, 2022 at 12:59 PM Andres Freund <and...@anarazel.de> wrote: > On 2022-04-08 17:55:51 -0400, Tom Lane wrote: > > I tried adjusting the patch so it does guarantee that (as attached), > > and in two out of two tries it got past the archive_cleanup_command > > failure but then hung up waiting for standby2 to promote. > > Adding > > $node_standby->safe_psql('postgres', "SELECT pg_switch_wal()"); > just after > $node_standby2->start; > > makes the tests pass here.
Sorry for the delay... I got a bit confused about the different things going on in this thread but I hope I've got it now: 1. This test had some pre-existing bugs/races, which hadn't failed before due to scheduling, even under Valgrind. The above changes appear to fix those problems. To Michael for comment. > What is that second test really testing? > > # Check the presence of temporary files specifically generated during > # archive recovery. To ensure the presence of the temporary history > # file, switch to a timeline large enough to allow a standby to recover > # a history file from an archive. As this requires at least two timeline > # switches, promote the existing standby first. Then create a second > # standby based on the promoted one. Finally, the second standby is > # promoted. > > Note "Then create a second standby based on the promoted one." - but that's > not actually what's happening: 2. There may also be other problems with the test but those aren't relevant to skink's failure, which starts on the 5th test. To Michael for comment. > But I think there's also something funky in the prefetching logic. I think it > may attempt restoring during prefetching somehow, even though there's code > that appears to try to prevent that? 3. Urghl. Yeah. There is indeed code to report XLREAD_WOULDBLOCK for the lastSourceFailed case, but we don't really want to do that more often than every 5 seconds. So I think I need to make that "sticky", so that we don't attempt to prefetch again (ie to read from the next segment, which invokes restore_command) until we've replayed all the records we already have, then hit the end and go to sleep. The attached patch does that, and makes the offending test pass under Valgrind for me, even without the other changes already mentioned. If I understand correctly, this is due to a timing race in the tests (though I didn't check where exactly), because all those extra fork/exec calls are extremely slow under Valgrind. > And interestingly I'm not seeing the > "switched WAL source from stream to archive after failure" > lines I'd expect. I see them now. It's because it gives up when it's reading ahead (nonblocking), which may not be strictly necessary but I found it simpler to think about. Then when it tries again in 5 seconds it's in blocking mode so it doesn't give up so easily. 2022-04-11 18:15:08.220 NZST [524796] DEBUG: switched WAL source from stream to archive after failure cp: cannot stat '/tmp/archive/000000010000000000000017': No such file or directory 2022-04-11 18:15:08.226 NZST [524796] DEBUG: could not restore file "000000010000000000000017" from archive: child process exited with exit code 1 2022-04-11 18:15:08.226 NZST [524796] DEBUG: could not open file "pg_wal/000000010000000000000017": No such file or directory 2022-04-11 18:15:08.226 NZST [524796] DEBUG: switched WAL source from archive to stream after failure
From e488e20f7c364ba1adaae8d1f6ea92a7f5d0bda6 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Mon, 11 Apr 2022 17:16:24 +1200 Subject: [PATCH] Don't retry restore_command while reading ahead. Suppress further attempts to read ahead in the WAL if we run out of data, until records already decoded have been replayed. When replaying from archives, this avoids repeatedly retrying the restore_command until after we've slept for 5 seconds. When replaying from a network stream, this makes us less aggressive in theory, but we probably can't benefit from being more aggressive yet anyway as the flushedUpto variable doesn't advance until we run out of data (though we could improve that in future). While here, fix a potential off-by one confusion with the no_readahead_until mechanism: we suppress readahead until after the record that begins at that LSN has been replayed (ie we are replaying a higher LSN), so change < to <=. Defect in commit 5dc0418f. Reported-by: Andres Freund <and...@anarazel.de> Discussion: https://postgr.es/m/20220409005910.alw46xqmmgny2sgr%40alap3.anarazel.de --- src/backend/access/transam/xlogprefetcher.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/src/backend/access/transam/xlogprefetcher.c b/src/backend/access/transam/xlogprefetcher.c index f3428888d2..43e82b65c2 100644 --- a/src/backend/access/transam/xlogprefetcher.c +++ b/src/backend/access/transam/xlogprefetcher.c @@ -487,8 +487,8 @@ XLogPrefetcherNextBlock(uintptr_t pgsr_private, XLogRecPtr *lsn) */ nonblocking = XLogReaderHasQueuedRecordOrError(reader); - /* Certain records act as barriers for all readahead. */ - if (nonblocking && replaying_lsn < prefetcher->no_readahead_until) + /* Readahead is disabled until we replay past a certain point. */ + if (nonblocking && replaying_lsn <= prefetcher->no_readahead_until) return LRQ_NEXT_AGAIN; record = XLogReadAhead(prefetcher->reader, nonblocking); @@ -496,8 +496,13 @@ XLogPrefetcherNextBlock(uintptr_t pgsr_private, XLogRecPtr *lsn) { /* * We can't read any more, due to an error or lack of data in - * nonblocking mode. + * nonblocking mode. Don't try to read ahead again until we've + * replayed everything already decoded. */ + if (nonblocking && prefetcher->reader->decode_queue_tail) + prefetcher->no_readahead_until = + prefetcher->reader->decode_queue_tail->lsn; + return LRQ_NEXT_AGAIN; } -- 2.30.2