On Tue, Apr 12, 2022 at 3:49 PM Michael Paquier <mich...@paquier.xyz> wrote: > All that stuff leads me to the attached. Thoughts?
Under valgrind I got "Undefined subroutine &main::usleep called at t/002_archiving.pl line 103" so I added "use Time::HiRes qw(usleep);", and now I get past the first 4 tests with your patch, but then promotion times out, not sure why: +++ tap check in src/test/recovery +++ t/002_archiving.pl .. ok 1 - check content from archives ok 2 - archive_cleanup_command executed on checkpoint ok 3 - recovery_end_command not executed yet # found 00000002.history after 14 attempts ok 4 - recovery_end_command executed after promotion Bailout called. Further testing stopped: command "pg_ctl -D /home/tmunro/projects/postgresql/src/test/recovery/tmp_check/t_002_archiving_standby2_data/pgdata -l /home/tmunro/projects/postgresql/src/test/recovery/tmp_check/log/002_archiving_standby2.log promote" exited with value 1 Since it's quite painful to run TAP tests under valgrind, I found a place to stick a plain old sleep to repro these problems: --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -1035,7 +1035,7 @@ sub enable_restoring my $copy_command = $PostgreSQL::Test::Utils::windows_os ? qq{copy "$path\\\\%f" "%p"} - : qq{cp "$path/%f" "%p"}; + : qq{sleep 1 && cp "$path/%f" "%p"}; Soon I'll push the fix to the slowness that xlogprefetcher.c accidentally introduced to continuous archive recovery, ie the problem of calling a failing restore_command repeatedly as we approach the end of a WAL segment instead of just once every 5 seconds after we run out of data, and after that you'll probably need to revert that fix locally to repro this.