On Thu, May 12, 2022 at 4:57 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Thu, May 12, 2022 at 3:13 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > > error running SQL: 'psql:<stdin>:1: ERROR: source database > > "conflict_db_template" is being accessed by other users > > DETAIL: There is 1 other session using the database.' > > Oh, for this one I think it may just be that the autovacuum worker > with PID 23757 took longer to exit than the 5 seconds > CountOtherDBBackends() is prepared to wait, after sending it SIGTERM.
In this test, autovacuum_naptime is set to 1s (per Andres, AV was implicated when he first saw the problem with pg_upgrade, hence desire to crank it up). That's not necessary: commenting out the active line in ProcessBarrierSmgrRelease() shows that the tests reliably reproduce data corruption without it. Let's just take that out. As for skink failing, the timeout was hard coded 300s for the whole test, but apparently that wasn't enough under valgrind. Let's use the standard PostgreSQL::Test::Utils::timeout_default (180s usually), but reset it for each query we send. See attached.
From ffcb61004fd06c9b2db56c7fe045c7c726d67a72 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Fri, 13 May 2022 13:40:03 +1200 Subject: [PATCH] Fix slow animal timeouts in 032_relfilenode_reuse.pl. Per BF animal chipmunk: CREATE DATABASE could apparently fail due to an AV process being in the template database and not quitting fast enough for the 5 second timeout in CountOtherDBBackends(). The test script had autovacuum_naptime=1s to encourage more activity that opens fds, but that wasn't strictly necessary for this test. Take it out. Per BF animal skink: the test had a global 300s timeout, but apparently that was not enough under valgrind. Use the standard timeout PostgreSQL::Test::Utils::timeout_default, but reset it for each query we run. Discussion: --- src/test/recovery/t/032_relfilenode_reuse.pl | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/src/test/recovery/t/032_relfilenode_reuse.pl b/src/test/recovery/t/032_relfilenode_reuse.pl index ac9340b7dd..5a6a759aa5 100644 --- a/src/test/recovery/t/032_relfilenode_reuse.pl +++ b/src/test/recovery/t/032_relfilenode_reuse.pl @@ -14,7 +14,6 @@ log_connections=on # to avoid "repairing" corruption full_page_writes=off log_min_messages=debug2 -autovacuum_naptime=1s shared_buffers=1MB ]); $node_primary->start; @@ -28,11 +27,8 @@ $node_standby->init_from_backup($node_primary, $backup_name, has_streaming => 1); $node_standby->start; -# To avoid hanging while expecting some specific input from a psql -# instance being driven by us, add a timeout high enough that it -# should never trigger even on very slow machines, unless something -# is really wrong. -my $psql_timeout = IPC::Run::timer(300); +# We'll reset this timeout for each individual query we run. +my $psql_timeout = IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default); my %psql_primary = (stdin => '', stdout => '', stderr => ''); $psql_primary{run} = IPC::Run::start( @@ -202,6 +198,9 @@ sub send_query_and_wait my ($psql, $query, $untl) = @_; my $ret; + $psql_timeout->reset(); + $psql_timeout->start(); + # send query $$psql{stdin} .= $query; $$psql{stdin} .= "\n"; -- 2.36.0