(I pushed the main patch as f2698ea, on 2022-03-04.) On Fri, Feb 18, 2022 at 06:41:36PM -0800, Noah Misch wrote: > On Fri, Feb 18, 2022 at 10:26:52AM -0500, Tom Lane wrote: > > Noah Misch <n...@leadboat.com> writes: > > > On Thu, Feb 17, 2022 at 09:48:25PM -0800, Andres Freund wrote: > > >> Meson's test runner has the concept of a "timeout multiplier" for ways of > > >> running tests. Meson's stuff is about entire tests (i.e. one tap test), > > >> so > > >> doesn't apply here, but I wonder if we shouldn't do something similar? > > > > > Hmmm. It is good if the user can express an intent that continues to make > > > sense if we change the default timeout. For the buildfarm use case, a > > > multiplier is moderately better on that axis > > > (PG_TEST_TIMEOUT_MULTIPLIER=100 > > > beats PG_TEST_TIMEOUT_DEFAULT=18000). For the hacker use case, an > > > absolute > > > value is substantially better on that axis (PG_TEST_TIMEOUT_DEFAULT=3 > > > beats > > > PG_TEST_TIMEOUT_MULTIPLIER=.016666). > > > > FWIW, I'm fairly sure that PGISOLATIONTIMEOUT=300 was selected after > > finding that smaller values didn't work reliably in the buildfarm. > > Now maybe 741d7f1 fixed that, but I wouldn't count on it. So while I > > approve of the idea to remove PGISOLATIONTIMEOUT in favor of using this > > centralized setting, I think that we might need to have a multiplier > > there, or else we'll end up with PG_TEST_TIMEOUT_DEFAULT set to 300 > > across the board. Perhaps the latter is fine, but a multiplier seems a > > bit more flexible. > > The PGISOLATIONTIMEOUT replacement was 2*timeout_default, so isolation suites > would get 2*180s=360s. (I don't want to lower any default timeouts, but I > don't mind raising them.) In a sense, PG_TEST_TIMEOUT_DEFAULT is a multiplier > with as many sites as possible multiplying it by 1. The patch has multiples > at two code sites.
Here's the PGISOLATIONTIMEOUT replacement patch. I waffled on whether to back-patch. Since it affects only isolation suite testing, only on systems too slow for the default timeout, it's not a major decision. I currently plan not to back-patch, since slow systems that would have wanted a back-patch can just set both variables.
Author: Noah Misch <n...@leadboat.com> Commit: Noah Misch <n...@leadboat.com> Replace PGISOLATIONTIMEOUT with 2 * PG_TEST_TIMEOUT_DEFAULT. Now that the more-generic variable exists, use it. Reviewed by FIXME. Discussion: https://postgr.es/m/20220219024136.ga3670...@rfd.leadboat.com diff --git a/src/test/isolation/README b/src/test/isolation/README index 8457a56..5818ca5 100644 --- a/src/test/isolation/README +++ b/src/test/isolation/README @@ -47,10 +47,10 @@ pg_isolation_regress is a tool similar to pg_regress, but instead of using psql to execute a test, it uses isolationtester. It accepts all the same command-line arguments as pg_regress. -By default, isolationtester will wait at most 300 seconds (5 minutes) +By default, isolationtester will wait at most 360 seconds (6 minutes) for any one test step to complete. If you need to adjust this, set -the environment variable PGISOLATIONTIMEOUT to the desired timeout -in seconds. +the environment variable PG_TEST_TIMEOUT_DEFAULT to half the desired +timeout in seconds. Test specification @@ -138,10 +138,11 @@ Each step may contain commands that block until further action has been taken deadlock). A test that uses this ability must manually specify valid permutations, i.e. those that would not expect a blocked session to execute a command. If a test fails to follow that rule, isolationtester will cancel it -after PGISOLATIONTIMEOUT seconds. If the cancel doesn't work, isolationtester -will exit uncleanly after a total of twice PGISOLATIONTIMEOUT. Testing -invalid permutations should be avoided because they can make the isolation -tests take a very long time to run, and they serve no useful testing purpose. +after 2 * PG_TEST_TIMEOUT_DEFAULT seconds. If the cancel doesn't work, +isolationtester will exit uncleanly after a total of 4 * +PG_TEST_TIMEOUT_DEFAULT. Testing invalid permutations should be avoided +because they can make the isolation tests take a very long time to run, and +they serve no useful testing purpose. Note that isolationtester recognizes that a command has blocked by looking to see if it is shown as waiting in the pg_locks view; therefore, only diff --git a/src/test/isolation/isolationtester.c b/src/test/isolation/isolationtester.c index 12179f2..095db8f 100644 --- a/src/test/isolation/isolationtester.c +++ b/src/test/isolation/isolationtester.c @@ -46,7 +46,7 @@ static int nconns = 0; static bool any_new_notice = false; /* Maximum time to wait before giving up on a step (in usec) */ -static int64 max_step_wait = 300 * USECS_PER_SEC; +static int64 max_step_wait = 360 * USECS_PER_SEC; static void check_testspec(TestSpec *testspec); @@ -128,12 +128,12 @@ main(int argc, char **argv) conninfo = "dbname = postgres"; /* - * If PGISOLATIONTIMEOUT is set in the environment, adopt its value (given - * in seconds) as the max time to wait for any one step to complete. + * If PG_TEST_TIMEOUT_DEFAULT is set, adopt its value (given in seconds) + * as half the max time to wait for any one step to complete. */ - env_wait = getenv("PGISOLATIONTIMEOUT"); + env_wait = getenv("PG_TEST_TIMEOUT_DEFAULT"); if (env_wait != NULL) - max_step_wait = ((int64) atoi(env_wait)) * USECS_PER_SEC; + max_step_wait = 2 * ((int64) atoi(env_wait)) * USECS_PER_SEC; /* Read the test spec from stdin */ spec_yyparse();