Re: Timeout control within tests

Noah Misch Sun, 17 Apr 2022 21:24:15 -0700

(I pushed the main patch as f2698ea, on 2022-03-04.)

On Fri, Feb 18, 2022 at 06:41:36PM -0800, Noah Misch wrote:
> On Fri, Feb 18, 2022 at 10:26:52AM -0500, Tom Lane wrote:
> > Noah Misch <n...@leadboat.com> writes:
> > > On Thu, Feb 17, 2022 at 09:48:25PM -0800, Andres Freund wrote:
> > >> Meson's test runner has the concept of a "timeout multiplier" for ways of
> > >> running tests. Meson's stuff is about entire tests (i.e. one tap test), 
> > >> so
> > >> doesn't apply here, but I wonder if we shouldn't do something similar?
> > 
> > > Hmmm.  It is good if the user can express an intent that continues to make
> > > sense if we change the default timeout.  For the buildfarm use case, a
> > > multiplier is moderately better on that axis 
> > > (PG_TEST_TIMEOUT_MULTIPLIER=100
> > > beats PG_TEST_TIMEOUT_DEFAULT=18000).  For the hacker use case, an 
> > > absolute
> > > value is substantially better on that axis (PG_TEST_TIMEOUT_DEFAULT=3 
> > > beats
> > > PG_TEST_TIMEOUT_MULTIPLIER=.016666).
> > 
> > FWIW, I'm fairly sure that PGISOLATIONTIMEOUT=300 was selected after
> > finding that smaller values didn't work reliably in the buildfarm.
> > Now maybe 741d7f1 fixed that, but I wouldn't count on it.  So while I
> > approve of the idea to remove PGISOLATIONTIMEOUT in favor of using this
> > centralized setting, I think that we might need to have a multiplier
> > there, or else we'll end up with PG_TEST_TIMEOUT_DEFAULT set to 300
> > across the board.  Perhaps the latter is fine, but a multiplier seems a
> > bit more flexible.
> 
> The PGISOLATIONTIMEOUT replacement was 2*timeout_default, so isolation suites
> would get 2*180s=360s.  (I don't want to lower any default timeouts, but I
> don't mind raising them.)  In a sense, PG_TEST_TIMEOUT_DEFAULT is a multiplier
> with as many sites as possible multiplying it by 1.  The patch has multiples
> at two code sites.


Here's the PGISOLATIONTIMEOUT replacement patch.  I waffled on whether to
back-patch.  Since it affects only isolation suite testing, only on systems
too slow for the default timeout, it's not a major decision.  I currently plan
not to back-patch, since slow systems that would have wanted a back-patch can
just set both variables.

Author:     Noah Misch <n...@leadboat.com>
Commit:     Noah Misch <n...@leadboat.com>

    Replace PGISOLATIONTIMEOUT with 2 * PG_TEST_TIMEOUT_DEFAULT.
    
    Now that the more-generic variable exists, use it.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20220219024136.ga3670...@rfd.leadboat.com

diff --git a/src/test/isolation/README b/src/test/isolation/README
index 8457a56..5818ca5 100644
--- a/src/test/isolation/README
+++ b/src/test/isolation/README
@@ -47,10 +47,10 @@ pg_isolation_regress is a tool similar to pg_regress, but 
instead of using
 psql to execute a test, it uses isolationtester.  It accepts all the same
 command-line arguments as pg_regress.
 
-By default, isolationtester will wait at most 300 seconds (5 minutes)
+By default, isolationtester will wait at most 360 seconds (6 minutes)
 for any one test step to complete.  If you need to adjust this, set
-the environment variable PGISOLATIONTIMEOUT to the desired timeout
-in seconds.
+the environment variable PG_TEST_TIMEOUT_DEFAULT to half the desired
+timeout in seconds.
 
 
 Test specification
@@ -138,10 +138,11 @@ Each step may contain commands that block until further 
action has been taken
 deadlock).  A test that uses this ability must manually specify valid
 permutations, i.e. those that would not expect a blocked session to execute a
 command.  If a test fails to follow that rule, isolationtester will cancel it
-after PGISOLATIONTIMEOUT seconds.  If the cancel doesn't work, isolationtester
-will exit uncleanly after a total of twice PGISOLATIONTIMEOUT.  Testing
-invalid permutations should be avoided because they can make the isolation
-tests take a very long time to run, and they serve no useful testing purpose.
+after 2 * PG_TEST_TIMEOUT_DEFAULT seconds.  If the cancel doesn't work,
+isolationtester will exit uncleanly after a total of 4 *
+PG_TEST_TIMEOUT_DEFAULT.  Testing invalid permutations should be avoided
+because they can make the isolation tests take a very long time to run, and
+they serve no useful testing purpose.
 
 Note that isolationtester recognizes that a command has blocked by looking
 to see if it is shown as waiting in the pg_locks view; therefore, only
diff --git a/src/test/isolation/isolationtester.c 
b/src/test/isolation/isolationtester.c
index 12179f2..095db8f 100644
--- a/src/test/isolation/isolationtester.c
+++ b/src/test/isolation/isolationtester.c
@@ -46,7 +46,7 @@ static int    nconns = 0;
 static bool any_new_notice = false;
 
 /* Maximum time to wait before giving up on a step (in usec) */
-static int64 max_step_wait = 300 * USECS_PER_SEC;
+static int64 max_step_wait = 360 * USECS_PER_SEC;
 
 
 static void check_testspec(TestSpec *testspec);
@@ -128,12 +128,12 @@ main(int argc, char **argv)
                conninfo = "dbname = postgres";
 
        /*
-        * If PGISOLATIONTIMEOUT is set in the environment, adopt its value 
(given
-        * in seconds) as the max time to wait for any one step to complete.
+        * If PG_TEST_TIMEOUT_DEFAULT is set, adopt its value (given in seconds)
+        * as half the max time to wait for any one step to complete.
         */
-       env_wait = getenv("PGISOLATIONTIMEOUT");
+       env_wait = getenv("PG_TEST_TIMEOUT_DEFAULT");
        if (env_wait != NULL)
-               max_step_wait = ((int64) atoi(env_wait)) * USECS_PER_SEC;
+               max_step_wait = 2 * ((int64) atoi(env_wait)) * USECS_PER_SEC;
 
        /* Read the test spec from stdin */
        spec_yyparse();

Re: Timeout control within tests

Reply via email to