We've had more buildfarm failures due to hard-coded, short timeouts:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2018-10-13%2021%3A06%3A58
  10s timeout while running pg_recvlogical
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2018-12-03%2005%3A52%3A12
  30s timeout while running pg_recvlogical
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2018-11-30%2014%3A31%3A18
  60s timeout in isolationtester try_complete_step()

The 180s timeout in poll_query_until has been trouble-free since 2a0f89c
introduced it two years ago.  I plan to raise the timeouts in question to
180s, as attached.
diff --git a/src/test/isolation/README b/src/test/isolation/README
index bea278a..780b6dc 100644
--- a/src/test/isolation/README
+++ b/src/test/isolation/README
@@ -108,8 +108,8 @@ Each step may contain commands that block until further 
action has been taken
 deadlock).  A test that uses this ability must manually specify valid
 permutations, i.e. those that would not expect a blocked session to execute a
 command.  If a test fails to follow that rule, isolationtester will cancel it
-after 60 seconds.  If the cancel doesn't work, isolationtester will exit
-uncleanly after a total of 75 seconds of wait time.  Testing invalid
+after 180 seconds.  If the cancel doesn't work, isolationtester will exit
+uncleanly after a total of 200 seconds of wait time.  Testing invalid
 permutations should be avoided because they can make the isolation tests take
 a very long time to run, and they serve no useful testing purpose.
 
diff --git a/src/test/isolation/isolationtester.c 
b/src/test/isolation/isolationtester.c
index 7df67da..9134b05 100644
--- a/src/test/isolation/isolationtester.c
+++ b/src/test/isolation/isolationtester.c
@@ -783,15 +783,15 @@ try_complete_step(Step *step, int flags)
                        td += (int64) current_time.tv_usec - (int64) 
start_time.tv_usec;
 
                        /*
-                        * After 60 seconds, try to cancel the query.
+                        * After 180 seconds, try to cancel the query.
                         *
                         * If the user tries to test an invalid permutation, we 
don't want
                         * to hang forever, especially when this is running in 
the
-                        * buildfarm.  So try to cancel it after a minute.  
This will
-                        * presumably lead to this permutation failing, but 
remaining
-                        * permutations and tests should still be OK.
+                        * buildfarm.  This will presumably lead to this 
permutation
+                        * failing, but remaining permutations and tests should 
still be
+                        * OK.
                         */
-                       if (td > 60 * USECS_PER_SEC && !canceled)
+                       if (td > 180 * USECS_PER_SEC && !canceled)
                        {
                                PGcancel   *cancel = PQgetCancel(conn);
 
@@ -808,15 +808,15 @@ try_complete_step(Step *step, int flags)
                        }
 
                        /*
-                        * After 75 seconds, just give up and die.
+                        * After 200 seconds, just give up and die.
                         *
                         * Since cleanup steps won't be run in this case, this 
may cause
                         * later tests to fail.  That stinks, but it's better 
than waiting
                         * forever for the server to respond to the cancel.
                         */
-                       if (td > 75 * USECS_PER_SEC)
+                       if (td > 200 * USECS_PER_SEC)
                        {
-                               fprintf(stderr, "step %s timed out after 75 
seconds\n",
+                               fprintf(stderr, "step %s timed out after 200 
seconds\n",
                                                step->name);
                                exit_nicely();
                        }
diff --git a/src/test/recovery/t/006_logical_decoding.pl 
b/src/test/recovery/t/006_logical_decoding.pl
index 884b0ae..c23cc4d 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -72,7 +72,7 @@ my $endpos = $node_master->safe_psql('postgres',
 print "waiting to replay $endpos\n";
 
 my $stdout_recv = $node_master->pg_recvlogical_upto(
-       'postgres', 'test_slot', $endpos, 10,
+       'postgres', 'test_slot', $endpos, 180,
        'include-xids'     => '0',
        'skip-empty-xacts' => '1');
 chomp($stdout_recv);
@@ -84,7 +84,7 @@ $node_master->poll_query_until('postgres',
 ) or die "slot never became inactive";
 
 $stdout_recv = $node_master->pg_recvlogical_upto(
-       'postgres', 'test_slot', $endpos, 10,
+       'postgres', 'test_slot', $endpos, 180,
        'include-xids'     => '0',
        'skip-empty-xacts' => '1');
 chomp($stdout_recv);
diff --git a/src/test/recovery/t/010_logical_decoding_timelines.pl 
b/src/test/recovery/t/010_logical_decoding_timelines.pl
index 4fbd386..e582b20 100644
--- a/src/test/recovery/t/010_logical_decoding_timelines.pl
+++ b/src/test/recovery/t/010_logical_decoding_timelines.pl
@@ -183,7 +183,7 @@ my $endpos = $node_replica->safe_psql('postgres',
 
 $stdout = $node_replica->pg_recvlogical_upto(
        'postgres', 'before_basebackup',
-       $endpos,    30,
+       $endpos,    180,
        'include-xids'     => '0',
        'skip-empty-xacts' => '1');
 

Reply via email to