Hello Tom and Jelte,

31.08.2024 07:04, Alexander Lakhin wrote:
I've tested your fix with the modification I proposed upthread:
             idle_session_timeout_enabled = false;
         }
+if (rand() % 10 == 0) pg_usleep(10000);
         /*
          * (5) disable async signal conditions again.

and can confirm that the issue is gone. On 8749d850f~1, the test failed
on iterations 3, 3. 12 for me, but on current REL_17_STABLE, 100 test
iterations succeeded.


One month later, treehopper has found a way to break that test: [1].
The failure log contains:
2024-09-30 19:34:31.347 UTC [3201034:13] fdw_retry_check LOG: execute 
<unnamed>: DECLARE c2 CURSOR FOR
    SELECT count(*) FROM ((("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (TRUE)) INNER JOIN "S 1"."T 1" r4 ON (TRUE)) INNER JOIN "S 1"."T 1" r6 ON (TRUE))
2024-09-30 19:34:31.464 UTC [3201033:10] pg_regress/query_cancel ERROR:  
canceling statement due to statement timeout
2024-09-30 19:34:31.464 UTC [3201033:11] pg_regress/query_cancel STATEMENT:  SELECT count(*) FROM ft1 a CROSS JOIN ft1 b CROSS JOIN ft1 c CROSS JOIN ft1 d;
2024-09-30 19:34:31.466 UTC [3201035:1] [unknown] LOG:  connection received: 
host=[local]
2024-09-30 19:34:31.474 UTC [3201034:14] fdw_retry_check LOG: statement: FETCH 
100 FROM c2
2024-09-30 19:35:01.485 UTC [3201033:12] pg_regress/query_cancel WARNING:  could not get result of cancel request due to timeout

It looks like this time the cancel request arrived to the remote backend
when it processed FETCH, presumably at the DoingCommandRead stage.
I've managed to reproduce the issue with the additional modification:
@@ -1605,7 +1605,10 @@ postgresIterateForeignScan(ForeignScanState *node)
      * first call after Begin or ReScan.
      */
     if (!fsstate->cursor_exists)
+{
         create_cursor(node);
+pg_usleep(100000);
+}

With postgres_fdw/Makefile modified to repeat the query_cancel test, I get:
ok 13        - query_cancel                              245 ms
not ok 14    - query_cancel                            30258 ms
ok 15        - query_cancel                              249 ms
...
ok 19        - query_cancel                              236 ms
not ok 20    - query_cancel                            30258 ms
ok 21        - query_cancel                              225 ms
..
not ok 33    - query_cancel                            30272 ms
1..33
# 3 of 33 tests failed.

(Please find attached the complete patch.)

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=treehopper&dt=2024-09-30%2019%3A21%3A14

Best regards,
Alexander
diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index b9fa699305..3e1260e125 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql
 
-REGRESS = postgres_fdw query_cancel
+REGRESS = postgres_fdw query_cancel $(shell printf 'query_cancel %.0s' `seq 
31`)
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/postgres_fdw/postgres_fdw.c 
b/contrib/postgres_fdw/postgres_fdw.c
index fc65d81e21..9190fbb3cb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1605,7 +1605,10 @@ postgresIterateForeignScan(ForeignScanState *node)
         * first call after Begin or ReScan.
         */
        if (!fsstate->cursor_exists)
+{
                create_cursor(node);
+pg_usleep(100000);
+}
 
        /*
         * Get some more tuples, if we've run out.
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a750dc800b..c281547e22 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4717,6 +4717,10 @@ PostgresMain(const char *dbname, const char *username)
                        idle_session_timeout_enabled = false;
                }
 
+if (rand() % 10 == 0)
+{
+pg_usleep(10000);
+}
                /*
                 * (5) disable async signal conditions again.
                 *

Reply via email to