Hello hackers,

30.03.2024 01:17, Noah Misch wrote:
On Fri, Mar 29, 2024 at 09:17:55AM +0100, Jelte Fennema-Nio wrote:
On Thu, 28 Mar 2024 at 19:03, Tom Lane <t...@sss.pgh.pa.us> wrote:
Could we make this test bulletproof by using an injection point?
If not, I remain of the opinion that we're better off without it.
Possibly, and if so, I agree that would be better than the currently
added test. But I honestly don't feel like spending the time on
creating such a test.
The SQL test is more representative of real applications, and it's way simpler
to understand.  In general, I prefer 6-line SQL tests that catch a problem 10%
of the time over injection point tests that catch it 100% of the time.  For
low detection rate to be exciting, it needs to be low enough to have a serious
chance of all buildfarm members reporting green for the bad commit.  With ~115
buildfarm members running in the last day, 0.1% detection rate would have been
low enough to bother improving, but 4% would be high enough to call it good.

As a recent buildfarm failure on orlingo (which tests asan-enabled builds)
[1] shows, that test can still fail:
70/70 postgresql:postgres_fdw-running / postgres_fdw-running/regress            
   ERROR           278.67s exit status 1

@@ -2775,6 +2775,7 @@
 SET LOCAL statement_timeout = '10ms';
 select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this 
takes very long
 ERROR:  canceling statement due to statement timeout
+WARNING:  could not get result of cancel request due to timeout
 COMMIT;

(from the next run we can see normal duration:
"postgres_fdw-running/regress               OK                6.30s ")

I reproduced the failure with an asan-enabled build on a slowed-down VM
and as far as I can see, it's caused by the following condition in
ProcessInterrupts():
                /*
                 * If we are reading a command from the client, just ignore the 
cancel
                 * request --- sending an extra error message won't accomplish
                 * anything.  Otherwise, go ahead and throw the error.
                 */
                if (!DoingCommandRead)
                {
                        LockErrorCleanup();
                        ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
                                         errmsg("canceling statement due to user 
request")));
                }

I think this failure can be reproduced easily (without asan/slowing down)
with this modification:
@@ -4630,6 +4630,7 @@ PostgresMain(const char *dbname, const char *username)
             idle_session_timeout_enabled = false;
         }

+if (rand() % 10 == 0) pg_usleep(10000);
         /*
          * (5) disable async signal conditions again.
          *

Running this test in a loop (for ((i=1;i<=100;i++)); do \
echo "iteration $i"; make -s check -C contrib/postgres_fdw/ || break; \
done), I get:
...
iteration 56
# +++ regress check in contrib/postgres_fdw +++
# initializing database system by copying initdb template
# using temp instance on port 55312 with PID 991332
ok 1         - postgres_fdw                            20093 ms
1..1
# All 1 tests passed.
iteration 57
# +++ regress check in contrib/postgres_fdw +++
# initializing database system by copying initdb template
# using temp instance on port 55312 with PID 992152
not ok 1     - postgres_fdw                            62064 ms
1..1
...
--- .../contrib/postgres_fdw/expected/postgres_fdw.out 2024-06-22 
02:52:42.991574907 +0000
+++ .../contrib/postgres_fdw/results/postgres_fdw.out 2024-06-22 
14:43:43.949552927 +0000
@@ -2775,6 +2775,7 @@
 SET LOCAL statement_timeout = '10ms';
 select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this 
takes very long
 ERROR:  canceling statement due to statement timeout
+WARNING:  could not get result of cancel request due to timeout
 COMMIT;

I also came across another failure of the test:
@@ -2774,7 +2774,7 @@
 BEGIN;
 SET LOCAL statement_timeout = '10ms';
 select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this 
takes very long
-ERROR:  canceling statement due to statement timeout
+ERROR:  canceling statement due to user request
 COMMIT;

which is reproduced with a sleep added here:
@@ -1065,6 +1065,7 @@ exec_simple_query(const char *query_string)
      */
     parsetree_list = pg_parse_query(query_string);
+pg_usleep(11000);

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-06-20%2009%3A52%3A04

Best regards,
Alexander


Reply via email to