Hi, On 2022-12-08 16:15:11 -0800, Andres Freund wrote: > commit 3f0e786ccbf > Author: Andres Freund <and...@anarazel.de> > Date: 2022-12-07 12:13:35 -0800 > > meson: Add 'running' test setup, as a replacement for installcheck > > CI tests the pg_regress/isolationtester tests that support doing so against a > running server. > > > Unfortunately cfbot shows that that doesn't work entirely reliably. > > The most frequent case is postgres_fdw, which somewhat regularly fails with a > regression.diff like this: > > diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out > /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out > --- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out > 2022-12-08 20:35:24.772888000 +0000 > +++ > /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out > 2022-12-08 20:43:38.199450000 +0000 > @@ -9911,8 +9911,7 @@ > WHERE application_name = 'fdw_retry_check'; > pg_terminate_backend > ---------------------- > - t > -(1 row) > +(0 rows) > > -- This query should detect the broken connection when starting new remote > -- transaction, reestablish new connection, and then succeed. > > > See e.g. > https://cirrus-ci.com/task/5925540020879360 > https://api.cirrus-ci.com/v1/artifact/task/5925540020879360/testrun/build/testrun/postgres_fdw-running/regress/regression.diffs > https://api.cirrus-ci.com/v1/artifact/task/5925540020879360/testrun/build/testrun/runningcheck.log > > > The following comment in the test provides a hint what might be happening: > > -- If debug_discard_caches is active, it results in > -- dropping remote connections after every transaction, making it > -- impossible to test termination meaningfully. So turn that off > -- for this test. > SET debug_discard_caches = 0; > > > I guess that a cache reset message arrives and leads to the connection being > terminated. Unfortunately that's hard to see right now, as the relevant log > messages are output with DEBUG3 - it's quite verbose, so enabling it for all > tests will be painful. > > I *think* I have seen this failure locally at least once, when running the > test normally. > > > I'll try to reproduce this locally for a bit. If I can't, the only other idea > I have for debugging this is to change log_min_messages in that section of the > postgres_fdw test to DEBUG3.
Oh. I tried to reproduce the issue, without success so far, but eventually my test loop got stuck in something I reported previously and forgot about since: https://www.postgresql.org/message-id/20220925232237.p6uskba2dw6fnwj2%40awork3.anarazel.de I wonder if the timing on the freebsd CI task works out to hitting a "smaller version" of the problem that eventually resolves itself, which then leads to a sinval reset getting sent, causing the observable problem. Greetings, Andres Freund