On Tue, May 4, 2021 at 4:12 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > The buildfarm is showing that one of these test queries is not stable > under CLOBBER_CACHE_ALWAYS: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2021-05-01%2007%3A44%3A47 > > of which the relevant part is: > > diff -U3 > /home/buildfarm/buildroot/HEAD/pgsql.build/contrib/postgres_fdw/expected/postgres_fdw.out > > /home/buildfarm/buildroot/HEAD/pgsql.build/contrib/postgres_fdw/results/postgres_fdw.out > --- > /home/buildfarm/buildroot/HEAD/pgsql.build/contrib/postgres_fdw/expected/postgres_fdw.out > 2021-05-01 03:44:45.022300613 -0400 > +++ > /home/buildfarm/buildroot/HEAD/pgsql.build/contrib/postgres_fdw/results/postgres_fdw.out > 2021-05-03 09:11:24.051379288 -0400 > @@ -9215,8 +9215,7 @@ > WHERE application_name = 'fdw_retry_check'; > pg_terminate_backend > ---------------------- > - t > -(1 row) > +(0 rows) > > -- This query should detect the broken connection when starting new remote > -- transaction, reestablish new connection, and then succeed.
Thanks for the report. > I can reproduce that locally by setting > > alter system set debug_invalidate_system_caches_always = 1; > > and running "make installcheck" in contrib/postgres_fdw. > (It takes a good long time to run the whole test script > though, so you might want to see if running just these few > queries is enough.) I can reproduce the issue with the failing case. Issue is that the backend pid will be null in the pg_stat_activity because of the cache invalidation that happens at the beginning of the query and hence pg_terminate_backend returns null on null input. > There's no evidence of distress in the postmaster log, > so I suspect this might just be a timing instability, > e.g. remote process already gone before local process > looks. If so, it's probably hopeless to make this > test stable as-is. Perhaps we should just take it out. Actually, that test case covers retry code, so removing it worries me. Instead, I can do as attached i.e. ignore the pg_terminate_backend output using PERFORM, as the function signals the backend if the given pid is a valid backend pid and returns on success. If at all, the function is to return false, it emits a warning, so it will be caught in the tests. And having a retry test case with clobber cache enabled doesn't make sense because all the cache entries are removed/invalidated for each query, but the test case covers the code on non-clobber cache platforms, so I would like to keep it. Please see the attached, it passes with "alter system set debug_invalidate_system_caches_always = 1;" on my system. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
v1-0001-Stabilize-a-test-case-in-postgres_fdw.patch
Description: Binary data