On Sun, Mar 27, 2022 at 01:18:46PM -0400, Tom Lane wrote: > skink has passed several runs since the commit went in, so it's > "unstable" not "fails consistently". I see the test tries to > disable autovacuum on that table, so that doesn't seem to be > the problem ... what is?
This is a race condition, directly unrelated to valgrind but easier to trigger under it because things get slower. It takes me a dozen of tries to be able to reproduce the failure locally, but I can wiht valgrind enabled. So, the output of the test is simply telling us that the FSM of the main table is not getting truncated. From what I can see, the difference is in should_attempt_truncation(), where we finish with nonempty_pages set to 1 rather than 0 on failure. And it just takes one autovacuum to run in parallel of the manual VACUUM after the DELETE to prevent the removal of those tuples, which is what I can see from the logs on failure: LOG: statement: DELETE FROM freespace_tab; DEBUG: autovacuum: processing database "contrib_regression" LOG: statement: VACUUM freespace_tab; It seems to me here that the snapshot hold by autovacuum during the scan of pg_database to find the relations to process is enough to prevent the FSM truncation, as the tuples cleaned up by the DELETE query still need to be visible. One simple way to keep this test would be a custom configuration file with autovacuum disabled and NO_INSTALLCHECK. Any better ideas? -- Michael
signature.asc
Description: PGP signature