On Wed, Aug 3, 2022 at 6:59 AM Robert Haas <robertmh...@gmail.com> wrote: > I don't really like this approach. Imagine that the code got broken in > such a way that relfrozenxid and relminmxid were set to a value chosen > at random - say, the contents of 4 bytes of unallocated memory that > contained random garbage. Well, right now, the chances that this would > cause a test failure are nearly 100%. With this change, they'd be > nearly 0%.
If that kind of speculative bug existed, and somehow triggered before the concurrent autovacuum ran (which seems very likely to be the source of the test flappiness), then it would still be caught, most likely. VACUUM itself has the following defenses: * The defensive "can't happen" errors added to heap_prepare_freeze_tuple() and related freezing routines by commit 699bf7d0 in 2017, as hardening following the "freeze the dead" bug. That'll catch XIDs that are before the relfrozenxid at the start of the VACUUM (ditto for MXIDs/relminmxid). * The assertion added in my recent commit 0b018fab, which verifies that we're about to set relfrozenxid to something sane. * VACUUM now warns when it sees a *previous* relfrozenxid that's apparently "in the future", following recent commit e83ebfe6. This problem scenario is associated with several historic bugs in pg_upgrade, where for one reason or another it failed to carry forward correct relfrozenxid and/or relminmxid values for a table (see the commit message for references to those old pg_upgrade bugs). It might make sense to run a manual VACUUM right at the end of the test, so that you reliably get this kind of coverage, even without autovacuum. -- Peter Geoghegan