Hi, On 2022-01-18 21:50:07 -0500, Tom Lane wrote: > I just found one thing making check-world slower than it ought to be: > src/test/recovery/t/008_fsm_truncation.pl does > > $node_primary->append_conf( > 'postgresql.conf', qq{ > fsync = on > wal_log_hints = on > max_prepared_transactions = 5 > autovacuum = off > }); > > There is no reason for this script to be overriding Cluster.pm's > fsync = off setting. > > This appears to go back to 917dc7d23 of 2016, so I think it just > predates our recognition that we should disable fsync in routine > tests.
Yea, I noticed this too. I was wondering if there's a conceivable reason to actually want fsyncs, but I couldn't come up with one. On systems where IO isn't overloaded, the main problem with this test are elsewhere: It multiple times waits for VACUUMs that are blocked truncating the table. Which these days takes 5 seconds. Thus the test takes quite a while. To me VACUUM_TRUNCATE_LOCK_TIMEOUT = 5s seems awfully long. On a system with a lot of tables that's much more than vacuum will take. So this can easily lead to using up all autovacuum workers... > This actually causes parallel check-world to fail altogether on florican's > host, because the initial fsync of the recovered primary takes more than 3 > minutes when there's conflicting I/O traffic, causing pg_ctl to time out. Ugh. I noticed a few other sources of "unnecessary" fsyncs. The most frequent being the durable_rename() of backup_manifest in pg_basebackup.c. Manifests are surprisingly large, 135k for a freshly initdb'd cluster. There's an fsync in walmethods.c:tar_close() that sounds intentional, but I don't really understand what the comment: /* Always fsync on close, so the padding gets fsynced */ if (tar_sync(f) < 0) Greetings, Andres Freund