Andres Freund <and...@anarazel.de> writes: > On 2022-01-18 21:50:07 -0500, Tom Lane wrote: >> This actually causes parallel check-world to fail altogether on florican's >> host, because the initial fsync of the recovered primary takes more than 3 >> minutes when there's conflicting I/O traffic, causing pg_ctl to time out.
> Ugh. I misspoke there: it's the standby that is performing an fsync'd checkpoint and timing out, during the test's promote-the-standby step. This test attempt revealed another problem too: the standby never shut down, and thus the calling "make" never quit, until I intervened manually. I'm not sure why. I see that Cluster::promote uses system_or_bail() to run "pg_ctl promote" ... could it be that BAIL_OUT causes the normal script END hooks to not get run? But it seems like we'd have noticed that long ago. regards, tom lane