Re: [PATCH] Fix fragile walreceiver test.

Michael Paquier Tue, 04 Nov 2025 22:51:00 -0800

On Wed, Nov 05, 2025 at 12:03:29AM -0600, Bryan Green wrote:
> Problem: restart() kills the walreceiver (as it should), which writes
> that exact FATAL message to the log. The test then searches the log and
> finds it.


Timing issue then, the buildfarm has not been complaining on this one
AFAIK, there have been no recoveryCheck failures reported:
https://buildfarm.postgresql.org/cgi-bin/show_failures.pl

> The test has a comment claiming "a new log file is used on node
> restart". TAP tests use pg_ctl with a fixed filename that gets reused
> across restarts. No log rotation.

I've fat-fingered this assumption, indeed, missing that one would need
to do an extra rotate_logfile() before the restart.

> The fix is obvious: check that the walreceiver PID stays constant.
> That's what we actually care about anyway.

Hmm.  The reason why I didn't use a PID matching check (mentioned at
[1]) is that this is not entirely bullet-proof.  On a very slow
machine, one could assume that standby_1 generates some records and 
that these are replayed by standby_2 *before* the PID of the WAL
receiver is retrieved.  This could lead to false positives in some
cases, and a bunch of buildfarm members are very slow.  You have a
point that these would unlikely happen in normal runs, so a PID
matching check would be relevant most of the time anyway, even if the
original PID has been fetched after the TLI jump has been processed in
standby_2.  I'd rather keep the log check, TBH, bypassing it with an 
extra rotate_logfile() before the restart of standby_2.

> This matters because changes to I/O behavior elsewhere in the code can
> make this test fail spuriously. I hit it while working on O_CLOEXEC
> handling for Windows.

Fun.  And the WAL receiver never stops after the restart of standby_2
with the log entry present in the server logs generated before the
restart, right?  

[1]: https://www.postgresql.org/message-id/[email protected]
--
Michael

signature.asc
Description: PGP signature

Re: [PATCH] Fix fragile walreceiver test.

Reply via email to