On 2025-02-13 Th 12:39 PM, Andres Freund wrote:
Hi,

One of the remaining tasks for AIO was to convert the tests to be proper tap
tests.  I did that and was thanked by the tests fairly randomly failing on
windows. Which test fails changes from run to run.

The symptom is that BackgroundPsql->query() sometimes would simply swallow
errors that were clearly generated by the backend. Which then would cause
tests to fail, because testing for errors was the whole point of $test.


At first I thought the issue was that psql didn't actually flush stderr after
displaying errors. And while that may be an issue, it doesn't seem to be the
one causing problems for me.

Lots of hair-pulling later, I have a somewhat confirmed theory for what's
happening:

BackgroundPsql::query() tries to detect if the passed in query has executed by
adding a "banner" after the query and using pump_until() to wait until that
banner has been "reached".  That seems to work reasonably well on !windows.

On windows however, it looks like there's no guarantee that if stdout has been
received by IPC::Run, stderr also has been received, even if the stderr
content has been generated first. I tried to add an extra ->pump_nb() call to
query(), thinking that maybe IPC::Run just didn't get input that had actually
arrived, due to waiting on just one pipe. But no success.

My understanding is that IPC::Run uses a proxy process on windows to execute
subprocesses and then communicates with that over TCP (or something along
those lines).  I suspect what's happening is that the communication with the
external process allows for reordering between stdout/stderr.

And indeed, changing BackgroundPsql::query() to emit the banner on both stdout
and stderr and waiting on both seems to fix the issue.


One complication is that I found that just waiting for the banner, without
also its newline, sometimes lead to unexpected newlines causing later queries
to fail. I think that happens if the trailing newline is read separately from
the rest of the string.

However, matching the newlines caused tests to fail on some machines. After a
lot of cursing I figured out that for interactive psql we output \r\n, causing
my regex match to fail. I.e. tests failed whenever IO::PTY was availble...

It's also not correct, as we did before, to just look for the banner, it has
to be anchored to either the start of the output or a newline, otherwise the
\echo (or \warn) command itself will be matched by pump_until() (but then the
replacing the command would fail). Not sure that could cause active problems
without the addition of \warn (which is also echoed on stdout), but it
certainly could after.


The banner being the same between queries made it hard to understand if a
banner that appeared in the output was from the current query or a past
query. Therefore I added a counter to it.


For debugging I added a "note" that shows stdout/stderr after executing the
query, I think it may be worth keeping that, but I'm not sure.


This was a rather painful exercise.




It's been discussed before, but I'd really really like to get rid of BackgroundPsql. It's ugly, non-intuitive and fragile.

Last year I did some work on this. I was going to present it at Athens but illness prevented me, and then other life events managed to get in my way. But the basic work is around. See <https://github.com/adunstan/test-pq/commit/98518e4929e80fb96f210bbc5aab9fdcea058512> This introduces a libpq session object (PostgreSQL::Test::Session) which can be backed either by FFI or a small XS wrapper - the commit has recipes for both. Missing is a meson.build file for the XS wrapper. There are significant performance gains to be had too (poll_query_until is much nicer, for example, as are most uses of safe_psql). If there isĀ  interest I will bring the work up to date, and maybe we can introduce this early in the v19 cycle. It would significantly reduce our dependency on IPC::Run, especially the pump() stuff.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com



Reply via email to