Hi, One of the remaining tasks for AIO was to convert the tests to be proper tap tests. I did that and was thanked by the tests fairly randomly failing on windows. Which test fails changes from run to run.
The symptom is that BackgroundPsql->query() sometimes would simply swallow errors that were clearly generated by the backend. Which then would cause tests to fail, because testing for errors was the whole point of $test. At first I thought the issue was that psql didn't actually flush stderr after displaying errors. And while that may be an issue, it doesn't seem to be the one causing problems for me. Lots of hair-pulling later, I have a somewhat confirmed theory for what's happening: BackgroundPsql::query() tries to detect if the passed in query has executed by adding a "banner" after the query and using pump_until() to wait until that banner has been "reached". That seems to work reasonably well on !windows. On windows however, it looks like there's no guarantee that if stdout has been received by IPC::Run, stderr also has been received, even if the stderr content has been generated first. I tried to add an extra ->pump_nb() call to query(), thinking that maybe IPC::Run just didn't get input that had actually arrived, due to waiting on just one pipe. But no success. My understanding is that IPC::Run uses a proxy process on windows to execute subprocesses and then communicates with that over TCP (or something along those lines). I suspect what's happening is that the communication with the external process allows for reordering between stdout/stderr. And indeed, changing BackgroundPsql::query() to emit the banner on both stdout and stderr and waiting on both seems to fix the issue. One complication is that I found that just waiting for the banner, without also its newline, sometimes lead to unexpected newlines causing later queries to fail. I think that happens if the trailing newline is read separately from the rest of the string. However, matching the newlines caused tests to fail on some machines. After a lot of cursing I figured out that for interactive psql we output \r\n, causing my regex match to fail. I.e. tests failed whenever IO::PTY was availble... It's also not correct, as we did before, to just look for the banner, it has to be anchored to either the start of the output or a newline, otherwise the \echo (or \warn) command itself will be matched by pump_until() (but then the replacing the command would fail). Not sure that could cause active problems without the addition of \warn (which is also echoed on stdout), but it certainly could after. The banner being the same between queries made it hard to understand if a banner that appeared in the output was from the current query or a past query. Therefore I added a counter to it. For debugging I added a "note" that shows stdout/stderr after executing the query, I think it may be worth keeping that, but I'm not sure. This was a rather painful exercise. Greetings, Andres Freund
>From 1aeb60e0687339b9f7524c3b347915bb53ed1284 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Thu, 13 Feb 2025 10:09:49 -0500 Subject: [PATCH v1] BackgroundPsql: Fix potential for lost errors on windows Author: Reviewed-by: Discussion: https://postgr.es/m/ Backpatch: --- .../perl/PostgreSQL/Test/BackgroundPsql.pm | 55 +++++++++++++++---- 1 file changed, 43 insertions(+), 12 deletions(-) diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm index dbfd82e4fac..7e76845307d 100644 --- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm +++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm @@ -89,7 +89,8 @@ sub new 'stdin' => '', 'stdout' => '', 'stderr' => '', - 'query_timer_restart' => undef + 'query_timer_restart' => undef, + 'query_cnt' => 1, }; my $run; @@ -219,27 +220,57 @@ sub query my ($self, $query) = @_; my $ret; my $output; + my $query_cnt = $self->{query_cnt}++; + local $Test::Builder::Level = $Test::Builder::Level + 1; - note "issuing query via background psql: $query"; + note "issuing query $query_cnt via background psql: $query"; $self->{timeout}->start() if (defined($self->{query_timer_restart})); # Feed the query to psql's stdin, followed by \n (so psql processes the # line), by a ; (so that psql issues the query, if it doesn't include a ; - # itself), and a separator echoed with \echo, that we can wait on. - my $banner = "background_psql: QUERY_SEPARATOR"; - $self->{stdin} .= "$query\n;\n\\echo $banner\n"; - - pump_until($self->{run}, $self->{timeout}, \$self->{stdout}, qr/$banner/); + # itself), and a separator echoed both with \echo and \warn, that we can + # wait on. + # + # To avoid somehow confusing the separator from separately issued queries, + # and to make it easier to debug, we include a per-psql query counter in + # the separator. + # + # We need both \echo (printing to stdout) and \warn (printing to stderr), + # because on windows we can get data on stdout before seeing data on + # stderr (or vice versa), even if psql printed them in the opposite + # order. We therefore wait on both. + # + # We need to match for the newline, because we try to remove it below, and + # it's possible to consume just the input *without* the newline. In + # interactive psql we emit \r\n, so we need to allow for that. Also need + # to be careful that we don't e.g. match the echoed \echo command, rather + # than its output. + my $banner = "background_psql: QUERY_SEPARATOR $query_cnt"; + my $banner_match = qr/(^|\n)$banner\r?\n/; + $self->{stdin} .= "$query\n;\n\\echo $banner\n\\warn $banner\n"; + pump_until( + $self->{run}, $self->{timeout}, + \$self->{stdout}, qr/$banner_match/); + pump_until( + $self->{run}, $self->{timeout}, + \$self->{stderr}, qr/$banner_match/); die "psql query timed out" if $self->{timeout}->is_expired; + + note "query returned:\n", + explain { + stdout => $self->{stdout}, + stderr => $self->{stderr}, + }; + + # Remove banner from stdout and stderr, our caller doesn't care. The + # first newline is optional, as there would not be one if consuming an + # empty query result. $output = $self->{stdout}; - - # Remove banner again, our caller doesn't care. The first newline is - # optional, as there would not be one if consuming an empty query - # result. - $output =~ s/\n?$banner\n$//s; + $output =~ s/$banner_match//; + $self->{stderr} =~ s/$banner_match//; # clear out output for the next query $self->{stdout} = ''; -- 2.48.1.76.g4e746b1a31.dirty