Hi,

One of the remaining tasks for AIO was to convert the tests to be proper tap
tests.  I did that and was thanked by the tests fairly randomly failing on
windows. Which test fails changes from run to run.

The symptom is that BackgroundPsql->query() sometimes would simply swallow
errors that were clearly generated by the backend. Which then would cause
tests to fail, because testing for errors was the whole point of $test.


At first I thought the issue was that psql didn't actually flush stderr after
displaying errors. And while that may be an issue, it doesn't seem to be the
one causing problems for me.

Lots of hair-pulling later, I have a somewhat confirmed theory for what's
happening:

BackgroundPsql::query() tries to detect if the passed in query has executed by
adding a "banner" after the query and using pump_until() to wait until that
banner has been "reached".  That seems to work reasonably well on !windows.

On windows however, it looks like there's no guarantee that if stdout has been
received by IPC::Run, stderr also has been received, even if the stderr
content has been generated first. I tried to add an extra ->pump_nb() call to
query(), thinking that maybe IPC::Run just didn't get input that had actually
arrived, due to waiting on just one pipe. But no success.

My understanding is that IPC::Run uses a proxy process on windows to execute
subprocesses and then communicates with that over TCP (or something along
those lines).  I suspect what's happening is that the communication with the
external process allows for reordering between stdout/stderr.

And indeed, changing BackgroundPsql::query() to emit the banner on both stdout
and stderr and waiting on both seems to fix the issue.


One complication is that I found that just waiting for the banner, without
also its newline, sometimes lead to unexpected newlines causing later queries
to fail. I think that happens if the trailing newline is read separately from
the rest of the string.

However, matching the newlines caused tests to fail on some machines. After a
lot of cursing I figured out that for interactive psql we output \r\n, causing
my regex match to fail. I.e. tests failed whenever IO::PTY was availble...

It's also not correct, as we did before, to just look for the banner, it has
to be anchored to either the start of the output or a newline, otherwise the
\echo (or \warn) command itself will be matched by pump_until() (but then the
replacing the command would fail). Not sure that could cause active problems
without the addition of \warn (which is also echoed on stdout), but it
certainly could after.


The banner being the same between queries made it hard to understand if a
banner that appeared in the output was from the current query or a past
query. Therefore I added a counter to it.


For debugging I added a "note" that shows stdout/stderr after executing the
query, I think it may be worth keeping that, but I'm not sure.


This was a rather painful exercise.


Greetings,

Andres Freund
>From 1aeb60e0687339b9f7524c3b347915bb53ed1284 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Thu, 13 Feb 2025 10:09:49 -0500
Subject: [PATCH v1] BackgroundPsql: Fix potential for lost errors on windows

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 .../perl/PostgreSQL/Test/BackgroundPsql.pm    | 55 +++++++++++++++----
 1 file changed, 43 insertions(+), 12 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index dbfd82e4fac..7e76845307d 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -89,7 +89,8 @@ sub new
 		'stdin' => '',
 		'stdout' => '',
 		'stderr' => '',
-		'query_timer_restart' => undef
+		'query_timer_restart' => undef,
+		'query_cnt' => 1,
 	};
 	my $run;
 
@@ -219,27 +220,57 @@ sub query
 	my ($self, $query) = @_;
 	my $ret;
 	my $output;
+	my $query_cnt = $self->{query_cnt}++;
+
 	local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-	note "issuing query via background psql: $query";
+	note "issuing query $query_cnt via background psql: $query";
 
 	$self->{timeout}->start() if (defined($self->{query_timer_restart}));
 
 	# Feed the query to psql's stdin, followed by \n (so psql processes the
 	# line), by a ; (so that psql issues the query, if it doesn't include a ;
-	# itself), and a separator echoed with \echo, that we can wait on.
-	my $banner = "background_psql: QUERY_SEPARATOR";
-	$self->{stdin} .= "$query\n;\n\\echo $banner\n";
-
-	pump_until($self->{run}, $self->{timeout}, \$self->{stdout}, qr/$banner/);
+	# itself), and a separator echoed both with \echo and \warn, that we can
+	# wait on.
+	#
+	# To avoid somehow confusing the separator from separately issued queries,
+	# and to make it easier to debug, we include a per-psql query counter in
+	# the separator.
+	#
+	# We need both \echo (printing to stdout) and \warn (printing to stderr),
+	# because on windows we can get data on stdout before seeing data on
+	# stderr (or vice versa), even if psql printed them in the opposite
+	# order. We therefore wait on both.
+	#
+	# We need to match for the newline, because we try to remove it below, and
+	# it's possible to consume just the input *without* the newline. In
+	# interactive psql we emit \r\n, so we need to allow for that. Also need
+	# to be careful that we don't e.g. match the echoed \echo command, rather
+	# than its output.
+	my $banner = "background_psql: QUERY_SEPARATOR $query_cnt";
+	my $banner_match = qr/(^|\n)$banner\r?\n/;
+	$self->{stdin} .= "$query\n;\n\\echo $banner\n\\warn $banner\n";
+	pump_until(
+		$self->{run}, $self->{timeout},
+		\$self->{stdout}, qr/$banner_match/);
+	pump_until(
+		$self->{run}, $self->{timeout},
+		\$self->{stderr}, qr/$banner_match/);
 
 	die "psql query timed out" if $self->{timeout}->is_expired;
+
+	note "query returned:\n",
+	  explain {
+		stdout => $self->{stdout},
+		stderr => $self->{stderr},
+	  };
+
+	# Remove banner from stdout and stderr, our caller doesn't care.  The
+	# first newline is optional, as there would not be one if consuming an
+	# empty query result.
 	$output = $self->{stdout};
-
-	# Remove banner again, our caller doesn't care.  The first newline is
-	# optional, as there would not be one if consuming an empty query
-	# result.
-	$output =~ s/\n?$banner\n$//s;
+	$output =~ s/$banner_match//;
+	$self->{stderr} =~ s/$banner_match//;
 
 	# clear out output for the next query
 	$self->{stdout} = '';
-- 
2.48.1.76.g4e746b1a31.dirty

Reply via email to