Hello hackers,

13.09.2020 21:37, Tom Lane wrote:
> I happened to try googling for other similar reports, and I found
> a very interesting recent thread here:
>
> https://github.com/nodejs/node/issues/33166
>
> It might not have the same underlying cause, of course, but it sure
> sounds familiar.  If Node.js are really seeing the same effect,
> that would point to an underlying Windows bug rather than anything
> Postgres is doing wrong.
>
> It doesn't look like the Node.js crew got any closer to
> understanding the issue than we have, unfortunately.  They made
> their problem mostly go away by reverting a seemingly-unrelated
> patch.  But I can't help thinking that it's a timing-related bug,
> and that patch was just unlucky enough to change the timing of
> their tests so that they saw the failure frequently.
I've managed to make a simple reproducer. Please look at the patch attached.
There are two things crucial for reproducing the bug:
    ioctlsocket(sock, FIONBIO, &ioctlsocket_ret); // from pgwin32_socket()
and
    WSACleanup();

I still can't understand what affects the effect. With this reproducer I
get:
vcregress taptest src\test\modules\connect
...
t/000_connect.pl .. # test
#
t/000_connect.pl .. 13346/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 16714/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 26216/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 30077/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 36505/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 43647/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 53070/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 54402/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 55685/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 83193/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 99992/100000 # Looks like you failed 10 tests of 100000.
t/000_connect.pl .. Dubious, test returned 10 (wstat 2560, 0xa00)
Failed 10/100000 subtests

But in our test farm the pg_bench test (from the installcheck-world
suite that we run with using msys) can fail roughly on each third run.
Perhaps it depends on I/O load. It seems, that searching files/scanning
disk in parallel increases the probability of the glitch.
I see no solution for this on the postgres side for now, but this
information about Windows quirks could be useful in case someone
stumbled upon it too.

Best regards,
Alexander
diff --git a/src/test/modules/connect/connect.c b/src/test/modules/connect/connect.c
new file mode 100644
index 0000000000..c875b7647b
--- /dev/null
+++ b/src/test/modules/connect/connect.c
@@ -0,0 +1,26 @@
+#include <stdio.h>
+#include <winsock.h>
+
+int
+main(int argc, char *argv[])
+{
+	WSADATA	      wsaData;
+	SOCKET sock;
+	int port = 5432;
+	struct sockaddr_in ai_addr;
+	unsigned long ioctlsocket_ret = 1;
+
+	if (argc > 1) port = atoi(argv[1]);
+
+	ai_addr.sin_family = AF_INET;
+	ai_addr.sin_addr.s_addr = inet_addr("127.0.0.1");
+	ai_addr.sin_port = htons(port);
+
+	WSAStartup(MAKEWORD(1, 1), &wsaData);
+	sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
+	ioctlsocket(sock, FIONBIO, &ioctlsocket_ret);
+	connect(sock, (SOCKADDR *) & ai_addr, sizeof (ai_addr));
+	fprintf(stdout, "test\n");
+	WSACleanup();
+	return 0;
+}
diff --git a/src/test/modules/connect/t/000_connect.pl b/src/test/modules/connect/t/000_connect.pl
new file mode 100644
index 0000000000..bd23d1e419
--- /dev/null
+++ b/src/test/modules/connect/t/000_connect.pl
@@ -0,0 +1,25 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More;
+use Test::More tests => 100000;
+
+# start a server
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+
+my $stdout;
+my $stderr;
+my @cmd = ('connect', $node->port);
+
+IPC::Run::run(\@cmd, '>', \$stdout, '2>', \$stderr);
+diag($stdout);
+diag($stderr);
+
+for (my $i =0; $i < 100000; $i++) {
+	IPC::Run::run(\@cmd, '>', \$stdout, '2>', \$stderr);
+	ok(defined $stdout && $stdout ne '');
+}
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 20da7985c1..70959fbb24 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -43,6 +43,7 @@ my $contrib_extrasource = {
 	'seg'  => [ 'contrib/seg/segscan.l',   'contrib/seg/segparse.y' ],
 };
 my @contrib_excludes = (
+	'connect',
 	'bool_plperl',      'commit_ts',
 	'hstore_plperl',    'hstore_plpython',
 	'intagg',           'jsonb_plperl',
@@ -423,6 +424,10 @@ sub mkvcbuild
 	$zic->AddDirResourceFile('src/timezone');
 	$zic->AddReference($libpgcommon, $libpgport);
 
+	my $test_connect = $solution->AddProject('connect', 'exe', 'utils');
+	$test_connect->AddFile('src/test/modules/connect/connect.c');
+	$test_connect->AddLibrary('ws2_32.lib');
+
 	if (!$solution->{options}->{xml})
 	{
 		push @contrib_excludes, 'xml2';

Reply via email to