06.12.2021 23:51, Andrew Dunstan wrote: > I have been getting 100% failures on the SSL tests with closesocket() > alone, and 100% success over 10 tests with this: > > > diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c > index 96ab37c7d0..5998c089b0 100644 > --- a/src/backend/libpq/pqcomm.c > +++ b/src/backend/libpq/pqcomm.c > @@ -295,6 +295,7 @@ socket_close(int code, Datum arg) > * Windows too. But it's a lot more fragile than the other way. > */ > #ifdef WIN32 > + shutdown(MyProcPort->sock, SD_SEND); > closesocket(MyProcPort->sock); > #endif > > > That said, your results are quite worrying. My next results are following: It seems that the test failure rate may depend on the specs/environment. With close-only version, having limited CPU usage for my Windows VM to 20%, I've got failures on iterations 10, 2, 1. With 100% CPU I've seen 20 successful runs, then fails on iterations 5, 2. clean&buid and then failed iterations 11, 6, 3. (So maybe caching is another factor.)
shutdown(MyProcPort->sock, SD_SEND) apparently fixes the issue, I've got 83 successful runs, but then iteration 84 unfortunately failed: t/001_ssltests.pl .. 106/110 # Failed test 'intermediate client certificate is missing: matches' # at t/001_ssltests.pl line 608. # 'psql: error: connection to server at "127.0.0.1", port 63187 failed: could not receive data from server: Software caused connection abort (0x00002745/10053) # SSL SYSCALL error: Software caused connection abort (0x00002745/10053) # could not send startup packet: No error (0x00000000/0)' # doesn't match '(?^:SSL error: tlsv1 alert unknown ca)' # Looks like you failed 1 test of 110. t/001_ssltests.pl .. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/110 subtests (less 2 skipped subtests: 107 okay) It's not that one that we observed with close-only fix, but it still worrying. And then exactly this fail occurred again, on iteration 8. But "fortunately" I've got the same fail as before: t/001_ssltests.pl .. 106/110 # Failed test 'certificate authorization fails with revoked client cert with server-side CRL directory: matches' # at t/001_ssltests.pl line 618. # 'psql: error: connection to server at "127.0.0.1", port 59220 failed: server closed the connection unexpectedly # This probably means the server terminated abnormally # before or while processing the request. # server closed the connection unexpectedly # This probably means the server terminated abnormally # before or while processing the request. # server closed the connection unexpectedly # This probably means the server terminated abnormally # before or while processing the request.' # doesn't match '(?^:SSL error: sslv3 alert certificate revoked)' # Looks like you failed 1 test of 110. t/001_ssltests.pl .. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/110 subtests (less 2 skipped subtests: 107 okay) on 145-th iteration of the test even without close() (I've tried to check whether the aforementioned fail existed before the fix). So probably we found a practical evidence of shutdown() importance we missed before, but it's not the end. There was some test instability even without the close() fix and it remains with the shutdown(...SD_SEND). By the way, while exploring openssl' behavior, I found that SSL_shutdown() has it's own quirks (see [1], return value 0). Maybe now we've encountered one of these. Best regards, Alexander [1] https://www.openssl.org/docs/man3.0/man3/SSL_shutdown.html