On Nov 11, 2020, at 10:57 AM, Jacob Champion <pchamp...@vmware.com> wrote: > > False alarm -- the stderr debugging I'd added in to track down the > assertion tripped up the "no stderr" tests. Zero failing tests now.
I took a look at the OpenSSL interop problems you mentioned upthread. I don't see a hang like you did, but I do see a PR_IO_TIMEOUT_ERROR during connection. I think pgtls_read() needs to treat PR_IO_TIMEOUT_ERROR as if no bytes were read, in order to satisfy its API. There was some discussion on this upthread: On Oct 27, 2020, at 1:07 PM, Daniel Gustafsson <dan...@yesql.se> wrote: > > On 20 Oct 2020, at 21:15, Andres Freund <and...@anarazel.de> wrote: >> >>> + case PR_IO_TIMEOUT_ERROR: >>> + break; >> >> What does this mean? We'll return with a 0 errno here, right? When is >> this case reachable? > > It should, AFAICT, only be reachable when PR_Recv is used with a timeout which > we don't do. It mentioned somewhere that it had happened in no-wait calls due > to a bug, but I fail to find that reference now. Either way, I've removed it > to fall into the default error handling which now sets errno correctly as that > was a paddle short here. PR_IO_TIMEOUT_ERROR is definitely returned in no-wait calls on my machine. It doesn't look like the PR_Recv() API has a choice -- if there's no data, it can't return a positive integer, and returning zero means that the socket has been disconnected. So -1 with a timeout error is the only option. I'm not completely sure why this is exposed so easily with an OpenSSL server -- I'm guessing the implementation slices up its packets differently on the wire, causing a read event before NSS is able to decrypt a full record -- but it's worth noting that this case also shows up during NSS-to-NSS psql connections, when handling notifications at the end of every query. PQconsumeInput() reports a hard failure with the current implementation, but its return value is ignored by PrintNotifications(). Otherwise this probably would have showed up earlier. (What's the best way to test this case? Are there lower-level tests for the protocol/network layer somewhere that I'm missing?) While patching this case, I also noticed that pgtls_read() doesn't call SOCK_ERRNO_SET() for the disconnection case. That is also in the attached patch. --Jacob
nss-handle-timeouts-and-disconnections-in-pgtls_read.patch
Description: nss-handle-timeouts-and-disconnections-in-pgtls_read.patch