On Wed, Jan 23, 2019 at 11:23 AM Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Wed, Jan 23, 2019 at 4:07 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > The whole thing reminds me of the recent bug #15598: > > > > https://www.postgresql.org/message-id/87k1iy44fd.fsf%40news-spur.riddles.org.uk > > Yeah, if errors get moved to later exchanges but the server might exit > and close its end of the socket before we can manage to initiate a > later exchange, it starts to look just like that.
Based on some clues from Andrew Gierth (in the email referenced above and also in an off-list chat), I did some experimentation that seemed to confirm a theory of his that Linux might be taking a shortcut when both sides are local, bypassing the RST step because it can see both ends (whereas normally the TCP stack should cause the *next* sendto() to fail IIUC?). I think this case is essentially the same as bug #15598, it's just happening at a different time. With a simple socket test program I can see that if you send a single packet after the remote end has closed and after it had already read everything you sent it up to now, you get EPIPE. If there was some outstanding data from a previous send that it hadn't read yet when it closed its end, you get ECONNRESET. This doesn't happen if client and server are on different machines, or on FreeBSD even on the same machine, but does happen if client and server are on the same Linux system (whether using the loopback interface or a real network interface). However, after you get ECONNRESET, you can still read the final data that was sent by the server before it closed, which presumably contains the error we want to report to the user. That suggests that we could perhaps handle ECONNRESET both at startup packet send time (for certificate rejection, eelpout's case) and at initial query send (for idle timeout, bug #15598's case) by attempting to read. Does that make sense? I haven't poked into the libpq state machine stuff to see if that would be easy or hard. PS: looking again at the strace output from earlier, it's kinda funny that it says revents=POLLOUT|POLLERR|POLLHUP, since that seems to be a contradiction: if this were poll() and not ppoll() I think it might violate POSIX which says "[POLLHUP] and POLLOUT are mutually-exclusive; a stream can never be writable if a hangup has occurred", but I don't see what we could do differently with that anyway. -- Thomas Munro https://enterprisedb.com