On Fri, Jul 5, 2019 at 6:42 PM Tatsuo Ishii <is...@sraoss.co.jp> wrote: > > This seems like a reasonable idea to me. There is no point in running > > a monster 24 hour OLAP query if your client has gone away. It's using > > MSG_PEEK which is POSIX, and I can't immediately think of any reason > > why it's not safe to try to peek at a byte in that socket at any time. > > I am not familiar with Windows but I accidentally found this article > written by Microsoft: > > https://support.microsoft.com/en-us/help/192599/info-avoid-data-peeking-in-winsock > > It seems using MSG_PEEK is not recommended by Microsoft.
Hmm, interesting. Using it very infrequently just as a way to detect that the other end has gone away doesn't seem too crazy based on anything in that article though, does it? What they're saying actually applies to every operating system, not just Windows, AFAICS. Namely, don't use MSG_PEEK frequently because it's a syscall and takes locks in the kernel, and don't use it to wait for full messages to arrive, or you might effectively deadlock if internal buffers are full. But Sergey's patch only uses it to check if we could read 1 single byte, and does so very infrequently (the GUC should certainly be set to at least many seconds). What else could we do? Assuming the kernel actually knows the connection has gone away, the WaitEventSetWait() interface is no help on its own, I think, because it'll just tell you the socket is read for reading when it's closed, you still have to actually try to read to distinguish closed from a data byte. I tried this patch using a real network with two machines. I was able to get the new "connection to client lost" error by shutting down a network interface (effectively yanking a cable), but only with TCP keepalive configured. That's not too surprising; without that and without trying to write, there is no way for the kernel to know that the other end has gone. -- Thomas Munro https://enterprisedb.com