Hannu Krosing <[EMAIL PROTECTED]> writes: > We were bitten by the following bug a few times, when our server tried > to reestablish connections under bad network conditions: > > if connection is closed while trying to get response to SSL setup packet > (i.e. conn->status is CONNECTION_SSL_STARTUP), we get a busy loop, as > line 1035 in 8.0.0.beta2: > > if (pqWaitTimed(1, 0, conn, finish_time) { > > tells that there is data to read (returns 0) while actually it is error > (POLLERR & POLLHUP) and not POLLIN returned from poll() and
This is intentional: the idea is that we should go ahead and do the read (or write), which will detect the error condition on the socket. poll() in itself doesn't give enough information to determine what the error condition is, so it's not appropriate to fail here. > after that the check on line 1462: > > if (nread == 0) > /* caller failed to wait for data */ > return PGRES_POLLING_READING; > > resumes the busy loop This seems to me to be the bug. pqReadData jumps through hoops to determine whether a zero-length read means EOF or not, and I think we need to expend some effort to determine that here too. One possibility is to forget the direct call to recv() and use pqReadData --- since conn->ssl isn't set yet, and we aren't expecting the server to send more than one byte, this should in theory be safe. Comments? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org