I wrote: > Michael Paquier <michael.paqu...@gmail.com> writes: >> And this gives the patch attached, just took the time to hack it.
> I think this is a good idea, but (1) I'm inclined not to restrict it to > Windows, and (2) I think we should hold off applying it until we've seen > a failure or two more, and can confirm whether d1b7d4877 does anything > useful for the error messages. OK, we now have failures from both bowerbird and jacana with the error reporting patch applied: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-21%2012%3A03%3A02 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-04-19%2021%3A00%3A39 and they both boil down to this: pg_ctl: could not start server Examine the log output. # pg_ctl failed; logfile: LOG: could not bind IPv4 socket: Permission denied HINT: Is another postmaster already running on port 60200? If not, wait a few seconds and retry. WARNING: could not create listen socket for "127.0.0.1" FATAL: could not create any TCP/IP sockets LOG: database system is shut down So "permission denied" is certainly more useful than "no error", which makes me feel that d1b7d4877+22989a8e3 are doing what they intended to and should get back-patched --- any objections? However, it's still not entirely clear what is the root cause of the failure and whether a patch along the discussed lines would prevent its recurrence. Looking at TranslateSocketError, it seems we must be seeing an underlying error code of WSAEACCES. A little googling says that Windows might indeed return that, rather than the more expected WSAEADDRINUSE, if someone else has the port open with SO_EXCLUSIVEADDRUSE: Another possible reason for the WSAEACCES error is that when the bind function is called (on Windows NT 4.0 with SP4 and later), another application, service, or kernel mode driver is bound to the same address with exclusive access. Such exclusive access is a new feature of Windows NT 4.0 with SP4 and later, and is implemented by using the SO_EXCLUSIVEADDRUSE option. So theory A is that some other program is binding random high port numbers with SO_EXCLUSIVEADDRUSE. Theory B is that this is the handiwork of Windows antivirus software doing what Windows antivirus software typically does, ie inject random permissions failures depending on the phase of the moon. It's not very clear that a test along the lines described (that is, attempt to connect to, not bind to, the target port) would pre-detect either type of error. Under theory A, a connect() test would recognize the problem only if the other program were using the port to listen rather than make an outbound connection; and the latter seems much more likely. (Possibly we could detect the latter case by checking the error code returned by connect(), but Michael's proposed patch does no such thing.) Under theory B, we're pretty much screwed, we don't know what will happen. I wonder what Andrew can tell us about what else is running on that machine and whether either theory has any credibility. BTW, if Windows *had* returned WSAEADDRINUSE, TranslateSocketError would have failed to translate it --- surely that's an oversight? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers