On 2020/10/09 4:15, Tom Lane wrote:
Over in the thread at [1], we've tentatively determined that the
reason buildfarm member lorikeet is currently failing is that its
network stack returns ECONNABORTED for (some?) connection failures,
whereas our code is only expecting ECONNRESET.  Fujii Masao therefore
proposes that we treat ECONNABORTED the same as ECONNRESET.  I think
this is a good idea, but after a bit of research I feel it does not
go far enough.  I find these POSIX-standard errnos that also seem
likely candidates to be returned for a hard loss of connection:

        ECONNABORTED
        EHOSTUNREACH
        ENETDOWN
        ENETUNREACH

All of these have been in POSIX since SUSv2, so it seems unlikely
that we need to #ifdef any of them.  (It is in any case pretty silly
that we have #ifdefs around a very small minority of our references
to ECONNRESET :-(.)

There are some other related errnos, such as ECONNREFUSED, that
don't seem like they'd be returned for a failure of a pre-existing
connection, so we don't need to include them in such tests.

Accordingly, I propose the attached patch (an expansion of
Fujii-san's) that causes us to test for all five errnos anyplace
we had been checking for ECONNRESET.

+1

Thanks for expanding the patch!

-#ifdef ECONNRESET
-                       case ECONNRESET:
+                       case ALL_CONNECTION_LOSS_ERRNOS:
                                printfPQExpBuffer(&conn->errorMessage,
                                                                  
libpq_gettext("server closed the connection unexpectedly\n"
                                                                                          
      "\tThis probably means the server terminated abnormally\n"
                                                                                          
      "\tbefore or while processing the request.\n"));

This change causes the same error message to be reported for those five errno.
That is, we cannot identify which errno is actually reported, from the error
message. But I just wonder if it's more helpful for the troubleshooting if we,
for example, append strerror() into the message so that we can easily
identify errno. Thought?


I felt that this was getting to
the point where we'd better centralize the knowledge of what to check,
so the patch does that, via an inline function and an admittedly hacky
macro.  I also upgraded some places such as strerror.c to have full
support for these symbols.

All of the machines I have (even as far back as HPUX 10.20) also
define ENETRESET and EHOSTDOWN.  However, those symbols do not appear
in SUSv2.  ENETRESET was added at some later point, but EHOSTDOWN is
still not in POSIX.  For the moment I've left these second-tier
symbols out of the patch, but there's a case for adding them.  I'm
not sure whether there'd be any point in trying to #ifdef them.

BTW, I took out the conditional defines of some of these errnos in
libpq's win32.h; AFAICS that's been dead code ever since we added
#define's for them to win32_port.h.  Am I missing something?

This seems like a bug fix to me, so I'm inclined to back-patch.

+1

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to