In cases where the socket address is non local (full transparent proxy) and when trafficserver is configured to make upstream OS connections from a specific interface/address ( port configs that use the ip-out identifier), the ::bind call must precede the connect in order to correctly set the socket's "local" address.
Barring those two cases, the ::bind call does seem spurious. But whatever solution we implement should respect and maintain those capabilities. I ran into a similar issue with non-local address spaces and running out of ports in TS-1075. In that instance the kernels auto-assignment of ports was unable to properly account for multiple port-spaces for non-local or Aliased IP addresses. -Bart -----Original Message----- From: Brian Geffon [mailto:bri...@apache.org] Sent: Tuesday, October 09, 2012 4:50 PM To: dev@trafficserver.apache.org Subject: Connect returning EADDRNOTAVAIL Hello All, tl;dr: I think we should remove the call to bind() before our call to connect(). I've run into a situation where after a while the connect system call in Connection::connect in UnixConnect.cc will actually fail with errno = 99 (EADDRNOTAVAIL) on RHEL 6, this would cause hostdb to mark the host as down and then we would see repeated connection failures because hostdb has decided the host was down. Receiving a EADDRNOTAVAIL from connect() was very surprising since according to many sources connect() should never actually return this value. After some digging, it appears that connect can return EADDRNOTAVAIL when the local ip port remote ip port pair is already in use. But shouldn't the OS have chosen a port that wasn't in use? So I found two possible solutions to this problem and verified them on a host that was exhibiting this sporadic behavior. Both patches are for 3.0.x. The first patch is as follows: --- iocore/net/UnixConnection.cc 2012-05-07 14:56:06.000000000 -0700 +++ iocore/net/UnixConnection.cc 2012-10-09 12:35:35.960953957 -0700 @@ -324,9 +324,18 @@ cleaner<Connection> cleanup(this, &Connection::_cleanup); // mark for close until we succeed. + /* + * Connect technically should never return this, but ocasionally some OSes will. + * Since we specified INADDR_ANY and ANYPORT this shouldn't happen, so try + * again to prevent hostdb from marking the host as down when it was a supurious + * OS error + */ + do { res = ::connect(fd, reinterpret_cast<struct sockaddr *>(&sa), sizeof(struct sockaddr_in)); + } while (-1 == res && EADDRNOTAVAIL == errno); + // It's only really an error if either the connect was blocking // or it wasn't blocking and the error was other than EINPROGRESS. // (Is EWOULDBLOCK ok? Does that start the connect?) Basically, it just involves retrying the connect when the OS returns this weird EADDRNOTAVAIL, again, I have verified that this stops the problem. The second fix was to simply not call bind() before a connect(), this also fixes the problem and the reason it does is sort of complicated: --- iocore/net/UnixConnection.cc 2012-05-07 14:56:06.000000000 -0700 +++ iocore/net/UnixConnection.cc 2012-10-09 13:35:34.660974785 -0700 @@ -296,6 +296,7 @@ #endif } +#ifdef BIND_BEFORE_CONNECT // Local address/port. struct sockaddr_in bind_sa; memset(&bind_sa, 0, sizeof(bind_sa)); @@ -307,6 +308,8 @@ sizeof(bind_sa))) return -errno; +#endif + cleanup.reset(); is_bound = true; return 0; So after digging for a while to figure out why not calling bind would fix this problem it turns out that the Linux kernel uses two different mechanisms to find a free port when local port specific is 0 (ANYPORT), the method used in bind() can be seen in net/ipv4/inet_connection_sock.c's function inet_csk_get_port(), and the method used when connect() is called on an unbind socket can be seen in net/ipv4/inet_hashtables.c's function __inet_hash_connect(). The primary difference is that the bind() version does not consider the local ip when looking for a port to use, so this can prevent local ports from being reused even though the source ip source port remote ip remote port 4 tuple is different, I found somewhat of an explanation here: http://aleccolocco.blogspot.com/2008/11/ephemeral-ports-problem-and-solution .html. So I was hoping to get some community feedback on what people thing the best solution to this problem is, I believe the second solution which doesn't use bind is the better approach. Thanks, Brian