In cases where the socket address is non local (full transparent proxy) and
when trafficserver is configured to make upstream OS connections from a
specific interface/address ( port configs that use the ip-out identifier),
the ::bind call must precede the connect in order to correctly set the
socket's "local" address.

Barring those two cases, the ::bind call does seem spurious.  But whatever
solution we implement should respect and maintain those capabilities.

I ran into a similar issue with non-local address spaces and running out of
ports in TS-1075.  In that instance the kernels auto-assignment of ports was
unable to properly account for multiple port-spaces for non-local or Aliased
IP addresses.

-Bart

-----Original Message-----
From: Brian Geffon [mailto:bri...@apache.org] 
Sent: Tuesday, October 09, 2012 4:50 PM
To: dev@trafficserver.apache.org
Subject: Connect returning EADDRNOTAVAIL

Hello All,

tl;dr: I think we should remove the call to bind() before our call to
connect().

I've run into a situation where after a while the connect system call in
Connection::connect in UnixConnect.cc will actually fail with errno = 99
(EADDRNOTAVAIL) on RHEL 6, this would cause hostdb to mark the host as down
and then we would see repeated connection failures because hostdb has
decided the host was down. Receiving a EADDRNOTAVAIL from connect() was very
surprising since according to many sources connect() should never actually
return this value. After some digging, it appears that connect can return
EADDRNOTAVAIL when the local ip port remote ip port pair is already in use.
But shouldn't the OS have chosen a port that wasn't in use?

So I found two possible solutions to this problem and verified them on a
host that was exhibiting this sporadic behavior. Both patches are for 3.0.x.

The first patch is as follows:

   --- iocore/net/UnixConnection.cc     2012-05-07 14:56:06.000000000 -0700
   +++ iocore/net/UnixConnection.cc     2012-10-09 12:35:35.960953957 -0700
   @@ -324,9 +324,18 @@

      cleaner<Connection> cleanup(this, &Connection::_cleanup); // mark for
close until we succeed.

   +  /*
   +   * Connect technically should never return this, but ocasionally
some OSes will.
   +   * Since we specified INADDR_ANY and ANYPORT this shouldn't happen, so
try
   +   * again to prevent hostdb from marking the host as down when it
was a supurious
   +   * OS error
   +   */
   +  do {
      res = ::connect(fd,
                  reinterpret_cast<struct sockaddr *>(&sa),
                  sizeof(struct sockaddr_in));
   +  } while (-1 == res && EADDRNOTAVAIL == errno);
   +
      // It's only really an error if either the connect was blocking
      // or it wasn't blocking and the error was other than EINPROGRESS.
      // (Is EWOULDBLOCK ok? Does that start the connect?)

Basically, it just involves retrying the connect when the OS returns this
weird EADDRNOTAVAIL, again, I have verified that this stops the problem.

The second fix was to simply not call bind() before a connect(), this also
fixes the problem and the reason it does is sort of complicated:

   --- iocore/net/UnixConnection.cc        2012-05-07 14:56:06.000000000
-0700
   +++ iocore/net/UnixConnection.cc        2012-10-09 13:35:34.660974785
-0700
   @@ -296,6 +296,7 @@
    #endif
      }

   +#ifdef BIND_BEFORE_CONNECT
      // Local address/port.
      struct sockaddr_in bind_sa;
      memset(&bind_sa, 0, sizeof(bind_sa));
   @@ -307,6 +308,8 @@
                                      sizeof(bind_sa)))
        return -errno;

   +#endif
   +
      cleanup.reset();
      is_bound = true;
      return 0;

So after digging for a while to figure out why not calling bind would fix
this problem it turns out that the Linux kernel uses two different
mechanisms to find a free port when local port specific is 0 (ANYPORT), the
method used in bind() can be seen in net/ipv4/inet_connection_sock.c's
function inet_csk_get_port(), and the method used when connect() is called
on an unbind socket can be seen in net/ipv4/inet_hashtables.c's function
__inet_hash_connect().
The primary difference is that the bind() version does not consider the
local ip when looking for a port to use, so this can prevent local ports
from being reused even though the source ip source port remote ip remote
port 4 tuple is different, I found somewhat of an explanation here:
http://aleccolocco.blogspot.com/2008/11/ephemeral-ports-problem-and-solution
.html.

So I was hoping to get some community feedback on what people thing the best
solution to this problem is, I believe the second solution which doesn't use
bind is the better approach.

Thanks,
Brian

Reply via email to