Hello All,

tl;dr: I think we should remove the call to bind() before our call to connect().

I've run into a situation where after a while the connect system call
in Connection::connect in UnixConnect.cc will actually fail with errno
= 99 (EADDRNOTAVAIL) on RHEL 6, this would cause hostdb to mark the
host as down and then we would see repeated connection failures
because hostdb has decided the host was down. Receiving a
EADDRNOTAVAIL from connect() was very surprising since according to
many sources connect() should never actually return this value. After
some digging, it appears that connect can return EADDRNOTAVAIL when
the local ip port remote ip port pair is already in use. But shouldn't
the OS have chosen a port that wasn't in use?

So I found two possible solutions to this problem and verified them on
a host that was exhibiting this sporadic behavior. Both patches are
for 3.0.x.

The first patch is as follows:

   --- iocore/net/UnixConnection.cc     2012-05-07 14:56:06.000000000 -0700
   +++ iocore/net/UnixConnection.cc     2012-10-09 12:35:35.960953957 -0700
   @@ -324,9 +324,18 @@

      cleaner<Connection> cleanup(this, &Connection::_cleanup); //
mark for close until we succeed.

   +  /*
   +   * Connect technically should never return this, but ocasionally
some OSes will.
   +   * Since we specified INADDR_ANY and ANYPORT this shouldn't happen, so try
   +   * again to prevent hostdb from marking the host as down when it
was a supurious
   +   * OS error
   +   */
   +  do {
      res = ::connect(fd,
                  reinterpret_cast<struct sockaddr *>(&sa),
                  sizeof(struct sockaddr_in));
   +  } while (-1 == res && EADDRNOTAVAIL == errno);
   +
      // It's only really an error if either the connect was blocking
      // or it wasn't blocking and the error was other than EINPROGRESS.
      // (Is EWOULDBLOCK ok? Does that start the connect?)

Basically, it just involves retrying the connect when the OS returns
this weird EADDRNOTAVAIL, again, I have verified that this stops the
problem.

The second fix was to simply not call bind() before a connect(), this
also fixes the problem and the reason it does is sort of complicated:

   --- iocore/net/UnixConnection.cc        2012-05-07 14:56:06.000000000 -0700
   +++ iocore/net/UnixConnection.cc        2012-10-09 13:35:34.660974785 -0700
   @@ -296,6 +296,7 @@
    #endif
      }

   +#ifdef BIND_BEFORE_CONNECT
      // Local address/port.
      struct sockaddr_in bind_sa;
      memset(&bind_sa, 0, sizeof(bind_sa));
   @@ -307,6 +308,8 @@
                                      sizeof(bind_sa)))
        return -errno;

   +#endif
   +
      cleanup.reset();
      is_bound = true;
      return 0;

So after digging for a while to figure out why not calling bind would
fix this problem it turns out that the Linux kernel uses two different
mechanisms to find a free port when local port specific is 0
(ANYPORT), the method used in bind() can be seen in
net/ipv4/inet_connection_sock.c's function inet_csk_get_port(), and
the method used when connect() is called on an unbind socket can be
seen in net/ipv4/inet_hashtables.c's function __inet_hash_connect().
The primary difference is that the bind() version does not consider
the local ip when looking for a port to use, so this can prevent local
ports from being reused even though the source ip source port remote
ip remote port 4 tuple is different, I found somewhat of an
explanation here:
http://aleccolocco.blogspot.com/2008/11/ephemeral-ports-problem-and-solution.html.

So I was hoping to get some community feedback on what people thing
the best solution to this problem is, I believe the second solution
which doesn't use bind is the better approach.

Thanks,
Brian

Reply via email to