I have the following patch that I think will retain that
functionality, basically if you're specifying a local port of 0 and
INADDR_ANY, then you shouldn't call bind.

--- iocore/net/UnixConnection.cc        2012-05-07 14:56:06.000000000 -0700
+++ iocore/net/UnixConnection.cc        2012-10-09 15:44:47.021952385 -0700
@@ -297,15 +297,17 @@
   }

   // Local address/port.
-  struct sockaddr_in bind_sa;
-  memset(&bind_sa, 0, sizeof(bind_sa));
-  bind_sa.sin_family = AF_INET;
-  bind_sa.sin_port = htons(local_port);
-  bind_sa.sin_addr.s_addr = local_addr;
-  if (-1 == socketManager.ink_bind(fd,
-                                  reinterpret_cast<struct sockaddr 
*>(&bind_sa),
-                                  sizeof(bind_sa)))
-    return -errno;
+  if(local_port != 0 || local_addr != INADDR_ANY) {
+    struct sockaddr_in bind_sa;
+    memset(&bind_sa, 0, sizeof(bind_sa));
+    bind_sa.sin_family = AF_INET;
+    bind_sa.sin_port = htons(local_port);
+    bind_sa.sin_addr.s_addr = local_addr;
+    if (-1 == socketManager.ink_bind(fd,
+                                  reinterpret_cast<struct sockaddr 
*>(&bind_sa),
+                                  sizeof(bind_sa)))
+     return -errno;
+  }

   cleanup.reset();
   is_bound = true;


On Tue, Oct 9, 2012 at 3:54 PM, Bart Wyatt <wanderingb...@yooser.com> wrote:
> In cases where the socket address is non local (full transparent proxy) and
> when trafficserver is configured to make upstream OS connections from a
> specific interface/address ( port configs that use the ip-out identifier),
> the ::bind call must precede the connect in order to correctly set the
> socket's "local" address.
>
> Barring those two cases, the ::bind call does seem spurious.  But whatever
> solution we implement should respect and maintain those capabilities.
>
> I ran into a similar issue with non-local address spaces and running out of
> ports in TS-1075.  In that instance the kernels auto-assignment of ports was
> unable to properly account for multiple port-spaces for non-local or Aliased
> IP addresses.
>
> -Bart
>
> -----Original Message-----
> From: Brian Geffon [mailto:bri...@apache.org]
> Sent: Tuesday, October 09, 2012 4:50 PM
> To: dev@trafficserver.apache.org
> Subject: Connect returning EADDRNOTAVAIL
>
> Hello All,
>
> tl;dr: I think we should remove the call to bind() before our call to
> connect().
>
> I've run into a situation where after a while the connect system call in
> Connection::connect in UnixConnect.cc will actually fail with errno = 99
> (EADDRNOTAVAIL) on RHEL 6, this would cause hostdb to mark the host as down
> and then we would see repeated connection failures because hostdb has
> decided the host was down. Receiving a EADDRNOTAVAIL from connect() was very
> surprising since according to many sources connect() should never actually
> return this value. After some digging, it appears that connect can return
> EADDRNOTAVAIL when the local ip port remote ip port pair is already in use.
> But shouldn't the OS have chosen a port that wasn't in use?
>
> So I found two possible solutions to this problem and verified them on a
> host that was exhibiting this sporadic behavior. Both patches are for 3.0.x.
>
> The first patch is as follows:
>
>    --- iocore/net/UnixConnection.cc     2012-05-07 14:56:06.000000000 -0700
>    +++ iocore/net/UnixConnection.cc     2012-10-09 12:35:35.960953957 -0700
>    @@ -324,9 +324,18 @@
>
>       cleaner<Connection> cleanup(this, &Connection::_cleanup); // mark for
> close until we succeed.
>
>    +  /*
>    +   * Connect technically should never return this, but ocasionally
> some OSes will.
>    +   * Since we specified INADDR_ANY and ANYPORT this shouldn't happen, so
> try
>    +   * again to prevent hostdb from marking the host as down when it
> was a supurious
>    +   * OS error
>    +   */
>    +  do {
>       res = ::connect(fd,
>                   reinterpret_cast<struct sockaddr *>(&sa),
>                   sizeof(struct sockaddr_in));
>    +  } while (-1 == res && EADDRNOTAVAIL == errno);
>    +
>       // It's only really an error if either the connect was blocking
>       // or it wasn't blocking and the error was other than EINPROGRESS.
>       // (Is EWOULDBLOCK ok? Does that start the connect?)
>
> Basically, it just involves retrying the connect when the OS returns this
> weird EADDRNOTAVAIL, again, I have verified that this stops the problem.
>
> The second fix was to simply not call bind() before a connect(), this also
> fixes the problem and the reason it does is sort of complicated:
>
>    --- iocore/net/UnixConnection.cc        2012-05-07 14:56:06.000000000
> -0700
>    +++ iocore/net/UnixConnection.cc        2012-10-09 13:35:34.660974785
> -0700
>    @@ -296,6 +296,7 @@
>     #endif
>       }
>
>    +#ifdef BIND_BEFORE_CONNECT
>       // Local address/port.
>       struct sockaddr_in bind_sa;
>       memset(&bind_sa, 0, sizeof(bind_sa));
>    @@ -307,6 +308,8 @@
>                                       sizeof(bind_sa)))
>         return -errno;
>
>    +#endif
>    +
>       cleanup.reset();
>       is_bound = true;
>       return 0;
>
> So after digging for a while to figure out why not calling bind would fix
> this problem it turns out that the Linux kernel uses two different
> mechanisms to find a free port when local port specific is 0 (ANYPORT), the
> method used in bind() can be seen in net/ipv4/inet_connection_sock.c's
> function inet_csk_get_port(), and the method used when connect() is called
> on an unbind socket can be seen in net/ipv4/inet_hashtables.c's function
> __inet_hash_connect().
> The primary difference is that the bind() version does not consider the
> local ip when looking for a port to use, so this can prevent local ports
> from being reused even though the source ip source port remote ip remote
> port 4 tuple is different, I found somewhat of an explanation here:
> http://aleccolocco.blogspot.com/2008/11/ephemeral-ports-problem-and-solution
> .html.
>
> So I was hoping to get some community feedback on what people thing the best
> solution to this problem is, I believe the second solution which doesn't use
> bind is the better approach.
>
> Thanks,
> Brian
>

Reply via email to