It appears that bind() and connect() use two different methods for finding a local port and since bind doesn't know how the port will be used that sort of makes sense why it needs to be more restrictive.
If there are no objections, I'll commit the fix that just checks that local_port != 0 || local_addr != INADDR_ANY and only in that situation will it call bind() so that we will not break any of the transparent proxy stuff or alan's ip-out stuff. Brian On Tue, Oct 9, 2012 at 3:54 PM, Bart Wyatt <wanderingb...@yooser.com> wrote: > In cases where the socket address is non local (full transparent proxy) and > when trafficserver is configured to make upstream OS connections from a > specific interface/address ( port configs that use the ip-out identifier), > the ::bind call must precede the connect in order to correctly set the > socket's "local" address. > > Barring those two cases, the ::bind call does seem spurious. But whatever > solution we implement should respect and maintain those capabilities. > > I ran into a similar issue with non-local address spaces and running out of > ports in TS-1075. In that instance the kernels auto-assignment of ports was > unable to properly account for multiple port-spaces for non-local or Aliased > IP addresses. > > -Bart > > -----Original Message----- > From: Brian Geffon [mailto:bri...@apache.org] > Sent: Tuesday, October 09, 2012 4:50 PM > To: dev@trafficserver.apache.org > Subject: Connect returning EADDRNOTAVAIL > > Hello All, > > tl;dr: I think we should remove the call to bind() before our call to > connect(). > > I've run into a situation where after a while the connect system call in > Connection::connect in UnixConnect.cc will actually fail with errno = 99 > (EADDRNOTAVAIL) on RHEL 6, this would cause hostdb to mark the host as down > and then we would see repeated connection failures because hostdb has > decided the host was down. Receiving a EADDRNOTAVAIL from connect() was very > surprising since according to many sources connect() should never actually > return this value. After some digging, it appears that connect can return > EADDRNOTAVAIL when the local ip port remote ip port pair is already in use. > But shouldn't the OS have chosen a port that wasn't in use? > > So I found two possible solutions to this problem and verified them on a > host that was exhibiting this sporadic behavior. Both patches are for 3.0.x. > > The first patch is as follows: > > --- iocore/net/UnixConnection.cc 2012-05-07 14:56:06.000000000 -0700 > +++ iocore/net/UnixConnection.cc 2012-10-09 12:35:35.960953957 -0700 > @@ -324,9 +324,18 @@ > > cleaner<Connection> cleanup(this, &Connection::_cleanup); // mark for > close until we succeed. > > + /* > + * Connect technically should never return this, but ocasionally > some OSes will. > + * Since we specified INADDR_ANY and ANYPORT this shouldn't happen, so > try > + * again to prevent hostdb from marking the host as down when it > was a supurious > + * OS error > + */ > + do { > res = ::connect(fd, > reinterpret_cast<struct sockaddr *>(&sa), > sizeof(struct sockaddr_in)); > + } while (-1 == res && EADDRNOTAVAIL == errno); > + > // It's only really an error if either the connect was blocking > // or it wasn't blocking and the error was other than EINPROGRESS. > // (Is EWOULDBLOCK ok? Does that start the connect?) > > Basically, it just involves retrying the connect when the OS returns this > weird EADDRNOTAVAIL, again, I have verified that this stops the problem. > > The second fix was to simply not call bind() before a connect(), this also > fixes the problem and the reason it does is sort of complicated: > > --- iocore/net/UnixConnection.cc 2012-05-07 14:56:06.000000000 > -0700 > +++ iocore/net/UnixConnection.cc 2012-10-09 13:35:34.660974785 > -0700 > @@ -296,6 +296,7 @@ > #endif > } > > +#ifdef BIND_BEFORE_CONNECT > // Local address/port. > struct sockaddr_in bind_sa; > memset(&bind_sa, 0, sizeof(bind_sa)); > @@ -307,6 +308,8 @@ > sizeof(bind_sa))) > return -errno; > > +#endif > + > cleanup.reset(); > is_bound = true; > return 0; > > So after digging for a while to figure out why not calling bind would fix > this problem it turns out that the Linux kernel uses two different > mechanisms to find a free port when local port specific is 0 (ANYPORT), the > method used in bind() can be seen in net/ipv4/inet_connection_sock.c's > function inet_csk_get_port(), and the method used when connect() is called > on an unbind socket can be seen in net/ipv4/inet_hashtables.c's function > __inet_hash_connect(). > The primary difference is that the bind() version does not consider the > local ip when looking for a port to use, so this can prevent local ports > from being reused even though the source ip source port remote ip remote > port 4 tuple is different, I found somewhat of an explanation here: > http://aleccolocco.blogspot.com/2008/11/ephemeral-ports-problem-and-solution > .html. > > So I was hoping to get some community feedback on what people thing the best > solution to this problem is, I believe the second solution which doesn't use > bind is the better approach. > > Thanks, > Brian >