On Tue, Oct 9, 2012 at 4:22 PM, Brian Geffon wrote:
> I have the following patch that I think will retain that
> functionality, basically if you're specifying a local port of 0 and
> INADDR_ANY, then you shouldn't call bind.
>
> --- iocore/net/UnixConnection.cc        2012-05-07 14:56:06.000000000 -0700
> +++ iocore/net/UnixConnection.cc        2012-10-09 15:44:47.021952385 -0700
> @@ -297,15 +297,17 @@
>    }
>
>    // Local address/port.
> -  struct sockaddr_in bind_sa;
> -  memset(&bind_sa, 0, sizeof(bind_sa));
> -  bind_sa.sin_family = AF_INET;
> -  bind_sa.sin_port = htons(local_port);
> -  bind_sa.sin_addr.s_addr = local_addr;
> -  if (-1 == socketManager.ink_bind(fd,
> -                                  reinterpret_cast<struct sockaddr 
> *>(&bind_sa),
> -                                  sizeof(bind_sa)))
> -    return -errno;
> +  if(local_port != 0 || local_addr != INADDR_ANY) {
> +    struct sockaddr_in bind_sa;
> +    memset(&bind_sa, 0, sizeof(bind_sa));
> +    bind_sa.sin_family = AF_INET;
> +    bind_sa.sin_port = htons(local_port);
> +    bind_sa.sin_addr.s_addr = local_addr;
> +    if (-1 == socketManager.ink_bind(fd,
> +                                  reinterpret_cast<struct sockaddr 
> *>(&bind_sa),
> +                                  sizeof(bind_sa)))
> +     return -errno;
> +  }
>
>    cleanup.reset();
>    is_bound = true;
>
>
> On Tue, Oct 9, 2012 at 3:54 PM, Bart Wyatt <wanderingb...@yooser.com> wrote:
>> In cases where the socket address is non local (full transparent proxy) and
>> when trafficserver is configured to make upstream OS connections from a
>> specific interface/address ( port configs that use the ip-out identifier),
>> the ::bind call must precede the connect in order to correctly set the
>> socket's "local" address.
>>
>> Barring those two cases, the ::bind call does seem spurious.  But whatever
>> solution we implement should respect and maintain those capabilities.
>>
>> I ran into a similar issue with non-local address spaces and running out of
>> ports in TS-1075.  In that instance the kernels auto-assignment of ports was
>> unable to properly account for multiple port-spaces for non-local or Aliased
>> IP addresses.
>>
>> -Bart
>>
>> -----Original Message-----
>> From: Brian Geffon [mailto:bri...@apache.org]
>> Sent: Tuesday, October 09, 2012 4:50 PM
>> To: dev@trafficserver.apache.org
>> Subject: Connect returning EADDRNOTAVAIL
>>
>> Hello All,
>>
>> tl;dr: I think we should remove the call to bind() before our call to
>> connect().
>>
>> I've run into a situation where after a while the connect system call in
>> Connection::connect in UnixConnect.cc will actually fail with errno = 99
>> (EADDRNOTAVAIL) on RHEL 6, this would cause hostdb to mark the host as down
>> and then we would see repeated connection failures because hostdb has
>> decided the host was down. Receiving a EADDRNOTAVAIL from connect() was very
>> surprising since according to many sources connect() should never actually
>> return this value. After some digging, it appears that connect can return
>> EADDRNOTAVAIL when the local ip port remote ip port pair is already in use.
>> But shouldn't the OS have chosen a port that wasn't in use?
>>
>> So I found two possible solutions to this problem and verified them on a
>> host that was exhibiting this sporadic behavior. Both patches are for 3.0.x.
>>
>> The first patch is as follows:
>>
>>    --- iocore/net/UnixConnection.cc     2012-05-07 14:56:06.000000000 -0700
>>    +++ iocore/net/UnixConnection.cc     2012-10-09 12:35:35.960953957 -0700
>>    @@ -324,9 +324,18 @@
>>
>>       cleaner<Connection> cleanup(this, &Connection::_cleanup); // mark for
>> close until we succeed.
>>
>>    +  /*
>>    +   * Connect technically should never return this, but ocasionally
>> some OSes will.
>>    +   * Since we specified INADDR_ANY and ANYPORT this shouldn't happen, so
>> try
>>    +   * again to prevent hostdb from marking the host as down when it
>> was a supurious
>>    +   * OS error
>>    +   */
>>    +  do {
>>       res = ::connect(fd,
>>                   reinterpret_cast<struct sockaddr *>(&sa),
>>                   sizeof(struct sockaddr_in));
>>    +  } while (-1 == res && EADDRNOTAVAIL == errno);
>>    +
>>       // It's only really an error if either the connect was blocking
>>       // or it wasn't blocking and the error was other than EINPROGRESS.
>>       // (Is EWOULDBLOCK ok? Does that start the connect?)
>>
>> Basically, it just involves retrying the connect when the OS returns this
>> weird EADDRNOTAVAIL, again, I have verified that this stops the problem.
>>
>> The second fix was to simply not call bind() before a connect(), this also
>> fixes the problem and the reason it does is sort of complicated:
>>
>>    --- iocore/net/UnixConnection.cc        2012-05-07 14:56:06.000000000
>> -0700
>>    +++ iocore/net/UnixConnection.cc        2012-10-09 13:35:34.660974785
>> -0700
>>    @@ -296,6 +296,7 @@
>>     #endif
>>       }
>>
>>    +#ifdef BIND_BEFORE_CONNECT
>>       // Local address/port.
>>       struct sockaddr_in bind_sa;
>>       memset(&bind_sa, 0, sizeof(bind_sa));
>>    @@ -307,6 +308,8 @@
>>                                       sizeof(bind_sa)))
>>         return -errno;
>>
>>    +#endif
>>    +
>>       cleanup.reset();
>>       is_bound = true;
>>       return 0;
>>
>> So after digging for a while to figure out why not calling bind would fix
>> this problem it turns out that the Linux kernel uses two different
>> mechanisms to find a free port when local port specific is 0 (ANYPORT), the
>> method used in bind() can be seen in net/ipv4/inet_connection_sock.c's
>> function inet_csk_get_port(), and the method used when connect() is called
>> on an unbind socket can be seen in net/ipv4/inet_hashtables.c's function
>> __inet_hash_connect().
>> The primary difference is that the bind() version does not consider the
>> local ip when looking for a port to use, so this can prevent local ports
>> from being reused even though the source ip source port remote ip remote
>> port 4 tuple is different, I found somewhat of an explanation here:
>> http://aleccolocco.blogspot.com/2008/11/ephemeral-ports-problem-and-solution
>> .html.
>>
>> So I was hoping to get some community feedback on what people thing the best
>> solution to this problem is, I believe the second solution which doesn't use
>> bind is the better approach.
>>
>> Thanks,
>> Brian
>>

Reply via email to