On 9/26/19 9:57 AM, Eric Dumazet wrote:
> 
> 
> On 9/26/19 9:46 AM, Eric Dumazet wrote:
>>
>>
>> On 9/26/19 8:05 AM, Eric Dumazet wrote:
>>>
>>>
>>> On 9/25/19 1:46 AM, Marek Majkowski wrote:
>>>> Hello my favorite mailing list!
>>>>
>>>> Recently I've been looking into TCP_USER_TIMEOUT and noticed some
>>>> strange behaviour on fresh sockets in SYN-SENT state. Full writeup:
>>>> https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
>>>>
>>>> Here's a reproducer. It does a simple thing: sets TCP_USER_TIMEOUT and
>>>> does connect() to a blackholed IP:
>>>>
>>>> $ wget 
>>>> https://gist.githubusercontent.com/majek/b4ad53c5795b226d62fad1fa4a87151a/raw/cbb928cb99cd6c5aa9f73ba2d3bc0aef22fbc2bf/user-timeout-and-syn.py
>>>>
>>>> $ sudo python3 user-timeout-and-syn.py
>>>> 00:00.000000 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:01.007053 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:03.023051 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.007096 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.015037 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.023020 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.034983 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>>
>>>> The connect() times out with ETIMEDOUT after 5 seconds - as intended.
>>>> But Linux (5.3.0-rc3) does something weird on the network - it sends
>>>> remaining tcp_syn_retries packets aligned to the 5s mark.
>>>>
>>>> In other words: with TCP_USER_TIMEOUT we are sending spurious SYN
>>>> packets on a timeout.
>>>>
>>>> For the record, the man page doesn't define what TCP_USER_TIMEOUT does
>>>> on SYN-SENT state.
>>>>
>>>
>>> Exactly, so far this option has only be used on established flows.
>>>
>>> Feel free to send patches if you need to override the stack behavior
>>> for connection establishment (Same remark for passive side...)
>>
>> Also please take a look at TCP_SYNCNT,  which predates TCP_USER_TIMEOUT
>>
>>
> 
> I will test the following :
> 
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 
> dbd9d2d0ee63aa46ad2dda417da6ec9409442b77..1182e51a6b794d75beb8c130354d7804fc83a307
>  100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -220,7 +220,6 @@ static int tcp_write_timeout(struct sock *sk)
>                         sk_rethink_txhash(sk);
>                 }
>                 retry_until = icsk->icsk_syn_retries ? : 
> net->ipv4.sysctl_tcp_syn_retries;
> -               expired = icsk->icsk_retransmits >= retry_until;
>         } else {
>                 if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1, 
> 0)) {
>                         /* Black hole detection */
> @@ -242,9 +241,9 @@ static int tcp_write_timeout(struct sock *sk)
>                         if (tcp_out_of_resources(sk, do_reset))
>                                 return 1;
>                 }
> -               expired = retransmits_timed_out(sk, retry_until,
> -                                               icsk->icsk_user_timeout);
>         }
> +       expired = retransmits_timed_out(sk, retry_until,
> +                                       icsk->icsk_user_timeout);
>         tcp_fastopen_active_detect_blackhole(sk, expired);
>  
>         if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RTO_CB_FLAG))
> 

The patch works well, but reading again the man page, I see the existing 
behavior as
been clearly documented.

If we change the behavior, we might break applications that were setting 
TCP_USER_TIMEOUT
on the listener, expecting the value to b inherited to children at accept() time
but not expecting to change SYNACK rtx behavior.

On the other hand, John Maxell patch (tcp: Add tcp_clamp_rto_to_user_timeout() 
helper to improve accuracy)
has added this weird effect of sending remaining SYN every jiffie


     remaining = icsk->icsk_user_timeout - elapsed;
     if (remaining <= 0)
         return 1; /* user timeout has passed; fire ASAP */ 

So we probably just should extend TCP_USER_TIMEOUT to SYN_SENT/SYN_RECV states
and change the man page accordingly. 



       TCP_USER_TIMEOUT (since Linux 2.6.37)
              This  option takes an unsigned int as an argument.  When the 
value is
              greater than 0, it specifies the maximum amount of time in  
millisec‐
              onds  that transmitted data may remain unacknowledged before TCP 
will
              forcibly close the corresponding connection and return  ETIMEDOUT 
 to
              the  application.  If the option value is specified as 0, TCP 
will to
              use the system default.

              Increasing user timeouts allows a TCP connection to survive  
extended
              periods  without  end-to-end  connectivity.  Decreasing user 
timeouts
              allows applications to "fail fast", if so desired.  Otherwise,  
fail‐
              ure  may  take up to 20 minutes with the current system defaults 
in a
              normal WAN environment.

              This option can be set during any state of a TCP connection,  but 
 is
              effective only during the synchronized states of a connection 
(ESTAB‐
              LISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING,  and  
LAST-ACK).
              Moreover,  when  used  with  the TCP keepalive (SO_KEEPALIVE) 
option,
              TCP_USER_TIMEOUT will override keepalive to determine when to 
close a
              connection due to keepalive failure.

              The option has no effect on when TCP retransmits a packet, nor 
when a
              keepalive probe is sent.

              This option, like many others, will be inherited by  the  socket  
re‐
              turned by accept(2), if it was set on the listening socket.

              Further  details  on the user timeout feature can be found in RFC 
793
              and RFC 5482 ("TCP User Timeout Option").

Reply via email to