From: Alexander Duyck <alexanderdu...@fb.com> There are cases where a fastopen SYN may trigger either a ICMP_TOOBIG message in the case of IPv6 or a fragmentation request in the case of IPv4. This results in the socket stalling for a second or more as it does not respond to the message by retransmitting the SYN frame.
Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that makes use of the entire MSS. In the case of fastopen it does, and an additional complication is that the retransmit queue doesn't contain the original frames. As a result when tcp_simple_retransmit is called and walks the list of frames in the queue it may not mark the frames as lost because both the SYN and the data packet each individually are smaller than the MSS size after the adjustment. This results in the socket being stalled until the retransmit timer kicks in and forces the SYN frame out again without the data attached. In order to resolve this we can generate our best estimate for the original packet size by detecting the fastopen SYN frame and then adding the overhead for MAX_TCP_OPTION_SPACE and verifying if the SYN w/ data would have exceeded the MSS. If so we can mark the frame as lost and retransmit it. Signed-off-by: Alexander Duyck <alexanderdu...@fb.com> --- net/ipv4/tcp_input.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9e8a6c1aa019..79375b58de84 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2686,11 +2686,35 @@ static void tcp_mtup_probe_success(struct sock *sk) void tcp_simple_retransmit(struct sock *sk) { const struct inet_connection_sock *icsk = inet_csk(sk); + struct sk_buff *skb = tcp_rtx_queue_head(sk); struct tcp_sock *tp = tcp_sk(sk); - struct sk_buff *skb; - unsigned int mss = tcp_current_mss(sk); + unsigned int mss; + + /* A fastopen SYN request is stored as two separate packets within + * the retransmit queue, this is done by tcp_send_syn_data(). + * As a result simply checking the MSS of the frames in the queue + * will not work for the SYN packet. So instead we must make a best + * effort attempt by validating the data frame with the mss size + * that would be computed now by tcp_send_syn_data and comparing + * that against the data frame that would have been included with + * the SYN. + */ + if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN && tp->syn_data) { + struct sk_buff *syn_data = skb_rb_next(skb); + + mss = tcp_mtu_to_mss(sk, icsk->icsk_pmtu_cookie) + + tp->tcp_header_len - sizeof(struct tcphdr) - + MAX_TCP_OPTION_SPACE; - skb_rbtree_walk(skb, &sk->tcp_rtx_queue) { + if (syn_data && syn_data->len > mss) + tcp_mark_skb_lost(sk, skb); + + skb = syn_data; + } else { + mss = tcp_current_mss(sk); + } + + skb_rbtree_walk_from(skb) { if (tcp_skb_seglen(skb) > mss) tcp_mark_skb_lost(sk, skb); }