On Fri, Oct 28, 2016 at 5:08 AM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On Thu, 2016-10-27 at 12:54 -0500, Thomas Falcon wrote: >> On 10/27/2016 10:26 AM, Eric Dumazet wrote: >> > On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote: >> >> We recently encountered a bug where a few customers using ibmveth on the >> >> same LPAR hit an issue where a TCP session hung when large receive was >> >> enabled. Closer analysis revealed that the session was stuck because the >> >> one side was advertising a zero window repeatedly. >> >> >> >> We narrowed this down to the fact the ibmveth driver did not set gso_size >> >> which is translated by TCP into the MSS later up the stack. The MSS is >> >> used to calculate the TCP window size and as that was abnormally large, >> >> it was calculating a zero window, even although the sockets receive buffer >> >> was completely empty. >> >> >> >> We were able to reproduce this and worked with IBM to fix this. Thanks Tom >> >> and Marcelo for all your help and review on this. >> >> >> >> The patch fixes both our internal reproduction tests and our customers >> >> tests. >> >> >> >> Signed-off-by: Jon Maxwell <jmaxwel...@gmail.com> >> >> --- >> >> drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++ >> >> 1 file changed, 20 insertions(+) >> >> >> >> diff --git a/drivers/net/ethernet/ibm/ibmveth.c >> >> b/drivers/net/ethernet/ibm/ibmveth.c >> >> index 29c05d0..c51717e 100644 >> >> --- a/drivers/net/ethernet/ibm/ibmveth.c >> >> +++ b/drivers/net/ethernet/ibm/ibmveth.c >> >> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, >> >> int budget) >> >> int frames_processed = 0; >> >> unsigned long lpar_rc; >> >> struct iphdr *iph; >> >> + bool large_packet = 0; >> >> + u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr); >> >> >> >> restart_poll: >> >> while (frames_processed < budget) { >> >> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, >> >> int budget) >> >> iph->check = 0; >> >> iph->check = >> >> ip_fast_csum((unsigned char *)iph, iph->ihl); >> >> adapter->rx_large_packets++; >> >> + large_packet = 1; >> >> } >> >> } >> >> } >> >> >> >> + if (skb->len > netdev->mtu) { >> >> + iph = (struct iphdr *)skb->data; >> >> + if (be16_to_cpu(skb->protocol) == ETH_P_IP && >> >> + iph->protocol == IPPROTO_TCP) { >> >> + hdr_len += sizeof(struct iphdr); >> >> + skb_shinfo(skb)->gso_type = >> >> SKB_GSO_TCPV4; >> >> + skb_shinfo(skb)->gso_size = >> >> netdev->mtu - hdr_len; >> >> + } else if (be16_to_cpu(skb->protocol) == >> >> ETH_P_IPV6 && >> >> + iph->protocol == IPPROTO_TCP) { >> >> + hdr_len += sizeof(struct ipv6hdr); >> >> + skb_shinfo(skb)->gso_type = >> >> SKB_GSO_TCPV6; >> >> + skb_shinfo(skb)->gso_size = >> >> netdev->mtu - hdr_len; >> >> + } >> >> + if (!large_packet) >> >> + adapter->rx_large_packets++; >> >> + } >> >> + >> >> >> > This might break forwarding and PMTU discovery. >> > >> > You force gso_size to device mtu, regardless of real MSS used by the TCP >> > sender. >> > >> > Don't you have the MSS provided in RX descriptor, instead of guessing >> > the value ? >> > >> > >> > >> The MSS is not always available unfortunately, so this is the best solution >> there is at the moment. > > Hmm... then what about skb_shinfo(skb)->gso_segs ? > > ip_rcv() for example has : > > __IP_ADD_STATS(net, > IPSTATS_MIB_NOECTPKTS + (iph->tos & INET_ECN_MASK), > max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs)); > >
Okay thanks Eric. As the MSS is the main concern, let me work with the team here and see if we find can a better way to do this. Will take care of the other points you raised at the same time. > > Also prefer : (skb->protocol == htons(ETH_P_IP)) tests > > And the ipv6 test is wrong : > > } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 && > iph->protocol == IPPROTO_TCP) { > > > Since iph is a pointer to ipv4 iphdr . > > >