Given: - tap0, vxlan0 enslaved under a bridge - eth0 is the tunnel underlay having small mtu (e.g. 1400)
Assume GSO skbs arriving from tap0 having a gso_size as determined by user-provided virtio_net_hdr (e.g. 1460 corresponding to VM mtu of 1500). After encapsulation these skbs have skb_gso_network_seglen that exceed underlay ip_skb_dst_mtu. These skbs are accidentally passed to ip_finish_output2 AS IS; however each final segment (either segmented by validate_xmit_skb of eth0, or by eth0 hardware UFO) would be larger than eth0 mtu. As a result, those above-mtu segments get dropped on certain underlay networks. The expected behavior in such a setup would be segmenting the skb first, and then fragmenting each segment according to dst mtu, and finally passing the resulting fragments to ip_finish_output2. 'ip_finish_output_gso' already supports this "Slowpath" behavior, but it is only considered if IPSKB_FORWARDED is set. However in the bridged case, IPSKB_FORWARDED is off, and the "Slowpath" behavior is not considered. Fix, by performing ip_finish_output_gso "Slowpath" even for non IPSKB_FORWARDED skbs. This is also OK for locally created skbs, as they likely to have skb_gso_network_seglen that equals dst mtu, and thus will go directly to 'ip_finish_output2' as done prior this fix. Signed-off-by: Shmulik Ladkani <shmulik.ladk...@ravellosystems.com> --- net/ipv4/ip_output.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index cbac493..8ae65b3 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -223,9 +223,8 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk, struct sk_buff *segs; int ret = 0; - /* common case: locally created skb or seglen is <= mtu */ - if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) || - skb_gso_validate_mtu(skb, mtu)) + /* common case: seglen is <= mtu */ + if (skb_gso_validate_mtu(skb, mtu)) return ip_finish_output2(net, sk, skb); /* Slowpath - GSO segment length is exceeding the dst MTU. -- 1.9.1