On 19.08.2016 09:26, Shmulik Ladkani wrote: > On Mon, 15 Aug 2016 14:16:39 +0300 Shmulik Ladkani > <shmulik.ladk...@gmail.com> wrote: >> On Fri, 12 Aug 2016 13:11:50 +0200, han...@stressinduktion.org wrote: >>> I really would not like to see this expanded to gre and other protocols. >>> All switches drop packets where the packets are exceeding the MTU, >>> bridges and also openvswitch should behave the same. >>> >>> Unfortunately we already had this loophole in the kernel that vxlan udp >>> output path could fragment the packet again, even in case of switches. >>> But this stopped working for GSO packets, which violates another rule in >>> the kernel, GSO should always be transparent and user space should never >>> have to care if a packet is GSO or not. >>> >>> Because we couldn't a) roll back the change that we fragment packets in >>> UDP output paths and b) should not violate GSO transparency rule, I >>> strongly believed it would be better too only change the kernel in a way >>> that it transparently works with GSO, too. If we argue that a VTEP is >>> its own UDP endpoint which is set up after the bridge, I still can sleep >>> well. :) >>> >>> My understanding was that GRE failed consistently, GSO as well as >>> non-GSO packets are dropped, which would be the correct behavior for me. >>> I don't want to change this. A good argument against this would be if we >>> violate the GSO transparency rule again. But when I looked into the code >>> I couldn't see that. >> >> I completely agree with your arguments. >> >> I think we may run into the same GSO vs Non-GSO anomaly if one uses >> a "nopmtudisc" tunnel, or a gre tunnel in "collect_md" mode, where the >> encapsulating iphdr 'df' is derived from 'tun_flags&TUNNEL_DONT_FRAGMENT' >> (e.g. in case DF is not set). >> >> I suspect OvS's vport-gre does exactly that, so I assume this is the >> reason why the change was suggested. >> >> Maybe we can change our criteria in the following manner: >> >> - if (skb_iif && proto == IPPROTO_UDP) { >> + if (skb_iif && !(df & htons(IP_DF))) { >> IPCB(skb)->flags |= IPSKB_FRAG_SEGS; >> >> This way, any tunnel explicitly providing DF is NOT allowed to >> further fragment the resulting segments (leading to tx segments being >> dropped). >> Others tunnels, that do not care (e.g. vxlan and geneve, and probably >> ovs vport-gre, or other ovs encap vports, in df_default=false mode), >> will behave same for gso and non-gso. >> >> WDYT? Am I missing something here? >> > > ping..
I am really not sure... Probably we have no other choice. Bridges caring about df bit set is rather unusual. I wonder if it might not be more sensitive to actually add a sysctl for that or make it depending on some per-tunnel configuration which can be updated with netlink? Bye, Hannes