Hi, > Hi, > > On Fri, 12 Aug 2016 13:11:50 +0200, han...@stressinduktion.org wrote: >> I really would not like to see this expanded to gre and other protocols. >> All switches drop packets where the packets are exceeding the MTU, >> bridges and also openvswitch should behave the same. >> >> Unfortunately we already had this loophole in the kernel that vxlan udp >> output path could fragment the packet again, even in case of switches. >> But this stopped working for GSO packets, which violates another rule in >> the kernel, GSO should always be transparent and user space should never >> have to care if a packet is GSO or not. >> >> Because we couldn't a) roll back the change that we fragment packets in >> UDP output paths and b) should not violate GSO transparency rule, I >> strongly believed it would be better too only change the kernel in a way >> that it transparently works with GSO, too. If we argue that a VTEP is >> its own UDP endpoint which is set up after the bridge, I still can sleep >> well. :) >> >> My understanding was that GRE failed consistently, GSO as well as >> non-GSO packets are dropped, which would be the correct behavior for me. >> I don't want to change this. A good argument against this would be if we >> violate the GSO transparency rule again. But when I looked into the code >> I couldn't see that. > I completely agree with your arguments. > > I think we may run into the same GSO vs Non-GSO anomaly if one uses > a "nopmtudisc" tunnel, or a gre tunnel in "collect_md" mode, where the > encapsulating iphdr 'df' is derived from 'tun_flags&TUNNEL_DONT_FRAGMENT' > (e.g. in case DF is not set). > > I suspect OvS's vport-gre does exactly that, so I assume this is the > reason why the change was suggested. > > Maybe we can change our criteria in the following manner: > > - if (skb_iif && proto == IPPROTO_UDP) { > + if (skb_iif && !(df & htons(IP_DF))) { > IPCB(skb)->flags |= IPSKB_FRAG_SEGS; > > This way, any tunnel explicitly providing DF is NOT allowed to > further fragment the resulting segments (leading to tx segments being > dropped). > Others tunnels, that do not care (e.g. vxlan and geneve, and probably > ovs vport-gre, or other ovs encap vports, in df_default=false mode), > will behave same for gso and non-gso. > > WDYT? Am I missing something here? > > Thanks, > Shmulik
I think the criteria 'skb_iif && !(df & htons(IP_DF)' is suitable. 'nopmtudisc' tunnel and ovs-gre tunnel can clear the DF through df_default=false. In this situation both gso(final segment) or no-gso packet can be fragment(if the packet size is more than mtu). Thanks Wenxu