On 10/27/2020 11:55 PM, Jakub Kicinski wrote: > On Tue, 27 Oct 2020 08:51:07 -0600 David Ahern wrote: >>> Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery >>> support for directly bridged IP packets")? Sounds like non-UDP tunnels >>> need the same treatment to make PMTUD work. >>> >>> RFC2003 seems to clearly forbid ignoring the inner DF: >> I was looking at this patch Sunday night. To me it seems odd that >> packets flowing through the overlay affect decisions in the underlay >> which meant I agree with the proposed change. > The RFC was probably written before we invented terms like underlay > and overlay, and still considered tunneling to be an inefficient hack ;) > >> ip_md_tunnel_xmit is inconsistent right now. tnl_update_pmtu is called >> based on the TUNNEL_DONT_FRAGMENT flag, so why let it be changed later >> based on the inner header? Or, if you agree with RFC 2003 and the DF >> should be propagated outer to inner, then it seems like the df reset >> needs to be moved up before the call to tnl_update_pmtu > Looks like TUNNEL_DONT_FRAGMENT is intended to switch between using > PMTU inside the tunnel or just the tunnel dev MTU. ICMP PTB is still > generated based on the inner headers. > > We should be okay to add something like IFLA_GRE_IGNORE_DF to lwt, > but IMHO the default should not be violating the RFC.
If we add TUNNEL_IGNORE_DF to lwt, the two IGNORE_DF and DONT_FRAGMENT flags should not coexist ? Or DONT_FRAGMENT is prior to the IGNORE_DF? Also there is inconsistent in the kernel for the tunnel device. For geneve and vxlan tunnel (don't send tunnel with ip_md_tunnel_xmit) in the lwt mode set the outer df only based TUNNEL_DONT_FRAGMENT . And this is also the some behavior for gre device before switching to use ip_md_tunnel_xmit as the following patch. 962924f ip_gre: Refactor collect metatdata mode tunnel xmit to ip_md_tunnel_xmit