I think that if we come up with a good overall design, it should be able to handle different MTUs without needing to special case them - after all, we're already talking about 2 different MTUs (encapsulated and not) - so I don't think that having more would really make a significant difference. I would encourage you to read through this thread if you haven't already: https://www.spinics.net/lists/netdev/msg257830.html
On Fri, Jun 10, 2016 at 6:50 AM, Matt Kassawara <mkassaw...@gmail.com> wrote: > If it helps any, the entire underlying physical network should use the same > MTU, at least in the simplest case. In other words, all provider networks > use the native MTU and all self-service/project networks use the native MTU > minus overlay protocol overhead, or 58 bytes for Geneve with IPv4 endpoints. > Corner cases exist, but are rare and we can worry about them later. > > On Thu, Jun 9, 2016 at 7:06 PM, Jesse Gross <je...@kernel.org> wrote: >> >> In my previous message, this is what I mentioned (reproducing it here >> just because it doesn't appear in the quoted conversation below): >> >> "One possible solution is to introduce an action in the kernel that >> would check packets flowing through the switch against a length >> specified by the user (where the 'user' is OVS userspace/OVN in this >> case). To use this, we would do a route lookup of the tunnel endpoint >> to find the outgoing device, subtract the encapsulation overhead, and >> install a flow that checks this length and punts the packet to OVN to >> generate an ICMP message." >> >> A possible way of getting the MTU to OVS userspace would be through a >> configuration option. I don't think that this is really the hard part >> though and so rest of the discussion around this should still apply. >> In particular, it's not really that hard for OVS userspace to do a >> route lookup, so if we are totally sure that the MTU is static we >> could just have OVS fetch it. I'm not sure that either approach is all >> that generic though. >> >> On Thu, Jun 9, 2016 at 10:27 AM, Matt Kassawara <mkassaw...@gmail.com> >> wrote: >> > Jesse, >> > >> > I know this sounds too easy, but can we just tell OVS about the >> > underlying >> > physical network MTU via config option? >> > >> > On Fri, May 6, 2016 at 1:08 PM, Jesse Gross <je...@kernel.org> wrote: >> >> >> >> On Fri, May 6, 2016 at 11:53 AM, Ryan Moats <rmo...@us.ibm.com> wrote: >> >> > Jesse Gross <je...@kernel.org> wrote on 05/06/2016 11:11:10 AM: >> >> > >> >> >> From: Jesse Gross <je...@kernel.org> >> >> >> To: Ryan Moats/Omaha/IBM@IBMUS >> >> >> Cc: Matt Kassawara <mkassaw...@gmail.com>, discuss >> >> >> <discuss@openvswitch.org>, Thomas Graf <tg...@suug.ch> >> >> >> Date: 05/06/2016 11:11 AM >> >> > >> >> > >> >> >> Subject: Re: [ovs-discuss] MTU considerations for OVN >> >> >> >> >> >> On Fri, May 6, 2016 at 8:40 AM, Ryan Moats <rmo...@us.ibm.com> >> >> >> wrote: >> >> >> > "discuss" <discuss-boun...@openvswitch.org> wrote on 05/04/2016 >> >> >> > 06:09:04 >> >> >> > PM: >> >> >> > >> >> >> >> From: Jesse Gross <je...@kernel.org> >> >> >> >> To: Matt Kassawara <mkassaw...@gmail.com> >> >> >> >> Cc: discuss <discuss@openvswitch.org> >> >> >> >> Date: 05/04/2016 06:09 PM >> >> >> >> Subject: Re: [ovs-discuss] MTU considerations for OVN >> >> >> >> Sent by: "discuss" <discuss-boun...@openvswitch.org> >> >> >> >> >> >> >> >> On Tue, May 3, 2016 at 3:50 PM, Matt Kassawara >> >> >> >> <mkassaw...@gmail.com> >> >> >> >> wrote: >> >> >> >> > Jesse, >> >> >> >> > >> >> >> >> > I'm resurrecting this thread after a fairly lengthy discussion >> >> >> >> > of >> >> >> >> > MTU >> >> >> >> > with >> >> >> >> > Ben at the recent OpenStack summit. Have you given the topic >> >> >> >> > any >> >> >> >> > further >> >> >> >> > thought toward implementation in a reasonable way? Can you >> >> >> >> > elaborate >> >> >> >> > on >> >> >> >> > the >> >> >> >> > architectural limitations? At the moment, the OpenStack >> >> >> >> > implementation >> >> >> >> > of >> >> >> >> > OVN doesn't use DPDK. >> >> >> >> >> >> >> >> The issue that I alluded to before is that when OVS (and by >> >> >> >> extension >> >> >> >> OVN) does L3 processing the packets aren't traversing the Linux >> >> >> >> IP >> >> >> >> stack and so the usual MTU checks don't apply. Instead OVS just >> >> >> >> does >> >> >> >> a >> >> >> >> single combined lookup for all flow processing and then applies >> >> >> >> some >> >> >> >> actions like set SMAC/DMAC and decrement TTL. Not only is there >> >> >> >> no >> >> >> >> code to check the outgoing MTU but there's no obvious outgoing >> >> >> >> device >> >> >> >> to fetch the desired MTU from. >> >> >> > >> >> >> > I'm not 100% sure why this would be an issue - IIRC (based on my >> >> >> > scanning >> >> >> > the code) >> >> >> > when a packet is going to be outputed, it looks like the MTU of >> >> >> > the >> >> >> > physical >> >> >> > device >> >> >> > is checked and a fragmentation decision made. Isn't that good >> >> >> > enough >> >> >> > for >> >> >> > our >> >> >> > purposes? >> >> >> >> >> >> Which check in particular do you have in mind? >> >> >> >> >> >> There are two possibilities that I can think of: >> >> >> * ovs_vport_send() has one but the device it looks at for the MTU >> >> >> is >> >> >> a tunnel device, which has an essentially infinite MTU. The real MTU >> >> >> that we would need to check also depends on the destination IP >> >> >> address >> >> >> of the tunnel but we haven't done a route lookup at this point. >> >> >> * ip_finish_output() in the IP stack. This one does have the >> >> >> information that we need but it is outside of the tunnel. Any ICMP >> >> >> packets that are generated will be processed through the >> >> >> hypervisor's >> >> >> IP stack and won't make it back to the VM. In addition, this check >> >> >> doesn't handle GSO packets. >> >> > >> >> > I see, I was misreading code... my mistake. >> >> > >> >> > I certainly dislike the idea of separating the MTU calculation from >> >> > the >> >> > datapath. What I was hoping to find that it would be possible to do >> >> > the >> >> > fragmentation check on the tunnel after the route has been looked up >> >> > and >> >> > the outgoing device is known, but looking through this, I'm not >> >> > seeing >> >> > a good way to do this cleanly (yet) ... >> >> >> >> I agree. >> >> >> >> There was a thread a while back on the netdev mailing list related >> >> this but no real conclusion: >> >> https://www.spinics.net/lists/netdev/msg257830.html >> > >> > > > _______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss