On Tue, Jan 6, 2015 at 4:34 AM, Fan Du <fengyuleidian0...@gmail.com> wrote: > > On 2015/1/6 1:58, Jesse Gross wrote: >> >> On Mon, Jan 5, 2015 at 1:02 AM, Fan Du <fengyuleidian0...@gmail.com> >> wrote: >>> >>> 于 2014年12月03日 10:31, Du, Fan 写道: >>> >>>> >>>>> -----Original Message----- >>>>> From: Thomas Graf [mailto:t...@infradead.org] On Behalf Of Thomas Graf >>>>> Sent: Wednesday, December 3, 2014 1:42 AM >>>>> To: Michael S. Tsirkin >>>>> Cc: Du, Fan; 'Jason Wang'; net...@vger.kernel.org; da...@davemloft.net; >>>>> f...@strlen.de; dev@openvswitch.org; je...@nicira.com; pshe...@nicira.com >>>>> Subject: Re: [PATCH net] gso: do GSO for local skb with size bigger >>>>> than >>>>> MTU >>>>> >>>>> On 12/02/14 at 07:34pm, Michael S. Tsirkin wrote: >>>>>> >>>>>> On Tue, Dec 02, 2014 at 05:09:27PM +0000, Thomas Graf wrote: >>>>>>> >>>>>>> On 12/02/14 at 01:48pm, Flavio Leitner wrote: >>>>>>>> >>>>>>>> What about containers or any other virtualization environment that >>>>>>>> doesn't use Virtio? >>>>>>> >>>>>>> >>>>>>> The host can dictate the MTU in that case for both veth or OVS >>>>>>> internal which would be primary container plumbing techniques. >>>>>> >>>>>> >>>>>> It typically can't do this easily for VMs with emulated devices: >>>>>> real ethernet uses a fixed MTU. >>>>>> >>>>>> IMHO it's confusing to suggest MTU as a fix for this bug, it's an >>>>>> unrelated optimization. >>>>>> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED is the right fix here. >>>>> >>>>> >>>>> PMTU discovery only resolves the issue if an actual IP stack is running >>>>> inside the >>>>> VM. This may not be the case at all. >>>> >>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>>> >>>> Some thoughts here: >>>> >>>> Think otherwise, this is indeed what host stack should forge a >>>> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED >>>> message with _inner_ skb network and transport header, do whatever type >>>> of >>>> encapsulation, >>>> and thereafter push such packet upward to Guest/Container, which make >>>> them >>>> feel, the intermediate node >>>> or the peer send such message. PMTU should be expected to work correct. >>>> And such behavior should be shared by all other encapsulation tech if >>>> they >>>> are also suffered. >>> >>> >>> Hi David, Jesse and Thomas >>> >>> As discussed in here: >>> https://www.marc.info/?l=linux-netdev&m=141764712631150&w=4 and >>> quotes from Jesse: >>> My proposal would be something like this: >>> * For L2, reduce the VM MTU to the lowest common denominator on the >>> segment. >>> * For L3, use path MTU discovery or fragment inner packet (i.e. >>> normal routing behavior). >>> * As a last resort (such as if using an old version of virtio in the >>> guest), fragment the tunnel packet. >>> >>> >>> For L2, it's a administrative action >>> For L3, PMTU approach looks better, because once the sender is alerted >>> the >>> reduced MTU, >>> packet size after encapsulation will not exceed physical MTU, so no >>> additional fragments >>> efforts needed. >>> For "As a last resort... fragment the tunnel packet", the original patch: >>> https://www.marc.info/?l=linux-netdev&m=141715655024090&w=4 did the job, >>> but >>> seems it's >>> not welcomed. >> >> This needs to be properly integrated into IP processing if it is to >> work correctly. > > Do you mean the original patch in this thread? yes, it works correctly > in my cloud env. If you has any other concerns, please let me know. :)
Ok...but that doesn't actually address the points that I made. >> One of the reasons for only doing path MTU discovery >> for L3 is that it operates seamlessly as part of normal operation - >> there is no need to forge addresses or potentially generate ICMP when >> on an L2 network. However, this ignores the IP handling that is going >> on (note that in OVS it is possible for L3 to be implemented as a set >> of flows coming from a controller). >> >> It also should not be VXLAN specific or duplicate VXLAN encapsulation >> code. As this is happening before encapsulation, the generated ICMP >> does not need to be encapsulated either if it is created in the right >> location. > > Yes, I agree. GRE share the same issue from the code flow. > Pushing back ICMP msg back without encapsulation without circulating down > to physical device is possible. The "right location" as far as I know > could only be in ovs_vport_send. In addition this probably requires wrapper > route looking up operation for GRE/VXLAN, after get the under layer device > MTU > from the routing information, then calculate reduced MTU becomes feasible. As I said, it needs to be integrated into L3 processing. In OVS this would mean adding some primitives to the kernel and then exposing the functionality upwards into userspace/controller. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev