On 8/8/16, 8:25 AM, Simon Horman wrote: > On Sun, Jul 31, 2016 at 12:07:10AM -0700, Roopa Prabhu wrote: >> On 7/27/16, 12:02 AM, zhuyj wrote: >>> On ubuntu16.04 server 64 bit >>> The attached script is run, the following will appear. >>> >>> Error: either "to" is duplicate, or "encap" is a garbage. >> This maybe just because the iproute2 version on ubuntu does not >> support the route encap attributes yet. >> >> [snip] >> >>> On Tue, Jul 26, 2016 at 12:39 AM, Lennert Buytenhek <buyt...@wantstofly.org> >>> wrote: >>> >>>> Hi! >>>> >>>> I am seeing pretty horrible TCP transmit performance (anywhere between >>>> 1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a >>>> route that involves MPLS labeling, and this seems to be due to an >>>> interaction between MPLS and TSO/GSO that causes all segmentable TCP >>>> frames that are MPLS-labeled to be dropped on egress. >>>> >>>> I initially ran into this issue with the ixgbe driver, but it is easily >>>> reproduced with veth interfaces, and the script attached below this >>>> email reproduces the issue. The script configures three network >>>> namespaces: one that transmits TCP data (netperf) with MPLS labels, >>>> one that takes the MPLS traffic and pops the labels and forwards the >>>> traffic on, and one that receives the traffic (netserver). When not >>>> using MPLS labeling, I get ~30000 Mb/s single-stream TCP performance >>>> in this setup on my test box, and with MPLS labeling, I get ~2 Mb/s. >>>> >>>> Some investigating shows that egress TCP frames that need to be >>>> segmented are being dropped in validate_xmit_skb(), which calls >>>> skb_gso_segment() which calls skb_mac_gso_segment() which returns >>>> -EPROTONOSUPPORT because we apparently didn't have the right kernel >>>> module (mpls_gso) loaded. >>>> >>>> (It's somewhat poor design, IMHO, to degrade network performance by >>>> 15000x if someone didn't load a kernel module they didn't know they >>>> should have loaded, and in a way that doesn't log any warnings or >>>> errors and can only be diagnosed by adding printk calls to net/core/ >>>> and recompiling your kernel.) >> Its possible that the right way to do this is to always auto select MPLS_GSO >> if MPLS_IPTUNNEL is selected. I am guessing this by looking at the >> openvswitch mpls Kconfig entries and comparing with MPLS_IPTUNNEL. >> will look some more. >> >>>> (Also, I'm not sure why mpls_gso is needed when ixgbe seems to be >>>> able to natively do TSO on MPLS-labeled traffic, maybe because ixgbe >>>> doesn't advertise the necessary features in ->mpls_features? But >>>> adding those bits doesn't seem to change much.) >>>> >>>> But, loading mpls_gso doesn't change much -- skb_gso_segment() then >>>> starts return -EINVAL instead, which is due to the >>>> skb_network_protocol() call in skb_mac_gso_segment() returning zero. >>>> And looking at skb_network_protocol(), I don't see how this is >>>> supposed to work -- skb->protocol is 0 at this point, and there is no >>>> way to figure out that what we are encapsulating is IP traffic, because >>>> unlike what is the case with VLAN tags, MPLS labels aren't followed by >>>> an inner ethertype that says what kind of traffic is in here, you have >>>> to have explicit knowledge of the payload type for MPLS. >>>> >>>> Any ideas? >> I was looking at the history of net/mpls/mpls_gso.c and the initial git log >> comment >> says that the driver expects the mpls tunnel driver to do a few things which >> I think >> might be the problem. I do see mpls_iptunnel.c setting the skb->protocol but >> not the >> skb->inner_protocol. wonder if fixing anything there will help ?. > If the inner protocol is not set then I don't think that segmentation can > function as there is (or at least was for the use case the code was added) > no way for the stack to know the protocol of the inner packet otherwise. > > On another note I was recently poking around the code and I wonder if the > following may be needed (this was in the context of my under-construction > l3 tunnel work for OvS and it may only be needed in that context):
Thanks simon, we are still working with this.. stay tuned. > > diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c > index 2055e57ed1c3..113cba89653d 100644 > --- a/net/mpls/mpls_gso.c > +++ b/net/mpls/mpls_gso.c > @@ -39,16 +39,18 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff > *skb, > mpls_features = skb->dev->mpls_features & features; > segs = skb_mac_gso_segment(skb, mpls_features); > > - > - /* Restore outer protocol. */ > - skb->protocol = mpls_protocol; > - > /* Re-pull the mac header that the call to skb_mac_gso_segment() > * above pulled. It will be re-pushed after returning > * skb_mac_gso_segment(), an indirect caller of this function. > */ > __skb_pull(skb, skb->data - skb_mac_header(skb)); > > + /* Restore outer protocol. */ > + skb->protocol = mpls_protocol; > + if (!IS_ERR(segs)) > + for (skb = segs; skb; skb = skb->next) > + skb->protocol = mpls_protocol; > + > return segs; > } >