On Wed, Jun 13, 2012 at 8:38 PM, Jesse Gross <je...@nicira.com> wrote: > On Thu, Jun 14, 2012 at 9:37 AM, Ben Pfaff <b...@nicira.com> wrote: >> On Wed, Jun 13, 2012 at 09:55:36PM +0900, Jesse Gross wrote: >>> If we go with what you have now, I'm fairly confident that we will >>> regret it in the future. The kernel code used to more directly >>> implement various features and prior to upstreaming we broke many >>> of them down. I'm happy with the results of that but this time we >>> won't have the benefit of revising things later. This is >>> particularly bad because it deviates from our usual model of >>> userspace controlling everything and here userspace won't even know >>> what the flow looks like. The effects of this tend to metastasize >>> because when userspace doesn't know what the packet looks like it >>> can't implement things that it might overwise be able to do and >>> more and more ends up in the kernel. The other thing, which is >>> specific to MPLS, is that there is no inherent way to know the type >>> of the payload. Userspace is vastly more likely to have this >>> information in the event that we want to do something with the >>> inner packet. In your patch the kernel is basically assuming that >>> the type is IP (OpenFlow doesn't give us any additional information >>> but it doesn't seem like a good idea in general). >>> >>> Using the approach that I had suggested largely avoids these >>> problems and if we could go that path by adding say, the ability to >>> match on another tag, then I think that would be >>> reasonable. However, it still has a variation of the last problem, >>> which is that although it may know what the inner type is, unless it >>> can tell that to the kernel there is no way to get to the inner >>> flow. This makes copy inwards to IP header difficult. Perhaps we >>> could add a mechanism for userspace to tell the kernel the type of >>> the inner packet, although it might not know either. >>> >>> Potentially a third approach is to add a more native way to >>> recirculate through the pipeline (somewhat like a patch port but >>> lighter weight and with a different intention). In addition to the >>> TTL issues it could be useful in situations where you want to parse >>> deeper into the packet than the number of MPLS or vlan tags >>> supported. Often times people might want to do L3 operations after >>> popping off an MPLS tag and in that case you almost certainly end up >>> doing something like this. >>> >>> Ben, do you have any thoughts? >> >> I think I understand options 1 and 3 but I wasn't able to quickly >> figure out what option 2 is. > > Option 2 would be to do something similar to what we did with IP TTL > where you have a set operation rather than copy in/out/decrement/etc. > It gives userspace much more control because it always knows what the > packet looks like.
<rk> this will increase data structure size for both struct flow and sw_flow_key, but do-able. Also note that skb's don't have fields for mpls offsets and it needs to be calculated when modifying, although kernel code might get simplified a little since it just needs to calculate offset and update. Haven't thought through it thoroughly but I think that's the essence. Correct me if I am wrong. > >> Anyway, I mostly like option 3. One wrinkle that occurs to me (maybe >> it's obvious?) is to separate defining the inner protocol from the >> recirculation. In other words, you'd have an action, >> e.g. OVS_ACTION_ATTR_SET_MPLS_INNER_PROTO, that tells the datapath >> what to find beyond the MPLS label(s). Following that, you could >> usefully tell the kernel module to copy ttl/in out, or you could >> usefully do a separate recirculate action. > > I was thinking that the first pass would pop off the tags and then the > second pass would process the inner packet. Since when popping off > the last tag with MPLS you already set the EtherType, nothing more > needs to be done specific to MPLS. Probably you would model these > passes as connected to tables in userspace, so a common use case might > be the first table does MPLS lookup/pop and the second tables does IP > processing. For that you just need a way to recirculate and know what > pass you're on. > > It's somewhat more complicated to have userspace setup recirculation > passes implicitly where it doesn't map directly to OpenFlow but in > theory you can model it the same way for TTL copy or additional levels > of tags. <rk> note that this is case for a pure mpls and egress only. One disadvantage in this scenario is that customers with overlapping ip cannot be supported, this will not be the case for mpls/vpn and doesn't need to parse both mpls and ip. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev