On Thu, Jun 14, 2012 at 4:12 AM, Jesse Gross <je...@nicira.com> wrote: > On Thu, Jun 14, 2012 at 2:42 PM, ravi kerur <rke...@gmail.com> wrote: >> On Wed, Jun 13, 2012 at 8:38 PM, Jesse Gross <je...@nicira.com> wrote: >>> On Thu, Jun 14, 2012 at 9:37 AM, Ben Pfaff <b...@nicira.com> wrote: >>>> On Wed, Jun 13, 2012 at 09:55:36PM +0900, Jesse Gross wrote: >>>>> If we go with what you have now, I'm fairly confident that we will >>>>> regret it in the future. The kernel code used to more directly >>>>> implement various features and prior to upstreaming we broke many >>>>> of them down. I'm happy with the results of that but this time we >>>>> won't have the benefit of revising things later. This is >>>>> particularly bad because it deviates from our usual model of >>>>> userspace controlling everything and here userspace won't even know >>>>> what the flow looks like. The effects of this tend to metastasize >>>>> because when userspace doesn't know what the packet looks like it >>>>> can't implement things that it might overwise be able to do and >>>>> more and more ends up in the kernel. The other thing, which is >>>>> specific to MPLS, is that there is no inherent way to know the type >>>>> of the payload. Userspace is vastly more likely to have this >>>>> information in the event that we want to do something with the >>>>> inner packet. In your patch the kernel is basically assuming that >>>>> the type is IP (OpenFlow doesn't give us any additional information >>>>> but it doesn't seem like a good idea in general). >>>>> >>>>> Using the approach that I had suggested largely avoids these >>>>> problems and if we could go that path by adding say, the ability to >>>>> match on another tag, then I think that would be >>>>> reasonable. However, it still has a variation of the last problem, >>>>> which is that although it may know what the inner type is, unless it >>>>> can tell that to the kernel there is no way to get to the inner >>>>> flow. This makes copy inwards to IP header difficult. Perhaps we >>>>> could add a mechanism for userspace to tell the kernel the type of >>>>> the inner packet, although it might not know either. >>>>> >>>>> Potentially a third approach is to add a more native way to >>>>> recirculate through the pipeline (somewhat like a patch port but >>>>> lighter weight and with a different intention). In addition to the >>>>> TTL issues it could be useful in situations where you want to parse >>>>> deeper into the packet than the number of MPLS or vlan tags >>>>> supported. Often times people might want to do L3 operations after >>>>> popping off an MPLS tag and in that case you almost certainly end up >>>>> doing something like this. >>>>> >>>>> Ben, do you have any thoughts? >>>> >>>> I think I understand options 1 and 3 but I wasn't able to quickly >>>> figure out what option 2 is. >>> >>> Option 2 would be to do something similar to what we did with IP TTL >>> where you have a set operation rather than copy in/out/decrement/etc. >>> It gives userspace much more control because it always knows what the >>> packet looks like. >> >> <rk> this will increase data structure size for both struct flow and >> sw_flow_key, but do-able. Also note that skb's don't have fields for >> mpls offsets and it needs to be calculated when modifying, although >> kernel code might get simplified a little since it just needs to >> calculate offset and update. Haven't thought through it thoroughly but >> I think that's the essence. Correct me if I am wrong. > > These are implementation details and not particularly significant > ones. I want to get the big picture right first. > >>>
<rk> atleast I thought we were leaning towards option 2 until option 3 is implemented. >>>> Anyway, I mostly like option 3. One wrinkle that occurs to me (maybe >>>> it's obvious?) is to separate defining the inner protocol from the >>>> recirculation. In other words, you'd have an action, >>>> e.g. OVS_ACTION_ATTR_SET_MPLS_INNER_PROTO, that tells the datapath >>>> what to find beyond the MPLS label(s). Following that, you could >>>> usefully tell the kernel module to copy ttl/in out, or you could >>>> usefully do a separate recirculate action. >>> >>> I was thinking that the first pass would pop off the tags and then the >>> second pass would process the inner packet. Since when popping off >>> the last tag with MPLS you already set the EtherType, nothing more >>> needs to be done specific to MPLS. Probably you would model these >>> passes as connected to tables in userspace, so a common use case might >>> be the first table does MPLS lookup/pop and the second tables does IP >>> processing. For that you just need a way to recirculate and know what >>> pass you're on. >>> >>> It's somewhat more complicated to have userspace setup recirculation >>> passes implicitly where it doesn't map directly to OpenFlow but in >>> theory you can model it the same way for TTL copy or additional levels >>> of tags. >> >> >> <rk> note that this is case for a pure mpls and egress only. One >> disadvantage in this scenario is that customers with overlapping ip >> cannot be supported, this will not be the case for mpls/vpn and >> doesn't need to parse both mpls and ip. > > I don't know why you think that overlapping IPs can't be supported. A > design requirement would be that multiple passes over the same packet > could be linked up in some manner. Also, if you don't need to parse > the inner packet then you wouldn't run a second pass. <rk> all packets at last hop come with explicit null label or implicit null depending on traffic-engineering parameters and not sure how you would differentiate it. What you have described above on recirculation is what traditional routers/switches do in both software and hardware. I figured out ovs couldn't do it when I was working on qinq. After matching on outer tag and pop the tag, there was no way to match on modified packet for a second round on inner tag. So I had to implement single match on both inner and outer tag. Anyways, the point is, if you guys decide to finally support some sort of recirculation I believe it won't be specific to mpls/ip but generic to handle any recirculation. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev