Inline <rk> On Wed, Jun 13, 2012 at 5:55 AM, Jesse Gross <je...@nicira.com> wrote: > On Jun 13, 2012, at 12:12 PM, ravi kerur <rke...@gmail.com> wrote: > >> Thanks Pravin for the review comments. Inline <rk> >> >> On Tue, Jun 12, 2012 at 5:31 PM, Pravin Shelar <pshe...@nicira.com> wrote: >>> Hi Ravi, >>> >>> I see following issues with datapath mpls support: >>> >>> Actions: >>> I do not see skb->csum updated when mpls action is performed on packet. >> >> <rk> csum calculation is usually done in the kernel stack and I leave >> like that. Moreover, there could be multiple mpls actions for a single >> header and hence I thought it would not be efficient to do it >> everytime in ovs but rather rely on kernel during transmit. Please let >> me know otherwise. >> >> Note:(I have verified via tcpdump that packets are transmitted with >> correct checksum) > > skb->csum is used on the receive side by NICs that just generically compute a > checksum over the entire packet rather than giving a yes/no on the validity. > You must update it as you change the packet.
<rk> ok will fix it. > > >>> >>> TTL actions: >>> As Jesse has mentioned before there could be one Set action which >>> should be able to handle three different ttl related actions added. >>> e.g. OVS_ACTION_ATTR_COPY_TTL_IN can generate set-ipv4 action from >>> userspace. OVS_ACTION_ATTR_COPY_TTL_OUT and >>> OVS_ACTION_ATTR_DEC_MPLS_TTL can be set mpls action. This works for >>> single label case. But ttl_in and ttl_out also needs mpls inner packet >>> parsing in flow_extract to get exact flow in kernel datapath. >>> >> >> <rk> I had disagreed with Jesse for this approach and unfortunately >> our discussion didn't continue since he was busy. My understanding is >> that you use set actions when you are modifying multiple fields in a >> single header and it makes perfect sense to me. In case of mpls we are >> dealing with 2 headers ip and mpls > > It does save some time if you modify multiple header fields but that's not > the main goal. The primary objective is to push as much complexity as > possible to userspace and flow setup time. The kernel should just expose the > simplest building blocks possible. > >> for e.g. >> >> copy_ttl_in could be mpls-outer-header to mpls-inner-header (in case >> of stacked labels) and mpls-header to ip for single label. >> >> copy_ttl_out could be mpls-inner-header to mpls-outer-header(stacked >> labels) and ip to mpls-header for signle label >> >> dec_mpls_ttl, single header operation. >> >> hence I think it should be left as is. Let me know if I have >> misunderstood set operations. > > I spent a fair amount of time thinking about this today and concluded that > neither option is very attractive as is. > > If we go with what you have now, I'm fairly confident that we will regret it > in the future. The kernel code used to more directly implement various > features and prior to upstreaming we broke many of them down. I'm happy with > the results of that but this time we won't have the benefit of revising > things later. This is particularly bad because it deviates from our usual > model of userspace controlling everything and here userspace won't even know > what the flow looks like. The effects of this tend to metastasize because > when userspace doesn't know what the packet looks like it can't implement > things that it might overwise be able to do and more and more ends up in the > kernel. The other thing, which is specific to MPLS, is that there is no > inherent way to know the type of the payload. Userspace is vastly more likely > to have this information in the event that we want to do something with the > inner packet. In your patch the kernel is basically assuming that the type > is IP (OpenFlow doesn't give us any additional information but it doesn't > seem like a good idea in general). <rk> No both kernel and userspace takes care of ipv4/ipv6 and non-ip as well. Motivation for MPLS is to be unaware of underneath l2/l3 protocol and allow fast switching based on just top mpls label. > > Using the approach that I had suggested largely avoids these problems and if > we could go that path by adding say, the ability to match on another tag, > then I think that would be reasonable. However, it still has a variation of > the last problem, which is that although it may know what the inner type is, > unless it can tell that to the kernel there is no way to get to the inner > flow. This makes copy inwards to IP header difficult. Perhaps we could add a > mechanism for userspace to tell the kernel the type of the inner packet, > although it might not know either. <rk> Note that we are also increasing the size of flow-key and flow here which is not required in the first case. Is it good to increase memory footprint if majority of the flows handled by ovs are non-MPLS. It can be justified if ovs is in the core and handles only MPLS traffic. > > Potentially a third approach is to add a more native way to recirculate > through the pipeline (somewhat like a patch port but lighter weight and with > a different intention). In addition to the TTL issues it could be useful in > situations where you want to parse deeper into the packet than the number of > MPLS or vlan tags supported. Often times people might want to do L3 > operations after popping off an MPLS tag and in that case you almost > certainly end up doing something like this. <rk> this is the case when packet has single mpls label and action configured is pop_mpls, in addition to mpls label lookup, l3 lookup happens as well. > > Ben, do you have any thoughts? > >> I am not sure whether we should do take inner mpls header parsing >> approach in kernel because it deviates from the spec. The only >> deviation so far in the code v/s spec is handling of ttl and tos bits >> for inner-most mpls header and it is done that way because of >> practical deployed use-case. > > The kernel doesn't implement OpenFlow, so it doesn't matter what the spec > says. The only thing that needs to be consistent is the interface that > userspace exposes to the outside world. Everything else is an internal > implementation detail. <rk> I was referring to struct sw_flow_key and atleast I thought it is motivated by openflow. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev