Inline <rk>

On Wed, Jun 13, 2012 at 5:55 AM, Jesse Gross <je...@nicira.com> wrote:
> On Jun 13, 2012, at 12:12 PM, ravi kerur <rke...@gmail.com> wrote:
>
>> Thanks Pravin for the review comments. Inline <rk>
>>
>> On Tue, Jun 12, 2012 at 5:31 PM, Pravin Shelar <pshe...@nicira.com> wrote:
>>> Hi Ravi,
>>>
>>> I see following issues with datapath mpls support:
>>>
>>> Actions:
>>> I do not see skb->csum updated when mpls action is performed on packet.
>>
>> <rk> csum calculation is usually done in the kernel stack and I leave
>> like that. Moreover, there could be multiple mpls actions for a single
>> header and hence I thought it would not be efficient to do it
>> everytime in ovs but rather rely on kernel during transmit. Please let
>> me know otherwise.
>>
>> Note:(I have verified via tcpdump that packets are transmitted with
>> correct checksum)
>
> skb->csum is used on the receive side by NICs that just generically compute a 
> checksum over the entire packet rather than giving a yes/no on the validity. 
> You must update it as you change the packet.

<rk> ok will fix it.
>
>
>>>
>>> TTL actions:
>>> As Jesse has mentioned before there could be one Set action which
>>> should be able to handle three different ttl related actions added.
>>> e.g. OVS_ACTION_ATTR_COPY_TTL_IN can generate set-ipv4 action from
>>> userspace. OVS_ACTION_ATTR_COPY_TTL_OUT and
>>> OVS_ACTION_ATTR_DEC_MPLS_TTL can be set mpls action. This works for
>>> single label case. But ttl_in and ttl_out also needs mpls inner packet
>>> parsing in flow_extract to get exact flow in kernel datapath.
>>>
>>
>> <rk> I had disagreed with Jesse for this approach and unfortunately
>> our discussion didn't continue since he was busy. My understanding is
>> that you use set actions when you are modifying multiple fields in a
>> single header and it makes perfect sense to me. In case of mpls we are
>> dealing with 2 headers ip and mpls
>
> It does save some time if you modify multiple header fields but that's not 
> the main goal. The primary objective is to push as much complexity as 
> possible to userspace and flow setup time. The kernel should just expose the 
> simplest building blocks possible.
>
>> for e.g.
>>
>> copy_ttl_in could be mpls-outer-header to mpls-inner-header (in case
>> of stacked labels) and mpls-header to ip for single label.
>>
>> copy_ttl_out could be mpls-inner-header to mpls-outer-header(stacked
>> labels) and ip to mpls-header for signle label
>>
>> dec_mpls_ttl, single header operation.
>>
>> hence I think it should be left as is. Let me know if I have
>> misunderstood set operations.
>
> I spent a fair amount of time thinking about this today and concluded that 
> neither option is very attractive as is.
>
> If we go with what you have now, I'm fairly confident that we will regret it 
> in the future.  The kernel code used to more directly implement various 
> features and prior to upstreaming we broke many of them down.  I'm happy with 
> the results of that but this time we won't have the benefit of revising 
> things later. This is particularly bad because it deviates from our usual 
> model of userspace controlling everything and here userspace won't even know 
> what the flow looks like. The effects of this tend to metastasize because 
> when userspace doesn't know what the packet looks like it can't implement 
> things that it might overwise be able to do and more and more ends up in the 
> kernel. The other thing, which is specific to MPLS, is that there is no 
> inherent way to know the type of the payload. Userspace is vastly more likely 
> to have this information in the event that we want to do something with the 
> inner packet.  In your patch the kernel is basically assuming that the type 
> is IP (OpenFlow doesn't give us any additional information but it doesn't 
> seem like a good idea in general).

<rk> No both kernel and userspace takes care of ipv4/ipv6 and non-ip
as well. Motivation for MPLS is to be unaware of underneath l2/l3
protocol and allow fast switching based on just top mpls label.

>
> Using the approach that I had suggested largely avoids these problems and if 
> we could go that path by adding say, the ability to match on another tag, 
> then I think that would be reasonable. However, it still has a variation of 
> the last problem, which is that although it may know what the inner type is, 
> unless it can tell that to the kernel there is no way to get to the inner 
> flow. This makes copy inwards to IP header difficult. Perhaps we could add a 
> mechanism for userspace to tell the kernel the type of the inner packet, 
> although it might not know either.

<rk> Note that we are also increasing the size of flow-key and flow
here which is not required in the first case. Is it good to increase
memory footprint if majority of the flows handled by ovs are non-MPLS.
It can be justified if ovs is in the core and handles only MPLS
traffic.

>
> Potentially a third approach is to add a more native way to recirculate 
> through the pipeline (somewhat like a patch port but lighter weight and with 
> a different intention). In addition to the TTL issues it could be useful in 
> situations where you want to parse deeper into the packet than the number of 
> MPLS or vlan tags supported.  Often times people might want to do L3 
> operations after popping off an MPLS tag and in that case you almost 
> certainly end up doing something like this.

<rk> this is the case when packet has single mpls label and action
configured is pop_mpls, in addition to mpls label lookup, l3 lookup
happens as well.

>
> Ben, do you have any thoughts?
>
>> I am not sure whether we should do take inner mpls header parsing
>> approach in kernel because it deviates from the spec. The only
>> deviation so far in the code v/s spec is handling of ttl and tos bits
>> for inner-most mpls header and it is done that way because of
>> practical deployed use-case.
>
> The kernel doesn't implement OpenFlow, so it doesn't matter what the spec 
> says. The only thing that needs to be consistent is the interface that 
> userspace exposes to the outside world. Everything else is an internal 
> implementation detail.

<rk> I was referring to struct sw_flow_key and atleast I thought it is
motivated by openflow.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to