Re: [ovs-dev] MPLS important comments (was: Re: Patch for MPLS)

Ravi.Kerur Tue, 13 Mar 2012 12:14:21 -0700

Thanks again Ben. Comments inline <rk>

On Mon, Mar 12, 2012 at 09:58:15PM +0100, ravi.ke...@telekom.com wrote:
> flow_extract() now has code to skip over inner VLAN tags.  What's the
> rationale for that?  It seems unwise to me, because it makes it
> impossible for a controller to distinguish a packet with one VLAN tag
> from one with a dozen.  The implementation is also not correct as far
> as I can tell, because it can overrun the end of the packet.
> 
> For the same reason I wonder whether we should really skip over all
> the MPLS labels?
> 
> <rk> I don't know of any use-case where a packet can carry more than
> 2 vlan tags. Idea is to parse all vlan_tags/mpls_labels such that
> packet->l3 is correctly set after the parse. While parsing multiple
> tags/labels update flow->vlan_tci and flow->mpls_lse only after
> parsing first tag/label. If this is not the right way to do it let
> me know how you plan to handle this. I will incorporate it.


I'm not aware of a legitimate use case for more than 2 VLAN tags,
either, but I bet someone is doing it.

Currently OVS can distinguish:

        1. No VLAN tag.
        2. One VLAN tag.
        3. Two or more VLAN tags.

In cases 1 and 2, OVS can extract additional flow data; in case 3, OVS
can only report that there were at least two VLAN tags.

The "natural" way to support QinQ is to adjust the situation to:

        1. No VLAN tag.
        2. One VLAN tag.
        3. Two VLAN tags.
        4. Three or more VLAN tags.

such that in cases 1, 2, and 3, OVS can extract additional flow data
and in case 4, OVS can only report that there were at least three VLAN
tags.

Otherwise, you can end up with a security hole.  If, for example, our
current implementation simply skipped over second and subsequent VLAN
tags, then an OpenFlow flow could not detect QinQ packets at all and
would not know to drop them to avoid a VLAN-hopping vulnerability.
Here's a description of an attack mode that sounds likely to me:
http://blog.ine.com/2011/01/26/a-vlan-hopping-attack/

I don't know MPLS at all well, but I have no reason to believe that
the same situation couldn't come up with MPLS.

<rk> I kind of understand vlan qinq issue now and will handle it in vlan qinq 
patch. For mpls there is no such limit as 2 labels stack. I have tested 
multiple labels and single label mpls stack and made sure appropriate actions 
and parsing happens on that packet.

> I think that the definitions in mf_is_value_valid() and mf_get_value()
> are contradictory.  mf_is_value_valid() insists that the label be in
> the least significant 20 bits of value->be32, but mf_get_value() puts
> it in the most significant 20 bits.  I suggest that both should use
> the least significant 20 bits.
> 
> I think that mf_random_value() has the same problem.
> 
> <rk> Code is written with flow->mpls_lse following MPLS format
> because it is easier for anyone debugging it. Issue is when a user
> configures for a match based on a label it's in lower 20 bits. In
> mf_set_value, flow->mpls_lse is updated and it updates according to
> MPLS format. For nx_put_match, it follows the spec by putting the
> label in lower 20bits. mf_is_value_valid is called from
> nx_pull_match in which case label is in lower 20bits and hence the
> deviation.

I think that you are confusing two different issues.  I agree that
flow->mpls_lse should be in the MPLS header format, and I am not
suggesting a change.  But the be32 that appears in the "union
mf_value" used by the meta-flow.c functions does not correspond to
flow->mpls_lse.  It corresponds to the format of NXM_NX_MPLS_LABEL.
You should define that the same way as OpenFlow 1.2 does; otherwise,
we will have to do unnecessary work when we implement OpenFlow 1.2
support, and then carry that work forward forever.

Here is the OpenFlow 1.2 definition.  As you can see, it puts the
label in the low 20 bits of OXM_OF_MPLS_LABEL, not in the high 20
bits:

    /* The LABEL in the first MPLS shim header.
     *
     * Prereqs:
     *   OXM_OF_ETH_TYPE must match 0x8847 or 0x8848 exactly.
     *
     * Format: 32-bit integer in network byte order with 12 most-significant
     * bits forced to 0. Only the lower 20 bits have meaning.
     *
     * Masking: Not maskable. */
    #define OXM_OF_MPLS_LABEL  OXM_HEADER  (0x8000, OFPXMT_OFB_MPLS_LABEL, 4)

> mf_set_value, mf_get_value and mf_random_value follow the same
> format. I am bit reluctant to make changes now since it requires
> complete code changes and I am not seeing any added value in doing
> it.

The changes should be pretty localized to meta-flow.c.

<rk> Thanks for clarification. I have followed the spec and for 
NXM_NX_MPLS_LABEL it's always in the lower 20 bits and for NXM_NX_MPLS_TC it's 
always in lower 3 bits. In the previous comments (MPLS less-important 
comments), comment was a misnomer. I have fixed the comment. I will fix 
mf_get_value function. 

> It appears that push_mpls() refuses to add an MPLS header if there is
> a VLAN tag and the inner Ethertype is not IPv4.  Why?
> 
> <rk> I didn't think any other L3 protocol would fit into the
> use-case. For IPV6, it has its own label and I am not sure there
> would be a case for <VLAN/IPV6> packet to push a new MPLS
> label. Make sense? Or are you referring to other L3 protocol aka
> IPX...?

Do you mean that there is a different Ethertype for IPv6-in-MPLS?  Or
do you mean something different? 

<rk> I meant to say I don't know of any use case where MPLS shim header would 
be pushed on an IPv6 packet since IPv6 has its own flow label. Ethtype is the 
same for all layer-3 protocols. I will add IPv6 support with the assumption 
that there might be a use case which requires MPLS shim header for IPv6 
packets. Please let me know if I have misunderstood your question.

> ofproto/ofproto-dpif.c
> ----------------------
> 
> Did you carefully work out how the actions are implemented?  Does the
> right thing happen (a no-op) if an OpenFlow flow has a "push mpls"
> action followed by a "pop mpls" action?  etc.
> 
> <rk> I don't think push_mpls followed by pop_mpls as a nop is a
> valid use case. However, pop_mpls followed by push_mpls is a valid
> use case for label swap. I have tested code for both scenarios with
> latest code and it works fine.

OK, so what happens if you do push_mpls followed by pop_mpls within a
series of actions?  The result should make sense.

<rk> It's a no-op and I have tested it. What I meant to say was that usually 
you don't push_mpls/pop_mpls on a flow since it wouldn't do anything where as 
pop_mpls/push_mpls corresponds to a specific mpls action.

> I'm not sure, in fact, that it makes sense to try to delay committing
> MPLS actions.  It seems tricky, to me, at least.  I'd be tempted to
> write them to the ODP actions immediately rather than trying to defer
> it.  The code would be simpler and more obviously correct, at any rate.
> 
> <rk> When I started off implementation I wanted to take similar
> approach of the existing code since there is nothing special in
> MPLS. Is there a specific reason why existing code does it? I have
> made necessary changes not to delay mpls actions.

The existing code does it because Nicira has controllers that do
complicated dances with VLANs and other fields, changing them back and
forth and often not doing any output actions in between, so otherwise
we were ending up with kernel actions that did a lot of wasted changes
to fields.

<rk> thanks for clarification.

> tests/test-mpls.c
> -----------------
> 
> I don't understand why this program is here.  You should be able to
> write unit tests for this feature without actually opening raw
> sockets, etc.  We have sufficient tools to test packets through the
> "dummy" interfaces, which some of the existing unit tests.  At any
> rate I believe that this test program requires "root" privileges to
> work.
> 
> <rk> test-mpls.c is the main unit-test function I have used for all
> mpls and regression testing. For "root" privileges, I believe even
> "make install" or "make check" requires it, so I am not sure why
> this is a special case?

"make check" does not require root privileges.  It cannot, because the
unit tests run in builds by Debian and other distros where root
privileges are not available.

> You didn't add any unit tests, just this test program.  That means
> that "make check" doesn't actually test anything MPLS related.  We
> need at least basic unit tests.
> 
> <rk> I will add unit-test cases for mpls. Since I need to do this
> for vlan qinq as well, I would like to combine them and send it as a
> separate patch.

I'm happy to take a separate patch with unit tests, but I'd rather
have one per feature, that is, one patch to add unit tests for MPLS
and one patch to add unit tests for QinQ.

<rk> thanks, will send separate patch.

> <rk> Now, a request. It's really hard to carry around this huge
> change for a long time, because everytime I merge I see conflict in
> different areas of code and have to retest it each time. If you
> could let me push the modified code(I will send out updated patch
> shortly) into targeted release it would really help me. I will work
> on unit-tests shortly after the push and get back to you at the
> earliest. I have made sure "make check" matches latest code in
> master. Since new code is not breaking existing functionality it
> would really help me if it finds a home.

We don't put unfinished code into our repository.  You can make your
life easier, though, a couple of ways.  First, there's no need to
merge or rebase often.  You don't even have to do it before sending
out a revised patch, as long as you let us know what commit it's based
on.  Second, if retesting is a problem then it means that the tests
aren't automatic enough.  "make check TESTSUITEFLAGS=-j8" runs in 26
seconds here, which is not a large burden.

<rk> I think we are not in agreement here. I am not saying testing existing OVS 
functionality is a burden. What I meant to say was mpls tests are not automated 
yet and I have to go through manual testing on 2 systems to make sure nothing 
is broken after every merge/rebase. It would be really helpful if I can just 
work on unit-tests as a separate patch such that it makes it easier and faster 
to test the mpls functionality. Furthermore, it doesn't break any existing 
functionality and hence I am making the case.

Thanks
Ravi
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] MPLS important comments (was: Re: Patch for MPLS)

Reply via email to