On Sep 3, 2012, at 3:43 PM, Jesse Gross wrote:

> On Wed, Aug 29, 2012 at 7:00 AM, Kyle Mestery <kmest...@cisco.com> wrote:
>> This is a first pass at providing a tun_key which can be
>> used as the basis for flow-based tunnelling. The tun_key
>> includes and replaces the tun_id in both struct ovs_skb_cb
>> and struct sw_tun_key.
>> 
>> In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow
>> when support for IPv6 to an extent that inlining the structure will result
>> in ovs_skb_cb being larger than the 48 bytes available in skb->cb.
>> 
>> As OVS does not support IPv6 as the outer transport protocol for tunnels
>> the IPv6 portions of this change, which appeared in the previous revision,
>> have been dropped in order to limit the scope and size of this patch.
>> 
>> This patch allows all existing tun_id behaviour to still work. However,
>> when the userspace code is updated to make use of the new tun_key, the
>> old behaviour will be deprecated and removed.
>> 
>> Signed-off-by: Kyle Mestery <kmest...@cisco.com>
>> Cc: Simon Horman <ho...@verge.net.au>
> 
> Thanks and sorry about the long delay.
> 
No worries. Please see my comments below for each of yours. I'll send out a new
patch today, with the exception of the changes for moving the tun_key value
for matching optimizations, today.

> The commit message seems to be a little bit more of a historical
> narrative than a description of what's going on in this particular
> patch.  Since I'd like to apply it on its own, can you make it
> describe the rationale a little more clearly?
> 
> sparse flagged a couple of errors that look one of the calls to
> __find_route() in tunnel.c wasn't updated:
> 
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69: warning:
> incorrect type in argument 3 (different base types)
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69:    expected
> restricted __be32 [usertype] daddr
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69:    got
> unsigned char [unsigned] [usertype] tos
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67: warning:
> incorrect type in argument 5 (different base types)
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67:    expected
> unsigned char [unsigned] [usertype] tos
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67:    got
> restricted __be32 [usertype] saddr
> 

Fixed.

>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index 208f260..83382b9 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> @@ -343,7 +343,11 @@ static int execute_set_action(struct sk_buff *skb,
>>                break;
>> 
>>        case OVS_KEY_ATTR_TUN_ID:
>> -               OVS_CB(skb)->tun_id = nla_get_be64(nested_attr);
>> +               OVS_CB(skb)->tun_key->tun_id = nla_get_be64(nested_attr);
> 
> I think this doesn't quite work out so simply as a compatibility layer
> because tun_key is either NULL or pointing directly into the actions
> of the flow, which are supposed to be immutable.  Long term this won't
> be a problem but in the meantime we'll have a figure out a way to fake
> up an appropriate tun_key.
> 
So, I think what we should do here is allocate OVS_CB(skb)->tun_key in
this case. We could modify struct ovs_key_ipv4_tunnel to have a flag value
indicating if it was allocated or not, to allow for correct handling at the time
of termination. Thoughts?

>> diff --git a/datapath/datapath.h b/datapath/datapath.h
>> index affbf0e..5e8bff1 100644
>> --- a/datapath/datapath.h
>> +++ b/datapath/datapath.h
>> @@ -96,7 +96,7 @@ struct datapath {
>> /**
>>  * struct ovs_skb_cb - OVS data in skb CB
>>  * @flow: The flow associated with this packet.  May be %NULL if no flow.
>> - * @tun_id: ID of the tunnel that encapsulated this packet.  It is 0 if the
>> + * @tun_key: Key for the tunnel that encapsulated this packet.
> 
> Can you mention that this is NULL if the packet is not being tunneled?
> 
Fixed.

>> diff --git a/datapath/flow.c b/datapath/flow.c
>> index d07337c..a68195d 100644
>> --- a/datapath/flow.c
>> +++ b/datapath/flow.c
>> @@ -1023,10 +1025,17 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, 
>> int *key_lenp,
>>        }
>> 
>>        if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) {
>> -               swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
>> +               swkey->phy.tun_key.tun_id = 
>> nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
>>                attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID);
>>        }
>> 
>> +       if (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) {
>> +               struct ovs_key_ipv4_tunnel *tun_key;
>> +               tun_key = nla_data(a[OVS_KEY_ATTR_IPV4_TUNNEL]);
>> +               swkey->phy.tun_key = *tun_key;
>> +               attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL);
>> +       }
> 
> Although we should accept both forms, I think if both are present we
> should ensure that they are consistent by checking that tun_id
> matches.
> 
Fixed.

>> @@ -1162,14 +1171,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, 
>> int *key_lenp,
>>  * get the metadata, that is, the parts of the flow key that cannot be
>>  * extracted from the packet itself.
>>  */
>> -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 
>> *tun_id,
>> +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
>> +                                  struct ovs_key_ipv4_tunnel *tun_key,
>>                                   const struct nlattr *attr)
>> {
>>        const struct nlattr *nla;
>>        int rem;
>> 
>>        *in_port = DP_MAX_PORTS;
>> -       *tun_id = 0;
>> +       tun_key->tun_id = 0;
> 
> I think we probably should memset the whole tun_key to ensure that
> there isn't any uninitialized data.
> 
Fixed.

>> int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
>> {
>>        struct ovs_key_ethernet *eth_key;
>> +       struct ovs_key_ipv4_tunnel *tun_key;
> 
> Can you declare this in block that checks for tun_key.ipv4_dst just to
> keep everything close together?
> 
Fixed.

>> +       if (swkey->phy.tun_key.tun_id != cpu_to_be64(0) &&
>> +               nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, 
>> swkey->phy.tun_key.tun_id))
> 
> Can you line up the second line with spaces to swkey->... make it a
> little clearer which part it belongs to prevent line wrapping?
> 
Fixed.

>> diff --git a/datapath/flow.h b/datapath/flow.h
>> index 5be481e..bab5363 100644
>> --- a/datapath/flow.h
>> +++ b/datapath/flow.h
>> struct sw_flow_key {
>>        struct {
>> -               __be64  tun_id;         /* Encapsulating tunnel ID. */
>> +               struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel 
>> key. */
> 
> One thing that I think we should look into is how we can both shrink
> this fields down for the purposes of matching (for example, I believe
> the flags fields is always zero here) and move it to the end of the
> struct (so that non-tunneled packets don't have to process it at all).
> We've seen that hashing and comparison are the most expensive
> operations in the OVS datapath so we should try to avoid introducing
> extra bytes in places where they aren't needed.  Obviously, this
> becomes even more important with IPv6 tunneling.  Normally I wouldn't
> worry about optimization of a new feature until the end but since I'd
> like to start putting in patches as they are ready it's important that
> we don't introduce regressions for existing behavior.
> 
So, as a start, move it out of the "struct phy" here into something at the end
of "struct sw_flow_key"?

>> @@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
>>  *                         ------  ---  ------  -----
>>  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
>>  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
>> + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24
> 
> I think this value might be out of date: I get 24 bytes as the base
> size of the struct.  Also, since the two bytes of padding are
> explicitly part of the struct and not added by Netlink, I would
> include them in the struct column rather than the pad column.
> 
Fixed.

>> +static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key)
>> +{
>> +       __be32 ndst = tun_key->ipv4_src;
>> +       tun_key->ipv4_src = tun_key->ipv4_dst;
>> +       tun_key->ipv4_dst = ndst;
>> +}
> 
> I don't think this function is actually used anywhere.
> 
Removed. I checked, and later patches did not use it either.

>> +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
>> +                               const struct iphdr *iph, __be64 tun_id)
>> +{
>> +       tun_key->tun_id = tun_id;
>> +       tun_key->ipv4_src = iph->saddr;
>> +       tun_key->ipv4_dst = iph->daddr;
>> +       tun_key->ipv4_tos = iph->tos;
>> +       tun_key->ipv4_ttl = iph->ttl;
>> +}
> 
> I'd be inclined to move this to tunnel.h since it's only used by
> tunnel implementations.
> 
Moved.

>> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
>> index d651c11..259668c 100644
>> --- a/datapath/tunnel.c
>> +++ b/datapath/tunnel.c
>> bool ovs_tnl_frag_needed(struct vport *vport,
> [...]
>> -       /*
>> -        * Assume that flow based keys are symmetric with respect to input
>> -        * and output and use the key that we were going to put on the
>> -        * outgoing packet for the fake received packet.  If the keys are
>> -        * not symmetric then PMTUD needs to be disabled since we won't have
>> -        * any way of synthesizing packets.
>> -        */
>> -       if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
>> -           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
>> -               OVS_CB(nskb)->tun_id = flow_key;
> 
> I think this is the one externally visibly feature that we're dropping
> at this point.  Can you make a note of it in the commit message and
> NEWS?
> 
Updated, please review.

>> @@ -799,10 +786,8 @@ static void create_tunnel_header(const struct vport 
>> *vport,
>>        iph->ihl        = sizeof(struct iphdr) >> 2;
>>        iph->frag_off   = htons(IP_DF);
>>        iph->protocol   = tnl_vport->tnl_ops->ipproto;
>> -       iph->tos        = mutable->tos;
>>        iph->daddr      = rt->rt_dst;
>>        iph->saddr      = rt->rt_src;
>> -       iph->ttl        = mutable->ttl;
>>        if (!iph->ttl)
>>                iph->ttl = ip4_dst_hoplimit(&rt_dst(rt));
> 
> I think these fields don't get used at this point but I'm not sure
> that the change is really related to the patch (and it also uses
> uninitialized memory).
> 
Added back in.

>> static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
>> -                                  u8 ipproto, u8 tos)
>> +                                  u8 ipproto, __be32 daddr, __be32 saddr,
>> +                                  u8 tos)
> 
> I find this order of arguments somewhat confusing (and judging by the
> fact that one place calls it wrong, I think you must too).  How about
> saddr, daddr, tos, ipproto?
> 
Fixed __find_route() and it's callers to include the order. I agree, it makes
more sense with the ordering you suggest here.

>> @@ -1037,14 +1024,16 @@ static struct rtable *find_route(struct vport *vport,
>>        *cache = NULL;
>>        tos = RT_TOS(tos);
>> 
>> -       if (likely(tos == RT_TOS(mutable->tos) &&
>> -           check_cache_valid(cur_cache, mutable))) {
>> +       if (daddr == mutable->key.daddr && saddr == mutable->key.saddr &&
>> +           tos == RT_TOS(mutable->tos) &&
> 
> I think we need to replicate this check down below where we build the
> cache - otherwise we could cache addresses from a specific flow that
> don't apply to this port in general.
> 
Replicated.

>> @@ -1219,11 +1210,21 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff 
>> *skb)
>> 
>>        if (mutable->flags & TNL_F_TOS_INHERIT)
>>                tos = inner_tos;
>> +       else if (OVS_CB(skb)->tun_key)
>> +               tos = OVS_CB(skb)->tun_key->ipv4_tos;
>>        else
>>                tos = mutable->tos;
>> 
>> +       if (OVS_CB(skb)->tun_key) {
>> +               daddr = OVS_CB(skb)->tun_key->ipv4_dst;
>> +               saddr = OVS_CB(skb)->tun_key->ipv4_src;
>> +       } else {
>> +               daddr = mutable->key.daddr;
>> +               saddr = mutable->key.saddr;
>> +       }
> [...]
>>        /* TTL */
>> -       ttl = mutable->ttl;
>> +       if (OVS_CB(skb)->tun_key)
>> +               ttl = OVS_CB(skb)->tun_key->ipv4_ttl;
>> +       else
>> +               ttl = mutable->ttl;
>>        if (!ttl)
>>                ttl = ip4_dst_hoplimit(&rt_dst(rt));
> 
> I think in all of these cases we want to use a check for ipv4_dst != 0
> instead of tun_key being non-NULL since we're planning on using
> tun_key even if only tun_id was initialized.
> 
Updated.

> That being said, if the full tun_key is set, that should probably
> override all of the other settings.  When it is used userspace can
> directly initialize all of the values without any kernel involvement
> and that the direction that we want to go in.  If userspace is using
> this action, we can assume that it accepts this new model.  This also
> includes ECN encapsulation, since userspace can do that as well.
> 
Agree, I think it's covered now.

>> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
>> index ab89c5b..90041af 100644
>> --- a/datapath/vport-gre.c
>> +++ b/datapath/vport-gre.c
>> @@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct 
>> vport *vport,
>>        __be32 *options = (__be32 *)(skb_network_header(skb) + 
>> mutable->tunnel_hlen
>>                                               - GRE_HEADER_SECTION);
>> 
>> -       /* Work backwards over the options so the checksum is last. */
>> -       if (mutable->flags & TNL_F_OUT_KEY_ACTION)
>> -               *options = be64_get_low32(OVS_CB(skb)->tun_id);
> 
> Hmm, it doesn't seem like we should be removing this.
> 
Added back in.

>> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
>> index f5c9cca..c32bb58 100644
>> --- a/include/linux/openvswitch.h
>> +++ b/include/linux/openvswitch.h
>> @@ -278,7 +278,8 @@ enum ovs_key_attr {
>>        OVS_KEY_ATTR_ICMPV6,    /* struct ovs_key_icmpv6 */
>>        OVS_KEY_ATTR_ARP,       /* struct ovs_key_arp */
>>        OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
>> -       OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */
>> +       OVS_KEY_ATTR_TUN_ID,    /* be64 tunnel ID */
>> +       OVS_KEY_ATTR_IPV4_TUNNEL,  /* struct ovs_key_ipv4_tunnel */
> 
> Where possible we should try to avoid breaking userspace/kernel
> compatibility.  We won't be able to completely prevent it since we
> want to completely switch over to the new mechanism but we should
> limit the number of times that we have to do it.  In this case, let's
> keep OVS_KEY_ATTR_TUN_ID as 63.  For OVS_KEY_ATTR_IPV4_TUNNEL, let's
> take the next high value (i.e. 62).  Pretty soon we'll want to use a
> "real" value but until we have some userspace code that actually
> depends on it this will give us a chance to make any modifications we
> need.
> 
Updated.

>> diff --git a/lib/odp-util.c b/lib/odp-util.c
>> index 901dac3..e2c1f2a 100644
>> --- a/lib/odp-util.c
>> +++ b/lib/odp-util.c
>> @@ -106,6 +106,7 @@ ovs_key_attr_to_string(enum ovs_key_attr attr)
>>     case OVS_KEY_ATTR_ARP: return "arp";
>>     case OVS_KEY_ATTR_ND: return "nd";
>>     case OVS_KEY_ATTR_TUN_ID: return "tun_id";
>> +    case OVS_KEY_ATTR_IPV4_TUNNEL: return "ipv4_tunnel";
>> 
>>     case __OVS_KEY_ATTR_MAX:
>>     default:
>> @@ -614,6 +615,7 @@ odp_flow_key_attr_len(uint16_t type)
>>     case OVS_KEY_ATTR_ICMPV6: return sizeof(struct ovs_key_icmpv6);
>>     case OVS_KEY_ATTR_ARP: return sizeof(struct ovs_key_arp);
>>     case OVS_KEY_ATTR_ND: return sizeof(struct ovs_key_nd);
>> +    case OVS_KEY_ATTR_IPV4_TUNNEL: return sizeof(struct 
>> ovs_key_ipv4_tunnel);
> 
> Since these statements aren't sensitive to ordering, I'm inclined to
> keep them in a more natural order based on layering.
> 
Fixed the ordering here.

>> diff --git a/lib/odp-util.h b/lib/odp-util.h
>> index 16f2b15..df89c72 100644
>> --- a/lib/odp-util.h
>> +++ b/lib/odp-util.h
>> @@ -80,6 +80,7 @@ int odp_actions_from_string(const char *, const struct 
>> simap *port_names,
>>  *                         ------  ---  ------  -----
>>  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
>>  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
>> + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24
> 
> Same issue with the calculation here as well.

Fixed.

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to