On Wed, Aug 29, 2012 at 7:00 AM, Kyle Mestery <kmest...@cisco.com> wrote: > This is a first pass at providing a tun_key which can be > used as the basis for flow-based tunnelling. The tun_key > includes and replaces the tun_id in both struct ovs_skb_cb > and struct sw_tun_key. > > In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow > when support for IPv6 to an extent that inlining the structure will result > in ovs_skb_cb being larger than the 48 bytes available in skb->cb. > > As OVS does not support IPv6 as the outer transport protocol for tunnels > the IPv6 portions of this change, which appeared in the previous revision, > have been dropped in order to limit the scope and size of this patch. > > This patch allows all existing tun_id behaviour to still work. However, > when the userspace code is updated to make use of the new tun_key, the > old behaviour will be deprecated and removed. > > Signed-off-by: Kyle Mestery <kmest...@cisco.com> > Cc: Simon Horman <ho...@verge.net.au>
Thanks and sorry about the long delay. The commit message seems to be a little bit more of a historical narrative than a description of what's going on in this particular patch. Since I'd like to apply it on its own, can you make it describe the rationale a little more clearly? sparse flagged a couple of errors that look one of the calls to __find_route() in tunnel.c wasn't updated: /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69: warning: incorrect type in argument 3 (different base types) /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69: expected restricted __be32 [usertype] daddr /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69: got unsigned char [unsigned] [usertype] tos /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67: warning: incorrect type in argument 5 (different base types) /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67: expected unsigned char [unsigned] [usertype] tos /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67: got restricted __be32 [usertype] saddr > diff --git a/datapath/actions.c b/datapath/actions.c > index 208f260..83382b9 100644 > --- a/datapath/actions.c > +++ b/datapath/actions.c > @@ -343,7 +343,11 @@ static int execute_set_action(struct sk_buff *skb, > break; > > case OVS_KEY_ATTR_TUN_ID: > - OVS_CB(skb)->tun_id = nla_get_be64(nested_attr); > + OVS_CB(skb)->tun_key->tun_id = nla_get_be64(nested_attr); I think this doesn't quite work out so simply as a compatibility layer because tun_key is either NULL or pointing directly into the actions of the flow, which are supposed to be immutable. Long term this won't be a problem but in the meantime we'll have a figure out a way to fake up an appropriate tun_key. > diff --git a/datapath/datapath.h b/datapath/datapath.h > index affbf0e..5e8bff1 100644 > --- a/datapath/datapath.h > +++ b/datapath/datapath.h > @@ -96,7 +96,7 @@ struct datapath { > /** > * struct ovs_skb_cb - OVS data in skb CB > * @flow: The flow associated with this packet. May be %NULL if no flow. > - * @tun_id: ID of the tunnel that encapsulated this packet. It is 0 if the > + * @tun_key: Key for the tunnel that encapsulated this packet. Can you mention that this is NULL if the packet is not being tunneled? > diff --git a/datapath/flow.c b/datapath/flow.c > index d07337c..a68195d 100644 > --- a/datapath/flow.c > +++ b/datapath/flow.c > @@ -1023,10 +1025,17 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, > int *key_lenp, > } > > if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) { > - swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]); > + swkey->phy.tun_key.tun_id = > nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]); > attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID); > } > > + if (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) { > + struct ovs_key_ipv4_tunnel *tun_key; > + tun_key = nla_data(a[OVS_KEY_ATTR_IPV4_TUNNEL]); > + swkey->phy.tun_key = *tun_key; > + attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL); > + } Although we should accept both forms, I think if both are present we should ensure that they are consistent by checking that tun_id matches. > @@ -1162,14 +1171,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, > int *key_lenp, > * get the metadata, that is, the parts of the flow key that cannot be > * extracted from the packet itself. > */ > -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 > *tun_id, > +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, > + struct ovs_key_ipv4_tunnel *tun_key, > const struct nlattr *attr) > { > const struct nlattr *nla; > int rem; > > *in_port = DP_MAX_PORTS; > - *tun_id = 0; > + tun_key->tun_id = 0; I think we probably should memset the whole tun_key to ensure that there isn't any uninitialized data. > int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) > { > struct ovs_key_ethernet *eth_key; > + struct ovs_key_ipv4_tunnel *tun_key; Can you declare this in block that checks for tun_key.ipv4_dst just to keep everything close together? > + if (swkey->phy.tun_key.tun_id != cpu_to_be64(0) && > + nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, > swkey->phy.tun_key.tun_id)) Can you line up the second line with spaces to swkey->... make it a little clearer which part it belongs to prevent line wrapping? > diff --git a/datapath/flow.h b/datapath/flow.h > index 5be481e..bab5363 100644 > --- a/datapath/flow.h > +++ b/datapath/flow.h > struct sw_flow_key { > struct { > - __be64 tun_id; /* Encapsulating tunnel ID. */ > + struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel > key. */ One thing that I think we should look into is how we can both shrink this fields down for the purposes of matching (for example, I believe the flags fields is always zero here) and move it to the end of the struct (so that non-tunneled packets don't have to process it at all). We've seen that hashing and comparison are the most expensive operations in the OVS datapath so we should try to avoid introducing extra bytes in places where they aren't needed. Obviously, this becomes even more important with IPv6 tunneling. Normally I wouldn't worry about optimization of a new feature until the end but since I'd like to start putting in patches as they are ready it's important that we don't introduce regressions for existing behavior. > @@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies); > * ------ --- ------ ----- > * OVS_KEY_ATTR_PRIORITY 4 -- 4 8 > * OVS_KEY_ATTR_TUN_ID 8 -- 4 12 > + * OVS_KEY_ATTR_IPV4_TUNNEL 18 2 4 24 I think this value might be out of date: I get 24 bytes as the base size of the struct. Also, since the two bytes of padding are explicitly part of the struct and not added by Netlink, I would include them in the struct column rather than the pad column. > +static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key) > +{ > + __be32 ndst = tun_key->ipv4_src; > + tun_key->ipv4_src = tun_key->ipv4_dst; > + tun_key->ipv4_dst = ndst; > +} I don't think this function is actually used anywhere. > +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key, > + const struct iphdr *iph, __be64 tun_id) > +{ > + tun_key->tun_id = tun_id; > + tun_key->ipv4_src = iph->saddr; > + tun_key->ipv4_dst = iph->daddr; > + tun_key->ipv4_tos = iph->tos; > + tun_key->ipv4_ttl = iph->ttl; > +} I'd be inclined to move this to tunnel.h since it's only used by tunnel implementations. > diff --git a/datapath/tunnel.c b/datapath/tunnel.c > index d651c11..259668c 100644 > --- a/datapath/tunnel.c > +++ b/datapath/tunnel.c > bool ovs_tnl_frag_needed(struct vport *vport, [...] > - /* > - * Assume that flow based keys are symmetric with respect to input > - * and output and use the key that we were going to put on the > - * outgoing packet for the fake received packet. If the keys are > - * not symmetric then PMTUD needs to be disabled since we won't have > - * any way of synthesizing packets. > - */ > - if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) == > - (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) > - OVS_CB(nskb)->tun_id = flow_key; I think this is the one externally visibly feature that we're dropping at this point. Can you make a note of it in the commit message and NEWS? > @@ -799,10 +786,8 @@ static void create_tunnel_header(const struct vport > *vport, > iph->ihl = sizeof(struct iphdr) >> 2; > iph->frag_off = htons(IP_DF); > iph->protocol = tnl_vport->tnl_ops->ipproto; > - iph->tos = mutable->tos; > iph->daddr = rt->rt_dst; > iph->saddr = rt->rt_src; > - iph->ttl = mutable->ttl; > if (!iph->ttl) > iph->ttl = ip4_dst_hoplimit(&rt_dst(rt)); I think these fields don't get used at this point but I'm not sure that the change is really related to the patch (and it also uses uninitialized memory). > static struct rtable *__find_route(const struct tnl_mutable_config *mutable, > - u8 ipproto, u8 tos) > + u8 ipproto, __be32 daddr, __be32 saddr, > + u8 tos) I find this order of arguments somewhat confusing (and judging by the fact that one place calls it wrong, I think you must too). How about saddr, daddr, tos, ipproto? > @@ -1037,14 +1024,16 @@ static struct rtable *find_route(struct vport *vport, > *cache = NULL; > tos = RT_TOS(tos); > > - if (likely(tos == RT_TOS(mutable->tos) && > - check_cache_valid(cur_cache, mutable))) { > + if (daddr == mutable->key.daddr && saddr == mutable->key.saddr && > + tos == RT_TOS(mutable->tos) && I think we need to replicate this check down below where we build the cache - otherwise we could cache addresses from a specific flow that don't apply to this port in general. > @@ -1219,11 +1210,21 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff > *skb) > > if (mutable->flags & TNL_F_TOS_INHERIT) > tos = inner_tos; > + else if (OVS_CB(skb)->tun_key) > + tos = OVS_CB(skb)->tun_key->ipv4_tos; > else > tos = mutable->tos; > > + if (OVS_CB(skb)->tun_key) { > + daddr = OVS_CB(skb)->tun_key->ipv4_dst; > + saddr = OVS_CB(skb)->tun_key->ipv4_src; > + } else { > + daddr = mutable->key.daddr; > + saddr = mutable->key.saddr; > + } [...] > /* TTL */ > - ttl = mutable->ttl; > + if (OVS_CB(skb)->tun_key) > + ttl = OVS_CB(skb)->tun_key->ipv4_ttl; > + else > + ttl = mutable->ttl; > if (!ttl) > ttl = ip4_dst_hoplimit(&rt_dst(rt)); I think in all of these cases we want to use a check for ipv4_dst != 0 instead of tun_key being non-NULL since we're planning on using tun_key even if only tun_id was initialized. That being said, if the full tun_key is set, that should probably override all of the other settings. When it is used userspace can directly initialize all of the values without any kernel involvement and that the direction that we want to go in. If userspace is using this action, we can assume that it accepts this new model. This also includes ECN encapsulation, since userspace can do that as well. > diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c > index ab89c5b..90041af 100644 > --- a/datapath/vport-gre.c > +++ b/datapath/vport-gre.c > @@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct > vport *vport, > __be32 *options = (__be32 *)(skb_network_header(skb) + > mutable->tunnel_hlen > - GRE_HEADER_SECTION); > > - /* Work backwards over the options so the checksum is last. */ > - if (mutable->flags & TNL_F_OUT_KEY_ACTION) > - *options = be64_get_low32(OVS_CB(skb)->tun_id); Hmm, it doesn't seem like we should be removing this. > diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h > index f5c9cca..c32bb58 100644 > --- a/include/linux/openvswitch.h > +++ b/include/linux/openvswitch.h > @@ -278,7 +278,8 @@ enum ovs_key_attr { > OVS_KEY_ATTR_ICMPV6, /* struct ovs_key_icmpv6 */ > OVS_KEY_ATTR_ARP, /* struct ovs_key_arp */ > OVS_KEY_ATTR_ND, /* struct ovs_key_nd */ > - OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */ > + OVS_KEY_ATTR_TUN_ID, /* be64 tunnel ID */ > + OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */ Where possible we should try to avoid breaking userspace/kernel compatibility. We won't be able to completely prevent it since we want to completely switch over to the new mechanism but we should limit the number of times that we have to do it. In this case, let's keep OVS_KEY_ATTR_TUN_ID as 63. For OVS_KEY_ATTR_IPV4_TUNNEL, let's take the next high value (i.e. 62). Pretty soon we'll want to use a "real" value but until we have some userspace code that actually depends on it this will give us a chance to make any modifications we need. > diff --git a/lib/odp-util.c b/lib/odp-util.c > index 901dac3..e2c1f2a 100644 > --- a/lib/odp-util.c > +++ b/lib/odp-util.c > @@ -106,6 +106,7 @@ ovs_key_attr_to_string(enum ovs_key_attr attr) > case OVS_KEY_ATTR_ARP: return "arp"; > case OVS_KEY_ATTR_ND: return "nd"; > case OVS_KEY_ATTR_TUN_ID: return "tun_id"; > + case OVS_KEY_ATTR_IPV4_TUNNEL: return "ipv4_tunnel"; > > case __OVS_KEY_ATTR_MAX: > default: > @@ -614,6 +615,7 @@ odp_flow_key_attr_len(uint16_t type) > case OVS_KEY_ATTR_ICMPV6: return sizeof(struct ovs_key_icmpv6); > case OVS_KEY_ATTR_ARP: return sizeof(struct ovs_key_arp); > case OVS_KEY_ATTR_ND: return sizeof(struct ovs_key_nd); > + case OVS_KEY_ATTR_IPV4_TUNNEL: return sizeof(struct ovs_key_ipv4_tunnel); Since these statements aren't sensitive to ordering, I'm inclined to keep them in a more natural order based on layering. > diff --git a/lib/odp-util.h b/lib/odp-util.h > index 16f2b15..df89c72 100644 > --- a/lib/odp-util.h > +++ b/lib/odp-util.h > @@ -80,6 +80,7 @@ int odp_actions_from_string(const char *, const struct > simap *port_names, > * ------ --- ------ ----- > * OVS_KEY_ATTR_PRIORITY 4 -- 4 8 > * OVS_KEY_ATTR_TUN_ID 8 -- 4 12 > + * OVS_KEY_ATTR_IPV4_TUNNEL 18 2 4 24 Same issue with the calculation here as well. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev