On Sep 3, 2012, at 3:43 PM, Jesse Gross wrote: > On Wed, Aug 29, 2012 at 7:00 AM, Kyle Mestery <kmest...@cisco.com> wrote: >> This is a first pass at providing a tun_key which can be >> used as the basis for flow-based tunnelling. The tun_key >> includes and replaces the tun_id in both struct ovs_skb_cb >> and struct sw_tun_key. >> >> In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow >> when support for IPv6 to an extent that inlining the structure will result >> in ovs_skb_cb being larger than the 48 bytes available in skb->cb. >> >> As OVS does not support IPv6 as the outer transport protocol for tunnels >> the IPv6 portions of this change, which appeared in the previous revision, >> have been dropped in order to limit the scope and size of this patch. >> >> This patch allows all existing tun_id behaviour to still work. However, >> when the userspace code is updated to make use of the new tun_key, the >> old behaviour will be deprecated and removed. >> >> Signed-off-by: Kyle Mestery <kmest...@cisco.com> >> Cc: Simon Horman <ho...@verge.net.au> > > Thanks and sorry about the long delay. > No worries. Please see my comments below for each of yours. I'll send out a new patch today, with the exception of the changes for moving the tun_key value for matching optimizations, today.
> The commit message seems to be a little bit more of a historical > narrative than a description of what's going on in this particular > patch. Since I'd like to apply it on its own, can you make it > describe the rationale a little more clearly? > > sparse flagged a couple of errors that look one of the calls to > __find_route() in tunnel.c wasn't updated: > > /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69: warning: > incorrect type in argument 3 (different base types) > /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69: expected > restricted __be32 [usertype] daddr > /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69: got > unsigned char [unsigned] [usertype] tos > /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67: warning: > incorrect type in argument 5 (different base types) > /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67: expected > unsigned char [unsigned] [usertype] tos > /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67: got > restricted __be32 [usertype] saddr > Fixed. >> diff --git a/datapath/actions.c b/datapath/actions.c >> index 208f260..83382b9 100644 >> --- a/datapath/actions.c >> +++ b/datapath/actions.c >> @@ -343,7 +343,11 @@ static int execute_set_action(struct sk_buff *skb, >> break; >> >> case OVS_KEY_ATTR_TUN_ID: >> - OVS_CB(skb)->tun_id = nla_get_be64(nested_attr); >> + OVS_CB(skb)->tun_key->tun_id = nla_get_be64(nested_attr); > > I think this doesn't quite work out so simply as a compatibility layer > because tun_key is either NULL or pointing directly into the actions > of the flow, which are supposed to be immutable. Long term this won't > be a problem but in the meantime we'll have a figure out a way to fake > up an appropriate tun_key. > So, I think what we should do here is allocate OVS_CB(skb)->tun_key in this case. We could modify struct ovs_key_ipv4_tunnel to have a flag value indicating if it was allocated or not, to allow for correct handling at the time of termination. Thoughts? >> diff --git a/datapath/datapath.h b/datapath/datapath.h >> index affbf0e..5e8bff1 100644 >> --- a/datapath/datapath.h >> +++ b/datapath/datapath.h >> @@ -96,7 +96,7 @@ struct datapath { >> /** >> * struct ovs_skb_cb - OVS data in skb CB >> * @flow: The flow associated with this packet. May be %NULL if no flow. >> - * @tun_id: ID of the tunnel that encapsulated this packet. It is 0 if the >> + * @tun_key: Key for the tunnel that encapsulated this packet. > > Can you mention that this is NULL if the packet is not being tunneled? > Fixed. >> diff --git a/datapath/flow.c b/datapath/flow.c >> index d07337c..a68195d 100644 >> --- a/datapath/flow.c >> +++ b/datapath/flow.c >> @@ -1023,10 +1025,17 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, >> int *key_lenp, >> } >> >> if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) { >> - swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]); >> + swkey->phy.tun_key.tun_id = >> nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]); >> attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID); >> } >> >> + if (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) { >> + struct ovs_key_ipv4_tunnel *tun_key; >> + tun_key = nla_data(a[OVS_KEY_ATTR_IPV4_TUNNEL]); >> + swkey->phy.tun_key = *tun_key; >> + attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL); >> + } > > Although we should accept both forms, I think if both are present we > should ensure that they are consistent by checking that tun_id > matches. > Fixed. >> @@ -1162,14 +1171,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, >> int *key_lenp, >> * get the metadata, that is, the parts of the flow key that cannot be >> * extracted from the packet itself. >> */ >> -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 >> *tun_id, >> +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, >> + struct ovs_key_ipv4_tunnel *tun_key, >> const struct nlattr *attr) >> { >> const struct nlattr *nla; >> int rem; >> >> *in_port = DP_MAX_PORTS; >> - *tun_id = 0; >> + tun_key->tun_id = 0; > > I think we probably should memset the whole tun_key to ensure that > there isn't any uninitialized data. > Fixed. >> int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) >> { >> struct ovs_key_ethernet *eth_key; >> + struct ovs_key_ipv4_tunnel *tun_key; > > Can you declare this in block that checks for tun_key.ipv4_dst just to > keep everything close together? > Fixed. >> + if (swkey->phy.tun_key.tun_id != cpu_to_be64(0) && >> + nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, >> swkey->phy.tun_key.tun_id)) > > Can you line up the second line with spaces to swkey->... make it a > little clearer which part it belongs to prevent line wrapping? > Fixed. >> diff --git a/datapath/flow.h b/datapath/flow.h >> index 5be481e..bab5363 100644 >> --- a/datapath/flow.h >> +++ b/datapath/flow.h >> struct sw_flow_key { >> struct { >> - __be64 tun_id; /* Encapsulating tunnel ID. */ >> + struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel >> key. */ > > One thing that I think we should look into is how we can both shrink > this fields down for the purposes of matching (for example, I believe > the flags fields is always zero here) and move it to the end of the > struct (so that non-tunneled packets don't have to process it at all). > We've seen that hashing and comparison are the most expensive > operations in the OVS datapath so we should try to avoid introducing > extra bytes in places where they aren't needed. Obviously, this > becomes even more important with IPv6 tunneling. Normally I wouldn't > worry about optimization of a new feature until the end but since I'd > like to start putting in patches as they are ready it's important that > we don't introduce regressions for existing behavior. > So, as a start, move it out of the "struct phy" here into something at the end of "struct sw_flow_key"? >> @@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies); >> * ------ --- ------ ----- >> * OVS_KEY_ATTR_PRIORITY 4 -- 4 8 >> * OVS_KEY_ATTR_TUN_ID 8 -- 4 12 >> + * OVS_KEY_ATTR_IPV4_TUNNEL 18 2 4 24 > > I think this value might be out of date: I get 24 bytes as the base > size of the struct. Also, since the two bytes of padding are > explicitly part of the struct and not added by Netlink, I would > include them in the struct column rather than the pad column. > Fixed. >> +static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key) >> +{ >> + __be32 ndst = tun_key->ipv4_src; >> + tun_key->ipv4_src = tun_key->ipv4_dst; >> + tun_key->ipv4_dst = ndst; >> +} > > I don't think this function is actually used anywhere. > Removed. I checked, and later patches did not use it either. >> +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key, >> + const struct iphdr *iph, __be64 tun_id) >> +{ >> + tun_key->tun_id = tun_id; >> + tun_key->ipv4_src = iph->saddr; >> + tun_key->ipv4_dst = iph->daddr; >> + tun_key->ipv4_tos = iph->tos; >> + tun_key->ipv4_ttl = iph->ttl; >> +} > > I'd be inclined to move this to tunnel.h since it's only used by > tunnel implementations. > Moved. >> diff --git a/datapath/tunnel.c b/datapath/tunnel.c >> index d651c11..259668c 100644 >> --- a/datapath/tunnel.c >> +++ b/datapath/tunnel.c >> bool ovs_tnl_frag_needed(struct vport *vport, > [...] >> - /* >> - * Assume that flow based keys are symmetric with respect to input >> - * and output and use the key that we were going to put on the >> - * outgoing packet for the fake received packet. If the keys are >> - * not symmetric then PMTUD needs to be disabled since we won't have >> - * any way of synthesizing packets. >> - */ >> - if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) == >> - (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) >> - OVS_CB(nskb)->tun_id = flow_key; > > I think this is the one externally visibly feature that we're dropping > at this point. Can you make a note of it in the commit message and > NEWS? > Updated, please review. >> @@ -799,10 +786,8 @@ static void create_tunnel_header(const struct vport >> *vport, >> iph->ihl = sizeof(struct iphdr) >> 2; >> iph->frag_off = htons(IP_DF); >> iph->protocol = tnl_vport->tnl_ops->ipproto; >> - iph->tos = mutable->tos; >> iph->daddr = rt->rt_dst; >> iph->saddr = rt->rt_src; >> - iph->ttl = mutable->ttl; >> if (!iph->ttl) >> iph->ttl = ip4_dst_hoplimit(&rt_dst(rt)); > > I think these fields don't get used at this point but I'm not sure > that the change is really related to the patch (and it also uses > uninitialized memory). > Added back in. >> static struct rtable *__find_route(const struct tnl_mutable_config *mutable, >> - u8 ipproto, u8 tos) >> + u8 ipproto, __be32 daddr, __be32 saddr, >> + u8 tos) > > I find this order of arguments somewhat confusing (and judging by the > fact that one place calls it wrong, I think you must too). How about > saddr, daddr, tos, ipproto? > Fixed __find_route() and it's callers to include the order. I agree, it makes more sense with the ordering you suggest here. >> @@ -1037,14 +1024,16 @@ static struct rtable *find_route(struct vport *vport, >> *cache = NULL; >> tos = RT_TOS(tos); >> >> - if (likely(tos == RT_TOS(mutable->tos) && >> - check_cache_valid(cur_cache, mutable))) { >> + if (daddr == mutable->key.daddr && saddr == mutable->key.saddr && >> + tos == RT_TOS(mutable->tos) && > > I think we need to replicate this check down below where we build the > cache - otherwise we could cache addresses from a specific flow that > don't apply to this port in general. > Replicated. >> @@ -1219,11 +1210,21 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff >> *skb) >> >> if (mutable->flags & TNL_F_TOS_INHERIT) >> tos = inner_tos; >> + else if (OVS_CB(skb)->tun_key) >> + tos = OVS_CB(skb)->tun_key->ipv4_tos; >> else >> tos = mutable->tos; >> >> + if (OVS_CB(skb)->tun_key) { >> + daddr = OVS_CB(skb)->tun_key->ipv4_dst; >> + saddr = OVS_CB(skb)->tun_key->ipv4_src; >> + } else { >> + daddr = mutable->key.daddr; >> + saddr = mutable->key.saddr; >> + } > [...] >> /* TTL */ >> - ttl = mutable->ttl; >> + if (OVS_CB(skb)->tun_key) >> + ttl = OVS_CB(skb)->tun_key->ipv4_ttl; >> + else >> + ttl = mutable->ttl; >> if (!ttl) >> ttl = ip4_dst_hoplimit(&rt_dst(rt)); > > I think in all of these cases we want to use a check for ipv4_dst != 0 > instead of tun_key being non-NULL since we're planning on using > tun_key even if only tun_id was initialized. > Updated. > That being said, if the full tun_key is set, that should probably > override all of the other settings. When it is used userspace can > directly initialize all of the values without any kernel involvement > and that the direction that we want to go in. If userspace is using > this action, we can assume that it accepts this new model. This also > includes ECN encapsulation, since userspace can do that as well. > Agree, I think it's covered now. >> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c >> index ab89c5b..90041af 100644 >> --- a/datapath/vport-gre.c >> +++ b/datapath/vport-gre.c >> @@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct >> vport *vport, >> __be32 *options = (__be32 *)(skb_network_header(skb) + >> mutable->tunnel_hlen >> - GRE_HEADER_SECTION); >> >> - /* Work backwards over the options so the checksum is last. */ >> - if (mutable->flags & TNL_F_OUT_KEY_ACTION) >> - *options = be64_get_low32(OVS_CB(skb)->tun_id); > > Hmm, it doesn't seem like we should be removing this. > Added back in. >> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h >> index f5c9cca..c32bb58 100644 >> --- a/include/linux/openvswitch.h >> +++ b/include/linux/openvswitch.h >> @@ -278,7 +278,8 @@ enum ovs_key_attr { >> OVS_KEY_ATTR_ICMPV6, /* struct ovs_key_icmpv6 */ >> OVS_KEY_ATTR_ARP, /* struct ovs_key_arp */ >> OVS_KEY_ATTR_ND, /* struct ovs_key_nd */ >> - OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */ >> + OVS_KEY_ATTR_TUN_ID, /* be64 tunnel ID */ >> + OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */ > > Where possible we should try to avoid breaking userspace/kernel > compatibility. We won't be able to completely prevent it since we > want to completely switch over to the new mechanism but we should > limit the number of times that we have to do it. In this case, let's > keep OVS_KEY_ATTR_TUN_ID as 63. For OVS_KEY_ATTR_IPV4_TUNNEL, let's > take the next high value (i.e. 62). Pretty soon we'll want to use a > "real" value but until we have some userspace code that actually > depends on it this will give us a chance to make any modifications we > need. > Updated. >> diff --git a/lib/odp-util.c b/lib/odp-util.c >> index 901dac3..e2c1f2a 100644 >> --- a/lib/odp-util.c >> +++ b/lib/odp-util.c >> @@ -106,6 +106,7 @@ ovs_key_attr_to_string(enum ovs_key_attr attr) >> case OVS_KEY_ATTR_ARP: return "arp"; >> case OVS_KEY_ATTR_ND: return "nd"; >> case OVS_KEY_ATTR_TUN_ID: return "tun_id"; >> + case OVS_KEY_ATTR_IPV4_TUNNEL: return "ipv4_tunnel"; >> >> case __OVS_KEY_ATTR_MAX: >> default: >> @@ -614,6 +615,7 @@ odp_flow_key_attr_len(uint16_t type) >> case OVS_KEY_ATTR_ICMPV6: return sizeof(struct ovs_key_icmpv6); >> case OVS_KEY_ATTR_ARP: return sizeof(struct ovs_key_arp); >> case OVS_KEY_ATTR_ND: return sizeof(struct ovs_key_nd); >> + case OVS_KEY_ATTR_IPV4_TUNNEL: return sizeof(struct >> ovs_key_ipv4_tunnel); > > Since these statements aren't sensitive to ordering, I'm inclined to > keep them in a more natural order based on layering. > Fixed the ordering here. >> diff --git a/lib/odp-util.h b/lib/odp-util.h >> index 16f2b15..df89c72 100644 >> --- a/lib/odp-util.h >> +++ b/lib/odp-util.h >> @@ -80,6 +80,7 @@ int odp_actions_from_string(const char *, const struct >> simap *port_names, >> * ------ --- ------ ----- >> * OVS_KEY_ATTR_PRIORITY 4 -- 4 8 >> * OVS_KEY_ATTR_TUN_ID 8 -- 4 12 >> + * OVS_KEY_ATTR_IPV4_TUNNEL 18 2 4 24 > > Same issue with the calculation here as well. Fixed. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev