On Sep 6, 2012, at 10:24 PM, Jesse Gross wrote: > On Wed, Sep 5, 2012 at 2:58 PM, Kyle Mestery <kmest...@cisco.com> wrote: >> This is a first pass at providing a tun_key which can be >> used as the basis for flow-based tunnelling. The tun_key >> includes and replaces the tun_id in both struct ovs_skb_cb >> and struct sw_tun_key. >> >> This patch allows all existing tun_id behaviour to still work. Existing >> users of tun_id are redirected to tun_key->tun_id to retain compatibility. >> However, when the userspace code is updated to make use of the new tun_key, >> the old behaviour will be deprecated and removed. >> >> NOTE: With these changes, the tunneling code no longer assumes input and >> output keys are symmetric. If they are not, PMTUD needs to be disabled >> for tunneling to work. >> >> Signed-off-by: Kyle Mestery <kmest...@cisco.com> >> Cc: Simon Horman <ho...@verge.net.au> >> Cc: Jesse Gross <je...@nicira.com> > > A couple high level Linux kernel style comments: > * I noticed a couple places that use spaces instead of tabs. > * The multiline comment style used in networking (although not > currently respected everywhere) is: > /* Comment > * comment > */ OK, cleaned those up.
> >> diff --git a/NEWS b/NEWS >> index cbc5c58..0abf2fa 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -1,5 +1,7 @@ >> post-v1.8.0 >> ------------------------ >> + - The tunneling code no longer assumes input and output keys are >> symmetric. >> + If they are not, PMTUD needs to be disabled for tunneling to work. > > I would probably note that this only applies in the case of flow-based keys. > Fixed. >> diff --git a/datapath/actions.c b/datapath/actions.c >> index ec9b595..31f6bd6 100644 >> --- a/datapath/actions.c >> +++ b/datapath/actions.c >> @@ -377,6 +381,16 @@ static int do_execute_actions(struct datapath *dp, >> struct sk_buff *skb, >> int prev_port = -1; >> const struct nlattr *a; >> int rem; >> + struct ovs_key_ipv4_tunnel tun_key; >> + >> + /* >> + * If tun_key is NULL for this skb, point it at one on the stack for >> + * action processing and output. This can disappear once we drop >> support >> + * for setting tun_id outside of tun_key. >> + */ >> + memset(&tun_key, 0, sizeof(struct ovs_key_ipv4_tunnel)); >> + if (OVS_CB(skb)->tun_key == NULL) >> + OVS_CB(skb)->tun_key = &tun_key; > > I think ideally we want to push this down as far as possible. struct > ovs_key_ipv4_tunnel needs to be on the stack in do_execute_actions() > so that it is in scope when we output but it's best to do the > memsetting and assignment only when (and if) we actually set the > tun_id. We can just pass a pointer into execute_set_action(). > OK, fixed. >> diff --git a/datapath/flow.c b/datapath/flow.c >> index d07337c..3b85c62 100644 >> --- a/datapath/flow.c >> +++ b/datapath/flow.c >> @@ -1023,10 +1026,28 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, >> int *key_lenp, >> } >> >> if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) { >> - swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]); >> + tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]); >> attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID); >> } >> >> + if (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) { >> + struct ovs_key_ipv4_tunnel *tun_key; >> + tun_key = nla_data(a[OVS_KEY_ATTR_IPV4_TUNNEL]); >> + >> + /* >> + * If both OVS_KEY_ATTR_TUN_ID and OVS_KEY_ATTR_IPV4_TUNNEL >> are >> + * present, ensure they are consistent or error out. >> + */ >> + if (tun_id != 0) { >> + /* Verify they match */ >> + if (tun_id != tun_key->tun_id) >> + return -EINVAL; >> + } >> + >> + swkey->tun.tun_key = *tun_key; >> + attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL); >> + } > > I think if only OVS_KEY_ATTR_TUN_ID is sent then this will result in > tun_key never getting set at all. This shouldn't happen in practice > because on flow setup we always will send both together and they > should be echoed back together. It doesn't seem right to silently > ignore one though. > > Here's a different option - we could enforce that both must come > together and where there is overlap the information is consistent. > This should be easier to check and we can avoid ambiguities around the > value zero. It also treats them as one, which they effectively are. > On both the sent and receive sides I would combine the if blocks to > really make it explicit that they go together. > Done. I checked for both being set, and compared the values if so. I then checked for only one being set, and returned an error if so. >> +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, >> + struct ovs_key_ipv4_tunnel *tun_key, >> const struct nlattr *attr) >> { >> const struct nlattr *nla; >> int rem; >> >> *in_port = DP_MAX_PORTS; >> - *tun_id = 0; >> + memset(tun_key, 0, sizeof(struct ovs_key_ipv4_tunnel)); > > Can you use sizeof(*tun_key) here? > Fixed. >> @@ -1185,7 +1207,12 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 >> *in_port, __be64 *tun_id, >> break; >> >> case OVS_KEY_ATTR_TUN_ID: >> - *tun_id = nla_get_be64(nla); >> + tun_key->tun_id = nla_get_be64(nla); >> + break; >> + >> + case OVS_KEY_ATTR_IPV4_TUNNEL: >> + memcpy(tun_key, nla_data(nla), >> + sizeof(*tun_key)); > > We should enforce consistency here as well (this one is probably more > likely to be broken anyways because flow setups just echo back what > the kernel supplied but this is generated "by hand"). In this case we > can't enforce that both come together. > I don't think there is an implicit order to the netlink attributes here, so what I did was something like this: diff --git a/datapath/flow.c b/datapath/flow.c index 1c4eb99..5be492e 100644 --- a/datapath/flow.c +++ b/datapath/flow.c @@ -1187,9 +1187,10 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, { const struct nlattr *nla; int rem; + __be64 tun_id = 0; *in_port = DP_MAX_PORTS; - memset(tun_key, 0, sizeof(struct ovs_key_ipv4_tunnel)); + memset(tun_key, 0, sizeof(*tun_key)); *priority = 0; nla_for_each_nested(nla, attr, rem) { @@ -1205,12 +1206,19 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, break; case OVS_KEY_ATTR_TUN_ID: - tun_key->tun_id = nla_get_be64(nla); + tun_id = nla_get_be64(nla); + if (tun_key->tun_id != 0 && + tun_key->tun_id != tun_id) + return -EINVAL; break; case OVS_KEY_ATTR_IPV4_TUNNEL: + if (tun_key->tun_id != 0) + tun_id = tun_key->tun_id; memcpy(tun_key, nla_data(nla), sizeof(*tun_key)); + if (tun_id != 0 && tun_id != tun_key->tun_id) + return -EINVAL; break; >> diff --git a/datapath/flow.h b/datapath/flow.h >> index 02c563a..5d4fcde 100644 >> --- a/datapath/flow.h >> +++ b/datapath/flow.h >> @@ -42,7 +42,6 @@ struct sw_flow_actions { >> >> struct sw_flow_key { >> struct { >> - __be64 tun_id; /* Encapsulating tunnel ID. */ >> u32 priority; /* Packet QoS priority. */ >> u16 in_port; /* Input switch port (or >> DP_MAX_PORTS). */ >> } phy; >> @@ -92,6 +91,9 @@ struct sw_flow_key { >> } nd; >> } ipv6; >> }; >> + struct { >> + struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel >> key. */ >> + } tun; >> }; > > The solution for this is a little more complicated because for > efficiency we only match on as much of the flow struct as is > necessary. Since we currently calculate how far we need to search > (i.e. TCP/IPv4) without taking into account a tunnel header at the end > this will never match on the tunnel. The other less serious problem > is that this reduces the effectiveness of the partial length match > when tunneling because IPv4 packets will have to match over a series > of zeros where the IPv6 header exists. Basically we want to stick > tunnel headers at the end of the current match, where ever that might > be. > > One way of doing this is to add a copy of the tun_key to each union > that could possibly be the end of the struct. We essentially do that > with the TCP/UDP ports which could follow either a IPv4 or IPv6 > header. However in the case of tunnels there are many more possible > end locations (I count 7) and whereas L4 ports logically follows the > L3 header that's not really the case with tunnels, we're just moving > them around because we can and the benefit is large. > > I think what I would do is to make struct tun completely independent > of struct sw_flow_key and then have it "float" over where ever we > decide is the end of the flow struct. It's a little messy if you want > random access to it because you have to compute the size of the > earlier members but in practice don't think we ever do so it shouldn't > be too bad. > I think I see what you're looking for here. I'm going to make this a separate patch built on top of the other stuff, since it seems this may be a little more complicated as you indicate. I'll send it out with the other patch, though. >> diff --git a/datapath/tunnel.c b/datapath/tunnel.c >> index d651c11..0687f80 100644 >> --- a/datapath/tunnel.c >> +++ b/datapath/tunnel.c >> @@ -799,10 +786,10 @@ static void create_tunnel_header(const struct vport >> *vport, >> iph->ihl = sizeof(struct iphdr) >> 2; >> iph->frag_off = htons(IP_DF); >> iph->protocol = tnl_vport->tnl_ops->ipproto; >> - iph->tos = mutable->tos; >> + iph->tos = mutable->tos; >> iph->daddr = rt->rt_dst; >> iph->saddr = rt->rt_src; >> - iph->ttl = mutable->ttl; >> + iph->ttl = mutable->ttl; > > It looks like these might have been converted from tabs to spaces. > Fixed. >> @@ -1029,7 +1017,8 @@ static struct rtable *__find_route(const struct >> tnl_mutable_config *mutable, >> >> static struct rtable *find_route(struct vport *vport, >> const struct tnl_mutable_config *mutable, >> - u8 tos, struct tnl_cache **cache) >> + u8 tos, __be32 daddr, __be32 saddr, >> + struct tnl_cache **cache) > > Can we make the argument list here match __find_route (where > possible)? In particular, it's confusing that daddr and saddr are in > opposite order and no static checker will be able to catch those > mistakes. > Corrected. >> @@ -1219,11 +1213,23 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff >> *skb) >> >> if (mutable->flags & TNL_F_TOS_INHERIT) >> tos = inner_tos; >> + else if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_tos != 0) >> + tos = OVS_CB(skb)->tun_key->ipv4_tos; >> else >> tos = mutable->tos; >> >> + if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_dst != 0) >> + daddr = OVS_CB(skb)->tun_key->ipv4_dst; >> + else >> + daddr = mutable->key.daddr; >> + >> + if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_src != 0) >> + saddr = OVS_CB(skb)->tun_key->ipv4_src; >> + else >> + saddr = mutable->key.saddr; > > For these checks I think we actually want to always check for > tun_key->ipv4_dst != 0 as indicator for which type of tunnel key we > are using rather than to check for the actual value. For example, in > the case of ipv4_tos it's quite possible that 0 is the actual value > that is intended to be set. If it is set then I would use it, > overriding any other settings. > Corrected. >> diff --git a/datapath/tunnel.h b/datapath/tunnel.h >> index 1924017..78a4d14 100644 >> --- a/datapath/tunnel.h >> +++ b/datapath/tunnel.h >> @@ -286,4 +286,14 @@ static inline struct tnl_vport *tnl_vport_priv(const >> struct vport *vport) >> return vport_priv(vport); >> } >> >> +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key, >> + const struct iphdr *iph, __be64 tun_id) > > Can you use tnl_ as the prefix here like the other functions in this file? > Fixed. >> diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c >> index 05a099d..1e08d5a 100644 >> --- a/datapath/vport-capwap.c >> +++ b/datapath/vport-capwap.c >> @@ -220,7 +220,7 @@ static struct sk_buff *capwap_update_header(const struct >> vport *vport, >> struct capwaphdr_wsi *wsi = (struct capwaphdr_wsi *)(cwh + 1); >> struct capwaphdr_wsi_key *opt = (struct capwaphdr_wsi_key >> *)(wsi + 1); >> >> - opt->key = OVS_CB(skb)->tun_id; >> + opt->key = OVS_CB(skb)->tun_key->tun_id; > > In theory it's possible that tun_key is NULL (at the moment we always > initialize to something but that's not the long term goal), which > means that this will crash. While we could add a check here I think > it's probably better to do it in the generic tunneling code since once > we switch over to flow based tunneling it will be required that you > set the tun_key (for the IP addresses if for no other reason). What I > would do is add a check for tun_key == NULL in tnl_send(). If it is > NULL we can add a zeroed out dummy since there isn't a requirement > that the user call set tun_id yet. > Checked added in ovs_tnl_send(). >> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h >> index f5c9cca..f9f0c66 100644 >> --- a/include/linux/openvswitch.h >> +++ b/include/linux/openvswitch.h >> @@ -278,7 +278,8 @@ enum ovs_key_attr { >> OVS_KEY_ATTR_ICMPV6, /* struct ovs_key_icmpv6 */ >> OVS_KEY_ATTR_ARP, /* struct ovs_key_arp */ >> OVS_KEY_ATTR_ND, /* struct ovs_key_nd */ >> - OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */ >> + OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */ >> + OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */ > > I would assign OVS_KEY_ATTR_IPV4_TUNNEL the value of 62 for the time > being until we are ready to lock things down. > Got it, done. >> diff --git a/lib/odp-util.c b/lib/odp-util.c >> index 901dac3..5a17e11 100644 >> --- a/lib/odp-util.c >> +++ b/lib/odp-util.c >> @@ -99,13 +99,14 @@ ovs_key_attr_to_string(enum ovs_key_attr attr) >> case OVS_KEY_ATTR_ETHERTYPE: return "eth_type"; >> case OVS_KEY_ATTR_IPV4: return "ipv4"; >> case OVS_KEY_ATTR_IPV6: return "ipv6"; >> + case OVS_KEY_ATTR_TUN_ID: return "tun_id"; >> + case OVS_KEY_ATTR_IPV4_TUNNEL: return "ipv4_tunnel"; >> case OVS_KEY_ATTR_TCP: return "tcp"; >> case OVS_KEY_ATTR_UDP: return "udp"; >> case OVS_KEY_ATTR_ICMP: return "icmp"; >> case OVS_KEY_ATTR_ICMPV6: return "icmpv6"; >> case OVS_KEY_ATTR_ARP: return "arp"; >> case OVS_KEY_ATTR_ND: return "nd"; >> - case OVS_KEY_ATTR_TUN_ID: return "tun_id"; >> >> case __OVS_KEY_ATTR_MAX: >> default: >> @@ -601,13 +602,14 @@ odp_flow_key_attr_len(uint16_t type) >> switch ((enum ovs_key_attr) type) { >> case OVS_KEY_ATTR_ENCAP: return -2; >> case OVS_KEY_ATTR_PRIORITY: return 4; >> - case OVS_KEY_ATTR_TUN_ID: return 8; >> case OVS_KEY_ATTR_IN_PORT: return 4; >> case OVS_KEY_ATTR_ETHERNET: return sizeof(struct ovs_key_ethernet); >> case OVS_KEY_ATTR_VLAN: return sizeof(ovs_be16); >> case OVS_KEY_ATTR_ETHERTYPE: return 2; >> case OVS_KEY_ATTR_IPV4: return sizeof(struct ovs_key_ipv4); >> case OVS_KEY_ATTR_IPV6: return sizeof(struct ovs_key_ipv6); >> + case OVS_KEY_ATTR_TUN_ID: return 8; >> + case OVS_KEY_ATTR_IPV4_TUNNEL: return sizeof(struct >> ovs_key_ipv4_tunnel); > > Since the IP keys actually refer to the inner header I would logically > group the tunnel attributes together where they are in the second > block - after priority. > OK, makes sense. Corrected. >> diff --git a/lib/odp-util.h b/lib/odp-util.h >> index 16f2b15..250302e 100644 >> --- a/lib/odp-util.h >> +++ b/lib/odp-util.h >> @@ -90,12 +91,12 @@ int odp_actions_from_string(const char *, const struct >> simap *port_names, >> * OVS_KEY_ATTR_ICMPV6 2 2 4 8 >> * OVS_KEY_ATTR_ND 28 -- 4 32 >> * ------------------------------------------------- >> - * total 156 >> + * total 184 >> * >> * We include some slack space in case the calculation isn't quite right or >> we >> * add another field and forget to adjust this value. >> */ >> -#define ODPUTIL_FLOW_KEY_BYTES 200 >> +#define ODPUTIL_FLOW_KEY_BYTES 204 > > Since this is just an arbitrary amount of padding we can probably keep > it at 200 for the time being as a nice round number. Fixed. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev