On Oct 3, 2012, at 11:38 AM, Jesse Gross wrote: > On Mon, Oct 1, 2012 at 8:52 AM, Kyle Mestery <kmest...@cisco.com> wrote: >> This is a first pass at providing a tun_key which can be >> used as the basis for flow-based tunnelling. The >> tun_key includes and replaces the tun_id in both struct >> ovs_skb_cb and struct sw_tun_key. >> >> This patch allows all existing tun_id behaviour to still work. Existing >> users of tun_id are redirected to tun_key->tun_id to retain compatibility. >> However, when the userspace code is updated to make use of the new tun_key, >> the old behaviour will be deprecated and removed. >> >> NOTE: With these changes, the tunneling code no longer assumes input and >> output keys are symmetric. If they are not, PMTUD needs to be disabled >> for tunneling to work. >> >> Signed-off-by: Kyle Mestery <kmest...@cisco.com> >> Cc: Simon Horman <ho...@verge.net.au> >> Cc: Jesse Gross <je...@nicira.com> > > I see some new compiler warnings with these changes applied: > > CC [M] /home/jgross/openvswitch/datapath/linux/flow.o > /home/jgross/openvswitch/datapath/linux/flow.c: In function > ‘ovs_flow_from_nlattrs’: > /home/jgross/openvswitch/datapath/linux/flow.c:1044:6: warning: > suggest parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’ > to ‘~’ [-Wparentheses] > /home/jgross/openvswitch/datapath/linux/flow.c:1046:6: warning: > suggest parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’ > to ‘~’ [-Wparentheses] > Thanks, I'll clean these up.
> Also, when I was working on the userspace portions I defined three > flags to be used instead of the current tunnel port configuration: > > Flow lib/flow.h: > #define FLOW_TNL_F_DONT_FRAGMENT (1 << 0) > #define FLOW_TNL_F_CSUM (1 << 1) > #define FLOW_TNL_F_KEY (1 << 2) > > Can you add these to the kernel header and allow them to control > encapsulation/decapsulation? This was one of the main blockers that I > ran into when doing the userspace work. > Sure, I can add this. > Finally, the remaining piece in order for this to be useful in the new > model is to allow packets to be send and received on tunnel ports > without a dest IP. Otherwise, it's not possible to create the tunnel > ports that can handle traffic on a flow basis. That could possibly be > a follow up patch though. > I'll look into this, maybe make it a subsequent patch. >> diff --git a/datapath/actions.c b/datapath/actions.c >> index ec9b595..a70cde8 100644 >> --- a/datapath/actions.c >> +++ b/datapath/actions.c >> @@ -377,6 +392,7 @@ static int do_execute_actions(struct datapath *dp, >> struct sk_buff *skb, >> int prev_port = -1; >> const struct nlattr *a; >> int rem; >> + struct ovs_key_ipv4_tunnel tun_key; > > Unfortunately, I think this needs to be on the stack in the top level > ovs_execute_actions(). Imagine this set of actions: > sample(0.5, set(tun_id=10)), output: tunnel > > When we assign tun_key the stack frame will look like this: > #1 ovs_execute_actions() > #2 do_execute_actions() > #3 sample() > #4 do_execute_actions() > #5 execute_set_action() > > tun_key will be from stack frame #4. > > However, when we actually use it the stack will look like: > #1 ovs_execute_actions() > #2 do_execute_actions() > #3 do_output() > OK, thanks for the explanation here. >> diff --git a/datapath/flow.c b/datapath/flow.c >> index d07337c..5be492e 100644 >> --- a/datapath/flow.c >> +++ b/datapath/flow.c >> static int ipv4_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len, >> @@ -996,6 +998,7 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int >> *key_lenp, >> { >> const struct nlattr *a[OVS_KEY_ATTR_MAX + 1]; >> const struct ovs_key_ethernet *eth_key; >> + __be64 tun_id = 0; > > I think this is only used in the block where we parse > TUN_ID/IPV4_TUNNEL, so we can just declare it there. > Sure, will move it. >> @@ -1022,10 +1025,26 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, >> int *key_lenp, >> swkey->phy.in_port = DP_MAX_PORTS; >> } >> >> - if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) { >> - swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]); >> + /* OVS_KEY_ATTR_TUN_ID and OVS_KEY_ATTR_IPV4_TUNEL must both arrive >> + * together. >> + */ >> + if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID) && >> + attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) { > > I know I said that we should combine these checks together but I > realized that there is a problem. In order to maintain compatibility > in the future, we want to enforce sanity all that we want to make now. > It's fine to make this that were previously rejected OK but we can't > do the reverse. Specifically, we should enforce that whenever we have > an OVS_KEY_ATTR_IPV4_TUNNEL, it should be valid (i.e. have a > ipv4_dst). In the case of flow misses, this isn't a problem because > all of our tunnels will produce a valid IPV4_TUNNEL if they have a > TUN_ID, so we could mandate both. > > However, in the case of flow metadata, like a packet out from a > controller, we won't have IPV4_TUNNEL information until we actually > have the full userspace infrastructure. It's possible these packets > end up going back to userspace again and when we serialize the flow, > we'll break the rule that both IPV4_TUNNEL and TUN_ID come together. > > This is a long way of saying that we do have to allow these to be > separate but if both come together then we should check that the > tun_id matches and should enforce that ipv4_dst != 0 everywhere. > OK, I think I may have had it this way at one point, I'll go ahead and reformulate this check as above. >> + swkey->tun.tun_key = *tun_key; >> attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID); >> - } >> + attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL); >> + } else if ((attrs & (1ULL << OVS_KEY_ATTR_TUN_ID) && >> + !attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) || >> + (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL) && >> + !attrs & (1ULL << OVS_KEY_ATTR_TUN_ID))) >> + return -EINVAL; > > I don't believe that this else if block is necessary. If we don't get > the correct tunnel information then we won't zero out the appropriate > attributes and we'll return an error at the end for having unparsed > information. > OK, will remove this. >> @@ -1185,7 +1206,19 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 >> *in_port, __be64 *tun_id, >> break; >> >> case OVS_KEY_ATTR_TUN_ID: >> - *tun_id = nla_get_be64(nla); >> + tun_id = nla_get_be64(nla); >> + if (tun_key->tun_id != 0 && >> + tun_key->tun_id != tun_id) >> + return -EINVAL; >> + break; >> + >> + case OVS_KEY_ATTR_IPV4_TUNNEL: >> + if (tun_key->tun_id != 0) >> + tun_id = tun_key->tun_id; >> + memcpy(tun_key, nla_data(nla), >> + sizeof(*tun_key)); >> + if (tun_id != 0 && tun_id != tun_key->tun_id) >> + return -EINVAL; >> break; >> >> case OVS_KEY_ATTR_IN_PORT: > > Similar to my comments above, we should check that ipv4_dst != 0. I > think we should also use a flag to distinguish the case of receiving a > tun_id from the value 0. > Yes, I noticed in a few spots a check for tun_id == 0, and I believe it was causing issues while I was unit testing this. I'll make the change. >> diff --git a/datapath/flow.h b/datapath/flow.h >> index 02c563a..5d4fcde 100644 >> --- a/datapath/flow.h >> +++ b/datapath/flow.h >> @@ -42,7 +42,6 @@ struct sw_flow_actions { >> >> struct sw_flow_key { >> struct { >> - __be64 tun_id; /* Encapsulating tunnel ID. */ >> u32 priority; /* Packet QoS priority. */ >> u16 in_port; /* Input switch port (or >> DP_MAX_PORTS). */ >> } phy; >> @@ -92,6 +91,9 @@ struct sw_flow_key { >> } nd; >> } ipv6; >> }; >> + struct { >> + struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel >> key. */ >> + } tun; >> }; > > For this patch we should keep struct tun at the beginning of the flow. > Otherwise, the variable length flow matching won't take it into > account and it won't get matched at all. In the next patch we can > optimize it. > OK, got it, will migrate this. >> diff --git a/datapath/tunnel.c b/datapath/tunnel.c >> index d651c11..b62c469 100644 >> --- a/datapath/tunnel.c >> +++ b/datapath/tunnel.c >> @@ -1217,13 +1212,32 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff >> *skb) >> else >> inner_tos = 0; >> >> + /* If OVS_CB(skb)->tun_key is NULL, point it at the local tun_key >> here, >> + * and zero it out. >> + */ >> + if (OVS_CB(skb)->tun_key == NULL) { >> + memset(&tun_key, 0, sizeof(tun_key)); >> + OVS_CB(skb)->tun_key = &tun_key; >> + } > > Can you stick this above the inner_tos block? Currently, it's right > in the middle of the ToS code. I'd be inclined to put the address > computation before ToS as because it feels more natural but that's > less significant. > Will change this. >> if (mutable->flags & TNL_F_TOS_INHERIT) >> tos = inner_tos; >> + else if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_dst != 0) >> + tos = OVS_CB(skb)->tun_key->ipv4_tos; >> else >> tos = mutable->tos; > > We can drop all the checks for !OVS_CB(skb)->tun_key at this point > since we just initialized it. > OK. > However, if we do have OVS_CB(skb)->tun_key->ipv4_dst != 0, then it > should completely override all other ToS processing, including both > inheritance and ECN encapsulation. > Got it. >> @@ -1260,7 +1274,10 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff >> *skb) >> } >> >> /* TTL */ >> - ttl = mutable->ttl; >> + if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_dst != 0) >> + ttl = OVS_CB(skb)->tun_key->ipv4_ttl; >> + else >> + ttl = mutable->ttl; >> if (!ttl) >> ttl = ip4_dst_hoplimit(&rt_dst(rt)); > > I think the check for !ttl belongs in the else block. I can't think > of a reason why userspace would use a TTL of zero but I'd prefer to > keep these types of checks out of OVS once we move completely over to > the new model. > Will make this change. >> diff --git a/datapath/tunnel.h b/datapath/tunnel.h >> index 1924017..9ff6e61 100644 >> --- a/datapath/tunnel.h >> +++ b/datapath/tunnel.h >> +static inline void tnl_tun_key_init(struct ovs_key_ipv4_tunnel *tun_key, >> + const struct iphdr *iph, __be64 tun_id) >> +{ >> + tun_key->tun_id = tun_id; >> + tun_key->ipv4_src = iph->saddr; >> + tun_key->ipv4_dst = iph->daddr; >> + tun_key->ipv4_tos = iph->tos; >> + tun_key->ipv4_ttl = iph->ttl; > > I noticed that this leaves flags uninitialized and is currently > reporting garbage to userspace on receive. However, when we add the > new flags to the kernel interface we'll actually need to populate it > anyways since CSUM and KEY are relevant on receive. OK, will update this as well. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev