Sorry about the screwed up formatting, I wrote it on a plane with a different mail client. I'm going to try again.
On Sun, Jun 3, 2012 at 6:01 PM, Jesse Gross <je...@nicira.com> wrote: > On May 24, 2012, at 2:08 AM, Simon Horman wrote: > > this is a first pass at providing a tun_key which can be used > as the basis for flow-based tunnelling. The tun_key includes and > replaces the tun_id in both struct ovs_skb_cb and struct sw_tun_key. > > In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow > when support for IPv6 to an extent that inlining the structure will result > in ovs_skb_cb being larger than the 48 bytes available in skb->cb. > > As OVS does not support IPv6 as the outer transport protocol for tunnels > the IPv6 portions of this change, which appeared in the previous revision, > have been dropped in order to limit the scope and size of this patch. > > This patch does not make any effort to retain the existing tun_id behaviour > nor does it fully implement flow-based tunnels. As such it it is incomplete > and can't be used in its current form (other than to break OVS tunnelling). > > ** Please do not apply ** > > Cc: Kyle Mestery <kmest...@cisco.com> > Signed-off-by: Simon Horman <ho...@verge.net.au> > > > Thanks and sorry again about being so slow to look at this. > > Overall, this looks pretty good to me. The main difficulty that I had was > in figuring out what should go with the old behavior and what should go with > the new since it's at an intermediate point between the two but I understand > that it's difficult to break it up in a way that both encapsulates a > particular set of functionality and isn't too large. Otherwise, I noticed a > few specific things that I noted below. > > diff --git a/datapath/flow.c b/datapath/flow.c > index d07337c..49c0dd8 100644 > --- a/datapath/flow.c > +++ b/datapath/flow.c > @@ -1162,14 +1166,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, > int *key_lenp, > * get the metadata, that is, the parts of the flow key that cannot be > * extracted from the packet itself. > */ > -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 > *tun_id, > +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, > + struct ovs_key_ipv4_tunnel *tun_key, > const struct nlattr *attr) > { > const struct nlattr *nla; > int rem; > > *in_port = DP_MAX_PORTS; > - *tun_id = 0; > + tun_key->tun_id = 0; > > > I think we probably want to memset the entire tun_key to zero to avoid > having potentially uninitialized data in the flow. > > > @@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, > u16 *in_port, __be64 *tun_id, > int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff > *skb) > { > struct ovs_key_ethernet *eth_key; > + struct ovs_key_ipv4_tunnel *tun_key; > struct nlattr *nla, *encap; > > if (swkey->phy.priority && > nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority)) > goto nla_put_failure; > > - if (swkey->phy.tun_id != cpu_to_be64(0) && > - nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id)) > - goto nla_put_failure; > + if (swkey->phy.tun_key.ipv4_dst) { > > > It's probably OK to use DIP equal to zero as a not present marker but we > need to enforce that it's always true - for example we shouldn't allow > somebody to setup a flow that way or receive packets with a zero address. > Alternately, we may be able to find a spare bit to indicate this, like is > done with vlans. > > In any case, I think we need to do some additional validation when setting > up flows to check reserved space, for example, as otherwise that will never > match. > > diff --git a/datapath/flow.h b/datapath/flow.h > index 5be481e..bab5363 100644 > --- a/datapath/flow.h > +++ b/datapath/flow.h > @@ -42,7 +42,7 @@ struct sw_flow_actions { > > struct sw_flow_key { > struct { > - __be64 tun_id; /* Encapsulating tunnel ID. */ > + struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel key. */ > > > This is an optimization but as we get closer I'd like to put the tun_key at > the end of struct sw_flow_key so that packets that didn't come from a tunnel > don't have to pay the cost during the lookup (this is especially true as we > add support for IPv6 tunnels). > > In a similar vein, struct ovs_key_ipv4_tunnel contains some fields that I > think can never apply for lookup such as the flags so it would be nice if we > could remove that for lookup. > > > @@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies); > * ------ --- ------ ----- > * OVS_KEY_ATTR_PRIORITY 4 -- 4 8 > * OVS_KEY_ATTR_TUN_ID 8 -- 4 12 > + * OVS_KEY_ATTR_IPV4_TUNNEL 18 2 4 24 > > > If my math is correct, I think the size of the base struct > ova_key_ipv4_tunnel is 24 bytes. > > +static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key) > +{ > + __be32 ndst = tun_key->ipv4_src; > + tun_key->ipv4_src = tun_key->ipv4_dst; > + tun_key->ipv4_dst = ndst; > +} > > > I'm not quite sure when we would need to swap the addresses in a tunnel and > I didn't see any uses of this function. > > +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key, > + const struct iphdr *iph, __be64 tun_id) > +{ > + tun_key->tun_id = tun_id; > + tun_key->ipv4_src = iph->saddr; > + tun_key->ipv4_dst = iph->daddr; > + tun_key->ipv4_tos = iph->tos; > + tun_key->ipv4_ttl = iph->ttl; > +} > > > Aren't there some fields that we need to zero out to avoid problems in the > lookup? > > diff --git a/datapath/tunnel.c b/datapath/tunnel.c > index d651c11..010e513 100644 > --- a/datapath/tunnel.c > +++ b/datapath/tunnel.c > @@ -367,9 +367,9 @@ struct vport *ovs_tnl_find_port(struct net *net, __be32 > saddr, __be32 daddr, > return NULL; > } > > -static void ecn_decapsulate(struct sk_buff *skb, u8 tos) > +static void ecn_decapsulate(struct sk_buff *skb) > { > - if (unlikely(INET_ECN_is_ce(tos))) { > + if (unlikely(INET_ECN_is_ce(OVS_CB(skb)->tun_key->ipv4_tos))) { > __be16 protocol = skb->protocol; > > > This might come in a later patch, although I didn't see it in a quick scan, > but it should be possible to implement all the ECN encapsulation and > decapsulation in userspace, just like we can do with the rest of the ToS and > TTL. > > > bool ovs_tnl_frag_needed(struct vport *vport, > const struct tnl_mutable_config *mutable, > - struct sk_buff *skb, unsigned int mtu, __be64 flow_key) > + struct sk_buff *skb, unsigned int mtu, > + struct ovs_key_ipv4_tunnel *tun_key) > { > unsigned int eth_hdr_len = ETH_HLEN; > unsigned int total_length = 0, header_length = 0, payload_length; > struct ethhdr *eh, *old_eh = eth_hdr(skb); > struct sk_buff *nskb; > + struct ovs_key_ipv4_tunnel ntun_key; > > /* Sanity check */ > if (skb->protocol == htons(ETH_P_IP)) { > @@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport, > * any way of synthesizing packets. > */ > if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) == > - (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) > - OVS_CB(nskb)->tun_id = flow_key; > + (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) { > + ntun_key = *tun_key; > + OVS_CB(nskb)->tun_key = &ntun_key; > + } > > > I guess this is probably where you were going to use the function to reverse > IP addresses. The logic doesn't really work but it's moot since this is > going away anyways. > > > @@ -799,10 +803,8 @@ static void create_tunnel_header(const struct vport > *vport, > iph->ihl = sizeof(struct iphdr) >> 2; > iph->frag_off = htons(IP_DF); > iph->protocol = tnl_vport->tnl_ops->ipproto; > - iph->tos = mutable->tos; > iph->daddr = rt->rt_dst; > iph->saddr = rt->rt_src; > - iph->ttl = mutable->ttl; > if (!iph->ttl) > iph->ttl = ip4_dst_hoplimit(&rt_dst(rt)); > > > I'm not sure that these changes quite belong in this patch (not that it > shouldn't be done but it seems like the supporting code isn't there yet). > > > diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c > index ab89c5b..fd2b038 100644 > --- a/datapath/vport-gre.c > +++ b/datapath/vport-gre.c > @@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct > vport *vport, > __be32 *options = (__be32 *)(skb_network_header(skb) + mutable->tunnel_hlen > - GRE_HEADER_SECTION); > > - /* Work backwards over the options so the checksum is last. */ > - if (mutable->flags & TNL_F_OUT_KEY_ACTION) > - *options = be64_get_low32(OVS_CB(skb)->tun_id); > > > Why does this go away? > > diff --git a/datapath/vport.c b/datapath/vport.c > index 172261a..0c77a1b 100644 > --- a/datapath/vport.c > +++ b/datapath/vport.c > @@ -462,7 +462,7 @@ void ovs_vport_receive(struct vport *vport, struct > sk_buff *skb) > OVS_CB(skb)->flow = NULL; > > if (!(vport->ops->flags & VPORT_F_TUN_ID)) > - OVS_CB(skb)->tun_id = 0; > + OVS_CB(skb)->tun_key = NULL; > > > We probably should rename this flag now. > > diff --git a/lib/odp-util.h b/lib/odp-util.h > index d53f083..4e5a8a1 100644 > --- a/lib/odp-util.h > +++ b/lib/odp-util.h > @@ -72,6 +72,7 @@ int odp_actions_from_string(const char *, const struct > simap *port_names, > * ------ --- ------ ----- > * OVS_KEY_ATTR_PRIORITY 4 -- 4 8 > * OVS_KEY_ATTR_TUN_ID 8 -- 4 12 > + * OVS_KEY_ATTR_IPV4_TUNNEL 18 2 4 24 > > > Same thing about the size here as well. > > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev