On Fri, Feb 15, 2013 at 1:56 AM, Simon Horman <ho...@verge.net.au> wrote: > Allow datapath to recognize and extract MPLS labels into flow keys > and execute actions which push, pop, and set labels on packets. > > Based heavily on work by Leo Alterman and Ravi K. > > Cc: Ravi K <rke...@gmail.com> > Cc: Leo Alterman <lalter...@nicira.com> > Reviewed-by: Isaku Yamahata <yamah...@valinux.co.jp> > Signed-off-by: Simon Horman <ho...@verge.net.au>
I received some sparse errors: CHECK /home/jgross/openvswitch/datapath/linux/datapath.c /home/jgross/openvswitch/datapath/linux/datapath.c:74:5: warning: symbol 'ovs_dp_ioctl_hook' was not declared. Should it be static? /home/jgross/openvswitch/datapath/linux/datapath.c:87:13: warning: restricted __be16 degrades to integer /home/jgross/openvswitch/datapath/linux/datapath.c:87:13: warning: restricted __be16 degrades to integer The first one looks like it is due to a rebase error and the second one seems like a real endian problem. A couple high level comments: * I'd like to nail down the format of the userspace/kernel interface for multiple levels of tags. We don't need to implement multiple levels but we should have a good plan on how we would cleanly extend the interface to do that in the future if we want. Ben and I had talked about using an array of tags in a single Netlink attribute, which seems fairly clean and avoids needing lots of encap attributes. * I'm not really excited about enabling userspace to work blind, such as popping off multiple levels of tags or manipulating protocols beyond what it can see. This is currently a problem for vlans and the combination of MPLS and vlans makes things worse. Not to sound like a broken record but recirculation would solve this problem once and for all. Also, in the case of vlans, even though OpenFlow and OVS do support this behavior currently, I don't know of anyone using it since they don't have any visibility into the packet either. I'll make a few comments on the code below but the second one impacts a number of places so it's important to get resolved first. > diff --git a/datapath/actions.c b/datapath/actions.c > index f638ffc..60522be 100644 > --- a/datapath/actions.c > +++ b/datapath/actions.c > +static int push_mpls(struct sk_buff *skb, const struct ovs_action_push_mpls > *mpls) > +{ > + u32 l2_size; > + __be32 *new_mpls_lse; > + > + if (skb_cow_head(skb, MPLS_HLEN) < 0) { > + kfree_skb(skb); > + return -ENOMEM; > + } I don't think there is a reason to free the skb on error here since it will get handled later on like other actions. vlans require a special case but that doesn't mean that we have to propagate it. > + l2_size = skb_cb_l2_size(skb); > + skb_push(skb, MPLS_HLEN); > + memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb), > l2_size); > + skb_reset_mac_header(skb); Should we set the mac len as well? > + new_mpls_lse = (__be32 *)(skb_mac_header(skb) + l2_size); > + *new_mpls_lse = mpls->mpls_lse; > + > + set_ethertype(skb, mpls->mpls_ethertype); > + return 0; > +} There are potentially a number of issues with offloading here: if we have a packet that requires some form of offloading (say TSO) and we push an MPLS header on the front, then it's quite likely that the NIC can no longer handle the packet and we must emulate in software first. That would ideally mean GSO support for MPLS (as was done for vlan). In addition, as MPLS labels are modified we need to update skb->csum in the CHECKSUM_COMPLETE case (it looks like we have a similar issue for some of the other modifications as well). > +static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse) > +{ > + __be16 current_ethertype = get_ethertype(skb); > + if (eth_p_mpls(current_ethertype)) > + memcpy(skb_cb_mpls_bos(skb), mpls_lse, sizeof(__be32)); You could use MPLS_HLEN here. > diff --git a/datapath/datapath.c b/datapath/datapath.c > index 87c96ae..1386917 100644 > --- a/datapath/datapath.c > +++ b/datapath/datapath.c > +void skb_cb_set_l2_size(struct sk_buff *skb) > +{ > + struct ethhdr *eth; > + int nh_ofs; > + __be16 dl_type = 0; > + > + skb_reset_mac_header(skb); > + > + eth = eth_hdr(skb); > + nh_ofs = sizeof(struct ethhdr); > + if (likely(eth->h_proto >= htons(ETH_TYPE_MIN))) { > + dl_type = eth->h_proto; > + > + while (dl_type == htons(ETH_P_8021Q) && > + skb->len >= nh_ofs + sizeof(struct vlan_hdr)) > { > + struct vlan_hdr *vh = (struct vlan_hdr*)(skb->data + > nh_ofs); > + dl_type = vh->h_vlan_encapsulated_proto; > + nh_ofs += sizeof(struct vlan_hdr); > + } > + > + OVS_CB(skb)->l2_size = nh_ofs; > + } else { > + OVS_CB(skb)->l2_size = 0; > + } > +} It really seems like this should happen as part of flow extraction, similar to the initialization of the other layer pointers, rather than separately in every place that we get a packet. However, I don't think that we should be skipping over more vlan headers than we can parse anyways so if we follow the model that I mentioned above then the initialization should just happen naturally. > +unsigned char *skb_cb_mpls_bos(const struct sk_buff *skb) > +{ > + return skb_mac_header(skb) + OVS_CB(skb)->l2_size; > +} Isn't this actually supposed to point to the top of the stack? (I think it does but the name seems wrong.) > +ptrdiff_t skb_cb_l2_size(const struct sk_buff *skb) > +{ > + return OVS_CB(skb)->l2_size; > +} It seems like in some places we use this accessor function and in other places just retrieve the size directly. Does it provide much value? > @@ -667,6 +706,11 @@ static int validate_set(const struct nlattr *a, > > return validate_tp_port(flow_key); > > + case OVS_KEY_ATTR_MPLS: > + if (!eth_p_mpls(flow_key->eth.type)) > + return -EINVAL; I think this will fail if you try to first push and then set an MPLS header. > diff --git a/datapath/flow.c b/datapath/flow.c > index fad9e19..27e1920 100644 > --- a/datapath/flow.c > +++ b/datapath/flow.c > @@ -728,6 +728,17 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, > struct sw_flow_key *key, > memcpy(key->ipv4.arp.tha, arp->ar_tha, ETH_ALEN); > key_len = SW_FLOW_KEY_OFFSET(ipv4.arp); > } > + } else if (eth_p_mpls(key->eth.type)) { > + error = check_header(skb, MPLS_HLEN); > + if (unlikely(error)) > + goto out; > + > + key_len = SW_FLOW_KEY_OFFSET(mpls.top_lse); > + memcpy(&key->mpls.top_lse, skb_network_header(skb), > MPLS_HLEN); > + > + /* Update network header */ > + skb_set_network_header(skb, skb_network_header(skb) - > + skb->data + MPLS_HLEN); Is it possible to actually use this network header pointer? > @@ -838,6 +849,7 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { > [OVS_KEY_ATTR_ETHERNET] = sizeof(struct ovs_key_ethernet), > [OVS_KEY_ATTR_VLAN] = sizeof(__be16), > [OVS_KEY_ATTR_ETHERTYPE] = sizeof(__be16), > + [OVS_KEY_ATTR_MPLS] = sizeof(struct ovs_key_mpls), Can you put this under the non-upstream group? > @@ -1473,6 +1495,15 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key > *swkey, struct sk_buff *skb) > arp_key->arp_op = htons(swkey->ip.proto); > memcpy(arp_key->arp_sha, swkey->ipv4.arp.sha, ETH_ALEN); > memcpy(arp_key->arp_tha, swkey->ipv4.arp.tha, ETH_ALEN); > + } else if (eth_p_mpls(swkey->eth.type)) { > + struct ovs_key_mpls *mpls_key; > + > + nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS, sizeof(*mpls_key)); > + if (!nla) > + goto nla_put_failure; > + mpls_key = nla_data(nla); > + memset(mpls_key, 0, sizeof(struct ovs_key_mpls)); Is there any reason for this memset here? > diff --git a/datapath/flow.h b/datapath/flow.h > index 6949640..d8e350c 100644 > --- a/datapath/flow.h > +++ b/datapath/flow.h > @@ -73,6 +73,9 @@ struct sw_flow_key { > __be16 type; /* Ethernet frame type. */ > } eth; > struct { > + __be32 top_lse; /* top label stack entry */ > + } mpls; We should be able to make this a union with the IP headers: since we stop parsing after the MPLS header we can never have both at the same time so we can avoid expanding the size of the struct in the non-MPLS case. > +#define ETH_TYPE_MIN 0x600 This really seems like it belongs in the upstream kernel someplace since there's a million uses of this constant. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev