On Wed, Feb 27, 2013 at 07:39:02AM -0800, Jesse Gross wrote: > On Tue, Feb 26, 2013 at 11:10 PM, Simon Horman <ho...@verge.net.au> wrote: > > On Tue, Feb 26, 2013 at 02:49:47PM -0800, Jesse Gross wrote: > >> A couple high level comments: > >> * I'd like to nail down the format of the userspace/kernel interface > >> for multiple levels of tags. We don't need to implement multiple > >> levels but we should have a good plan on how we would cleanly extend > >> the interface to do that in the future if we want. Ben and I had > >> talked about using an array of tags in a single Netlink attribute, > >> which seems fairly clean and avoids needing lots of encap attributes. > > > > Is the implication here that the size of the array and thus depth of the > > tags is fixed? > > I think we could use a variable sized array and then use the size from > the Netlink attribute to get the number of elements. That way we > could easily expand it over time.
Thanks, I think that has worked out pretty cleanly. What I have done is to allow the attribute to be an array of one or more struct ovs_key_mpls. This doesn't actually require any code changes because its a superset of the previous arrangement of an element with exactly one struct ovs_key_mpls. However, I added a little bit of code to the datapath to allow verification parsing of the array. This is shown in the incremental patch at the bottom of this email which I will roll into the next revision of the patch to add MPLS support to the datapath (the patch this thread is about). I also added a bit of code to ovs-vswitchd to exercise this lightly which seems to work out well enough. > > >> > diff --git a/datapath/actions.c b/datapath/actions.c > >> > index f638ffc..60522be 100644 > >> > --- a/datapath/actions.c > >> > +++ b/datapath/actions.c > >> > + new_mpls_lse = (__be32 *)(skb_mac_header(skb) + l2_size); > >> > + *new_mpls_lse = mpls->mpls_lse; > >> > + > >> > + set_ethertype(skb, mpls->mpls_ethertype); > >> > + return 0; > >> > +} > >> > >> There are potentially a number of issues with offloading here: if we > >> have a packet that requires some form of offloading (say TSO) and we > >> push an MPLS header on the front, then it's quite likely that the NIC > >> can no longer handle the packet and we must emulate in software first. > >> That would ideally mean GSO support for MPLS (as was done for vlan). > > > > I take it that would be work in the core kernel code. > > Could you give me a few pointers as to what I should be looking at? > > It would mostly be the code around skb_gso_segment() that would need > to skip over and replicate the MPLS tags. dev_hard_start_xmit() also > likely needs some work to make sure that we take the software path > instead of using hardware that can't offload. Thanks. I have added it to the TODO list and will look into it. No doubt I will have more questions at that time. > >> > diff --git a/datapath/datapath.c b/datapath/datapath.c > >> > index 87c96ae..1386917 100644 > >> > --- a/datapath/datapath.c > >> > +++ b/datapath/datapath.c > >> > +void skb_cb_set_l2_size(struct sk_buff *skb) > >> > +{ > >> > + struct ethhdr *eth; > >> > + int nh_ofs; > >> > + __be16 dl_type = 0; > >> > + > >> > + skb_reset_mac_header(skb); > >> > + > >> > + eth = eth_hdr(skb); > >> > + nh_ofs = sizeof(struct ethhdr); > >> > + if (likely(eth->h_proto >= htons(ETH_TYPE_MIN))) { > >> > + dl_type = eth->h_proto; > >> > + > >> > + while (dl_type == htons(ETH_P_8021Q) && > >> > + skb->len >= nh_ofs + sizeof(struct > >> > vlan_hdr)) { > >> > + struct vlan_hdr *vh = (struct > >> > vlan_hdr*)(skb->data + nh_ofs); > >> > + dl_type = vh->h_vlan_encapsulated_proto; > >> > + nh_ofs += sizeof(struct vlan_hdr); > >> > + } > >> > + > >> > + OVS_CB(skb)->l2_size = nh_ofs; > >> > + } else { > >> > + OVS_CB(skb)->l2_size = 0; > >> > + } > >> > +} > >> > >> It really seems like this should happen as part of flow extraction, > >> similar to the initialization of the other layer pointers, rather than > >> separately in every place that we get a packet. > > > > Yes, I think that should work. > > > > Thinking out aloud: > > > > I think that it should be possible to implement your idea by > > having ovs_flow_extract() set OVS_CB(skb)->l2_size. > > > > Currently skb_cb_set_l2_size() is called in two locations, > > ovs_packet_cmd_execute() and ovs_vport_receive(). > > > > In the case of ovs_packet_cmd_execute() there is a previously a > > call to ovs_flow_extract(). > > > > And in the case of ovs_vport_receive() there is subsequently a > > call to ovs_dp_process_received_packet() which in turn calls > > ovs_flow_extract() if OVS_CB(skb)->flow is not set. > > All of that sounds right to me. > > >> > diff --git a/datapath/flow.c b/datapath/flow.c > >> > index fad9e19..27e1920 100644 > >> > --- a/datapath/flow.c > >> > +++ b/datapath/flow.c > >> > @@ -728,6 +728,17 @@ int ovs_flow_extract(struct sk_buff *skb, u16 > >> > in_port, struct sw_flow_key *key, > >> > memcpy(key->ipv4.arp.tha, arp->ar_tha, ETH_ALEN); > >> > key_len = SW_FLOW_KEY_OFFSET(ipv4.arp); > >> > } > >> > + } else if (eth_p_mpls(key->eth.type)) { > >> > + error = check_header(skb, MPLS_HLEN); > >> > + if (unlikely(error)) > >> > + goto out; > >> > + > >> > + key_len = SW_FLOW_KEY_OFFSET(mpls.top_lse); > >> > + memcpy(&key->mpls.top_lse, skb_network_header(skb), > >> > MPLS_HLEN); > >> > + > >> > + /* Update network header */ > >> > + skb_set_network_header(skb, skb_network_header(skb) - > >> > + skb->data + MPLS_HLEN); > >> > >> Is it possible to actually use this network header pointer? > > > > It is possible to use it in ovs_flow_extract_l3_onwards() in conjunction > > with the inner/outer flow changes I have in other patches (and you have > > asked me to drop). > > > > It does seem to me that it may be incorrect if there is more than one MPLS > > LSE present in the frame. > > > > But I'm not sure if either of those answers relate to the point you are > > making. Are there use-cases which concern you? > > I was just unsure of when it would be useful. It seems like if we're > not going to try to act beyond the MPLS labels that we can parse then > it makes the most sense to just leave the network layer unset (maybe > we can use it to point to the start of the MPLS stack instead of > needing a special l2 length then?). I think that removing the call to skb_set_network_header() you have isolated above is harmless enough as I don't think it is used because if frame is MPLS then no l3+ processing is done. I looked at making use of the network header to mark the top of the MPLS stack and not adding the special l2 length, however, I believe it breaks down in the situation I will now describe. Suppose an IP frame has two actions applied, dec_ttl and push_mpls. This is valid because an IP match may be made and ovs-vswitchd can install set(ip) and push_mpls actions in the datapath. Now, for some reason that I must confess that I don't fully understand at this moment the actions are installed as push_mpls and then set(ip). I believe that for the scheme you suggest above to work push_mpls (and pop_mpls) need to update the network_header such that it always points to the to of the MPLS stack. This is to allow multiple mpls actions to behave correctly, e.g. mpls_push then mpls_set. The problem being that set(ip) uses the network_header and if it has been set to the top of the MPLS stack by push_mpls then the set(ip) action will corrupt the packet. ------ incremental patch to allow OVS_KEY_ATTR_MPLS to be an array ------- diff --git a/datapath/datapath.c b/datapath/datapath.c index eb2e91b..af0dba8 100644 --- a/datapath/datapath.c +++ b/datapath/datapath.c @@ -599,8 +599,7 @@ static int validate_set(const struct nlattr *a, return -EINVAL; if (key_type > OVS_KEY_ATTR_MAX || - (ovs_key_lens[key_type] != nla_len(ovs_key) && - ovs_key_lens[key_type] != -1)) + ovs_flow_verify_key_len(key_type, ovs_key)) return -EINVAL; switch (key_type) { diff --git a/datapath/flow.c b/datapath/flow.c index c99a6f9..7528eb4 100644 --- a/datapath/flow.c +++ b/datapath/flow.c @@ -858,7 +858,7 @@ void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow) } /* The size of the argument for each %OVS_KEY_ATTR_* Netlink attribute. */ -const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { +static const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { [OVS_KEY_ATTR_ENCAP] = -1, [OVS_KEY_ATTR_PRIORITY] = sizeof(u32), [OVS_KEY_ATTR_IN_PORT] = sizeof(u32), @@ -881,6 +881,11 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { [OVS_KEY_ATTR_TUN_ID] = sizeof(__be64), }; +/* Set the bit of a type in ovs_key_arrays if it is + * an array of one or more elements of size ovs_key_lens[type]. + */ +static const u64 ovs_array_keys = (1ULL << OVS_KEY_ATTR_MPLS); + static int ipv4_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len, const struct nlattr *a[], u64 *attrs) { @@ -987,6 +992,27 @@ static int ipv6_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len, return 0; } +int ovs_flow_verify_key_len(u16 type, const struct nlattr *nla) +{ + int expected_len = ovs_key_lens[type]; + int len = nla_len(nla); + + if (len == -1) + return 0; + + if (ovs_array_keys & (1ULL << type)) { + /* The type is an array. + * Check if len is a non-zero, non-negative multiple + * of expected_len. + */ + if (len < expected_len || len % expected_len) + return -EINVAL; + } else if (len != expected_len) + return -EINVAL; + + return 0; +} + static int parse_flow_nlattrs(const struct nlattr *attr, const struct nlattr *a[], u64 *attrsp) { @@ -997,14 +1023,12 @@ static int parse_flow_nlattrs(const struct nlattr *attr, attrs = 0; nla_for_each_nested(nla, attr, rem) { u16 type = nla_type(nla); - int expected_len; if (type > OVS_KEY_ATTR_MAX || attrs & (1ULL << type)) return -EINVAL; - expected_len = ovs_key_lens[type]; - if (nla_len(nla) != expected_len && expected_len != -1) - return -EINVAL; + if (ovs_flow_verify_key_len(type, nla)) + return -EINVAL; attrs |= 1ULL << type; a[type] = nla; @@ -1305,7 +1329,6 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp, memcpy(swkey->ipv4.arp.tha, arp_key->arp_tha, ETH_ALEN); } else if (eth_p_mpls(swkey->eth.type)) { const struct ovs_key_mpls *mpls_key; - if (!(attrs & (1ULL << OVS_KEY_ATTR_MPLS))) return -EINVAL; attrs &= ~(1ULL << OVS_KEY_ATTR_MPLS); @@ -1353,7 +1376,7 @@ int ovs_flow_metadata_from_nlattrs(struct sw_flow *flow, int key_len, const stru if (type <= OVS_KEY_ATTR_MAX && ovs_key_lens[type] > 0) { int err; - if (nla_len(nla) != ovs_key_lens[type]) + if (ovs_flow_verify_key_len(type, nla)) return -EINVAL; switch (type) { diff --git a/datapath/flow.h b/datapath/flow.h index 99429d2..27e394b 100644 --- a/datapath/flow.h +++ b/datapath/flow.h @@ -240,7 +240,7 @@ void ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow, void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow); struct sw_flow *ovs_flow_tbl_next(struct flow_table *table, u32 *bucket, u32 *idx); -extern const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1]; +int ovs_flow_verify_key_len(u16 type, const struct nlattr *nla); int ipv4_tun_from_nlattr(const struct nlattr *attr, struct ovs_key_ipv4_tunnel *tun_key); int ipv4_tun_to_nlattr(struct sk_buff *skb, diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h index 63d1cac..f7b788c 100644 --- a/include/linux/openvswitch.h +++ b/include/linux/openvswitch.h @@ -285,7 +285,7 @@ enum ovs_key_attr { OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */ #endif - OVS_KEY_ATTR_MPLS = 62, /* struct ovs_key_mpls */ + OVS_KEY_ATTR_MPLS = 62, /* Array of one or more struct ovs_key_mpls */ OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */ __OVS_KEY_ATTR_MAX }; _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev