On Tue, Jun 18, 2013 at 04:06:49PM +0900, Simon Horman wrote: > Allow datapath to recognize and extract MPLS labels into flow keys > and execute actions which push, pop, and set labels on packets. > > Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe > Stringer. > > Cc: Ravi K <rke...@gmail.com> > Cc: Leo Alterman <lalter...@nicira.com> > Cc: Isaku Yamahata <yamah...@valinux.co.jp> > Cc: Joe Stringer <j...@wand.net.nz> > Signed-off-by: Simon Horman <ho...@verge.net.au> > > --- > > This patch depends on "gre: Restructure tunneling" which it aims > to be compatible with.
To clarify. The dependency relates to a conflict when applying this patch which modifies datapath/linux/compat/gso.[ch], files that are created by "gre: Restructure tunneling". I believe it would be trivial to reverse the dependency so that this patch creates those files and "gre: Restructure tunneling" applies on top of it as the two patches add different functions to those files. As such I think it would be better to describe this patch as compatible with "gre: Restructure tunneling" rather than dependent on it. > > This is the remaining patch of the series "MPLS actions and matches". > To aid review it and its dependency are available in git at: > > git://github.com/horms/openvswitch.git devel/mpls-v2.33 > > v2.33 > * Ensure that inner_protocol is always set to to the current > skb->protocol value in ovs_execute_actions(). This ensures > it is set to the correct value in the absence of a push_mpls action. > Also remove setting of inner_protocol in push_mpls() as > it duplicates the code now in ovs_execute_actions(). > * Call __skb_gso_segment() instead of skb_gso_segment() from > rpl___skb_gso_segment() in the case that HAVE___SKB_GSO_SEGMENT is set. > This was a typo. > > v2.32 > * As suggested by Jesse Gross > - Use int instead of size_t in validate_and_copy_actions__(). > - Fix crazy edit mess in pop_mpls() action comment > - Move eth_p_mpls() into mpls.h > - Refactor skb_gso_segment MPLS handling into rpl_skb_gso_segment > Address Jesse's comments regarding this code: > "Can we push this completely into the skb_gso_segment() compatibility > code? It's both nicer and may make the interactions with the vlan code > less confusing." > - Move GSO compatibility code into linux/compat/gso.* > - Set skb->protocol on mpls_push and mpls_pop in the presence > of an offloaded VLAN. > > v2.31 > * As suggested by Jesse Gross > - There is no need to make mac_header_end inline as it is not in a header > file > - Remove dubious if (*skb_ethertype == ethertype) optimisation from > set_ethertype > - Only set skb->protocol in push_mpls() or pop_mpls() for non-VLAN packets > - Use MAX_ETH_TYPES instead of SAMPLE_ACTION_DEPTH for array size > of types in struct eth_types. This corrects a typo/thinko. > - Correct eth type tracking logic such that start isn't advanced > when entering a sample action, ensuring that all possibly types > are checked when verifying nested actions. > * Define HAVE_INNER_PROTOCOL based on kernel version. > inner_protocol has been merged into net-next and should appear in > v3.11 so there is no longer a need for a acinclude.m4 test to check for it. > * Add MPLS GSO compatibility code. > This is for use on kernels that do not have MPLS GSO support. > Thanks to Joe Stringer for his work on this. > > v2.30 > * As suggested by Jesse Gross > - Use skb_cow_head in push_mpls to ensure there is sufficient headroom for > skb_push > - Call make_writable with skb->mac_len instead of skb->mac_len + MPLS_HLEN > in push_mpls as only the first skb->mac_len bytes of existing packet data > are modified. > - Rename skb_mac_header_end as mac_header_end, this seems > to be a more appropriate name for a local function. > - Remove OVS_CSUM_COMPLETE code from set_ethertype(). > Inside OVS the ethernet header is not covered by OVS_CSUM_COMPLETE. > - Use __skb_pull() instead of skb_pull() in pop_mpls() > - Decrement and decrement skb->mac_len when poping and pushing VLAN tags. > Previously mac_len was reset, but this would result in forgetting > the MPLS label stack. > - Remove spurious comment from before do_execute_actions(). > - Move OVS_KEY_ATTR_MPLS attribute to its final, upstreamable, location. > - Correct ethertype check for OVS_ACTION_ATTR_POP_MPLS case in > validate_and_copy_actions() to check for MPLS ethertypes rather than > ETH_P_IP. > - Rewrite tracking of eth types used to verify actions in the presence > of sample actions. There is a large comment above struct eth_types > describing the new implementation. > > v2.29 > * Break include/ and lib/ portions of the patch out into a > separate patch "datapath: Add basic MPLS support to kernel" > * Update for new MPLS GSO scheme > - skb->protocol is set to the new ethertype of the packet > on MPLS push and pop > - When pushing the first MPLS LSE onto a previously non-MPLS > packet set skb->inner_protocol to the original ethertype. > - skb->inner_protocol may be used by the network stack > for GSO of the inner-packet. > * Drop const from ethertype parameter of set_ethertype. > This appears to be a legacy of this parameter being a pointer. > * Pass the ethertype patrameter of pop_mpls as a value rather > than a pointer. > > v2.28 > * Kernel Datapath changes as suggested by Jarno Rajahalme > + Correct the logic introduced in v2.27 to set the network_header > to after the MPLS label stack in the case of an MPLS packet. > - Increment stack_len offset so that label stacks of depth greater > than two do not cause an infinite loop. > - Correct offset passed to check_header to include skb->mac len > > v2.27 > * Kernel Datapath changes as suggested by Jarno Rajahalme and Jesse Gross: > + Previously the mac_len and network_header of an skb corresponded > to the end of the L2 header. To support GSO, just before transmission, > do_output, with the results as follows: > > Input: non-MPLS skb: Output: network header and mac_len correspond > to the beginning of the L3 headers > Input: MPLS: Output: network header and mac_len correspond to the > end of the L2 headers. > > This is somewhat confusing. > > + The new scheme is as follows: > - The mac_len always corresponds to the end of the L2 header. > - The network header always corresponds to the beginning of the > L3 header. > > + Note that in the case of MPLS output the end of the L2 headers and the > beginning of the L3 headers will differ. > > * Remove unused declaration of skb_cb_mpls_stack() > > v2.26 > * Rebase on master > * Kernel Datapath changes as suggested by Jarno Rajahalme > - Use skb_network_header() instead of skb_mac_header() to locate > the ethertype to set in set_ethertype() as the latter will > be wrong in the presence of VLAN tags. This resolves > a regression introduced in v2.24. > - Enhance comment in do_output() > - do_execute_actions(): Do not alter mpls_stack_depth if > a MPLS push or pop action fail. This is achieved by altering > mpls_stack_depth at the end of push_mpls() and pop_mpls(). > > v2.25 > * Rebase on master > * Pass big-endian value as the last argument of eth_types_set() in > validate_and_copy_actions__() > * Use revised GSO support as provided by the patch series > "[PATCH 0/2] Small Modifications to GSO to allow segmentation of MPLS" > - Set skb->mac_len to the length of the l2 header + MPLS stack length > - Update skb->network_header accordingly > - Set skb->encapsulated_features > > v2.24 > * Use skb_mac_header() in set_ethertype() > * Set skb->encapsulation in set_ethertype() to support MPLS GSO. > Also add a note about the other requirements for MPLS GSO. > MPLS GSO support will be posted as a patch net-next (Linux mainline) > "MPLS: Add limited GSO support" > * Do not add ETH_TYPE_MIN, it is no longer used > > v2.23 > * As suggested by Jesse Gross: > - Verify the current ethernet type when validating sample actions > both for the taken and not-taken path if the sample action. > - Document that the OVS_KEY_ATTR_MPLS attribute accepts a list of > struct ovs_key_mpls but that an implementation may restrict > the length it accepts. > - Restrict the array length of the OVS_KEY_ATTR_MPLS to one. > + Don't add ovs_flow_verify_key_len as it was added to > handle attributes whose values are arrays but there are > no attributes with values that are arrays (of length greater than one). > > v2.22 > * As suggested by Jesse Gross: > - Fix sparse warning in validate_and_copy_actions() > I have no idea why sparse doesn't show this up this on my system. > - Remove call to skb_cow_head() from push_mpls() as it > is already covered by a call to make_writable() > - Check (key_type > OVS_KEY_ATTR_MAX) in ovs_flow_verify_key_len() > - Disallow set actions on l2.5+ data and MPLS push and pop actions > after an MPLS pop action as there is no verification that the packet > is actually of the new ethernet type. This may later be supported > using recirculation or by other means. > - Do not add spurious debuging message to ovs_flow_cmd_new_or_set() > > v2.21 > * As suggested by Jesse Gross: > - Verify that l3 and l4 actions always always occur prior to > a push_mpls action and use the network header pointer of an skb > to track the top of the MPLS stack. This avoids adding an l2_size > element to the skb callback. > > v2.20 > * As suggested by Jesse Gross: > - Do not add ovs_dp_ioctl_hook > + This appears to be garbage from a rebase > - Do not add skb_cb_set_l2_size. Instead set OVS_CB(skb)->l2_size > in ovs_flow_extract(). > - Do not free skb on error in push_mpls(), it is freed in the caller > - Call skb_reset_mac_len() in pop_mpls() and push_mpls() > - Update checksums in pop_mpls(), push_mpls() and set_mpls(). > - Rename skb_cb_mpls_bos() as skb_cb_mpls_stack(). > It returns the top not the bottom of the stack. > - Track the current eth_type in validate_and_copy_actions > which is initially the eth_type of the flow and may be modified > by push_mpls and pop_mpls actions. Use this to correctly validate > mpls_set actions. This is to allow mpls_set actions to be applied > to a non-MPLS frame after an mpls_push action (although ovs-vswitchd > doesn't currently do that). > Also: > + Remove the check of the eth_type in set_mpls() as the new validation > scheme should ensure it cannot be incorrect. > + Use the current eth_type to validate mpls_pop actions and remove > the eth_type check from pop_mpls(). > - Move OVS_KEY_ATTR_MPLS to non-upstream group in ovs_key_lens > - Remove unnecessary memset of mpls_key in ovs_flow_to_nlattrs() > - Make a union of the mpls and ip elements of struct sw_flow_key. > Currently the code stops parsing after an MPLS header so it is > not possible for the ip and mpls elements to be used simultaneously > and some space can be saved by using a union. > - Allow an array of MPLS key attributes > + Currently all but the first element is ignored > + User-space needs to be updated to accept more than one element, > currently it will treat their presence as an error > - Do not update network header in ovs_flow_extract() for after parsing > the MPLS stack as it is never used because no l3+ processing > occurs on MPLS frames. > - Allow multiple MPLS entries in a match by allowing the OVS_KEY_ATTR_MPLS > to be an array of struct ovs_key_mpls with at least one entry. > Currently only one entry is used which is byte-for-byte compatible with > the previous scheme of having OVS_KEY_ATTR_MPLS as a struct > ovs_key_mpls. > * Make skb writable in pop_mpls(), push_mpls() and set_mpls(). > > v2.18 - v2.19 > * No change > > v2.17 > * As suggested by Ben Pfaff > - Use consistent terminology for MPLS. > + Consistently refer to the MPLS component of a packet as the > MPLS label stack and entries in the stack as MPLS label stack entries > (LSE). An MPLS label is a component of an MPLS label stack entry. > The other components are the traffic class (TC), time to live (TTL) > and bottom of stack (BoS) bit. > - Rename compose_.*mpls_ functions as execute_.*mpls_ > > v2.16 > * No change > > v2.15 > * As suggested by Ben Pfaff > - Use OVS_ACTION_SET to set OVS_KEY_ATTR_MPLS instead of > OVS_ACTION_ATTR_SET_MPLS > > v2.14 > * Remove include/linux/openvswitch.h portion which added add > new key and action attributes. This > now present in "User-Space MPLS actions and matches" > which is now a dependency of this patch > > v2.13 > * As suggested by Jarno Rajahalme > - Rename mpls_bos element of ovs_skb_cb as l2_size as it is set and used > regardless of if an MPLS stack is present or not. Update the name of > helper functions and documentation accordingly. > - Ensure that skb_cb_mpls_bos() never returns NULL > * Correct endieness in eth_p_mpls() > > v2.12 > * Update skb and network header on MPLS extraction in ovs_flow_extract() > * Use NULL in skb_cb_mpls_bos() > * Add eth_p_mpls helper > > v2.10 - v2.11 > * No change > > v2.9 > * datapath: Always update the mpls bos if vlan_pop is successful > > Regardless of the details of how a successful > vlan_pop is achieved, the mpls bos needs to be updated. > > Without this fix it has been observed that the following > results in malformed packets > > v2.8 > * No change > > v2.7 > * Rebase > > v2.6 > * As suggested by Yamahata-san > - Do not guard against label == 0 for > OVS_ACTION_ATTR_SET_MPLS in validate_actions(). > A label of 0 is valid > - Remove comment stupulating that if > the top_label element of struct sw_flow_key is 0 then > there is no MPLS label. An MPLS label of 0 is valid > and the correct check if ethertype is > ntohs(ETH_TYPE_MPLS) or ntohs(ETH_TYPE_MPLS_MCAST) > > v2.4 - v2.5 > * No change > > v2.3 > * s/mpls_stack/mpls_bos/ > This is in keeping with the naming used in the OpenFlow 1.3 specification > > v2.2 > * Call skb_reset_mac_header() in skb_cb_set_mpls_stack() > eth_hdr(skb) is non-NULL when called in skb_cb_set_mpls_stack(). > * Add a call to skb_cb_set_mpls_stack() in ovs_packet_cmd_execute(). > I apologise that I have mislaid my notes on this but > it avoids a kernel panic. I can investigate again if necessary. > * Use struct ovs_action_push_mpls instead of > __be16 to decode OVS_ACTION_ATTR_PUSH_MPLS in validate_actions(). This is > consistent with the data format for the attribute. > * Indentation fix in skb_cb_mpls_stack(). [cosmetic] > > v2.1 > * Manual rebase > --- > datapath/Modules.mk | 1 + > datapath/actions.c | 126 +++++++++++- > datapath/datapath.c | 254 > +++++++++++++++++++++--- > datapath/datapath.h | 9 + > datapath/flow.c | 51 +++++ > datapath/flow.h | 19 +- > datapath/linux/compat/gso.c | 46 +++++ > datapath/linux/compat/gso.h | 36 ++++ > datapath/linux/compat/include/linux/netdevice.h | 12 -- > datapath/linux/compat/include/linux/skbuff.h | 1 + > datapath/linux/compat/netdevice.c | 28 --- > datapath/mpls.h | 12 ++ > datapath/tunnel.c | 1 + > datapath/vport-netdev.c | 28 ++- > include/linux/openvswitch.h | 7 +- > 15 files changed, 552 insertions(+), 79 deletions(-) > create mode 100644 datapath/mpls.h > > diff --git a/datapath/Modules.mk b/datapath/Modules.mk > index 2ce8888..ad19807 100644 > --- a/datapath/Modules.mk > +++ b/datapath/Modules.mk > @@ -26,6 +26,7 @@ openvswitch_headers = \ > compat.h \ > datapath.h \ > flow.h \ > + mpls.h \ > tunnel.h \ > vlan.h \ > vport.h \ > diff --git a/datapath/actions.c b/datapath/actions.c > index 09d0c3f..39fef3d 100644 > --- a/datapath/actions.c > +++ b/datapath/actions.c > @@ -34,6 +34,8 @@ > > #include "checksum.h" > #include "datapath.h" > +#include "gso.h" > +#include "mpls.h" > #include "vlan.h" > #include "vport.h" > > @@ -48,6 +50,110 @@ static int make_writable(struct sk_buff *skb, int > write_len) > return pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > } > > +/* The end of the mac header. > + * > + * For non-MPLS skbs this will correspond to the network header. > + * For MPLS skbs it will be berfore the network_header as the MPLS > + * label stack lies between the end of the mac header and the network > + * header. That is, for MPLS skbs the end of the mac header > + * is the top of the MPLS label stack. > + */ > +static unsigned char *mac_header_end(const struct sk_buff *skb) > +{ > + return skb_mac_header(skb) + skb->mac_len; > +} > + > +static __be16 *get_ethertype(struct sk_buff *skb) > +{ > + /* skb_mac_header() is not used to locate the ethertype to > + * set as it will be incorrect in the presence of VLAN tags > + */ > + struct ethhdr *hdr = (struct ethhdr *)(mac_header_end(skb) - ETH_HLEN); > + return &hdr->h_proto; > +} > + > +static void set_ethertype(struct sk_buff *skb, __be16 ethertype) > +{ > + __be16 *skb_ethertype = get_ethertype(skb); > + *skb_ethertype = ethertype; > +} > + > +static int push_mpls(struct sk_buff *skb, > + const struct ovs_action_push_mpls *mpls) > +{ > + __be32 *new_mpls_lse; > + int err; > + > + if (skb_cow_head(skb, MPLS_HLEN) < 0) > + return -ENOMEM; > + > + err = make_writable(skb, skb->mac_len); > + if (unlikely(err)) > + return err; > + > + skb_push(skb, MPLS_HLEN); > + memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb), > + skb->mac_len); > + skb_reset_mac_header(skb); > + > + new_mpls_lse = (__be32 *)mac_header_end(skb); > + *new_mpls_lse = mpls->mpls_lse; > + > + if (get_ip_summed(skb) == OVS_CSUM_COMPLETE) > + skb->csum = csum_add(skb->csum, csum_partial(new_mpls_lse, > + MPLS_HLEN, 0)); > + > + set_ethertype(skb, mpls->mpls_ethertype); > + if (skb->protocol != htons(ETH_P_8021Q)) > + skb->protocol = mpls->mpls_ethertype; > + return 0; > +} > + > +static int pop_mpls(struct sk_buff *skb, const __be16 ethertype) > +{ > + int err; > + > + err = make_writable(skb, skb->mac_len + MPLS_HLEN); > + if (unlikely(err)) > + return err; > + > + if (get_ip_summed(skb) == OVS_CSUM_COMPLETE) > + skb->csum = csum_sub(skb->csum, > + csum_partial(mac_header_end(skb), > + MPLS_HLEN, 0)); > + > + memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb), > + skb->mac_len); > + > + __skb_pull(skb, MPLS_HLEN); > + skb_reset_mac_header(skb); > + > + set_ethertype(skb, ethertype); > + if (skb->protocol != htons(ETH_P_8021Q)) > + skb->protocol = ethertype; > + return 0; > +} > + > +static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse) > +{ > + __be32 *stack = (__be32 *)mac_header_end(skb); > + int err; > + > + err = make_writable(skb, skb->mac_len + MPLS_HLEN); > + if (unlikely(err)) > + return err; > + > + if (get_ip_summed(skb) == OVS_CSUM_COMPLETE) { > + __be32 diff[] = { ~(*stack), *mpls_lse }; > + skb->csum = ~csum_partial((char *)diff, sizeof(diff), > + ~skb->csum); > + } > + > + *stack = *mpls_lse; > + > + return 0; > +} > + > /* remove VLAN header from packet and update csum accordingly. */ > static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci) > { > @@ -70,7 +176,7 @@ static int __pop_vlan_tci(struct sk_buff *skb, __be16 > *current_tci) > > vlan_set_encap_proto(skb, vhdr); > skb->mac_header += VLAN_HLEN; > - skb_reset_mac_len(skb); > + skb->mac_len -= VLAN_HLEN; > > return 0; > } > @@ -115,6 +221,9 @@ static int push_vlan(struct sk_buff *skb, const struct > ovs_action_push_vlan *vla > if (!__vlan_put_tag(skb, current_tag)) > return -ENOMEM; > > + /* update mac_len for mac_header_end() */ > + skb->mac_len += VLAN_HLEN; > + > if (get_ip_summed(skb) == OVS_CSUM_COMPLETE) > skb->csum = csum_add(skb->csum, csum_partial(skb->data > + (2 * ETH_ALEN), VLAN_HLEN, 0)); > @@ -467,6 +576,10 @@ static int execute_set_action(struct sk_buff *skb, > case OVS_KEY_ATTR_UDP: > err = set_udp(skb, nla_data(nested_attr)); > break; > + > + case OVS_KEY_ATTR_MPLS: > + err = set_mpls(skb, nla_data(nested_attr)); > + break; > } > > return err; > @@ -502,6 +615,14 @@ static int do_execute_actions(struct datapath *dp, > struct sk_buff *skb, > output_userspace(dp, skb, a); > break; > > + case OVS_ACTION_ATTR_PUSH_MPLS: > + err = push_mpls(skb, nla_data(a)); > + break; > + > + case OVS_ACTION_ATTR_POP_MPLS: > + err = pop_mpls(skb, nla_get_be16(a)); > + break; > + > case OVS_ACTION_ATTR_PUSH_VLAN: > err = push_vlan(skb, nla_data(a)); > if (unlikely(err)) /* skb already freed. */ > @@ -575,6 +696,9 @@ int ovs_execute_actions(struct datapath *dp, struct > sk_buff *skb) > goto out_loop; > } > > + /* Needed for inner protocol compatibility on older kernels. */ > + ovs_skb_set_inner_protocol(skb, skb->protocol); > + > OVS_CB(skb)->tun_key = NULL; > error = do_execute_actions(dp, skb, acts->actions, > acts->actions_len, false); > diff --git a/datapath/datapath.c b/datapath/datapath.c > index 42af315..264b8b6 100644 > --- a/datapath/datapath.c > +++ b/datapath/datapath.c > @@ -57,6 +57,8 @@ > #include "checksum.h" > #include "datapath.h" > #include "flow.h" > +#include "gso.h" > +#include "mpls.h" > #include "vlan.h" > #include "tunnel.h" > #include "vport-internal_dev.h" > @@ -551,18 +553,132 @@ static inline void add_nested_action_end(struct > sw_flow_actions *sfa, int st_off > a->nla_len = sfa->actions_len - st_offset; > } > > -static int validate_and_copy_actions(const struct nlattr *attr, > +#define MAX_ETH_TYPES 16 /* Arbitrary Limit */ > + > +/* struct eth_types - possible eth types > + * @types: provides storage for the possible eth types. > + * @start: is the index of the first entry of types which is possible. > + * @end: is the index of the last entry of types which is possible. > + * @cursor: is the index of the entry which should be updated if an action > + * changes the eth type. > + * > + * Due to the sample action there may be multiple possible eth types. > + * In order to correctly validate actions all possible types are tracked > + * and verified. This is done using struct eth_types. > + * > + * Initially start, end and cursor should be 0, and the first element of > + * types should be set to the eth type of the flow. > + * > + * When an action changes the eth type then the values of start and end are > + * updated to the value of cursor. The new type is stored at types[cursor]. > + * > + * When entering a sample action the start and cursor values are saved. The > + * value of cursor is set to the value of end plus one. > + * > + * When leaving a sample action the start and cursor values are restored to > + * their saved values. > + * > + * An example follows. > + * > + * actions: pop_mpls(A),sample(pop_mpls(B)),sample(pop_mpls(C)),pop_mpls(D) > + * > + * 0. Initial state: > + * types = { original_eth_type } > + * start = end = cursor = 0; > + * > + * 1. pop_mpls(A) > + * a. Check types from start (0) to end (0) inclusive > + * i.e. Check against original_eth_type > + * b. Set start = end = cursor > + * c. Set types[cursor] = A > + * New state: > + * types = { A } > + * start = end = cursor = 0; > + * > + * 2. Enter first sample() > + * a. Save start and cursor > + * b. Set cursor = end + 1 > + * New state: > + * types = { A } > + * start = end = 0; > + * cursor = 1; > + * > + * 3. pop_mpls(B) > + * a. Check types from start (0) to end (0) > + * i.e: Check against A > + * b. Set start = end = cursor > + * c. Set types[cursor] = B > + * New state: > + * types = { A, B } > + * start = end = cursor = 1; > + * > + * 4. Leave first sample() > + * a. Restore start and cursor to the values when entering 2. > + * New state: > + * types = { A, B } > + * start = cursor = 0; > + * end = 1; > + * > + * 5. Enter second sample() > + * a. Save start and cursor > + * b. Set cursor = end + 1 > + * New state: > + * types = { A, B } > + * start = 0; > + * end = 1; > + * cursor = 2; > + * > + * 6. pop_mpls(C) > + * a. Check types from start (0) to end (1) inclusive > + * i.e: Check against A and B > + * b. Set start = end = cursor > + * c. Set types[cursor] = C > + * New state: > + * types = { A, B, C } > + * start = end = cursor = 2; > + * > + * 7. Leave second sample() > + * a. Restore start and cursor to the values when entering 5. > + * New state: > + * types = { A, B, C } > + * start = cursor = 0; > + * end = 2; > + * > + * 8. pop_mpls(D) > + * a. Check types from start (0) to end (2) inclusive > + * i.e: Check against A, B and C > + * b. Set start = end = cursor > + * c. Set types[cursor] = D > + * New state: > + * types = { D } // Trailing entries of type are no longer used end = 0 > + * start = end = cursor = 0; > + */ > +struct eth_types { > + int start, end, cursor; > + __be16 types[MAX_ETH_TYPES]; > +}; > + > +static void eth_types_set(struct eth_types *types, __be16 type) > +{ > + types->start = types->end = types->cursor; > + types->types[types->cursor] = type; > +} > + > +static int validate_and_copy_actions__(const struct nlattr *attr, > const struct sw_flow_key *key, int depth, > - struct sw_flow_actions **sfa); > + struct sw_flow_actions **sfa, > + struct eth_types *eth_types); > > static int validate_and_copy_sample(const struct nlattr *attr, > const struct sw_flow_key *key, int depth, > - struct sw_flow_actions **sfa) > + struct sw_flow_actions **sfa, > + struct eth_types *eth_types) > { > const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1]; > const struct nlattr *probability, *actions; > const struct nlattr *a; > int rem, start, err, st_acts; > + int saved_eth_types_start, saved_eth_types_cursor; > > memset(attrs, 0, sizeof(attrs)); > nla_for_each_nested(a, attr, rem) { > @@ -593,22 +709,38 @@ static int validate_and_copy_sample(const struct nlattr > *attr, > if (st_acts < 0) > return st_acts; > > - err = validate_and_copy_actions(actions, key, depth + 1, sfa); > + /* Save and update eth_types cursor and start. Please see the > + * comment for struct eth_types for a discussion of this. > + */ > + saved_eth_types_start = eth_types->start; > + saved_eth_types_cursor = eth_types->cursor; > + eth_types->cursor = eth_types->end + 1; > + if (eth_types->cursor == MAX_ETH_TYPES) > + return -EINVAL; > + > + err = validate_and_copy_actions__(actions, key, depth + 1, sfa, > + eth_types); > if (err) > return err; > > + /* Restore eth_types cursor and start. Please see the > + * comment for struct eth_types for a discussion of this. > + */ > + eth_types->cursor = saved_eth_types_cursor; > + eth_types->start = saved_eth_types_start; > + > add_nested_action_end(*sfa, st_acts); > add_nested_action_end(*sfa, start); > > return 0; > } > > -static int validate_tp_port(const struct sw_flow_key *flow_key) > +static int validate_tp_port(const struct sw_flow_key *flow_key, __be16 > eth_type) > { > - if (flow_key->eth.type == htons(ETH_P_IP)) { > + if (eth_type == htons(ETH_P_IP)) { > if (flow_key->ipv4.tp.src || flow_key->ipv4.tp.dst) > return 0; > - } else if (flow_key->eth.type == htons(ETH_P_IPV6)) { > + } else if (eth_type == htons(ETH_P_IPV6)) { > if (flow_key->ipv6.tp.src || flow_key->ipv6.tp.dst) > return 0; > } > @@ -639,7 +771,7 @@ static int validate_and_copy_set_tun(const struct nlattr > *attr, > static int validate_set(const struct nlattr *a, > const struct sw_flow_key *flow_key, > struct sw_flow_actions **sfa, > - bool *set_tun) > + bool *set_tun, struct eth_types *eth_types) > { > const struct nlattr *ovs_key = nla_data(a); > int key_type = nla_type(ovs_key); > @@ -676,9 +808,12 @@ static int validate_set(const struct nlattr *a, > return err; > break; > > - case OVS_KEY_ATTR_IPV4: > - if (flow_key->eth.type != htons(ETH_P_IP)) > - return -EINVAL; > + case OVS_KEY_ATTR_IPV4: { > + int i; > + > + for (i = eth_types->start; i <= eth_types->end; i++) > + if (eth_types->types[i] != htons(ETH_P_IP)) > + return -EINVAL; > > if (!flow_key->ip.proto) > return -EINVAL; > @@ -691,10 +826,14 @@ static int validate_set(const struct nlattr *a, > return -EINVAL; > > break; > + } > > - case OVS_KEY_ATTR_IPV6: > - if (flow_key->eth.type != htons(ETH_P_IPV6)) > - return -EINVAL; > + case OVS_KEY_ATTR_IPV6: { > + int i; > + > + for (i = eth_types->start; i <= eth_types->end; i++) > + if (eth_types->types[i] != htons(ETH_P_IPV6)) > + return -EINVAL; > > if (!flow_key->ip.proto) > return -EINVAL; > @@ -710,18 +849,37 @@ static int validate_set(const struct nlattr *a, > return -EINVAL; > > break; > + } > + > + case OVS_KEY_ATTR_TCP: { > + int i; > > - case OVS_KEY_ATTR_TCP: > if (flow_key->ip.proto != IPPROTO_TCP) > return -EINVAL; > > - return validate_tp_port(flow_key); > + for (i = eth_types->start; i <= eth_types->end; i++) > + if (validate_tp_port(flow_key, eth_types->types[i])) > + return -EINVAL; > + } > > - case OVS_KEY_ATTR_UDP: > + case OVS_KEY_ATTR_UDP: { > + int i; > if (flow_key->ip.proto != IPPROTO_UDP) > return -EINVAL; > > - return validate_tp_port(flow_key); > + for (i = eth_types->start; i <= eth_types->end; i++) > + if (validate_tp_port(flow_key, eth_types->types[i])) > + return -EINVAL; > + } > + > + case OVS_KEY_ATTR_MPLS: { > + int i; > + > + for (i = eth_types->start; i < eth_types->end; i++) > + if (!eth_p_mpls(eth_types->types[i])) > + return -EINVAL; > + break; > + } > > default: > return -EINVAL; > @@ -765,10 +923,10 @@ static int copy_action(const struct nlattr *from, > return 0; > } > > -static int validate_and_copy_actions(const struct nlattr *attr, > - const struct sw_flow_key *key, > - int depth, > - struct sw_flow_actions **sfa) > +static int validate_and_copy_actions__(const struct nlattr *attr, > + const struct sw_flow_key *key, int depth, > + struct sw_flow_actions **sfa, > + struct eth_types *eth_types) > { > const struct nlattr *a; > int rem, err; > @@ -781,6 +939,8 @@ static int validate_and_copy_actions(const struct nlattr > *attr, > static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = { > [OVS_ACTION_ATTR_OUTPUT] = sizeof(u32), > [OVS_ACTION_ATTR_USERSPACE] = (u32)-1, > + [OVS_ACTION_ATTR_PUSH_MPLS] = sizeof(struct > ovs_action_push_mpls), > + [OVS_ACTION_ATTR_POP_MPLS] = sizeof(__be16), > [OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct > ovs_action_push_vlan), > [OVS_ACTION_ATTR_POP_VLAN] = 0, > [OVS_ACTION_ATTR_SET] = (u32)-1, > @@ -811,6 +971,33 @@ static int validate_and_copy_actions(const struct nlattr > *attr, > return -EINVAL; > break; > > + case OVS_ACTION_ATTR_PUSH_MPLS: { > + const struct ovs_action_push_mpls *mpls = nla_data(a); > + if (!eth_p_mpls(mpls->mpls_ethertype)) > + return -EINVAL; > + eth_types_set(eth_types, mpls->mpls_ethertype); > + break; > + } > + > + case OVS_ACTION_ATTR_POP_MPLS: { > + int i; > + > + for (i = eth_types->start; i <= eth_types->end; i++) > + if (!eth_p_mpls(eth_types->types[i])) > + return -EINVAL; > + > + /* Disallow subsequent L2.5+ set and mpls_pop actions > + * as there is no check here to ensure that the new > + * eth_type is valid and thus set actions could > + * write off the end of the packet or otherwise > + * corrupt it. > + * > + * Support for these actions is planned using packet > + * recirculation. > + */ > + eth_types_set(eth_types, htons(0)); > + break; > + } > > case OVS_ACTION_ATTR_POP_VLAN: > break; > @@ -824,13 +1011,14 @@ static int validate_and_copy_actions(const struct > nlattr *attr, > break; > > case OVS_ACTION_ATTR_SET: > - err = validate_set(a, key, sfa, &skip_copy); > + err = validate_set(a, key, sfa, &skip_copy, eth_types); > if (err) > return err; > break; > > case OVS_ACTION_ATTR_SAMPLE: > - err = validate_and_copy_sample(a, key, depth, sfa); > + err = validate_and_copy_sample(a, key, depth, sfa, > + eth_types); > if (err) > return err; > skip_copy = true; > @@ -852,6 +1040,20 @@ static int validate_and_copy_actions(const struct > nlattr *attr, > return 0; > } > > +static int validate_and_copy_actions(const struct nlattr *attr, > + const struct sw_flow_key *key, > + struct sw_flow_actions **sfa) > +{ > + struct eth_types eth_type = { > + .start = 0, > + .end = 0, > + .cursor = 0, > + .types = { key->eth.type, }, > + }; > + > + return validate_and_copy_actions__(attr, key, 0, sfa, ð_type); > +} > + > static void clear_stats(struct sw_flow *flow) > { > flow->used = 0; > @@ -916,7 +1118,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, > struct genl_info *info) > if (IS_ERR(acts)) > goto err_flow_free; > > - err = validate_and_copy_actions(a[OVS_PACKET_ATTR_ACTIONS], &flow->key, > 0, &acts); > + err = validate_and_copy_actions(a[OVS_PACKET_ATTR_ACTIONS], &flow->key, > &acts); > rcu_assign_pointer(flow->sf_acts, acts); > if (err) > goto err_flow_free; > @@ -1252,7 +1454,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, > struct genl_info *info) > if (IS_ERR(acts)) > goto error; > > - error = validate_and_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], > &key, 0, &acts); > + error = validate_and_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], > &key, &acts); > if (error) > goto err_kfree; > } else if (info->genlhdr->cmd == OVS_FLOW_CMD_NEW) { > diff --git a/datapath/datapath.h b/datapath/datapath.h > index ad59a3a..f779d1b 100644 > --- a/datapath/datapath.h > +++ b/datapath/datapath.h > @@ -38,6 +38,10 @@ > > #define SAMPLE_ACTION_DEPTH 3 > > +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,11,0) > +#define HAVE_INNER_PROTOCOL > +#endif > + > /** > * struct dp_stats_percpu - per-cpu packet processing statistics for a given > * datapath. > @@ -101,6 +105,8 @@ struct datapath { > * packet was not received on a tunnel. > * @vlan_tci: Provides a substitute for the skb->vlan_tci field on kernels > * before 2.6.27. > + * @inner_protocol: Provides a substitute for the skb->inner_protocol field > on > + * kernels before 3.11. > */ > struct ovs_skb_cb { > struct sw_flow *flow; > @@ -112,6 +118,9 @@ struct ovs_skb_cb { > #ifdef NEED_VLAN_FIELD > u16 vlan_tci; > #endif > +#ifndef HAVE_INNER_PROTOCOL > + __be16 inner_protocol; > +#endif > }; > #define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb) > > diff --git a/datapath/flow.c b/datapath/flow.c > index 1f5a8e5..ed38ac4 100644 > --- a/datapath/flow.c > +++ b/datapath/flow.c > @@ -43,10 +43,13 @@ > #include <net/ipv6.h> > #include <net/ndisc.h> > > +#include "mpls.h" > #include "vlan.h" > > static struct kmem_cache *flow_cache; > > +#define MPLS_BOS_MASK 0x00000100 > + > static int check_header(struct sk_buff *skb, int len) > { > if (unlikely(skb->len < len)) > @@ -650,6 +653,7 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, > struct sw_flow_key *key, > return -ENOMEM; > > skb_reset_network_header(skb); > + skb_reset_mac_len(skb); > __skb_push(skb, skb->data - skb_mac_header(skb)); > > /* Network layer. */ > @@ -732,6 +736,35 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, > struct sw_flow_key *key, > memcpy(key->ipv4.arp.tha, arp->ar_tha, ETH_ALEN); > key_len = SW_FLOW_KEY_OFFSET(ipv4.arp); > } > + } else if (eth_p_mpls(key->eth.type)) { > + size_t stack_len = MPLS_HLEN; > + > + /* In the presence of an MPLS label stack the end of the L2 > + * header and the beginning of the L3 header differ. > + * > + * Advance network_header to the beginning of the L3 > + * header. mac_len corresponds to the end of the L2 header. > + */ > + while (1) { > + __be32 lse; > + > + error = check_header(skb, skb->mac_len + stack_len); > + if (unlikely(error)) > + goto out; > + > + memcpy(&lse, skb_network_header(skb), MPLS_HLEN); > + > + if (stack_len == MPLS_HLEN) { > + key_len = SW_FLOW_KEY_OFFSET(mpls.top_lse); > + memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN); > + } > + > + skb_set_network_header(skb, skb->mac_len + stack_len); > + if (lse & htonl(MPLS_BOS_MASK)) > + break; > + > + stack_len += MPLS_HLEN; > + } > } else if (key->eth.type == htons(ETH_P_IPV6)) { > int nh_len; /* IPv6 Header + Extensions */ > > @@ -850,6 +883,7 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { > [OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp), > [OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd), > [OVS_KEY_ATTR_TUNNEL] = -1, > + [OVS_KEY_ATTR_MPLS] = sizeof(struct ovs_key_mpls), > }; > > static int ipv4_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len, > @@ -1256,6 +1290,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, > int *key_lenp, > swkey->ip.proto = ntohs(arp_key->arp_op); > memcpy(swkey->ipv4.arp.sha, arp_key->arp_sha, ETH_ALEN); > memcpy(swkey->ipv4.arp.tha, arp_key->arp_tha, ETH_ALEN); > + } else if (eth_p_mpls(swkey->eth.type)) { > + const struct ovs_key_mpls *mpls_key; > + if (!(attrs & (1ULL << OVS_KEY_ATTR_MPLS))) > + return -EINVAL; > + attrs &= ~(1ULL << OVS_KEY_ATTR_MPLS); > + > + key_len = SW_FLOW_KEY_OFFSET(mpls.top_lse); > + mpls_key = nla_data(a[OVS_KEY_ATTR_MPLS]); > + swkey->mpls.top_lse = mpls_key->mpls_lse; > } > > if (attrs) > @@ -1422,6 +1465,14 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key > *swkey, struct sk_buff *skb) > arp_key->arp_op = htons(swkey->ip.proto); > memcpy(arp_key->arp_sha, swkey->ipv4.arp.sha, ETH_ALEN); > memcpy(arp_key->arp_tha, swkey->ipv4.arp.tha, ETH_ALEN); > + } else if (eth_p_mpls(swkey->eth.type)) { > + struct ovs_key_mpls *mpls_key; > + > + nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS, sizeof(*mpls_key)); > + if (!nla) > + goto nla_put_failure; > + mpls_key = nla_data(nla); > + mpls_key->mpls_lse = swkey->mpls.top_lse; > } > > if ((swkey->eth.type == htons(ETH_P_IP) || > diff --git a/datapath/flow.h b/datapath/flow.h > index dfffed7..bc17fab 100644 > --- a/datapath/flow.h > +++ b/datapath/flow.h > @@ -69,12 +69,17 @@ struct sw_flow_key { > __be16 tci; /* 0 if no VLAN, VLAN_TAG_PRESENT set > otherwise. */ > __be16 type; /* Ethernet frame type. */ > } eth; > - struct { > - u8 proto; /* IP protocol or lower 8 bits of ARP > opcode. */ > - u8 tos; /* IP ToS. */ > - u8 ttl; /* IP TTL/hop limit. */ > - u8 frag; /* One of OVS_FRAG_TYPE_*. */ > - } ip; > + union { > + struct { > + __be32 top_lse; /* top label stack entry */ > + } mpls; > + struct { > + u8 proto; /* IP protocol or lower 8 bits > of ARP opcode. */ > + u8 tos; /* IP ToS. */ > + u8 ttl; /* IP TTL/hop limit. */ > + u8 frag; /* One of OVS_FRAG_TYPE_*. */ > + } ip; > + }; > union { > struct { > struct { > @@ -140,6 +145,8 @@ struct arp_eth_header { > unsigned char ar_tip[4]; /* target IP address */ > } __packed; > > +#define MPLS_HLEN 4 > + > int ovs_flow_init(void); > void ovs_flow_exit(void); > > diff --git a/datapath/linux/compat/gso.c b/datapath/linux/compat/gso.c > index 8cb2e06..eccbbee 100644 > --- a/datapath/linux/compat/gso.c > +++ b/datapath/linux/compat/gso.c > @@ -19,6 +19,7 @@ > #include <linux/module.h> > #include <linux/if.h> > #include <linux/if_tunnel.h> > +#include <linux/if_vlan.h> > #include <linux/icmp.h> > #include <linux/in.h> > #include <linux/ip.h> > @@ -35,12 +36,20 @@ > #include <net/xfrm.h> > > #include "gso.h" > +#include "mpls.h" > +#include "vlan.h" > > +#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0) > static __be16 skb_network_protocol(struct sk_buff *skb) > { > __be16 type = skb->protocol; > + __be16 inner_proto; > int vlan_depth = ETH_HLEN; > > + inner_proto = ovs_skb_get_inner_protocol(skb); > + if (eth_p_mpls(skb->protocol) && !eth_p_mpls(inner_proto)) > + type = inner_proto; > + > while (type == htons(ETH_P_8021Q) || type == htons(ETH_P_8021AD)) { > struct vlan_hdr *vh; > > @@ -55,6 +64,43 @@ static __be16 skb_network_protocol(struct sk_buff *skb) > return type; > } > > +struct sk_buff *rpl___skb_gso_segment(struct sk_buff *skb, > + netdev_features_t features, > + bool tx_path) > +{ > + struct sk_buff *skb_gso; > + __be16 type = skb->protocol; > + > + skb->protocol = skb_network_protocol(skb); > + > + /* this hack needed to get regular skb_gso_segment() */ > +#ifdef HAVE___SKB_GSO_SEGMENT > +#undef __skb_gso_segment > + skb_gso = __skb_gso_segment(skb, features, tx_path); > +#else > +#undef skb_gso_segment > + skb_gso = skb_gso_segment(skb, features); > +#endif > + > + if (!skb_gso || IS_ERR(skb_gso)) > + return skb_gso; > + > + skb = skb_gso; > + while (skb) { > + skb->protocol = type; > + skb = skb->next; > + } > + > + return skb_gso; > +} > + > +struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb, > + netdev_features_t features) > +{ > + return rpl___skb_gso_segment(skb, features, true); > +} > +#endif /* kernel version < 3.11.0 */ > + > static struct sk_buff *tnl_skb_gso_segment(struct sk_buff *skb, > netdev_features_t features, > bool tx_path) > diff --git a/datapath/linux/compat/gso.h b/datapath/linux/compat/gso.h > index 44fd213..a8bc3b0 100644 > --- a/datapath/linux/compat/gso.h > +++ b/datapath/linux/compat/gso.h > @@ -1,6 +1,7 @@ > #ifndef __LINUX_GSO_WRAPPER_H > #define __LINUX_GSO_WRAPPER_H > > +#include <linux/netdevice.h> > #include <linux/skbuff.h> > #include <net/protocol.h> > > @@ -69,4 +70,39 @@ static inline void skb_reset_inner_headers(struct sk_buff > *skb) > > #define ip_local_out rpl_ip_local_out > int ip_local_out(struct sk_buff *skb); > + > +#ifdef HAVE_INNER_PROTOCOL > +static inline void ovs_skb_set_inner_protocol(struct sk_buff *skb, > + __be16 ethertype) > +{ > + skb->inner_protocol = ethertype; > +} > + > +static inline __be16 ovs_skb_get_inner_protocol(struct sk_buff *skb) > +{ > + return skb->inner_protocol; > +} > +#else > +static inline void ovs_skb_set_inner_protocol(struct sk_buff *skb, > + __be16 ethertype) { > + OVS_CB(skb)->inner_protocol = ethertype; > +} > + > +static inline __be16 ovs_skb_get_inner_protocol(struct sk_buff *skb) > +{ > + return OVS_CB(skb)->inner_protocol; > +} > +#endif > + > +#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0) > +#define skb_gso_segment rpl_skb_gso_segment > +struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb, > + netdev_features_t features); > + > +#define __skb_gso_segment rpl___skb_gso_segment > +struct sk_buff *rpl___skb_gso_segment(struct sk_buff *skb, > + netdev_features_t features, > + bool tx_path); > +#endif /* before 3.11 */ > + > #endif > diff --git a/datapath/linux/compat/include/linux/netdevice.h > b/datapath/linux/compat/include/linux/netdevice.h > index 644e7d7..c98317d 100644 > --- a/datapath/linux/compat/include/linux/netdevice.h > +++ b/datapath/linux/compat/include/linux/netdevice.h > @@ -140,9 +140,6 @@ static inline struct net_device > *dev_get_by_index_rcu(struct net *net, int ifind > #endif > > #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,38) > -#define skb_gso_segment rpl_skb_gso_segment > -struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb, u32 features); > - > #define netif_skb_features rpl_netif_skb_features > u32 rpl_netif_skb_features(struct sk_buff *skb); > > @@ -158,15 +155,6 @@ static inline int rpl_netif_needs_gso(struct sk_buff > *skb, int features) > typedef u32 netdev_features_t; > #endif > > -#ifndef HAVE___SKB_GSO_SEGMENT > -static inline struct sk_buff *__skb_gso_segment(struct sk_buff *skb, > - netdev_features_t features, > - bool tx_path) > -{ > - return skb_gso_segment(skb, features); > -} > -#endif > - > #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37) > #define skb_has_frag_list skb_has_frags > #endif > diff --git a/datapath/linux/compat/include/linux/skbuff.h > b/datapath/linux/compat/include/linux/skbuff.h > index d485b39..02a0193 100644 > --- a/datapath/linux/compat/include/linux/skbuff.h > +++ b/datapath/linux/compat/include/linux/skbuff.h > @@ -251,4 +251,5 @@ static inline void skb_reset_mac_len(struct sk_buff *skb) > skb->mac_len = skb->network_header - skb->mac_header; > } > #endif > + > #endif > diff --git a/datapath/linux/compat/netdevice.c > b/datapath/linux/compat/netdevice.c > index d26fb5e..ef0f155 100644 > --- a/datapath/linux/compat/netdevice.c > +++ b/datapath/linux/compat/netdevice.c > @@ -71,32 +71,4 @@ u32 rpl_netif_skb_features(struct sk_buff *skb) > return harmonize_features(skb, protocol, features); > } > } > - > -struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb, u32 features) > -{ > - int vlan_depth = ETH_HLEN; > - __be16 type = skb->protocol; > - __be16 skb_proto; > - struct sk_buff *skb_gso; > - > - while (type == htons(ETH_P_8021Q)) { > - struct vlan_hdr *vh; > - > - if (unlikely(!pskb_may_pull(skb, vlan_depth + VLAN_HLEN))) > - return ERR_PTR(-EINVAL); > - > - vh = (struct vlan_hdr *)(skb->data + vlan_depth); > - type = vh->h_vlan_encapsulated_proto; > - vlan_depth += VLAN_HLEN; > - } > - > - /* this hack needed to get regular skb_gso_segment() */ > -#undef skb_gso_segment > - skb_proto = skb->protocol; > - skb->protocol = type; > - > - skb_gso = skb_gso_segment(skb, features); > - skb->protocol = skb_proto; > - return skb_gso; > -} > #endif /* kernel version < 2.6.38 */ > diff --git a/datapath/mpls.h b/datapath/mpls.h > new file mode 100644 > index 0000000..e72f2e7 > --- /dev/null > +++ b/datapath/mpls.h > @@ -0,0 +1,12 @@ > +#ifndef MPLS_H > +#define MPLS_H 1 > + > +#include <linux/if_ether.h> > + > +static inline bool eth_p_mpls(__be16 eth_type) > +{ > + return eth_type == htons(ETH_P_MPLS_UC) || > + eth_type == htons(ETH_P_MPLS_MC); > +} > + > +#endif > diff --git a/datapath/tunnel.c b/datapath/tunnel.c > index 9102786..162a099 100644 > --- a/datapath/tunnel.c > +++ b/datapath/tunnel.c > @@ -33,6 +33,7 @@ > #include "checksum.h" > #include "compat.h" > #include "datapath.h" > +#include "gso.h" > #include "tunnel.h" > #include "vlan.h" > #include "vport.h" > diff --git a/datapath/vport-netdev.c b/datapath/vport-netdev.c > index fe7e359..b2b0e99 100644 > --- a/datapath/vport-netdev.c > +++ b/datapath/vport-netdev.c > @@ -30,6 +30,8 @@ > > #include "checksum.h" > #include "datapath.h" > +#include "gso.h" > +#include "mpls.h" > #include "vlan.h" > #include "vport-internal_dev.h" > #include "vport-netdev.h" > @@ -299,6 +301,8 @@ static int netdev_send(struct vport *vport, struct > sk_buff *skb) > struct netdev_vport *netdev_vport = netdev_vport_priv(vport); > int mtu = netdev_vport->dev->mtu; > int len; > + __be16 inner_protocol; > + bool vlan, mpls; > > if (unlikely(packet_length(skb) > mtu && !skb_is_gso(skb))) { > net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n", > @@ -310,8 +314,17 @@ static int netdev_send(struct vport *vport, struct > sk_buff *skb) > skb->dev = netdev_vport->dev; > forward_ip_summed(skb, true); > > - if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev)) { > - int features; > + vlan = mpls = false; > + > + inner_protocol = ovs_skb_get_inner_protocol(skb); > + if (eth_p_mpls(skb->protocol) && !eth_p_mpls(inner_protocol)) > + mpls = true; > + > + if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev)) > + vlan = true; > + > + if (vlan || mpls) { > + netdev_features_t features; > > features = netif_skb_features(skb); > > @@ -319,6 +332,17 @@ static int netdev_send(struct vport *vport, struct > sk_buff *skb) > features &= ~(NETIF_F_TSO | NETIF_F_TSO6 | > NETIF_F_UFO | NETIF_F_FSO); > > + /* As of v3.11 the kernel provides an mpls_features field in > + * struct net_device which allows devices to advertise which > + * features its supports for MPLS. This value defaults to > + * NETIF_F_SG and as of writing is not overridden anywhere. > + * This compatibility code is intended for older kernels which > + * do not support MPLS GSO and thus do not provide > + * mpls_features. Thus this code uses NETIF_F_SG directly in > + * place of mpls_features. */ > + if (mpls) > + features &= NETIF_F_SG; > + > if (netif_needs_gso(skb, features)) { > struct sk_buff *nskb; > > diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h > index e890fd8..ba21b53 100644 > --- a/include/linux/openvswitch.h > +++ b/include/linux/openvswitch.h > @@ -282,14 +282,13 @@ enum ovs_key_attr { > OVS_KEY_ATTR_ND, /* struct ovs_key_nd */ > OVS_KEY_ATTR_SKB_MARK, /* u32 skb mark */ > OVS_KEY_ATTR_TUNNEL, /* Nested set of ovs_tunnel attributes */ > + OVS_KEY_ATTR_MPLS, /* array of struct ovs_key_mpls. > + * The implementation may restrict > + * the accepted length of the array. */ > > #ifdef __KERNEL__ > OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */ > #endif > - > - OVS_KEY_ATTR_MPLS = 62, /* array of struct ovs_key_mpls. > - * The implementation may restrict > - * the accepted length of the array. */ > __OVS_KEY_ATTR_MAX > }; > > -- > 1.8.2.1 > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev