On Oct 11, 2012, at 4:52 PM, Pravin Shelar <pshe...@nicira.com> wrote: > On Thu, Oct 11, 2012 at 2:44 PM, Kyle Mestery (kmestery) > <kmest...@cisco.com> wrote: >> On Oct 11, 2012, at 4:32 PM, Pravin Shelar <pshe...@nicira.com> wrote: >>> On Thu, Oct 11, 2012 at 1:52 PM, Kyle Mestery (kmestery) >>> <kmest...@cisco.com> wrote: >>>> >>>> On Oct 10, 2012, at 6:23 PM, Pravin Shelar <pshe...@nicira.com> wrote: >>>> >>>>> Patch looks good, I have few comments inlined. >>>>> >>>>> >>>>> On Tue, Oct 9, 2012 at 11:49 AM, Kyle Mestery <kmest...@cisco.com> wrote: >>>>>> This is a first pass at providing a tun_key which can be >>>>>> used as the basis for flow-based tunnelling. The >>>>>> tun_key includes and replaces the tun_id in both struct >>>>>> ovs_skb_cb and struct sw_tun_key. >>>>>> >>>>>> This patch allows all existing tun_id behaviour to still work. Existing >>>>>> users of tun_id are redirected to tun_key->tun_id to retain >>>>>> compatibility. >>>>>> However, when the userspace code is updated to make use of the new >>>>>> tun_key, >>>>>> the old behaviour will be deprecated and removed. >>>>>> >>>>>> NOTE: With these changes, the tunneling code no longer assumes input and >>>>>> output keys are symmetric. If they are not, PMTUD needs to be disabled >>>>>> for tunneling to work. >>>>>> >>>>>> Signed-off-by: Kyle Mestery <kmest...@cisco.com> >>>>>> Cc: Simon Horman <ho...@verge.net.au> >>>>>> Cc: Jesse Gross <je...@nicira.com> >>>>>> --- >>>>>> V6: >>>>>> - Fix more comments addressed from Jesse. >>>>>> V5: >>>>>> - Address another round of comments from Jesse. >>>>>> V4: >>>>>> - Address 2 comments from Jesse: >>>>>> - When processing actions, if OVS_CB(skb)->tun_key is NULL, point it at >>>>>> one >>>>>> on the stack temporarily. This goes away when we remove the ability to >>>>>> set >>>>>> tun_id outside the scope of tun_key. >>>>>> - Move tun_key to the end of struct sw_flow_key. >>>>>> V3: >>>>>> - Fix issues found during review by Jesse. >>>>>> - Add a NEWS entry around tunnel code no longer assuming symmetric input >>>>>> and >>>>>> output tunnel keys. >>>>>> >>>>>> V2: >>>>>> - Fix blank line addition/removal found by Simon. >>>>>> - Fix hex printing output found by Simon. >>>>>> --- >>>>>> NEWS | 3 ++ >>>>>> datapath/actions.c | 38 +++++++++++++---- >>>>>> datapath/datapath.c | 10 ++++- >>>>>> datapath/datapath.h | 5 ++- >>>>>> datapath/flow.c | 64 ++++++++++++++++++++++++---- >>>>>> datapath/flow.h | 12 ++++-- >>>>>> datapath/tunnel.c | 101 >>>>>> ++++++++++++++++++++++++++++---------------- >>>>>> datapath/tunnel.h | 15 ++++++- >>>>>> datapath/vport-capwap.c | 12 +++--- >>>>>> datapath/vport-gre.c | 15 ++++--- >>>>>> datapath/vport.c | 2 +- >>>>>> include/linux/openvswitch.h | 18 +++++++- >>>>>> lib/dpif-netdev.c | 1 + >>>>>> lib/odp-util.c | 15 ++++++- >>>>>> lib/odp-util.h | 3 +- >>>>>> 15 files changed, 235 insertions(+), 79 deletions(-) >>>>>> >>>>>> diff --git a/NEWS b/NEWS >>>>>> index 29fd9f3..ae831e3 100644 >>>>>> --- a/NEWS >>>>>> +++ b/NEWS >>>>>> @@ -1,5 +1,8 @@ >>>>>> post-v1.8.0 >>>>>> ------------------------ >>>>>> + - The tunneling code no longer assumes input and output keys are >>>>>> symmetric. >>>>>> + If they are not, PMTUD needs to be disabled for tunneling to >>>>>> work. Note >>>>>> + this only applies to flow-based keys. >>>>>> - FreeBSD is now a supported platform, thanks to code contributions >>>>>> from >>>>>> Gaetano Catalli, Ed Maste, and Giuseppe Lettieri. >>>>>> - ovs-bugtool: New --ovs option to report only OVS related information. >>>>>> diff --git a/datapath/actions.c b/datapath/actions.c >>>>>> index ec9b595..fa8c10d 100644 >>>>>> --- a/datapath/actions.c >>>>>> +++ b/datapath/actions.c >>>>>> @@ -37,7 +37,8 @@ >>>>>> #include "vport.h" >>>>>> >>>>>> static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, >>>>>> - const struct nlattr *attr, int len, bool >>>>>> keep_skb); >>>>>> + const struct nlattr *attr, int len, >>>>>> + struct ovs_key_ipv4_tunnel *tun_key, bool >>>>>> keep_skb); >>>>>> >>>>>> static int make_writable(struct sk_buff *skb, int write_len) >>>>>> { >>>>>> @@ -329,11 +330,14 @@ static int sample(struct datapath *dp, struct >>>>>> sk_buff *skb, >>>>>> } >>>>>> >>>>>> return do_execute_actions(dp, skb, nla_data(acts_list), >>>>>> - nla_len(acts_list), >>>>>> true); >>>>>> + nla_len(acts_list), >>>>>> + OVS_CB(skb)->tun_key, >>>>>> + true); >>>>>> } >>>>>> >>>>>> static int execute_set_action(struct sk_buff *skb, >>>>>> - const struct nlattr *nested_attr) >>>>>> + const struct nlattr *nested_attr, >>>>>> + struct ovs_key_ipv4_tunnel *tun_key) >>>>>> { >>>>>> int err = 0; >>>>>> >>>>>> @@ -343,7 +347,21 @@ static int execute_set_action(struct sk_buff *skb, >>>>>> break; >>>>>> >>>>>> case OVS_KEY_ATTR_TUN_ID: >>>>>> - OVS_CB(skb)->tun_id = nla_get_be64(nested_attr); >>>>>> + if (OVS_CB(skb)->tun_key == NULL) { >>>>>> + /* If tun_key is NULL for this skb, assign it to >>>>>> + * a value the caller passed in for action >>>>>> processing >>>>>> + * and output. This can disappear once we drop >>>>>> support >>>>>> + * for setting tun_id outside of tun_key. >>>>>> + */ >>>>>> + memset(tun_key, 0, sizeof(struct >>>>>> ovs_key_ipv4_tunnel)); >>>>>> + OVS_CB(skb)->tun_key = tun_key; >>>>>> + } >>>>>> + >>>>>> + OVS_CB(skb)->tun_key->tun_id = nla_get_be64(nested_attr); >>>>>> + break; >>>>>> + >>>>>> + case OVS_KEY_ATTR_IPV4_TUNNEL: >>>>>> + OVS_CB(skb)->tun_key = nla_data(nested_attr); >>>>>> break; >>>>>> >>>>>> case OVS_KEY_ATTR_ETHERNET: >>>>>> @@ -368,7 +386,8 @@ static int execute_set_action(struct sk_buff *skb, >>>>>> >>>>>> /* Execute a list of actions against 'skb'. */ >>>>>> static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, >>>>>> - const struct nlattr *attr, int len, bool >>>>>> keep_skb) >>>>>> + const struct nlattr *attr, int len, >>>>>> + struct ovs_key_ipv4_tunnel *tun_key, bool >>>>>> keep_skb) >>>>>> { >>>>>> /* Every output action needs a separate clone of 'skb', but the >>>>>> common >>>>>> * case is just a single output action, so that doing a clone and >>>>>> @@ -407,7 +426,7 @@ static int do_execute_actions(struct datapath *dp, >>>>>> struct sk_buff *skb, >>>>>> break; >>>>>> >>>>>> case OVS_ACTION_ATTR_SET: >>>>>> - err = execute_set_action(skb, nla_data(a)); >>>>>> + err = execute_set_action(skb, nla_data(a), >>>>>> tun_key); >>>>>> break; >>>>>> >>>>>> case OVS_ACTION_ATTR_SAMPLE: >>>>>> @@ -458,6 +477,9 @@ int ovs_execute_actions(struct datapath *dp, struct >>>>>> sk_buff *skb) >>>>>> struct sw_flow_actions *acts = >>>>>> rcu_dereference(OVS_CB(skb)->flow->sf_acts); >>>>>> struct loop_counter *loop; >>>>>> int error; >>>>>> + struct ovs_key_ipv4_tunnel tun_key; >>>>>> + >>>>>> + memset(&tun_key, 0, sizeof(tun_key)); >>>>>> >>>>> tun key will zeroed while executing tunnel set action, Do we still >>>>> need to zero it here? >>>>> >>>>>> /* Check whether we've looped too much. */ >>>>>> loop = &__get_cpu_var(loop_counters); >>>>>> @@ -469,9 +491,9 @@ int ovs_execute_actions(struct datapath *dp, struct >>>>>> sk_buff *skb) >>>>>> goto out_loop; >>>>>> } >>>>>> >>>>>> - OVS_CB(skb)->tun_id = 0; >>>>>> + OVS_CB(skb)->tun_key = NULL; >>>>>> error = do_execute_actions(dp, skb, acts->actions, >>>>>> - acts->actions_len, false); >>>>>> + acts->actions_len, &tun_key, >>>>>> false); >>>>>> >>>>>> /* Check whether sub-actions looped too much. */ >>>>>> if (unlikely(loop->looping)) >>>>>> diff --git a/datapath/datapath.c b/datapath/datapath.c >>>>>> index a6915fb..d8a198e 100644 >>>>>> --- a/datapath/datapath.c >>>>>> +++ b/datapath/datapath.c >>>>>> @@ -587,12 +587,20 @@ static int validate_set(const struct nlattr *a, >>>>>> >>>>>> switch (key_type) { >>>>>> const struct ovs_key_ipv4 *ipv4_key; >>>>>> + const struct ovs_key_ipv4_tunnel *tun_key; >>>>>> >>>>>> case OVS_KEY_ATTR_PRIORITY: >>>>>> case OVS_KEY_ATTR_TUN_ID: >>>>>> case OVS_KEY_ATTR_ETHERNET: >>>>>> break; >>>>>> >>>>>> + case OVS_KEY_ATTR_IPV4_TUNNEL: >>>>>> + tun_key = nla_data(ovs_key); >>>>>> + if (!tun_key->ipv4_dst) { >>>>>> + return -EINVAL; >>>>>> + } >>>>>> + break; >>>>>> + >>>>>> case OVS_KEY_ATTR_IPV4: >>>>>> if (flow_key->eth.type != htons(ETH_P_IP)) >>>>>> return -EINVAL; >>>>>> @@ -785,7 +793,7 @@ static int ovs_packet_cmd_execute(struct sk_buff >>>>>> *skb, struct genl_info *info) >>>>>> >>>>>> err = ovs_flow_metadata_from_nlattrs(&flow->key.phy.priority, >>>>>> &flow->key.phy.in_port, >>>>>> - &flow->key.phy.tun_id, >>>>>> + &flow->key.tun.tun_key, >>>>>> a[OVS_PACKET_ATTR_KEY]); >>>>>> if (err) >>>>>> goto err_flow_put; >>>>>> diff --git a/datapath/datapath.h b/datapath/datapath.h >>>>>> index affbf0e..c5df12d 100644 >>>>>> --- a/datapath/datapath.h >>>>>> +++ b/datapath/datapath.h >>>>>> @@ -96,7 +96,8 @@ struct datapath { >>>>>> /** >>>>>> * struct ovs_skb_cb - OVS data in skb CB >>>>>> * @flow: The flow associated with this packet. May be %NULL if no flow. >>>>>> - * @tun_id: ID of the tunnel that encapsulated this packet. It is 0 if >>>>>> the >>>>>> + * @tun_key: Key for the tunnel that encapsulated this packet. NULL if >>>>>> the >>>>>> + * packet is not being tunneled. >>>>>> * @ip_summed: Consistently stores L4 checksumming status across different >>>>>> * kernel versions. >>>>>> * @csum_start: Stores the offset from which to start checksumming >>>>>> independent >>>>>> @@ -107,7 +108,7 @@ struct datapath { >>>>>> */ >>>>>> struct ovs_skb_cb { >>>>>> struct sw_flow *flow; >>>>>> - __be64 tun_id; >>>>>> + struct ovs_key_ipv4_tunnel *tun_key; >>>>>> #ifdef NEED_CSUM_NORMALIZE >>>>>> enum csum_type ip_summed; >>>>>> u16 csum_start; >>>>>> diff --git a/datapath/flow.c b/datapath/flow.c >>>>>> index d07337c..376f4be 100644 >>>>>> --- a/datapath/flow.c >>>>>> +++ b/datapath/flow.c >>>>>> @@ -629,7 +629,8 @@ int ovs_flow_extract(struct sk_buff *skb, u16 >>>>>> in_port, struct sw_flow_key *key, >>>>>> memset(key, 0, sizeof(*key)); >>>>>> >>>>>> key->phy.priority = skb->priority; >>>>>> - key->phy.tun_id = OVS_CB(skb)->tun_id; >>>>>> + if (OVS_CB(skb)->tun_key) >>>>>> + key->tun.tun_key = *OVS_CB(skb)->tun_key; >>>>>> key->phy.in_port = in_port; >>>>>> >>>>>> skb_reset_mac_header(skb); >>>>>> @@ -847,6 +848,7 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { >>>>>> >>>>>> /* Not upstream. */ >>>>>> [OVS_KEY_ATTR_TUN_ID] = sizeof(__be64), >>>>>> + [OVS_KEY_ATTR_IPV4_TUNNEL] = sizeof(struct ovs_key_ipv4_tunnel), >>>>>> }; >>>>>> >>>>>> static int ipv4_flow_from_nlattrs(struct sw_flow_key *swkey, int >>>>>> *key_len, >>>>>> @@ -1022,9 +1024,30 @@ int ovs_flow_from_nlattrs(struct sw_flow_key >>>>>> *swkey, int *key_lenp, >>>>>> swkey->phy.in_port = DP_MAX_PORTS; >>>>>> } >>>>>> >>>>>> - if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) { >>>>>> - swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]); >>>>>> + /* OVS_KEY_ATTR_TUN_ID and OVS_KEY_ATTR_IPV4_TUNEL must both >>>>>> arrive >>>>>> + * together. >>>>>> + */ >>>>>> + if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID) && >>>>>> + attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) { >>>>>> + struct ovs_key_ipv4_tunnel *tun_key; >>>>>> + __be64 tun_id = 0; >>>>>> + tun_key = nla_data(a[OVS_KEY_ATTR_IPV4_TUNNEL]); >>>>>> + tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]); >>>>>> + >>>>>> + if (tun_id != tun_key->tun_id) >>>>>> + return -EINVAL; >>>>>> + >>>>>> + swkey->tun.tun_key = *tun_key; >>>>>> attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID); >>>>>> + attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL); >>>>>> + } else if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) { >>>>>> + swkey->tun.tun_key.tun_id = >>>>>> nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]); >>>>>> + attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID); >>>>>> + } else if (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) { >>>>>> + struct ovs_key_ipv4_tunnel *tun_key; >>>>>> + tun_key = nla_data(a[OVS_KEY_ATTR_IPV4_TUNNEL]); >>>>>> + swkey->tun.tun_key = *tun_key; >>>>>> + attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL); >>>>>> } >>>>> >>>>> While validating set action OVS_KEY_ATTR_IPV4_TUNNEL, we are checking >>>>> for ipv4_dst, I think we need similar check here. >>>>> >>>>>> >>>>>> /* Data attributes. */ >>>>>> @@ -1162,14 +1185,16 @@ int ovs_flow_from_nlattrs(struct sw_flow_key >>>>>> *swkey, int *key_lenp, >>>>>> * get the metadata, that is, the parts of the flow key that cannot be >>>>>> * extracted from the packet itself. >>>>>> */ >>>>>> -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 >>>>>> *tun_id, >>>>>> +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, >>>>>> + struct ovs_key_ipv4_tunnel *tun_key, >>>>>> const struct nlattr *attr) >>>>>> { >>>>>> const struct nlattr *nla; >>>>>> int rem; >>>>>> + __be64 tun_id = 0; >>>>>> >>>>>> *in_port = DP_MAX_PORTS; >>>>>> - *tun_id = 0; >>>>>> + memset(tun_key, 0, sizeof(*tun_key)); >>>>>> *priority = 0; >>>>>> >>>>>> nla_for_each_nested(nla, attr, rem) { >>>>>> @@ -1185,7 +1210,20 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, >>>>>> u16 *in_port, __be64 *tun_id, >>>>>> break; >>>>>> >>>>>> case OVS_KEY_ATTR_TUN_ID: >>>>>> - *tun_id = nla_get_be64(nla); >>>>>> + tun_id = nla_get_be64(nla); >>>>>> + if ((tun_key->tun_flags & >>>>>> FLOW_TNL_F_KEY) && >>>>>> + tun_key->tun_id != tun_id) >>>>>> + return -EINVAL; >>>>>> + break; >>>>>> + >>>>>> + case OVS_KEY_ATTR_IPV4_TUNNEL: >>>>>> + if (tun_key->tun_flags & FLOW_TNL_F_KEY) >>>>>> + tun_id = tun_key->tun_id; >>>>>> + memcpy(tun_key, nla_data(nla), >>>>>> + sizeof(*tun_key)); >>>>>> + if ((tun_key->tun_flags & >>>>>> FLOW_TNL_F_KEY) && >>>>>> + tun_id != tun_key->tun_id) >>>>>> + return -EINVAL; >>>>>> break; >>>>>> >>>>>> case OVS_KEY_ATTR_IN_PORT: >>>>>> @@ -1210,10 +1248,20 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key >>>>>> *swkey, struct sk_buff *skb) >>>>>> nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority)) >>>>>> goto nla_put_failure; >>>>>> >>>>>> - if (swkey->phy.tun_id != cpu_to_be64(0) && >>>>>> - nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id)) >>>>>> + if ((swkey->tun.tun_key.tun_flags & FLOW_TNL_F_KEY) && >>>>>> + nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, >>>>>> swkey->tun.tun_key.tun_id)) >>>>>> goto nla_put_failure; >>>>>> >>>>>> + if (swkey->tun.tun_key.ipv4_dst) { >>>>>> + struct ovs_key_ipv4_tunnel *tun_key; >>>>>> + nla = nla_reserve(skb, OVS_KEY_ATTR_IPV4_TUNNEL, >>>>>> + sizeof(*tun_key)); >>>>>> + if (!nla) >>>>>> + goto nla_put_failure; >>>>>> + tun_key = nla_data(nla); >>>>>> + *tun_key = swkey->tun.tun_key; >>>>>> + } >>>>>> + >>>>> >>>>> Why are we sending both OVS_KEY_ATTR_TUN_ID and >>>>> OVS_KEY_ATTR_IPV4_TUNNEL attributed here? >>>>> we could first check for ipv4_dst else for FLOW_TNL_F_KEY in flow key. >>>>> >>>>>> if (swkey->phy.in_port != DP_MAX_PORTS && >>>>>> nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, swkey->phy.in_port)) >>>>>> goto nla_put_failure; >>>>>> diff --git a/datapath/flow.h b/datapath/flow.h >>>>>> index 02c563a..4430b32 100644 >>>>>> --- a/datapath/flow.h >>>>>> +++ b/datapath/flow.h >>>>>> @@ -42,10 +42,12 @@ struct sw_flow_actions { >>>>>> >>>>>> struct sw_flow_key { >>>>>> struct { >>>>>> - __be64 tun_id; /* Encapsulating tunnel ID. */ >>>>>> u32 priority; /* Packet QoS priority. */ >>>>>> u16 in_port; /* Input switch port (or >>>>>> DP_MAX_PORTS). */ >>>>>> } phy; >>>>>> + struct { >>>>>> + struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating >>>>>> tunnel key. */ >>>>>> + } tun; >>>>>> struct { >>>>>> u8 src[ETH_ALEN]; /* Ethernet source address. */ >>>>>> u8 dst[ETH_ALEN]; /* Ethernet destination address. */ >>>>>> @@ -150,6 +152,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies); >>>>>> * ------ --- ------ ----- >>>>>> * OVS_KEY_ATTR_PRIORITY 4 -- 4 8 >>>>>> * OVS_KEY_ATTR_TUN_ID 8 -- 4 12 >>>>>> + * OVS_KEY_ATTR_IPV4_TUNNEL 24 -- 4 28 >>>>>> * OVS_KEY_ATTR_IN_PORT 4 -- 4 8 >>>>>> * OVS_KEY_ATTR_ETHERNET 12 -- 4 16 >>>>>> * OVS_KEY_ATTR_ETHERTYPE 2 2 4 8 (outer VLAN >>>>>> ethertype) >>>>>> @@ -160,14 +163,15 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies); >>>>>> * OVS_KEY_ATTR_ICMPV6 2 2 4 8 >>>>>> * OVS_KEY_ATTR_ND 28 -- 4 32 >>>>>> * ------------------------------------------------- >>>>>> - * total 156 >>>>>> + * total 184 >>>>>> */ >>>>>> -#define FLOW_BUFSIZE 156 >>>>>> +#define FLOW_BUFSIZE 184 >>>>>> >>>>>> int ovs_flow_to_nlattrs(const struct sw_flow_key *, struct sk_buff *); >>>>>> int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp, >>>>>> const struct nlattr *); >>>>>> -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 >>>>>> *tun_id, >>>>>> +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, >>>>>> + struct ovs_key_ipv4_tunnel *tun_key, >>>>>> const struct nlattr *); >>>>>> >>>>>> #define MAX_ACTIONS_BUFSIZE (16 * 1024) >>>>>> diff --git a/datapath/tunnel.c b/datapath/tunnel.c >>>>>> index d651c11..739f098 100644 >>>>>> --- a/datapath/tunnel.c >>>>>> +++ b/datapath/tunnel.c >>>>>> @@ -367,9 +367,9 @@ struct vport *ovs_tnl_find_port(struct net *net, >>>>>> __be32 saddr, __be32 daddr, >>>>>> return NULL; >>>>> ñ> } >>>>>> >>>>>> -static void ecn_decapsulate(struct sk_buff *skb, u8 tos) >>>>>> +static void ecn_decapsulate(struct sk_buff *skb) >>>>>> { >>>>>> - if (unlikely(INET_ECN_is_ce(tos))) { >>>>>> + if (unlikely(INET_ECN_is_ce(OVS_CB(skb)->tun_key->ipv4_tos))) { >>>>>> __be16 protocol = skb->protocol; >>>>>> >>>>>> skb_set_network_header(skb, ETH_HLEN); >>>>>> @@ -416,7 +416,7 @@ static void ecn_decapsulate(struct sk_buff *skb, u8 >>>>>> tos) >>>>>> * - skb->csum does not include the inner Ethernet header. >>>>>> * - The layer pointers are undefined. >>>>>> */ >>>>>> -void ovs_tnl_rcv(struct vport *vport, struct sk_buff *skb, u8 tos) >>>>>> +void ovs_tnl_rcv(struct vport *vport, struct sk_buff *skb) >>>>>> { >>>>>> struct ethhdr *eh; >>>>>> >>>>>> @@ -433,7 +433,7 @@ void ovs_tnl_rcv(struct vport *vport, struct sk_buff >>>>>> *skb, u8 tos) >>>>>> skb_clear_rxhash(skb); >>>>>> secpath_reset(skb); >>>>>> >>>>>> - ecn_decapsulate(skb, tos); >>>>>> + ecn_decapsulate(skb); >>>>>> vlan_set_tci(skb, 0); >>>>>> >>>>>> if (unlikely(compute_ip_summed(skb, false))) { >>>>>> @@ -613,7 +613,7 @@ static void ipv6_build_icmp(struct sk_buff *skb, >>>>>> struct sk_buff *nskb, >>>>>> >>>>>> bool ovs_tnl_frag_needed(struct vport *vport, >>>>>> const struct tnl_mutable_config *mutable, >>>>>> - struct sk_buff *skb, unsigned int mtu, __be64 >>>>>> flow_key) >>>>>> + struct sk_buff *skb, unsigned int mtu) >>>>>> { >>>>>> unsigned int eth_hdr_len = ETH_HLEN; >>>>>> unsigned int total_length = 0, header_length = 0, payload_length; >>>>>> @@ -697,17 +697,6 @@ bool ovs_tnl_frag_needed(struct vport *vport, >>>>>> ipv6_build_icmp(skb, nskb, mtu, payload_length); >>>>>> #endif >>>>>> >>>>>> - /* >>>>>> - * Assume that flow based keys are symmetric with respect to >>>>>> input >>>>>> - * and output and use the key that we were going to put on the >>>>>> - * outgoing packet for the fake received packet. If the keys are >>>>>> - * not symmetric then PMTUD needs to be disabled since we won't >>>>>> have >>>>>> - * any way of synthesizing packets. >>>>>> - */ >>>>>> - if ((mutable->flags & (TNL_F_IN_KEY_MATCH | >>>>>> TNL_F_OUT_KEY_ACTION)) == >>>>>> - (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) >>>>>> - OVS_CB(nskb)->tun_id = flow_key; >>>>>> - >>>>>> if (unlikely(compute_ip_summed(nskb, false))) { >>>>>> kfree_skb(nskb); >>>>>> return false; >>>>>> @@ -723,12 +712,20 @@ static bool check_mtu(struct sk_buff *skb, >>>>>> const struct tnl_mutable_config *mutable, >>>>>> const struct rtable *rt, __be16 *frag_offp) >>>>>> { >>>>>> - bool df_inherit = mutable->flags & TNL_F_DF_INHERIT; >>>>>> + bool df_inherit; >>>>>> bool pmtud = mutable->flags & TNL_F_PMTUD; >>>>>> - __be16 frag_off = mutable->flags & TNL_F_DF_DEFAULT ? >>>>>> htons(IP_DF) : 0; >>>>>> + __be16 frag_off; >>>>>> int mtu = 0; >>>>>> unsigned int packet_length = skb->len - ETH_HLEN; >>>>>> >>>>>> + if (OVS_CB(skb)->tun_key->ipv4_dst != 0) { >>>>>> + df_inherit = false; >>>>>> + frag_off = OVS_CB(skb)->tun_key->tun_flags & >>>>>> FLOW_TNL_F_DONT_FRAGMENT; >>>>>> + } else { >>>>>> + df_inherit = mutable->flags & TNL_F_DF_INHERIT; >>>>>> + frag_off = mutable->flags & TNL_F_DF_DEFAULT ? >>>>>> htons(IP_DF) : 0; >>>>>> + } >>>>>> + >>>>> Why not set pmtud to false incase of flow-based tunneling like you did >>>>> it for df_inherit? >>>>> >>>>>> /* Allow for one level of tagging in the packet length. */ >>>>>> if (!vlan_tx_tag_present(skb) && >>>>>> eth_hdr(skb)->h_proto == htons(ETH_P_8021Q)) >>>>>> @@ -760,8 +757,7 @@ static bool check_mtu(struct sk_buff *skb, >>>>>> mtu = max(mtu, IP_MIN_MTU); >>>>>> >>>>>> if (packet_length > mtu && >>>>>> - ovs_tnl_frag_needed(vport, mutable, skb, mtu, >>>>>> - OVS_CB(skb)->tun_id)) >>>>>> + ovs_tnl_frag_needed(vport, mutable, skb, >>>>>> mtu)) >>>>>> return false; >>>>>> } >>>>>> } >>>>>> @@ -777,8 +773,7 @@ static bool check_mtu(struct sk_buff *skb, >>>>>> mtu = max(mtu, IPV6_MIN_MTU); >>>>>> >>>>>> if (packet_length > mtu && >>>>>> - ovs_tnl_frag_needed(vport, mutable, skb, mtu, >>>>>> - OVS_CB(skb)->tun_id)) >>>>>> + ovs_tnl_frag_needed(vport, mutable, skb, >>>>>> mtu)) >>>>>> return false; >>>>>> } >>>>>> } >>>>>> @@ -1000,15 +995,16 @@ unlock: >>>>>> } >>>>>> >>>>>> static struct rtable *__find_route(const struct tnl_mutable_config >>>>>> *mutable, >>>>>> - u8 ipproto, u8 tos) >>>>>> + __be32 saddr, __be32 daddr, u8 >>>>>> ipproto, >>>>>> + u8 tos) >>>>>> { >>>>>> /* Tunnel configuration keeps DSCP part of TOS bits, But Linux >>>>>> * router expect RT_TOS bits only. */ >>>>>> >>>>>> #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,39) >>>>>> struct flowi fl = { .nl_u = { .ip4_u = { >>>>>> - .daddr = mutable->key.daddr, >>>>>> - .saddr = mutable->key.saddr, >>>>>> + .daddr = daddr, >>>>>> + .saddr = saddr, >>>>>> .tos = RT_TOS(tos) } }, >>>>>> .proto = ipproto }; >>>>>> struct rtable *rt; >>>>>> @@ -1018,8 +1014,8 @@ static struct rtable *__find_route(const struct >>>>>> tnl_mutable_config *mutable, >>>>>> >>>>>> return rt; >>>>>> #else >>>>>> - struct flowi4 fl = { .daddr = mutable->key.daddr, >>>>>> - .saddr = mutable->key.saddr, >>>>>> + struct flowi4 fl = { .daddr = daddr, >>>>>> + .saddr = saddr, >>>>>> .flowi4_tos = RT_TOS(tos), >>>>>> .flowi4_proto = ipproto }; >>>>>> >>>>>> @@ -1029,7 +1025,8 @@ static struct rtable *__find_route(const struct >>>>>> tnl_mutable_config *mutable, >>>>>> >>>>>> static struct rtable *find_route(struct vport *vport, >>>>>> const struct tnl_mutable_config *mutable, >>>>>> - u8 tos, struct tnl_cache **cache) >>>>>> + __be32 saddr, __be32 daddr, u8 tos, >>>>>> + struct tnl_cache **cache) >>>>>> { >>>>>> struct tnl_vport *tnl_vport = tnl_vport_priv(vport); >>>>>> struct tnl_cache *cur_cache = rcu_dereference(tnl_vport->cache); >>>>>> @@ -1037,18 +1034,21 @@ static struct rtable *find_route(struct vport >>>>>> *vport, >>>>>> *cache = NULL; >>>>>> tos = RT_TOS(tos); >>>>>> >>>>>> - if (likely(tos == RT_TOS(mutable->tos) && >>>>>> - check_cache_valid(cur_cache, mutable))) { >>>>>> + if (daddr == mutable->key.daddr && saddr == mutable->key.saddr && >>>>>> + tos == RT_TOS(mutable->tos) && >>>>>> + check_cache_valid(cur_cache, mutable)) { >>>>>> *cache = cur_cache; >>>>>> return cur_cache->rt; >>>>>> } else { >>>>>> struct rtable *rt; >>>>>> >>>>>> - rt = __find_route(mutable, tnl_vport->tnl_ops->ipproto, >>>>>> tos); >>>>>> + rt = __find_route(mutable, saddr, daddr, >>>>>> + tnl_vport->tnl_ops->ipproto, tos); >>>>>> if (IS_ERR(rt)) >>>>>> return NULL; >>>>>> >>>>>> - if (likely(tos == RT_TOS(mutable->tos))) >>>>>> + if (daddr == mutable->key.daddr && saddr == >>>>>> mutable->key.saddr && >>>>>> + tos == RT_TOS(mutable->tos)) >>>>>> *cache = build_cache(vport, mutable, rt); >>>>>> >>>>> I am not sure why checks are added on saddr and daddr here? >>>>> header caching is not set in case of flow based tunneling to >>>>> build_cache call is no-op anyways. >>>>> >>>>>> return rt; >>>>>> @@ -1178,8 +1178,11 @@ int ovs_tnl_send(struct vport *vport, struct >>>>>> sk_buff *skb) >>>>>> struct rtable *rt; >>>>>> struct dst_entry *unattached_dst = NULL; >>>>>> struct tnl_cache *cache; >>>>>> + struct ovs_key_ipv4_tunnel tun_key; >>>>>> int sent_len = 0; >>>>>> __be16 frag_off = 0; >>>>>> + __be32 daddr; >>>>>> + __be32 saddr; >>>>>> u8 ttl; >>>>>> u8 inner_tos; >>>>>> u8 tos; >>>>>> @@ -1207,6 +1210,13 @@ int ovs_tnl_send(struct vport *vport, struct >>>>>> sk_buff *skb) >>>>>> } >>>>>> #endif >>>>>> >>>>>> + /* If OVS_CB(skb)->tun_key is NULL, point it at the local >>>>>> tun_key here, >>>>>> + * and zero it out. >>>>>> + */ >>>>>> + if (OVS_CB(skb)->tun_key == NULL) { >>>>>> + memset(&tun_key, 0, sizeof(tun_key)); >>>>>> + OVS_CB(skb)->tun_key = &tun_key; >>>>>> + } >>>>>> /* ToS */ >>>>>> if (skb->protocol == htons(ETH_P_IP)) >>>>>> inner_tos = ip_hdr(skb)->tos; >>>>>> @@ -1219,11 +1229,23 @@ int ovs_tnl_send(struct vport *vport, struct >>>>>> sk_buff *skb) >>>>>> >>>>>> if (mutable->flags & TNL_F_TOS_INHERIT) >>>>>> tos = inner_tos; >>>>>> + else if (OVS_CB(skb)->tun_key->ipv4_dst != 0) >>>>>> + tos = OVS_CB(skb)->tun_key->ipv4_tos; >>>>>> else >>>>>> tos = mutable->tos; >>>>>> >>>>>> + if (OVS_CB(skb)->tun_key->ipv4_dst != 0) >>>>>> + daddr = OVS_CB(skb)->tun_key->ipv4_dst; >>>>>> + else >>>>>> + daddr = mutable->key.daddr; >>>>>> + >>>>>> + if (OVS_CB(skb)->tun_key->ipv4_dst != 0) >>>>>> + saddr = OVS_CB(skb)->tun_key->ipv4_src; >>>>>> + else >>>>>> + saddr = mutable->key.saddr; >>>>>> + >>>>> >>>>> It will be easier to read if we have one if check on >>>>> OVS_CB(skb)->tun_key->ipv4_dst which is retrieve all tunnel parameter >>>>> for flow-based tunning and else block for campat code. >>>>> >>>>>> /* Route lookup */ >>>>>> - rt = find_route(vport, mutable, tos, &cache); >>>>>> + rt = find_route(vport, mutable, saddr, daddr, tos, &cache); >>>>>> if (unlikely(!rt)) >>>>>> goto error_free; >>>>>> if (unlikely(!cache)) >>>>>> @@ -1260,9 +1282,13 @@ int ovs_tnl_send(struct vport *vport, struct >>>>>> sk_buff *skb) >>>>>> } >>>>>> >>>>>> /* TTL */ >>>>>> - ttl = mutable->ttl; >>>>>> - if (!ttl) >>>>>> - ttl = ip4_dst_hoplimit(&rt_dst(rt)); >>>>>> + if (OVS_CB(skb)->tun_key->ipv4_dst != 0) >>>>>> + ttl = OVS_CB(skb)->tun_key->ipv4_ttl; >>>>>> + else { >>>>>> + ttl = mutable->ttl; >>>>>> + if (!ttl) >>>>>> + ttl = ip4_dst_hoplimit(&rt_dst(rt)); >>>>>> + } >>>>>> >>>>>> if (mutable->flags & TNL_F_TTL_INHERIT) { >>>>> >>>>> This check could be in the compat code block, so it is less confusing. >>>>> >>>>>> if (skb->protocol == htons(ETH_P_IP)) >>>>>> @@ -1442,7 +1468,8 @@ static int tnl_set_config(struct net *net, struct >>>>>> nlattr *options, >>>>>> struct net_device *dev; >>>>>> struct rtable *rt; >>>>>> >>>>>> - rt = __find_route(mutable, tnl_ops->ipproto, >>>>>> mutable->tos); >>>>>> + rt = __find_route(mutable, mutable->key.saddr, >>>>>> mutable->key.daddr, >>>>>> + tnl_ops->ipproto, mutable->tos); >>>>>> if (IS_ERR(rt)) >>>>>> return -EADDRNOTAVAIL; >>>>>> dev = rt_dst(rt).dev; >>>>>> diff --git a/datapath/tunnel.h b/datapath/tunnel.h >>>>>> index 1924017..b7858d3 100644 >>>>>> --- a/datapath/tunnel.h >>>>>> +++ b/datapath/tunnel.h >>>>>> @@ -269,14 +269,14 @@ int ovs_tnl_set_addr(struct vport *vport, const >>>>>> unsigned char *addr); >>>>>> const char *ovs_tnl_get_name(const struct vport *vport); >>>>>> const unsigned char *ovs_tnl_get_addr(const struct vport *vport); >>>>>> int ovs_tnl_send(struct vport *vport, struct sk_buff *skb); >>>>>> -void ovs_tnl_rcv(struct vport *vport, struct sk_buff *skb, u8 tos); >>>>>> +void ovs_tnl_rcv(struct vport *vport, struct sk_buff *skb); >>>>>> >>>>>> struct vport *ovs_tnl_find_port(struct net *net, __be32 saddr, __be32 >>>>>> daddr, >>>>>> __be64 key, int tunnel_type, >>>>>> const struct tnl_mutable_config **mutable); >>>>>> bool ovs_tnl_frag_needed(struct vport *vport, >>>>>> const struct tnl_mutable_config *mutable, >>>>>> - struct sk_buff *skb, unsigned int mtu, __be64 >>>>>> flow_key); >>>>>> + struct sk_buff *skb, unsigned int mtu); >>>>>> void ovs_tnl_free_linked_skbs(struct sk_buff *skb); >>>>>> >>>>>> int ovs_tnl_init(void); >>>>>> @@ -286,4 +286,15 @@ static inline struct tnl_vport >>>>>> *tnl_vport_priv(const struct vport *vport) >>>>>> return vport_priv(vport); >>>>>> } >>>>>> >>>>>> +static inline void tnl_tun_key_init(struct ovs_key_ipv4_tunnel *tun_key, >>>>>> + const struct iphdr *iph, __be64 >>>>>> tun_id) >>>>>> +{ >>>>>> + tun_key->tun_id = tun_id; >>>>>> + tun_key->ipv4_src = iph->saddr; >>>>>> + tun_key->ipv4_dst = iph->daddr; >>>>>> + tun_key->ipv4_tos = iph->tos; >>>>>> + tun_key->ipv4_ttl = iph->ttl; >>>>>> + tun_key->tun_flags = 0; >>>>>> +} >>>>>> + >>>>>> #endif /* tunnel.h */ >>>>>> diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c >>>>>> index 05a099d..fbf50d9 100644 >>>>>> --- a/datapath/vport-capwap.c >>>>>> +++ b/datapath/vport-capwap.c >>>>>> @@ -220,7 +220,7 @@ static struct sk_buff *capwap_update_header(const >>>>>> struct vport *vport, >>>>>> struct capwaphdr_wsi *wsi = (struct capwaphdr_wsi *)(cwh + >>>>>> 1); >>>>>> struct capwaphdr_wsi_key *opt = (struct capwaphdr_wsi_key >>>>>> *)(wsi + 1); >>>>>> >>>>>> - opt->key = OVS_CB(skb)->tun_id; >>>>>> + opt->key = OVS_CB(skb)->tun_key->tun_id; >>>>>> } >>>>>> >>>>>> udph->len = htons(skb->len - skb_transport_offset(skb)); >>>>>> @@ -316,6 +316,7 @@ static int capwap_rcv(struct sock *sk, struct >>>>>> sk_buff *skb) >>>>>> struct vport *vport; >>>>>> const struct tnl_mutable_config *mutable; >>>>>> struct iphdr *iph; >>>>>> + struct ovs_key_ipv4_tunnel tun_key; >>>>>> __be64 key = 0; >>>>>> >>>>>> if (unlikely(!pskb_may_pull(skb, CAPWAP_MIN_HLEN + ETH_HLEN))) >>>>>> @@ -333,12 +334,11 @@ static int capwap_rcv(struct sock *sk, struct >>>>>> sk_buff *skb) >>>>>> goto error; >>>>>> } >>>>>> >>>>>> - if (mutable->flags & TNL_F_IN_KEY_MATCH) >>>>>> - OVS_CB(skb)->tun_id = key; >>>>>> - else >>>>>> - OVS_CB(skb)->tun_id = 0; >>>>>> + tnl_tun_key_init(&tun_key, iph, >>>>>> + mutable->flags & TNL_F_IN_KEY_MATCH ? key : 0); >>>>>> + OVS_CB(skb)->tun_key = &tun_key; >>>>>> >>>>>> - ovs_tnl_rcv(vport, skb, iph->tos); >>>>>> + ovs_tnl_rcv(vport, skb); >>>>>> goto out; >>>>>> >>>>>> error: >>>>>> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c >>>>>> index ab89c5b..8076b1d 100644 >>>>>> --- a/datapath/vport-gre.c >>>>>> +++ b/datapath/vport-gre.c >>>>>> @@ -103,7 +103,7 @@ static struct sk_buff *gre_update_header(const >>>>>> struct vport *vport, >>>>>> >>>>>> /* Work backwards over the options so the checksum is last. */ >>>>>> if (mutable->flags & TNL_F_OUT_KEY_ACTION) >>>>>> - *options = be64_get_low32(OVS_CB(skb)->tun_id); >>>>>> + *options = be64_get_low32(OVS_CB(skb)->tun_key->tun_id); >>>>>> >>>>> Here we do need check if we doing flow-based tunneling. in case of >>>>> flow-based tunneling mutable is shared between all tunnels, so there >>>>> is no-way to we can do port specific flags check from mutable. about >>>>> GRE checksum, we need to check for FLOW_TNL_F_CSUM flag for doing >>>>> checksum on a packet. >>>>> >>>>>> if (mutable->out_key || mutable->flags & TNL_F_OUT_KEY_ACTION) >>>>>> options--; >>>>>> @@ -285,7 +285,7 @@ static void gre_err(struct sk_buff *skb, u32 info) >>>>>> #endif >>>>>> >>>>>> __skb_pull(skb, tunnel_hdr_len); >>>>>> - ovs_tnl_frag_needed(vport, mutable, skb, mtu, key); >>>>>> + ovs_tnl_frag_needed(vport, mutable, skb, mtu); >>>>>> __skb_push(skb, tunnel_hdr_len); >>>>>> >>>>> same here, tunnel_hdr_len is stored in mutable which is going to be >>>>> shared for all tunnels for a type in flow based tunneling, so we need >>>>> to calculate it on packet send. >>>>> >>>> I looked, and in the context of this function, tunnel_hdr_len is actually >>>> calculated >>>> by the static function parse_header(), which actually is looking at the >>>> packet to >>>> find it. Now, in the broader context, mutable does have a tunnel header as >>>> a part >>>> of it, and if that's the tunnel header length you are concerned about, I >>>> agree we >>>> need to address that. Can you confirm? >>>> >>> >>> Right, I misplaced this comment. While rcv is tunnel_hlen is right, on >>> send side tunnel_hlen needs fix. >>> >> Calculating header length for each packet transmit doesn't seem efficient, >> but we may have >> to go that route. Any ideas here? >> > > I was thinking of keeping it in sw_flow and calculate it on first > transmit. We can remove hlen mutable config and always use > flow->tunnel_hlen even in compat mode. > OK, this is what I was thinking as well. Let me redo the patch and resubmit.
Just FYI: I may not get to this for another week or so, as I'm on PTO tomorrow, and traveling next week. Let me see what I can do though. Thanks, Kyle > >> Thanks, >> Kyle >> >>> Thanks, >>> Pravin. >> >> _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev