On Thu, Oct 25, 2018 at 07:37:56AM -0700, Slava Ovsiienko wrote: > > -----Original Message----- > > From: Yongseok Koh > > Sent: Tuesday, October 23, 2018 13:06 > > To: Slava Ovsiienko <viachesl...@mellanox.com> > > Cc: Shahaf Shuler <shah...@mellanox.com>; dev@dpdk.org > > Subject: Re: [PATCH v2 3/7] net/mlx5: e-switch VXLAN flow translation > > routine > > > > On Mon, Oct 15, 2018 at 02:13:31PM +0000, Viacheslav Ovsiienko wrote: [...] > > > @@ -2184,13 +2447,16 @@ struct pedit_parser { > > > * Pointer to the list of actions. > > > * @param[out] action_flags > > > * Pointer to the detected actions. > > > + * @param[out] tunnel > > > + * Pointer to tunnel encapsulation parameters structure to fill. > > > * > > > * @return > > > * Maximum size of memory for actions. > > > */ > > > static int > > > flow_tcf_get_actions_and_size(const struct rte_flow_action actions[], > > > - uint64_t *action_flags) > > > + uint64_t *action_flags, > > > + void *tunnel) > > > > This func is to get actions and size but you are parsing and filling tunnel > > info > > here. It would be better to move parsing to translate() because it anyway > > has > > multiple if conditions (same as switch/case) to set TCA_TUNNEL_KEY_ENC_* > > there. > Do you mean call of flow_tcf_vxlan_encap_parse(actions, tunnel)?
Yes. > OK, let's move it to translate stage. Anyway, we need to keep encap structure > for local/neigh rules. > > > > > > { > > > int size = 0; > > > uint64_t flags = 0; > > > @@ -2246,6 +2512,29 @@ struct pedit_parser { > > > SZ_NLATTR_TYPE_OF(uint16_t) + /* VLAN ID. > > */ > > > SZ_NLATTR_TYPE_OF(uint8_t); /* VLAN prio. > > */ > > > break; > > > + case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP: > > > + size += SZ_NLATTR_NEST + /* na_act_index. */ > > > + SZ_NLATTR_STRZ_OF("tunnel_key") + > > > + SZ_NLATTR_NEST + /* TCA_ACT_OPTIONS. > > */ > > > + SZ_NLATTR_TYPE_OF(uint8_t); > > > + size += SZ_NLATTR_TYPE_OF(struct tc_tunnel_key); > > > + size += flow_tcf_vxlan_encap_parse(actions, tunnel) > > + > > > + RTE_ALIGN_CEIL /* preceding encap params. > > */ > > > + (sizeof(struct mlx5_flow_tcf_vxlan_encap), > > > + MNL_ALIGNTO); > > > > Is it different from SZ_NLATTR_TYPE_OF(struct > > mlx5_flow_tcf_vxlan_encap)? Or, use __rte_aligned(MNL_ALIGNTO) instead. > > It is written intentionally in this form. It means that there is struct > mlx5_flow_tcf_vxlan_encap > at the beginning of buffer. This is not the NL attribute, usage of > SZ_NLATTR_TYPE_OF is > not relevant here. Alignment is needed for the following Netlink message. Good point. Understood. > > > > > + flags |= MLX5_ACTION_VXLAN_ENCAP; > > > + break; > > > + case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP: > > > + size += SZ_NLATTR_NEST + /* na_act_index. */ > > > + SZ_NLATTR_STRZ_OF("tunnel_key") + > > > + SZ_NLATTR_NEST + /* TCA_ACT_OPTIONS. > > */ > > > + SZ_NLATTR_TYPE_OF(uint8_t); > > > + size += SZ_NLATTR_TYPE_OF(struct tc_tunnel_key); > > > + size += RTE_ALIGN_CEIL /* preceding decap params. > > */ > > > + (sizeof(struct mlx5_flow_tcf_vxlan_decap), > > > + MNL_ALIGNTO); > > > > Same here. > > > > > + flags |= MLX5_ACTION_VXLAN_DECAP; > > > + break; > > > case RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC: > > > case RTE_FLOW_ACTION_TYPE_SET_IPV4_DST: > > > case RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC: > > > @@ -2289,6 +2578,26 @@ struct pedit_parser { } > > > > > > /** > > > + * Convert VXLAN VNI to 32-bit integer. > > > + * > > > + * @param[in] vni > > > + * VXLAN VNI in 24-bit wire format. > > > + * > > > + * @return > > > + * VXLAN VNI as a 32-bit integer value in network endian. > > > + */ > > > +static rte_be32_t > > > > make it inline. > OK. Missed point. > > > > > > +vxlan_vni_as_be32(const uint8_t vni[3]) { > > > + rte_be32_t ret; > > > > Defining ret as rte_be32_t? The return value of this func which is > > bswap(ret) > > is also rte_be32_t?? > Yes. And it is directly stored in the net-endian NL attribute. > I've compiled and checked the listing of the function you proposed. It seems > to be best, I'll take it. > > > > > > + > > > + ret = vni[0]; > > > + ret = (ret << 8) | vni[1]; > > > + ret = (ret << 8) | vni[2]; > > > + return RTE_BE32(ret); > > > > Use rte_cpu_to_be_*() instead. But I still don't understand why you shuffle > > bytes twice. One with shift and or and other by bswap(). > And it works. There are three bytes in very bizarre order (in NL attribute) - > 0, vni[0], vni[1], vni[2]. > > > > > { > > union { > > uint8_t vni[4]; > > rte_be32_t dword; > > } ret = { > > .vni = { 0, vni[0], vni[1], vni[2] }, > > }; > > return ret.dword; > > } > > > > This will have the same result without extra cost. > > OK. Investigated, it is the best for x86-64. Also I'm going to test it on the > ARM 32, > with various compilers, just curious. > > > > > > +} > > > + > > > +/** > > > * Prepare a flow object for Linux TC flower. It calculates the maximum > > size of > > > * memory required, allocates the memory, initializes Netlink message > > headers > > > * and set unique TC message handle. > > > @@ -2323,22 +2632,54 @@ struct pedit_parser { > > > struct mlx5_flow *dev_flow; > > > struct nlmsghdr *nlh; > > > struct tcmsg *tcm; > > > + struct mlx5_flow_tcf_vxlan_encap encap = {.mask = 0}; > > > + uint8_t *sp, *tun = NULL; > > > > > > size += flow_tcf_get_items_and_size(attr, items, item_flags); > > > - size += flow_tcf_get_actions_and_size(actions, action_flags); > > > - dev_flow = rte_zmalloc(__func__, size, MNL_ALIGNTO); > > > + size += flow_tcf_get_actions_and_size(actions, action_flags, > > &encap); > > > + dev_flow = rte_zmalloc(__func__, size, > > > + RTE_MAX(alignof(struct mlx5_flow_tcf_tunnel_hdr), > > > + (size_t)MNL_ALIGNTO)); > > > > Why RTE_MAX between the two? Note that it is alignment for start address > > of the memory and the minimum alignment is cacheline size. On x86, non- > > zero value less than 64 will have same result as 64. > > OK. Thanks for note. > It is not expected the structure alignments exceed the cache line size. > So? Just specify zero? > > > > > if (!dev_flow) { > > > rte_flow_error_set(error, ENOMEM, > > > RTE_FLOW_ERROR_TYPE_UNSPECIFIED, > > NULL, > > > "not enough memory to create E-Switch > > flow"); > > > return NULL; > > > } > > > - nlh = mnl_nlmsg_put_header((void *)(dev_flow + 1)); > > > + sp = (uint8_t *)(dev_flow + 1); > > > + if (*action_flags & MLX5_ACTION_VXLAN_ENCAP) { > > > + tun = sp; > > > + sp += RTE_ALIGN_CEIL > > > + (sizeof(struct mlx5_flow_tcf_vxlan_encap), > > > + MNL_ALIGNTO); > > > > And why should it be aligned? > > Netlink message should be aligned. It follows the mlx5_flow_tcf_vxlan_encap, > that's why the pointer is aligned. Not true. There's no requirement for nl msg buffer alignment. MNL_ALIGNTO is for mainly size alignment. For example, checkout the source code of mnl_nlmsg_put_header(void *buf). There's no requirement of aligning the start address of buf. But, size of any entries (hdr, attr ...) should be aligned to MNL_ALIGNTO(4). > > As the size of dev_flow might not be aligned, it > > is meaningless, isn't it? If you think it must be aligned for better > > performance > > (not much anyway), you can use __rte_aligned(MNL_ALIGNTO) on the struct > Hm. Where we can use __rte_aligned? Could you clarify, please. For example, struct mlx5_flow_tcf_tunnel_hdr { uint32_t type; /**< Tunnel action type. */ unsigned int ifindex_tun; /**< Tunnel endpoint interface. */ unsigned int ifindex_org; /**< Original dst/src interface */ unsigned int *ifindex_ptr; /**< Interface ptr in message. */ } __rte_aligned(MNL_ALIGNTO); A good example is the struct rte_mbuf. If this attribute is used, the size of the struct will be aligned to the value. If you still want to make the nl msg aligned, dev_flow = rte_zmalloc(..., MNL_ALIGNTO); /* anyway cacheline aligned. */ tun = RTE_PTR_ALIGN(dev_flow + 1, MNL_ALIGNTO); nlh = mnl_nlmsg_put_header(tun); with adding '__rte_aligned(MNL_ALIGNTO)' to struct mlx5_flow_tcf_vxlan_encap/decap. Then, nlh will be aligned. You should make sure size is correctly calculated. > > > definition but not for mlx5_flow (it's not only for tcf, have to do it > > manually). > > > > > + size -= RTE_ALIGN_CEIL > > > + (sizeof(struct mlx5_flow_tcf_vxlan_encap), > > > + MNL_ALIGNTO); > > > > Don't you have to subtract sizeof(struct mlx5_flow) as well? But like I > > mentioned, if '.nlsize' below isn't needed, you don't need to have this > > calculation either. > Yes, it is a bug. Should be fixed. Thank you. > Let's discuss whether we can keep the nlsize under NDEBUG switch. I agreed on using NDEBUG for it. > > > > > > + encap.hdr.type = > > MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP; > > > + memcpy(tun, &encap, > > > + sizeof(struct mlx5_flow_tcf_vxlan_encap)); > > > + } else if (*action_flags & MLX5_ACTION_VXLAN_DECAP) { > > > + tun = sp; > > > + sp += RTE_ALIGN_CEIL > > > + (sizeof(struct mlx5_flow_tcf_vxlan_decap), > > > + MNL_ALIGNTO); > > > + size -= RTE_ALIGN_CEIL > > > + (sizeof(struct mlx5_flow_tcf_vxlan_decap), > > > + MNL_ALIGNTO); > > > + encap.hdr.type = > > MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP; > > > + memcpy(tun, &encap, > > > + sizeof(struct mlx5_flow_tcf_vxlan_decap)); > > > + } > > > + nlh = mnl_nlmsg_put_header(sp); > > > tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm)); > > > *dev_flow = (struct mlx5_flow){ > > > .tcf = (struct mlx5_flow_tcf){ > > > + .nlsize = size, > > > .nlh = nlh, > > > .tcm = tcm, > > > + .tunnel = (struct mlx5_flow_tcf_tunnel_hdr *)tun, > > > + .item_flags = *item_flags, > > > + .action_flags = *action_flags, > > > }, > > > }; > > > /* [...] > > > @@ -2827,6 +3268,76 @@ struct pedit_parser { > > > (na_vlan_priority) = > > > conf.of_set_vlan_pcp->vlan_pcp; > > > } > > > + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); > > > + break; > > > + case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP: > > > + assert(decap.vxlan); > > > + assert(dev_flow->tcf.tunnel); > > > + dev_flow->tcf.tunnel->ifindex_ptr > > > + = (unsigned int *)&tcm->tcm_ifindex; > > > + na_act_index = > > > + mnl_attr_nest_start(nlh, > > na_act_index_cur++); > > > + assert(na_act_index); > > > + mnl_attr_put_strz(nlh, TCA_ACT_KIND, > > "tunnel_key"); > > > + na_act = mnl_attr_nest_start(nlh, > > TCA_ACT_OPTIONS); > > > + assert(na_act); > > > + mnl_attr_put(nlh, TCA_TUNNEL_KEY_PARMS, > > > + sizeof(struct tc_tunnel_key), > > > + &(struct tc_tunnel_key){ > > > + .action = TC_ACT_PIPE, > > > + .t_action = > > TCA_TUNNEL_KEY_ACT_RELEASE, > > > + }); > > > + mnl_attr_nest_end(nlh, na_act); > > > + mnl_attr_nest_end(nlh, na_act_index); > > > + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); > > > + break; > > > + case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP: > > > + assert(encap.vxlan); > > > + na_act_index = > > > + mnl_attr_nest_start(nlh, > > na_act_index_cur++); > > > + assert(na_act_index); > > > + mnl_attr_put_strz(nlh, TCA_ACT_KIND, > > "tunnel_key"); > > > + na_act = mnl_attr_nest_start(nlh, > > TCA_ACT_OPTIONS); > > > + assert(na_act); > > > + mnl_attr_put(nlh, TCA_TUNNEL_KEY_PARMS, > > > + sizeof(struct tc_tunnel_key), > > > + &(struct tc_tunnel_key){ > > > + .action = TC_ACT_PIPE, > > > + .t_action = > > TCA_TUNNEL_KEY_ACT_SET, > > > + }); > > > + if (encap.vxlan->mask & > > MLX5_FLOW_TCF_ENCAP_UDP_DST) > > > + mnl_attr_put_u16(nlh, > > > + TCA_TUNNEL_KEY_ENC_DST_PORT, > > > + encap.vxlan->udp.dst); > > > + if (encap.vxlan->mask & > > MLX5_FLOW_TCF_ENCAP_IPV4_SRC) > > > + mnl_attr_put_u32(nlh, > > > + TCA_TUNNEL_KEY_ENC_IPV4_SRC, > > > + encap.vxlan->ipv4.src); > > > + if (encap.vxlan->mask & > > MLX5_FLOW_TCF_ENCAP_IPV4_DST) > > > + mnl_attr_put_u32(nlh, > > > + TCA_TUNNEL_KEY_ENC_IPV4_DST, > > > + encap.vxlan->ipv4.dst); > > > + if (encap.vxlan->mask & > > MLX5_FLOW_TCF_ENCAP_IPV6_SRC) > > > + mnl_attr_put(nlh, > > > + TCA_TUNNEL_KEY_ENC_IPV6_SRC, > > > + sizeof(encap.vxlan->ipv6.src), > > > + &encap.vxlan->ipv6.src); > > > + if (encap.vxlan->mask & > > MLX5_FLOW_TCF_ENCAP_IPV6_DST) > > > + mnl_attr_put(nlh, > > > + TCA_TUNNEL_KEY_ENC_IPV6_DST, > > > + sizeof(encap.vxlan->ipv6.dst), > > > + &encap.vxlan->ipv6.dst); > > > + if (encap.vxlan->mask & > > MLX5_FLOW_TCF_ENCAP_VXLAN_VNI) > > > + mnl_attr_put_u32(nlh, > > > + TCA_TUNNEL_KEY_ENC_KEY_ID, > > > + vxlan_vni_as_be32 > > > + (encap.vxlan->vxlan.vni)); > > > +#ifdef TCA_TUNNEL_KEY_NO_CSUM > > > + mnl_attr_put_u8(nlh, TCA_TUNNEL_KEY_NO_CSUM, > > 0); #endif > > > > TCA_TUNNEL_KEY_NO_CSUM is anyway defined like others, then why do > > you treat it differently with #ifdef/#endif? > > As it was found it is not defined on old kernels, on some our CI machines > compilation errors occurred. In your first patch, TCA_TUNNEL_KEY_NO_CSUM is defined if there isn't HAVE_TC_ACT_TUNNEL_KEY. Actually I'm wondering why it is different from HAVE_TCA_TUNNEL_KEY_ENC_DST_PORT. It looks like the following is needed - HAVE_TCA_TUNNEL_KEY_NO_CSUM ?? #ifdef HAVE_TC_ACT_TUNNEL_KEY #include <linux/tc_act/tc_tunnel_key.h> #ifndef HAVE_TCA_TUNNEL_KEY_ENC_DST_PORT #define TCA_TUNNEL_KEY_ENC_DST_PORT 9 #endif #ifndef HAVE_TCA_TUNNEL_KEY_NO_CSUM #define TCA_TUNNEL_KEY_NO_CSUM 10 #endif #else /* HAVE_TC_ACT_TUNNEL_KEY */ Thanks, Yongseok