[dpdk-dev] [PATCH] Made i40 header CPP compatible using extern "C". Library headers work directly in cpp code. Linking errors thrown due to the absence of this change in i40e pmd header does not help

2020-11-07 Thread Prateek Agarwal
Signed-off-by: Prateek Agarwal 
---
 drivers/net/i40e/rte_pmd_i40e.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h
index fc3560c28..4cb21c371 100644
--- a/drivers/net/i40e/rte_pmd_i40e.h
+++ b/drivers/net/i40e/rte_pmd_i40e.h
@@ -14,6 +14,10 @@
  *
  */
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #include 
 #include 
 #include 
@@ -1130,4 +1134,8 @@ __rte_experimental
 int
 rte_pmd_i40e_set_switch_dev(uint16_t port_id, struct rte_eth_dev *switch_dev);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _PMD_I40E_H_ */
-- 
2.25.1



Re: [dpdk-dev] [PATCH v4 00/58] net: txgbe PMD

2020-11-07 Thread Thomas Monjalon
06/11/2020 20:56, Honnappa Nagarahalli:
> > 06/11/2020 19:21, Honnappa Nagarahalli:
> > > > 05/11/2020 09:55, Jiawen Wu:
> > > > > On Thursday, November 5, 2020 9:55 AM, Jiawen Wu wrote:
> > > > > > On Thursday, November 5, 2020 1:24 AM, Ferruh Yigit wrote:
> > > > > > > On 11/3/2020 11:08 PM, Thomas Monjalon wrote:
> > > > > > > > When pulling in the main branch, I see some checkpatches
> > > > > > > > warnings (in order of criticality):
> > > > > > > > Using rte_smp_[r/w]mb
> > > > > > > > Using rte_panic/rte_exit
> > > > > > > > Using compiler attribute directly
> > > > > > > >
> > > > > > > > Please could you fix them (at least first two) before the second
> > series?
> > >
> > > Thomas, IMO, these should result in errors in checkpatch, not just 
> > > warnings.
> > > Do you see any issues?
> > 
> > The problem is that there are too many false positives in checkpatch.
> 
> These two, mentioned above, should be pretty straight forward.

Yes

> Have you seen any false positives with these>

No

What do you propose to distinguish warnings and errors in patchwork?
We can have a different return value in the script
to categorize the checkpatch test in the right patchwork column.




Re: [dpdk-dev] [PATCH v5 1/1] vfio: modify spapr iommu support to use static window sizing

2020-11-07 Thread Thomas Monjalon
06/11/2020 23:16, David Christensen:
> On 11/4/20 11:12 PM, Thomas Monjalon wrote:
> > 04/11/2020 23:25, David Christensen:
> >> On 11/4/20 1:02 PM, Thomas Monjalon wrote:
> >>> 04/11/2020 22:00, David Christensen:
>  On 11/4/20 11:43 AM, Thomas Monjalon wrote:
> >> Signed-off-by: David Christensen 
> >> Acked-by: Anatoly Burakov 
> >> ---
> >> -#ifdef VFIO_IOMMU_SPAPR_INFO_DDW
> >> -  /* try possible page_shift and levels for workaround */
> >> +  /* if at first we don't succeed, try more levels */
> >>uint32_t levels;
> >> 
> >> -  for (levels = create->levels + 1;
> >> +  for (levels = create.levels + 1;
> >>ret && levels <= info.ddw.levels; levels++) {
> >
> > There is a compilation failure with ppc64le-power8-linux-gcc:
> > error: ‘struct vfio_iommu_spapr_tce_info’ has no member named ‘ddw’
> 
>  How did you find that error?  It builds locally for me on a POWER system
>  with Meson/gcc and there were no build failures on Travis
>  (https://travis-ci.com/github/drchristensen/dpdk/builds/198047029) when
>  I checked it against AMD64/ARM systems.  The code is PPC specific but it
>  will build on all architectures (there are no IFDEFs around it).
> >>>
> >>> Remember, I cross-build with test-meson-builds.sh
> >>> Is it an issue of my toolchain?
> >>
> >> What distro/gcc version are you using?  I'll try it locally on an x86.
> > 
> > I am using powerpc64le-power8--glibc--stable-2018.11-1 from
> > https://toolchains.bootlin.com/releases_powerpc64le-power8.html
> 
> Here's what I found:
> 
> - Builds correctly on a RHEL 8.2 POWER9 host with gcc (GCC) 8.3.1 
> 20191121 (Red Hat 8.3.1-5) and kernel 4.18.0
> - Builds correctly on an Ubuntu 18.04.5 POWER9 host with gcc (Ubuntu 
> 7.5.0-3ubuntu1~18.04) 7.5.0 and kernel 4.15.0.
> - Build fails on an Ubuntu 18.04.5 AMD64 host with your POWER8 toolchain 
> and the devtools/test-meson-builds.sh script.
> 
> It appears that the VFIO header file in your toolchain:
> 
> powerpc64le-buildroot-linux-gnu/sysroot/usr/include/linux/vfio.h
> 
> is from the 4.1.49 kernel, but the sPAPR v2 IOMMU support wasn't added 
> until the 4.2.0 kernel (https://lkml.org/lkml/2015/4/25/56).  The update 
> added the ddw member to the vfio_iommu_spapr_tce_info structure.  I'll 
> submit a new patch which skips testing additional levels unless kernel 
> 4.2.0 or later is used.

Instead of testing kernel version, which is fragile with backports,
can you test the presence of the feature itself?
If no macro (usable with #ifdef) is defined with the feature,
checking the kernel version is acceptable.




[dpdk-dev] [PATCH 1/1] mbuf: move pool pointer in first half

2020-11-07 Thread Thomas Monjalon
The mempool pointer in the mbuf struct is moved
from the second to the first half.
It should increase performance on most systems having 64-byte cache line,
i.e. mbuf is split in two cache lines.

Due to this change, tx_offload is moved, so some vector data paths
may need to be adjusted. Note: OCTEON TX2 check is removed temporarily!

Moving this field gives more space to dynfield1.

This is how the mbuf layout looks like (pahole-style):

word  type  namebyte  size
 0void *buf_addr; /*   0 +  8 */
 1rte_iova_tbuf_iova  /*   8 +  8 */
  /* --- RTE_MARKER64   rearm_data;   */
 2uint16_t  data_off; /*  16 +  2 */
  uint16_t  refcnt;   /*  18 +  2 */
  uint16_t  nb_segs;  /*  20 +  2 */
  uint16_t  port; /*  22 +  2 */
 3uint64_t  ol_flags; /*  24 +  8 */
  /* --- RTE_MARKER rx_descriptor_fields1;*/
 4uint32_t unionpacket_type;  /*  32 +  4 */
  uint32_t  pkt_len;  /*  36 +  4 */
 5uint16_t  data_len; /*  40 +  2 */
  uint16_t  vlan_tci; /*  42 +  2 */
 5.5  uint64_t unionhash; /*  44 +  8 */
 6.5  uint16_t  vlan_tci_outer;   /*  52 +  2 */
  uint16_t  buf_len;  /*  54 +  2 */
 7struct rte_mempool *  pool; /*  56 +  8 */
  /* --- RTE_MARKER cacheline1;   */
 8struct rte_mbuf * next; /*  64 +  8 */
 9uint64_t uniontx_offload;   /*  72 +  8 */
10struct rte_mbuf_ext_shared_info * shinfo;   /*  80 +  8 */
11uint16_t  priv_size;/*  88 +  2 */
  uint16_t  timesync; /*  90 +  2 */
11.5  uint32_t  dynfield1[9]; /*  92 + 36 */
16/* --- END 128  */

Signed-off-by: Thomas Monjalon 
---
 doc/guides/rel_notes/deprecation.rst | 7 ---
 drivers/net/octeontx2/otx2_ethdev.c  | 2 --
 lib/librte_kni/rte_kni_common.h  | 3 ++-
 lib/librte_mbuf/rte_mbuf.h   | 1 -
 lib/librte_mbuf/rte_mbuf_core.h  | 5 ++---
 lib/librte_mbuf/rte_mbuf_dyn.c   | 1 -
 6 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index f3258eb3f7..efb09f0c5e 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -81,13 +81,6 @@ Deprecation Notices
   us extending existing enum/define.
   One solution can be using a fixed size array instead of ``.*MAX.*`` value.
 
-* mbuf: Some fields will be converted to dynamic API in DPDK 20.11
-  in order to reserve more space for the dynamic fields, as explained in
-  `this presentation `_.
-  As a consequence, the layout of the ``struct rte_mbuf`` will be re-arranged,
-  avoiding impact on vectorized implementation of the driver datapaths,
-  while evaluating performance gains of a better use of the first cache line.
-
 * ethdev: The flow director API, including ``rte_eth_conf.fdir_conf`` field,
   and the related structures (``rte_fdir_*`` and ``rte_eth_fdir_*``),
   will be removed in DPDK 20.11.
diff --git a/drivers/net/octeontx2/otx2_ethdev.c 
b/drivers/net/octeontx2/otx2_ethdev.c
index 6cebbe677d..d6e0f1dd03 100644
--- a/drivers/net/octeontx2/otx2_ethdev.c
+++ b/drivers/net/octeontx2/otx2_ethdev.c
@@ -748,8 +748,6 @@ nix_tx_offload_flags(struct rte_eth_dev *eth_dev)
 offsetof(struct rte_mbuf, buf_iova) + 16);
RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pkt_len) !=
 offsetof(struct rte_mbuf, ol_flags) + 12);
-   RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, tx_offload) !=
-offsetof(struct rte_mbuf, pool) + 2 * sizeof(void *));
 
if (conf & DEV_TX_OFFLOAD_VLAN_INSERT ||
conf & DEV_TX_OFFLOAD_QINQ_INSERT)
diff --git a/lib/librte_kni/rte_kni_common.h b/lib/librte_kni/rte_kni_common.h
index 36d66e2ffa..ffb3182731 100644
--- a/lib/librte_kni/rte_kni_common.h
+++ b/lib/librte_kni/rte_kni_common.h
@@ -84,10 +84,11 @@ struct rte_kni_mbuf {
char pad2[4];
uint32_t pkt_len;   /**< Total pkt len: sum of all segment 
data_len. */
uint16_t data_len;  /**< Amount of data in segment buffer. */
+   char pad3[14];
+   void *pool;
 
/* fields on second cache line */
__attribute__((__aligned__(RTE_CACHE_LINE_MIN_SIZE)))
-   voi

Re: [dpdk-dev] [PATCH 1/1] mbuf: move pool pointer in first half

2020-11-07 Thread Jerin Jacob
On Sat, Nov 7, 2020 at 10:04 PM Thomas Monjalon  wrote:
>
> The mempool pointer in the mbuf struct is moved
> from the second to the first half.
> It should increase performance on most systems having 64-byte cache line,

> i.e. mbuf is split in two cache lines.

But In any event, Tx needs to touch the pool to freeing back to the
pool upon  Tx completion. Right?
Not able to understand the motivation for moving it to the first 64B cache line?
The gain varies from driver to driver. For example, a Typical
ARM-based NPU does not need to
touch the pool in Rx and its been filled by HW. Whereas it needs to
touch in Tx if the reference count is implemented.


>
> Due to this change, tx_offload is moved, so some vector data paths
> may need to be adjusted. Note: OCTEON TX2 check is removed temporarily!

It will be breaking the Tx path, Please just don't remove the static
assert without adjusting the code.

>
> Moving this field gives more space to dynfield1.
>
> This is how the mbuf layout looks like (pahole-style):
>
> word  type  namebyte  size
>  0void *buf_addr; /*   0 +  8 */
>  1rte_iova_tbuf_iova  /*   8 +  8 */
>   /* --- RTE_MARKER64   rearm_data;   */
>  2uint16_t  data_off; /*  16 +  2 */
>   uint16_t  refcnt;   /*  18 +  2 */
>   uint16_t  nb_segs;  /*  20 +  2 */
>   uint16_t  port; /*  22 +  2 */
>  3uint64_t  ol_flags; /*  24 +  8 */
>   /* --- RTE_MARKER rx_descriptor_fields1;*/
>  4uint32_t unionpacket_type;  /*  32 +  4 */
>   uint32_t  pkt_len;  /*  36 +  4 */
>  5uint16_t  data_len; /*  40 +  2 */
>   uint16_t  vlan_tci; /*  42 +  2 */
>  5.5  uint64_t unionhash; /*  44 +  8 */
>  6.5  uint16_t  vlan_tci_outer;   /*  52 +  2 */
>   uint16_t  buf_len;  /*  54 +  2 */
>  7struct rte_mempool *  pool; /*  56 +  8 */
>   /* --- RTE_MARKER cacheline1;   */
>  8struct rte_mbuf * next; /*  64 +  8 */
>  9uint64_t uniontx_offload;   /*  72 +  8 */
> 10struct rte_mbuf_ext_shared_info * shinfo;   /*  80 +  8 */
> 11uint16_t  priv_size;/*  88 +  2 */
>   uint16_t  timesync; /*  90 +  2 */
> 11.5  uint32_t  dynfield1[9]; /*  92 + 36 */
> 16/* --- END 128  */
>
> Signed-off-by: Thomas Monjalon 
> ---
>  doc/guides/rel_notes/deprecation.rst | 7 ---
>  drivers/net/octeontx2/otx2_ethdev.c  | 2 --
>  lib/librte_kni/rte_kni_common.h  | 3 ++-
>  lib/librte_mbuf/rte_mbuf.h   | 1 -
>  lib/librte_mbuf/rte_mbuf_core.h  | 5 ++---
>  lib/librte_mbuf/rte_mbuf_dyn.c   | 1 -
>  6 files changed, 4 insertions(+), 15 deletions(-)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst 
> b/doc/guides/rel_notes/deprecation.rst
> index f3258eb3f7..efb09f0c5e 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -81,13 +81,6 @@ Deprecation Notices
>us extending existing enum/define.
>One solution can be using a fixed size array instead of ``.*MAX.*`` value.
>
> -* mbuf: Some fields will be converted to dynamic API in DPDK 20.11
> -  in order to reserve more space for the dynamic fields, as explained in
> -  `this presentation `_.
> -  As a consequence, the layout of the ``struct rte_mbuf`` will be 
> re-arranged,
> -  avoiding impact on vectorized implementation of the driver datapaths,
> -  while evaluating performance gains of a better use of the first cache line.
> -
>  * ethdev: The flow director API, including ``rte_eth_conf.fdir_conf`` field,
>and the related structures (``rte_fdir_*`` and ``rte_eth_fdir_*``),
>will be removed in DPDK 20.11.
> diff --git a/drivers/net/octeontx2/otx2_ethdev.c 
> b/drivers/net/octeontx2/otx2_ethdev.c
> index 6cebbe677d..d6e0f1dd03 100644
> --- a/drivers/net/octeontx2/otx2_ethdev.c
> +++ b/drivers/net/octeontx2/otx2_ethdev.c
> @@ -748,8 +748,6 @@ nix_tx_offload_flags(struct rte_eth_dev *eth_dev)
>  offsetof(struct rte_mbuf, buf_iova) + 16);
> RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pkt_len) !=
>  offsetof(struct rte_mbuf, ol_flags) + 12);
> -   RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, tx_offload) !=
> -offsetof(str

Re: [dpdk-dev] [PATCH v2] app/testpmd: support age shared action context

2020-11-07 Thread Matan Azrad
Hi Ferruh

From: Ferruh Yigit
> On 11/5/2020 9:32 PM, Matan Azrad wrote:
> > When an age action becomes aged-out the rte_flow_get_aged_flows should
> > return the action context supplied by the configuration structure.
> >
> > In case the age action created by the shared action API, the shared
> > action context of the Testpmd application was not set.
> >
> > In addition, the application handler of the contexts returned by the
> > rte_flow_get_aged_flows API didn't consider the fact that the action
> > could be set by the shared action API and considered it as regular
> > flow context.
> >
> > This caused a crash in Testpmd when the context is parsed.
> >
> > This patch set context type in the flow and shared action context and
> > uses it to parse the aged-out contexts correctly.
> >
> > Signed-off-by: Matan Azrad 
> > Acked-by: Dekel Peled 
> > ---
> >   app/test-pmd/config.c  | 119 ++---
> 
> >   app/test-pmd/testpmd.h |   5 +++
> >   2 files changed, 87 insertions(+), 37 deletions(-)
> >
> > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
> > 755d1df..00a7dd1 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -1763,6 +1763,33 @@ void port_flow_tunnel_create(portid_t port_id,
> const struct tunnel_ops *ops)
> >   }
> >   }
> >
> > +#define AGE_ACTION_TYPE_MASK 0x3u
> > +
> > +static void
> > +set_age_action_context(void **ctx, enum action_age_context_type type,
> > +void *obj) {
> > + uintptr_t value = (uintptr_t)obj;
> > +
> > + /*
> > +  * obj is allocated by malloc\calloc which must return an address
> > +  * aligned to 8.
> > +  * Use the last 2 bits for the age context type.
> > +  */
> > + value |= (uintptr_t)type & AGE_ACTION_TYPE_MASK;
> > + *ctx = (void *)value;
> 
> Thanks Matan, I think this is much clear. But I though the 'id' will be used, 
> not
> the pointer itself, like "uintptr_t value = id | (type * MASK)"
> OR the address pointer and type seems error prone, although you comment
> you rely on the alignment.

I understand your concern, that's why the context value management is wrapped 
well by dedicated functions for set and parse.
Also it's very optimized way for memory and time especially when we are talking 
about big scale(see below).

> The testpmd usage also kind of sample usage for the applications, I am for not
> suggesting this for the user applications.


> Reserving the two bit of the 'id' reduces the usable 'id' to 30 bits, but it 
> looks
> still big enough, what do you think?


Yes, it is big enough.
The problem with the id is the latency to get the pointer from it.
Since both the flows and the shared actions are saved in a list we need to 
traverse all the list in order to get the pointer and the needed information.



Re: [dpdk-dev] [PATCH 1/1] mbuf: move pool pointer in first half

2020-11-07 Thread Thomas Monjalon
07/11/2020 18:12, Jerin Jacob:
> On Sat, Nov 7, 2020 at 10:04 PM Thomas Monjalon  wrote:
> >
> > The mempool pointer in the mbuf struct is moved
> > from the second to the first half.
> > It should increase performance on most systems having 64-byte cache line,
> 
> > i.e. mbuf is split in two cache lines.
> 
> But In any event, Tx needs to touch the pool to freeing back to the
> pool upon  Tx completion. Right?
> Not able to understand the motivation for moving it to the first 64B cache 
> line?
> The gain varies from driver to driver. For example, a Typical
> ARM-based NPU does not need to
> touch the pool in Rx and its been filled by HW. Whereas it needs to
> touch in Tx if the reference count is implemented.
> 
> > Due to this change, tx_offload is moved, so some vector data paths
> > may need to be adjusted. Note: OCTEON TX2 check is removed temporarily!
> 
> It will be breaking the Tx path, Please just don't remove the static
> assert without adjusting the code.

Of course not.
I looked at the vector Tx path of OCTEON TX2,
it's close to be impossible to understand :)
Please help!




Re: [dpdk-dev] [PATCH 1/1] mbuf: move pool pointer in first half

2020-11-07 Thread Morten Brørup
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Saturday, November 7, 2020 4:53 PM
> 
> The mempool pointer in the mbuf struct is moved
> from the second to the first half.
> It should increase performance on most systems having 64-byte cache line,
> i.e. mbuf is split in two cache lines.

Perhaps with the #define DEV_TX_OFFLOAD_MBUF_FAST_FREE mentioned by Konstantin, 
it might be better moving m->next instead, at least for applications with the 
opportunity to set this flag (e.g. applications with only one mbuf pool).

Unfortunately, the information about the DEV_TX_OFFLOAD_MBUF_FAST_FREE flag 
came to light after the techboard meeting, and I don't have any benchmarks to 
support this suggestion, so I guess it's hard to change the techboard's 
decision to move the pool pointer.

> 
> Due to this change, tx_offload is moved, so some vector data paths
> may need to be adjusted. Note: OCTEON TX2 check is removed temporarily!

Regardless if m->pool or m->next is moved, it will not affect any issues 
related to m->tx_offload moving to a new location in the second cache line.

> 
> Moving this field gives more space to dynfield1.
> 
> This is how the mbuf layout looks like (pahole-style):
> 
> word  type  namebyte  size
>  0void *buf_addr; /*   0 +  8 */
>  1rte_iova_tbuf_iova  /*   8 +  8 */
>   /* --- RTE_MARKER64   rearm_data;   */
>  2uint16_t  data_off; /*  16 +  2 */
>   uint16_t  refcnt;   /*  18 +  2 */
>   uint16_t  nb_segs;  /*  20 +  2 */
>   uint16_t  port; /*  22 +  2 */
>  3uint64_t  ol_flags; /*  24 +  8 */
>   /* --- RTE_MARKER rx_descriptor_fields1;*/
>  4uint32_t unionpacket_type;  /*  32 +  4 */
>   uint32_t  pkt_len;  /*  36 +  4 */
>  5uint16_t  data_len; /*  40 +  2 */
>   uint16_t  vlan_tci; /*  42 +  2 */
>  5.5  uint64_t unionhash; /*  44 +  8 */
>  6.5  uint16_t  vlan_tci_outer;   /*  52 +  2 */
>   uint16_t  buf_len;  /*  54 +  2 */
>  7struct rte_mempool *  pool; /*  56 +  8 */
>   /* --- RTE_MARKER cacheline1;   */
>  8struct rte_mbuf * next; /*  64 +  8 */
>  9uint64_t uniontx_offload;   /*  72 +  8 */
> 10struct rte_mbuf_ext_shared_info * shinfo;   /*  80 +  8 */
> 11uint16_t  priv_size;/*  88 +  2 */
>   uint16_t  timesync; /*  90 +  2 */
> 11.5  uint32_t  dynfield1[9]; /*  92 + 36 */
> 16/* --- END 128  */
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  doc/guides/rel_notes/deprecation.rst | 7 ---
>  drivers/net/octeontx2/otx2_ethdev.c  | 2 --
>  lib/librte_kni/rte_kni_common.h  | 3 ++-
>  lib/librte_mbuf/rte_mbuf.h   | 1 -
>  lib/librte_mbuf/rte_mbuf_core.h  | 5 ++---
>  lib/librte_mbuf/rte_mbuf_dyn.c   | 1 -
>  6 files changed, 4 insertions(+), 15 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index f3258eb3f7..efb09f0c5e 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -81,13 +81,6 @@ Deprecation Notices
>us extending existing enum/define.
>One solution can be using a fixed size array instead of ``.*MAX.*``
> value.
> 
> -* mbuf: Some fields will be converted to dynamic API in DPDK 20.11
> -  in order to reserve more space for the dynamic fields, as explained in
> -  `this presentation `_.
> -  As a consequence, the layout of the ``struct rte_mbuf`` will be re-
> arranged,
> -  avoiding impact on vectorized implementation of the driver datapaths,
> -  while evaluating performance gains of a better use of the first cache
> line.
> -
>  * ethdev: The flow director API, including ``rte_eth_conf.fdir_conf``
> field,
>and the related structures (``rte_fdir_*`` and ``rte_eth_fdir_*``),
>will be removed in DPDK 20.11.
> diff --git a/drivers/net/octeontx2/otx2_ethdev.c
> b/drivers/net/octeontx2/otx2_ethdev.c
> index 6cebbe677d..d6e0f1dd03 100644
> --- a/drivers/net/octeontx2/otx2_ethdev.c
> +++ b/drivers/net/octeontx2/otx2_ethdev.c
> @@ -748,8 +748,6 @@ nix_tx_offload_flags(struct rte_eth_dev *eth_dev)
>offsetof(struct rte_mbuf, buf_iova) + 16);
>   RTE_BUILD_BUG_ON(offsetof(struct rt

Re: [dpdk-dev] [PATCH 1/1] mbuf: move pool pointer in first half

2020-11-07 Thread Jerin Jacob
On Sun, Nov 8, 2020 at 12:09 AM Thomas Monjalon  wrote:
>
> 07/11/2020 18:12, Jerin Jacob:
> > On Sat, Nov 7, 2020 at 10:04 PM Thomas Monjalon  wrote:
> > >
> > > The mempool pointer in the mbuf struct is moved
> > > from the second to the first half.
> > > It should increase performance on most systems having 64-byte cache line,
> >
> > > i.e. mbuf is split in two cache lines.
> >
> > But In any event, Tx needs to touch the pool to freeing back to the
> > pool upon  Tx completion. Right?
> > Not able to understand the motivation for moving it to the first 64B cache 
> > line?
> > The gain varies from driver to driver. For example, a Typical
> > ARM-based NPU does not need to
> > touch the pool in Rx and its been filled by HW. Whereas it needs to
> > touch in Tx if the reference count is implemented.

See below.

> >
> > > Due to this change, tx_offload is moved, so some vector data paths
> > > may need to be adjusted. Note: OCTEON TX2 check is removed temporarily!
> >
> > It will be breaking the Tx path, Please just don't remove the static
> > assert without adjusting the code.
>
> Of course not.
> I looked at the vector Tx path of OCTEON TX2,
> it's close to be impossible to understand :)
> Please help!

Off course. Could you check the above section any share the rationale
for this change
and where it helps and how much it helps?


>
>


Re: [dpdk-dev] [PATCH 1/1] mbuf: move pool pointer in first half

2020-11-07 Thread Thomas Monjalon
07/11/2020 20:05, Jerin Jacob:
> On Sun, Nov 8, 2020 at 12:09 AM Thomas Monjalon  wrote:
> > 07/11/2020 18:12, Jerin Jacob:
> > > On Sat, Nov 7, 2020 at 10:04 PM Thomas Monjalon  
> > > wrote:
> > > >
> > > > The mempool pointer in the mbuf struct is moved
> > > > from the second to the first half.
> > > > It should increase performance on most systems having 64-byte cache 
> > > > line,
> > >
> > > > i.e. mbuf is split in two cache lines.
> > >
> > > But In any event, Tx needs to touch the pool to freeing back to the
> > > pool upon  Tx completion. Right?
> > > Not able to understand the motivation for moving it to the first 64B 
> > > cache line?
> > > The gain varies from driver to driver. For example, a Typical
> > > ARM-based NPU does not need to
> > > touch the pool in Rx and its been filled by HW. Whereas it needs to
> > > touch in Tx if the reference count is implemented.
> 
> See below.
> 
> > >
> > > > Due to this change, tx_offload is moved, so some vector data paths
> > > > may need to be adjusted. Note: OCTEON TX2 check is removed temporarily!
> > >
> > > It will be breaking the Tx path, Please just don't remove the static
> > > assert without adjusting the code.
> >
> > Of course not.
> > I looked at the vector Tx path of OCTEON TX2,
> > it's close to be impossible to understand :)
> > Please help!
> 
> Off course. Could you check the above section any share the rationale
> for this change
> and where it helps and how much it helps?

It has been concluded in the techboard meeting you were part of.
I don't understand why we restart this discussion again.
I won't have the energy to restart this process myself.
If you don't want to apply the techboard decision, then please
do the necessary to request another quick decision.





Re: [dpdk-dev] [PATCH v8 04/14] build: reformat and move Arm config and comments

2020-11-07 Thread Honnappa Nagarahalli


> 
> Change formatting so that it's more consistent and readable, add/modify
> comments/stdout messages, move configuration options to more appropriate
> places and make the order consistent according to these rules:
> 1. First list generic configuration options, then list options that may
>be overwritten. List SoC-specific options last.
> 2. For SoC-specific options, list number of cores before the number of
>NUMA nodes, to make it consistent with config/meson.build.
> 
> Signed-off-by: Juraj Linkeš 
Few nits, otherwise, looks good.
Reviewed-by: Honnappa Nagarahalli 

> ---
>  config/arm/arm64_armv8_linux_gcc | 21 ++-
>  config/arm/meson.build   | 94 +++-
>  2 files changed, 77 insertions(+), 38 deletions(-)
> 
> diff --git a/config/arm/arm64_armv8_linux_gcc
> b/config/arm/arm64_armv8_linux_gcc
> index 13ee8b223..04cd82ba9 100644
> --- a/config/arm/arm64_armv8_linux_gcc
> +++ b/config/arm/arm64_armv8_linux_gcc
> @@ -13,9 +13,16 @@ cpu = 'armv8-a'
>  endian = 'little'
> 
>  [properties]
> +# Supported implementers:
> +# 'generic': Generic armv8
> +# '0x41':Arm
> +# '0x43':Cavium
> +# '0x50':Ampere Computing
> +# '0x56':Marvell ARMADA
> +# 'dpaa':NXP DPAA
I do think the comments add much value here and they are not relevant too (as 
this is a cross build file for a generic build). They are captured in 
config/arm/meson.build already.

Instead, would be good to add something like "Generate binaries that are 
portable across all Armv8 machines"

>  implementer_id = 'generic'
> 
> -# Valid options for Arm's part_number:
> +# Supported part_numbers for generic, 0x41, 0x56, dpaa:
>  # 'generic': valid for all armv8-a architectures (default value)
>  # '0xd03':   cortex-a53
>  # '0xd04':   cortex-a35
> @@ -25,4 +32,16 @@ implementer_id = 'generic'
>  # '0xd09':   cortex-a73
>  # '0xd0a':   cortex-a75
>  # '0xd0b':   cortex-a76
> +# '0xd0c':   neoverse-n1
>  part_number = 'generic'
> +
> +# Supported part_numbers for 0x43:
> +# 'generic': valid for all Cavium builds
> +# '0xa1':thunderxt88
> +# '0xa2':thunderxt81
> +# '0xa3':thunderxt83
> +# '0xaf':thunderx2t99
> +# '0xb2':octeontx2
Same here. It would be good to remove the existing comments as well.

> +
> +# Supported part_numbers for 0x50:
> +# 'generic': valid for all Ampere builds
> diff --git a/config/arm/meson.build b/config/arm/meson.build index
> 7c7059cc2..5b922ef9c 100644
> --- a/config/arm/meson.build
> +++ b/config/arm/meson.build
> @@ -5,6 +5,7 @@
> 
>  arm_force_native_march = false
> 
> +# common flags to all aarch64 builds, with lowest priority
>  flags_common_default = [
>   # Accelarate rte_memcpy. Be sure to run unit test
> (memcpy_perf_autotest)
>   # to determine the best threshold in code. Refer to notes in source file
> @@ -12,8 +13,8 @@ flags_common_default = [
>   ['RTE_ARCH_ARM64_MEMCPY', false],
>   #   ['RTE_ARM64_MEMCPY_ALIGNED_THRESHOLD', 2048],
>   #   ['RTE_ARM64_MEMCPY_UNALIGNED_THRESHOLD', 512],
> - # Leave below RTE_ARM64_MEMCPY_xxx options commented out,
> unless there're
> - # strong reasons.
> + # Leave below RTE_ARM64_MEMCPY_xxx options commented out,
> + # unless there are strong reasons.
>   #   ['RTE_ARM64_MEMCPY_SKIP_GCC_VER_CHECK', false],
>   #   ['RTE_ARM64_MEMCPY_ALIGN_MASK', 0xF],
>   #   ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false],
> @@ -23,69 +24,86 @@ flags_common_default = [
> 
>   ['RTE_SCHED_VECTOR', false],
>   ['RTE_ARM_USE_WFE', false],
> + ['RTE_ARCH_ARM64', true],
> + ['RTE_CACHE_LINE_SIZE', 128]
>  ]
> 
> +# implementer specific aarch64 flags, with middle priority # (will
Nit  ^^^ can be removed

> +overwrite common flags)
>  flags_implementer_generic = [
>   ['RTE_MACHINE', '"armv8a"'],
> - ['RTE_MAX_LCORE', 256],
>   ['RTE_USE_C11_MEM_MODEL', true],
> - ['RTE_CACHE_LINE_SIZE', 128]]
> + ['RTE_CACHE_LINE_SIZE', 128],
> + ['RTE_MAX_LCORE', 256]
> +]



>  flags_implementer_armada = [
>   ['RTE_MACHINE', '"armv8a"'],
>   ['RTE_CACHE_LINE_SIZE', 64],
> - ['RTE_MAX_NUMA_NODES', 1],
> - ['RTE_MAX_LCORE', 16]]
> + ['RTE_MAX_LCORE', 16],
> + ['RTE_MAX_NUMA_NODES', 1]
> +]
> 
> +# part number specific aarch64 flags, with highest priority # (will
Nit ^^^ can be removed

> +overwrite both common and implementer specific flags)
>  flags_part_number_thunderx = [
>   ['RTE_MACHINE', '"thunderx"'],
> - ['RTE_USE_C11_MEM_MODEL', false]]
> + ['RTE_USE_C11_MEM_MODEL', false]
> +]



>  flags_part_number_n1generic = [
>   ['RTE_MACHINE', '"neoverse-n1"'],
> - ['RTE_MAX_LCORE', 64],
> - ['RTE_CACHE_LINE_SIZE', 64],
>   ['RTE_ARM_FEATURE_ATOMICS', true],
>   ['RTE_USE_C11_MEM_MODEL', true],
> - ['RTE_MAX_MEM_MB', 1048576],
> - ['RTE_MAX_NUMA_NODES', 1],
>   ['RTE_EAL_NUMA_AWARE_

Re: [dpdk-dev] [PATCH v8 05/14] build: simplify how Arm flags are processed

2020-11-07 Thread Honnappa Nagarahalli


> 
> Set flags in one loop. Append flags to a list and use the list in the loop.
> 
> Signed-off-by: Juraj Linkeš 
Reviewed-by: Honnappa Nagarahalli 

> ---
>  config/arm/meson.build | 37 +
>  1 file changed, 17 insertions(+), 20 deletions(-)
> 
> diff --git a/config/arm/meson.build b/config/arm/meson.build index
> 5b922ef9c..eda485e7f 100644
> --- a/config/arm/meson.build
> +++ b/config/arm/meson.build
> @@ -150,7 +150,6 @@ if dpdk_conf.get('RTE_ARCH_32')  else
>   # aarch64 build
>   implementer_id = 'generic'
> - machine_args = [] # Clear previous machine args
>   if machine == 'generic' and not meson.is_cross_build()
>   # generic build
>   implementer_config = implementer_generic @@ -183,34
> +182,32 @@ else
>   implementer_config = get_variable('implementer_' +
> implementer_id)
>   endif
> 
> - # Apply Common Defaults. These settings may be overwritten by
> machine
> - # settings later.
> - foreach flag: flags_common_default
> - if flag.length() > 0
> - dpdk_conf.set(flag[0], flag[1])
> - endif
> - endforeach
> + message('Arm implementer: ' + implementer_config[0])
> + message('Arm part number: ' + part_number)
> 
> - message('Implementer : ' + implementer_config[0])
> - foreach flag: implementer_config[1]
> - if flag.length() > 0
> - dpdk_conf.set(flag[0], flag[1])
> - endif
> - endforeach
> + # use default flags with implementer flags
> + dpdk_flags = flags_common_default + implementer_config[1]
> 
> + machine_args = [] # Clear previous machine args
>   foreach marg: implementer_config[2]
>   if marg[0] == part_number
> + # apply supported machine args
>   foreach flag: marg[1]
>   if cc.has_argument(flag)
>   machine_args += flag
>   endif
>   endforeach
> - # Apply any extra machine specific flags.
> - foreach flag: marg.get(2, [])
> - if flag.length() > 0
> - dpdk_conf.set(flag[0], flag[1])
> - endif
> - endforeach
> + if marg.length() > 2
> + # add extra flags for the part
> + dpdk_flags += marg[2]
> + endif
> + endif
> + endforeach
> +
> + # apply flags
> + foreach flag: dpdk_flags
> + if flag.length() > 0
> + dpdk_conf.set(flag[0], flag[1])
>   endif
>   endforeach
>  endif
> --
> 2.20.1



[dpdk-dev] [PATCH v2] net/mlx5: improve vMPRQ descriptors allocation locality

2020-11-07 Thread Alexander Kozyrev
There is a performance penalty for the replenish scheme
used in vectorized Rx burst for both MPRQ and SPRQ.
Mbuf elements are being filled at the end of the mbufs
array and being replenished at the beginning. That leads
to an increase in cache misses and the performance drop.
The more Rx descriptors are used the worse the situation.

Change the allocation scheme for vectorized MPRQ Rx burst:
allocate new mbufs only when consumed mbufs are almost
depleted (always have one burst gap between allocated and
consumed indices). Keeping a small number of mbufs allocated
improves cache locality and improves performance a lot.

Unfortunately, this approach cannot be applied to SPRQ Rx
burst routine. In MPRQ Rx burst we simply copy packets from
external MPRQ buffers or attach these buffers to mbufs.
In SPRQ Rx burst we allow the NIC to fill mbufs for us.
Hence keeping a small number of allocated mbufs will limit
NIC ability to fill as many buffers as possible. This fact
offsets the advantage of better cache locality.

Fixes: 0f20acbf5e ("net/mlx5: implement vectorized MPRQ burst")

Signed-off-by: Alexander Kozyrev 
---
v1: https://patchwork.dpdk.org/patch/83779/
v2: fixed assertion for the number of mbufs to replenish

 drivers/net/mlx5/mlx5_rxtx_vec.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxtx_vec.c
index 469ea8401d..68c51dce31 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
@@ -145,16 +145,17 @@ mlx5_rx_mprq_replenish_bulk_mbuf(struct mlx5_rxq_data 
*rxq)
const uint32_t strd_n = 1 << rxq->strd_num_n;
const uint32_t elts_n = wqe_n * strd_n;
const uint32_t wqe_mask = elts_n - 1;
-   uint32_t n = elts_n - (rxq->elts_ci - rxq->rq_pi);
+   uint32_t n = rxq->elts_ci - rxq->rq_pi;
uint32_t elts_idx = rxq->elts_ci & wqe_mask;
struct rte_mbuf **elts = &(*rxq->elts)[elts_idx];
 
-   /* Not to cross queue end. */
-   if (n >= rxq->rq_repl_thresh) {
-   MLX5_ASSERT(n >= MLX5_VPMD_RXQ_RPLNSH_THRESH(elts_n));
+   if (n <= rxq->rq_repl_thresh) {
+   MLX5_ASSERT(n + MLX5_VPMD_RX_MAX_BURST >=
+   MLX5_VPMD_RXQ_RPLNSH_THRESH(elts_n));
MLX5_ASSERT(MLX5_VPMD_RXQ_RPLNSH_THRESH(elts_n) >
 MLX5_VPMD_DESCS_PER_LOOP);
-   n = RTE_MIN(n, elts_n - elts_idx);
+   /* Not to cross queue end. */
+   n = RTE_MIN(n + MLX5_VPMD_RX_MAX_BURST, elts_n - elts_idx);
if (rte_mempool_get_bulk(rxq->mp, (void *)elts, n) < 0) {
rxq->stats.rx_nombuf += n;
return;
-- 
2.24.1



[dpdk-dev] [PATCH v2] net/mlx5: fix info about Rx descriptors for MPRQ

2020-11-07 Thread Alexander Kozyrev
The number of descriptors configured is returned to a user
via the rxq_info_get API. This number is incorrect for MPRQ.
For SPRQ this number matches the number of mbufs allocated.
For MPRQ we have fewer external MPRQ buffers that can hold
multiple packets in strides of this big buffer. Take that
into account and return the number of MPRQ buffers multiplied
by the number of strides in this case.

Fixes: 26f1bae837 ("net/mlx5: add Rx/Tx burst mode info")
Cc: sta...@dpdk.org

Signed-off-by: Alexander Kozyrev 
---
v1: https://patchwork.dpdk.org/patch/83778/
v2: fixed a commit message typo and a coverity issue

 drivers/net/mlx5/mlx5_rxtx.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 3aee8d4def..c540849da2 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -556,7 +556,7 @@ mlx5_rxq_info_get(struct rte_eth_dev *dev, uint16_t 
rx_queue_id,
 
if (!rxq)
return;
-   qinfo->mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
+   qinfo->mp = mlx5_rxq_mprq_enabled(rxq) ?
rxq->mprq_mp : rxq->mp;
qinfo->conf.rx_thresh.pthresh = 0;
qinfo->conf.rx_thresh.hthresh = 0;
@@ -566,7 +566,9 @@ mlx5_rxq_info_get(struct rte_eth_dev *dev, uint16_t 
rx_queue_id,
qinfo->conf.rx_deferred_start = rxq_ctrl ? 0 : 1;
qinfo->conf.offloads = dev->data->dev_conf.rxmode.offloads;
qinfo->scattered_rx = dev->data->scattered_rx;
-   qinfo->nb_desc = 1 << rxq->elts_n;
+   qinfo->nb_desc = mlx5_rxq_mprq_enabled(rxq) ?
+   (1 << rxq->elts_n) * (1 << rxq->strd_num_n) :
+   (1 << rxq->elts_n);
 }
 
 /**
-- 
2.24.1



Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free

2020-11-07 Thread Ali Alnubani
Hi Olivier,

> -Original Message-
> From: dev  On Behalf Of Olivier Matz
> Sent: Wednesday, November 4, 2020 7:00 PM
> To: dev@dpdk.org
> Cc: konstantin.anan...@intel.com; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
> 

I can reproduce the failure in "ci/iol-mellanox-Performance" locally also with 
this patch.
I see a 2 mpps degradation with single core on ConnectX-5. Packet size is 64B. 
Testpmd command:
"""
dpdk-testpmd -n 4  -w :82:00.1  -w :82:00.0 -c 0x9000  -- --burst=64 
--mbcache=512 -i  --nb-cores=1  --txd=256 --rxd=256
"""

Regards,
Ali


[dpdk-dev] [PATCH] net/mlx5: make flow opration thread safe

2020-11-07 Thread Weifeng Li
Does it neet a lock for flow about below scene.
Thread1: flow_list_destroyflow_list_create
Thread2: -flow_list_destroy
Maybe the same flow can be operate at the same time.

When i start mlx5 bond and trigger LSC at the same time.
It is possible to assert in mlx5_rx_queue_release func and
print "port 4 Rx queu 0 is still used by a flow and cannot
be removed". I use dpdk-testpmd to simulate the test.

Signed-off-by: Weifeng Li 
---
 drivers/net/mlx5/linux/mlx5_os.c |  1 +
 drivers/net/mlx5/mlx5.h  |  1 +
 drivers/net/mlx5/mlx5_flow.c | 13 +++--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index c78d56f..59c074e 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1426,6 +1426,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
  MLX5_MAX_MAC_ADDRESSES);
priv->flows = 0;
priv->ctrl_flows = 0;
+   rte_spinlock_init(&priv->flow_lock);
rte_spinlock_init(&priv->flow_list_lock);
TAILQ_INIT(&priv->flow_meters);
TAILQ_INIT(&priv->flow_meter_profiles);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b43a8c9..860bf2f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -963,6 +963,7 @@ struct mlx5_priv {
unsigned int reta_idx_n; /* RETA index size. */
struct mlx5_drop drop_queue; /* Flow drop queues. */
uint32_t flows; /* RTE Flow rules. */
+   rte_spinlock_t flow_lock;
uint32_t ctrl_flows; /* Control flow rules. */
rte_spinlock_t flow_list_lock;
struct mlx5_obj_ops obj_ops; /* HW objects operations. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index e9c0ddd..69d8159 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -5577,6 +5577,7 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
external, hairpin_flow, error);
if (ret < 0)
goto error_before_hairpin_split;
+   rte_spinlock_lock(&priv->flow_lock);
flow = mlx5_ipool_zmalloc(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], &idx);
if (!flow) {
rte_errno = ENOMEM;
@@ -5598,8 +5599,10 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
memset(rss_desc, 0, offsetof(struct mlx5_flow_rss_desc, queue));
rss = flow_get_rss_action(p_actions_rx);
if (rss) {
-   if (flow_rss_workspace_adjust(wks, rss_desc, rss->queue_num))
+   if (flow_rss_workspace_adjust(wks, rss_desc, rss->queue_num)) {
+   rte_spinlock_unlock(&priv->flow_lock);
return 0;
+   }
/*
 * The following information is required by
 * mlx5_flow_hashfields_adjust() in advance.
@@ -5723,6 +5726,7 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
__atomic_add_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED);
mlx5_free(default_miss_ctx.queue);
}
+   rte_spinlock_unlock(&priv->flow_lock);
return idx;
 error:
MLX5_ASSERT(flow);
@@ -5738,6 +5742,7 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
wks->flow_nested_idx = 0;
 error_before_hairpin_split:
rte_free(translated_actions);
+   rte_spinlock_unlock(&priv->flow_lock);
return 0;
 }
 
@@ -5877,11 +5882,14 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t 
*list,
  uint32_t flow_idx)
 {
struct mlx5_priv *priv = dev->data->dev_private;
+   rte_spinlock_lock(&priv->flow_lock);
struct rte_flow *flow = mlx5_ipool_get(priv->sh->ipool
   [MLX5_IPOOL_RTE_FLOW], flow_idx);
 
-   if (!flow)
+   if (!flow) {
+   rte_spinlock_unlock(&priv->flow_lock);
return;
+   }
/*
 * Update RX queue flags only if port is started, otherwise it is
 * already clean.
@@ -5908,6 +5916,7 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
}
flow_mreg_del_copy_action(dev, flow);
mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
+   rte_spinlock_unlock(&priv->flow_lock);
 }
 
 /**
-- 
2.9.5



[dpdk-dev] [PATCH v2] net/mlx5: make flow operation thread safe

2020-11-07 Thread Weifeng Li
Does it need a lock for flow about below scene.
Thread1: flow_list_destroyflow_list_create
Thread2: -flow_list_destroy
Maybe the same flow can be operate at the same time.

When i start mlx5 bond and trigger LSC at the same time.
It is possible to assert in mlx5_rx_queue_release func and
print "port 4 Rx queue 0 is still used by a flow and cannot
be removed". I use dpdk-testpmd to simulate the test.

Signed-off-by: Weifeng Li 
---
v2: adjust coding style issue.
---
 drivers/net/mlx5/linux/mlx5_os.c |  1 +
 drivers/net/mlx5/mlx5.h  |  1 +
 drivers/net/mlx5/mlx5_flow.c | 13 +++--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index c78d56f..59c074e 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1426,6 +1426,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
  MLX5_MAX_MAC_ADDRESSES);
priv->flows = 0;
priv->ctrl_flows = 0;
+   rte_spinlock_init(&priv->flow_lock);
rte_spinlock_init(&priv->flow_list_lock);
TAILQ_INIT(&priv->flow_meters);
TAILQ_INIT(&priv->flow_meter_profiles);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b43a8c9..860bf2f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -963,6 +963,7 @@ struct mlx5_priv {
unsigned int reta_idx_n; /* RETA index size. */
struct mlx5_drop drop_queue; /* Flow drop queues. */
uint32_t flows; /* RTE Flow rules. */
+   rte_spinlock_t flow_lock;
uint32_t ctrl_flows; /* Control flow rules. */
rte_spinlock_t flow_list_lock;
struct mlx5_obj_ops obj_ops; /* HW objects operations. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index e9c0ddd..69d8159 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -5577,6 +5577,7 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
external, hairpin_flow, error);
if (ret < 0)
goto error_before_hairpin_split;
+   rte_spinlock_lock(&priv->flow_lock);
flow = mlx5_ipool_zmalloc(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], &idx);
if (!flow) {
rte_errno = ENOMEM;
@@ -5598,8 +5599,10 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
memset(rss_desc, 0, offsetof(struct mlx5_flow_rss_desc, queue));
rss = flow_get_rss_action(p_actions_rx);
if (rss) {
-   if (flow_rss_workspace_adjust(wks, rss_desc, rss->queue_num))
+   if (flow_rss_workspace_adjust(wks, rss_desc, rss->queue_num)) {
+   rte_spinlock_unlock(&priv->flow_lock);
return 0;
+   }
/*
 * The following information is required by
 * mlx5_flow_hashfields_adjust() in advance.
@@ -5723,6 +5726,7 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
__atomic_add_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED);
mlx5_free(default_miss_ctx.queue);
}
+   rte_spinlock_unlock(&priv->flow_lock);
return idx;
 error:
MLX5_ASSERT(flow);
@@ -5738,6 +5742,7 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
wks->flow_nested_idx = 0;
 error_before_hairpin_split:
rte_free(translated_actions);
+   rte_spinlock_unlock(&priv->flow_lock);
return 0;
 }
 
@@ -5877,11 +5882,14 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t 
*list,
  uint32_t flow_idx)
 {
struct mlx5_priv *priv = dev->data->dev_private;
+   rte_spinlock_lock(&priv->flow_lock);
struct rte_flow *flow = mlx5_ipool_get(priv->sh->ipool
   [MLX5_IPOOL_RTE_FLOW], flow_idx);
 
-   if (!flow)
+   if (!flow) {
+   rte_spinlock_unlock(&priv->flow_lock);
return;
+   }
/*
 * Update RX queue flags only if port is started, otherwise it is
 * already clean.
@@ -5908,6 +5916,7 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
}
flow_mreg_del_copy_action(dev, flow);
mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
+   rte_spinlock_unlock(&priv->flow_lock);
 }
 
 /**
-- 
2.9.5