date:20240527

Hi,

From: Bill Zhou 
Sent: Monday, May 20, 2024 9:25 AM
To: Alex Vesker; Dariusz Sosnowski; Slava Ovsiienko; Ori Kam; Suanming Mou; 
Matan Azrad
Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL); Raslan Darawsheh
Subject: [PATCH] net/mlx5/hws: add support for NVGRE matching

Add HWS support for RTE_FLOW_ITEM_TYPE_NVGRE item
all fields.

Signed-off-by: Dong Zhou 
Acked-by: Alex Vesker 

Patch applied to next-net-mlx,

Kindest regards
Raslan Darawsheh

Re: [PATCH] net/mlx5: fix Rx Hash queue resource release in sample flow

Hi,

From: Jiawei(Jonny) Wang 
Sent: Monday, May 20, 2024 6:07 PM
To: Bing Zhao; Suanming Mou; Dariusz Sosnowski; Slava Ovsiienko; Ori Kam; Matan 
Azrad
Cc: dev@dpdk.org; Raslan Darawsheh; sta...@dpdk.org
Subject: [PATCH] net/mlx5: fix Rx Hash queue resource release in sample flow

While the queue/rss action was added to sample action lists,
the rx hash queue resource was allocated in the sample action
translation to create the sample DR action later.

While there's a failure in the flow creation, the Rx hash queue
resource of the sample action list was destroyed in the wrong place.

This patch adds the checking to release the Rx hash queue
resource after the sample action release, to avoid one more
extra release if there's a failure.

Fixes: ca5eb60ecd5b ("net/mlx5: fix resource release for mirror flow")
Cc: sta...@dpdk.org

Signed-off-by: Jiawei Wang 
Reviewed-by: Bing Zhao 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

Re: [PATCH] net/mlx5: remove redundant macro

Hi,

From: Thomas Monjalon 
Sent: Friday, May 24, 2024 4:42 PM
To: dev@dpdk.org
Cc: Dariusz Sosnowski; Slava Ovsiienko; Ori Kam; Suanming Mou; Matan Azrad
Subject: [PATCH] net/mlx5: remove redundant macro

The macro MLX5_BITSHIFT() is not used anymore,
and is redundant with RTE_BIT64(),
so it can be removed.

Signed-off-by: Thomas Monjalon 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

Re: [PATCH] net/mlx5: fix indirect action template error handling

Hi,

From: Maayan Kashani 
Sent: Sunday, May 26, 2024 4:22 PM
To: dev@dpdk.org
Cc: Maayan Kashani; Suanming Mou; Raslan Darawsheh; sta...@dpdk.org; Dariusz 
Sosnowski; Slava Ovsiienko; Ori Kam; Matan Azrad
Subject: [PATCH] net/mlx5: fix indirect action template error handling

For indirect action type, on  error case the function jumped to err
but returned zero cause rte_errno was not initialized before the jump.
It caused no error in table creation.

In case reaching an error, if rte_errno is not initialized,
it will be set to EINVAL.
Now table creation should fail if the translate of the action fails.
Added driver log warnings so it can be easy to track failure on shared
actions translate.

Fixes: 7ab3962d2d2b ("net/mlx5: add indirect HW steering action")
Cc: sta...@dpdk.org
Signed-off-by: Maayan Kashani 
Acked-by: Suanming Mou 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

Re: [PATCH] doc: add firmware update links in mlx5 guide

Hi,
From: Thomas Monjalon 
Sent: Thursday, May 23, 2024 2:28 PM
To: dev@dpdk.org
Cc: Dariusz Sosnowski; Slava Ovsiienko; Ori Kam; Suanming Mou; Matan Azrad
Subject: [PATCH] doc: add firmware update links in mlx5 guide

If using upstream kernel and libraries, there was no direct link
to download the firmware and update tool.
Firmware update explanations are reorganized and include all links.

Signed-off-by: Thomas Monjalon 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

RE: [PATCH 1/2] eal: provide macro for GCC builtin constant intrinsic

PING for Review/ACK.

Come on fellow reviewers, it's only 5 lines of code!

The mempool library cannot build with MSVC without this patch series.

Other patches are also being held back, waiting for this MSVC compatible DPDK 
macro for __builtin_constant_p().

The macro for MSVC can be improved as suggested by Stephen later.

> From: Morten Brørup [mailto:m...@smartsharesystems.com]
> Sent: Monday, 1 April 2024 10.35
> 
> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > Sent: Monday, 1 April 2024 00.03
> >
> > On Wed, 20 Mar 2024 14:33:35 -0700
> > Tyler Retzlaff  wrote:
> >
> > > +#ifdef RTE_TOOLCHAIN_MSVC
> > > +#define __rte_constant(e) 0
> > > +#else
> > > +#define __rte_constant(e) __extension__(__builtin_constant_p(e))
> > > +#endif
> > > +
> >
> >
> > I did some looking around and some other project have macros
> > for expressing constant expression vs constant.
> >
> > Implementing this with some form of sizeof math is possible.
> > For example in linux/compiler.h
> >
> > /*
> >  * This returns a constant expression while determining if an argument
> > is
> >  * a constant expression, most importantly without evaluating the
> > argument.
> >  * Glory to Martin Uecker 
> >  *
> >  * Details:
> >  * - sizeof() return an integer constant expression, and does not
> > evaluate
> >  *   the value of its operand; it only examines the type of its operand.
> >  * - The results of comparing two integer constant expressions is also
> >  *   an integer constant expression.
> >  * - The first literal "8" isn't important. It could be any literal
> > value.
> >  * - The second literal "8" is to avoid warnings about unaligned
> > pointers;
> >  *   this could otherwise just be "1".
> >  * - (long)(x) is used to avoid warnings about 64-bit types on 32-bit
> >  *   architectures.
> >  * - The C Standard defines "null pointer constant", "(void *)0", as
> >  *   distinct from other void pointers.
> >  * - If (x) is an integer constant expression, then the "* 0l" resolves
> >  *   it into an integer constant expression of value 0. Since it is cast
> > to
> >  *   "void *", this makes the second operand a null pointer constant.
> >  * - If (x) is not an integer constant expression, then the second
> > operand
> >  *   resolves to a void pointer (but not a null pointer constant: the
> > value
> >  *   is not an integer constant 0).
> >  * - The conditional operator's third operand, "(int *)8", is an object
> >  *   pointer (to type "int").
> >  * - The behavior (including the return type) of the conditional
> > operator
> >  *   ("operand1 ? operand2 : operand3") depends on the kind of
> > expressions
> >  *   given for the second and third operands. This is the central
> > mechanism
> >  *   of the macro:
> >  *   - When one operand is a null pointer constant (i.e. when x is an
> > integer
> >  * constant expression) and the other is an object pointer (i.e. our
> >  * third operand), the conditional operator returns the type of the
> >  * object pointer operand (i.e. "int *). Here, within the sizeof(),
> > we
> >  * would then get:
> >  *   sizeof(*((int *)(...))  == sizeof(int)  == 4
> >  *   - When one operand is a void pointer (i.e. when x is not an integer
> >  * constant expression) and the other is an object pointer (i.e. our
> >  * third operand), the conditional operator returns a "void *" type.
> >  * Here, within the sizeof(), we would then get:
> >  *   sizeof(*((void *)(...)) == sizeof(void) == 1
> >  * - The equality comparison to "sizeof(int)" therefore depends on (x):
> >  * sizeof(int) == sizeof(int) (x) was a constant expression
> >  * sizeof(int) != sizeof(void)(x) was not a constant expression
> >  */
> > #define __is_constexpr(x) \
> > (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int
> > *)8)))
> 
> Nice!
> If the author is willing to license it under the BSD license, we can copy it
> as is.
> 
> We might want to add a couple of build time checks to verify that it does what
> is expected; to catch any changes in compiler behavior.

Re: [PATCH 1/2] eal: provide macro for GCC builtin constant intrinsic

2024-05-27 Thread Bruce Richardson

On Wed, Mar 20, 2024 at 02:33:35PM -0700, Tyler Retzlaff wrote:
> MSVC does not have a __builtin_constant_p intrinsic so provide
> __rte_constant(e) that expands false for MSVC and to the intrinsic for
> GCC.
> 
> Signed-off-by: Tyler Retzlaff 
> ---
>  lib/eal/include/rte_common.h | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
> index 298a5c6..d520be6 100644
> --- a/lib/eal/include/rte_common.h
> +++ b/lib/eal/include/rte_common.h
> @@ -44,6 +44,12 @@
>  #endif
>  #endif
>  
> +#ifdef RTE_TOOLCHAIN_MSVC
> +#define __rte_constant(e) 0
> +#else
> +#define __rte_constant(e) __extension__(__builtin_constant_p(e))
> +#endif
> +

Acked-by: Bruce Richardson

RE: [PATCH v12 1/2] mempool cache: add zero-copy get and put functions

> From: Dharmik Thakkar [mailto:dharmikjayesh.thak...@arm.com]
> Sent: Friday, 21 July 2023 18.29
> 
> From: Morten Brørup 
> 
> Zero-copy access to mempool caches is beneficial for PMD performance.
> Furthermore, having a zero-copy mempool API is considered a precondition
> for fixing a certain category of bugs, present in some PMDs: For
> performance reasons, some PMDs had bypassed the mempool API in order to
> achieve zero-copy access to the mempool cache. This can only be fixed
> in those PMDs without a performance regression if the mempool library
> offers zero-copy access APIs, so the PMDs can use the proper mempool
> API instead of copy-pasting code from the mempool library.
> Furthermore, the copy-pasted code in those PMDs has not been kept up to
> date with the improvements of the mempool library, so when they bypass
> the mempool API, mempool trace is missing and mempool statistics is not
> updated.
> 
> Bugzilla ID: 1052
> 
> Signed-off-by: Morten Brørup 
> Signed-off-by: Kamalakshitha Aligeri 
> Signed-off-by: Dharmik Thakkar 
> Reviewed-by: Ruifeng Wang 
> Acked-by: Konstantin Ananyev 
> Acked-by: Chengwen Feng 
> 
> ---

Patchwork shows this series as failing:
https://patchwork.dpdk.org/project/dpdk/list/?series=29003

Please fix and resubmit the series as v13, so we can get it into DPDK.

[PATCH v2] dma/cnxk: add higher chunk size support

2024-05-27 Thread pbhagavatula

From: Pavan Nikhilesh 

Add support to configure higher chunk size by using the new
OPEN_V2 mailbox, this improves performance as the number of
mempool allocs are reduced.
Add timeout when polling for queue idle timeout.

Signed-off-by: Pavan Nikhilesh 
Signed-off-by: Amit Prakash Shukla 
---
v2 Changes:
- Update release notes.
- Use timeout when polling for queue idle state.

 doc/guides/rel_notes/release_24_07.rst |  6 +++
 drivers/common/cnxk/roc_dpi.c  | 72 ++
 drivers/common/cnxk/roc_dpi.h  |  3 ++
 drivers/common/cnxk/roc_dpi_priv.h |  3 ++
 drivers/common/cnxk/version.map|  2 +
 drivers/dma/cnxk/cnxk_dmadev.c | 37 -
 drivers/dma/cnxk/cnxk_dmadev.h |  1 +
 7 files changed, 101 insertions(+), 23 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_07.rst 
b/doc/guides/rel_notes/release_24_07.rst
index a69f24cf99..60b92e4842 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -55,6 +55,12 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===

+* **Updated Marvell CNXK DMA driver.**
+
+  * Updated DMA driver internal pool to use higher chunk size, effectively
+reducing the number of mempool allocs needed, thereby increasing DMA
+performance.
+

 Removed Items
 -
diff --git a/drivers/common/cnxk/roc_dpi.c b/drivers/common/cnxk/roc_dpi.c
index 1ee777d779..892685d185 100644
--- a/drivers/common/cnxk/roc_dpi.c
+++ b/drivers/common/cnxk/roc_dpi.c
@@ -38,6 +38,24 @@ send_msg_to_pf(struct plt_pci_addr *pci_addr, const char 
*value, int size)
return 0;
 }

+int
+roc_dpi_wait_queue_idle(struct roc_dpi *roc_dpi)
+{
+   const uint64_t cyc = (DPI_QUEUE_IDLE_TMO_MS * plt_tsc_hz()) / 1E3;
+   const uint64_t start = plt_tsc_cycles();
+   uint64_t reg;
+
+   /* Wait for SADDR to become idle */
+   reg = plt_read64(roc_dpi->rbase + DPI_VDMA_SADDR);
+   while (!(reg & BIT_ULL(63))) {
+   reg = plt_read64(roc_dpi->rbase + DPI_VDMA_SADDR);
+   if (plt_tsc_cycles() - start == cyc)
+   return -ETIMEDOUT;
+   }
+
+   return 0;
+}
+
 int
 roc_dpi_enable(struct roc_dpi *dpi)
 {
@@ -57,7 +75,6 @@ roc_dpi_configure(struct roc_dpi *roc_dpi, uint32_t chunk_sz, 
uint64_t aura, uin
 {
struct plt_pci_device *pci_dev;
dpi_mbox_msg_t mbox_msg;
-   uint64_t reg;
int rc;

if (!roc_dpi) {
@@ -68,9 +85,9 @@ roc_dpi_configure(struct roc_dpi *roc_dpi, uint32_t chunk_sz, 
uint64_t aura, uin
pci_dev = roc_dpi->pci_dev;

roc_dpi_disable(roc_dpi);
-   reg = plt_read64(roc_dpi->rbase + DPI_VDMA_SADDR);
-   while (!(reg & BIT_ULL(63)))
-   reg = plt_read64(roc_dpi->rbase + DPI_VDMA_SADDR);
+   rc = roc_dpi_wait_queue_idle(roc_dpi);
+   if (rc)
+   return rc;

plt_write64(0x0, roc_dpi->rbase + DPI_VDMA_REQQ_CTL);
plt_write64(chunk_base, roc_dpi->rbase + DPI_VDMA_SADDR);
@@ -87,6 +104,45 @@ roc_dpi_configure(struct roc_dpi *roc_dpi, uint32_t 
chunk_sz, uint64_t aura, uin
if (mbox_msg.s.wqecsoff)
mbox_msg.s.wqecs = 1;

+   rc = send_msg_to_pf(&pci_dev->addr, (const char *)&mbox_msg, 
sizeof(dpi_mbox_msg_t));
+   if (rc < 0)
+   plt_err("Failed to send mbox message %d to DPI PF, err %d", 
mbox_msg.s.cmd, rc);
+
+   return rc;
+}
+
+int
+roc_dpi_configure_v2(struct roc_dpi *roc_dpi, uint32_t chunk_sz, uint64_t 
aura, uint64_t chunk_base)
+{
+   struct plt_pci_device *pci_dev;
+   dpi_mbox_msg_t mbox_msg;
+   int rc;
+
+   if (!roc_dpi) {
+   plt_err("roc_dpi is NULL");
+   return -EINVAL;
+   }
+
+   pci_dev = roc_dpi->pci_dev;
+
+   roc_dpi_disable(roc_dpi);
+
+   rc = roc_dpi_wait_queue_idle(roc_dpi);
+   if (rc)
+   return rc;
+
+   plt_write64(0x0, roc_dpi->rbase + DPI_VDMA_REQQ_CTL);
+   plt_write64(chunk_base, roc_dpi->rbase + DPI_VDMA_SADDR);
+   mbox_msg.u[0] = 0;
+   mbox_msg.u[1] = 0;
+   /* DPI PF driver expects vfid starts from index 0 */
+   mbox_msg.s.vfid = roc_dpi->vfid;
+   mbox_msg.s.cmd = DPI_QUEUE_OPEN_V2;
+   mbox_msg.s.csize = chunk_sz / 8;
+   mbox_msg.s.aura = aura;
+   mbox_msg.s.sso_pf_func = idev_sso_pffunc_get();
+   mbox_msg.s.npa_pf_func = idev_npa_pffunc_get();
+
rc = send_msg_to_pf(&pci_dev->addr, (const char *)&mbox_msg,
sizeof(dpi_mbox_msg_t));
if (rc < 0)
@@ -116,13 +172,11 @@ roc_dpi_dev_fini(struct roc_dpi *roc_dpi)
 {
struct plt_pci_device *pci_dev = roc_dpi->pci_dev;
dpi_mbox_msg_t mbox_msg;
-   uint64_t reg;
int rc;

-   /* Wait for SADDR to become idle */
-   reg = plt_read64(roc_dpi->rbase + DPI_VDMA_SADDR);
-   while (!(reg & BIT_ULL(63)))
-

FW: [PATCH] mempool: dump includes list of memory chunks

PING for review.

@Paul, @Du and @Ferruh, if you think the information provided by this patch 
would have been useful for your recent work with the mempool, please Review or 
ACK it.

-Morten


From: Morten Brørup [mailto:m...@smartsharesystems.com] 
Sent: Thursday, 16 May 2024 11.00

Added information about the memory chunks holding the objects in the
mempool when dumping the status of the mempool to a file.

Signed-off-by: Morten Brørup 
---
 lib/mempool/rte_mempool.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
index 12390a2c81..e9a8a5b411 100644
--- a/lib/mempool/rte_mempool.c
+++ b/lib/mempool/rte_mempool.c
@@ -1230,6 +1230,7 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
 #endif
struct rte_mempool_memhdr *memhdr;
struct rte_mempool_ops *ops;
+   unsigned int n;
unsigned common_count;
unsigned cache_count;
size_t mem_len = 0;
@@ -1264,6 +1265,15 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
(long double)mem_len / mp->size);
}
 
+   fprintf(f, "  mem_list:\n");
+   n = 0;
+   STAILQ_FOREACH(memhdr, &mp->mem_list, next) {
+   fprintf(f, "addr[%u]=%p\n", n, memhdr->addr);
+   fprintf(f, "iova[%u]=0x%" PRIx64 "\n", n, memhdr->iova);
+   fprintf(f, "len[%u]=%zu\n", n, memhdr->len);
+   n++;
+   }
+
cache_count = rte_mempool_dump_cache(f, mp);
common_count = rte_mempool_ops_get_count(mp);
if ((cache_count + common_count) > mp->size)
-- 
2.17.1

More reviewing, please

Dear DPDK community,

Many non-PMD patches don't get sufficient reviews/acks, and get stuck in limbo.

At a recent DPDK tech board meeting, it was mentioned that: "On the Linux 
Kernel mailing list, patches are met with discussion, on the DPDK mailing list, 
patches are met with silence."

We need to improve this situation.

Please, let's all actively encourage our colleagues to review more patches, and 
encourage managers to approve spending more time on reviewing.


-Morten

[PATCH v4] eal/x86: improve rte_memcpy const size 16 performance

When the rte_memcpy() size is 16, the same 16 bytes are copied twice.
In the case where the size is known to be 16 at build tine, omit the
duplicate copy.

Reduced the amount of effectively copy-pasted code by using #ifdef
inside functions instead of outside functions.

Suggested-by: Stephen Hemminger 
Signed-off-by: Morten Brørup 
Acked-by: Bruce Richardson 
---
v4:
* There are no problems compiling AVX2, only AVX. (Bruce Richardson)
v3:
* AVX2 is a superset of AVX;
  for a block of AVX code, testing for AVX suffices. (Bruce Richardson)
* Define RTE_MEMCPY_AVX if AVX is available, to avoid copy-pasting the
  check for older GCC version. (Bruce Richardson)
v2:
* For GCC, version 11 is required for proper AVX handling;
  if older GCC version, treat AVX as SSE.
  Clang does not have this issue.
  Note: Original code always treated AVX as SSE, regardless of compiler.
* Do not add copyright. (Stephen Hemminger)
---
 lib/eal/x86/include/rte_memcpy.h | 239 +--
 1 file changed, 64 insertions(+), 175 deletions(-)

diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
index 72a92290e0..d687aa7756 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -27,6 +27,16 @@ extern "C" {
 #pragma GCC diagnostic ignored "-Wstringop-overflow"
 #endif
 
+/*
+ * GCC older than version 11 doesn't compile AVX properly, so use SSE instead.
+ * There are no problems with AVX2.
+ */
+#if defined __AVX2__
+#define RTE_MEMCPY_AVX
+#elif defined __AVX__ && !(defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 
11))
+#define RTE_MEMCPY_AVX
+#endif
+
 /**
  * Copy bytes from one location to another. The locations must not overlap.
  *
@@ -91,14 +101,6 @@ rte_mov15_or_less(void *dst, const void *src, size_t n)
return ret;
 }
 
-#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512
-
-#define ALIGNMENT_MASK 0x3F
-
-/**
- * AVX512 implementation below
- */
-
 /**
  * Copy 16 bytes from one location to another,
  * locations should not overlap.
@@ -119,10 +121,15 @@ rte_mov16(uint8_t *dst, const uint8_t *src)
 static __rte_always_inline void
 rte_mov32(uint8_t *dst, const uint8_t *src)
 {
+#if defined RTE_MEMCPY_AVX
__m256i ymm0;
 
ymm0 = _mm256_loadu_si256((const __m256i *)src);
_mm256_storeu_si256((__m256i *)dst, ymm0);
+#else /* SSE implementation */
+   rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);
+   rte_mov16((uint8_t *)dst + 1 * 16, (const uint8_t *)src + 1 * 16);
+#endif
 }
 
 /**
@@ -132,10 +139,15 @@ rte_mov32(uint8_t *dst, const uint8_t *src)
 static __rte_always_inline void
 rte_mov64(uint8_t *dst, const uint8_t *src)
 {
+#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512
__m512i zmm0;
 
zmm0 = _mm512_loadu_si512((const void *)src);
_mm512_storeu_si512((void *)dst, zmm0);
+#else /* AVX2, AVX & SSE implementation */
+   rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32);
+   rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 * 32);
+#endif
 }
 
 /**
@@ -156,12 +168,18 @@ rte_mov128(uint8_t *dst, const uint8_t *src)
 static __rte_always_inline void
 rte_mov256(uint8_t *dst, const uint8_t *src)
 {
-   rte_mov64(dst + 0 * 64, src + 0 * 64);
-   rte_mov64(dst + 1 * 64, src + 1 * 64);
-   rte_mov64(dst + 2 * 64, src + 2 * 64);
-   rte_mov64(dst + 3 * 64, src + 3 * 64);
+   rte_mov128(dst + 0 * 128, src + 0 * 128);
+   rte_mov128(dst + 1 * 128, src + 1 * 128);
 }
 
+#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512
+
+/**
+ * AVX512 implementation below
+ */
+
+#define ALIGNMENT_MASK 0x3F
+
 /**
  * Copy 128-byte blocks from one location to another,
  * locations should not overlap.
@@ -231,12 +249,22 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
/**
 * Fast way when copy size doesn't exceed 512 bytes
 */
+   if (__builtin_constant_p(n) && n == 32) {
+   rte_mov32((uint8_t *)dst, (const uint8_t *)src);
+   return ret;
+   }
if (n <= 32) {
rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+   if (__builtin_constant_p(n) && n == 16)
+   return ret; /* avoid (harmless) duplicate copy */
rte_mov16((uint8_t *)dst - 16 + n,
  (const uint8_t *)src - 16 + n);
return ret;
}
+   if (__builtin_constant_p(n) && n == 64) {
+   rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+   return ret;
+   }
if (n <= 64) {
rte_mov32((uint8_t *)dst, (const uint8_t *)src);
rte_mov32((uint8_t *)dst - 32 + n,
@@ -313,80 +341,13 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
goto COPY_BLOCK_128_BACK63;
 }
 
-#elif defined __AVX2__
-
-#define ALIGNMENT_MASK 0x1F
-
-/**
- * AVX2 implementation below
- */
-
-/**
- * Copy 16 bytes from one location to another

[PATCH v5] eal/x86: improve rte_memcpy const size 16 performance

When the rte_memcpy() size is 16, the same 16 bytes are copied twice.
In the case where the size is known to be 16 at build tine, omit the
duplicate copy.

Reduced the amount of effectively copy-pasted code by using #ifdef
inside functions instead of outside functions.

Depends-on: series-31578 ("provide toolchain abstracted
__builtin_constant_p")

Suggested-by: Stephen Hemminger 
Signed-off-by: Morten Brørup 
Acked-by: Bruce Richardson 
---
v5:
* Fix for building with MSVC:
  Use __rte_constant() instead of __builtin_constant_p().
  Add dependency on patch providing __rte_constant().
v4:
* There are no problems compiling AVX2, only AVX. (Bruce Richardson)
v3:
* AVX2 is a superset of AVX;
  for a block of AVX code, testing for AVX suffices. (Bruce Richardson)
* Define RTE_MEMCPY_AVX if AVX is available, to avoid copy-pasting the
  check for older GCC version. (Bruce Richardson)
v2:
* For GCC, version 11 is required for proper AVX handling;
  if older GCC version, treat AVX as SSE.
  Clang does not have this issue.
  Note: Original code always treated AVX as SSE, regardless of compiler.
* Do not add copyright. (Stephen Hemminger)
---
 lib/eal/x86/include/rte_memcpy.h | 239 +--
 1 file changed, 64 insertions(+), 175 deletions(-)

diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
index 72a92290e0..1619a8f296 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -27,6 +27,16 @@ extern "C" {
 #pragma GCC diagnostic ignored "-Wstringop-overflow"
 #endif
 
+/*
+ * GCC older than version 11 doesn't compile AVX properly, so use SSE instead.
+ * There are no problems with AVX2.
+ */
+#if defined __AVX2__
+#define RTE_MEMCPY_AVX
+#elif defined __AVX__ && !(defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 
11))
+#define RTE_MEMCPY_AVX
+#endif
+
 /**
  * Copy bytes from one location to another. The locations must not overlap.
  *
@@ -91,14 +101,6 @@ rte_mov15_or_less(void *dst, const void *src, size_t n)
return ret;
 }
 
-#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512
-
-#define ALIGNMENT_MASK 0x3F
-
-/**
- * AVX512 implementation below
- */
-
 /**
  * Copy 16 bytes from one location to another,
  * locations should not overlap.
@@ -119,10 +121,15 @@ rte_mov16(uint8_t *dst, const uint8_t *src)
 static __rte_always_inline void
 rte_mov32(uint8_t *dst, const uint8_t *src)
 {
+#if defined RTE_MEMCPY_AVX
__m256i ymm0;
 
ymm0 = _mm256_loadu_si256((const __m256i *)src);
_mm256_storeu_si256((__m256i *)dst, ymm0);
+#else /* SSE implementation */
+   rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);
+   rte_mov16((uint8_t *)dst + 1 * 16, (const uint8_t *)src + 1 * 16);
+#endif
 }
 
 /**
@@ -132,10 +139,15 @@ rte_mov32(uint8_t *dst, const uint8_t *src)
 static __rte_always_inline void
 rte_mov64(uint8_t *dst, const uint8_t *src)
 {
+#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512
__m512i zmm0;
 
zmm0 = _mm512_loadu_si512((const void *)src);
_mm512_storeu_si512((void *)dst, zmm0);
+#else /* AVX2, AVX & SSE implementation */
+   rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32);
+   rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 * 32);
+#endif
 }
 
 /**
@@ -156,12 +168,18 @@ rte_mov128(uint8_t *dst, const uint8_t *src)
 static __rte_always_inline void
 rte_mov256(uint8_t *dst, const uint8_t *src)
 {
-   rte_mov64(dst + 0 * 64, src + 0 * 64);
-   rte_mov64(dst + 1 * 64, src + 1 * 64);
-   rte_mov64(dst + 2 * 64, src + 2 * 64);
-   rte_mov64(dst + 3 * 64, src + 3 * 64);
+   rte_mov128(dst + 0 * 128, src + 0 * 128);
+   rte_mov128(dst + 1 * 128, src + 1 * 128);
 }
 
+#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512
+
+/**
+ * AVX512 implementation below
+ */
+
+#define ALIGNMENT_MASK 0x3F
+
 /**
  * Copy 128-byte blocks from one location to another,
  * locations should not overlap.
@@ -231,12 +249,22 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
/**
 * Fast way when copy size doesn't exceed 512 bytes
 */
+   if (__rte_constant(n) && n == 32) {
+   rte_mov32((uint8_t *)dst, (const uint8_t *)src);
+   return ret;
+   }
if (n <= 32) {
rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+   if (__rte_constant(n) && n == 16)
+   return ret; /* avoid (harmless) duplicate copy */
rte_mov16((uint8_t *)dst - 16 + n,
  (const uint8_t *)src - 16 + n);
return ret;
}
+   if (__rte_constant(n) && n == 64) {
+   rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+   return ret;
+   }
if (n <= 64) {
rte_mov32((uint8_t *)dst, (const uint8_t *)src);
rte_mov32((uint8_t *)dst - 32 + n,
@@ -313,80 +341,13 @@ rte_memcpy_generic(void *dst, const void

RE: [PATCH v5] eal/x86: improve rte_memcpy const size 16 performance

Recheck-request: iol-testing

RE: [PATCH v2] doc: add dma perf feature details

2024-05-27 Thread Amit Prakash Shukla

Hi,

Gentle Ping.

Thanks,
Amit Shukla

> -Original Message-
> From: Amit Prakash Shukla 
> Sent: Tuesday, March 19, 2024 12:16 AM
> To: Cheng Jiang ; Chengwen Feng
> 
> Cc: dev@dpdk.org; Jerin Jacob ; Vamsi Krishna Attunuru
> ; Anoob Joseph ;
> Gowrishankar Muthukrishnan ; Amit Prakash
> Shukla 
> Subject: [PATCH v2] doc: add dma perf feature details
> 
> Update dma perf test document with below support features:
> 1. Memory-to-device and device-to-memory copy.
> 2. Skip support.
> 3. Scatter-gather support.
> 
> Signed-off-by: Amit Prakash Shukla 
> ---
> v2:
> - Rebased the patch.
> 
>  doc/guides/tools/dmaperf.rst | 89 ++-
> -
>  1 file changed, 64 insertions(+), 25 deletions(-)
>

Re: [PATCH] common/cnxk: restore segregation of logs based on module

On Tue, Apr 23, 2024 at 4:15 PM Anoob Joseph  wrote:
>
> Originally the logs were segregated under various labels which could be
> selectively enabled. It was changed to use 'pmd.common.cnxk' while
> changing the macro used for registering logging. Address the same by
> restoring the segregation.
>
> Current logs:
> ...
> logtype3
> pmd.common.cnxk
> pmd.common.iavf
> ...
>
> Changed to:
> ...
> logtype3
> pmd.common.cnxk.base
> pmd.common.cnxk.crypto
> pmd.common.cnxk.dpi
> pmd.common.cnxk.esw
> pmd.common.cnxk.event
> pmd.common.cnxk.flow
> pmd.common.cnxk.mbox
> pmd.common.cnxk.mempool
> pmd.common.cnxk.ml
> pmd.common.cnxk.nix
> pmd.common.cnxk.ree
> pmd.common.cnxk.rep
> pmd.common.cnxk.timer
> pmd.common.cnxk.tm
> pmd.common.iavf
> ...
>
> Updated documentation also to reflect the same.
>
> Fixes: 233692f550a1 ("dma/cnxk: rework DMA driver")
>
> Signed-off-by: Anoob Joseph 


Applied to dpdk-next-net-mrvl/for-main. Thanks

[PATCH v4 0/7] Add ODM DMA device

Add Odyssey ODM DMA device. This PMD abstracts ODM hardware unit on
Odyssey SoC which can perform mem to mem copies.

The hardware unit can support upto 32 queues (vchan) and 16 VFs. It
supports 'fill' operation with specific values. It also supports
SG mode of operation with upto 4 src pointers and 4 destination
pointers.

The PMD is tested with both unit tests and performance applications.

Changes in v4
- Added release notes
- Addressed review comments from Jerin

Changes in v3
- Addressed build failure with stdatomic stage in CI

Changes in v2
- Addressed build failure in CI
- Moved update to usertools as separate patch

Anoob Joseph (2):
  dma/odm: add framework for ODM DMA device
  dma/odm: add hardware defines

Gowrishankar Muthukrishnan (3):
  dma/odm: add dev init and fini
  dma/odm: add device ops
  dma/odm: add stats

Vidya Sagar Velumuri (2):
  dma/odm: add copy and copy sg ops
  dma/odm: add remaining ops

 MAINTAINERS|   7 +
 doc/guides/dmadevs/index.rst   |   1 +
 doc/guides/dmadevs/odm.rst |  92 
 doc/guides/rel_notes/release_24_07.rst |   4 +
 drivers/dma/meson.build|   1 +
 drivers/dma/odm/meson.build|  14 +
 drivers/dma/odm/odm.c  | 237 
 drivers/dma/odm/odm.h  | 203 +++
 drivers/dma/odm/odm_dmadev.c   | 717 +
 drivers/dma/odm/odm_priv.h |  49 ++
 10 files changed, 1325 insertions(+)
 create mode 100644 doc/guides/dmadevs/odm.rst
 create mode 100644 drivers/dma/odm/meson.build
 create mode 100644 drivers/dma/odm/odm.c
 create mode 100644 drivers/dma/odm/odm.h
 create mode 100644 drivers/dma/odm/odm_dmadev.c
 create mode 100644 drivers/dma/odm/odm_priv.h

-- 
2.45.1

[PATCH v4 1/7] dma/odm: add framework for ODM DMA device

Add framework for Odyssey ODM DMA device.

Signed-off-by: Anoob Joseph 
Signed-off-by: Gowrishankar Muthukrishnan 
Signed-off-by: Vidya Sagar Velumuri 
---
 MAINTAINERS  |  6 +++
 drivers/dma/meson.build  |  1 +
 drivers/dma/odm/meson.build  | 14 +++
 drivers/dma/odm/odm.h| 29 ++
 drivers/dma/odm/odm_dmadev.c | 74 
 5 files changed, 124 insertions(+)
 create mode 100644 drivers/dma/odm/meson.build
 create mode 100644 drivers/dma/odm/odm.h
 create mode 100644 drivers/dma/odm/odm_dmadev.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c9adff9846..b581207a9a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1269,6 +1269,12 @@ T: git://dpdk.org/next/dpdk-next-net-mrvl
 F: drivers/dma/cnxk/
 F: doc/guides/dmadevs/cnxk.rst
 
+Marvell Odyssey ODM DMA
+M: Gowrishankar Muthukrishnan 
+M: Vidya Sagar Velumuri 
+T: git://dpdk.org/next/dpdk-next-net-mrvl
+F: drivers/dma/odm/
+
 NXP DPAA DMA
 M: Gagandeep Singh 
 M: Sachin Saxena 
diff --git a/drivers/dma/meson.build b/drivers/dma/meson.build
index 582654ea1b..358132759a 100644
--- a/drivers/dma/meson.build
+++ b/drivers/dma/meson.build
@@ -8,6 +8,7 @@ drivers = [
 'hisilicon',
 'idxd',
 'ioat',
+'odm',
 'skeleton',
 ]
 std_deps = ['dmadev']
diff --git a/drivers/dma/odm/meson.build b/drivers/dma/odm/meson.build
new file mode 100644
index 00..227b10c890
--- /dev/null
+++ b/drivers/dma/odm/meson.build
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2024 Marvell.
+
+if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
+build = false
+reason = 'only supported on 64-bit Linux'
+subdir_done()
+endif
+
+deps += ['bus_pci', 'dmadev', 'eal', 'mempool', 'pci']
+
+sources = files('odm_dmadev.c')
+
+pmd_supports_disable_iova_as_pa = true
diff --git a/drivers/dma/odm/odm.h b/drivers/dma/odm/odm.h
new file mode 100644
index 00..aeeb6f9e9a
--- /dev/null
+++ b/drivers/dma/odm/odm.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#ifndef _ODM_H_
+#define _ODM_H_
+
+#include 
+
+extern int odm_logtype;
+
+#define odm_err(...)   
\
+   rte_log(RTE_LOG_ERR, odm_logtype,   
   \
+   RTE_FMT("%s(): %u" RTE_FMT_HEAD(__VA_ARGS__, ), __func__, 
__LINE__,\
+   RTE_FMT_TAIL(__VA_ARGS__, )))
+#define odm_info(...)  
\
+   rte_log(RTE_LOG_INFO, odm_logtype,  
   \
+   RTE_FMT("%s(): %u" RTE_FMT_HEAD(__VA_ARGS__, ), __func__, 
__LINE__,\
+   RTE_FMT_TAIL(__VA_ARGS__, )))
+
+struct __rte_cache_aligned odm_dev {
+   struct rte_pci_device *pci_dev;
+   uint8_t *rbase;
+   uint16_t vfid;
+   uint8_t max_qs;
+   uint8_t num_qs;
+};
+
+#endif /* _ODM_H_ */
diff --git a/drivers/dma/odm/odm_dmadev.c b/drivers/dma/odm/odm_dmadev.c
new file mode 100644
index 00..cc3342cf7b
--- /dev/null
+++ b/drivers/dma/odm/odm_dmadev.c
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "odm.h"
+
+#define PCI_VENDOR_ID_CAVIUM0x177D
+#define PCI_DEVID_ODYSSEY_ODM_VF 0xA08C
+#define PCI_DRIVER_NAME dma_odm
+
+static int
+odm_dmadev_probe(struct rte_pci_driver *pci_drv __rte_unused, struct 
rte_pci_device *pci_dev)
+{
+   char name[RTE_DEV_NAME_MAX_LEN];
+   struct odm_dev *odm = NULL;
+   struct rte_dma_dev *dmadev;
+
+   if (!pci_dev->mem_resource[0].addr)
+   return -ENODEV;
+
+   memset(name, 0, sizeof(name));
+   rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+
+   dmadev = rte_dma_pmd_allocate(name, pci_dev->device.numa_node, 
sizeof(*odm));
+   if (dmadev == NULL) {
+   odm_err("DMA device allocation failed for %s", name);
+   return -ENOMEM;
+   }
+
+   odm_info("DMA device %s probed", name);
+
+   return 0;
+}
+
+static int
+odm_dmadev_remove(struct rte_pci_device *pci_dev)
+{
+   char name[RTE_DEV_NAME_MAX_LEN];
+
+   memset(name, 0, sizeof(name));
+   rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+
+   return rte_dma_pmd_release(name);
+}
+
+static const struct rte_pci_id odm_dma_pci_map[] = {
+   {
+   RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_ODYSSEY_ODM_VF)
+   },
+   {
+   .vendor_id = 0,
+   },
+};
+
+static struct rte_pci_driver odm_dmadev = {
+   .id_table = odm_dma_pci_map,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+   .probe = odm_dmadev_probe,
+   .remove = odm_dmadev_remove,
+};
+

[PATCH v4 2/7] dma/odm: add hardware defines

Add ODM registers and structures. Add mailbox structs as well.

Signed-off-by: Anoob Joseph 
Signed-off-by: Gowrishankar Muthukrishnan 
Signed-off-by: Vidya Sagar Velumuri 
---
 drivers/dma/odm/odm.h  | 106 +
 drivers/dma/odm/odm_priv.h |  49 +
 2 files changed, 155 insertions(+)
 create mode 100644 drivers/dma/odm/odm_priv.h

diff --git a/drivers/dma/odm/odm.h b/drivers/dma/odm/odm.h
index aeeb6f9e9a..8cc3e0de44 100644
--- a/drivers/dma/odm/odm.h
+++ b/drivers/dma/odm/odm.h
@@ -9,6 +9,46 @@
 
 extern int odm_logtype;
 
+/* ODM VF register offsets from VF_BAR0 */
+#define ODM_VDMA_EN(x) (0x00 | (x << 3))
+#define ODM_VDMA_REQQ_CTL(x)   (0x80 | (x << 3))
+#define ODM_VDMA_DBELL(x)  (0x100 | (x << 3))
+#define ODM_VDMA_RING_CFG(x)   (0x180 | (x << 3))
+#define ODM_VDMA_IRING_BADDR(x) (0x200 | (x << 3))
+#define ODM_VDMA_CRING_BADDR(x) (0x280 | (x << 3))
+#define ODM_VDMA_COUNTS(x) (0x300 | (x << 3))
+#define ODM_VDMA_IRING_NADDR(x) (0x380 | (x << 3))
+#define ODM_VDMA_CRING_NADDR(x) (0x400 | (x << 3))
+#define ODM_VDMA_IRING_DBG(x)  (0x480 | (x << 3))
+#define ODM_VDMA_CNT(x)(0x580 | (x << 3))
+#define ODM_VF_INT (0x1000)
+#define ODM_VF_INT_W1S (0x1008)
+#define ODM_VF_INT_ENA_W1C (0x1010)
+#define ODM_VF_INT_ENA_W1S (0x1018)
+#define ODM_MBOX_VF_PF_DATA(i) (0x2000 | (i << 3))
+#define ODM_MBOX_RETRY_CNT (0xfff)
+#define ODM_MBOX_ERR_CODE_MAX  (0xf)
+#define ODM_IRING_IDLE_WAIT_CNT (0xfff)
+
+/*
+ * Enumeration odm_hdr_xtype_e
+ *
+ * ODM Transfer Type Enumeration
+ * Enumerates the pointer type in ODM_DMA_INSTR_HDR_S[XTYPE]
+ */
+#define ODM_XTYPE_INTERNAL 2
+#define ODM_XTYPE_FILL0   4
+#define ODM_XTYPE_FILL1   5
+
+/*
+ *  ODM Header completion type enumeration
+ *  Enumerates the completion type in ODM_DMA_INSTR_HDR_S[CT]
+ */
+#define ODM_HDR_CT_CW_CA 0x0
+#define ODM_HDR_CT_CW_NC 0x1
+
+#define ODM_MAX_QUEUES_PER_DEV 16
+
 #define odm_err(...)   
\
rte_log(RTE_LOG_ERR, odm_logtype,   
   \
RTE_FMT("%s(): %u" RTE_FMT_HEAD(__VA_ARGS__, ), __func__, 
__LINE__,\
@@ -18,6 +58,72 @@ extern int odm_logtype;
RTE_FMT("%s(): %u" RTE_FMT_HEAD(__VA_ARGS__, ), __func__, 
__LINE__,\
RTE_FMT_TAIL(__VA_ARGS__, )))
 
+/*
+ * Structure odm_instr_hdr_s for ODM
+ *
+ * ODM DMA Instruction Header Format
+ */
+union odm_instr_hdr_s {
+   uint64_t u;
+   struct odm_instr_hdr {
+   uint64_t nfst : 3;
+   uint64_t reserved_3 : 1;
+   uint64_t nlst : 3;
+   uint64_t reserved_7_9 : 3;
+   uint64_t ct : 2;
+   uint64_t stse : 1;
+   uint64_t reserved_13_28 : 16;
+   uint64_t sts : 1;
+   uint64_t reserved_30_49 : 20;
+   uint64_t xtype : 3;
+   uint64_t reserved_53_63 : 11;
+   } s;
+};
+
+/* ODM Completion Entry Structure */
+union odm_cmpl_ent_s {
+   uint32_t u;
+   struct odm_cmpl_ent {
+   uint32_t cmp_code : 8;
+   uint32_t rsvd : 23;
+   uint32_t valid : 1;
+   } s;
+};
+
+/* ODM DMA Ring Configuration Register */
+union odm_vdma_ring_cfg_s {
+   uint64_t u;
+   struct {
+   uint64_t isize : 8;
+   uint64_t rsvd_8_15 : 8;
+   uint64_t csize : 8;
+   uint64_t rsvd_24_63 : 40;
+   } s;
+};
+
+/* ODM DMA Instruction Ring DBG */
+union odm_vdma_iring_dbg_s {
+   uint64_t u;
+   struct {
+   uint64_t dbell_cnt : 32;
+   uint64_t offset : 16;
+   uint64_t rsvd_48_62 : 15;
+   uint64_t iwbusy : 1;
+   } s;
+};
+
+/* ODM DMA Counts */
+union odm_vdma_counts_s {
+   uint64_t u;
+   struct {
+   uint64_t dbell : 32;
+   uint64_t buf_used_cnt : 9;
+   uint64_t rsvd_41_43 : 3;
+   uint64_t rsvd_buf_used_cnt : 3;
+   uint64_t rsvd_47_63 : 17;
+   } s;
+};
+
 struct __rte_cache_aligned odm_dev {
struct rte_pci_device *pci_dev;
uint8_t *rbase;
diff --git a/drivers/dma/odm/odm_priv.h b/drivers/dma/odm/odm_priv.h
new file mode 100644
index 00..1878f4d9a6
--- /dev/null
+++ b/drivers/dma/odm/odm_priv.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#ifndef _ODM_PRIV_H_
+#define _ODM_PRIV_H_
+
+#define ODM_MAX_VFS16
+#define ODM_MAX_QUEUES 32
+
+#define ODM_CMD_QUEUE_SIZE 4096
+
+#define ODM_DEV_INIT   0x1
+#define ODM_DEV_CLOSE  0x2
+#define ODM_QUEUE_OPEN 0x3
+#define ODM_QUEUE_CLOSE 0x4
+#define ODM_REG_DUMP   0x5
+
+struct odm_mbox_dev_msg {
+   /* Response code */
+   uint64_t rsp : 8;
+   /* Number of VFs */

[PATCH v4 3/7] dma/odm: add dev init and fini

From: Gowrishankar Muthukrishnan 

Add ODM device init and fini.

Signed-off-by: Anoob Joseph 
Signed-off-by: Gowrishankar Muthukrishnan 
Signed-off-by: Vidya Sagar Velumuri 
---
 drivers/dma/odm/meson.build  |  2 +-
 drivers/dma/odm/odm.c| 97 
 drivers/dma/odm/odm.h| 10 
 drivers/dma/odm/odm_dmadev.c | 13 +
 4 files changed, 121 insertions(+), 1 deletion(-)
 create mode 100644 drivers/dma/odm/odm.c

diff --git a/drivers/dma/odm/meson.build b/drivers/dma/odm/meson.build
index 227b10c890..d597762d37 100644
--- a/drivers/dma/odm/meson.build
+++ b/drivers/dma/odm/meson.build
@@ -9,6 +9,6 @@ endif
 
 deps += ['bus_pci', 'dmadev', 'eal', 'mempool', 'pci']
 
-sources = files('odm_dmadev.c')
+sources = files('odm_dmadev.c', 'odm.c')
 
 pmd_supports_disable_iova_as_pa = true
diff --git a/drivers/dma/odm/odm.c b/drivers/dma/odm/odm.c
new file mode 100644
index 00..c0963da451
--- /dev/null
+++ b/drivers/dma/odm/odm.c
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include 
+
+#include 
+
+#include 
+
+#include "odm.h"
+#include "odm_priv.h"
+
+static void
+odm_vchan_resc_free(struct odm_dev *odm, int qno)
+{
+   RTE_SET_USED(odm);
+   RTE_SET_USED(qno);
+}
+
+static int
+send_mbox_to_pf(struct odm_dev *odm, union odm_mbox_msg *msg, union 
odm_mbox_msg *rsp)
+{
+   int retry_cnt = ODM_MBOX_RETRY_CNT;
+   union odm_mbox_msg pf_msg;
+
+   msg->d.err = ODM_MBOX_ERR_CODE_MAX;
+   odm_write64(msg->u[0], odm->rbase + ODM_MBOX_VF_PF_DATA(0));
+   odm_write64(msg->u[1], odm->rbase + ODM_MBOX_VF_PF_DATA(1));
+
+   pf_msg.u[0] = 0;
+   pf_msg.u[1] = 0;
+   pf_msg.u[0] = odm_read64(odm->rbase + ODM_MBOX_VF_PF_DATA(0));
+
+   while (pf_msg.d.rsp == 0 && retry_cnt > 0) {
+   pf_msg.u[0] = odm_read64(odm->rbase + ODM_MBOX_VF_PF_DATA(0));
+   --retry_cnt;
+   }
+
+   if (retry_cnt <= 0)
+   return -EBADE;
+
+   pf_msg.u[1] = odm_read64(odm->rbase + ODM_MBOX_VF_PF_DATA(1));
+
+   if (rsp) {
+   rsp->u[0] = pf_msg.u[0];
+   rsp->u[1] = pf_msg.u[1];
+   }
+
+   if (pf_msg.d.rsp == msg->d.err && pf_msg.d.err != 0)
+   return -EBADE;
+
+   return 0;
+}
+
+int
+odm_dev_init(struct odm_dev *odm)
+{
+   struct rte_pci_device *pci_dev = odm->pci_dev;
+   union odm_mbox_msg mbox_msg;
+   uint16_t vfid;
+   int rc;
+
+   odm->rbase = pci_dev->mem_resource[0].addr;
+   vfid = ((pci_dev->addr.devid & 0x1F) << 3) | (pci_dev->addr.function & 
0x7);
+   vfid -= 1;
+   odm->vfid = vfid;
+   odm->num_qs = 0;
+
+   mbox_msg.u[0] = 0;
+   mbox_msg.u[1] = 0;
+   mbox_msg.q.vfid = odm->vfid;
+   mbox_msg.q.cmd = ODM_DEV_INIT;
+   rc = send_mbox_to_pf(odm, &mbox_msg, &mbox_msg);
+   if (!rc)
+   odm->max_qs = 1 << (4 - mbox_msg.d.nvfs);
+
+   return rc;
+}
+
+int
+odm_dev_fini(struct odm_dev *odm)
+{
+   union odm_mbox_msg mbox_msg;
+   int qno, rc = 0;
+
+   mbox_msg.u[0] = 0;
+   mbox_msg.u[1] = 0;
+   mbox_msg.q.vfid = odm->vfid;
+   mbox_msg.q.cmd = ODM_DEV_CLOSE;
+   rc = send_mbox_to_pf(odm, &mbox_msg, &mbox_msg);
+
+   for (qno = 0; qno < odm->num_qs; qno++)
+   odm_vchan_resc_free(odm, qno);
+
+   return rc;
+}
diff --git a/drivers/dma/odm/odm.h b/drivers/dma/odm/odm.h
index 8cc3e0de44..0bf0c6345b 100644
--- a/drivers/dma/odm/odm.h
+++ b/drivers/dma/odm/odm.h
@@ -5,6 +5,10 @@
 #ifndef _ODM_H_
 #define _ODM_H_
 
+#include 
+
+#include 
+#include 
 #include 
 
 extern int odm_logtype;
@@ -49,6 +53,9 @@ extern int odm_logtype;
 
 #define ODM_MAX_QUEUES_PER_DEV 16
 
+#define odm_read64(addr)   rte_read64_relaxed((volatile void *)(addr))
+#define odm_write64(val, addr) rte_write64_relaxed((val), (volatile void 
*)(addr))
+
 #define odm_err(...)   
\
rte_log(RTE_LOG_ERR, odm_logtype,   
   \
RTE_FMT("%s(): %u" RTE_FMT_HEAD(__VA_ARGS__, ), __func__, 
__LINE__,\
@@ -132,4 +139,7 @@ struct __rte_cache_aligned odm_dev {
uint8_t num_qs;
 };
 
+int odm_dev_init(struct odm_dev *odm);
+int odm_dev_fini(struct odm_dev *odm);
+
 #endif /* _ODM_H_ */
diff --git a/drivers/dma/odm/odm_dmadev.c b/drivers/dma/odm/odm_dmadev.c
index cc3342cf7b..bef335c10c 100644
--- a/drivers/dma/odm/odm_dmadev.c
+++ b/drivers/dma/odm/odm_dmadev.c
@@ -23,6 +23,7 @@ odm_dmadev_probe(struct rte_pci_driver *pci_drv __rte_unused, 
struct rte_pci_dev
char name[RTE_DEV_NAME_MAX_LEN];
struct odm_dev *odm = NULL;
struct rte_dma_dev *dmadev;
+   int rc;
 
if (!pci_dev->mem_resource[0].addr)
return -ENODEV;
@@ -37,8 +38,20 @@ odm_dmadev_probe(struct rte_pci_driver *pci_drv 
__rte_unused,

[PATCH v4 4/7] dma/odm: add device ops

From: Gowrishankar Muthukrishnan 

Add DMA device control ops.

Signed-off-by: Anoob Joseph 
Signed-off-by: Gowrishankar Muthukrishnan 
Signed-off-by: Vidya Sagar Velumuri 
---
 drivers/dma/odm/odm.c| 144 ++-
 drivers/dma/odm/odm.h|  54 +
 drivers/dma/odm/odm_dmadev.c |  85 +
 3 files changed, 281 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/odm/odm.c b/drivers/dma/odm/odm.c
index c0963da451..270808f4df 100644
--- a/drivers/dma/odm/odm.c
+++ b/drivers/dma/odm/odm.c
@@ -7,6 +7,7 @@
 #include 
 
 #include 
+#include 
 
 #include "odm.h"
 #include "odm_priv.h"
@@ -14,8 +15,15 @@
 static void
 odm_vchan_resc_free(struct odm_dev *odm, int qno)
 {
-   RTE_SET_USED(odm);
-   RTE_SET_USED(qno);
+   struct odm_queue *vq = &odm->vq[qno];
+
+   rte_memzone_free(vq->iring_mz);
+   rte_memzone_free(vq->cring_mz);
+   rte_free(vq->extra_ins_sz);
+
+   vq->iring_mz = NULL;
+   vq->cring_mz = NULL;
+   vq->extra_ins_sz = NULL;
 }
 
 static int
@@ -53,6 +61,138 @@ send_mbox_to_pf(struct odm_dev *odm, union odm_mbox_msg 
*msg, union odm_mbox_msg
return 0;
 }
 
+static int
+odm_queue_ring_config(struct odm_dev *odm, int vchan, int isize, int csize)
+{
+   union odm_vdma_ring_cfg_s ring_cfg = {0};
+   struct odm_queue *vq = &odm->vq[vchan];
+
+   if (vq->iring_mz == NULL || vq->cring_mz == NULL)
+   return -EINVAL;
+
+   ring_cfg.s.isize = (isize / 1024) - 1;
+   ring_cfg.s.csize = (csize / 1024) - 1;
+
+   odm_write64(ring_cfg.u, odm->rbase + ODM_VDMA_RING_CFG(vchan));
+   odm_write64(vq->iring_mz->iova, odm->rbase + 
ODM_VDMA_IRING_BADDR(vchan));
+   odm_write64(vq->cring_mz->iova, odm->rbase + 
ODM_VDMA_CRING_BADDR(vchan));
+
+   return 0;
+}
+
+int
+odm_enable(struct odm_dev *odm)
+{
+   struct odm_queue *vq;
+   int qno, rc = 0;
+
+   for (qno = 0; qno < odm->num_qs; qno++) {
+   vq = &odm->vq[qno];
+
+   vq->desc_idx = vq->stats.completed_offset;
+   vq->pending_submit_len = 0;
+   vq->pending_submit_cnt = 0;
+   vq->iring_head = 0;
+   vq->cring_head = 0;
+   vq->ins_ring_head = 0;
+   vq->iring_sz_available = vq->iring_max_words;
+
+   rc = odm_queue_ring_config(odm, qno, vq->iring_max_words * 8,
+  vq->cring_max_entry * 4);
+   if (rc < 0)
+   break;
+
+   odm_write64(0x1, odm->rbase + ODM_VDMA_EN(qno));
+   }
+
+   return rc;
+}
+
+int
+odm_disable(struct odm_dev *odm)
+{
+   int qno, wait_cnt = ODM_IRING_IDLE_WAIT_CNT;
+   uint64_t val;
+
+   /* Disable the queue and wait for the queue to became idle */
+   for (qno = 0; qno < odm->num_qs; qno++) {
+   odm_write64(0x0, odm->rbase + ODM_VDMA_EN(qno));
+   do {
+   val = odm_read64(odm->rbase + 
ODM_VDMA_IRING_BADDR(qno));
+   } while ((!(val & 1ULL << 63)) && (--wait_cnt > 0));
+   }
+
+   return 0;
+}
+
+int
+odm_vchan_setup(struct odm_dev *odm, int vchan, int nb_desc)
+{
+   struct odm_queue *vq = &odm->vq[vchan];
+   int isize, csize, max_nb_desc, rc = 0;
+   union odm_mbox_msg mbox_msg;
+   const struct rte_memzone *mz;
+   char name[32];
+
+   if (vq->iring_mz != NULL)
+   odm_vchan_resc_free(odm, vchan);
+
+   mbox_msg.u[0] = 0;
+   mbox_msg.u[1] = 0;
+
+   /* ODM PF driver expects vfid starts from index 0 */
+   mbox_msg.q.vfid = odm->vfid;
+   mbox_msg.q.cmd = ODM_QUEUE_OPEN;
+   mbox_msg.q.qidx = vchan;
+   rc = send_mbox_to_pf(odm, &mbox_msg, &mbox_msg);
+   if (rc < 0)
+   return rc;
+
+   /* Determine instruction & completion ring sizes. */
+
+   /* Create iring that can support nb_desc. Round up to a multiple of 
1024. */
+   isize = RTE_ALIGN_CEIL(nb_desc * ODM_IRING_ENTRY_SIZE_MAX * 8, 1024);
+   isize = RTE_MIN(isize, ODM_IRING_MAX_SIZE);
+   snprintf(name, sizeof(name), "vq%d_iring%d", odm->vfid, vchan);
+   mz = rte_memzone_reserve_aligned(name, isize, SOCKET_ID_ANY, 0, 1024);
+   if (mz == NULL)
+   return -ENOMEM;
+   vq->iring_mz = mz;
+   vq->iring_max_words = isize / 8;
+
+   /* Create cring that can support max instructions that can be inflight 
in hw. */
+   max_nb_desc = (isize / (ODM_IRING_ENTRY_SIZE_MIN * 8));
+   csize = RTE_ALIGN_CEIL(max_nb_desc * sizeof(union odm_cmpl_ent_s), 
1024);
+   snprintf(name, sizeof(name), "vq%d_cring%d", odm->vfid, vchan);
+   mz = rte_memzone_reserve_aligned(name, csize, SOCKET_ID_ANY, 0, 1024);
+   if (mz == NULL) {
+   rc = -ENOMEM;
+   goto iring_free;
+   }
+   vq->cring_mz = mz;
+   vq->cring_max_entry = csize / 4;
+
+   /* Allocate memory to trac

[PATCH v4 5/7] dma/odm: add stats

From: Gowrishankar Muthukrishnan 

Add DMA dev stats.

Signed-off-by: Anoob Joseph 
Signed-off-by: Gowrishankar Muthukrishnan 
Signed-off-by: Vidya Sagar Velumuri 
---
 drivers/dma/odm/odm_dmadev.c | 63 ++--
 1 file changed, 61 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/odm/odm_dmadev.c b/drivers/dma/odm/odm_dmadev.c
index 8c705978fe..13b2588246 100644
--- a/drivers/dma/odm/odm_dmadev.c
+++ b/drivers/dma/odm/odm_dmadev.c
@@ -87,14 +87,73 @@ odm_dmadev_close(struct rte_dma_dev *dev)
return 0;
 }
 
+static int
+odm_stats_get(const struct rte_dma_dev *dev, uint16_t vchan, struct 
rte_dma_stats *rte_stats,
+ uint32_t size)
+{
+   struct odm_dev *odm = dev->fp_obj->dev_private;
+
+   if (size < sizeof(rte_stats))
+   return -EINVAL;
+   if (rte_stats == NULL)
+   return -EINVAL;
+
+   if (vchan != RTE_DMA_ALL_VCHAN) {
+   struct rte_dma_stats *stats = (struct rte_dma_stats 
*)&odm->vq[vchan].stats;
+
+   *rte_stats = *stats;
+   } else {
+   int i;
+
+   for (i = 0; i < odm->num_qs; i++) {
+   struct rte_dma_stats *stats = (struct rte_dma_stats 
*)&odm->vq[i].stats;
+
+   rte_stats->submitted += stats->submitted;
+   rte_stats->completed += stats->completed;
+   rte_stats->errors += stats->errors;
+   }
+   }
+
+   return 0;
+}
+
+static void
+odm_vq_stats_reset(struct vq_stats *vq_stats)
+{
+   vq_stats->completed_offset += vq_stats->completed;
+   vq_stats->completed = 0;
+   vq_stats->errors = 0;
+   vq_stats->submitted = 0;
+}
+
+static int
+odm_stats_reset(struct rte_dma_dev *dev, uint16_t vchan)
+{
+   struct odm_dev *odm = dev->fp_obj->dev_private;
+   struct vq_stats *vq_stats;
+   int i;
+
+   if (vchan != RTE_DMA_ALL_VCHAN) {
+   vq_stats = &odm->vq[vchan].stats;
+   odm_vq_stats_reset(vq_stats);
+   } else {
+   for (i = 0; i < odm->num_qs; i++) {
+   vq_stats = &odm->vq[i].stats;
+   odm_vq_stats_reset(vq_stats);
+   }
+   }
+
+   return 0;
+}
+
 static const struct rte_dma_dev_ops odm_dmadev_ops = {
.dev_close = odm_dmadev_close,
.dev_configure = odm_dmadev_configure,
.dev_info_get = odm_dmadev_info_get,
.dev_start = odm_dmadev_start,
.dev_stop = odm_dmadev_stop,
-   .stats_get = NULL,
-   .stats_reset = NULL,
+   .stats_get = odm_stats_get,
+   .stats_reset = odm_stats_reset,
.vchan_setup = odm_dmadev_vchan_setup,
 };
 
-- 
2.45.1

[PATCH v4 6/7] dma/odm: add copy and copy sg ops

From: Vidya Sagar Velumuri 

Add ODM copy and copy SG ops.

Signed-off-by: Anoob Joseph 
Signed-off-by: Gowrishankar Muthukrishnan 
Signed-off-by: Vidya Sagar Velumuri 
---
 drivers/dma/odm/odm_dmadev.c | 236 +++
 1 file changed, 236 insertions(+)

diff --git a/drivers/dma/odm/odm_dmadev.c b/drivers/dma/odm/odm_dmadev.c
index 13b2588246..b21be83a89 100644
--- a/drivers/dma/odm/odm_dmadev.c
+++ b/drivers/dma/odm/odm_dmadev.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "odm.h"
@@ -87,6 +88,238 @@ odm_dmadev_close(struct rte_dma_dev *dev)
return 0;
 }
 
+static int
+odm_dmadev_copy(void *dev_private, uint16_t vchan, rte_iova_t src, rte_iova_t 
dst, uint32_t length,
+   uint64_t flags)
+{
+   uint16_t pending_submit_len, pending_submit_cnt, iring_sz_available, 
iring_head;
+   const int num_words = ODM_IRING_ENTRY_SIZE_MIN;
+   struct odm_dev *odm = dev_private;
+   uint64_t *iring_head_ptr;
+   struct odm_queue *vq;
+   uint64_t h;
+
+   const union odm_instr_hdr_s hdr = {
+   .s.ct = ODM_HDR_CT_CW_NC,
+   .s.xtype = ODM_XTYPE_INTERNAL,
+   .s.nfst = 1,
+   .s.nlst = 1,
+   };
+
+   vq = &odm->vq[vchan];
+
+   h = length;
+   h |= ((uint64_t)length << 32);
+
+   const uint16_t max_iring_words = vq->iring_max_words;
+
+   iring_sz_available = vq->iring_sz_available;
+   pending_submit_len = vq->pending_submit_len;
+   pending_submit_cnt = vq->pending_submit_cnt;
+   iring_head_ptr = vq->iring_mz->addr;
+   iring_head = vq->iring_head;
+
+   if (iring_sz_available < num_words)
+   return -ENOSPC;
+
+   if ((iring_head + num_words) >= max_iring_words) {
+
+   iring_head_ptr[iring_head] = hdr.u;
+   iring_head = (iring_head + 1) % max_iring_words;
+
+   iring_head_ptr[iring_head] = h;
+   iring_head = (iring_head + 1) % max_iring_words;
+
+   iring_head_ptr[iring_head] = src;
+   iring_head = (iring_head + 1) % max_iring_words;
+
+   iring_head_ptr[iring_head] = dst;
+   iring_head = (iring_head + 1) % max_iring_words;
+   } else {
+   iring_head_ptr[iring_head++] = hdr.u;
+   iring_head_ptr[iring_head++] = h;
+   iring_head_ptr[iring_head++] = src;
+   iring_head_ptr[iring_head++] = dst;
+   }
+
+   pending_submit_len += num_words;
+
+   if (flags & RTE_DMA_OP_FLAG_SUBMIT) {
+   rte_wmb();
+   odm_write64(pending_submit_len, odm->rbase + 
ODM_VDMA_DBELL(vchan));
+   vq->stats.submitted += pending_submit_cnt + 1;
+   vq->pending_submit_len = 0;
+   vq->pending_submit_cnt = 0;
+   } else {
+   vq->pending_submit_len = pending_submit_len;
+   vq->pending_submit_cnt++;
+   }
+
+   vq->iring_head = iring_head;
+
+   vq->iring_sz_available = iring_sz_available - num_words;
+
+   /* No extra space to save. Skip entry in extra space ring. */
+   vq->ins_ring_head = (vq->ins_ring_head + 1) % vq->cring_max_entry;
+
+   return vq->desc_idx++;
+}
+
+static inline void
+odm_dmadev_fill_sg(uint64_t *cmd, const struct rte_dma_sge *src, const struct 
rte_dma_sge *dst,
+  uint16_t nb_src, uint16_t nb_dst, union odm_instr_hdr_s *hdr)
+{
+   int i = 0, j = 0;
+   uint64_t h = 0;
+
+   cmd[j++] = hdr->u;
+   /* When nb_src is even */
+   if (!(nb_src & 0x1)) {
+   /* Fill the iring with src pointers */
+   for (i = 1; i < nb_src; i += 2) {
+   h = ((uint64_t)src[i].length << 32) | src[i - 1].length;
+   cmd[j++] = h;
+   cmd[j++] = src[i - 1].addr;
+   cmd[j++] = src[i].addr;
+   }
+
+   /* Fill the iring with dst pointers */
+   for (i = 1; i < nb_dst; i += 2) {
+   h = ((uint64_t)dst[i].length << 32) | dst[i - 1].length;
+   cmd[j++] = h;
+   cmd[j++] = dst[i - 1].addr;
+   cmd[j++] = dst[i].addr;
+   }
+
+   /* Handle the last dst pointer when nb_dst is odd */
+   if (nb_dst & 0x1) {
+   h = dst[nb_dst - 1].length;
+   cmd[j++] = h;
+   cmd[j++] = dst[nb_dst - 1].addr;
+   cmd[j++] = 0;
+   }
+   } else {
+   /* When nb_src is odd */
+
+   /* Fill the iring with src pointers */
+   for (i = 1; i < nb_src; i += 2) {
+   h = ((uint64_t)src[i].length << 32) | src[i - 1].length;
+   cmd[j++] = h;
+   cmd[j++] = src[i - 1].addr;
+   cmd[j++] = src[i]

[PATCH v4 7/7] dma/odm: add remaining ops

From: Vidya Sagar Velumuri 

Add all remaining ops such as fill, burst_capacity etc. Also update the
documentation.

Signed-off-by: Anoob Joseph 
Signed-off-by: Gowrishankar Muthukrishnan 
Signed-off-by: Vidya Sagar Velumuri 
---
 MAINTAINERS|   1 +
 doc/guides/dmadevs/index.rst   |   1 +
 doc/guides/dmadevs/odm.rst |  92 +
 doc/guides/rel_notes/release_24_07.rst |   4 +
 drivers/dma/odm/odm.h  |   4 +
 drivers/dma/odm/odm_dmadev.c   | 250 +
 6 files changed, 352 insertions(+)
 create mode 100644 doc/guides/dmadevs/odm.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index b581207a9a..195125ee1e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1274,6 +1274,7 @@ M: Gowrishankar Muthukrishnan 
 M: Vidya Sagar Velumuri 
 T: git://dpdk.org/next/dpdk-next-net-mrvl
 F: drivers/dma/odm/
+F: doc/guides/dmadevs/odm.rst
 
 NXP DPAA DMA
 M: Gagandeep Singh 
diff --git a/doc/guides/dmadevs/index.rst b/doc/guides/dmadevs/index.rst
index 5bd25b32b9..ce9f6eb260 100644
--- a/doc/guides/dmadevs/index.rst
+++ b/doc/guides/dmadevs/index.rst
@@ -17,3 +17,4 @@ an application through DMA API.
hisilicon
idxd
ioat
+   odm
diff --git a/doc/guides/dmadevs/odm.rst b/doc/guides/dmadevs/odm.rst
new file mode 100644
index 00..a2eaab59a0
--- /dev/null
+++ b/doc/guides/dmadevs/odm.rst
@@ -0,0 +1,92 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright(c) 2024 Marvell.
+
+Odyssey ODM DMA Device Driver
+=
+
+The ``odm`` DMA device driver provides a poll-mode driver (PMD) for Marvell 
Odyssey
+DMA Hardware Accelerator block found in Odyssey SoC. The block supports only 
mem
+to mem DMA transfers.
+
+ODM DMA device can support up to 32 queues and 16 VFs.
+
+Prerequisites and Compilation procedure
+---
+
+Device Setup
+-
+
+ODM DMA device is initialized by kernel PF driver. The PF kernel driver is part
+of Marvell software packages for Odyssey.
+
+Kernel module can be inserted as in below example::
+
+$ sudo insmod odyssey_odm.ko
+
+ODM DMA device can support up to 16 VFs::
+
+$ sudo echo 16 > /sys/bus/pci/devices/\:08\:00.0/sriov_numvfs
+
+Above command creates 16 VFs with 2 queues each.
+
+The ``dpdk-devbind.py`` script, included with DPDK, can be used to show the
+presence of supported hardware. Running ``dpdk-devbind.py --status-dev dma``
+will show all the Odyssey ODM DMA devices.
+
+Devices using VFIO drivers
+~~
+
+The HW devices to be used will need to be bound to a user-space IO driver.
+The ``dpdk-devbind.py`` script can be used to view the state of the devices
+and to bind them to a suitable DPDK-supported driver, such as ``vfio-pci``.
+For example::
+
+ $ dpdk-devbind.py -b vfio-pci :08:00.1
+
+Device Probing and Initialization
+~
+
+To use the devices from an application, the dmadev API can be used.
+
+Once configured, the device can then be made ready for use
+by calling the ``rte_dma_start()`` API.
+
+Performing Data Copies
+~~
+
+Refer to the :ref:`Enqueue / Dequeue APIs ` section
+of the dmadev library documentation for details on operation enqueue and
+submission API usage.
+
+Performance Tuning Parameters
+~
+
+To achieve higher performance, DMA device needs to be tuned using PF kernel
+driver module parameters.
+
+Following options are exposed by kernel PF driver via devlink interface for
+tuning performance.
+
+``eng_sel``
+
+  ODM DMA device has 2 engines internally. Engine to queue mapping is decided
+  by a hardware register which can be configured as below::
+
+$ /sbin/devlink dev param set pci/:08:00.0 name eng_sel value 
3435973836 cmode runtime
+
+  Each bit in the register corresponds to one queue. Each queue would be
+  associated with one engine. If the value of the bit corresponding to the 
queue
+  is 0, then engine 0 would be picked. If it is 1, then engine 1 would be
+  picked.
+
+  In the above command, the register value is set as
+  ``1100 1100 1100 1100 1100 1100 1100 1100`` which allows for alternate 
engines
+  to be used with alternate VFs (assuming the system has 16 VFs with 2 queues
+  each).
+
+``max_load_request``
+
+  Specifies maximum outstanding load requests on internal bus. Values can range
+  from 1 to 512. Set to 512 for maximum requests in flight.::
+
+$ /sbin/devlink dev param set pci/:08:00.0 name max_load_request value 
512 cmode runtime
diff --git a/doc/guides/rel_notes/release_24_07.rst 
b/doc/guides/rel_notes/release_24_07.rst
index a69f24cf99..3bc8451330 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -55,6 +55,10 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Added Marvell Odyssey ODM DMA device suppo

Re: [PATCH v4 1/3] event/dlb2: add support for HW delayed token

On Thu, May 2, 2024 at 1:16 AM Abdullah Sevincer
 wrote:
>
> In DLB 2.5, hardware assist is available, complementing the Delayed
> token POP software implementation. When it is enabled, the feature
> works as follows:
>
> It stops CQ scheduling when the inflight limit associated with the CQ
> is reached. So the feature is activated only if the core is
> congested. If the core can handle multiple atomic flows, DLB will not
> try to switch them. This is an improvement over SW implementation
> which always switches the flows.
>
> The feature will resume CQ scheduling when the number of pending
> completions fall below a configured threshold. To emulate older 2.0
> behavior, this threshold is set to 1 by old APIs. SW sets CQ to
> auto-pop mode for token return, as tokens withholding is not
> necessary now. As HW counts completions and not tokens, events equal
> to HL (History List) entries will be scheduled to DLB before the
> feature activates and stops CQ scheduling.


1) Also tell about adding new PMD API and update the release notes for
PMD section for new feature.
2) Fix CI http://mails.dpdk.org/archives/test-report/2024-May/657681.html


>
> Signed-off-by: Abdullah Sevincer 
 +/** Set inflight threshold for flow migration */
> +#define DLB2_FLOW_MIGRATION_THRESHOLD RTE_BIT64(0)

Fix the namespace for public API, RTE_PMD_DLB2_PORT_SET_F_FLOW_MIGRATION_...


> +
> +/** Set port history list */
> +#define DLB2_SET_PORT_HL RTE_BIT64(1)

RTE_PMD_DLB2_PORT_SET_F_PORT_HL


> +
> +struct dlb2_port_param {

fix name space, rte_pmd_dlb2_port_params

> +   uint16_t inflight_threshold : 12;
> +};
> +
> +/*!
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Configure various port parameters.
> + * AUTO_POP. This function must be called before calling 
> rte_event_port_setup()
> + * for the port, but after calling rte_event_dev_configure().
> + *
> + * @param dev_id
> + *The identifier of the event device.
> + * @param port_id
> + *The identifier of the event port.
> + * @param flags
> + *Bitmask of the parameters being set.
> + * @param val
> + *Structure coantaining the values of parameters being set.

Why not use struct rte_pmd_dlb2_port_params itself instead of void *.

> + *
> + * @return
> + * - 0: Success
> + * - EINVAL: Invalid dev_id, port_id, or mode
> + * - EINVAL: The DLB2 is not configured, is already running, or the port is
> + *   already setup
> + */
> +__rte_experimental
> +int
> +rte_pmd_dlb2_set_port_param(uint8_t dev_id,
> +   uint8_t port_id,
> +   uint64_t flags,
> +   void *val);

Re: [PATCH v4 2/3] event/dlb2: add support for dynamic HL entries

On Thu, May 2, 2024 at 1:16 AM Abdullah Sevincer
 wrote:
>
> In DLB 2.5, hardware assist is available, complementing the Delayed
> token POP software implementation. When it is enabled, the feature
> works as follows:
>
> It stops CQ scheduling when the inflight limit associated with the CQ
> is reached. So the feature is activated only if the core is
> congested. If the core can handle multiple atomic flows, DLB will not
> try to switch them. This is an improvement over SW implementation
> which always switches the flows.
>
> The feature will resume CQ scheduling when the number of pending
> completions fall below a configured threshold.
>
> DLB has 64 LDB ports and 2048 HL entries. If all LDB ports are used,
> possible HL entries per LDB port equals 2048 / 64 = 32. So, the
> maximum CQ depth possible is 16, if all 64 LB ports are needed in a
> high-performance setting.
>
> In case all CQs are configured to have HL = 2* CQ Depth as a
> performance option, then the calculation of HL at the time of domain
> creation will be based on maximum possible dequeue depth. This could
> result in allocating too many HL  entries to the domain as DLB only
> has limited number of HL entries to be allocated. Hence, it is best
> to allow application to specify HL entries as a command line argument
> and override default allocation. A summary of usage is listed below:
>
> When 'use_default_hl = 1', Per port HL is set to
> DLB2_FIXED_CQ_HL_SIZE (32) and command line parameter
> alloc_hl_entries is ignored.
>
> When 'use_default_hl = 0', Per LDB port HL = 2 * CQ depth and per
> port HL is set to 2 * DLB2_FIXED_CQ_HL_SIZE.
>
> User should calculate needed HL entries based on CQ depths the
> application will use and specify it as command line parameter
> 'alloc_hl_entries'. This will be used to allocate HL entries.
> Hence, alloc_hl_entries = (Sum of all LDB ports CQ depths * 2).
>
> If alloc_hl_entries is not specified, then Total HL entries for the
> vdev = num_ldb_ports * 64.
>
> Signed-off-by: Abdullah Sevincer 

> }
> diff --git a/drivers/event/dlb2/dlb2_priv.h b/drivers/event/dlb2/dlb2_priv.h
> index d6828aa482..dc9f98e142 100644
> --- a/drivers/event/dlb2/dlb2_priv.h
> +++ b/drivers/event/dlb2/dlb2_priv.h
> @@ -52,6 +52,8 @@
>  #define DLB2_PRODUCER_COREMASK "producer_coremask"
>  #define DLB2_DEFAULT_LDB_PORT_ALLOCATION_ARG "default_port_allocation"
>  #define DLB2_ENABLE_CQ_WEIGHT_ARG "enable_cq_weight"
> +#define DLB2_USE_DEFAULT_HL "use_default_hl"
> +#define DLB2_ALLOC_HL_ENTRIES "alloc_hl_entries"


1)Update doc/guides/eventdevs/dlb2.rst for new devargs
2)Please release note PMD section for this feature.

Re: [PATCH v4 3/3] event/dlb2: enhance DLB credit handling

On Thu, May 2, 2024 at 1:27 AM Abdullah Sevincer
 wrote:
>
> This commit improves DLB credit handling scenarios when
> ports hold on to credits but can't release them due to insufficient
> accumulation (less than 2 * credit quanta).
>
> Worker ports now release all accumulated credits when back-to-back
> zero poll count reaches preset threshold.
>
> Producer ports release all accumulated credits if enqueue fails for a
> consecutive number of retries.
>
> In a multi-producer system, some producer(s) may exit early while
> holding on to credits. Now these are released during port unlink
> which needs to be performed by the application.
>
> test-eventdev is modified to call rte_event_port_unlink() to release
> any accumulated credits by producer ports.
>
> Signed-off-by: Abdullah Sevincer 
> ---
>  app/test-eventdev/test_perf_common.c |  20 +--

1) Spotted non-driver changes in driver patches, Please send
test-eventdev changes as separate commit with complete rational.
2) Fix CI issues http://mails.dpdk.org/archives/test-report/2024-May/657683.html



>  drivers/event/dlb2/dlb2.c| 203 +--
>  drivers/event/dlb2/dlb2_priv.h   |   1 +
>  drivers/event/dlb2/meson.build   |  12 ++
>  drivers/event/dlb2/meson_options.txt |   6 +
>  5 files changed, 194 insertions(+), 48 deletions(-)
>  create mode 100644 drivers/event/dlb2/meson_options.txt
>

>
>  static inline uint64_t
> diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
> index 11bbe30d7b..2c341a5845 100644
> --- a/drivers/event/dlb2/dlb2.c
> +++ b/drivers/event/dlb2/dlb2.c
> @@ -43,7 +43,47 @@
>   * to DLB can go ahead of relevant application writes like updates to buffers
>   * being sent with event
>   */
> +#ifndef DLB2_BYPASS_FENCE_ON_PP
>  #define DLB2_BYPASS_FENCE_ON_PP 0  /* 1 == Bypass fence, 0 == do not bypass 
> */
> +#endif
> +
> +/* HW credit checks can only be turned off for DLB2 device if following
> + * is true for each created eventdev
> + * LDB credits <= DIR credits + minimum CQ Depth
> + * (CQ Depth is minimum of all ports configured within eventdev)
> + * This needs to be true for all eventdevs created on any DLB2 device
> + * managed by this driver.
> + * DLB2.5 does not any such restriction as it has single credit pool
> + */
> +#ifndef DLB_HW_CREDITS_CHECKS
> +#define DLB_HW_CREDITS_CHECKS 1
> +#endif
> +
> +/*
> + * SW credit checks can only be turned off if application has a way to
> + * limit input events to the eventdev below assigned credit limit
> + */
> +#ifndef DLB_SW_CREDITS_CHECKS
> +#define DLB_SW_CREDITS_CHECKS 1
> +#endif
> +

> +
> +static void dlb2_check_and_return_credits(struct dlb2_eventdev_port *ev_port,
> + bool cond, uint32_t threshold)
> +{
> +#if DLB_SW_CREDITS_CHECKS || DLB_HW_CREDITS_CHECKS


This new patch is full of compilation flags clutter, can you make it runtime?

>
> diff --git a/drivers/event/dlb2/dlb2_priv.h b/drivers/event/dlb2/dlb2_priv.h
> index dc9f98e142..fd76b5b9fb 100644
> --- a/drivers/event/dlb2/dlb2_priv.h
> +++ b/drivers/event/dlb2/dlb2_priv.h
> @@ -527,6 +527,7 @@ struct __rte_cache_aligned dlb2_eventdev_port {
> struct rte_event_port_conf conf; /* user-supplied configuration */
> uint16_t inflight_credits; /* num credits this port has right now */
> uint16_t credit_update_quanta;
> +   uint32_t credit_return_count; /* count till the credit return 
> condition is true */
> struct dlb2_eventdev *dlb2; /* backlink optimization */
> alignas(RTE_CACHE_LINE_SIZE) struct dlb2_port_stats stats;
> struct dlb2_event_queue_link link[DLB2_MAX_NUM_QIDS_PER_LDB_CQ];
> diff --git a/drivers/event/dlb2/meson.build b/drivers/event/dlb2/meson.build
> index 515d1795fe..77a197e32c 100644
> --- a/drivers/event/dlb2/meson.build
> +++ b/drivers/event/dlb2/meson.build
> @@ -68,3 +68,15 @@ endif
>  headers = files('rte_pmd_dlb2.h')
>
>  deps += ['mbuf', 'mempool', 'ring', 'pci', 'bus_pci']
> +
> +if meson.version().version_compare('> 0.58.0')
> +fs = import('fs')
> +dlb_options = fs.read('meson_options.txt').strip().split('\n')
> +
> +foreach opt: dlb_options
> +   if (opt.strip().startswith('#') or opt.strip() == '')
> +   continue
> +   endif
> +   cflags += '-D' + opt.strip().to_upper().replace(' ','')
> +endforeach
> +endif
> diff --git a/drivers/event/dlb2/meson_options.txt 
> b/drivers/event/dlb2/meson_options.txt


Adding @Richardson, Bruce   @Thomas Monjalon   to comment on this, I
am not sure driver specific
meson_options.txt is a good path?



> new file mode 100644
> index 00..b57c999e54
> --- /dev/null
> +++ b/drivers/event/dlb2/meson_options.txt
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2023-2024 Intel Corporation
> +
> +DLB2_BYPASS_FENCE_ON_PP = 0
> +DLB_HW_CREDITS_CHECKS = 1
> +DLB_SW_CREDITS_CHECKS = 1
> --
> 2.25.1
>

Re: [PATCH] event/dsw: support explicit release only mode

On Sat, May 25, 2024 at 1:13 AM Mattias Rönnblom
 wrote:
>
> Add the RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE capability to the
> DSW event device.
>
> This feature may be used by an EAL thread to pull more work from the
> work scheduler, without giving up the option to forward events
> originating from a previous dequeue batch. This in turn allows an EAL
> thread to be productive while waiting for a hardware accelerator to
> complete some operation.
>
> Prior to this change, DSW didn't make any distinction between
> RTE_EVENT_OP_FORWARD and RTE_EVENT_OP_NEW type events, other than that
> new events would be backpressured earlier.
>
> After this change, DSW tracks the number of released events (i.e.,
> events of type RTE_EVENT_OP_FORWARD and RTE_EVENT_OP_RELASE) that has
> been enqueued.
>
> For efficency reasons, DSW does not track the *identity* of individual
> events. This in turn implies that a certain stage in the flow
> migration process, DSW must wait for all pending releases (on the
> migration source port, only) to be received from the application, to
> assure that no event pertaining to any of the to-be-migrated flows are
> being processed.
>
> With this change, DSW starts making a distinction between forward and
> new type events for credit allocation purposes. Only RTE_EVENT_OP_NEW
> events needs credits. All events marked as RTE_EVENT_OP_FORWARD must
> have a corresponding dequeued event from a previous dequeue batch.
>
> Flow migration for flows on RTE_SCHED_TYPE_PARALLEL queues remains
> unaffected by this change.
>
> A side-effect of the tweaked DSW migration logic is that the migration
> latency is reduced, regardless if implicit relase is enabled or not.
>
> Signed-off-by: Mattias Rönnblom 


1) Update releases for PMD specific section for this new feature
2) Fix CI issue as applicable

https://patches.dpdk.org/project/dpdk/patch/20240524192437.183960-1-mattias.ronnb...@ericsson.com/
http://mails.dpdk.org/archives/test-report/2024-May/672848.html
https://github.com/ovsrobot/dpdk/actions/runs/9229147658

Re: [PATCH v5] cnxk: disable building template files

2024-05-27 Thread Bruce Richardson

On Mon, May 27, 2024 at 09:04:29PM +0530, pbhagavat...@marvell.com wrote:
> From: Pavan Nikhilesh 
> 
> Disable building template files when CNXK_DIS_TMPLT_FUNC
> is defined as a part of c_args.
> This option can be used when reworking datapath or debugging
> issues to reduce Rx/Tx path compilation time.
> 
> Example command:
> meson build -Dc_args='-DCNXK_DIS_TMPLT_FUNC'
> -Dexamples=all  --cross-file config/arm/arm64_cn10k_linux_gcc
>
Should this option be set in CI by default, or in test-meson-builds by
default? When do we need to avoid setting this flag, vs setting it?

Thanks,
/Bruce

Re: [PATCH] event/dsw: support explicit release only mode

2024-05-27 Thread Mattias Rönnblom

On 2024-05-27 17:35, Jerin Jacob wrote:

On Sat, May 25, 2024 at 1:13 AM Mattias Rönnblom
wrote:

Add the RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE capability to the
DSW event device.

This feature may be used by an EAL thread to pull more work from the
work scheduler, without giving up the option to forward events
originating from a previous dequeue batch. This in turn allows an EAL
thread to be productive while waiting for a hardware accelerator to
complete some operation.

Prior to this change, DSW didn't make any distinction between
RTE_EVENT_OP_FORWARD and RTE_EVENT_OP_NEW type events, other than that
new events would be backpressured earlier.

After this change, DSW tracks the number of released events (i.e.,
events of type RTE_EVENT_OP_FORWARD and RTE_EVENT_OP_RELASE) that has
been enqueued.

For efficency reasons, DSW does not track the *identity* of individual
events. This in turn implies that a certain stage in the flow
migration process, DSW must wait for all pending releases (on the
migration source port, only) to be received from the application, to
assure that no event pertaining to any of the to-be-migrated flows are
being processed.

With this change, DSW starts making a distinction between forward and
new type events for credit allocation purposes. Only RTE_EVENT_OP_NEW
events needs credits. All events marked as RTE_EVENT_OP_FORWARD must
have a corresponding dequeued event from a previous dequeue batch.

Flow migration for flows on RTE_SCHED_TYPE_PARALLEL queues remains
unaffected by this change.

A side-effect of the tweaked DSW migration logic is that the migration
latency is reduced, regardless if implicit relase is enabled or not.

Signed-off-by: Mattias Rönnblom

1) Update releases for PMD specific section for this new feature

Should the release note update be in the same patch, or a separate?

2) Fix CI issue as applicable

https://patches.dpdk.org/project/dpdk/patch/20240524192437.183960-1-mattias.ronnb...@ericsson.com/
http://mails.dpdk.org/archives/test-report/2024-May/672848.html
https://github.com/ovsrobot/dpdk/actions/runs/9229147658

[PATCH] net/i40e: increase descriptor queue length to 8160

2024-05-27 Thread Igor Gutorov

According to the Intel X710/XXV710/XL710 Datasheet, the maximum receive
queue descriptor length is 0x1FE0 (8160 in base 10). This is specified
as QLEN in table 8-12, page 1083.

I've tested this change with an XXV710 NIC and it has positive effect on
performance under high load scenarios. Where previously I'd get
~2000 packets/sec miss rate, now I get only ~40 packets/sec miss rate.

Signed-off-by: Igor Gutorov 
---
 drivers/net/i40e/i40e_rxtx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 2f2f890855..33fc9770d9 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -25,7 +25,7 @@
 #define I40E_RX_MAX_DATA_BUF_SIZE  (16 * 1024 - 128)
 
 #defineI40E_MIN_RING_DESC  64
-#defineI40E_MAX_RING_DESC  4096
+#defineI40E_MAX_RING_DESC  8160
 
 #define I40E_FDIR_NUM_TX_DESC   (I40E_FDIR_PRG_PKT_CNT << 1)
 #define I40E_FDIR_NUM_RX_DESC   (I40E_FDIR_PRG_PKT_CNT << 1)
-- 
2.45.1

Re: DPDK patch for Amston Lake SGMII <> GPY215

2024-05-27 Thread Ferruh Yigit

On 5/27/2024 2:48 PM, Amy.Shih wrote:

Copied down, please bottom post.

> 
> Best Regards,
> Amy Shih
> Advantech ICVG x86 Software
> 02-7732-3399 Ext. 1249
> 
> -Original Message-
> From: Ferruh Yigit  
> Sent: Monday, May 27, 2024 4:58 PM
> To: Jack.Chen ; dev@dpdk.org
> Cc: Amy.Shih ; bill.lu ; 
> Jenny3.Lin ; Bruce Richardson 
> ; Mcnamara, John 
> Subject: Re: DPDK patch for Amston Lake SGMII <> GPY215
> 
> On 5/24/2024 6:40 AM, Jack.Chen wrote:
>> Dear DPDK Dev .
>>
>> This is PM from Advantech ENPD. We are working on Intel Amston Lake 
>> CPU’s  SGMII <> GPY215 PHY for DPDK test but fail.
>>
>> We consulted with Intel support team and they suggested we should 
>> consult DPDK community and it should have the patch or code change for 
>> Amston Lake <> GYP215 available for DPDK.
>>
>> Could you kindly suggest us the direction of it?  I also keep my 
>> Engineering team in this mail loop for further discussion.
>>
>>  
>>
>> Thank you so much
>>
>>  
>>
>> The error message while we testing DPDK
>>
>> SoC 2.5G LAN (BIOS  set to 1G) with dpdk 24.03.0. It can run testpmd
>> test, and  error message as follows :
>>
>> root@fwa-1214-efi:~/dpdk/dpdk-24.03/build/app# ./dpdk-testpmd -c 0xf -n
>> 1 -a 00:1e.4 --socket-mem=2048,0 -- -i --mbcache=512 --numa
>> --port-numa-config=0,0 --socket-num=0 --coremask=0x2 --nb-cores=1
>> --rxq=1 --txq=1 --portmask=0x1 --rxd=2048 --rxfreet=64 --rxpt=64
>> --rxht=8 --rxwt=0 --txd=2048 --txfreet=64 --txpt=64 --txht=0 --txwt=0
>> --burst=64 --txrst=64 --rss-ip -a
>>
>> EAL: Detected CPU lcores: 4
>>
>> EAL: Detected NUMA nodes: 1
>>
>> EAL: Detected static linkage of DPDK
>>
>> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>>
>> EAL: Selected IOVA mode 'PA'
>>
>> TELEMETRY: No legacy callbacks, legacy socket not created
>>
>> testpmd: No probed ethernet devices
>>
>> Interactive-mode selected
>>
>> Fail: input rxq (1) can't be greater than max_rx_queues (0) of port 0
>>
>> EAL: Error - exiting with code: 1
>>
>>  Cause: rxq 1 invalid - must be >= 0 && <= 0
>>
>>  
> 
> 
> Hi Jack,
> 
> According above log device is not detected.
> What is the Ehternet controller connected to the "GPY215 PHY" and do you
> know if it has required driver in DPDK for it?
> If device sits on PCIe bus, you can check it via `lspci`.
> 
>
> Hi Ferruh:
>
> The Ethernet controller connected to the "GPY215 PHY" is the
integrated Gigabit Ethernet (GbE) controller from the Intel Amston Lake
CPU.
> The output of `lspci` is as follows:
>
> 00:1e.4 Ethernet controller [0200]: Intel Corporation Device [8086:54ac]

In Linux kernel, it is "dwmac-intel" driver, and as far as I can see it
is not supported in DPDK.

Re: [PATCH] event/dsw: support explicit release only mode

On Mon, May 27, 2024 at 9:38 PM Mattias Rönnblom  wrote:
>
> On 2024-05-27 17:35, Jerin Jacob wrote:
> > On Sat, May 25, 2024 at 1:13 AM Mattias Rönnblom
> >  wrote:
> >>
> >> Add the RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE capability to the
> >> DSW event device.
> >>
> >> This feature may be used by an EAL thread to pull more work from the
> >> work scheduler, without giving up the option to forward events
> >> originating from a previous dequeue batch. This in turn allows an EAL
> >> thread to be productive while waiting for a hardware accelerator to
> >> complete some operation.
> >>
> >> Prior to this change, DSW didn't make any distinction between
> >> RTE_EVENT_OP_FORWARD and RTE_EVENT_OP_NEW type events, other than that
> >> new events would be backpressured earlier.
> >>
> >> After this change, DSW tracks the number of released events (i.e.,
> >> events of type RTE_EVENT_OP_FORWARD and RTE_EVENT_OP_RELASE) that has
> >> been enqueued.
> >>
> >> For efficency reasons, DSW does not track the *identity* of individual
> >> events. This in turn implies that a certain stage in the flow
> >> migration process, DSW must wait for all pending releases (on the
> >> migration source port, only) to be received from the application, to
> >> assure that no event pertaining to any of the to-be-migrated flows are
> >> being processed.
> >>
> >> With this change, DSW starts making a distinction between forward and
> >> new type events for credit allocation purposes. Only RTE_EVENT_OP_NEW
> >> events needs credits. All events marked as RTE_EVENT_OP_FORWARD must
> >> have a corresponding dequeued event from a previous dequeue batch.
> >>
> >> Flow migration for flows on RTE_SCHED_TYPE_PARALLEL queues remains
> >> unaffected by this change.
> >>
> >> A side-effect of the tweaked DSW migration logic is that the migration
> >> latency is reduced, regardless if implicit relase is enabled or not.
> >>
> >> Signed-off-by: Mattias Rönnblom 
> >
> >
> > 1) Update releases for PMD specific section for this new feature
>
> Should the release note update be in the same patch, or a separate?

Same patch.


>
> > 2) Fix CI issue as applicable
> >
> > https://patches.dpdk.org/project/dpdk/patch/20240524192437.183960-1-mattias.ronnb...@ericsson.com/
> > http://mails.dpdk.org/archives/test-report/2024-May/672848.html
> > https://github.com/ovsrobot/dpdk/actions/runs/9229147658

Re: [PATCH] event: fix warning from useless snprintf

On Thu, Apr 25, 2024 at 12:41 AM Stephen Hemminger
 wrote:
>
> On Wed, 24 Apr 2024 17:12:39 +
> "Van Haaren, Harry"  wrote:
>
> > > 
> > > From: Stephen Hemminger 
> > > Sent: Wednesday, April 24, 2024 5:13 PM
> > > To: Van Haaren, Harry
> > > Cc: dev@dpdk.org; Richardson, Bruce; Jerin Jacob
> > > Subject: Re: [PATCH] event: fix warning from useless snprintf
> > >
> > > On Wed, 24 Apr 2024 08:45:52 +
> > > "Van Haaren, Harry"  wrote:
> > >
> > > > > From: Stephen Hemminger 
> > > > > Sent: Wednesday, April 24, 2024 4:45 AM
> > > > > To: dev@dpdk.org
> > > > > Cc: Richardson, Bruce; Stephen Hemminger; Van Haaren, Harry; Jerin 
> > > > > Jacob
> > > > > Subject: [PATCH] event: fix warning from useless snprintf
> > > > >
> > > > > With Gcc-14, this warning is generated:
> > > > > ../drivers/event/sw/sw_evdev.c:263:3: warning: 'snprintf' will always 
> > > > > be truncated;
> > > > > specified size is 12, but format string expands to at least 13 
> > > > > [-Wformat-truncation]
> > > > >   263 | snprintf(buf, sizeof(buf), "sw%d_iq_%d_rob", 
> > > > > dev_id, i);
> > > > >   | ^
> > > > >
> > > > > Yet the whole printf to the buf is unnecessary. The type string 
> > > > > argument
> > > > > has never been implemented, and should just be NULL.  Removing the
> > > > > unnecessary snprintf, then means IQ_ROB_NAMESIZE can be removed.
> > > >
> > > > I understand that today the "type" value isn't implemented, but across 
> > > > the DPDK codebase it
> > > > seems like others are filling in "type" to be some debug-useful 
> > > > name/string. If it was added
> > > > in future it'd be nice to have the ROB/IQ memory identified by name, 
> > > > like the rest of DPDK components.
> > >
> > > No, don't bother. This is a case of 
> > > https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it
> >
> > I agree that YAGNI perhaps applied when designing the APIs, but the "type" 
> > parameter is there now...
> > Should we add a guidance of "when reworking code, always pass NULL as the 
> > type parameter to rte_malloc functions" somewhere in the programmers guide, 
> > to align community with this "pass NULL for type" initiative?
> >
> > 
> >
> > Acked-by: Harry van Haaren 


Changed to event/sw:

Applied to dpdk-next-eventdev/for-main. Thanks


> >

Re: [PATCH 10/10] net/cnxk: define CPT HW result format for PMD API

On Fri, May 17, 2024 at 1:16 PM Nithin Dabilpuram
 wrote:
>
> From: Srujana Challa 
>
> Defines CPT HW result format for PMD API,
> rte_pmd_cnxk_inl_ipsec_res().
>
> Signed-off-by: Srujana Challa 
> ---
>  drivers/net/cnxk/cn10k_ethdev_sec.c |  4 ++--
>  drivers/net/cnxk/rte_pmd_cnxk.h | 28 ++--
>  2 files changed, 28 insertions(+), 4 deletions(-)
>

>
> +/** CPT HW result format */
> +union rte_pmd_cnxk_cpt_res_s {
> +   struct rte_pmd_cpt_cn10k_res_s {
> +   uint64_t compcode : 7;

Public API, Please add Doxygen for every symbol and check the
generated HTML files.


> +   uint64_t doneint : 1;
> +   uint64_t uc_compcode : 8;
> +   uint64_t rlen : 16;
> +   uint64_t spi : 32;
> +
> +   uint64_t esn;
> +   } cn10k;
> +
> +   struct rte_pmd_cpt_cn9k_res_s {
> +   uint64_t compcode : 8;
> +   uint64_t uc_compcode : 8;
> +   uint64_t doneint : 1;
> +   uint64_t reserved_17_63 : 47;
> +
> +   uint64_t reserved_64_127;
> +   } cn9k;
> +
> +   uint64_t u64[2];
> +};
> +

> 2.25.1
>

Re: [PATCH 09/10] net/cnxk: clear CGX stats during xstats reset

On Fri, May 17, 2024 at 1:16 PM Nithin Dabilpuram
 wrote:
>
> From: Sunil Kumar Kori 
>
> Currently only NIX stats are cleared during xstats
> reset and CGX stats are left as it is.
>
> Clearing CGX stats too during xstats reset.
>
> Signed-off-by: Sunil Kumar Kori 


Change to fix and add Fixes: tag


> ---
>  drivers/net/cnxk/cnxk_stats.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/cnxk/cnxk_stats.c b/drivers/net/cnxk/cnxk_stats.c
> index f2fc89..469faff405 100644
> --- a/drivers/net/cnxk/cnxk_stats.c
> +++ b/drivers/net/cnxk/cnxk_stats.c
> @@ -316,6 +316,8 @@ cnxk_nix_xstats_reset(struct rte_eth_dev *eth_dev)
> goto exit;
> }
>
> +   /* Reset MAC stats */
> +   rc = roc_nix_mac_stats_reset(nix);
>  exit:
> return rc;
>  }
> --
> 2.25.1
>

Re: [PATCH 06/10] net/cnxk: add option to disable custom meta aura

On Fri, May 17, 2024 at 1:23 PM Nithin Dabilpuram
 wrote:
>
> Add option to explicitly disable custom meta aura. Currently
> custom meta aura is enabled automatically when inl_cpt_channel
> is set i.e inline dev is masking CHAN field in IPsec rules.
>
> Also decouple the custom meta aura feature from custom sa action
> so that the custom sa action can independently be used.
>
> Signed-off-by: Nithin Dabilpuram 
> ---
>  doc/guides/nics/cnxk.rst   | 13 +
>  drivers/common/cnxk/roc_nix_inl.c  | 19 +--
>  drivers/common/cnxk/roc_nix_inl.h  |  1 +
>  drivers/common/cnxk/version.map|  1 +
>  drivers/net/cnxk/cnxk_ethdev.c |  5 +
>  drivers/net/cnxk/cnxk_ethdev.h |  3 +++
>  drivers/net/cnxk/cnxk_ethdev_devargs.c |  8 +++-
>  7 files changed, 43 insertions(+), 7 deletions(-)
>
> diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
> index f5f296ee36..99ad224efd 100644
> --- a/doc/guides/nics/cnxk.rst
> +++ b/doc/guides/nics/cnxk.rst
> @@ -444,6 +444,19 @@ Runtime Config Options
> With the above configuration, driver would enable packet inject from ARM 
> cores
> to crypto to process and send back in Rx path.
>
> +- ``Disable custom meta aura feature`` (default ``0``)
> +
> +   Custom meta aura i.e 1:N meta aura is enabled for second pass traffic by 
> default when
> +   ``inl_cpt_channel`` devarg is provided. Provide an option to disable the 
> custom
> +   meta aura feature by setting devarg ``custom_meta_aura_dis`` to ``1``.


Update release notes for PMD section for this new feature.

> +
> +   For example::
> +
> + -a 0002:02:00.0,custom_meta_aura_dis=1
> +
> +   With the above configuration, driver would disable custom meta aura 
> feature for
> +   ``0002:02:00.0`` ethdev.
> +
>  .. note::

RE: [PATCH v3 0/7] Fix outer UDP checksum for Intel nics

2024-05-27 Thread Ali Alnubani

> -Original Message-
> From: David Marchand 
> Sent: Thursday, April 18, 2024 11:20 AM
> To: dev@dpdk.org
> Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) ;
> ferruh.yi...@amd.com
> Subject: [PATCH v3 0/7] Fix outer UDP checksum for Intel nics
> 
> This series aims at fixing outer UDP checksum for Intel nics (i40e and
> ice).
> The net/hns3 is really similar in its internals and has been aligned.
> 
> As I touched testpmd csumonly engine, this series may break other
> vendors outer offloads, so please vendors, review and test this ASAP.
> 
> Thanks.
> 
> 
> --

Hello,

I have tested the patchset on ConnectX-6 Dx as suggested by Thomas and didn't 
see failures in our functional tests caused by the changes.

Tested-by: Ali Alnubani 

Regards,
Ali

Including contigmem in core dumps

2024-05-27 Thread Lewis Donzis

I've been wondering why we exclude memory allocated by eal_get_virtual_area() 
from core dumps? (More specifically, it calls eal_mem_set_dump() to call 
madvise() to disable core dumps from the allocated region.) 

On many occasions, when debugging after a crash, it would have been very 
convenient to be able to see the contents of an mbuf or other object allocated 
in contigmem space. And we often avoid using the rte memory allocator just 
because of this. 

Is there any reason for this, or could it perhaps be a compile-time 
configuration option not to call madvise()?

[PATCH 0/2] support new firmware name scheme

This patch series refator the firmware load logic and add support of
the fourth firmware name scheme, and this name scheme takes the third
priority.

Chaoyong He (2):
  net/nfp: refactor the firmware load logic
  net/nfp: support new firmware name scheme

 drivers/net/nfp/nfp_ethdev.c | 73 ++--
 1 file changed, 36 insertions(+), 37 deletions(-)

-- 
2.39.1

[PATCH 1/2] net/nfp: refactor the firmware load logic

Refactor the firmware load logic, make it more specific and clear.

Signed-off-by: Chaoyong He 
Reviewed-by: Long Wu 
Reviewed-by: Peng Zhang 
---
 drivers/net/nfp/nfp_ethdev.c | 66 
 1 file changed, 29 insertions(+), 37 deletions(-)

diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index cdc946faff..771137db92 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -1081,16 +1081,18 @@ nfp_net_init(struct rte_eth_dev *eth_dev,
 
 static int
 nfp_fw_get_name(struct rte_pci_device *dev,
-   struct nfp_nsp *nsp,
-   char *card,
+   struct nfp_cpp *cpp,
+   struct nfp_eth_table *nfp_eth_table,
+   struct nfp_hwinfo *hwinfo,
char *fw_name,
size_t fw_size)
 {
char serial[40];
uint16_t interface;
+   char card_desc[100];
uint32_t cpp_serial_len;
+   const char *nfp_fw_model;
const uint8_t *cpp_serial;
-   struct nfp_cpp *cpp = nfp_nsp_cpp(nsp);
 
cpp_serial_len = nfp_cpp_serial(cpp, &cpp_serial);
if (cpp_serial_len != NFP_SERIAL_LEN)
@@ -1119,8 +1121,20 @@ nfp_fw_get_name(struct rte_pci_device *dev,
if (access(fw_name, F_OK) == 0)
return 0;
 
+   nfp_fw_model = nfp_hwinfo_lookup(hwinfo, "nffw.partno");
+   if (nfp_fw_model == NULL) {
+   nfp_fw_model = nfp_hwinfo_lookup(hwinfo, "assembly.partno");
+   if (nfp_fw_model == NULL) {
+   PMD_DRV_LOG(ERR, "firmware model NOT found");
+   return -EIO;
+   }
+   }
+
/* Finally try the card type and media */
-   snprintf(fw_name, fw_size, "%s/%s", DEFAULT_FW_PATH, card);
+   snprintf(card_desc, sizeof(card_desc), "nic_%s_%dx%d.nffw",
+   nfp_fw_model, nfp_eth_table->count,
+   nfp_eth_table->ports[0].speed / 1000);
+   snprintf(fw_name, fw_size, "%s/%s", DEFAULT_FW_PATH, card_desc);
PMD_DRV_LOG(DEBUG, "Trying with fw file: %s", fw_name);
if (access(fw_name, F_OK) == 0)
return 0;
@@ -1364,49 +1378,20 @@ nfp_fw_setup(struct rte_pci_device *dev,
 {
int err;
char fw_name[125];
-   char card_desc[100];
struct nfp_nsp *nsp;
-   const char *nfp_fw_model;
-
-   nfp_fw_model = nfp_hwinfo_lookup(hwinfo, "nffw.partno");
-   if (nfp_fw_model == NULL)
-   nfp_fw_model = nfp_hwinfo_lookup(hwinfo, "assembly.partno");
-
-   if (nfp_fw_model != NULL) {
-   PMD_DRV_LOG(INFO, "firmware model found: %s", nfp_fw_model);
-   } else {
-   PMD_DRV_LOG(ERR, "firmware model NOT found");
-   return -EIO;
-   }
 
-   if (nfp_eth_table->count == 0 || nfp_eth_table->count > 8) {
-   PMD_DRV_LOG(ERR, "NFP ethernet table reports wrong ports: %u",
-   nfp_eth_table->count);
-   return -EIO;
+   err = nfp_fw_get_name(dev, cpp, nfp_eth_table, hwinfo, fw_name, 
sizeof(fw_name));
+   if (err != 0) {
+   PMD_DRV_LOG(ERR, "Can't find suitable firmware.");
+   return err;
}
 
-   PMD_DRV_LOG(INFO, "NFP ethernet port table reports %u ports",
-   nfp_eth_table->count);
-
-   PMD_DRV_LOG(INFO, "Port speed: %u", nfp_eth_table->ports[0].speed);
-
-   snprintf(card_desc, sizeof(card_desc), "nic_%s_%dx%d.nffw",
-   nfp_fw_model, nfp_eth_table->count,
-   nfp_eth_table->ports[0].speed / 1000);
-
nsp = nfp_nsp_open(cpp);
if (nsp == NULL) {
PMD_DRV_LOG(ERR, "NFP error when obtaining NSP handle");
return -EIO;
}
 
-   err = nfp_fw_get_name(dev, nsp, card_desc, fw_name, sizeof(fw_name));
-   if (err != 0) {
-   PMD_DRV_LOG(ERR, "Can't find suitable firmware.");
-   nfp_nsp_close(nsp);
-   return err;
-   }
-
if (multi_pf->enabled)
err = nfp_fw_reload_for_multi_pf(nsp, fw_name, cpp, dev_info, 
multi_pf,
force_reload_fw);
@@ -1853,6 +1838,13 @@ nfp_pf_init(struct rte_pci_device *pci_dev)
goto hwinfo_cleanup;
}
 
+   if (nfp_eth_table->count == 0 || nfp_eth_table->count > 8) {
+   PMD_INIT_LOG(ERR, "NFP ethernet table reports wrong ports: %u",
+   nfp_eth_table->count);
+   ret = -EIO;
+   goto eth_table_cleanup;
+   }
+
pf_dev->multi_pf.enabled = nfp_check_multi_pf_from_nsp(pci_dev, cpp);
pf_dev->multi_pf.function_id = function_id;
 
-- 
2.39.1

[PATCH 2/2] net/nfp: support new firmware name scheme

Now all application firmware is indifferent of port speed, so do not
bother to compose the firmware name with media info. This will reduce
a number of symlinks for firmware files.

The logic of firmware name with media info still kept for compatibility.

Signed-off-by: Chaoyong He 
Reviewed-by: Long Wu 
Reviewed-by: Peng Zhang 
---
 drivers/net/nfp/nfp_ethdev.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index 771137db92..74d4a726df 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -1130,6 +1130,13 @@ nfp_fw_get_name(struct rte_pci_device *dev,
}
}
 
+   /* And then try the model name */
+   snprintf(card_desc, sizeof(card_desc), "%s.nffw", nfp_fw_model);
+   snprintf(fw_name, fw_size, "%s/%s", DEFAULT_FW_PATH, card_desc);
+   PMD_DRV_LOG(DEBUG, "Trying with fw file: %s", fw_name);
+   if (access(fw_name, F_OK) == 0)
+   return 0;
+
/* Finally try the card type and media */
snprintf(card_desc, sizeof(card_desc), "nic_%s_%dx%d.nffw",
nfp_fw_model, nfp_eth_table->count,
-- 
2.39.1

[PATCH 0/4] support generic flow item

This patch series add the support of some generic flow items, namely 
flow items with a NULL 'item->spec' value, including:
- ETH flow item
- TCP flow item
- UDP flow item
- SCTP flow item

Chaoyong He (4):
  net/nfp: support generic ETH flow item
  net/nfp: support generic TCP flow item
  net/nfp: support generic UDP flow item
  net/nfp: support generic SCTP flow item

 drivers/net/nfp/flower/nfp_flower_flow.c | 126 +++
 1 file changed, 83 insertions(+), 43 deletions(-)

-- 
2.39.1

[PATCH 1/4] net/nfp: support generic ETH flow item

Add support of ETH flow item with a NULL 'item->spec' value.

Signed-off-by: Chaoyong He 
Reviewed-by: Long Wu 
Reviewed-by: Peng Zhang 
---
 drivers/net/nfp/flower/nfp_flower_flow.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/nfp/flower/nfp_flower_flow.c 
b/drivers/net/nfp/flower/nfp_flower_flow.c
index ae3f25e410..098a714ea5 100644
--- a/drivers/net/nfp/flower/nfp_flower_flow.c
+++ b/drivers/net/nfp/flower/nfp_flower_flow.c
@@ -1230,10 +1230,9 @@ nfp_flow_merge_eth(__rte_unused struct nfp_app_fw_flower 
*app_fw_flower,
}
 
eth->mpls_lse = 0;
-
-eth_end:
*mbuf_off += sizeof(struct nfp_flower_mac_mpls);
 
+eth_end:
return 0;
 }
 
-- 
2.39.1

[PATCH 2/4] net/nfp: support generic TCP flow item

Add support of TCP flow item with a NULL 'item->spec' value.

Signed-off-by: Chaoyong He 
Reviewed-by: Long Wu 
Reviewed-by: Peng Zhang 
---
 drivers/net/nfp/flower/nfp_flower_flow.c | 45 +++-
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/drivers/net/nfp/flower/nfp_flower_flow.c 
b/drivers/net/nfp/flower/nfp_flower_flow.c
index 098a714ea5..810f55f805 100644
--- a/drivers/net/nfp/flower/nfp_flower_flow.c
+++ b/drivers/net/nfp/flower/nfp_flower_flow.c
@@ -1316,11 +1316,6 @@ nfp_flow_merge_ipv4(__rte_unused struct 
nfp_app_fw_flower *app_fw_flower,
ipv4_udp_tun->ipv4.dst   = hdr->dst_addr;
}
} else {
-   if (spec == NULL) {
-   PMD_DRV_LOG(DEBUG, "nfp flow merge ipv4: no 
item->spec!");
-   goto ipv4_end;
-   }
-
/*
 * Reserve space for L4 info.
 * rte_flow has ipv4 before L4 but NFP flower fw requires L4 
before ipv4.
@@ -1328,6 +1323,11 @@ nfp_flow_merge_ipv4(__rte_unused struct 
nfp_app_fw_flower *app_fw_flower,
if ((meta_tci->nfp_flow_key_layer & NFP_FLOWER_LAYER_TP) != 0)
*mbuf_off += sizeof(struct nfp_flower_tp_ports);
 
+   if (spec == NULL) {
+   PMD_DRV_LOG(DEBUG, "nfp flow merge ipv4: no 
item->spec!");
+   goto ipv4_end;
+   }
+
hdr = is_mask ? &mask->hdr : &spec->hdr;
ipv4 = (struct nfp_flower_ipv4 *)*mbuf_off;
 
@@ -1399,11 +1399,6 @@ nfp_flow_merge_ipv6(__rte_unused struct 
nfp_app_fw_flower *app_fw_flower,
sizeof(ipv6_udp_tun->ipv6.ipv6_dst));
}
} else {
-   if (spec == NULL) {
-   PMD_DRV_LOG(DEBUG, "nfp flow merge ipv6: no 
item->spec!");
-   goto ipv6_end;
-   }
-
/*
 * Reserve space for L4 info.
 * rte_flow has ipv6 before L4 but NFP flower fw requires L4 
before ipv6.
@@ -1411,6 +1406,11 @@ nfp_flow_merge_ipv6(__rte_unused struct 
nfp_app_fw_flower *app_fw_flower,
if ((meta_tci->nfp_flow_key_layer & NFP_FLOWER_LAYER_TP) != 0)
*mbuf_off += sizeof(struct nfp_flower_tp_ports);
 
+   if (spec == NULL) {
+   PMD_DRV_LOG(DEBUG, "nfp flow merge ipv6: no 
item->spec!");
+   goto ipv6_end;
+   }
+
hdr = is_mask ? &mask->hdr : &spec->hdr;
vtc_flow = rte_be_to_cpu_32(hdr->vtc_flow);
ipv6 = (struct nfp_flower_ipv6 *)*mbuf_off;
@@ -1445,23 +1445,34 @@ nfp_flow_merge_tcp(__rte_unused struct 
nfp_app_fw_flower *app_fw_flower,
const struct rte_flow_item_tcp *mask;
struct nfp_flower_meta_tci *meta_tci;
 
-   spec = item->spec;
-   if (spec == NULL) {
-   PMD_DRV_LOG(DEBUG, "nfp flow merge tcp: no item->spec!");
-   return 0;
-   }
-
meta_tci = (struct nfp_flower_meta_tci 
*)nfp_flow->payload.unmasked_data;
if ((meta_tci->nfp_flow_key_layer & NFP_FLOWER_LAYER_IPV4) != 0) {
ipv4  = (struct nfp_flower_ipv4 *)
(*mbuf_off - sizeof(struct nfp_flower_ipv4));
+   if (is_mask)
+   ipv4->ip_ext.proto = 0xFF;
+   else
+   ipv4->ip_ext.proto = IPPROTO_TCP;
ports = (struct nfp_flower_tp_ports *)
((char *)ipv4 - sizeof(struct 
nfp_flower_tp_ports));
-   } else { /* IPv6 */
+   } else if ((meta_tci->nfp_flow_key_layer & NFP_FLOWER_LAYER_IPV6) != 0) 
{
ipv6  = (struct nfp_flower_ipv6 *)
(*mbuf_off - sizeof(struct nfp_flower_ipv6));
+   if (is_mask)
+   ipv6->ip_ext.proto = 0xFF;
+   else
+   ipv6->ip_ext.proto = IPPROTO_TCP;
ports = (struct nfp_flower_tp_ports *)
((char *)ipv6 - sizeof(struct 
nfp_flower_tp_ports));
+   } else {
+   PMD_DRV_LOG(ERR, "nfp flow merge tcp: no L3 layer!");
+   return -EINVAL;
+   }
+
+   spec = item->spec;
+   if (spec == NULL) {
+   PMD_DRV_LOG(DEBUG, "nfp flow merge tcp: no item->spec!");
+   return 0;
}
 
mask = item->mask ? item->mask : proc->mask_default;
-- 
2.39.1

[PATCH 3/4] net/nfp: support generic UDP flow item

Add support of UDP flow item with a NULL 'item->spec' value.

Signed-off-by: Chaoyong He 
Reviewed-by: Long Wu 
Reviewed-by: Peng Zhang 
---
 drivers/net/nfp/flower/nfp_flower_flow.c | 41 
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/net/nfp/flower/nfp_flower_flow.c 
b/drivers/net/nfp/flower/nfp_flower_flow.c
index 810f55f805..4cbdfd02b8 100644
--- a/drivers/net/nfp/flower/nfp_flower_flow.c
+++ b/drivers/net/nfp/flower/nfp_flower_flow.c
@@ -1522,18 +1522,13 @@ nfp_flow_merge_udp(__rte_unused struct 
nfp_app_fw_flower *app_fw_flower,
bool is_mask,
bool is_outer_layer)
 {
-   char *ports_off;
struct nfp_flower_tp_ports *ports;
+   struct nfp_flower_ipv4 *ipv4 = NULL;
+   struct nfp_flower_ipv6 *ipv6 = NULL;
const struct rte_flow_item_udp *spec;
const struct rte_flow_item_udp *mask;
struct nfp_flower_meta_tci *meta_tci;
 
-   spec = item->spec;
-   if (spec == NULL) {
-   PMD_DRV_LOG(DEBUG, "nfp flow merge udp: no item->spec!");
-   return 0;
-   }
-
/* Don't add L4 info if working on a inner layer pattern */
if (!is_outer_layer) {
PMD_DRV_LOG(INFO, "Detected inner layer UDP, skipping.");
@@ -1542,13 +1537,33 @@ nfp_flow_merge_udp(__rte_unused struct 
nfp_app_fw_flower *app_fw_flower,
 
meta_tci = (struct nfp_flower_meta_tci 
*)nfp_flow->payload.unmasked_data;
if ((meta_tci->nfp_flow_key_layer & NFP_FLOWER_LAYER_IPV4) != 0) {
-   ports_off = *mbuf_off - sizeof(struct nfp_flower_ipv4) -
-   sizeof(struct nfp_flower_tp_ports);
-   } else {/* IPv6 */
-   ports_off = *mbuf_off - sizeof(struct nfp_flower_ipv6) -
-   sizeof(struct nfp_flower_tp_ports);
+   ipv4 = (struct nfp_flower_ipv4 *)
+   (*mbuf_off - sizeof(struct nfp_flower_ipv4));
+   if (is_mask)
+   ipv4->ip_ext.proto = 0xFF;
+   else
+   ipv4->ip_ext.proto = IPPROTO_UDP;
+   ports = (struct nfp_flower_tp_ports *)
+   ((char *)ipv4 - sizeof(struct 
nfp_flower_tp_ports));
+   } else if ((meta_tci->nfp_flow_key_layer & NFP_FLOWER_LAYER_IPV6) != 0) 
{
+   ipv6 = (struct nfp_flower_ipv6 *)
+   (*mbuf_off - sizeof(struct nfp_flower_ipv6));
+   if (is_mask)
+   ipv6->ip_ext.proto = 0xFF;
+   else
+   ipv6->ip_ext.proto = IPPROTO_UDP;
+   ports = (struct nfp_flower_tp_ports *)
+   ((char *)ipv6 - sizeof(struct 
nfp_flower_tp_ports));
+   } else {
+   PMD_DRV_LOG(ERR, "nfp flow merge udp: no L3 layer!");
+   return -EINVAL;
+   }
+
+   spec = item->spec;
+   if (spec == NULL) {
+   PMD_DRV_LOG(DEBUG, "nfp flow merge udp: no item->spec!");
+   return 0;
}
-   ports = (struct nfp_flower_tp_ports *)ports_off;
 
mask = item->mask ? item->mask : proc->mask_default;
if (is_mask) {
-- 
2.39.1

[PATCH 4/4] net/nfp: support generic SCTP flow item

Add support of SCTP flow item with a NULL 'item->spec' value.

Signed-off-by: Chaoyong He 
Reviewed-by: Long Wu 
Reviewed-by: Peng Zhang 
---
 drivers/net/nfp/flower/nfp_flower_flow.c | 37 +---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/net/nfp/flower/nfp_flower_flow.c 
b/drivers/net/nfp/flower/nfp_flower_flow.c
index 4cbdfd02b8..bd77807db0 100644
--- a/drivers/net/nfp/flower/nfp_flower_flow.c
+++ b/drivers/net/nfp/flower/nfp_flower_flow.c
@@ -1586,28 +1586,43 @@ nfp_flow_merge_sctp(__rte_unused struct 
nfp_app_fw_flower *app_fw_flower,
bool is_mask,
__rte_unused bool is_outer_layer)
 {
-   char *ports_off;
struct nfp_flower_tp_ports *ports;
+   struct nfp_flower_ipv4 *ipv4 = NULL;
+   struct nfp_flower_ipv6 *ipv6 = NULL;
struct nfp_flower_meta_tci *meta_tci;
const struct rte_flow_item_sctp *spec;
const struct rte_flow_item_sctp *mask;
 
+   meta_tci = (struct nfp_flower_meta_tci 
*)nfp_flow->payload.unmasked_data;
+   if ((meta_tci->nfp_flow_key_layer & NFP_FLOWER_LAYER_IPV4) != 0) {
+   ipv4 = (struct nfp_flower_ipv4 *)
+   (*mbuf_off - sizeof(struct nfp_flower_ipv4));
+   if (is_mask)
+   ipv4->ip_ext.proto = 0xFF;
+   else
+   ipv4->ip_ext.proto = IPPROTO_SCTP;
+   ports = (struct nfp_flower_tp_ports *)
+   ((char *)ipv4 - sizeof(struct 
nfp_flower_tp_ports));
+   } else if ((meta_tci->nfp_flow_key_layer & NFP_FLOWER_LAYER_IPV6) != 0) 
{
+   ipv6 = (struct nfp_flower_ipv6 *)
+   (*mbuf_off - sizeof(struct nfp_flower_ipv6));
+   if (is_mask)
+   ipv6->ip_ext.proto = 0xFF;
+   else
+   ipv6->ip_ext.proto = IPPROTO_SCTP;
+   ports = (struct nfp_flower_tp_ports *)
+   ((char *)ipv6 - sizeof(struct 
nfp_flower_tp_ports));
+   } else {
+   PMD_DRV_LOG(ERR, "nfp flow merge sctp: no L3 layer!");
+   return -EINVAL;
+   }
+
spec = item->spec;
if (spec == NULL) {
PMD_DRV_LOG(DEBUG, "nfp flow merge sctp: no item->spec!");
return 0;
}
 
-   meta_tci = (struct nfp_flower_meta_tci 
*)nfp_flow->payload.unmasked_data;
-   if ((meta_tci->nfp_flow_key_layer & NFP_FLOWER_LAYER_IPV4) != 0) {
-   ports_off = *mbuf_off - sizeof(struct nfp_flower_ipv4) -
-   sizeof(struct nfp_flower_tp_ports);
-   } else { /* IPv6 */
-   ports_off = *mbuf_off - sizeof(struct nfp_flower_ipv6) -
-   sizeof(struct nfp_flower_tp_ports);
-   }
-   ports = (struct nfp_flower_tp_ports *)ports_off;
-
mask = item->mask ? item->mask : proc->mask_default;
if (is_mask) {
ports->port_src = mask->hdr.src_port;
-- 
2.39.1

[PATCH] eal: speed up dpdk init time

2024-05-27 Thread Fengnan Chang

If we have a lot of huge pages in system, the memory init will
cost long time in legacy-mem mode. For example, we have 120G memory
in unit of 2MB hugepage, the env init will cost 43s. Almost half
of time spent on find_numasocket, since the address in
/proc/self/numa_maps is orderd, we can sort hugepg_tbl by orig_va
first and then just read numa_maps line by line is enough to find
socket. In my test, spent time reduced to 19s.

Signed-off-by: Fengnan Chang 
---
 lib/eal/linux/eal_memory.c | 115 +++--
 1 file changed, 72 insertions(+), 43 deletions(-)

diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c
index 45879ca743..28cc136ac0 100644
--- a/lib/eal/linux/eal_memory.c
+++ b/lib/eal/linux/eal_memory.c
@@ -414,7 +414,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl, struct 
hugepage_info *hpi,
 static int
 find_numasocket(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi)
 {
-   int socket_id;
+   int socket_id = -1;
char *end, *nodestr;
unsigned i, hp_count = 0;
uint64_t virt_addr;
@@ -432,54 +432,61 @@ find_numasocket(struct hugepage_file *hugepg_tbl, struct 
hugepage_info *hpi)
snprintf(hugedir_str, sizeof(hugedir_str),
"%s/%s", hpi->hugedir, eal_get_hugefile_prefix());
 
-   /* parse numa map */
-   while (fgets(buf, sizeof(buf), f) != NULL) {
-
-   /* ignore non huge page */
-   if (strstr(buf, " huge ") == NULL &&
+   /* if we find this page in our mappings, set socket_id */
+   for (i = 0; i < hpi->num_pages[0]; i++) {
+   void *va = NULL;
+   /* parse numa map */
+   while (fgets(buf, sizeof(buf), f) != NULL) {
+   if (strstr(buf, " huge ") == NULL &&
strstr(buf, hugedir_str) == NULL)
-   continue;
-
-   /* get zone addr */
-   virt_addr = strtoull(buf, &end, 16);
-   if (virt_addr == 0 || end == buf) {
-   EAL_LOG(ERR, "%s(): error in numa_maps parsing", 
__func__);
-   goto error;
-   }
+   continue;
+   /* get zone addr */
+   virt_addr = strtoull(buf, &end, 16);
+   if (virt_addr == 0 || end == buf) {
+   EAL_LOG(ERR, "error in numa_maps parsing");
+   goto error;
+   }
 
-   /* get node id (socket id) */
-   nodestr = strstr(buf, " N");
-   if (nodestr == NULL) {
-   EAL_LOG(ERR, "%s(): error in numa_maps parsing", 
__func__);
-   goto error;
-   }
-   nodestr += 2;
-   end = strstr(nodestr, "=");
-   if (end == NULL) {
-   EAL_LOG(ERR, "%s(): error in numa_maps parsing", 
__func__);
-   goto error;
-   }
-   end[0] = '\0';
-   end = NULL;
+   /* get node id (socket id) */
+   nodestr = strstr(buf, " N");
+   if (nodestr == NULL) {
+   EAL_LOG(ERR, "error in numa_maps parsing");
+   goto error;
+   }
+   nodestr += 2;
+   end = strstr(nodestr, "=");
+   if (end == NULL) {
+   EAL_LOG(ERR, "error in numa_maps parsing");
+   goto error;
+   }
+   end[0] = '\0';
+   end = NULL;
 
-   socket_id = strtoul(nodestr, &end, 0);
-   if ((nodestr[0] == '\0') || (end == NULL) || (*end != '\0')) {
-   EAL_LOG(ERR, "%s(): error in numa_maps parsing", 
__func__);
-   goto error;
+   socket_id = strtoul(nodestr, &end, 0);
+   if ((nodestr[0] == '\0') || (end == NULL) || (*end != 
'\0')) {
+   EAL_LOG(ERR, "error in numa_maps parsing");
+   goto error;
+   }
+   va = (void *)(unsigned long)virt_addr;
+   if (hugepg_tbl[i].orig_va != va) {
+   EAL_LOG(DEBUG, "search %p not seq, let's start 
from begin",
+   hugepg_tbl[i].orig_va);
+   fseek(f, 0, SEEK_SET);
+   } else {
+   break;
+   }
}
-
-   /* if we find this page in our mappings, set socket_id */
-   for (i = 0; i < hpi->num_pages[0]; i++) {
-   void *va = (void *)(unsigned long)virt_addr;
-

[PATCH v5] eal/x86: improve rte_memcpy const size 16 performance

When the rte_memcpy() size is 16, the same 16 bytes are copied twice.
In the case where the size is known to be 16 at build tine, omit the
duplicate copy.

Reduced the amount of effectively copy-pasted code by using #ifdef
inside functions instead of outside functions.

Depends-on: series-31578 ("provide toolchain abstracted __builtin_constant_p")

Suggested-by: Stephen Hemminger 
Signed-off-by: Morten Brørup 
Acked-by: Bruce Richardson 
---
v6:
* Don't wrap depends on line. It seems not to have been understood.
v5:
* Fix for building with MSVC:
  Use __rte_constant() instead of __builtin_constant_p().
  Add dependency on patch providing __rte_constant().
v4:
* There are no problems compiling AVX2, only AVX. (Bruce Richardson)
v3:
* AVX2 is a superset of AVX;
  for a block of AVX code, testing for AVX suffices. (Bruce Richardson)
* Define RTE_MEMCPY_AVX if AVX is available, to avoid copy-pasting the
  check for older GCC version. (Bruce Richardson)
v2:
* For GCC, version 11 is required for proper AVX handling;
  if older GCC version, treat AVX as SSE.
  Clang does not have this issue.
  Note: Original code always treated AVX as SSE, regardless of compiler.
* Do not add copyright. (Stephen Hemminger)
---
 lib/eal/x86/include/rte_memcpy.h | 239 +--
 1 file changed, 64 insertions(+), 175 deletions(-)

diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
index 72a92290e0..1619a8f296 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -27,6 +27,16 @@ extern "C" {
 #pragma GCC diagnostic ignored "-Wstringop-overflow"
 #endif
 
+/*
+ * GCC older than version 11 doesn't compile AVX properly, so use SSE instead.
+ * There are no problems with AVX2.
+ */
+#if defined __AVX2__
+#define RTE_MEMCPY_AVX
+#elif defined __AVX__ && !(defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 
11))
+#define RTE_MEMCPY_AVX
+#endif
+
 /**
  * Copy bytes from one location to another. The locations must not overlap.
  *
@@ -91,14 +101,6 @@ rte_mov15_or_less(void *dst, const void *src, size_t n)
return ret;
 }
 
-#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512
-
-#define ALIGNMENT_MASK 0x3F
-
-/**
- * AVX512 implementation below
- */
-
 /**
  * Copy 16 bytes from one location to another,
  * locations should not overlap.
@@ -119,10 +121,15 @@ rte_mov16(uint8_t *dst, const uint8_t *src)
 static __rte_always_inline void
 rte_mov32(uint8_t *dst, const uint8_t *src)
 {
+#if defined RTE_MEMCPY_AVX
__m256i ymm0;
 
ymm0 = _mm256_loadu_si256((const __m256i *)src);
_mm256_storeu_si256((__m256i *)dst, ymm0);
+#else /* SSE implementation */
+   rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);
+   rte_mov16((uint8_t *)dst + 1 * 16, (const uint8_t *)src + 1 * 16);
+#endif
 }
 
 /**
@@ -132,10 +139,15 @@ rte_mov32(uint8_t *dst, const uint8_t *src)
 static __rte_always_inline void
 rte_mov64(uint8_t *dst, const uint8_t *src)
 {
+#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512
__m512i zmm0;
 
zmm0 = _mm512_loadu_si512((const void *)src);
_mm512_storeu_si512((void *)dst, zmm0);
+#else /* AVX2, AVX & SSE implementation */
+   rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32);
+   rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 * 32);
+#endif
 }
 
 /**
@@ -156,12 +168,18 @@ rte_mov128(uint8_t *dst, const uint8_t *src)
 static __rte_always_inline void
 rte_mov256(uint8_t *dst, const uint8_t *src)
 {
-   rte_mov64(dst + 0 * 64, src + 0 * 64);
-   rte_mov64(dst + 1 * 64, src + 1 * 64);
-   rte_mov64(dst + 2 * 64, src + 2 * 64);
-   rte_mov64(dst + 3 * 64, src + 3 * 64);
+   rte_mov128(dst + 0 * 128, src + 0 * 128);
+   rte_mov128(dst + 1 * 128, src + 1 * 128);
 }
 
+#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512
+
+/**
+ * AVX512 implementation below
+ */
+
+#define ALIGNMENT_MASK 0x3F
+
 /**
  * Copy 128-byte blocks from one location to another,
  * locations should not overlap.
@@ -231,12 +249,22 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
/**
 * Fast way when copy size doesn't exceed 512 bytes
 */
+   if (__rte_constant(n) && n == 32) {
+   rte_mov32((uint8_t *)dst, (const uint8_t *)src);
+   return ret;
+   }
if (n <= 32) {
rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+   if (__rte_constant(n) && n == 16)
+   return ret; /* avoid (harmless) duplicate copy */
rte_mov16((uint8_t *)dst - 16 + n,
  (const uint8_t *)src - 16 + n);
return ret;
}
+   if (__rte_constant(n) && n == 64) {
+   rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+   return ret;
+   }
if (n <= 64) {
rte_mov32((uint8_t *)dst, (const uint8_t *)src);
rte_mov32((uint8_t *)dst

[PATCH v6] eal/x86: improve rte_memcpy const size 16 performance