Re: Eventdev dequeue-enqueue event correlation

2023-10-25 Thread Mattias Rönnblom

On 2023-10-24 11:10, Bruce Richardson wrote:

On Tue, Oct 24, 2023 at 09:10:30AM +0100, Bruce Richardson wrote:

On Mon, Oct 23, 2023 at 06:10:54PM +0200, Mattias Rönnblom wrote:

Hi.

Consider an Eventdev app using atomic-type scheduling doing something like:

 struct rte_event events[3];

 rte_event_dequeue_burst(dev_id, port_id, events, 3, 0);

 /* Assume three events were dequeued, and the application decides
  * it's best off to processing event 0 and 2 consecutively */

 process(&events[0]);
 process(&events[2]);

 events[0].queue_id++;
 events[0].op = RTE_EVENT_OP_FORWARD;
 events[2].queue_id++;
 events[2].op = RTE_EVENT_OP_FORWARD;

 rte_event_enqueue_burst(dev_id, port_id, &events[0], 1);
 rte_event_enqueue_burst(dev_id, port_id, &events[2], 1);

 process(&events[1]);
 events[1].queue_id++;
 events[1].op = RTE_EVENT_OP_FORWARD;

 rte_event_enqueue_burst(dev_id, port_id, &events[1], 1);

If one would just read the Eventdev API spec, they might expect this to work
(especially since impl_opaque hints as potentially be useful for the purpose
of identifying events).

However, on certain event devices, it doesn't (and maybe rightly so). If
event 0 and 2 belongs to the same flow (queue id + flow id pair), and event
1 belongs to some other, then this other flow would be "unlocked" at the
point of the second enqueue operation (and thus be processed on some other
core, in parallel). The first flow would still be needlessly "locked".

Such event devices require the order of the enqueued events to be the same
as the dequeued events, using RTE_EVENT_OP_RELEASE type events as "fillers"
for dropped events.

Am I missing something in the Eventdev API documentation?



Much more likely is that the documentation is missing something. We should
explicitly clarify this behaviour, as it's required by a number of drivers.


Could an event device use the impl_opaque field to track the identity of an
event (and thus relax ordering requirements) and still be complaint toward
the API?



Possibly, but the documentation also doesn't report that the impl_opaque
field must be preserved between dequeue and enqueue. When forwarding a
packet it's well possible for an app to extract an mbuf from a dequeued
event and create a new event for sending it back in to the eventdev. For


Such a behavior would be in violation of a part of the Eventdev API 
contract actually specified. The rte_event struct documentation says 
about impl_opaque that "An implementation may use this field to hold 
implementation specific value to share between dequeue and enqueue 
operation. The application should not modify this field. "


I see no other way to read this than that "an implementation" here is 
referring to an event device PMD. The requirement that the application 
can't modify this field only make sense in the context of "from dequeue 
to enqueue".



example, if the first stage post-RX is doing classify, it's entirely
possible for every single field in the event header to be different for the
event returned compared to dequeue (flow_id recomputed, event type/source
adjusted, target queue_id and priority updated, op type changed to forward
from new, etc. etc.).


What happens if a RTE_EVENT_OP_NEW event is inserted into the mix of
OP_FORWARD and OP_RELEASE type events being enqueued? Again I'm not clear on
what the API says, if anything.


OP_NEW should have no effect on the "history-list" of events previousl
dequeued. Again, our docs should clarify that explicitly. Thanks for
calling all this out.


Looking at the docs we have, I would propose adding a new subsection "Event
Operations", as section 49.1.6 to [1]. There we could explain "New",
"Forward" and "Release" events - what they mean for the different queue
types and how to use them. That section could also cover the enqueue
ordering rules, as the use of event "history" is necessary to explain
releases and forwards.

This seem reasonable? If nobody else has already started on updating docs
for this, I'm happy enough to give it a stab.



Batch dequeues not only provides an opportunity to amortize 
per-interaction overhead with the event device, it also allows the 
application to reshuffle the order in which it decides to process the 
events.


Such reshuffling may have a very significant impact on performance. At a 
minimum, cache locality improves, and in case the app is able to "vector 
processing" (e.g., something akin to what fd.io VPP does), the gains may 
be further increased.


One may argue the app/core should just "do what it's told" by the event 
device. After all, an event device is a work scheduler, and reshuffling 
items of work certainly counts as (micro-)scheduling work.


However it's much to hope for to expect a fairly generic function, 
especially if it comes in the form of hardware, with a design frozen 
years ago, to be able to arrange the work in whatever is currently 
optimal order for one particular 

[PATCH] net/mlx5: fix device checking for send to kernel action

2023-10-25 Thread Jiawei Wang
The previous commit extended the send to kernel support on FDB table.
This action creation failed at MLX5 core kernel module, due to the
VF/SF ports do NOT belong to E-Switch mode.
The failure caused the kernel stuck with the older MLX5 core
kernel version.

This patch adds the checking to avoid creating action at the VF/SF ports
on the FDB table.

Fixes: b2cd39187cd4 ("net/mlx5: extend send to kernel action support")

Signed-off-by: Jiawei Wang 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 6fcf654e4a..89b6f546ae 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -5957,8 +5957,11 @@ flow_hw_create_send_to_kernel_actions(struct mlx5_priv 
*priv __rte_unused)
 #ifdef HAVE_MLX5DV_DR_ACTION_CREATE_DEST_ROOT_TABLE
int action_flag;
int i;
+   bool is_vf_sf_dev = priv->sh->dev_cap.vf || priv->sh->dev_cap.sf;
 
for (i = MLX5DR_TABLE_TYPE_NIC_RX; i < MLX5DR_TABLE_TYPE_MAX; i++) {
+   if (is_vf_sf_dev && MLX5DR_TABLE_TYPE_FDB == i)
+   continue;
action_flag = mlx5_hw_act_flag[1][i];
priv->hw_send_to_kernel[i] =
mlx5dr_action_create_dest_root(priv->dr_ctx,
-- 
2.18.1



RE: [PATCH] net/mlx5: fix device checking for send to kernel action

2023-10-25 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: Jiawei(Jonny) Wang 
> Sent: Wednesday, October 25, 2023 10:49 AM
> To: Suanming Mou ; Slava Ovsiienko
> 
> Cc: dev@dpdk.org; Raslan Darawsheh 
> Subject: [PATCH] net/mlx5: fix device checking for send to kernel action
> 
> The previous commit extended the send to kernel support on FDB table.
> This action creation failed at MLX5 core kernel module, due to the VF/SF ports
> do NOT belong to E-Switch mode.
> The failure caused the kernel stuck with the older MLX5 core kernel version.
> 
> This patch adds the checking to avoid creating action at the VF/SF ports on 
> the
> FDB table.
> 
> Fixes: b2cd39187cd4 ("net/mlx5: extend send to kernel action support")
> 
> Signed-off-by: Jiawei Wang 
> Acked-by: Suanming Mou 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh


Re: [PATCH] net/iavf: fix IAVF_TX_OFFLOAD_MASK definition

2023-10-25 Thread Radu Nicolau



On 25-Oct-23 12:30 AM, Zhang, Qi Z wrote:



-Original Message-
From: Nicolau, Radu 
Sent: Tuesday, October 24, 2023 10:49 PM
To: Zhang, Qi Z ; Marchand, David

Cc: Wu, Jingjing ; Xing, Beilei ;
dev@dpdk.org; sta...@dpdk.org
Subject: Re: [PATCH] net/iavf: fix IAVF_TX_OFFLOAD_MASK definition


On 24-Oct-23 12:24 PM, Zhang, Qi Z wrote:

-Original Message-
From: Radu Nicolau 
Sent: Tuesday, October 24, 2023 6:23 PM
To: Marchand, David 
Cc: Wu, Jingjing ; Xing, Beilei
; dev@dpdk.org; sta...@dpdk.org
Subject: Re: [PATCH] net/iavf: fix IAVF_TX_OFFLOAD_MASK definition


On 24-Oct-23 10:49 AM, David Marchand wrote:

On Tue, Oct 24, 2023 at 11:13 AM Radu Nicolau


wrote:

IAVF_TX_OFFLOAD_MASK definition contained

RTE_ETH_TX_OFFLOAD_SECURITY

instead of RTE_MBUF_F_TX_SEC_OFFLOAD.

Fixes: 6bc987ecb860 ("net/iavf: support IPsec inline crypto")
Cc: sta...@dpdk.org

Signed-off-by: Radu Nicolau 

Something is not clear to me.
How was the IPsec inline crypto feature supposed to work with this
driver so far?

Any packet with the RTE_MBUF_F_TX_SEC_OFFLOAD flag should have

been

refused in iavf_prep_pkts.


It worked because the IPsec sample app doesn't call
rte_eth_tx_prepare, and from what I can see no other sample app does.

To keep consistent, its better to refine the

IAVF_TX_OFFLOAD_NOTSUP_MASK definition.

You mean like this?


#define IAVF_TX_OFFLOAD_NOTSUP_MASK ( \
      RTE_MBUF_F_TX_OFFLOAD_MASK ^ (  \
          RTE_MBUF_F_TX_OUTER_IPV6 |         \
          RTE_MBUF_F_TX_OUTER_IPV4 |         \
          RTE_MBUF_F_TX_IPV6 |             \
          RTE_MBUF_F_TX_IPV4 |             \
          RTE_MBUF_F_TX_VLAN |         \
          RTE_MBUF_F_TX_IP_CKSUM |         \
          RTE_MBUF_F_TX_L4_MASK |         \
          RTE_MBUF_F_TX_TCP_SEG |         \
          RTE_MBUF_F_TX_UDP_SEG |  \
          RTE_MBUF_F_TX_TUNNEL_MASK |    \
          RTE_MBUF_F_TX_OUTER_IP_CKSUM |  \
          RTE_MBUF_F_TX_OUTER_UDP_CKSUM | \
          RTE_MBUF_F_TX_SEC_OFFLOAD))

Sorry, I miss understanding this code change, actually you didn't remove a 
flag, but just replace it,  NOTSUP_MASK no need to be changed

Then I don't understand why "Any packet with the RTE_MBUF_F_TX_SEC_OFFLOAD flag 
should have refused in iavf_prep_pkts"
But I assume tx_pkt_prepare should reject only invalid packets while still 
functioning correctly with inline IPsec.


rte_eth_tx_prepare would have rejected the packets before this fix, but 
no app calls rte_eth_tx_prepare. The only app that calls it is testpmd.





RE: [PATCH v3] net/mlx5: add test for live migration

2023-10-25 Thread Rongwei Liu



BR
Rongwei

> -Original Message-
> From: Rongwei Liu 
> Sent: Monday, October 16, 2023 17:30
> To: NBU-Contact-Thomas Monjalon (EXTERNAL) 
> Cc: dev@dpdk.org; Matan Azrad ; Slava Ovsiienko
> ; Ori Kam ; Suanming Mou
> ; Raslan Darawsheh 
> Subject: RE: [PATCH v3] net/mlx5: add test for live migration
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi
> 
> BR
> Rongwei
> 
> > -Original Message-
> > From: Thomas Monjalon 
> > Sent: Monday, October 16, 2023 17:27
> > To: Rongwei Liu 
> > Cc: dev@dpdk.org; Matan Azrad ; Slava Ovsiienko
> > ; Ori Kam ; Suanming Mou
> > ; Raslan Darawsheh 
> > Subject: Re: [PATCH v3] net/mlx5: add test for live migration
> >
> > External email: Use caution opening links or attachments
> >
> >
> > 16/10/2023 10:25, Rongwei Liu:
> > > From: Thomas Monjalon 
> > > > 19/09/2023 10:12, Rongwei Liu:
> > > > > +   testpmd> mlx5 set flow_engine  []
> > > >
> > > > What are the flags?
> > > >
> > > The flag is optional and defined a as bitmap.
> > > For now, only one value is accepted: BIT(0).
> > > I don't have any idea to propagate the value definition list here.
> > > Any
> > suggestions?
> >
> > Just add it and give the usage of the flag or refer to another part of
> > the doc for explanation.
> Change it as: " Set the flow engine to active or standby mode with specific
> flags (bitmap style)::"
> Sound good?
> >
@NBU-Contact-Thomas Monjalon (EXTERNAL) are we good to move forward? Thanks


Re: [PATCH] net/iavf: fix IAVF_TX_OFFLOAD_MASK definition

2023-10-25 Thread David Marchand
On Wed, Oct 25, 2023 at 11:02 AM Radu Nicolau  wrote:
>
>
> On 25-Oct-23 12:30 AM, Zhang, Qi Z wrote:
> >
> >> -Original Message-
> >> From: Nicolau, Radu 
> >> Sent: Tuesday, October 24, 2023 10:49 PM
> >> To: Zhang, Qi Z ; Marchand, David
> >> 
> >> Cc: Wu, Jingjing ; Xing, Beilei 
> >> ;
> >> dev@dpdk.org; sta...@dpdk.org
> >> Subject: Re: [PATCH] net/iavf: fix IAVF_TX_OFFLOAD_MASK definition
> >>
> >>
> >> On 24-Oct-23 12:24 PM, Zhang, Qi Z wrote:
>  -Original Message-
>  From: Radu Nicolau 
>  Sent: Tuesday, October 24, 2023 6:23 PM
>  To: Marchand, David 
>  Cc: Wu, Jingjing ; Xing, Beilei
>  ; dev@dpdk.org; sta...@dpdk.org
>  Subject: Re: [PATCH] net/iavf: fix IAVF_TX_OFFLOAD_MASK definition
> 
> 
>  On 24-Oct-23 10:49 AM, David Marchand wrote:
> > On Tue, Oct 24, 2023 at 11:13 AM Radu Nicolau
> > 
>  wrote:
> >> IAVF_TX_OFFLOAD_MASK definition contained
>  RTE_ETH_TX_OFFLOAD_SECURITY
> >> instead of RTE_MBUF_F_TX_SEC_OFFLOAD.
> >>
> >> Fixes: 6bc987ecb860 ("net/iavf: support IPsec inline crypto")
> >> Cc: sta...@dpdk.org
> >>
> >> Signed-off-by: Radu Nicolau 
> > Something is not clear to me.
> > How was the IPsec inline crypto feature supposed to work with this
> > driver so far?
> >
> > Any packet with the RTE_MBUF_F_TX_SEC_OFFLOAD flag should have
> >> been
> > refused in iavf_prep_pkts.
> >
>  It worked because the IPsec sample app doesn't call
>  rte_eth_tx_prepare, and from what I can see no other sample app does.
> >>> To keep consistent, its better to refine the
> >> IAVF_TX_OFFLOAD_NOTSUP_MASK definition.
> >>
> >> You mean like this?
> >>
> >>
> >> #define IAVF_TX_OFFLOAD_NOTSUP_MASK ( \
> >>   RTE_MBUF_F_TX_OFFLOAD_MASK ^ (  \
> >>   RTE_MBUF_F_TX_OUTER_IPV6 | \
> >>   RTE_MBUF_F_TX_OUTER_IPV4 | \
> >>   RTE_MBUF_F_TX_IPV6 | \
> >>   RTE_MBUF_F_TX_IPV4 | \
> >>   RTE_MBUF_F_TX_VLAN | \
> >>   RTE_MBUF_F_TX_IP_CKSUM | \
> >>   RTE_MBUF_F_TX_L4_MASK | \
> >>   RTE_MBUF_F_TX_TCP_SEG | \
> >>   RTE_MBUF_F_TX_UDP_SEG |  \
> >>   RTE_MBUF_F_TX_TUNNEL_MASK |\
> >>   RTE_MBUF_F_TX_OUTER_IP_CKSUM |  \
> >>   RTE_MBUF_F_TX_OUTER_UDP_CKSUM | \
> >>   RTE_MBUF_F_TX_SEC_OFFLOAD))
> > Sorry, I miss understanding this code change, actually you didn't remove a 
> > flag, but just replace it,  NOTSUP_MASK no need to be changed
> >
> > Then I don't understand why "Any packet with the RTE_MBUF_F_TX_SEC_OFFLOAD 
> > flag should have refused in iavf_prep_pkts"
> > But I assume tx_pkt_prepare should reject only invalid packets while still 
> > functioning correctly with inline IPsec.
>
> rte_eth_tx_prepare would have rejected the packets before this fix, but
> no app calls rte_eth_tx_prepare. The only app that calls it is testpmd.

>From my understanding, applications that want checksum offload are
required to call rte_eth_tx_prepare.


-- 
David Marchand



[PATCH v2] net/iavf: fix Tx offloading flags check

2023-10-25 Thread Radu Nicolau
Relax the check in the previous fix to allow packets
with security offload flag set.

Fixes: 3c715591ece0 ("net/iavf: fix checksum offloading")
Cc: sta...@dpdk.org
Cc: david.march...@redhat.com

Signed-off-by: Radu Nicolau 
---
v2: extend the check for only TX_SEC_OFFLOAD

 drivers/net/iavf/iavf_rxtx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index c6ef6af1d8..99007676a8 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -2664,7 +2664,8 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t 
*qw1,
l2tag1 |= m->vlan_tci;
}
 
-   if ((m->ol_flags & IAVF_TX_CKSUM_OFFLOAD_MASK) == 0)
+   if ((m->ol_flags &
+   (IAVF_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
goto skip_cksum;
 
/* Set MACLEN */
-- 
2.25.1



[PATCH v3] mem: allow using ASan in multi-process mode

2023-10-25 Thread Artur Paszkiewicz
Multi-process applications operate on shared hugepage memory but each
process has its own ASan shadow region which is not synchronized with
the other processes. This causes issues when different processes try to
use the same memory because they have their own view of which addresses
are valid.

Fix it by mapping the shadow regions for memseg lists as shared memory.
The primary process is responsible for creating and removing the shared
memory objects.

Disable ASan instrumentation for triggering the page fault in
alloc_seg() because if the segment is already allocated by another
process and is marked as free in the shadow, accessing this address will
cause an ASan error.

Signed-off-by: Artur Paszkiewicz 
---
v3:
- Removed conditional compilation from eal_common_memory.c.
- Improved comments.
v2:
- Added checks for config options disabling multi-process support.
- Fixed missing unmap in legacy mode.

 lib/eal/common/eal_common_memory.c |   7 ++
 lib/eal/common/eal_private.h   |  35 ++
 lib/eal/linux/eal_memalloc.c   |  23 +--
 lib/eal/linux/eal_memory.c | 101 +
 lib/eal/linux/meson.build  |   4 ++
 5 files changed, 164 insertions(+), 6 deletions(-)

diff --git a/lib/eal/common/eal_common_memory.c 
b/lib/eal/common/eal_common_memory.c
index d9433db623..5daf53d4d2 100644
--- a/lib/eal/common/eal_common_memory.c
+++ b/lib/eal/common/eal_common_memory.c
@@ -263,6 +263,11 @@ eal_memseg_list_alloc(struct rte_memseg_list *msl, int 
reserve_flags)
RTE_LOG(DEBUG, EAL, "VA reserved for memseg list at %p, size %zx\n",
addr, mem_sz);
 
+   if (eal_memseg_list_map_asan_shadow(msl) != 0) {
+   RTE_LOG(ERR, EAL, "Failed to map ASan shadow region for memseg 
list");
+   return -1;
+   }
+
return 0;
 }
 
@@ -1050,6 +1055,8 @@ rte_eal_memory_detach(void)
RTE_LOG(ERR, EAL, "Could not unmap memory: 
%s\n",
rte_strerror(rte_errno));
 
+   eal_memseg_list_unmap_asan_shadow(msl);
+
/*
 * we are detaching the fbarray rather than destroying because
 * other processes might still reference this fbarray, and we
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index 5eadba4902..6535b38637 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -300,6 +300,41 @@ eal_memseg_list_alloc(struct rte_memseg_list *msl, int 
reserve_flags);
 void
 eal_memseg_list_populate(struct rte_memseg_list *msl, void *addr, int n_segs);
 
+/**
+ * Map shared memory for MSL ASan shadow region.
+ *
+ * @param msl
+ *  Memory segment list.
+ * @return
+ *  0 on success, (-1) on failure.
+ */
+#ifdef RTE_MALLOC_ASAN
+int
+eal_memseg_list_map_asan_shadow(struct rte_memseg_list *msl);
+#else
+static inline int
+eal_memseg_list_map_asan_shadow(__rte_unused struct rte_memseg_list *msl)
+{
+   return 0;
+}
+#endif
+
+/**
+ * Unmap the MSL ASan shadow region.
+ *
+ * @param msl
+ *  Memory segment list.
+ */
+#ifdef RTE_MALLOC_ASAN
+void
+eal_memseg_list_unmap_asan_shadow(struct rte_memseg_list *msl);
+#else
+static inline void
+eal_memseg_list_unmap_asan_shadow(__rte_unused struct rte_memseg_list *msl)
+{
+}
+#endif
+
 /**
  * Distribute available memory between MSLs.
  *
diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c
index f8b1588cae..a4151534a8 100644
--- a/lib/eal/linux/eal_memalloc.c
+++ b/lib/eal/linux/eal_memalloc.c
@@ -511,6 +511,21 @@ resize_hugefile(int fd, uint64_t fa_offset, uint64_t 
page_sz, bool grow,
grow, dirty);
 }
 
+__rte_no_asan
+static inline void
+page_fault(void *addr)
+{
+   /* We need to trigger a write to the page to enforce page fault but we
+* can't overwrite value that is already there, so read the old value
+* and write it back. Kernel populates the page with zeroes initially.
+*
+* Disable ASan instrumentation here because if the segment is already
+* allocated by another process and is marked as free in the shadow,
+* accessing this address will cause an ASan error.
+*/
+   *(volatile int *)addr = *(volatile int *)addr;
+}
+
 static int
 alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
struct hugepage_info *hi, unsigned int list_idx,
@@ -636,12 +651,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
goto mapped;
}
 
-   /* we need to trigger a write to the page to enforce page fault and
-* ensure that page is accessible to us, but we can't overwrite value
-* that is already there, so read the old value, and write itback.
-* kernel populates the page with zeroes initially.
-*/
-   *(volatile int *)addr = *(volatile int *)addr;
+   /* enforce page fault and ensure that page is accessible to us

[Bug 1304] l3fwd-power example fails to run with uncore options, -U -u and -i

2023-10-25 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=1304

Bug ID: 1304
   Summary: l3fwd-power example fails to run with uncore options,
-U -u and -i
   Product: DPDK
   Version: 23.11
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: other
  Assignee: dev@dpdk.org
  Reporter: karen.ke...@intel.com
  Target Milestone: ---

We suspect this particular commit introduces this bug, the sample app will not
work with the -U -u and -i options. 

The options worked when I tried with 23.07. I also tried removing this commit
and the options worked again.

Result of git show:

commit ac1edcb6621af6ff3c2b01d40e4dd6ed0527a748
Author: Sivaprasad Tummala 
Date:   Wed Aug 16 03:09:57 2023 -0700

power: refactor uncore power management API

Currently the uncore power management implementation is vendor specific.

Added new vendor agnostic uncore power interface similar to rte_power
and rename specific implementations ("rte_power_intel_uncore") to
"power_intel_uncore" along with functions.

Signed-off-by: Sivaprasad Tummala 


DPDK Version: 23.11-rc1
Commands: 
.//examples/dpdk-l3fwd-power -c 0x6 -n 1 -- -p 0x1 -P
--config="(0,0,2)" -U
.//examples/dpdk-l3fwd-power -c 0x6 -n 1 -- -p 0x1 -P
--config="(0,0,2)" -u
.//examples/dpdk-l3fwd-power -c 0x6 -n 1 -- -p 0x1 -P
--config="(0,0,2)" -i 2
Error:
EAL: Error - exiting with code: 1
Cause: Invalid L3FWD parameters

-- 
You are receiving this mail because:
You are the assignee for the bug.

[PATCH v5] net/mlx5: add test for live migration

2023-10-25 Thread Rongwei Liu
This patch adds testpmd app a runtime function to test the live
migration API.

testpmd> mlx5 set flow_engine  []
Flag is optional.

Signed-off-by: Rongwei Liu 
Acked-by: Viacheslav Ovsiienko 
Acked-by: Ori Kam 
---
 doc/guides/nics/mlx5.rst|  15 
 drivers/net/mlx5/mlx5_testpmd.c | 124 
 2 files changed, 139 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 9039b55c0b..412a967c68 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2187,3 +2187,18 @@ where:
 * ``sw_queue_id``: queue index in range [64536, 65535].
   This range is the highest 1000 numbers.
 * ``hw_queue_id``: queue index given by HW in queue creation.
+
+Set Flow Engine Mode
+
+
+Set the flow engine to active or standby mode with specific flags (bitmap 
style)::
+See MLX5_FLOW_ENGINE_FLAG_* for the detailed flags definitions.
+
+.. code-block:: console
+
+   testpmd> mlx5 set flow_engine  []
+
+This command is used for testing live migration and works for
+software steering only.
+Default FDB jump should be disabled if switchdev is enabled.
+The mode will propagate to all the probed ports.
diff --git a/drivers/net/mlx5/mlx5_testpmd.c b/drivers/net/mlx5/mlx5_testpmd.c
index 879ea2826e..c70a10b3af 100644
--- a/drivers/net/mlx5/mlx5_testpmd.c
+++ b/drivers/net/mlx5/mlx5_testpmd.c
@@ -25,6 +25,29 @@
 
 static uint8_t host_shaper_avail_thresh_triggered[RTE_MAX_ETHPORTS];
 #define SHAPER_DISABLE_DELAY_US 10 /* 100ms */
+#define PARSE_DELIMITER " \f\n\r\t\v"
+
+static int
+parse_uint(uint64_t *value, const char *str)
+{
+   char *next = NULL;
+   uint64_t n;
+
+   errno = 0;
+   /* Parse number string */
+   if (!strncasecmp(str, "0x", 2)) {
+   str += 2;
+   n = strtol(str, &next, 16);
+   } else {
+   n = strtol(str, &next, 10);
+   }
+   if (errno != 0 || str == next || *next != '\0')
+   return -1;
+
+   *value = n;
+
+   return 0;
+}
 
 /**
  * Disable the host shaper and re-arm available descriptor threshold event.
@@ -561,6 +584,102 @@ cmdline_parse_inst_t mlx5_cmd_unmap_ext_rxq = {
}
 };
 
+/* Set flow engine mode with flags command. */
+struct mlx5_cmd_set_flow_engine_mode {
+   cmdline_fixed_string_t mlx5;
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t flow_engine;
+   cmdline_multi_string_t mode;
+};
+
+static int
+parse_multi_token_flow_engine_mode(char *t_str, enum mlx5_flow_engine_mode 
*mode,
+  uint32_t *flag)
+{
+   uint64_t val;
+   char *token;
+   int ret;
+
+   *flag = 0;
+   /* First token: mode string */
+   token = strtok_r(t_str, PARSE_DELIMITER, &t_str);
+   if (token ==  NULL)
+   return -1;
+
+   if (!strcmp(token, "active"))
+   *mode = MLX5_FLOW_ENGINE_MODE_ACTIVE;
+   else if (!strcmp(token, "standby"))
+   *mode = MLX5_FLOW_ENGINE_MODE_STANDBY;
+   else
+   return -1;
+
+   /* Second token: flag */
+   token = strtok_r(t_str, PARSE_DELIMITER, &t_str);
+   if (token == NULL)
+   return 0;
+
+   ret = parse_uint(&val, token);
+   if (ret != 0 || val > UINT32_MAX)
+   return -1;
+
+   *flag = val;
+   return 0;
+}
+
+static void
+mlx5_cmd_set_flow_engine_mode_parsed(void *parsed_result,
+__rte_unused struct cmdline *cl,
+__rte_unused void *data)
+{
+   struct mlx5_cmd_set_flow_engine_mode *res = parsed_result;
+   enum mlx5_flow_engine_mode mode;
+   uint32_t flag;
+   int ret;
+
+   ret = parse_multi_token_flow_engine_mode(res->mode, &mode, &flag);
+
+   if (ret < 0) {
+   fprintf(stderr, "Bad input\n");
+   return;
+   }
+
+   ret = rte_pmd_mlx5_flow_engine_set_mode(mode, flag);
+
+   if (ret < 0)
+   fprintf(stderr, "Fail to set flow_engine to %s mode with flag 
0x%x, error %s\n",
+   mode == MLX5_FLOW_ENGINE_MODE_ACTIVE ? "active" : 
"standby", flag,
+   strerror(-ret));
+   else
+   TESTPMD_LOG(DEBUG, "Set %d ports flow_engine to %s mode with 
flag 0x%x\n", ret,
+   mode == MLX5_FLOW_ENGINE_MODE_ACTIVE ? "active" : 
"standby", flag);
+}
+
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_mlx5 =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, mlx5,
+"mlx5");
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_set =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, set,
+"set");
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_flow_engine =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, 
flow_engine,
+ 

RE: [PATCH v2 09/19] rcu: use rte optional stdatomic API

2023-10-25 Thread Ruifeng Wang
> -Original Message-
> From: Tyler Retzlaff 
> Sent: Wednesday, October 18, 2023 4:31 AM
> To: dev@dpdk.org
> Cc: Akhil Goyal ; Anatoly Burakov 
> ; Andrew
> Rybchenko ; Bruce Richardson 
> ;
> Chenbo Xia ; Ciara Power ; David 
> Christensen
> ; David Hunt ; Dmitry Kozlyuk
> ; Dmitry Malloy ; Elena 
> Agostini
> ; Erik Gabriel Carrillo ; 
> Fan Zhang
> ; Ferruh Yigit ; Harman Kalra
> ; Harry van Haaren ; Honnappa 
> Nagarahalli
> ; jer...@marvell.com; Konstantin Ananyev
> ; Matan Azrad ; Maxime 
> Coquelin
> ; Narcisa Ana Maria Vasile 
> ;
> Nicolas Chautru ; Olivier Matz 
> ; Ori
> Kam ; Pallavi Kadam ; Pavan 
> Nikhilesh
> ; Reshma Pattan ; Sameh 
> Gobriel
> ; Shijith Thotton ; Sivaprasad 
> Tummala
> ; Stephen Hemminger ; 
> Suanming Mou
> ; Sunil Kumar Kori ; 
> tho...@monjalon.net;
> Viacheslav Ovsiienko ; Vladimir Medvedkin
> ; Yipeng Wang ; Tyler 
> Retzlaff
> 
> Subject: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> 
> Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding 
> rte_atomic_xxx
> optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff 
> ---
>  lib/rcu/rte_rcu_qsbr.c | 48 +--
>  lib/rcu/rte_rcu_qsbr.h | 68 
> +-
>  2 files changed, 58 insertions(+), 58 deletions(-)
> 
> diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c index 
> 17be93e..4dc7714 100644
> --- a/lib/rcu/rte_rcu_qsbr.c
> +++ b/lib/rcu/rte_rcu_qsbr.c
> @@ -102,21 +102,21 @@
>* go out of sync. Hence, additional checks are required.
>*/
>   /* Check if the thread is already registered */
> - old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> - __ATOMIC_RELAXED);
> + old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> + rte_memory_order_relaxed);
>   if (old_bmap & 1UL << id)
>   return 0;
> 
>   do {
>   new_bmap = old_bmap | (1UL << id);
> - success = __atomic_compare_exchange(
> + success = rte_atomic_compare_exchange_strong_explicit(
>   __RTE_QSBR_THRID_ARRAY_ELM(v, i),
> - &old_bmap, &new_bmap, 0,
> - __ATOMIC_RELEASE, __ATOMIC_RELAXED);
> + &old_bmap, new_bmap,
> + rte_memory_order_release, 
> rte_memory_order_relaxed);
> 
>   if (success)
> - __atomic_fetch_add(&v->num_threads,
> - 1, __ATOMIC_RELAXED);
> + rte_atomic_fetch_add_explicit(&v->num_threads,
> + 1, rte_memory_order_relaxed);
>   else if (old_bmap & (1UL << id))
>   /* Someone else registered this thread.
>* Counter should not be incremented.
> @@ -154,8 +154,8 @@
>* go out of sync. Hence, additional checks are required.
>*/
>   /* Check if the thread is already unregistered */
> - old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> - __ATOMIC_RELAXED);
> + old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> + rte_memory_order_relaxed);
>   if (!(old_bmap & (1UL << id)))
>   return 0;
> 
> @@ -165,14 +165,14 @@
>* completed before removal of the thread from the list of
>* reporting threads.
>*/
> - success = __atomic_compare_exchange(
> + success = rte_atomic_compare_exchange_strong_explicit(
>   __RTE_QSBR_THRID_ARRAY_ELM(v, i),
> - &old_bmap, &new_bmap, 0,
> - __ATOMIC_RELEASE, __ATOMIC_RELAXED);
> + &old_bmap, new_bmap,
> + rte_memory_order_release, 
> rte_memory_order_relaxed);
> 
>   if (success)
> - __atomic_fetch_sub(&v->num_threads,
> - 1, __ATOMIC_RELAXED);
> + rte_atomic_fetch_sub_explicit(&v->num_threads,
> + 1, rte_memory_order_relaxed);
>   else if (!(old_bmap & (1UL << id)))
>   /* Someone else unregistered this thread.
>* Counter should not be incremented.
> @@ -227,8 +227,8 @@
> 
>   fprintf(f, "  Registered thread IDs = ");
>   for (i = 0; i < v->num_elems; i++) {
> - bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> - __ATOMIC_ACQUIRE);
> + bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v,

Re: [PATCH v5] net/mlx5: add test for live migration

2023-10-25 Thread Thomas Monjalon
25/10/2023 11:36, Rongwei Liu:
> +Set Flow Engine Mode
> +
> +
> +Set the flow engine to active or standby mode with specific flags (bitmap 
> style)::

This sentence should end with a dot.

> +See MLX5_FLOW_ENGINE_FLAG_* for the detailed flags definitions.

MLX5_FLOW_ENGINE_FLAG_* should be between backquotes:
``MLX5_FLOW_ENGINE_FLAG_*``

No need "s" to "flags".
You can also remove "detailed": "flag definitions".

> +
> +.. code-block:: console
> +
> +   testpmd> mlx5 set flow_engine  []
> +
> +This command is used for testing live migration and works for
> +software steering only.
> +Default FDB jump should be disabled if switchdev is enabled.
> +The mode will propagate to all the probed ports.





[PATCH v6] net/mlx5: add test for live migration

2023-10-25 Thread Rongwei Liu
This patch adds testpmd app a runtime function to test the live
migration API.

testpmd> mlx5 set flow_engine  []
Flag is optional.

Signed-off-by: Rongwei Liu 
Acked-by: Viacheslav Ovsiienko 
Acked-by: Ori Kam 
---
 doc/guides/nics/mlx5.rst|  15 
 drivers/net/mlx5/mlx5_testpmd.c | 124 
 2 files changed, 139 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 9039b55c0b..aca51f0928 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2187,3 +2187,18 @@ where:
 * ``sw_queue_id``: queue index in range [64536, 65535].
   This range is the highest 1000 numbers.
 * ``hw_queue_id``: queue index given by HW in queue creation.
+
+Set Flow Engine Mode
+
+
+Set the flow engine to active or standby mode with specific flags (bitmap 
style)::
+See MLX5_FLOW_ENGINE_FLAG_* for the flag definitions.
+
+.. code-block:: console
+
+   testpmd> mlx5 set flow_engine  []
+
+This command is used for testing live migration and works for
+software steering only.
+Default FDB jump should be disabled if switchdev is enabled.
+The mode will propagate to all the probed ports.
diff --git a/drivers/net/mlx5/mlx5_testpmd.c b/drivers/net/mlx5/mlx5_testpmd.c
index 879ea2826e..c70a10b3af 100644
--- a/drivers/net/mlx5/mlx5_testpmd.c
+++ b/drivers/net/mlx5/mlx5_testpmd.c
@@ -25,6 +25,29 @@
 
 static uint8_t host_shaper_avail_thresh_triggered[RTE_MAX_ETHPORTS];
 #define SHAPER_DISABLE_DELAY_US 10 /* 100ms */
+#define PARSE_DELIMITER " \f\n\r\t\v"
+
+static int
+parse_uint(uint64_t *value, const char *str)
+{
+   char *next = NULL;
+   uint64_t n;
+
+   errno = 0;
+   /* Parse number string */
+   if (!strncasecmp(str, "0x", 2)) {
+   str += 2;
+   n = strtol(str, &next, 16);
+   } else {
+   n = strtol(str, &next, 10);
+   }
+   if (errno != 0 || str == next || *next != '\0')
+   return -1;
+
+   *value = n;
+
+   return 0;
+}
 
 /**
  * Disable the host shaper and re-arm available descriptor threshold event.
@@ -561,6 +584,102 @@ cmdline_parse_inst_t mlx5_cmd_unmap_ext_rxq = {
}
 };
 
+/* Set flow engine mode with flags command. */
+struct mlx5_cmd_set_flow_engine_mode {
+   cmdline_fixed_string_t mlx5;
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t flow_engine;
+   cmdline_multi_string_t mode;
+};
+
+static int
+parse_multi_token_flow_engine_mode(char *t_str, enum mlx5_flow_engine_mode 
*mode,
+  uint32_t *flag)
+{
+   uint64_t val;
+   char *token;
+   int ret;
+
+   *flag = 0;
+   /* First token: mode string */
+   token = strtok_r(t_str, PARSE_DELIMITER, &t_str);
+   if (token ==  NULL)
+   return -1;
+
+   if (!strcmp(token, "active"))
+   *mode = MLX5_FLOW_ENGINE_MODE_ACTIVE;
+   else if (!strcmp(token, "standby"))
+   *mode = MLX5_FLOW_ENGINE_MODE_STANDBY;
+   else
+   return -1;
+
+   /* Second token: flag */
+   token = strtok_r(t_str, PARSE_DELIMITER, &t_str);
+   if (token == NULL)
+   return 0;
+
+   ret = parse_uint(&val, token);
+   if (ret != 0 || val > UINT32_MAX)
+   return -1;
+
+   *flag = val;
+   return 0;
+}
+
+static void
+mlx5_cmd_set_flow_engine_mode_parsed(void *parsed_result,
+__rte_unused struct cmdline *cl,
+__rte_unused void *data)
+{
+   struct mlx5_cmd_set_flow_engine_mode *res = parsed_result;
+   enum mlx5_flow_engine_mode mode;
+   uint32_t flag;
+   int ret;
+
+   ret = parse_multi_token_flow_engine_mode(res->mode, &mode, &flag);
+
+   if (ret < 0) {
+   fprintf(stderr, "Bad input\n");
+   return;
+   }
+
+   ret = rte_pmd_mlx5_flow_engine_set_mode(mode, flag);
+
+   if (ret < 0)
+   fprintf(stderr, "Fail to set flow_engine to %s mode with flag 
0x%x, error %s\n",
+   mode == MLX5_FLOW_ENGINE_MODE_ACTIVE ? "active" : 
"standby", flag,
+   strerror(-ret));
+   else
+   TESTPMD_LOG(DEBUG, "Set %d ports flow_engine to %s mode with 
flag 0x%x\n", ret,
+   mode == MLX5_FLOW_ENGINE_MODE_ACTIVE ? "active" : 
"standby", flag);
+}
+
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_mlx5 =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, mlx5,
+"mlx5");
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_set =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, set,
+"set");
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_flow_engine =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, 
flow_engine,
+   

[PATCH] net/mlx5: add global API prefix to public constants

2023-10-25 Thread Thomas Monjalon
The file rte_pmd_mlx5.h is a public API,
so its components must be prefixed with RTE_PMD_.

Signed-off-by: Thomas Monjalon 
---
 drivers/net/mlx5/mlx5.h |  6 +++---
 drivers/net/mlx5/mlx5_defs.h|  2 +-
 drivers/net/mlx5/mlx5_ethdev.c  |  4 ++--
 drivers/net/mlx5/mlx5_flow.c| 28 ++--
 drivers/net/mlx5/mlx5_flow.h|  2 +-
 drivers/net/mlx5/mlx5_flow_dv.c |  8 
 drivers/net/mlx5/mlx5_flow_hw.c |  4 ++--
 drivers/net/mlx5/mlx5_rx.c  |  2 +-
 drivers/net/mlx5/mlx5_rx.h  |  4 ++--
 drivers/net/mlx5/mlx5_rxq.c | 10 +-
 drivers/net/mlx5/mlx5_testpmd.c |  4 ++--
 drivers/net/mlx5/rte_pmd_mlx5.h | 30 +++---
 12 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0b709a1bda..9966c2c082 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1684,8 +1684,8 @@ struct mlx5_dv_flow_info {
struct rte_flow_attr attr;
 };
 
-struct mlx5_flow_engine_mode_info {
-   enum mlx5_flow_engine_mode mode;
+struct rte_pmd_mlx5_flow_engine_mode_info {
+   enum rte_pmd_mlx5_flow_engine_mode mode;
uint32_t mode_flag;
/* The list is maintained in insertion order. */
LIST_HEAD(hot_up_info, mlx5_dv_flow_info) hot_upgrade;
@@ -1834,7 +1834,7 @@ struct mlx5_priv {
uint32_t nb_queue; /* HW steering queue number. */
struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */
uint32_t hws_mark_refcnt; /* HWS mark action reference counter. */
-   struct mlx5_flow_engine_mode_info mode_info; /* Process set flow engine 
info. */
+   struct rte_pmd_mlx5_flow_engine_mode_info mode_info; /* Process set 
flow engine info. */
struct mlx5_flow_hw_attr *hw_attr; /* HW Steering port configuration. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
/* Item template list. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 2af8c731ef..dc5216cb24 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -181,7 +181,7 @@
 #define MLX5_MAX_INDIRECT_ACTIONS 3
 
 /* Maximum number of external Rx queues supported by rte_flow */
-#define MLX5_MAX_EXT_RX_QUEUES (UINT16_MAX - MLX5_EXTERNAL_RX_QUEUE_ID_MIN + 1)
+#define MLX5_MAX_EXT_RX_QUEUES (UINT16_MAX - 
RTE_PMD_MLX5_EXTERNAL_RX_QUEUE_ID_MIN + 1)
 
 /*
  * Linux definition of static_assert is found in /usr/include/assert.h.
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 4a85415ff3..3339da054e 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -129,11 +129,11 @@ mlx5_dev_configure(struct rte_eth_dev *dev)
rte_errno = EINVAL;
return -rte_errno;
}
-   if (priv->ext_rxqs && rxqs_n >= MLX5_EXTERNAL_RX_QUEUE_ID_MIN) {
+   if (priv->ext_rxqs && rxqs_n >= RTE_PMD_MLX5_EXTERNAL_RX_QUEUE_ID_MIN) {
DRV_LOG(ERR, "port %u cannot handle this many Rx queues (%u), "
"the maximal number of internal Rx queues is %u",
dev->data->port_id, rxqs_n,
-   MLX5_EXTERNAL_RX_QUEUE_ID_MIN - 1);
+   RTE_PMD_MLX5_EXTERNAL_RX_QUEUE_ID_MIN - 1);
rte_errno = EINVAL;
return -rte_errno;
}
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 8ad85e6027..ca4702efd9 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -170,7 +170,7 @@ mlx5_need_cache_flow(const struct mlx5_priv *priv,
 {
return priv->isolated && priv->sh->config.dv_flow_en == 1 &&
(attr ? !attr->group : true) &&
-   priv->mode_info.mode == MLX5_FLOW_ENGINE_MODE_STANDBY &&
+   priv->mode_info.mode == RTE_PMD_MLX5_FLOW_ENGINE_MODE_STANDBY &&
(!priv->sh->config.dv_esw_en || !priv->sh->config.fdb_def_rule);
 }
 
@@ -7632,7 +7632,7 @@ mlx5_flow_cache_flow_info(struct rte_eth_dev *dev,
  uint32_t flow_idx)
 {
struct mlx5_priv *priv = dev->data->dev_private;
-   struct mlx5_flow_engine_mode_info *mode_info = &priv->mode_info;
+   struct rte_pmd_mlx5_flow_engine_mode_info *mode_info = &priv->mode_info;
struct mlx5_dv_flow_info *flow_info, *tmp_info;
struct rte_flow_error error;
int len, ret;
@@ -7706,7 +7706,7 @@ static int
 mlx5_flow_cache_flow_toggle(struct rte_eth_dev *dev, bool orig_prio)
 {
struct mlx5_priv *priv = dev->data->dev_private;
-   struct mlx5_flow_engine_mode_info *mode_info = &priv->mode_info;
+   struct rte_pmd_mlx5_flow_engine_mode_info *mode_info = &priv->mode_info;
struct mlx5_dv_flow_info *flow_info;
struct rte_flow_attr attr;
struct rte_flow_error error;
@@ -7769,7 +7769,7 @@ mlx5_flow_cache_flow_toggle(struct rte_eth_dev *dev, bool 
orig_prio)
  *

RE: [PATCH v5] net/mlx5: add test for live migration

2023-10-25 Thread Rongwei Liu



BR
Rongwei

> -Original Message-
> From: Thomas Monjalon 
> Sent: Wednesday, October 25, 2023 17:42
> To: Rongwei Liu 
> Cc: dev@dpdk.org; Matan Azrad ; Slava Ovsiienko
> ; Ori Kam ; Suanming Mou
> 
> Subject: Re: [PATCH v5] net/mlx5: add test for live migration
> 
> External email: Use caution opening links or attachments
> 
> 
> 25/10/2023 11:36, Rongwei Liu:
> > +Set Flow Engine Mode
> > +
> > +
> > +Set the flow engine to active or standby mode with specific flags (bitmap
> style)::
> 
> This sentence should end with a dot.
> 
Sure.
> > +See MLX5_FLOW_ENGINE_FLAG_* for the detailed flags definitions.
> 
> MLX5_FLOW_ENGINE_FLAG_* should be between backquotes:
> ``MLX5_FLOW_ENGINE_FLAG_*``
> 
Sure.
> No need "s" to "flags".
> You can also remove "detailed": "flag definitions".
> 
Sure. 
> > +
> > +.. code-block:: console
> > +
> > +   testpmd> mlx5 set flow_engine  []
> > +
> > +This command is used for testing live migration and works for
> > +software steering only.
> > +Default FDB jump should be disabled if switchdev is enabled.
> > +The mode will propagate to all the probed ports.
> 
> 



[PATCH v7] net/mlx5: add test for live migration

2023-10-25 Thread Rongwei Liu
This patch adds testpmd app a runtime function to test the live
migration API.

testpmd> mlx5 set flow_engine  []
Flag is optional.

Signed-off-by: Rongwei Liu 
Acked-by: Viacheslav Ovsiienko 
Acked-by: Ori Kam 
---
 doc/guides/nics/mlx5.rst|  15 
 drivers/net/mlx5/mlx5_testpmd.c | 124 
 2 files changed, 139 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 9039b55c0b..8bfe1e6efd 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2187,3 +2187,18 @@ where:
 * ``sw_queue_id``: queue index in range [64536, 65535].
   This range is the highest 1000 numbers.
 * ``hw_queue_id``: queue index given by HW in queue creation.
+
+Set Flow Engine Mode
+
+
+Set the flow engine to active or standby mode with specific flags (bitmap 
style).
+See ``MLX5_FLOW_ENGINE_FLAG_*`` for the flag definitions.
+
+.. code-block:: console
+
+   testpmd> mlx5 set flow_engine  []
+
+This command is used for testing live migration and works for
+software steering only.
+Default FDB jump should be disabled if switchdev is enabled.
+The mode will propagate to all the probed ports.
diff --git a/drivers/net/mlx5/mlx5_testpmd.c b/drivers/net/mlx5/mlx5_testpmd.c
index 879ea2826e..c70a10b3af 100644
--- a/drivers/net/mlx5/mlx5_testpmd.c
+++ b/drivers/net/mlx5/mlx5_testpmd.c
@@ -25,6 +25,29 @@
 
 static uint8_t host_shaper_avail_thresh_triggered[RTE_MAX_ETHPORTS];
 #define SHAPER_DISABLE_DELAY_US 10 /* 100ms */
+#define PARSE_DELIMITER " \f\n\r\t\v"
+
+static int
+parse_uint(uint64_t *value, const char *str)
+{
+   char *next = NULL;
+   uint64_t n;
+
+   errno = 0;
+   /* Parse number string */
+   if (!strncasecmp(str, "0x", 2)) {
+   str += 2;
+   n = strtol(str, &next, 16);
+   } else {
+   n = strtol(str, &next, 10);
+   }
+   if (errno != 0 || str == next || *next != '\0')
+   return -1;
+
+   *value = n;
+
+   return 0;
+}
 
 /**
  * Disable the host shaper and re-arm available descriptor threshold event.
@@ -561,6 +584,102 @@ cmdline_parse_inst_t mlx5_cmd_unmap_ext_rxq = {
}
 };
 
+/* Set flow engine mode with flags command. */
+struct mlx5_cmd_set_flow_engine_mode {
+   cmdline_fixed_string_t mlx5;
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t flow_engine;
+   cmdline_multi_string_t mode;
+};
+
+static int
+parse_multi_token_flow_engine_mode(char *t_str, enum mlx5_flow_engine_mode 
*mode,
+  uint32_t *flag)
+{
+   uint64_t val;
+   char *token;
+   int ret;
+
+   *flag = 0;
+   /* First token: mode string */
+   token = strtok_r(t_str, PARSE_DELIMITER, &t_str);
+   if (token ==  NULL)
+   return -1;
+
+   if (!strcmp(token, "active"))
+   *mode = MLX5_FLOW_ENGINE_MODE_ACTIVE;
+   else if (!strcmp(token, "standby"))
+   *mode = MLX5_FLOW_ENGINE_MODE_STANDBY;
+   else
+   return -1;
+
+   /* Second token: flag */
+   token = strtok_r(t_str, PARSE_DELIMITER, &t_str);
+   if (token == NULL)
+   return 0;
+
+   ret = parse_uint(&val, token);
+   if (ret != 0 || val > UINT32_MAX)
+   return -1;
+
+   *flag = val;
+   return 0;
+}
+
+static void
+mlx5_cmd_set_flow_engine_mode_parsed(void *parsed_result,
+__rte_unused struct cmdline *cl,
+__rte_unused void *data)
+{
+   struct mlx5_cmd_set_flow_engine_mode *res = parsed_result;
+   enum mlx5_flow_engine_mode mode;
+   uint32_t flag;
+   int ret;
+
+   ret = parse_multi_token_flow_engine_mode(res->mode, &mode, &flag);
+
+   if (ret < 0) {
+   fprintf(stderr, "Bad input\n");
+   return;
+   }
+
+   ret = rte_pmd_mlx5_flow_engine_set_mode(mode, flag);
+
+   if (ret < 0)
+   fprintf(stderr, "Fail to set flow_engine to %s mode with flag 
0x%x, error %s\n",
+   mode == MLX5_FLOW_ENGINE_MODE_ACTIVE ? "active" : 
"standby", flag,
+   strerror(-ret));
+   else
+   TESTPMD_LOG(DEBUG, "Set %d ports flow_engine to %s mode with 
flag 0x%x\n", ret,
+   mode == MLX5_FLOW_ENGINE_MODE_ACTIVE ? "active" : 
"standby", flag);
+}
+
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_mlx5 =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, mlx5,
+"mlx5");
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_set =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, set,
+"set");
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_flow_engine =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, 
flow_engine,
+

Re: [Bug 1304] l3fwd-power example fails to run with uncore options, -U -u and -i

2023-10-25 Thread David Marchand
Hello Siva,

On Wed, Oct 25, 2023 at 11:34 AM  wrote:
>
> Bug ID 1304
> Summary l3fwd-power example fails to run with uncore options, -U -u and -i
> Product DPDK
> Version 23.11
> Hardware All
> OS All
> Status UNCONFIRMED
> Severity normal
> Priority Normal
> Component other
> Assignee dev@dpdk.org
> Reporter karen.ke...@intel.com
> Target Milestone ---
>
> We suspect this particular commit introduces this bug, the sample app will not
> work with the -U -u and -i options.
>
> The options worked when I tried with 23.07. I also tried removing this commit
> and the options worked again.
>
> Result of git show:
>
> commit ac1edcb6621af6ff3c2b01d40e4dd6ed0527a748
> Author: Sivaprasad Tummala 
> Date:   Wed Aug 16 03:09:57 2023 -0700
>
> power: refactor uncore power management API
>
> Currently the uncore power management implementation is vendor specific.
>
> Added new vendor agnostic uncore power interface similar to rte_power
> and rename specific implementations ("rte_power_intel_uncore") to
> "power_intel_uncore" along with functions.
>
> Signed-off-by: Sivaprasad Tummala 
>
>
> DPDK Version: 23.11-rc1
> Commands:
> .//examples/dpdk-l3fwd-power -c 0x6 -n 1 -- -p 0x1 -P
> --config="(0,0,2)" -U
> .//examples/dpdk-l3fwd-power -c 0x6 -n 1 -- -p 0x1 -P
> --config="(0,0,2)" -u
> .//examples/dpdk-l3fwd-power -c 0x6 -n 1 -- -p 0x1 -P
> --config="(0,0,2)" -i 2
> Error:
> EAL: Error - exiting with code: 1
> Cause: Invalid L3FWD parameters

Please register to dpdk.org bugzilla, and have a look at this report.
Thanks.


-- 
David Marchand



RE: [PATCH v2 19/19] ring: use rte optional stdatomic API

2023-10-25 Thread Konstantin Ananyev


> 
> On Tue, Oct 24, 2023 at 09:43:13AM +0100, Konstantin Ananyev wrote:
> > 17.10.2023 21:31, Tyler Retzlaff пишет:
> > >Replace the use of gcc builtin __atomic_xxx intrinsics with
> > >corresponding rte_atomic_xxx optional stdatomic API
> > >
> > >Signed-off-by: Tyler Retzlaff 
> > >---
> > >  drivers/net/mlx5/mlx5_hws_cnt.h   |  2 +-
> > >  lib/ring/rte_ring_c11_pvt.h   | 33 +
> > >  lib/ring/rte_ring_core.h  | 10 +-
> > >  lib/ring/rte_ring_generic_pvt.h   |  3 ++-
> > >  lib/ring/rte_ring_hts_elem_pvt.h  | 22 --
> > >  lib/ring/rte_ring_peek_elem_pvt.h |  6 +++---
> > >  lib/ring/rte_ring_rts_elem_pvt.h  | 27 ++-
> > >  7 files changed, 54 insertions(+), 49 deletions(-)
> > >
> > >diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h 
> > >b/drivers/net/mlx5/mlx5_hws_cnt.h
> > >index f462665..cc9ac10 100644
> > >--- a/drivers/net/mlx5/mlx5_hws_cnt.h
> > >+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
> > >@@ -394,7 +394,7 @@ struct mlx5_hws_age_param {
> > >   __rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
> > >   &zcd->ptr1, &zcd->n1, &zcd->ptr2);
> > >   /* Update tail */
> > >-  __atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
> > >+  rte_atomic_store_explicit(&r->prod.tail, revert2head, 
> > >rte_memory_order_release);
> > >   return n;
> > >  }
> > >diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> > >index f895950..f8be538 100644
> > >--- a/lib/ring/rte_ring_c11_pvt.h
> > >+++ b/lib/ring/rte_ring_c11_pvt.h
> > >@@ -22,9 +22,10 @@
> > >* we need to wait for them to complete
> > >*/
> > >   if (!single)
> > >-  rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> > >+  rte_wait_until_equal_32((volatile uint32_t 
> > >*)(uintptr_t)&ht->tail, old_val,
> > >+  rte_memory_order_relaxed);
> > >-  __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
> > >+  rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
> > >  }
> > >  /**
> > >@@ -61,19 +62,19 @@
> > >   unsigned int max = n;
> > >   int success;
> > >-  *old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
> > >+  *old_head = rte_atomic_load_explicit(&r->prod.head, 
> > >rte_memory_order_relaxed);
> > >   do {
> > >   /* Reset n to the initial burst count */
> > >   n = max;
> > >   /* Ensure the head is read before tail */
> > >-  __atomic_thread_fence(__ATOMIC_ACQUIRE);
> > >+  __atomic_thread_fence(rte_memory_order_acquire);
> > >   /* load-acquire synchronize with store-release of ht->tail
> > >* in update_tail.
> > >*/
> > >-  cons_tail = __atomic_load_n(&r->cons.tail,
> > >-  __ATOMIC_ACQUIRE);
> > >+  cons_tail = rte_atomic_load_explicit(&r->cons.tail,
> > >+  rte_memory_order_acquire);
> > >   /* The subtraction is done between two unsigned 32bits value
> > >* (the result is always modulo 32 bits even if we have
> > >@@ -95,10 +96,10 @@
> > >   r->prod.head = *new_head, success = 1;
> > >   else
> > >   /* on failure, *old_head is updated */
> > >-  success = __atomic_compare_exchange_n(&r->prod.head,
> > >+  success = 
> > >rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
> > >   old_head, *new_head,
> > >-  0, __ATOMIC_RELAXED,
> > >-  __ATOMIC_RELAXED);
> > >+  rte_memory_order_relaxed,
> > >+  rte_memory_order_relaxed);
> > >   } while (unlikely(success == 0));
> > >   return n;
> > >  }
> > >@@ -137,19 +138,19 @@
> > >   int success;
> > >   /* move cons.head atomically */
> > >-  *old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
> > >+  *old_head = rte_atomic_load_explicit(&r->cons.head, 
> > >rte_memory_order_relaxed);
> > >   do {
> > >   /* Restore n as it may change every loop */
> > >   n = max;
> > >   /* Ensure the head is read before tail */
> > >-  __atomic_thread_fence(__ATOMIC_ACQUIRE);
> > >+  __atomic_thread_fence(rte_memory_order_acquire);
> > >   /* this load-acquire synchronize with store-release of ht->tail
> > >* in update_tail.
> > >*/
> > >-  prod_tail = __atomic_load_n(&r->prod.tail,
> > >-  __ATOMIC_ACQUIRE);
> > >+  prod_tail = rte_atomic_load_explicit(&r->prod.tail,
> > >+  rte_memory_order_acquire);
> > >   /* The subtraction is done between two unsigned 32bits value
> > >* (the result is always modulo 32 bits even if we have
> > >@@ -170,10 +171,10 @@
> > >   r->cons.head = *n

Re: [PATCH] net/iavf: fix IAVF_TX_OFFLOAD_MASK definition

2023-10-25 Thread Radu Nicolau



On 25-Oct-23 10:07 AM, David Marchand wrote:

On Wed, Oct 25, 2023 at 11:02 AM Radu Nicolau  wrote:


On 25-Oct-23 12:30 AM, Zhang, Qi Z wrote:

-Original Message-
From: Nicolau, Radu 
Sent: Tuesday, October 24, 2023 10:49 PM
To: Zhang, Qi Z ; Marchand, David

Cc: Wu, Jingjing ; Xing, Beilei ;
dev@dpdk.org; sta...@dpdk.org
Subject: Re: [PATCH] net/iavf: fix IAVF_TX_OFFLOAD_MASK definition


On 24-Oct-23 12:24 PM, Zhang, Qi Z wrote:

-Original Message-
From: Radu Nicolau 
Sent: Tuesday, October 24, 2023 6:23 PM
To: Marchand, David 
Cc: Wu, Jingjing ; Xing, Beilei
; dev@dpdk.org; sta...@dpdk.org
Subject: Re: [PATCH] net/iavf: fix IAVF_TX_OFFLOAD_MASK definition


On 24-Oct-23 10:49 AM, David Marchand wrote:

On Tue, Oct 24, 2023 at 11:13 AM Radu Nicolau


wrote:

IAVF_TX_OFFLOAD_MASK definition contained

RTE_ETH_TX_OFFLOAD_SECURITY

instead of RTE_MBUF_F_TX_SEC_OFFLOAD.

Fixes: 6bc987ecb860 ("net/iavf: support IPsec inline crypto")
Cc: sta...@dpdk.org

Signed-off-by: Radu Nicolau 

Something is not clear to me.
How was the IPsec inline crypto feature supposed to work with this
driver so far?

Any packet with the RTE_MBUF_F_TX_SEC_OFFLOAD flag should have

been

refused in iavf_prep_pkts.


It worked because the IPsec sample app doesn't call
rte_eth_tx_prepare, and from what I can see no other sample app does.

To keep consistent, its better to refine the

IAVF_TX_OFFLOAD_NOTSUP_MASK definition.

You mean like this?


#define IAVF_TX_OFFLOAD_NOTSUP_MASK ( \
   RTE_MBUF_F_TX_OFFLOAD_MASK ^ (  \
   RTE_MBUF_F_TX_OUTER_IPV6 | \
   RTE_MBUF_F_TX_OUTER_IPV4 | \
   RTE_MBUF_F_TX_IPV6 | \
   RTE_MBUF_F_TX_IPV4 | \
   RTE_MBUF_F_TX_VLAN | \
   RTE_MBUF_F_TX_IP_CKSUM | \
   RTE_MBUF_F_TX_L4_MASK | \
   RTE_MBUF_F_TX_TCP_SEG | \
   RTE_MBUF_F_TX_UDP_SEG |  \
   RTE_MBUF_F_TX_TUNNEL_MASK |\
   RTE_MBUF_F_TX_OUTER_IP_CKSUM |  \
   RTE_MBUF_F_TX_OUTER_UDP_CKSUM | \
   RTE_MBUF_F_TX_SEC_OFFLOAD))

Sorry, I miss understanding this code change, actually you didn't remove a 
flag, but just replace it,  NOTSUP_MASK no need to be changed

Then I don't understand why "Any packet with the RTE_MBUF_F_TX_SEC_OFFLOAD flag 
should have refused in iavf_prep_pkts"
But I assume tx_pkt_prepare should reject only invalid packets while still 
functioning correctly with inline IPsec.

rte_eth_tx_prepare would have rejected the packets before this fix, but
no app calls rte_eth_tx_prepare. The only app that calls it is testpmd.

 From my understanding, applications that want checksum offload are
required to call rte_eth_tx_prepare.


TBH I don't understand much about it and looking at the implementation 
actually made things worse: for example from what I can see calling it 
when RTE_MBUF_F_TX_TCP_CKSUM is set will result in having the TCP 
checksum being computed (in software) in the prepare function.




[PATCH v5 00/10] net/mlx5: support indirect actions list

2023-10-25 Thread Gregory Etelson
Add MLX5 PMD support for indirect actions list.

Erez Shitrit (1):
  net/mlx5/hws: allow destination into default miss FT

Gregory Etelson (4):
  net/mlx5: reformat HWS code for HWS mirror action
  net/mlx5: support HWS mirror action
  net/mlx5: reformat HWS code for indirect list actions
  net/mlx5: support indirect list METER_MARK action

Haifei Luo (1):
  net/mlx5/hws: support reformat for hws mirror

Hamdan Igbaria (3):
  net/mlx5/hws: add support for reformat DevX object
  net/mlx5/hws: support creating of dynamic forward table and FTE
  net/mlx5/hws: add mlx5dr DevX object struct to mlx5dr action

Shun Hao (1):
  net/mlx5/hws: add support for mirroring

 doc/guides/nics/features/mlx5.ini  |1 +
 doc/guides/rel_notes/release_23_11.rst |1 +
 drivers/common/mlx5/mlx5_prm.h |   81 +-
 drivers/net/mlx5/hws/mlx5dr.h  |   34 +
 drivers/net/mlx5/hws/mlx5dr_action.c   |  210 +++-
 drivers/net/mlx5/hws/mlx5dr_action.h   |8 +
 drivers/net/mlx5/hws/mlx5dr_cmd.c  |  143 ++-
 drivers/net/mlx5/hws/mlx5dr_cmd.h  |   49 +-
 drivers/net/mlx5/hws/mlx5dr_debug.c|1 +
 drivers/net/mlx5/hws/mlx5dr_internal.h |5 +
 drivers/net/mlx5/hws/mlx5dr_send.c |5 -
 drivers/net/mlx5/hws/mlx5dr_table.c|8 +-
 drivers/net/mlx5/mlx5.c|1 +
 drivers/net/mlx5/mlx5.h|2 +
 drivers/net/mlx5/mlx5_flow.c   |  199 
 drivers/net/mlx5/mlx5_flow.h   |  111 ++-
 drivers/net/mlx5/mlx5_flow_hw.c| 1217 +---
 17 files changed, 1908 insertions(+), 168 deletions(-)

-- 
v3: Add ACK to patches in the series.
v4: Squash reformat patches.
v5: Update release notes.
Fix code style.
--
2.39.2



[PATCH v5 01/10] net/mlx5/hws: add support for reformat DevX object

2023-10-25 Thread Gregory Etelson
From: Hamdan Igbaria 

Add support for creation of packet reformat object,
via the ALLOC_PACKET_REFORMAT_CONTEXT command.

Signed-off-by: Hamdan Igbaria 
Acked-by: Suanming Mou 
---
 drivers/common/mlx5/mlx5_prm.h | 39 +
 drivers/net/mlx5/hws/mlx5dr_cmd.c  | 60 ++
 drivers/net/mlx5/hws/mlx5dr_cmd.h  | 11 +
 drivers/net/mlx5/hws/mlx5dr_internal.h |  5 +++
 drivers/net/mlx5/hws/mlx5dr_send.c |  5 ---
 5 files changed, 115 insertions(+), 5 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 6e181a0eca..4192fff55b 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1218,6 +1218,8 @@ enum {
MLX5_CMD_OP_CREATE_FLOW_GROUP = 0x933,
MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY = 0x936,
MLX5_CMD_OP_MODIFY_FLOW_TABLE = 0x93c,
+   MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT = 0x93d,
+   MLX5_CMD_OP_DEALLOC_PACKET_REFORMAT_CONTEXT = 0x93e,
MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
MLX5_CMD_OP_CREATE_GENERAL_OBJECT = 0xa00,
@@ -5191,6 +5193,43 @@ struct mlx5_ifc_modify_flow_table_out_bits {
u8 reserved_at_40[0x60];
 };
 
+struct mlx5_ifc_packet_reformat_context_in_bits {
+   u8 reformat_type[0x8];
+   u8 reserved_at_8[0x4];
+   u8 reformat_param_0[0x4];
+   u8 reserved_at_16[0x6];
+   u8 reformat_data_size[0xa];
+
+   u8 reformat_param_1[0x8];
+   u8 reserved_at_40[0x8];
+   u8 reformat_data[6][0x8];
+
+   u8 more_reformat_data[][0x8];
+};
+
+struct mlx5_ifc_alloc_packet_reformat_context_in_bits {
+   u8 opcode[0x10];
+   u8 uid[0x10];
+
+   u8 reserved_at_20[0x10];
+   u8 op_mod[0x10];
+
+   u8 reserved_at_40[0xa0];
+
+   u8 packet_reformat_context[];
+};
+
+struct mlx5_ifc_alloc_packet_reformat_out_bits {
+   u8 status[0x8];
+   u8 reserved_at_8[0x18];
+
+   u8 syndrome[0x20];
+
+   u8 packet_reformat_id[0x20];
+
+   u8 reserved_at_60[0x20];
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.c 
b/drivers/net/mlx5/hws/mlx5dr_cmd.c
index 594c59aee3..0ccbaee961 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.c
+++ b/drivers/net/mlx5/hws/mlx5dr_cmd.c
@@ -780,6 +780,66 @@ mlx5dr_cmd_sq_create(struct ibv_context *ctx,
return devx_obj;
 }
 
+struct mlx5dr_devx_obj *
+mlx5dr_cmd_packet_reformat_create(struct ibv_context *ctx,
+ struct mlx5dr_cmd_packet_reformat_create_attr 
*attr)
+{
+   uint32_t out[MLX5_ST_SZ_DW(alloc_packet_reformat_out)] = {0};
+   size_t insz, cmd_data_sz, cmd_total_sz;
+   struct mlx5dr_devx_obj *devx_obj;
+   void *prctx;
+   void *pdata;
+   void *in;
+
+   cmd_total_sz = MLX5_ST_SZ_BYTES(alloc_packet_reformat_context_in);
+   cmd_total_sz += MLX5_ST_SZ_BYTES(packet_reformat_context_in);
+   cmd_data_sz = MLX5_FLD_SZ_BYTES(packet_reformat_context_in, 
reformat_data);
+   insz = align(cmd_total_sz + attr->data_sz - cmd_data_sz, DW_SIZE);
+   in = simple_calloc(1, insz);
+   if (!in) {
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+
+   MLX5_SET(alloc_packet_reformat_context_in, in, opcode,
+MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT);
+
+   prctx = MLX5_ADDR_OF(alloc_packet_reformat_context_in, in,
+packet_reformat_context);
+   pdata = MLX5_ADDR_OF(packet_reformat_context_in, prctx, reformat_data);
+
+   MLX5_SET(packet_reformat_context_in, prctx, reformat_type, attr->type);
+   MLX5_SET(packet_reformat_context_in, prctx, reformat_param_0, 
attr->reformat_param_0);
+   MLX5_SET(packet_reformat_context_in, prctx, reformat_data_size, 
attr->data_sz);
+   memcpy(pdata, attr->data, attr->data_sz);
+
+   devx_obj = simple_malloc(sizeof(*devx_obj));
+   if (!devx_obj) {
+   DR_LOG(ERR, "Failed to allocate memory for packet reformat 
object");
+   rte_errno = ENOMEM;
+   goto out_free_in;
+   }
+
+   devx_obj->obj = mlx5_glue->devx_obj_create(ctx, in, insz, out, 
sizeof(out));
+   if (!devx_obj->obj) {
+   DR_LOG(ERR, "Failed to create packet reformat");
+   rte_errno = errno;
+   goto out_free_devx;
+   }
+
+   devx_obj->id = MLX5_GET(alloc_packet_reformat_out, out, 
packet_reformat_id);
+
+   simple_free(in);
+
+   return devx_obj;
+
+out_free_devx:
+   simple_free(devx_obj);
+out_free_in:
+   simple_free(in);
+   return NULL;
+}
+
 int mlx5dr_cmd_sq_modify_rdy(struct mlx5dr_devx_obj *devx_obj)
 {
uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.h 
b/drivers/net/mlx5/hws/mlx5dr_cmd.h
index 8a495db9b3..f45b6c6b07 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.h
+++ b/drivers/net/mlx5/hws/mlx5d

[PATCH v5 03/10] net/mlx5/hws: add mlx5dr DevX object struct to mlx5dr action

2023-10-25 Thread Gregory Etelson
From: Hamdan Igbaria 

Add mlx5dr_devx_obj struct to mlx5dr_action, so we could hold
the FT obj in dest table action.

Signed-off-by: Hamdan Igbaria 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/hws/mlx5dr_action.c | 4 
 drivers/net/mlx5/hws/mlx5dr_action.h | 3 +++
 drivers/net/mlx5/hws/mlx5dr_table.c  | 1 -
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/hws/mlx5dr_action.c 
b/drivers/net/mlx5/hws/mlx5dr_action.c
index ea9fc23732..55ec4f71c9 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.c
+++ b/drivers/net/mlx5/hws/mlx5dr_action.c
@@ -787,6 +787,8 @@ mlx5dr_action_create_dest_table(struct mlx5dr_context *ctx,
ret = mlx5dr_action_create_stcs(action, tbl->ft);
if (ret)
goto free_action;
+
+   action->devx_dest.devx_obj = tbl->ft;
}
 
return action;
@@ -864,6 +866,8 @@ mlx5dr_action_create_dest_tir(struct mlx5dr_context *ctx,
ret = mlx5dr_action_create_stcs(action, cur_obj);
if (ret)
goto clean_obj;
+
+   action->devx_dest.devx_obj = cur_obj;
}
 
return action;
diff --git a/drivers/net/mlx5/hws/mlx5dr_action.h 
b/drivers/net/mlx5/hws/mlx5dr_action.h
index 314e289780..104c6880c1 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.h
+++ b/drivers/net/mlx5/hws/mlx5dr_action.h
@@ -148,6 +148,9 @@ struct mlx5dr_action {
struct {
struct mlx5dv_steering_anchor *sa;
} root_tbl;
+   struct {
+   struct mlx5dr_devx_obj *devx_obj;
+   } devx_dest;
};
};
 
diff --git a/drivers/net/mlx5/hws/mlx5dr_table.c 
b/drivers/net/mlx5/hws/mlx5dr_table.c
index e1150cd75d..91eb92db78 100644
--- a/drivers/net/mlx5/hws/mlx5dr_table.c
+++ b/drivers/net/mlx5/hws/mlx5dr_table.c
@@ -68,7 +68,6 @@ static void mlx5dr_table_down_default_fdb_miss_tbl(struct 
mlx5dr_table *tbl)
return;
 
mlx5dr_cmd_forward_tbl_destroy(default_miss);
-
ctx->common_res[tbl_type].default_miss = NULL;
 }
 
-- 
2.39.2



[PATCH v5 02/10] net/mlx5/hws: support creating of dynamic forward table and FTE

2023-10-25 Thread Gregory Etelson
From: Hamdan Igbaria 

Add the ability to create forward table and FTE.

Signed-off-by: Hamdan Igbaria 
Acked-by: Suanming Mou 
---
 drivers/common/mlx5/mlx5_prm.h|  4 
 drivers/net/mlx5/hws/mlx5dr_cmd.c | 13 +
 drivers/net/mlx5/hws/mlx5dr_cmd.h | 19 +++
 3 files changed, 36 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 4192fff55b..df621b19af 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -5048,7 +5048,11 @@ enum mlx5_flow_destination_type {
 };
 
 enum mlx5_flow_context_action {
+   MLX5_FLOW_CONTEXT_ACTION_DROP = 1 << 1,
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST = 1 << 2,
+   MLX5_FLOW_CONTEXT_ACTION_REFORMAT = 1 << 4,
+   MLX5_FLOW_CONTEXT_ACTION_DECRYPT = 1 << 12,
+   MLX5_FLOW_CONTEXT_ACTION_ENCRYPT = 1 << 13,
 };
 
 enum mlx5_flow_context_flow_source {
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.c 
b/drivers/net/mlx5/hws/mlx5dr_cmd.c
index 0ccbaee961..8f407f9bce 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.c
+++ b/drivers/net/mlx5/hws/mlx5dr_cmd.c
@@ -42,6 +42,7 @@ mlx5dr_cmd_flow_table_create(struct ibv_context *ctx,
ft_ctx = MLX5_ADDR_OF(create_flow_table_in, in, flow_table_context);
MLX5_SET(flow_table_context, ft_ctx, level, ft_attr->level);
MLX5_SET(flow_table_context, ft_ctx, rtc_valid, ft_attr->rtc_valid);
+   MLX5_SET(flow_table_context, ft_ctx, reformat_en, ft_attr->reformat_en);
 
devx_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out, 
sizeof(out));
if (!devx_obj->obj) {
@@ -182,12 +183,24 @@ mlx5dr_cmd_set_fte(struct ibv_context *ctx,
action_flags = fte_attr->action_flags;
MLX5_SET(flow_context, in_flow_context, action, action_flags);
 
+   if (action_flags & MLX5_FLOW_CONTEXT_ACTION_REFORMAT)
+   MLX5_SET(flow_context, in_flow_context,
+packet_reformat_id, fte_attr->packet_reformat_id);
+
+   if (action_flags & (MLX5_FLOW_CONTEXT_ACTION_DECRYPT | 
MLX5_FLOW_CONTEXT_ACTION_ENCRYPT)) {
+   MLX5_SET(flow_context, in_flow_context,
+encrypt_decrypt_type, fte_attr->encrypt_decrypt_type);
+   MLX5_SET(flow_context, in_flow_context,
+encrypt_decrypt_obj_id, 
fte_attr->encrypt_decrypt_obj_id);
+   }
+
if (action_flags & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
/* Only destination_list_size of size 1 is supported */
MLX5_SET(flow_context, in_flow_context, destination_list_size, 
1);
in_dests = MLX5_ADDR_OF(flow_context, in_flow_context, 
destination);
MLX5_SET(dest_format, in_dests, destination_type, 
fte_attr->destination_type);
MLX5_SET(dest_format, in_dests, destination_id, 
fte_attr->destination_id);
+   MLX5_SET(set_fte_in, in, ignore_flow_level, 
fte_attr->ignore_flow_level);
}
 
devx_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out, 
sizeof(out));
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.h 
b/drivers/net/mlx5/hws/mlx5dr_cmd.h
index f45b6c6b07..bf3a362300 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.h
+++ b/drivers/net/mlx5/hws/mlx5dr_cmd.h
@@ -7,8 +7,12 @@
 
 struct mlx5dr_cmd_set_fte_attr {
uint32_t action_flags;
+   uint8_t encrypt_decrypt_type;
+   uint32_t encrypt_decrypt_obj_id;
+   uint32_t packet_reformat_id;
uint8_t destination_type;
uint32_t destination_id;
+   uint8_t ignore_flow_level;
uint8_t flow_source;
 };
 
@@ -16,6 +20,7 @@ struct mlx5dr_cmd_ft_create_attr {
uint8_t type;
uint8_t level;
bool rtc_valid;
+   uint8_t reformat_en;
 };
 
 #define ACCESS_KEY_LEN 32
@@ -296,6 +301,20 @@ struct mlx5dr_devx_obj *
 mlx5dr_cmd_packet_reformat_create(struct ibv_context *ctx,
  struct mlx5dr_cmd_packet_reformat_create_attr 
*attr);
 
+struct mlx5dr_devx_obj *
+mlx5dr_cmd_set_fte(struct ibv_context *ctx,
+  uint32_t table_type,
+  uint32_t table_id,
+  uint32_t group_id,
+  struct mlx5dr_cmd_set_fte_attr *fte_attr);
+
+struct mlx5dr_cmd_forward_tbl *
+mlx5dr_cmd_forward_tbl_create(struct ibv_context *ctx,
+ struct mlx5dr_cmd_ft_create_attr *ft_attr,
+ struct mlx5dr_cmd_set_fte_attr *fte_attr);
+
+void mlx5dr_cmd_forward_tbl_destroy(struct mlx5dr_cmd_forward_tbl *tbl);
+
 struct mlx5dr_devx_obj *
 mlx5dr_cmd_alias_obj_create(struct ibv_context *ctx,
struct mlx5dr_cmd_alias_obj_create_attr 
*alias_attr);
-- 
2.39.2



[PATCH v5 04/10] net/mlx5/hws: add support for mirroring

2023-10-25 Thread Gregory Etelson
From: Shun Hao 

This patch supports mirroring by adding an dest_array action. The action
accecpts a list containing multiple destination actions, and can duplicate
packet and forward to each destination in the list.

Signed-off-by: Shun Hao 
Acked-by: Alex Vesker 
Acked-by: Suanming Mou 
---
 drivers/common/mlx5/mlx5_prm.h   |  23 -
 drivers/net/mlx5/hws/mlx5dr.h|  34 +++
 drivers/net/mlx5/hws/mlx5dr_action.c | 130 ++-
 drivers/net/mlx5/hws/mlx5dr_action.h |   3 +
 drivers/net/mlx5/hws/mlx5dr_cmd.c|  64 ++---
 drivers/net/mlx5/hws/mlx5dr_cmd.h|  21 -
 drivers/net/mlx5/hws/mlx5dr_debug.c  |   1 +
 drivers/net/mlx5/hws/mlx5dr_table.c  |   7 +-
 8 files changed, 262 insertions(+), 21 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index df621b19af..aa0b622ca2 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -2320,7 +2320,11 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 };
 
 struct mlx5_ifc_esw_cap_bits {
-   u8 reserved_at_0[0x60];
+   u8 reserved_at_0[0x1d];
+   u8 merged_eswitch[0x1];
+   u8 reserved_at_1e[0x2];
+
+   u8 reserved_at_20[0x40];
 
u8 esw_manager_vport_number_valid[0x1];
u8 reserved_at_61[0xf];
@@ -5045,6 +5049,7 @@ struct mlx5_ifc_query_flow_table_out_bits {
 enum mlx5_flow_destination_type {
MLX5_FLOW_DESTINATION_TYPE_VPORT = 0x0,
MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE = 0x1,
+   MLX5_FLOW_DESTINATION_TYPE_TIR = 0x2,
 };
 
 enum mlx5_flow_context_action {
@@ -5088,6 +5093,19 @@ union mlx5_ifc_dest_format_flow_counter_list_auto_bits {
u8 reserved_at_0[0x40];
 };
 
+struct mlx5_ifc_extended_dest_format_bits {
+   struct mlx5_ifc_dest_format_bits destination_entry;
+
+   u8 packet_reformat_id[0x20];
+
+   u8 reserved_at_60[0x20];
+};
+
+#define MLX5_IFC_MULTI_PATH_FT_MAX_LEVEL 64
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
 struct mlx5_ifc_flow_context_bits {
u8 reserved_at_00[0x20];
u8 group_id[0x20];
@@ -5106,8 +5124,7 @@ struct mlx5_ifc_flow_context_bits {
u8 reserved_at_e0[0x40];
u8 encrypt_decrypt_obj_id[0x20];
u8 reserved_at_140[0x16c0];
-   /* Currently only one destnation */
-   union mlx5_ifc_dest_format_flow_counter_list_auto_bits destination[1];
+   union mlx5_ifc_dest_format_flow_counter_list_auto_bits destination[0];
 };
 
 struct mlx5_ifc_set_fte_in_bits {
diff --git a/drivers/net/mlx5/hws/mlx5dr.h b/drivers/net/mlx5/hws/mlx5dr.h
index ea8bf683f3..1995c55132 100644
--- a/drivers/net/mlx5/hws/mlx5dr.h
+++ b/drivers/net/mlx5/hws/mlx5dr.h
@@ -46,6 +46,7 @@ enum mlx5dr_action_type {
MLX5DR_ACTION_TYP_ASO_METER,
MLX5DR_ACTION_TYP_ASO_CT,
MLX5DR_ACTION_TYP_DEST_ROOT,
+   MLX5DR_ACTION_TYP_DEST_ARRAY,
MLX5DR_ACTION_TYP_MAX,
 };
 
@@ -213,6 +214,20 @@ struct mlx5dr_rule_action {
};
 };
 
+struct mlx5dr_action_dest_attr {
+   /* Required action combination */
+   enum mlx5dr_action_type *action_type;
+
+   /* Required destination action to forward the packet */
+   struct mlx5dr_action *dest;
+
+   /* Optional reformat data */
+   struct {
+   size_t reformat_data_sz;
+   void *reformat_data;
+   } reformat;
+};
+
 /* Open a context used for direct rule insertion using hardware steering.
  * Each context can contain multiple tables of different types.
  *
@@ -616,6 +631,25 @@ mlx5dr_action_create_pop_vlan(struct mlx5dr_context *ctx, 
uint32_t flags);
 struct mlx5dr_action *
 mlx5dr_action_create_push_vlan(struct mlx5dr_context *ctx, uint32_t flags);
 
+/* Create a dest array action, this action can duplicate packets and forward to
+ * multiple destinations in the destination list.
+ * @param[in] ctx
+ * The context in which the new action will be created.
+ * @param[in] num_dest
+ * The number of dests attributes.
+ * @param[in] dests
+ * The destination array. Each contains a destination action and can have
+ * additional actions.
+ * @param[in] flags
+ * Action creation flags. (enum mlx5dr_action_flags)
+ * @return pointer to mlx5dr_action on success NULL otherwise.
+ */
+struct mlx5dr_action *
+mlx5dr_action_create_dest_array(struct mlx5dr_context *ctx,
+   size_t num_dest,
+   struct mlx5dr_action_dest_attr *dests,
+   uint32_t flags);
+
 /* Create dest root table, this action will jump to root table according
  * the given priority.
  * @param[in] ctx
diff --git a/drivers/net/mlx5/hws/mlx5dr_action.c 
b/drivers/net/mlx5/hws/mlx5dr_action.c
index 55ec4f71c9..f068bc7e9c 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.c
+++ b/drivers/net/mlx5/hws/mlx5dr_action.c
@@ -34,7 +34,8 @@ static const uint32_t 
action_order_arr[MLX5DR_TABLE_TYPE_MAX][MLX5DR_ACTION_TYP_
BIT(MLX5DR_ACTION_TYP_MISS) 

[PATCH v5 05/10] net/mlx5/hws: allow destination into default miss FT

2023-10-25 Thread Gregory Etelson
From: Erez Shitrit 

In FDB it will direct the packet into the hypervisor vport.
That allows the user to mirror packets into the default-miss vport.

Signed-off-by: Erez Shitrit 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/hws/mlx5dr_action.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/mlx5/hws/mlx5dr_action.c 
b/drivers/net/mlx5/hws/mlx5dr_action.c
index f068bc7e9c..6b62111593 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.c
+++ b/drivers/net/mlx5/hws/mlx5dr_action.c
@@ -1769,6 +1769,17 @@ mlx5dr_action_create_dest_array(struct mlx5dr_context 
*ctx,
fte_attr.action_flags |= 
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
fte_attr.ignore_flow_level = 1;
break;
+   case MLX5DR_ACTION_TYP_MISS:
+   if (table_type != MLX5DR_TABLE_TYPE_FDB) {
+   DR_LOG(ERR, "Miss action supported for 
FDB only");
+   rte_errno = ENOTSUP;
+   goto free_dest_list;
+   }
+   dest_list[i].destination_type = 
MLX5_FLOW_DESTINATION_TYPE_VPORT;
+   dest_list[i].destination_id =
+   ctx->caps->eswitch_manager_vport_number;
+   fte_attr.action_flags |= 
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+   break;
case MLX5DR_ACTION_TYP_VPORT:
dest_list[i].destination_type = 
MLX5_FLOW_DESTINATION_TYPE_VPORT;
dest_list[i].destination_id = 
dests[i].dest->vport.vport_num;
-- 
2.39.2



[PATCH v5 06/10] net/mlx5/hws: support reformat for hws mirror

2023-10-25 Thread Gregory Etelson
From: Haifei Luo 

In dest_array action, an optional reformat action can be applied to each
destination. This patch supports this by using the extended destination
entry.

Signed-off-by: Haifei Luo 
Signed-off-by: Shun Hao 
Acked-by: Suanming Mou 
---
 drivers/common/mlx5/mlx5_prm.h   | 15 +++
 drivers/net/mlx5/hws/mlx5dr_action.c | 67 +++-
 drivers/net/mlx5/hws/mlx5dr_action.h |  2 +
 drivers/net/mlx5/hws/mlx5dr_cmd.c| 10 -
 drivers/net/mlx5/hws/mlx5dr_cmd.h|  2 +
 5 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index aa0b622ca2..bced5a59dd 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -5352,6 +5352,21 @@ enum mlx5_parse_graph_arc_node_index {
MLX5_GRAPH_ARC_NODE_PROGRAMMABLE = 0x1f,
 };
 
+enum mlx5_packet_reformat_context_reformat_type {
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L2_TO_L2_TUNNEL = 0x2,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L2_TO_L3_TUNNEL = 0x4,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV4 
= 0x5,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L2_TO_L3_ESP_TUNNEL = 0x6,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_UDPV4 
= 0x7,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_DEL_ESP_TRANSPORT = 0x8,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L3_ESP_TUNNEL_TO_L2 = 0x9,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_DEL_ESP_TRANSPORT_OVER_UDP = 
0xA,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV6 
= 0xB,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_UDPV6 
= 0xC,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_ADD_NISP_TNL = 0xD,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_REMOVE_NISP_TNL = 0xE,
+};
+
 #define MLX5_PARSE_GRAPH_FLOW_SAMPLE_MAX 8
 #define MLX5_PARSE_GRAPH_IN_ARC_MAX 8
 #define MLX5_PARSE_GRAPH_OUT_ARC_MAX 8
diff --git a/drivers/net/mlx5/hws/mlx5dr_action.c 
b/drivers/net/mlx5/hws/mlx5dr_action.c
index 6b62111593..11a7c58925 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.c
+++ b/drivers/net/mlx5/hws/mlx5dr_action.c
@@ -1703,6 +1703,44 @@ mlx5dr_action_create_modify_header(struct mlx5dr_context 
*ctx,
return NULL;
 }
 
+static struct mlx5dr_devx_obj *
+mlx5dr_action_dest_array_process_reformat(struct mlx5dr_context *ctx,
+ enum mlx5dr_action_type type,
+ void *reformat_data,
+ size_t reformat_data_sz)
+{
+   struct mlx5dr_cmd_packet_reformat_create_attr pr_attr = {0};
+   struct mlx5dr_devx_obj *reformat_devx_obj;
+
+   if (!reformat_data || !reformat_data_sz) {
+   DR_LOG(ERR, "Empty reformat action or data");
+   rte_errno = EINVAL;
+   return NULL;
+   }
+
+   switch (type) {
+   case MLX5DR_ACTION_TYP_REFORMAT_L2_TO_TNL_L2:
+   pr_attr.type = 
MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L2_TO_L2_TUNNEL;
+   break;
+   case MLX5DR_ACTION_TYP_REFORMAT_L2_TO_TNL_L3:
+   pr_attr.type = 
MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L2_TO_L3_TUNNEL;
+   break;
+   default:
+   DR_LOG(ERR, "Invalid value for reformat type");
+   rte_errno = EINVAL;
+   return NULL;
+   }
+   pr_attr.reformat_param_0 = 0;
+   pr_attr.data_sz = reformat_data_sz;
+   pr_attr.data = reformat_data;
+
+   reformat_devx_obj = mlx5dr_cmd_packet_reformat_create(ctx->ibv_ctx, 
&pr_attr);
+   if (!reformat_devx_obj)
+   return NULL;
+
+   return reformat_devx_obj;
+}
+
 struct mlx5dr_action *
 mlx5dr_action_create_dest_array(struct mlx5dr_context *ctx,
size_t num_dest,
@@ -1710,6 +1748,7 @@ mlx5dr_action_create_dest_array(struct mlx5dr_context 
*ctx,
uint32_t flags)
 {
struct mlx5dr_cmd_set_fte_dest *dest_list = NULL;
+   struct mlx5dr_devx_obj *packet_reformat = NULL;
struct mlx5dr_cmd_ft_create_attr ft_attr = {0};
struct mlx5dr_cmd_set_fte_attr fte_attr = {0};
struct mlx5dr_cmd_forward_tbl *fw_island;
@@ -1796,6 +1835,21 @@ mlx5dr_action_create_dest_array(struct mlx5dr_context 
*ctx,
dest_list[i].destination_id = 
dests[i].dest->devx_dest.devx_obj->id;
fte_attr.action_flags |= 
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
break;
+   case MLX5DR_ACTION_TYP_REFORMAT_L2_TO_TNL_L2:
+   case MLX5DR_ACTION_TYP_REFORMAT_L2_TO_TNL_L3:
+   packet_reformat = 
mlx5dr_action_dest_array_process_reformat
+   (ctx,
+   

[PATCH v5 07/10] net/mlx5: reformat HWS code for HWS mirror action

2023-10-25 Thread Gregory Etelson
Reformat HWS code for HWS mirror action.

Signed-off-by: Gregory Etelson 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 70 ++---
 1 file changed, 39 insertions(+), 31 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 6fcf654e4a..b2215fb5cf 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4548,6 +4548,17 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] 
= {
[RTE_FLOW_ACTION_TYPE_SEND_TO_KERNEL] = MLX5DR_ACTION_TYP_DEST_ROOT,
 };
 
+static inline void
+action_template_set_type(struct rte_flow_actions_template *at,
+enum mlx5dr_action_type *action_types,
+unsigned int action_src, uint16_t *curr_off,
+enum mlx5dr_action_type type)
+{
+   at->actions_off[action_src] = *curr_off;
+   action_types[*curr_off] = type;
+   *curr_off = *curr_off + 1;
+}
+
 static int
 flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
  unsigned int action_src,
@@ -4565,9 +4576,8 @@ flow_hw_dr_actions_template_handle_shared(const struct 
rte_flow_action *mask,
type = mask->type;
switch (type) {
case RTE_FLOW_ACTION_TYPE_RSS:
-   at->actions_off[action_src] = *curr_off;
-   action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
-   *curr_off = *curr_off + 1;
+   action_template_set_type(at, action_types, action_src, curr_off,
+MLX5DR_ACTION_TYP_TIR);
break;
case RTE_FLOW_ACTION_TYPE_AGE:
case RTE_FLOW_ACTION_TYPE_COUNT:
@@ -4575,23 +4585,20 @@ flow_hw_dr_actions_template_handle_shared(const struct 
rte_flow_action *mask,
 * Both AGE and COUNT action need counter, the first one fills
 * the action_types array, and the second only saves the offset.
 */
-   if (*cnt_off == UINT16_MAX) {
-   *cnt_off = *curr_off;
-   action_types[*cnt_off] = MLX5DR_ACTION_TYP_CTR;
-   *curr_off = *curr_off + 1;
-   }
+   if (*cnt_off == UINT16_MAX)
+   action_template_set_type(at, action_types,
+action_src, curr_off,
+MLX5DR_ACTION_TYP_CTR);
at->actions_off[action_src] = *cnt_off;
break;
case RTE_FLOW_ACTION_TYPE_CONNTRACK:
-   at->actions_off[action_src] = *curr_off;
-   action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
-   *curr_off = *curr_off + 1;
+   action_template_set_type(at, action_types, action_src, curr_off,
+MLX5DR_ACTION_TYP_ASO_CT);
break;
case RTE_FLOW_ACTION_TYPE_QUOTA:
case RTE_FLOW_ACTION_TYPE_METER_MARK:
-   at->actions_off[action_src] = *curr_off;
-   action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_METER;
-   *curr_off = *curr_off + 1;
+   action_template_set_type(at, action_types, action_src, curr_off,
+MLX5DR_ACTION_TYP_ASO_METER);
break;
default:
DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
@@ -5101,31 +5108,32 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
at->reformat_off = UINT16_MAX;
at->mhdr_off = UINT16_MAX;
at->rx_cpy_pos = pos;
-   /*
-* mlx5 PMD hacks indirect action index directly to the action conf.
-* The rte_flow_conv() function copies the content from conf pointer.
-* Need to restore the indirect action index from action conf here.
-*/
for (i = 0; actions->type != RTE_FLOW_ACTION_TYPE_END;
 actions++, masks++, i++) {
-   if (actions->type == RTE_FLOW_ACTION_TYPE_INDIRECT) {
+   const struct rte_flow_action_modify_field *info;
+
+   switch (actions->type) {
+   /*
+* mlx5 PMD hacks indirect action index directly to the action 
conf.
+* The rte_flow_conv() function copies the content from conf 
pointer.
+* Need to restore the indirect action index from action conf 
here.
+*/
+   case RTE_FLOW_ACTION_TYPE_INDIRECT:
at->actions[i].conf = actions->conf;
at->masks[i].conf = masks->conf;
-   }
-   if (actions->type == RTE_FLOW_ACTION_TYPE_MODIFY_FIELD) {
-   const struct rte_flow_action_modify_field *info = 
actions->conf;
-
+   break;
+   case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+   i

[PATCH v5 08/10] net/mlx5: support HWS mirror action

2023-10-25 Thread Gregory Etelson
HWS mirror clones original packet to one or two destinations and
proceeds with the original packet path.

The mirror has no dedicated RTE flow action type.
Mirror object is referenced by INDIRECT_LIST action.
INDIRECT_LIST for a mirror built from actions list:

SAMPLE [/ SAMPLE] /  / END

Mirror SAMPLE action defines packet clone. It specifies the clone
destination and optional clone reformat action.
Destination action for both clone and original packet depends on HCA
domain:
- for NIC RX, destination is ether RSS or QUEUE
- for FDB, destination is PORT

HWS mirror was inplemented with the INDIRECT_LIST flow action.

MLX5 PMD defines general `struct mlx5_indirect_list` type for all.
INDIRECT_LIST handler objects:

struct mlx5_indirect_list {
enum mlx5_indirect_list_type type;
LIST_ENTRY(mlx5_indirect_list) chain;
char data[];
};

Specific INDIRECT_LIST type must overload `mlx5_indirect_list::data`
and provide unique `type` value.
PMD returns a pointer to `mlx5_indirect_list` object.

Existing non-masked actions template API cannot identify flow actions
in INDIRECT_LIST handler because INDIRECT_LIST handler can represent
several flow actions.

For example:
A: SAMPLE / JUMP
B: SAMPE / SAMPLE / RSS

Actions template command

template indirect_list / end mask indirect_list 0 / end

does not provide any information to differentiate between flow
actions in A and B.

MLX5 PMD requires INDIRECT_LIST configuration parameter in the
template section:

Non-masked INDIRECT_LIST API:
=

template indirect_list X / end mask indirect_list 0 / end

PMD identifies type of X handler and will use the same type in
template creation. Actual parameters for actions in the list will
be extracted from flow configuration

Masked INDIRECT_LIST API:
=

template indirect_list X / end mask indirect_list -lUL / end

PMD creates action template from actions types and configurations
referenced by X.

INDIRECT_LIST action without configuration is invalid and will be
rejected by PMD.

Signed-off-by: Gregory Etelson 
Acked-by: Suanming Mou 
---
 doc/guides/nics/features/mlx5.ini  |   1 +
 doc/guides/rel_notes/release_23_11.rst |   1 +
 drivers/net/mlx5/mlx5.c|   1 +
 drivers/net/mlx5/mlx5.h|   2 +
 drivers/net/mlx5/mlx5_flow.c   | 134 ++
 drivers/net/mlx5/mlx5_flow.h   |  69 ++-
 drivers/net/mlx5/mlx5_flow_hw.c| 615 -
 7 files changed, 818 insertions(+), 5 deletions(-)

diff --git a/doc/guides/nics/features/mlx5.ini 
b/doc/guides/nics/features/mlx5.ini
index fc67415c6c..a85d755734 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -106,6 +106,7 @@ drop = Y
 flag = Y
 inc_tcp_ack  = Y
 inc_tcp_seq  = Y
+indirect_list= Y
 jump = Y
 mark = Y
 meter= Y
diff --git a/doc/guides/rel_notes/release_23_11.rst 
b/doc/guides/rel_notes/release_23_11.rst
index 0a6fc76a9d..81d606e773 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -143,6 +143,7 @@ New Features
 * **Updated NVIDIA mlx5 net driver.**
 
   * Added support for Network Service Header (NSH) flow matching.
+  * Added support for ``RTE_FLOW_ACTION_TYPE_INDIRECT_LIST`` flow action.
 
 * **Updated Solarflare net driver.**
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 997df595d0..08b7b03365 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2168,6 +2168,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
/* Free the eCPRI flex parser resource. */
mlx5_flex_parser_ecpri_release(dev);
mlx5_flex_item_port_cleanup(dev);
+   mlx5_indirect_list_handles_release(dev);
 #ifdef HAVE_MLX5_HWS_SUPPORT
flow_hw_destroy_vport_action(dev);
flow_hw_resource_release(dev);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0b709a1bda..f3b872f59c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1791,6 +1791,8 @@ struct mlx5_priv {
LIST_HEAD(ind_tables, mlx5_ind_table_obj) ind_tbls;
/* Standalone indirect tables. */
LIST_HEAD(stdl_ind_tables, mlx5_ind_table_obj) standalone_ind_tbls;
+   /* Objects created with indirect list action */
+   LIST_HEAD(indirect_list, mlx5_indirect_list) indirect_list_head;
/* Pointer to next element. */
rte_rwlock_t ind_tbls_lock;
uint32_t refcnt; /**< Reference counter. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 8ad85e6027..99b814d815 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -62,6 +62,30 @@ struct tunnel_default_miss_ctx {
};
 };
 
+void
+mlx5_indirect_list_handles_release(str

[PATCH v5 09/10] net/mlx5: reformat HWS code for indirect list actions

2023-10-25 Thread Gregory Etelson
Reformat HWS code for indirect list actions.

Signed-off-by: Gregory Etelson 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5_flow.h|   4 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 250 +---
 2 files changed, 139 insertions(+), 115 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 580db80fd4..653f83cf55 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1331,11 +1331,11 @@ struct rte_flow_actions_template {
uint64_t action_flags; /* Bit-map of all valid action in template. */
uint16_t dr_actions_num; /* Amount of DR rules actions. */
uint16_t actions_num; /* Amount of flow actions */
-   uint16_t *actions_off; /* DR action offset for given rte action offset. 
*/
+   uint16_t *dr_off; /* DR action offset for given rte action offset. */
+   uint16_t *src_off; /* RTE action displacement from app. template */
uint16_t reformat_off; /* Offset of DR reformat action. */
uint16_t mhdr_off; /* Offset of DR modify header action. */
uint32_t refcnt; /* Reference counter. */
-   uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
uint8_t flex_item; /* flex item index. */
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 1c3d915be1..f9f735ba75 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -1015,11 +1015,11 @@ flow_hw_modify_field_init(struct 
mlx5_hw_modify_header_action *mhdr,
 static __rte_always_inline int
 flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 const struct rte_flow_attr *attr,
-const struct rte_flow_action *action_start, /* 
Start of AT actions. */
 const struct rte_flow_action *action, /* Current 
action from AT. */
 const struct rte_flow_action *action_mask, /* 
Current mask from AT. */
 struct mlx5_hw_actions *acts,
 struct mlx5_hw_modify_header_action *mhdr,
+uint16_t src_pos,
 struct rte_flow_error *error)
 {
struct mlx5_priv *priv = dev->data->dev_private;
@@ -1122,7 +1122,7 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
if (shared)
return 0;
ret = __flow_hw_act_data_hdr_modify_append(priv, acts, 
RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
-  action - action_start, 
mhdr->pos,
+  src_pos, mhdr->pos,
   cmds_start, cmds_end, shared,
   field, dcopy, mask);
if (ret)
@@ -1181,11 +1181,10 @@ flow_hw_validate_compiled_modify_field(struct 
rte_eth_dev *dev,
 static int
 flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 const struct rte_flow_attr *attr,
-const struct rte_flow_action *action_start,
 const struct rte_flow_action *action,
 const struct rte_flow_action *action_mask,
 struct mlx5_hw_actions *acts,
-uint16_t action_dst,
+uint16_t action_src, uint16_t action_dst,
 struct rte_flow_error *error)
 {
struct mlx5_priv *priv = dev->data->dev_private;
@@ -1241,7 +1240,7 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
} else {
ret = __flow_hw_act_data_general_append
(priv, acts, action->type,
-action - action_start, action_dst);
+action_src, action_dst);
if (ret)
return rte_flow_error_set
(error, ENOMEM,
@@ -1493,7 +1492,6 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
const struct rte_flow_attr *attr = &table_attr->flow_attr;
struct rte_flow_action *actions = at->actions;
-   struct rte_flow_action *action_start = actions;
struct rte_flow_action *masks = at->masks;
enum mlx5dr_action_type refmt_type = MLX5DR_ACTION_TYP_LAST;
const struct rte_flow_action_raw_encap *raw_encap_data;
@@ -1506,7 +1504,6 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
uint32_t type;
bool reformat_used = false;
unsigned int of_vlan_offset;
-   uint16_t action_pos;
uint16_t jump_pos;
uint32_t ct_idx;
int ret, err;
@@ -1521,71 +1518,69 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
else
type = MLX5DR_

[PATCH v5 10/10] net/mlx5: support indirect list METER_MARK action

2023-10-25 Thread Gregory Etelson
Signed-off-by: Gregory Etelson 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5_flow.c|  69 -
 drivers/net/mlx5/mlx5_flow.h|  70 --
 drivers/net/mlx5/mlx5_flow_hw.c | 432 +++-
 3 files changed, 485 insertions(+), 86 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 99b814d815..34252d66c0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -75,8 +75,11 @@ mlx5_indirect_list_handles_release(struct rte_eth_dev *dev)
switch (e->type) {
 #ifdef HAVE_MLX5_HWS_SUPPORT
case MLX5_INDIRECT_ACTION_LIST_TYPE_MIRROR:
-   mlx5_hw_mirror_destroy(dev, (struct mlx5_mirror *)e, 
true);
+   mlx5_hw_mirror_destroy(dev, (struct mlx5_mirror *)e);
break;
+   case MLX5_INDIRECT_ACTION_LIST_TYPE_LEGACY:
+   mlx5_destroy_legacy_indirect(dev, e);
+   break;
 #endif
default:
DRV_LOG(ERR, "invalid indirect list type");
@@ -1169,7 +1172,24 @@ mlx5_flow_async_action_list_handle_destroy
 const struct rte_flow_op_attr *op_attr,
 struct rte_flow_action_list_handle *action_handle,
 void *user_data, struct rte_flow_error *error);
-
+static int
+mlx5_flow_action_list_handle_query_update(struct rte_eth_dev *dev,
+ const
+ struct rte_flow_action_list_handle 
*handle,
+ const void **update, void **query,
+ enum rte_flow_query_update_mode mode,
+ struct rte_flow_error *error);
+static int
+mlx5_flow_async_action_list_handle_query_update(struct rte_eth_dev *dev,
+   uint32_t queue_id,
+   const struct rte_flow_op_attr 
*attr,
+   const struct
+   rte_flow_action_list_handle 
*handle,
+   const void **update,
+   void **query,
+   enum rte_flow_query_update_mode 
mode,
+   void *user_data,
+   struct rte_flow_error *error);
 static const struct rte_flow_ops mlx5_flow_ops = {
.validate = mlx5_flow_validate,
.create = mlx5_flow_create,
@@ -1219,6 +1239,10 @@ static const struct rte_flow_ops mlx5_flow_ops = {
mlx5_flow_async_action_list_handle_create,
.async_action_list_handle_destroy =
mlx5_flow_async_action_list_handle_destroy,
+   .action_list_handle_query_update =
+   mlx5_flow_action_list_handle_query_update,
+   .async_action_list_handle_query_update =
+   mlx5_flow_async_action_list_handle_query_update,
 };
 
 /* Tunnel information. */
@@ -11003,6 +11027,47 @@ mlx5_flow_async_action_list_handle_destroy
  error);
 }
 
+static int
+mlx5_flow_action_list_handle_query_update(struct rte_eth_dev *dev,
+ const
+ struct rte_flow_action_list_handle 
*handle,
+ const void **update, void **query,
+ enum rte_flow_query_update_mode mode,
+ struct rte_flow_error *error)
+{
+   const struct mlx5_flow_driver_ops *fops;
+
+   MLX5_DRV_FOPS_OR_ERR(dev, fops,
+action_list_handle_query_update, ENOTSUP);
+   return fops->action_list_handle_query_update(dev, handle, update, query,
+mode, error);
+}
+
+static int
+mlx5_flow_async_action_list_handle_query_update(struct rte_eth_dev *dev,
+   uint32_t queue_id,
+   const
+   struct rte_flow_op_attr 
*op_attr,
+   const struct
+   rte_flow_action_list_handle 
*handle,
+   const void **update,
+   void **query,
+   enum
+   rte_flow_query_update_mode mode,
+   void *user_data,
+   struct rte_flow_error *error)
+{
+   const struct mlx5_flow_driver_ops *fops;
+
+   MLX5_

Re: [PATCH v1 0/1] doc: bbdev device discovery clarification

2023-10-25 Thread Maxime Coquelin




On 10/10/23 22:34, Nicolas Chautru wrote:

Adding more information in bbdev documentation related to the
bbdev device discovery from info_get which was not very verbose so far.
Notably for FEC and FFT operations which have extra parameters to
manage different implementation variants.

Also use code snippet to refer to info structure and keep the doc
in sync moving forward.

This is on top of this serie
https://patches.dpdk.org/project/dpdk/list/?series=29744

Nicolas Chautru (1):
   doc: bbdev device discovery clarification

  doc/guides/prog_guide/bbdev.rst | 60 -
  lib/bbdev/rte_bbdev.h   |  6 
  2 files changed, 57 insertions(+), 9 deletions(-)



Applied to next-baseband/for-main.

Thanks,
Maxime



RE: [PATCH] net/mlx5: fix wrong decap action checking in sample flow

2023-10-25 Thread Raslan Darawsheh
Hi,
> -Original Message-
> From: Jiawei(Jonny) Wang 
> Sent: Wednesday, October 11, 2023 9:37 AM
> To: Suanming Mou ; Slava Ovsiienko
> 
> Cc: dev@dpdk.org; Raslan Darawsheh ;
> sta...@dpdk.org
> Subject: [PATCH] net/mlx5: fix wrong decap action checking in sample flow
> 
> This patch uses the temp variable to check the current action type, to avoid
> overlap  the sample action following the decap.
> 
> Fixes: 7356aec64c48 ("net/mlx5: fix mirror flow split with L3 encapsulation")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Jiawei Wang 
> Acked-by: Suanming Mou 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh



[PATCH v6 00/10] net/mlx5: support indirect actions list

2023-10-25 Thread Gregory Etelson
Add MLX5 PMD support for indirect actions list.

Erez Shitrit (1):
  net/mlx5/hws: allow destination into default miss FT

Gregory Etelson (4):
  net/mlx5: reformat HWS code for HWS mirror action
  net/mlx5: support HWS mirror action
  net/mlx5: reformat HWS code for indirect list actions
  net/mlx5: support indirect list METER_MARK action

Haifei Luo (1):
  net/mlx5/hws: support reformat for hws mirror

Hamdan Igbaria (3):
  net/mlx5/hws: add support for reformat DevX object
  net/mlx5/hws: support creating of dynamic forward table and FTE
  net/mlx5/hws: add mlx5dr DevX object struct to mlx5dr action

Shun Hao (1):
  net/mlx5/hws: add support for mirroring

 doc/guides/nics/features/mlx5.ini  |1 +
 doc/guides/rel_notes/release_23_11.rst |1 +
 drivers/common/mlx5/mlx5_prm.h |   81 +-
 drivers/net/mlx5/hws/mlx5dr.h  |   34 +
 drivers/net/mlx5/hws/mlx5dr_action.c   |  210 +++-
 drivers/net/mlx5/hws/mlx5dr_action.h   |8 +
 drivers/net/mlx5/hws/mlx5dr_cmd.c  |  143 ++-
 drivers/net/mlx5/hws/mlx5dr_cmd.h  |   49 +-
 drivers/net/mlx5/hws/mlx5dr_debug.c|1 +
 drivers/net/mlx5/hws/mlx5dr_internal.h |5 +
 drivers/net/mlx5/hws/mlx5dr_send.c |5 -
 drivers/net/mlx5/hws/mlx5dr_table.c|8 +-
 drivers/net/mlx5/mlx5.c|1 +
 drivers/net/mlx5/mlx5.h|2 +
 drivers/net/mlx5/mlx5_flow.c   |  199 
 drivers/net/mlx5/mlx5_flow.h   |  111 ++-
 drivers/net/mlx5/mlx5_flow_hw.c| 1217 +---
 17 files changed, 1908 insertions(+), 168 deletions(-)

--
v3: Add ACK to patches in the series.
v4: Squash reformat patches.
v5: Update release notes.
Fix code style.
v6: Fix code style.
--
2.39.2



[PATCH v6 01/10] net/mlx5/hws: add support for reformat DevX object

2023-10-25 Thread Gregory Etelson
From: Hamdan Igbaria 

Add support for creation of packet reformat object,
via the ALLOC_PACKET_REFORMAT_CONTEXT command.

Signed-off-by: Hamdan Igbaria 
Acked-by: Suanming Mou 
---
 drivers/common/mlx5/mlx5_prm.h | 39 +
 drivers/net/mlx5/hws/mlx5dr_cmd.c  | 60 ++
 drivers/net/mlx5/hws/mlx5dr_cmd.h  | 11 +
 drivers/net/mlx5/hws/mlx5dr_internal.h |  5 +++
 drivers/net/mlx5/hws/mlx5dr_send.c |  5 ---
 5 files changed, 115 insertions(+), 5 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 6e181a0eca..4192fff55b 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1218,6 +1218,8 @@ enum {
MLX5_CMD_OP_CREATE_FLOW_GROUP = 0x933,
MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY = 0x936,
MLX5_CMD_OP_MODIFY_FLOW_TABLE = 0x93c,
+   MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT = 0x93d,
+   MLX5_CMD_OP_DEALLOC_PACKET_REFORMAT_CONTEXT = 0x93e,
MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
MLX5_CMD_OP_CREATE_GENERAL_OBJECT = 0xa00,
@@ -5191,6 +5193,43 @@ struct mlx5_ifc_modify_flow_table_out_bits {
u8 reserved_at_40[0x60];
 };
 
+struct mlx5_ifc_packet_reformat_context_in_bits {
+   u8 reformat_type[0x8];
+   u8 reserved_at_8[0x4];
+   u8 reformat_param_0[0x4];
+   u8 reserved_at_16[0x6];
+   u8 reformat_data_size[0xa];
+
+   u8 reformat_param_1[0x8];
+   u8 reserved_at_40[0x8];
+   u8 reformat_data[6][0x8];
+
+   u8 more_reformat_data[][0x8];
+};
+
+struct mlx5_ifc_alloc_packet_reformat_context_in_bits {
+   u8 opcode[0x10];
+   u8 uid[0x10];
+
+   u8 reserved_at_20[0x10];
+   u8 op_mod[0x10];
+
+   u8 reserved_at_40[0xa0];
+
+   u8 packet_reformat_context[];
+};
+
+struct mlx5_ifc_alloc_packet_reformat_out_bits {
+   u8 status[0x8];
+   u8 reserved_at_8[0x18];
+
+   u8 syndrome[0x20];
+
+   u8 packet_reformat_id[0x20];
+
+   u8 reserved_at_60[0x20];
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.c 
b/drivers/net/mlx5/hws/mlx5dr_cmd.c
index 594c59aee3..0ccbaee961 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.c
+++ b/drivers/net/mlx5/hws/mlx5dr_cmd.c
@@ -780,6 +780,66 @@ mlx5dr_cmd_sq_create(struct ibv_context *ctx,
return devx_obj;
 }
 
+struct mlx5dr_devx_obj *
+mlx5dr_cmd_packet_reformat_create(struct ibv_context *ctx,
+ struct mlx5dr_cmd_packet_reformat_create_attr 
*attr)
+{
+   uint32_t out[MLX5_ST_SZ_DW(alloc_packet_reformat_out)] = {0};
+   size_t insz, cmd_data_sz, cmd_total_sz;
+   struct mlx5dr_devx_obj *devx_obj;
+   void *prctx;
+   void *pdata;
+   void *in;
+
+   cmd_total_sz = MLX5_ST_SZ_BYTES(alloc_packet_reformat_context_in);
+   cmd_total_sz += MLX5_ST_SZ_BYTES(packet_reformat_context_in);
+   cmd_data_sz = MLX5_FLD_SZ_BYTES(packet_reformat_context_in, 
reformat_data);
+   insz = align(cmd_total_sz + attr->data_sz - cmd_data_sz, DW_SIZE);
+   in = simple_calloc(1, insz);
+   if (!in) {
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+
+   MLX5_SET(alloc_packet_reformat_context_in, in, opcode,
+MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT);
+
+   prctx = MLX5_ADDR_OF(alloc_packet_reformat_context_in, in,
+packet_reformat_context);
+   pdata = MLX5_ADDR_OF(packet_reformat_context_in, prctx, reformat_data);
+
+   MLX5_SET(packet_reformat_context_in, prctx, reformat_type, attr->type);
+   MLX5_SET(packet_reformat_context_in, prctx, reformat_param_0, 
attr->reformat_param_0);
+   MLX5_SET(packet_reformat_context_in, prctx, reformat_data_size, 
attr->data_sz);
+   memcpy(pdata, attr->data, attr->data_sz);
+
+   devx_obj = simple_malloc(sizeof(*devx_obj));
+   if (!devx_obj) {
+   DR_LOG(ERR, "Failed to allocate memory for packet reformat 
object");
+   rte_errno = ENOMEM;
+   goto out_free_in;
+   }
+
+   devx_obj->obj = mlx5_glue->devx_obj_create(ctx, in, insz, out, 
sizeof(out));
+   if (!devx_obj->obj) {
+   DR_LOG(ERR, "Failed to create packet reformat");
+   rte_errno = errno;
+   goto out_free_devx;
+   }
+
+   devx_obj->id = MLX5_GET(alloc_packet_reformat_out, out, 
packet_reformat_id);
+
+   simple_free(in);
+
+   return devx_obj;
+
+out_free_devx:
+   simple_free(devx_obj);
+out_free_in:
+   simple_free(in);
+   return NULL;
+}
+
 int mlx5dr_cmd_sq_modify_rdy(struct mlx5dr_devx_obj *devx_obj)
 {
uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.h 
b/drivers/net/mlx5/hws/mlx5dr_cmd.h
index 8a495db9b3..f45b6c6b07 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.h
+++ b/drivers/net/mlx5/hws/mlx5d

[PATCH v6 02/10] net/mlx5/hws: support creating of dynamic forward table and FTE

2023-10-25 Thread Gregory Etelson
From: Hamdan Igbaria 

Add the ability to create forward table and FTE.

Signed-off-by: Hamdan Igbaria 
Acked-by: Suanming Mou 
---
 drivers/common/mlx5/mlx5_prm.h|  4 
 drivers/net/mlx5/hws/mlx5dr_cmd.c | 13 +
 drivers/net/mlx5/hws/mlx5dr_cmd.h | 19 +++
 3 files changed, 36 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 4192fff55b..df621b19af 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -5048,7 +5048,11 @@ enum mlx5_flow_destination_type {
 };
 
 enum mlx5_flow_context_action {
+   MLX5_FLOW_CONTEXT_ACTION_DROP = 1 << 1,
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST = 1 << 2,
+   MLX5_FLOW_CONTEXT_ACTION_REFORMAT = 1 << 4,
+   MLX5_FLOW_CONTEXT_ACTION_DECRYPT = 1 << 12,
+   MLX5_FLOW_CONTEXT_ACTION_ENCRYPT = 1 << 13,
 };
 
 enum mlx5_flow_context_flow_source {
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.c 
b/drivers/net/mlx5/hws/mlx5dr_cmd.c
index 0ccbaee961..8f407f9bce 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.c
+++ b/drivers/net/mlx5/hws/mlx5dr_cmd.c
@@ -42,6 +42,7 @@ mlx5dr_cmd_flow_table_create(struct ibv_context *ctx,
ft_ctx = MLX5_ADDR_OF(create_flow_table_in, in, flow_table_context);
MLX5_SET(flow_table_context, ft_ctx, level, ft_attr->level);
MLX5_SET(flow_table_context, ft_ctx, rtc_valid, ft_attr->rtc_valid);
+   MLX5_SET(flow_table_context, ft_ctx, reformat_en, ft_attr->reformat_en);
 
devx_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out, 
sizeof(out));
if (!devx_obj->obj) {
@@ -182,12 +183,24 @@ mlx5dr_cmd_set_fte(struct ibv_context *ctx,
action_flags = fte_attr->action_flags;
MLX5_SET(flow_context, in_flow_context, action, action_flags);
 
+   if (action_flags & MLX5_FLOW_CONTEXT_ACTION_REFORMAT)
+   MLX5_SET(flow_context, in_flow_context,
+packet_reformat_id, fte_attr->packet_reformat_id);
+
+   if (action_flags & (MLX5_FLOW_CONTEXT_ACTION_DECRYPT | 
MLX5_FLOW_CONTEXT_ACTION_ENCRYPT)) {
+   MLX5_SET(flow_context, in_flow_context,
+encrypt_decrypt_type, fte_attr->encrypt_decrypt_type);
+   MLX5_SET(flow_context, in_flow_context,
+encrypt_decrypt_obj_id, 
fte_attr->encrypt_decrypt_obj_id);
+   }
+
if (action_flags & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
/* Only destination_list_size of size 1 is supported */
MLX5_SET(flow_context, in_flow_context, destination_list_size, 
1);
in_dests = MLX5_ADDR_OF(flow_context, in_flow_context, 
destination);
MLX5_SET(dest_format, in_dests, destination_type, 
fte_attr->destination_type);
MLX5_SET(dest_format, in_dests, destination_id, 
fte_attr->destination_id);
+   MLX5_SET(set_fte_in, in, ignore_flow_level, 
fte_attr->ignore_flow_level);
}
 
devx_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out, 
sizeof(out));
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.h 
b/drivers/net/mlx5/hws/mlx5dr_cmd.h
index f45b6c6b07..bf3a362300 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.h
+++ b/drivers/net/mlx5/hws/mlx5dr_cmd.h
@@ -7,8 +7,12 @@
 
 struct mlx5dr_cmd_set_fte_attr {
uint32_t action_flags;
+   uint8_t encrypt_decrypt_type;
+   uint32_t encrypt_decrypt_obj_id;
+   uint32_t packet_reformat_id;
uint8_t destination_type;
uint32_t destination_id;
+   uint8_t ignore_flow_level;
uint8_t flow_source;
 };
 
@@ -16,6 +20,7 @@ struct mlx5dr_cmd_ft_create_attr {
uint8_t type;
uint8_t level;
bool rtc_valid;
+   uint8_t reformat_en;
 };
 
 #define ACCESS_KEY_LEN 32
@@ -296,6 +301,20 @@ struct mlx5dr_devx_obj *
 mlx5dr_cmd_packet_reformat_create(struct ibv_context *ctx,
  struct mlx5dr_cmd_packet_reformat_create_attr 
*attr);
 
+struct mlx5dr_devx_obj *
+mlx5dr_cmd_set_fte(struct ibv_context *ctx,
+  uint32_t table_type,
+  uint32_t table_id,
+  uint32_t group_id,
+  struct mlx5dr_cmd_set_fte_attr *fte_attr);
+
+struct mlx5dr_cmd_forward_tbl *
+mlx5dr_cmd_forward_tbl_create(struct ibv_context *ctx,
+ struct mlx5dr_cmd_ft_create_attr *ft_attr,
+ struct mlx5dr_cmd_set_fte_attr *fte_attr);
+
+void mlx5dr_cmd_forward_tbl_destroy(struct mlx5dr_cmd_forward_tbl *tbl);
+
 struct mlx5dr_devx_obj *
 mlx5dr_cmd_alias_obj_create(struct ibv_context *ctx,
struct mlx5dr_cmd_alias_obj_create_attr 
*alias_attr);
-- 
2.39.2



[PATCH v6 03/10] net/mlx5/hws: add mlx5dr DevX object struct to mlx5dr action

2023-10-25 Thread Gregory Etelson
From: Hamdan Igbaria 

Add mlx5dr_devx_obj struct to mlx5dr_action, so we could hold
the FT obj in dest table action.

Signed-off-by: Hamdan Igbaria 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/hws/mlx5dr_action.c | 4 
 drivers/net/mlx5/hws/mlx5dr_action.h | 3 +++
 drivers/net/mlx5/hws/mlx5dr_table.c  | 1 -
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/hws/mlx5dr_action.c 
b/drivers/net/mlx5/hws/mlx5dr_action.c
index ea9fc23732..55ec4f71c9 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.c
+++ b/drivers/net/mlx5/hws/mlx5dr_action.c
@@ -787,6 +787,8 @@ mlx5dr_action_create_dest_table(struct mlx5dr_context *ctx,
ret = mlx5dr_action_create_stcs(action, tbl->ft);
if (ret)
goto free_action;
+
+   action->devx_dest.devx_obj = tbl->ft;
}
 
return action;
@@ -864,6 +866,8 @@ mlx5dr_action_create_dest_tir(struct mlx5dr_context *ctx,
ret = mlx5dr_action_create_stcs(action, cur_obj);
if (ret)
goto clean_obj;
+
+   action->devx_dest.devx_obj = cur_obj;
}
 
return action;
diff --git a/drivers/net/mlx5/hws/mlx5dr_action.h 
b/drivers/net/mlx5/hws/mlx5dr_action.h
index 314e289780..104c6880c1 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.h
+++ b/drivers/net/mlx5/hws/mlx5dr_action.h
@@ -148,6 +148,9 @@ struct mlx5dr_action {
struct {
struct mlx5dv_steering_anchor *sa;
} root_tbl;
+   struct {
+   struct mlx5dr_devx_obj *devx_obj;
+   } devx_dest;
};
};
 
diff --git a/drivers/net/mlx5/hws/mlx5dr_table.c 
b/drivers/net/mlx5/hws/mlx5dr_table.c
index e1150cd75d..91eb92db78 100644
--- a/drivers/net/mlx5/hws/mlx5dr_table.c
+++ b/drivers/net/mlx5/hws/mlx5dr_table.c
@@ -68,7 +68,6 @@ static void mlx5dr_table_down_default_fdb_miss_tbl(struct 
mlx5dr_table *tbl)
return;
 
mlx5dr_cmd_forward_tbl_destroy(default_miss);
-
ctx->common_res[tbl_type].default_miss = NULL;
 }
 
-- 
2.39.2



[PATCH v6 04/10] net/mlx5/hws: add support for mirroring

2023-10-25 Thread Gregory Etelson
From: Shun Hao 

This patch supports mirroring by adding an dest_array action. The action
accecpts a list containing multiple destination actions, and can duplicate
packet and forward to each destination in the list.

Signed-off-by: Shun Hao 
Acked-by: Alex Vesker 
Acked-by: Suanming Mou 
---
 drivers/common/mlx5/mlx5_prm.h   |  23 -
 drivers/net/mlx5/hws/mlx5dr.h|  34 +++
 drivers/net/mlx5/hws/mlx5dr_action.c | 130 ++-
 drivers/net/mlx5/hws/mlx5dr_action.h |   3 +
 drivers/net/mlx5/hws/mlx5dr_cmd.c|  64 ++---
 drivers/net/mlx5/hws/mlx5dr_cmd.h|  21 -
 drivers/net/mlx5/hws/mlx5dr_debug.c  |   1 +
 drivers/net/mlx5/hws/mlx5dr_table.c  |   7 +-
 8 files changed, 262 insertions(+), 21 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index df621b19af..aa0b622ca2 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -2320,7 +2320,11 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 };
 
 struct mlx5_ifc_esw_cap_bits {
-   u8 reserved_at_0[0x60];
+   u8 reserved_at_0[0x1d];
+   u8 merged_eswitch[0x1];
+   u8 reserved_at_1e[0x2];
+
+   u8 reserved_at_20[0x40];
 
u8 esw_manager_vport_number_valid[0x1];
u8 reserved_at_61[0xf];
@@ -5045,6 +5049,7 @@ struct mlx5_ifc_query_flow_table_out_bits {
 enum mlx5_flow_destination_type {
MLX5_FLOW_DESTINATION_TYPE_VPORT = 0x0,
MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE = 0x1,
+   MLX5_FLOW_DESTINATION_TYPE_TIR = 0x2,
 };
 
 enum mlx5_flow_context_action {
@@ -5088,6 +5093,19 @@ union mlx5_ifc_dest_format_flow_counter_list_auto_bits {
u8 reserved_at_0[0x40];
 };
 
+struct mlx5_ifc_extended_dest_format_bits {
+   struct mlx5_ifc_dest_format_bits destination_entry;
+
+   u8 packet_reformat_id[0x20];
+
+   u8 reserved_at_60[0x20];
+};
+
+#define MLX5_IFC_MULTI_PATH_FT_MAX_LEVEL 64
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
 struct mlx5_ifc_flow_context_bits {
u8 reserved_at_00[0x20];
u8 group_id[0x20];
@@ -5106,8 +5124,7 @@ struct mlx5_ifc_flow_context_bits {
u8 reserved_at_e0[0x40];
u8 encrypt_decrypt_obj_id[0x20];
u8 reserved_at_140[0x16c0];
-   /* Currently only one destnation */
-   union mlx5_ifc_dest_format_flow_counter_list_auto_bits destination[1];
+   union mlx5_ifc_dest_format_flow_counter_list_auto_bits destination[0];
 };
 
 struct mlx5_ifc_set_fte_in_bits {
diff --git a/drivers/net/mlx5/hws/mlx5dr.h b/drivers/net/mlx5/hws/mlx5dr.h
index ea8bf683f3..1995c55132 100644
--- a/drivers/net/mlx5/hws/mlx5dr.h
+++ b/drivers/net/mlx5/hws/mlx5dr.h
@@ -46,6 +46,7 @@ enum mlx5dr_action_type {
MLX5DR_ACTION_TYP_ASO_METER,
MLX5DR_ACTION_TYP_ASO_CT,
MLX5DR_ACTION_TYP_DEST_ROOT,
+   MLX5DR_ACTION_TYP_DEST_ARRAY,
MLX5DR_ACTION_TYP_MAX,
 };
 
@@ -213,6 +214,20 @@ struct mlx5dr_rule_action {
};
 };
 
+struct mlx5dr_action_dest_attr {
+   /* Required action combination */
+   enum mlx5dr_action_type *action_type;
+
+   /* Required destination action to forward the packet */
+   struct mlx5dr_action *dest;
+
+   /* Optional reformat data */
+   struct {
+   size_t reformat_data_sz;
+   void *reformat_data;
+   } reformat;
+};
+
 /* Open a context used for direct rule insertion using hardware steering.
  * Each context can contain multiple tables of different types.
  *
@@ -616,6 +631,25 @@ mlx5dr_action_create_pop_vlan(struct mlx5dr_context *ctx, 
uint32_t flags);
 struct mlx5dr_action *
 mlx5dr_action_create_push_vlan(struct mlx5dr_context *ctx, uint32_t flags);
 
+/* Create a dest array action, this action can duplicate packets and forward to
+ * multiple destinations in the destination list.
+ * @param[in] ctx
+ * The context in which the new action will be created.
+ * @param[in] num_dest
+ * The number of dests attributes.
+ * @param[in] dests
+ * The destination array. Each contains a destination action and can have
+ * additional actions.
+ * @param[in] flags
+ * Action creation flags. (enum mlx5dr_action_flags)
+ * @return pointer to mlx5dr_action on success NULL otherwise.
+ */
+struct mlx5dr_action *
+mlx5dr_action_create_dest_array(struct mlx5dr_context *ctx,
+   size_t num_dest,
+   struct mlx5dr_action_dest_attr *dests,
+   uint32_t flags);
+
 /* Create dest root table, this action will jump to root table according
  * the given priority.
  * @param[in] ctx
diff --git a/drivers/net/mlx5/hws/mlx5dr_action.c 
b/drivers/net/mlx5/hws/mlx5dr_action.c
index 55ec4f71c9..f068bc7e9c 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.c
+++ b/drivers/net/mlx5/hws/mlx5dr_action.c
@@ -34,7 +34,8 @@ static const uint32_t 
action_order_arr[MLX5DR_TABLE_TYPE_MAX][MLX5DR_ACTION_TYP_
BIT(MLX5DR_ACTION_TYP_MISS) 

[PATCH v6 05/10] net/mlx5/hws: allow destination into default miss FT

2023-10-25 Thread Gregory Etelson
From: Erez Shitrit 

In FDB it will direct the packet into the hypervisor vport.
That allows the user to mirror packets into the default-miss vport.

Signed-off-by: Erez Shitrit 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/hws/mlx5dr_action.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/mlx5/hws/mlx5dr_action.c 
b/drivers/net/mlx5/hws/mlx5dr_action.c
index f068bc7e9c..6b62111593 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.c
+++ b/drivers/net/mlx5/hws/mlx5dr_action.c
@@ -1769,6 +1769,17 @@ mlx5dr_action_create_dest_array(struct mlx5dr_context 
*ctx,
fte_attr.action_flags |= 
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
fte_attr.ignore_flow_level = 1;
break;
+   case MLX5DR_ACTION_TYP_MISS:
+   if (table_type != MLX5DR_TABLE_TYPE_FDB) {
+   DR_LOG(ERR, "Miss action supported for 
FDB only");
+   rte_errno = ENOTSUP;
+   goto free_dest_list;
+   }
+   dest_list[i].destination_type = 
MLX5_FLOW_DESTINATION_TYPE_VPORT;
+   dest_list[i].destination_id =
+   ctx->caps->eswitch_manager_vport_number;
+   fte_attr.action_flags |= 
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+   break;
case MLX5DR_ACTION_TYP_VPORT:
dest_list[i].destination_type = 
MLX5_FLOW_DESTINATION_TYPE_VPORT;
dest_list[i].destination_id = 
dests[i].dest->vport.vport_num;
-- 
2.39.2



[PATCH v6 06/10] net/mlx5/hws: support reformat for hws mirror

2023-10-25 Thread Gregory Etelson
From: Haifei Luo 

In dest_array action, an optional reformat action can be applied to each
destination. This patch supports this by using the extended destination
entry.

Signed-off-by: Haifei Luo 
Signed-off-by: Shun Hao 
Acked-by: Suanming Mou 
---
 drivers/common/mlx5/mlx5_prm.h   | 15 +++
 drivers/net/mlx5/hws/mlx5dr_action.c | 67 +++-
 drivers/net/mlx5/hws/mlx5dr_action.h |  2 +
 drivers/net/mlx5/hws/mlx5dr_cmd.c| 10 -
 drivers/net/mlx5/hws/mlx5dr_cmd.h|  2 +
 5 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index aa0b622ca2..bced5a59dd 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -5352,6 +5352,21 @@ enum mlx5_parse_graph_arc_node_index {
MLX5_GRAPH_ARC_NODE_PROGRAMMABLE = 0x1f,
 };
 
+enum mlx5_packet_reformat_context_reformat_type {
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L2_TO_L2_TUNNEL = 0x2,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L2_TO_L3_TUNNEL = 0x4,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV4 
= 0x5,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L2_TO_L3_ESP_TUNNEL = 0x6,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_UDPV4 
= 0x7,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_DEL_ESP_TRANSPORT = 0x8,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L3_ESP_TUNNEL_TO_L2 = 0x9,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_DEL_ESP_TRANSPORT_OVER_UDP = 
0xA,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV6 
= 0xB,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_UDPV6 
= 0xC,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_ADD_NISP_TNL = 0xD,
+   MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_REMOVE_NISP_TNL = 0xE,
+};
+
 #define MLX5_PARSE_GRAPH_FLOW_SAMPLE_MAX 8
 #define MLX5_PARSE_GRAPH_IN_ARC_MAX 8
 #define MLX5_PARSE_GRAPH_OUT_ARC_MAX 8
diff --git a/drivers/net/mlx5/hws/mlx5dr_action.c 
b/drivers/net/mlx5/hws/mlx5dr_action.c
index 6b62111593..11a7c58925 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.c
+++ b/drivers/net/mlx5/hws/mlx5dr_action.c
@@ -1703,6 +1703,44 @@ mlx5dr_action_create_modify_header(struct mlx5dr_context 
*ctx,
return NULL;
 }
 
+static struct mlx5dr_devx_obj *
+mlx5dr_action_dest_array_process_reformat(struct mlx5dr_context *ctx,
+ enum mlx5dr_action_type type,
+ void *reformat_data,
+ size_t reformat_data_sz)
+{
+   struct mlx5dr_cmd_packet_reformat_create_attr pr_attr = {0};
+   struct mlx5dr_devx_obj *reformat_devx_obj;
+
+   if (!reformat_data || !reformat_data_sz) {
+   DR_LOG(ERR, "Empty reformat action or data");
+   rte_errno = EINVAL;
+   return NULL;
+   }
+
+   switch (type) {
+   case MLX5DR_ACTION_TYP_REFORMAT_L2_TO_TNL_L2:
+   pr_attr.type = 
MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L2_TO_L2_TUNNEL;
+   break;
+   case MLX5DR_ACTION_TYP_REFORMAT_L2_TO_TNL_L3:
+   pr_attr.type = 
MLX5_PACKET_REFORMAT_CONTEXT_REFORMAT_TYPE_L2_TO_L3_TUNNEL;
+   break;
+   default:
+   DR_LOG(ERR, "Invalid value for reformat type");
+   rte_errno = EINVAL;
+   return NULL;
+   }
+   pr_attr.reformat_param_0 = 0;
+   pr_attr.data_sz = reformat_data_sz;
+   pr_attr.data = reformat_data;
+
+   reformat_devx_obj = mlx5dr_cmd_packet_reformat_create(ctx->ibv_ctx, 
&pr_attr);
+   if (!reformat_devx_obj)
+   return NULL;
+
+   return reformat_devx_obj;
+}
+
 struct mlx5dr_action *
 mlx5dr_action_create_dest_array(struct mlx5dr_context *ctx,
size_t num_dest,
@@ -1710,6 +1748,7 @@ mlx5dr_action_create_dest_array(struct mlx5dr_context 
*ctx,
uint32_t flags)
 {
struct mlx5dr_cmd_set_fte_dest *dest_list = NULL;
+   struct mlx5dr_devx_obj *packet_reformat = NULL;
struct mlx5dr_cmd_ft_create_attr ft_attr = {0};
struct mlx5dr_cmd_set_fte_attr fte_attr = {0};
struct mlx5dr_cmd_forward_tbl *fw_island;
@@ -1796,6 +1835,21 @@ mlx5dr_action_create_dest_array(struct mlx5dr_context 
*ctx,
dest_list[i].destination_id = 
dests[i].dest->devx_dest.devx_obj->id;
fte_attr.action_flags |= 
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
break;
+   case MLX5DR_ACTION_TYP_REFORMAT_L2_TO_TNL_L2:
+   case MLX5DR_ACTION_TYP_REFORMAT_L2_TO_TNL_L3:
+   packet_reformat = 
mlx5dr_action_dest_array_process_reformat
+   (ctx,
+   

[PATCH v6 07/10] net/mlx5: reformat HWS code for HWS mirror action

2023-10-25 Thread Gregory Etelson
Reformat HWS code for HWS mirror action.

Signed-off-by: Gregory Etelson 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 70 ++---
 1 file changed, 39 insertions(+), 31 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 6fcf654e4a..b2215fb5cf 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4548,6 +4548,17 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] 
= {
[RTE_FLOW_ACTION_TYPE_SEND_TO_KERNEL] = MLX5DR_ACTION_TYP_DEST_ROOT,
 };
 
+static inline void
+action_template_set_type(struct rte_flow_actions_template *at,
+enum mlx5dr_action_type *action_types,
+unsigned int action_src, uint16_t *curr_off,
+enum mlx5dr_action_type type)
+{
+   at->actions_off[action_src] = *curr_off;
+   action_types[*curr_off] = type;
+   *curr_off = *curr_off + 1;
+}
+
 static int
 flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
  unsigned int action_src,
@@ -4565,9 +4576,8 @@ flow_hw_dr_actions_template_handle_shared(const struct 
rte_flow_action *mask,
type = mask->type;
switch (type) {
case RTE_FLOW_ACTION_TYPE_RSS:
-   at->actions_off[action_src] = *curr_off;
-   action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
-   *curr_off = *curr_off + 1;
+   action_template_set_type(at, action_types, action_src, curr_off,
+MLX5DR_ACTION_TYP_TIR);
break;
case RTE_FLOW_ACTION_TYPE_AGE:
case RTE_FLOW_ACTION_TYPE_COUNT:
@@ -4575,23 +4585,20 @@ flow_hw_dr_actions_template_handle_shared(const struct 
rte_flow_action *mask,
 * Both AGE and COUNT action need counter, the first one fills
 * the action_types array, and the second only saves the offset.
 */
-   if (*cnt_off == UINT16_MAX) {
-   *cnt_off = *curr_off;
-   action_types[*cnt_off] = MLX5DR_ACTION_TYP_CTR;
-   *curr_off = *curr_off + 1;
-   }
+   if (*cnt_off == UINT16_MAX)
+   action_template_set_type(at, action_types,
+action_src, curr_off,
+MLX5DR_ACTION_TYP_CTR);
at->actions_off[action_src] = *cnt_off;
break;
case RTE_FLOW_ACTION_TYPE_CONNTRACK:
-   at->actions_off[action_src] = *curr_off;
-   action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
-   *curr_off = *curr_off + 1;
+   action_template_set_type(at, action_types, action_src, curr_off,
+MLX5DR_ACTION_TYP_ASO_CT);
break;
case RTE_FLOW_ACTION_TYPE_QUOTA:
case RTE_FLOW_ACTION_TYPE_METER_MARK:
-   at->actions_off[action_src] = *curr_off;
-   action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_METER;
-   *curr_off = *curr_off + 1;
+   action_template_set_type(at, action_types, action_src, curr_off,
+MLX5DR_ACTION_TYP_ASO_METER);
break;
default:
DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
@@ -5101,31 +5108,32 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
at->reformat_off = UINT16_MAX;
at->mhdr_off = UINT16_MAX;
at->rx_cpy_pos = pos;
-   /*
-* mlx5 PMD hacks indirect action index directly to the action conf.
-* The rte_flow_conv() function copies the content from conf pointer.
-* Need to restore the indirect action index from action conf here.
-*/
for (i = 0; actions->type != RTE_FLOW_ACTION_TYPE_END;
 actions++, masks++, i++) {
-   if (actions->type == RTE_FLOW_ACTION_TYPE_INDIRECT) {
+   const struct rte_flow_action_modify_field *info;
+
+   switch (actions->type) {
+   /*
+* mlx5 PMD hacks indirect action index directly to the action 
conf.
+* The rte_flow_conv() function copies the content from conf 
pointer.
+* Need to restore the indirect action index from action conf 
here.
+*/
+   case RTE_FLOW_ACTION_TYPE_INDIRECT:
at->actions[i].conf = actions->conf;
at->masks[i].conf = masks->conf;
-   }
-   if (actions->type == RTE_FLOW_ACTION_TYPE_MODIFY_FIELD) {
-   const struct rte_flow_action_modify_field *info = 
actions->conf;
-
+   break;
+   case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+   i

[PATCH v6 08/10] net/mlx5: support HWS mirror action

2023-10-25 Thread Gregory Etelson
HWS mirror clones original packet to one or two destinations and
proceeds with the original packet path.

The mirror has no dedicated RTE flow action type.
Mirror object is referenced by INDIRECT_LIST action.
INDIRECT_LIST for a mirror built from actions list:

SAMPLE [/ SAMPLE] /  / END

Mirror SAMPLE action defines packet clone. It specifies the clone
destination and optional clone reformat action.
Destination action for both clone and original packet depends on HCA
domain:
- for NIC RX, destination is ether RSS or QUEUE
- for FDB, destination is PORT

HWS mirror was implemented with the INDIRECT_LIST flow action.

MLX5 PMD defines general `struct mlx5_indirect_list` type for all.
INDIRECT_LIST handler objects:

struct mlx5_indirect_list {
enum mlx5_indirect_list_type type;
LIST_ENTRY(mlx5_indirect_list) chain;
char data[];
};

Specific INDIRECT_LIST type must overload `mlx5_indirect_list::data`
and provide unique `type` value.
PMD returns a pointer to `mlx5_indirect_list` object.

Existing non-masked actions template API cannot identify flow actions
in INDIRECT_LIST handler because INDIRECT_LIST handler can represent
several flow actions.

For example:
A: SAMPLE / JUMP
B: SAMPE / SAMPLE / RSS

Actions template command

template indirect_list / end mask indirect_list 0 / end

does not provide any information to differentiate between flow
actions in A and B.

MLX5 PMD requires INDIRECT_LIST configuration parameter in the
template section:

Non-masked INDIRECT_LIST API:
=

template indirect_list X / end mask indirect_list 0 / end

PMD identifies type of X handler and will use the same type in
template creation. Actual parameters for actions in the list will
be extracted from flow configuration

Masked INDIRECT_LIST API:
=

template indirect_list X / end mask indirect_list -lUL / end

PMD creates action template from actions types and configurations
referenced by X.

INDIRECT_LIST action without configuration is invalid and will be
rejected by PMD.

Signed-off-by: Gregory Etelson 
Acked-by: Suanming Mou 
---
 doc/guides/nics/features/mlx5.ini  |   1 +
 doc/guides/rel_notes/release_23_11.rst |   1 +
 drivers/net/mlx5/mlx5.c|   1 +
 drivers/net/mlx5/mlx5.h|   2 +
 drivers/net/mlx5/mlx5_flow.c   | 134 ++
 drivers/net/mlx5/mlx5_flow.h   |  69 ++-
 drivers/net/mlx5/mlx5_flow_hw.c| 615 -
 7 files changed, 818 insertions(+), 5 deletions(-)

diff --git a/doc/guides/nics/features/mlx5.ini 
b/doc/guides/nics/features/mlx5.ini
index fc67415c6c..a85d755734 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -106,6 +106,7 @@ drop = Y
 flag = Y
 inc_tcp_ack  = Y
 inc_tcp_seq  = Y
+indirect_list= Y
 jump = Y
 mark = Y
 meter= Y
diff --git a/doc/guides/rel_notes/release_23_11.rst 
b/doc/guides/rel_notes/release_23_11.rst
index 0a6fc76a9d..81d606e773 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -143,6 +143,7 @@ New Features
 * **Updated NVIDIA mlx5 net driver.**
 
   * Added support for Network Service Header (NSH) flow matching.
+  * Added support for ``RTE_FLOW_ACTION_TYPE_INDIRECT_LIST`` flow action.
 
 * **Updated Solarflare net driver.**
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 997df595d0..08b7b03365 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2168,6 +2168,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
/* Free the eCPRI flex parser resource. */
mlx5_flex_parser_ecpri_release(dev);
mlx5_flex_item_port_cleanup(dev);
+   mlx5_indirect_list_handles_release(dev);
 #ifdef HAVE_MLX5_HWS_SUPPORT
flow_hw_destroy_vport_action(dev);
flow_hw_resource_release(dev);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0b709a1bda..f3b872f59c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1791,6 +1791,8 @@ struct mlx5_priv {
LIST_HEAD(ind_tables, mlx5_ind_table_obj) ind_tbls;
/* Standalone indirect tables. */
LIST_HEAD(stdl_ind_tables, mlx5_ind_table_obj) standalone_ind_tbls;
+   /* Objects created with indirect list action */
+   LIST_HEAD(indirect_list, mlx5_indirect_list) indirect_list_head;
/* Pointer to next element. */
rte_rwlock_t ind_tbls_lock;
uint32_t refcnt; /**< Reference counter. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 8ad85e6027..99b814d815 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -62,6 +62,30 @@ struct tunnel_default_miss_ctx {
};
 };
 
+void
+mlx5_indirect_list_handles_release(str

[PATCH v6 10/10] net/mlx5: support indirect list METER_MARK action

2023-10-25 Thread Gregory Etelson
Signed-off-by: Gregory Etelson 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5_flow.c|  69 -
 drivers/net/mlx5/mlx5_flow.h|  70 --
 drivers/net/mlx5/mlx5_flow_hw.c | 430 +++-
 3 files changed, 484 insertions(+), 85 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 99b814d815..34252d66c0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -75,8 +75,11 @@ mlx5_indirect_list_handles_release(struct rte_eth_dev *dev)
switch (e->type) {
 #ifdef HAVE_MLX5_HWS_SUPPORT
case MLX5_INDIRECT_ACTION_LIST_TYPE_MIRROR:
-   mlx5_hw_mirror_destroy(dev, (struct mlx5_mirror *)e, 
true);
+   mlx5_hw_mirror_destroy(dev, (struct mlx5_mirror *)e);
break;
+   case MLX5_INDIRECT_ACTION_LIST_TYPE_LEGACY:
+   mlx5_destroy_legacy_indirect(dev, e);
+   break;
 #endif
default:
DRV_LOG(ERR, "invalid indirect list type");
@@ -1169,7 +1172,24 @@ mlx5_flow_async_action_list_handle_destroy
 const struct rte_flow_op_attr *op_attr,
 struct rte_flow_action_list_handle *action_handle,
 void *user_data, struct rte_flow_error *error);
-
+static int
+mlx5_flow_action_list_handle_query_update(struct rte_eth_dev *dev,
+ const
+ struct rte_flow_action_list_handle 
*handle,
+ const void **update, void **query,
+ enum rte_flow_query_update_mode mode,
+ struct rte_flow_error *error);
+static int
+mlx5_flow_async_action_list_handle_query_update(struct rte_eth_dev *dev,
+   uint32_t queue_id,
+   const struct rte_flow_op_attr 
*attr,
+   const struct
+   rte_flow_action_list_handle 
*handle,
+   const void **update,
+   void **query,
+   enum rte_flow_query_update_mode 
mode,
+   void *user_data,
+   struct rte_flow_error *error);
 static const struct rte_flow_ops mlx5_flow_ops = {
.validate = mlx5_flow_validate,
.create = mlx5_flow_create,
@@ -1219,6 +1239,10 @@ static const struct rte_flow_ops mlx5_flow_ops = {
mlx5_flow_async_action_list_handle_create,
.async_action_list_handle_destroy =
mlx5_flow_async_action_list_handle_destroy,
+   .action_list_handle_query_update =
+   mlx5_flow_action_list_handle_query_update,
+   .async_action_list_handle_query_update =
+   mlx5_flow_async_action_list_handle_query_update,
 };
 
 /* Tunnel information. */
@@ -11003,6 +11027,47 @@ mlx5_flow_async_action_list_handle_destroy
  error);
 }
 
+static int
+mlx5_flow_action_list_handle_query_update(struct rte_eth_dev *dev,
+ const
+ struct rte_flow_action_list_handle 
*handle,
+ const void **update, void **query,
+ enum rte_flow_query_update_mode mode,
+ struct rte_flow_error *error)
+{
+   const struct mlx5_flow_driver_ops *fops;
+
+   MLX5_DRV_FOPS_OR_ERR(dev, fops,
+action_list_handle_query_update, ENOTSUP);
+   return fops->action_list_handle_query_update(dev, handle, update, query,
+mode, error);
+}
+
+static int
+mlx5_flow_async_action_list_handle_query_update(struct rte_eth_dev *dev,
+   uint32_t queue_id,
+   const
+   struct rte_flow_op_attr 
*op_attr,
+   const struct
+   rte_flow_action_list_handle 
*handle,
+   const void **update,
+   void **query,
+   enum
+   rte_flow_query_update_mode mode,
+   void *user_data,
+   struct rte_flow_error *error)
+{
+   const struct mlx5_flow_driver_ops *fops;
+
+   MLX5_

[PATCH v6 09/10] net/mlx5: reformat HWS code for indirect list actions

2023-10-25 Thread Gregory Etelson
Reformat HWS code for indirect list actions.

Signed-off-by: Gregory Etelson 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5_flow.h|   4 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 250 +---
 2 files changed, 139 insertions(+), 115 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 580db80fd4..653f83cf55 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1331,11 +1331,11 @@ struct rte_flow_actions_template {
uint64_t action_flags; /* Bit-map of all valid action in template. */
uint16_t dr_actions_num; /* Amount of DR rules actions. */
uint16_t actions_num; /* Amount of flow actions */
-   uint16_t *actions_off; /* DR action offset for given rte action offset. 
*/
+   uint16_t *dr_off; /* DR action offset for given rte action offset. */
+   uint16_t *src_off; /* RTE action displacement from app. template */
uint16_t reformat_off; /* Offset of DR reformat action. */
uint16_t mhdr_off; /* Offset of DR modify header action. */
uint32_t refcnt; /* Reference counter. */
-   uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
uint8_t flex_item; /* flex item index. */
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 1c3d915be1..f9f735ba75 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -1015,11 +1015,11 @@ flow_hw_modify_field_init(struct 
mlx5_hw_modify_header_action *mhdr,
 static __rte_always_inline int
 flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 const struct rte_flow_attr *attr,
-const struct rte_flow_action *action_start, /* 
Start of AT actions. */
 const struct rte_flow_action *action, /* Current 
action from AT. */
 const struct rte_flow_action *action_mask, /* 
Current mask from AT. */
 struct mlx5_hw_actions *acts,
 struct mlx5_hw_modify_header_action *mhdr,
+uint16_t src_pos,
 struct rte_flow_error *error)
 {
struct mlx5_priv *priv = dev->data->dev_private;
@@ -1122,7 +1122,7 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
if (shared)
return 0;
ret = __flow_hw_act_data_hdr_modify_append(priv, acts, 
RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
-  action - action_start, 
mhdr->pos,
+  src_pos, mhdr->pos,
   cmds_start, cmds_end, shared,
   field, dcopy, mask);
if (ret)
@@ -1181,11 +1181,10 @@ flow_hw_validate_compiled_modify_field(struct 
rte_eth_dev *dev,
 static int
 flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 const struct rte_flow_attr *attr,
-const struct rte_flow_action *action_start,
 const struct rte_flow_action *action,
 const struct rte_flow_action *action_mask,
 struct mlx5_hw_actions *acts,
-uint16_t action_dst,
+uint16_t action_src, uint16_t action_dst,
 struct rte_flow_error *error)
 {
struct mlx5_priv *priv = dev->data->dev_private;
@@ -1241,7 +1240,7 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
} else {
ret = __flow_hw_act_data_general_append
(priv, acts, action->type,
-action - action_start, action_dst);
+action_src, action_dst);
if (ret)
return rte_flow_error_set
(error, ENOMEM,
@@ -1493,7 +1492,6 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
const struct rte_flow_attr *attr = &table_attr->flow_attr;
struct rte_flow_action *actions = at->actions;
-   struct rte_flow_action *action_start = actions;
struct rte_flow_action *masks = at->masks;
enum mlx5dr_action_type refmt_type = MLX5DR_ACTION_TYP_LAST;
const struct rte_flow_action_raw_encap *raw_encap_data;
@@ -1506,7 +1504,6 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
uint32_t type;
bool reformat_used = false;
unsigned int of_vlan_offset;
-   uint16_t action_pos;
uint16_t jump_pos;
uint32_t ct_idx;
int ret, err;
@@ -1521,71 +1518,69 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
else
type = MLX5DR_

RE: [Bug 1304] l3fwd-power example fails to run with uncore options, -U -u and -i

2023-10-25 Thread Tummala, Sivaprasad
[AMD Official Use Only - General]

Hi David,

> -Original Message-
> From: David Marchand 
> Sent: Wednesday, October 25, 2023 3:24 PM
> To: Tummala, Sivaprasad 
> Cc: dev@dpdk.org; Yigit, Ferruh 
> Subject: Re: [Bug 1304] l3fwd-power example fails to run with uncore options, 
> -U -
> u and -i
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> Hello Siva,
>
> On Wed, Oct 25, 2023 at 11:34 AM  wrote:
> >
> > Bug ID 1304
> > Summary l3fwd-power example fails to run with uncore options, -U -u
> > and -i Product DPDK Version 23.11 Hardware All OS All Status
> > UNCONFIRMED Severity normal Priority Normal Component other Assignee
> > dev@dpdk.org Reporter karen.ke...@intel.com Target Milestone ---
> >
> > We suspect this particular commit introduces this bug, the sample app
> > will not work with the -U -u and -i options.
> >
> > The options worked when I tried with 23.07. I also tried removing this
> > commit and the options worked again.
> >
> > Result of git show:
> >
> > commit ac1edcb6621af6ff3c2b01d40e4dd6ed0527a748
> > Author: Sivaprasad Tummala 
> > Date:   Wed Aug 16 03:09:57 2023 -0700
> >
> > power: refactor uncore power management API
> >
> > Currently the uncore power management implementation is vendor specific.
> >
> > Added new vendor agnostic uncore power interface similar to rte_power
> > and rename specific implementations ("rte_power_intel_uncore") to
> > "power_intel_uncore" along with functions.
> >
> > Signed-off-by: Sivaprasad Tummala 
> >
> >
> > DPDK Version: 23.11-rc1
> > Commands:
> > .//examples/dpdk-l3fwd-power -c 0x6 -n 1 -- -p
> > 0x1 -P --config="(0,0,2)" -U
> > .//examples/dpdk-l3fwd-power -c 0x6 -n 1 -- -p
> > 0x1 -P --config="(0,0,2)" -u
> > .//examples/dpdk-l3fwd-power -c 0x6 -n 1 -- -p
> > 0x1 -P --config="(0,0,2)" -i 2
> > Error:
> > EAL: Error - exiting with code: 1
> > Cause: Invalid L3FWD parameters
>
> Please register to dpdk.org bugzilla, and have a look at this report.
> Thanks.

Done! I will look into this.
>
>
> --
> David Marchand



Re: [PATCH v2 0/7] vhost: ensure virtqueue access status is checked

2023-10-25 Thread David Marchand
On Fri, Oct 20, 2023 at 10:48 AM Maxime Coquelin
 wrote:
>
> Li Feng initially reported segmentation fault in rte_vhost_vring_call()
> because of not checking the virtqueue metadata can be accessed.
>
> This should be achieved by checking the access_ok status field of
> the virtqueue.
>
> This series also takes the opportunity to fix the other APIs.
> This is split in multiple patches to ease LTS maintainers backports,
> but could be squashed if preferred.
>
> Changes in v2:
> --
> - Rebased to apply on -rc1 (David)
> - Add Fixes tag in patch 1 (David)
> - Fix various typos in commit logs (David)
>
> Maxime Coquelin (7):
>   vhost: fix missing vring call check on virtqueue access
>   vhost: fix missing check on virtqueue access
>   vhost: fix checking virtqueue access when notifying guest
>   vhost: fix check on virtqueue access in async registration
>   vhost: fix check on virtqueue access in in-flight getter
>   vhost: fix missing lock protection in power monitor API
>   vhost: fix checking virtqueue access in stats API
>
>  lib/vhost/vhost.c | 92 +++
>  1 file changed, 85 insertions(+), 7 deletions(-)

For the series,
Acked-by: David Marchand 


-- 
David Marchand



Re: [PATCH v2 0/7] vhost: ensure virtqueue access status is checked

2023-10-25 Thread Maxime Coquelin




On 10/20/23 10:47, Maxime Coquelin wrote:

Li Feng initially reported segmentation fault in rte_vhost_vring_call()
because of not checking the virtqueue metadata can be accessed.

This should be achieved by checking the access_ok status field of
the virtqueue.

This series also takes the opportunity to fix the other APIs.
This is split in multiple patches to ease LTS maintainers backports,
but could be squashed if preferred.

Changes in v2:
--
- Rebased to apply on -rc1 (David)
- Add Fixes tag in patch 1 (David)
- Fix various typos in commit logs (David)

Maxime Coquelin (7):
   vhost: fix missing vring call check on virtqueue access
   vhost: fix missing check on virtqueue access
   vhost: fix checking virtqueue access when notifying guest
   vhost: fix check on virtqueue access in async registration
   vhost: fix check on virtqueue access in in-flight getter
   vhost: fix missing lock protection in power monitor API
   vhost: fix checking virtqueue access in stats API

  lib/vhost/vhost.c | 92 +++
  1 file changed, 85 insertions(+), 7 deletions(-)



Applied to next-virtio/for-next-net

Thanks,
Maxime



Re: [PATCH v2] net/virtio: fix link state interrupt vector setting

2023-10-25 Thread Maxime Coquelin




On 10/23/23 03:46, Wenwu Ma wrote:

The settings of the vector for link state interrupts
should be done before the initialization of the device
is completed.

Fixes: ee85024cf5f7 ("net/virtio: complete init stage at the right place")
Cc: sta...@dpdk.org

Signed-off-by: Wenwu Ma 
Tested-by: Wei Ling 
Reviewed-by: Maxime Coquelin 
---
v2:
  - rewording of the title

---
  drivers/net/virtio/virtio_ethdev.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 3ab56ef769..c2c0a1a111 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1912,6 +1912,14 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t 
req_features)
}
}
  
+	if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)

+   /* Enable vector (0) for Link State Interrupt */
+   if (VIRTIO_OPS(hw)->set_config_irq(hw, 0) ==
+   VIRTIO_MSI_NO_VECTOR) {
+   PMD_DRV_LOG(ERR, "failed to set config vector");
+   return -EBUSY;
+   }
+
virtio_reinit_complete(hw);
  
  	return 0;

@@ -2237,14 +2245,6 @@ virtio_dev_configure(struct rte_eth_dev *dev)
hw->has_tx_offload = tx_offload_enabled(hw);
hw->has_rx_offload = rx_offload_enabled(hw);
  
-	if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)

-   /* Enable vector (0) for Link State Interrupt */
-   if (VIRTIO_OPS(hw)->set_config_irq(hw, 0) ==
-   VIRTIO_MSI_NO_VECTOR) {
-   PMD_DRV_LOG(ERR, "failed to set config vector");
-   return -EBUSY;
-   }
-
if (virtio_with_packed_queue(hw)) {
  #if defined(RTE_ARCH_X86_64) && defined(CC_AVX512_SUPPORT)
if ((hw->use_vec_rx || hw->use_vec_tx) &&


Applied to next-virtio/for-next-net

Thanks,
Maxime



Re: [PATCH] net/virtio: fixed missing next flag when sending packets in packed mode

2023-10-25 Thread Maxime Coquelin




On 10/17/23 09:26, Fengjiang Liu wrote:

When the packets is sent in packed mode, and the packets data and
virtio-header are divided into two desc, set the next flag of
virtio-header desc

Bugzilla ID: 1295
Fixes: 892dc798fa9c ("net/virtio: implement Tx path for packed queues")

Signed-off-by: Fengjiang Liu 
---
  drivers/net/virtio/virtqueue.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 9d4aba11a3..4e9f2d0358 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -672,6 +672,7 @@ virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, 
struct rte_mbuf *cookie,
 */
start_dp[idx].addr = txvq->hdr_mem + 
RTE_PTR_DIFF(&txr[idx].tx_hdr, txr);
start_dp[idx].len = vq->hw->vtnet_hdr_size;
+   head_flags |= VRING_DESC_F_NEXT;
hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr;
idx++;
if (idx >= vq->vq_nentries) {


Applied to next-virtio/for-next-net

Thanks,
Maxime



Re: Eventdev dequeue-enqueue event correlation

2023-10-25 Thread Bruce Richardson
On Wed, Oct 25, 2023 at 09:40:54AM +0200, Mattias Rönnblom wrote:
> On 2023-10-24 11:10, Bruce Richardson wrote:
> > On Tue, Oct 24, 2023 at 09:10:30AM +0100, Bruce Richardson wrote:
> > > On Mon, Oct 23, 2023 at 06:10:54PM +0200, Mattias Rönnblom wrote:
> > > > Hi.
> > > > 
> > > > Consider an Eventdev app using atomic-type scheduling doing something 
> > > > like:
> > > > 
> > > >  struct rte_event events[3];
> > > > 
> > > >  rte_event_dequeue_burst(dev_id, port_id, events, 3, 0);
> > > > 
> > > >  /* Assume three events were dequeued, and the application decides
> > > >   * it's best off to processing event 0 and 2 consecutively */
> > > > 
> > > >  process(&events[0]);
> > > >  process(&events[2]);
> > > > 
> > > >  events[0].queue_id++;
> > > >  events[0].op = RTE_EVENT_OP_FORWARD;
> > > >  events[2].queue_id++;
> > > >  events[2].op = RTE_EVENT_OP_FORWARD;
> > > > 
> > > >  rte_event_enqueue_burst(dev_id, port_id, &events[0], 1);
> > > >  rte_event_enqueue_burst(dev_id, port_id, &events[2], 1);
> > > > 
> > > >  process(&events[1]);
> > > >  events[1].queue_id++;
> > > >  events[1].op = RTE_EVENT_OP_FORWARD;
> > > > 
> > > >  rte_event_enqueue_burst(dev_id, port_id, &events[1], 1);
> > > > 
> > > > If one would just read the Eventdev API spec, they might expect this to 
> > > > work
> > > > (especially since impl_opaque hints as potentially be useful for the 
> > > > purpose
> > > > of identifying events).
> > > > 
> > > > However, on certain event devices, it doesn't (and maybe rightly so). If
> > > > event 0 and 2 belongs to the same flow (queue id + flow id pair), and 
> > > > event
> > > > 1 belongs to some other, then this other flow would be "unlocked" at the
> > > > point of the second enqueue operation (and thus be processed on some 
> > > > other
> > > > core, in parallel). The first flow would still be needlessly "locked".
> > > > 
> > > > Such event devices require the order of the enqueued events to be the 
> > > > same
> > > > as the dequeued events, using RTE_EVENT_OP_RELEASE type events as 
> > > > "fillers"
> > > > for dropped events.
> > > > 
> > > > Am I missing something in the Eventdev API documentation?
> > > > 
> > > 
> > > Much more likely is that the documentation is missing something. We should
> > > explicitly clarify this behaviour, as it's required by a number of 
> > > drivers.
> > > 
> > > > Could an event device use the impl_opaque field to track the identity 
> > > > of an
> > > > event (and thus relax ordering requirements) and still be complaint 
> > > > toward
> > > > the API?
> > > > 
> > > 
> > > Possibly, but the documentation also doesn't report that the impl_opaque
> > > field must be preserved between dequeue and enqueue. When forwarding a
> > > packet it's well possible for an app to extract an mbuf from a dequeued
> > > event and create a new event for sending it back in to the eventdev. For
> 
> Such a behavior would be in violation of a part of the Eventdev API contract
> actually specified. The rte_event struct documentation says about
> impl_opaque that "An implementation may use this field to hold
> implementation specific value to share between dequeue and enqueue
> operation. The application should not modify this field. "
> 
> I see no other way to read this than that "an implementation" here is
> referring to an event device PMD. The requirement that the application can't
> modify this field only make sense in the context of "from dequeue to
> enqueue".
> 

Yep, you are completely correct. For some reason, I had this in my head the
other way round, that it was for internal use between the enqueue and
dequeue. My mistake! :-(

> > > example, if the first stage post-RX is doing classify, it's entirely
> > > possible for every single field in the event header to be different for 
> > > the
> > > event returned compared to dequeue (flow_id recomputed, event type/source
> > > adjusted, target queue_id and priority updated, op type changed to forward
> > > from new, etc. etc.).
> > > 
> > > > What happens if a RTE_EVENT_OP_NEW event is inserted into the mix of
> > > > OP_FORWARD and OP_RELEASE type events being enqueued? Again I'm not 
> > > > clear on
> > > > what the API says, if anything.
> > > > 
> > > OP_NEW should have no effect on the "history-list" of events previousl
> > > dequeued. Again, our docs should clarify that explicitly. Thanks for
> > > calling all this out.
> > > 
> > Looking at the docs we have, I would propose adding a new subsection "Event
> > Operations", as section 49.1.6 to [1]. There we could explain "New",
> > "Forward" and "Release" events - what they mean for the different queue
> > types and how to use them. That section could also cover the enqueue
> > ordering rules, as the use of event "history" is necessary to explain
> > releases and forwards.
> > 
> > This seem reasonable? If nobody else has already started on updating docs
> > for this, I'm happy enough 

Re: [PATCH v1 1/2] baseband/acc: support ACC100 deRM corner case SDK

2023-10-25 Thread Maxime Coquelin

Hi Nicolas,

On 10/18/23 12:56, David Marchand wrote:

On Tue, Oct 10, 2023 at 7:55 PM Hernan Vargas  wrote:


Implement de-ratematch pre-processing for ACC100 SW corner cases.
Some specific 5GUL FEC corner cases may cause unintended back pressure
and in some cases a potential stability issue on the ACC100.
The PMD can detect such code block configuration and issue an info
message to the user.

Signed-off-by: Hernan Vargas 
---
  drivers/baseband/acc/meson.build  | 23 ++-
  drivers/baseband/acc/rte_acc100_pmd.c | 59 +--
  2 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/drivers/baseband/acc/meson.build b/drivers/baseband/acc/meson.build
index 27a654b50153..84f4fea635ef 100644
--- a/drivers/baseband/acc/meson.build
+++ b/drivers/baseband/acc/meson.build
@@ -1,7 +1,28 @@
  # SPDX-License-Identifier: BSD-3-Clause
  # Copyright(c) 2020 Intel Corporation

-deps += ['bus_pci']

...

+deps += ['bbdev', 'bus_pci']


This part is likely a rebase damage.
See: b7b8de26f34d ("drivers: add dependencies for some classes")




Do you plan to send a new version² fixing the rebase issue?

Thanks,
Maxime



RE: [PATCH 4/5] net/ena: add support for ena-express metrics

2023-10-25 Thread Brandes, Shai
On Tue, 24 Oct 2023 13:21:27 +0300
 wrote:

>>  struct ena_offloads {
>>   uint32_t tx_offloads;
>>   uint32_t rx_offloads;
>> @@ -329,6 +346,7 @@ struct ena_adapter {
>>*/
>>   uint64_t metrics_stats[ENA_MAX_CUSTOMER_METRICS] __rte_cache_aligned;
>>   uint16_t metrics_num;
>> + struct ena_stats_srd srd_stats __rte_cache_aligned;
>>  };

> If metrics_num was before the metrics_stats[] you would save some space.
Hi, I checked it with pahole and both ways provide same structure size and 
overall same padding size (14B):

uint64_t   metrics_stats[6] 
__attribute__((__aligned__(64))); /* 18195248 */
uint16_t   metrics_num;  /* 182000 2 */

/* XXX 14 bytes hole, try to pack */

/* --- cacheline 2844 boundary (182016 bytes) --- */
struct ena_stats_srd   srd_stats __attribute__((__aligned__(64))); 
/* 18201640 */

/* size: 182080, cachelines: 2845, members: 40 */
/* sum members: 181910, holes: 9, sum holes: 146 */

Vs:

uint16_t   metrics_num 
__attribute__((__aligned__(64))); /* 181952 2 */

/* XXX 6 bytes hole, try to pack */

uint64_t   metrics_stats[6]; /* 18196048 */

/* XXX 8 bytes hole, try to pack */

/* --- cacheline 2844 boundary (182016 bytes) --- */
struct ena_stats_srd   srd_stats __attribute__((__aligned__(64))); 
/* 18201640 */

/* size: 182080, cachelines: 2845, members: 40 */
/* sum members: 181910, holes: 10, sum holes: 146 */


Re: [EXT] Re: [PATCH v2 1/1] usertools/rss: add CNXK RSS key

2023-10-25 Thread Robin Jarry

Sunil Kumar Kori, Oct 18, 2023 at 12:01:
> I could have a shot at it since it may involve some refactoring. 
> Also, existing supported drivers will benefit from it. This does not 
> seem like it is directly related to CNXK.


Sure, Thanks.


Hi Sunil,

I have sent a patch that should allow you to define the default key and 
RETA size for the CNXK driver.


http://patches.dpdk.org/project/dpdk/patch/20231023080710.240402-3-rja...@redhat.com/

It would probably make sense to apply my patch first and rebase yours on 
top of it.


Thomas, what do you think?



[PATCH v3] config/arm: update aarch32 build with gcc13

2023-10-25 Thread Juraj Linkeš
The aarch32 with gcc13 fails with:

Compiler for C supports arguments -march=armv8-a: NO

../config/arm/meson.build:714:12: ERROR: Problem encountered: No
suitable armv8 march version found.

This is because we test -march=armv8-a alone (without the -mpfu option),
which is no longer supported in gcc13 aarch32 builds.

The most recent recommendation from the compiler team is to build with
-march=armv8-a+simd -mfpu=auto, which should work for compilers old and
new. The suggestion is to first check -march=armv8-a+simd and only then
check -mfpu=auto.

To address this, add a way to force the architecture (the value of
the -march option).

Signed-off-by: Juraj Linkeš 
---
 config/arm/meson.build | 40 +++-
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/config/arm/meson.build b/config/arm/meson.build
index 3f22d8a2fc..c3f763764a 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -43,7 +43,9 @@ implementer_generic = {
 },
 'generic_aarch32': {
 'march': 'armv8-a',
-'compiler_options': ['-mfpu=neon'],
+'force_march': true,
+'march_features': ['simd'],
+'compiler_options': ['-mfpu=auto'],
 'flags': [
 ['RTE_ARCH_ARM_NEON_MEMCPY', false],
 ['RTE_ARCH_STRICT_ALIGN', true],
@@ -695,21 +697,25 @@ if update_flags
 # probe supported archs and their features
 candidate_march = ''
 if part_number_config.has_key('march')
-supported_marchs = ['armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a',
-'armv8.2-a', 'armv8.1-a', 'armv8-a']
-check_compiler_support = false
-foreach supported_march: supported_marchs
-if supported_march == part_number_config['march']
-# start checking from this version downwards
-check_compiler_support = true
-endif
-if (check_compiler_support and
-cc.has_argument('-march=' + supported_march))
-candidate_march = supported_march
-# highest supported march version found
-break
-endif
-endforeach
+if part_number_config.get('force_march', false)
+candidate_march = part_number_config['march']
+else
+supported_marchs = ['armv8.6-a', 'armv8.5-a', 'armv8.4-a', 
'armv8.3-a',
+'armv8.2-a', 'armv8.1-a', 'armv8-a']
+check_compiler_support = false
+foreach supported_march: supported_marchs
+if supported_march == part_number_config['march']
+# start checking from this version downwards
+check_compiler_support = true
+endif
+if (check_compiler_support and
+cc.has_argument('-march=' + supported_march))
+candidate_march = supported_march
+# highest supported march version found
+break
+endif
+endforeach
+endif
 if candidate_march == ''
 error('No suitable armv8 march version found.')
 endif
@@ -741,7 +747,7 @@ if update_flags
 # apply supported compiler options
 if part_number_config.has_key('compiler_options')
 foreach flag: part_number_config['compiler_options']
-if cc.has_argument(flag)
+if cc.has_multi_arguments(machine_args + [flag])
 machine_args += flag
 else
 warning('Configuration compiler option ' +
-- 
2.34.1



Re: [PATCH v5 2/9] buildtools: script to generate cmdline boilerplate

2023-10-25 Thread Robin Jarry

Bruce Richardson, Oct 17, 2023 at 14:13:

Provide a "dpdk-cmdline-gen.py" script for application developers to
quickly generate the boilerplate code necessary for using the cmdline
library.

Example of use:
The script takes an input file with a list of commands the user wants in
the app, where the parameter variables are tagged with the type.
For example:

$ cat commands.list
list
add x y
echo message
add socket path
quit

When run through the script as "./dpdk-cmdline-gen.py commands.list",
the output will be the contents of a header file with all the
boilerplate necessary for a commandline instance with those commands.

If the flag --stubs is passed, an output header filename must also be
passed, in which case both a header file with the definitions and a C
file with function stubs in it is written to disk. The separation is so
that the header file can be rewritten at any future point to add more
commands, while the C file can be kept as-is and extended by the user
with any additional functions needed.

Signed-off-by: Bruce Richardson 
---
 buildtools/dpdk-cmdline-gen.py| 190 ++
 buildtools/meson.build|   7 ++
 doc/guides/prog_guide/cmdline.rst | 131 +++-
 3 files changed, 327 insertions(+), 1 deletion(-)
 create mode 100755 buildtools/dpdk-cmdline-gen.py


Hi Bruce,

thanks for the respin! I have some small remarks inline.


diff --git a/buildtools/dpdk-cmdline-gen.py b/buildtools/dpdk-cmdline-gen.py
new file mode 100755
index 00..6cb7610de4
--- /dev/null
+++ b/buildtools/dpdk-cmdline-gen.py
@@ -0,0 +1,190 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 Intel Corporation
+#
+"""
+Script to automatically generate boilerplate for using DPDK cmdline library.
+"""
+
+import argparse
+import sys
+
+PARSE_FN_PARAMS = "void *parsed_result, struct cmdline *cl, void *data"
+PARSE_FN_BODY = """
+/* TODO: command action */
+RTE_SET_USED(parsed_result);
+RTE_SET_USED(cl);
+RTE_SET_USED(data);
+"""
+NUMERIC_TYPES = [
+"UINT8",
+"UINT16",
+"UINT32",
+"UINT64",
+"INT8",
+"INT16",
+"INT32",
+"INT64",
+]
+
+
+def process_command(lineno, tokens, comment):
+"""Generate the structures and definitions for a single command."""
+out = []
+cfile_out = []
+
+if tokens[0].startswith("<"):
+raise ValueError(f"Error line {lineno + 1}: command must start with a 
literal string")
+
+name_tokens = []
+for t in tokens:
+if t.startswith("<"):
+break
+name_tokens.append(t)
+name = "_".join(name_tokens)
+
+result_struct = []
+initializers = []
+token_list = []
+for t in tokens:
+if t.startswith("<"):
+t_type, t_name = t[1:].split(">")
+t_val = "NULL"
+else:
+t_type = "STRING"
+t_name = t
+t_val = f'"{t}"'
+
+if t_type == "STRING":
+result_struct.append(f"\tcmdline_fixed_string_t {t_name};")
+initializers.append(
+f"static cmdline_parse_token_string_t cmd_{name}_{t_name}_tok 
=\n"
++ f"\tTOKEN_STRING_INITIALIZER(struct cmd_{name}_result, {t_name}, 
{t_val});"


Since you are now using multiline strings in process_commands(), why not 
use them everywhere?


It would make the code more readable in my opinion and would avoid 
inline f-string concatenation.



+)
+elif t_type in NUMERIC_TYPES:
+result_struct.append(f"\t{t_type.lower()}_t {t_name};")
+initializers.append(
+f"static cmdline_parse_token_num_t cmd_{name}_{t_name}_tok =\n"
++ f"\tTOKEN_NUM_INITIALIZER(struct cmd_{name}_result, {t_name}, 
RTE_{t_type});"
+)
+elif t_type in ["IP", "IP_ADDR", "IPADDR"]:
+result_struct.append(f"\tcmdline_ipaddr_t {t_name};")
+initializers.append(
+f"cmdline_parse_token_ipaddr_t cmd_{name}_{t_name}_tok =\n"
++ f"\tTOKEN_IPV4_INITIALIZER(struct cmd_{name}_result, 
{t_name});"
+)
+else:
+raise TypeError(f"Error line {lineno + 1}: unknown token type 
'{t_type}'")
+token_list.append(f"cmd_{name}_{t_name}_tok")
+
+out.append(f'/* Auto-generated handling for command "{" ".join(tokens)}" 
*/')
+# output function prototype
+func_sig = f"void\ncmd_{name}_parsed({PARSE_FN_PARAMS})"
+out.append(f"extern {func_sig};\n")
+# output result data structure
+out.append(f"struct cmd_{name}_result {{\n" + "\n".join(result_struct) + 
"\n};\n")
+# output the initializer tokens
+out.append("\n".join(initializers) + "\n")
+# output the instance structure
+out.append(
+f"static cmdline_parse_inst_t cmd_{name} = {{\n"
++ f"\t.f = cmd_{name}_parsed,\n"
++ "\t.data = NULL,\n"
++ f'\t.help_str = "{commen

Re: [PATCH v7] net/mlx5: add test for live migration

2023-10-25 Thread Thomas Monjalon
25/10/2023 11:50, Rongwei Liu:
> This patch adds testpmd app a runtime function to test the live
> migration API.
> 
> testpmd> mlx5 set flow_engine  []
> Flag is optional.
> 
> Signed-off-by: Rongwei Liu 
> Acked-by: Viacheslav Ovsiienko 
> Acked-by: Ori Kam 

Acked-by: Thomas Monjalon 




Re: [PATCH] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Thomas Monjalon
24/10/2023 18:04, Stephen Hemminger:
> On Tue, 24 Oct 2023 15:55:13 +0200
> Morten Brørup  wrote:
> 
> > > 
> > >4. It MAY be used by preemptible multi-producer and/or preemptible 
> > > multi-
> > > consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), 
> > > SCHED_IDLE
> > > or SCHED_BATCH. User SHOULD be aware of the performance penalty before 
> > > using
> > > it.
> > > 
> > > -  5. It MUST not be used by multi-producer/consumer pthreads, whose
> > > scheduling policies are SCHED_FIFO or SCHED_RR.
> > > +  5. It MUST not be used by multi-producer/consumer pthreads
> > > + whose scheduling policies are ``SCHED_FIFO``
> > > + or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).  
> > 
> > Do the RTS or HTS ring modes make any difference here?
> > 
> > Anyway, I agree that real-time priority should not be forbidden on Unix.
> > 
> > Acked-by: Morten Brørup 
> 
> Please add a big warning message in the rte_thread.c and the documentation
> to describe the problem. Need to have the "you have been warned" action.

Yes I can add more warnings.

> Use of RT priority is incompatible with 100% poll mode as is typically done
> in DPDK applications. A real time thread has higher priority than other 
> necessary
> kernel threads on the same CPU. Therefore if the RT thread never sleeps, 
> critical
> system actions such as delayed writes, network packet processing and timer 
> updates
> will not happen which makes the system unstable.

Yes, and it is shown by the test on loongarch:
DPDK:fast-tests / threads_autotestTIMEOUT80.01s
http://mails.dpdk.org/archives/test-report/2023-October/488760.html

I'll try to pass the test by adding a sleep in the test thread.

> Multiple DPDK users have learned this the hard way.






Re: [PATCH v5 2/9] buildtools: script to generate cmdline boilerplate

2023-10-25 Thread Bruce Richardson
On Wed, Oct 25, 2023 at 03:04:05PM +0200, Robin Jarry wrote:
> Bruce Richardson, Oct 17, 2023 at 14:13:
> > Provide a "dpdk-cmdline-gen.py" script for application developers to
> > quickly generate the boilerplate code necessary for using the cmdline
> > library.
> > 
> > Example of use:
> > The script takes an input file with a list of commands the user wants in
> > the app, where the parameter variables are tagged with the type.
> > For example:
> > 
> > $ cat commands.list
> > list
> > add x y
> > echo message
> > add socket path
> > quit
> > 
> > When run through the script as "./dpdk-cmdline-gen.py commands.list",
> > the output will be the contents of a header file with all the
> > boilerplate necessary for a commandline instance with those commands.
> > 
> > If the flag --stubs is passed, an output header filename must also be
> > passed, in which case both a header file with the definitions and a C
> > file with function stubs in it is written to disk. The separation is so
> > that the header file can be rewritten at any future point to add more
> > commands, while the C file can be kept as-is and extended by the user
> > with any additional functions needed.
> > 
> > Signed-off-by: Bruce Richardson 
> > ---
> >  buildtools/dpdk-cmdline-gen.py| 190 ++
> >  buildtools/meson.build|   7 ++
> >  doc/guides/prog_guide/cmdline.rst | 131 +++-
> >  3 files changed, 327 insertions(+), 1 deletion(-)
> >  create mode 100755 buildtools/dpdk-cmdline-gen.py
> 
> Hi Bruce,
> 
> thanks for the respin! I have some small remarks inline.
> 
> > diff --git a/buildtools/dpdk-cmdline-gen.py b/buildtools/dpdk-cmdline-gen.py
> > new file mode 100755
> > index 00..6cb7610de4
> > --- /dev/null
> > +++ b/buildtools/dpdk-cmdline-gen.py
> > @@ -0,0 +1,190 @@
> > +#!/usr/bin/env python3
> > +# SPDX-License-Identifier: BSD-3-Clause
> > +# Copyright(c) 2023 Intel Corporation
> > +#
> > +"""
> > +Script to automatically generate boilerplate for using DPDK cmdline 
> > library.
> > +"""
> > +
> > +import argparse
> > +import sys
> > +
> > +PARSE_FN_PARAMS = "void *parsed_result, struct cmdline *cl, void *data"
> > +PARSE_FN_BODY = """
> > +/* TODO: command action */
> > +RTE_SET_USED(parsed_result);
> > +RTE_SET_USED(cl);
> > +RTE_SET_USED(data);
> > +"""
> > +NUMERIC_TYPES = [
> > +"UINT8",
> > +"UINT16",
> > +"UINT32",
> > +"UINT64",
> > +"INT8",
> > +"INT16",
> > +"INT32",
> > +"INT64",
> > +]
> > +
> > +
> > +def process_command(lineno, tokens, comment):
> > +"""Generate the structures and definitions for a single command."""
> > +out = []
> > +cfile_out = []
> > +
> > +if tokens[0].startswith("<"):
> > +raise ValueError(f"Error line {lineno + 1}: command must start 
> > with a literal string")
> > +
> > +name_tokens = []
> > +for t in tokens:
> > +if t.startswith("<"):
> > +break
> > +name_tokens.append(t)
> > +name = "_".join(name_tokens)
> > +
> > +result_struct = []
> > +initializers = []
> > +token_list = []
> > +for t in tokens:
> > +if t.startswith("<"):
> > +t_type, t_name = t[1:].split(">")
> > +t_val = "NULL"
> > +else:
> > +t_type = "STRING"
> > +t_name = t
> > +t_val = f'"{t}"'
> > +
> > +if t_type == "STRING":
> > +result_struct.append(f"\tcmdline_fixed_string_t {t_name};")
> > +initializers.append(
> > +f"static cmdline_parse_token_string_t 
> > cmd_{name}_{t_name}_tok =\n"
> > ++ f"\tTOKEN_STRING_INITIALIZER(struct cmd_{name}_result, 
> > {t_name}, {t_val});"
> 
> Since you are now using multiline strings in process_commands(), why not use
> them everywhere?
> 
> It would make the code more readable in my opinion and would avoid inline
> f-string concatenation.
> 

I'm a bit unsure about this case. I notice I can at least remove the "+"
symbol and have implicit string concat, but I really don't like the way the
indentation gets adjusted when we use multi-line strings, since the indent
has to match the C-code indent rather than the python indentation levels.

Therefore, I'm going to leave these pairs of lines as they are.

> > +)
> > +elif t_type in NUMERIC_TYPES:
> > +result_struct.append(f"\t{t_type.lower()}_t {t_name};")
> > +initializers.append(
> > +f"static cmdline_parse_token_num_t cmd_{name}_{t_name}_tok 
> > =\n"
> > ++ f"\tTOKEN_NUM_INITIALIZER(struct cmd_{name}_result, 
> > {t_name}, RTE_{t_type});"
> > +)
> > +elif t_type in ["IP", "IP_ADDR", "IPADDR"]:
> > +result_struct.append(f"\tcmdline_ipaddr_t {t_name};")
> > +initializers.append(
> > +f"cmdline_parse_token_ipaddr_t cmd_{name}_{t_name}_tok =\n"
> > +  

Re: [PATCH] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Bruce Richardson
On Wed, Oct 25, 2023 at 03:15:49PM +0200, Thomas Monjalon wrote:
> 24/10/2023 18:04, Stephen Hemminger:
> > On Tue, 24 Oct 2023 15:55:13 +0200
> > Morten Brørup  wrote:
> > 
> > > > 
> > > >4. It MAY be used by preemptible multi-producer and/or preemptible 
> > > > multi-
> > > > consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), 
> > > > SCHED_IDLE
> > > > or SCHED_BATCH. User SHOULD be aware of the performance penalty before 
> > > > using
> > > > it.
> > > > 
> > > > -  5. It MUST not be used by multi-producer/consumer pthreads, whose
> > > > scheduling policies are SCHED_FIFO or SCHED_RR.
> > > > +  5. It MUST not be used by multi-producer/consumer pthreads
> > > > + whose scheduling policies are ``SCHED_FIFO``
> > > > + or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).  
> > > 
> > > Do the RTS or HTS ring modes make any difference here?
> > > 
> > > Anyway, I agree that real-time priority should not be forbidden on Unix.
> > > 
> > > Acked-by: Morten Brørup 
> > 
> > Please add a big warning message in the rte_thread.c and the documentation
> > to describe the problem. Need to have the "you have been warned" action.
> 
> Yes I can add more warnings.
> 
> > Use of RT priority is incompatible with 100% poll mode as is typically done
> > in DPDK applications. A real time thread has higher priority than other 
> > necessary
> > kernel threads on the same CPU. Therefore if the RT thread never sleeps, 
> > critical
> > system actions such as delayed writes, network packet processing and timer 
> > updates
> > will not happen which makes the system unstable.
> 
> Yes, and it is shown by the test on loongarch:
> DPDK:fast-tests / threads_autotestTIMEOUT80.01s
> http://mails.dpdk.org/archives/test-report/2023-October/488760.html
> 
> I'll try to pass the test by adding a sleep in the test thread.
> 

"sched_yield()" rather than sleep perhaps? Might better convey the
intention of the call.



[PATCH v2 0/5] net/ena: v2.8.0 driver release

2023-10-25 Thread shaibran
From: Shai Brandes 

Hi,

This patchset contains alignment of the driver to the latest HAL version
which adds support for retrieving new metrics from the device and opens
a path to use additional device features that are not yet supported by
the driver.

The new driver features are mostly about adding additional metrics.
We added support for an additional customer metric, ENA-express metrics
and are now reporting also rx overrun errors. All new metrics have MP
support.

* Testing:
The team performed directed tests for all the features and executed our
proprietary performance benchamarking (BW, PPS, Latency) to verify there
is no regression. It was executed on the entire Amazon EC2 instance types
matrix that covers all device generations and CPU architectures. 

---
v2:
* Fixed spelling issues from checkpatch in patch 0001

Shai Brandes (5):
  net/ena: hal upgrade
  net/ena: add support for connection tracking metric
  net/ena: report Rx overrun errors in xstats
  net/ena: add support for ena-express metrics
  net/ena: update ena version to 2.8.0

 doc/guides/rel_notes/release_23_11.rst|   7 +
 drivers/net/ena/base/ena_com.c| 499 +++---
 drivers/net/ena/base/ena_com.h| 197 ++-
 .../net/ena/base/ena_defs/ena_admin_defs.h| 198 ++-
 .../net/ena/base/ena_defs/ena_eth_io_defs.h   |  18 +-
 drivers/net/ena/base/ena_defs/ena_gen_info.h  |   4 +-
 drivers/net/ena/base/ena_defs/ena_regs_defs.h |  12 +
 drivers/net/ena/base/ena_eth_com.c|  45 +-
 drivers/net/ena/base/ena_eth_com.h|  30 +-
 drivers/net/ena/base/ena_plat.h   |   8 +-
 drivers/net/ena/base/ena_plat_dpdk.h  |  49 +-
 drivers/net/ena/ena_ethdev.c  | 321 ---
 drivers/net/ena/ena_ethdev.h  |  43 +-
 13 files changed, 1204 insertions(+), 227 deletions(-)

-- 
2.17.1



[PATCH v2 1/5] net/ena: hal upgrade

2023-10-25 Thread shaibran
From: Shai Brandes 

ENA maintains a HAL that is shared by all supported host drivers.
Main features introduced to the HAL:
[1] Reworked the mechanism that queries the performance metrics
from the device.
[2] Added support for a new metric that allows monitoring the
available tracked connections.
[3] Added support for a new statistic that counts RX drops due
to insufficient buffers provided by host.
[4] Added support for Scalable Reliable Datagram (SRD) metrics
from ENA Express.
[5] Added support for querying the LLQ entry size recommendation
from the device.
[6] Added support for PTP hardware clock (PHC) feature that
provides enhanced accuracy (Not supported by the driver).
[7] Added support for new reset reasons for a suspected CPU
starvation and for completion descriptor inconsistency.
[8] Aligned all return error code to a common notation.
[9] Removed an obsolete queue tail pointer update API.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_23_11.rst|   4 +
 drivers/net/ena/base/ena_com.c| 499 +++---
 drivers/net/ena/base/ena_com.h| 197 ++-
 .../net/ena/base/ena_defs/ena_admin_defs.h| 198 ++-
 .../net/ena/base/ena_defs/ena_eth_io_defs.h   |  18 +-
 drivers/net/ena/base/ena_defs/ena_gen_info.h  |   4 +-
 drivers/net/ena/base/ena_defs/ena_regs_defs.h |  12 +
 drivers/net/ena/base/ena_eth_com.c|  45 +-
 drivers/net/ena/base/ena_eth_com.h|  30 +-
 drivers/net/ena/base/ena_plat.h   |   8 +-
 drivers/net/ena/base/ena_plat_dpdk.h  |  49 +-
 drivers/net/ena/ena_ethdev.c  |  16 +-
 12 files changed, 915 insertions(+), 165 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_11.rst 
b/doc/guides/rel_notes/release_23_11.rst
index 0a6fc76a9d..e3b0ba58c9 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -144,6 +144,10 @@ New Features
 
   * Added support for Network Service Header (NSH) flow matching.
 
+* **Updated Amazon Elastic Network Adapter ena net driver.**
+
+  * Upgraded ENA HAL to latest version.
+
 * **Updated Solarflare net driver.**
 
   * Added support for transfer flow action ``INDIRECT`` with subtype 
``VXLAN_ENCAP``.
diff --git a/drivers/net/ena/base/ena_com.c b/drivers/net/ena/base/ena_com.c
index 5ca36ab6d9..880d047956 100644
--- a/drivers/net/ena/base/ena_com.c
+++ b/drivers/net/ena/base/ena_com.c
@@ -38,6 +38,12 @@
 
 #define ENA_MAX_ADMIN_POLL_US 5000
 
+/* PHC definitions */
+#define ENA_PHC_DEFAULT_EXPIRE_TIMEOUT_USEC 20
+#define ENA_PHC_DEFAULT_BLOCK_TIMEOUT_USEC 1000
+#define ENA_PHC_TIMESTAMP_ERROR 0x
+#define ENA_PHC_REQ_ID_OFFSET 0xDEAD
+
 /*/
 /*/
 /*/
@@ -70,7 +76,7 @@ static int ena_com_mem_addr_set(struct ena_com_dev *ena_dev,
   dma_addr_t addr)
 {
if ((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 0)) != addr) {
-   ena_trc_err(ena_dev, "DMA address has more bits than the device 
supports\n");
+   ena_trc_err(ena_dev, "DMA address has more bits that the device 
supports\n");
return ENA_COM_INVAL;
}
 
@@ -360,7 +366,7 @@ static int ena_com_init_io_sq(struct ena_com_dev *ena_dev,
ENA_COM_BOUNCE_BUFFER_CNTRL_CNT;
io_sq->bounce_buf_ctrl.next_to_use = 0;
 
-   size = io_sq->bounce_buf_ctrl.buffer_size *
+   size = (size_t)io_sq->bounce_buf_ctrl.buffer_size *
io_sq->bounce_buf_ctrl.buffers_num;
 
ENA_MEM_ALLOC_NODE(ena_dev->dmadev,
@@ -658,7 +664,7 @@ static int ena_com_config_llq_info(struct ena_com_dev 
*ena_dev,
} else {
ena_trc_err(ena_dev, "Invalid header location control, 
supported: 0x%x\n",
supported_feat);
-   return -EINVAL;
+   return ENA_COM_INVAL;
}
 
if (likely(llq_info->header_location_ctrl == ENA_ADMIN_INLINE_HEADER)) {
@@ -673,7 +679,7 @@ static int ena_com_config_llq_info(struct ena_com_dev 
*ena_dev,
} else {
ena_trc_err(ena_dev, "Invalid desc_stride_ctrl, 
supported: 0x%x\n",
supported_feat);
-   return -EINVAL;
+   return ENA_COM_INVAL;
}
 
ena_trc_err(ena_dev, "Default llq stride ctrl is not 
supported, performing fallback, default: 0x%x, supported: 0x%x, used: 0x%x\n",
@@ -702,7 +708,7 @@ static int ena_com_config_llq_info(struct ena_com_dev 
*ena_dev,
} else {

[PATCH v2 4/5] net/ena: add support for ena-express metrics

2023-10-25 Thread shaibran
From: Shai Brandes 

ENA-express is powered by AWS scalable reliable datagram (SRD)
technology. SRD is a high performance network transport protocol
that uses dynamic routing to increase throughput and minimize
tail latency.

The driver expose the following ENA-express metrics via xstats:
* ena_srd_mode – Describes which ENA-express features are enabled
* ena_srd_eligible_tx_pkts – The number of network packets sent
  within a given time period that meet SRD requirements for
  eligibility
* ena_srd_tx_pkts – The number of SRD packets transmitted within
  a given time period.
* ena_srd_rx_pkts – The number of SRD packets received within
  a given time period.
* ena_srd_resource_utilization – The percentage of the maximum
  allowed memory utilization for concurrent SRD connections
  that the instance has consumed.

Probing the ENA Express metrics is performed via an admin command.
Thus, a multi-process proxy handler was added.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_23_11.rst |   1 +
 drivers/net/ena/ena_ethdev.c   | 105 -
 drivers/net/ena/ena_ethdev.h   |  18 +
 3 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_23_11.rst 
b/doc/guides/rel_notes/release_23_11.rst
index f622d93384..7aaa78e6c9 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -149,6 +149,7 @@ New Features
   * Upgraded ENA HAL to latest version.
   * Added support for connection tracking allowance utilization metric.
   * Added support for reporting rx overrun errors in xstats.
+  * Added support for ENA-express metrics.
 
 * **Updated Solarflare net driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index b3ebda6049..59cb2792d4 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -66,6 +66,9 @@ struct ena_stats {
 #define ENA_STAT_GLOBAL_ENTRY(stat) \
ENA_STAT_ENTRY(stat, dev)
 
+#define ENA_STAT_ENA_SRD_ENTRY(stat) \
+   ENA_STAT_ENTRY(stat, srd)
+
 /* Device arguments */
 #define ENA_DEVARG_LARGE_LLQ_HDR "large_llq_hdr"
 /* Timeout in seconds after which a single uncompleted Tx packet should be
@@ -106,6 +109,14 @@ static struct ena_stats ena_stats_metrics_strings[] = {
ENA_STAT_METRICS_ENTRY(conntrack_allowance_available),
 };
 
+static const struct ena_stats ena_stats_srd_strings[] = {
+   ENA_STAT_ENA_SRD_ENTRY(ena_srd_mode),
+   ENA_STAT_ENA_SRD_ENTRY(ena_srd_tx_pkts),
+   ENA_STAT_ENA_SRD_ENTRY(ena_srd_eligible_tx_pkts),
+   ENA_STAT_ENA_SRD_ENTRY(ena_srd_rx_pkts),
+   ENA_STAT_ENA_SRD_ENTRY(ena_srd_resource_utilization),
+};
+
 static const struct ena_stats ena_stats_tx_strings[] = {
ENA_STAT_TX_ENTRY(cnt),
ENA_STAT_TX_ENTRY(bytes),
@@ -132,9 +143,11 @@ static const struct ena_stats ena_stats_rx_strings[] = {
 #define ENA_STATS_ARRAY_GLOBAL ARRAY_SIZE(ena_stats_global_strings)
 #define ENA_STATS_ARRAY_METRICSARRAY_SIZE(ena_stats_metrics_strings)
 #define ENA_STATS_ARRAY_METRICS_LEGACY (ENA_STATS_ARRAY_METRICS - 1)
+#define ENA_STATS_ARRAY_ENA_SRDARRAY_SIZE(ena_stats_srd_strings)
 #define ENA_STATS_ARRAY_TX ARRAY_SIZE(ena_stats_tx_strings)
 #define ENA_STATS_ARRAY_RX ARRAY_SIZE(ena_stats_rx_strings)
 
+
 #define QUEUE_OFFLOADS (RTE_ETH_TX_OFFLOAD_TCP_CKSUM |\
RTE_ETH_TX_OFFLOAD_UDP_CKSUM |\
RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |\
@@ -272,6 +285,8 @@ static int ena_parse_devargs(struct ena_adapter *adapter,
 static void ena_copy_customer_metrics(struct ena_adapter *adapter,
uint64_t *buf,
size_t buf_size);
+static void ena_copy_ena_srd_info(struct ena_adapter *adapter,
+ struct ena_stats_srd *srd_info);
 static int ena_setup_rx_intr(struct rte_eth_dev *dev);
 static int ena_rx_queue_intr_enable(struct rte_eth_dev *dev,
uint16_t queue_id);
@@ -324,6 +339,7 @@ enum ena_mp_req {
ENA_MP_IND_TBL_GET,
ENA_MP_IND_TBL_SET,
ENA_MP_CUSTOMER_METRICS_GET,
+   ENA_MP_SRD_STATS_GET,
 };
 
 /** Proxy message body. Shared between requests and responses. */
@@ -581,6 +597,22 @@ ENA_PROXY_DESC(ena_com_get_customer_metrics, 
ENA_MP_CUSTOMER_METRICS_GET,
 }),
struct ena_com_dev *ena_dev, char *buf, size_t buf_size);
 
+ENA_PROXY_DESC(ena_com_get_ena_srd_info, ENA_MP_SRD_STATS_GET,
+({
+   ENA_TOUCH(adapter);
+   ENA_TOUCH(req);
+   ENA_TOUCH(ena_dev);
+   ENA_TOUCH(info);
+}),
+({
+   ENA_TOUCH(rsp);
+   ENA_TOUCH(ena_dev);
+   if ((struct ena_stats_srd *)info != &adapter->srd_stats)
+   rte_memcpy((struct ena_stats_srd *)info,
+   &adapter->srd_stats,
+   sizeof(struct ena_stats_srd));
+}),
+   struct ena_com_de

[PATCH v2 2/5] net/ena: add support for connection tracking metric

2023-10-25 Thread shaibran
From: Shai Brandes 

The driver publishes network performance metrics that the
application can use to troubleshoot performance issues,
monitor the workload, and benchmark applications to determine
whether they maximize the performance.

This patch adds support for the connection tracking allowance
utilization metric (Conntrack_allowance_available) that allows
monitoring the available tracked connections that can be
established before the interface exceeds its allowance.

The driver uses the redesigned HAL mechanism that is backward
compatible with the old method to query the metrics.

Probing the customer metrics is performed via an admin command.
Thus, a multi-process proxy handler was added.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_23_11.rst |   1 +
 drivers/net/ena/ena_ethdev.c   | 198 +
 drivers/net/ena/ena_ethdev.h   |  24 ++-
 3 files changed, 161 insertions(+), 62 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_11.rst 
b/doc/guides/rel_notes/release_23_11.rst
index e3b0ba58c9..eefbcc08fe 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -147,6 +147,7 @@ New Features
 * **Updated Amazon Elastic Network Adapter ena net driver.**
 
   * Upgraded ENA HAL to latest version.
+  * Added support for connection tracking allowance utilization metric.
 
 * **Updated Solarflare net driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index b764442dbb..daec7f7d16 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -60,8 +60,8 @@ struct ena_stats {
 #define ENA_STAT_TX_ENTRY(stat) \
ENA_STAT_ENTRY(stat, tx)
 
-#define ENA_STAT_ENI_ENTRY(stat) \
-   ENA_STAT_ENTRY(stat, eni)
+#define ENA_STAT_METRICS_ENTRY(stat) \
+   ENA_STAT_ENTRY(stat, metrics)
 
 #define ENA_STAT_GLOBAL_ENTRY(stat) \
ENA_STAT_ENTRY(stat, dev)
@@ -92,12 +92,17 @@ static const struct ena_stats ena_stats_global_strings[] = {
ENA_STAT_GLOBAL_ENTRY(tx_drops),
 };
 
-static const struct ena_stats ena_stats_eni_strings[] = {
-   ENA_STAT_ENI_ENTRY(bw_in_allowance_exceeded),
-   ENA_STAT_ENI_ENTRY(bw_out_allowance_exceeded),
-   ENA_STAT_ENI_ENTRY(pps_allowance_exceeded),
-   ENA_STAT_ENI_ENTRY(conntrack_allowance_exceeded),
-   ENA_STAT_ENI_ENTRY(linklocal_allowance_exceeded),
+/*
+ * The legacy metrics (also known as eni stats) consisted of 5 stats, while 
the reworked
+ * metrics (also known as customer metrics) support an additional stat.
+ */
+static struct ena_stats ena_stats_metrics_strings[] = {
+   ENA_STAT_METRICS_ENTRY(bw_in_allowance_exceeded),
+   ENA_STAT_METRICS_ENTRY(bw_out_allowance_exceeded),
+   ENA_STAT_METRICS_ENTRY(pps_allowance_exceeded),
+   ENA_STAT_METRICS_ENTRY(conntrack_allowance_exceeded),
+   ENA_STAT_METRICS_ENTRY(linklocal_allowance_exceeded),
+   ENA_STAT_METRICS_ENTRY(conntrack_allowance_available),
 };
 
 static const struct ena_stats ena_stats_tx_strings[] = {
@@ -124,7 +129,8 @@ static const struct ena_stats ena_stats_rx_strings[] = {
 };
 
 #define ENA_STATS_ARRAY_GLOBAL ARRAY_SIZE(ena_stats_global_strings)
-#define ENA_STATS_ARRAY_ENIARRAY_SIZE(ena_stats_eni_strings)
+#define ENA_STATS_ARRAY_METRICSARRAY_SIZE(ena_stats_metrics_strings)
+#define ENA_STATS_ARRAY_METRICS_LEGACY (ENA_STATS_ARRAY_METRICS - 1)
 #define ENA_STATS_ARRAY_TX ARRAY_SIZE(ena_stats_tx_strings)
 #define ENA_STATS_ARRAY_RX ARRAY_SIZE(ena_stats_rx_strings)
 
@@ -262,8 +268,9 @@ static int ena_process_bool_devarg(const char *key,
   void *opaque);
 static int ena_parse_devargs(struct ena_adapter *adapter,
 struct rte_devargs *devargs);
-static int ena_copy_eni_stats(struct ena_adapter *adapter,
- struct ena_stats_eni *stats);
+static void ena_copy_customer_metrics(struct ena_adapter *adapter,
+   uint64_t *buf,
+   size_t buf_size);
 static int ena_setup_rx_intr(struct rte_eth_dev *dev);
 static int ena_rx_queue_intr_enable(struct rte_eth_dev *dev,
uint16_t queue_id);
@@ -314,7 +321,8 @@ enum ena_mp_req {
ENA_MP_ENI_STATS_GET,
ENA_MP_MTU_SET,
ENA_MP_IND_TBL_GET,
-   ENA_MP_IND_TBL_SET
+   ENA_MP_IND_TBL_SET,
+   ENA_MP_CUSTOMER_METRICS_GET,
 };
 
 /** Proxy message body. Shared between requests and responses. */
@@ -507,8 +515,8 @@ ENA_PROXY_DESC(ena_com_get_eni_stats, ENA_MP_ENI_STATS_GET,
 ({
ENA_TOUCH(rsp);
ENA_TOUCH(ena_dev);
-   if (stats != (struct ena_admin_eni_stats *)&adapter->eni_stats)
-   rte_memcpy(stats, &adapter->eni_stats, sizeof(*stats));
+   if (stats != (struct ena_admin_eni_stats *)&adapter->metrics_stats)
+   rte_memcpy(stats, &adapter->metrics_sta

[PATCH v2 3/5] net/ena: report Rx overrun errors in xstats

2023-10-25 Thread shaibran
From: Shai Brandes 

RX overrun error occur when a packet arrives but there are
not enough free buffers in the RX ring to receive it.
The driver publishes the extended statistics with the RX
buffer overrun errors as reported by the device.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_23_11.rst | 1 +
 drivers/net/ena/ena_ethdev.c   | 4 
 drivers/net/ena/ena_ethdev.h   | 1 +
 3 files changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/release_23_11.rst 
b/doc/guides/rel_notes/release_23_11.rst
index eefbcc08fe..f622d93384 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -148,6 +148,7 @@ New Features
 
   * Upgraded ENA HAL to latest version.
   * Added support for connection tracking allowance utilization metric.
+  * Added support for reporting rx overrun errors in xstats.
 
 * **Updated Solarflare net driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index daec7f7d16..b3ebda6049 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -90,6 +90,7 @@ static const struct ena_stats ena_stats_global_strings[] = {
ENA_STAT_GLOBAL_ENTRY(dev_start),
ENA_STAT_GLOBAL_ENTRY(dev_stop),
ENA_STAT_GLOBAL_ENTRY(tx_drops),
+   ENA_STAT_GLOBAL_ENTRY(rx_overruns),
 };
 
 /*
@@ -3894,15 +3895,18 @@ static void ena_keep_alive(void *adapter_data,
struct ena_admin_aenq_keep_alive_desc *desc;
uint64_t rx_drops;
uint64_t tx_drops;
+   uint64_t rx_overruns;
 
adapter->timestamp_wd = rte_get_timer_cycles();
 
desc = (struct ena_admin_aenq_keep_alive_desc *)aenq_e;
rx_drops = ((uint64_t)desc->rx_drops_high << 32) | desc->rx_drops_low;
tx_drops = ((uint64_t)desc->tx_drops_high << 32) | desc->tx_drops_low;
+   rx_overruns = ((uint64_t)desc->rx_overruns_high << 32) | 
desc->rx_overruns_low;
 
adapter->drv_stats->rx_drops = rx_drops;
adapter->dev_stats.tx_drops = tx_drops;
+   adapter->dev_stats.rx_overruns = rx_overruns;
 }
 
 /**
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 9268d44dde..3f29764ca6 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -219,6 +219,7 @@ struct ena_stats_dev {
 * As a workaround it is being published as an extended statistic.
 */
u64 tx_drops;
+   u64 rx_overruns;
 };
 
 struct ena_stats_metrics {
-- 
2.17.1



[PATCH v2 5/5] net/ena: update ena version to 2.8.0

2023-10-25 Thread shaibran
From: Shai Brandes 

This release introduces:
* Upgraded ENA HAL.
* Support for connection tracking allowance utilization metric.
* Support for reporting rx overrun errors in xstats.
* Support for ENA-express metrics.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/ena_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 59cb2792d4..591bdf864f 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -21,7 +21,7 @@
 #include 
 
 #define DRV_MODULE_VER_MAJOR   2
-#define DRV_MODULE_VER_MINOR   7
+#define DRV_MODULE_VER_MINOR   8
 #define DRV_MODULE_VER_SUBMINOR0
 
 #define __MERGE_64B_H_L(h, l) (((uint64_t)h << 32) | l)
-- 
2.17.1



Re: [PATCH] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Thomas Monjalon
25/10/2023 15:34, Bruce Richardson:
> On Wed, Oct 25, 2023 at 03:15:49PM +0200, Thomas Monjalon wrote:
> > 24/10/2023 18:04, Stephen Hemminger:
> > > On Tue, 24 Oct 2023 15:55:13 +0200
> > > Morten Brørup  wrote:
> > > 
> > > > > 
> > > > >4. It MAY be used by preemptible multi-producer and/or preemptible 
> > > > > multi-
> > > > > consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), 
> > > > > SCHED_IDLE
> > > > > or SCHED_BATCH. User SHOULD be aware of the performance penalty 
> > > > > before using
> > > > > it.
> > > > > 
> > > > > -  5. It MUST not be used by multi-producer/consumer pthreads, whose
> > > > > scheduling policies are SCHED_FIFO or SCHED_RR.
> > > > > +  5. It MUST not be used by multi-producer/consumer pthreads
> > > > > + whose scheduling policies are ``SCHED_FIFO``
> > > > > + or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).  
> > > > 
> > > > Do the RTS or HTS ring modes make any difference here?
> > > > 
> > > > Anyway, I agree that real-time priority should not be forbidden on Unix.
> > > > 
> > > > Acked-by: Morten Brørup 
> > > 
> > > Please add a big warning message in the rte_thread.c and the documentation
> > > to describe the problem. Need to have the "you have been warned" action.
> > 
> > Yes I can add more warnings.
> > 
> > > Use of RT priority is incompatible with 100% poll mode as is typically 
> > > done
> > > in DPDK applications. A real time thread has higher priority than other 
> > > necessary
> > > kernel threads on the same CPU. Therefore if the RT thread never sleeps, 
> > > critical
> > > system actions such as delayed writes, network packet processing and 
> > > timer updates
> > > will not happen which makes the system unstable.
> > 
> > Yes, and it is shown by the test on loongarch:
> > DPDK:fast-tests / threads_autotestTIMEOUT80.01s
> > http://mails.dpdk.org/archives/test-report/2023-October/488760.html
> > 
> > I'll try to pass the test by adding a sleep in the test thread.
> > 
> 
> "sched_yield()" rather than sleep perhaps? Might better convey the
> intention of the call.

Do we have sched_yield on Windows?





[PATCH] config: compiler support check for machine arch flags

2023-10-25 Thread Sivaprasad Tummala
Added additional checks for compiler support of specific cpu arch
flags to fix incorrect error reporting.

Without this patch, meson build reports '__SSE4_2__' not defined
error for x86 builds when the compiler does not support the specified
cpu_instruction_set (or) machine argument.

Signed-off-by: Sivaprasad Tummala 
---
 config/meson.build | 5 +
 1 file changed, 5 insertions(+)

diff --git a/config/meson.build b/config/meson.build
index d56b0f9bce..e776870def 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -159,8 +159,13 @@ if not is_ms_compiler
 if host_machine.cpu_family().startswith('ppc')
 machine_args += '-mcpu=' + cpu_instruction_set
 machine_args += '-mtune=' + cpu_instruction_set
+compiler_arch_support = cc.has_argument('-mcpu=' + cpu_instruction_set)
 else
 machine_args += '-march=' + cpu_instruction_set
+compiler_arch_support = cc.has_argument('-march=' + 
cpu_instruction_set)
+endif
+if not compiler_arch_support
+error('Compiler does not support "@0@" arch 
flag.'.format(cpu_instruction_set))
 endif
 endif
 
-- 
2.34.1



Re: [PATCH] config: compiler support check for machine arch flags

2023-10-25 Thread Bruce Richardson
On Wed, Oct 25, 2023 at 07:17:09AM -0700, Sivaprasad Tummala wrote:
> Added additional checks for compiler support of specific cpu arch
> flags to fix incorrect error reporting.
> 
> Without this patch, meson build reports '__SSE4_2__' not defined
> error for x86 builds when the compiler does not support the specified
> cpu_instruction_set (or) machine argument.
> 
> Signed-off-by: Sivaprasad Tummala 
> ---
>  config/meson.build | 5 +
>  1 file changed, 5 insertions(+)
> 
Acked-by: Bruce Richardson 


Re: [PATCH] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Stephen Hemminger
On Wed, 25 Oct 2023 15:44:25 +0200
Thomas Monjalon  wrote:

> > > 
> > > I'll try to pass the test by adding a sleep in the test thread.
> > >   
> > 
> > "sched_yield()" rather than sleep perhaps? Might better convey the
> > intention of the call.  
> 
> Do we have sched_yield on Windows?

Windows has an equivalent but sched_yield() won't work here.
Since the DPDK thread is still higher priority than the kernel thread,
the scheduler will reschedule the DPDK thread. You need to sleep
to let kthread run.


[PATCH v2] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Thomas Monjalon
When adding an API for creating threads,
the real-time priority has been forbidden on Unix.

There is a known issue with ring behaviour,
but it should not be completely forbidden.

Real-time thread can block some kernel threads on the same core,
making the system unstable.
That's why a pause is added in the test thread.
This pause is a new API function rte_thread_yield(),
compatible with both Unix and Windows.

Fixes: ca04c78b6262 ("eal: get/set thread priority per thread identifier")
Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
Cc: sta...@dpdk.org

Signed-off-by: Thomas Monjalon 
Acked-by: Morten Brørup 
---
 app/test/test_threads.c   | 11 +-
 .../prog_guide/env_abstraction_layer.rst  |  4 +++-
 lib/eal/include/rte_thread.h  | 13 ++--
 lib/eal/unix/rte_thread.c | 21 +++
 lib/eal/version.map   |  3 +++
 lib/eal/windows/rte_thread.c  |  6 ++
 6 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index 4ac3f2671a..9a449ba9c5 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -22,7 +22,7 @@ thread_main(void *arg)
__atomic_store_n(&thread_id_ready, 1, __ATOMIC_RELEASE);
 
while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 1)
-   ;
+   rte_thread_yield(); /* required in case of real-time priority */
 
return 0;
 }
@@ -97,21 +97,12 @@ test_thread_priority(void)
"Priority set mismatches priority get");
 
priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
-#ifndef RTE_EXEC_ENV_WINDOWS
-   RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == ENOTSUP,
-   "Priority set to critical should fail");
-   RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
-   "Failed to get thread priority");
-   RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_NORMAL,
-   "Failed set to critical should have retained normal");
-#else
RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
"Priority set to critical should succeed");
RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
"Failed to get thread priority");
RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL,
"Priority set mismatches priority get");
-#endif
 
priority = RTE_THREAD_PRIORITY_NORMAL;
RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst 
b/doc/guides/prog_guide/env_abstraction_layer.rst
index 6debf54efb..d1f7cae7cd 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -815,7 +815,9 @@ Known Issues
 
   4. It MAY be used by preemptible multi-producer and/or preemptible 
multi-consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), 
SCHED_IDLE or SCHED_BATCH. User SHOULD be aware of the performance penalty 
before using it.
 
-  5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling 
policies are SCHED_FIFO or SCHED_RR.
+  5. It MUST not be used by multi-producer/consumer pthreads
+ whose scheduling policies are ``SCHED_FIFO``
+ or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
 
   Alternatively, applications can use the lock-free stack mempool handler. When
   considering this handler, note that:
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8da9d4d3fb..eeccc40532 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -56,10 +56,11 @@ typedef uint32_t (*rte_thread_func) (void *arg);
  * Thread priority values.
  */
 enum rte_thread_priority {
+   /** Normal thread priority, the default. */
RTE_THREAD_PRIORITY_NORMAL= 0,
-   /**< normal thread priority, the default */
+   /** Highest thread priority, use with caution.
+*  WARNING: System may be unstable because of a real-time busy loop. */
RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 1,
-   /**< highest thread priority allowed */
 };
 
 /**
@@ -183,6 +184,14 @@ int rte_thread_join(rte_thread_t thread_id, uint32_t 
*value_ptr);
  */
 int rte_thread_detach(rte_thread_t thread_id);
 
+/**
+ * Allow another thread to run on the same CPU core.
+ *
+ * Especially useful in real-time thread priority.
+ * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
+ */
+void rte_thread_yield(void);
+
 /**
  * Get the id of the calling thread.
  *
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 36a21ab2f9..399acf2fa0 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -5,6 +5,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 

Re: [PATCH] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Bruce Richardson
On Wed, Oct 25, 2023 at 08:08:52AM -0700, Stephen Hemminger wrote:
> On Wed, 25 Oct 2023 15:44:25 +0200
> Thomas Monjalon  wrote:
> 
> > > > 
> > > > I'll try to pass the test by adding a sleep in the test thread.
> > > >   
> > > 
> > > "sched_yield()" rather than sleep perhaps? Might better convey the
> > > intention of the call.  
> > 
> > Do we have sched_yield on Windows?
> 
> Windows has an equivalent but sched_yield() won't work here.
> Since the DPDK thread is still higher priority than the kernel thread,
> the scheduler will reschedule the DPDK thread. You need to sleep
> to let kthread run.

Interesting. Thanks for clarifying the situation.


Re: [PATCH] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Thomas Monjalon
25/10/2023 17:14, Bruce Richardson:
> On Wed, Oct 25, 2023 at 08:08:52AM -0700, Stephen Hemminger wrote:
> > On Wed, 25 Oct 2023 15:44:25 +0200
> > Thomas Monjalon  wrote:
> > 
> > > > > 
> > > > > I'll try to pass the test by adding a sleep in the test thread.
> > > > >   
> > > > 
> > > > "sched_yield()" rather than sleep perhaps? Might better convey the
> > > > intention of the call.  
> > > 
> > > Do we have sched_yield on Windows?
> > 
> > Windows has an equivalent but sched_yield() won't work here.
> > Since the DPDK thread is still higher priority than the kernel thread,
> > the scheduler will reschedule the DPDK thread. You need to sleep
> > to let kthread run.
> 
> Interesting. Thanks for clarifying the situation.

Indeed interesting.
I've just sent a v2 before reading this.

So I should try a v3 with a sleep.
But then I need to find a better name than rte_thread_yield.
Ideas?




Re: Series for 23.11

2023-10-25 Thread Maxime Coquelin

Hi Nicolas;

On 10/16/23 16:49, Chautru, Nicolas wrote:

Hi Maxime,

Just a heads up that Hernan is going on paternity leave, I will be 
covering for his series pending for 23.11. Ping me if there is any 
update required from review that I may have missed.


Pending series in patchwork:

  * SDK + doc update
https://patches.dpdk.org/project/dpdk/list/?series=29797

  * FPGA pmd update https://patches.dpdk.org/project/dpdk/list/?series=29537


We'd need new revision for this one by tomorrow if we want it in 23.11.


  * test-bbdev updates
https://patches.dpdk.org/project/dpdk/list/?series=29705
  * Doc update https://patches.dpdk.org/project/dpdk/list/?series=29840


For others, I think we can delay until -rc3, even if not ideal.

Thanks,
Maxime


Thanks!

Nic

*From:*Vargas, Hernan 
*Sent:* Monday, October 2, 2023 7:44 AM
*To:* dev@dpdk.org; maxime.coque...@redhat.com; gak...@marvell.com; Rix, 
Tom 
*Cc:* Chautru, Nicolas ; Zhang, Qi Z 


*Subject:* Series for 23.11

Hi Maxime,

Kind reminder to review these series for 23.11 in order of priority:

 1. https://patches.dpdk.org/project/dpdk/list/?series=29558

 2. https://patches.dpdk.org/project/dpdk/list/?series=29537

 3. https://patches.dpdk.org/project/dpdk/list/?series=29705


Thanks,

Hernan





Re: [PATCH] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Thomas Monjalon
25/10/2023 17:18, Thomas Monjalon:
> 25/10/2023 17:14, Bruce Richardson:
> > On Wed, Oct 25, 2023 at 08:08:52AM -0700, Stephen Hemminger wrote:
> > > On Wed, 25 Oct 2023 15:44:25 +0200
> > > Thomas Monjalon  wrote:
> > > 
> > > > > > 
> > > > > > I'll try to pass the test by adding a sleep in the test thread.
> > > > > >   
> > > > > 
> > > > > "sched_yield()" rather than sleep perhaps? Might better convey the
> > > > > intention of the call.  
> > > > 
> > > > Do we have sched_yield on Windows?
> > > 
> > > Windows has an equivalent but sched_yield() won't work here.
> > > Since the DPDK thread is still higher priority than the kernel thread,
> > > the scheduler will reschedule the DPDK thread. You need to sleep
> > > to let kthread run.
> > 
> > Interesting. Thanks for clarifying the situation.
> 
> Indeed interesting.
> I've just sent a v2 before reading this.
> 
> So I should try a v3 with a sleep.
> But then I need to find a better name than rte_thread_yield.
> Ideas?

I will go with rte_thread_yield_realtime().
Any sleep will suffice on Linux? What about a nanosleep?
I suppose Sleep(0) is OK on Windows?





Re: [PATCH v2] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Stephen Hemminger
On Wed, 25 Oct 2023 17:13:14 +0200
Thomas Monjalon  wrote:

>   case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
> + /*
> +  * WARNING: Real-time busy loop takes priority on kernel 
> threads,
> +  *  making the system unstable.
> +  *  There is also a known issue when using rte_ring.
> +  */

I was thinking something like:

static bool warned;
if (!warned) {
RTE_LOG(NOTICE, EAL, "Real time priority is unstable when 
thread is polling without sleep\n");
warned = true;
}


Re: [External] Re: [PATCH] eal: fix modify data area after memset

2023-10-25 Thread Stephen Hemminger
On Mon, 23 Oct 2023 17:07:21 +0800
Fengnan Chang  wrote:

> Dmitry Kozlyuk  于2023年10月23日周一 04:22写道:
> >
> > 2023-09-22 16:12 (UTC+0800), Fengnan Chang:  
> > > ping
> > >
> > > Fengnan Chang  于2023年9月12日周二 17:05写道:  
> > > >
> > > > Let's look at this path:
> > > > malloc_elem_free  
> > > >->malloc_elem_join_adjacent_free
> > > >   ->join_elem(elem, elem->next)  
> > > >
> > > > 0. cur elem's pad > 0
> > > > 1. data area memset in malloc_elem_free first.
> > > > 2. next elem is free, try to join cur elem and next.
> > > > 3. in join_elem, try to modify inner->size, this address had
> > > > memset in step 1, it casue the content of addrees become non-zero.
> > > >
> > > > If user call rte_zmalloc, and pick this elem, it can't get all
> > > > zero'd memory.  
> >
> > malloc_elem_join_adjacent_free() always calls memset() after join_elem(),
> > for the next and the previous element respectively.  
> when try to call join_elem() for the next element in
> malloc_elem_join_adjacent_free(),
> the memset is try to memset *next* element, but join_elem() is update
> *current* element's
> content, which shoudn't happen, it's two different element.
> 
> > How to reproduce this bug?  
> when I test this patch,
> https://patches.dpdk.org/project/dpdk/patch/2023083937.60975-1-changfeng...@bytedance.com/
> I have a case try to alloc 64/128/192 size object and free with 16 threads,
> after every
> alloc I'll check wheather all content is 0 or not.
> It's not easy to reproduce, you can have a try, it's easier to find
> this problem in code level.

I tried to make a test that would reproduce the problem but it did not.

diff --git a/app/test/test_malloc.c b/app/test/test_malloc.c
index cd579c503cf5..cfd45d6a28eb 100644
--- a/app/test/test_malloc.c
+++ b/app/test/test_malloc.c
@@ -28,6 +28,7 @@
 #include 

 #define N 1
+#define BINS 100

 static int
 is_mem_on_socket(int32_t socket);
@@ -69,13 +70,24 @@ is_aligned(void *p, int align)
return 1;
 }

+static bool is_all_zero(uint8_t *mem, size_t sz)
+{
+   size_t i;
+
+   for (i = 0; i < sz; i++)
+   if (mem[i] != 0)
+   return false;
+
+   return true;
+}
+
 static int
 test_align_overlap_per_lcore(__rte_unused void *arg)
 {
const unsigned align1 = 8,
align2 = 64,
align3 = 2048;
-   unsigned i,j;
+   unsigned int i;
void *p1 = NULL, *p2 = NULL, *p3 = NULL;
int ret = 0;

@@ -86,11 +98,12 @@ test_align_overlap_per_lcore(__rte_unused void *arg)
ret = -1;
break;
}
-   for(j = 0; j < 1000 ; j++) {
-   if( *(char *)p1 != 0) {
-   printf("rte_zmalloc didn't zero the allocated 
memory\n");
-   ret = -1;
-   }
+
+   if (!is_all_zero(p1, 1000)) {
+   printf("rte_zmalloc didn't zero the allocated 
memory\n");
+   ret = -1;
+   rte_free(p1);
+   break;
}
p2 = rte_malloc("dummy", 1000, align2);
if (!p2){
@@ -140,6 +153,66 @@ test_align_overlap_per_lcore(__rte_unused void *arg)
return ret;
 }

+/*
+ * Allocate random size chunks and make sure that they are
+ * always zero.
+ */
+static int
+test_zmalloc(__rte_unused void *arg)
+{
+   unsigned int i, n;
+   void *slots[BINS] = { };
+   void *p1;
+   size_t sz;
+
+   /* Allocate many variable size chunks */
+   for (i = 0; i < BINS; i++) {
+   sz = rte_rand_max(1024) + 1;
+   p1 = rte_zmalloc("slots", sz, 0);
+   if (p1 == NULL) {
+   printf("rte_zmalloc(%zu) returned NULL (i=%u)\n", sz, 
i);
+   goto fail;
+   }
+   slots[i] = p1;
+   if (!is_all_zero(p1, sz))
+   goto fail;
+   }
+
+   /* Drop one chunk per iteration */
+   for (n = BINS; n > 0; n--) {
+   /* Swap in a new block into a slot */
+   for (i = 0; i < N; i++) {
+   unsigned int bin = rte_rand_max(n);
+
+   sz = rte_rand_max(1024) + 1;
+   p1 = rte_zmalloc("swap", sz, 0);
+   if (!p1){
+   printf("rte_zmalloc(%zu) returned NULL 
(i=%u)\n", sz, i);
+   goto fail;
+   }
+
+   if (!is_all_zero(p1, sz)) {
+   printf("rte_zmalloc didn't zero the allocated 
memory\n");
+   goto fail;
+   }
+
+   rte_free(slots[bin]);
+   slots[bin] = p1;
+   }
+
+   /* Drop last bin */
+   rte_free(slots[n]);
+   

Re: [PATCH v2 00/25] add the NFP vDPA PMD

2023-10-25 Thread Ferruh Yigit
On 10/24/2023 3:28 AM, Chaoyong He wrote:
> This patch series aims to add the NFP vDPA PMD, we also grab the common
> logic into the `drivers/common/nfp` directory.
> 
> ---
> v2:
> * Grab more logic into the `drivers/common/nfp` directory.
> * Delete some logic which should be when moving logic.
> ---
> 
> Chaoyong He (25):
>   drivers: introduce the NFP common library
>   net/nfp: make VF PMD using of NFP common module
>   net/nfp: rename common module name
>   net/nfp: rename ctrl module name
>   net/nfp: extract the cap data field
>   net/nfp: extract the qcp data field
>   net/nfp: extract the ctrl BAR data field
>   net/nfp: extract the ctrl data field
>   net/nfp: change the parameter of APIs
>   net/nfp: change the parameter of reconfig
>   net/nfp: extract the MAC address data field
>   net/nfp: rename parameter in related logic
>   drivers: add the common ctrl module
>   drivers: add the nfp common module
>   drivers: move queue logic to common module
>   drivers: move platform module to common library
>   drivers: move device module to common library
>   drivers/vdpa: introduce the NFP vDPA library
>   drivers: add the basic framework of vDPA PMD
>   vdpa/nfp: add the logic of remap PCI memory
>   vdpa/nfp: add the hardware init logic
>   drivers: add the datapath update logic
>   vdpa/nfp: add the notify related logic
>   vdpa/nfp: add nfp vDPA device operations
>   doc: add the common and vDPA document
> 

Overall pretty clean set, but there are a few minor issues, commented on
patches.


Also can you please address checkpatch warnings:

  ### [PATCH] drivers: add the datapath update logic

Warning in drivers/vdpa/nfp/nfp_vdpa.c:
Using __atomic_xxx built-ins, prefer rte_atomic_xxx

  ### [PATCH] vdpa/nfp: add the notify related logic

Warning in drivers/vdpa/nfp/nfp_vdpa.c:
Using pthread functions, prefer rte_thread

  ### [PATCH] vdpa/nfp: add nfp vDPA device operations

Warning in drivers/vdpa/nfp/nfp_vdpa.c:
Using __atomic_xxx built-ins, prefer rte_atomic_xxx


And some typos:
  vdpa/nfp: add nfp vDPA device operations
  opetation



Re: [PATCH v2 02/25] net/nfp: make VF PMD using of NFP common module

2023-10-25 Thread Ferruh Yigit
On 10/24/2023 3:28 AM, Chaoyong He wrote:
> Modify the logic of NFP VF PMD, make it using of the NFP common module
> and link into the 'nfp_drivers_list'.
> 
> Signed-off-by: Chaoyong He 
> Signed-off-by: Shujing Dong 
> Reviewed-by: Long Wu 
> Reviewed-by: Peng Zhang 
> ---
>  drivers/net/nfp/meson.build |  6 +-
>  drivers/net/nfp/nfp_ethdev_vf.c | 14 ++
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/nfp/meson.build b/drivers/net/nfp/meson.build
> index 40e9ef8524..f8581403fa 100644
> --- a/drivers/net/nfp/meson.build
> +++ b/drivers/net/nfp/meson.build
> @@ -40,4 +40,8 @@ sources = files(
>  'nfp_rxtx.c',
>  )
>  
> -deps += ['hash', 'security']
> +deps += ['hash', 'security', 'common_nfp']
> +
> +if not dpdk_conf.has('RTE_COMMON_NFP')
> +error('Missing internal dependency "common/nfp"')
> +endif
>

This will break the build for the cases 'common/nfp' is disabled, above
'deps' updates should be sufficient, which should prevent to build the
driver when 'common/nfp' is missing, instead of failing.

So can you please drop above check.



Re: [PATCH v2 18/25] drivers/vdpa: introduce the NFP vDPA library

2023-10-25 Thread Ferruh Yigit
On 10/24/2023 3:28 AM, Chaoyong He wrote:
> Introduce the very basic NFP vDPA library.
> 
> Signed-off-by: Shujing Dong 
> Signed-off-by: Chaoyong He 
> Reviewed-by: Long Wu 
> Reviewed-by: Peng Zhang 

<...>

> --- /dev/null
> +++ b/drivers/vdpa/nfp/meson.build
> @@ -0,0 +1,16 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright (c) 2023 Corigine, Inc.
> +
> +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
> +build = false
> +reason = 'only supported on 64-bit Linux'
> +endif
> +
> +if not dpdk_conf.has('RTE_LIB_VHOST')
> +build = false
> +reason = 'missing dependency, DPDK vhost library'
> +endif
> +

Similar to previous comment, this may break the build.
Instead of this check, it is possible to add vhost and common/nfp as
dependency to this driver, using 'deps'.


> +sources = files(
> +'nfp_vdpa_log.c',
> +)
> diff --git a/drivers/vdpa/nfp/nfp_vdpa_log.c b/drivers/vdpa/nfp/nfp_vdpa_log.c
> new file mode 100644
> index 00..8c957d59ea
> --- /dev/null
> +++ b/drivers/vdpa/nfp/nfp_vdpa_log.c
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2023 Corigine, Inc.
> + * All rights reserved.
> + */
> +
> +#include "nfp_vdpa_log.h"
> +
> +RTE_LOG_REGISTER_SUFFIX(nfp_core_logtype, driver, NOTICE);
> +RTE_LOG_REGISTER_SUFFIX(nfp_vdpa_logtype, driver, NOTICE);
>

Both has 'driver' has suffix, I assume copy/paste error.



Re: [PATCH v2 19/25] drivers: add the basic framework of vDPA PMD

2023-10-25 Thread Ferruh Yigit
On 10/24/2023 3:28 AM, Chaoyong He wrote:
> Add the basic framework of vDPA PMD.
> 
> Signed-off-by: Chaoyong He 
> Signed-off-by: Shujing Dong 
> Reviewed-by: Long Wu 
> Reviewed-by: Peng Zhang 

<...>

> @@ -12,5 +12,12 @@ if not dpdk_conf.has('RTE_LIB_VHOST')
>  endif
>  
>  sources = files(
> +'nfp_vdpa.c',
>  'nfp_vdpa_log.c',
>  )
> +
> +deps += ['common_nfp']
> +
> +if not dpdk_conf.has('RTE_COMMON_NFP')
> +error('Missing internal dependency "common/nfp"')
> +endif
>

Same comment with previous patches, please drop above check.




Re: [PATCH v2 25/25] doc: add the common and vDPA document

2023-10-25 Thread Ferruh Yigit
On 10/24/2023 3:28 AM, Chaoyong He wrote:
> Add the document for nfp common library and vDPA PMD.
> 

Can you please distribute this patch to other patches in set, details below.

> Signed-off-by: Chaoyong He 
> Reviewed-by: Long Wu 
> Reviewed-by: Peng Zhang 
> ---
>  MAINTAINERS|  8 
>  doc/guides/platform/index.rst  |  1 +
>  doc/guides/platform/nfp.rst| 30 ++
>  doc/guides/rel_notes/release_23_11.rst |  5 +++
>  doc/guides/vdpadevs/features/nfp.ini   |  8 
>  doc/guides/vdpadevs/index.rst  |  1 +
>  doc/guides/vdpadevs/nfp.rst| 54 ++
>  7 files changed, 107 insertions(+)
>  create mode 100644 doc/guides/platform/nfp.rst
>  create mode 100644 doc/guides/vdpadevs/features/nfp.ini
>  create mode 100644 doc/guides/vdpadevs/nfp.rst
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4083658697..b28cdab54c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -900,9 +900,11 @@ F: doc/guides/nics/features/nfb.ini
>  
>  Netronome nfp
>  M: Chaoyong He 
> +F: drivers/common/nfp/
>

This part can go into first patch that introduces common/nfp

>  F: drivers/net/nfp/
>  F: doc/guides/nics/nfp.rst
>  F: doc/guides/nics/features/nfp*.ini
> +F: doc/guides/platform/nfp.rst
>  

Is a platform guide needed for nfp, it has only net and vdpa at this
stage and both has its own documentation, for a nic (not a SoC or
platform), a platform guide seems excessive to me, what do you think to
distribute this information to net and vdpa documentation of driver?


>  NXP dpaa
>  M: Hemant Agrawal 
> @@ -1306,6 +1308,12 @@ F: drivers/vdpa/ifc/
>  F: doc/guides/vdpadevs/ifc.rst
>  F: doc/guides/vdpadevs/features/ifcvf.ini
>  
> +Corigine nfp vDPA
> +M: Chaoyong He 
> +F: drivers/vdpa/nfp/
> +F: doc/guides/vpdadevs/nfp.rst
> +F: doc/guides/vdpadevs/features/nfp.ini
> +

Above part can go into patch that introduces vdpa/nfp, like
18/25] drivers/vdpa: introduce the NFP vDPA library

including other vdpa related documentation:
doc/guides/vdpadevs/features/nfp.ini
doc/guides/vdpadevs/index.rst
doc/guides/vdpadevs/nfp.rst



[PATCH v3 0/2] allow creating thread with real-time priority

2023-10-25 Thread Thomas Monjalon
Real-time thread priority was been forbidden on Unix
because of problems they can cause.
Warnings and helpers are added to avoid deadlocks,
so real-time can be allowed on all systems.

Thomas Monjalon (2):
  eal: add thread yield functions
  eal/unix: allow creating thread with real-time priority

 app/test/test_threads.c   | 11 +--
 .../prog_guide/env_abstraction_layer.rst  |  4 ++-
 lib/eal/include/rte_thread.h  | 29 --
 lib/eal/unix/rte_thread.c | 30 +--
 lib/eal/version.map   |  4 +++
 lib/eal/windows/rte_thread.c  | 15 ++
 6 files changed, 71 insertions(+), 22 deletions(-)

-- 
2.42.0



[PATCH v3 1/2] eal: add thread yield functions

2023-10-25 Thread Thomas Monjalon
When running real-time threads, we may need to force scheduling
kernel threads or other real-time threads.
New functions are added to address these cases.

The yield functions should not have any interest for normal threads.
Note: other purposes may be addressed with rte_pause() or rte_delay_*().

Signed-off-by: Thomas Monjalon 
---
 lib/eal/include/rte_thread.h | 22 ++
 lib/eal/unix/rte_thread.c| 16 
 lib/eal/version.map  |  4 
 lib/eal/windows/rte_thread.c | 15 +++
 4 files changed, 57 insertions(+)

diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8da9d4d3fb..139cafac96 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -183,6 +183,28 @@ int rte_thread_join(rte_thread_t thread_id, uint32_t 
*value_ptr);
  */
 int rte_thread_detach(rte_thread_t thread_id);
 
+/**
+ * Allow another thread to run on the same CPU core.
+ *
+ * Lower priority threads may not be scheduled.
+ *
+ * Especially useful in real-time thread priority
+ * to schedule other real-time threads.
+ * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
+ */
+__rte_experimental
+void rte_thread_yield(void);
+
+/**
+ * Unblock a CPU core running busy in a real-time thread.
+ *
+ * Especially useful in real-time thread priority
+ * to avoid a busy loop blocking vital threads on a core.
+ * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
+ */
+__rte_experimental
+void rte_thread_yield_realtime(void);
+
 /**
  * Get the id of the calling thread.
  *
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 36a21ab2f9..92b4e53adb 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -5,9 +5,11 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -227,6 +229,20 @@ rte_thread_detach(rte_thread_t thread_id)
return pthread_detach((pthread_t)thread_id.opaque_id);
 }
 
+void
+rte_thread_yield(void)
+{
+   sched_yield();
+}
+
+void
+rte_thread_yield_realtime(void)
+{
+   /* A simple yield may not be enough to schedule kernel threads. */
+   struct timespec wait = {.tv_nsec = 1};
+   nanosleep(&wait, NULL);
+}
+
 int
 rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
 {
diff --git a/lib/eal/version.map b/lib/eal/version.map
index e00a844805..b81ac3e3af 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -413,6 +413,10 @@ EXPERIMENTAL {
# added in 23.07
rte_memzone_max_get;
rte_memzone_max_set;
+
+   # added in 23.11
+   rte_thread_yield;
+   rte_thread_yield_realtime;
 };
 
 INTERNAL {
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index acf648456c..1e031eca40 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -304,6 +304,21 @@ rte_thread_detach(rte_thread_t thread_id)
return 0;
 }
 
+void
+rte_thread_yield(void)
+{
+   Sleep(0);
+}
+
+void
+rte_thread_yield_realtime(void)
+{
+   /* Real-time threads are not causing problems on Windows.
+* A normal yield should be fine.
+*/
+   Sleep(0);
+}
+
 int
 rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
 {
-- 
2.42.0



[PATCH v3 2/2] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Thomas Monjalon
When adding an API for creating threads,
the real-time priority has been forbidden on Unix.

There is a known issue with ring behaviour,
but it should not be completely forbidden.

Real-time thread can block some kernel threads on the same core,
making the system unstable.
That's why a pause is added in the test thread.

Fixes: ca04c78b6262 ("eal: get/set thread priority per thread identifier")
Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
Cc: sta...@dpdk.org

Signed-off-by: Thomas Monjalon 
Acked-by: Morten Brørup 
---
 app/test/test_threads.c | 11 +--
 doc/guides/prog_guide/env_abstraction_layer.rst |  4 +++-
 lib/eal/include/rte_thread.h|  7 +--
 lib/eal/unix/rte_thread.c   | 14 +-
 4 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index 4ac3f2671a..c14d39fc83 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -22,7 +22,7 @@ thread_main(void *arg)
__atomic_store_n(&thread_id_ready, 1, __ATOMIC_RELEASE);
 
while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 1)
-   ;
+   rte_thread_yield_realtime(); /* required for RT priority */
 
return 0;
 }
@@ -97,21 +97,12 @@ test_thread_priority(void)
"Priority set mismatches priority get");
 
priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
-#ifndef RTE_EXEC_ENV_WINDOWS
-   RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == ENOTSUP,
-   "Priority set to critical should fail");
-   RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
-   "Failed to get thread priority");
-   RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_NORMAL,
-   "Failed set to critical should have retained normal");
-#else
RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
"Priority set to critical should succeed");
RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
"Failed to get thread priority");
RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL,
"Priority set mismatches priority get");
-#endif
 
priority = RTE_THREAD_PRIORITY_NORMAL;
RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst 
b/doc/guides/prog_guide/env_abstraction_layer.rst
index 6debf54efb..d1f7cae7cd 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -815,7 +815,9 @@ Known Issues
 
   4. It MAY be used by preemptible multi-producer and/or preemptible 
multi-consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), 
SCHED_IDLE or SCHED_BATCH. User SHOULD be aware of the performance penalty 
before using it.
 
-  5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling 
policies are SCHED_FIFO or SCHED_RR.
+  5. It MUST not be used by multi-producer/consumer pthreads
+ whose scheduling policies are ``SCHED_FIFO``
+ or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
 
   Alternatively, applications can use the lock-free stack mempool handler. When
   considering this handler, note that:
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 139cafac96..1952a10155 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -56,10 +56,13 @@ typedef uint32_t (*rte_thread_func) (void *arg);
  * Thread priority values.
  */
 enum rte_thread_priority {
+   /** Normal thread priority, the default. */
RTE_THREAD_PRIORITY_NORMAL= 0,
-   /**< normal thread priority, the default */
+   /**
+* Highest thread priority, use with caution.
+* WARNING: System may be unstable because of a real-time busy loop.
+*/
RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 1,
-   /**< highest thread priority allowed */
 };
 
 /**
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 92b4e53adb..87ddf25f1c 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -51,6 +51,11 @@ thread_map_priority_to_os_value(enum rte_thread_priority 
eal_pri, int *os_pri,
sched_get_priority_max(SCHED_OTHER)) / 2;
break;
case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
+   /*
+* WARNING: Real-time busy loop takes priority on kernel 
threads,
+*  making the system unstable.
+*  There is also a known issue when using rte_ring.
+*/
*pol = SCHED_RR;
*os_pri = sched_get_priority_max(SCHED_RR);
break;
@@ -155,11 +160,6 @@ rte_thread_create(rte_thr

Re: [PATCH v3 1/2] eal: add thread yield functions

2023-10-25 Thread Bruce Richardson
On Wed, Oct 25, 2023 at 06:31:10PM +0200, Thomas Monjalon wrote:
> When running real-time threads, we may need to force scheduling
> kernel threads or other real-time threads.
> New functions are added to address these cases.
> 
> The yield functions should not have any interest for normal threads.
> Note: other purposes may be addressed with rte_pause() or rte_delay_*().
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  lib/eal/include/rte_thread.h | 22 ++
>  lib/eal/unix/rte_thread.c| 16 
>  lib/eal/version.map  |  4 
>  lib/eal/windows/rte_thread.c | 15 +++
>  4 files changed, 57 insertions(+)
> 
> diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
> index 8da9d4d3fb..139cafac96 100644
> --- a/lib/eal/include/rte_thread.h
> +++ b/lib/eal/include/rte_thread.h
> @@ -183,6 +183,28 @@ int rte_thread_join(rte_thread_t thread_id, uint32_t 
> *value_ptr);
>   */
>  int rte_thread_detach(rte_thread_t thread_id);
>  
> +/**
> + * Allow another thread to run on the same CPU core.
> + *
> + * Lower priority threads may not be scheduled.
> + *
> + * Especially useful in real-time thread priority
> + * to schedule other real-time threads.
> + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> + */
> +__rte_experimental
> +void rte_thread_yield(void);
> +
> +/**
> + * Unblock a CPU core running busy in a real-time thread.
> + *
> + * Especially useful in real-time thread priority
> + * to avoid a busy loop blocking vital threads on a core.
> + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> + */
> +__rte_experimental
> +void rte_thread_yield_realtime(void);
> +
>  /**
>   * Get the id of the calling thread.
>   *
> diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
> index 36a21ab2f9..92b4e53adb 100644
> --- a/lib/eal/unix/rte_thread.c
> +++ b/lib/eal/unix/rte_thread.c
> @@ -5,9 +5,11 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -227,6 +229,20 @@ rte_thread_detach(rte_thread_t thread_id)
>   return pthread_detach((pthread_t)thread_id.opaque_id);
>  }
>  
> +void
> +rte_thread_yield(void)
> +{
> + sched_yield();
> +}
> +
> +void
> +rte_thread_yield_realtime(void)
> +{
> + /* A simple yield may not be enough to schedule kernel threads. */
> + struct timespec wait = {.tv_nsec = 1};
> + nanosleep(&wait, NULL);
> +}
> +
While I realise we discussed this earlier, and I also was the original
suggester of using sched_yield, I think just having just one function using
sleep is probably best after all.

/Bruce


Re: [PATCH v2] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Thomas Monjalon
25/10/2023 17:37, Stephen Hemminger:
> On Wed, 25 Oct 2023 17:13:14 +0200
> Thomas Monjalon  wrote:
> 
> > case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
> > +   /*
> > +* WARNING: Real-time busy loop takes priority on kernel 
> > threads,
> > +*  making the system unstable.
> > +*  There is also a known issue when using rte_ring.
> > +*/
> 
> I was thinking something like:
> 
>   static bool warned;
>   if (!warned) {
>   RTE_LOG(NOTICE, EAL, "Real time priority is unstable when 
> thread is polling without sleep\n");
>   warned = true;
>   }

I'm not sure about bothering users.
They can fear something is wrong even if the developer took care of it.
I think doc warnings for developers are more appropriate.
I've added notes in the API.




Re: [PATCH v3 1/2] eal: add thread yield functions

2023-10-25 Thread Thomas Monjalon
25/10/2023 18:40, Bruce Richardson:
> On Wed, Oct 25, 2023 at 06:31:10PM +0200, Thomas Monjalon wrote:
> > When running real-time threads, we may need to force scheduling
> > kernel threads or other real-time threads.
> > New functions are added to address these cases.
> > 
> > The yield functions should not have any interest for normal threads.
> > Note: other purposes may be addressed with rte_pause() or rte_delay_*().
> > 
> > Signed-off-by: Thomas Monjalon 
> > ---
> >  lib/eal/include/rte_thread.h | 22 ++
> >  lib/eal/unix/rte_thread.c| 16 
> >  lib/eal/version.map  |  4 
> >  lib/eal/windows/rte_thread.c | 15 +++
> >  4 files changed, 57 insertions(+)
> > 
> > diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
> > index 8da9d4d3fb..139cafac96 100644
> > --- a/lib/eal/include/rte_thread.h
> > +++ b/lib/eal/include/rte_thread.h
> > @@ -183,6 +183,28 @@ int rte_thread_join(rte_thread_t thread_id, uint32_t 
> > *value_ptr);
> >   */
> >  int rte_thread_detach(rte_thread_t thread_id);
> >  
> > +/**
> > + * Allow another thread to run on the same CPU core.
> > + *
> > + * Lower priority threads may not be scheduled.
> > + *
> > + * Especially useful in real-time thread priority
> > + * to schedule other real-time threads.
> > + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> > + */
> > +__rte_experimental
> > +void rte_thread_yield(void);
> > +
> > +/**
> > + * Unblock a CPU core running busy in a real-time thread.
> > + *
> > + * Especially useful in real-time thread priority
> > + * to avoid a busy loop blocking vital threads on a core.
> > + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> > + */
> > +__rte_experimental
> > +void rte_thread_yield_realtime(void);
> > +
> >  /**
> >   * Get the id of the calling thread.
> >   *
> > diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
> > index 36a21ab2f9..92b4e53adb 100644
> > --- a/lib/eal/unix/rte_thread.c
> > +++ b/lib/eal/unix/rte_thread.c
> > @@ -5,9 +5,11 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -227,6 +229,20 @@ rte_thread_detach(rte_thread_t thread_id)
> > return pthread_detach((pthread_t)thread_id.opaque_id);
> >  }
> >  
> > +void
> > +rte_thread_yield(void)
> > +{
> > +   sched_yield();
> > +}
> > +
> > +void
> > +rte_thread_yield_realtime(void)
> > +{
> > +   /* A simple yield may not be enough to schedule kernel threads. */
> > +   struct timespec wait = {.tv_nsec = 1};
> > +   nanosleep(&wait, NULL);
> > +}
> > +
> While I realise we discussed this earlier, and I also was the original
> suggester of using sched_yield, I think just having just one function using
> sleep is probably best after all.

I think there is a value to have a simple yield function
for scheduling between multiple real-time threads
without sleep overhead (not sure about the overhead).
If there is not much overhead, then a single function is OK.

Note I'm preparing a new version with a simple yield implemented
with the lighter SwitchToThread() on Windows.





RE: [PATCH v2] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Morten Brørup
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Wednesday, 25 October 2023 18.46
> 
> 25/10/2023 17:37, Stephen Hemminger:
> > On Wed, 25 Oct 2023 17:13:14 +0200
> > Thomas Monjalon  wrote:
> >
> > >   case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
> > > + /*
> > > +  * WARNING: Real-time busy loop takes priority on kernel
> threads,
> > > +  *  making the system unstable.
> > > +  *  There is also a known issue when using
> rte_ring.
> > > +  */
> >
> > I was thinking something like:
> >
> > static bool warned;
> > if (!warned) {
> > RTE_LOG(NOTICE, EAL, "Real time priority is unstable when
> thread is polling without sleep\n");
> > warned = true;
> > }
> 
> I'm not sure about bothering users.
> They can fear something is wrong even if the developer took care of it.
> I think doc warnings for developers are more appropriate.
> I've added notes in the API.

I agree with Thomas on this.

If you want the log message, please degrade it to INFO or DEBUG level. It is 
only relevant when chasing problems, not for normal production - and thus 
NOTICE is too high.


Someone might build a kernel with options to keep non-dataplane threads off 
some dedicated CPU cores, so they can be used for guaranteed low-latency 
dataplane threads. We do. We don't use real-time priority, though.

For reference, we did some experiments (using this custom built kernel) with a 
dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and 
registering the delta. Although the overwhelming majority is ca. CPU 80 cycles, 
there are some big outliers at ca. 9,000 CPU cycles. (Order of magnitude: ca. 
45 of these big outliers per minute.) Apparently some kernel threads steal some 
cycles from this thread, regardless of our customizations. We haven't bothered 
analyzing and optimizing it further.

I think our experiment supports the need to allow kernel threads to run, e.g. 
by calling sleep() or similar, when an EAL thread has real-time priority.



RE: [PATCH v3 1/2] eal: add thread yield functions

2023-10-25 Thread Morten Brørup
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Wednesday, 25 October 2023 18.31
> 
> When running real-time threads, we may need to force scheduling
> kernel threads or other real-time threads.
> New functions are added to address these cases.
> 
> The yield functions should not have any interest for normal threads.
> Note: other purposes may be addressed with rte_pause() or rte_delay_*().
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  lib/eal/include/rte_thread.h | 22 ++
>  lib/eal/unix/rte_thread.c| 16 
>  lib/eal/version.map  |  4 
>  lib/eal/windows/rte_thread.c | 15 +++
>  4 files changed, 57 insertions(+)
> 
> diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
> index 8da9d4d3fb..139cafac96 100644
> --- a/lib/eal/include/rte_thread.h
> +++ b/lib/eal/include/rte_thread.h
> @@ -183,6 +183,28 @@ int rte_thread_join(rte_thread_t thread_id,
> uint32_t *value_ptr);
>   */
>  int rte_thread_detach(rte_thread_t thread_id);
> 
> +/**
> + * Allow another thread to run on the same CPU core.
> + *
> + * Lower priority threads may not be scheduled.
> + *
> + * Especially useful in real-time thread priority
> + * to schedule other real-time threads.
> + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> + */
> +__rte_experimental
> +void rte_thread_yield(void);
> +
> +/**
> + * Unblock a CPU core running busy in a real-time thread.
> + *
> + * Especially useful in real-time thread priority
> + * to avoid a busy loop blocking vital threads on a core.
> + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> + */
> +__rte_experimental
> +void rte_thread_yield_realtime(void);
> +

If an application really needs to use real-time priority, the behavior of any 
DPDK yield functions must be documented in much more detail than this - 
especially in regard to expected latency.

[...]

> diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
> index acf648456c..1e031eca40 100644
> --- a/lib/eal/windows/rte_thread.c
> +++ b/lib/eal/windows/rte_thread.c
> @@ -304,6 +304,21 @@ rte_thread_detach(rte_thread_t thread_id)
>   return 0;
>  }
> 
> +void
> +rte_thread_yield(void)
> +{
> + Sleep(0);
> +}
> +
> +void
> +rte_thread_yield_realtime(void)
> +{
> + /* Real-time threads are not causing problems on Windows.
> +  * A normal yield should be fine.

Back in the days, the Windows API had a Yield() function; make sure your 
comment can't be misunderstood as referring to that.

> +  */
> + Sleep(0);
> +}



[PATCH] ethdev: fix ESP packet type description

2023-10-25 Thread Alexander Kozyrev
The correct protocol number for ESP (IP Encapsulating Security Payload)
packet type is 50. 51 is IPSec AH (Authentication Header).

Fixes: 1e84afd3906b ("mbuf: add security crypto flags and fields")
Signed-off-by: Alexander Kozyrev 
---
 lib/mbuf/rte_mbuf_ptype.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/mbuf/rte_mbuf_ptype.h b/lib/mbuf/rte_mbuf_ptype.h
index 17a2dd3576..f2276e2909 100644
--- a/lib/mbuf/rte_mbuf_ptype.h
+++ b/lib/mbuf/rte_mbuf_ptype.h
@@ -419,10 +419,10 @@ extern "C" {
  *
  * Packet format:
  * <'ether type'=0x0800
- * | 'version'=4, 'protocol'=51>
+ * | 'version'=4, 'protocol'=50>
  * or,
  * <'ether type'=0x86DD
- * | 'version'=6, 'next header'=51>
+ * | 'version'=6, 'next header'=50>
  */
 #define RTE_PTYPE_TUNNEL_ESP0x9000
 /**
-- 
2.18.2



[PATCH] net/mlx5/hws: remove csum check from L3 ok check

2023-10-25 Thread Alexander Kozyrev
From: Michael Baum 

This patch changes the integrity item behavior for HW steering.

Old behavior: the "ipv4_csum_ok" checks only IPv4 checksum and "l3_ok"
checks everything is ok including IPv4 checksum.

New behavior: the "l3_ok" checks everything is ok excluding IPv4
checksum.

This change enables matching "l3_ok" in IPv6 packets since for IPv6
packets "ipv4_csum_ok" is always miss.
For SW steering the old behavior is kept as same as for L4 ok.

Signed-off-by: Michael Baum 
---
 doc/guides/nics/mlx5.rst  | 11 ---
 drivers/net/mlx5/hws/mlx5dr_definer.c |  6 ++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 7086f3d1d4..4d9c8f53cf 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -648,12 +648,13 @@ Limitations
 
 - Integrity:
 
-  - Integrity offload is enabled starting from **ConnectX-6 Dx**.
   - Verification bits provided by the hardware are ``l3_ok``, 
``ipv4_csum_ok``, ``l4_ok``, ``l4_csum_ok``.
   - ``level`` value 0 references outer headers.
   - Negative integrity item verification is not supported.
-  - Multiple integrity items not supported in a single flow rule.
-  - Flow rule items supplied by application must explicitly specify network 
headers referred by integrity item.
+  - With SW steering (``dv_flow_en=1``)
+- Integrity offload is enabled starting from **ConnectX-6 Dx**.
+- Multiple integrity items not supported in a single flow rule.
+- Flow rule items supplied by application must explicitly specify network 
headers referred by integrity item.
 For example, if integrity item mask sets ``l4_ok`` or ``l4_csum_ok`` bits, 
reference to L4 network header,
 TCP or UDP, must be in the rule pattern as well::
 
@@ -661,6 +662,10 @@ Limitations
 
   flow create 0 ingress pattern integrity level is 0 value mask l4_ok 
value spec l4_ok / eth / ipv4 proto is udp / end …
 
+  - With HW steering (``dv_flow_en=2``)
+- The ``l3_ok`` field represents all L3 checks, but nothing about whether 
IPv4 checksum ok.
+- The ``l4_ok`` field represents all L4 checks including L4 checksum ok.
+
 - Connection tracking:
 
   - Cannot co-exist with ASO meter, ASO age action in a single flow rule.
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c 
b/drivers/net/mlx5/hws/mlx5dr_definer.c
index 95b5d4b70e..6b63ccedac 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.c
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
@@ -287,10 +287,8 @@ mlx5dr_definer_integrity_set(struct mlx5dr_definer_fc *fc,
uint32_t ok1_bits = 0;
 
if (v->l3_ok)
-   ok1_bits |= inner ? BIT(MLX5DR_DEFINER_OKS1_SECOND_L3_OK) |
-   
BIT(MLX5DR_DEFINER_OKS1_SECOND_IPV4_CSUM_OK) :
-   BIT(MLX5DR_DEFINER_OKS1_FIRST_L3_OK) |
-   BIT(MLX5DR_DEFINER_OKS1_FIRST_IPV4_CSUM_OK);
+   ok1_bits |= inner ? BIT(MLX5DR_DEFINER_OKS1_SECOND_L3_OK) :
+   BIT(MLX5DR_DEFINER_OKS1_FIRST_L3_OK);
 
if (v->ipv4_csum_ok)
ok1_bits |= inner ? 
BIT(MLX5DR_DEFINER_OKS1_SECOND_IPV4_CSUM_OK) :
-- 
2.18.2



[PATCH] net/mlx5/hws: fix integrity bits level

2023-10-25 Thread Alexander Kozyrev
The level field in the integrity item is not taken into account
in the current implementation of hardware steering.
Use this value instead of trying to find out the encapsulation
level according to the protocol items involved.

Fixes: c55c2bf35333 ("net/mlx5/hws: add definer layer")

Signed-off-by: Alexander Kozyrev 
---
 drivers/net/mlx5/hws/mlx5dr_definer.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c 
b/drivers/net/mlx5/hws/mlx5dr_definer.c
index 95b5d4b70e..600544c044 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.c
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
@@ -1716,7 +1716,6 @@ mlx5dr_definer_conv_item_integrity(struct 
mlx5dr_definer_conv_data *cd,
 {
const struct rte_flow_item_integrity *m = item->mask;
struct mlx5dr_definer_fc *fc;
-   bool inner = cd->tunnel;
 
if (!m)
return 0;
@@ -1727,7 +1726,7 @@ mlx5dr_definer_conv_item_integrity(struct 
mlx5dr_definer_conv_data *cd,
}
 
if (m->l3_ok || m->ipv4_csum_ok || m->l4_ok || m->l4_csum_ok) {
-   fc = &cd->fc[DR_CALC_FNAME(INTEGRITY, inner)];
+   fc = &cd->fc[DR_CALC_FNAME(INTEGRITY, m->level)];
fc->item_idx = item_idx;
fc->tag_set = &mlx5dr_definer_integrity_set;
DR_CALC_SET_HDR(fc, oks1, oks1_bits);
@@ -2282,8 +2281,7 @@ mlx5dr_definer_conv_items_to_hl(struct mlx5dr_context 
*ctx,
break;
case RTE_FLOW_ITEM_TYPE_INTEGRITY:
ret = mlx5dr_definer_conv_item_integrity(&cd, items, i);
-   item_flags |= cd.tunnel ? 
MLX5_FLOW_ITEM_INNER_INTEGRITY :
- 
MLX5_FLOW_ITEM_OUTER_INTEGRITY;
+   item_flags |= MLX5_FLOW_ITEM_INTEGRITY;
break;
case RTE_FLOW_ITEM_TYPE_CONNTRACK:
ret = mlx5dr_definer_conv_item_conntrack(&cd, items, i);
-- 
2.18.2



[PATCH v4 0/4] ptype matching support in mlx5

2023-10-25 Thread Alexander Kozyrev
Add support for RTE_FLOW_ITEM_TYPE_PTYPE in mlx5 PMD.

Alexander Kozyrev (3):
  net/mlx5: add support for ptype match in hardware steering
  net/mlx5/hws: add support for fragmented ptype match
  doc: add packet type matching item to release notes

Michael Baum (1):
  doc: add PMD ptype item limitations

 doc/guides/nics/features/mlx5.ini  |   1 +
 doc/guides/nics/mlx5.rst   |  15 ++
 doc/guides/rel_notes/release_23_11.rst |   5 +
 drivers/net/mlx5/hws/mlx5dr_definer.c  | 195 +
 drivers/net/mlx5/hws/mlx5dr_definer.h  |   9 ++
 drivers/net/mlx5/mlx5_flow.h   |   3 +
 drivers/net/mlx5/mlx5_flow_hw.c|   1 +
 7 files changed, 229 insertions(+)

-- 
2.18.2



[PATCH v4 1/4] net/mlx5: add support for ptype match in hardware steering

2023-10-25 Thread Alexander Kozyrev
The packet type matching provides quick way of finding out
L2/L3/L4 protocols in a given packet. That helps with
optimized flow rules matching, eliminating the need of
stacking all the packet headers in the matching criteria.

Signed-off-by: Alexander Kozyrev 
---
 drivers/net/mlx5/hws/mlx5dr_definer.c | 161 ++
 drivers/net/mlx5/hws/mlx5dr_definer.h |   7 ++
 drivers/net/mlx5/mlx5_flow.h  |   3 +
 drivers/net/mlx5/mlx5_flow_hw.c   |   1 +
 4 files changed, 172 insertions(+)

diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c 
b/drivers/net/mlx5/hws/mlx5dr_definer.c
index 95b5d4b70e..8d846984e7 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.c
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
@@ -16,11 +16,15 @@
 #define STE_NO_VLAN0x0
 #define STE_SVLAN  0x1
 #define STE_CVLAN  0x2
+#define STE_NO_L3  0x0
 #define STE_IPV4   0x1
 #define STE_IPV6   0x2
+#define STE_NO_L4  0x0
 #define STE_TCP0x1
 #define STE_UDP0x2
 #define STE_ICMP   0x3
+#define STE_NO_TUN 0x0
+#define STE_ESP0x3
 
 #define MLX5DR_DEFINER_QUOTA_BLOCK 0
 #define MLX5DR_DEFINER_QUOTA_PASS  2
@@ -277,6 +281,82 @@ mlx5dr_definer_conntrack_tag(struct mlx5dr_definer_fc *fc,
DR_SET(tag, reg_value, fc->byte_off, fc->bit_off, fc->bit_mask);
 }
 
+static void
+mlx5dr_definer_ptype_l2_set(struct mlx5dr_definer_fc *fc,
+   const void *item_spec,
+   uint8_t *tag)
+{
+   bool inner = (fc->fname == MLX5DR_DEFINER_FNAME_PTYPE_L2_I);
+   const struct rte_flow_item_ptype *v = item_spec;
+   uint32_t packet_type = v->packet_type &
+   (inner ? RTE_PTYPE_INNER_L2_MASK : RTE_PTYPE_L2_MASK);
+   uint8_t l2_type = STE_NO_VLAN;
+
+   if (packet_type == (inner ? RTE_PTYPE_INNER_L2_ETHER : 
RTE_PTYPE_L2_ETHER))
+   l2_type = STE_NO_VLAN;
+   else if (packet_type == (inner ? RTE_PTYPE_INNER_L2_ETHER_VLAN : 
RTE_PTYPE_L2_ETHER_VLAN))
+   l2_type = STE_CVLAN;
+   else if (packet_type == (inner ? RTE_PTYPE_INNER_L2_ETHER_QINQ : 
RTE_PTYPE_L2_ETHER_QINQ))
+   l2_type = STE_SVLAN;
+
+   DR_SET(tag, l2_type, fc->byte_off, fc->bit_off, fc->bit_mask);
+}
+
+static void
+mlx5dr_definer_ptype_l3_set(struct mlx5dr_definer_fc *fc,
+   const void *item_spec,
+   uint8_t *tag)
+{
+   bool inner = (fc->fname == MLX5DR_DEFINER_FNAME_PTYPE_L3_I);
+   const struct rte_flow_item_ptype *v = item_spec;
+   uint32_t packet_type = v->packet_type &
+   (inner ? RTE_PTYPE_INNER_L3_MASK : RTE_PTYPE_L3_MASK);
+   uint8_t l3_type = STE_NO_L3;
+
+   if (packet_type == (inner ? RTE_PTYPE_INNER_L3_IPV4 : 
RTE_PTYPE_L3_IPV4))
+   l3_type = STE_IPV4;
+   else if (packet_type == (inner ? RTE_PTYPE_INNER_L3_IPV6 : 
RTE_PTYPE_L3_IPV6))
+   l3_type = STE_IPV6;
+
+   DR_SET(tag, l3_type, fc->byte_off, fc->bit_off, fc->bit_mask);
+}
+
+static void
+mlx5dr_definer_ptype_l4_set(struct mlx5dr_definer_fc *fc,
+   const void *item_spec,
+   uint8_t *tag)
+{
+   bool inner = (fc->fname == MLX5DR_DEFINER_FNAME_PTYPE_L4_I);
+   const struct rte_flow_item_ptype *v = item_spec;
+   uint32_t packet_type = v->packet_type &
+   (inner ? RTE_PTYPE_INNER_L4_MASK : RTE_PTYPE_L4_MASK);
+   uint8_t l4_type = STE_NO_L4;
+
+   if (packet_type == (inner ? RTE_PTYPE_INNER_L4_TCP : RTE_PTYPE_L4_TCP))
+   l4_type = STE_TCP;
+   else if (packet_type == (inner ? RTE_PTYPE_INNER_L4_UDP : 
RTE_PTYPE_L4_UDP))
+   l4_type = STE_UDP;
+   else if (packet_type == (inner ? RTE_PTYPE_INNER_L4_ICMP : 
RTE_PTYPE_L4_ICMP))
+   l4_type = STE_ICMP;
+
+   DR_SET(tag, l4_type, fc->byte_off, fc->bit_off, fc->bit_mask);
+}
+
+static void
+mlx5dr_definer_ptype_tunnel_set(struct mlx5dr_definer_fc *fc,
+   const void *item_spec,
+   uint8_t *tag)
+{
+   const struct rte_flow_item_ptype *v = item_spec;
+   uint32_t packet_type = v->packet_type & RTE_PTYPE_TUNNEL_MASK;
+   uint8_t tun_type = STE_NO_TUN;
+
+   if (packet_type == RTE_PTYPE_TUNNEL_ESP)
+   tun_type = STE_ESP;
+
+   DR_SET(tag, tun_type, fc->byte_off, fc->bit_off, fc->bit_mask);
+}
+
 static void
 mlx5dr_definer_integrity_set(struct mlx5dr_definer_fc *fc,
 const void *item_spec,
@@ -1709,6 +1789,83 @@ mlx5dr_definer_conv_item_gre_key(struct 
mlx5dr_definer_conv_data *cd,
return 0;
 }
 
+static int
+mlx5dr_definer_conv_item_ptype(struct mlx5dr_definer_conv_data *cd,
+  struct rte_flow_item *item,
+  int item_idx)
+{
+   const struct rte_flow_item_ptype *m = item->mask;
+   struct mlx5dr_definer_fc *fc;
+
+   if (!m)
+

[PATCH v4 2/4] net/mlx5/hws: add support for fragmented ptype match

2023-10-25 Thread Alexander Kozyrev
Expand packet type matching with support of the
Fragmented IP (Internet Protocol) packet type.

Signed-off-by: Alexander Kozyrev 
---
 drivers/net/mlx5/hws/mlx5dr_definer.c | 54 ++-
 drivers/net/mlx5/hws/mlx5dr_definer.h |  2 +
 2 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c 
b/drivers/net/mlx5/hws/mlx5dr_definer.c
index 8d846984e7..0e1035c6bd 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.c
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
@@ -357,6 +357,19 @@ mlx5dr_definer_ptype_tunnel_set(struct mlx5dr_definer_fc 
*fc,
DR_SET(tag, tun_type, fc->byte_off, fc->bit_off, fc->bit_mask);
 }
 
+static void
+mlx5dr_definer_ptype_frag_set(struct mlx5dr_definer_fc *fc,
+ const void *item_spec,
+ uint8_t *tag)
+{
+   bool inner = (fc->fname == MLX5DR_DEFINER_FNAME_PTYPE_FRAG_I);
+   const struct rte_flow_item_ptype *v = item_spec;
+   uint32_t packet_type = v->packet_type &
+   (inner ? RTE_PTYPE_INNER_L4_FRAG : RTE_PTYPE_L4_FRAG);
+
+   DR_SET(tag, !!packet_type, fc->byte_off, fc->bit_off, fc->bit_mask);
+}
+
 static void
 mlx5dr_definer_integrity_set(struct mlx5dr_definer_fc *fc,
 const void *item_spec,
@@ -1840,19 +1853,40 @@ mlx5dr_definer_conv_item_ptype(struct 
mlx5dr_definer_conv_data *cd,
}
 
if (m->packet_type & RTE_PTYPE_L4_MASK) {
-   fc = &cd->fc[DR_CALC_FNAME(PTYPE_L4, false)];
-   fc->item_idx = item_idx;
-   fc->tag_set = &mlx5dr_definer_ptype_l4_set;
-   fc->tag_mask_set = &mlx5dr_definer_ones_set;
-   DR_CALC_SET(fc, eth_l2, l4_type, false);
+   /*
+* Fragmented IP (Internet Protocol) packet type.
+* Cannot be combined with Layer 4 Types (TCP/UDP).
+* The exact value must be specified in the mask.
+*/
+   if (m->packet_type == RTE_PTYPE_L4_FRAG) {
+   fc = &cd->fc[DR_CALC_FNAME(PTYPE_FRAG, false)];
+   fc->item_idx = item_idx;
+   fc->tag_set = &mlx5dr_definer_ptype_frag_set;
+   fc->tag_mask_set = &mlx5dr_definer_ones_set;
+   DR_CALC_SET(fc, eth_l2, ip_fragmented, false);
+   } else {
+   fc = &cd->fc[DR_CALC_FNAME(PTYPE_L4, false)];
+   fc->item_idx = item_idx;
+   fc->tag_set = &mlx5dr_definer_ptype_l4_set;
+   fc->tag_mask_set = &mlx5dr_definer_ones_set;
+   DR_CALC_SET(fc, eth_l2, l4_type, false);
+   }
}
 
if (m->packet_type & RTE_PTYPE_INNER_L4_MASK) {
-   fc = &cd->fc[DR_CALC_FNAME(PTYPE_L4, true)];
-   fc->item_idx = item_idx;
-   fc->tag_set = &mlx5dr_definer_ptype_l4_set;
-   fc->tag_mask_set = &mlx5dr_definer_ones_set;
-   DR_CALC_SET(fc, eth_l2, l4_type, true);
+   if (m->packet_type == RTE_PTYPE_INNER_L4_FRAG) {
+   fc = &cd->fc[DR_CALC_FNAME(PTYPE_FRAG, true)];
+   fc->item_idx = item_idx;
+   fc->tag_set = &mlx5dr_definer_ptype_frag_set;
+   fc->tag_mask_set = &mlx5dr_definer_ones_set;
+   DR_CALC_SET(fc, eth_l2, ip_fragmented, true);
+   } else {
+   fc = &cd->fc[DR_CALC_FNAME(PTYPE_L4, true)];
+   fc->item_idx = item_idx;
+   fc->tag_set = &mlx5dr_definer_ptype_l4_set;
+   fc->tag_mask_set = &mlx5dr_definer_ones_set;
+   DR_CALC_SET(fc, eth_l2, l4_type, true);
+   }
}
 
if (m->packet_type & RTE_PTYPE_TUNNEL_MASK) {
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.h 
b/drivers/net/mlx5/hws/mlx5dr_definer.h
index ea07f55d52..791154a7dc 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.h
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.h
@@ -148,6 +148,8 @@ enum mlx5dr_definer_fname {
MLX5DR_DEFINER_FNAME_PTYPE_L4_O,
MLX5DR_DEFINER_FNAME_PTYPE_L4_I,
MLX5DR_DEFINER_FNAME_PTYPE_TUNNEL,
+   MLX5DR_DEFINER_FNAME_PTYPE_FRAG_O,
+   MLX5DR_DEFINER_FNAME_PTYPE_FRAG_I,
MLX5DR_DEFINER_FNAME_MAX,
 };
 
-- 
2.18.2



[PATCH v4 3/4] doc: add PMD ptype item limitations

2023-10-25 Thread Alexander Kozyrev
From: Michael Baum 

Add limitations for ptype item support in "mlx5.rst" file.

Signed-off-by: Michael Baum 
---
 doc/guides/nics/features/mlx5.ini |  1 +
 doc/guides/nics/mlx5.rst  | 15 +++
 2 files changed, 16 insertions(+)

diff --git a/doc/guides/nics/features/mlx5.ini 
b/doc/guides/nics/features/mlx5.ini
index fc67415c6c..e3927ab4df 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -86,6 +86,7 @@ nsh  = Y
 nvgre= Y
 port_id  = Y
 port_representor = Y
+ptype= Y
 quota= Y
 tag  = Y
 tcp  = Y
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 7086f3d1d4..c9e74948cc 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -646,6 +646,21 @@ Limitations
   - When using HWS flow engine (``dv_flow_en`` = 2),
 only meter mark action is supported.
 
+- Ptype:
+
+  - Only supports HW steering (``dv_flow_en=2``).
+  - The supported values are:
+L2: ``RTE_PTYPE_L2_ETHER``, ``RTE_PTYPE_L2_ETHER_VLAN``, 
``RTE_PTYPE_L2_ETHER_QINQ``
+L3: ``RTE_PTYPE_L3_IPV4``, ``RTE_PTYPE_L3_IPV6``
+L4: ``RTE_PTYPE_L4_TCP``, ``RTE_PTYPE_L4_UDP``, ``RTE_PTYPE_L4_ICMP``
+and their ``RTE_PTYPE_INNER_XXX`` counterparts as well as 
``RTE_PTYPE_TUNNEL_ESP``.
+Any other values are not supported. Using them as a value will cause 
unexpected behavior.
+  - Matching on both outer and inner IP fragmented is supported using 
``RTE_PTYPE_L4_FRAG`` and
+``RTE_PTYPE_INNER_L4_FRAG`` values. They are not part of L4 types, so they 
should be provided
+explicitly as a mask value during pattern template creation. Providing 
``RTE_PTYPE_L4_MASK``
+during pattern template creation and ``RTE_PTYPE_L4_FRAG`` during flow 
rule creation
+will cause unexpected behavior.
+
 - Integrity:
 
   - Integrity offload is enabled starting from **ConnectX-6 Dx**.
-- 
2.18.2



[PATCH v4 4/4] doc: add packet type matching item to release notes

2023-10-25 Thread Alexander Kozyrev
Document new RTE_FLOW_ITEM_TYPE_PTYPE in the release notes.

Signed-off-by: Alexander Kozyrev 
---
 doc/guides/rel_notes/release_23_11.rst | 5 +
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/release_23_11.rst 
b/doc/guides/rel_notes/release_23_11.rst
index 0a6fc76a9d..b94328b8a7 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -122,6 +122,10 @@ New Features
   a group's miss actions, which are the actions to be performed on packets
   that didn't match any of the flow rules in the group.
 
+* **Added ptype matching criteria.**
+  Added ``RTE_FLOW_ITEM_TYPE_PTYPE`` to allow matching on L2/L3/L4
+  and tunnel  information as defined in mbuf packet type.
+
 * **Updated Intel cpfl driver.**
 
   * Added support for port representor.
@@ -143,6 +147,7 @@ New Features
 * **Updated NVIDIA mlx5 net driver.**
 
   * Added support for Network Service Header (NSH) flow matching.
+  * Added support for ``RTE_FLOW_ITEM_TYPE_PTYPE`` flow item.
 
 * **Updated Solarflare net driver.**
 
-- 
2.18.2



Re: [PATCH v2 1/5] net/ena: hal upgrade

2023-10-25 Thread Ferruh Yigit
On 10/25/2023 2:36 PM, shaib...@amazon.com wrote:
> From: Shai Brandes 
> 
> ENA maintains a HAL that is shared by all supported host drivers.
> Main features introduced to the HAL:
> [1] Reworked the mechanism that queries the performance metrics
> from the device.
> [2] Added support for a new metric that allows monitoring the
> available tracked connections.
> [3] Added support for a new statistic that counts RX drops due
> to insufficient buffers provided by host.
> [4] Added support for Scalable Reliable Datagram (SRD) metrics
> from ENA Express.
> [5] Added support for querying the LLQ entry size recommendation
> from the device.
> [6] Added support for PTP hardware clock (PHC) feature that
> provides enhanced accuracy (Not supported by the driver).
> [7] Added support for new reset reasons for a suspected CPU
> starvation and for completion descriptor inconsistency.
> [8] Aligned all return error code to a common notation.
> [9] Removed an obsolete queue tail pointer update API.
> 
> Signed-off-by: Shai Brandes 
> Reviewed-by: Amit Bernstein 
> ---
>  doc/guides/rel_notes/release_23_11.rst|   4 +
>  drivers/net/ena/base/ena_com.c| 499 +++---
>  drivers/net/ena/base/ena_com.h| 197 ++-
>  .../net/ena/base/ena_defs/ena_admin_defs.h| 198 ++-
>  .../net/ena/base/ena_defs/ena_eth_io_defs.h   |  18 +-
>  drivers/net/ena/base/ena_defs/ena_gen_info.h  |   4 +-
>  drivers/net/ena/base/ena_defs/ena_regs_defs.h |  12 +
>  drivers/net/ena/base/ena_eth_com.c|  45 +-
>  drivers/net/ena/base/ena_eth_com.h|  30 +-
>  drivers/net/ena/base/ena_plat.h   |   8 +-
>  drivers/net/ena/base/ena_plat_dpdk.h  |  49 +-
>  drivers/net/ena/ena_ethdev.c  |  16 +-
>  12 files changed, 915 insertions(+), 165 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_23_11.rst 
> b/doc/guides/rel_notes/release_23_11.rst
> index 0a6fc76a9d..e3b0ba58c9 100644
> --- a/doc/guides/rel_notes/release_23_11.rst
> +++ b/doc/guides/rel_notes/release_23_11.rst
> @@ -144,6 +144,10 @@ New Features
>  
>* Added support for Network Service Header (NSH) flow matching.
>  
> +* **Updated Amazon Elastic Network Adapter ena net driver.**
> +
>

What do you think clarify as:

Updated Amazon ena (Elastic Network Adapter) net driver.


> +  * Upgraded ENA HAL to latest version.
> +
>

Updates should be ordered in vendor in the driver group, so can you
please sort it based on 'Amazon'? With current release notes, it makes
before Intel driver updates.



>  * **Updated Solarflare net driver.**
>  
>* Added support for transfer flow action ``INDIRECT`` with subtype 
> ``VXLAN_ENCAP``.
> diff --git a/drivers/net/ena/base/ena_com.c b/drivers/net/ena/base/ena_com.c
> index 5ca36ab6d9..880d047956 100644
> --- a/drivers/net/ena/base/ena_com.c
> +++ b/drivers/net/ena/base/ena_com.c
> @@ -38,6 +38,12 @@
>  
>  #define ENA_MAX_ADMIN_POLL_US 5000
>  
> +/* PHC definitions */
> +#define ENA_PHC_DEFAULT_EXPIRE_TIMEOUT_USEC 20
> +#define ENA_PHC_DEFAULT_BLOCK_TIMEOUT_USEC 1000
> +#define ENA_PHC_TIMESTAMP_ERROR 0x
> +#define ENA_PHC_REQ_ID_OFFSET 0xDEAD
> +
>  
> /*/
>  
> /*/
>  
> /*/
> @@ -70,7 +76,7 @@ static int ena_com_mem_addr_set(struct ena_com_dev *ena_dev,
>  dma_addr_t addr)
>  {
>   if ((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 0)) != addr) {
> - ena_trc_err(ena_dev, "DMA address has more bits than the device 
> supports\n");
> + ena_trc_err(ena_dev, "DMA address has more bits that the device 
> supports\n");
>

Original wording looks better to me, 'than' instead of 'that', but we
can get another opinion from a native English speaker.

<...>

> diff --git a/drivers/net/ena/base/ena_plat.h b/drivers/net/ena/base/ena_plat.h
> index 2583823080..a3649e0cb6 100644
> --- a/drivers/net/ena/base/ena_plat.h
> +++ b/drivers/net/ena/base/ena_plat.h
> @@ -14,14 +14,16 @@
>  #else
>  #include 
>  #endif
> +#elif defined(_WIN32)
> +#include 
>  #elif defined(__FreeBSD__)
> -#if defined(_KERNEL)
> +#if defined(__KERNEL__)
>  #include 
>  #else
>  #include 
>  #endif
> -#elif defined(_WIN32)
> -#include 
> +#elif defined(__APPLE__)
> +#include 
>  #else
>  #error "Invalid platform"
>  #endif
>

As far as I can see only ena_plat_dpdk.h exists, other ena_plat_*.h
files not exists at all in dpdk driver,
would it be possible to strip those lines for the dpdk version of the
ena_plat.h to not confuse about the support.

If this creates additional maintanence burden, OK to continue as it is.


<...>

> @@ -107,6 +110,7 @@ extern int ena_logtype_com;
>  #define BITS_PER_LONG_LONG (__SIZEOF_LONG_LONG__ * 8)
>  #defi

Re: [PATCH v2] eal/unix: allow creating thread with real-time priority

2023-10-25 Thread Stephen Hemminger
On Wed, 25 Oct 2023 19:54:06 +0200
Morten Brørup  wrote:

> I agree with Thomas on this.
> 
> If you want the log message, please degrade it to INFO or DEBUG level. It is 
> only relevant when chasing problems, not for normal production - and thus 
> NOTICE is too high.

I don't want the message to be hidden.
If we get any bug reports want to be able to say "read the log, don't do that".

> Someone might build a kernel with options to keep non-dataplane threads off 
> some dedicated CPU cores, so they can be used for guaranteed low-latency 
> dataplane threads. We do. We don't use real-time priority, though.

This is really, hard to do. Isolated CPU's are not isolated from interrupts and 
other sources which end up scheduling work as kernel threads. Plus there is the 
behavior where kernel decides to turn a soft irq into a kernel thread, then 
starve
itself. Under starvation, disk corruption is likely if interrupts never get 
processed :-(

> For reference, we did some experiments (using this custom built kernel) with 
> a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and 
> registering the delta. Although the overwhelming majority is ca. CPU 80 
> cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of 
> magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel 
> threads steal some cycles from this thread, regardless of our customizations. 
> We haven't bothered analyzing and optimizing it further.

Was this on isolated CPU?
Did you check that that CPU was excluded from the smp_affinty mask on all 
devices?
Did you enable the kernel feature to avoid clock ticks if CPU is dedicated?
Same thing for RCU, need to adjust parameters?

Also, on many systems there can be SMI BIOS hidden execution that will cause 
big outliers.

Lastly never try and use CPU 0. The kernel uses CPU 0 as catch all in lots of 
places.

> I think our experiment supports the need to allow kernel threads to run, e.g. 
> by calling sleep() or similar, when an EAL thread has real-time priority.



Re: [PATCH v2 09/19] rcu: use rte optional stdatomic API

2023-10-25 Thread Tyler Retzlaff
On Wed, Oct 25, 2023 at 09:41:22AM +, Ruifeng Wang wrote:
> > -Original Message-
> > From: Tyler Retzlaff 
> > Sent: Wednesday, October 18, 2023 4:31 AM
> > To: dev@dpdk.org
> > Cc: Akhil Goyal ; Anatoly Burakov 
> > ; Andrew
> > Rybchenko ; Bruce Richardson 
> > ;
> > Chenbo Xia ; Ciara Power ; 
> > David Christensen
> > ; David Hunt ; Dmitry Kozlyuk
> > ; Dmitry Malloy ; Elena 
> > Agostini
> > ; Erik Gabriel Carrillo ; 
> > Fan Zhang
> > ; Ferruh Yigit ; Harman Kalra
> > ; Harry van Haaren ; 
> > Honnappa Nagarahalli
> > ; jer...@marvell.com; Konstantin Ananyev
> > ; Matan Azrad ; Maxime 
> > Coquelin
> > ; Narcisa Ana Maria Vasile 
> > ;
> > Nicolas Chautru ; Olivier Matz 
> > ; Ori
> > Kam ; Pallavi Kadam ; Pavan 
> > Nikhilesh
> > ; Reshma Pattan ; Sameh 
> > Gobriel
> > ; Shijith Thotton ; 
> > Sivaprasad Tummala
> > ; Stephen Hemminger 
> > ; Suanming Mou
> > ; Sunil Kumar Kori ; 
> > tho...@monjalon.net;
> > Viacheslav Ovsiienko ; Vladimir Medvedkin
> > ; Yipeng Wang ; Tyler 
> > Retzlaff
> > 
> > Subject: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> > 
> > Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding 
> > rte_atomic_xxx
> > optional stdatomic API
> > 
> > Signed-off-by: Tyler Retzlaff 
> > ---
> >  lib/rcu/rte_rcu_qsbr.c | 48 +--
> >  lib/rcu/rte_rcu_qsbr.h | 68 
> > +-
> >  2 files changed, 58 insertions(+), 58 deletions(-)
> > 
> > diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c index 
> > 17be93e..4dc7714 100644
> > --- a/lib/rcu/rte_rcu_qsbr.c
> > +++ b/lib/rcu/rte_rcu_qsbr.c
> > @@ -102,21 +102,21 @@
> >  * go out of sync. Hence, additional checks are required.
> >  */
> > /* Check if the thread is already registered */
> > -   old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > -   __ATOMIC_RELAXED);
> > +   old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > +   rte_memory_order_relaxed);
> > if (old_bmap & 1UL << id)
> > return 0;
> > 
> > do {
> > new_bmap = old_bmap | (1UL << id);
> > -   success = __atomic_compare_exchange(
> > +   success = rte_atomic_compare_exchange_strong_explicit(
> > __RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > -   &old_bmap, &new_bmap, 0,
> > -   __ATOMIC_RELEASE, __ATOMIC_RELAXED);
> > +   &old_bmap, new_bmap,
> > +   rte_memory_order_release, 
> > rte_memory_order_relaxed);
> > 
> > if (success)
> > -   __atomic_fetch_add(&v->num_threads,
> > -   1, __ATOMIC_RELAXED);
> > +   rte_atomic_fetch_add_explicit(&v->num_threads,
> > +   1, rte_memory_order_relaxed);
> > else if (old_bmap & (1UL << id))
> > /* Someone else registered this thread.
> >  * Counter should not be incremented.
> > @@ -154,8 +154,8 @@
> >  * go out of sync. Hence, additional checks are required.
> >  */
> > /* Check if the thread is already unregistered */
> > -   old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > -   __ATOMIC_RELAXED);
> > +   old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > +   rte_memory_order_relaxed);
> > if (!(old_bmap & (1UL << id)))
> > return 0;
> > 
> > @@ -165,14 +165,14 @@
> >  * completed before removal of the thread from the list of
> >  * reporting threads.
> >  */
> > -   success = __atomic_compare_exchange(
> > +   success = rte_atomic_compare_exchange_strong_explicit(
> > __RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > -   &old_bmap, &new_bmap, 0,
> > -   __ATOMIC_RELEASE, __ATOMIC_RELAXED);
> > +   &old_bmap, new_bmap,
> > +   rte_memory_order_release, 
> > rte_memory_order_relaxed);
> > 
> > if (success)
> > -   __atomic_fetch_sub(&v->num_threads,
> > -   1, __ATOMIC_RELAXED);
> > +   rte_atomic_fetch_sub_explicit(&v->num_threads,
> > +   1, rte_memory_order_relaxed);
> > else if (!(old_bmap & (1UL << id)))
> > /* Someone else unregistered this thread.
> >  * Counter should not be incremented.
> > @@ -227,8 +227,8 @@
> > 
> > fprintf(f, "  Registered thread IDs = ");
> > for (i = 0; i < v->num_elems; i++) {
> > -

Re: [PATCH v2 19/19] ring: use rte optional stdatomic API

2023-10-25 Thread Tyler Retzlaff
On Wed, Oct 25, 2023 at 10:06:23AM +, Konstantin Ananyev wrote:
> 
> 
> > 
> > On Tue, Oct 24, 2023 at 09:43:13AM +0100, Konstantin Ananyev wrote:
> > > 17.10.2023 21:31, Tyler Retzlaff пишет:
> > > >Replace the use of gcc builtin __atomic_xxx intrinsics with
> > > >corresponding rte_atomic_xxx optional stdatomic API
> > > >
> > > >Signed-off-by: Tyler Retzlaff 
> > > >---
> > > >  drivers/net/mlx5/mlx5_hws_cnt.h   |  2 +-
> > > >  lib/ring/rte_ring_c11_pvt.h   | 33 
> > > > +
> > > >  lib/ring/rte_ring_core.h  | 10 +-
> > > >  lib/ring/rte_ring_generic_pvt.h   |  3 ++-
> > > >  lib/ring/rte_ring_hts_elem_pvt.h  | 22 --
> > > >  lib/ring/rte_ring_peek_elem_pvt.h |  6 +++---
> > > >  lib/ring/rte_ring_rts_elem_pvt.h  | 27 ++-
> > > >  7 files changed, 54 insertions(+), 49 deletions(-)
> > > >
> > > >diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h 
> > > >b/drivers/net/mlx5/mlx5_hws_cnt.h
> > > >index f462665..cc9ac10 100644
> > > >--- a/drivers/net/mlx5/mlx5_hws_cnt.h
> > > >+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
> > > >@@ -394,7 +394,7 @@ struct mlx5_hws_age_param {
> > > > __rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
> > > > &zcd->ptr1, &zcd->n1, &zcd->ptr2);
> > > > /* Update tail */
> > > >-__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
> > > >+rte_atomic_store_explicit(&r->prod.tail, revert2head, 
> > > >rte_memory_order_release);
> > > > return n;
> > > >  }
> > > >diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> > > >index f895950..f8be538 100644
> > > >--- a/lib/ring/rte_ring_c11_pvt.h
> > > >+++ b/lib/ring/rte_ring_c11_pvt.h
> > > >@@ -22,9 +22,10 @@
> > > >  * we need to wait for them to complete
> > > >  */
> > > > if (!single)
> > > >-rte_wait_until_equal_32(&ht->tail, old_val, 
> > > >__ATOMIC_RELAXED);
> > > >+rte_wait_until_equal_32((volatile uint32_t 
> > > >*)(uintptr_t)&ht->tail, old_val,
> > > >+rte_memory_order_relaxed);
> > > >-__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
> > > >+rte_atomic_store_explicit(&ht->tail, new_val, 
> > > >rte_memory_order_release);
> > > >  }
> > > >  /**
> > > >@@ -61,19 +62,19 @@
> > > > unsigned int max = n;
> > > > int success;
> > > >-*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
> > > >+*old_head = rte_atomic_load_explicit(&r->prod.head, 
> > > >rte_memory_order_relaxed);
> > > > do {
> > > > /* Reset n to the initial burst count */
> > > > n = max;
> > > > /* Ensure the head is read before tail */
> > > >-__atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > >+__atomic_thread_fence(rte_memory_order_acquire);
> > > > /* load-acquire synchronize with store-release of 
> > > > ht->tail
> > > >  * in update_tail.
> > > >  */
> > > >-cons_tail = __atomic_load_n(&r->cons.tail,
> > > >-__ATOMIC_ACQUIRE);
> > > >+cons_tail = rte_atomic_load_explicit(&r->cons.tail,
> > > >+rte_memory_order_acquire);
> > > > /* The subtraction is done between two unsigned 32bits 
> > > > value
> > > >  * (the result is always modulo 32 bits even if we have
> > > >@@ -95,10 +96,10 @@
> > > > r->prod.head = *new_head, success = 1;
> > > > else
> > > > /* on failure, *old_head is updated */
> > > >-success = 
> > > >__atomic_compare_exchange_n(&r->prod.head,
> > > >+success = 
> > > >rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
> > > > old_head, *new_head,
> > > >-0, __ATOMIC_RELAXED,
> > > >-__ATOMIC_RELAXED);
> > > >+rte_memory_order_relaxed,
> > > >+rte_memory_order_relaxed);
> > > > } while (unlikely(success == 0));
> > > > return n;
> > > >  }
> > > >@@ -137,19 +138,19 @@
> > > > int success;
> > > > /* move cons.head atomically */
> > > >-*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
> > > >+*old_head = rte_atomic_load_explicit(&r->cons.head, 
> > > >rte_memory_order_relaxed);
> > > > do {
> > > > /* Restore n as it may change every loop */
> > > > n = max;
> > > > /* Ensure the head is read before tail */
> > > >-__atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > >+__atomic_th

  1   2   >