Re: [PATCH v3 00/16] stop using variadic argument pack extension

2024-02-29 Thread David Marchand
On Wed, Feb 28, 2024 at 6:59 PM Tyler Retzlaff
 wrote:
> > I find this new helper less tricky to use and easier to read than the
> > RTE_FMT_* stuff that gets copy/pasted everywhere.
> > The changes are quite mechanical, so even though we are past -rc1, +1
> > for me on the series.
> >
> > Can we finish the job and convert remaining macros that prefix messages in 
> > lib/?
>
> I didn't realize I missed any. do you have a list or a regex that points
> me at them.  I was just searching for use of args...
>
> Happy to make the conversion of the others in the next rev.

Basically, this new macro/approach makes direct use of RTE_FMT_HEAD unneeded.

So I grepped like this:
$ git grep RTE_FMT_HEAD -- lib/ :^lib/log/ :^lib/eal/include/rte_common.h
b55361f252:lib/cryptodev/rte_cryptodev.h:   RTE_FMT("%s()
line %u: " RTE_FMT_HEAD(__VA_ARGS__ ,), \
b55361f252:lib/cryptodev/rte_cryptodev.h:   RTE_FMT("%s()
line %u: " RTE_FMT_HEAD(__VA_ARGS__ ,), \
b55361f252:lib/cryptodev/rte_cryptodev.h:   RTE_FMT("[%s]
%s: " RTE_FMT_HEAD(__VA_ARGS__ ,), \
b55361f252:lib/eal/windows/include/rte_windows.h:
RTE_FMT_HEAD(__VA_ARGS__ ,), GetLastError(), \
b55361f252:lib/eventdev/eventdev_pmd.h: RTE_FMT("%s() line %u:
" RTE_FMT_HEAD(__VA_ARGS__ ,), \
b55361f252:lib/eventdev/eventdev_pmd.h: RTE_FMT("%s() line %u:
" RTE_FMT_HEAD(__VA_ARGS__ ,), \
b55361f252:lib/eventdev/rte_event_timer_adapter.c:
RTE_FMT("EVTIMER: %s() line %u: " RTE_FMT_HEAD(__VA_ARGS__ ,), \
b55361f252:lib/graph/graph_private.h:   RTE_FMT("%s():%u "
RTE_FMT_HEAD(__VA_ARGS__ ,),\
b55361f252:lib/member/member.h: RTE_FMT("%s(): "
RTE_FMT_HEAD(__VA_ARGS__ ,), \
b55361f252:lib/node/node_private.h: RTE_FMT("%s: %s():%u "
RTE_FMT_HEAD(__VA_ARGS__ ,),\


-- 
David Marchand



RE: [PATCH 00/11] net/mlx5: flow insertion performance improvements

2024-02-29 Thread Ori Kam
Hi Dariusz,

> -Original Message-
> From: Dariusz Sosnowski 
> Sent: Wednesday, February 28, 2024 7:01 PM
> 
> Goal of this patchset is to improve the throughput of flow insertion
> and deletion in mlx5 PMD when HW Steering flow engine is used.
> 
> - Patch 1 - Use preallocated per-queue, per-actions template buffer
>   for storing translated flow actions, instead of allocating and
>   filling it on demand, on each flow operation.
> - Patches 2-4 - Make resource index allocation optional. This allocation
>   will be skipped when it is not required by the created template table.
> - Patches 5-7 - Reduce memory footprint of the internal flow queue.
> - Patch 8 - Remove indirection between flow job and flow itself,
>   by using flow as an operation container.
> - Patches 9-10 - Reduce memory footpring of flow struct by moving
>   rarely used flow fields outside of the main flow struct.
>   These fields will accesses only when needed.
>   Also remove unneeded `zmalloc` usage.
> - Patch 11 - Remove unneeded device status check in flow create.
> 
> In general all of these changes result in the following improvements
> (all numbers are averaged Kflows/sec):
> 
> |  | Insertion) |   +%   | Deletion |   +%  |
> |--|:--:|:--:|::|:-:|
> | baseline |   6338.7   ||  9739.6  |   |
> | improvements |   6978.8   | +10.1% |  10432.4 | +7.1% |
> 
> The basic benchmark was run on ConnectX-6 Dx (22.40.1000),
> on the system with Intel Xeon Platinum 8380 CPU.
> 
> Bing Zhao (2):
>   net/mlx5: skip the unneeded resource index allocation
>   net/mlx5: remove unneeded device status checking
> 
> Dariusz Sosnowski (7):
>   net/mlx5: allocate local DR rule action buffers
>   net/mlx5: remove action params from job
>   net/mlx5: remove flow pattern from job
>   net/mlx5: remove updated flow from job
>   net/mlx5: use flow as operation container
>   net/mlx5: move rarely used flow fields outside
>   net/mlx5: reuse flow fields
> 
> Erez Shitrit (2):
>   net/mlx5/hws: add check for matcher rule update support
>   net/mlx5/hws: add check if matcher contains complex rules
> 
>  drivers/net/mlx5/hws/mlx5dr.h |  16 +
>  drivers/net/mlx5/hws/mlx5dr_action.c  |   6 +
>  drivers/net/mlx5/hws/mlx5dr_action.h  |   2 +
>  drivers/net/mlx5/hws/mlx5dr_matcher.c |  29 +
>  drivers/net/mlx5/mlx5.h   |  29 +-
>  drivers/net/mlx5/mlx5_flow.h  | 128 -
>  drivers/net/mlx5/mlx5_flow_hw.c   | 794 --
>  7 files changed, 666 insertions(+), 338 deletions(-)
> 
> --
> 2.39.2

Series-acked-by:  Ori Kam 
Best,
Ori



RE: release candidate 24.03-rc1

2024-02-29 Thread Xu, HailinX
> -Original Message-
> From: Thomas Monjalon 
> Sent: Thursday, February 22, 2024 3:36 PM
> To: annou...@dpdk.org
> Subject: release candidate 24.03-rc1
> 
> A new DPDK release candidate is ready for testing:
>   https://git.dpdk.org/dpdk/tag/?id=v24.03-rc1
> 
> There are 521 new patches in this snapshot.
> 
> Release notes:
>   https://doc.dpdk.org/guides/rel_notes/release_24_03.html
> 
> Highlights of 24.03-rc1:
>   - argument parsing library
>   - dynamic logging standardized
>   - HiSilicon UACCE bus
>   - Tx queue query
>   - flow matching with random and field comparison
>   - flow action NAT64
>   - more cleanups to prepare MSVC build
> 
> Please test and report issues on bugs.dpdk.org.
> 
> DPDK 24.03-rc2 will be out as soon as possible.
> Priority is on features announced in the roadmap:
>   https://core.dpdk.org/roadmap/
> 
> Thank you everyone
> 
Update the test status for Intel part. dpdk24.03-rc1 all test is done. found 
three new issues.

New issues:
1. Bug 1386 - [dpdk-24.03] [ABI][meson test] driver-tests/link_bonding_autotest 
test failed: Segmentation fault when do ABI testing  -> not fix yet
2. Bug 1387 - [dpdk24.03] cbdma: Failed to launch dpdk-dma app  -> has fix patch
3. dcf_lifecycle/test_one_testpmd_dcf_reset_port: Failed to manually reset vf 
when use dcf  -> Intel dev is under investigating

# Basic Intel(R) NIC testing
* Build or compile:  
 *Build: cover the build test combination with latest GCC/Clang version and the 
popular OS revision such as Ubuntu23.10, Ubuntu22.04.3, Fedora39, RHEL8.9/9.2, 
Centos7.9, FreeBSD14.0, SUSE15, OpenAnolis8.8, CBL-Mariner2.0 etc.
  - All test passed.
 *Compile: cover the CFLAGES(O0/O1/O2/O3) with popular OS such as Ubuntu22.04.3 
and RHEL9.2.
  - All test passed with latest dpdk.
* PF/VF(i40e, ixgbe): test scenarios including 
PF/VF-RTE_FLOW/TSO/Jumboframe/checksum offload/VLAN/VXLAN, etc. 
- All test case is done. No new issue is found.
* PF/VF(ice): test scenarios including Switch features/Package Management/Flow 
Director/Advanced Tx/Advanced RSS/ACL/DCF/Flexible Descriptor, etc.
- Execution rate is done. found the third issue.
* Intel NIC single core/NIC performance: test scenarios including PF/VF single 
core performance test, RFC2544 Zero packet loss performance test, etc.
- Execution rate is done. No new issue is found.
* Power and IPsec: 
 * Power: test scenarios including bi-direction/Telemetry/Empty Poll 
Lib/Priority Base Frequency, etc. 
- Execution rate is done. No new issue is found.
 * IPsec: test scenarios including ipsec/ipsec-gw/ipsec library basic test - 
QAT&SW/FIB library, etc.
- Execution rate is done. No new issue is found. 
# Basic cryptodev and virtio testing
* Virtio: both function and performance test are covered. Such as 
PVP/Virtio_loopback/virtio-user loopback/virtio-net VM2VM perf testing/VMAWARE 
ESXI 8.0U1, etc.
- Execution rate is done. found the second issue.
* Cryptodev: 
 *Function test: test scenarios including Cryptodev API testing/CompressDev 
ISA-L/QAT/ZLIB PMD Testing/FIPS, etc.
- Execution rate is done. No new issue is found. 
 *Performance test: test scenarios including Throughput Performance /Cryptodev 
Latency, etc.
- Execution rate is done. No performance drop.


Regards,
Xu, Hailin


Re: [PATCH] net/hns3: fix Rx packet truncation when KEEP CRC enabled

2024-02-29 Thread Ferruh Yigit
On 2/29/2024 3:58 AM, huangdengdui wrote:
> 
> 
> On 2024/2/28 21:07, Ferruh Yigit wrote:
>> On 2/28/2024 2:27 AM, huangdengdui wrote:
>>>
>>>
>>> On 2024/2/27 0:43, Ferruh Yigit wrote:
 On 2/26/2024 3:16 AM, Jie Hai wrote:
> On 2024/2/23 21:53, Ferruh Yigit wrote:
>> On 2/20/2024 3:58 AM, Jie Hai wrote:
>>> Hi, Ferruh,
>>>
>>> Thanks for your review.
>>>
>>> On 2024/2/7 22:15, Ferruh Yigit wrote:
 On 2/6/2024 1:10 AM, Jie Hai wrote:
> From: Dengdui Huang 
>
> When KEEP_CRC offload is enabled, some packets will be truncated and
> the CRC is still be stripped in following cases:
> 1. For HIP08 hardware, the packet type is TCP and the length
>  is less than or equal to 60B.
> 2. For other hardwares, the packet type is IP and the length
>  is less than or equal to 60B.
>

 If a device doesn't support the offload by some packets, it can be
 option to disable offload for that device, instead of calculating it in
 software and append it.
>>>
>>> The KEEP CRC feature of hns3 is faulty only in the specific packet
>>> type and small packet(<60B) case.
>>> What's more, the small ethernet packet is not common.
>>>
 Unless you have a specific usecase, or requirement to support the
 offload.
>>>
>>> Yes, some users of hns3 are already using this feature.
>>> So we cannot drop this offload
>>>
 <...>

> @@ -2492,10 +2544,16 @@ hns3_recv_pkts_simple(void *rx_queue,
>    goto pkt_err;
>      rxm->packet_type = hns3_rx_calc_ptype(rxq, l234_info,
> ol_info);
> -
>    if (rxm->packet_type == RTE_PTYPE_L2_ETHER_TIMESYNC)
>    rxm->ol_flags |= RTE_MBUF_F_RX_IEEE1588_PTP;
>    +    if (unlikely(rxq->crc_len > 0)) {
> +    if (hns3_need_recalculate_crc(rxq, rxm))
> +    hns3_recalculate_crc(rxq, rxm);
> +    rxm->pkt_len -= rxq->crc_len;
> +    rxm->data_len -= rxq->crc_len;
>

 Removing 'crc_len' from 'mbuf->pkt_len' & 'mbuf->data_len' is
 practically same as stripping CRC.

 We don't count CRC length in the statistics, but it should be
 accessible
 in the payload by the user.
>>> Our drivers are behaving exactly as you say.
>>>
>>
>> If so I missed why mbuf 'pkt_len' and 'data_len' reduced by
>> 'rxq->crc_len', can you please explain what above lines does?
>>
>>
> @@ -2470,8 +2523,7 @@ hns3_recv_pkts_simple(void *rx_queue,
>  rxdp->rx.bd_base_info = 0;
>
>  rxm->data_off = RTE_PKTMBUF_HEADROOM;
> -    rxm->pkt_len = (uint16_t)(rte_le_to_cpu_16(rxd.rx.pkt_len)) -
> -    rxq->crc_len;
> +    rxm->pkt_len = rte_le_to_cpu_16(rxd.rx.pkt_len);
>
> In the previous code above, the 'pkt_len' is set to the length obtained
> from the BD. the length obtained from the BD already contains CRC length.
> But as you said above, the DPDK requires that the length of the mbuf
> does not contain CRC length . So we subtract 'rxq->crc_len' from
> mbuf'pkt_len' and 'data_len'. This patch doesn't change the logic, it
> just moves the code around.
>

 Nope, I am not saying mbuf length shouldn't contain CRC length, indeed
 it is other way around and this is our confusion.

 CRC length shouldn't be in the statistics, I mean in received bytes stats.
 Assume that received packet is 128 bytes and we know it has the CRC,
 Rx received bytes stat should be 124 (rx_bytes = 128 - CRC = 124)

 But mbuf->data_len & mbuf->pkt_len should have full frame length,
 including CRC.

 As application explicitly requested to KEEP CRC, it will know last 4
 bytes are CRC.
 Anything after 'mbuf->data_len' in the mbuf buffer is not valid, so if
 you reduce 'mbuf->data_len' by CRC size, application can't know if 4
 bytes after 'mbuf->data_len' is valid CRC or not.

>>> I agree with you.
>>>
>>> But the implementation of other PMDs supported KEEP_CRC is like this.
>>> In addition, there are probably many users that are already using it.
>>> If we modify it, it may cause applications incompatible.
>>>
>>> what do you think?
>>>
>> This is documented in the ethdev [1], better to follow the documentation
>> for all PMDs, can you please highlight the relevant driver code, we can
>> discuss it with their maintainers.
>>
>> Alternatively we can document this additionally in the KEEP_CRC feature
>> document if it helps for the applications.
>>
>>
>> [1]
>> https://git.dpdk.org/dpdk/tree/lib/ethdev/rte_ethdev.h?h=v23.11#n257
> 
> Currently,this documentation does not describe whether pkt_len and data_len 
> should contain crc_len.
> 

I think it is

[PATCH v3 1/3] common/qat: isolate parser arguments configuration

2024-02-29 Thread Arkadiusz Kusztal
This commit isolates qat device arguments from the common
code. Now arguments are defined per service, and only appear
in the application if the service is compiled-in.

Signed-off-by: Arkadiusz Kusztal 
---
 drivers/common/qat/qat_common.c |  11 +++
 drivers/common/qat/qat_common.h |   3 +
 drivers/common/qat/qat_device.c | 179 ++--
 drivers/common/qat/qat_device.h |  27 ++
 drivers/compress/qat/qat_comp_pmd.c |  31 +--
 drivers/compress/qat/qat_comp_pmd.h |   3 +-
 drivers/crypto/qat/qat_asym.c   |  38 +---
 drivers/crypto/qat/qat_sym.c|  49 ++
 8 files changed, 188 insertions(+), 153 deletions(-)

diff --git a/drivers/common/qat/qat_common.c b/drivers/common/qat/qat_common.c
index 59e7e02622..61bc97b0f3 100644
--- a/drivers/common/qat/qat_common.c
+++ b/drivers/common/qat/qat_common.c
@@ -6,6 +6,17 @@
 #include "qat_device.h"
 #include "qat_logs.h"
 
+#define QAT_LEGACY_CAPA "qat_legacy_capa"
+
+static const char *const arguments[] = {
+   QAT_LEGACY_CAPA,
+   NULL
+};
+
+const char *const *qat_cmdline_defines[QAT_MAX_SERVICES + 1] = {
+   [QAT_MAX_SERVICES] = arguments,
+};
+
 const char *
 qat_service_get_str(enum qat_service_type type)
 {
diff --git a/drivers/common/qat/qat_common.h b/drivers/common/qat/qat_common.h
index 53799ce174..7425600506 100644
--- a/drivers/common/qat/qat_common.h
+++ b/drivers/common/qat/qat_common.h
@@ -16,6 +16,9 @@
  * from one according to the generation of the device.
  * QAT_GEN* is used as the index to find all devices
  */
+
+extern const char *const *qat_cmdline_defines[];
+
 enum qat_device_gen {
QAT_GEN1,
QAT_GEN2,
diff --git a/drivers/common/qat/qat_device.c b/drivers/common/qat/qat_device.c
index a9ea4af5df..e367671267 100644
--- a/drivers/common/qat/qat_device.c
+++ b/drivers/common/qat/qat_device.c
@@ -121,70 +121,6 @@ qat_get_qat_dev_from_pci_dev(struct rte_pci_device 
*pci_dev)
return qat_pci_get_named_dev(name);
 }
 
-static void
-qat_dev_parse_cmd(const char *str, struct qat_dev_cmd_param
-   *qat_dev_cmd_param)
-{
-   int i = 0;
-   const char *param;
-
-   while (1) {
-   char value_str[4] = { };
-
-   param = qat_dev_cmd_param[i].name;
-   if (param == NULL)
-   return;
-   long value = 0;
-   const char *arg = strstr(str, param);
-   const char *arg2 = NULL;
-
-   if (arg) {
-   arg2 = arg + strlen(param);
-   if (*arg2 != '=') {
-   QAT_LOG(DEBUG, "parsing error '=' sign"
-   " should immediately follow %s",
-   param);
-   arg2 = NULL;
-   } else
-   arg2++;
-   } else {
-   QAT_LOG(DEBUG, "%s not provided", param);
-   }
-   if (arg2) {
-   int iter = 0;
-   while (iter < 2) {
-   if (!isdigit(*(arg2 + iter)))
-   break;
-   iter++;
-   }
-   if (!iter) {
-   QAT_LOG(DEBUG, "parsing error %s"
-  " no number provided",
-  param);
-   } else {
-   memcpy(value_str, arg2, iter);
-   value = strtol(value_str, NULL, 10);
-   if (strcmp(param,
-SYM_CIPHER_CRC_ENABLE_NAME) == 0) {
-   if (value < 0 || value > 1) {
-   QAT_LOG(DEBUG, "The value for 
qat_sym_cipher_crc_enable should be set to 0 or 1, setting to 0");
-   value = 0;
-   }
-   } else if (value > MAX_QP_THRESHOLD_SIZE) {
-   QAT_LOG(DEBUG, "Exceeded max size of"
-   " threshold, setting to %d",
-   MAX_QP_THRESHOLD_SIZE);
-   value = MAX_QP_THRESHOLD_SIZE;
-   }
-   QAT_LOG(DEBUG, "parsing %s = %ld",
-   param, value);
-   }
-   }
-   qat_dev_cmd_param[i].val = value;
-   i++;
-   }
-}
-
 static enum qat_device_gen
 pick_gen(const struct rte_pci_device *pci_dev)
 {
@@ -210,9 +146,79 @@ pick_gen(const struct rte_pci_device *pci_dev)
}
 }
 
-stru

[PATCH v3 2/3] common/qat: decouple pmds from the common code

2024-02-29 Thread Arkadiusz Kusztal
Service specific functions were moved to services
files. Weak symbols for device create/destroy were removed,
named private devs were replaced by an opaque array.

Signed-off-by: Arkadiusz Kusztal 
---
 drivers/common/qat/qat_device.c | 112 ++--
 drivers/common/qat/qat_device.h |  47 ---
 drivers/compress/qat/qat_comp_pmd.c |  34 +--
 drivers/compress/qat/qat_comp_pmd.h |   7 ---
 drivers/crypto/qat/qat_asym.c   |  20 ---
 drivers/crypto/qat/qat_sym.c|  19 +++---
 6 files changed, 83 insertions(+), 156 deletions(-)

diff --git a/drivers/common/qat/qat_device.c b/drivers/common/qat/qat_device.c
index e367671267..e1cea4d9b2 100644
--- a/drivers/common/qat/qat_device.c
+++ b/drivers/common/qat/qat_device.c
@@ -26,6 +26,8 @@
 struct qat_gen_hw_data qat_gen_config[QAT_N_GENS];
 struct qat_dev_hw_spec_funcs *qat_dev_hw_spec[QAT_N_GENS];
 
+struct qat_service qat_service[QAT_MAX_SERVICES];
+
 /* per-process array of device data */
 struct qat_device_info qat_pci_devs[RTE_PMD_QAT_MAX_PCI_DEVICES];
 static int qat_nb_pci_devices;
@@ -111,7 +113,7 @@ qat_pci_find_free_device_index(void)
return dev_id;
 }
 
-struct qat_pci_device *
+static struct qat_pci_device *
 qat_get_qat_dev_from_pci_dev(struct rte_pci_device *pci_dev)
 {
char name[QAT_DEV_NAME_MAX_LEN];
@@ -277,7 +279,8 @@ qat_pci_device_allocate(struct rte_pci_device *pci_dev)
return NULL;
}
 
-   qat_dev_size = sizeof(struct qat_pci_device) + extra_size;
+   qat_dev_size = sizeof(struct qat_pci_device) + sizeof(void *) *
+   QAT_MAX_SERVICES + extra_size;
qat_dev_mz = rte_memzone_reserve(name, qat_dev_size,
rte_socket_id(), 0);
 
@@ -366,7 +369,7 @@ qat_pci_device_release(struct rte_pci_device *pci_dev)
 {
struct qat_pci_device *qat_dev;
char name[QAT_DEV_NAME_MAX_LEN];
-   int busy = 0;
+   int busy = 0, i;
 
if (pci_dev == NULL)
return -EINVAL;
@@ -381,24 +384,16 @@ qat_pci_device_release(struct rte_pci_device *pci_dev)
/* Check that there are no service devs still on pci device */
 
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-   if (qat_dev->sym_dev != NULL) {
-   QAT_LOG(DEBUG, "QAT sym device %s is busy",
-   name);
-   busy = 1;
-   }
-   if (qat_dev->asym_dev != NULL) {
-   QAT_LOG(DEBUG, "QAT asym device %s is busy",
-   name);
-   busy = 1;
-   }
-   if (qat_dev->comp_dev != NULL) {
-   QAT_LOG(DEBUG, "QAT comp device %s is busy",
-   name);
+   for (i = 0; i < QAT_MAX_SERVICES; i++) {
+   if (qat_dev->pmd[i] == NULL)
+   continue;
+   QAT_LOG(DEBUG, "QAT %s device %s is busy",
+   qat_service[i].name, name);
busy = 1;
-   }
if (busy)
return -EBUSY;
rte_memzone_free(inst->mz);
+   }
}
memset(inst, 0, sizeof(struct qat_device_info));
qat_nb_pci_devices--;
@@ -412,17 +407,20 @@ static int
 qat_pci_dev_destroy(struct qat_pci_device *qat_pci_dev,
struct rte_pci_device *pci_dev)
 {
-   qat_sym_dev_destroy(qat_pci_dev);
-   qat_comp_dev_destroy(qat_pci_dev);
-   qat_asym_dev_destroy(qat_pci_dev);
+   int i;
+
+   for (i = 0; i < QAT_MAX_SERVICES; i++) {
+   if (!qat_service[i].dev_create)
+   continue;
+   qat_service[i].dev_destroy(qat_pci_dev);
+   }
return qat_pci_device_release(pci_dev);
 }
 
 static int qat_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
struct rte_pci_device *pci_dev)
 {
-   int sym_ret = 0, asym_ret = 0, comp_ret = 0;
-   int num_pmds_created = 0;
+   int i, ret = 0, num_pmds_created = 0;
struct qat_pci_device *qat_pci_dev;
 
QAT_LOG(DEBUG, "Found QAT device at %02x:%02x.%x",
@@ -434,30 +432,18 @@ static int qat_pci_probe(struct rte_pci_driver *pci_drv 
__rte_unused,
if (qat_pci_dev == NULL)
return -ENODEV;
 
-   sym_ret = qat_sym_dev_create(qat_pci_dev);
-   if (sym_ret == 0) {
-   num_pmds_created++;
-   }
-   else
-   QAT_LOG(WARNING,
-   "Failed to create QAT SYM PMD on device %s",
-   qat_pci_dev->name);
-
-   comp_re

[PATCH v3 3/3] common/qat: fix incorrectly placed legacy flag

2024-02-29 Thread Arkadiusz Kusztal
This commit fixes a legacy flag, which was placed in a file
that may not be included in a building process.

Fixes: cffb726b7797 ("crypto/qat: enable insecure algorithms")

Signed-off-by: Arkadiusz Kusztal 
---
 drivers/common/qat/qat_device.c | 1 +
 drivers/crypto/qat/qat_sym.c| 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/common/qat/qat_device.c b/drivers/common/qat/qat_device.c
index e1cea4d9b2..fcae35c2b5 100644
--- a/drivers/common/qat/qat_device.c
+++ b/drivers/common/qat/qat_device.c
@@ -31,6 +31,7 @@ struct qat_service qat_service[QAT_MAX_SERVICES];
 /* per-process array of device data */
 struct qat_device_info qat_pci_devs[RTE_PMD_QAT_MAX_PCI_DEVICES];
 static int qat_nb_pci_devices;
+int qat_legacy_capa;
 
 /*
  * The set of PCI devices this driver supports
diff --git a/drivers/crypto/qat/qat_sym.c b/drivers/crypto/qat/qat_sym.c
index 83254a03c1..6d333447c9 100644
--- a/drivers/crypto/qat/qat_sym.c
+++ b/drivers/crypto/qat/qat_sym.c
@@ -19,7 +19,6 @@
 #include "qat_qp.h"
 
 uint8_t qat_sym_driver_id;
-int qat_legacy_capa;
 
 #define SYM_ENQ_THRESHOLD_NAME "qat_sym_enq_threshold"
 #define SYM_CIPHER_CRC_ENABLE_NAME "qat_sym_cipher_crc_enable"
-- 
2.13.6



[PATCH] net/mlx5: fix the counters map in bonding mode

2024-02-29 Thread Bing Zhao
In the HW-LAG mode, there is only one mlx5 IB device with 2 ETH
interfaces. In theory, the settings on both ports should be the same.
But in the real life, some inconsistent settings may be done by the
user and the PMD is not aware of this.

In the previous implementation, the xstats map was generated from the
information fetched on the 1st port of a bonding interface. If the
2nd port had a different settings, the number and the order of the
counters may differ from that of the 1st one. The ioctl() call may
corrupt the user buffers (copy_to_user) and cause a crash.

The commit will change the map between the driver counters to the
PMD user counters.
  1. Switch the inner and outer loop to speed up the initialization
 time AMAP - since there will be >300 counters returned from the
 driver.
  2. Generate an unique map for both ports in LAG mode.
a. Scan the 1st port and find the supported counters' strings,
   then add to the map.
b. In bonding, scan the 2nd port and find the strings. If one is
   already in the map, use the index. Or append to the next free
   slot.
c. Append the device counters that needs to be fetched via sysfs
   or Devx command. This kind of counter(s) is unique per IB
   device.

After querying the statistics from the driver, the value will be read
from the proper offset in the "struct ethtool_stats" and then added
into the output array based on the map information. In bonding mode,
the statistics from both ports will be accumulated if the counters
are valid on both ports.

Compared to the system call or Devx command, the overhead introduced
by the extra index comparison is light and should not cause a
significant degradation.

The application should ensure that the port settings should not be
changed out of the DPDK application dynamically in most cases. Or
else the change cannot be notified and the counters map might not
be valid when the number doesn't change but the counters set had
changed. A device restart will help to re-initialize the map from
scrath.

Fixes: 7ed15acdcd69 ("net/mlx5: improve xstats of bonding port")
Cc: xuemi...@nvidia.com
Cc: sta...@dpdk.org

Signed-off-by: Bing Zhao 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/linux/mlx5_ethdev_os.c   | 249 +++---
 drivers/net/mlx5/mlx5.h   |  15 +-
 drivers/net/mlx5/mlx5_stats.c |  58 +++--
 drivers/net/mlx5/windows/mlx5_ethdev_os.c |  22 +-
 4 files changed, 242 insertions(+), 102 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c 
b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index 92c47a3b3d..eb47c284ec 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -1286,13 +1286,16 @@ _mlx5_os_read_dev_counters(struct rte_eth_dev *dev, int 
pf, uint64_t *stats)
struct mlx5_xstats_ctrl *xstats_ctrl = &priv->xstats_ctrl;
unsigned int i;
struct ifreq ifr;
-   unsigned int stats_sz = xstats_ctrl->stats_n * sizeof(uint64_t);
+   unsigned int max_stats_n = RTE_MAX(xstats_ctrl->stats_n, 
xstats_ctrl->stats_n_2nd);
+   unsigned int stats_sz = max_stats_n * sizeof(uint64_t);
unsigned char et_stat_buf[sizeof(struct ethtool_stats) + stats_sz];
struct ethtool_stats *et_stats = (struct ethtool_stats *)et_stat_buf;
int ret;
+   uint16_t i_idx, o_idx;
 
et_stats->cmd = ETHTOOL_GSTATS;
-   et_stats->n_stats = xstats_ctrl->stats_n;
+   /* Pass the maximum value, the driver may ignore this. */
+   et_stats->n_stats = max_stats_n;
ifr.ifr_data = (caddr_t)et_stats;
if (pf >= 0)
ret = mlx5_ifreq_by_ifname(priv->sh->bond.ports[pf].ifname,
@@ -1305,21 +1308,34 @@ _mlx5_os_read_dev_counters(struct rte_eth_dev *dev, int 
pf, uint64_t *stats)
dev->data->port_id);
return ret;
}
-   for (i = 0; i != xstats_ctrl->mlx5_stats_n; ++i) {
-   if (xstats_ctrl->info[i].dev)
-   continue;
-   stats[i] += (uint64_t)
-   et_stats->data[xstats_ctrl->dev_table_idx[i]];
+   if (pf <= 0) {
+   for (i = 0; i != xstats_ctrl->mlx5_stats_n; i++) {
+   i_idx = xstats_ctrl->dev_table_idx[i];
+   if (i_idx == UINT16_MAX || xstats_ctrl->info[i].dev)
+   continue;
+   o_idx = xstats_ctrl->xstats_o_idx[i];
+   stats[o_idx] += (uint64_t)et_stats->data[i_idx];
+   }
+   } else {
+   for (i = 0; i != xstats_ctrl->mlx5_stats_n; i++) {
+   i_idx = xstats_ctrl->dev_table_idx_2nd[i];
+   if (i_idx == UINT16_MAX)
+   continue;
+   o_idx = xstats_ctrl->xstats_o_idx_2nd[i];
+   stats[o_idx] += (uint64_t)et_stats->data[i_idx];
+   }
}

RE: [PATCH v4 0/4] add QAT GEN LCE device

2024-02-29 Thread Kusztal, ArkadiuszX



> -Original Message-
> From: Power, Ciara 
> Sent: Tuesday, February 27, 2024 10:55 AM
> To: Nayak, Nishikanta ; dev@dpdk.org
> Cc: Ji, Kai ; Kusztal, ArkadiuszX
> ; S Joshi, Rakesh 
> Subject: RE: [PATCH v4 0/4] add QAT GEN LCE device
> 
> 
> 
> > -Original Message-
> > From: Nayak, Nishikanta 
> > Sent: Tuesday, February 27, 2024 9:40 AM
> > To: dev@dpdk.org
> > Cc: Power, Ciara ; Ji, Kai ;
> > Kusztal, ArkadiuszX ; S Joshi, Rakesh
> > ; Nayak, Nishikanta
> > 
> > Subject: [PATCH v4 0/4] add QAT GEN LCE device
> >
> > This patchset adds a new QAT LCE device.
> > The device currently only supports symmetric crypto, and only the
> > AES-GCM algorithm.
> >
> > v4:
> >   - Fixed cover letter, v3 included the wrong details relating
> > to another patchset.
> > v3:
> >   - Fixed typos in commit and code comments.
> >   - Replaced use of linux/kernel.h macro with local macro
> > to fix ARM compilation in CI.
> > v2:
> >- Renamed device from GEN 5 to GEN LCE.
> >- Removed unused code.
> >- Updated macro names.
> >
> > Nishikant Nayak (4):
> >   common/qat: add files specific to GEN LCE
> >   common/qat: update common driver to support GEN LCE
> >   crypto/qat: update headers for GEN LCE support
> >   test/cryptodev: add tests for GCM with AAD
> 
> Series-acked-by: Ciara Power 
Series-acked-by: Arkadiusz Kusztal 


RE: [PATCH v2 0/4] add new QAT gen3 and gen5

2024-02-29 Thread Kusztal, ArkadiuszX
Series-acked-by: Arkadiusz Kusztal 
arkadiuszx.kusz...@intel.com

Series-acked-by: Kai Ji mailto:kai...@intel.com>>


From: Power, Ciara mailto:ciara.po...@intel.com>>
Sent: 23 February 2024 15:12
To: dev@dpdk.org mailto:dev@dpdk.org>>
Cc: gak...@marvell.com 
mailto:gak...@marvell.com>>; Ji, Kai 
mailto:kai...@intel.com>>; Kusztal, ArkadiuszX 
mailto:arkadiuszx.kusz...@intel.com>>; Power, 
Ciara mailto:ciara.po...@intel.com>>
Subject: [PATCH v2 0/4] add new QAT gen3 and gen5

This patchset adds support for two new QAT devices.
A new GEN3 device, and a GEN5 device, both of which have
wireless slice support for algorithms such as ZUC-256.

Symmetric, asymmetric and compression are all supported
for these devices.

v2:
  - New patch added for gen5 device that reuses gen4 code,
and new gen3 wireless slice changes.
  - Removed patch to disable asymmetric and compression.
  - Documentation updates added.
  - Fixed ZUC-256 IV modification for raw API path.
  - Fixed setting extended protocol flag bit position.
  - Added check for ZUC-256 wireless slice in slice map.

Ciara Power (4):
  common/qat: add new gen3 device
  common/qat: add zuc256 wireless slice for gen3
  common/qat: add new gen3 CMAC macros
  common/qat: add gen5 device

 doc/guides/compressdevs/qat_comp.rst |   1 +
 doc/guides/cryptodevs/qat.rst|   6 +
 doc/guides/rel_notes/release_24_03.rst   |   7 +
 drivers/common/qat/dev/qat_dev_gen4.c|  31 ++-
 drivers/common/qat/dev/qat_dev_gen5.c|  51 
 drivers/common/qat/dev/qat_dev_gens.h|  54 
 drivers/common/qat/meson.build   |   3 +
 drivers/common/qat/qat_adf/icp_qat_fw.h  |   6 +-
 drivers/common/qat/qat_adf/icp_qat_fw_la.h   |  24 ++
 drivers/common/qat/qat_adf/icp_qat_hw.h  |  26 +-
 drivers/common/qat/qat_common.h  |   1 +
 drivers/common/qat/qat_device.c  |  19 ++
 drivers/common/qat/qat_device.h  |   2 +
 drivers/compress/qat/dev/qat_comp_pmd_gen4.c |   8 +-
 drivers/compress/qat/dev/qat_comp_pmd_gen5.c |  73 +
 drivers/compress/qat/dev/qat_comp_pmd_gens.h |  14 +
 drivers/crypto/qat/dev/qat_crypto_pmd_gen2.c |   7 +-
 drivers/crypto/qat/dev/qat_crypto_pmd_gen3.c |  63 -
 drivers/crypto/qat/dev/qat_crypto_pmd_gen4.c |   4 +-
 drivers/crypto/qat/dev/qat_crypto_pmd_gen5.c | 278 +++
 drivers/crypto/qat/dev/qat_crypto_pmd_gens.h |  40 ++-
 drivers/crypto/qat/dev/qat_sym_pmd_gen1.c|  43 +++
 drivers/crypto/qat/qat_sym_session.c | 177 ++--
 drivers/crypto/qat/qat_sym_session.h |   2 +
 24 files changed, 889 insertions(+), 51 deletions(-)
 create mode 100644 drivers/common/qat/dev/qat_dev_gen5.c
 create mode 100644 drivers/compress/qat/dev/qat_comp_pmd_gen5.c
 create mode 100644 drivers/crypto/qat/dev/qat_crypto_pmd_gen5.c

--
2.25.1


Re: [PATCH v4 1/7] ethdev: support report register names and filter

2024-02-29 Thread Thomas Monjalon
26/02/2024 04:07, Jie Hai:
> This patch adds "filter" and "names" fields to "rte_dev_reg_info"
> structure. Names of registers in data fields can be reported and
> the registers can be filtered by their names.
> 
> The new API rte_eth_dev_get_reg_info_ext() is added to support
> reporting names and filtering by names. And the original API
> rte_eth_dev_get_reg_info() does not use the name and filter fields.
> A local variable is used in rte_eth_dev_get_reg_info for
> compatibility. If the drivers does not report the names, set them
> to "offset_XXX".

Isn't it possible to implement filtering in the original function?
What would it break?

> @@ -20,6 +25,12 @@ struct rte_dev_reg_info {
>   uint32_t length; /**< Number of registers to fetch */
>   uint32_t width; /**< Size of device register */
>   uint32_t version; /**< Device version */
> + /**
> +  * Filter for target subset of registers.
> +  * This field could affects register selection for data/length/names.
> +  */
> + const char *filter;
> + struct rte_eth_reg_name *names; /**< Registers name saver */
>  };

I suppose this is an ABI break?
Confirmed: http://mails.dpdk.org/archives/test-report/2024-February/587314.html




[PATCH] net/mlx5: fix action template expansion: support indirect actions list

2024-02-29 Thread Gregory Etelson
MLX5 PMD actions template compilation may implicitly add MODIFY_HEADER
to actions list provided by application.
MLX5 actions in a template list must be arranged according to the HW
supported order.
The PMD must place new MODIFY_HEADER in the correct location relative
to existing actions.

The patch adds indirect actions list to calculation of the new
MODIFY_HEADER location.

Fixes: e26f50adbf38 ("net/mlx5: support indirect list meter mark action")

Cc: sta...@dpdk.org

Signed-off-by: Gregory Etelson 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 80 +
 1 file changed, 80 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index f778fd0698..585f1ba925 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -88,6 +88,9 @@ mlx5_tbl_multi_pattern_process(struct rte_eth_dev *dev,
 static void
 mlx5_destroy_multi_pattern_segment(struct mlx5_multi_pattern_segment *segment);
 
+static __rte_always_inline enum mlx5_indirect_list_type
+flow_hw_inlist_type_get(const struct rte_flow_action *actions);
+
 static __rte_always_inline int
 mlx5_multi_pattern_reformat_to_index(enum mlx5dr_action_type type)
 {
@@ -5803,6 +5806,69 @@ mlx5_decap_encap_reformat_type(const struct 
rte_flow_action *actions,
   MLX5_FLOW_ACTION_ENCAP : MLX5_FLOW_ACTION_DECAP;
 }
 
+enum mlx5_hw_indirect_list_relative_position {
+   MLX5_INDIRECT_LIST_POSITION_UNKNOWN = -1,
+   MLX5_INDIRECT_LIST_POSITION_BEFORE_MH = 0,
+   MLX5_INDIRECT_LIST_POSITION_AFTER_MH,
+};
+
+static enum mlx5_hw_indirect_list_relative_position
+mlx5_hw_indirect_list_mh_position(const struct rte_flow_action *action)
+{
+   const struct rte_flow_action_indirect_list *conf = action->conf;
+   enum mlx5_indirect_list_type list_type = 
mlx5_get_indirect_list_type(conf->handle);
+   enum mlx5_hw_indirect_list_relative_position pos = 
MLX5_INDIRECT_LIST_POSITION_UNKNOWN;
+   const union {
+   struct mlx5_indlst_legacy *legacy;
+   struct mlx5_hw_encap_decap_action *reformat;
+   struct rte_flow_action_list_handle *handle;
+   } h = { .handle = conf->handle};
+
+   switch (list_type) {
+   case  MLX5_INDIRECT_ACTION_LIST_TYPE_LEGACY:
+   switch (h.legacy->legacy_type) {
+   case RTE_FLOW_ACTION_TYPE_AGE:
+   case RTE_FLOW_ACTION_TYPE_COUNT:
+   case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+   case RTE_FLOW_ACTION_TYPE_METER_MARK:
+   case RTE_FLOW_ACTION_TYPE_QUOTA:
+   pos = MLX5_INDIRECT_LIST_POSITION_BEFORE_MH;
+   break;
+   case RTE_FLOW_ACTION_TYPE_RSS:
+   pos = MLX5_INDIRECT_LIST_POSITION_AFTER_MH;
+   break;
+   default:
+   pos = MLX5_INDIRECT_LIST_POSITION_UNKNOWN;
+   break;
+   }
+   break;
+   case MLX5_INDIRECT_ACTION_LIST_TYPE_MIRROR:
+   pos = MLX5_INDIRECT_LIST_POSITION_AFTER_MH;
+   break;
+   case MLX5_INDIRECT_ACTION_LIST_TYPE_REFORMAT:
+   switch (h.reformat->action_type) {
+   case MLX5DR_ACTION_TYP_REFORMAT_TNL_L2_TO_L2:
+   case MLX5DR_ACTION_TYP_REFORMAT_TNL_L3_TO_L2:
+   pos = MLX5_INDIRECT_LIST_POSITION_BEFORE_MH;
+   break;
+   case MLX5DR_ACTION_TYP_REFORMAT_L2_TO_TNL_L2:
+   case MLX5DR_ACTION_TYP_REFORMAT_L2_TO_TNL_L3:
+   pos = MLX5_INDIRECT_LIST_POSITION_AFTER_MH;
+   break;
+   default:
+   pos = MLX5_INDIRECT_LIST_POSITION_UNKNOWN;
+   break;
+   }
+   break;
+   default:
+   pos = MLX5_INDIRECT_LIST_POSITION_UNKNOWN;
+   break;
+   }
+   return pos;
+}
+
+#define MLX5_HW_EXPAND_MH_FAILED 0x
+
 static inline uint16_t
 flow_hw_template_expand_modify_field(struct rte_flow_action actions[],
 struct rte_flow_action masks[],
@@ -5839,6 +5905,7 @@ flow_hw_template_expand_modify_field(struct 
rte_flow_action actions[],
 * @see action_order_arr[]
 */
for (i = act_num - 2; (int)i >= 0; i--) {
+   enum mlx5_hw_indirect_list_relative_position pos;
enum rte_flow_action_type type = actions[i].type;
uint64_t reformat_type;
 
@@ -5869,6 +5936,13 @@ flow_hw_template_expand_modify_field(struct 
rte_flow_action actions[],
if (actions[i - 1].type == 
RTE_FLOW_ACTION_TYPE_RAW_DECAP)
i--;
break;
+   case RTE_FLOW_ACTION_TYPE_INDIRECT_LIST:
+   pos = mlx5_hw_indirect_list_mh_position(&actions[i]);
+   if (pos == MLX5_INDI

[PATCH v4] common/qat: add virtual qat device (vQAT)

2024-02-29 Thread Arkadiusz Kusztal
This commit adds virtual QAT device to the Intel
QuickAssist Technology PMD. There are three kinds of
virtual QAT device defined which offer different QAT
service to the customers: symmetric crypto, asymmetric
crypto and compression.

Signed-off-by: Arkadiusz Kusztal 
---
v2:
- added symmetric crypto qp config
v3:
- added compression
- added asymmetric crypto
v4:
- rebased to fix a release notes issue

 doc/guides/rel_notes/release_24_03.rst   |  4 ++
 drivers/common/qat/dev/qat_dev_gen4.c| 55 +++-
 drivers/common/qat/qat_adf/icp_qat_hw.h  |  5 +++
 drivers/common/qat/qat_common.h  |  1 +
 drivers/common/qat/qat_device.c  |  7 +++-
 drivers/compress/qat/dev/qat_comp_pmd_gen4.c | 18 ++---
 drivers/compress/qat/qat_comp_pmd.c  |  7 
 drivers/crypto/qat/dev/qat_crypto_pmd_gen4.c | 24 
 drivers/crypto/qat/qat_asym.c|  7 
 drivers/crypto/qat/qat_sym.c |  7 
 drivers/crypto/qat/qat_sym_session.c | 13 ---
 11 files changed, 126 insertions(+), 22 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index 879bb4944c..76a9e7230a 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -138,6 +138,10 @@ New Features
 to support TLS v1.2, TLS v1.3 and DTLS v1.2.
   * Added PMD API to allow raw submission of instructions to CPT.
 
+* **Updated Intel QuickAssist Technology driver.**
+
+  * Enabled support for virtual QAT - vQAT (0da5) devices in QAT PMD.
+
 
 Removed Items
 -
diff --git a/drivers/common/qat/dev/qat_dev_gen4.c 
b/drivers/common/qat/dev/qat_dev_gen4.c
index 1ce262f715..1c5a2f2b6f 100644
--- a/drivers/common/qat/dev/qat_dev_gen4.c
+++ b/drivers/common/qat/dev/qat_dev_gen4.c
@@ -143,6 +143,42 @@ qat_dev_read_config_gen4(struct qat_pci_device *qat_dev)
return 0;
 }
 
+static int
+qat_dev_read_config_vqat(struct qat_pci_device *qat_dev)
+{
+   int i = 0;
+   struct qat_dev_gen4_extra *dev_extra = qat_dev->dev_private;
+   struct qat_qp_hw_data *hw_data;
+   struct qat_device_info *qat_dev_instance =
+   &qat_pci_devs[qat_dev->qat_dev_id];
+   uint16_t sub_id = qat_dev_instance->pci_dev->id.subsystem_device_id;
+
+   for (; i < QAT_GEN4_BUNDLE_NUM; i++) {
+   hw_data = &dev_extra->qp_gen4_data[i][0];
+   memset(hw_data, 0, sizeof(*hw_data));
+   if (sub_id == ADF_VQAT_SYM_PCI_SUBSYSTEM_ID) {
+   hw_data->service_type = QAT_SERVICE_SYMMETRIC;
+   hw_data->tx_msg_size = 128;
+   hw_data->rx_msg_size = 32;
+   } else if (sub_id == ADF_VQAT_ASYM_PCI_SUBSYSTEM_ID) {
+   hw_data->service_type = QAT_SERVICE_ASYMMETRIC;
+   hw_data->tx_msg_size = 64;
+   hw_data->rx_msg_size = 32;
+   } else if (sub_id == ADF_VQAT_DC_PCI_SUBSYSTEM_ID) {
+   hw_data->service_type = QAT_SERVICE_COMPRESSION;
+   hw_data->tx_msg_size = 128;
+   hw_data->rx_msg_size = 32;
+   } else {
+   QAT_LOG(ERR, "Unrecognized subsystem id %hu", sub_id);
+   return -EINVAL;
+   }
+   hw_data->tx_ring_num = 0;
+   hw_data->rx_ring_num = 1;
+   hw_data->hw_bundle_num = i;
+   }
+   return 0;
+}
+
 static void
 qat_qp_build_ring_base_gen4(void *io_addr,
struct qat_queue *queue)
@@ -268,6 +304,12 @@ qat_reset_ring_pairs_gen4(struct qat_pci_device 
*qat_pci_dev)
return 0;
 }
 
+static int
+qat_reset_ring_pairs_vqat(struct qat_pci_device *qat_pci_dev __rte_unused)
+{
+   return 0;
+}
+
 static const struct rte_mem_resource *
 qat_dev_get_transport_bar_gen4(struct rte_pci_device *pci_dev)
 {
@@ -304,10 +346,21 @@ static struct qat_dev_hw_spec_funcs qat_dev_hw_spec_gen4 
= {
.qat_dev_get_slice_map = qat_dev_get_slice_map_gen4,
 };
 
+static struct qat_dev_hw_spec_funcs qat_dev_hw_spec_vqat = {
+   .qat_dev_reset_ring_pairs = qat_reset_ring_pairs_vqat,
+   .qat_dev_get_transport_bar = qat_dev_get_transport_bar_gen4,
+   .qat_dev_get_misc_bar = qat_dev_get_misc_bar_gen4,
+   .qat_dev_read_config = qat_dev_read_config_vqat,
+   .qat_dev_get_extra_size = qat_dev_get_extra_size_gen4,
+   .qat_dev_get_slice_map = qat_dev_get_slice_map_gen4,
+};
+
 RTE_INIT(qat_dev_gen_4_init)
 {
-   qat_qp_hw_spec[QAT_GEN4] = &qat_qp_hw_spec_gen4;
+   qat_qp_hw_spec[QAT_VQAT] = qat_qp_hw_spec[QAT_GEN4] = 
&qat_qp_hw_spec_gen4;
qat_dev_hw_spec[QAT_GEN4] = &qat_dev_hw_spec_gen4;
+   qat_dev_hw_spec[QAT_VQAT] = &qat_dev_hw_spec_vqat;
qat_gen_config[QAT_GEN4].dev_gen = QAT_GEN4;
+   qat_gen_config[QAT_VQAT].dev_gen = QAT_VQAT;
qat_gen

RE: [PATCH v4] common/qat: add virtual qat device (vQAT)

2024-02-29 Thread Power, Ciara



> -Original Message-
> From: Kusztal, ArkadiuszX 
> Sent: Thursday, February 29, 2024 10:22 AM
> To: dev@dpdk.org
> Cc: gak...@marvell.com; Power, Ciara ; Kusztal,
> ArkadiuszX 
> Subject: [PATCH v4] common/qat: add virtual qat device (vQAT)
> 
> This commit adds virtual QAT device to the Intel QuickAssist Technology PMD.
> There are three kinds of virtual QAT device defined which offer different QAT
> service to the customers: symmetric crypto, asymmetric crypto and
> compression.
> 
> Signed-off-by: Arkadiusz Kusztal 
> ---
> v2:
> - added symmetric crypto qp config
> v3:
> - added compression
> - added asymmetric crypto
> v4:
> - rebased to fix a release notes issue
> 
>  doc/guides/rel_notes/release_24_03.rst   |  4 ++
>  drivers/common/qat/dev/qat_dev_gen4.c| 55
> +++-
>  drivers/common/qat/qat_adf/icp_qat_hw.h  |  5 +++
>  drivers/common/qat/qat_common.h  |  1 +
>  drivers/common/qat/qat_device.c  |  7 +++-
>  drivers/compress/qat/dev/qat_comp_pmd_gen4.c | 18 ++---
>  drivers/compress/qat/qat_comp_pmd.c  |  7 
>  drivers/crypto/qat/dev/qat_crypto_pmd_gen4.c | 24 
>  drivers/crypto/qat/qat_asym.c|  7 
>  drivers/crypto/qat/qat_sym.c |  7 
>  drivers/crypto/qat/qat_sym_session.c | 13 ---
>  11 files changed, 126 insertions(+), 22 deletions(-)

Acked-by: Ciara Power 




RE: [PATCH v4 0/5] NAT64 support in mlx5 PMD

2024-02-29 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: Bing Zhao 
> Sent: Wednesday, February 28, 2024 5:09 PM
> To: Ori Kam ; aman.deep.si...@intel.com; Dariusz
> Sosnowski ; Slava Ovsiienko
> ; Suanming Mou ;
> Matan Azrad ; NBU-Contact-Thomas Monjalon
> (EXTERNAL) ; ferruh.yi...@amd.com;
> dev@dpdk.org; Raslan Darawsheh 
> Cc: yuying.zh...@intel.com; andrew.rybche...@oktetlabs.ru
> Subject: [PATCH v4 0/5] NAT64 support in mlx5 PMD
> 
> This patch set contains the mlx5 PMD implementation for NAT64.
> 
> Series-acked-by: Ori Kam 
> 
> Update in v4:
>   1. rebase to solve the conflicts.
>   2. fix the old NIC startup issue in a separate patch:
>  https://patches.dpdk.org/project/dpdk/patch/20240227152627.25749-
> 1-bi...@nvidia.com/
> 
> Update in v3:
>   1. code style and typo.
> 
> Update in v2:
>   1. separate from the RTE and testpmd common part.
>   2. reorder the commits.
>   3. bug fix, code polishing and document update.
> 
> Bing Zhao (4):
>   net/mlx5: fetch the available registers for NAT64
>   net/mlx5: create NAT64 actions during configuration
>   net/mlx5: add NAT64 action support in rule creation
>   net/mlx5: validate the actions combination with NAT64
> 
> Erez Shitrit (1):
>   net/mlx5/hws: support NAT64 action
> 
>  doc/guides/nics/features/mlx5.ini  |   1 +
>  doc/guides/nics/mlx5.rst   |  10 +
>  doc/guides/rel_notes/release_24_03.rst |   7 +
>  drivers/net/mlx5/hws/mlx5dr.h  |  29 ++
>  drivers/net/mlx5/hws/mlx5dr_action.c   | 436
> -
>  drivers/net/mlx5/hws/mlx5dr_action.h   |  35 ++
>  drivers/net/mlx5/hws/mlx5dr_debug.c|   1 +
>  drivers/net/mlx5/mlx5.c|   9 +
>  drivers/net/mlx5/mlx5.h|  11 +
>  drivers/net/mlx5/mlx5_flow.h   |  12 +
>  drivers/net/mlx5/mlx5_flow_dv.c|   4 +-
>  drivers/net/mlx5/mlx5_flow_hw.c| 136 
>  12 files changed, 689 insertions(+), 2 deletions(-)
> 
> --
> 2.39.3
Series applied to next-net-mlx,
Kindest regards
Raslan Darawsheh


[PATCH] net/mlx5: fix sync meter processing in HWS setup

2024-02-29 Thread Gregory Etelson
Synchronous calls for meter ASO try to pull pending completions
from CQ, submit WR and return to caller. That avoids delays between
WR post and  HW response.
If the template API was activated, PMD will use control queue for
sync operations.

PMD has different formats for the `user_data` context in sync and
async meter ASO calls.
PMD port destruction procedure submits async operations to the port
control queue and polls the queue CQs to clean HW responses.

Port destruction can pull a meter ASO completion from control CQ.
Such completion has sync format, but was processed by async handler.

The patch implements sync meter ASO interface with async calls
in the template API environment.

Fixes: 48fbb0e93d06 ("net/mlx5: support flow meter mark indirect action with 
HWS")

Cc: sta...@dpdk.org

Signed-off-by: Gregory Etelson 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5.h| 108 ++
 drivers/net/mlx5/mlx5_flow_aso.c   | 170 ++---
 drivers/net/mlx5/mlx5_flow_hw.c|  98 +
 drivers/net/mlx5/mlx5_flow_meter.c |  27 +++--
 4 files changed, 242 insertions(+), 161 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index bb1853e797..1575876a46 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -392,44 +392,6 @@ enum mlx5_hw_indirect_type {
 
 #define MLX5_HW_MAX_ITEMS (16)
 
-/* HW steering flow management job descriptor. */
-struct mlx5_hw_q_job {
-   uint32_t type; /* Job type. */
-   uint32_t indirect_type;
-   union {
-   struct rte_flow_hw *flow; /* Flow attached to the job. */
-   const void *action; /* Indirect action attached to the job. */
-   };
-   void *user_data; /* Job user data. */
-   uint8_t *encap_data; /* Encap data. */
-   uint8_t *push_data; /* IPv6 routing push data. */
-   struct mlx5_modification_cmd *mhdr_cmd;
-   struct rte_flow_item *items;
-   union {
-   struct {
-   /* User memory for query output */
-   void *user;
-   /* Data extracted from hardware */
-   void *hw;
-   } __rte_packed query;
-   struct rte_flow_item_ethdev port_spec;
-   struct rte_flow_item_tag tag_spec;
-   } __rte_packed;
-   struct rte_flow_hw *upd_flow; /* Flow with updated values. */
-};
-
-/* HW steering job descriptor LIFO pool. */
-struct mlx5_hw_q {
-   uint32_t job_idx; /* Free job index. */
-   uint32_t size; /* LIFO size. */
-   struct mlx5_hw_q_job **job; /* LIFO header. */
-   struct rte_ring *indir_cq; /* Indirect action SW completion queue. */
-   struct rte_ring *indir_iq; /* Indirect action SW in progress queue. */
-   struct rte_ring *flow_transfer_pending;
-   struct rte_ring *flow_transfer_completed;
-} __rte_cache_aligned;
-
-
 #define MLX5_COUNTER_POOLS_MAX_NUM (1 << 15)
 #define MLX5_COUNTERS_PER_POOL 512
 #define MLX5_MAX_PENDING_QUERIES 4
@@ -2025,6 +1987,65 @@ enum dr_dump_rec_type {
DR_DUMP_REC_TYPE_PMD_COUNTER = 4430,
 };
 
+/* HW steering flow management job descriptor. */
+struct mlx5_hw_q_job {
+   uint32_t type; /* Job type. */
+   uint32_t indirect_type;
+   union {
+   struct rte_flow_hw *flow; /* Flow attached to the job. */
+   const void *action; /* Indirect action attached to the job. */
+   };
+   void *user_data; /* Job user data. */
+   uint8_t *encap_data; /* Encap data. */
+   uint8_t *push_data; /* IPv6 routing push data. */
+   struct mlx5_modification_cmd *mhdr_cmd;
+   struct rte_flow_item *items;
+   union {
+   struct {
+   /* User memory for query output */
+   void *user;
+   /* Data extracted from hardware */
+   void *hw;
+   } __rte_packed query;
+   struct rte_flow_item_ethdev port_spec;
+   struct rte_flow_item_tag tag_spec;
+   } __rte_packed;
+   struct rte_flow_hw *upd_flow; /* Flow with updated values. */
+};
+
+/* HW steering job descriptor LIFO pool. */
+struct mlx5_hw_q {
+   uint32_t job_idx; /* Free job index. */
+   uint32_t size; /* LIFO size. */
+   struct mlx5_hw_q_job **job; /* LIFO header. */
+   struct rte_ring *indir_cq; /* Indirect action SW completion queue. */
+   struct rte_ring *indir_iq; /* Indirect action SW in progress queue. */
+   struct rte_ring *flow_transfer_pending;
+   struct rte_ring *flow_transfer_completed;
+} __rte_cache_aligned;
+
+static __rte_always_inline struct mlx5_hw_q_job *
+flow_hw_job_get(struct mlx5_priv *priv, uint32_t queue)
+{
+   MLX5_ASSERT(priv->hw_q[queue].job_idx <= priv->hw_q[queue].size);
+   return priv->hw_q[queue].job_idx ?
+  priv->hw_q[queue].job[--priv->hw_q[queue].job_idx] : NULL;
+}
+
+static __rte_always_inline vo

[PATCH] net/mlx5: fix non-masked indirect list meter translation in flow rule

2024-02-29 Thread Gregory Etelson
Template table reuses DR5 action handle for non-masked indirect
actions. Flow rule must explicitly translate non-masked indirect
action and update DR5 handle with the rule indirect object.

Current PMD assumed DR5 handle of non-masked indirect action was
always NULL before the action translation.

The patch always translates non-masked indirect list meter object.

Fixes: e26f50adbf38 ("net/mlx5: support indirect list meter mark action")

Cc: sta...@dpdk.org

Signed-off-by: Gregory Etelson 
Acked-by: Dariusz Sosnowski 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 9620b7f576..9833654aac 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -1702,15 +1702,9 @@ flow_hw_translate_indirect_meter(struct rte_eth_dev *dev,
const struct rte_flow_indirect_update_flow_meter_mark **flow_conf =
(typeof(flow_conf))action_conf->conf;
 
-   /*
-* Masked indirect handle set dr5 action during template table
-* translation.
-*/
-   if (!dr_rule->action) {
-   ret = flow_dr_set_meter(priv, dr_rule, action_conf);
-   if (ret)
-   return ret;
-   }
+   ret = flow_dr_set_meter(priv, dr_rule, action_conf);
+   if (ret)
+   return ret;
if (!act_data->shared_meter.conf_masked) {
if (flow_conf && flow_conf[0] && flow_conf[0]->init_color < 
RTE_COLORS)
flow_dr_mtr_flow_color(dr_rule, 
flow_conf[0]->init_color);
-- 
2.39.2



Re: [PATCH] net/ixgbevf: fix RSS init for x550 nics

2024-02-29 Thread Bruce Richardson
On Tue, Feb 27, 2024 at 05:26:06PM +, Medvedkin, Vladimir wrote:
> 
> On 15/02/2024 13:31, edwin.brosse...@6wind.com wrote:
> > From: Edwin Brossette 
> > 
> > Different Intel nics with the igxbe pmd do not handle RSS in the same
> > way when working with virtualization. While some nics like Intel 82599ES
> > only have a single RSS table in the device and leave all rss features to
> > be handled by the pf, some other nics like x550 let the vf handle RSS
> > features. This can lead to different behavior when rss is enabled
> > depending on the model of nic used.
> > 
> > In particular, it occurred that ixgbevf_dev_rx_init() do not initiate
> > rss parameters at device init, even if the multi-queue mode option is
> > set in the device configuration (ie: RTE_ETH_MQ_RX_RSS is set). Note
> > that this issue went unnoticed until now, probably because some nics do
> > not really have support for RSS in virtualization mode.
> > 
> > Thus, depending on the nic used, we can we find ourselves in a situation
> > where RSS is not configured despite being enabled. This will cause
> > serious performance issues because the RSS reta will be fully zeroed,
> > causing all packets to go only in the first queue and leaving all the
> > others empty.
> > 
> > By looking at ixgbe_reta_size_get(), we can see that only X550 nic
> > models have a non zero reta size set in vf mode. Thus add a call to
> > ixgbe_rss_configure() for these cards in ixgbevf_dev_rx_init() if the
> > option to enable RSS is set.
> > 
> > Fixes: f4d1598ee14f ("ixgbevf: support RSS config on x550")

+ Cc: sta...@dpdk.org

> > Signed-off-by: Edwin Brossette 
> > ---
> Acked-by: Vladimir Medvedkin 
> 

Applied to next-net-intel tree.
Thanks,
/Bruce


[PATCH v2 00/11] net/mlx5: flow insertion performance improvements

2024-02-29 Thread Dariusz Sosnowski
Goal of this patchset is to improve the throughput of flow insertion
and deletion in mlx5 PMD when HW Steering flow engine is used.

- Patch 1 - Use preallocated per-queue, per-actions template buffer
  for storing translated flow actions, instead of allocating and
  filling it on demand, on each flow operation.
- Patches 2-4 - Make resource index allocation optional. This allocation
  will be skipped when it is not required by the created template table.
- Patches 5-7 - Reduce memory footprint of the internal flow queue.
- Patch 8 - Remove indirection between flow job and flow itself,
  by using flow as an operation container.
- Patches 9-10 - Reduce memory footpring of flow struct by moving
  rarely used flow fields outside of the main flow struct.
  These fields will accesses only when needed.
  Also remove unneeded `zmalloc` usage.
- Patch 11 - Remove unneeded device status check in flow create.

In general all of these changes result in the following improvements
(all numbers are averaged Kflows/sec):

|  | Insertion) |   +%   | Deletion |   +%  |
|--|:--:|:--:|::|:-:|
| baseline |   6338.7   ||  9739.6  |   |
| improvements |   6978.8   | +10.1% |  10432.4 | +7.1% |

The basic benchmark was run on ConnectX-6 Dx (22.40.1000),
on the system with Intel Xeon Platinum 8380 CPU.

v2:

- Rebased.
- Applied Acked-by tags from previous version.

Bing Zhao (2):
  net/mlx5: skip the unneeded resource index allocation
  net/mlx5: remove unneeded device status checking

Dariusz Sosnowski (7):
  net/mlx5: allocate local DR rule action buffers
  net/mlx5: remove action params from job
  net/mlx5: remove flow pattern from job
  net/mlx5: remove updated flow from job
  net/mlx5: use flow as operation container
  net/mlx5: move rarely used flow fields outside
  net/mlx5: reuse flow fields

Erez Shitrit (2):
  net/mlx5/hws: add check for matcher rule update support
  net/mlx5/hws: add check if matcher contains complex rules

 drivers/net/mlx5/hws/mlx5dr.h |  16 +
 drivers/net/mlx5/hws/mlx5dr_action.c  |   6 +
 drivers/net/mlx5/hws/mlx5dr_action.h  |   2 +
 drivers/net/mlx5/hws/mlx5dr_matcher.c |  29 +
 drivers/net/mlx5/mlx5.h   |  29 +-
 drivers/net/mlx5/mlx5_flow.h  | 128 -
 drivers/net/mlx5/mlx5_flow_hw.c   | 794 --
 7 files changed, 666 insertions(+), 338 deletions(-)

--
2.39.2



[PATCH v2 01/11] net/mlx5: allocate local DR rule action buffers

2024-02-29 Thread Dariusz Sosnowski
Goal of this is to remove the unnecessary copying of precalculated
mlx5dr_rule_action structures used to create HWS flow rules.

Before this patch, during template table creation an array of these
structures was calculated for each actions template used.
Each of these structures contained either full action definition or
partial (depends on mask configuration).
During flow creation, this array was copied to stack and later passed to
mlx5dr_rule_create().

This patch removes this copy by implementing the following:

- Allocate an array of mlx5dr_rule_action structures for each actions
  template and queue.
- Populate them with precalculated data from relevant actions templates.
- During flow creation, construction of unmasked actions works on an
  array dedicated for the specific queue and actions template.
- Pass this buffer to mlx5dr_rule_create directly.

Signed-off-by: Dariusz Sosnowski 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5_flow.h| 13 +
 drivers/net/mlx5/mlx5_flow_hw.c | 51 +
 2 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 7aa24f7c52..02af0a08fa 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1566,6 +1566,10 @@ struct mlx5_matcher_info {
uint32_t refcnt;
 };
 
+struct mlx5_dr_rule_action_container {
+   struct mlx5dr_rule_action acts[MLX5_HW_MAX_ACTS];
+} __rte_cache_aligned;
+
 struct rte_flow_template_table {
LIST_ENTRY(rte_flow_template_table) next;
struct mlx5_flow_group *grp; /* The group rte_flow_template_table uses. 
*/
@@ -1585,6 +1589,15 @@ struct rte_flow_template_table {
uint32_t refcnt; /* Table reference counter. */
struct mlx5_tbl_multi_pattern_ctx mpctx;
struct mlx5dr_matcher_attr matcher_attr;
+   /**
+* Variable length array of containers containing precalculated 
templates of DR actions
+* arrays. This array is allocated at template table creation time and 
contains
+* one container per each queue, per each actions template.
+* Essentially rule_acts is a 2-dimensional array indexed with (AT 
index, queue) pair.
+* Each container will provide a local "queue buffer" to work on for 
flow creation
+* operations when using a given actions template.
+*/
+   struct mlx5_dr_rule_action_container rule_acts[];
 };
 
 static __rte_always_inline struct mlx5dr_matcher *
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 9620b7f576..ef91a23a9b 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -2512,6 +2512,34 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
  "fail to create rte table");
 }
 
+static __rte_always_inline struct mlx5dr_rule_action *
+flow_hw_get_dr_action_buffer(struct mlx5_priv *priv,
+struct rte_flow_template_table *table,
+uint8_t action_template_index,
+uint32_t queue)
+{
+   uint32_t offset = action_template_index * priv->nb_queue + queue;
+
+   return &table->rule_acts[offset].acts[0];
+}
+
+static void
+flow_hw_populate_rule_acts_caches(struct rte_eth_dev *dev,
+ struct rte_flow_template_table *table,
+ uint8_t at_idx)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+   uint32_t q;
+
+   for (q = 0; q < priv->nb_queue; ++q) {
+   struct mlx5dr_rule_action *rule_acts =
+   flow_hw_get_dr_action_buffer(priv, table, 
at_idx, q);
+
+   rte_memcpy(rule_acts, table->ats[at_idx].acts.rule_acts,
+  sizeof(table->ats[at_idx].acts.rule_acts));
+   }
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -2539,6 +2567,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
tbl->ats[i].action_template,
&tbl->mpctx, error))
goto err;
+   flow_hw_populate_rule_acts_caches(dev, tbl, i);
}
ret = mlx5_tbl_multi_pattern_process(dev, tbl, &tbl->mpctx.segments[0],
 
rte_log2_u32(tbl->cfg.attr.nb_flows),
@@ -2928,7 +2957,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
struct mlx5_aso_mtr *aso_mtr;
struct mlx5_multi_pattern_segment *mp_segment = NULL;
 
-   rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * 
at->dr_actions_num);
attr.group = table->grp->group_id;
ft_flag = mlx5_hw_act_flag[!!table->grp->group_id][table->type];
if (table->type == MLX5DR_TABLE_TYPE_FDB) {
@@ -3335,7 +3363,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
.user_data = user_data,
.burst 

[PATCH v2 02/11] net/mlx5/hws: add check for matcher rule update support

2024-02-29 Thread Dariusz Sosnowski
From: Erez Shitrit 

The user want to know before trying to update a rule if that matcher that
keeps the original rule supports updating.

Signed-off-by: Erez Shitrit 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/hws/mlx5dr.h |  8 
 drivers/net/mlx5/hws/mlx5dr_matcher.c | 12 
 2 files changed, 20 insertions(+)

diff --git a/drivers/net/mlx5/hws/mlx5dr.h b/drivers/net/mlx5/hws/mlx5dr.h
index 8441ae97e9..c5824a6480 100644
--- a/drivers/net/mlx5/hws/mlx5dr.h
+++ b/drivers/net/mlx5/hws/mlx5dr.h
@@ -492,6 +492,14 @@ int mlx5dr_matcher_resize_rule_move(struct mlx5dr_matcher 
*src_matcher,
struct mlx5dr_rule *rule,
struct mlx5dr_rule_attr *attr);
 
+/* Check matcher ability to update existing rules
+ *
+ * @param[in] matcher
+ * that the rule belongs to.
+ * @return true when the matcher is updatable false otherwise.
+ */
+bool mlx5dr_matcher_is_updatable(struct mlx5dr_matcher *matcher);
+
 /* Get the size of the rule handle (mlx5dr_rule) to be used on rule creation.
  *
  * @return size in bytes of rule handle struct.
diff --git a/drivers/net/mlx5/hws/mlx5dr_matcher.c 
b/drivers/net/mlx5/hws/mlx5dr_matcher.c
index 8a74a1ed7d..4e4da8e8f6 100644
--- a/drivers/net/mlx5/hws/mlx5dr_matcher.c
+++ b/drivers/net/mlx5/hws/mlx5dr_matcher.c
@@ -1530,6 +1530,18 @@ int mlx5dr_match_template_destroy(struct 
mlx5dr_match_template *mt)
return 0;
 }
 
+bool mlx5dr_matcher_is_updatable(struct mlx5dr_matcher *matcher)
+{
+   if (mlx5dr_table_is_root(matcher->tbl) ||
+   mlx5dr_matcher_req_fw_wqe(matcher) ||
+   mlx5dr_matcher_is_resizable(matcher) ||
+   (!matcher->attr.optimize_using_rule_idx &&
+   !mlx5dr_matcher_is_insert_by_idx(matcher)))
+   return false;
+
+   return true;
+}
+
 static int mlx5dr_matcher_resize_precheck(struct mlx5dr_matcher *src_matcher,
  struct mlx5dr_matcher *dst_matcher)
 {
-- 
2.39.2



[PATCH v2 04/11] net/mlx5: skip the unneeded resource index allocation

2024-02-29 Thread Dariusz Sosnowski
From: Bing Zhao 

The resource index was introduced to decouple the flow rule and its
resources used by hardware steering. This is needed only when a rule
update is supported.

In some cases, the update is not supported on a table(matcher). E.g.:
  * Table is resizable
  * FW gets involved
  * Root table
  * Not index based or optimized (not applicable)

Or only one STE entry is required per rule. When doing an update, the
operation is always atomic. There is no need for the extra resource
index either.

If the matcher doesn't support rule update or the maximal entry is
only 1 for this matcher, there is no need to manage the resource
index allocation and free from the pool.

Signed-off-by: Bing Zhao 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 129 +++-
 1 file changed, 76 insertions(+), 53 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index ef91a23a9b..1fe8f42618 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3383,9 +3383,6 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
flow = mlx5_ipool_zmalloc(table->flow, &flow_idx);
if (!flow)
goto error;
-   mlx5_ipool_malloc(table->resource, &res_idx);
-   if (!res_idx)
-   goto error;
rule_acts = flow_hw_get_dr_action_buffer(priv, table, 
action_template_index, queue);
/*
 * Set the table here in order to know the destination table
@@ -3394,7 +3391,14 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
flow->table = table;
flow->mt_idx = pattern_template_index;
flow->idx = flow_idx;
-   flow->res_idx = res_idx;
+   if (table->resource) {
+   mlx5_ipool_malloc(table->resource, &res_idx);
+   if (!res_idx)
+   goto error;
+   flow->res_idx = res_idx;
+   } else {
+   flow->res_idx = flow_idx;
+   }
/*
 * Set the job type here in order to know if the flow memory
 * should be freed or not when get the result from dequeue.
@@ -3404,11 +3408,10 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
job->user_data = user_data;
rule_attr.user_data = job;
/*
-* Indexed pool returns 1-based indices, but mlx5dr expects 0-based 
indices for rule
-* insertion hints.
+* Indexed pool returns 1-based indices, but mlx5dr expects 0-based 
indices
+* for rule insertion hints.
 */
-   MLX5_ASSERT(res_idx > 0);
-   flow->rule_idx = res_idx - 1;
+   flow->rule_idx = flow->res_idx - 1;
rule_attr.rule_idx = flow->rule_idx;
/*
 * Construct the flow actions based on the input actions.
@@ -3451,12 +3454,12 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
if (likely(!ret))
return (struct rte_flow *)flow;
 error:
-   if (job)
-   flow_hw_job_put(priv, job, queue);
+   if (table->resource && res_idx)
+   mlx5_ipool_free(table->resource, res_idx);
if (flow_idx)
mlx5_ipool_free(table->flow, flow_idx);
-   if (res_idx)
-   mlx5_ipool_free(table->resource, res_idx);
+   if (job)
+   flow_hw_job_put(priv, job, queue);
rte_flow_error_set(error, rte_errno,
   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
   "fail to create rte flow");
@@ -3527,9 +3530,6 @@ flow_hw_async_flow_create_by_index(struct rte_eth_dev 
*dev,
flow = mlx5_ipool_zmalloc(table->flow, &flow_idx);
if (!flow)
goto error;
-   mlx5_ipool_malloc(table->resource, &res_idx);
-   if (!res_idx)
-   goto error;
rule_acts = flow_hw_get_dr_action_buffer(priv, table, 
action_template_index, queue);
/*
 * Set the table here in order to know the destination table
@@ -3538,7 +3538,14 @@ flow_hw_async_flow_create_by_index(struct rte_eth_dev 
*dev,
flow->table = table;
flow->mt_idx = 0;
flow->idx = flow_idx;
-   flow->res_idx = res_idx;
+   if (table->resource) {
+   mlx5_ipool_malloc(table->resource, &res_idx);
+   if (!res_idx)
+   goto error;
+   flow->res_idx = res_idx;
+   } else {
+   flow->res_idx = flow_idx;
+   }
/*
 * Set the job type here in order to know if the flow memory
 * should be freed or not when get the result from dequeue.
@@ -3547,9 +3554,7 @@ flow_hw_async_flow_create_by_index(struct rte_eth_dev 
*dev,
job->flow = flow;
job->user_data = user_data;
rule_attr.user_data = job;
-   /*
-* Set the rule index.
-*/
+   /* Set the rule index. */
flow->rule_idx = rule_index;
rule_attr.rule_idx = flow->rule_idx;
/*
@@ -3585,12 +3590,12 @@ flow_hw_async_flow_create_

[PATCH v2 03/11] net/mlx5/hws: add check if matcher contains complex rules

2024-02-29 Thread Dariusz Sosnowski
From: Erez Shitrit 

The function returns true if that matcher can contain complicated rule,
which means rule that needs more than one writing to the HW in order to
have it.

Signed-off-by: Erez Shitrit 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/hws/mlx5dr.h |  8 
 drivers/net/mlx5/hws/mlx5dr_action.c  |  6 ++
 drivers/net/mlx5/hws/mlx5dr_action.h  |  2 ++
 drivers/net/mlx5/hws/mlx5dr_matcher.c | 17 +
 4 files changed, 33 insertions(+)

diff --git a/drivers/net/mlx5/hws/mlx5dr.h b/drivers/net/mlx5/hws/mlx5dr.h
index c5824a6480..36ecccf9ac 100644
--- a/drivers/net/mlx5/hws/mlx5dr.h
+++ b/drivers/net/mlx5/hws/mlx5dr.h
@@ -500,6 +500,14 @@ int mlx5dr_matcher_resize_rule_move(struct mlx5dr_matcher 
*src_matcher,
  */
 bool mlx5dr_matcher_is_updatable(struct mlx5dr_matcher *matcher);
 
+/* Check matcher if might contain rules that need complex structure
+ *
+ * @param[in] matcher
+ * that the rule belongs to.
+ * @return true when the matcher is contains such rules, false otherwise.
+ */
+bool mlx5dr_matcher_is_dependent(struct mlx5dr_matcher *matcher);
+
 /* Get the size of the rule handle (mlx5dr_rule) to be used on rule creation.
  *
  * @return size in bytes of rule handle struct.
diff --git a/drivers/net/mlx5/hws/mlx5dr_action.c 
b/drivers/net/mlx5/hws/mlx5dr_action.c
index 96cad553aa..084d4d606e 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.c
+++ b/drivers/net/mlx5/hws/mlx5dr_action.c
@@ -3686,6 +3686,7 @@ int mlx5dr_action_template_process(struct 
mlx5dr_action_template *at)
setter->flags |= ASF_SINGLE1 | ASF_REMOVE;
setter->set_single = 
&mlx5dr_action_setter_ipv6_route_ext_pop;
setter->idx_single = i;
+   at->need_dep_write = true;
break;
 
case MLX5DR_ACTION_TYP_PUSH_IPV6_ROUTE_EXT:
@@ -3712,6 +3713,7 @@ int mlx5dr_action_template_process(struct 
mlx5dr_action_template *at)
setter->set_double = 
&mlx5dr_action_setter_ipv6_route_ext_mhdr;
setter->idx_double = i;
setter->extra_data = 2;
+   at->need_dep_write = true;
break;
 
case MLX5DR_ACTION_TYP_MODIFY_HDR:
@@ -3720,6 +3722,7 @@ int mlx5dr_action_template_process(struct 
mlx5dr_action_template *at)
setter->flags |= ASF_DOUBLE | ASF_MODIFY;
setter->set_double = 
&mlx5dr_action_setter_modify_header;
setter->idx_double = i;
+   at->need_dep_write = true;
break;
 
case MLX5DR_ACTION_TYP_ASO_METER:
@@ -3747,6 +3750,7 @@ int mlx5dr_action_template_process(struct 
mlx5dr_action_template *at)
setter->flags |= ASF_DOUBLE | ASF_INSERT;
setter->set_double = &mlx5dr_action_setter_insert_ptr;
setter->idx_double = i;
+   at->need_dep_write = true;
break;
 
case MLX5DR_ACTION_TYP_REFORMAT_L2_TO_TNL_L3:
@@ -3757,6 +3761,7 @@ int mlx5dr_action_template_process(struct 
mlx5dr_action_template *at)
setter->idx_double = i;
setter->set_single = &mlx5dr_action_setter_common_decap;
setter->idx_single = i;
+   at->need_dep_write = true;
break;
 
case MLX5DR_ACTION_TYP_REFORMAT_TNL_L3_TO_L2:
@@ -3765,6 +3770,7 @@ int mlx5dr_action_template_process(struct 
mlx5dr_action_template *at)
setter->flags |= ASF_DOUBLE | ASF_MODIFY | ASF_INSERT;
setter->set_double = &mlx5dr_action_setter_tnl_l3_to_l2;
setter->idx_double = i;
+   at->need_dep_write = true;
break;
 
case MLX5DR_ACTION_TYP_TAG:
diff --git a/drivers/net/mlx5/hws/mlx5dr_action.h 
b/drivers/net/mlx5/hws/mlx5dr_action.h
index 064c18a90c..57e059a572 100644
--- a/drivers/net/mlx5/hws/mlx5dr_action.h
+++ b/drivers/net/mlx5/hws/mlx5dr_action.h
@@ -151,6 +151,8 @@ struct mlx5dr_action_template {
uint8_t num_of_action_stes;
uint8_t num_actions;
uint8_t only_term;
+   /* indicates rule might require dependent wqe */
+   bool need_dep_write;
uint32_t flags;
 };
 
diff --git a/drivers/net/mlx5/hws/mlx5dr_matcher.c 
b/drivers/net/mlx5/hws/mlx5dr_matcher.c
index 4e4da8e8f6..1c64abfa57 100644
--- a/drivers/net/mlx5/hws/mlx5dr_matcher.c
+++ b/drivers/net/mlx5/hws/mlx5dr_matcher.c
@@ -1542,6 +1542,23 @@ bool mlx5dr_matcher_is_updatable(struct mlx5dr_matcher 
*matcher)
return true;
 }
 
+bool mlx5dr_matcher_is_dependent(struct mlx5dr_matcher *matcher)
+{
+   int i;
+
+   if (matcher->action_ste.max_stes || mlx5dr_matcher_req_fw_wqe(matcher))
+   return tru

[PATCH v2 05/11] net/mlx5: remove action params from job

2024-02-29 Thread Dariusz Sosnowski
mlx5_hw_q_job struct held references to buffers which contained:

- modify header commands array,
- encap/decap data buffer,
- IPv6 routing data buffer.

These buffers were passed as parameters to HWS layer during rule
creation. They were needed only during the call to HWS layer
when flow operation is enqueues (i.e. mlx5dr_rule_create()).
After operation is enqueued, data stored there can be safely discarded
and it is not required to store it during the whole lifecycle of a job.

This patch removes references to these buffers from mlx5_hw_q_job
and removes relevant allocations to reduce job memory footprint.
Buffers stored per job are replaced with stack allocated ones,
contained in mlx5_flow_hw_action_params struct.

Signed-off-by: Dariusz Sosnowski 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5.h |   3 -
 drivers/net/mlx5/mlx5_flow.h|  10 +++
 drivers/net/mlx5/mlx5_flow_hw.c | 120 ++--
 3 files changed, 63 insertions(+), 70 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f11a0181b8..42dc312a87 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -401,9 +401,6 @@ struct mlx5_hw_q_job {
const void *action; /* Indirect action attached to the job. */
};
void *user_data; /* Job user data. */
-   uint8_t *encap_data; /* Encap data. */
-   uint8_t *push_data; /* IPv6 routing push data. */
-   struct mlx5_modification_cmd *mhdr_cmd;
struct rte_flow_item *items;
union {
struct {
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 02af0a08fa..9ed356e1c2 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1306,6 +1306,16 @@ typedef int
 
 #define MLX5_MHDR_MAX_CMD ((MLX5_MAX_MODIFY_NUM) * 2 + 1)
 
+/** Container for flow action data constructed during flow rule creation. */
+struct mlx5_flow_hw_action_params {
+   /** Array of constructed modify header commands. */
+   struct mlx5_modification_cmd mhdr_cmd[MLX5_MHDR_MAX_CMD];
+   /** Constructed encap/decap data buffer. */
+   uint8_t encap_data[MLX5_ENCAP_MAX_LEN];
+   /** Constructed IPv6 routing data buffer. */
+   uint8_t ipv6_push_data[MLX5_PUSH_MAX_LEN];
+};
+
 /* rte flow action translate to DR action struct. */
 struct mlx5_action_construct_data {
LIST_ENTRY(mlx5_action_construct_data) next;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 1fe8f42618..a87fe4d07a 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -158,7 +158,7 @@ static int flow_hw_translate_group(struct rte_eth_dev *dev,
   struct rte_flow_error *error);
 static __rte_always_inline int
 flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
-  struct mlx5_hw_q_job *job,
+  struct mlx5_modification_cmd *mhdr_cmd,
   struct mlx5_action_construct_data *act_data,
   const struct mlx5_hw_actions *hw_acts,
   const struct rte_flow_action *action);
@@ -2812,7 +2812,7 @@ flow_hw_mhdr_cmd_is_nop(const struct 
mlx5_modification_cmd *cmd)
  *0 on success, negative value otherwise and rte_errno is set.
  */
 static __rte_always_inline int
-flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
+flow_hw_modify_field_construct(struct mlx5_modification_cmd *mhdr_cmd,
   struct mlx5_action_construct_data *act_data,
   const struct mlx5_hw_actions *hw_acts,
   const struct rte_flow_action *action)
@@ -2871,7 +2871,7 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 
if (i >= act_data->modify_header.mhdr_cmds_end)
return -1;
-   if (flow_hw_mhdr_cmd_is_nop(&job->mhdr_cmd[i])) {
+   if (flow_hw_mhdr_cmd_is_nop(&mhdr_cmd[i])) {
++i;
continue;
}
@@ -2891,7 +2891,7 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
mhdr_action->dst.field == RTE_FLOW_FIELD_IPV6_DSCP)
data <<= MLX5_IPV6_HDR_DSCP_SHIFT;
data = (data & mask) >> off_b;
-   job->mhdr_cmd[i++].data1 = rte_cpu_to_be_32(data);
+   mhdr_cmd[i++].data1 = rte_cpu_to_be_32(data);
++field;
} while (field->size);
return 0;
@@ -2905,8 +2905,10 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] job
- *   Pointer to job descriptor.
+ * @param[in] flow
+ *   Pointer to flow structure.
+ * @param[in] ap
+ *   Pointer to container for temporarily constructed actions' parameters.
  * @param[in] hw_acts
  *   Pointer to translated action

[PATCH v2 07/11] net/mlx5: remove updated flow from job

2024-02-29 Thread Dariusz Sosnowski
mlx5_hw_q_job struct held a reference to a temporary flow rule struct,
used during flow rule update operation. It serves as a container for
flow actions data calculated during actions construction.
After flow rule update operation succeeds, data from temporary flow rule
is copied over to original flow rule.

Although access to this temporary flow rule struct is required
during both operation enqueue step and completion polling step,
there can be only one ongoing flow update operation for a given
flow rule. As a result there is no need to store it per job.

This patch removes all references to temporary flow rule struct
stored in mlx5_hw_q_job and removes relevant allocations to reduce
job memory footprint.
Temporary flow rule struct stored per job is replaced with:

- If table is not resizable - An array of rte_flow_hw_aux structs,
  stored in template table. This array holds one entry per each
  flow rule, each containing a single mentioned temporary struct.
- If table is resizable - Additional rte_flow_hw_aux struct,
  allocated alongside rte_flow_hw in resizable ipool.

Signed-off-by: Dariusz Sosnowski 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5.h |   1 -
 drivers/net/mlx5/mlx5_flow.h|   7 +++
 drivers/net/mlx5/mlx5_flow_hw.c | 100 ++--
 3 files changed, 89 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1ca6223f95..2e2504f20f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -407,7 +407,6 @@ struct mlx5_hw_q_job {
/* Data extracted from hardware */
void *hw;
} query;
-   struct rte_flow_hw *upd_flow; /* Flow with updated values. */
 };
 
 /* HW steering job descriptor LIFO pool. */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 436d1391bc..a204f94624 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1293,6 +1293,12 @@ struct rte_flow_hw {
uint8_t rule[]; /* HWS layer data struct. */
 } __rte_packed;
 
+/** Auxiliary data stored per flow which is not required to be stored in main 
flow structure. */
+struct rte_flow_hw_aux {
+   /** Placeholder flow struct used during flow rule update operation. */
+   struct rte_flow_hw upd_flow;
+};
+
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
@@ -1601,6 +1607,7 @@ struct rte_flow_template_table {
/* Action templates bind to the table. */
struct mlx5_hw_action_template ats[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
struct mlx5_indexed_pool *flow; /* The table's flow ipool. */
+   struct rte_flow_hw_aux *flow_aux; /**< Auxiliary data stored per flow. 
*/
struct mlx5_indexed_pool *resource; /* The table's resource ipool. */
struct mlx5_flow_template_table_cfg cfg;
uint32_t type; /* Flow table type RX/TX/FDB. */
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index ab67dc139e..cbbf87b999 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -79,6 +79,66 @@ struct mlx5_indlst_legacy {
 #define MLX5_CONST_ENCAP_ITEM(encap_type, ptr) \
 (((const struct encap_type *)(ptr))->definition)
 
+/**
+ * Returns the size of a struct with a following layout:
+ *
+ * @code{.c}
+ * struct rte_flow_hw {
+ * // rte_flow_hw fields
+ * uint8_t rule[mlx5dr_rule_get_handle_size()];
+ * };
+ * @endcode
+ *
+ * Such struct is used as a basic container for HW Steering flow rule.
+ */
+static size_t
+mlx5_flow_hw_entry_size(void)
+{
+   return sizeof(struct rte_flow_hw) + mlx5dr_rule_get_handle_size();
+}
+
+/**
+ * Returns the size of "auxed" rte_flow_hw structure which is assumed to be 
laid out as follows:
+ *
+ * @code{.c}
+ * struct {
+ * struct rte_flow_hw {
+ * // rte_flow_hw fields
+ * uint8_t rule[mlx5dr_rule_get_handle_size()];
+ * } flow;
+ * struct rte_flow_hw_aux aux;
+ * };
+ * @endcode
+ *
+ * Such struct is used whenever rte_flow_hw_aux cannot be allocated separately 
from the rte_flow_hw
+ * e.g., when table is resizable.
+ */
+static size_t
+mlx5_flow_hw_auxed_entry_size(void)
+{
+   size_t rule_size = mlx5dr_rule_get_handle_size();
+
+   return sizeof(struct rte_flow_hw) + rule_size + sizeof(struct 
rte_flow_hw_aux);
+}
+
+/**
+ * Returns a valid pointer to rte_flow_hw_aux associated with given rte_flow_hw
+ * depending on template table configuration.
+ */
+static __rte_always_inline struct rte_flow_hw_aux *
+mlx5_flow_hw_aux(uint16_t port_id, struct rte_flow_hw *flow)
+{
+   struct rte_flow_template_table *table = flow->table;
+
+   if (rte_flow_template_table_resizable(port_id, &table->cfg.attr)) {
+   size_t offset = sizeof(struct rte_flow_hw) + 
mlx5dr_rule_get_handle_size();
+
+   return RTE_PTR_ADD(flow, offset);
+   } else {
+   return &table->flow_aux[flow->idx - 1];
+   }
+}
+
 static int
 mlx5_tbl_multi_p

[PATCH v2 06/11] net/mlx5: remove flow pattern from job

2024-02-29 Thread Dariusz Sosnowski
mlx5_hw_q_job struct held a reference to temporary flow rule pattern
and contained temporary REPRESENTED_PORT and TAG items structs.
They are used whenever it is required to prepend a flow rule pattern,
provided by the application with one of such items.
If prepending is required, then flow rule pattern is copied over to
temporary buffer and a new item added internally in PMD.
Such constructed buffer is passed to the HWS layer when flow create
operation is being enqueued.
After operation is enqueued, temporary flow pattern can be safely
discarded, so there is no need to store it during
the whole lifecycle of mlx5_hw_q_job.

This patch removes all references to flow rule pattern and items stored
inside mlx5_hw_q_job and removes relevant allocations to reduce job
memory footprint.
Temporary pattern and items stored per job are replaced with stack
allocated ones, contained in mlx5_flow_hw_pattern_params struct.

Signed-off-by: Dariusz Sosnowski 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5.h | 17 ---
 drivers/net/mlx5/mlx5_flow.h| 10 +++
 drivers/net/mlx5/mlx5_flow_hw.c | 51 ++---
 3 files changed, 37 insertions(+), 41 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 42dc312a87..1ca6223f95 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -401,17 +401,12 @@ struct mlx5_hw_q_job {
const void *action; /* Indirect action attached to the job. */
};
void *user_data; /* Job user data. */
-   struct rte_flow_item *items;
-   union {
-   struct {
-   /* User memory for query output */
-   void *user;
-   /* Data extracted from hardware */
-   void *hw;
-   } __rte_packed query;
-   struct rte_flow_item_ethdev port_spec;
-   struct rte_flow_item_tag tag_spec;
-   } __rte_packed;
+   struct {
+   /* User memory for query output */
+   void *user;
+   /* Data extracted from hardware */
+   void *hw;
+   } query;
struct rte_flow_hw *upd_flow; /* Flow with updated values. */
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 9ed356e1c2..436d1391bc 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1316,6 +1316,16 @@ struct mlx5_flow_hw_action_params {
uint8_t ipv6_push_data[MLX5_PUSH_MAX_LEN];
 };
 
+/** Container for dynamically generated flow items used during flow rule 
creation. */
+struct mlx5_flow_hw_pattern_params {
+   /** Array of dynamically generated flow items. */
+   struct rte_flow_item items[MLX5_HW_MAX_ITEMS];
+   /** Temporary REPRESENTED_PORT item generated by PMD. */
+   struct rte_flow_item_ethdev port_spec;
+   /** Temporary TAG item generated by PMD. */
+   struct rte_flow_item_tag tag_spec;
+};
+
 /* rte flow action translate to DR action struct. */
 struct mlx5_action_construct_data {
LIST_ENTRY(mlx5_action_construct_data) next;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index a87fe4d07a..ab67dc139e 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3272,44 +3272,44 @@ flow_hw_get_rule_items(struct rte_eth_dev *dev,
   const struct rte_flow_template_table *table,
   const struct rte_flow_item items[],
   uint8_t pattern_template_index,
-  struct mlx5_hw_q_job *job)
+  struct mlx5_flow_hw_pattern_params *pp)
 {
struct rte_flow_pattern_template *pt = 
table->its[pattern_template_index];
 
/* Only one implicit item can be added to flow rule pattern. */
MLX5_ASSERT(!pt->implicit_port || !pt->implicit_tag);
-   /* At least one item was allocated in job descriptor for items. */
+   /* At least one item was allocated in pattern params for items. */
MLX5_ASSERT(MLX5_HW_MAX_ITEMS >= 1);
if (pt->implicit_port) {
if (pt->orig_item_nb + 1 > MLX5_HW_MAX_ITEMS) {
rte_errno = ENOMEM;
return NULL;
}
-   /* Set up represented port item in job descriptor. */
-   job->port_spec = (struct rte_flow_item_ethdev){
+   /* Set up represented port item in pattern params. */
+   pp->port_spec = (struct rte_flow_item_ethdev){
.port_id = dev->data->port_id,
};
-   job->items[0] = (struct rte_flow_item){
+   pp->items[0] = (struct rte_flow_item){
.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
-   .spec = &job->port_spec,
+   .spec = &pp->port_spec,
};
-   rte_memcpy(&job->items[1], items, sizeof(*

[PATCH v2 09/11] net/mlx5: move rarely used flow fields outside

2024-02-29 Thread Dariusz Sosnowski
Some of the flow fields are either not always required
or are used very rarely, e.g.:

- AGE action reference,
- direct METER/METER_MARK action reference,
- matcher selector for resizable tables.

This patch moves these fields to rte_flow_hw_aux struct in order to
reduce the overall size of the flow struct, reducing the total size
of working set for most common use cases.
This results in reduction of the frequency of cache invalidation
during async flow operations processing.

Signed-off-by: Dariusz Sosnowski 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5_flow.h|  61 +++-
 drivers/net/mlx5/mlx5_flow_hw.c | 121 
 2 files changed, 138 insertions(+), 44 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 46d8ce1775..e8f4d2cb16 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1283,31 +1283,60 @@ enum {
 #pragma GCC diagnostic ignored "-Wpedantic"
 #endif
 
-/* HWS flow struct. */
+/** HWS flow struct. */
 struct rte_flow_hw {
-   uint32_t idx; /* Flow index from indexed pool. */
-   uint32_t res_idx; /* Resource index from indexed pool. */
-   uint32_t fate_type; /* Fate action type. */
+   /** The table flow allcated from. */
+   struct rte_flow_template_table *table;
+   /** Application's private data passed to enqueued flow operation. */
+   void *user_data;
+   /** Flow index from indexed pool. */
+   uint32_t idx;
+   /** Resource index from indexed pool. */
+   uint32_t res_idx;
+   /** HWS flow rule index passed to mlx5dr. */
+   uint32_t rule_idx;
+   /** Fate action type. */
+   uint32_t fate_type;
+   /** Ongoing flow operation type. */
+   uint8_t operation_type;
+   /** Index of pattern template this flow is based on. */
+   uint8_t mt_idx;
+
+   /** COUNT action index. */
+   cnt_id_t cnt_id;
union {
-   /* Jump action. */
+   /** Jump action. */
struct mlx5_hw_jump_action *jump;
-   struct mlx5_hrxq *hrxq; /* TIR action. */
+   /** TIR action. */
+   struct mlx5_hrxq *hrxq;
};
-   struct rte_flow_template_table *table; /* The table flow allcated from. 
*/
-   uint8_t mt_idx;
-   uint8_t matcher_selector:1;
+
+   /**
+* Padding for alignment to 56 bytes.
+* Since mlx5dr rule is 72 bytes, whole flow is contained within 128 B 
(2 cache lines).
+* This space is reserved for future additions to flow struct.
+*/
+   uint8_t padding[10];
+   /** HWS layer data struct. */
+   uint8_t rule[];
+} __rte_packed;
+
+/** Auxiliary data fields that are updatable. */
+struct rte_flow_hw_aux_fields {
+   /** AGE action index. */
uint32_t age_idx;
-   cnt_id_t cnt_id;
+   /** Direct meter (METER or METER_MARK) action index. */
uint32_t mtr_id;
-   uint32_t rule_idx;
-   uint8_t operation_type; /**< Ongoing flow operation type. */
-   void *user_data; /**< Application's private data passed to enqueued 
flow operation. */
-   uint8_t padding[1]; /**< Padding for proper alignment of mlx5dr rule 
struct. */
-   uint8_t rule[]; /* HWS layer data struct. */
-} __rte_packed;
+};
 
 /** Auxiliary data stored per flow which is not required to be stored in main 
flow structure. */
 struct rte_flow_hw_aux {
+   /** Auxiliary fields associated with the original flow. */
+   struct rte_flow_hw_aux_fields orig;
+   /** Auxiliary fields associated with the updated flow. */
+   struct rte_flow_hw_aux_fields upd;
+   /** Index of resizable matcher associated with this flow. */
+   uint8_t matcher_selector;
/** Placeholder flow struct used during flow rule update operation. */
struct rte_flow_hw upd_flow;
 };
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index dc0b4bff3d..025f04ddde 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -139,6 +139,50 @@ mlx5_flow_hw_aux(uint16_t port_id, struct rte_flow_hw 
*flow)
}
 }
 
+static __rte_always_inline void
+mlx5_flow_hw_aux_set_age_idx(struct rte_flow_hw *flow,
+struct rte_flow_hw_aux *aux,
+uint32_t age_idx)
+{
+   /*
+* Only when creating a flow rule, the type will be set explicitly.
+* Or else, it should be none in the rule update case.
+*/
+   if (unlikely(flow->operation_type == MLX5_FLOW_HW_FLOW_OP_TYPE_UPDATE))
+   aux->upd.age_idx = age_idx;
+   else
+   aux->orig.age_idx = age_idx;
+}
+
+static __rte_always_inline uint32_t
+mlx5_flow_hw_aux_get_age_idx(struct rte_flow_hw *flow, struct rte_flow_hw_aux 
*aux)
+{
+   if (unlikely(flow->operation_type == MLX5_FLOW_HW_FLOW_OP_TYPE_UPDATE))
+   return aux->upd.age_idx;
+   else
+   

[PATCH v2 08/11] net/mlx5: use flow as operation container

2024-02-29 Thread Dariusz Sosnowski
While processing async flow operations in mlx5 PMD,
mlx5_hw_q_job struct is used to hold the following data
related to the ongoing operation.

- operation type,
- user data,
- flow reference.

Job itself is then passed to mlx5dr layer as its "user data".
Other types of data required during flow operation processing
are accessed through the flow itself.

Since most of the accessed fields are in the flow struct itself,
the operation type and user data can be moved to the flow struct.
This removes unnecessary memory indirection and reduces memory
footprint of flow operations processing. It decreases cache stress
and as a result can increase processing throughput.

This patch removes the mlx5_hw_q_job from async flow operations
processing and from now on the flow itself can represent the ongoing
operation. Async operations on indirect actions still use jobs.

Signed-off-by: Dariusz Sosnowski 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5.h |   8 +-
 drivers/net/mlx5/mlx5_flow.h|  13 ++
 drivers/net/mlx5/mlx5_flow_hw.c | 210 +++-
 3 files changed, 116 insertions(+), 115 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 2e2504f20f..8acb79e7bb 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -396,10 +396,7 @@ enum mlx5_hw_indirect_type {
 struct mlx5_hw_q_job {
uint32_t type; /* Job type. */
uint32_t indirect_type;
-   union {
-   struct rte_flow_hw *flow; /* Flow attached to the job. */
-   const void *action; /* Indirect action attached to the job. */
-   };
+   const void *action; /* Indirect action attached to the job. */
void *user_data; /* Job user data. */
struct {
/* User memory for query output */
@@ -412,7 +409,8 @@ struct mlx5_hw_q_job {
 /* HW steering job descriptor LIFO pool. */
 struct mlx5_hw_q {
uint32_t job_idx; /* Free job index. */
-   uint32_t size; /* LIFO size. */
+   uint32_t size; /* Job LIFO queue size. */
+   uint32_t ongoing_flow_ops; /* Number of ongoing flow operations. */
struct mlx5_hw_q_job **job; /* LIFO header. */
struct rte_ring *indir_cq; /* Indirect action SW completion queue. */
struct rte_ring *indir_iq; /* Indirect action SW in progress queue. */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index a204f94624..46d8ce1775 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1269,6 +1269,16 @@ typedef uint32_t cnt_id_t;
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 
+enum {
+   MLX5_FLOW_HW_FLOW_OP_TYPE_NONE,
+   MLX5_FLOW_HW_FLOW_OP_TYPE_CREATE,
+   MLX5_FLOW_HW_FLOW_OP_TYPE_DESTROY,
+   MLX5_FLOW_HW_FLOW_OP_TYPE_UPDATE,
+   MLX5_FLOW_HW_FLOW_OP_TYPE_RSZ_TBL_CREATE,
+   MLX5_FLOW_HW_FLOW_OP_TYPE_RSZ_TBL_DESTROY,
+   MLX5_FLOW_HW_FLOW_OP_TYPE_RSZ_TBL_MOVE,
+};
+
 #ifdef PEDANTIC
 #pragma GCC diagnostic ignored "-Wpedantic"
 #endif
@@ -1290,6 +1300,9 @@ struct rte_flow_hw {
cnt_id_t cnt_id;
uint32_t mtr_id;
uint32_t rule_idx;
+   uint8_t operation_type; /**< Ongoing flow operation type. */
+   void *user_data; /**< Application's private data passed to enqueued 
flow operation. */
+   uint8_t padding[1]; /**< Padding for proper alignment of mlx5dr rule 
struct. */
uint8_t rule[]; /* HWS layer data struct. */
 } __rte_packed;
 
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index cbbf87b999..dc0b4bff3d 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -312,6 +312,31 @@ static const struct rte_flow_item_eth 
ctrl_rx_eth_bcast_spec = {
.hdr.ether_type = 0,
 };
 
+static inline uint32_t
+flow_hw_q_pending(struct mlx5_priv *priv, uint32_t queue)
+{
+   struct mlx5_hw_q *q = &priv->hw_q[queue];
+
+   MLX5_ASSERT(q->size >= q->job_idx);
+   return (q->size - q->job_idx) + q->ongoing_flow_ops;
+}
+
+static inline void
+flow_hw_q_inc_flow_ops(struct mlx5_priv *priv, uint32_t queue)
+{
+   struct mlx5_hw_q *q = &priv->hw_q[queue];
+
+   q->ongoing_flow_ops++;
+}
+
+static inline void
+flow_hw_q_dec_flow_ops(struct mlx5_priv *priv, uint32_t queue)
+{
+   struct mlx5_hw_q *q = &priv->hw_q[queue];
+
+   q->ongoing_flow_ops--;
+}
+
 static __rte_always_inline struct mlx5_hw_q_job *
 flow_hw_job_get(struct mlx5_priv *priv, uint32_t queue)
 {
@@ -3426,20 +3451,15 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
struct mlx5_flow_hw_action_params ap;
struct mlx5_flow_hw_pattern_params pp;
struct rte_flow_hw *flow = NULL;
-   struct mlx5_hw_q_job *job = NULL;
const struct rte_flow_item *rule_items;
uint32_t flow_idx = 0;
uint32_t res_idx = 0;
int ret;
 
if (unlikely((!dev->data->dev_started))) {
-   rte_errno = EINVAL;
-   g

[PATCH v2 10/11] net/mlx5: reuse flow fields

2024-02-29 Thread Dariusz Sosnowski
Each time a flow is allocated in mlx5 PMD the whole buffer,
both rte_flow_hw and mlx5dr_rule parts, are zeroed.
This introduces some wasted work because:

- mlx5dr layer does not assume that mlx5dr_rule must be initialized,
- flow action translation in mlx5 PMD does not need most of the fields
  of rte_flow_hw to be zeroed.

To reduce this wasted work, this patch introduces flags field to
flow definition. Each flow field which is not always initialized
during flow creation, will have a correspondent flag set if value is
valid (in other words - it was set during flow creation).
Utilizing this mechanism allows PMD to:

- remove zeroing from flow allocation,
- access some fields (especially from rte_flow_hw_aux) if and only if
  corresponding flag is set.

Signed-off-by: Dariusz Sosnowski 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5_flow.h| 24 -
 drivers/net/mlx5/mlx5_flow_hw.c | 93 +
 2 files changed, 83 insertions(+), 34 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index e8f4d2cb16..db65825eab 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1279,6 +1279,26 @@ enum {
MLX5_FLOW_HW_FLOW_OP_TYPE_RSZ_TBL_MOVE,
 };
 
+enum {
+   MLX5_FLOW_HW_FLOW_FLAG_CNT_ID = RTE_BIT32(0),
+   MLX5_FLOW_HW_FLOW_FLAG_FATE_JUMP = RTE_BIT32(1),
+   MLX5_FLOW_HW_FLOW_FLAG_FATE_HRXQ = RTE_BIT32(2),
+   MLX5_FLOW_HW_FLOW_FLAG_AGE_IDX = RTE_BIT32(3),
+   MLX5_FLOW_HW_FLOW_FLAG_MTR_ID = RTE_BIT32(4),
+   MLX5_FLOW_HW_FLOW_FLAG_MATCHER_SELECTOR = RTE_BIT32(5),
+   MLX5_FLOW_HW_FLOW_FLAG_UPD_FLOW = RTE_BIT32(6),
+};
+
+#define MLX5_FLOW_HW_FLOW_FLAGS_ALL ( \
+   MLX5_FLOW_HW_FLOW_FLAG_CNT_ID | \
+   MLX5_FLOW_HW_FLOW_FLAG_FATE_JUMP | \
+   MLX5_FLOW_HW_FLOW_FLAG_FATE_HRXQ | \
+   MLX5_FLOW_HW_FLOW_FLAG_AGE_IDX | \
+   MLX5_FLOW_HW_FLOW_FLAG_MTR_ID | \
+   MLX5_FLOW_HW_FLOW_FLAG_MATCHER_SELECTOR | \
+   MLX5_FLOW_HW_FLOW_FLAG_UPD_FLOW \
+   )
+
 #ifdef PEDANTIC
 #pragma GCC diagnostic ignored "-Wpedantic"
 #endif
@@ -1295,8 +1315,8 @@ struct rte_flow_hw {
uint32_t res_idx;
/** HWS flow rule index passed to mlx5dr. */
uint32_t rule_idx;
-   /** Fate action type. */
-   uint32_t fate_type;
+   /** Which flow fields (inline or in auxiliary struct) are used. */
+   uint32_t flags;
/** Ongoing flow operation type. */
uint8_t operation_type;
/** Index of pattern template this flow is based on. */
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 025f04ddde..979be4764a 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -2845,6 +2845,7 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, 
uint32_t queue,
&rule_act->action,
&rule_act->counter.offset))
return -1;
+   flow->flags |= MLX5_FLOW_HW_FLOW_FLAG_CNT_ID;
flow->cnt_id = act_idx;
break;
case MLX5_INDIRECT_ACTION_TYPE_AGE:
@@ -2854,6 +2855,7 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, 
uint32_t queue,
 * it in flow destroy.
 */
mlx5_flow_hw_aux_set_age_idx(flow, aux, act_idx);
+   flow->flags |= MLX5_FLOW_HW_FLOW_FLAG_AGE_IDX;
if (action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT)
/*
 * The mutual update for idirect AGE & COUNT will be
@@ -2869,6 +2871,7 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, 
uint32_t queue,
  ¶m->queue_id, &age_cnt,
  idx) < 0)
return -1;
+   flow->flags |= MLX5_FLOW_HW_FLOW_FLAG_CNT_ID;
flow->cnt_id = age_cnt;
param->nb_cnts++;
} else {
@@ -3174,7 +3177,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
rule_acts[act_data->action_dst].action =
(!!attr.group) ? jump->hws_action : jump->root_action;
flow->jump = jump;
-   flow->fate_type = MLX5_FLOW_FATE_JUMP;
+   flow->flags |= MLX5_FLOW_HW_FLOW_FLAG_FATE_JUMP;
break;
case RTE_FLOW_ACTION_TYPE_RSS:
case RTE_FLOW_ACTION_TYPE_QUEUE:
@@ -3185,7 +3188,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
return -1;
rule_acts[act_data->action_dst].action = hrxq->action;
flow->hrxq = hrxq;
-   flow->fate_type = MLX5_FLOW_FATE_QUEUE;
+   flow->flags |= MLX5_FLOW_HW

[PATCH v2 11/11] net/mlx5: remove unneeded device status checking

2024-02-29 Thread Dariusz Sosnowski
From: Bing Zhao 

The flow rule can be inserted even before the device started. The
only exception is for a queue or RSS action.

For the other interfaces of template API, the start status is not
checked. The checking would cause some cache miss or eviction since
the flag locates on some other cache line.

Fixes: f1fecffa88df ("net/mlx5: support Direct Rules action template API")
Cc: sta...@dpdk.org

Signed-off-by: Bing Zhao 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 979be4764a..285ec603d3 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3520,11 +3520,6 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
uint32_t res_idx = 0;
int ret;
 
-   if (unlikely((!dev->data->dev_started))) {
-   rte_flow_error_set(error, EINVAL, 
RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
-  "Port must be started before enqueueing flow 
operations");
-   return NULL;
-   }
flow = mlx5_ipool_malloc(table->flow, &flow_idx);
if (!flow)
goto error;
-- 
2.39.2



Re: [PATCH v2] net/iavf: remove error logs for vlan offloading

2024-02-29 Thread Bruce Richardson
On Tue, Feb 06, 2024 at 11:34:20AM +0100, David Marchand wrote:
> This was reported by RH QE.
> 
> When a vlan is enforced on a VF via an administrative configuration on
> the PF side, the net/iavf driver logs two error messages.
> Those error messages have no consequence on the rest of the port
> initialisation and packet processing works fine.
> 
> [root@toto ~] # ip l set enp94s0 vf 0 vlan 2
> [root@toto ~] # dpdk-testpmd -a :5e:02.0 -- -i
> ...
> Configuring Port 0 (socket 0)
> iavf_dev_init_vlan(): Failed to update vlan offload
> iavf_dev_configure(): configure VLAN failed: -95
> iavf_set_rx_function(): request RXDID[1] in Queue[0] is legacy, set
>   rx_pkt_burst as legacy for all queues
> 
> The first change is to remove the error log in iavf_dev_init_vlan().
> This log is unneeded since all error path are covered by a dedicated log
> message already.
> 
> Then, in iavf_dev_init_vlan(), requesting all possible VLAN offloading
> must not trigger an ERROR level log message. This is simply confusing,
> as the application may not have requested such vlan offloading.
> The reason why the driver requests all offloading is unclear so keep it
> as is. Instead, rephrase the log message and lower its level to INFO.
> 
> Fixes: 1c301e8c3cff ("net/iavf: support new VLAN capabilities")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: David Marchand 
> Acked-by: Bruce Richardson 
> ---
Applied to dpdk-next-net-intel

Thanks,
/Bruce


[PATCH 0/7] vhost: FD manager improvements

2024-02-29 Thread Maxime Coquelin
This series aims at improving the Vhost FD manager.

First patch is a fix necessary to have VDUSE devices
destroy to work. I expect it to be taken into v24.03
release.

The rest of the series are various improvements to the
FD manager that can wait v24.07 release.

Maxime Coquelin (7):
  vhost: fix VDUSE device destruction failure
  vhost: rename polling mutex
  vhost: make use of FD manager init function
  vhost: hide synchronization within FD manager
  vhost: improve fdset initialization
  vhost: convert fdset sync to eventfd
  vhost: improve FD manager logging

 lib/vhost/fd_man.c  | 313 +--
 lib/vhost/fd_man.c.orig | 538 
 lib/vhost/fd_man.h  |  41 +--
 lib/vhost/socket.c  |  37 +--
 lib/vhost/vduse.c   |  51 +---
 5 files changed, 800 insertions(+), 180 deletions(-)
 create mode 100644 lib/vhost/fd_man.c.orig

-- 
2.43.2



[PATCH 1/7] vhost: fix VDUSE device destruction failure

2024-02-29 Thread Maxime Coquelin
VDUSE_DESTROY_DEVICE ioctl can fail because the device's
chardev is not released despite close syscall having been
called. It happens because the events handler thread is
still polling the file descriptor.

fdset_pipe_notify() is not enough because it does not
ensure the notification has been handled by the event
thread, it just returns once the notification is sent.

To fix this, this patch introduces a synchronization
mechanism based on pthread's condition, so that
fdset_pipe_notify() only returns once the pipe's read
callback has been executed.

Fixes: 51d018fdac4e ("vhost: add VDUSE events handler")
Cc: sta...@dpdk.org

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/fd_man.c | 21 ++---
 lib/vhost/fd_man.h |  5 +
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c
index 79a8d2c006..42ce059039 100644
--- a/lib/vhost/fd_man.c
+++ b/lib/vhost/fd_man.c
@@ -309,10 +309,11 @@ fdset_event_dispatch(void *arg)
 }
 
 static void
-fdset_pipe_read_cb(int readfd, void *dat __rte_unused,
+fdset_pipe_read_cb(int readfd, void *dat,
   int *remove __rte_unused)
 {
char charbuf[16];
+   struct fdset *fdset = dat;
int r = read(readfd, charbuf, sizeof(charbuf));
/*
 * Just an optimization, we don't care if read() failed
@@ -320,6 +321,11 @@ fdset_pipe_read_cb(int readfd, void *dat __rte_unused,
 * compiler happy
 */
RTE_SET_USED(r);
+
+   pthread_mutex_lock(&fdset->sync_mutex);
+   fdset->sync = true;
+   pthread_cond_broadcast(&fdset->sync_cond);
+   pthread_mutex_unlock(&fdset->sync_mutex);
 }
 
 void
@@ -342,7 +348,7 @@ fdset_pipe_init(struct fdset *fdset)
}
 
ret = fdset_add(fdset, fdset->u.readfd,
-   fdset_pipe_read_cb, NULL, NULL);
+   fdset_pipe_read_cb, NULL, fdset);
 
if (ret < 0) {
VHOST_FDMAN_LOG(ERR,
@@ -359,7 +365,12 @@ fdset_pipe_init(struct fdset *fdset)
 void
 fdset_pipe_notify(struct fdset *fdset)
 {
-   int r = write(fdset->u.writefd, "1", 1);
+   int r;
+
+   pthread_mutex_lock(&fdset->sync_mutex);
+
+   fdset->sync = false;
+   r = write(fdset->u.writefd, "1", 1);
/*
 * Just an optimization, we don't care if write() failed
 * so ignore explicitly its return value to make the
@@ -367,4 +378,8 @@ fdset_pipe_notify(struct fdset *fdset)
 */
RTE_SET_USED(r);
 
+   while (!fdset->sync)
+   pthread_cond_wait(&fdset->sync_cond, &fdset->sync_mutex);
+
+   pthread_mutex_unlock(&fdset->sync_mutex);
 }
diff --git a/lib/vhost/fd_man.h b/lib/vhost/fd_man.h
index 6315904c8e..cc19937612 100644
--- a/lib/vhost/fd_man.h
+++ b/lib/vhost/fd_man.h
@@ -6,6 +6,7 @@
 #define _FD_MAN_H_
 #include 
 #include 
+#include 
 
 #define MAX_FDS 1024
 
@@ -35,6 +36,10 @@ struct fdset {
int writefd;
};
} u;
+
+   pthread_mutex_t sync_mutex;
+   pthread_cond_t sync_cond;
+   bool sync;
 };
 
 
-- 
2.43.2



[PATCH 2/7] vhost: rename polling mutex

2024-02-29 Thread Maxime Coquelin
This trivial patch fixes a typo in fd's manager polling
mutex name.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/fd_man.c | 8 
 lib/vhost/fd_man.h | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c
index 42ce059039..5dde40e51a 100644
--- a/lib/vhost/fd_man.c
+++ b/lib/vhost/fd_man.c
@@ -125,9 +125,9 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb 
wcb, void *dat)
pthread_mutex_lock(&pfdset->fd_mutex);
i = pfdset->num < MAX_FDS ? pfdset->num++ : -1;
if (i == -1) {
-   pthread_mutex_lock(&pfdset->fd_pooling_mutex);
+   pthread_mutex_lock(&pfdset->fd_polling_mutex);
fdset_shrink_nolock(pfdset);
-   pthread_mutex_unlock(&pfdset->fd_pooling_mutex);
+   pthread_mutex_unlock(&pfdset->fd_polling_mutex);
i = pfdset->num < MAX_FDS ? pfdset->num++ : -1;
if (i == -1) {
pthread_mutex_unlock(&pfdset->fd_mutex);
@@ -244,9 +244,9 @@ fdset_event_dispatch(void *arg)
numfds = pfdset->num;
pthread_mutex_unlock(&pfdset->fd_mutex);
 
-   pthread_mutex_lock(&pfdset->fd_pooling_mutex);
+   pthread_mutex_lock(&pfdset->fd_polling_mutex);
val = poll(pfdset->rwfds, numfds, 1000 /* millisecs */);
-   pthread_mutex_unlock(&pfdset->fd_pooling_mutex);
+   pthread_mutex_unlock(&pfdset->fd_polling_mutex);
if (val < 0)
continue;
 
diff --git a/lib/vhost/fd_man.h b/lib/vhost/fd_man.h
index cc19937612..2517ae5a9b 100644
--- a/lib/vhost/fd_man.h
+++ b/lib/vhost/fd_man.h
@@ -24,7 +24,7 @@ struct fdset {
struct pollfd rwfds[MAX_FDS];
struct fdentry fd[MAX_FDS];
pthread_mutex_t fd_mutex;
-   pthread_mutex_t fd_pooling_mutex;
+   pthread_mutex_t fd_polling_mutex;
int num;/* current fd number of this fdset */
 
union pipefds {
-- 
2.43.2



[PATCH 4/7] vhost: hide synchronization within FD manager

2024-02-29 Thread Maxime Coquelin
This patch forces synchronization for all FDs additions
or deletions in the FD set. With that, it is no more
necessary for the user to know about the FD set pipe, so
hide its initialization in the FD manager.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/fd_man.c | 174 +
 lib/vhost/fd_man.h |   7 +-
 lib/vhost/socket.c |  12 +---
 lib/vhost/vduse.c  |  18 +
 4 files changed, 101 insertions(+), 110 deletions(-)

diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c
index d33036a171..0ae481b785 100644
--- a/lib/vhost/fd_man.c
+++ b/lib/vhost/fd_man.c
@@ -2,7 +2,9 @@
  * Copyright(c) 2010-2014 Intel Corporation
  */
 
+#include 
 #include 
+#include 
 #include 
 
 #include 
@@ -17,6 +19,87 @@ RTE_LOG_REGISTER_SUFFIX(vhost_fdset_logtype, fdset, INFO);
 
 #define FDPOLLERR (POLLERR | POLLHUP | POLLNVAL)
 
+static void
+fdset_pipe_read_cb(int readfd, void *dat,
+  int *remove __rte_unused)
+{
+   char charbuf[16];
+   struct fdset *fdset = dat;
+   int r = read(readfd, charbuf, sizeof(charbuf));
+   /*
+* Just an optimization, we don't care if read() failed
+* so ignore explicitly its return value to make the
+* compiler happy
+*/
+   RTE_SET_USED(r);
+
+   pthread_mutex_lock(&fdset->sync_mutex);
+   fdset->sync = true;
+   pthread_cond_broadcast(&fdset->sync_cond);
+   pthread_mutex_unlock(&fdset->sync_mutex);
+}
+
+static void
+fdset_pipe_uninit(struct fdset *fdset)
+{
+   fdset_del(fdset, fdset->u.readfd);
+   close(fdset->u.readfd);
+   fdset->u.readfd = -1;
+   close(fdset->u.writefd);
+   fdset->u.writefd = -1;
+}
+
+static int
+fdset_pipe_init(struct fdset *fdset)
+{
+   int ret;
+
+   pthread_mutex_init(&fdset->sync_mutex, NULL);
+   pthread_cond_init(&fdset->sync_cond, NULL);
+
+   if (pipe(fdset->u.pipefd) < 0) {
+   VHOST_FDMAN_LOG(ERR,
+   "failed to create pipe for vhost fdset");
+   return -1;
+   }
+
+   ret = fdset_add(fdset, fdset->u.readfd,
+   fdset_pipe_read_cb, NULL, fdset);
+   if (ret < 0) {
+   VHOST_FDMAN_LOG(ERR,
+   "failed to add pipe readfd %d into vhost server fdset",
+   fdset->u.readfd);
+
+   fdset_pipe_uninit(fdset);
+   return -1;
+   }
+
+   return 0;
+}
+
+static void
+fdset_sync(struct fdset *fdset)
+{
+   int ret;
+
+   pthread_mutex_lock(&fdset->sync_mutex);
+
+   fdset->sync = false;
+   ret = write(fdset->u.writefd, "1", 1);
+   if (ret < 0) {
+   VHOST_FDMAN_LOG(ERR,
+   "Failed to write to notification pipe: %s",
+   strerror(errno));
+   goto out_unlock;
+   }
+
+   while (!fdset->sync)
+   pthread_cond_wait(&fdset->sync_cond, &fdset->sync_mutex);
+
+out_unlock:
+   pthread_mutex_unlock(&fdset->sync_mutex);
+}
+
 static int
 get_last_valid_idx(struct fdset *pfdset, int last_valid_idx)
 {
@@ -96,6 +179,12 @@ fdset_add_fd(struct fdset *pfdset, int idx, int fd,
pfd->revents = 0;
 }
 
+void
+fdset_uninit(struct fdset *pfdset)
+{
+   fdset_pipe_uninit(pfdset);
+}
+
 int
 fdset_init(struct fdset *pfdset)
 {
@@ -113,7 +202,7 @@ fdset_init(struct fdset *pfdset)
}
pfdset->num = 0;
 
-   return 0;
+   return fdset_pipe_init(pfdset);
 }
 
 /**
@@ -143,6 +232,8 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb 
wcb, void *dat)
fdset_add_fd(pfdset, i, fd, rcb, wcb, dat);
pthread_mutex_unlock(&pfdset->fd_mutex);
 
+   fdset_sync(pfdset);
+
return 0;
 }
 
@@ -174,6 +265,8 @@ fdset_del(struct fdset *pfdset, int fd)
pthread_mutex_unlock(&pfdset->fd_mutex);
} while (i != -1);
 
+   fdset_sync(pfdset);
+
return dat;
 }
 
@@ -207,6 +300,9 @@ fdset_try_del(struct fdset *pfdset, int fd)
}
 
pthread_mutex_unlock(&pfdset->fd_mutex);
+
+   fdset_sync(pfdset);
+
return 0;
 }
 
@@ -312,79 +408,3 @@ fdset_event_dispatch(void *arg)
 
return 0;
 }
-
-static void
-fdset_pipe_read_cb(int readfd, void *dat,
-  int *remove __rte_unused)
-{
-   char charbuf[16];
-   struct fdset *fdset = dat;
-   int r = read(readfd, charbuf, sizeof(charbuf));
-   /*
-* Just an optimization, we don't care if read() failed
-* so ignore explicitly its return value to make the
-* compiler happy
-*/
-   RTE_SET_USED(r);
-
-   pthread_mutex_lock(&fdset->sync_mutex);
-   fdset->sync = true;
-   pthread_cond_broadcast(&fdset->sync_cond);
-   pthread_mutex_unlock(&fdset->sync_mutex);
-}
-
-void
-fdset_pipe_uninit(struct fdset *fdset)
-{
-   fdset_del(fdset, fdset->u.readfd);
-   close(fdset->u.readfd);
-   close(fdset->u.writefd);
-}
-
-int
-fdset_pipe_init(str

[PATCH 3/7] vhost: make use of FD manager init function

2024-02-29 Thread Maxime Coquelin
Instead of statically initialize the fdset, this patch
converts VDUSE and Vhost-user to use fdset_init() function,
which now also initialize the mutexes.

This is preliminary rework to hide FDs manager pipe from
its users.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/fd_man.c |  9 +++--
 lib/vhost/fd_man.h |  2 +-
 lib/vhost/socket.c | 11 +--
 lib/vhost/vduse.c  | 14 ++
 4 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c
index 5dde40e51a..d33036a171 100644
--- a/lib/vhost/fd_man.c
+++ b/lib/vhost/fd_man.c
@@ -96,19 +96,24 @@ fdset_add_fd(struct fdset *pfdset, int idx, int fd,
pfd->revents = 0;
 }
 
-void
+int
 fdset_init(struct fdset *pfdset)
 {
int i;
 
if (pfdset == NULL)
-   return;
+   return -1;
+
+   pthread_mutex_init(&pfdset->fd_mutex, NULL);
+   pthread_mutex_init(&pfdset->fd_polling_mutex, NULL);
 
for (i = 0; i < MAX_FDS; i++) {
pfdset->fd[i].fd = -1;
pfdset->fd[i].dat = NULL;
}
pfdset->num = 0;
+
+   return 0;
 }
 
 /**
diff --git a/lib/vhost/fd_man.h b/lib/vhost/fd_man.h
index 2517ae5a9b..92d24d8591 100644
--- a/lib/vhost/fd_man.h
+++ b/lib/vhost/fd_man.h
@@ -43,7 +43,7 @@ struct fdset {
 };
 
 
-void fdset_init(struct fdset *pfdset);
+int fdset_init(struct fdset *pfdset);
 
 int fdset_add(struct fdset *pfdset, int fd,
fd_cb rcb, fd_cb wcb, void *dat);
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index a2fdac30a4..b544e39be7 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -89,12 +89,6 @@ static int create_unix_socket(struct vhost_user_socket 
*vsocket);
 static int vhost_user_start_client(struct vhost_user_socket *vsocket);
 
 static struct vhost_user vhost_user = {
-   .fdset = {
-   .fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
-   .fd_mutex = PTHREAD_MUTEX_INITIALIZER,
-   .fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
-   .num = 0
-   },
.vsocket_cnt = 0,
.mutex = PTHREAD_MUTEX_INITIALIZER,
 };
@@ -1187,6 +1181,11 @@ rte_vhost_driver_start(const char *path)
return vduse_device_create(path, 
vsocket->net_compliant_ol_flags);
 
if (fdset_tid.opaque_id == 0) {
+   if (fdset_init(&vhost_user.fdset) < 0) {
+   VHOST_CONFIG_LOG(path, ERR, "Failed to init Vhost-user 
fdset");
+   return -1;
+   }
+
/**
 * create a pipe which will be waited by poll and notified to
 * rebuild the wait list of poll.
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index d462428d2c..d83d7b0d7c 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -31,14 +31,7 @@ struct vduse {
struct fdset fdset;
 };
 
-static struct vduse vduse = {
-   .fdset = {
-   .fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
-   .fd_mutex = PTHREAD_MUTEX_INITIALIZER,
-   .fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
-   .num = 0
-   },
-};
+static struct vduse vduse;
 
 static bool vduse_events_thread;
 
@@ -434,6 +427,11 @@ vduse_device_create(const char *path, bool 
compliant_ol_flags)
 
/* If first device, create events dispatcher thread */
if (vduse_events_thread == false) {
+   if (fdset_init(&vduse.fdset) < 0) {
+   VHOST_CONFIG_LOG(path, ERR, "Failed to init VDUSE 
fdset");
+   return -1;
+   }
+
/**
 * create a pipe which will be waited by poll and notified to
 * rebuild the wait list of poll.
-- 
2.43.2



[PATCH 5/7] vhost: improve fdset initialization

2024-02-29 Thread Maxime Coquelin
This patch heavily reworks fdset initialization:
 - fdsets are now dynamically allocated by the FD manager
 - the event dispatcher is now created by the FD manager
 - struct fdset is now opaque to VDUSE and Vhost

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/fd_man.c  | 177 +++--
 lib/vhost/fd_man.c.orig | 538 
 lib/vhost/fd_man.h  |  39 +--
 lib/vhost/socket.c  |  24 +-
 lib/vhost/vduse.c   |  29 +--
 5 files changed, 715 insertions(+), 92 deletions(-)
 create mode 100644 lib/vhost/fd_man.c.orig

diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c
index 0ae481b785..8b47c97d45 100644
--- a/lib/vhost/fd_man.c
+++ b/lib/vhost/fd_man.c
@@ -3,12 +3,16 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "fd_man.h"
 
@@ -19,6 +23,79 @@ RTE_LOG_REGISTER_SUFFIX(vhost_fdset_logtype, fdset, INFO);
 
 #define FDPOLLERR (POLLERR | POLLHUP | POLLNVAL)
 
+struct fdentry {
+   int fd; /* -1 indicates this entry is empty */
+   fd_cb rcb;  /* callback when this fd is readable. */
+   fd_cb wcb;  /* callback when this fd is writeable.*/
+   void *dat;  /* fd context */
+   int busy;   /* whether this entry is being used in cb. */
+};
+
+struct fdset {
+   char name[RTE_THREAD_NAME_SIZE];
+   struct pollfd rwfds[MAX_FDS];
+   struct fdentry fd[MAX_FDS];
+   rte_thread_t tid;
+   pthread_mutex_t fd_mutex;
+   pthread_mutex_t fd_polling_mutex;
+   int num;/* current fd number of this fdset */
+
+   union pipefds {
+   struct {
+   int pipefd[2];
+   };
+   struct {
+   int readfd;
+   int writefd;
+   };
+   } u;
+
+   pthread_mutex_t sync_mutex;
+   pthread_cond_t sync_cond;
+   bool sync;
+   bool destroy;
+};
+
+static int fdset_add_no_sync(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb 
wcb, void *dat);
+static uint32_t fdset_event_dispatch(void *arg);
+
+#define MAX_FDSETS 8
+
+static struct fdset *fdsets[MAX_FDSETS];
+pthread_mutex_t fdsets_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static struct fdset *
+fdset_lookup(const char *name)
+{
+   int i;
+
+   for (i = 0; i < MAX_FDSETS; i++) {
+   struct fdset *fdset = fdsets[i];
+   if (fdset == NULL)
+   continue;
+
+   if (!strncmp(fdset->name, name, RTE_THREAD_NAME_SIZE))
+   return fdset;
+   }
+
+   return NULL;
+}
+
+static int
+fdset_insert(struct fdset *fdset)
+{
+   int i;
+
+   for (i = 0; i < MAX_FDSETS; i++) {
+   if (fdsets[i] == NULL) {
+   fdsets[i] = fdset;
+   return 0;
+   }
+   }
+
+   return -1;
+}
+
 static void
 fdset_pipe_read_cb(int readfd, void *dat,
   int *remove __rte_unused)
@@ -63,7 +140,7 @@ fdset_pipe_init(struct fdset *fdset)
return -1;
}
 
-   ret = fdset_add(fdset, fdset->u.readfd,
+   ret = fdset_add_no_sync(fdset, fdset->u.readfd,
fdset_pipe_read_cb, NULL, fdset);
if (ret < 0) {
VHOST_FDMAN_LOG(ERR,
@@ -179,37 +256,82 @@ fdset_add_fd(struct fdset *pfdset, int idx, int fd,
pfd->revents = 0;
 }
 
-void
-fdset_uninit(struct fdset *pfdset)
-{
-   fdset_pipe_uninit(pfdset);
-}
-
-int
-fdset_init(struct fdset *pfdset)
+struct fdset *
+fdset_init(const char *name)
 {
+   struct fdset *fdset;
+   uint32_t val;
int i;
 
-   if (pfdset == NULL)
-   return -1;
+   if (name == NULL) {
+   VHOST_FDMAN_LOG(ERR, "Invalid name");
+   goto err;
+   }
 
-   pthread_mutex_init(&pfdset->fd_mutex, NULL);
-   pthread_mutex_init(&pfdset->fd_polling_mutex, NULL);
+   pthread_mutex_lock(&fdsets_mutex);
+   fdset = fdset_lookup(name);
+   if (fdset) {
+   pthread_mutex_unlock(&fdsets_mutex);
+   return fdset;
+   }
+
+   fdset = rte_zmalloc(NULL, sizeof(*fdset), 0);
+   if (!fdset) {
+   VHOST_FDMAN_LOG(ERR, "Failed to alloc fdset %s", name);
+   goto err_unlock;
+   }
+
+   rte_strscpy(fdset->name, name, RTE_THREAD_NAME_SIZE);
+
+   pthread_mutex_init(&fdset->fd_mutex, NULL);
+   pthread_mutex_init(&fdset->fd_polling_mutex, NULL);
 
for (i = 0; i < MAX_FDS; i++) {
-   pfdset->fd[i].fd = -1;
-   pfdset->fd[i].dat = NULL;
+   fdset->fd[i].fd = -1;
+   fdset->fd[i].dat = NULL;
}
-   pfdset->num = 0;
+   fdset->num = 0;
 
-   return fdset_pipe_init(pfdset);
+   if (fdset_pipe_init(fdset)) {
+   VHOST_FDMAN_LOG(ERR, "Failed to init pipe for %s", name);
+   goto err_free;
+   }
+

[PATCH 6/7] vhost: convert fdset sync to eventfd

2024-02-29 Thread Maxime Coquelin
This patch converts fdset's sync mechanism from
pipe to eventfd, as we only use it to send
notification events.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/fd_man.c | 65 +++---
 1 file changed, 26 insertions(+), 39 deletions(-)

diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c
index 8b47c97d45..6a5bd74656 100644
--- a/lib/vhost/fd_man.c
+++ b/lib/vhost/fd_man.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -40,19 +41,11 @@ struct fdset {
pthread_mutex_t fd_polling_mutex;
int num;/* current fd number of this fdset */
 
-   union pipefds {
-   struct {
-   int pipefd[2];
-   };
-   struct {
-   int readfd;
-   int writefd;
-   };
-   } u;
-
+   int sync_fd;
pthread_mutex_t sync_mutex;
pthread_cond_t sync_cond;
bool sync;
+
bool destroy;
 };
 
@@ -97,12 +90,11 @@ fdset_insert(struct fdset *fdset)
 }
 
 static void
-fdset_pipe_read_cb(int readfd, void *dat,
-  int *remove __rte_unused)
+fdset_sync_read_cb(int sync_fd, void *dat, int *remove __rte_unused)
 {
-   char charbuf[16];
+   eventfd_t val;
struct fdset *fdset = dat;
-   int r = read(readfd, charbuf, sizeof(charbuf));
+   int r = eventfd_read(sync_fd, &val);
/*
 * Just an optimization, we don't care if read() failed
 * so ignore explicitly its return value to make the
@@ -117,37 +109,33 @@ fdset_pipe_read_cb(int readfd, void *dat,
 }
 
 static void
-fdset_pipe_uninit(struct fdset *fdset)
+fdset_sync_uninit(struct fdset *fdset)
 {
-   fdset_del(fdset, fdset->u.readfd);
-   close(fdset->u.readfd);
-   fdset->u.readfd = -1;
-   close(fdset->u.writefd);
-   fdset->u.writefd = -1;
+   fdset_del(fdset, fdset->sync_fd);
+   close(fdset->sync_fd);
+   fdset->sync_fd = -1;
 }
 
 static int
-fdset_pipe_init(struct fdset *fdset)
+fdset_sync_init(struct fdset *fdset)
 {
int ret;
 
pthread_mutex_init(&fdset->sync_mutex, NULL);
pthread_cond_init(&fdset->sync_cond, NULL);
 
-   if (pipe(fdset->u.pipefd) < 0) {
-   VHOST_FDMAN_LOG(ERR,
-   "failed to create pipe for vhost fdset");
+   fdset->sync_fd = eventfd(0, 0);
+   if (fdset->sync_fd < 0) {
+   VHOST_FDMAN_LOG(ERR, "failed to create eventfd for %s fdset", 
fdset->name);
return -1;
}
 
-   ret = fdset_add_no_sync(fdset, fdset->u.readfd,
-   fdset_pipe_read_cb, NULL, fdset);
+   ret = fdset_add_no_sync(fdset, fdset->sync_fd, fdset_sync_read_cb, 
NULL, fdset);
if (ret < 0) {
-   VHOST_FDMAN_LOG(ERR,
-   "failed to add pipe readfd %d into vhost server fdset",
-   fdset->u.readfd);
+   VHOST_FDMAN_LOG(ERR, "failed to add eventfd %d to %s fdset",
+   fdset->sync_fd, fdset->name);
 
-   fdset_pipe_uninit(fdset);
+   fdset_sync_uninit(fdset);
return -1;
}
 
@@ -162,11 +150,10 @@ fdset_sync(struct fdset *fdset)
pthread_mutex_lock(&fdset->sync_mutex);
 
fdset->sync = false;
-   ret = write(fdset->u.writefd, "1", 1);
+   ret = eventfd_write(fdset->sync_fd, (eventfd_t)1);
if (ret < 0) {
-   VHOST_FDMAN_LOG(ERR,
-   "Failed to write to notification pipe: %s",
-   strerror(errno));
+   VHOST_FDMAN_LOG(ERR, "Failed to write sync eventfd for %s 
fdset: %s",
+   fdset->name, strerror(errno));
goto out_unlock;
}
 
@@ -292,8 +279,8 @@ fdset_init(const char *name)
}
fdset->num = 0;
 
-   if (fdset_pipe_init(fdset)) {
-   VHOST_FDMAN_LOG(ERR, "Failed to init pipe for %s", name);
+   if (fdset_sync_init(fdset)) {
+   VHOST_FDMAN_LOG(ERR, "Failed to init sync for %s", name);
goto err_free;
}
 
@@ -301,7 +288,7 @@ fdset_init(const char *name)
fdset_event_dispatch, fdset)) {
VHOST_FDMAN_LOG(ERR, "Failed to create %s event dispatch 
thread",
fdset->name);
-   goto err_pipe;
+   goto err_sync;
}
 
if (fdset_insert(fdset)) {
@@ -317,8 +304,8 @@ fdset_init(const char *name)
fdset->destroy = true;
fdset_sync(fdset);
rte_thread_join(fdset->tid, &val);
-err_pipe:
-   fdset_pipe_uninit(fdset);
+err_sync:
+   fdset_sync_uninit(fdset);
 err_free:
rte_free(fdset);
 err_unlock:
-- 
2.43.2



[PATCH 7/7] vhost: improve FD manager logging

2024-02-29 Thread Maxime Coquelin
Convert the logging macro to pass the fdset name
as argument.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/fd_man.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c
index 6a5bd74656..18426095b4 100644
--- a/lib/vhost/fd_man.c
+++ b/lib/vhost/fd_man.c
@@ -19,8 +19,8 @@
 
 RTE_LOG_REGISTER_SUFFIX(vhost_fdset_logtype, fdset, INFO);
 #define RTE_LOGTYPE_VHOST_FDMAN vhost_fdset_logtype
-#define VHOST_FDMAN_LOG(level, ...) \
-   RTE_LOG_LINE(level, VHOST_FDMAN, "" __VA_ARGS__)
+#define VHOST_FDMAN_LOG(prefix, level, fmt, args...) \
+   RTE_LOG_LINE(level, VHOST_FDMAN, "(%s) " fmt, prefix, ##args)
 
 #define FDPOLLERR (POLLERR | POLLHUP | POLLNVAL)
 
@@ -126,15 +126,13 @@ fdset_sync_init(struct fdset *fdset)
 
fdset->sync_fd = eventfd(0, 0);
if (fdset->sync_fd < 0) {
-   VHOST_FDMAN_LOG(ERR, "failed to create eventfd for %s fdset", 
fdset->name);
+   VHOST_FDMAN_LOG(fdset->name, ERR, "failed to create eventfd");
return -1;
}
 
ret = fdset_add_no_sync(fdset, fdset->sync_fd, fdset_sync_read_cb, 
NULL, fdset);
if (ret < 0) {
-   VHOST_FDMAN_LOG(ERR, "failed to add eventfd %d to %s fdset",
-   fdset->sync_fd, fdset->name);
-
+   VHOST_FDMAN_LOG(fdset->name, ERR, "failed to add eventfd %d", 
fdset->sync_fd);
fdset_sync_uninit(fdset);
return -1;
}
@@ -152,8 +150,8 @@ fdset_sync(struct fdset *fdset)
fdset->sync = false;
ret = eventfd_write(fdset->sync_fd, (eventfd_t)1);
if (ret < 0) {
-   VHOST_FDMAN_LOG(ERR, "Failed to write sync eventfd for %s 
fdset: %s",
-   fdset->name, strerror(errno));
+   VHOST_FDMAN_LOG(fdset->name, ERR, "Failed to write sync 
eventfd: %s",
+   strerror(errno));
goto out_unlock;
}
 
@@ -251,7 +249,7 @@ fdset_init(const char *name)
int i;
 
if (name == NULL) {
-   VHOST_FDMAN_LOG(ERR, "Invalid name");
+   VHOST_FDMAN_LOG("fdset", ERR, "Invalid name");
goto err;
}
 
@@ -264,7 +262,7 @@ fdset_init(const char *name)
 
fdset = rte_zmalloc(NULL, sizeof(*fdset), 0);
if (!fdset) {
-   VHOST_FDMAN_LOG(ERR, "Failed to alloc fdset %s", name);
+   VHOST_FDMAN_LOG(name, ERR, "Failed to alloc fdset");
goto err_unlock;
}
 
@@ -280,19 +278,18 @@ fdset_init(const char *name)
fdset->num = 0;
 
if (fdset_sync_init(fdset)) {
-   VHOST_FDMAN_LOG(ERR, "Failed to init sync for %s", name);
+   VHOST_FDMAN_LOG(fdset->name, ERR, "Failed to init sync");
goto err_free;
}
 
if (rte_thread_create_internal_control(&fdset->tid, fdset->name,
fdset_event_dispatch, fdset)) {
-   VHOST_FDMAN_LOG(ERR, "Failed to create %s event dispatch 
thread",
-   fdset->name);
+   VHOST_FDMAN_LOG(fdset->name, ERR, "Failed to create event 
dispatch thread");
goto err_sync;
}
 
if (fdset_insert(fdset)) {
-   VHOST_FDMAN_LOG(ERR, "Failed to insert fdset %s", name);
+   VHOST_FDMAN_LOG(fdset->name, ERR, "Failed to insert fdset");
goto err_thread;
}
 
-- 
2.43.2



[PATCH v1] ethdev: add Linux ethtool link mode conversion

2024-02-29 Thread Thomas Monjalon
Speed capabilities of a NIC may be discovered through its Linux
kernel driver. It is especially useful for bifurcated drivers,
so they don't have to duplicate the same logic in the DPDK driver.

Parsing ethtool speed capabilities is made easy thanks to
the functions added in ethdev for internal usage only.
Of course these functions work only on Linux,
so they are not compiled in other environments.

In order to ease parsing, the ethtool macro names are parsed
externally in a shell command which generates a C array
included in this patch.
It also avoids to depend on a kernel version.
This C array should be updated in future to get latest ethtool bits.
Note it is easier to update this array than adding new cases
in a parsing code.

The types in the functions are following the ethtool type:
uint32_t for bitmaps, and int8_t for the number of 32-bitmaps.

Signed-off-by: Thomas Monjalon 
---

A follow-up patch will be sent to use these functions in mlx5.
I suspect mana could use this parsing as well.

---
 lib/ethdev/ethdev_linux_ethtool.c | 162 ++
 lib/ethdev/ethdev_linux_ethtool.h |  41 
 lib/ethdev/meson.build|   9 ++
 lib/ethdev/version.map|   3 +
 4 files changed, 215 insertions(+)
 create mode 100644 lib/ethdev/ethdev_linux_ethtool.c
 create mode 100644 lib/ethdev/ethdev_linux_ethtool.h

diff --git a/lib/ethdev/ethdev_linux_ethtool.c 
b/lib/ethdev/ethdev_linux_ethtool.c
new file mode 100644
index 00..fe98882b8a
--- /dev/null
+++ b/lib/ethdev/ethdev_linux_ethtool.c
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2024 NVIDIA Corporation & Affiliates
+ */
+
+#include 
+
+#include "rte_ethdev.h"
+#include "ethdev_linux_ethtool.h"
+
+/*
+Link modes sorted with index as defined in ethtool.
+Values are speed in Mbps with LSB indicating duplex.
+
+The ethtool bits definition should not change as it is a kernel API.
+Using raw numbers directly avoids checking API availability
+and allows to compile with new bits included even on an old kernel.
+
+The array below is built from bit definitions with this shell command:
+   sed -rn 's;.*(ETHTOOL_LINK_MODE_)([0-9]+)([0-9a-zA-Z_]*).*= 
*([0-9]*).*;'\
+   '[\4] = \2, /\* \1\2\3 *\/;p' /usr/include/linux/ethtool.h |
+   awk '/_Half_/{$3=$3+1","}1'
+*/
+static uint32_t link_modes[] = {
+   [  0] =  11, /* ETHTOOL_LINK_MODE_10baseT_Half_BIT */
+   [  1] =  10, /* ETHTOOL_LINK_MODE_10baseT_Full_BIT */
+   [  2] = 101, /* ETHTOOL_LINK_MODE_100baseT_Half_BIT */
+   [  3] = 100, /* ETHTOOL_LINK_MODE_100baseT_Full_BIT */
+   [  4] =1001, /* ETHTOOL_LINK_MODE_1000baseT_Half_BIT */
+   [  5] =1000, /* ETHTOOL_LINK_MODE_1000baseT_Full_BIT */
+   [ 12] =   1, /* ETHTOOL_LINK_MODE_1baseT_Full_BIT */
+   [ 15] =2500, /* ETHTOOL_LINK_MODE_2500baseX_Full_BIT */
+   [ 17] =1000, /* ETHTOOL_LINK_MODE_1000baseKX_Full_BIT */
+   [ 18] =   1, /* ETHTOOL_LINK_MODE_1baseKX4_Full_BIT */
+   [ 19] =   1, /* ETHTOOL_LINK_MODE_1baseKR_Full_BIT */
+   [ 20] =   1, /* ETHTOOL_LINK_MODE_1baseR_FEC_BIT */
+   [ 21] =   2, /* ETHTOOL_LINK_MODE_2baseMLD2_Full_BIT */
+   [ 22] =   2, /* ETHTOOL_LINK_MODE_2baseKR2_Full_BIT */
+   [ 23] =   4, /* ETHTOOL_LINK_MODE_4baseKR4_Full_BIT */
+   [ 24] =   4, /* ETHTOOL_LINK_MODE_4baseCR4_Full_BIT */
+   [ 25] =   4, /* ETHTOOL_LINK_MODE_4baseSR4_Full_BIT */
+   [ 26] =   4, /* ETHTOOL_LINK_MODE_4baseLR4_Full_BIT */
+   [ 27] =   56000, /* ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT */
+   [ 28] =   56000, /* ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT */
+   [ 29] =   56000, /* ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT */
+   [ 30] =   56000, /* ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT */
+   [ 31] =   25000, /* ETHTOOL_LINK_MODE_25000baseCR_Full_BIT */
+   [ 32] =   25000, /* ETHTOOL_LINK_MODE_25000baseKR_Full_BIT */
+   [ 33] =   25000, /* ETHTOOL_LINK_MODE_25000baseSR_Full_BIT */
+   [ 34] =   5, /* ETHTOOL_LINK_MODE_5baseCR2_Full_BIT */
+   [ 35] =   5, /* ETHTOOL_LINK_MODE_5baseKR2_Full_BIT */
+   [ 36] =  10, /* ETHTOOL_LINK_MODE_10baseKR4_Full_BIT */
+   [ 37] =  10, /* ETHTOOL_LINK_MODE_10baseSR4_Full_BIT */
+   [ 38] =  10, /* ETHTOOL_LINK_MODE_10baseCR4_Full_BIT */
+   [ 39] =  10, /* ETHTOOL_LINK_MODE_10baseLR4_ER4_Full_BIT */
+   [ 40] =   5, /* ETHTOOL_LINK_MODE_5baseSR2_Full_BIT */
+   [ 41] =1000, /* ETHTOOL_LINK_MODE_1000baseX_Full_BIT */
+   [ 42] =   1, /* ETHTOOL_LINK_MODE_1baseCR_Full_BIT */
+   [ 43] =   1, /* ETHTOOL_LINK_MODE_1baseSR_Full_BIT */
+   [ 44] =   1, /* ETHTOOL_LINK_MODE_1baseLR_Full_BIT */
+   [ 45] =   1, /* ETHTOOL_LINK_MODE_1baseLRM_Full_BIT */
+   [ 46] =   1, /* ETH

[v10 0/3] net/af_xdp: fix multi interface support for K8s

2024-02-29 Thread Maryam Tahhan
The original `use_cni` implementation was limited to
supporting only a single netdev in a DPDK pod. This patchset
aims to fix this limitation transparently to the end user.
It will also enable compatibility with the latest AF_XDP
Device Plugin. 

Signed-off-by: Maryam Tahhan 
---
v10:
* Add UDS acronym
* Update `use_cni` in docs with ``use_cni``
* Remove reference to limitations and simply document behaviour
  before and after DPDK 23.11.

v9:
* Fixup checkpatch issues.

v8:
* Go back to using `use_cni` vdev argument
* Introduce `use_map_pinning` vdev param.
* Rename `uds_path` to `dp_path` so that it can be used
  with map pinning as well as `use_cni`.
* Set `dp_path` internally in the AF_XDP PMD if it's
  not configured by the user.
* Clean up the original `use_cni` documentation separately
  to coding changes.

v7:
* Give a more descriptive commit msg headline.
* Fixup typos in documentation.

v6:
* Add link to PR 81 in commit message
* Add release notes changes to this patchset

v5:
* Fix alignment for ETH_AF_XDP_USE_DP_UDS_PATH_ARG
* Remove use_cni references in af_xdp.rst

v4:
* Rename af_xdp_cni.rst to af_xdp_dp.rst
* Removed all incorrect references to CNI throughout af_xdp
  PMD file.
* Fixed Typos in af_xdp_dp.rst

v3:
* Remove `use_cni` vdev argument as it's no longer needed.
* Update incorrect CNI references for the AF_XDP DP in the
  documentation.
* Update the documentation to run a simple example with the
  AF_XDP DP plugin in K8s.

v2:
* Rename sock_path to uds_path.
* Update documentation to reflect when CAP_BPF is needed.
* Fix testpmd arguments in the provided example for Pods.
* Use AF_XDP API to update the xskmap entry.
---

Maryam Tahhan (3):
  docs: AF_XDP Device Plugin
  net/af_xdp: fix multi interface support for K8s
  net/af_xdp: support AF_XDP DP pinned maps

 doc/guides/howto/af_xdp_cni.rst| 253 --
 doc/guides/howto/af_xdp_dp.rst | 340 +
 doc/guides/howto/index.rst |   2 +-
 doc/guides/nics/af_xdp.rst |  44 +++-
 doc/guides/rel_notes/release_24_03.rst |  17 ++
 drivers/net/af_xdp/rte_eth_af_xdp.c| 167 
 6 files changed, 522 insertions(+), 301 deletions(-)
 delete mode 100644 doc/guides/howto/af_xdp_cni.rst
 create mode 100644 doc/guides/howto/af_xdp_dp.rst

-- 
2.41.0



[v10 1/3] docs: AF_XDP Device Plugin

2024-02-29 Thread Maryam Tahhan
Fixup the references to the AF_XDP Device Plugin in
the documentation (was referred to as CNI previously)
and document the single netdev limitation for deploying
an AF_XDP based DPDK pod. Also renames af_xdp_cni.rst to
af_xdp_dp.rst

Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: sta...@dpdk.org

Signed-off-by: Maryam Tahhan 
---
 doc/guides/howto/af_xdp_cni.rst | 253 ---
 doc/guides/howto/af_xdp_dp.rst  | 299 
 doc/guides/howto/index.rst  |   2 +-
 doc/guides/nics/af_xdp.rst  |   4 +-
 4 files changed, 302 insertions(+), 256 deletions(-)
 delete mode 100644 doc/guides/howto/af_xdp_cni.rst
 create mode 100644 doc/guides/howto/af_xdp_dp.rst

diff --git a/doc/guides/howto/af_xdp_cni.rst b/doc/guides/howto/af_xdp_cni.rst
deleted file mode 100644
index a1a6d5b99c..00
--- a/doc/guides/howto/af_xdp_cni.rst
+++ /dev/null
@@ -1,253 +0,0 @@
-.. SPDX-License-Identifier: BSD-3-Clause
-   Copyright(c) 2023 Intel Corporation.
-
-Using a CNI with the AF_XDP driver
-==
-
-Introduction
-
-
-CNI, the Container Network Interface, is a technology for configuring
-container network interfaces
-and which can be used to setup Kubernetes networking.
-AF_XDP is a Linux socket Address Family that enables an XDP program
-to redirect packets to a memory buffer in userspace.
-
-This document explains how to enable the `AF_XDP Plugin for Kubernetes`_ within
-a DPDK application using the :doc:`../nics/af_xdp` to connect and use these 
technologies.
-
-.. _AF_XDP Plugin for Kubernetes: 
https://github.com/intel/afxdp-plugins-for-kubernetes
-
-
-Background
---
-
-The standard :doc:`../nics/af_xdp` initialization process involves loading an 
eBPF program
-onto the kernel netdev to be used by the PMD.
-This operation requires root or escalated Linux privileges
-and thus prevents the PMD from working in an unprivileged container.
-The AF_XDP CNI plugin handles this situation
-by providing a device plugin that performs the program loading.
-
-At a technical level the CNI opens a Unix Domain Socket and listens for a 
client
-to make requests over that socket.
-A DPDK application acting as a client connects and initiates a configuration 
"handshake".
-The client then receives a file descriptor which points to the XSKMAP
-associated with the loaded eBPF program.
-The XSKMAP is a BPF map of AF_XDP sockets (XSK).
-The client can then proceed with creating an AF_XDP socket
-and inserting that socket into the XSKMAP pointed to by the descriptor.
-
-The EAL vdev argument ``use_cni`` is used to indicate that the user wishes
-to run the PMD in unprivileged mode and to receive the XSKMAP file descriptor
-from the CNI.
-When this flag is set,
-the ``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag
-should be used when creating the socket
-to instruct libbpf not to load the default libbpf program on the netdev.
-Instead the loading is handled by the CNI.
-
-.. note::
-
-   The Unix Domain Socket file path appear in the end user is 
"/tmp/afxdp.sock".
-
-
-Prerequisites
--
-
-Docker and container prerequisites:
-
-* Set up the device plugin
-  as described in the instructions for `AF_XDP Plugin for Kubernetes`_.
-
-* The Docker image should contain the libbpf and libxdp libraries,
-  which are dependencies for AF_XDP,
-  and should include support for the ``ethtool`` command.
-
-* The Pod should have enabled the capabilities ``CAP_NET_RAW`` and ``CAP_BPF``
-  for AF_XDP along with support for hugepages.
-
-* Increase locked memory limit so containers have enough memory for packet 
buffers.
-  For example:
-
-  .. code-block:: console
-
- cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
- [Service]
- LimitMEMLOCK=infinity
- EOF
-
-* dpdk-testpmd application should have AF_XDP feature enabled.
-
-  For further information see the docs for the: :doc:`../../nics/af_xdp`.
-
-
-Example

-
-Howto run dpdk-testpmd with CNI plugin:
-
-* Clone the CNI plugin
-
-  .. code-block:: console
-
- # git clone https://github.com/intel/afxdp-plugins-for-kubernetes.git
-
-* Build the CNI plugin
-
-  .. code-block:: console
-
- # cd afxdp-plugins-for-kubernetes/
- # make build
-
-  .. note::
-
- CNI plugin has a dependence on the config.json.
-
-  Sample Config.json
-
-  .. code-block:: json
-
- {
-"logLevel":"debug",
-"logFile":"afxdp-dp-e2e.log",
-"pools":[
-   {
-  "name":"e2e",
-  "mode":"primary",
-  "timeout":30,
-  "ethtoolCmds" : ["-L -device- combined 1"],
-  "devices":[
- {
-"name":"ens785f0"
- }
-  ]
-   }
-]
- }
-
-  For further reference please use the `config.json`_
-
-  .. _config.json: 
https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/config.

[v10 3/3] net/af_xdp: support AF_XDP DP pinned maps

2024-02-29 Thread Maryam Tahhan
Enable the AF_XDP PMD to retrieve the xskmap
from a pinned eBPF map. This map is expected
to be pinned by an external entity like the
AF_XDP Device Plugin. This enabled unprivileged
pods to create and use AF_XDP sockets.

Signed-off-by: Maryam Tahhan 
---
 doc/guides/howto/af_xdp_dp.rst | 35 --
 doc/guides/nics/af_xdp.rst | 34 --
 doc/guides/rel_notes/release_24_03.rst | 10 +++
 drivers/net/af_xdp/rte_eth_af_xdp.c| 93 --
 4 files changed, 141 insertions(+), 31 deletions(-)

diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index ec348c3b82..9aa9f7d8d4 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,10 +52,21 @@ should be used when creating the socket
 to instruct libbpf not to load the default libbpf program on the netdev.
 Instead the loading is handled by the AF_XDP Device Plugin.
 
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
-argument then the AF_XDP PMD configures it internally.
+The EAL vdev argument ``use_pinned_map`` is used indicate to the AF_XDP PMD to
+retrieve the XSKMAP fd from a pinned eBPF map. This map is expected to be 
pinned
+by an external entity like the AF_XDP Device Plugin. This enabled unprivileged 
pods
+to create and use AF_XDP sockets. When this flag is set, the
+``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag is used by the AF_XDP PMD 
when
+creating the AF_XDP socket.
+
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or 
``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` 
arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for 
Kubernetes`_.
 
 .. note::
 
@@ -312,8 +323,18 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
--no-mlockall --in-memory \
-- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
 
+  Or
+
+  .. code-block:: console
+
+ kubectl exec -i  --container  -- \
+   //dpdk-testpmd -l 0,1 --no-pci \
+   --vdev=net_af_xdp0,use_pinned_map=1,iface=,dp_path="/tmp/afxdp_dp//xsks_map" \
+   --no-mlockall --in-memory \
+   -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
 .. note::
 
-If the ``dp_path`` parameter isn't explicitly set (like the example above)
-the AF_XDP PMD will set the parameter value to
-``/tmp/afxdp_dp/<>/afxdp.sock``.
+If the ``dp_path`` parameter isn't explicitly set with ``use_cni`` or 
``use_pinned_map``
+the AF_XDP PMD will set the parameter values to the `AF_XDP Device Plugin 
for Kubernetes`_
+defaults.
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 7f8651beda..940bbf60f2 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -171,13 +171,35 @@ enable the `AF_XDP Device Plugin for Kubernetes`_ with a 
DPDK application/pod.
so enabling and disabling of the promiscuous mode through the DPDK 
application
is also not supported.
 
+use_pinned_map
+~~
+
+The EAL vdev argument ``use_pinned_map`` is used to indicate that the user 
wishes to
+load a pinned xskmap mounted by `AF_XDP Device Plugin for Kubernetes`_ in the 
DPDK
+application/pod.
+
+.. _AF_XDP Device Plugin for Kubernetes: 
https://github.com/intel/afxdp-plugins-for-kubernetes
+
+.. code-block:: console
+
+   --vdev=net_af_xdp0,use_pinned_map=1
+
+.. note::
+
+This feature can also be used with any external entity that can pin an 
eBPF map, not just
+the `AF_XDP Device Plugin for Kubernetes`_.
+
 dp_path
 ~~~
 
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
-alongside the ``use_cni`` argument then the AF_XDP PMD configures it 
internally.
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or 
``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` 
arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for 
Kubernetes`_.
 
 .. _AF_XDP Device Plugin for Kubernetes: 
https://github.com/intel/afxdp-plugins-for-kubernetes
 
@@ -185,6 +207,10 @@ alongside the ``use_cni`` argument then the AF_XDP PMD 
configures it internally.
 
--vdev=net_af_x

[v10 2/3] net/af_xdp: fix multi interface support for K8s

2024-02-29 Thread Maryam Tahhan
The original 'use_cni' implementation, was added
to enable support for the AF_XDP PMD in a K8s env
without any escalated privileges.
However 'use_cni' used a hardcoded socket rather
than a configurable one. If a DPDK pod is requesting
multiple net devices and these devices are from
different pools, then the AF_XDP PMD attempts to
mount all the netdev UDSes in the pod as /tmp/afxdp.sock.
Which means that at best only 1 netdev will handshake
correctly with the AF_XDP DP. This patch addresses
this by making the socket parameter configurable using
a new vdev param called 'dp_path' alongside the
original 'use_cni' param. If the 'dp_path' parameter
is not set alongside the 'use_cni' parameter, then
it's configured inside the AF_XDP PMD (transparently
to the user). This change has been tested
with the AF_XDP DP PR 81[1], with both single and
multiple interfaces.

[1] https://github.com/intel/afxdp-plugins-for-kubernetes/pull/81

Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: sta...@dpdk.org

Signed-off-by: Maryam Tahhan 
---
 doc/guides/howto/af_xdp_dp.rst | 62 +++--
 doc/guides/nics/af_xdp.rst | 14 
 doc/guides/rel_notes/release_24_03.rst |  7 ++
 drivers/net/af_xdp/rte_eth_af_xdp.c| 94 --
 4 files changed, 121 insertions(+), 56 deletions(-)

diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index 7166d904bd..ec348c3b82 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,29 +52,33 @@ should be used when creating the socket
 to instruct libbpf not to load the default libbpf program on the netdev.
 Instead the loading is handled by the AF_XDP Device Plugin.
 
-Limitations

+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
+argument then the AF_XDP PMD configures it internally.
 
-For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
-the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
-is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
-and the pod is limited to a single netdev.
+.. note::
+
+DPDK AF_XDP PMD <= v23.11 will only work with the AF_XDP Device Plugin
+<= commit id `38317c2`_.
 
 .. note::
 
-DPDK AF_XDP PMD <= v23.11 will not work with the latest version of the
-AF_XDP Device Plugin.
+DPDK AF_XDP PMD > v23.11 will work with latest version of the
+AF_XDP Device Plugin through a combination of the ``dp_path`` and/or
+the ``use_cni`` parameter. In these versions of the PMD if a user doesn't
+explicitly set the ``dp_path``parameter when using ``use_cni`` then that
+path is transparently configured in the AF_XDP PMD to the default
+`AF_XDP Device Plugin for Kubernetes`_ mount point path. The path can
+be overriden by explicitly setting the ``dp_path`` param.
 
-The issue is if a single pod requests different devices from different pools it
-results in multiple UDS servers serving the pod with the container using only a
-single mount point for their UDS as ``/tmp/afxdp.sock``. This means that at 
best one
-device might be able to complete the handshake. This has been fixed in the 
AF_XDP
-Device Plugin so that the mount point in the pods for the UDS appear at
-``/tmp/afxdp_dp//afxdp.sock``. Later versions of DPDK fix this 
hardcoded path
-in the PMD alongside the ``use_cni`` parameter.
+.. note::
 
-.. _38317c2: 
https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
+DPDK AF_XDP PMD > v23.11 is backwards compatible with (older) versions
+of the AF_XDP DP <= commit id `38317c2`_ by explicitly setting ``dp_path`` 
to
+``/tmp/afxdp.sock``.
 
+.. _38317c2: 
https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
 
 Prerequisites
 -
@@ -105,10 +109,10 @@ Device Plugin and DPDK container prerequisites:
 
   .. code-block:: console
 
- cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
- [Service]
- LimitMEMLOCK=infinity
- EOF
+cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
+[Service]
+LimitMEMLOCK=infinity
+EOF
 
 * dpdk-testpmd application should have AF_XDP feature enabled.
 
@@ -284,7 +288,7 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
 emptyDir:
   medium: HugePages
 
-  For further reference please use the `pod.yaml`_
+  For further reference please see the `pod.yaml`_
 
   .. _pod.yaml: 
https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
 
@@ -297,3 +301,19 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
--vdev=net_af_xdp0,use_cni=1,iface= \
--no-mlockall --in-memory \
  

Re: [v10 2/3] net/af_xdp: fix multi interface support for K8s

2024-02-29 Thread Maryam Tahhan

On 29/02/2024 13:01, Maryam Tahhan wrote:

The original 'use_cni' implementation, was added
to enable support for the AF_XDP PMD in a K8s env
without any escalated privileges.
However 'use_cni' used a hardcoded socket rather
than a configurable one. If a DPDK pod is requesting
multiple net devices and these devices are from
different pools, then the AF_XDP PMD attempts to
mount all the netdev UDSes in the pod as /tmp/afxdp.sock.
Which means that at best only 1 netdev will handshake
correctly with the AF_XDP DP. This patch addresses
this by making the socket parameter configurable using
a new vdev param called 'dp_path' alongside the
original 'use_cni' param. If the 'dp_path' parameter
is not set alongside the 'use_cni' parameter, then
it's configured inside the AF_XDP PMD (transparently
to the user). This change has been tested
with the AF_XDP DP PR 81[1], with both single and
multiple interfaces.

[1] https://github.com/intel/afxdp-plugins-for-kubernetes/pull/81

Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: sta...@dpdk.org

Signed-off-by: Maryam Tahhan 
---
  doc/guides/howto/af_xdp_dp.rst | 62 +++--
  doc/guides/nics/af_xdp.rst | 14 
  doc/guides/rel_notes/release_24_03.rst |  7 ++
  drivers/net/af_xdp/rte_eth_af_xdp.c| 94 --
  4 files changed, 121 insertions(+), 56 deletions(-)

diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index 7166d904bd..ec348c3b82 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,29 +52,33 @@ should be used when creating the socket
  to instruct libbpf not to load the default libbpf program on the netdev.
  Instead the loading is handled by the AF_XDP Device Plugin.
  
-Limitations


+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
+argument then the AF_XDP PMD configures it internally.
  
-For DPDK versions <= v23.11 the Unix Domain Socket file path appears in

-the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
-is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
-and the pod is limited to a single netdev.
+.. note::
+
+DPDK AF_XDP PMD <= v23.11 will only work with the AF_XDP Device Plugin
+<= commit id `38317c2`_.
  
  .. note::
  
-DPDK AF_XDP PMD <= v23.11 will not work with the latest version of the

-AF_XDP Device Plugin.
+DPDK AF_XDP PMD > v23.11 will work with latest version of the
+AF_XDP Device Plugin through a combination of the ``dp_path`` and/or
+the ``use_cni`` parameter. In these versions of the PMD if a user doesn't
+explicitly set the ``dp_path``parameter when using ``use_cni`` then that

I see the typo - will respin - sorry, it's been a long day already

+path is transparently configured in the AF_XDP PMD to the default
+`AF_XDP Device Plugin for Kubernetes`_ mount point path. The path can
+be overriden by explicitly setting the ``dp_path`` param.
  
-The issue is if a single pod requests different devices from different pools it

-results in multiple UDS servers serving the pod with the container using only a
-single mount point for their UDS as ``/tmp/afxdp.sock``. This means that at 
best one
-device might be able to complete the handshake. This has been fixed in the 
AF_XDP
-Device Plugin so that the mount point in the pods for the UDS appear at
-``/tmp/afxdp_dp//afxdp.sock``. Later versions of DPDK fix this 
hardcoded path
-in the PMD alongside the ``use_cni`` parameter.
+.. note::
  
-.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669

+DPDK AF_XDP PMD > v23.11 is backwards compatible with (older) versions
+of the AF_XDP DP <= commit id `38317c2`_ by explicitly setting ``dp_path`` 
to
+``/tmp/afxdp.sock``.
  
+.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
  
  Prerequisites

  -
@@ -105,10 +109,10 @@ Device Plugin and DPDK container prerequisites:
  
.. code-block:: console
  
- cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf

- [Service]
- LimitMEMLOCK=infinity
- EOF
+cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
+[Service]
+LimitMEMLOCK=infinity
+EOF
  
  * dpdk-testpmd application should have AF_XDP feature enabled.
  
@@ -284,7 +288,7 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI

  emptyDir:
medium: HugePages
  
-  For further reference please use the `pod.yaml`_

+  For further reference please see the `pod.yaml`_
  
.. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
  
@@ -297,3 +301,19

RE: [EXT] Re: [PATCH v10 4/4] app/dma-perf: add SG copy support

2024-02-29 Thread Gowrishankar Muthukrishnan
Hi Fengcheng,

> -Original Message-
> From: fengchengwen 
> Sent: Wednesday, February 28, 2024 3:02 PM
> To: Amit Prakash Shukla ; Cheng Jiang
> 
> Cc: dev@dpdk.org; Jerin Jacob ; Anoob Joseph
> ; Kevin Laatz ; Bruce
> Richardson ; Pavan Nikhilesh Bhagavatula
> ; Gowrishankar Muthukrishnan
> 
> Subject: [EXT] Re: [PATCH v10 4/4] app/dma-perf: add SG copy support
> 
> External Email
> 
> --
> Hi Gowrishankar,
> 
> On 2024/2/28 2:56, Amit Prakash Shukla wrote:
> > From: Gowrishankar Muthukrishnan 
> >
> > Add SG copy support.
> >
> > Signed-off-by: Gowrishankar Muthukrishnan 
> > Acked-by: Anoob Joseph 
> > Acked-by: Chengwen Feng 
> > ---
> > v10:
> > - SG config variables renamed.
> >
> >  app/test-dma-perf/benchmark.c | 278
> > +-
> >  app/test-dma-perf/config.ini  |  25 ++-
> >  app/test-dma-perf/main.c  |  34 -
> >  app/test-dma-perf/main.h  |   5 +-
> >  4 files changed, 300 insertions(+), 42 deletions(-)
> >
> > diff --git a/app/test-dma-perf/benchmark.c
> > b/app/test-dma-perf/benchmark.c index 0047e2f4b8..25ed6fa6d0 100644
> > --- a/app/test-dma-perf/benchmark.c
> > +++ b/app/test-dma-perf/benchmark.c
> > @@ -46,6 +46,10 @@ struct lcore_params {
> > uint16_t test_secs;
> > struct rte_mbuf **srcs;
> > struct rte_mbuf **dsts;
> > +   struct rte_dma_sge *src_sges;
> > +   struct rte_dma_sge *dst_sges;
> > +   uint8_t src_ptrs;
> > +   uint8_t dst_ptrs;
> 
> 1. src/dst_ptrs -> src/dst_nb_sge
Ack.

> 2. How about wrap these four fields as a struct?
Ack.

> 
> > volatile struct worker_info worker_info;  };
> >
> > @@ -86,21 +90,31 @@ calc_result(uint32_t buf_size, uint32_t nr_buf,
> > uint16_t nb_workers, uint16_t te  }
> >
> >  static void
> > -output_result(uint8_t scenario_id, uint32_t lcore_id, char *dma_name,
> uint16_t ring_size,
> > -   uint16_t kick_batch, uint64_t ave_cycle, uint32_t
> buf_size, uint32_t nr_buf,
> > -   float memory, float bandwidth, float mops, bool
> is_dma)
> > +output_result(struct test_configure *cfg, struct lcore_params *para,
> > +   uint16_t kick_batch, uint64_t ave_cycle, uint32_t
> buf_size,
> > +   uint32_t nr_buf, float memory, float bandwidth, float
> mops)
> >  {
> > -   if (is_dma)
> > -   printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch Size:
> %u.\n",
> > -   lcore_id, dma_name, ring_size, kick_batch);
> > -   else
> > +   uint16_t ring_size = cfg->ring_size.cur;
> > +   uint8_t scenario_id = cfg->scenario_id;
> > +   uint32_t lcore_id = para->lcore_id;
> > +   char *dma_name = para->dma_name;
> > +
> > +   if (cfg->is_dma) {
> > +   printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch Size:
> %u", lcore_id,
> > +  dma_name, ring_size, kick_batch);
> > +   if (cfg->is_sg)
> > +   printf(" DMA src ptrs: %u, dst ptrs: %u",
> > +  para->src_ptrs, para->dst_ptrs);
> 
> DMA src sges: %u DMA dst sges: %u
> 
> I think we should add a column which title maybe misc, some like sg-src[4]-
> dst[1], and later we may add fill test, then this field could be pattern-
> 0x12345678
> 
> And in "[PATCH v10 2/4] app/dma-perf: add PCI device support" commit, if
> the DMA was worked in non-mem2mem direction, we could add simple
> descriptor of direction and pcie.info in the above misc column.
> 

I am sorry, I could not understand complete picture here. Do you mean we 
reserve a column and use it as per test type.

For plain mem copy, nothing added.
For SG mem copy, instead of showing "DMA src sges: 1, dst sges: 4", print 
"sg-src[1]-dst[4]".
In future, when we add fill test in benchmark, this line instead be 
"pattern-0x12345678".

Is my understanding correct over here ?

> > +   printf(".\n");
> > +   } else {
> > printf("lcore %u\n", lcore_id);
> > +   }
> >
> > printf("Average Cycles/op: %" PRIu64 ", Buffer Size: %u B, Buffer
> Number: %u, Memory: %.2lf MB, Frequency: %.3lf Ghz.\n",
> > ave_cycle, buf_size, nr_buf, memory,
> rte_get_timer_hz()/10.0);
> > printf("Average Bandwidth: %.3lf Gbps, MOps: %.3lf\n", bandwidth,
> > mops);
> >
> > -   if (is_dma)
> > +   if (cfg->is_dma)
> > snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN,
> CSV_LINE_DMA_FMT,
> > scenario_id, lcore_id, dma_name, ring_size,
> kick_batch, buf_size,
> > nr_buf, memory, ave_cycle, bandwidth, mops); @@ -
> 167,7 +181,7 @@
> > vchan_data_populate(uint32_t dev_id, struct rte_dma_vchan_conf *qconf,
> >
> >  /* Configuration of device. */
> >  static void
> > -configure_dmadev_queue(uint32_t dev_id, struct test_configure *cfg)
> > +configure_dmadev_queue(uint32_t dev_id, struct test_configure *cfg,
> > +uint8_t ptrs_max)
> >  {
> > uint16_t vchan = 0;
> > struct rte_dma_info info;
> > @@ -19

[v11 0/3] net/af_xdp: fix multi interface support for K8s

2024-02-29 Thread Maryam Tahhan
The original `use_cni` implementation was limited to
supporting only a single netdev in a DPDK pod. This patchset
aims to fix this limitation transparently to the end user.
It will also enable compatibility with the latest AF_XDP
Device Plugin. 

Signed-off-by: Maryam Tahhan 
---
v11:
* Fixed up typos picked up by checkpatch.

v10:
* Add UDS acronym
* Update `use_cni` in docs with ``use_cni``
* Remove reference to limitations and simply document behaviour
  before and after DPDK 23.11.

v9:
* Fixup checkpatch issues.

v8:
* Go back to using `use_cni` vdev argument
* Introduce `use_map_pinning` vdev param.
* Rename `uds_path` to `dp_path` so that it can be used
  with map pinning as well as `use_cni`.
* Set `dp_path` internally in the AF_XDP PMD if it's
  not configured by the user.
* Clean up the original `use_cni` documentation separately
  to coding changes.

v7:
* Give a more descriptive commit msg headline.
* Fixup typos in documentation.

v6:
* Add link to PR 81 in commit message
* Add release notes changes to this patchset

v5:
* Fix alignment for ETH_AF_XDP_USE_DP_UDS_PATH_ARG
* Remove use_cni references in af_xdp.rst

v4:
* Rename af_xdp_cni.rst to af_xdp_dp.rst
* Removed all incorrect references to CNI throughout af_xdp
  PMD file.
* Fixed Typos in af_xdp_dp.rst

v3:
* Remove `use_cni` vdev argument as it's no longer needed.
* Update incorrect CNI references for the AF_XDP DP in the
  documentation.
* Update the documentation to run a simple example with the
  AF_XDP DP plugin in K8s.

v2:
* Rename sock_path to uds_path.
* Update documentation to reflect when CAP_BPF is needed.
* Fix testpmd arguments in the provided example for Pods.
* Use AF_XDP API to update the xskmap entry.
---

Maryam Tahhan (3):
  docs: AF_XDP Device Plugin
  net/af_xdp: fix multi interface support for K8s
  net/af_xdp: support AF_XDP DP pinned maps

 doc/guides/howto/af_xdp_cni.rst| 253 --
 doc/guides/howto/af_xdp_dp.rst | 340 +
 doc/guides/howto/index.rst |   2 +-
 doc/guides/nics/af_xdp.rst |  44 +++-
 doc/guides/rel_notes/release_24_03.rst |  17 ++
 drivers/net/af_xdp/rte_eth_af_xdp.c| 167 
 6 files changed, 522 insertions(+), 301 deletions(-)
 delete mode 100644 doc/guides/howto/af_xdp_cni.rst
 create mode 100644 doc/guides/howto/af_xdp_dp.rst

-- 
2.41.0



[v11 1/3] docs: AF_XDP Device Plugin

2024-02-29 Thread Maryam Tahhan
Fixup the references to the AF_XDP Device Plugin in
the documentation (was referred to as CNI previously)
and document the single netdev limitation for deploying
an AF_XDP based DPDK pod. Also renames af_xdp_cni.rst to
af_xdp_dp.rst

Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: sta...@dpdk.org

Signed-off-by: Maryam Tahhan 
---
 doc/guides/howto/af_xdp_cni.rst | 253 ---
 doc/guides/howto/af_xdp_dp.rst  | 299 
 doc/guides/howto/index.rst  |   2 +-
 doc/guides/nics/af_xdp.rst  |   4 +-
 4 files changed, 302 insertions(+), 256 deletions(-)
 delete mode 100644 doc/guides/howto/af_xdp_cni.rst
 create mode 100644 doc/guides/howto/af_xdp_dp.rst

diff --git a/doc/guides/howto/af_xdp_cni.rst b/doc/guides/howto/af_xdp_cni.rst
deleted file mode 100644
index a1a6d5b99c..00
--- a/doc/guides/howto/af_xdp_cni.rst
+++ /dev/null
@@ -1,253 +0,0 @@
-.. SPDX-License-Identifier: BSD-3-Clause
-   Copyright(c) 2023 Intel Corporation.
-
-Using a CNI with the AF_XDP driver
-==
-
-Introduction
-
-
-CNI, the Container Network Interface, is a technology for configuring
-container network interfaces
-and which can be used to setup Kubernetes networking.
-AF_XDP is a Linux socket Address Family that enables an XDP program
-to redirect packets to a memory buffer in userspace.
-
-This document explains how to enable the `AF_XDP Plugin for Kubernetes`_ within
-a DPDK application using the :doc:`../nics/af_xdp` to connect and use these 
technologies.
-
-.. _AF_XDP Plugin for Kubernetes: 
https://github.com/intel/afxdp-plugins-for-kubernetes
-
-
-Background
---
-
-The standard :doc:`../nics/af_xdp` initialization process involves loading an 
eBPF program
-onto the kernel netdev to be used by the PMD.
-This operation requires root or escalated Linux privileges
-and thus prevents the PMD from working in an unprivileged container.
-The AF_XDP CNI plugin handles this situation
-by providing a device plugin that performs the program loading.
-
-At a technical level the CNI opens a Unix Domain Socket and listens for a 
client
-to make requests over that socket.
-A DPDK application acting as a client connects and initiates a configuration 
"handshake".
-The client then receives a file descriptor which points to the XSKMAP
-associated with the loaded eBPF program.
-The XSKMAP is a BPF map of AF_XDP sockets (XSK).
-The client can then proceed with creating an AF_XDP socket
-and inserting that socket into the XSKMAP pointed to by the descriptor.
-
-The EAL vdev argument ``use_cni`` is used to indicate that the user wishes
-to run the PMD in unprivileged mode and to receive the XSKMAP file descriptor
-from the CNI.
-When this flag is set,
-the ``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag
-should be used when creating the socket
-to instruct libbpf not to load the default libbpf program on the netdev.
-Instead the loading is handled by the CNI.
-
-.. note::
-
-   The Unix Domain Socket file path appear in the end user is 
"/tmp/afxdp.sock".
-
-
-Prerequisites
--
-
-Docker and container prerequisites:
-
-* Set up the device plugin
-  as described in the instructions for `AF_XDP Plugin for Kubernetes`_.
-
-* The Docker image should contain the libbpf and libxdp libraries,
-  which are dependencies for AF_XDP,
-  and should include support for the ``ethtool`` command.
-
-* The Pod should have enabled the capabilities ``CAP_NET_RAW`` and ``CAP_BPF``
-  for AF_XDP along with support for hugepages.
-
-* Increase locked memory limit so containers have enough memory for packet 
buffers.
-  For example:
-
-  .. code-block:: console
-
- cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
- [Service]
- LimitMEMLOCK=infinity
- EOF
-
-* dpdk-testpmd application should have AF_XDP feature enabled.
-
-  For further information see the docs for the: :doc:`../../nics/af_xdp`.
-
-
-Example

-
-Howto run dpdk-testpmd with CNI plugin:
-
-* Clone the CNI plugin
-
-  .. code-block:: console
-
- # git clone https://github.com/intel/afxdp-plugins-for-kubernetes.git
-
-* Build the CNI plugin
-
-  .. code-block:: console
-
- # cd afxdp-plugins-for-kubernetes/
- # make build
-
-  .. note::
-
- CNI plugin has a dependence on the config.json.
-
-  Sample Config.json
-
-  .. code-block:: json
-
- {
-"logLevel":"debug",
-"logFile":"afxdp-dp-e2e.log",
-"pools":[
-   {
-  "name":"e2e",
-  "mode":"primary",
-  "timeout":30,
-  "ethtoolCmds" : ["-L -device- combined 1"],
-  "devices":[
- {
-"name":"ens785f0"
- }
-  ]
-   }
-]
- }
-
-  For further reference please use the `config.json`_
-
-  .. _config.json: 
https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/config.

[v11 2/3] net/af_xdp: fix multi interface support for K8s

2024-02-29 Thread Maryam Tahhan
The original 'use_cni' implementation, was added
to enable support for the AF_XDP PMD in a K8s env
without any escalated privileges.
However 'use_cni' used a hardcoded socket rather
than a configurable one. If a DPDK pod is requesting
multiple net devices and these devices are from
different pools, then the AF_XDP PMD attempts to
mount all the netdev UDSes in the pod as /tmp/afxdp.sock.
Which means that at best only 1 netdev will handshake
correctly with the AF_XDP DP. This patch addresses
this by making the socket parameter configurable using
a new vdev param called 'dp_path' alongside the
original 'use_cni' param. If the 'dp_path' parameter
is not set alongside the 'use_cni' parameter, then
it's configured inside the AF_XDP PMD (transparently
to the user). This change has been tested
with the AF_XDP DP PR 81[1], with both single and
multiple interfaces.

[1] https://github.com/intel/afxdp-plugins-for-kubernetes/pull/81

Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: sta...@dpdk.org

Signed-off-by: Maryam Tahhan 
---
 doc/guides/howto/af_xdp_dp.rst | 62 +++--
 doc/guides/nics/af_xdp.rst | 14 
 doc/guides/rel_notes/release_24_03.rst |  7 ++
 drivers/net/af_xdp/rte_eth_af_xdp.c| 94 --
 4 files changed, 121 insertions(+), 56 deletions(-)

diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index 7166d904bd..4aa6b5499f 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,29 +52,33 @@ should be used when creating the socket
 to instruct libbpf not to load the default libbpf program on the netdev.
 Instead the loading is handled by the AF_XDP Device Plugin.
 
-Limitations

+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
+argument then the AF_XDP PMD configures it internally.
 
-For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
-the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
-is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
-and the pod is limited to a single netdev.
+.. note::
+
+DPDK AF_XDP PMD <= v23.11 will only work with the AF_XDP Device Plugin
+<= commit id `38317c2`_.
 
 .. note::
 
-DPDK AF_XDP PMD <= v23.11 will not work with the latest version of the
-AF_XDP Device Plugin.
+DPDK AF_XDP PMD > v23.11 will work with latest version of the
+AF_XDP Device Plugin through a combination of the ``dp_path`` and/or
+the ``use_cni`` parameter. In these versions of the PMD if a user doesn't
+explicitly set the ``dp_path`` parameter when using ``use_cni`` then that
+path is transparently configured in the AF_XDP PMD to the default
+`AF_XDP Device Plugin for Kubernetes`_ mount point path. The path can
+be overridden by explicitly setting the ``dp_path`` param.
 
-The issue is if a single pod requests different devices from different pools it
-results in multiple UDS servers serving the pod with the container using only a
-single mount point for their UDS as ``/tmp/afxdp.sock``. This means that at 
best one
-device might be able to complete the handshake. This has been fixed in the 
AF_XDP
-Device Plugin so that the mount point in the pods for the UDS appear at
-``/tmp/afxdp_dp//afxdp.sock``. Later versions of DPDK fix this 
hardcoded path
-in the PMD alongside the ``use_cni`` parameter.
+.. note::
 
-.. _38317c2: 
https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
+DPDK AF_XDP PMD > v23.11 is backwards compatible with (older) versions
+of the AF_XDP DP <= commit id `38317c2`_ by explicitly setting ``dp_path`` 
to
+``/tmp/afxdp.sock``.
 
+.. _38317c2: 
https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
 
 Prerequisites
 -
@@ -105,10 +109,10 @@ Device Plugin and DPDK container prerequisites:
 
   .. code-block:: console
 
- cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
- [Service]
- LimitMEMLOCK=infinity
- EOF
+cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
+[Service]
+LimitMEMLOCK=infinity
+EOF
 
 * dpdk-testpmd application should have AF_XDP feature enabled.
 
@@ -284,7 +288,7 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
 emptyDir:
   medium: HugePages
 
-  For further reference please use the `pod.yaml`_
+  For further reference please see the `pod.yaml`_
 
   .. _pod.yaml: 
https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
 
@@ -297,3 +301,19 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
--vdev=net_af_xdp0,use_cni=1,iface= \
--no-mlockall --in-memory \

[v11 3/3] net/af_xdp: support AF_XDP DP pinned maps

2024-02-29 Thread Maryam Tahhan
Enable the AF_XDP PMD to retrieve the xskmap
from a pinned eBPF map. This map is expected
to be pinned by an external entity like the
AF_XDP Device Plugin. This enabled unprivileged
pods to create and use AF_XDP sockets.

Signed-off-by: Maryam Tahhan 
---
 doc/guides/howto/af_xdp_dp.rst | 35 --
 doc/guides/nics/af_xdp.rst | 34 --
 doc/guides/rel_notes/release_24_03.rst | 10 +++
 drivers/net/af_xdp/rte_eth_af_xdp.c| 93 --
 4 files changed, 141 insertions(+), 31 deletions(-)

diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index 4aa6b5499f..8b9b5ebbad 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,10 +52,21 @@ should be used when creating the socket
 to instruct libbpf not to load the default libbpf program on the netdev.
 Instead the loading is handled by the AF_XDP Device Plugin.
 
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
-argument then the AF_XDP PMD configures it internally.
+The EAL vdev argument ``use_pinned_map`` is used indicate to the AF_XDP PMD to
+retrieve the XSKMAP fd from a pinned eBPF map. This map is expected to be 
pinned
+by an external entity like the AF_XDP Device Plugin. This enabled unprivileged 
pods
+to create and use AF_XDP sockets. When this flag is set, the
+``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag is used by the AF_XDP PMD 
when
+creating the AF_XDP socket.
+
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or 
``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` 
arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for 
Kubernetes`_.
 
 .. note::
 
@@ -312,8 +323,18 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
--no-mlockall --in-memory \
-- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
 
+  Or
+
+  .. code-block:: console
+
+ kubectl exec -i  --container  -- \
+   //dpdk-testpmd -l 0,1 --no-pci \
+   --vdev=net_af_xdp0,use_pinned_map=1,iface=,dp_path="/tmp/afxdp_dp//xsks_map" \
+   --no-mlockall --in-memory \
+   -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
 .. note::
 
-If the ``dp_path`` parameter isn't explicitly set (like the example above)
-the AF_XDP PMD will set the parameter value to
-``/tmp/afxdp_dp/<>/afxdp.sock``.
+If the ``dp_path`` parameter isn't explicitly set with ``use_cni`` or 
``use_pinned_map``
+the AF_XDP PMD will set the parameter values to the `AF_XDP Device Plugin 
for Kubernetes`_
+defaults.
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 7f8651beda..940bbf60f2 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -171,13 +171,35 @@ enable the `AF_XDP Device Plugin for Kubernetes`_ with a 
DPDK application/pod.
so enabling and disabling of the promiscuous mode through the DPDK 
application
is also not supported.
 
+use_pinned_map
+~~
+
+The EAL vdev argument ``use_pinned_map`` is used to indicate that the user 
wishes to
+load a pinned xskmap mounted by `AF_XDP Device Plugin for Kubernetes`_ in the 
DPDK
+application/pod.
+
+.. _AF_XDP Device Plugin for Kubernetes: 
https://github.com/intel/afxdp-plugins-for-kubernetes
+
+.. code-block:: console
+
+   --vdev=net_af_xdp0,use_pinned_map=1
+
+.. note::
+
+This feature can also be used with any external entity that can pin an 
eBPF map, not just
+the `AF_XDP Device Plugin for Kubernetes`_.
+
 dp_path
 ~~~
 
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
-alongside the ``use_cni`` argument then the AF_XDP PMD configures it 
internally.
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or 
``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` 
arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for 
Kubernetes`_.
 
 .. _AF_XDP Device Plugin for Kubernetes: 
https://github.com/intel/afxdp-plugins-for-kubernetes
 
@@ -185,6 +207,10 @@ alongside the ``use_cni`` argument then the AF_XDP PMD 
configures it internally.
 
--vdev=net_af_x

[PATCH v2 0/3] net/virtio: support IOVA as PA mode for vDPA backend

2024-02-29 Thread Srujana Challa
This patch series makes Virtio-user works in IOVA as PA mode
for vDPA backend.

First patch fixes the issue when having buffer IOVA address in
control queue descriptors.
Second and third patches helps to share descriptor IOVA address,
to the vhost backend. And also disables the use_va flag for VDPA
backend type.

v1->v2:
- Split single patch into three patches.

Srujana Challa (3):
  net/virtio_user: avoid cq descriptor buffer address accessing
  net/virtio: store desc IOVA address in vring data structure
  net/virtio_user: support sharing vq descriptor IOVA to the backend

 drivers/net/virtio/virtio_ring.h  | 12 ++-
 .../net/virtio/virtio_user/virtio_user_dev.c  | 94 ++-
 drivers/net/virtio/virtio_user_ethdev.c   | 10 +-
 drivers/net/virtio/virtqueue.c|  4 +-
 4 files changed, 69 insertions(+), 51 deletions(-)

-- 
2.25.1



[PATCH v2 1/3] net/virtio_user: avoid cq descriptor buffer address accessing

2024-02-29 Thread Srujana Challa
This patch makes changes to avoid descriptor buffer address
accessing while processing shadow control queue.
So that Virtio-user can work with having IOVA as descriptor
buffer address.

Signed-off-by: Srujana Challa 
---
 .../net/virtio/virtio_user/virtio_user_dev.c  | 68 +--
 1 file changed, 33 insertions(+), 35 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index d395fc1676..bf3da4340f 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -885,11 +885,11 @@ static uint32_t
 virtio_user_handle_ctrl_msg_split(struct virtio_user_dev *dev, struct vring 
*vring,
uint16_t idx_hdr)
 {
-   struct virtio_net_ctrl_hdr *hdr;
virtio_net_ctrl_ack status = ~0;
-   uint16_t i, idx_data, idx_status;
+   uint16_t i, idx_data;
uint32_t n_descs = 0;
int dlen[CVQ_MAX_DATA_DESCS], nb_dlen = 0;
+   struct virtio_pmd_ctrl *ctrl;
 
/* locate desc for header, data, and status */
idx_data = vring->desc[idx_hdr].next;
@@ -902,34 +902,33 @@ virtio_user_handle_ctrl_msg_split(struct virtio_user_dev 
*dev, struct vring *vri
n_descs++;
}
 
-   /* locate desc for status */
-   idx_status = i;
n_descs++;
 
-   hdr = (void *)(uintptr_t)vring->desc[idx_hdr].addr;
-   if (hdr->class == VIRTIO_NET_CTRL_MQ &&
-   hdr->cmd == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
-   uint16_t queues;
+   /* Access control command via VA from CVQ */
+   ctrl = (struct virtio_pmd_ctrl *)dev->hw.cvq->hdr_mz->addr;
+   if (ctrl->hdr.class == VIRTIO_NET_CTRL_MQ &&
+   ctrl->hdr.cmd == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
+   uint16_t *queues;
 
-   queues = *(uint16_t *)(uintptr_t)vring->desc[idx_data].addr;
-   status = virtio_user_handle_mq(dev, queues);
-   } else if (hdr->class == VIRTIO_NET_CTRL_MQ && hdr->cmd == 
VIRTIO_NET_CTRL_MQ_RSS_CONFIG) {
+   queues = (uint16_t *)ctrl->data;
+   status = virtio_user_handle_mq(dev, *queues);
+   } else if (ctrl->hdr.class == VIRTIO_NET_CTRL_MQ &&
+  ctrl->hdr.cmd == VIRTIO_NET_CTRL_MQ_RSS_CONFIG) {
struct virtio_net_ctrl_rss *rss;
 
-   rss = (struct virtio_net_ctrl_rss 
*)(uintptr_t)vring->desc[idx_data].addr;
+   rss = (struct virtio_net_ctrl_rss *)ctrl->data;
status = virtio_user_handle_mq(dev, rss->max_tx_vq);
-   } else if (hdr->class == VIRTIO_NET_CTRL_RX  ||
-  hdr->class == VIRTIO_NET_CTRL_MAC ||
-  hdr->class == VIRTIO_NET_CTRL_VLAN) {
+   } else if (ctrl->hdr.class == VIRTIO_NET_CTRL_RX  ||
+  ctrl->hdr.class == VIRTIO_NET_CTRL_MAC ||
+  ctrl->hdr.class == VIRTIO_NET_CTRL_VLAN) {
status = 0;
}
 
if (!status && dev->scvq)
-   status = virtio_send_command(&dev->scvq->cq,
-   (struct virtio_pmd_ctrl *)hdr, dlen, nb_dlen);
+   status = virtio_send_command(&dev->scvq->cq, ctrl, dlen, 
nb_dlen);
 
/* Update status */
-   *(virtio_net_ctrl_ack *)(uintptr_t)vring->desc[idx_status].addr = 
status;
+   ctrl->status = status;
 
return n_descs;
 }
@@ -948,7 +947,7 @@ virtio_user_handle_ctrl_msg_packed(struct virtio_user_dev 
*dev,
   struct vring_packed *vring,
   uint16_t idx_hdr)
 {
-   struct virtio_net_ctrl_hdr *hdr;
+   struct virtio_pmd_ctrl *ctrl;
virtio_net_ctrl_ack status = ~0;
uint16_t idx_data, idx_status;
/* initialize to one, header is first */
@@ -971,32 +970,31 @@ virtio_user_handle_ctrl_msg_packed(struct virtio_user_dev 
*dev,
n_descs++;
}
 
-   hdr = (void *)(uintptr_t)vring->desc[idx_hdr].addr;
-   if (hdr->class == VIRTIO_NET_CTRL_MQ &&
-   hdr->cmd == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
-   uint16_t queues;
+   /* Access control command via VA from CVQ */
+   ctrl = (struct virtio_pmd_ctrl *)dev->hw.cvq->hdr_mz->addr;
+   if (ctrl->hdr.class == VIRTIO_NET_CTRL_MQ &&
+   ctrl->hdr.cmd == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
+   uint16_t *queues;
 
-   queues = *(uint16_t *)(uintptr_t)
-   vring->desc[idx_data].addr;
-   status = virtio_user_handle_mq(dev, queues);
-   } else if (hdr->class == VIRTIO_NET_CTRL_MQ && hdr->cmd == 
VIRTIO_NET_CTRL_MQ_RSS_CONFIG) {
+   queues = (uint16_t *)ctrl->data;
+   status = virtio_user_handle_mq(dev, *queues);
+   } else if (ctrl->hdr.class == VIRTIO_NET_CTRL_MQ &&
+  ctrl->hdr.cmd == VIRTIO_NET_CTRL_MQ_RSS_CONFIG) {
struct virtio_net_ctrl_rss 

[PATCH v2 2/3] net/virtio: store desc IOVA address in vring data structure

2024-02-29 Thread Srujana Challa
Stores desc IOVA in the queue's vring data structure,
as preliminary work to provide a way for Virtio-user
to share desc IOVA to the vhost backend.

Signed-off-by: Srujana Challa 
---
 drivers/net/virtio/virtio_ring.h | 12 
 drivers/net/virtio/virtqueue.c   |  4 ++--
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio/virtio_ring.h b/drivers/net/virtio/virtio_ring.h
index e848c0b73b..998605dbb5 100644
--- a/drivers/net/virtio/virtio_ring.h
+++ b/drivers/net/virtio/virtio_ring.h
@@ -83,6 +83,7 @@ struct vring_packed_desc_event {
 
 struct vring_packed {
unsigned int num;
+   rte_iova_t desc_iova;
struct vring_packed_desc *desc;
struct vring_packed_desc_event *driver;
struct vring_packed_desc_event *device;
@@ -90,6 +91,7 @@ struct vring_packed {
 
 struct vring {
unsigned int num;
+   rte_iova_t desc_iova;
struct vring_desc  *desc;
struct vring_avail *avail;
struct vring_used  *used;
@@ -149,11 +151,12 @@ vring_size(struct virtio_hw *hw, unsigned int num, 
unsigned long align)
return size;
 }
 static inline void
-vring_init_split(struct vring *vr, uint8_t *p, unsigned long align,
-unsigned int num)
+vring_init_split(struct vring *vr, uint8_t *p, rte_iova_t iova,
+unsigned long align, unsigned int num)
 {
vr->num = num;
vr->desc = (struct vring_desc *) p;
+   vr->desc_iova = iova;
vr->avail = (struct vring_avail *) (p +
num * sizeof(struct vring_desc));
vr->used = (void *)
@@ -161,11 +164,12 @@ vring_init_split(struct vring *vr, uint8_t *p, unsigned 
long align,
 }
 
 static inline void
-vring_init_packed(struct vring_packed *vr, uint8_t *p, unsigned long align,
-unsigned int num)
+vring_init_packed(struct vring_packed *vr, uint8_t *p, rte_iova_t iova,
+ unsigned long align, unsigned int num)
 {
vr->num = num;
vr->desc = (struct vring_packed_desc *)p;
+   vr->desc_iova = iova;
vr->driver = (struct vring_packed_desc_event *)(p +
vr->num * sizeof(struct vring_packed_desc));
vr->device = (struct vring_packed_desc_event *)
diff --git a/drivers/net/virtio/virtqueue.c b/drivers/net/virtio/virtqueue.c
index 6f419665f1..cf46abfd06 100644
--- a/drivers/net/virtio/virtqueue.c
+++ b/drivers/net/virtio/virtqueue.c
@@ -282,13 +282,13 @@ virtio_init_vring(struct virtqueue *vq)
vq->vq_free_cnt = vq->vq_nentries;
memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
if (virtio_with_packed_queue(vq->hw)) {
-   vring_init_packed(&vq->vq_packed.ring, ring_mem,
+   vring_init_packed(&vq->vq_packed.ring, ring_mem, 
vq->vq_ring_mem,
  VIRTIO_VRING_ALIGN, size);
vring_desc_init_packed(vq, size);
} else {
struct vring *vr = &vq->vq_split.ring;
 
-   vring_init_split(vr, ring_mem, VIRTIO_VRING_ALIGN, size);
+   vring_init_split(vr, ring_mem, vq->vq_ring_mem, 
VIRTIO_VRING_ALIGN, size);
vring_desc_init_split(vr->desc, size);
}
/*
-- 
2.25.1



[PATCH v2 3/3] net/virtio_user: support sharing vq descriptor IOVA to the backend

2024-02-29 Thread Srujana Challa
Adds support to share descriptor IOVA to the vhost backend.
This makes virtio_user driver works in IOVA as PA mode when
use_va flag is disabled.
This patch also disables use_va flag for VDPA backend.

Signed-off-by: Srujana Challa 
---
 .../net/virtio/virtio_user/virtio_user_dev.c  | 26 ---
 drivers/net/virtio/virtio_user_ethdev.c   | 10 ++-
 2 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index bf3da4340f..8ad10e6354 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -62,6 +62,7 @@ virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t 
queue_sel)
struct vhost_vring_state state;
struct vring *vring = &dev->vrings.split[queue_sel];
struct vring_packed *pq_vring = &dev->vrings.packed[queue_sel];
+   uint64_t desc_addr, avail_addr, used_addr;
struct vhost_vring_addr addr = {
.index = queue_sel,
.log_guest_addr = 0,
@@ -81,16 +82,23 @@ virtio_user_kick_queue(struct virtio_user_dev *dev, 
uint32_t queue_sel)
}
 
if (dev->features & (1ULL << VIRTIO_F_RING_PACKED)) {
-   addr.desc_user_addr =
-   (uint64_t)(uintptr_t)pq_vring->desc;
-   addr.avail_user_addr =
-   (uint64_t)(uintptr_t)pq_vring->driver;
-   addr.used_user_addr =
-   (uint64_t)(uintptr_t)pq_vring->device;
+   desc_addr = pq_vring->desc_iova;
+   avail_addr = desc_addr + pq_vring->num * sizeof(struct 
vring_packed_desc);
+   used_addr =  RTE_ALIGN_CEIL(avail_addr + sizeof(struct 
vring_packed_desc_event),
+   VIRTIO_VRING_ALIGN);
+
+   addr.desc_user_addr = desc_addr;
+   addr.avail_user_addr = avail_addr;
+   addr.used_user_addr = used_addr;
} else {
-   addr.desc_user_addr = (uint64_t)(uintptr_t)vring->desc;
-   addr.avail_user_addr = (uint64_t)(uintptr_t)vring->avail;
-   addr.used_user_addr = (uint64_t)(uintptr_t)vring->used;
+   desc_addr = vring->desc_iova;
+   avail_addr = desc_addr + vring->num * sizeof(struct vring_desc);
+   used_addr = 
RTE_ALIGN_CEIL((uintptr_t)(&vring->avail->ring[vring->num]),
+  VIRTIO_VRING_ALIGN);
+
+   addr.desc_user_addr = desc_addr;
+   addr.avail_user_addr = avail_addr;
+   addr.used_user_addr = used_addr;
}
 
state.index = queue_sel;
diff --git a/drivers/net/virtio/virtio_user_ethdev.c 
b/drivers/net/virtio/virtio_user_ethdev.c
index bf9de36d8f..ae6593ba0b 100644
--- a/drivers/net/virtio/virtio_user_ethdev.c
+++ b/drivers/net/virtio/virtio_user_ethdev.c
@@ -198,6 +198,7 @@ virtio_user_setup_queue_packed(struct virtqueue *vq,
   sizeof(struct vring_packed_desc_event),
   VIRTIO_VRING_ALIGN);
vring->num = vq->vq_nentries;
+   vring->desc_iova = vq->vq_ring_mem;
vring->desc = (void *)(uintptr_t)desc_addr;
vring->driver = (void *)(uintptr_t)avail_addr;
vring->device = (void *)(uintptr_t)used_addr;
@@ -221,6 +222,7 @@ virtio_user_setup_queue_split(struct virtqueue *vq, struct 
virtio_user_dev *dev)
   VIRTIO_VRING_ALIGN);
 
dev->vrings.split[queue_idx].num = vq->vq_nentries;
+   dev->vrings.split[queue_idx].desc_iova = vq->vq_ring_mem;
dev->vrings.split[queue_idx].desc = (void *)(uintptr_t)desc_addr;
dev->vrings.split[queue_idx].avail = (void *)(uintptr_t)avail_addr;
dev->vrings.split[queue_idx].used = (void *)(uintptr_t)used_addr;
@@ -689,7 +691,13 @@ virtio_user_pmd_probe(struct rte_vdev_device *vdev)
 * Virtio-user requires using virtual addresses for the descriptors
 * buffers, whatever other devices require
 */
-   hw->use_va = true;
+   if (backend_type == VIRTIO_USER_BACKEND_VHOST_VDPA)
+   /* VDPA backend requires using iova for the buffers to make it
+* work in IOVA as PA mode also.
+*/
+   hw->use_va = false;
+   else
+   hw->use_va = true;
 
/* previously called by pci probing for physical dev */
if (eth_virtio_dev_init(eth_dev) < 0) {
-- 
2.25.1



Re: [PATCH 1/7] vhost: fix VDUSE device destruction failure

2024-02-29 Thread David Marchand
Hey Maxime,

On Thu, Feb 29, 2024 at 1:25 PM Maxime Coquelin
 wrote:
>
> VDUSE_DESTROY_DEVICE ioctl can fail because the device's
> chardev is not released despite close syscall having been
> called. It happens because the events handler thread is
> still polling the file descriptor.
>
> fdset_pipe_notify() is not enough because it does not
> ensure the notification has been handled by the event
> thread, it just returns once the notification is sent.
>
> To fix this, this patch introduces a synchronization
> mechanism based on pthread's condition, so that
> fdset_pipe_notify() only returns once the pipe's read
> callback has been executed.
>
> Fixes: 51d018fdac4e ("vhost: add VDUSE events handler")

This looks to be a generic issue in the fd_man code.
In practice, VDUSE only seems to be affected, so I am ok with this Fixes: tag.


> Cc: sta...@dpdk.org
>
> Signed-off-by: Maxime Coquelin 
> ---
>  lib/vhost/fd_man.c | 21 ++---
>  lib/vhost/fd_man.h |  5 +
>  2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c
> index 79a8d2c006..42ce059039 100644
> --- a/lib/vhost/fd_man.c
> +++ b/lib/vhost/fd_man.c
> @@ -309,10 +309,11 @@ fdset_event_dispatch(void *arg)
>  }
>
>  static void
> -fdset_pipe_read_cb(int readfd, void *dat __rte_unused,
> +fdset_pipe_read_cb(int readfd, void *dat,
>int *remove __rte_unused)
>  {
> char charbuf[16];
> +   struct fdset *fdset = dat;
> int r = read(readfd, charbuf, sizeof(charbuf));
> /*
>  * Just an optimization, we don't care if read() failed
> @@ -320,6 +321,11 @@ fdset_pipe_read_cb(int readfd, void *dat __rte_unused,
>  * compiler happy
>  */
> RTE_SET_USED(r);
> +
> +   pthread_mutex_lock(&fdset->sync_mutex);
> +   fdset->sync = true;
> +   pthread_cond_broadcast(&fdset->sync_cond);
> +   pthread_mutex_unlock(&fdset->sync_mutex);
>  }
>
>  void
> @@ -342,7 +348,7 @@ fdset_pipe_init(struct fdset *fdset)
> }
>
> ret = fdset_add(fdset, fdset->u.readfd,
> -   fdset_pipe_read_cb, NULL, NULL);
> +   fdset_pipe_read_cb, NULL, fdset);
>
> if (ret < 0) {
> VHOST_FDMAN_LOG(ERR,
> @@ -359,7 +365,12 @@ fdset_pipe_init(struct fdset *fdset)
>  void
>  fdset_pipe_notify(struct fdset *fdset)
>  {
> -   int r = write(fdset->u.writefd, "1", 1);
> +   int r;
> +
> +   pthread_mutex_lock(&fdset->sync_mutex);
> +
> +   fdset->sync = false;
> +   r = write(fdset->u.writefd, "1", 1);
> /*
>  * Just an optimization, we don't care if write() failed
>  * so ignore explicitly its return value to make the
> @@ -367,4 +378,8 @@ fdset_pipe_notify(struct fdset *fdset)
>  */
> RTE_SET_USED(r);
>
> +   while (!fdset->sync)
> +   pthread_cond_wait(&fdset->sync_cond, &fdset->sync_mutex);
> +
> +   pthread_mutex_unlock(&fdset->sync_mutex);
>  }
> diff --git a/lib/vhost/fd_man.h b/lib/vhost/fd_man.h
> index 6315904c8e..cc19937612 100644
> --- a/lib/vhost/fd_man.h
> +++ b/lib/vhost/fd_man.h
> @@ -6,6 +6,7 @@
>  #define _FD_MAN_H_
>  #include 
>  #include 
> +#include 
>
>  #define MAX_FDS 1024
>
> @@ -35,6 +36,10 @@ struct fdset {
> int writefd;
> };
> } u;
> +
> +   pthread_mutex_t sync_mutex;
> +   pthread_cond_t sync_cond;
> +   bool sync;

We should explicitly initialise those in
https://git.dpdk.org/dpdk/tree/lib/vhost/socket.c#n91 and
https://git.dpdk.org/dpdk/tree/lib/vhost/vduse.c#n34.
The rest looks acceptable to me.


-- 
David Marchand



RE: [PATCH] net/mlx5: fix counter cache starvation

2024-02-29 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: Dariusz Sosnowski 
> Sent: Wednesday, February 28, 2024 9:06 PM
> To: Slava Ovsiienko ; Ori Kam ;
> Suanming Mou ; Matan Azrad
> ; Jack Min 
> Cc: dev@dpdk.org; sta...@dpdk.org; Bing Zhao 
> Subject: [PATCH] net/mlx5: fix counter cache starvation
> 
> mlx5 PMD maintains a global counter pool and per-queue counter cache,
> which are used to allocate COUNT flow action objects.
> Whenever an empty cache is accessed, it is replenished with a pre-defined
> number of counters.
> 
> If number of configured counters was sufficiently small, then it might have
> happened that caches associated with some queues could get starved because
> all counters were fetched on other queues.
> 
> This patch fixes that by disabling cache at runtime if number of configured
> counters is not sufficient to avoid such starvation.
> 
> Fixes: 4d368e1da3a4 ("net/mlx5: support flow counter action for HWS")
> Cc: jack...@nvidia.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Dariusz Sosnowski 
> Acked-by: Ori Kam 
> Acked-by: Bing Zhao 

Patch applied to next-net-mlx,
Kindest regards
Raslan Darawsheh


RE: [PATCH] net/mlx5: fix the counters map in bonding mode

2024-02-29 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: Bing Zhao 
> Sent: Thursday, February 29, 2024 11:35 AM
> To: Slava Ovsiienko ; dev@dpdk.org; Raslan
> Darawsheh 
> Cc: Ori Kam ; Dariusz Sosnowski
> ; Suanming Mou ;
> Matan Azrad ; Xueming Li ;
> sta...@dpdk.org
> Subject: [PATCH] net/mlx5: fix the counters map in bonding mode
> 
> In the HW-LAG mode, there is only one mlx5 IB device with 2 ETH interfaces. In
> theory, the settings on both ports should be the same.
> But in the real life, some inconsistent settings may be done by the user and 
> the
> PMD is not aware of this.
> 
> In the previous implementation, the xstats map was generated from the
> information fetched on the 1st port of a bonding interface. If the 2nd port 
> had
> a different settings, the number and the order of the counters may differ from
> that of the 1st one. The ioctl() call may corrupt the user buffers 
> (copy_to_user)
> and cause a crash.
> 
> The commit will change the map between the driver counters to the PMD user
> counters.
>   1. Switch the inner and outer loop to speed up the initialization
>  time AMAP - since there will be >300 counters returned from the
>  driver.
>   2. Generate an unique map for both ports in LAG mode.
> a. Scan the 1st port and find the supported counters' strings,
>then add to the map.
> b. In bonding, scan the 2nd port and find the strings. If one is
>already in the map, use the index. Or append to the next free
>slot.
> c. Append the device counters that needs to be fetched via sysfs
>or Devx command. This kind of counter(s) is unique per IB
>device.
> 
> After querying the statistics from the driver, the value will be read from the
> proper offset in the "struct ethtool_stats" and then added into the output
> array based on the map information. In bonding mode, the statistics from
> both ports will be accumulated if the counters are valid on both ports.
> 
> Compared to the system call or Devx command, the overhead introduced by
> the extra index comparison is light and should not cause a significant
> degradation.
> 
> The application should ensure that the port settings should not be changed
> out of the DPDK application dynamically in most cases. Or else the change
> cannot be notified and the counters map might not be valid when the number
> doesn't change but the counters set had changed. A device restart will help to
> re-initialize the map from scrath.
> 
> Fixes: 7ed15acdcd69 ("net/mlx5: improve xstats of bonding port")
> Cc: xuemi...@nvidia.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Bing Zhao 
> Acked-by: Viacheslav Ovsiienko 
Patch applied to next-net-mlx,

Kindest regards
Raslan Darawsheh


RE: [PATCH] net/mlx5: fix action template expansion: support indirect actions list

2024-02-29 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: Gregory Etelson 
> Sent: Thursday, February 29, 2024 12:20 PM
> To: dev@dpdk.org
> Cc: Gregory Etelson ; Maayan Kashani
> ; Raslan Darawsheh ;
> sta...@dpdk.org; Suanming Mou ; Dariusz
> Sosnowski ; Slava Ovsiienko
> ; Ori Kam ; Matan Azrad
> 
> Subject: [PATCH] net/mlx5: fix action template expansion: support indirect
> actions list
> 
> MLX5 PMD actions template compilation may implicitly add MODIFY_HEADER
> to actions list provided by application.
> MLX5 actions in a template list must be arranged according to the HW
> supported order.
> The PMD must place new MODIFY_HEADER in the correct location relative to
> existing actions.
> 
> The patch adds indirect actions list to calculation of the new MODIFY_HEADER
> location.
> 
> Fixes: e26f50adbf38 ("net/mlx5: support indirect list meter mark action")
> 
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Gregory Etelson 
> Acked-by: Suanming Mou 
> ---
Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh



Re: [PATCH 2/7] vhost: rename polling mutex

2024-02-29 Thread David Marchand
On Thu, Feb 29, 2024 at 1:25 PM Maxime Coquelin
 wrote:
>
> This trivial patch fixes a typo in fd's manager polling
> mutex name.
>
> Signed-off-by: Maxime Coquelin 

Reviewed-by: David Marchand 


-- 
David Marchand



Re: [PATCH 3/7] vhost: make use of FD manager init function

2024-02-29 Thread David Marchand
On Thu, Feb 29, 2024 at 1:25 PM Maxime Coquelin
 wrote:
>
> Instead of statically initialize the fdset, this patch
> converts VDUSE and Vhost-user to use fdset_init() function,
> which now also initialize the mutexes.
>
> This is preliminary rework to hide FDs manager pipe from
> its users.
>
> Signed-off-by: Maxime Coquelin 
> ---
>  lib/vhost/fd_man.c |  9 +++--
>  lib/vhost/fd_man.h |  2 +-
>  lib/vhost/socket.c | 11 +--
>  lib/vhost/vduse.c  | 14 ++
>  4 files changed, 19 insertions(+), 17 deletions(-)
>
> diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c
> index 5dde40e51a..d33036a171 100644
> --- a/lib/vhost/fd_man.c
> +++ b/lib/vhost/fd_man.c
> @@ -96,19 +96,24 @@ fdset_add_fd(struct fdset *pfdset, int idx, int fd,
> pfd->revents = 0;
>  }
>
> -void
> +int
>  fdset_init(struct fdset *pfdset)
>  {
> int i;
>
> if (pfdset == NULL)
> -   return;
> +   return -1;
> +
> +   pthread_mutex_init(&pfdset->fd_mutex, NULL);
> +   pthread_mutex_init(&pfdset->fd_polling_mutex, NULL);

Following patch 1, we are missing init of sync_* variables.


>
> for (i = 0; i < MAX_FDS; i++) {
> pfdset->fd[i].fd = -1;
> pfdset->fd[i].dat = NULL;
> }
> pfdset->num = 0;
> +
> +   return 0;
>  }
>
>  /**
> diff --git a/lib/vhost/fd_man.h b/lib/vhost/fd_man.h
> index 2517ae5a9b..92d24d8591 100644
> --- a/lib/vhost/fd_man.h
> +++ b/lib/vhost/fd_man.h
> @@ -43,7 +43,7 @@ struct fdset {
>  };
>
>
> -void fdset_init(struct fdset *pfdset);
> +int fdset_init(struct fdset *pfdset);
>
>  int fdset_add(struct fdset *pfdset, int fd,
> fd_cb rcb, fd_cb wcb, void *dat);
> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> index a2fdac30a4..b544e39be7 100644
> --- a/lib/vhost/socket.c
> +++ b/lib/vhost/socket.c
> @@ -89,12 +89,6 @@ static int create_unix_socket(struct vhost_user_socket 
> *vsocket);
>  static int vhost_user_start_client(struct vhost_user_socket *vsocket);
>
>  static struct vhost_user vhost_user = {
> -   .fdset = {
> -   .fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
> -   .fd_mutex = PTHREAD_MUTEX_INITIALIZER,
> -   .fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
> -   .num = 0
> -   },
> .vsocket_cnt = 0,
> .mutex = PTHREAD_MUTEX_INITIALIZER,
>  };
> @@ -1187,6 +1181,11 @@ rte_vhost_driver_start(const char *path)
> return vduse_device_create(path, 
> vsocket->net_compliant_ol_flags);
>
> if (fdset_tid.opaque_id == 0) {
> +   if (fdset_init(&vhost_user.fdset) < 0) {
> +   VHOST_CONFIG_LOG(path, ERR, "Failed to init 
> Vhost-user fdset");

Nit: other log messages in this function are not consistent wrt
starting with a capital letter.


> +   return -1;
> +   }
> +
> /**
>  * create a pipe which will be waited by poll and notified to
>  * rebuild the wait list of poll.
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index d462428d2c..d83d7b0d7c 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -31,14 +31,7 @@ struct vduse {
> struct fdset fdset;
>  };
>
> -static struct vduse vduse = {
> -   .fdset = {
> -   .fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
> -   .fd_mutex = PTHREAD_MUTEX_INITIALIZER,
> -   .fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
> -   .num = 0
> -   },
> -};
> +static struct vduse vduse;
>
>  static bool vduse_events_thread;
>
> @@ -434,6 +427,11 @@ vduse_device_create(const char *path, bool 
> compliant_ol_flags)
>
> /* If first device, create events dispatcher thread */
> if (vduse_events_thread == false) {
> +   if (fdset_init(&vduse.fdset) < 0) {
> +   VHOST_CONFIG_LOG(path, ERR, "Failed to init VDUSE 
> fdset");
> +   return -1;
> +   }
> +

Nit: idem
+ other log messages use "vduse fdset".


-- 
David Marchand



[v11 0/4] PCI Dev and SG copy support

2024-02-29 Thread Gowrishankar Muthukrishnan
Improve dma-perf application to support PCI dev and SG copy,
along with additional supports as below:
 - validate copied memory
 - skip tests if not opted.

v11:
- Review suggestions.

Gowrishankar Muthukrishnan (4):
  app/dma-perf: add skip support
  app/dma-perf: add PCI device support
  app/dma-perf: validate copied memory
  app/dma-perf: add SG copy support

 app/test-dma-perf/benchmark.c | 413 ++
 app/test-dma-perf/config.ini  |  56 +
 app/test-dma-perf/main.c  | 178 ---
 app/test-dma-perf/main.h  |  13 +-
 4 files changed, 595 insertions(+), 65 deletions(-)

-- 
2.25.1



[v11 2/4] app/dma-perf: add PCI device support

2024-02-29 Thread Gowrishankar Muthukrishnan
Add support to test performance for "device to memory" and
"memory to device" data transfer.

Signed-off-by: Amit Prakash Shukla 
Acked-by: Anoob Joseph 
Acked-by: Chengwen Feng 
---
v11:
 - config file formatting.

 app/test-dma-perf/benchmark.c | 119 ++
 app/test-dma-perf/config.ini  |  31 +
 app/test-dma-perf/main.c  |  77 ++
 app/test-dma-perf/main.h  |   7 ++
 4 files changed, 222 insertions(+), 12 deletions(-)

diff --git a/app/test-dma-perf/benchmark.c b/app/test-dma-perf/benchmark.c
index 9b1f58c78c..3c4fddb138 100644
--- a/app/test-dma-perf/benchmark.c
+++ b/app/test-dma-perf/benchmark.c
@@ -127,17 +127,54 @@ cache_flush_buf(__rte_unused struct rte_mbuf **array,
 #endif
 }
 
+static int
+vchan_data_populate(uint32_t dev_id, struct rte_dma_vchan_conf *qconf,
+   struct test_configure *cfg)
+{
+   struct rte_dma_info info;
+
+   qconf->direction = cfg->transfer_dir;
+
+   rte_dma_info_get(dev_id, &info);
+   if (!(RTE_BIT64(qconf->direction) & info.dev_capa))
+   return -1;
+
+   qconf->nb_desc = cfg->ring_size.cur;
+
+   switch (qconf->direction) {
+   case RTE_DMA_DIR_MEM_TO_DEV:
+   qconf->dst_port.pcie.vfen = 1;
+   qconf->dst_port.port_type = RTE_DMA_PORT_PCIE;
+   qconf->dst_port.pcie.coreid = cfg->vchan_dev.port.pcie.coreid;
+   qconf->dst_port.pcie.vfid = cfg->vchan_dev.port.pcie.vfid;
+   qconf->dst_port.pcie.pfid = cfg->vchan_dev.port.pcie.pfid;
+   break;
+   case RTE_DMA_DIR_DEV_TO_MEM:
+   qconf->src_port.pcie.vfen = 1;
+   qconf->src_port.port_type = RTE_DMA_PORT_PCIE;
+   qconf->src_port.pcie.coreid = cfg->vchan_dev.port.pcie.coreid;
+   qconf->src_port.pcie.vfid = cfg->vchan_dev.port.pcie.vfid;
+   qconf->src_port.pcie.pfid = cfg->vchan_dev.port.pcie.pfid;
+   break;
+   case RTE_DMA_DIR_MEM_TO_MEM:
+   case RTE_DMA_DIR_DEV_TO_DEV:
+   break;
+   }
+
+   return 0;
+}
+
 /* Configuration of device. */
 static void
-configure_dmadev_queue(uint32_t dev_id, uint32_t ring_size)
+configure_dmadev_queue(uint32_t dev_id, struct test_configure *cfg)
 {
uint16_t vchan = 0;
struct rte_dma_info info;
struct rte_dma_conf dev_config = { .nb_vchans = 1 };
-   struct rte_dma_vchan_conf qconf = {
-   .direction = RTE_DMA_DIR_MEM_TO_MEM,
-   .nb_desc = ring_size
-   };
+   struct rte_dma_vchan_conf qconf = { 0 };
+
+   if (vchan_data_populate(dev_id, &qconf, cfg) != 0)
+   rte_exit(EXIT_FAILURE, "Error with vchan data populate.\n");
 
if (rte_dma_configure(dev_id, &dev_config) != 0)
rte_exit(EXIT_FAILURE, "Error with dma configure.\n");
@@ -159,7 +196,6 @@ configure_dmadev_queue(uint32_t dev_id, uint32_t ring_size)
 static int
 config_dmadevs(struct test_configure *cfg)
 {
-   uint32_t ring_size = cfg->ring_size.cur;
struct lcore_dma_map_t *ldm = &cfg->lcore_dma_map;
uint32_t nb_workers = ldm->cnt;
uint32_t i;
@@ -176,7 +212,7 @@ config_dmadevs(struct test_configure *cfg)
}
 
ldm->dma_ids[i] = dev_id;
-   configure_dmadev_queue(dev_id, ring_size);
+   configure_dmadev_queue(dev_id, cfg);
++nb_dmadevs;
}
 
@@ -302,13 +338,23 @@ do_cpu_mem_copy(void *p)
return 0;
 }
 
+static void
+dummy_free_ext_buf(void *addr, void *opaque)
+{
+   RTE_SET_USED(addr);
+   RTE_SET_USED(opaque);
+}
+
 static int
 setup_memory_env(struct test_configure *cfg, struct rte_mbuf ***srcs,
struct rte_mbuf ***dsts)
 {
-   unsigned int buf_size = cfg->buf_size.cur;
+   static struct rte_mbuf_ext_shared_info *ext_buf_info;
+   unsigned int cur_buf_size = cfg->buf_size.cur;
+   unsigned int buf_size = cur_buf_size + RTE_PKTMBUF_HEADROOM;
unsigned int nr_sockets;
uint32_t nr_buf = cfg->nr_buf;
+   uint32_t i;
 
nr_sockets = rte_socket_count();
if (cfg->src_numa_node >= nr_sockets ||
@@ -321,7 +367,7 @@ setup_memory_env(struct test_configure *cfg, struct 
rte_mbuf ***srcs,
nr_buf,
0,
0,
-   buf_size + RTE_PKTMBUF_HEADROOM,
+   buf_size,
cfg->src_numa_node);
if (src_pool == NULL) {
PRINT_ERR("Error with source mempool creation.\n");
@@ -332,7 +378,7 @@ setup_memory_env(struct test_configure *cfg, struct 
rte_mbuf ***srcs,
nr_buf,
0,
0,
-   buf_size + RTE_PKTMBUF_HEADROOM,
+   buf_size,
cfg->dst_numa_node);
if (dst_pool == NULL) {
PRI

[v11 1/4] app/dma-perf: add skip support

2024-02-29 Thread Gowrishankar Muthukrishnan
Add support to skip running a dma-perf test-case.

Signed-off-by: Amit Prakash Shukla 
Acked-by: Anoob Joseph 
Acked-by: Chengwen Feng 
---
v11:
 - config file formatting

 app/test-dma-perf/config.ini |  2 ++
 app/test-dma-perf/main.c | 48 ++--
 app/test-dma-perf/main.h |  1 +
 3 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/app/test-dma-perf/config.ini b/app/test-dma-perf/config.ini
index b550f4b23f..bb0b1aa11a 100644
--- a/app/test-dma-perf/config.ini
+++ b/app/test-dma-perf/config.ini
@@ -30,6 +30,8 @@
 ; If you have already set the "-l" and "-a" parameters using EAL,
 ; make sure that the value of "lcore" falls within their range of values.
 
+; "skip" To skip a test-case set skip to 1.
+
 ; To specify a configuration file, use the "--config" flag followed by the 
path to the file.
 
 ; To specify a result file, use the "--result" flag followed by the path to 
the file.
diff --git a/app/test-dma-perf/main.c b/app/test-dma-perf/main.c
index 544784df50..e9e40e72e7 100644
--- a/app/test-dma-perf/main.c
+++ b/app/test-dma-perf/main.c
@@ -86,6 +86,19 @@ output_header(uint32_t case_id, struct test_configure 
*case_cfg)
output_csv(true);
 }
 
+static int
+open_output_csv(const char *rst_path_ptr)
+{
+   fd = fopen(rst_path_ptr, "a");
+   if (!fd) {
+   printf("Open output CSV file error.\n");
+   return 1;
+   }
+   output_csv(true);
+   fclose(fd);
+   return 0;
+}
+
 static void
 run_test_case(struct test_configure *case_cfg)
 {
@@ -322,6 +335,7 @@ load_configs(const char *path)
const char *case_type;
const char *lcore_dma;
const char *mem_size_str, *buf_size_str, *ring_size_str, 
*kick_batch_str;
+   const char *skip;
int args_nr, nb_vp;
bool is_dma;
 
@@ -341,6 +355,13 @@ load_configs(const char *path)
for (i = 0; i < nb_sections; i++) {
snprintf(section_name, CFG_NAME_LEN, "case%d", i + 1);
test_case = &test_cases[i];
+
+   skip = rte_cfgfile_get_entry(cfgfile, section_name, "skip");
+   if (skip && (atoi(skip) == 1)) {
+   test_case->is_skip = true;
+   continue;
+   }
+
case_type = rte_cfgfile_get_entry(cfgfile, section_name, 
"type");
if (case_type == NULL) {
printf("Error: No case type in case %d, the test will 
be finished here.\n",
@@ -525,31 +546,20 @@ main(int argc, char *argv[])
 
printf("Running cases...\n");
for (i = 0; i < case_nb; i++) {
-   if (!test_cases[i].is_valid) {
-   printf("Invalid test case %d.\n\n", i + 1);
-   snprintf(output_str[0], MAX_OUTPUT_STR_LEN, "Invalid 
case %d\n", i + 1);
-
-   fd = fopen(rst_path_ptr, "a");
-   if (!fd) {
-   printf("Open output CSV file error.\n");
+   if (test_cases[i].is_skip) {
+   printf("Test case %d configured to be skipped.\n\n", i 
+ 1);
+   snprintf(output_str[0], MAX_OUTPUT_STR_LEN, "Skip the 
test-case %d\n",
+i + 1);
+   if (open_output_csv(rst_path_ptr))
return 0;
-   }
-   output_csv(true);
-   fclose(fd);
continue;
}
 
-   if (test_cases[i].test_type == TEST_TYPE_NONE) {
-   printf("No valid test type in test case %d.\n\n", i + 
1);
+   if (!test_cases[i].is_valid) {
+   printf("Invalid test case %d.\n\n", i + 1);
snprintf(output_str[0], MAX_OUTPUT_STR_LEN, "Invalid 
case %d\n", i + 1);
-
-   fd = fopen(rst_path_ptr, "a");
-   if (!fd) {
-   printf("Open output CSV file error.\n");
+   if (open_output_csv(rst_path_ptr))
return 0;
-   }
-   output_csv(true);
-   fclose(fd);
continue;
}
 
diff --git a/app/test-dma-perf/main.h b/app/test-dma-perf/main.h
index 62085e6e8f..32670151af 100644
--- a/app/test-dma-perf/main.h
+++ b/app/test-dma-perf/main.h
@@ -40,6 +40,7 @@ struct lcore_dma_map_t {
 
 struct test_configure {
bool is_valid;
+   bool is_skip;
uint8_t test_type;
const char *test_type_str;
uint16_t src_numa_node;
-- 
2.25.1



[v11 3/4] app/dma-perf: validate copied memory

2024-02-29 Thread Gowrishankar Muthukrishnan
Validate copied memory to ensure DMA copy did not fail.

Signed-off-by: Gowrishankar Muthukrishnan 
Acked-by: Anoob Joseph 
Acked-by: Chengwen Feng 
---
 app/test-dma-perf/benchmark.c | 23 ++-
 app/test-dma-perf/main.c  | 16 +++-
 app/test-dma-perf/main.h  |  2 +-
 3 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/app/test-dma-perf/benchmark.c b/app/test-dma-perf/benchmark.c
index 3c4fddb138..9c155a58cc 100644
--- a/app/test-dma-perf/benchmark.c
+++ b/app/test-dma-perf/benchmark.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "main.h"
 
@@ -407,6 +408,11 @@ setup_memory_env(struct test_configure *cfg, struct 
rte_mbuf ***srcs,
return -1;
}
 
+   for (i = 0; i < nr_buf; i++) {
+   memset(rte_pktmbuf_mtod((*srcs)[i], void *), rte_rand(), 
buf_size);
+   memset(rte_pktmbuf_mtod((*dsts)[i], void *), 0, buf_size);
+   }
+
if (cfg->transfer_dir == RTE_DMA_DIR_DEV_TO_MEM ||
cfg->transfer_dir == RTE_DMA_DIR_MEM_TO_DEV) {
ext_buf_info = rte_malloc(NULL, sizeof(struct 
rte_mbuf_ext_shared_info), 0);
@@ -445,7 +451,7 @@ setup_memory_env(struct test_configure *cfg, struct 
rte_mbuf ***srcs,
return 0;
 }
 
-void
+int
 mem_copy_benchmark(struct test_configure *cfg, bool is_dma)
 {
uint32_t i;
@@ -463,6 +469,7 @@ mem_copy_benchmark(struct test_configure *cfg, bool is_dma)
uint32_t avg_cycles_total;
float mops, mops_total;
float bandwidth, bandwidth_total;
+   int ret = 0;
 
if (setup_memory_env(cfg, &srcs, &dsts) < 0)
goto out;
@@ -536,6 +543,18 @@ mem_copy_benchmark(struct test_configure *cfg, bool is_dma)
 
rte_eal_mp_wait_lcore();
 
+   if (cfg->transfer_dir == RTE_DMA_DIR_MEM_TO_MEM) {
+   for (i = 0; i < (nr_buf / nb_workers) * nb_workers; i++) {
+   if (memcmp(rte_pktmbuf_mtod(srcs[i], void *),
+  rte_pktmbuf_mtod(dsts[i], void *),
+  cfg->buf_size.cur) != 0) {
+   printf("Copy validation fails for buffer number 
%d\n", i);
+   ret = -1;
+   goto out;
+   }
+   }
+   }
+
mops_total = 0;
bandwidth_total = 0;
avg_cycles_total = 0;
@@ -601,4 +620,6 @@ mem_copy_benchmark(struct test_configure *cfg, bool is_dma)
rte_dma_stop(ldm->dma_ids[i]);
}
}
+
+   return ret;
 }
diff --git a/app/test-dma-perf/main.c b/app/test-dma-perf/main.c
index 051f76a6f9..df05bcd7df 100644
--- a/app/test-dma-perf/main.c
+++ b/app/test-dma-perf/main.c
@@ -101,20 +101,24 @@ open_output_csv(const char *rst_path_ptr)
return 0;
 }
 
-static void
+static int
 run_test_case(struct test_configure *case_cfg)
 {
+   int ret = 0;
+
switch (case_cfg->test_type) {
case TEST_TYPE_DMA_MEM_COPY:
-   mem_copy_benchmark(case_cfg, true);
+   ret = mem_copy_benchmark(case_cfg, true);
break;
case TEST_TYPE_CPU_MEM_COPY:
-   mem_copy_benchmark(case_cfg, false);
+   ret = mem_copy_benchmark(case_cfg, false);
break;
default:
printf("Unknown test type. %s\n", case_cfg->test_type_str);
break;
}
+
+   return ret;
 }
 
 static void
@@ -159,8 +163,10 @@ run_test(uint32_t case_id, struct test_configure *case_cfg)
case_cfg->scenario_id++;
printf("\nRunning scenario %d\n", case_cfg->scenario_id);
 
-   run_test_case(case_cfg);
-   output_csv(false);
+   if (run_test_case(case_cfg) < 0)
+   printf("\nTest fails! skipping this scenario.\n");
+   else
+   output_csv(false);
 
if (var_entry->op == OP_ADD)
var_entry->cur += var_entry->incr;
diff --git a/app/test-dma-perf/main.h b/app/test-dma-perf/main.h
index 745c24b7fe..1123e7524a 100644
--- a/app/test-dma-perf/main.h
+++ b/app/test-dma-perf/main.h
@@ -66,6 +66,6 @@ struct test_configure {
struct test_vchan_dev_config vchan_dev;
 };
 
-void mem_copy_benchmark(struct test_configure *cfg, bool is_dma);
+int mem_copy_benchmark(struct test_configure *cfg, bool is_dma);
 
 #endif /* MAIN_H */
-- 
2.25.1



[v11 4/4] app/dma-perf: add SG copy support

2024-02-29 Thread Gowrishankar Muthukrishnan
Add SG copy support.

Signed-off-by: Gowrishankar Muthukrishnan 
Acked-by: Anoob Joseph 
Acked-by: Chengwen Feng 
---
v11:
 - using struct for SGE config.

 app/test-dma-perf/benchmark.c | 283 ++
 app/test-dma-perf/config.ini  |  25 ++-
 app/test-dma-perf/main.c  |  41 -
 app/test-dma-perf/main.h  |   5 +-
 4 files changed, 317 insertions(+), 37 deletions(-)

diff --git a/app/test-dma-perf/benchmark.c b/app/test-dma-perf/benchmark.c
index 9c155a58cc..d821af8532 100644
--- a/app/test-dma-perf/benchmark.c
+++ b/app/test-dma-perf/benchmark.c
@@ -34,6 +34,13 @@ struct worker_info {
uint32_t test_cpl;
 };
 
+struct sge_info {
+   struct rte_dma_sge *srcs;
+   struct rte_dma_sge *dsts;
+   uint8_t nb_srcs;
+   uint8_t nb_dsts;
+};
+
 struct lcore_params {
uint8_t scenario_id;
unsigned int lcore_id;
@@ -46,6 +53,7 @@ struct lcore_params {
uint16_t test_secs;
struct rte_mbuf **srcs;
struct rte_mbuf **dsts;
+   struct sge_info sge;
volatile struct worker_info worker_info;
 };
 
@@ -86,21 +94,31 @@ calc_result(uint32_t buf_size, uint32_t nr_buf, uint16_t 
nb_workers, uint16_t te
 }
 
 static void
-output_result(uint8_t scenario_id, uint32_t lcore_id, char *dma_name, uint16_t 
ring_size,
-   uint16_t kick_batch, uint64_t ave_cycle, uint32_t 
buf_size, uint32_t nr_buf,
-   float memory, float bandwidth, float mops, bool is_dma)
+output_result(struct test_configure *cfg, struct lcore_params *para,
+   uint16_t kick_batch, uint64_t ave_cycle, uint32_t 
buf_size,
+   uint32_t nr_buf, float memory, float bandwidth, float 
mops)
 {
-   if (is_dma)
-   printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch Size: 
%u.\n",
-   lcore_id, dma_name, ring_size, kick_batch);
-   else
+   uint16_t ring_size = cfg->ring_size.cur;
+   uint8_t scenario_id = cfg->scenario_id;
+   uint32_t lcore_id = para->lcore_id;
+   char *dma_name = para->dma_name;
+
+   if (cfg->is_dma) {
+   printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch Size: 
%u", lcore_id,
+  dma_name, ring_size, kick_batch);
+   if (cfg->is_sg)
+   printf(" DMA src sges: %u, dst sges: %u",
+  para->sge.nb_srcs, para->sge.nb_dsts);
+   printf(".\n");
+   } else {
printf("lcore %u\n", lcore_id);
+   }
 
printf("Average Cycles/op: %" PRIu64 ", Buffer Size: %u B, Buffer 
Number: %u, Memory: %.2lf MB, Frequency: %.3lf Ghz.\n",
ave_cycle, buf_size, nr_buf, memory, 
rte_get_timer_hz()/10.0);
printf("Average Bandwidth: %.3lf Gbps, MOps: %.3lf\n", bandwidth, mops);
 
-   if (is_dma)
+   if (cfg->is_dma)
snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN, 
CSV_LINE_DMA_FMT,
scenario_id, lcore_id, dma_name, ring_size, kick_batch, 
buf_size,
nr_buf, memory, ave_cycle, bandwidth, mops);
@@ -167,7 +185,7 @@ vchan_data_populate(uint32_t dev_id, struct 
rte_dma_vchan_conf *qconf,
 
 /* Configuration of device. */
 static void
-configure_dmadev_queue(uint32_t dev_id, struct test_configure *cfg)
+configure_dmadev_queue(uint32_t dev_id, struct test_configure *cfg, uint8_t 
sges_max)
 {
uint16_t vchan = 0;
struct rte_dma_info info;
@@ -190,6 +208,10 @@ configure_dmadev_queue(uint32_t dev_id, struct 
test_configure *cfg)
rte_exit(EXIT_FAILURE, "Error, no configured queues reported on 
device id. %u\n",
dev_id);
 
+   if (info.max_sges < sges_max)
+   rte_exit(EXIT_FAILURE, "Error with unsupported max_sges on 
device id %u.\n",
+   dev_id);
+
if (rte_dma_start(dev_id) != 0)
rte_exit(EXIT_FAILURE, "Error with dma start.\n");
 }
@@ -202,8 +224,12 @@ config_dmadevs(struct test_configure *cfg)
uint32_t i;
int dev_id;
uint16_t nb_dmadevs = 0;
+   uint8_t nb_sges = 0;
char *dma_name;
 
+   if (cfg->is_sg)
+   nb_sges = RTE_MAX(cfg->nb_src_sges, cfg->nb_dst_sges);
+
for (i = 0; i < ldm->cnt; i++) {
dma_name = ldm->dma_names[i];
dev_id = rte_dma_get_dev_id_by_name(dma_name);
@@ -213,7 +239,7 @@ config_dmadevs(struct test_configure *cfg)
}
 
ldm->dma_ids[i] = dev_id;
-   configure_dmadev_queue(dev_id, cfg);
+   configure_dmadev_queue(dev_id, cfg, nb_sges);
++nb_dmadevs;
}
 
@@ -253,7 +279,7 @@ do_dma_submit_and_poll(uint16_t dev_id, uint64_t *async_cnt,
 }
 
 static inline int
-do_dma_mem_copy(void *p)
+do_dma_plain_mem_copy(void *p)
 {
struct lcore_params *para = (struct lcore_params *

[PATCH v2] net/mlx5: add HWS support for matching ingress metadata

2024-02-29 Thread Michael Baum
Add support for matching metadata in HWS ingress rules.
It using REG_B matching which is supported for each device supports HWS.

Signed-off-by: Michael Baum 
---

v2:
 - Rebase.
 - Fix compilation issue.
 - Update documentation.

 doc/guides/nics/mlx5.rst  |   3 +-
 drivers/net/mlx5/hws/mlx5dr.h |   1 +
 drivers/net/mlx5/hws/mlx5dr_definer.c |  32 ---
 drivers/net/mlx5/mlx5_flow.h  | 122 ++
 drivers/net/mlx5/mlx5_flow_hw.c   |  10 +--
 5 files changed, 92 insertions(+), 76 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index c0294f268d..aea9d95cff 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1301,7 +1301,8 @@ for an additional list of options shared with other mlx5 
drivers.
 ``META`` related actions and items operate only within NIC Tx and
 NIC Rx steering domains, no ``MARK`` and ``META`` information crosses
 the domain boundaries. The ``MARK`` item is 24 bits wide, the ``META``
-item is 32 bits wide and match supported on egress only.
+item is 32 bits wide and match supported on egress only when
+``dv_flow_en`` = 1.
 
   - 1, this engages extensive metadata mode, the ``MARK`` and ``META``
 related actions and items operate within all supported steering domains,
diff --git a/drivers/net/mlx5/hws/mlx5dr.h b/drivers/net/mlx5/hws/mlx5dr.h
index 8441ae97e9..59fd61e24b 100644
--- a/drivers/net/mlx5/hws/mlx5dr.h
+++ b/drivers/net/mlx5/hws/mlx5dr.h
@@ -18,6 +18,7 @@ enum mlx5dr_table_type {
MLX5DR_TABLE_TYPE_NIC_TX,
MLX5DR_TABLE_TYPE_FDB,
MLX5DR_TABLE_TYPE_MAX,
+   MLX5DR_TABLE_TYPE_DONTCARE = MLX5DR_TABLE_TYPE_MAX,
 };
 
 enum mlx5dr_matcher_resource_mode {
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c 
b/drivers/net/mlx5/hws/mlx5dr_definer.c
index 0e15aafb8a..35a2ed2048 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.c
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
@@ -1745,6 +1745,7 @@ mlx5dr_definer_conv_item_tag(struct 
mlx5dr_definer_conv_data *cd,
if (item->type == RTE_FLOW_ITEM_TYPE_TAG)
reg = flow_hw_get_reg_id_from_ctx(cd->ctx,
  RTE_FLOW_ITEM_TYPE_TAG,
+ MLX5DR_TABLE_TYPE_DONTCARE,
  v->index);
else
reg = (int)v->index;
@@ -1805,9 +1806,10 @@ mlx5dr_definer_conv_item_quota(struct 
mlx5dr_definer_conv_data *cd,
   __rte_unused struct rte_flow_item *item,
   int item_idx)
 {
-   int mtr_reg =
-   flow_hw_get_reg_id_from_ctx(cd->ctx, RTE_FLOW_ITEM_TYPE_METER_COLOR,
-   0);
+   int mtr_reg = flow_hw_get_reg_id_from_ctx(cd->ctx,
+ 
RTE_FLOW_ITEM_TYPE_METER_COLOR,
+ MLX5DR_TABLE_TYPE_DONTCARE,
+ 0);
struct mlx5dr_definer_fc *fc;
 
if (mtr_reg < 0) {
@@ -1826,6 +1828,7 @@ mlx5dr_definer_conv_item_quota(struct 
mlx5dr_definer_conv_data *cd,
 
 static int
 mlx5dr_definer_conv_item_metadata(struct mlx5dr_definer_conv_data *cd,
+ enum mlx5dr_table_type table_domain_type,
  struct rte_flow_item *item,
  int item_idx)
 {
@@ -1837,7 +1840,8 @@ mlx5dr_definer_conv_item_metadata(struct 
mlx5dr_definer_conv_data *cd,
if (!m)
return 0;
 
-   reg = flow_hw_get_reg_id_from_ctx(cd->ctx, RTE_FLOW_ITEM_TYPE_META, -1);
+   reg = flow_hw_get_reg_id_from_ctx(cd->ctx, RTE_FLOW_ITEM_TYPE_META,
+ table_domain_type, -1);
if (reg <= 0) {
DR_LOG(ERR, "Invalid register for item metadata");
rte_errno = EINVAL;
@@ -2144,8 +2148,9 @@ mlx5dr_definer_conv_item_conntrack(struct 
mlx5dr_definer_conv_data *cd,
if (!m)
return 0;
 
-   reg = flow_hw_get_reg_id_from_ctx(cd->ctx, RTE_FLOW_ITEM_TYPE_CONNTRACK,
- -1);
+   reg = flow_hw_get_reg_id_from_ctx(cd->ctx,
+ RTE_FLOW_ITEM_TYPE_CONNTRACK,
+ MLX5DR_TABLE_TYPE_DONTCARE, -1);
if (reg <= 0) {
DR_LOG(ERR, "Invalid register for item conntrack");
rte_errno = EINVAL;
@@ -2287,7 +2292,8 @@ mlx5dr_definer_conv_item_meter_color(struct 
mlx5dr_definer_conv_data *cd,
return 0;
 
reg = flow_hw_get_reg_id_from_ctx(cd->ctx,
- RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+ RTE_FLOW_ITEM_TYPE_METER_COLOR,
+ MLX5DR_TABLE_TYPE_DONTCARE, 0);
MLX5_ASSERT(reg 

RE: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional transfer

2024-02-29 Thread Amit Prakash Shukla
Hi Chengwen,

I liked your suggestion and tried making changes, but encountered parsing issue 
for CFG files with line greater than CFG_VALUE_LEN=256(current value set).

There is a discussion on the similar lines in another patch set: 
https://patchwork.dpdk.org/project/dpdk/patch/20231206112952.1588-1-vipin.vargh...@amd.com/.

I believe this patch can be taken as-is and we can come up with the solution 
when we can increase the CFG_VALUE_LEN as changing CFG_VALUE_LEN in this 
release is causing ABI breakage.

Thanks,
Amit Shukla

> -Original Message-
> From: Amit Prakash Shukla
> Sent: Wednesday, February 28, 2024 3:08 PM
> To: fengchengwen ; Cheng Jiang
> ; Gowrishankar Muthukrishnan
> 
> Cc: dev@dpdk.org; Jerin Jacob ; Anoob Joseph
> ; Kevin Laatz ; Bruce
> Richardson ; Pavan Nikhilesh Bhagavatula
> 
> Subject: RE: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional
> transfer
> 
> Hi Chengwen,
> 
> Please see my reply in-line.
> 
> Thanks
> Amit Shukla
> 
> > -Original Message-
> > From: fengchengwen 
> > Sent: Wednesday, February 28, 2024 12:34 PM
> > To: Amit Prakash Shukla ; Cheng Jiang
> > ; Gowrishankar Muthukrishnan
> > 
> > Cc: dev@dpdk.org; Jerin Jacob ; Anoob Joseph
> > ; Kevin Laatz ; Bruce
> > Richardson ; Pavan Nikhilesh Bhagavatula
> > 
> > Subject: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional
> > transfer
> >
> > External Email
> >
> > --
> > Hi Amit and Gowrishankar,
> >
> > It's nature to support multiple dmadev test in one testcase, and the
> > original framework supports it.
> > But it seem we both complicated it when adding support for non-
> mem2mem
> > dma test.
> >
> > The new added "direction" and "vchan_dev" could treat as the dmadev's
> > private configure, some thing like:
> >
> >
> lcore_dma=lcore10@:00:04.2,vchan=0,dir=mem2dev,devtype=pcie,radd
> > r=xxx,coreid=1,pfid=2,vfid=3
> >
> > then this bi-directional could impl only with config:
> >
> >
> lcore_dma=lcore10@:00:04.2,dir=mem2dev,devtype=pcie,raddr=xxx,cor
> > eid=1,pfid=2,vfid=3,
> >
> lcore11@:00:04.3,dir=dev2mem,devtype=pcie,raddr=xxx,coreid=1,pfid=
> > 2,vfid=3
> > so that the lcore10 will do mem2dev with device :00:04.2, while
> > lcore11 will do dev2mem with device :00:04.3.
> 
> Thanks for the suggestion. I will make the suggested changes and send the
> next version.


RE: [PATCH v2] net/mlx5: add HWS support for matching ingress metadata

2024-02-29 Thread Dariusz Sosnowski
> -Original Message-
> From: Michael Baum 
> Sent: Thursday, February 29, 2024 14:49
> To: dev@dpdk.org
> Cc: Matan Azrad ; Dariusz Sosnowski
> ; Raslan Darawsheh ; Slava
> Ovsiienko ; Ori Kam ;
> Suanming Mou 
> Subject: [PATCH v2] net/mlx5: add HWS support for matching ingress
> metadata
> 
> Add support for matching metadata in HWS ingress rules.
> It using REG_B matching which is supported for each device supports HWS.
> 
> Signed-off-by: Michael Baum 
Acked-by: Dariusz Sosnowski 

Best regards,
Dariusz Sosnowski


Re: [PATCH] net/ixgbe: increase vf reset timeout

2024-02-29 Thread Bruce Richardson
On Tue, Feb 27, 2024 at 12:35:28PM +, Medvedkin, Vladimir wrote:
> On 30/01/2024 10:00, Kevin Traynor wrote:
> > When vf issues a reset to pf there is a 50 msec wait plus an additional
> > max of 1 msec for the pf to indicate the reset is complete before
> > timeout.
> > 
> > In some cases, it is seen that the reset is timing out, in which case
> > the reset does not complete and an error is returned.
> > 
> > In order to account for this, continue to wait an initial 50 msecs,
> > but then allow a max of an additional 50 msecs for the command to
> > complete.
> > 
> > Fixes: af75078fece3 ("first public release")

Cc: sta...@dpdk.org

> > 
> > Signed-off-by: Kevin Traynor 
> > ---
> >   drivers/net/ixgbe/base/ixgbe_type.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ixgbe/base/ixgbe_type.h 
> > b/drivers/net/ixgbe/base/ixgbe_type.h
> > index 1094df5891..35212a561b 100644
> > --- a/drivers/net/ixgbe/base/ixgbe_type.h
> > +++ b/drivers/net/ixgbe/base/ixgbe_type.h
> > @@ -1801,5 +1801,5 @@ enum {
> >   #define IXGBE_VFRE_ENABLE_ALL 0x
> > -#define IXGBE_VF_INIT_TIMEOUT  200 /* Number of retries to clear RSTI 
> > */
> > +#define IXGBE_VF_INIT_TIMEOUT  1 /* Number of retries to clear 
> > RSTI */
> >   /* RDHMPN and TDHMPN bitmasks */
> 
> Acked-by: Vladimir Medvedkin 
> 
This changes the code in the "base" directory, which is not ideal, but I
see no other way to fix the issue reported.

Applied to dpdk-next-net-intel.

Thanks,
/Bruce


[PATCH v3] app/dma-perf: support bi-directional transfer

2024-02-29 Thread Amit Prakash Shukla
Adds bi-directional DMA transfer support to test performance.
One DMA device on one core will do mem2dev transfer and another
DMA device on another core will do dev2mem transfer.

Depends-on: series-31298 ("PCI Dev and SG copy support")

Signed-off-by: Amit Prakash Shukla 
---
v3:
- Rebased with dependent series.

 app/test-dma-perf/benchmark.c | 65 +++
 app/test-dma-perf/config.ini  |  5 +++
 app/test-dma-perf/main.c  | 18 +-
 app/test-dma-perf/main.h  |  1 +
 4 files changed, 74 insertions(+), 15 deletions(-)

diff --git a/app/test-dma-perf/benchmark.c b/app/test-dma-perf/benchmark.c
index d821af8532..79de80499f 100644
--- a/app/test-dma-perf/benchmark.c
+++ b/app/test-dma-perf/benchmark.c
@@ -148,12 +148,19 @@ cache_flush_buf(__rte_unused struct rte_mbuf **array,
 
 static int
 vchan_data_populate(uint32_t dev_id, struct rte_dma_vchan_conf *qconf,
-   struct test_configure *cfg)
+   struct test_configure *cfg, uint16_t dev_num)
 {
struct rte_dma_info info;
 
qconf->direction = cfg->transfer_dir;
 
+   /* If its a bi-directional test, configure odd device for inbound dma
+* transfer and even device for outbound dma transfer.
+*/
+   if (cfg->is_bidir)
+   qconf->direction = (dev_num % 2) ? RTE_DMA_DIR_MEM_TO_DEV :
+  RTE_DMA_DIR_DEV_TO_MEM;
+
rte_dma_info_get(dev_id, &info);
if (!(RTE_BIT64(qconf->direction) & info.dev_capa))
return -1;
@@ -185,14 +192,15 @@ vchan_data_populate(uint32_t dev_id, struct 
rte_dma_vchan_conf *qconf,
 
 /* Configuration of device. */
 static void
-configure_dmadev_queue(uint32_t dev_id, struct test_configure *cfg, uint8_t 
sges_max)
+configure_dmadev_queue(uint32_t dev_id, struct test_configure *cfg, uint8_t 
sges_max,
+  uint16_t dev_num)
 {
uint16_t vchan = 0;
struct rte_dma_info info;
struct rte_dma_conf dev_config = { .nb_vchans = 1 };
struct rte_dma_vchan_conf qconf = { 0 };
 
-   if (vchan_data_populate(dev_id, &qconf, cfg) != 0)
+   if (vchan_data_populate(dev_id, &qconf, cfg, dev_num) != 0)
rte_exit(EXIT_FAILURE, "Error with vchan data populate.\n");
 
if (rte_dma_configure(dev_id, &dev_config) != 0)
@@ -239,7 +247,7 @@ config_dmadevs(struct test_configure *cfg)
}
 
ldm->dma_ids[i] = dev_id;
-   configure_dmadev_queue(dev_id, cfg, nb_sges);
+   configure_dmadev_queue(dev_id, cfg, nb_sges, nb_dmadevs);
++nb_dmadevs;
}
 
@@ -508,7 +516,7 @@ setup_memory_env(struct test_configure *cfg,
}
}
 
-   if (cfg->transfer_dir == RTE_DMA_DIR_DEV_TO_MEM) {
+   if (cfg->transfer_dir == RTE_DMA_DIR_DEV_TO_MEM && !cfg->is_bidir) {
ext_buf_info->free_cb = dummy_free_ext_buf;
ext_buf_info->fcb_opaque = NULL;
for (i = 0; i < nr_buf; i++) {
@@ -521,7 +529,7 @@ setup_memory_env(struct test_configure *cfg,
}
}
 
-   if (cfg->transfer_dir == RTE_DMA_DIR_MEM_TO_DEV) {
+   if (cfg->transfer_dir == RTE_DMA_DIR_MEM_TO_DEV && !cfg->is_bidir) {
ext_buf_info->free_cb = dummy_free_ext_buf;
ext_buf_info->fcb_opaque = NULL;
for (i = 0; i < nr_buf; i++) {
@@ -534,6 +542,19 @@ setup_memory_env(struct test_configure *cfg,
}
}
 
+   if (cfg->is_bidir) {
+   ext_buf_info->free_cb = dummy_free_ext_buf;
+   ext_buf_info->fcb_opaque = NULL;
+   for (i = 0; i < nr_buf; i++) {
+   /* Using mbuf structure to hold remote iova address. */
+   rte_pktmbuf_attach_extbuf((*srcs)[i],
+   (void *)(cfg->vchan_dev.raddr + (i * buf_size)),
+   (rte_iova_t)(cfg->vchan_dev.raddr + (i * 
buf_size)),
+   0, ext_buf_info);
+   rte_mbuf_ext_refcnt_update(ext_buf_info, 1);
+   }
+   }
+
if (cfg->is_sg) {
uint8_t nb_src_sges = cfg->nb_src_sges;
uint8_t nb_dst_sges = cfg->nb_dst_sges;
@@ -676,16 +697,30 @@ mem_copy_benchmark(struct test_configure *cfg)
lcores[i]->nr_buf = (uint32_t)(nr_buf / nb_workers);
lcores[i]->buf_size = buf_size;
lcores[i]->test_secs = test_secs;
-   lcores[i]->srcs = srcs + offset;
-   lcores[i]->dsts = dsts + offset;
lcores[i]->scenario_id = cfg->scenario_id;
lcores[i]->lcore_id = lcore_id;
 
-   if (cfg->is_sg) {
-   lcores[i]->sge.nb_srcs = cfg->nb_src_sges;
-   lcores[i]->sge.nb_dsts = cfg->nb_dst_sges;
-   lcores[i]->sge.srcs = src_sges + (nr_sgsrc / nb_workers

Re: [PATCH v5] app/testpmd: support updating flow rule actions

2024-02-29 Thread Ferruh Yigit
On 2/29/2024 2:15 AM, Oleksandr Kolomeiets wrote:
> "flow update" updates a flow rule specified by a rule ID with a
> new action list by making a call to "rte_flow_actions_update()":
> 
> flow update {port_id} {rule_id}
> actions {action} [/ {action} [...]] / end [user_id]
> 
> Creating, updating and destroying a flow rule:
> 
> testpmd> flow create 0 group 1 pattern eth / end actions drop / end
> Flow rule #0 created
> testpmd> flow update 0 0 actions queue index 1 / end
> Flow rule #0 updated with new actions
> testpmd> flow destroy 0 rule 0
> Flow rule #0 destroyed
> 
> Signed-off-by: Oleksandr Kolomeiets 
> Reviewed-by: Mykola Kostenok 
> Reviewed-by: Christian Koue Muf 
>

Acked-by: Ferruh Yigit 

Applied to dpdk-next-net/main, thanks.


Re: [PATCH v6 20/23] mbuf: remove and stop using rte marker fields

2024-02-29 Thread Dodji Seketeli
David Marchand  writes:

> On Wed, Feb 28, 2024 at 3:04 PM Dodji Seketeli  wrote:
>> > Btw, I see no way to suppress this (except a global [suppress_type]
>> > name = rte_mbuf)...
>>
>> Right.
>>
>> To avoid having subsequent changes to that type from being "overly"
>> suppressed, maybe do something like:
>>
>> [suppress_type]
>>  name = rte_mbuf
>>  has_size_change = no
>>  has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, 
>> cacheline1}
>>
>> That way, only size-impacting changes to struct rte_mbuf in its form
>> that predates this patch would be suppressed, hopefully.
>
> Do you mean, only changes *not* size-impacting would be suppressed?

Yes, of course.  Sorry for the typo.  You are right.

> This is slightly better than the suppression on the whole rte_mbuf
> object, but it won't catch field reordering iiuc.

Indeed.

>
> On the other hand, now that I try reordering fields (to test this
> suggestion of yours), I get build failures all over the DPDK tree
> because we have many build checks to ensure those fields are at known
> locations...
> So maybe we can relax and just go with the full suppression.

Yes, that would make sense.

Thanks!

-- 
Dodji



Re: [PATCH] net/iavf: fix access to null value

2024-02-29 Thread Bruce Richardson
On Fri, Feb 09, 2024 at 03:27:56PM +0100, Burakov, Anatoly wrote:
> On 1/24/2024 3:05 AM, Mingjin Ye wrote:
> > The "vsi" may be null, so it needs to be used after checking.
> > 
> > Fixes: ab28aad9c24f ("net/iavf: fix Rx Tx burst in multi-process")

Fixes: 9f6186cf0d80 ("net/iavf: fix Rx/Tx burst in multi-process")

> > Cc: sta...@dpdk.org
> > 
> > Signed-off-by: Mingjin Ye 
> > ---
> Acked-by: Anatoly Burakov 
>
Applied to dpdk-next-net-intel.

Thanks,
/Bruce


[PATCH v2] ethdev: add Linux ethtool link mode conversion

2024-02-29 Thread Thomas Monjalon
Speed capabilities of a NIC may be discovered through its Linux
kernel driver. It is especially useful for bifurcated drivers,
so they don't have to duplicate the same logic in the DPDK driver.

Parsing ethtool speed capabilities is made easy thanks to
the functions added in ethdev for internal usage only.
Of course these functions work only on Linux,
so they are not compiled in other environments.

In order to ease parsing, the ethtool macro names are parsed
externally in a shell command which generates a C array
included in this patch.
It also avoids to depend on a kernel version.
This C array should be updated in future to get latest ethtool bits.
Note it is easier to update this array than adding new cases
in a parsing code.

The types in the functions are following the ethtool type:
uint32_t for bitmaps, and int8_t for the number of 32-bitmaps.

Signed-off-by: Thomas Monjalon 
---

A follow-up patch will be sent to use these functions in mlx5.
I suspect mana could use this parsing as well.

---
 lib/ethdev/ethdev_linux_ethtool.c | 161 ++
 lib/ethdev/ethdev_linux_ethtool.h |  41 
 lib/ethdev/meson.build|   9 ++
 lib/ethdev/version.map|   3 +
 4 files changed, 214 insertions(+)
 create mode 100644 lib/ethdev/ethdev_linux_ethtool.c
 create mode 100644 lib/ethdev/ethdev_linux_ethtool.h

diff --git a/lib/ethdev/ethdev_linux_ethtool.c 
b/lib/ethdev/ethdev_linux_ethtool.c
new file mode 100644
index 00..0ece172a75
--- /dev/null
+++ b/lib/ethdev/ethdev_linux_ethtool.c
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2024 NVIDIA Corporation & Affiliates
+ */
+
+#include 
+
+#include "rte_ethdev.h"
+#include "ethdev_linux_ethtool.h"
+
+/* Link modes sorted with index as defined in ethtool.
+ * Values are speed in Mbps with LSB indicating duplex.
+ *
+ * The ethtool bits definition should not change as it is a kernel API.
+ * Using raw numbers directly avoids checking API availability
+ * and allows to compile with new bits included even on an old kernel.
+ *
+ * The array below is built from bit definitions with this shell command:
+ *   sed -rn 's;.*(ETHTOOL_LINK_MODE_)([0-9]+)([0-9a-zA-Z_]*).*= *([0-9]*).*;'\
+ *   '[\4] = \2, /\* \1\2\3 *\/;p' /usr/include/linux/ethtool.h |
+ *   awk '/_Half_/{$3=$3+1","}1'
+ */
+static uint32_t link_modes[] = {
+ [0] =  11, /* ETHTOOL_LINK_MODE_10baseT_Half_BIT */
+ [1] =  10, /* ETHTOOL_LINK_MODE_10baseT_Full_BIT */
+ [2] = 101, /* ETHTOOL_LINK_MODE_100baseT_Half_BIT */
+ [3] = 100, /* ETHTOOL_LINK_MODE_100baseT_Full_BIT */
+ [4] =1001, /* ETHTOOL_LINK_MODE_1000baseT_Half_BIT */
+ [5] =1000, /* ETHTOOL_LINK_MODE_1000baseT_Full_BIT */
+[12] =   1, /* ETHTOOL_LINK_MODE_1baseT_Full_BIT */
+[15] =2500, /* ETHTOOL_LINK_MODE_2500baseX_Full_BIT */
+[17] =1000, /* ETHTOOL_LINK_MODE_1000baseKX_Full_BIT */
+[18] =   1, /* ETHTOOL_LINK_MODE_1baseKX4_Full_BIT */
+[19] =   1, /* ETHTOOL_LINK_MODE_1baseKR_Full_BIT */
+[20] =   1, /* ETHTOOL_LINK_MODE_1baseR_FEC_BIT */
+[21] =   2, /* ETHTOOL_LINK_MODE_2baseMLD2_Full_BIT */
+[22] =   2, /* ETHTOOL_LINK_MODE_2baseKR2_Full_BIT */
+[23] =   4, /* ETHTOOL_LINK_MODE_4baseKR4_Full_BIT */
+[24] =   4, /* ETHTOOL_LINK_MODE_4baseCR4_Full_BIT */
+[25] =   4, /* ETHTOOL_LINK_MODE_4baseSR4_Full_BIT */
+[26] =   4, /* ETHTOOL_LINK_MODE_4baseLR4_Full_BIT */
+[27] =   56000, /* ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT */
+[28] =   56000, /* ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT */
+[29] =   56000, /* ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT */
+[30] =   56000, /* ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT */
+[31] =   25000, /* ETHTOOL_LINK_MODE_25000baseCR_Full_BIT */
+[32] =   25000, /* ETHTOOL_LINK_MODE_25000baseKR_Full_BIT */
+[33] =   25000, /* ETHTOOL_LINK_MODE_25000baseSR_Full_BIT */
+[34] =   5, /* ETHTOOL_LINK_MODE_5baseCR2_Full_BIT */
+[35] =   5, /* ETHTOOL_LINK_MODE_5baseKR2_Full_BIT */
+[36] =  10, /* ETHTOOL_LINK_MODE_10baseKR4_Full_BIT */
+[37] =  10, /* ETHTOOL_LINK_MODE_10baseSR4_Full_BIT */
+[38] =  10, /* ETHTOOL_LINK_MODE_10baseCR4_Full_BIT */
+[39] =  10, /* ETHTOOL_LINK_MODE_10baseLR4_ER4_Full_BIT */
+[40] =   5, /* ETHTOOL_LINK_MODE_5baseSR2_Full_BIT */
+[41] =1000, /* ETHTOOL_LINK_MODE_1000baseX_Full_BIT */
+[42] =   1, /* ETHTOOL_LINK_MODE_1baseCR_Full_BIT */
+[43] =   1, /* ETHTOOL_LINK_MODE_1baseSR_Full_BIT */
+[44] =   1, /* ETHTOOL_LINK_MODE_1baseLR_Full_BIT */
+[45] =   1, /* ETHTOOL_LINK_MODE_1baseLRM_Full_BIT */
+[46] =   1

RE: [EXT] [PATCH v6 4/4] test/cryptodev: add tests for GCM with AAD

2024-02-29 Thread Akhil Goyal
> Adding one new unit test code for validating the features
> added as part of GCM with 64 byte AAD.
> The new test case adds one new test for GCM algo for both
> encrypt and decrypt operations.
> 
> Signed-off-by: Nishikant Nayak 
> Acked-by: Ciara Power 
> ---
What is the need for this new test vector? How is this case not covered in 
existing cases?
Can you explain in the patch description?
How is it different than gcm_test_case_aad_2 case and other gcm 128 cases?


> @@ -12719,16 +12737,22 @@ test_authenticated_decryption_oop(const struct
> aead_test_data *tdata)
> 
>   /* Verify the capabilities */
>   struct rte_cryptodev_sym_capability_idx cap_idx;
> + const struct rte_cryptodev_symmetric_capability *capability;
>   cap_idx.type = RTE_CRYPTO_SYM_XFORM_AEAD;
>   cap_idx.algo.aead = tdata->algo;
> - if (rte_cryptodev_sym_capability_get(ts_params->valid_devs[0],
> - &cap_idx) == NULL)
> - return TEST_SKIPPED;
> + capability = rte_cryptodev_sym_capability_get(ts_params-
> >valid_devs[0],
> + &cap_idx);
> 
>   /* not supported with CPU crypto and raw data-path APIs*/
>   if (gbl_action_type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO ||
>   global_api_test_type == CRYPTODEV_RAW_API_TEST)
>   return TEST_SKIPPED;
> + if (capability == NULL)
> + return TEST_SKIPPED;

You should check the capability just after it is retrieved.
 


[PATCH v6 1/4] eal: add pointer compression functions

2024-02-29 Thread Paul Szczepanek
Add a new utility header for compressing pointers. The provided
functions can store pointers in 32-bit offsets.

The compression takes advantage of the fact that pointers are
usually located in a limited memory region (like a mempool).
We can compress them by converting them to offsets from a base
memory address. Offsets can be stored in fewer bytes (dictated
by the memory region size and alignment of the pointer).
For example: an 8 byte aligned pointer which is part of a 32GB
memory pool can be stored in 4 bytes.

This can be used for example when passing caches full of pointers
between threads. Memory containing the pointers is copied multiple
times which is especially costly between cores. This compression
method will allow us to shrink the memory size copied. Further
commits add a test to evaluate the effectiveness of this approach.

Suggested-by: Honnappa Nagarahalli 
Signed-off-by: Paul Szczepanek 
Signed-off-by: Kamalakshitha Aligeri 
Reviewed-by: Honnappa Nagarahalli 
---
 .mailmap   |   1 +
 lib/eal/include/meson.build|   1 +
 lib/eal/include/rte_ptr_compress.h | 266 +
 3 files changed, 268 insertions(+)
 create mode 100644 lib/eal/include/rte_ptr_compress.h

diff --git a/.mailmap b/.mailmap
index 3f5bab26a8..004751d27a 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1069,6 +1069,7 @@ Paul Greenwalt 
 Paulis Gributs 
 Paul Luse 
 Paul M Stillwell Jr 
+Paul Szczepanek 
 Pavan Kumar Linga 
 Pavan Nikhilesh  
 Pavel Belous 
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index e94b056d46..ce2c733633 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,7 @@ headers += files(
 'rte_pci_dev_features.h',
 'rte_per_lcore.h',
 'rte_pflock.h',
+   'rte_ptr_compress.h',
 'rte_random.h',
 'rte_reciprocal.h',
 'rte_seqcount.h',
diff --git a/lib/eal/include/rte_ptr_compress.h 
b/lib/eal/include/rte_ptr_compress.h
new file mode 100644
index 00..47a72e4213
--- /dev/null
+++ b/lib/eal/include/rte_ptr_compress.h
@@ -0,0 +1,266 @@
+/* SPDX-License-Identifier: BSD-shift-Clause
+ * Copyright(c) 2023 Arm Limited
+ */
+
+#ifndef RTE_PTR_COMPRESS_H
+#define RTE_PTR_COMPRESS_H
+
+/**
+ * @file
+ * Pointer compression and decompression functions.
+ *
+ * When passing arrays full of pointers between threads, memory containing
+ * the pointers is copied multiple times which is especially costly between
+ * cores. These functions allow us to compress the pointers.
+ *
+ * Compression takes advantage of the fact that pointers are usually located in
+ * a limited memory region (like a mempool). We compress them by converting 
them
+ * to offsets from a base memory address. Offsets can be stored in fewer bytes.
+ *
+ * The compression functions come in two varieties: 32-bit and 16-bit.
+ *
+ * To determine how many bits are needed to compress the pointer calculate
+ * the biggest offset possible (highest value pointer - base pointer)
+ * and shift the value right according to alignment (shift by exponent of the
+ * power of 2 of alignment: aligned by 4 - shift by 2, aligned by 8 - shift by
+ * 3, etc.). The resulting value must fit in either 32 or 16 bits.
+ *
+ * For usage example and further explanation please see "Pointer Compression" 
in
+ * doc/guides/prog_guide/env_abstraction_layer.rst
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Compress pointers into 32-bit offsets from base pointer.
+ *
+ * @note It is programmer's responsibility to ensure the resulting offsets fit
+ * into 32 bits. Alignment of the structures pointed to by the pointers allows
+ * us to drop bits from the offsets. This is controlled by the bit_shift
+ * parameter. This means that if structures are aligned by 8 bytes they must be
+ * within 32GB of the base pointer. If there is no such alignment guarantee 
they
+ * must be within 4GB.
+ *
+ * @param ptr_base
+ *   A pointer used to calculate offsets of pointers in src_table.
+ * @param src_table
+ *   A pointer to an array of pointers.
+ * @param dest_table
+ *   A pointer to an array of compressed pointers returned by this function.
+ * @param n
+ *   The number of objects to compress, must be strictly positive.
+ * @param bit_shift
+ *   Byte alignment of memory pointed to by the pointers allows for
+ *   bits to be dropped from the offset and hence widen the memory region that
+ *   can be covered. This controls how many bits are right shifted.
+ **/
+static __rte_always_inline void
+rte_ptr_compress_32(void *ptr_base, void **src_table,
+   uint32_t *dest_table, unsigned int n, unsigned int bit_shift)
+{
+   unsigned int i = 0;
+#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32
+   svuint64_t v_ptr_table;
+   svbool_t pg = svwhilelt_b64(i, n);
+   do {
+   v_ptr_table = svld1_u64(pg, (uint64_t *)

[PATCH v6 0/4] add pointer compression API

2024-02-29 Thread Paul Szczepanek
This patchset is proposing adding a new EAL header with utility functions
that allow compression of arrays of pointers.

When passing caches full of pointers between threads, memory containing
the pointers is copied multiple times which is especially costly between
cores. A compression method will allow us to shrink the memory size
copied.

The compression takes advantage of the fact that pointers are usually
located in a limited memory region (like a mempool). We can compress them
by converting them to offsets from a base memory address.

Offsets can be stored in fewer bytes (dictated by the memory region size
and alignment of the pointer). For example: an 8 byte aligned pointer
which is part of a 32GB memory pool can be stored in 4 bytes. The API is
very generic and does not assume mempool pointers, any pointer can be
passed in.

Compression is based on few and fast operations and especially with vector
instructions leveraged creates minimal overhead.

The API accepts and returns arrays because the overhead means it only is
worth it when done in bulk.

Test is added that shows potential performance gain from compression. In
this test an array of pointers is passed through a ring between two cores.
It shows the gain which is dependent on the bulk operation size. In this
synthetic test run on ampere altra a substantial (up to 25%) performance
gain is seen if done in bulk size larger than 32. At 32 it breaks even and
lower sizes create a small (less than 5%) slowdown due to overhead.

In a more realistic mock application running the l3 forwarding dpdk
example that works in pipeline mode on two cores this translated into a
~5% throughput increase on an ampere altra.

v2:
* addressed review comments (style, explanations and typos)
* lowered bulk iterations closer to original numbers to keep runtime short
* fixed pointer size warning on 32-bit arch
v3:
* added 16-bit versions of compression functions and tests
* added documentation of these new utility functions in the EAL guide
v4:
* added unit test
* fix bug in NEON implementation of 32-bit decompress
v5:
* disable NEON and SVE implementation on AARCH32 due to wrong pointer size
v6:
* added example usage to commit message of the initial commit

Paul Szczepanek (4):
  eal: add pointer compression functions
  test: add pointer compress tests to ring perf test
  docs: add pointer compression to the EAL guide
  test: add unit test for ptr compression

 .mailmap  |   1 +
 app/test/meson.build  |   1 +
 app/test/test_eal_ptr_compress.c  | 108 ++
 app/test/test_ring.h  |  94 -
 app/test/test_ring_perf.c | 354 --
 .../prog_guide/env_abstraction_layer.rst  | 142 +++
 lib/eal/include/meson.build   |   1 +
 lib/eal/include/rte_ptr_compress.h| 266 +
 8 files changed, 843 insertions(+), 124 deletions(-)
 create mode 100644 app/test/test_eal_ptr_compress.c
 create mode 100644 lib/eal/include/rte_ptr_compress.h

--
2.25.1



[PATCH v6 2/4] test: add pointer compress tests to ring perf test

2024-02-29 Thread Paul Szczepanek
Add a test that runs a zero copy burst enqueue and dequeue on a ring
of raw pointers and compressed pointers at different burst sizes to
showcase performance benefits of newly added pointer compression APIs.

Refactored threading code to pass more parameters to threads to
reuse existing code. Added more bulk sizes to showcase their effects
on compression. Adjusted loop iteration numbers to take into account
bulk sizes to keep runtime constant (instead of number of operations).

Adjusted old printfs to match new ones which have aligned numbers.

Signed-off-by: Paul Szczepanek 
Reviewed-by: Honnappa Nagarahalli 
---
 app/test/test_ring.h  |  94 +-
 app/test/test_ring_perf.c | 354 +-
 2 files changed, 324 insertions(+), 124 deletions(-)

diff --git a/app/test/test_ring.h b/app/test/test_ring.h
index 45c263f3ff..3b00f2465d 100644
--- a/app/test/test_ring.h
+++ b/app/test/test_ring.h
@@ -1,10 +1,12 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2019-2023 Arm Limited
  */

 #include 
 #include 
 #include 
+#include 
+#include 

 /* API type to call
  * rte_ring__enqueue_
@@ -25,6 +27,10 @@
 #define TEST_RING_ELEM_BULK 16
 #define TEST_RING_ELEM_BURST 32

+#define TEST_RING_ELEM_BURST_ZC 64
+#define TEST_RING_ELEM_BURST_ZC_COMPRESS_PTR_16 128
+#define TEST_RING_ELEM_BURST_ZC_COMPRESS_PTR_32 256
+
 #define TEST_RING_IGNORE_API_TYPE ~0U

 /* This function is placed here as it is required for both
@@ -101,6 +107,9 @@ static inline unsigned int
 test_ring_enqueue(struct rte_ring *r, void **obj, int esize, unsigned int n,
unsigned int api_type)
 {
+   unsigned int ret;
+   struct rte_ring_zc_data zcd = {0};
+
/* Legacy queue APIs? */
if (esize == -1)
switch (api_type) {
@@ -152,6 +161,46 @@ test_ring_enqueue(struct rte_ring *r, void **obj, int 
esize, unsigned int n,
case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
return rte_ring_mp_enqueue_burst_elem(r, obj, esize, n,
NULL);
+   case (TEST_RING_ELEM_BURST_ZC):
+   ret = rte_ring_enqueue_zc_burst_elem_start(
+   r, esize, n, &zcd, NULL);
+   if (unlikely(ret == 0))
+   return 0;
+   rte_memcpy(zcd.ptr1, (char *)obj, zcd.n1 * esize);
+   if (unlikely(zcd.ptr2 != NULL))
+   rte_memcpy(zcd.ptr2,
+   (char *)obj + zcd.n1 * esize,
+   (ret - zcd.n1) * esize);
+   rte_ring_enqueue_zc_finish(r, ret);
+   return ret;
+   case (TEST_RING_ELEM_BURST_ZC_COMPRESS_PTR_16):
+   /* rings cannot store uint16_t so we use a uint32_t
+* and half the requested number of elements
+* and compensate by doubling the returned numbers
+*/
+   ret = rte_ring_enqueue_zc_burst_elem_start(
+   r, sizeof(uint32_t), n / 2, &zcd, NULL);
+   if (unlikely(ret == 0))
+   return 0;
+   rte_ptr_compress_16(0, obj, zcd.ptr1, zcd.n1 * 2, 3);
+   if (unlikely(zcd.ptr2 != NULL))
+   rte_ptr_compress_16(0,
+   obj + (zcd.n1 * 2),
+   zcd.ptr2,
+   (ret - zcd.n1) * 2, 3);
+   rte_ring_enqueue_zc_finish(r, ret);
+   return ret * 2;
+   case (TEST_RING_ELEM_BURST_ZC_COMPRESS_PTR_32):
+   ret = rte_ring_enqueue_zc_burst_elem_start(
+   r, sizeof(uint32_t), n, &zcd, NULL);
+   if (unlikely(ret == 0))
+   return 0;
+   rte_ptr_compress_32(0, obj, zcd.ptr1, zcd.n1, 3);
+   if (unlikely(zcd.ptr2 != NULL))
+   rte_ptr_compress_32(0, obj + zcd.n1,
+   zcd.ptr2, ret - zcd.n1, 3);
+   rte_ring_enqueue_zc_finish(r, ret);
+   return ret;
default:
printf("Invalid API type\n");
return 0;
@@ -162,6 +211,9 @@ static inline unsigned int
 test_ring_dequeue(struct rte_ring *r, void **obj, int esize, unsigned int n,
unsigned int api_type)
 {
+   unsigned int ret;
+   struct rte_ring_zc_data zcd = {0};
+
/* Legacy queue APIs? */
if (esize == -1)

[PATCH v6 3/4] docs: add pointer compression to the EAL guide

2024-02-29 Thread Paul Szczepanek
Documentation added in the EAL guide for the new
utility functions for pointer compression
showing example code and potential usecases.

Signed-off-by: Paul Szczepanek 
Reviewed-by: Honnappa Nagarahalli 
---
 .../prog_guide/env_abstraction_layer.rst  | 142 ++
 1 file changed, 142 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst 
b/doc/guides/prog_guide/env_abstraction_layer.rst
index 6debf54efb..f04d032442 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -1192,3 +1192,145 @@ will not be deallocated.

 Any successful deallocation event will trigger a callback, for which user
 applications and other DPDK subsystems can register.
+
+.. _pointer_compression:
+
+Pointer Compression
+---
+
+Use ``rte_ptr_compress_16()`` and ``rte_ptr_decompress_16()`` to compress and
+decompress pointers into 16-bit offsets. Use ``rte_ptr_compress_32()`` and
+``rte_ptr_decompress_32()`` to compress and decompress pointers into 32-bit
+offsets.
+
+Compression takes advantage of the fact that pointers are usually located in a
+limited memory region (like a mempool). By converting them to offsets from a
+base memory address they can be stored in fewer bytes. How many bytes are 
needed
+to store the offset is dictated by the memory region size and alignment of
+objects the pointers point to.
+
+For example, a pointer which is part of a 4GB memory pool can be stored as 32
+bit offset. If the pointer points to memory that is 8 bytes aligned then 3 bits
+can be dropped from the offset and a 32GB memory pool can now fit in 32 bits.
+
+For performance reasons these requirements are not enforced programmatically.
+The programmer is responsible for ensuring that the combination of distance
+from the base pointer and memory alignment allow for storing of the offset in
+the number of bits indicated by the function name (16 or 32). Start of mempool
+memory would be a good candidate for the base pointer. Otherwise any pointer
+that precedes all pointers, is close enough and has the same alignment as the
+pointers being compressed will work.
+
+.. note::
+
+Performance gains depend on the batch size of pointers and CPU capabilities
+such as vector extensions. It's important to measure the performance
+increase on target hardware. A test called ``ring_perf_autotest`` in
+``dpdk-test`` can provide the measurements.
+
+Example usage
+~
+
+In this example we send pointers between two cores through a ring. While this
+is a realistic use case the code is simplified for demonstration purposes and
+does not have error handling.
+
+.. code-block:: c
+
+#include 
+#include 
+#include 
+#include 
+
+#define ITEMS_ARRAY_SIZE (1024)
+#define BATCH_SIZE (128)
+#define ALIGN_EXPONENT (3)
+#define ITEM_ALIGN (1<

[PATCH v6 4/4] test: add unit test for ptr compression

2024-02-29 Thread Paul Szczepanek
Test compresses and decompresses pointers with various combinations
of memory regions and alignments and verifies the pointers are
recovered correctly.

Signed-off-by: Paul Szczepanek 
---
 app/test/meson.build |   1 +
 app/test/test_eal_ptr_compress.c | 108 +++
 2 files changed, 109 insertions(+)
 create mode 100644 app/test/test_eal_ptr_compress.c

diff --git a/app/test/meson.build b/app/test/meson.build
index 4183d66b0e..3e172b154d 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -66,6 +66,7 @@ source_file_deps = {
 'test_dmadev_api.c': ['dmadev'],
 'test_eal_flags.c': [],
 'test_eal_fs.c': [],
+'test_eal_ptr_compress.c': [],
 'test_efd.c': ['efd', 'net'],
 'test_efd_perf.c': ['efd', 'hash'],
 'test_errno.c': [],
diff --git a/app/test/test_eal_ptr_compress.c b/app/test/test_eal_ptr_compress.c
new file mode 100644
index 00..c1c9a98be7
--- /dev/null
+++ b/app/test/test_eal_ptr_compress.c
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include "test.h"
+#include 
+#include 
+
+#include 
+
+#define MAX_ALIGN_EXPONENT 3
+#define PTRS_SIZE 16
+#define NUM_BASES 2
+#define NUM_REGIONS 4
+#define MAX_32BIT_REGION ((uint64_t)UINT32_MAX + 1)
+#define MAX_16BIT_REGION (UINT16_MAX + 1)
+
+static int
+test_eal_ptr_compress_params(
+   void *base,
+   uint64_t mem_sz,
+   unsigned int align_exp,
+   unsigned int num_ptrs,
+   bool use_32_bit)
+{
+   unsigned int i;
+   unsigned int align = 1 << align_exp;
+   void *ptrs[PTRS_SIZE] = {0};
+   void *ptrs_out[PTRS_SIZE] = {0};
+   uint32_t offsets32[PTRS_SIZE] = {0};
+   uint16_t offsets16[PTRS_SIZE] = {0};
+
+   for (i = 0; i < num_ptrs; i++) {
+   /* make pointers point at memory in steps of align */
+   /* alternate steps from the start and end of memory region */
+   if ((i & 1) == 1)
+   ptrs[i] = (char *)base + mem_sz - i * align;
+   else
+   ptrs[i] = (char *)base + i * align;
+   }
+
+   if (use_32_bit) {
+   rte_ptr_compress_32(base, ptrs, offsets32, num_ptrs, align_exp);
+   rte_ptr_decompress_32(base, offsets32, ptrs_out, num_ptrs,
+   align_exp);
+   } else {
+   rte_ptr_compress_16(base, ptrs, offsets16, num_ptrs, align_exp);
+   rte_ptr_decompress_16(base, offsets16, ptrs_out, num_ptrs,
+   align_exp);
+   }
+
+   TEST_ASSERT_BUFFERS_ARE_EQUAL(ptrs, ptrs_out, sizeof(void *) * num_ptrs,
+   "Decompressed pointers corrupted\nbase pointer: %p, "
+   "memory region size: %" PRIu64 ", alignment exponent: %u, "
+   "num of pointers: %u, using %s offsets",
+   base, mem_sz, align_exp, num_ptrs,
+   use_32_bit ? "32-bit" : "16-bit");
+
+   return 0;
+}
+
+static int
+test_eal_ptr_compress(void)
+{
+   unsigned int j, k, n;
+   int ret = 0;
+   void * const bases[NUM_BASES] = { (void *)0, (void *)UINT16_MAX };
+   /* maximum size for pointers aligned by consecutive powers of 2 */
+   const uint64_t region_sizes_16[NUM_REGIONS] = {
+   MAX_16BIT_REGION,
+   MAX_16BIT_REGION * 2,
+   MAX_16BIT_REGION * 4,
+   MAX_16BIT_REGION * 8,
+   };
+   const uint64_t region_sizes_32[NUM_REGIONS] = {
+   MAX_32BIT_REGION,
+   MAX_32BIT_REGION * 2,
+   MAX_32BIT_REGION * 4,
+   MAX_32BIT_REGION * 8,
+   };
+
+   for (j = 0; j < NUM_REGIONS; j++) {
+   for (k = 0; k < NUM_BASES; k++) {
+   for (n = 1; n < PTRS_SIZE; n++) {
+   ret |= test_eal_ptr_compress_params(
+   bases[k],
+   region_sizes_16[j],
+   j /* exponent of alignment */,
+   n,
+   false
+   );
+   ret |= test_eal_ptr_compress_params(
+   bases[k],
+   region_sizes_32[j],
+   j /* exponent of alignment */,
+   n,
+   true
+   );
+   if (ret != 0)
+   return ret;
+   }
+   }
+   }
+
+   return ret;
+}
+
+REGISTER_FAST_TEST(eal_ptr_compress_autotest, true, true, 
test_eal_ptr_compress);
--
2.25.1



RE: [EXT] [PATCH v6 3/4] crypto/qat: update headers for GEN LCE support

2024-02-29 Thread Akhil Goyal
> This patch handles the changes required for updating the common
> header fields specific to GEN LCE, Also added/updated of the response
> processing APIs based on GEN LCE requirement.
> 
> Signed-off-by: Nishikant Nayak 
> Acked-by: Ciara Power 
> ---
> v2:
> - Renamed device from GEN 5 to GEN LCE.
> - Removed unused code.
> - Updated macro names.
> - Added GEN LCE specific API for deque burst.
> - Fixed code formatting.
> ---
> ---
>  drivers/crypto/qat/qat_sym.c | 16 ++-
>  drivers/crypto/qat/qat_sym.h | 60 ++-
>  drivers/crypto/qat/qat_sym_session.c | 62 +++-
>  drivers/crypto/qat/qat_sym_session.h | 10 -
>  4 files changed, 140 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/crypto/qat/qat_sym.c b/drivers/crypto/qat/qat_sym.c
> index 6e03bde841..439a3fc00b 100644
> --- a/drivers/crypto/qat/qat_sym.c
> +++ b/drivers/crypto/qat/qat_sym.c
> @@ -180,7 +180,15 @@ qat_sym_dequeue_burst(void *qp, struct rte_crypto_op
> **ops,
>   uint16_t nb_ops)
>  {
>   return qat_dequeue_op_burst(qp, (void **)ops,
> - qat_sym_process_response, nb_ops);
> + qat_sym_process_response, nb_ops);

Unnecessary change. Please remove unnecessary changes which should not be part 
of this patch.

The maximum length of characters in a line is 100 now. You can format the code 
as per that.
Since QAT has long macros etc. it would be better to leverage the 100 character 
per line.
The code would look more readable.
This is a general comment on the complete patchset.

> +}
> +
> +uint16_t
> +qat_sym_dequeue_burst_gen_lce(void *qp, struct rte_crypto_op **ops,
> + uint16_t nb_ops)
> +{
> + return qat_dequeue_op_burst(qp, (void **)ops,
> + qat_sym_process_response_gen_lce, nb_ops);
>  }
> 
>  int
> @@ -200,6 +208,7 @@ qat_sym_dev_create(struct qat_pci_device
> *qat_pci_dev,
>   char capa_memz_name[RTE_CRYPTODEV_NAME_MAX_LEN];
>   struct rte_cryptodev *cryptodev;
>   struct qat_cryptodev_private *internals;
> + enum qat_device_gen qat_dev_gen = qat_pci_dev->qat_dev_gen;
>   const struct qat_crypto_gen_dev_ops *gen_dev_ops =
>   &qat_sym_gen_dev_ops[qat_pci_dev->qat_dev_gen];
> 
> @@ -249,7 +258,10 @@ qat_sym_dev_create(struct qat_pci_device
> *qat_pci_dev,
>   cryptodev->dev_ops = gen_dev_ops->cryptodev_ops;
> 
>   cryptodev->enqueue_burst = qat_sym_enqueue_burst;
> - cryptodev->dequeue_burst = qat_sym_dequeue_burst;
> + if (qat_dev_gen == QAT_GEN_LCE)
> + cryptodev->dequeue_burst = qat_sym_dequeue_burst_gen_lce;
> + else
> + cryptodev->dequeue_burst = qat_sym_dequeue_burst;
> 
>   cryptodev->feature_flags = gen_dev_ops-
> >get_feature_flags(qat_pci_dev);
> 
> diff --git a/drivers/crypto/qat/qat_sym.h b/drivers/crypto/qat/qat_sym.h
> index f2f197d050..3461113c13 100644
> --- a/drivers/crypto/qat/qat_sym.h
> +++ b/drivers/crypto/qat/qat_sym.h
> @@ -90,7 +90,7 @@
>  /*
>   * Maximum number of SGL entries
>   */
> -#define QAT_SYM_SGL_MAX_NUMBER   16
> +#define QAT_SYM_SGL_MAX_NUMBER 16

Again unnecessary change.

> 
>  /* Maximum data length for single pass GMAC: 2^14-1 */
>  #define QAT_AES_GMAC_SPC_MAX_SIZE 16383
> @@ -142,6 +142,10 @@ uint16_t
>  qat_sym_dequeue_burst(void *qp, struct rte_crypto_op **ops,
>   uint16_t nb_ops);
> 
> +uint16_t
> +qat_sym_dequeue_burst_gen_lce(void *qp, struct rte_crypto_op **ops,
> + uint16_t nb_ops);
> +
>  #ifdef RTE_QAT_OPENSSL
>  /** Encrypt a single partial block
>   *  Depends on openssl libcrypto
> @@ -390,6 +394,52 @@ qat_sym_process_response(void **op, uint8_t *resp,
> void *op_cookie,
>   return 1;
>  }
> 
> +static __rte_always_inline int
> +qat_sym_process_response_gen_lce(void **op, uint8_t *resp,
> + void *op_cookie __rte_unused,
> + uint64_t *dequeue_err_count __rte_unused)
> +{
> + struct icp_qat_fw_comn_resp *resp_msg =
> + (struct icp_qat_fw_comn_resp *)resp;
> + struct rte_crypto_op *rx_op = (struct rte_crypto_op *)(uintptr_t)
> + (resp_msg->opaque_data);
> + struct qat_sym_session *sess;
> +
> +#if RTE_LOG_DP_LEVEL >= RTE_LOG_DEBUG
> + QAT_DP_HEXDUMP_LOG(DEBUG, "qat_response:", (uint8_t *)resp_msg,
> + sizeof(struct icp_qat_fw_comn_resp));
> +#endif
> +
> + sess = CRYPTODEV_GET_SYM_SESS_PRIV(rx_op->sym->session);
> +
> + rx_op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
> +
> + if (ICP_QAT_FW_COMN_STATUS_FLAG_OK !=
> +
>   ICP_QAT_FW_COMN_RESP_UNSUPPORTED_REQUEST_STAT_GET(
> + resp_msg->comn_hdr.comn_status))
> + rx_op->status = RTE_CRYPTO_OP_STATUS_NOT_PROCESSED;
> +
> + else if (ICP_QAT_FW_COMN_STATUS_FLAG_OK !=
> + ICP_QAT_FW_COMN_RESP_INVALID_PARAM_STAT_GET(
> + resp_msg->comn_hdr.comn_status))
> +

[PATCH 1/2] net/mlx5: remove code duplications

2024-02-29 Thread Gregory Etelson
Remove code duplications in DV L3 items validation translation.

Fixes: 3193c2494eea ("net/mlx5: fix L4 protocol validation")

Cc: sta...@dpdk.org

Signed-off-by: Gregory Etelson 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5_flow_dv.c | 151 +---
 1 file changed, 43 insertions(+), 108 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 18f09b22be..fe0a06f364 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -7488,6 +7488,40 @@ flow_dv_validate_item_flex(struct rte_eth_dev *dev,
return 0;
 }
 
+static __rte_always_inline uint8_t
+mlx5_flow_l3_next_protocol(const struct rte_flow_item *l3_item,
+  enum MLX5_SET_MATCHER key_type)
+{
+#define MLX5_L3_NEXT_PROTOCOL(i, ms)   
 \
+   ((i)->type == RTE_FLOW_ITEM_TYPE_IPV4 ? 
 \
+   ((const struct rte_flow_item_ipv4 *)(i)->ms)->hdr.next_proto_id :   
\
+   (i)->type == RTE_FLOW_ITEM_TYPE_IPV6 ?  
\
+   ((const struct rte_flow_item_ipv6 *)(i)->ms)->hdr.proto :   
\
+   (i)->type == RTE_FLOW_ITEM_TYPE_IPV6_FRAG_EXT ? 
\
+   ((const struct rte_flow_item_ipv6_frag_ext *)(i)->ms)->hdr.next_header 
:\
+   0xff)
+
+   uint8_t next_protocol;
+
+   if (l3_item->mask != NULL && l3_item->spec != NULL) {
+   next_protocol = MLX5_L3_NEXT_PROTOCOL(l3_item, spec);
+   if (next_protocol)
+   next_protocol &= MLX5_L3_NEXT_PROTOCOL(l3_item, mask);
+   else
+   next_protocol = 0xff;
+   } else if (key_type == MLX5_SET_MATCHER_HS_M && l3_item->mask != NULL) {
+   next_protocol =  MLX5_L3_NEXT_PROTOCOL(l3_item, mask);
+   } else if (key_type == MLX5_SET_MATCHER_HS_V && l3_item->spec != NULL) {
+   next_protocol =  MLX5_L3_NEXT_PROTOCOL(l3_item, spec);
+   } else {
+   /* Reset for inner layer. */
+   next_protocol = 0xff;
+   }
+   return next_protocol;
+
+#undef MLX5_L3_NEXT_PROTOCOL
+}
+
 /**
  * Validate IB BTH item.
  *
@@ -7770,19 +7804,8 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct 
rte_flow_attr *attr,
return ret;
last_item = tunnel ? MLX5_FLOW_LAYER_INNER_L3_IPV4 :
 MLX5_FLOW_LAYER_OUTER_L3_IPV4;
-   if (items->mask != NULL &&
-   ((const struct rte_flow_item_ipv4 *)
-items->mask)->hdr.next_proto_id) {
-   next_protocol =
-   ((const struct rte_flow_item_ipv4 *)
-(items->spec))->hdr.next_proto_id;
-   next_protocol &=
-   ((const struct rte_flow_item_ipv4 *)
-(items->mask))->hdr.next_proto_id;
-   } else {
-   /* Reset for inner layer. */
-   next_protocol = 0xff;
-   }
+   next_protocol = mlx5_flow_l3_next_protocol
+   (items, (enum MLX5_SET_MATCHER)-1);
break;
case RTE_FLOW_ITEM_TYPE_IPV6:
mlx5_flow_tunnel_ip_check(items, next_protocol,
@@ -7796,22 +7819,8 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct 
rte_flow_attr *attr,
return ret;
last_item = tunnel ? MLX5_FLOW_LAYER_INNER_L3_IPV6 :
 MLX5_FLOW_LAYER_OUTER_L3_IPV6;
-   if (items->mask != NULL &&
-   ((const struct rte_flow_item_ipv6 *)
-items->mask)->hdr.proto) {
-   item_ipv6_proto =
-   ((const struct rte_flow_item_ipv6 *)
-items->spec)->hdr.proto;
-   next_protocol =
-   ((const struct rte_flow_item_ipv6 *)
-items->spec)->hdr.proto;
-   next_protocol &=
-   ((const struct rte_flow_item_ipv6 *)
-items->mask)->hdr.proto;
-   } else {
-   /* Reset for inner layer. */
-   next_protocol = 0xff;
-   }
+   next_protocol = mlx5_flow_l3_next_protocol
+   (items, (enum MLX5_SET_MATCHER)-1);
break;
case RTE_FL

[PATCH 2/2] net/mlx5: fix IP-in-IP tunnels recognition

2024-02-29 Thread Gregory Etelson
The patch fixes IP-in-IP tunnel recognition for the following patterns

 / [ipv4|ipv6] proto is [ipv4|ipv6] / end

 / [ipv4|ipv6] / [ipv4|ipv6] /

Fixes: 3d69434113d1 ("net/mlx5: add Direct Verbs validation function")
Signed-off-by: Gregory Etelson 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5_flow_dv.c | 104 
 1 file changed, 80 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index fe0a06f364..92a5b7b503 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -275,21 +275,41 @@ struct field_modify_info modify_tcp[] = {
{0, 0, 0},
 };
 
-static void
+enum mlx5_l3_tunnel_detection {
+   l3_tunnel_none,
+   l3_tunnel_outer,
+   l3_tunnel_inner
+};
+
+static enum mlx5_l3_tunnel_detection
 mlx5_flow_tunnel_ip_check(const struct rte_flow_item *item __rte_unused,
- uint8_t next_protocol, uint64_t *item_flags,
- int *tunnel)
+ uint8_t next_protocol, uint64_t item_flags,
+ uint64_t *l3_tunnel_flag)
 {
+   enum mlx5_l3_tunnel_detection td = l3_tunnel_none;
+
MLX5_ASSERT(item->type == RTE_FLOW_ITEM_TYPE_IPV4 ||
item->type == RTE_FLOW_ITEM_TYPE_IPV6);
-   if (next_protocol == IPPROTO_IPIP) {
-   *item_flags |= MLX5_FLOW_LAYER_IPIP;
-   *tunnel = 1;
-   }
-   if (next_protocol == IPPROTO_IPV6) {
-   *item_flags |= MLX5_FLOW_LAYER_IPV6_ENCAP;
-   *tunnel = 1;
+   if ((item_flags & MLX5_FLOW_LAYER_OUTER_L3) == 0) {
+   switch (next_protocol) {
+   case IPPROTO_IPIP:
+   td = l3_tunnel_outer;
+   *l3_tunnel_flag = MLX5_FLOW_LAYER_IPIP;
+   break;
+   case IPPROTO_IPV6:
+   td = l3_tunnel_outer;
+   *l3_tunnel_flag = MLX5_FLOW_LAYER_IPV6_ENCAP;
+   break;
+   default:
+   break;
+   }
+   } else {
+   td = l3_tunnel_inner;
+   *l3_tunnel_flag = item->type == RTE_FLOW_ITEM_TYPE_IPV4 ?
+ MLX5_FLOW_LAYER_IPIP :
+ MLX5_FLOW_LAYER_IPV6_ENCAP;
}
+   return td;
 }
 
 static inline struct mlx5_hlist *
@@ -7718,6 +7738,8 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct 
rte_flow_attr *attr,
return ret;
is_root = (uint64_t)ret;
for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
+   enum mlx5_l3_tunnel_detection l3_tunnel_detection;
+   uint64_t l3_tunnel_flag;
int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
int type = items->type;
 
@@ -7795,8 +7817,16 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct 
rte_flow_attr *attr,
vlan_m = items->mask;
break;
case RTE_FLOW_ITEM_TYPE_IPV4:
-   mlx5_flow_tunnel_ip_check(items, next_protocol,
- &item_flags, &tunnel);
+   next_protocol = mlx5_flow_l3_next_protocol
+   (items, (enum MLX5_SET_MATCHER)-1);
+   l3_tunnel_detection =
+   mlx5_flow_tunnel_ip_check(items, next_protocol,
+ item_flags,
+ &l3_tunnel_flag);
+   if (l3_tunnel_detection == l3_tunnel_inner) {
+   item_flags |= l3_tunnel_flag;
+   tunnel = 1;
+   }
ret = flow_dv_validate_item_ipv4(dev, items, item_flags,
 last_item, ether_type,
 error);
@@ -7804,12 +7834,20 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct 
rte_flow_attr *attr,
return ret;
last_item = tunnel ? MLX5_FLOW_LAYER_INNER_L3_IPV4 :
 MLX5_FLOW_LAYER_OUTER_L3_IPV4;
-   next_protocol = mlx5_flow_l3_next_protocol
-   (items, (enum MLX5_SET_MATCHER)-1);
+   if (l3_tunnel_detection == l3_tunnel_outer)
+   item_flags |= l3_tunnel_flag;
break;
case RTE_FLOW_ITEM_TYPE_IPV6:
-   mlx5_flow_tunnel_ip_check(items, next_protocol,
- &item_flags, &tunnel);
+   next_protocol = mlx5_flow_l3_next_protocol
+   (items, 

[PATCH 0/2] fix IP-in-IP tunnels recognition

2024-02-29 Thread Gregory Etelson
fix IP-in-IP tunnels validation and recognition 

Gregory Etelson (2):
  net/mlx5: remove code duplications
  net/mlx5: fix IP-in-IP tunnels recognition

 drivers/net/mlx5/mlx5_flow_dv.c | 243 +++-
 1 file changed, 117 insertions(+), 126 deletions(-)

Acked-by: Ori Kam 
-- 
2.39.2



Re: [PATCH 0/6] remove incorrect code for loading 16B descriptors

2024-02-29 Thread Bruce Richardson
On Thu, Feb 22, 2024 at 03:57:09PM +0100, Burakov, Anatoly wrote:
> On 1/23/2024 12:40 PM, Bruce Richardson wrote:
> > Inside the AVX2 code paths, there was special case code for loading two
> > 16-byte descriptors simultaneously, if that build-time feature was
> > enabled. As well as not being enabled by default, these code blocks also
> > were incorrect as there is no guarantee of the two descriptors being
> > loaded either atomically or in a defined order. If they were loaded in
> > an unexpected order the driver logic would break. Therefore we remove
> > these blocks, and do come cleanup of the following code to remove
> > indentation.
> > 
> > NOTE: I've split out the removal and subsequent cleanup into separate
> > patches for ease of review. These can be merged into a single patch on
> > merge, if so desired.
> > 
> > Bruce Richardson (6):
> >net/i40e: remove incorrect 16B descriptor read block
> >net/i40e: reduce code indentation
> >net/iavf: remove incorrect 16B descriptor read block
> >net/ice: remove incorrect 16B descriptor read block
> >net/ice: reduce code indent
> >net/iavf: reduce code indent
> > 
> >   drivers/net/i40e/i40e_rxtx_vec_avx2.c | 64 -
> >   drivers/net/iavf/iavf_rxtx_vec_avx2.c | 80 ---
> >   drivers/net/ice/ice_rxtx_vec_avx2.c   | 80 ---
> >   3 files changed, 72 insertions(+), 152 deletions(-)
> > 
> > --
> > 2.40.1
> > 
> Series-Acked-by: Anatoly Burakov 

Squashed the 6 patches down to 3, and applied to dpdk-next-net-intel

/Bruce


RE: [EXT] [PATCH v6 1/4] common/qat: add files specific to GEN LCE

2024-02-29 Thread Akhil Goyal
> a/drivers/common/qat/qat_adf/adf_transport_access_macros_gen_lce.h
> b/drivers/common/qat/qat_adf/adf_transport_access_macros_gen_lce.h
> new file mode 100644
> index 00..c9df8f5dd2
> --- /dev/null
> +++ b/drivers/common/qat/qat_adf/adf_transport_access_macros_gen_lce.h
> @@ -0,0 +1,51 @@
> +/* SPDX-License-Identifier: (BSD-3-Clause OR GPL-2.0)
> + * Copyright(c) 2021 Intel Corporation
> + */

I believe copyright year is a typo here.




Re: [EXT] [PATCH v4 01/12] eventdev: improve doxygen introduction text

2024-02-29 Thread Jerin Jacob
On Mon, Feb 26, 2024 at 3:29 PM Bruce Richardson
 wrote:
>
> On Mon, Feb 26, 2024 at 04:51:25AM +, Pavan Nikhilesh Bhagavatula wrote:
> > > Make some textual improvements to the introduction to eventdev and event
> > > devices in the eventdev header file. This text appears in the doxygen
> > > output for the header file, and introduces the key concepts, for
> > > example: events, event devices, queues, ports and scheduling.
> > >
> > > This patch makes the following improvements:
> > > * small textual fixups, e.g. correcting use of singular/plural
> > > * rewrites of some sentences to improve clarity
> > > * using doxygen markdown to split the whole large block up into
> > >   sections, thereby making it easier to read.
> > >
> > > No large-scale changes are made, and blocks are not reordered
> > >
> > > Signed-off-by: Bruce Richardson 
> > >
> >
> > Acked-by: Pavan Nikhilesh 
> >
> > > ---
> > > V4: reworked following review by Jerin
> > > V3: reworked following feedback from Mattias
> > > ---
> > >  lib/eventdev/rte_eventdev.h | 140 ++--
> > >  1 file changed, 86 insertions(+), 54 deletions(-)
> > >
> 
>
> > > + * In contrast, in an event-driver model, as supported by this "eventdev"
> >
> > Should be event-driven model.
> >
>
> Yes, good spot. Jerin, can you just fix this typo on apply, please?

Sure.

Series-Acked-by: Jerin Jacob 

Series applied to dpdk-next-eventdev/for-main. Thanks.


RE: [EXT] [PATCH v6 1/4] common/qat: add files specific to GEN LCE

2024-02-29 Thread Akhil Goyal
> Adding GEN5 files for handling GEN LCE specific operations.
> These files are inherited from the existing files/APIs
> which has some changes specific GEN5 requirements

It is not a good practice to use "adding files specific to .."
Instead please explain what operation/feature is added for new device.


> Also updated the mailmap file.
> 
> Signed-off-by: Nishikant Nayak 
> Acked-by: Ciara Power 


Re: [PATCH v4] dts: add Dockerfile

2024-02-29 Thread Nicholas Pratte
Tested-by: Nicholas Pratte 

On Thu, Feb 29, 2024 at 10:48 AM Patrick Robb  wrote:

>
>
> -- Forwarded message -
> From: 
> Date: Tue, Jan 16, 2024 at 2:18 PM
> Subject: [PATCH v4] dts: add Dockerfile
> To: , , <
> tho...@monjalon.net>, , , <
> paul.szczepa...@arm.com>, 
> Cc: , Jeremy Spewock 
>
>
> From: Juraj Linkeš 
>
> The Dockerfile defines development and CI runner images.
>
> Signed-off-by: Juraj Linkeš 
> Signed-off-by: Jeremy Spewock 
> ---
> v4:
>
> Remove an example from and updated a comment in the devcontainer.json
> and added the --no-root flag to the README to comply with the warning
> message and the discussion on slack.
>
> v3:
>
> Remove pexpect.
>
> v2:
>
> This verson updates the dockerfile to instead install poetry using pipx
> due to the version of poetry installed using the package repositories of
> the distro being out of date, and to conform to documentation on
> installing poetry.
>
> This version also adds extra information to the README about the
> preference of using SSH keys, and added a way to inject them into the
> devcontainer for vscode.
>
>  dts/.devcontainer/devcontainer.json | 30 +
>  dts/Dockerfile  | 38 
>  dts/README.md   | 70 +
>  3 files changed, 138 insertions(+)
>  create mode 100644 dts/.devcontainer/devcontainer.json
>  create mode 100644 dts/Dockerfile
>  create mode 100644 dts/README.md
>
> diff --git a/dts/.devcontainer/devcontainer.json
> b/dts/.devcontainer/devcontainer.json
> new file mode 100644
> index 00..4d737f1b40
> --- /dev/null
> +++ b/dts/.devcontainer/devcontainer.json
> @@ -0,0 +1,30 @@
> +// For format details, see https://aka.ms/devcontainer.json. For config
> options, see the README at:
> +//
> https://github.com/microsoft/vscode-dev-containers/tree/v0.241.1/containers/docker-existing-dockerfile
> +{
> +   "name": "Existing Dockerfile",
> +
> +   // Sets the run context to one level up instead of the
> .devcontainer folder.
> +   "context": "..",
> +
> +   // Update the 'dockerFile' property if you aren't using the
> standard 'Dockerfile' filename.
> +   "dockerFile": "../Dockerfile",
> +
> +   // Use 'forwardPorts' to make a list of ports inside the container
> available locally.
> +   // "forwardPorts": [],
> +
> +   // The next line runs commands after the container is created - in
> our case, installing dependencies.
> +   "postCreateCommand": "poetry install --no-root",
> +
> +   "extensions": [
> +   "ms-python.vscode-pylance",
> +   ]
> +
> +   // Uncomment when using a ptrace-based debugger like C++, Go, and
> Rust
> +   // "runArgs": [ "--cap-add=SYS_PTRACE", "--security-opt",
> "seccomp=unconfined" ],
> +
> +   // Uncomment to use the Docker CLI from inside the container. See
> https://aka.ms/vscode-remote/samples/docker-from-docker.
> +   // "mounts": [
> "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind" ],
> +
> +   // Uncomment to mount your SSH keys into the devcontainer used by
> vscode.
> +   // "mounts":
> ["source=${localEnv:HOME}/.ssh,destination=/root/.ssh,type=bind,readonly"]
> +}
> diff --git a/dts/Dockerfile b/dts/Dockerfile
> new file mode 100644
> index 00..fa4c1af10e
> --- /dev/null
> +++ b/dts/Dockerfile
> @@ -0,0 +1,38 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2022 University of New Hampshire
> +
> +# There are two Docker images defined in this Dockerfile.
> +# One is to be used in CI for automated testing.
> +# The other provides a DTS development environment, simplifying Python
> dependency management.
> +
> +FROM ubuntu:22.04 AS base
> +
> +RUN apt-get -y update && apt-get -y upgrade && \
> +apt-get -y install --no-install-recommends \
> +python3 \
> +python3-pip \
> +pipx \
> +python3-cachecontrol \
> +openssh-client && \
> +pipx install poetry>=1.5.1 && pipx ensurepath
> +WORKDIR /dpdk/dts
> +
> +
> +FROM base AS runner
> +
> +# This image is intended to be used as the base for automated systems.
> +# It bakes DTS into the image during the build.
> +
> +COPY . /dpdk/dts
> +RUN poetry install --no-dev
> +
> +CMD ["poetry", "run", "python", "main.py"]
> +
> +FROM base AS dev
> +
> +# This image is intended to be used as DTS development environment. It
> doesn't need C compilation
> +# capabilities, only Python dependencies. Once a container mounting DTS
> using this image is running,
> +# the dependencies should be installed using Poetry.
> +
> +RUN apt-get -y install --no-install-recommends \
> +vim emacs git
> diff --git a/dts/README.md b/dts/README.md
> new file mode 100644
> index 00..36c8cc9a0c
> --- /dev/null
> +++ b/dts/README.md
> @@ -0,0 +1,70 @@
> +# DTS Environment
> +The execution and development environments for DTS are the same, a
> +[Docker](https://docs.docker.com/) container defined by ou

RE: [PATCH v4] crypto/ipsec_mb: unified IPsec MB interface

2024-02-29 Thread Dooley, Brian
Hi folks,

The introduction of a more unified IPsec MB library for DPDK is causing the 
snow3g tests to fail on ARM. Artifact here: 
https://lab.dpdk.org/results/dashboard/patchsets/29315/
PMDs using the direct API (KASUMI, CHACHA, ZUC, SNOW3G) will use the job API, 
from the AESNI MB PMD code.
We have come across a similar issue in the past that related to an offset issue 
as SNOW3G uses bits instead of bytes.
 
commit a501609ea6466ed8526c0dfadedee332a4d4a451
Author: Pablo de Lara pablo.de.lara.gua...@intel.com
Date:   Wed Feb 23 16:01:16 2022 +
 
crypto/ipsec_mb: fix length and offset settings
 
KASUMI, SNOW3G and ZUC require lengths and offsets to
be set in bits or bytes depending on the algorithm.
There were some algorithms that were mixing these two,
so this commit is fixing this issue. 
 
This bug only appeared recently when the ARM ipsec version was bumped to 1.4. 
It appears there could be a similar scenario happening now and this is a 
potential fix that needs to be made in the ARM IPsec-mb repo:
 
diff --git a/lib/aarch64/mb_mgr_snow3g_submit_flush_common_aarch64.h 
b/lib/aarch64/mb_mgr_snow3g_submit_flush_common_aarch64.h
index 13bca11b..de284ade 100644
--- a/lib/aarch64/mb_mgr_snow3g_submit_flush_common_aarch64.h
+++ b/lib/aarch64/mb_mgr_snow3g_submit_flush_common_aarch64.h
@@ -94,8 +94,8 @@ static void snow3g_mb_mgr_insert_uea2_job(MB_MGR_SNOW3G_OOO 
*state, IMB_JOB *job
 state->num_lanes_inuse++;
 state->args.iv[used_lane_idx] = job->iv;
 state->args.keys[used_lane_idx] = job->enc_keys;
-state->args.in[used_lane_idx] = job->src + 
job->cipher_start_src_offset_in_bytes;
-state->args.out[used_lane_idx] = job->dst;
+state->args.in[used_lane_idx] = job->src + 
(job->cipher_start_src_offset_in_bits / 8);
+state->args.out[used_lane_idx] = job->dst + 
(job->cipher_start_src_offset_in_bits / 8);
 state->args.byte_length[used_lane_idx] = job->msg_len_to_cipher_in_bits / 
8;
 state->args.INITIALIZED[used_lane_idx] = 0;
 state->lens[used_lane_idx] = job->msg_len_to_cipher_in_bits / 8;

Thanks,
Brian

> -Original Message-
> From: Dooley, Brian 
> Sent: Wednesday, February 28, 2024 11:33 AM
> To: Ji, Kai ; De Lara Guarch, Pablo
> 
> Cc: dev@dpdk.org; gak...@marvell.com; Dooley, Brian
> 
> Subject: [PATCH v4] crypto/ipsec_mb: unified IPsec MB interface
> 
> Currently IPsec MB provides both the JOB API and direct API.
> AESNI_MB PMD is using the JOB API codepath while ZUC, KASUMI, SNOW3G
> and CHACHA20_POLY1305 are using the direct API.
> Instead of using the direct API for these PMDs, they should now make
> use of the JOB API codepath. This would remove all use of the IPsec MB
> direct API for these PMDs.
> 
> Signed-off-by: Brian Dooley 
> ---
> v2:
> - Fix compilation failure
> v3:
> - Remove session configure pointer for each PMD
> v4:
> - Keep AES GCM PMD and fix extern issue
> ---
>  doc/guides/rel_notes/release_24_03.rst|   6 +
>  drivers/crypto/ipsec_mb/pmd_aesni_mb.c|  10 +-
>  drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h   |  15 +-
>  drivers/crypto/ipsec_mb/pmd_chacha_poly.c | 338 +--
>  .../crypto/ipsec_mb/pmd_chacha_poly_priv.h|  28 -
>  drivers/crypto/ipsec_mb/pmd_kasumi.c  | 410 +
>  drivers/crypto/ipsec_mb/pmd_kasumi_priv.h |  20 -
>  drivers/crypto/ipsec_mb/pmd_snow3g.c  | 543 +-
>  drivers/crypto/ipsec_mb/pmd_snow3g_priv.h |  21 -
>  drivers/crypto/ipsec_mb/pmd_zuc.c | 347 +--
>  drivers/crypto/ipsec_mb/pmd_zuc_priv.h|  20 -
>  11 files changed, 48 insertions(+), 1710 deletions(-)
> 



RE: [EXT] [PATCH v6 1/4] common/qat: add files specific to GEN LCE

2024-02-29 Thread Power, Ciara



> -Original Message-
> From: Akhil Goyal 
> Sent: Thursday, February 29, 2024 4:14 PM
> To: Nayak, Nishikanta ; dev@dpdk.org
> Cc: Power, Ciara ; Ji, Kai ; Kusztal,
> ArkadiuszX ; S Joshi, Rakesh
> ; Thomas Monjalon ;
> Burakov, Anatoly 
> Subject: RE: [EXT] [PATCH v6 1/4] common/qat: add files specific to GEN LCE
> 
> > Adding GEN5 files for handling GEN LCE specific operations.
> > These files are inherited from the existing files/APIs which has some
> > changes specific GEN5 requirements
> 
> It is not a good practice to use "adding files specific to .."
> Instead please explain what operation/feature is added for new device.

Ok, we can squash this with patch #2 when the device ID is supported and 
functions are being used.
Will update in next version.

Thanks,
Ciara

> 
> 
> > Also updated the mailmap file.
> >
> > Signed-off-by: Nishikant Nayak 
> > Acked-by: Ciara Power 


RE: [PATCH v4] crypto/ipsec_mb: unified IPsec MB interface

2024-02-29 Thread Akhil Goyal
> Hi folks,
> 
> The introduction of a more unified IPsec MB library for DPDK is causing the
> snow3g tests to fail on ARM. Artifact here:
> https://lab.dpdk.org/results/dashboard/patchsets/29315/
> PMDs using the direct API (KASUMI, CHACHA, ZUC, SNOW3G) will use the job API,
> from the AESNI MB PMD code.
> We have come across a similar issue in the past that related to an offset 
> issue as
> SNOW3G uses bits instead of bytes.

The above link does not seem to be working. 
I believe from now on, since we continue to maintain two separate repos,
it would be better to get ack from ARM folks as well
before merging anything onto crypto/ipsec_mb PMD.

Arm folks, Could you please get the below change tested/incorporated in the 
repo.


> 
> commit a501609ea6466ed8526c0dfadedee332a4d4a451
> Author: Pablo de Lara pablo.de.lara.gua...@intel.com
> Date:   Wed Feb 23 16:01:16 2022 +
> 
> crypto/ipsec_mb: fix length and offset settings
> 
> KASUMI, SNOW3G and ZUC require lengths and offsets to
> be set in bits or bytes depending on the algorithm.
> There were some algorithms that were mixing these two,
> so this commit is fixing this issue.
> 
> This bug only appeared recently when the ARM ipsec version was bumped to 1.4.
> It appears there could be a similar scenario happening now and this is a 
> potential
> fix that needs to be made in the ARM IPsec-mb repo:
> 
> diff --git a/lib/aarch64/mb_mgr_snow3g_submit_flush_common_aarch64.h
> b/lib/aarch64/mb_mgr_snow3g_submit_flush_common_aarch64.h
> index 13bca11b..de284ade 100644
> --- a/lib/aarch64/mb_mgr_snow3g_submit_flush_common_aarch64.h
> +++ b/lib/aarch64/mb_mgr_snow3g_submit_flush_common_aarch64.h
> @@ -94,8 +94,8 @@ static void
> snow3g_mb_mgr_insert_uea2_job(MB_MGR_SNOW3G_OOO *state, IMB_JOB
> *job
>  state->num_lanes_inuse++;
>  state->args.iv[used_lane_idx] = job->iv;
>  state->args.keys[used_lane_idx] = job->enc_keys;
> -state->args.in[used_lane_idx] = job->src + job-
> >cipher_start_src_offset_in_bytes;
> -state->args.out[used_lane_idx] = job->dst;
> +state->args.in[used_lane_idx] = job->src + (job-
> >cipher_start_src_offset_in_bits / 8);
> +state->args.out[used_lane_idx] = job->dst + (job-
> >cipher_start_src_offset_in_bits / 8);
>  state->args.byte_length[used_lane_idx] = job->msg_len_to_cipher_in_bits 
> / 8;
>  state->args.INITIALIZED[used_lane_idx] = 0;
>  state->lens[used_lane_idx] = job->msg_len_to_cipher_in_bits / 8;
> 
> Thanks,
> Brian
> 
> > -Original Message-
> > From: Dooley, Brian 
> > Sent: Wednesday, February 28, 2024 11:33 AM
> > To: Ji, Kai ; De Lara Guarch, Pablo
> > 
> > Cc: dev@dpdk.org; gak...@marvell.com; Dooley, Brian
> > 
> > Subject: [PATCH v4] crypto/ipsec_mb: unified IPsec MB interface
> >
> > Currently IPsec MB provides both the JOB API and direct API.
> > AESNI_MB PMD is using the JOB API codepath while ZUC, KASUMI, SNOW3G
> > and CHACHA20_POLY1305 are using the direct API.
> > Instead of using the direct API for these PMDs, they should now make
> > use of the JOB API codepath. This would remove all use of the IPsec MB
> > direct API for these PMDs.
> >
> > Signed-off-by: Brian Dooley 
> > ---
> > v2:
> > - Fix compilation failure
> > v3:
> > - Remove session configure pointer for each PMD
> > v4:
> > - Keep AES GCM PMD and fix extern issue
> > ---
> >  doc/guides/rel_notes/release_24_03.rst|   6 +
> >  drivers/crypto/ipsec_mb/pmd_aesni_mb.c|  10 +-
> >  drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h   |  15 +-
> >  drivers/crypto/ipsec_mb/pmd_chacha_poly.c | 338 +--
> >  .../crypto/ipsec_mb/pmd_chacha_poly_priv.h|  28 -
> >  drivers/crypto/ipsec_mb/pmd_kasumi.c  | 410 +
> >  drivers/crypto/ipsec_mb/pmd_kasumi_priv.h |  20 -
> >  drivers/crypto/ipsec_mb/pmd_snow3g.c  | 543 +-
> >  drivers/crypto/ipsec_mb/pmd_snow3g_priv.h |  21 -
> >  drivers/crypto/ipsec_mb/pmd_zuc.c | 347 +--
> >  drivers/crypto/ipsec_mb/pmd_zuc_priv.h|  20 -
> >  11 files changed, 48 insertions(+), 1710 deletions(-)
> >
> 


Re: [PATCH v2 1/1] net/octeon_ep: use devarg to enable ISM accesses

2024-02-29 Thread Jerin Jacob
On Mon, Feb 26, 2024 at 2:55 PM Vamsi Attunuru  wrote:
>
> Adds a devarg option to enable/disable ISM memory accesses
> for reading packet count details. This option is disabled
> by default, as ISM memory accesses effect throughput of
> bigger size packets.
>
> Signed-off-by: Vamsi Attunuru 

Updated the git commit as follows and applied to
dpdk-next-net-mrvl/for-main. Thanks

net/octeon_ep: enable ISM accesses via devarg

Adds a devarg option to enable/disable ISM memory accesses
for reading packet count details. This option is disabled
by default, as ISM memory accesses effect throughput of
bigger size packets.

Signed-off-by: Vamsi Attunuru 


RE: [EXT] [PATCH v6 4/4] test/cryptodev: add tests for GCM with AAD

2024-02-29 Thread Power, Ciara



> -Original Message-
> From: Akhil Goyal 
> Sent: Thursday, February 29, 2024 3:52 PM
> To: Nayak, Nishikanta ; dev@dpdk.org
> Cc: Power, Ciara ; Ji, Kai ; Kusztal,
> ArkadiuszX ; S Joshi, Rakesh
> ; Fan Zhang 
> Subject: RE: [EXT] [PATCH v6 4/4] test/cryptodev: add tests for GCM with AAD
> 
> > Adding one new unit test code for validating the features added as
> > part of GCM with 64 byte AAD.
> > The new test case adds one new test for GCM algo for both encrypt and
> > decrypt operations.
> >
> > Signed-off-by: Nishikant Nayak 
> > Acked-by: Ciara Power 
> > ---
> What is the need for this new test vector? How is this case not covered in
> existing cases?
> Can you explain in the patch description?
> How is it different than gcm_test_case_aad_2 case and other gcm 128 cases?
> 

The differential is this test vector uses aad of size 64 bytes.
So far, other test vectors have aad sizes of 0, 8, 12, 65296.

Thanks,
Ciara






Re: [PATCH v2] ethdev: add Linux ethtool link mode conversion

2024-02-29 Thread Stephen Hemminger
On Thu, 29 Feb 2024 16:42:56 +0100
Thomas Monjalon  wrote:

> +/* Link modes sorted with index as defined in ethtool.
> + * Values are speed in Mbps with LSB indicating duplex.
> + *
> + * The ethtool bits definition should not change as it is a kernel API.
> + * Using raw numbers directly avoids checking API availability
> + * and allows to compile with new bits included even on an old kernel.
> + *
> + * The array below is built from bit definitions with this shell command:
> + *   sed -rn 's;.*(ETHTOOL_LINK_MODE_)([0-9]+)([0-9a-zA-Z_]*).*= 
> *([0-9]*).*;'\
> + *   '[\4] = \2, /\* \1\2\3 *\/;p' /usr/include/linux/ethtool.h |
> + *   awk '/_Half_/{$3=$3+1","}1'
> + */
> +static uint32_t link_modes[] = {

Make it const please.

You could add meson rule to generate it and then use non-numeric tags.


[PATCH] doc: update link to DevX integration on Windows

2024-02-29 Thread Ali Alnubani
The older link no longer works.

Cc: sta...@dpdk.org

Signed-off-by: Ali Alnubani 
---
 doc/guides/platform/mlx5.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/guides/platform/mlx5.rst b/doc/guides/platform/mlx5.rst
index a66cf778d1..e9a1f52aca 100644
--- a/doc/guides/platform/mlx5.rst
+++ b/doc/guides/platform/mlx5.rst
@@ -230,7 +230,7 @@ DevX SDK Installation
 The DevX SDK must be installed on the machine building the Windows PMD.
 Additional information can be found at
 `How to Integrate Windows DevX in Your Development Environment
-`_.
+`_.
 The minimal supported WinOF2 version is 2.60.
 
 
-- 
2.25.1



Re: [PATCH v3 3/3] event/cnxk: support DMA event functions

2024-02-29 Thread Jerin Jacob
On Mon, Feb 26, 2024 at 3:30 PM Amit Prakash Shukla
 wrote:
>
> Added support of dma driver callback assignment to eventdev
> enqueue and dequeue. The change also defines dma adapter
> capabilities function.
>
> Depends-on: series-30612 ("lib/dmadev: get DMA device using device ID")
>
> Signed-off-by: Amit Prakash Shukla 



# Please update release note add entry in PMD driver for this new feature
# Fix following build issue when building each patch at time.

[2850/3208] Generating drivers/rte_common_idpf.sym_chk with a custom
command (wrapped by meson to capture output)
[2851/3208] Linking static target drivers/librte_net_cnxk.a
[2852/3208] Compiling C object
drivers/libtmp_rte_dma_cnxk.a.p/dma_cnxk_cnxk_dmadev_fp.c.o
[2853/3208] Linking static target drivers/libtmp_rte_dma_cnxk.a
[2854/3208] Generating drivers/rte_net_cnxk.sym_chk with a custom
command (wrapped by meson to capture output)
[2855/3208] Compiling C object
drivers/libtmp_rte_event_cnxk.a.p/event_cnxk_cn9k_eventdev.c.o
FAILED: drivers/libtmp_rte_event_cnxk.a.p/event_cnxk_cn9k_eventdev.c.o
ccache aarch64-linux-gnu-gcc -Idrivers/libtmp_rte_event_cnxk.a.p
-Idrivers -I../drivers -Idrivers/event/cnxk -I../drivers/event/cnxk
-Ilib/eventdev -I../lib/eventdev -I. -I.. -Iconfig -I../config
-Ilib/eal/include -I../lib/eal/include -Ilib/eal/linux/include
-I../lib/eal/linux/include -Ilib/eal/arm/include
-I../lib/eal/arm/include -Ilib/eal/common -I../lib/eal/common
-Ilib/eal -I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/log
-I../lib/log -Ilib/metrics -I../lib/metrics -Ilib/telemetry
-I../lib/telemetry -Ilib/ring -I../lib/ring -Ilib/ethdev
-I../lib/ethdev -Ilib/net -I../lib/net -Ilib/mbuf -I../lib/mbuf
-Ilib/mempool -I../lib/mempool -Ilib/meter -I../lib/meter -Ilib/hash
-I../lib/hash -Ilib/rcu -I../lib/rcu -Ilib/timer -I../lib/timer
-Ilib/cryptodev -I../lib/cryptodev -Ilib/dmadev -I../lib/dmadev
-Idrivers/bus/pci -I../drivers/bus/pci -I../drivers/bus/pci/linux
-Ilib/pci -I../lib/pci -Idrivers/common/cnxk -I../drivers/common/cnxk
-Ilib/security -I../lib/security -Idrivers/net/cnxk
-I../drivers/net/cnxk -Idrivers/bus/vdev -I../drivers/bus/vdev
-Idrivers/mempool/cnxk -I../drivers/mempool/cnxk -Idrivers/crypto/cnxk
-I../drivers/crypto/cnxk -I/export/cross_prefix/prefix/include
-fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch
-Wextra -Werror -std=c11 -O2 -g -include rte_config.h -Wcast-qual
-Wdeprecated -Wformat -Wformat-nonliteral -Wformat-security
-Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare
-Wstrict-prototypes -Wundef -Wwrite-strings
-Wno-address-of-packed-member -Wno-packed-not-aligned
-Wno-missing-field-initializers -Wno-zero-length-bounds -D_GNU_SOURCE
-fPIC -march=armv8-a+crc -moutline-atomics -DALLOW_EXPERIMENTAL_API
-DALLOW_INTERNAL_API -Wno-format-truncation -flax-vector-conversions
-Wno-strict-aliasing -DRTE_LOG_DEFAULT_LOGTYPE=pmd.event.cnxk -MD -MQ
drivers/libtmp_rte_event_cnxk.a.p/event_cnxk_cn9k_eventdev.c.o -MF
drivers/libtmp_rte_event_cnxk.a.p/event_cnxk_cn9k_eventdev.c.o.d -o
drivers/libtmp_rte_event_cnxk.a.p/event_cnxk_cn9k_eventdev.c.o -c
../drivers/event/cnxk/cn9k_eventdev.c
../drivers/event/cnxk/cn9k_eventdev.c: In function 'cn9k_sso_fp_fns_set':
../drivers/event/cnxk/cn9k_eventdev.c:463:34: error:
'cn9k_dma_adapter_enqueue' undeclared (first use in this function);
did you mean 'event_dma_adapter_enqueue_t'?
  463 | event_dev->dma_enqueue = cn9k_dma_adapter_enqueue;
  |  ^~~~
  |  event_dma_adapter_enqueue_t
../drivers/event/cnxk/cn9k_eventdev.c:463:34: note: each undeclared
identifier is reported only once for each function it appears in
../drivers/event/cnxk/cn9k_eventdev.c:479:42: error:
'cn9k_dma_adapter_dual_enqueue' undeclared (first use in this
function)
  479 | event_dev->dma_enqueue = cn9k_dma_adapter_dual_enqueue;
  |  ^
[2856/3208] Generating drivers/rte_dma_cnxk.pmd.c with a custom command
[2857/3208] Generating drivers/rte_bus_dpaa.sym_chk with a custom
command (wrapped by meson to capture output)
[2858/3208] Generating drivers/rte_bus_fslmc.sym_chk with a custom
command (wrapped by meson to capture output)
[2859/3208] Generating lib/pipeline.sym_chk with a custom command
(wrapped by meson to capture output)
[2860/3208] Generating lib/ethdev.sym_chk with a custom command
(wrapped by meson to capture output)
[2861/3208] Generating lib/eal.sym_chk with a custom command (wrapped
by meson to capture output)
[2862/3208] Generating drivers/rte_common_sfc_efx.sym_chk with a
custom command (wrapped by meson to capture output)
[2863/3208] Generating drivers/rte_common_cnxk.sym_chk with a custom
command (wrapped by meson to capture output)
ninja: build stopped: subcommand failed.


RE: [PATCH v2] ethdev: add Linux ethtool link mode conversion

2024-02-29 Thread Morten Brørup
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Thursday, 29 February 2024 17.45
> 
> On Thu, 29 Feb 2024 16:42:56 +0100
> Thomas Monjalon  wrote:
> 
> > +/* Link modes sorted with index as defined in ethtool.
> > + * Values are speed in Mbps with LSB indicating duplex.
> > + *
> > + * The ethtool bits definition should not change as it is a kernel
> API.
> > + * Using raw numbers directly avoids checking API availability
> > + * and allows to compile with new bits included even on an old
> kernel.
> > + *
> > + * The array below is built from bit definitions with this shell
> command:
> > + *   sed -rn 's;.*(ETHTOOL_LINK_MODE_)([0-9]+)([0-9a-zA-Z_]*).*=
> *([0-9]*).*;'\
> > + *   '[\4] = \2, /\* \1\2\3 *\/;p'
> /usr/include/linux/ethtool.h |
> > + *   awk '/_Half_/{$3=$3+1","}1'
> > + */
> > +static uint32_t link_modes[] = {
> 
> Make it const please.
> 
> You could add meson rule to generate it and then use non-numeric tags.

However you do it, make sure it cross builds. The kernel/ethtool on the target 
system may differ from the one on the build system.



RE: [PATCH] test/crypto: add ext mbuf test for aes-gcm aead algo

2024-02-29 Thread Akhil Goyal
> Subject: [PATCH] test/crypto: add ext mbuf test for aes-gcm aead algo
> 
> Add external mbuf test for AES GCM aead algo.
> 
> Signed-off-by: Aakash Sasidharan 
Applied to dpdk-next-crypto
Thanks.


RE: [EXT] [PATCH 0/2] app/test-crypto-perf: fix multi-segment issue

2024-02-29 Thread Akhil Goyal
> This commit fixes the bugs in multi-segment size calculation.
> 
> Suanming Mou (2):
>   app/test-crypto-perf: fix copy segment size calculation
>   app/test-crypto-perf: fix dst_mbuf size calculation
> 
>  app/test-crypto-perf/cperf_test_common.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
Series Acked-by: Akhil Goyal 
Applied to dpdk-next-crypto


  1   2   3   >