[DPDK/ethdev Bug 1591] MLX5 Windows : Issue with Packet Loss When Setting Descriptors Above 1<<14 on ConnectX6-DX

2024-12-03 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=1591

Bug ID: 1591
   Summary: MLX5 Windows : Issue with Packet Loss When Setting
Descriptors Above 1<<14 on ConnectX6-DX
   Product: DPDK
   Version: 24.11
  Hardware: x86
OS: Windows
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: a.polle...@deltacast.tv
  Target Milestone: ---

I am encountering an issue with the ConnectX6-DX on Windows. When I set the
number of descriptors to a value greater than 1<<14, all my packets are dropped
(imissed), except for the first one. The root cause is unclear, but I observed
that the maximum number of descriptors reported by rte_eth_dev_info_get() is
32768 (rx_desc_lim.nb_max), which I believe indicates that the number of
descriptors should be set to this value or lower.

Here are some test results using testpmd that demonstrate the issue:

TEST 1 : 4096 descriptors

./dpdk-testpmd -l 2-3 -n 4 -a :03:00.0 --log-level=8
--log-level=pmd.common.mlx5:8 --log-level=pmd.net.mlx5:8 -- --socket-num=0
--burst=64 --txd=4096 --rxd=4096 --mbcache=512 --rxq=4 --txq=4 --nb-cores=1
--txpkts=1500 -i --forward-mode=rxonly --flow-isolate-all

testpmd> show port stats 0

   NIC statistics for port 0  
  RX-packets: 1626632RX-missed: 0  RX-bytes:  2152490218
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 0  TX-errors: 0  TX-bytes:  0

  Throughput (since last show)
  Rx-pps:   246876  Rx-bps:   2613496560
  Tx-pps:0  Tx-bps:0
  


TEST 2 : 16384 descriptors
./dpdk-testpmd -l 2-3 -n 4 -a :03:00.0 --log-level=8
--log-level=pmd.common.mlx5:8 --log-level=pmd.net.mlx5:8 -- --socket-num=0
--burst=64 --txd=4096 --rxd=16384 --mbcache=512 --rxq=4 --txq=4 --nb-cores=1
--txpkts=1500 -i --forward-mode=rxonly --flow-isolate-all
testpmd> show port stats 0

   NIC statistics for port 0  
  RX-packets: 2923021RX-missed: 0  RX-bytes:  3867975188
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 0  TX-errors: 0  TX-bytes:  0

  Throughput (since last show)
  Rx-pps:   246881  Rx-bps:   2613540240
  Tx-pps:0  Tx-bps:0
  

TEST 3 : 20480 descriptors
./dpdk-testpmd -l 2-3 -n 4 -a :03:00.0 --log-level=8
--log-level=pmd.common.mlx5:8 --log-level=pmd.net.mlx5:8 -- --socket-num=0
--burst=64 --txd=4096 --rxd=20480 --mbcache=512 --rxq=4 --txq=4 --nb-cores=1
--txpkts=1500 -i --forward-mode=rxonly --flow-isolate-all

testpmd> show port stats 0

   NIC statistics for port 0  
  RX-packets: 1  RX-missed: 2732098RX-bytes:  1328
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 0  TX-errors: 0  TX-bytes:  0

  Throughput (since last show)
  Rx-pps:0  Rx-bps:0
  Tx-pps:0  Tx-bps:0
  

TEST 4 : 32768 descriptors

./dpdk-testpmd -l 2-3 -n 4 -a :03:00.0 --log-level=8
--log-level=pmd.common.mlx5:8 --log-level=pmd.net.mlx5:8 -- --socket-num=0
--burst=64 --txd=4096 --rxd=32768 --mbcache=512 --rxq=4 --txq=4 --nb-cores=1
--txpkts=1500 -i --forward-mode=rxonly --flow-isolate-all
testpmd> show port stats 0

   NIC statistics for port 0  
  RX-packets: 1  RX-missed: 1129806RX-bytes:  1328
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 0  TX-errors: 0  TX-bytes:  0

  Throughput (since last show)
  Rx-pps:0  Rx-bps:0
  Tx-pps:0  Tx-bps:0
  


I was able to reproduce this issue on versions 24.11 and 23.11, using DevX
version 24.10.26603.

Thank you in advance for your help,

Please let me know if you need further information.

Thank you.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[PATCH] doc: add tested platforms with NVIDIA NICs

2024-12-03 Thread Raslan Darawsheh
Add tested platforms with NVIDIA NICs to the 24.11 release notes.

Signed-off-by: Raslan Darawsheh 
---
 doc/guides/rel_notes/release_24_11.rst | 108 +
 1 file changed, 108 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_11.rst 
b/doc/guides/rel_notes/release_24_11.rst
index 8486cd986f..61349b1ca2 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -616,3 +616,111 @@ Tested Platforms
   * Firmware version: 2.14, 0x828c
   * Device id (pf): 8086:125b
   * Driver version(in-tree): 6.8.0-45-generic (Ubuntu24.04.1)(igc)
+
+* Intel\ |reg| platforms with NVIDIA\ |reg| NICs combinations
+
+  * CPU:
+
+* Intel\ |reg| Xeon\ |reg| Gold 6154 CPU @ 3.00GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2697A v4 @ 2.60GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2697 v3 @ 2.60GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2680 v2 @ 2.80GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2670 0 @ 2.60GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2650 v4 @ 2.20GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2650 v3 @ 2.30GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2640 @ 2.50GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2650 0 @ 2.00GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2620 v4 @ 2.10GHz
+
+  * OS:
+
+* Red Hat Enterprise Linux release 9.1 (Plow)
+* Red Hat Enterprise Linux release 8.6 (Ootpa)
+* Red Hat Enterprise Linux release 8.4 (Ootpa)
+* Ubuntu 22.04
+* Ubuntu 20.04
+* SUSE Enterprise Linux 15 SP2
+
+  * OFED:
+
+* MLNX_OFED 24.10-0.7.0.0 and above
+
+  * DOCA:
+* doca 2.9.0-0.4.7 and above
+
+  * upstream kernel:
+
+* Linux 6.12.0 and above
+
+  * rdma-core:
+
+* rdma-core-54.0 and above
+
+  * NICs
+
+* NVIDIA\ |reg| ConnectX\ |reg|-6 Dx EN 100G MCX623106AN-CDAT (2x100G)
+
+  * Host interface: PCI Express 4.0 x16
+  * Device ID: 15b3:101d
+  * Firmware version: 22.43.1014 and above
+
+* NVIDIA\ |reg| ConnectX\ |reg|-6 Lx EN 25G MCX631102AN-ADAT (2x25G)
+
+  * Host interface: PCI Express 4.0 x8
+  * Device ID: 15b3:101f
+  * Firmware version: 26.43.1014 and above
+
+* NVIDIA\ |reg| ConnectX\ |reg|-7 200G CX713106AE-HEA_QP1_Ax (2x200G)
+
+  * Host interface: PCI Express 5.0 x16
+  * Device ID: 15b3:1021
+  * Firmware version: 28.43.1014 and above
+
+* NVIDIA\ |reg| BlueField\ |reg| SmartNIC
+
+  * NVIDIA\ |reg| BlueField\ |reg|-2 SmartNIC MT41686 - MBF2H332A-AEEOT_A1 
(2x25G)
+
+* Host interface: PCI Express 3.0 x16
+* Device ID: 15b3:a2d6
+* Firmware version: 24.43.1014 and above
+
+  * NVIDIA\ |reg| BlueField\ |reg|-3 P-Series DPU MT41692 - 900-9D3B6-00CV-AAB 
(2x200G)
+
+* Host interface: PCI Express 5.0 x16
+* Device ID: 15b3:a2dc
+* Firmware version: 32.43.1014 and above
+
+  * Embedded software:
+
+* Ubuntu 22.04
+* MLNX_OFED 24.10-0.6.7.0 and above
+* bf-bundle-2.9.0-90_24.10_ubuntu-22.04
+* DPDK application running on ARM cores
+
+* IBM Power 9 platforms with NVIDIA\ |reg| NICs combinations
+
+  * CPU:
+
+* POWER9 2.2 (pvr 004e 1202)
+
+  * OS:
+
+* Ubuntu 20.04
+
+  * NICs:
+
+* NVIDIA\ |reg| ConnectX\ |reg|-6 Dx 100G MCX623106AN-CDAT (2x100G)
+
+  * Host interface: PCI Express 4.0 x16
+  * Device ID: 15b3:101d
+  * Firmware version: 22.43.1014 and above
+
+* NVIDIA\ |reg| ConnectX\ |reg|-7 200G CX713106AE-HEA_QP1_Ax (2x200G)
+
+  * Host interface: PCI Express 5.0 x16
+  * Device ID: 15b3:1021
+  * Firmware version: 28.43.1014 and above
+
+  * OFED:
+
+* MLNX_OFED 24.10-0.7.0.0
-- 
2.39.5 (Apple Git-154)



Re: [PATCH v1 1/1] usertools/devbind: update coding style

2024-12-03 Thread Stephen Hemminger
On Mon,  2 Dec 2024 15:09:34 +
Anatoly Burakov  wrote:

> +# For kernels < 3.15 when binding devices to a generic driver (i.e. one 
> that doesn't have a PCI
> +# ID table) using new_id, some devices that are not bound to any other 
> driver could be bound
> +# even if no one has asked them to. hence, we check the list of drivers 
> again, and see if some
> +# of the previously-unbound devices were erroneously bound.
> +if not devbind.use_driver_override:

Why is tool still supporting out of date and no longer supported kernel?



> +choices=[
> +"baseband",
> +"compress",
> +"crypto",
> +"dma",
> +"event",
> +"mempool",
> +"misc",
> +"net",
> +"regex",
> +"ml",
> +"all",
> +],

Would prefer that all the types are in table/list and the help just
references that list. The next time a type is added, only one place
needs to change.

Also, I would not trust the output format of ip route not to change.
If the utility has to parse output of ip command, use json (-j) instead.

This whole section of code is quite fragile:

> if devices_type == network_devices:
> # check what is the interface if any for an ssh connection if
> # any to this host, so we can mark it later.
> ssh_if = []
> route = subprocess.check_output(["ip", "-o", "route"])
> # filter out all lines for 169.254 routes
> route = "\n".join(filter(lambda ln: not ln.startswith("169.254"),
>  route.decode().splitlines()))
> rt_info = route.split()
> for i in range(len(rt_info) - 1):
> if rt_info[i] == "dev":
> ssh_if.append(rt_info[i + 1])


RE: [PATCH][v2, 3/3] usertools/dpdk-devbind: add bind/unbind for platform device

2024-12-03 Thread Wencheng Li
Hi,
may I ask if there are any further changes needed for this patch? If not, can 
it be submitted to the open-source community?



Re: [PATCH v1 1/1] usertools/devbind: update coding style

2024-12-03 Thread Burakov, Anatoly

On 12/2/2024 6:01 PM, Stephen Hemminger wrote:

On Mon,  2 Dec 2024 15:09:34 +
Anatoly Burakov  wrote:


+# For kernels < 3.15 when binding devices to a generic driver (i.e. one 
that doesn't have a PCI
+# ID table) using new_id, some devices that are not bound to any other 
driver could be bound
+# even if no one has asked them to. hence, we check the list of drivers 
again, and see if some
+# of the previously-unbound devices were erroneously bound.
+if not devbind.use_driver_override:


Why is tool still supporting out of date and no longer supported kernel?


The aim was 100% compatibility with the old script, but I agree these 
parts can be taken out as this kernel is no longer supported. This will 
definitely make the binding code simpler.







+choices=[
+"baseband",
+"compress",
+"crypto",
+"dma",
+"event",
+"mempool",
+"misc",
+"net",
+"regex",
+"ml",
+"all",
+],


Would prefer that all the types are in table/list and the help just
references that list. The next time a type is added, only one place
needs to change.


It's a bit difficult to have *everything* as one list, as there are 
multiple places where we use this:


1) initial declarations at the top of the file (which I treat as "ground 
truth" for what sort of devices devbind aims to recognize)

2) categorization rules (which are inside Devbind class)
3) command line arguments
4) printouts

I suppose I can merge 3 and 4, but I don't see a neat way to specify 1) 
and 2) in a way that we can reuse elsewhere. I'll think on this though, 
thanks for the suggestion.




Also, I would not trust the output format of ip route not to change.
If the utility has to parse output of ip command, use json (-j) instead.

This whole section of code is quite fragile:


 if devices_type == network_devices:
 # check what is the interface if any for an ssh connection if
 # any to this host, so we can mark it later.
 ssh_if = []
 route = subprocess.check_output(["ip", "-o", "route"])
 # filter out all lines for 169.254 routes
 route = "\n".join(filter(lambda ln: not ln.startswith("169.254"),
  route.decode().splitlines()))
 rt_info = route.split()
 for i in range(len(rt_info) - 1):
 if rt_info[i] == "dev":
 ssh_if.append(rt_info[i + 1])


The quoted code is from old devbind code, but I agree that relying on -o 
output is not ideal, and using -j will be better. I'll fix it in v2.


Thanks for your feedback!


--
Thanks,
Anatoly


RE: [External] Re: [PATCH] eal: fix bus cleanup in secondary process

2024-12-03 Thread Ming 1. Yang (NSB)
Hi Stephen,

Yes, You're right. I'm making a new patch for improving in crypto device to 
solve this issue. And the modification has already worked in our cases.
Anyway, I will upload the patch soon and will mark the status of this patch to 
Superseded. Thanks.

Brs,
Yang Ming

-Original Message-
From: Stephen Hemminger  
Sent: 2024年11月29日 1:16
To: Ming 1. Yang (NSB) 
Cc: Anatoly Burakov ; Bruce Richardson 
; Kevin Laatz ; Morten 
Brørup ; dev@dpdk.org; sta...@dpdk.org
Subject: [External] Re: [PATCH] eal: fix bus cleanup in secondary process

Caution: This is an external email. Please be very careful when clicking links 
or opening attachments. See http://nok.it/nsb for additional information.

On Thu, 28 Nov 2024 13:48:29 +0800
myang  wrote:

> eal_bus_cleanup has been added in rte_eal_cleanup. But for secondary 
> process, eal_bus_cleanup will trigger vdev_cleanup which trigger 
> rte_vdev_driver to remove. Then our crypto devices will execute 
> ipsec_mb_remove to rte_cryptodev_pmd_destroy.
> 
> Finally error logs occur as below:
> CRYPTODEV: rte_cryptodev_close() line 1453: Device 0 must be stopped 
> before closing
> EAL: failed to send to (/tmp/dpdk/l2hicu/mp_socket) due to Bad file 
> descriptor
> EAL: Fail to send request /tmp/dpdk/l2hicu/mp_socket:ipsec_mb_mp_msg
> USER1: Create MR request to primary process failed.
> 
> Function call trace: rte_eal_cleanup->eal_bus_cleanup->
> vdev_cleanup->rte_vdev_driver->ipsec_mb_remove->
> 1. ipsec_mb_remove->rte_cryptodev_pmd_destroy->
> rte_cryptodev_pmd_release_device->rte_cryptodev_close
> 2. ipsec_mb_remove->ipsec_mb_qp_release->ipsec_mb_secondary_qp_op
> ->rte_mp_request_async->mp_request_async
> 
> Fixes: 1cab1a40ea9b ("bus: cleanup devices on shutdown")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: myang 

There is was a reason for calling cleanup on shutdown.
It looks like more of a bug in the crypto device not the EAL.


Freebsd and Windows also calls eal_bus_cleanup.





[PATCH v2 2/2] usertools/devbind: replace devbind

2024-12-03 Thread Anatoly Burakov
Signed-off-by: Anatoly Burakov 
---

Notes:
v2:
- Added this patch to aid in review
- I believe it's better to squash it on apply

 usertools/dpdk-devbind-new.py |  996 ---
 usertools/dpdk-devbind.py | 1678 ++---
 2 files changed, 911 insertions(+), 1763 deletions(-)
 delete mode 100755 usertools/dpdk-devbind-new.py

diff --git a/usertools/dpdk-devbind-new.py b/usertools/dpdk-devbind-new.py
deleted file mode 100755
index 9f2ee27cf3..00
--- a/usertools/dpdk-devbind-new.py
+++ /dev/null
@@ -1,996 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright(c) 2010-2024 Intel Corporation
-#
-"""Script to bind PCI devices to DPDK-compatible userspace IO drivers."""
-
-import argparse
-import glob
-import grp
-import json
-import os
-import pwd
-import subprocess
-import sys
-import typing as T
-
-# the following list of modules is supported by DPDK
-DPDK_KERNEL_MODULES = {"igb_uio", "vfio-pci", "uio_pci_generic"}
-
-# pattern matching criteria for various devices and devices classes. keys are 
entries in lspci,
-# while values, if present are further matches for lspci criteria. values can 
be either strings or
-# list of strings, in which case any match is sufficient.
-StrOrList = T.Union[str, T.List[str]]
-DeviceMatchPattern = T.Dict[str, StrOrList]
-CLASS_NETWORK: DeviceMatchPattern = {
-"Class": "02",
-}
-CLASS_ACCELERATION: DeviceMatchPattern = {
-"Class": "12",
-}
-CLASS_IFPGA: DeviceMatchPattern = {
-"Class": "12",
-"Vendor": "8086",
-"Device": "0b30",
-}
-CLASS_ENCRYPTION: DeviceMatchPattern = {
-"Class": "10",
-}
-CLASS_INTEL_PROCESSOR: DeviceMatchPattern = {
-"Class": "0b",
-"Vendor": "8086",
-}
-DEVICE_CAVIUM_SSO: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": ["a04b", "a04d"],
-}
-DEVICE_CAVIUM_FPA: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": "a053",
-}
-DEVICE_CAVIUM_PKX: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": ["a0dd", "a049"],
-}
-DEVICE_CAVIUM_TIM: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": "a051",
-}
-DEVICE_CAVIUM_ZIP: DeviceMatchPattern = {
-"Class": "12",
-"Vendor": "177d",
-"Device": "a037",
-}
-DEVICE_AVP_VNIC: DeviceMatchPattern = {
-"Class": "05",
-"Vendor": "1af4",
-"Device": "1110",
-}
-DEVICE_CNXK_BPHY: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": "a089",
-}
-DEVICE_CNXK_BPHY_CGX: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": ["a059", "a060"],
-}
-DEVICE_CNXK_DMA: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": "a081",
-}
-DEVICE_CNXK_INL_DEV: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": ["a0f0", "a0f1"],
-}
-DEVICE_HISILICON_DMA: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "19e5",
-"Device": "a122",
-}
-DEVICE_ODM_DMA: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": "a08c",
-}
-DEVICE_INTEL_DLB: DeviceMatchPattern = {
-"Class": "0b",
-"Vendor": "8086",
-"Device": ["270b", "2710", "2714"],
-}
-DEVICE_INTEL_IOAT_BDW: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "8086",
-"Device": [
-"6f20",
-"6f21",
-"6f22",
-"6f23",
-"6f24",
-"6f25",
-"6f26",
-"6f27",
-"6f2e",
-"6f2f",
-],
-}
-DEVICE_INTEL_IOAT_SKX: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "8086",
-"Device": "2021",
-}
-DEVICE_INTEL_IOAT_ICX: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "8086",
-"Device": "0b00",
-}
-DEVICE_INTEL_IDXD_SPR: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "8086",
-"Device": "0b25",
-}
-DEVICE_INTEL_NTB_SKX: DeviceMatchPattern = {
-"Class": "06",
-"Vendor": "8086",
-"Device": "201c",
-}
-DEVICE_INTEL_NTB_ICX: DeviceMatchPattern = {
-"Class": "06",
-"Vendor": "8086",
-"Device": "347e",
-}
-DEVICE_CNXK_SSO: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": ["a0f9", "a0fa"],
-}
-DEVICE_CNXK_NPA: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": ["a0fb", "a0fc"],
-}
-DEVICE_CN9K_REE: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": "a0f4",
-}
-DEVICE_VIRTIO_BLK: DeviceMatchPattern = {
-"Class": "01",
-"Vendor": "1af4",
-"Device": ["1001", "1042"],
-}
-DEVICE_CNXK_ML: DeviceMatchPattern = {
-"Class": "08",
-"Vendor": "177d",
-"Device": "a092",
-}
-
-# device types as recognized by devbind
-NETWORK_DEVICES = [CLASS_NETWORK, CLASS_IFPGA, DEVICE_CAVIUM_PKX, 
DEVICE_AVP_VNIC]
-BASEDBAND_DEVICES = [CLASS_ACCELERATION]
-CRYPTO_DEVICES = [CLASS_ENCRYPTION, CLASS_INTEL_PROCESSOR]
-DMA_DEVICES = [
-DEVICE_CNXK_DMA,
-DEVICE_HISIL

[PATCH v2 0/2] Rewrite devbind

2024-12-03 Thread Anatoly Burakov
It has been suggested [1] that a major cleanup/rewrite of devbind would be
beneficial in terms of long term maintainability of the code. I was in a
coding mood over the weekend, and so I've went ahead and rewritten devbind.

Note that this is one giant patch, rather than a series of patches adjusting
existing code. Making it a patch series is possible, however the internal
code architecture diverges quite significantly from the original devbind
script due to its copious usage of string operations/pattern matching and
global variables, so it is unclear whether subdividing this patch would be
worth the effort. Instead, as has been suggested [2], the patchset now
consists of creating a new file, followed by a removal of old file and
rename of the new file. It is expected that this will be squashed on apply.

The script has become slightly bigger - 1000 lines instead of 800, however
I would argue that since most of that increase is infrastructure, comments,
and sacrificing code golf for code readability (such as expanding one-liners
into multiple lines), the trade-off between being able to read and reason
about what happens in the script is worth the added line count.

[1] 
https://patches.dpdk.org/project/dpdk/patch/c2bf00195c2d43833a831a9cc9346b4606d6ea2e.1723810613.git.anatoly.bura...@intel.com/
[2] 
https://patches.dpdk.org/project/dpdk/cover/cover.1733151400.git.anatoly.bura...@intel.com/

Anatoly Burakov (2):
  usertools/devbind: update coding style
  usertools/devbind: replace devbind

 usertools/dpdk-devbind.py | 1678 -
 1 file changed, 911 insertions(+), 767 deletions(-)

-- 
2.43.5



[PATCH v2 1/2] usertools/devbind: update coding style

2024-12-03 Thread Anatoly Burakov
Devbind is one of the oldest tools in DPDK, and is written in a way that
uses a lot of string matching, no type safety, lots of global variables,
and has a few inconsistencies in the way it handles data (such as
differences between lspci calls and parsing in different circumstances).

This patch is a nigh complete rewrite of devbind, with full 100% feature
and command-line compatibility with the old version (except for dropping
older kernel support), albeit with a few differences in formatting and
error messages. All file handling code has also been replaced with
context managers.

What's different from old code:
- Full PEP-484 compliance
- Formatted with Ruff
- Much better structured code
- Clean and consistent control flow
- More comments
- Better error handling
- Fewer lspci calls
- Unified lspci parsing
- Using /sys/bus/pci/drivers as a source of truth about kernel modules
- Check for iproute2 package
- Use JSON parsing for iproute2 output
- Deprecate --status-dev in favor of optional --status argument
- Deprecate kernel <3.15 support and only use driver_override

Signed-off-by: Anatoly Burakov 
---

Notes:
v1 -> v2:
- Use dictionary syntax to get raw string values from devices
- Fixed rollback not working correctly due to stale device state
- Fixed attempts to bind to empty driver on rollback
- Simplified bind/rollback and removed recursion
- Unified command-line and device type handling
- Dropped support for kernels <3.15
- Use JSON parsing for ip route output
- Used a new filename to aid in review, rename in next patch

v1 -> v2:
- Fixed rollback not working correctly
- Fixed attempts to bind to empty driver
- Unified command-line and device type handling
- Dropped support for kernels <3.15
- Use JSON parsing for ip route output
- Used a new filename to aid in review, rename in next patch

v1 -> v2:
- Use dictionary syntax to get raw string values from devices
- Fixed rollback not working correctly
- Fixed attempts to bind to empty driver on rollback
- Unified command-line and device type handling
- Dropped support for kernels <3.15
- Use JSON parsing for ip route output
- Used a new filename to aid in review, rename in next patch

v1 -> v2:
- Fixed rollback not working correctly
- Fixed attempts to bind to empty driver
- Unified command-line and device type handling
- Dropped support for kernels <3.15
- Use JSON parsing for ip route output
- Used a new filename to aid in review, rename in next patch

 usertools/dpdk-devbind-new.py | 996 ++
 1 file changed, 996 insertions(+)
 create mode 100755 usertools/dpdk-devbind-new.py

diff --git a/usertools/dpdk-devbind-new.py b/usertools/dpdk-devbind-new.py
new file mode 100755
index 00..9f2ee27cf3
--- /dev/null
+++ b/usertools/dpdk-devbind-new.py
@@ -0,0 +1,996 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2010-2024 Intel Corporation
+#
+"""Script to bind PCI devices to DPDK-compatible userspace IO drivers."""
+
+import argparse
+import glob
+import grp
+import json
+import os
+import pwd
+import subprocess
+import sys
+import typing as T
+
+# the following list of modules is supported by DPDK
+DPDK_KERNEL_MODULES = {"igb_uio", "vfio-pci", "uio_pci_generic"}
+
+# pattern matching criteria for various devices and devices classes. keys are 
entries in lspci,
+# while values, if present are further matches for lspci criteria. values can 
be either strings or
+# list of strings, in which case any match is sufficient.
+StrOrList = T.Union[str, T.List[str]]
+DeviceMatchPattern = T.Dict[str, StrOrList]
+CLASS_NETWORK: DeviceMatchPattern = {
+"Class": "02",
+}
+CLASS_ACCELERATION: DeviceMatchPattern = {
+"Class": "12",
+}
+CLASS_IFPGA: DeviceMatchPattern = {
+"Class": "12",
+"Vendor": "8086",
+"Device": "0b30",
+}
+CLASS_ENCRYPTION: DeviceMatchPattern = {
+"Class": "10",
+}
+CLASS_INTEL_PROCESSOR: DeviceMatchPattern = {
+"Class": "0b",
+"Vendor": "8086",
+}
+DEVICE_CAVIUM_SSO: DeviceMatchPattern = {
+"Class": "08",
+"Vendor": "177d",
+"Device": ["a04b", "a04d"],
+}
+DEVICE_CAVIUM_FPA: DeviceMatchPattern = {
+"Class": "08",
+"Vendor": "177d",
+"Device": "a053",
+}
+DEVICE_CAVIUM_PKX: DeviceMatchPattern = {
+"Class": "08",
+"Vendor": "177d",
+"Device": ["a0dd", "a049"],
+}
+DEVICE_CAVIUM_TIM: DeviceMatchPattern = {
+"Class": "08",
+"Vendor": "177d",
+"Device": "a051",
+}
+DEVICE_CAVIUM_ZIP: DeviceMatchPattern = {
+"Class": "12",
+"Vendor": "177d",
+"Device": "a037",
+}
+DEVICE_AVP_VNIC: DeviceMatchPattern = {
+"Class": "05",
+"Vendor": "1af4",
+"Device": "1110",
+}
+DEVICE_CNXK_BPHY: DeviceMatchPattern = {
+"Class": "08",
+"Vendor": "177d",
+"Device": "a089",
+}
+DEVICE_CNXK_BPHY_CGX: DeviceMatchPattern = {
+"Class": "08",
+"Vendor": "

RE: rte_fib network order bug

2024-12-03 Thread Medvedkin, Vladimir
Hi Robin,

I like the second approach with one more suggestion. It would be nice to have 2 
different flags - an existing flag (RTE_FIB_F_LOOKUP_NETWORK_ORDER) for the 
data plane bswap, and a new one for the control plane operations (smth like 
RTE_FIB/RIB_F_CP_NETWORK_ORDER).
Also for user convenience RTE_FIB_F_NETWORK_ORDER may be introduced as 
(RTE_FIB_F_LOOKUP_NETWORK_ORDER| RTE_FIB_F_CP_NETWORK_ORDER)

> There would need to be an additional RTE_IPV4_BE() macro to declare IPv4 
> addresses in network order.
This may be useful as well

Thanks!

-Original Message-
From: Robin Jarry  
Sent: Friday, November 22, 2024 4:15 PM
To: Vladimir Medvedkin ; Stephen Hemminger 

Cc: Morten Brørup ; Medvedkin, Vladimir 
; dev@dpdk.org; Richardson, Bruce 
; Marchand, David ; 
Thomas Monjalon ; Konstantin Ananyev 

Subject: Re: rte_fib network order bug

Vladimir Medvedkin, Nov 17, 2024 at 16:04:
> [Robin]
>> I had not understood that it was *only* the lookups that were network 
>> order
>
> [Morten]
>> When I saw the byte order flag the first time, it was not clear to me 
>> either that it only affected lookups - I too thought it covered the 
>> entire API of the library. This needs to be emphasized in the 
>> description of the flag. And the flag's name should contain LOOKUP 
>> [Morten] > And/or rename RTE_FIB_F_NETWORK_ORDER to 
>> RTE_FIB_F_NETWORK_ORDER_LOOKUP or similar.
>
> There is a clear comment for this flag that it has effects on lookup. 
> Repeating the statement with an exclamation mark seems too much. 
> Moreover, at first this flag was named "RTE_FIB_FLAG_LOOKUP_BE" and it 
> was suggested for renaming here:
> https://inbox.dpdk.org/dev/d4swpkoprd5z.87yiet3y...@redhat.com/

This is my bad then. I had misunderstood what this flag was for. 
I should have been more careful. You had clearly stated that it was only 
affecting the lookup.

> So, feel free to submit patches adding this feature to the control 
> plane API, but let's consider:

I can commit to working on that topic if we can get a consensus. In my opinion 
there are two different approaches:

1) Change all IPv4 routing *APIs* to only use network order addresses 
=

This would make them consistent with all networking stacks (linux, vpp, bsd, 
etc.) and would avoid confusion from users (like me) who naively used these 
libraries with addresses generated with inet_pton() or addresses taken verbatim 
from IPv4 packet headers.

More importantly, it would make them consistent on big-endian and little-endian 
architectures. Currently, the same code could work (without any byte swap) on 
aarch4, but would not work on x86_64.

It would also make them consistent with their IPv6 counterparts which do not 
require any byteswap.

This would be a drastic and breaking change but I think this would be the 
better solution in the long run.

To ensure that potential users of these libraries will not miss this change, 
the uint32_t parameters should be changed to a rte_ipv4_addr structure that 
follows the same idea than rte_ipv6_addr.

We could also simply use rte_be32_t types everywhere but it would expose 
potential users of these APIs with bugs that could not be found at compilation.

Internally, all these routing libraries would continue using host order 
integers, the changes I am suggesting only affect the public API.

2) Implement network order via opt-in flags 
===

This would allow the same thing as solution 1) but would keep the default 
behaviour which I find confusing and inconsistent with IPv6 and with all IPv4 
networking stacks that I know.

The other concern I have with that second solution is that the public APIs 
would continue using uint32_t parameters which would be only correct when the 
network-order mode is not enabled.

On the other hand, it does not break any API for users that do not use the 
flags.

There would need to be an additional RTE_IPV4_BE() macro to declare IPv4 
addresses in network order.

Any thoughts?



Re: [PATCH v1 0/1] Rewrite devbind

2024-12-03 Thread Burakov, Anatoly

On 12/2/2024 5:14 PM, Bruce Richardson wrote:

On Mon, Dec 02, 2024 at 03:09:33PM +, Anatoly Burakov wrote:

It has been suggested [1] that a major cleanup/rewrite of devbind would be
beneficial in terms of long term maintainability of the code. I was in a
coding mood over the weekend, and so I've went ahead and rewritten devbind.

Note that this is one giant patch, rather than a series of patches adjusting
existing code. Making it a patch series is possible, however the internal
code architecture diverges quite significantly from the original devbind
script due to its copious usage of string operations/pattern matching and
global variables, so it is unclear whether subdividing this patch would be
worth the effort.


One suggestion here which might help reviewing. Since it is essentially a
rewrite, is it worth making this a two-patch set, where:

Patch 1: introduces a new script called e.g. dpdk-devbind-new.py, which
  contains just the rewrite without any of old code. This then can be
  reviewed in isolation
Patch 2: moves dpdk-devbind-new.py to overwrite dpdk-devbind.py

WDYT?

Regards,
/Bruce


I think it's a good idea, provided it gets squashed on apply.

--
Thanks,
Anatoly


[PATCH] net/mlx5: fix hypervisor detection in VLAN workaround

2024-12-03 Thread Viacheslav Ovsiienko
The mlx5 PMD provides a specific workaround for the VMware ESXi
hypervisor, enabling on-demand routing configuration to virtual
machines. This workaround activates when the device type is
a Virtual Function and either an ESXi hypervisor is detected
or the hypervisor type is unknown.

For non-x86 architectures the function rte_hypervisor_get()
consistently returns an unknown type, which triggers the workaround
automatically without any actual needs. If there are VLAN support
requirements, this can lead to failures in inserting default control
flows.

Do not trigger the workaround for unknown hypervisor type
in non-x86 environments.

Fixes: dfedf3e3f9d2 ("net/mlx5: add workaround for VLAN in virtual machine")
Cc: sta...@dpdk.org

Signed-off-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/linux/mlx5_vlan_os.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/mlx5/linux/mlx5_vlan_os.c 
b/drivers/net/mlx5/linux/mlx5_vlan_os.c
index 81611a8d3f..017953d5cc 100644
--- a/drivers/net/mlx5/linux/mlx5_vlan_os.c
+++ b/drivers/net/mlx5/linux/mlx5_vlan_os.c
@@ -112,7 +112,9 @@ mlx5_vlan_vmwa_init(struct rte_eth_dev *dev, uint32_t 
ifindex)
/* Check whether there is desired virtual environment */
hv_type = rte_hypervisor_get();
switch (hv_type) {
+#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_X86_64)
case RTE_HYPERVISOR_UNKNOWN:
+#endif
case RTE_HYPERVISOR_VMWARE:
/*
 * The "white list" of configurations
-- 
2.34.1



[PATCH v2 02/22] net/_common_intel: provide common Tx entry structures

2024-12-03 Thread Bruce Richardson
The Tx entry structures, both vector and scalar, are common across Intel
drivers, so provide a single definition to be used everywhere.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h| 27 +++
 .../net/i40e/i40e_recycle_mbufs_vec_common.c  |  2 +-
 drivers/net/i40e/i40e_rxtx.c  | 18 ++---
 drivers/net/i40e/i40e_rxtx.h  | 14 +++---
 drivers/net/i40e/i40e_rxtx_vec_altivec.c  |  2 +-
 drivers/net/i40e/i40e_rxtx_vec_avx2.c |  2 +-
 drivers/net/i40e/i40e_rxtx_vec_avx512.c   |  6 ++---
 drivers/net/i40e/i40e_rxtx_vec_common.h   |  4 +--
 drivers/net/i40e/i40e_rxtx_vec_neon.c |  2 +-
 drivers/net/i40e/i40e_rxtx_vec_sse.c  |  2 +-
 drivers/net/iavf/iavf_rxtx.c  | 12 -
 drivers/net/iavf/iavf_rxtx.h  | 14 +++---
 drivers/net/iavf/iavf_rxtx_vec_avx2.c |  2 +-
 drivers/net/iavf/iavf_rxtx_vec_avx512.c   | 10 +++
 drivers/net/iavf/iavf_rxtx_vec_common.h   |  4 +--
 drivers/net/iavf/iavf_rxtx_vec_sse.c  |  2 +-
 drivers/net/ice/ice_dcf_ethdev.c  |  2 +-
 drivers/net/ice/ice_rxtx.c| 16 +--
 drivers/net/ice/ice_rxtx.h| 13 ++---
 drivers/net/ice/ice_rxtx_vec_avx2.c   |  2 +-
 drivers/net/ice/ice_rxtx_vec_avx512.c |  6 ++---
 drivers/net/ice/ice_rxtx_vec_common.h |  6 ++---
 drivers/net/ice/ice_rxtx_vec_sse.c|  2 +-
 .../ixgbe/ixgbe_recycle_mbufs_vec_common.c|  2 +-
 drivers/net/ixgbe/ixgbe_rxtx.c| 16 +--
 drivers/net/ixgbe/ixgbe_rxtx.h| 22 +++
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h |  8 +++---
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c   |  2 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c|  2 +-
 29 files changed, 105 insertions(+), 117 deletions(-)
 create mode 100644 drivers/net/_common_intel/tx.h

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
new file mode 100644
index 00..384352b9db
--- /dev/null
+++ b/drivers/net/_common_intel/tx.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Intel Corporation
+ */
+
+#ifndef _COMMON_INTEL_TX_H_
+#define _COMMON_INTEL_TX_H_
+
+#include 
+#include 
+
+/**
+ * Structure associated with each descriptor of the TX ring of a TX queue.
+ */
+struct ci_tx_entry {
+   struct rte_mbuf *mbuf; /* mbuf associated with TX desc, if any. */
+   uint16_t next_id; /* Index of next descriptor in ring. */
+   uint16_t last_id; /* Index of last scattered descriptor. */
+};
+
+/**
+ * Structure associated with each descriptor of the TX ring of a TX queue in 
vector Tx.
+ */
+struct ci_tx_entry_vec {
+   struct rte_mbuf *mbuf; /* mbuf associated with TX desc, if any. */
+};
+
+#endif /* _COMMON_INTEL_TX_H_ */
diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c 
b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c
index 14424c9921..260d238ce4 100644
--- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c
+++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c
@@ -56,7 +56,7 @@ i40e_recycle_tx_mbufs_reuse_vec(void *tx_queue,
struct rte_eth_recycle_rxq_info *recycle_rxq_info)
 {
struct i40e_tx_queue *txq = tx_queue;
-   struct i40e_tx_entry *txep;
+   struct ci_tx_entry *txep;
struct rte_mbuf **rxep;
int i, n;
uint16_t nb_recycle_mbufs;
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 839c8a5442..2e1f07d2a1 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -378,7 +378,7 @@ i40e_build_ctob(uint32_t td_cmd,
 static inline int
 i40e_xmit_cleanup(struct i40e_tx_queue *txq)
 {
-   struct i40e_tx_entry *sw_ring = txq->sw_ring;
+   struct ci_tx_entry *sw_ring = txq->sw_ring;
volatile struct i40e_tx_desc *txd = txq->tx_ring;
uint16_t last_desc_cleaned = txq->last_desc_cleaned;
uint16_t nb_tx_desc = txq->nb_tx_desc;
@@ -1081,8 +1081,8 @@ uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
struct i40e_tx_queue *txq;
-   struct i40e_tx_entry *sw_ring;
-   struct i40e_tx_entry *txe, *txn;
+   struct ci_tx_entry *sw_ring;
+   struct ci_tx_entry *txe, *txn;
volatile struct i40e_tx_desc *txd;
volatile struct i40e_tx_desc *txr;
struct rte_mbuf *tx_pkt;
@@ -1331,7 +1331,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
 static __rte_always_inline int
 i40e_tx_free_bufs(struct i40e_tx_queue *txq)
 {
-   struct i40e_tx_entry *txep;
+   struct ci_tx_entry *txep;
uint16_t tx_rs_thresh = txq->tx_rs_thresh;
uint16_t i = 0, j = 0;
struct rte_mbuf *free[RTE_I40E_TX_MAX_FREE_BUF_SZ];
@@ -1418,7 +1418,7 @@ i40e_tx_fill_hw_ring(struct i40e_tx_queue *txq,
 uint16_t nb_pkts)

[PATCH v2 01/22] net/_common_intel: add pkt reassembly fn for intel drivers

2024-12-03 Thread Bruce Richardson
The code for reassembling a single, multi-mbuf packet from multiple
buffers received from the NIC is duplicated across many drivers. Rather
than having multiple copies of this function, we can create an
"_common_intel" directory to hold such functions and consolidate
multiple functions down to a single one for easier maintenance.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/rx.h| 79 +++
 drivers/net/i40e/i40e_rxtx_vec_altivec.c  |  4 +-
 drivers/net/i40e/i40e_rxtx_vec_avx2.c |  4 +-
 drivers/net/i40e/i40e_rxtx_vec_avx512.c   |  4 +-
 drivers/net/i40e/i40e_rxtx_vec_common.h   | 64 +-
 drivers/net/i40e/i40e_rxtx_vec_neon.c |  4 +-
 drivers/net/i40e/i40e_rxtx_vec_sse.c  |  4 +-
 drivers/net/i40e/meson.build  |  2 +-
 drivers/net/iavf/iavf_rxtx_vec_avx2.c |  8 +--
 drivers/net/iavf/iavf_rxtx_vec_avx512.c   |  8 +--
 drivers/net/iavf/iavf_rxtx_vec_common.h   | 65 +--
 drivers/net/iavf/iavf_rxtx_vec_sse.c  |  8 +--
 drivers/net/iavf/meson.build  |  2 +-
 drivers/net/ice/ice_rxtx_vec_avx2.c   |  4 +-
 drivers/net/ice/ice_rxtx_vec_avx512.c |  8 +--
 drivers/net/ice/ice_rxtx_vec_common.h | 66 +--
 drivers/net/ice/ice_rxtx_vec_sse.c|  4 +-
 drivers/net/ice/meson.build   |  2 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 63 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c   |  4 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c|  4 +-
 drivers/net/ixgbe/meson.build |  2 +-
 22 files changed, 121 insertions(+), 292 deletions(-)
 create mode 100644 drivers/net/_common_intel/rx.h

diff --git a/drivers/net/_common_intel/rx.h b/drivers/net/_common_intel/rx.h
new file mode 100644
index 00..5bd2fea7e3
--- /dev/null
+++ b/drivers/net/_common_intel/rx.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Intel Corporation
+ */
+
+#ifndef _COMMON_INTEL_RX_H_
+#define _COMMON_INTEL_RX_H_
+
+#include 
+#include 
+#include 
+
+#define CI_RX_BURST 32
+
+static inline uint16_t
+ci_rx_reassemble_packets(struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t 
*split_flags,
+   struct rte_mbuf **pkt_first_seg, struct rte_mbuf **pkt_last_seg,
+   const uint8_t crc_len)
+{
+   struct rte_mbuf *pkts[CI_RX_BURST] = {0}; /*finished pkts*/
+   struct rte_mbuf *start = *pkt_first_seg;
+   struct rte_mbuf *end = *pkt_last_seg;
+   unsigned int pkt_idx, buf_idx;
+
+   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
+   if (end) {
+   /* processing a split packet */
+   end->next = rx_bufs[buf_idx];
+   rx_bufs[buf_idx]->data_len += crc_len;
+
+   start->nb_segs++;
+   start->pkt_len += rx_bufs[buf_idx]->data_len;
+   end = end->next;
+
+   if (!split_flags[buf_idx]) {
+   /* it's the last packet of the set */
+   start->hash = end->hash;
+   start->vlan_tci = end->vlan_tci;
+   start->ol_flags = end->ol_flags;
+   /* we need to strip crc for the whole packet */
+   start->pkt_len -= crc_len;
+   if (end->data_len > crc_len) {
+   end->data_len -= crc_len;
+   } else {
+   /* free up last mbuf */
+   struct rte_mbuf *secondlast = start;
+
+   start->nb_segs--;
+   while (secondlast->next != end)
+   secondlast = secondlast->next;
+   secondlast->data_len -= (crc_len - 
end->data_len);
+   secondlast->next = NULL;
+   rte_pktmbuf_free_seg(end);
+   }
+   pkts[pkt_idx++] = start;
+   start = NULL;
+   end = NULL;
+   }
+   } else {
+   /* not processing a split packet */
+   if (!split_flags[buf_idx]) {
+   /* not a split packet, save and skip */
+   pkts[pkt_idx++] = rx_bufs[buf_idx];
+   continue;
+   }
+   start = rx_bufs[buf_idx];
+   end = start;
+   rx_bufs[buf_idx]->data_len += crc_len;
+   rx_bufs[buf_idx]->pkt_len += crc_len;
+   }
+   }
+
+   /* save the partial packet for ne

[PATCH v2 03/22] net/_common_intel: add Tx mbuf ring replenish fn

2024-12-03 Thread Bruce Richardson
Move the short function used to place mbufs on the SW Tx ring to common
code to avoid duplication.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h   |  7 +++
 drivers/net/i40e/i40e_rxtx_vec_altivec.c |  4 ++--
 drivers/net/i40e/i40e_rxtx_vec_avx2.c|  4 ++--
 drivers/net/i40e/i40e_rxtx_vec_common.h  | 10 --
 drivers/net/i40e/i40e_rxtx_vec_neon.c|  4 ++--
 drivers/net/i40e/i40e_rxtx_vec_sse.c |  4 ++--
 drivers/net/iavf/iavf_rxtx_vec_avx2.c|  4 ++--
 drivers/net/iavf/iavf_rxtx_vec_common.h  | 10 --
 drivers/net/iavf/iavf_rxtx_vec_sse.c |  4 ++--
 drivers/net/ice/ice_rxtx_vec_avx2.c  |  4 ++--
 drivers/net/ice/ice_rxtx_vec_common.h| 10 --
 drivers/net/ice/ice_rxtx_vec_sse.c   |  4 ++--
 12 files changed, 23 insertions(+), 46 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index 384352b9db..5397007411 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -24,4 +24,11 @@ struct ci_tx_entry_vec {
struct rte_mbuf *mbuf; /* mbuf associated with TX desc, if any. */
 };
 
+static __rte_always_inline void
+ci_tx_backlog_entry(struct ci_tx_entry *txep, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
+{
+   for (uint16_t i = 0; i < (int)nb_pkts; ++i)
+   txep[i].mbuf = tx_pkts[i];
+}
+
 #endif /* _COMMON_INTEL_TX_H_ */
diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c 
b/drivers/net/i40e/i40e_rxtx_vec_altivec.c
index ca1038eaa6..80f07a3e10 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c
@@ -575,7 +575,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 
n = (uint16_t)(txq->nb_tx_desc - tx_id);
if (nb_commit >= n) {
-   tx_backlog_entry(txep, tx_pkts, n);
+   ci_tx_backlog_entry(txep, tx_pkts, n);
 
for (i = 0; i < n - 1; ++i, ++tx_pkts, ++txdp)
vtx1(txdp, *tx_pkts, flags);
@@ -592,7 +592,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
txep = &txq->sw_ring[tx_id];
}
 
-   tx_backlog_entry(txep, tx_pkts, nb_commit);
+   ci_tx_backlog_entry(txep, tx_pkts, nb_commit);
 
vtx(txdp, tx_pkts, nb_commit, flags);
 
diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c 
b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
index e8441de759..b26bae4757 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
@@ -765,7 +765,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct 
rte_mbuf **tx_pkts,
 
n = (uint16_t)(txq->nb_tx_desc - tx_id);
if (nb_commit >= n) {
-   tx_backlog_entry(txep, tx_pkts, n);
+   ci_tx_backlog_entry(txep, tx_pkts, n);
 
vtx(txdp, tx_pkts, n - 1, flags);
tx_pkts += (n - 1);
@@ -783,7 +783,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct 
rte_mbuf **tx_pkts,
txep = &txq->sw_ring[tx_id];
}
 
-   tx_backlog_entry(txep, tx_pkts, nb_commit);
+   ci_tx_backlog_entry(txep, tx_pkts, nb_commit);
 
vtx(txdp, tx_pkts, nb_commit, flags);
 
diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h 
b/drivers/net/i40e/i40e_rxtx_vec_common.h
index 619fb89110..325e99c1a4 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
@@ -84,16 +84,6 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
return txq->tx_rs_thresh;
 }
 
-static __rte_always_inline void
-tx_backlog_entry(struct ci_tx_entry *txep,
-struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
-{
-   int i;
-
-   for (i = 0; i < (int)nb_pkts; ++i)
-   txep[i].mbuf = tx_pkts[i];
-}
-
 static inline void
 _i40e_rx_queue_release_mbufs_vec(struct i40e_rx_queue *rxq)
 {
diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c 
b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index 9b90a32e28..26bc345a0a 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -702,7 +702,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 
n = (uint16_t)(txq->nb_tx_desc - tx_id);
if (nb_commit >= n) {
-   tx_backlog_entry(txep, tx_pkts, n);
+   ci_tx_backlog_entry(txep, tx_pkts, n);
 
for (i = 0; i < n - 1; ++i, ++tx_pkts, ++txdp)
vtx1(txdp, *tx_pkts, flags);
@@ -719,7 +719,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
txep = &txq->sw_ring[tx_id];
}
 
-   tx_backlog_entry(txep, tx_pkts, nb_commit);
+   ci_tx_backlog_entry(txep, tx_pkts, nb_commit);
 
vtx(txdp, tx_pkts, nb_commit, flags);
 
diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c 
b/drivers/net/i40e/i40e_rxtx_vec_sse.c
index e1fa2ed543..ebc32b0d27 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_sse.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c
@

[PATCH v2 06/22] net/_common_intel: merge ice and i40e Tx queue struct

2024-12-03 Thread Bruce Richardson
The queue structures of i40e and ice drivers are virtually identical, so
merge them into a common struct. This should allow easier function
merging in future using that common struct.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h| 55 +
 drivers/net/i40e/i40e_ethdev.c|  4 +-
 drivers/net/i40e/i40e_ethdev.h|  4 +-
 drivers/net/i40e/i40e_fdir.c  |  4 +-
 .../net/i40e/i40e_recycle_mbufs_vec_common.c  |  2 +-
 drivers/net/i40e/i40e_rxtx.c  | 58 +-
 drivers/net/i40e/i40e_rxtx.h  | 50 ++--
 drivers/net/i40e/i40e_rxtx_vec_altivec.c  |  4 +-
 drivers/net/i40e/i40e_rxtx_vec_avx2.c |  4 +-
 drivers/net/i40e/i40e_rxtx_vec_avx512.c   |  6 +-
 drivers/net/i40e/i40e_rxtx_vec_common.h   |  2 +-
 drivers/net/i40e/i40e_rxtx_vec_neon.c |  4 +-
 drivers/net/i40e/i40e_rxtx_vec_sse.c  |  4 +-
 drivers/net/ice/ice_dcf.c |  4 +-
 drivers/net/ice/ice_dcf_ethdev.c  | 10 ++--
 drivers/net/ice/ice_diagnose.c|  2 +-
 drivers/net/ice/ice_ethdev.c  |  2 +-
 drivers/net/ice/ice_ethdev.h  |  4 +-
 drivers/net/ice/ice_rxtx.c| 60 +--
 drivers/net/ice/ice_rxtx.h| 41 +
 drivers/net/ice/ice_rxtx_vec_avx2.c   |  4 +-
 drivers/net/ice/ice_rxtx_vec_avx512.c |  8 +--
 drivers/net/ice/ice_rxtx_vec_common.h |  8 +--
 drivers/net/ice/ice_rxtx_vec_sse.c|  6 +-
 24 files changed, 165 insertions(+), 185 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index 5397007411..c965f5ee6c 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -8,6 +8,9 @@
 #include 
 #include 
 
+/* forward declaration of the common intel (ci) queue structure */
+struct ci_tx_queue;
+
 /**
  * Structure associated with each descriptor of the TX ring of a TX queue.
  */
@@ -24,6 +27,58 @@ struct ci_tx_entry_vec {
struct rte_mbuf *mbuf; /* mbuf associated with TX desc, if any. */
 };
 
+typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue *txq);
+
+struct ci_tx_queue {
+   union { /* TX ring virtual address */
+   volatile struct ice_tx_desc *ice_tx_ring;
+   volatile struct i40e_tx_desc *i40e_tx_ring;
+   };
+   volatile uint8_t *qtx_tail;   /* register address of tail */
+   struct ci_tx_entry *sw_ring; /* virtual address of SW ring */
+   rte_iova_t tx_ring_dma;/* TX ring DMA address */
+   uint16_t nb_tx_desc;   /* number of TX descriptors */
+   uint16_t tx_tail; /* current value of tail register */
+   uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
+   /* index to last TX descriptor to have been cleaned */
+   uint16_t last_desc_cleaned;
+   /* Total number of TX descriptors ready to be allocated. */
+   uint16_t nb_tx_free;
+   /* Start freeing TX buffers if there are less free descriptors than
+* this value.
+*/
+   uint16_t tx_free_thresh;
+   /* Number of TX descriptors to use before RS bit is set. */
+   uint16_t tx_rs_thresh;
+   uint8_t pthresh;   /**< Prefetch threshold register. */
+   uint8_t hthresh;   /**< Host threshold register. */
+   uint8_t wthresh;   /**< Write-back threshold reg. */
+   uint16_t port_id;  /* Device port identifier. */
+   uint16_t queue_id; /* TX queue index. */
+   uint16_t reg_idx;
+   uint64_t offloads;
+   uint16_t tx_next_dd;
+   uint16_t tx_next_rs;
+   uint64_t mbuf_errors;
+   bool tx_deferred_start; /* don't start this queue in dev start */
+   bool q_set; /* indicate if tx queue has been configured */
+   union {  /* the VSI this queue belongs to */
+   struct ice_vsi *ice_vsi;
+   struct i40e_vsi *i40e_vsi;
+   };
+   const struct rte_memzone *mz;
+
+   union {
+   struct { /* ICE driver specific values */
+   ice_tx_release_mbufs_t tx_rel_mbufs;
+   uint32_t q_teid; /* TX schedule node id. */
+   };
+   struct { /* I40E driver specific values */
+   uint8_t dcb_tc;
+   };
+   };
+};
+
 static __rte_always_inline void
 ci_tx_backlog_entry(struct ci_tx_entry *txep, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
 {
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 30dcdc68a8..bf5560ccc8 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -3685,7 +3685,7 @@ i40e_dev_update_mbuf_stats(struct rte_eth_dev *ethdev,
struct i40e_mbuf_stats *mbuf_stats)
 {
uint16_t idx;
-   struct i40e_tx_queue *txq;
+   struct c

[PATCH v2 05/22] drivers/net: add prefix for driver-specific structs

2024-12-03 Thread Bruce Richardson
In preparation for merging the Tx structs for multiple drivers into a
single struct, rename the driver-specific pointers in each struct to
have a prefix on it, to avoid conflicts.

Signed-off-by: Bruce Richardson 
---
 drivers/net/i40e/i40e_fdir.c  |  6 +--
 .../net/i40e/i40e_recycle_mbufs_vec_common.c  |  2 +-
 drivers/net/i40e/i40e_rxtx.c  | 30 ++--
 drivers/net/i40e/i40e_rxtx.h  |  4 +-
 drivers/net/i40e/i40e_rxtx_vec_altivec.c  |  6 +--
 drivers/net/i40e/i40e_rxtx_vec_avx2.c |  6 +--
 drivers/net/i40e/i40e_rxtx_vec_avx512.c   |  8 ++--
 drivers/net/i40e/i40e_rxtx_vec_common.h   |  2 +-
 drivers/net/i40e/i40e_rxtx_vec_neon.c |  6 +--
 drivers/net/i40e/i40e_rxtx_vec_sse.c  |  6 +--
 drivers/net/iavf/iavf_rxtx.c  | 24 +-
 drivers/net/iavf/iavf_rxtx.h  |  4 +-
 drivers/net/iavf/iavf_rxtx_vec_avx2.c |  6 +--
 drivers/net/iavf/iavf_rxtx_vec_avx512.c   | 14 +++---
 drivers/net/iavf/iavf_rxtx_vec_common.h   |  2 +-
 drivers/net/iavf/iavf_rxtx_vec_sse.c  |  6 +--
 drivers/net/ice/ice_dcf_ethdev.c  |  4 +-
 drivers/net/ice/ice_rxtx.c| 48 +--
 drivers/net/ice/ice_rxtx.h|  4 +-
 drivers/net/ice/ice_rxtx_vec_avx2.c   |  6 +--
 drivers/net/ice/ice_rxtx_vec_avx512.c |  8 ++--
 drivers/net/ice/ice_rxtx_vec_common.h |  4 +-
 drivers/net/ice/ice_rxtx_vec_sse.c|  6 +--
 .../ixgbe/ixgbe_recycle_mbufs_vec_common.c|  2 +-
 drivers/net/ixgbe/ixgbe_rxtx.c| 22 -
 drivers/net/ixgbe/ixgbe_rxtx.h|  2 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h |  6 +--
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c   |  6 +--
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c|  6 +--
 29 files changed, 128 insertions(+), 128 deletions(-)

diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index 47f79ecf11..c600167634 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -1383,7 +1383,7 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
volatile struct i40e_tx_desc *tmp_txdp;
 
tmp_tail = txq->tx_tail;
-   tmp_txdp = &txq->tx_ring[tmp_tail + 1];
+   tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
 
do {
if ((tmp_txdp->cmd_type_offset_bsz &
@@ -1640,7 +1640,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 
PMD_DRV_LOG(INFO, "filling filter programming descriptor.");
fdirdp = (volatile struct i40e_filter_program_desc *)
-   (&txq->tx_ring[txq->tx_tail]);
+   (&txq->i40e_tx_ring[txq->tx_tail]);
 
fdirdp->qindex_flex_ptype_vsi =
rte_cpu_to_le_32((fdir_action->rx_queue <<
@@ -1710,7 +1710,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
fdirdp->fd_id = rte_cpu_to_le_32(filter->soft_id);
 
PMD_DRV_LOG(INFO, "filling transmit descriptor.");
-   txdp = &txq->tx_ring[txq->tx_tail + 1];
+   txdp = &txq->i40e_tx_ring[txq->tx_tail + 1];
txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail >> 
1]);
 
td_cmd = I40E_TX_DESC_CMD_EOP |
diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c 
b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c
index 260d238ce4..8679e5c1fd 100644
--- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c
+++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c
@@ -75,7 +75,7 @@ i40e_recycle_tx_mbufs_reuse_vec(void *tx_queue,
return 0;
 
/* check DD bits on threshold descriptor */
-   if ((txq->tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+   if ((txq->i40e_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
return 0;
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index b0bb20fe9a..34ef931859 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -379,7 +379,7 @@ static inline int
 i40e_xmit_cleanup(struct i40e_tx_queue *txq)
 {
struct ci_tx_entry *sw_ring = txq->sw_ring;
-   volatile struct i40e_tx_desc *txd = txq->tx_ring;
+   volatile struct i40e_tx_desc *txd = txq->i40e_tx_ring;
uint16_t last_desc_cleaned = txq->last_desc_cleaned;
uint16_t nb_tx_desc = txq->nb_tx_desc;
uint16_t desc_to_clean_to;
@@ -1103,7 +1103,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
 
txq = tx_queue;
sw_ring = txq->sw_ring;
-   txr = txq->tx_ring;
+   txr = txq->i40e_tx_ring;
tx_id = txq->tx_tail;
txe = &sw_ring[tx_id];
 
@@ -1338,7 +1338,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *

[PATCH v2 08/22] net/ixgbe: convert Tx queue context cache field to ptr

2024-12-03 Thread Bruce Richardson
Rather than having a two element array of context cache values inside
the Tx queue structure, convert it to a pointer to a cache at the end of
the structure. This makes future merging of the structure easier as we
don't need the "ixgbe_advctx_info" struct defined when defining a
combined queue structure.

Signed-off-by: Bruce Richardson 
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 7 ---
 drivers/net/ixgbe/ixgbe_rxtx.h | 4 ++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index f7ddbba1b6..2ca26cd132 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -2522,8 +2522,7 @@ ixgbe_reset_tx_queue(struct ixgbe_tx_queue *txq)
txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
txq->ctx_curr = 0;
-   memset((void *)&txq->ctx_cache, 0,
-   IXGBE_CTX_NUM * sizeof(struct ixgbe_advctx_info));
+   memset(txq->ctx_cache, 0, IXGBE_CTX_NUM * sizeof(struct 
ixgbe_advctx_info));
 }
 
 static const struct ixgbe_txq_ops def_txq_ops = {
@@ -2741,10 +2740,12 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
}
 
/* First allocate the tx queue data structure */
-   txq = rte_zmalloc_socket("ethdev TX queue", sizeof(struct 
ixgbe_tx_queue),
+   txq = rte_zmalloc_socket("ethdev TX queue", sizeof(struct 
ixgbe_tx_queue) +
+   sizeof(struct ixgbe_advctx_info) * 
IXGBE_CTX_NUM,
 RTE_CACHE_LINE_SIZE, socket_id);
if (txq == NULL)
return -ENOMEM;
+   txq->ctx_cache = RTE_PTR_ADD(txq, sizeof(struct ixgbe_tx_queue));
 
/*
 * Allocate TX ring hardware descriptors. A memzone large enough to
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index f6bae37cf3..847cacf7b5 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -215,8 +215,8 @@ struct ixgbe_tx_queue {
uint8_t wthresh;   /**< Write-back threshold reg. */
uint64_t offloads; /**< Tx offload flags of RTE_ETH_TX_OFFLOAD_* */
uint32_tctx_curr;  /**< Hardware context states. */
-   /** Hardware context0 history. */
-   struct ixgbe_advctx_info ctx_cache[IXGBE_CTX_NUM];
+   /** Hardware context history. */
+   struct ixgbe_advctx_info *ctx_cache;
const struct ixgbe_txq_ops *ops;   /**< txq ops */
booltx_deferred_start; /**< not in global dev start. */
 #ifdef RTE_LIB_SECURITY
-- 
2.43.0



[PATCH v2 09/22] net/ixgbe: use common Tx queue structure

2024-12-03 Thread Bruce Richardson
Merge in additional fields used by the ixgbe driver and then convert it
over to using the common Tx queue structure.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h| 14 +++-
 drivers/net/ixgbe/ixgbe_ethdev.c  |  4 +-
 .../ixgbe/ixgbe_recycle_mbufs_vec_common.c|  2 +-
 drivers/net/ixgbe/ixgbe_rxtx.c| 64 +--
 drivers/net/ixgbe/ixgbe_rxtx.h| 56 ++--
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 26 
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c   | 14 ++--
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c| 14 ++--
 8 files changed, 80 insertions(+), 114 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index c4a1a0c816..51ae3b051d 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -34,9 +34,13 @@ struct ci_tx_queue {
volatile struct i40e_tx_desc *i40e_tx_ring;
volatile struct iavf_tx_desc *iavf_tx_ring;
volatile struct ice_tx_desc *ice_tx_ring;
+   volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
};
volatile uint8_t *qtx_tail;   /* register address of tail */
-   struct ci_tx_entry *sw_ring; /* virtual address of SW ring */
+   union {
+   struct ci_tx_entry *sw_ring; /* virtual address of SW ring */
+   struct ci_tx_entry_vec *sw_ring_vec;
+   };
rte_iova_t tx_ring_dma;/* TX ring DMA address */
uint16_t nb_tx_desc;   /* number of TX descriptors */
uint16_t tx_tail; /* current value of tail register */
@@ -87,6 +91,14 @@ struct ci_tx_queue {
uint8_t tc;
bool use_ctx;  /* with ctx info, each pkt needs two 
descriptors */
};
+   struct { /* ixgbe specific values */
+   const struct ixgbe_txq_ops *ops;
+   struct ixgbe_advctx_info *ctx_cache;
+   uint32_t ctx_curr;
+#ifdef RTE_LIB_SECURITY
+   uint8_t using_ipsec;  /**< indicates that IPsec TX 
feature is in use */
+#endif
+   };
};
 };
 
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 8bee97d191..5f18fbaad5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1118,7 +1118,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void 
*init_params __rte_unused)
 * RX and TX function.
 */
if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-   struct ixgbe_tx_queue *txq;
+   struct ci_tx_queue *txq;
/* TX queue function in primary, set by last queue initialized
 * Tx queue may not initialized by primary process
 */
@@ -1623,7 +1623,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
 * RX function
 */
if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-   struct ixgbe_tx_queue *txq;
+   struct ci_tx_queue *txq;
/* TX queue function in primary, set by last queue initialized
 * Tx queue may not initialized by primary process
 */
diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c 
b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c
index a878db3150..3fd05ed5eb 100644
--- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c
+++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c
@@ -51,7 +51,7 @@ uint16_t
 ixgbe_recycle_tx_mbufs_reuse_vec(void *tx_queue,
struct rte_eth_recycle_rxq_info *recycle_rxq_info)
 {
-   struct ixgbe_tx_queue *txq = tx_queue;
+   struct ci_tx_queue *txq = tx_queue;
struct ci_tx_entry *txep;
struct rte_mbuf **rxep;
int i, n;
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 2ca26cd132..344ef85685 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -98,7 +98,7 @@
  * Return the total number of buffers freed.
  */
 static __rte_always_inline int
-ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
+ixgbe_tx_free_bufs(struct ci_tx_queue *txq)
 {
struct ci_tx_entry *txep;
uint32_t status;
@@ -195,7 +195,7 @@ tx1(volatile union ixgbe_adv_tx_desc *txdp, struct rte_mbuf 
**pkts)
  * Copy mbuf pointers to the S/W ring.
  */
 static inline void
-ixgbe_tx_fill_hw_ring(struct ixgbe_tx_queue *txq, struct rte_mbuf **pkts,
+ixgbe_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
  uint16_t nb_pkts)
 {
volatile union ixgbe_adv_tx_desc *txdp = 
&txq->ixgbe_tx_ring[txq->tx_tail];
@@ -231,7 +231,7 @@ static inline uint16_t
 tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t nb_pkts)
 {
-   struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+   struct ci_

[PATCH v2 07/22] net/iavf: use common Tx queue structure

2024-12-03 Thread Bruce Richardson
Merge in the few additional fields used by iavf driver and convert it to
using the common Tx queue structure also.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h  | 15 +++-
 drivers/net/iavf/iavf.h |  2 +-
 drivers/net/iavf/iavf_ethdev.c  |  4 +-
 drivers/net/iavf/iavf_rxtx.c| 42 ++---
 drivers/net/iavf/iavf_rxtx.h| 49 +++--
 drivers/net/iavf/iavf_rxtx_vec_avx2.c   |  4 +-
 drivers/net/iavf/iavf_rxtx_vec_avx512.c | 14 +++
 drivers/net/iavf/iavf_rxtx_vec_common.h |  8 ++--
 drivers/net/iavf/iavf_rxtx_vec_sse.c|  8 ++--
 drivers/net/iavf/iavf_vchnl.c   |  6 +--
 10 files changed, 62 insertions(+), 90 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index c965f5ee6c..c4a1a0c816 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -31,8 +31,9 @@ typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue 
*txq);
 
 struct ci_tx_queue {
union { /* TX ring virtual address */
-   volatile struct ice_tx_desc *ice_tx_ring;
volatile struct i40e_tx_desc *i40e_tx_ring;
+   volatile struct iavf_tx_desc *iavf_tx_ring;
+   volatile struct ice_tx_desc *ice_tx_ring;
};
volatile uint8_t *qtx_tail;   /* register address of tail */
struct ci_tx_entry *sw_ring; /* virtual address of SW ring */
@@ -63,8 +64,9 @@ struct ci_tx_queue {
bool tx_deferred_start; /* don't start this queue in dev start */
bool q_set; /* indicate if tx queue has been configured */
union {  /* the VSI this queue belongs to */
-   struct ice_vsi *ice_vsi;
struct i40e_vsi *i40e_vsi;
+   struct iavf_vsi *iavf_vsi;
+   struct ice_vsi *ice_vsi;
};
const struct rte_memzone *mz;
 
@@ -76,6 +78,15 @@ struct ci_tx_queue {
struct { /* I40E driver specific values */
uint8_t dcb_tc;
};
+   struct { /* iavf driver specific values */
+   uint16_t ipsec_crypto_pkt_md_offset;
+   uint8_t rel_mbufs_type;
+#define IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1 BIT(0)
+#define IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 BIT(1)
+   uint8_t vlan_flag;
+   uint8_t tc;
+   bool use_ctx;  /* with ctx info, each pkt needs two 
descriptors */
+   };
};
 };
 
diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index ad526c644c..956c60ef45 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -98,7 +98,7 @@
 
 struct iavf_adapter;
 struct iavf_rx_queue;
-struct iavf_tx_queue;
+struct ci_tx_queue;
 
 
 struct iavf_ipsec_crypto_stats {
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index 7f80cd6258..328c224c93 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -954,7 +954,7 @@ static int
 iavf_start_queues(struct rte_eth_dev *dev)
 {
struct iavf_rx_queue *rxq;
-   struct iavf_tx_queue *txq;
+   struct ci_tx_queue *txq;
int i;
uint16_t nb_txq, nb_rxq;
 
@@ -1885,7 +1885,7 @@ iavf_dev_update_mbuf_stats(struct rte_eth_dev *ethdev,
struct iavf_mbuf_stats *mbuf_stats)
 {
uint16_t idx;
-   struct iavf_tx_queue *txq;
+   struct ci_tx_queue *txq;
 
for (idx = 0; idx < ethdev->data->nb_tx_queues; idx++) {
txq = ethdev->data->tx_queues[idx];
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index 6eda91e76b..7e381b2a17 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -213,7 +213,7 @@ check_rx_vec_allow(struct iavf_rx_queue *rxq)
 }
 
 static inline bool
-check_tx_vec_allow(struct iavf_tx_queue *txq)
+check_tx_vec_allow(struct ci_tx_queue *txq)
 {
if (!(txq->offloads & IAVF_TX_NO_VECTOR_FLAGS) &&
txq->tx_rs_thresh >= IAVF_VPMD_TX_MAX_BURST &&
@@ -282,7 +282,7 @@ reset_rx_queue(struct iavf_rx_queue *rxq)
 }
 
 static inline void
-reset_tx_queue(struct iavf_tx_queue *txq)
+reset_tx_queue(struct ci_tx_queue *txq)
 {
struct ci_tx_entry *txe;
uint32_t i, size;
@@ -388,7 +388,7 @@ release_rxq_mbufs(struct iavf_rx_queue *rxq)
 }
 
 static inline void
-release_txq_mbufs(struct iavf_tx_queue *txq)
+release_txq_mbufs(struct ci_tx_queue *txq)
 {
uint16_t i;
 
@@ -778,7 +778,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
struct iavf_info *vf =
IAVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
struct iavf_vsi *vsi = &vf->vsi;
-   struct iavf_tx_queue *txq;
+   struct ci_tx_queue *txq;
const struct rte_memzone *mz;
uint32_t ring_size;
uint16_t tx_rs_thresh, tx_free_thresh;
@@ -814,7 +814,7 @@ i

[PATCH v2 04/22] drivers/net: align Tx queue struct field names

2024-12-03 Thread Bruce Richardson
Across the various Intel drivers sometimes different names are given to
fields in the Tx queue structure which have the same function. Do some
renaming to align things better for future merging.

Signed-off-by: Bruce Richardson 
---
 drivers/net/i40e/i40e_rxtx.c|  6 +--
 drivers/net/i40e/i40e_rxtx.h|  2 +-
 drivers/net/iavf/iavf_rxtx.c| 60 -
 drivers/net/iavf/iavf_rxtx.h| 14 +++---
 drivers/net/iavf/iavf_rxtx_vec_avx2.c   | 19 
 drivers/net/iavf/iavf_rxtx_vec_avx512.c | 57 +++
 drivers/net/iavf/iavf_rxtx_vec_common.h | 24 +-
 drivers/net/iavf/iavf_rxtx_vec_sse.c| 18 
 drivers/net/iavf/iavf_vchnl.c   |  2 +-
 drivers/net/ixgbe/base/ixgbe_osdep.h|  2 +-
 drivers/net/ixgbe/ixgbe_rxtx.c  | 16 +++
 drivers/net/ixgbe/ixgbe_rxtx.h  |  6 +--
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c |  2 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c  |  2 +-
 14 files changed, 116 insertions(+), 114 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 2e1f07d2a1..b0bb20fe9a 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -2549,7 +2549,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
txq->vsi = vsi;
txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
-   txq->tx_ring_phys_addr = tz->iova;
+   txq->tx_ring_dma = tz->iova;
txq->tx_ring = (struct i40e_tx_desc *)tz->addr;
 
/* Allocate software ring */
@@ -2923,7 +2923,7 @@ i40e_tx_queue_init(struct i40e_tx_queue *txq)
/* clear the context structure first */
memset(&tx_ctx, 0, sizeof(tx_ctx));
tx_ctx.new_context = 1;
-   tx_ctx.base = txq->tx_ring_phys_addr / I40E_QUEUE_BASE_ADDR_UNIT;
+   tx_ctx.base = txq->tx_ring_dma / I40E_QUEUE_BASE_ADDR_UNIT;
tx_ctx.qlen = txq->nb_tx_desc;
 
 #ifdef RTE_LIBRTE_IEEE1588
@@ -3209,7 +3209,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
txq->reg_idx = pf->fdir.fdir_vsi->base_queue;
txq->vsi = pf->fdir.fdir_vsi;
 
-   txq->tx_ring_phys_addr = tz->iova;
+   txq->tx_ring_dma = tz->iova;
txq->tx_ring = (struct i40e_tx_desc *)tz->addr;
 
/*
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 0f5d3cb0b7..f420c98687 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -129,7 +129,7 @@ struct i40e_rx_queue {
  */
 struct i40e_tx_queue {
uint16_t nb_tx_desc; /**< number of TX descriptors */
-   uint64_t tx_ring_phys_addr; /**< TX ring DMA address */
+   rte_iova_t tx_ring_dma; /**< TX ring DMA address */
volatile struct i40e_tx_desc *tx_ring; /**< TX ring virtual address */
struct ci_tx_entry *sw_ring; /**< virtual address of SW ring */
uint16_t tx_tail; /**< current value of tail register */
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index e337f20073..adaaeb4625 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -216,8 +216,8 @@ static inline bool
 check_tx_vec_allow(struct iavf_tx_queue *txq)
 {
if (!(txq->offloads & IAVF_TX_NO_VECTOR_FLAGS) &&
-   txq->rs_thresh >= IAVF_VPMD_TX_MAX_BURST &&
-   txq->rs_thresh <= IAVF_VPMD_TX_MAX_FREE_BUF) {
+   txq->tx_rs_thresh >= IAVF_VPMD_TX_MAX_BURST &&
+   txq->tx_rs_thresh <= IAVF_VPMD_TX_MAX_FREE_BUF) {
PMD_INIT_LOG(DEBUG, "Vector tx can be enabled on this txq.");
return true;
}
@@ -309,13 +309,13 @@ reset_tx_queue(struct iavf_tx_queue *txq)
}
 
txq->tx_tail = 0;
-   txq->nb_used = 0;
+   txq->nb_tx_used = 0;
 
txq->last_desc_cleaned = txq->nb_tx_desc - 1;
-   txq->nb_free = txq->nb_tx_desc - 1;
+   txq->nb_tx_free = txq->nb_tx_desc - 1;
 
-   txq->next_dd = txq->rs_thresh - 1;
-   txq->next_rs = txq->rs_thresh - 1;
+   txq->tx_next_dd = txq->tx_rs_thresh - 1;
+   txq->tx_next_rs = txq->tx_rs_thresh - 1;
 }
 
 static int
@@ -845,8 +845,8 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
}
 
txq->nb_tx_desc = nb_desc;
-   txq->rs_thresh = tx_rs_thresh;
-   txq->free_thresh = tx_free_thresh;
+   txq->tx_rs_thresh = tx_rs_thresh;
+   txq->tx_free_thresh = tx_free_thresh;
txq->queue_id = queue_idx;
txq->port_id = dev->data->port_id;
txq->offloads = offloads;
@@ -881,7 +881,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
rte_free(txq);
return -ENOMEM;
}
-   txq->tx_ring_phys_addr = mz->iova;
+   txq->tx_ring_dma = mz->iova;
txq->tx_ring = (struct iavf_tx_desc *)mz->addr;
 
txq->mz = mz;
@@ -2387,7 +2387,7 @@ iavf_xmit_cleanup(struct iavf_tx_queue *txq)
 
volatile struct iavf_tx_desc *txd = txq->tx_ring;
 
-   desc_to_clean_to = (uint16_t)(last_desc_clea

[PATCH v2 10/22] net/_common_intel: pack Tx queue structure

2024-12-03 Thread Bruce Richardson
Move some fields about to better pack the Tx queue structure and make
sure all data used by the vector codepaths is on the first cacheline of
the structure. Checking with "pahole" on 64-bit build, only one 6-byte
hole is left in the structure - on second cacheline - after this patch.

As part of the reordering, move the p/h/wthresh values to the
ixgbe-specific part of the union. That is the only driver which actually
uses those values. i40e and ice drivers just record the values for later
return, so we can drop them from the Tx queue structure for those
drivers and just report the defaults in all cases.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h | 12 +---
 drivers/net/i40e/i40e_rxtx.c   |  9 +++--
 drivers/net/ice/ice_rxtx.c |  9 +++--
 3 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index 51ae3b051d..c372d2838b 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -41,7 +41,6 @@ struct ci_tx_queue {
struct ci_tx_entry *sw_ring; /* virtual address of SW ring */
struct ci_tx_entry_vec *sw_ring_vec;
};
-   rte_iova_t tx_ring_dma;/* TX ring DMA address */
uint16_t nb_tx_desc;   /* number of TX descriptors */
uint16_t tx_tail; /* current value of tail register */
uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
@@ -55,16 +54,14 @@ struct ci_tx_queue {
uint16_t tx_free_thresh;
/* Number of TX descriptors to use before RS bit is set. */
uint16_t tx_rs_thresh;
-   uint8_t pthresh;   /**< Prefetch threshold register. */
-   uint8_t hthresh;   /**< Host threshold register. */
-   uint8_t wthresh;   /**< Write-back threshold reg. */
uint16_t port_id;  /* Device port identifier. */
uint16_t queue_id; /* TX queue index. */
uint16_t reg_idx;
-   uint64_t offloads;
uint16_t tx_next_dd;
uint16_t tx_next_rs;
+   uint64_t offloads;
uint64_t mbuf_errors;
+   rte_iova_t tx_ring_dma;/* TX ring DMA address */
bool tx_deferred_start; /* don't start this queue in dev start */
bool q_set; /* indicate if tx queue has been configured */
union {  /* the VSI this queue belongs to */
@@ -95,9 +92,10 @@ struct ci_tx_queue {
const struct ixgbe_txq_ops *ops;
struct ixgbe_advctx_info *ctx_cache;
uint32_t ctx_curr;
-#ifdef RTE_LIB_SECURITY
+   uint8_t pthresh;   /**< Prefetch threshold register. */
+   uint8_t hthresh;   /**< Host threshold register. */
+   uint8_t wthresh;   /**< Write-back threshold reg. */
uint8_t using_ipsec;  /**< indicates that IPsec TX 
feature is in use */
-#endif
};
};
 };
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 305bc53480..539b170266 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -2539,9 +2539,6 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
txq->nb_tx_desc = nb_desc;
txq->tx_rs_thresh = tx_rs_thresh;
txq->tx_free_thresh = tx_free_thresh;
-   txq->pthresh = tx_conf->tx_thresh.pthresh;
-   txq->hthresh = tx_conf->tx_thresh.hthresh;
-   txq->wthresh = tx_conf->tx_thresh.wthresh;
txq->queue_id = queue_idx;
txq->reg_idx = reg_idx;
txq->port_id = dev->data->port_id;
@@ -3310,9 +3307,9 @@ i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t 
queue_id,
 
qinfo->nb_desc = txq->nb_tx_desc;
 
-   qinfo->conf.tx_thresh.pthresh = txq->pthresh;
-   qinfo->conf.tx_thresh.hthresh = txq->hthresh;
-   qinfo->conf.tx_thresh.wthresh = txq->wthresh;
+   qinfo->conf.tx_thresh.pthresh = I40E_DEFAULT_TX_PTHRESH;
+   qinfo->conf.tx_thresh.hthresh = I40E_DEFAULT_TX_HTHRESH;
+   qinfo->conf.tx_thresh.wthresh = I40E_DEFAULT_TX_WTHRESH;
 
qinfo->conf.tx_free_thresh = txq->tx_free_thresh;
qinfo->conf.tx_rs_thresh = txq->tx_rs_thresh;
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index bcc7c7a016..e2e147ba3e 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1492,9 +1492,6 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
txq->nb_tx_desc = nb_desc;
txq->tx_rs_thresh = tx_rs_thresh;
txq->tx_free_thresh = tx_free_thresh;
-   txq->pthresh = tx_conf->tx_thresh.pthresh;
-   txq->hthresh = tx_conf->tx_thresh.hthresh;
-   txq->wthresh = tx_conf->tx_thresh.wthresh;
txq->queue_id = queue_idx;
 
txq->reg_idx = vsi->base_queue + queue_idx;
@@ -1583,9 +1580,9 @@ ice_txq_info_get(struct rte_eth_dev *dev, uint16_t 
queue_id,
 
qinfo->nb_desc = txq->nb_tx_desc;
 
-   qinfo->conf.tx_th

[PATCH v2 11/22] net/_common_intel: add post-Tx buffer free function

2024-12-03 Thread Bruce Richardson
The actions taken for post-Tx buffer free for the SSE and AVX drivers
for i40e, iavf and ice drivers are all common, so centralize those in
common/intel_eth driver.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h  | 71 
 drivers/net/i40e/i40e_rxtx_vec_common.h | 72 -
 drivers/net/iavf/iavf_rxtx_vec_common.h | 61 -
 drivers/net/ice/ice_rxtx_vec_common.h   | 61 -
 4 files changed, 98 insertions(+), 167 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index c372d2838b..a930309c05 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 
 /* forward declaration of the common intel (ci) queue structure */
 struct ci_tx_queue;
@@ -107,4 +108,74 @@ ci_tx_backlog_entry(struct ci_tx_entry *txep, struct 
rte_mbuf **tx_pkts, uint16_
txep[i].mbuf = tx_pkts[i];
 }
 
+#define IETH_VPMD_TX_MAX_FREE_BUF 64
+
+typedef int (*ci_desc_done_fn)(struct ci_tx_queue *txq, uint16_t idx);
+
+static __rte_always_inline int
+ci_tx_free_bufs(struct ci_tx_queue *txq, ci_desc_done_fn desc_done)
+{
+   struct ci_tx_entry *txep;
+   uint32_t n;
+   uint32_t i;
+   int nb_free = 0;
+   struct rte_mbuf *m, *free[IETH_VPMD_TX_MAX_FREE_BUF];
+
+   /* check DD bits on threshold descriptor */
+   if (!desc_done(txq, txq->tx_next_dd))
+   return 0;
+
+   n = txq->tx_rs_thresh;
+
+/* first buffer to free from S/W ring is at index
+ * tx_next_dd - (tx_rs_thresh-1)
+ */
+   txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
+
+   if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
+   for (i = 0; i < n; i++) {
+   free[i] = txep[i].mbuf;
+   /* no need to reset txep[i].mbuf in vector path */
+   }
+   rte_mempool_put_bulk(free[0]->pool, (void **)free, n);
+   goto done;
+   }
+
+   m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
+   if (likely(m != NULL)) {
+   free[0] = m;
+   nb_free = 1;
+   for (i = 1; i < n; i++) {
+   m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
+   if (likely(m != NULL)) {
+   if (likely(m->pool == free[0]->pool)) {
+   free[nb_free++] = m;
+   } else {
+   rte_mempool_put_bulk(free[0]->pool,
+(void *)free,
+nb_free);
+   free[0] = m;
+   nb_free = 1;
+   }
+   }
+   }
+   rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+   } else {
+   for (i = 1; i < n; i++) {
+   m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
+   if (m != NULL)
+   rte_mempool_put(m->pool, m);
+   }
+   }
+
+done:
+   /* buffers were freed, update counters */
+   txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
+   txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
+   if (txq->tx_next_dd >= txq->nb_tx_desc)
+   txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
+
+   return txq->tx_rs_thresh;
+}
+
 #endif /* _COMMON_INTEL_TX_H_ */
diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h 
b/drivers/net/i40e/i40e_rxtx_vec_common.h
index 57d6263ccf..907d32dd0b 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
@@ -16,72 +16,18 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
+static inline int
+i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
+{
+   return (txq->i40e_tx_ring[idx].cmd_type_offset_bsz &
+   rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
+   rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
+}
+
 static __rte_always_inline int
 i40e_tx_free_bufs(struct ci_tx_queue *txq)
 {
-   struct ci_tx_entry *txep;
-   uint32_t n;
-   uint32_t i;
-   int nb_free = 0;
-   struct rte_mbuf *m, *free[RTE_I40E_TX_MAX_FREE_BUF_SZ];
-
-   /* check DD bits on threshold descriptor */
-   if ((txq->i40e_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-   rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-   rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
-   return 0;
-
-   n = txq->tx_rs_thresh;
-
-/* first buffer to free from S/W ring is at index
- * tx_next_dd - (tx_rs_thresh-1)
- */
-   txep = &txq->

[PATCH v2 19/22] net/i40e: use vector SW ring for all vector paths

2024-12-03 Thread Bruce Richardson
The AVX-512 code path used a smaller SW ring structure only containing
the mbuf pointer, but no other fields. The other fields are only used in
the scalar code path, so update all vector driver code paths (AVX2, SSE,
Neon, Altivec) to use the smaller, faster structure.

Signed-off-by: Bruce Richardson 
---
 drivers/net/i40e/i40e_rxtx.c |  8 +---
 drivers/net/i40e/i40e_rxtx_vec_altivec.c | 12 ++--
 drivers/net/i40e/i40e_rxtx_vec_avx2.c| 12 ++--
 drivers/net/i40e/i40e_rxtx_vec_avx512.c  | 14 ++
 drivers/net/i40e/i40e_rxtx_vec_common.h  |  6 --
 drivers/net/i40e/i40e_rxtx_vec_neon.c| 12 ++--
 drivers/net/i40e/i40e_rxtx_vec_sse.c | 12 ++--
 7 files changed, 31 insertions(+), 45 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 081d743e62..745c467912 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1891,7 +1891,7 @@ i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
tx_queue_id);
 
txq->vector_tx = ad->tx_vec_allowed;
-   txq->vector_sw_ring = ad->tx_use_avx512;
+   txq->vector_sw_ring = txq->vector_tx;
 
/*
 * tx_queue_id is queue id application refers to, while
@@ -3550,9 +3550,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
}
}
 
+   if (rte_vect_get_max_simd_bitwidth() < RTE_VECT_SIMD_128)
+   ad->tx_vec_allowed = false;
+
if (ad->tx_simple_allowed) {
-   if (ad->tx_vec_allowed &&
-   rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_128) {
+   if (ad->tx_vec_allowed) {
 #ifdef RTE_ARCH_X86
if (ad->tx_use_avx512) {
 #ifdef CC_AVX512_SUPPORT
diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c 
b/drivers/net/i40e/i40e_rxtx_vec_altivec.c
index 500bba2cef..b6900a3e15 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c
@@ -553,14 +553,14 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 {
struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
volatile struct i40e_tx_desc *txdp;
-   struct ci_tx_entry *txep;
+   struct ci_tx_entry_vec *txep;
uint16_t n, nb_commit, tx_id;
uint64_t flags = I40E_TD_CMD;
uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
int i;
 
if (txq->nb_tx_free < txq->tx_free_thresh)
-   i40e_tx_free_bufs(txq);
+   ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
 
nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
nb_commit = nb_pkts;
@@ -569,13 +569,13 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 
tx_id = txq->tx_tail;
txdp = &txq->i40e_tx_ring[tx_id];
-   txep = &txq->sw_ring[tx_id];
+   txep = &txq->sw_ring_vec[tx_id];
 
txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 
n = (uint16_t)(txq->nb_tx_desc - tx_id);
if (nb_commit >= n) {
-   ci_tx_backlog_entry(txep, tx_pkts, n);
+   ci_tx_backlog_entry_vec(txep, tx_pkts, n);
 
for (i = 0; i < n - 1; ++i, ++tx_pkts, ++txdp)
vtx1(txdp, *tx_pkts, flags);
@@ -589,10 +589,10 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 
/* avoid reach the end of ring */
txdp = &txq->i40e_tx_ring[tx_id];
-   txep = &txq->sw_ring[tx_id];
+   txep = &txq->sw_ring_vec[tx_id];
}
 
-   ci_tx_backlog_entry(txep, tx_pkts, nb_commit);
+   ci_tx_backlog_entry_vec(txep, tx_pkts, nb_commit);
 
vtx(txdp, tx_pkts, nb_commit, flags);
 
diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c 
b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
index 29bef64287..2477573c01 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
@@ -745,13 +745,13 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct 
rte_mbuf **tx_pkts,
 {
struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
volatile struct i40e_tx_desc *txdp;
-   struct ci_tx_entry *txep;
+   struct ci_tx_entry_vec *txep;
uint16_t n, nb_commit, tx_id;
uint64_t flags = I40E_TD_CMD;
uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
 
if (txq->nb_tx_free < txq->tx_free_thresh)
-   i40e_tx_free_bufs(txq);
+   ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
 
nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
if (unlikely(nb_pkts == 0))
@@ -759,13 +759,13 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct 
rte_mbuf **tx_pkts,
 
tx_id = txq->tx_tail;
txdp = &txq->i40e_tx_ring[tx_id];
-   txep = &txq->sw_ring[tx_id];
+   txep = &txq->sw_ring_vec[tx_id];
 
txq->nb_tx_free = (uint16_t

[PATCH v2 21/22] net/_common_intel: remove unneeded code

2024-12-03 Thread Bruce Richardson
With all drivers using the common Tx structure updated so that their
vector paths all use the simplified Tx mbuf ring format, it's no longer
necessary to have a separate flag for the ring format and for use of a
vector driver.

Remove the former flag and base all decisions off the vector flag. With
that done, we go from having only two paths to consider for releasing
all mbufs in the ring, not three. That allows further simpification of
the "ci_txq_release_all_mbufs" function.

The separate function to free buffers from the vector driver not using
the simplified ring format can similarly be removed as no longer
necessary.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h| 97 +++
 drivers/net/i40e/i40e_rxtx.c  |  1 -
 drivers/net/iavf/iavf_rxtx_vec_sse.c  |  1 -
 drivers/net/ice/ice_rxtx.c|  1 -
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h |  1 -
 5 files changed, 10 insertions(+), 91 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index aa42b9b49f..d9cf4474fc 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -66,7 +66,6 @@ struct ci_tx_queue {
bool tx_deferred_start; /* don't start this queue in dev start */
bool q_set; /* indicate if tx queue has been configured */
bool vector_tx; /* port is using vector TX */
-   bool vector_sw_ring;/* port is using vectorized SW ring 
(ieth_tx_entry_vec) */
union {  /* the VSI this queue belongs to */
struct i40e_vsi *i40e_vsi;
struct iavf_vsi *iavf_vsi;
@@ -120,72 +119,6 @@ ci_tx_backlog_entry_vec(struct ci_tx_entry_vec *txep, 
struct rte_mbuf **tx_pkts,
 
 typedef int (*ci_desc_done_fn)(struct ci_tx_queue *txq, uint16_t idx);
 
-static __rte_always_inline int
-ci_tx_free_bufs(struct ci_tx_queue *txq, ci_desc_done_fn desc_done)
-{
-   struct ci_tx_entry *txep;
-   uint32_t n;
-   uint32_t i;
-   int nb_free = 0;
-   struct rte_mbuf *m, *free[IETH_VPMD_TX_MAX_FREE_BUF];
-
-   /* check DD bits on threshold descriptor */
-   if (!desc_done(txq, txq->tx_next_dd))
-   return 0;
-
-   n = txq->tx_rs_thresh;
-
-/* first buffer to free from S/W ring is at index
- * tx_next_dd - (tx_rs_thresh-1)
- */
-   txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
-
-   if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
-   for (i = 0; i < n; i++) {
-   free[i] = txep[i].mbuf;
-   /* no need to reset txep[i].mbuf in vector path */
-   }
-   rte_mempool_put_bulk(free[0]->pool, (void **)free, n);
-   goto done;
-   }
-
-   m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
-   if (likely(m != NULL)) {
-   free[0] = m;
-   nb_free = 1;
-   for (i = 1; i < n; i++) {
-   m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
-   if (likely(m != NULL)) {
-   if (likely(m->pool == free[0]->pool)) {
-   free[nb_free++] = m;
-   } else {
-   rte_mempool_put_bulk(free[0]->pool,
-(void *)free,
-nb_free);
-   free[0] = m;
-   nb_free = 1;
-   }
-   }
-   }
-   rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
-   } else {
-   for (i = 1; i < n; i++) {
-   m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
-   if (m != NULL)
-   rte_mempool_put(m->pool, m);
-   }
-   }
-
-done:
-   /* buffers were freed, update counters */
-   txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
-   txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
-   if (txq->tx_next_dd >= txq->nb_tx_desc)
-   txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
-
-   return txq->tx_rs_thresh;
-}
-
 static __rte_always_inline int
 ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done, bool 
ctx_descs)
 {
@@ -278,21 +211,6 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, 
ci_desc_done_fn desc_done, bool ctx
return txq->tx_rs_thresh;
 }
 
-#define IETH_FREE_BUFS_LOOP(swr, nb_desc, start, end) do { \
-   uint16_t i = start; \
-   if (end < i) { \
-   for (; i < nb_desc; i++) { \
-   rte_pktmbuf_free_seg(swr[i].mbuf); \
-   swr[i].mbuf = NULL; \
-   } \

[PATCH v2 14/22] net/ice: move Tx queue mbuf cleanup fn to common

2024-12-03 Thread Bruce Richardson
The functions to loop over the Tx queue and clean up all the mbufs on
it, e.g. for queue shutdown, is not device specific and so can move into
the common_intel headers. Only complication is ensuring that the
correct ring format, either minimal vector or full structure, is used.
Ice driver currently uses two functions and a function pointer to help
with this - though actually one of those functions uses a further check
inside it - so we can simplify this down to just one common function,
with a flag set in the appropriate place. This avoids checking for
AVX-512-specific functions, which were the only function using the
smaller struct in this driver.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h| 49 -
 drivers/net/ice/ice_dcf_ethdev.c  |  5 +--
 drivers/net/ice/ice_ethdev.h  |  3 +-
 drivers/net/ice/ice_rxtx.c| 33 +
 drivers/net/ice/ice_rxtx_vec_common.h | 51 ---
 drivers/net/ice/ice_rxtx_vec_sse.c|  4 ---
 6 files changed, 60 insertions(+), 85 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index 26aef528fa..1bf2a61b2f 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -65,6 +65,8 @@ struct ci_tx_queue {
rte_iova_t tx_ring_dma;/* TX ring DMA address */
bool tx_deferred_start; /* don't start this queue in dev start */
bool q_set; /* indicate if tx queue has been configured */
+   bool vector_tx; /* port is using vector TX */
+   bool vector_sw_ring;/* port is using vectorized SW ring 
(ieth_tx_entry_vec) */
union {  /* the VSI this queue belongs to */
struct i40e_vsi *i40e_vsi;
struct iavf_vsi *iavf_vsi;
@@ -74,7 +76,6 @@ struct ci_tx_queue {
 
union {
struct { /* ICE driver specific values */
-   ice_tx_release_mbufs_t tx_rel_mbufs;
uint32_t q_teid; /* TX schedule node id. */
};
struct { /* I40E driver specific values */
@@ -270,4 +271,50 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, 
ci_desc_done_fn desc_done, bool ctx
return txq->tx_rs_thresh;
 }
 
+#define IETH_FREE_BUFS_LOOP(txq, swr, start) do { \
+   uint16_t i = start; \
+   if (txq->tx_tail < i) { \
+   for (; i < txq->nb_tx_desc; i++) { \
+   rte_pktmbuf_free_seg(swr[i].mbuf); \
+   swr[i].mbuf = NULL; \
+   } \
+   i = 0; \
+   } \
+   for (; i < txq->tx_tail; i++) { \
+   rte_pktmbuf_free_seg(swr[i].mbuf); \
+   swr[i].mbuf = NULL; \
+   } \
+} while (0)
+
+static inline void
+ci_txq_release_all_mbufs(struct ci_tx_queue *txq)
+{
+   if (unlikely(!txq || !txq->sw_ring))
+   return;
+
+   if (!txq->vector_tx) {
+   for (uint16_t i = 0; i < txq->nb_tx_desc; i++) {
+   if (txq->sw_ring[i].mbuf != NULL) {
+   rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
+   txq->sw_ring[i].mbuf = NULL;
+   }
+   }
+   return;
+   }
+
+   /**
+*  vPMD tx will not set sw_ring's mbuf to NULL after free,
+*  so need to free remains more carefully.
+*/
+   const uint16_t start = txq->tx_next_dd - txq->tx_rs_thresh + 1;
+
+   if (txq->vector_sw_ring) {
+   struct ci_tx_entry_vec *swr = txq->sw_ring_vec;
+   IETH_FREE_BUFS_LOOP(txq, swr, start);
+   } else {
+   struct ci_tx_entry *swr = txq->sw_ring;
+   IETH_FREE_BUFS_LOOP(txq, swr, start);
+   }
+}
+
 #endif /* _COMMON_INTEL_TX_H_ */
diff --git a/drivers/net/ice/ice_dcf_ethdev.c b/drivers/net/ice/ice_dcf_ethdev.c
index a0c065d78c..c20399cd84 100644
--- a/drivers/net/ice/ice_dcf_ethdev.c
+++ b/drivers/net/ice/ice_dcf_ethdev.c
@@ -24,6 +24,7 @@
 #include "ice_generic_flow.h"
 #include "ice_dcf_ethdev.h"
 #include "ice_rxtx.h"
+#include "_common_intel/tx.h"
 
 #define DCF_NUM_MACADDR_MAX  64
 
@@ -500,7 +501,7 @@ ice_dcf_tx_queue_stop(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
}
 
txq = dev->data->tx_queues[tx_queue_id];
-   txq->tx_rel_mbufs(txq);
+   ci_txq_release_all_mbufs(txq);
reset_tx_queue(txq);
dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STOPPED;
 
@@ -650,7 +651,7 @@ ice_dcf_stop_queues(struct rte_eth_dev *dev)
txq = dev->data->tx_queues[i];
if (!txq)
continue;
-   txq->tx_rel_mbufs(txq);
+   ci_txq_release_all_mbufs(txq);
reset_tx_queue(txq);
dev->data->tx_queue_s

[PATCH v2 15/22] net/i40e: use common Tx queue mbuf cleanup fn

2024-12-03 Thread Bruce Richardson
Update driver to be similar to the "ice" driver and use the common mbuf
ring cleanup code on shutdown of a Tx queue.

Signed-off-by: Bruce Richardson 
---
 drivers/net/i40e/i40e_ethdev.h |  4 +-
 drivers/net/i40e/i40e_rxtx.c   | 70 --
 drivers/net/i40e/i40e_rxtx.h   |  1 -
 3 files changed, 9 insertions(+), 66 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index d351193ed9..ccc8732d7d 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -1260,12 +1260,12 @@ struct i40e_adapter {
 
/* For RSS reta table update */
uint8_t rss_reta_updated;
-#ifdef RTE_ARCH_X86
+
+   /* used only on x86, zero on other architectures */
bool rx_use_avx2;
bool rx_use_avx512;
bool tx_use_avx2;
bool tx_use_avx512;
-#endif
 };
 
 /**
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 539b170266..b70919c5dc 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1875,6 +1875,7 @@ i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
int err;
struct ci_tx_queue *txq;
struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   const struct i40e_adapter *ad = 
I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
 
PMD_INIT_FUNC_TRACE();
 
@@ -1889,6 +1890,9 @@ i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
PMD_DRV_LOG(WARNING, "TX queue %u is deferred start",
tx_queue_id);
 
+   txq->vector_tx = ad->tx_vec_allowed;
+   txq->vector_sw_ring = ad->tx_use_avx512;
+
/*
 * tx_queue_id is queue id application refers to, while
 * rxq->reg_idx is the real queue index.
@@ -1929,7 +1933,7 @@ i40e_dev_tx_queue_stop(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
return err;
}
 
-   i40e_tx_queue_release_mbufs(txq);
+   ci_txq_release_all_mbufs(txq);
i40e_reset_tx_queue(txq);
dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STOPPED;
 
@@ -2604,7 +2608,7 @@ i40e_tx_queue_release(void *txq)
return;
}
 
-   i40e_tx_queue_release_mbufs(q);
+   ci_txq_release_all_mbufs(q);
rte_free(q->sw_ring);
rte_memzone_free(q->mz);
rte_free(q);
@@ -2701,66 +2705,6 @@ i40e_reset_rx_queue(struct i40e_rx_queue *rxq)
rxq->rxrearm_nb = 0;
 }
 
-void
-i40e_tx_queue_release_mbufs(struct ci_tx_queue *txq)
-{
-   struct rte_eth_dev *dev;
-   uint16_t i;
-
-   if (!txq || !txq->sw_ring) {
-   PMD_DRV_LOG(DEBUG, "Pointer to txq or sw_ring is NULL");
-   return;
-   }
-
-   dev = &rte_eth_devices[txq->port_id];
-
-   /**
-*  vPMD tx will not set sw_ring's mbuf to NULL after free,
-*  so need to free remains more carefully.
-*/
-#ifdef CC_AVX512_SUPPORT
-   if (dev->tx_pkt_burst == i40e_xmit_pkts_vec_avx512) {
-   struct ci_tx_entry_vec *swr = (void *)txq->sw_ring;
-
-   i = txq->tx_next_dd - txq->tx_rs_thresh + 1;
-   if (txq->tx_tail < i) {
-   for (; i < txq->nb_tx_desc; i++) {
-   rte_pktmbuf_free_seg(swr[i].mbuf);
-   swr[i].mbuf = NULL;
-   }
-   i = 0;
-   }
-   for (; i < txq->tx_tail; i++) {
-   rte_pktmbuf_free_seg(swr[i].mbuf);
-   swr[i].mbuf = NULL;
-   }
-   return;
-   }
-#endif
-   if (dev->tx_pkt_burst == i40e_xmit_pkts_vec_avx2 ||
-   dev->tx_pkt_burst == i40e_xmit_pkts_vec) {
-   i = txq->tx_next_dd - txq->tx_rs_thresh + 1;
-   if (txq->tx_tail < i) {
-   for (; i < txq->nb_tx_desc; i++) {
-   rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
-   txq->sw_ring[i].mbuf = NULL;
-   }
-   i = 0;
-   }
-   for (; i < txq->tx_tail; i++) {
-   rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
-   txq->sw_ring[i].mbuf = NULL;
-   }
-   } else {
-   for (i = 0; i < txq->nb_tx_desc; i++) {
-   if (txq->sw_ring[i].mbuf) {
-   rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
-   txq->sw_ring[i].mbuf = NULL;
-   }
-   }
-   }
-}
-
 static int
 i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
uint32_t free_cnt)
@@ -3127,7 +3071,7 @@ i40e_dev_clear_queues(struct rte_eth_dev *dev)
for (i = 0; i < dev->data->nb_tx_queues; i++) {
if (!dev->data->tx_queues[i])
 

[PATCH v2 17/22] net/iavf: use common Tx queue mbuf cleanup fn

2024-12-03 Thread Bruce Richardson
Adjust iavf driver to also use the common mbuf freeing functions on Tx
queue release/cleanup. The implementation is complicated a little by the
need to integrate the additional "has_ctx" parameter for the iavf code,
but changes in other drivers are minimal - just a constant "false"
parameter.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h  | 27 +-
 drivers/net/i40e/i40e_rxtx.c|  6 ++--
 drivers/net/iavf/iavf_rxtx.c| 37 ++---
 drivers/net/iavf/iavf_rxtx_vec_avx512.c | 24 ++--
 drivers/net/iavf/iavf_rxtx_vec_common.h | 18 
 drivers/net/iavf/iavf_rxtx_vec_sse.c|  9 ++
 drivers/net/ice/ice_dcf_ethdev.c|  4 +--
 drivers/net/ice/ice_rxtx.c  |  6 ++--
 drivers/net/ixgbe/ixgbe_rxtx.c  |  6 ++--
 9 files changed, 31 insertions(+), 106 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index 1bf2a61b2f..310b51adcf 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -271,23 +271,23 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, 
ci_desc_done_fn desc_done, bool ctx
return txq->tx_rs_thresh;
 }
 
-#define IETH_FREE_BUFS_LOOP(txq, swr, start) do { \
+#define IETH_FREE_BUFS_LOOP(swr, nb_desc, start, end) do { \
uint16_t i = start; \
-   if (txq->tx_tail < i) { \
-   for (; i < txq->nb_tx_desc; i++) { \
+   if (end < i) { \
+   for (; i < nb_desc; i++) { \
rte_pktmbuf_free_seg(swr[i].mbuf); \
swr[i].mbuf = NULL; \
} \
i = 0; \
} \
-   for (; i < txq->tx_tail; i++) { \
+   for (; i < end; i++) { \
rte_pktmbuf_free_seg(swr[i].mbuf); \
swr[i].mbuf = NULL; \
} \
 } while (0)
 
 static inline void
-ci_txq_release_all_mbufs(struct ci_tx_queue *txq)
+ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 {
if (unlikely(!txq || !txq->sw_ring))
return;
@@ -306,15 +306,14 @@ ci_txq_release_all_mbufs(struct ci_tx_queue *txq)
 *  vPMD tx will not set sw_ring's mbuf to NULL after free,
 *  so need to free remains more carefully.
 */
-   const uint16_t start = txq->tx_next_dd - txq->tx_rs_thresh + 1;
-
-   if (txq->vector_sw_ring) {
-   struct ci_tx_entry_vec *swr = txq->sw_ring_vec;
-   IETH_FREE_BUFS_LOOP(txq, swr, start);
-   } else {
-   struct ci_tx_entry *swr = txq->sw_ring;
-   IETH_FREE_BUFS_LOOP(txq, swr, start);
-   }
+   const uint16_t start = (txq->tx_next_dd - txq->tx_rs_thresh + 1) >> 
use_ctx;
+   const uint16_t nb_desc = txq->nb_tx_desc >> use_ctx;
+   const uint16_t end = txq->tx_tail >> use_ctx;
+
+   if (txq->vector_sw_ring)
+   IETH_FREE_BUFS_LOOP(txq->sw_ring_vec, nb_desc, start, end);
+   else
+   IETH_FREE_BUFS_LOOP(txq->sw_ring, nb_desc, start, end);
 }
 
 #endif /* _COMMON_INTEL_TX_H_ */
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index b70919c5dc..081d743e62 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1933,7 +1933,7 @@ i40e_dev_tx_queue_stop(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
return err;
}
 
-   ci_txq_release_all_mbufs(txq);
+   ci_txq_release_all_mbufs(txq, false);
i40e_reset_tx_queue(txq);
dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STOPPED;
 
@@ -2608,7 +2608,7 @@ i40e_tx_queue_release(void *txq)
return;
}
 
-   ci_txq_release_all_mbufs(q);
+   ci_txq_release_all_mbufs(q, false);
rte_free(q->sw_ring);
rte_memzone_free(q->mz);
rte_free(q);
@@ -3071,7 +3071,7 @@ i40e_dev_clear_queues(struct rte_eth_dev *dev)
for (i = 0; i < dev->data->nb_tx_queues; i++) {
if (!dev->data->tx_queues[i])
continue;
-   ci_txq_release_all_mbufs(dev->data->tx_queues[i]);
+   ci_txq_release_all_mbufs(dev->data->tx_queues[i], false);
i40e_reset_tx_queue(dev->data->tx_queues[i]);
}
 
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index 7e381b2a17..f0ab881ac5 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -387,24 +387,6 @@ release_rxq_mbufs(struct iavf_rx_queue *rxq)
rxq->rx_nb_avail = 0;
 }
 
-static inline void
-release_txq_mbufs(struct ci_tx_queue *txq)
-{
-   uint16_t i;
-
-   if (!txq || !txq->sw_ring) {
-   PMD_DRV_LOG(DEBUG, "Pointer to rxq or sw_ring is NULL");
-   return;
-   }
-
-   for (i = 0; i < txq->nb_tx_desc; i++) {
-   if (txq->sw

[PATCH v2 16/22] net/ixgbe: use common Tx queue mbuf cleanup fn

2024-12-03 Thread Bruce Richardson
Update driver to use the common cleanup function.

Signed-off-by: Bruce Richardson 
---
 drivers/net/ixgbe/ixgbe_rxtx.c| 22 +++---
 drivers/net/ixgbe/ixgbe_rxtx.h|  1 -
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 28 ++-
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c   |  7 --
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c|  7 --
 5 files changed, 5 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 344ef85685..bf9d461b06 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -2334,21 +2334,6 @@ ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct 
rte_mbuf **rx_pkts,
  *
  **/
 
-static void __rte_cold
-ixgbe_tx_queue_release_mbufs(struct ci_tx_queue *txq)
-{
-   unsigned i;
-
-   if (txq->sw_ring != NULL) {
-   for (i = 0; i < txq->nb_tx_desc; i++) {
-   if (txq->sw_ring[i].mbuf != NULL) {
-   rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
-   txq->sw_ring[i].mbuf = NULL;
-   }
-   }
-   }
-}
-
 static int
 ixgbe_tx_done_cleanup_full(struct ci_tx_queue *txq, uint32_t free_cnt)
 {
@@ -2472,7 +2457,7 @@ static void __rte_cold
 ixgbe_tx_queue_release(struct ci_tx_queue *txq)
 {
if (txq != NULL && txq->ops != NULL) {
-   txq->ops->release_mbufs(txq);
+   ci_txq_release_all_mbufs(txq);
txq->ops->free_swring(txq);
rte_memzone_free(txq->mz);
rte_free(txq);
@@ -2526,7 +2511,6 @@ ixgbe_reset_tx_queue(struct ci_tx_queue *txq)
 }
 
 static const struct ixgbe_txq_ops def_txq_ops = {
-   .release_mbufs = ixgbe_tx_queue_release_mbufs,
.free_swring = ixgbe_tx_free_swring,
.reset = ixgbe_reset_tx_queue,
 };
@@ -3380,7 +3364,7 @@ ixgbe_dev_clear_queues(struct rte_eth_dev *dev)
struct ci_tx_queue *txq = dev->data->tx_queues[i];
 
if (txq != NULL) {
-   txq->ops->release_mbufs(txq);
+   ci_txq_release_all_mbufs(txq);
txq->ops->reset(txq);
dev->data->tx_queue_state[i] = 
RTE_ETH_QUEUE_STATE_STOPPED;
}
@@ -5655,7 +5639,7 @@ ixgbe_dev_tx_queue_stop(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
}
 
if (txq->ops != NULL) {
-   txq->ops->release_mbufs(txq);
+   ci_txq_release_all_mbufs(txq);
txq->ops->reset(txq);
}
dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STOPPED;
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 4333e5bf2f..11689eb432 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -181,7 +181,6 @@ struct ixgbe_advctx_info {
 };
 
 struct ixgbe_txq_ops {
-   void (*release_mbufs)(struct ci_tx_queue *txq);
void (*free_swring)(struct ci_tx_queue *txq);
void (*reset)(struct ci_tx_queue *txq);
 };
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
index 81fd8bb64d..65794e45cb 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
@@ -78,32 +78,6 @@ tx_backlog_entry(struct ci_tx_entry_vec *txep,
txep[i].mbuf = tx_pkts[i];
 }
 
-static inline void
-_ixgbe_tx_queue_release_mbufs_vec(struct ci_tx_queue *txq)
-{
-   unsigned int i;
-   struct ci_tx_entry_vec *txe;
-   const uint16_t max_desc = (uint16_t)(txq->nb_tx_desc - 1);
-
-   if (txq->sw_ring == NULL || txq->nb_tx_free == max_desc)
-   return;
-
-   /* release the used mbufs in sw_ring */
-   for (i = txq->tx_next_dd - (txq->tx_rs_thresh - 1);
-i != txq->tx_tail;
-i = (i + 1) % txq->nb_tx_desc) {
-   txe = &txq->sw_ring_vec[i];
-   rte_pktmbuf_free_seg(txe->mbuf);
-   }
-   txq->nb_tx_free = max_desc;
-
-   /* reset tx_entry */
-   for (i = 0; i < txq->nb_tx_desc; i++) {
-   txe = &txq->sw_ring_vec[i];
-   txe->mbuf = NULL;
-   }
-}
-
 static inline void
 _ixgbe_rx_queue_release_mbufs_vec(struct ixgbe_rx_queue *rxq)
 {
@@ -208,6 +182,8 @@ ixgbe_txq_vec_setup_default(struct ci_tx_queue *txq,
/* leave the first one for overflow */
txq->sw_ring_vec = txq->sw_ring_vec + 1;
txq->ops = txq_ops;
+   txq->vector_tx = 1;
+   txq->vector_sw_ring = 1;
 
return 0;
 }
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index cb749a3760..2ccb399b64 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -633,12 +633,6 @@ ixgbe_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf 

[PATCH v2 18/22] net/ice: use vector SW ring for all vector paths

2024-12-03 Thread Bruce Richardson
The AVX-512 code path used a smaller SW ring structure only containing
the mbuf pointer, but no other fields. The other fields are only used in
the scalar code path, so update all vector driver code paths to use the
smaller, faster structure.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h|  7 +++
 drivers/net/ice/ice_rxtx.c|  2 +-
 drivers/net/ice/ice_rxtx_vec_avx2.c   | 12 ++--
 drivers/net/ice/ice_rxtx_vec_avx512.c | 14 ++
 drivers/net/ice/ice_rxtx_vec_common.h |  6 --
 drivers/net/ice/ice_rxtx_vec_sse.c| 12 ++--
 6 files changed, 22 insertions(+), 31 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index 310b51adcf..aa42b9b49f 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -109,6 +109,13 @@ ci_tx_backlog_entry(struct ci_tx_entry *txep, struct 
rte_mbuf **tx_pkts, uint16_
txep[i].mbuf = tx_pkts[i];
 }
 
+static __rte_always_inline void
+ci_tx_backlog_entry_vec(struct ci_tx_entry_vec *txep, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
+{
+   for (uint16_t i = 0; i < nb_pkts; ++i)
+   txep[i].mbuf = tx_pkts[i];
+}
+
 #define IETH_VPMD_TX_MAX_FREE_BUF 64
 
 typedef int (*ci_desc_done_fn)(struct ci_tx_queue *txq, uint16_t idx);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index ad0ddf6a88..77cb6688a7 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -825,7 +825,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
 
/* record what kind of descriptor cleanup we need on teardown */
txq->vector_tx = ad->tx_vec_allowed;
-   txq->vector_sw_ring = ad->tx_use_avx512;
+   txq->vector_sw_ring = txq->vector_tx;
 
dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
 
diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c 
b/drivers/net/ice/ice_rxtx_vec_avx2.c
index 12ffa0fa9a..98bab322b4 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -858,7 +858,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct 
rte_mbuf **tx_pkts,
 {
struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
volatile struct ice_tx_desc *txdp;
-   struct ci_tx_entry *txep;
+   struct ci_tx_entry_vec *txep;
uint16_t n, nb_commit, tx_id;
uint64_t flags = ICE_TD_CMD;
uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
@@ -867,7 +867,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct 
rte_mbuf **tx_pkts,
nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
 
if (txq->nb_tx_free < txq->tx_free_thresh)
-   ice_tx_free_bufs_vec(txq);
+   ci_tx_free_bufs_vec(txq, ice_tx_desc_done, false);
 
nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
if (unlikely(nb_pkts == 0))
@@ -875,13 +875,13 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct 
rte_mbuf **tx_pkts,
 
tx_id = txq->tx_tail;
txdp = &txq->ice_tx_ring[tx_id];
-   txep = &txq->sw_ring[tx_id];
+   txep = &txq->sw_ring_vec[tx_id];
 
txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 
n = (uint16_t)(txq->nb_tx_desc - tx_id);
if (nb_commit >= n) {
-   ci_tx_backlog_entry(txep, tx_pkts, n);
+   ci_tx_backlog_entry_vec(txep, tx_pkts, n);
 
ice_vtx(txdp, tx_pkts, n - 1, flags, offload);
tx_pkts += (n - 1);
@@ -896,10 +896,10 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct 
rte_mbuf **tx_pkts,
 
/* avoid reach the end of ring */
txdp = &txq->ice_tx_ring[tx_id];
-   txep = &txq->sw_ring[tx_id];
+   txep = &txq->sw_ring_vec[tx_id];
}
 
-   ci_tx_backlog_entry(txep, tx_pkts, nb_commit);
+   ci_tx_backlog_entry_vec(txep, tx_pkts, nb_commit);
 
ice_vtx(txdp, tx_pkts, nb_commit, flags, offload);
 
diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c 
b/drivers/net/ice/ice_rxtx_vec_avx512.c
index f6ec593f96..481f784e34 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx512.c
@@ -924,16 +924,6 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf 
**pkt,
}
 }
 
-static __rte_always_inline void
-ice_tx_backlog_entry_avx512(struct ci_tx_entry_vec *txep,
-   struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
-{
-   int i;
-
-   for (i = 0; i < (int)nb_pkts; ++i)
-   txep[i].mbuf = tx_pkts[i];
-}
-
 static __rte_always_inline uint16_t
 ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts, bool do_offload)
@@ -964,7 +954,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct 
rte_mbuf **tx_pkts,
 
n = (uint16_t)(txq->nb_tx_desc - tx_id);
if (nb_commit >= n) {
-   ice_tx

[PATCH v2 13/22] net/iavf: use common Tx free fn for AVX-512

2024-12-03 Thread Bruce Richardson
Switch the iavf driver to use the common Tx free function. This requires
one additional parameter to that function, since iavf sometimes uses
context descriptors which means that we have double the descriptors per
SW ring slot.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h  |   6 +-
 drivers/net/i40e/i40e_rxtx_vec_avx512.c |   2 +-
 drivers/net/iavf/iavf_rxtx_vec_avx512.c | 119 +---
 drivers/net/ice/ice_rxtx_vec_avx512.c   |   2 +-
 4 files changed, 7 insertions(+), 122 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index 84ff839672..26aef528fa 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -179,7 +179,7 @@ ci_tx_free_bufs(struct ci_tx_queue *txq, ci_desc_done_fn 
desc_done)
 }
 
 static __rte_always_inline int
-ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done)
+ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done, bool 
ctx_descs)
 {
int nb_free = 0;
struct rte_mbuf *free[IETH_VPMD_TX_MAX_FREE_BUF];
@@ -189,13 +189,13 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, 
ci_desc_done_fn desc_done)
if (!desc_done(txq, txq->tx_next_dd))
return 0;
 
-   const uint32_t n = txq->tx_rs_thresh;
+   const uint32_t n = txq->tx_rs_thresh >> ctx_descs;
 
/* first buffer to free from S/W ring is at index
 * tx_next_dd - (tx_rs_thresh - 1)
 */
struct ci_tx_entry_vec *txep = txq->sw_ring_vec;
-   txep += txq->tx_next_dd - (n - 1);
+   txep += (txq->tx_next_dd >> ctx_descs) - (n - 1);
 
if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE && (n & 31) == 0) 
{
struct rte_mempool *mp = txep[0].mbuf->pool;
diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c 
b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
index 9bb2a44231..c555c3491d 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
@@ -829,7 +829,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct 
rte_mbuf **tx_pkts,
uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
 
if (txq->nb_tx_free < txq->tx_free_thresh)
-   ci_tx_free_bufs_vec(txq, i40e_tx_desc_done);
+   ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
 
nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
if (unlikely(nb_pkts == 0))
diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c 
b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
index 9cf7171524..8543490c70 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
@@ -1844,121 +1844,6 @@ 
iavf_recv_scattered_pkts_vec_avx512_flex_rxd_offload(void *rx_queue,
true);
 }
 
-static __rte_always_inline int
-iavf_tx_free_bufs_avx512(struct ci_tx_queue *txq)
-{
-   struct ci_tx_entry_vec *txep;
-   uint32_t n;
-   uint32_t i;
-   int nb_free = 0;
-   struct rte_mbuf *m, *free[IAVF_VPMD_TX_MAX_FREE_BUF];
-
-   /* check DD bits on threshold descriptor */
-   if ((txq->iavf_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) !=
-   rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE))
-   return 0;
-
-   n = txq->tx_rs_thresh >> txq->use_ctx;
-
-/* first buffer to free from S/W ring is at index
- * tx_next_dd - (tx_rs_thresh-1)
- */
-   txep = (void *)txq->sw_ring;
-   txep += (txq->tx_next_dd >> txq->use_ctx) - (n - 1);
-
-   if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE && (n & 31) == 0) 
{
-   struct rte_mempool *mp = txep[0].mbuf->pool;
-   struct rte_mempool_cache *cache = rte_mempool_default_cache(mp,
-   rte_lcore_id());
-   void **cache_objs;
-
-   if (!cache || cache->len == 0)
-   goto normal;
-
-   cache_objs = &cache->objs[cache->len];
-
-   if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
-   rte_mempool_ops_enqueue_bulk(mp, (void *)txep, n);
-   goto done;
-   }
-
-   /* The cache follows the following algorithm
-*   1. Add the objects to the cache
-*   2. Anything greater than the cache min value (if it 
crosses the
-*   cache flush threshold) is flushed to the ring.
-*/
-   /* Add elements back into the cache */
-   uint32_t copied = 0;
-   /* n is multiple of 32 */
-   while (copied < n) {
-#ifdef RTE_ARCH_64
-   const __m512i a = _mm512_loadu_si512(&txep[copied]);
-   const __m512i b = _mm512_loadu_si512(&txep[copied + 8]);
-   const __m512i c = _mm

[PATCH v2 20/22] net/iavf: use vector SW ring for all vector paths

2024-12-03 Thread Bruce Richardson
The AVX-512 code path used a smaller SW ring structure only containing
the mbuf pointer, but no other fields. The other fields are only used in
the scalar code path, so update all vector driver code paths (AVX2, SSE)
to use the smaller, faster structure.

Signed-off-by: Bruce Richardson 
---
 drivers/net/iavf/iavf_rxtx.c|  7 ---
 drivers/net/iavf/iavf_rxtx_vec_avx2.c   | 12 ++--
 drivers/net/iavf/iavf_rxtx_vec_avx512.c |  8 
 drivers/net/iavf/iavf_rxtx_vec_common.h |  6 --
 drivers/net/iavf/iavf_rxtx_vec_sse.c| 14 +++---
 5 files changed, 13 insertions(+), 34 deletions(-)

diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f0ab881ac5..6692f6992b 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -4193,14 +4193,7 @@ iavf_set_tx_function(struct rte_eth_dev *dev)
txq = dev->data->tx_queues[i];
if (!txq)
continue;
-#ifdef CC_AVX512_SUPPORT
-   if (use_avx512)
-   iavf_txq_vec_setup_avx512(txq);
-   else
-   iavf_txq_vec_setup(txq);
-#else
iavf_txq_vec_setup(txq);
-#endif
}
 
if (no_poll_on_link_down) {
diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c 
b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
index fdb98b417a..b847886081 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
@@ -1736,14 +1736,14 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct 
rte_mbuf **tx_pkts,
 {
struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
volatile struct iavf_tx_desc *txdp;
-   struct ci_tx_entry *txep;
+   struct ci_tx_entry_vec *txep;
uint16_t n, nb_commit, tx_id;
/* bit2 is reserved and must be set to 1 according to Spec */
uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
 
if (txq->nb_tx_free < txq->tx_free_thresh)
-   iavf_tx_free_bufs(txq);
+   ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
 
nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
if (unlikely(nb_pkts == 0))
@@ -1752,13 +1752,13 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct 
rte_mbuf **tx_pkts,
 
tx_id = txq->tx_tail;
txdp = &txq->iavf_tx_ring[tx_id];
-   txep = &txq->sw_ring[tx_id];
+   txep = &txq->sw_ring_vec[tx_id];
 
txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 
n = (uint16_t)(txq->nb_tx_desc - tx_id);
if (nb_commit >= n) {
-   ci_tx_backlog_entry(txep, tx_pkts, n);
+   ci_tx_backlog_entry_vec(txep, tx_pkts, n);
 
iavf_vtx(txdp, tx_pkts, n - 1, flags, offload);
tx_pkts += (n - 1);
@@ -1773,10 +1773,10 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct 
rte_mbuf **tx_pkts,
 
/* avoid reach the end of ring */
txdp = &txq->iavf_tx_ring[tx_id];
-   txep = &txq->sw_ring[tx_id];
+   txep = &txq->sw_ring_vec[tx_id];
}
 
-   ci_tx_backlog_entry(txep, tx_pkts, nb_commit);
+   ci_tx_backlog_entry_vec(txep, tx_pkts, nb_commit);
 
iavf_vtx(txdp, tx_pkts, nb_commit, flags, offload);
 
diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c 
b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
index 007759e451..641f3311eb 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
@@ -2357,14 +2357,6 @@ iavf_xmit_pkts_vec_avx512(void *tx_queue, struct 
rte_mbuf **tx_pkts,
return iavf_xmit_pkts_vec_avx512_cmn(tx_queue, tx_pkts, nb_pkts, false);
 }
 
-int __rte_cold
-iavf_txq_vec_setup_avx512(struct ci_tx_queue *txq)
-{
-   txq->vector_tx = true;
-   txq->vector_sw_ring = true;
-   return 0;
-}
-
 uint16_t
 iavf_xmit_pkts_vec_avx512_offload(void *tx_queue, struct rte_mbuf **tx_pkts,
  uint16_t nb_pkts)
diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h 
b/drivers/net/iavf/iavf_rxtx_vec_common.h
index 6f94587eee..c69399a173 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/iavf/iavf_rxtx_vec_common.h
@@ -24,12 +24,6 @@ iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 }
 
-static __rte_always_inline int
-iavf_tx_free_bufs(struct ci_tx_queue *txq)
-{
-   return ci_tx_free_bufs(txq, iavf_tx_desc_done);
-}
-
 static inline void
 _iavf_rx_queue_release_mbufs_vec(struct iavf_rx_queue *rxq)
 {
diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c 
b/drivers/net/iavf/iavf_rxtx_vec_sse.c
index 3adf2a59e4..9f7db80bfd 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_sse.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c
@@ -1368,14 +1368,14 @@ iavf_xm

Re: About DPDK l3fwd

2024-12-03 Thread Liang Ma
On Fri, Nov 22, 2024 at 09:45:58AM +, 王颢 wrote:
> Dear all,
> 
> I have noticed that the performance reports on the official DPDK website 
> predominantly use l3fwd, while we typically utilize pktgen. Out of curiosity, 
> I attempted to replicate the experiments from the performance reports. 
> However, I encountered some issues, as shown in the attached image.
> 
> I ran pktgen on PC1 and l3fwd on PC2. After establishing the connection, I 
> entered the command str on pktgen. It appears that pktgen is able to send and 
> receive packets normally, but there is no output from l3fwd. Based on my 
> understanding, l3fwd should produce some output, correct?
> 
> Could you please share how you usually conduct these tests for your 
> performance reports?
> 
> Thank you for your assistance.
> 
> Best regards,
> Howard Wang
> 
> [cid:image001.png@01DB3D05.9B91B530]
> 
> [cid:image002.png@01DB3D05.9B91B530]
Hi Howard, 
   You can check out the performance test report. In most cases, the report 
will include all setup details (including the command line).
Regards
Liang




[PATCH v2 22/22] net/ixgbe: use common Tx backlog entry fn

2024-12-03 Thread Bruce Richardson
Remove the custom vector Tx backlog entry function and use the standard
intel_common one, now that all vector drivers are using the same,
smaller ring structure.

Signed-off-by: Bruce Richardson 
---
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 10 --
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c   |  4 ++--
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c|  4 ++--
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
index 3d4840c3b7..7316fc6c3b 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
@@ -68,16 +68,6 @@ ixgbe_tx_free_bufs(struct ci_tx_queue *txq)
return txq->tx_rs_thresh;
 }
 
-static __rte_always_inline void
-tx_backlog_entry(struct ci_tx_entry_vec *txep,
-struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
-{
-   int i;
-
-   for (i = 0; i < (int)nb_pkts; ++i)
-   txep[i].mbuf = tx_pkts[i];
-}
-
 static inline void
 _ixgbe_rx_queue_release_mbufs_vec(struct ixgbe_rx_queue *rxq)
 {
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index 2ccb399b64..f879f6fa9a 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -597,7 +597,7 @@ ixgbe_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 
n = (uint16_t)(txq->nb_tx_desc - tx_id);
if (nb_commit >= n) {
-   tx_backlog_entry(txep, tx_pkts, n);
+   ci_tx_backlog_entry_vec(txep, tx_pkts, n);
 
for (i = 0; i < n - 1; ++i, ++tx_pkts, ++txdp)
vtx1(txdp, *tx_pkts, flags);
@@ -614,7 +614,7 @@ ixgbe_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
txep = &txq->sw_ring_vec[tx_id];
}
 
-   tx_backlog_entry(txep, tx_pkts, nb_commit);
+   ci_tx_backlog_entry_vec(txep, tx_pkts, nb_commit);
 
vtx(txdp, tx_pkts, nb_commit, flags);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index fa26365f06..915358e16b 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -720,7 +720,7 @@ ixgbe_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
n = (uint16_t)(txq->nb_tx_desc - tx_id);
if (nb_commit >= n) {
 
-   tx_backlog_entry(txep, tx_pkts, n);
+   ci_tx_backlog_entry_vec(txep, tx_pkts, n);
 
for (i = 0; i < n - 1; ++i, ++tx_pkts, ++txdp)
vtx1(txdp, *tx_pkts, flags);
@@ -737,7 +737,7 @@ ixgbe_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
txep = &txq->sw_ring_vec[tx_id];
}
 
-   tx_backlog_entry(txep, tx_pkts, nb_commit);
+   ci_tx_backlog_entry_vec(txep, tx_pkts, nb_commit);
 
vtx(txdp, tx_pkts, nb_commit, flags);
 
-- 
2.43.0



[PATCH v2 00/22] Reduce code duplication across Intel NIC drivers

2024-12-03 Thread Bruce Richardson
This RFC attempts to reduce the amount of code duplication across a
number of Intel NIC drivers, specifically: ixgbe, i40e, iavf, and ice.

The first patch extract a function from the Rx side, otherwise the
majority of the changes are on the Tx side, leading to a converged Tx
queue structure across the 4 drivers, and a large number of common
functions.

v1->v2:
* Fix two additional checkpatch issues that were flagged.
* Added in patch 21, which performs additional cleanup that is possible
  once all vector drivers use the same mbuf free/release process.
  [This brings the patchset to having over twice as many lines removed
  as added (1887 vs 930), and close to having a net removal of 1kloc]

RFC->v1:
* Moved the location of the common code from "common/intel_eth" to
  "net/_common_intel", and added only ".." to the driver include path so
  that the paths included "_common_intel" in them, to make it clear it's
  not driver-local headers.
* Due to change in location, structure/fn prefix changes from "ieth" to
  "ci" for "common intel".
* Removed the seeming-arbitrary split of vector and non-vector code -
  since much of the code taken from vector files was scalar code which
  was used by the vector drivers.
* Split code into separate Rx and Tx files.
* Fixed multiple checkpatch issues (but not all).
* Attempted to improve name standardization, by using "_vec" as a common
  suffix for all vector-related fns and data. Previously, some names had
  "vec" in the middle, others had just "_v" suffix or full word "vector"
  as suffix.
* Other minor changes...

Bruce Richardson (22):
  net/_common_intel: add pkt reassembly fn for intel drivers
  net/_common_intel: provide common Tx entry structures
  net/_common_intel: add Tx mbuf ring replenish fn
  drivers/net: align Tx queue struct field names
  drivers/net: add prefix for driver-specific structs
  net/_common_intel: merge ice and i40e Tx queue struct
  net/iavf: use common Tx queue structure
  net/ixgbe: convert Tx queue context cache field to ptr
  net/ixgbe: use common Tx queue structure
  net/_common_intel: pack Tx queue structure
  net/_common_intel: add post-Tx buffer free function
  net/_common_intel: add Tx buffer free fn for AVX-512
  net/iavf: use common Tx free fn for AVX-512
  net/ice: move Tx queue mbuf cleanup fn to common
  net/i40e: use common Tx queue mbuf cleanup fn
  net/ixgbe: use common Tx queue mbuf cleanup fn
  net/iavf: use common Tx queue mbuf cleanup fn
  net/ice: use vector SW ring for all vector paths
  net/i40e: use vector SW ring for all vector paths
  net/iavf: use vector SW ring for all vector paths
  net/_common_intel: remove unneeded code
  net/ixgbe: use common Tx backlog entry fn

 drivers/net/_common_intel/rx.h|  79 ++
 drivers/net/_common_intel/tx.h| 249 ++
 drivers/net/i40e/i40e_ethdev.c|   4 +-
 drivers/net/i40e/i40e_ethdev.h|   8 +-
 drivers/net/i40e/i40e_fdir.c  |  10 +-
 .../net/i40e/i40e_recycle_mbufs_vec_common.c  |   6 +-
 drivers/net/i40e/i40e_rxtx.c  | 192 +-
 drivers/net/i40e/i40e_rxtx.h  |  61 +
 drivers/net/i40e/i40e_rxtx_vec_altivec.c  |  26 +-
 drivers/net/i40e/i40e_rxtx_vec_avx2.c |  26 +-
 drivers/net/i40e/i40e_rxtx_vec_avx512.c   | 144 +-
 drivers/net/i40e/i40e_rxtx_vec_common.h   | 144 +-
 drivers/net/i40e/i40e_rxtx_vec_neon.c |  26 +-
 drivers/net/i40e/i40e_rxtx_vec_sse.c  |  26 +-
 drivers/net/i40e/meson.build  |   2 +-
 drivers/net/iavf/iavf.h   |   2 +-
 drivers/net/iavf/iavf_ethdev.c|   4 +-
 drivers/net/iavf/iavf_rxtx.c  | 180 +
 drivers/net/iavf/iavf_rxtx.h  |  61 +
 drivers/net/iavf/iavf_rxtx_vec_avx2.c |  47 ++--
 drivers/net/iavf/iavf_rxtx_vec_avx512.c   | 214 +++
 drivers/net/iavf/iavf_rxtx_vec_common.h   | 160 +--
 drivers/net/iavf/iavf_rxtx_vec_sse.c  |  56 ++--
 drivers/net/iavf/iavf_vchnl.c |   8 +-
 drivers/net/iavf/meson.build  |   2 +-
 drivers/net/ice/ice_dcf.c |   4 +-
 drivers/net/ice/ice_dcf_ethdev.c  |  21 +-
 drivers/net/ice/ice_diagnose.c|   2 +-
 drivers/net/ice/ice_ethdev.c  |   2 +-
 drivers/net/ice/ice_ethdev.h  |   7 +-
 drivers/net/ice/ice_rxtx.c| 163 +---
 drivers/net/ice/ice_rxtx.h|  52 +---
 drivers/net/ice/ice_rxtx_vec_avx2.c   |  26 +-
 drivers/net/ice/ice_rxtx_vec_avx512.c | 153 +--
 drivers/net/ice/ice_rxtx_vec_common.h | 190 +
 drivers/net/ice/ice_rxtx_vec_sse.c|  32 +--
 drivers/net/ice/meson.build   |   2 +-
 drivers/net/ixgbe/base/ixgbe_osdep.h  |   2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c  

Re: [PATCH] net/mlx5: fix hypervisor detection in VLAN workaround

2024-12-03 Thread Stephen Hemminger
On Tue, 3 Dec 2024 18:22:00 +0200
Viacheslav Ovsiienko  wrote:

> diff --git a/drivers/net/mlx5/linux/mlx5_vlan_os.c 
> b/drivers/net/mlx5/linux/mlx5_vlan_os.c
> index 81611a8d3f..017953d5cc 100644
> --- a/drivers/net/mlx5/linux/mlx5_vlan_os.c
> +++ b/drivers/net/mlx5/linux/mlx5_vlan_os.c
> @@ -112,7 +112,9 @@ mlx5_vlan_vmwa_init(struct rte_eth_dev *dev, uint32_t 
> ifindex)
>   /* Check whether there is desired virtual environment */
>   hv_type = rte_hypervisor_get();
>   switch (hv_type) {
> +#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_X86_64)
>   case RTE_HYPERVISOR_UNKNOWN:
> +#endif
>   case RTE_HYPERVISOR_VMWARE:
>   /*
>* The "white list" of configurations

Could you fix that comment as well?
We got rid of all use of "whitelist" in DPDK, looks like this one got missed.


[PATCH v2 12/22] net/_common_intel: add Tx buffer free fn for AVX-512

2024-12-03 Thread Bruce Richardson
AVX-512 code paths for ice and i40e drivers are common, and differ from
the regular post-Tx free function in that the SW ring from which the
buffers are freed does not contain anything other than the mbuf pointer.
Merge these into a common function in intel_common to reduce
duplication.

Signed-off-by: Bruce Richardson 
---
 drivers/net/_common_intel/tx.h  |  92 +++
 drivers/net/i40e/i40e_rxtx_vec_avx512.c | 114 +--
 drivers/net/ice/ice_rxtx_vec_avx512.c   | 117 +---
 3 files changed, 94 insertions(+), 229 deletions(-)

diff --git a/drivers/net/_common_intel/tx.h b/drivers/net/_common_intel/tx.h
index a930309c05..84ff839672 100644
--- a/drivers/net/_common_intel/tx.h
+++ b/drivers/net/_common_intel/tx.h
@@ -178,4 +178,96 @@ ci_tx_free_bufs(struct ci_tx_queue *txq, ci_desc_done_fn 
desc_done)
return txq->tx_rs_thresh;
 }
 
+static __rte_always_inline int
+ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done)
+{
+   int nb_free = 0;
+   struct rte_mbuf *free[IETH_VPMD_TX_MAX_FREE_BUF];
+   struct rte_mbuf *m;
+
+   /* check DD bits on threshold descriptor */
+   if (!desc_done(txq, txq->tx_next_dd))
+   return 0;
+
+   const uint32_t n = txq->tx_rs_thresh;
+
+   /* first buffer to free from S/W ring is at index
+* tx_next_dd - (tx_rs_thresh - 1)
+*/
+   struct ci_tx_entry_vec *txep = txq->sw_ring_vec;
+   txep += txq->tx_next_dd - (n - 1);
+
+   if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE && (n & 31) == 0) 
{
+   struct rte_mempool *mp = txep[0].mbuf->pool;
+   void **cache_objs;
+   struct rte_mempool_cache *cache = rte_mempool_default_cache(mp, 
rte_lcore_id());
+
+   if (!cache || cache->len == 0)
+   goto normal;
+
+   cache_objs = &cache->objs[cache->len];
+
+   if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
+   rte_mempool_ops_enqueue_bulk(mp, (void *)txep, n);
+   goto done;
+   }
+
+   /* The cache follows the following algorithm
+*   1. Add the objects to the cache
+*   2. Anything greater than the cache min value (if it
+*   crosses the cache flush threshold) is flushed to the ring.
+*/
+   /* Add elements back into the cache */
+   uint32_t copied = 0;
+   /* n is multiple of 32 */
+   while (copied < n) {
+   memcpy(&cache_objs[copied], &txep[copied], 32 * 
sizeof(void *));
+   copied += 32;
+   }
+   cache->len += n;
+
+   if (cache->len >= cache->flushthresh) {
+   rte_mempool_ops_enqueue_bulk(mp, 
&cache->objs[cache->size],
+   cache->len - cache->size);
+   cache->len = cache->size;
+   }
+   goto done;
+   }
+
+normal:
+   m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
+   if (likely(m)) {
+   free[0] = m;
+   nb_free = 1;
+   for (uint32_t i = 1; i < n; i++) {
+   m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
+   if (likely(m)) {
+   if (likely(m->pool == free[0]->pool)) {
+   free[nb_free++] = m;
+   } else {
+   rte_mempool_put_bulk(free[0]->pool, 
(void *)free, nb_free);
+   free[0] = m;
+   nb_free = 1;
+   }
+   }
+   }
+   rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+   } else {
+   for (uint32_t i = 1; i < n; i++) {
+   m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
+   if (m)
+   rte_mempool_put(m->pool, m);
+   }
+   }
+
+done:
+   /* buffers were freed, update counters */
+   txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
+   txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
+   if (txq->tx_next_dd >= txq->nb_tx_desc)
+   txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
+
+   return txq->tx_rs_thresh;
+}
+
 #endif /* _COMMON_INTEL_TX_H_ */
diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c 
b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
index a3f6d1667f..9bb2a44231 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
@@ -754,118 +754,6 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,
rx_pkts + retval, nb_pkts);
 }
 
-static __rte_always_inline int
-i40e_tx_free_bufs_avx512(struct c

Re: [PATCH v2 1/2] usertools/devbind: update coding style

2024-12-03 Thread Stephen Hemminger
On Tue,  3 Dec 2024 11:25:00 +
Anatoly Burakov  wrote:

> +
> +def check_installed(program: str, package: str) -> None:
> +"""Check if a program is installed."""
> +if subprocess.call(
> +["which", program], stdout=subprocess.DEVNULL, 
> stderr=subprocess.DEVNULL
> +):
> +raise DevbindError(f"'{program}' not found - please install 
> '{package}'.")
> +

Apparently the posix way to do this is to use command -v not "which"


   command [-pVv] command [arg ...]
  Run  command  with  args  suppressing  the  normal shell function
  lookup.  Only builtin commands or commands found in the PATH  are
  executed.   If  the -p option is given, the search for command is
  performed using a default value for PATH that  is  guaranteed  to
  find  all  of the standard utilities.  If either the -V or -v op‐
  tion is supplied, a description of command is  printed.   The  -v
  option  causes  a  single word indicating the command or filename
  used to invoke command to be displayed; the -V option produces  a
  more  verbose  description.   If the -V or -v option is supplied,
  the exit status is 0 if command was found, and 1 if not.  If nei‐
  ther option is supplied and an error occurred or  command  cannot
  be  found, the exit status is 127.  Otherwise, the exit status of
  the command builtin is the exit status of command.


Re: [PATCH v16 1/4] lib: add generic support for reading PMU events

2024-12-03 Thread Stephen Hemminger
On Mon, 18 Nov 2024 08:37:03 +0100
Tomasz Duszynski  wrote:

> diff --git a/doc/guides/prog_guide/profile_app.rst 
> b/doc/guides/prog_guide/profile_app.rst
> index a6b5fb4d5e..ecb90a0d94 100644
> --- a/doc/guides/prog_guide/profile_app.rst
> +++ b/doc/guides/prog_guide/profile_app.rst
> @@ -7,6 +7,32 @@ Profile Your Application
>  The following sections describe methods of profiling DPDK applications on
>  different architectures.
>  
> +Performance counter based profiling
> +---
> +
> +Majority of architectures support some performance monitoring unit (PMU).
> +Such unit provides programmable counters that monitor specific events.
> +
> +Different tools gather that information, like for example perf.
> +However, in some scenarios when CPU cores are isolated and run
> +dedicated tasks interrupting those tasks with perf may be undesirable.

The data should be folded into telemetry rather than introducing yet another
DPDK API for applications to deal with.


> +
> +In such cases, an application can use the PMU library to read such events 
> via ``rte_pmu_read()``.
> +
> +By default, userspace applications are not allowed to access PMU internals. 
> That can be changed
> +by setting ``/sys/kernel/perf_event_paranoid`` to 2 (that should be a 
> default value anyway) and
> +adding ``CAP_PERFMON`` capability to DPDK application. Please refer to
> +``Documentation/admin-guide/perf-security.rst`` under Linux sources for more 
> information. Fairly
> +recent kernel, i.e >= 5.9, is advised too.

What happens on older kernels?

> +
> +As of now implementation imposes certain limitations:
> +
> +* Only EAL lcores are supported
> +
> +* EAL lcores must not share a cpu
> +
> +* Each EAL lcore measures same group of events
> +
>  
>  Profiling on x86
>  
> diff --git a/doc/guides/rel_notes/release_24_11.rst 
> b/doc/guides/rel_notes/release_24_11.rst
> index 5063badf39..1c299293e0 100644
> --- a/doc/guides/rel_notes/release_24_11.rst
> +++ b/doc/guides/rel_notes/release_24_11.rst

Well 24.11 is released, so as a minimum will need rebase for 25.03


Re: [PATCH v16 4/4] eal: add PMU support to tracing library

2024-12-03 Thread Stephen Hemminger
On Mon, 18 Nov 2024 08:37:06 +0100
Tomasz Duszynski  wrote:

> +static int
> +add_events(const char *pattern)
> +{
> + char *token, *copy;
> + int ret = 0;
> +
> + copy = strdup(pattern);
> + if (copy == NULL)
> + return -ENOMEM;
> +
> + token = strtok(copy, ",");

Since strtok is not thread safe, either use strtok_r or another
way to parse comma seperated list.

Maybe rte_strsplit could help?


Re: [RFC v3 1/2] pci: introduce the PCIe TLP Processing Hints API

2024-12-03 Thread Stephen Hemminger
On Mon, 21 Oct 2024 01:52:45 +
Wathsala Vithanage  wrote:

> Extend the PCI driver and the library to extract the Steering Tag (ST)
> for a given Processor/Processor Container and Cache ID pair and validate
> a Processing Hint from a TPH _DSM associated with a root port device.
> The rte_pci_device structure passed into the rte_pci_extract_tph_st()
> function could be a device or a root port. If it's a device, the
> function should trace it back to the root port and use its TPH _DSM to
> extract STs. The implementation of rte_pci_extract_tph_st() is dependent
> on the operating system.
> 
> rte_pci_extract_tph_st() should also be supplied with a
> rte_tph_acpi__dsm_args, and a rte_tph_acpi__dsm_return structures.
> These two structures are defined in the PCI library and comply with the
> TPH _DSM argument and return encoding specified in the PCI firmware ECN
> titled "Revised _DSM for Cache Locality TPH Features.". Use of
> rte_init_tph_acpi__dsm_args() is recommended for initializing the
> rte_tph_acpi__dsm_args struct which is capable of converting lcore ID,
> the cache level into values understood by the ACPI _DSM function.
> rte_tph_acpi__dsm_return struct will be initialized with the values
> returned by the TPH _DSM; it is up to the caller to use these values per
> the device's capabilities.
> 
> Signed-off-by: Wathsala Vithanage 
> Reviewed-by: Honnappa Nagarahalli 
> Reviewed-by: Dhruv Tripathi 

While doing review, noticed that patch has minor whitespace issue.

/home/shemminger/DPDK/main/.git/worktrees/stash/rebase-apply/patch:123: new 
blank line at EOF.
+
warning: 1 line adds whitespace errors.


Re: [RFC v3 2/2] ethdev: introduce the cache stashing hints API

2024-12-03 Thread Stephen Hemminger
On Mon, 21 Oct 2024 01:52:46 +
Wathsala Vithanage  wrote:

> Extend the ethdev library to enable the stashing of different data
> objects, such as the ones listed below, into CPU caches directly
> from the NIC.
> 
> - Rx/Tx queue descriptors
> - Rx packets
> - Packet headers
> - packet payloads
> - Data of a packet at an offset from the start of the packet
> 
> The APIs are designed in a hardware/vendor agnostic manner such that
> supporting PMDs could use any capabilities available in the underlying
> hardware for fine-grained stashing of data objects into a CPU cache
> (e.g., Steering Tags int PCIe TLP Processing Hints).
> 
> The API provides an interface to query the availability of stashing
> capabilities, i.e., platform/NIC support, stashable object types, etc,
> via the rte_eth_dev_stashing_capabilities_get interface.
> 
> The function pair rte_eth_dev_stashing_rx_config_set and
> rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU, 
> cache level, and data object types) on the Rx and Tx queues.
> 
> PMDs that support stashing must register their implementations with the
> following eth_dev_ops callbacks, which are invoked by the ethdev
> functions listed above.
> 
> - stashing_capabilities_get
> - stashing_rx_hints_set
> - stashing_tx_hints_set
> 
> Signed-off-by: Wathsala Vithanage 
> Reviewed-by: Honnappa Nagarahalli 
> Reviewed-by: Dhruv Tripathi 
> 
> ---
>  lib/ethdev/ethdev_driver.h |  66 +++
>  lib/ethdev/rte_ethdev.c| 120 +++
>  lib/ethdev/rte_ethdev.h| 161 +
>  lib/ethdev/version.map |   4 +
>  4 files changed, 351 insertions(+)
> 
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index 1fd4562b40..7caaea54a8 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -1367,6 +1367,68 @@ enum rte_eth_dev_operation {
>  typedef uint64_t (*eth_get_restore_flags_t)(struct rte_eth_dev *dev,
>   enum rte_eth_dev_operation op);
>  
> +/**
> + * @internal
> + * Set cache stashing hints in Rx queue.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param queue_id
> + *   Rx queue.
> + * @param config
> + *   Stashing hints configuration for the queue.
> + *
> + * @return
> + *   -ENOTSUP if the device or the platform does not support cache stashing.
> + *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing 
> feature.
> + *   -EINVAL  on invalid arguments.
> + *   0 on success.
> + */
> +typedef int (*eth_stashing_rx_hints_set_t)(struct rte_eth_dev *dev, uint16_t 
> queue_id,
> +struct rte_eth_stashing_config 
> *config);
> +
> +/**
> + * @internal
> + * Set cache stashing hints in Tx queue.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param queue_id
> + *   Tx queue.
> + * @param config
> + *   Stashing hints configuration for the queue.
> + *
> + * @return
> + *   -ENOTSUP if the device or the platform does not support cache stashing.
> + *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing 
> feature.
> + *   -EINVAL  on invalid arguments.
> + *   0 on success.
> + */
> +typedef int (*eth_stashing_tx_hints_set_t)(struct rte_eth_dev *dev, uint16_t 
> queue_id,
> +struct rte_eth_stashing_config 
> *config);
> +
> +/**
> + * @internal
> + * Get cache stashing object types supported in the ethernet device.
> + * The return value indicates availability of stashing hints support
> + * in the hardware and the PMD.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param objects
> + *   PMD sets supported bits on return.
> + *
> + * @return
> + *   -ENOTSUP if the device or the platform does not support cache stashing.
> + *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing 
> feature.
> + *   -EINVAL  on NULL values for types or hints parameters.
> + *   On return, types and hints parameters will have bits set for supported
> + *   object types and hints.
> + *   0 on success.
> + */
> +typedef int (*eth_stashing_capabilities_get_t)(struct rte_eth_dev *dev,
> +  uint16_t *objects);
> +
>  /**
>   * @internal A structure containing the functions exported by an Ethernet 
> driver.
>   */
> @@ -1393,6 +1455,10 @@ struct eth_dev_ops {
>   eth_mac_addr_remove_t  mac_addr_remove; /**< Remove MAC address */
>   eth_mac_addr_add_t mac_addr_add;  /**< Add a MAC address */
>   eth_mac_addr_set_t mac_addr_set;  /**< Set a MAC address */
> + eth_stashing_rx_hints_set_t   stashing_rx_hints_set; /**< Set Rx cache 
> stashing*/
> + eth_stashing_tx_hints_set_t   stashing_tx_hints_set; /**< Set Tx cache 
> stashing*/
> + /** Get supported stashing hints*/
> + eth_stashing_capabilities_get_t stashing_capabilities_get;
>   /** Set list of multicast addresses */
>   eth_set_mc_addr_li

Re: [PATCH 1/2] net/bonding: standard the log message

2024-12-03 Thread Stephen Hemminger
On Wed,  5 Jun 2024 13:55:19 +0800
Chaoyong He  wrote:

> From: Long Wu 
> 
> According to the check rules in the patch check script,
> drivers and libraries must use the logging framework.
> 
> So standard the log message of bonding driver by using
> the logging framework.
> 
> Signed-off-by: Long Wu 
> Reviewed-by: Peng Zhang 
> Reviewed-by: Chaoyong He 

Applied to next-net but reworded the commit log to better express the intent.


Re: [PATCH v16 4/4] eal: add PMU support to tracing library

2024-12-03 Thread Stephen Hemminger
On Mon, 18 Nov 2024 08:37:06 +0100
Tomasz Duszynski  wrote:

> +
> + /* events are matched against occurrences of e=ev1[,ev2,..] pattern */
> + ret = regcomp(®, "e=([_[:alnum:]-],?)+", REG_EXTENDED);
> + if (ret) {
> + PMU_LOG(ERR, "Failed to compile event matching regexp");
> + return -EINVAL;
> + }
> +
> + for (;;) {
> + if (regexec(®, pattern, 1, &rmatch, 0))
> + break;
> +

As with log parameters. Regex is harder to work with than a glob based syntax.


Re: [PATCH v16 0/4] add support for self monitoring

2024-12-03 Thread Stephen Hemminger
On Mon, 18 Nov 2024 08:37:02 +0100
Tomasz Duszynski  wrote:

> This series adds self monitoring support i.e allows to configure and
> read performance measurement unit (PMU) counters in runtime without
> using perf utility. This has certain advantages when application runs on
> isolated cores running dedicated tasks.
> 
> Events can be read directly using rte_pmu_read() or using dedicated
> tracepoint rte_eal_trace_pmu_read(). The latter will cause events to be
> stored inside CTF file.
> 
> By design, all enabled events are grouped together and the same group
> is attached to lcores that use self monitoring funtionality.
> 
> Events are enabled by names, which need to be read from standard
> location under sysfs i.e
> 
> /sys/bus/event_source/devices/PMU/events
> 
> where PMU is a core pmu i.e one measuring cpu events. As of today
> raw events are not supported.

It would be good to have a working useful example of what and how
these could be used.

Given that most DPDK applications are poll mode, it is not clear
how these could be useful.

There is a lot of work that perf does to go from the raw counters
to useful data, and without that exposing the raw stuff doesn't
seem that useful


Re: [v22,13/13] compress/zsda: add zsda compressdev capabilities

2024-12-03 Thread Hanxiao Li
Hi akhil:

I noticed that DPDK24.11 has been released and state of the my patches has been 
changed to "New" from "Defer".

And I have some questions about how to proceed with submitting patches.

1.  In the last version, I have submitted to v22. If I want to continue 
submitting, should I submit to v23 or v1?

2.  Should I submit patches based on the latest version as soon as 
possible, or wait for your comments before submitting?

3.  I think I need to add something to the release_25_03.rst. But I did not 
find the file in code.
So, what should I do? 
Don鈥檛 write this document for now, and add it after it is released 
later?
Or should I wait until this file exists before submitting?

Thanks
Hanxiao Li.

RE: [PATCH v2 2/2] net/bonding: add command to set dedicated queue size

2024-12-03 Thread Chaoyong He
> On Fri, 11 Oct 2024 11:24:12 +0800
> Chaoyong He  wrote:
> 
> > From: Long Wu 
> >
> > The testpmd application can not modify the value of dedicated hardware
> > Rx/Tx queue size, and hardcoded them as (128/512). This will cause the
> > bonding port start fail if some NIC requires more Rx/Tx descriptors
> > than the hardcoded number.
> >
> > Therefore, add a command into testpmd application to support the
> > modification of the size of the dedicated hardware Rx/Tx queue. Also
> > export an external interface to also let other applications can change
> > it.
> >
> > Signed-off-by: Long Wu 
> > Reviewed-by: Peng Zhang 
> > Reviewed-by: Chaoyong He 
> 
> 24.11 is released, this patch if still of interest will need to be rebased.

Okay, we will send new version patch later.

> 
> The definition of what a "dedicated queue" is a bit confusing.
> If it is only for LACP packets, it should never need to be very big.
> Only under a mis-configuration and DoS kind of flood should there ever be
> many packets.

Yes, the dedicated queue is only for LACP packets now and it doesn't need be 
set very big.

But if we use a hardware queue as the "dedicated queue", we must consider the 
hardware
capability. The minimum queue size of some NICs may be larger than the hardcode 
dedicated
queue size. In this case, I think it is better to add an interface to set the 
dedicated queue size.


Re: [PATCH] lib/lpm: use standard atomic_store_explicit

2024-12-03 Thread David Marchand
Hello Andre,

On Wed, Dec 4, 2024 at 3:20 AM Andre Muezerie
 wrote:
>
> MSVC issues the warning below:
>
> ../lib/lpm/rte_lpm.c(297): warning C4013
> '__atomic_store' undefined; assuming extern returning int
> ../lib/lpm/rte_lpm.c(298): error C2065:
> '__ATOMIC_RELAXED': undeclared identifier
>
> The fix is to use standard atomic_store_explicit() instead of
> gcc specific __atomic_store().
> atomic_store_explicit() was already being used in other parts
> of DPDK and is compatible
> with many compilers, including MSVC.
>
> Signed-off-by: Andre Muezerie 

With this change, is there anything remaining that blocks this library
compilation with MSVC?
If not, please update meson.build so that CI can test lpm compilation
with MSVC on this patch (and that will detect regressions once
merged).


-- 
David Marchand



[PATCH] lib/lpm: use standard atomic_store_explicit

2024-12-03 Thread Andre Muezerie
MSVC issues the warning below:

../lib/lpm/rte_lpm.c(297): warning C4013
'__atomic_store' undefined; assuming extern returning int
../lib/lpm/rte_lpm.c(298): error C2065:
'__ATOMIC_RELAXED': undeclared identifier

The fix is to use standard atomic_store_explicit() instead of
gcc specific __atomic_store().
atomic_store_explicit() was already being used in other parts
of DPDK and is compatible
with many compilers, including MSVC.

Signed-off-by: Andre Muezerie 
---
 lib/lpm/rte_lpm.c | 108 ++
 lib/lpm/rte_lpm.h |  56 ++--
 2 files changed, 104 insertions(+), 60 deletions(-)

diff --git a/lib/lpm/rte_lpm.c b/lib/lpm/rte_lpm.c
index a5c9e7c9fc..7ec85f1718 100644
--- a/lib/lpm/rte_lpm.c
+++ b/lib/lpm/rte_lpm.c
@@ -294,8 +294,8 @@ __lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned 
int n)
 
RTE_SET_USED(n);
/* Set tbl8 group invalid */
-   __atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
-   __ATOMIC_RELAXED);
+   rte_atomic_store_explicit(&tbl8[tbl8_group_index].val,
+   zero_tbl8_entry.val, rte_memory_order_relaxed);
 }
 
 /* Associate QSBR variable with an LPM object.
@@ -515,8 +515,8 @@ _tbl8_alloc(struct __rte_lpm *i_lpm)
RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
sizeof(tbl8_entry[0]));
 
-   __atomic_store(tbl8_entry, &new_tbl8_entry,
-   __ATOMIC_RELAXED);
+   rte_atomic_store_explicit(&tbl8_entry->val, 
new_tbl8_entry.val,
+   rte_memory_order_relaxed);
 
/* Return group index for allocated tbl8 group. */
return group_idx;
@@ -551,15 +551,19 @@ tbl8_free(struct __rte_lpm *i_lpm, uint32_t 
tbl8_group_start)
 
if (i_lpm->v == NULL) {
/* Set tbl8 group invalid*/
-   __atomic_store(&i_lpm->lpm.tbl8[tbl8_group_start], 
&zero_tbl8_entry,
-   __ATOMIC_RELAXED);
+   struct rte_lpm_tbl_entry *tbl8_entry =
+   &i_lpm->lpm.tbl8[tbl8_group_start];
+   rte_atomic_store_explicit(&tbl8_entry->val, zero_tbl8_entry.val,
+   rte_memory_order_relaxed);
} else if (i_lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
/* Wait for quiescent state change. */
rte_rcu_qsbr_synchronize(i_lpm->v,
RTE_QSBR_THRID_INVALID);
/* Set tbl8 group invalid*/
-   __atomic_store(&i_lpm->lpm.tbl8[tbl8_group_start], 
&zero_tbl8_entry,
-   __ATOMIC_RELAXED);
+   struct rte_lpm_tbl_entry *tbl8_entry =
+   &i_lpm->lpm.tbl8[tbl8_group_start];
+   rte_atomic_store_explicit(&tbl8_entry->val, zero_tbl8_entry.val,
+   rte_memory_order_relaxed);
} else if (i_lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
/* Push into QSBR defer queue. */
status = rte_rcu_qsbr_dq_enqueue(i_lpm->dq,
@@ -602,8 +606,10 @@ add_depth_small(struct __rte_lpm *i_lpm, uint32_t ip, 
uint8_t depth,
/* Setting tbl24 entry in one go to avoid race
 * conditions
 */
-   __atomic_store(&i_lpm->lpm.tbl24[i], &new_tbl24_entry,
-   __ATOMIC_RELEASE);
+   struct rte_lpm_tbl_entry *tbl24_entry =
+   &i_lpm->lpm.tbl24[i];
+   rte_atomic_store_explicit(&tbl24_entry->val, 
new_tbl24_entry.val,
+   rte_memory_order_release);
 
continue;
}
@@ -632,9 +638,11 @@ add_depth_small(struct __rte_lpm *i_lpm, uint32_t ip, 
uint8_t depth,
 * Setting tbl8 entry in one go to avoid
 * race conditions
 */
-   __atomic_store(&i_lpm->lpm.tbl8[j],
-   &new_tbl8_entry,
-   __ATOMIC_RELAXED);
+   struct rte_lpm_tbl_entry *tbl8_entry =
+   &i_lpm->lpm.tbl8[j];
+   
rte_atomic_store_explicit(&tbl8_entry->val,
+   new_tbl8_entry.val,
+   
rte_memory_order_relaxed);
 
continue;
}
@@ -679,8 +687,10 @@ add_depth_big(struct __rte_lpm *i_lpm, uint32_t ip_masked, 
uint8_t depth,
  

[PATCH v2] examples/l3fwd: add option to set Tx burst size

2024-12-03 Thread Jie Hai
The application send packets only when the buffer is full, or the
buffer is empty and the packets to be sent extends TX_PKT_BURST.
The change of MAX_PKT_BURST make TX buffer size and TX_PKT_BURST
increase, while the default cache size is 256. The packets in
the TX direction occupy the cache. As a result, the performance
deteriorates.

Restore the default Tx burst and add option '--tx-burst' to set
the Tx burst size. To ensure consistency, rename the option
'--burst' to '--rx-burst'. The valid range of the user-provided
value is (0, MAX_PKT_BURST] for both directions.

Fixes: d5c4897ecfb2 ("examples/l3fwd: add option to set RX burst size")
Cc: sta...@dpdk.org

Signed-off-by: Jie Hai 
---
 examples/l3fwd/l3fwd.h| 14 +++---
 examples/l3fwd/l3fwd_acl.c|  2 +-
 examples/l3fwd/l3fwd_common.h |  2 +-
 examples/l3fwd/l3fwd_em.c |  2 +-
 examples/l3fwd/l3fwd_fib.c|  2 +-
 examples/l3fwd/l3fwd_lpm.c|  2 +-
 examples/l3fwd/main.c | 89 ++-
 7 files changed, 59 insertions(+), 54 deletions(-)

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 0cce3406ee7d..a4e23b817edf 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -32,10 +32,6 @@
 
 #define VECTOR_SIZE_DEFAULT   MAX_PKT_BURST
 #define VECTOR_TMO_NS_DEFAULT 1E6 /* 1ms */
-/*
- * Try to avoid TX buffering if we have at least MAX_TX_BURST packets to send.
- */
-#defineMAX_TX_BURST  (MAX_PKT_BURST / 2)
 
 #define NB_SOCKETS8
 
@@ -116,7 +112,11 @@ extern struct acl_algorithms acl_alg[];
 
 extern uint32_t max_pkt_len;
 
-extern uint32_t nb_pkt_per_burst;
+extern uint32_t rx_pkt_burst;
+/*
+ * Try to avoid TX buffering if we have at least tx_pkt_burst packets to send.
+ */
+extern uint32_t tx_pkt_burst;
 extern uint32_t mb_mempool_cache_size;
 
 /* Send burst of packets on an output interface */
@@ -152,8 +152,8 @@ send_single_packet(struct lcore_conf *qconf,
len++;
 
/* enough pkts to be sent */
-   if (unlikely(len == MAX_PKT_BURST)) {
-   send_burst(qconf, MAX_PKT_BURST, port);
+   if (unlikely(len == tx_pkt_burst)) {
+   send_burst(qconf, tx_pkt_burst, port);
len = 0;
}
 
diff --git a/examples/l3fwd/l3fwd_acl.c b/examples/l3fwd/l3fwd_acl.c
index 4fc4b986cce6..a5af82357a03 100644
--- a/examples/l3fwd/l3fwd_acl.c
+++ b/examples/l3fwd/l3fwd_acl.c
@@ -1136,7 +1136,7 @@ acl_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid,
-   pkts_burst, nb_pkt_per_burst);
+   pkts_burst, rx_pkt_burst);
 
if (nb_rx > 0) {
nb_drop = acl_process_pkts(pkts_burst, hops,
diff --git a/examples/l3fwd/l3fwd_common.h b/examples/l3fwd/l3fwd_common.h
index d94e5f135791..34fe70b9415c 100644
--- a/examples/l3fwd/l3fwd_common.h
+++ b/examples/l3fwd/l3fwd_common.h
@@ -71,7 +71,7 @@ send_packetsx4(struct lcore_conf *qconf, uint16_t port, 
struct rte_mbuf *m[],
 * If TX buffer for that queue is empty, and we have enough packets,
 * then send them straightway.
 */
-   if (num >= MAX_TX_BURST && len == 0) {
+   if (num >= tx_pkt_burst / 2 && len == 0) {
n = rte_eth_tx_burst(port, qconf->tx_queue_id[port], m, num);
if (unlikely(n < num)) {
do {
diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index da9c45e3a482..ea74506ed971 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -644,7 +644,7 @@ em_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-   nb_pkt_per_burst);
+   rx_pkt_burst);
if (nb_rx == 0)
continue;
 
diff --git a/examples/l3fwd/l3fwd_fib.c b/examples/l3fwd/l3fwd_fib.c
index 82f1739df778..4223540b30ae 100644
--- a/examples/l3fwd/l3fwd_fib.c
+++ b/examples/l3fwd/l3fwd_fib.c
@@ -239,7 +239,7 @@ fib_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-   nb_pkt_per_burst);
+   rx_pkt_burst);
if (nb_rx == 0)
continue;
 
diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
index fec0aeb79c6a..bd1307c43e70 100644
--- a/examples/l3fwd/l3fwd_lpm.c
+++ b/example

[PATCH] cryptodev: not close device when secondary exit

2024-12-03 Thread Yang Ming
The secordary process should not close the crypto device when
it exits because the primary process still manage the device.
There is no reason with occurring error log below when
secordary process exits without any operation on the crypto
device while primary process starts the device.

Case situation:
eal_bus_cleanup has been added in rte_eal_cleanup. But for
secondary process, eal_bus_cleanup will trigger vdev_cleanup
which trigger rte_vdev_driver to remove. Then crypto devices
will execute ipsec_mb_remove to rte_cryptodev_pmd_destroy.
Finially, rte_cryptodev_close will be called by secordary
process exit.

Error logs occur as below when the secordary process exit:
CRYPTODEV: rte_cryptodev_close() line 1453: Device 0 must be
stopped before closing

Function call trace: rte_eal_cleanup->eal_bus_cleanup->
vdev_cleanup->rte_vdev_driver_remove->ipsec_mb_remove->
rte_cryptodev_pmd_destroy->rte_cryptodev_pmd_release_device->
rte_cryptodev_close

Signed-off-by: Yang Ming 
---
 lib/cryptodev/rte_cryptodev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/cryptodev/rte_cryptodev.c b/lib/cryptodev/rte_cryptodev.c
index 85a4b46ac9..ed1021f635 100644
--- a/lib/cryptodev/rte_cryptodev.c
+++ b/lib/cryptodev/rte_cryptodev.c
@@ -1142,7 +1142,7 @@ rte_cryptodev_pmd_release_device(struct rte_cryptodev 
*cryptodev)
cryptodev_fp_ops_reset(rte_crypto_fp_ops + dev_id);
 
/* Close device only if device operations have been set */
-   if (cryptodev->dev_ops) {
+   if (cryptodev->dev_ops && (rte_eal_process_type() == RTE_PROC_PRIMARY)) 
{
ret = rte_cryptodev_close(dev_id);
if (ret < 0)
return ret;
-- 
2.34.1



[PATCH] lib/fib: remove warning about implicit 64-bit conversion

2024-12-03 Thread Andre Muezerie
MSVC issues the warning below:

../lib/fib/trie.c(341): warning C4334: '<<':
result of 32-bit shift implicitly converted to 64 bits
(was 64-bit shift intended?)

The fix is to cast the result explicitly to ptrdiff_t since it is used
in pointer arithmetic.

Signed-off-by: Andre Muezerie 
---
 lib/fib/trie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/fib/trie.c b/lib/fib/trie.c
index 4893f6c636..997b7cc338 100644
--- a/lib/fib/trie.c
+++ b/lib/fib/trie.c
@@ -338,7 +338,7 @@ write_edge(struct rte_trie_tbl *dp, const uint8_t *ip_part, 
uint64_t next_hop,
if (ret < 0)
return ret;
if (edge == LEDGE) {
-   write_to_dp((uint8_t *)p + (1 << dp->nh_sz),
+   write_to_dp((uint8_t *)p + (ptrdiff_t)(1 << dp->nh_sz),
next_hop << 1, dp->nh_sz, UINT8_MAX - *ip_part);
} else {
write_to_dp(get_tbl_p_by_idx(dp->tbl8, tbl8_idx *
-- 
2.47.0.vfs.0.3



Re: [PATCH v1] app/testpmd: use Tx preparation in txonly engine

2024-12-03 Thread Stephen Hemminger
On Wed,  3 Jan 2024 09:29:12 +0800
Kaiwen Deng  wrote:

> Txonly forwarding engine does not call the Tx preparation API
> before transmitting packets. This may cause some problems.
> 
> TSO breaks when MSS spans more than 8 data fragments. Those
> packets will be dropped by Tx preparation API, but it will cause
> MDD event if txonly forwarding engine does not call the Tx preparation
> API before transmitting packets.
> 
> We can reproduce this issue by these steps list blow on ICE and I40e.
> 
> ./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -c 0xf -n 4 -- -i
> --tx-offloads=0x8000
> 
> testpmd>set txpkts 64,128,256,512,64,128,256,512,512
> testpmd>set burst 1
> testpmd>start tx_first 1  
> 
> This commit will use Tx preparation API in txonly forwarding engine.
> 
> Fixes: 655131ccf727 ("app/testpmd: factorize fwd engines Tx")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Kaiwen Deng 
> ---
>  app/test-pmd/txonly.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
> index c2b88764be..60d69be3f6 100644
> --- a/app/test-pmd/txonly.c
> +++ b/app/test-pmd/txonly.c
> @@ -339,6 +339,7 @@ pkt_burst_transmit(struct fwd_stream *fs)
>   struct rte_ether_hdr eth_hdr;
>   uint16_t nb_tx;
>   uint16_t nb_pkt;
> + uint16_t nb_prep;
>   uint16_t vlan_tci, vlan_tci_outer;
>   uint64_t ol_flags = 0;
>   uint64_t tx_offloads;
> @@ -396,7 +397,17 @@ pkt_burst_transmit(struct fwd_stream *fs)
>   if (nb_pkt == 0)
>   return false;
>  
> - nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_pkt);
> + nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue,
> + pkts_burst, nb_pkt);
> + if (unlikely(nb_prep != nb_pkt)) {
> + fprintf(stderr,
> + "Preparing packet burst to transmit failed: %s\n",
> + rte_strerror(rte_errno));
> + fs->fwd_dropped += (nb_pkt - nb_prep);
> + rte_pktmbuf_free_bulk(&pkts_burst[nb_prep], nb_pkt - nb_prep);
> + }
> +
> + nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_prep);

The comment section on this patch raises lots of good points.

1. Testpmd and example applications are not calling tx_prepare.
2. Testpmd and examples are not checking descriptor limits.
3. It is not clear from documentation when tx_prepare is required.

On a practical level, if testpmd was being used in txonly mode,
if the condition ever triggered, you really really don't want to
print a message there since it would likely be endless stream of messages.

Please consider the comments and come back with a better solution in v2.



Re: [PATCH v16 1/4] lib: add generic support for reading PMU events

2024-12-03 Thread Stephen Hemminger
On Wed, 4 Dec 2024 00:49:58 +0100
Morten Brørup  wrote:

> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > Sent: Tuesday, 3 December 2024 22.39
> > 
> > On Mon, 18 Nov 2024 08:37:03 +0100
> > Tomasz Duszynski  wrote:
> >   
> > > +Performance counter based profiling
> > > +---
> > > +
> > > +Majority of architectures support some performance monitoring unit  
> > (PMU).  
> > > +Such unit provides programmable counters that monitor specific  
> > events.  
> > > +
> > > +Different tools gather that information, like for example perf.
> > > +However, in some scenarios when CPU cores are isolated and run
> > > +dedicated tasks interrupting those tasks with perf may be  
> > undesirable.
> > 
> > The data should be folded into telemetry rather than introducing yet
> > another
> > DPDK API for applications to deal with.  
> 
> I strongly prefer the dedicated high-performance PMU API rather than using 
> telemetry for this.
> Please keep the PMU API.
> 
> I expect to call the PMU API in our (proprietary) run-time profiling library, 
> where reading PMU counters should be as lean as calling rte_rdtsc(). I sure 
> don't want any superfluous overhead when profiling with a very high sampling 
> rate.
> 
> For reference, many other libraries have dedicated APIs for reading the 
> statistics structures of those libraries.
> 
> A wrapper around the PMU API can be added for Telemetry.
> 
> IMO, the Telemetry library should be made optional, like the Trace library 
> recently was. For embedded systems, they are not only bloat, but potentially 
> helpful for hackers trying to break in. And Security is one of the DPDK 
> Governing Board's focus areas.
> 

Can this data go right into perf?
It is not clear why this is better than just using perf?
The one use case I can think of is a cloud provider with lots and lots of 
embedded systems.
But in that case they already have much more detailed and integrated tools, the 
DPDK stuff is not needed.


Re: [PATCH v2 2/2] net/bonding: add command to set dedicated queue size

2024-12-03 Thread Stephen Hemminger
On Fri, 11 Oct 2024 11:24:12 +0800
Chaoyong He  wrote:

> From: Long Wu 
> 
> The testpmd application can not modify the value of
> dedicated hardware Rx/Tx queue size, and hardcoded
> them as (128/512). This will cause the bonding port
> start fail if some NIC requires more Rx/Tx descriptors
> than the hardcoded number.
> 
> Therefore, add a command into testpmd application to
> support the modification of the size of the dedicated
> hardware Rx/Tx queue. Also export an external interface
> to also let other applications can change it.
> 
> Signed-off-by: Long Wu 
> Reviewed-by: Peng Zhang 
> Reviewed-by: Chaoyong He 

24.11 is released, this patch if still of interest will need to be rebased.

The definition of what a "dedicated queue" is a bit confusing.
If it is only for LACP packets, it should never need to be very big.
Only under a mis-configuration and DoS kind of flood should there
ever be many packets.


fbdev

2024-12-03 Thread Lewis Donzis
fbdev is now runnin FreeBSD 14.2. 

The former version (14.1) is now running on fbdev141. 

Thanks, 
lew 


RE: [PATCH v16 1/4] lib: add generic support for reading PMU events

2024-12-03 Thread Morten Brørup
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Tuesday, 3 December 2024 22.39
> 
> On Mon, 18 Nov 2024 08:37:03 +0100
> Tomasz Duszynski  wrote:
> 
> > +Performance counter based profiling
> > +---
> > +
> > +Majority of architectures support some performance monitoring unit
> (PMU).
> > +Such unit provides programmable counters that monitor specific
> events.
> > +
> > +Different tools gather that information, like for example perf.
> > +However, in some scenarios when CPU cores are isolated and run
> > +dedicated tasks interrupting those tasks with perf may be
> undesirable.
> 
> The data should be folded into telemetry rather than introducing yet
> another
> DPDK API for applications to deal with.

I strongly prefer the dedicated high-performance PMU API rather than using 
telemetry for this.
Please keep the PMU API.

I expect to call the PMU API in our (proprietary) run-time profiling library, 
where reading PMU counters should be as lean as calling rte_rdtsc(). I sure 
don't want any superfluous overhead when profiling with a very high sampling 
rate.

For reference, many other libraries have dedicated APIs for reading the 
statistics structures of those libraries.

A wrapper around the PMU API can be added for Telemetry.

IMO, the Telemetry library should be made optional, like the Trace library 
recently was. For embedded systems, they are not only bloat, but potentially 
helpful for hackers trying to break in. And Security is one of the DPDK 
Governing Board's focus areas.



Re: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM

2024-12-03 Thread Roger Melton (rmelton)
After looking at this a bit closer today, I realize that my assertion that 
CLANG14 does support vcopyq_laneq_u32() for 32bit ARM was incorrect.  It does 
not.  The reason that disabling the implementation in rte_vect.h works for our 
clang builds is that we do not build the l3fwd app nor the ixgbe PMD for our 
application, and they are the only libraries that reference that function.

The clang compile errors appear to be related to how clang handles compile time 
constants, but I'm am again unsure how to resolve them in a way that would work 
for both GNU and clang.

Any suggestions?

Regards,
Roger


On 12/2/24 8:26 PM, Ruifeng Wang wrote:
+Arm folks.

From: Roger Melton (rmelton) 
Date: Tuesday, December 3, 2024 at 3:39 AM
To: dev@dpdk.org , 
Ruifeng Wang 
Subject: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit 
ARM

Hey folks,
We are building DPDK with clang14 for a 32bit armv8-a based CPU and ran into a 
compile error with the following from lib/eal/arm/include/rte_vect.h:



#if (defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \

(defined(RTE_ARCH_ARM64) && 
RTE_CC_IS_GNU && 
(GCC_VERSION < 
7))

/* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7-A(AArch32)

 * On AArch64, this intrinsic is supported since GCC version 7.

 */

static inline uint32x4_t

vcopyq_laneq_u32(uint32x4_t
 a, const int lane_a,

  uint32x4_t b, const int lane_b)

{

  return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a);

}

#endif

clang14 compile fails as follows:

In file included from 
../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36:
../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:24:
 error: argument to '__builtin_neon_vgetq_lane_i32' must be a constant integer
return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a);
^ ~~
/auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:7697:22:
 note: expanded from macro 'vgetq_lane_u32'
__ret = (uint32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \
^ 
/auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24148:19:
 note: expanded from macro 'vsetq_lane_u32'
uint32_t __s0 = __p0; \
^~~~
In file included from 
../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36:
../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:9:
 error: argument to '__builtin_neon_vsetq_lane_i32' must be a constant integer
return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a);
^ ~~
/auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24150:24:
 note: expanded from macro 'vsetq_lane_u32'
__ret = (uint32x4_t) __builtin_neon_vsetq_lane_i32(__s0, (int32x4_t)__s1, 
__p2); \
^ 
2 errors generated.



clang14 does appear to support the vcopyq_laneq_u32() intrinsic, s0 we want to 
skip the conditional implementation.

Two approaches I have tested to resolve the error are:

1) skip if building with clang:

#if !defined(__clang__) && ((defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \
72 (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 7)))


2) skip if not building for ARMv7:


#if (defined(RTE_ARCH_ARMv7) && defined(RTE_ARCH_32)) || \
(defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 7))


Both address our immediate problem, but may not be a appropriate for all cases.

Can anyone suggest the proper way to address this?  I'll be submitting an patch 
once I have a solution that is acceptable to the community.
Regards,
Roger













rte_mempool_create fails with --no-huge

2024-12-03 Thread Alipour, Mehrdad
Hi,

I am facing problem with rte_mempool_create when running my app with -no-huge 
after rte_eal_init succeeds.
Note that this app works fine with hugepages but since it's purpose is unit 
testing certain packet processing logic, it has no requirements to have ports 
and the Linux may not necessarily have hugepage setup in boot arsgs.
Hence -no-huge.

To demonstrate the issue, I used DPDK 23.11 dpdk-testpmd (please ignore the 
fact that there are no ports. Objective is to demonstrate rte_malloc/ 
rte_mempool_create issue)

Running directly on Linux:

dpdk-testpmd -c 000F -n 2 --log-level=eal,8 --no-huge -m 4095 --no-pci -- -i 
--nb-cores=2 --total-num-mbufs=2048
==
2024-12-03 13:22:39.078164  EAL: lib.eal log level changed from info to debug
2024-12-03 13:22:39.078299  EAL: Detected lcore 0 as core 0 on socket 0
2024-12-03 13:22:39.078325  EAL: Detected lcore 1 as core 0 on socket 0
2024-12-03 13:22:39.078344  EAL: Detected lcore 2 as core 0 on socket 0
2024-12-03 13:22:39.078362  EAL: Detected lcore 3 as core 0 on socket 0
2024-12-03 13:22:39.078391  EAL: Detected lcore 4 as core 0 on socket 0
2024-12-03 13:22:39.078411  EAL: Detected lcore 5 as core 0 on socket 0
2024-12-03 13:22:39.078433  EAL: Detected lcore 6 as core 0 on socket 0
2024-12-03 13:22:39.078451  EAL: Detected lcore 7 as core 0 on socket 0
2024-12-03 13:22:39.078469  EAL: Detected lcore 8 as core 0 on socket 0
2024-12-03 13:22:39.078490  EAL: Detected lcore 9 as core 0 on socket 0
2024-12-03 13:22:39.083517  EAL: Maximum logical cores by configuration: 128
2024-12-03 13:22:39.083527  EAL: Detected CPU lcores: 10
2024-12-03 13:22:39.083537  EAL: Detected NUMA nodes: 1
2024-12-03 13:22:39.083580  EAL: Checking presence of .so 'librte_eal.so.24.0'
2024-12-03 13:22:39.083663  EAL: Detected shared linkage of DPDK
2024-12-03 13:22:39.084799  EAL: Ask a virtual area of 0x7000 bytes
2024-12-03 13:22:39.084820  EAL: Virtual area found at 0x1 (size = 
0x7000)
2024-12-03 13:22:39.086041  EAL: Multi-process socket 
/var/run/dpdk/rte/mp_socket
2024-12-03 13:22:39.086231  EAL: Bus vdev wants IOVA as 'DC'
2024-12-03 13:22:39.086243  EAL: Bus pci wants IOVA as 'DC'
2024-12-03 13:22:39.086247  EAL: Buses did not request a specific IOVA mode.
2024-12-03 13:22:39.086252  EAL: Physical addresses are unavailable, selecting 
IOVA as VA mode.
2024-12-03 13:22:39.086257  EAL: Selected IOVA mode 'VA'
2024-12-03 13:22:39.088780  EAL: Probing VFIO support...
2024-12-03 13:22:39.088847  EAL: IOMMU type 1 (Type 1) is supported
2024-12-03 13:22:39.088857  EAL: IOMMU type 7 (sPAPR) is not supported
2024-12-03 13:22:39.088877  EAL: IOMMU type 8 (No-IOMMU) is not supported
2024-12-03 13:22:39.088891  EAL: VFIO support initialized
2024-12-03 13:22:39.088905  EAL: Ask a virtual area of 0x2d2000 bytes
2024-12-03 13:22:39.088916  EAL: Virtual area found at 0x17000 (size = 
0x2d2000)
2024-12-03 13:22:39.090053  EAL: Setting up physically contiguous memory...
2024-12-03 13:22:39.090084  EAL: Setting maximum number of open files to 262144
2024-12-03 13:22:39.090107  EAL: Ask a virtual area of 0x301d000 bytes
2024-12-03 13:22:39.090126  EAL: Virtual area found at 0x1002d9000 (size = 
0x301d000)
2024-12-03 13:22:39.108399  EAL: Memseg list allocated at socket 0, page size 
0x4kB
2024-12-03 13:22:39.108495  EAL: Using memfd for anonymous memory
2024-12-03 13:22:39.108507  EAL: Ask a virtual area of 0xfff0 bytes
2024-12-03 13:22:39.108522  EAL: Virtual area found at 0x1032f6000 (size = 
0xfff0)
2024-12-03 13:22:39.108531  EAL: VA reserved for memseg list at 0x1032f6000, 
size fff0
2024-12-03 13:22:39.128891  EAL: Added 4095M to heap on socket 0
2024-12-03 13:22:39.629348  EAL: TSC frequency is ~1395 KHz
2024-12-03 13:22:39.630008  EAL: Main lcore 0 is ready 
(tid=7fee0d440900;cpuset=[0])
2024-12-03 13:22:39.630259  EAL: lcore 1 is ready (tid=7fee0c43c640;cpuset=[1])
2024-12-03 13:22:39.630336  EAL: lcore 2 is ready (tid=7fee0bc3b640;cpuset=[2])
2024-12-03 13:22:39.632917  EAL: lcore 3 is ready (tid=7fee03fff640;cpuset=[3])
2024-12-03 13:22:39.644341  TELEMETRY: No legacy callbacks, legacy socket not 
created
2024-12-03 13:22:39.647156  testpmd: No probed ethernet devices
Interactive-mode selected

2024-12-03 13:22:52.251198  testpmd: create a new mbuf pool : 
n=2048, size=2176, socket=0
2024-12-03 13:22:52.251320  testpmd: preferred mempool ops selected: ring_mp_mc
2024-12-03 13:22:52.251633  EAL: Error - exiting with code: 1
  Cause: 2024-12-03 13:22:52.251660  Creation of mbuf pool for socket 0 failed: 
No such file or directory

==

Comparing with running on a VM with boot args (default_hugepagesz=2M 
hugepagesz=1G hugepages=4) but still using -no-huge:
Note that this VM runs on the same machine as the above Linux test.

dpdk-testpmd -c 000F -n 2 --log-level=eal,8 --no-huge -m 4095 --no-pci -- -i 
--nb-cores=2 --total-num-mbufs=2048
==
2024-12-03 19:32:36.6