Re: [EXT] Re: [PATCH 1/2] config/arm: avoid mcpu and march conflicts

2024-01-29 Thread Juraj Linkeš
On Wed, Jan 24, 2024 at 4:22 PM Pavan Nikhilesh Bhagavatula
 wrote:
>
>
>
> > -Original Message-
> > From: Juraj Linkeš 
> > Sent: Monday, January 22, 2024 9:57 PM
> > To: Pavan Nikhilesh Bhagavatula 
> > Cc: Jerin Jacob ; ruifeng.w...@arm.com;
> > n...@arm.com; Bruce Richardson ;
> > dev@dpdk.org
> > Subject: Re: [EXT] Re: [PATCH 1/2] config/arm: avoid mcpu and march
> > conflicts
> >
> > On Mon, Jan 22, 2024 at 12:54 PM Pavan Nikhilesh Bhagavatula
> >  wrote:
> > >
> > > > On Sun, Jan 21, 2024 at 10:37 AM  wrote:
> > > > >
> > > > > From: Pavan Nikhilesh 
> > > > >
> > > > > The compiler options march and mtune are a subset
> > > > > of mcpu and will lead to conflicts if improper march
> > > > > is chosen for a given mcpu.
> > > > > To avoid conflicts, force part number march when
> > > > > mcpu is available and is supported by the compiler.
> > > > >
> > > > > Example:
> > > > > march = armv9-a
> > > > > mcpu = neoverse-n2
> > > > >
> > > > > mcpu supported, march supported
> > > > > machine_args = ['-mcpu=neoverse-n2', '-march=armv9-a']
> > > > >
> > > > > mcpu supported, march not supported
> > > > > machine_args = ['-mcpu=neoverse-n2']
> > > > >
> > > > > mcpu not supported, march supported
> > > > > machine_args = ['-march=armv9-a']
> > > > >
> > > > > mcpu not supported, march not supported
> > > > > machine_args = ['-march=armv8.6-a']
> > > > >
> > > > > Signed-off-by: Pavan Nikhilesh 
> > > > > ---
> > > > >  config/arm/meson.build | 109 +-
> > -
> > > > --
> > > > >  1 file changed, 67 insertions(+), 42 deletions(-)
> > > > >
> > > > > diff --git a/config/arm/meson.build b/config/arm/meson.build
> > > > > index 36f21d2259..8c8cfccca0 100644
> > > > > --- a/config/arm/meson.build
> > > > > +++ b/config/arm/meson.build
> > > > 
> > > > > @@ -127,21 +128,22 @@ implementer_cavium = {
> > > > >  ],
> > > > >  'part_number_config': {
> > > > >  '0xa1': {
> > > > > -'compiler_options': ['-mcpu=thunderxt88'],
> > > > > +'mcpu': 'thunderxt88',
> > > > >  'flags': flags_part_number_thunderx
> > > > >  },
> > > > >  '0xa2': {
> > > > > -'compiler_options': ['-mcpu=thunderxt81'],
> > > > > +'mcpu': 'thunderxt81',
> > > > >  'flags': flags_part_number_thunderx
> > > > >  },
> > > > >  '0xa3': {
> > > > > -'compiler_options': ['-march=armv8-a+crc', '-
> > mcpu=thunderxt83'],
> > > > > +'mcpu': 'thunderxt83',
> > > > > +'compiler_options': ['-march=armv8-a+crc'],
> > > >
> > > > Let's unify this with the rest and specify 'march': 'armv8-a+crc'
> > > > instead of having it under compiler_options.
> > >
> > > Ack.
> > >
> > > >
> > > > >  'flags': flags_part_number_thunderx
> > > > >  },
> > > > >  '0xaf': {
> > > > >  'march': 'armv8.1-a',
> > > > >  'march_features': ['crc', 'crypto'],
> > > > > -'compiler_options': ['-mcpu=thunderx2t99'],
> > > > > +'mcpu': 'thunderx2t99',
> > > > >  'flags': [
> > > > >  ['RTE_MACHINE', '"thunderx2"'],
> > > > >  ['RTE_ARM_FEATURE_ATOMICS', true],
> > > > > @@ -153,7 +155,7 @@ implementer_cavium = {
> > > > >  '0xb2': {
> > > > >  'march': 'armv8.2-a',
> > > > >  'march_features': ['crc', 'crypto', 'lse'],
> > > > > -'compiler_options': ['-mcpu=octeontx2'],
> > > > > +'mcpu': 'octeontx2',
> > > > >  'flags': [
> > > > >  ['RTE_MACHINE', '"cn9k"'],
> > > > >  ['RTE_ARM_FEATURE_ATOMICS', true],
> > > > > @@ -176,7 +178,7 @@ implementer_ampere = {
> > > > >  '0x0': {
> > > > >  'march': 'armv8-a',
> > > > >  'march_features': ['crc', 'crypto'],
> > > > > -'compiler_options':  ['-mtune=emag'],
> > > > > +'mcpu': 'emag',
> > > >
> > > > We're changing mtune to mcpu, is this equivalent?
> > > >
> > >
> > > Both march and mtune are a subset of mcpu.
> > >
> >
> > Sure, but we replaced '-mtune=emag' with '-mcpu=emag'. Are these two
> > builds going to be different or the same?
> >
>
> Yeah, I believe both are same.
>

Ok, the change is fine then.

> > > > >  'flags': [
> > > > >  ['RTE_MACHINE', '"eMAG"'],
> > > > >  ['RTE_MAX_LCORE', 32],
> > > > > @@ -186,7 +188,7 @@ implementer_ampere = {
> > > > >  '0xac3': {
> > > > >  'march': 'armv8.6-a',
> > > > >  'march_features': ['crc', 'crypto'],
> > > > > -'compiler_options':  ['-mcpu=ampere1'],
> > > > > +'mcpu': 'ampere1',
> > > > >  'flags': [
> > > > >  ['RTE_MACHINE', '"AmpereOne"'],
> > > > >  ['RTE_MAX_LCORE', 320],
> > > > > @@ -206,7 +208,7

[PATCH v2 0/2] fix secondary process PCI UIO resource problem

2024-01-29 Thread Chaoyong He
This patch series aims to fix some problems in secondary process PCI UIO
resource map logic.

Zerun Fu (2):
  bus/pci: fix secondary process PCI uio resource map problem
  bus/pci: fix secondary process save 'FD' problem

 drivers/bus/pci/linux/pci_uio.c  |   5 +-
 drivers/bus/pci/pci_common_uio.c | 102 +++
 2 files changed, 68 insertions(+), 39 deletions(-)

-- 
2.39.1



[PATCH v2 1/2] bus/pci: fix secondary process PCI uio resource map problem

2024-01-29 Thread Chaoyong He
From: Zerun Fu 

For the primary process, the logic loops all BARs and will skip
the map of BAR with an invalid physical address (0), also will
assign 'uio_res->nb_maps' with the real mapped BARs number. But
for the secondary process, instead of loops all BARs, the logic
using the 'uio_res->nb_map' as index. If the device uses continuous
BARs there will be no problem, whereas if it uses discrete BARs,
it will lead to mapping errors.

Fix this problem by also loops all BARs and skip the map of BAR
with an invalid physical address in secondary process.

Fixes: 9b957f378abf ("pci: merge uio functions for linux and bsd")
Cc: muk...@igel.co.jp
Cc: sta...@dpdk.org

Signed-off-by: Zerun Fu 
Reviewed-by: Chaoyong He 
Reviewed-by: Long Wu 
Reviewed-by: Peng Zhang 
---
 drivers/bus/pci/pci_common_uio.c | 94 
 1 file changed, 60 insertions(+), 34 deletions(-)

diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 76c661f054..fcd8a49daf 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -23,10 +23,57 @@ static struct rte_tailq_elem rte_uio_tailq = {
 };
 EAL_REGISTER_TAILQ(rte_uio_tailq)
 
+static int
+pci_uio_map_secondary_resource_by_index(struct rte_pci_device *dev,
+   int res_idx, struct mapped_pci_resource *uio_res, int map_idx)
+{
+   int fd, i;
+
+   if (map_idx >= uio_res->nb_maps)
+   return -1;
+
+   /*
+* open devname, to mmap it
+*/
+   fd = open(uio_res->maps[map_idx].path, O_RDWR);
+   if (fd < 0) {
+   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+   uio_res->maps[map_idx].path, strerror(errno));
+   return -1;
+   }
+
+   void *mapaddr = pci_map_resource(uio_res->maps[map_idx].addr,
+   fd, (off_t)uio_res->maps[map_idx].offset,
+   (size_t)uio_res->maps[map_idx].size, 0);
+
+   /* fd is not needed in secondary process, close it */
+   close(fd);
+   if (mapaddr != uio_res->maps[map_idx].addr) {
+   RTE_LOG(ERR, EAL,
+   "Cannot mmap device resource file %s to address: %p\n",
+   uio_res->maps[map_idx].path,
+   uio_res->maps[map_idx].addr);
+   if (mapaddr != NULL) {
+   /* unmap addrs correctly mapped */
+   for (i = 0; i < map_idx; i++)
+   pci_unmap_resource(
+   uio_res->maps[i].addr,
+   (size_t)uio_res->maps[i].size);
+   /* unmap addr wrongly mapped */
+   pci_unmap_resource(mapaddr,
+   (size_t)uio_res->maps[map_idx].size);
+   }
+   return -1;
+   }
+   dev->mem_resource[res_idx].addr = mapaddr;
+
+   return 0;
+}
+
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-   int fd, i, j;
+   int map_idx = 0, res_idx, ret;
struct mapped_pci_resource *uio_res;
struct mapped_pci_res_list *uio_res_list =
RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
@@ -37,41 +84,20 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
if (rte_pci_addr_cmp(&uio_res->pci_addr, &dev->addr))
continue;
 
-   for (i = 0; i != uio_res->nb_maps; i++) {
-   /*
-* open devname, to mmap it
-*/
-   fd = open(uio_res->maps[i].path, O_RDWR);
-   if (fd < 0) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-   uio_res->maps[i].path, strerror(errno));
-   return -1;
+   /* Map all BARs */
+   for (res_idx = 0; res_idx != PCI_MAX_RESOURCE; res_idx++) {
+/* skip empty BAR */
+   if (dev->mem_resource[res_idx].phys_addr == 0)
+   continue;
+
+   ret = pci_uio_map_secondary_resource_by_index(dev, 
res_idx,
+   uio_res, map_idx);
+   if (ret < 0) {
+   RTE_LOG(ERR, EAL, "Failed to map resources\n");
+   return ret;
}
 
-   void *mapaddr = pci_map_resource(uio_res->maps[i].addr,
-   fd, (off_t)uio_res->maps[i].offset,
-   (size_t)uio_res->maps[i].size, 0);
-
-   /* fd is not needed in secondary process, close it */
-   close(fd);
-   if (mapaddr != uio_res->maps[i].addr) {
-   RTE_LOG(ERR, EAL,
- 

[PATCH v2 2/2] bus/pci: fix secondary process save 'FD' problem

2024-01-29 Thread Chaoyong He
From: Zerun Fu 

In the previous logic the 'fd' was only saved in the primary process,
but for some devices this value is also used in the secondary logic.

For example, the call of 'rte_pci_find_ext_capability()' will fail in
the secondary process.

Fix this problem by getting and saving the value of 'fd' also in the
secondary process logic.

Fixes: 9b957f378abf ("pci: merge uio functions for linux and bsd")
Cc: muk...@igel.co.jp
Cc: sta...@dpdk.org

Signed-off-by: Zerun Fu 
Reviewed-by: Chaoyong He 
Reviewed-by: Long Wu 
Reviewed-by: Peng Zhang 
---
 drivers/bus/pci/linux/pci_uio.c  | 5 -
 drivers/bus/pci/pci_common_uio.c | 8 
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
index 97d740dfe5..6680e42efb 100644
--- a/drivers/bus/pci/linux/pci_uio.c
+++ b/drivers/bus/pci/linux/pci_uio.c
@@ -237,7 +237,7 @@ pci_uio_alloc_resource(struct rte_pci_device *dev,
}
snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
 
-   /* save fd if in primary process */
+   /* save fd */
fd = open(devname, O_RDWR);
if (fd < 0) {
RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
@@ -261,6 +261,9 @@ pci_uio_alloc_resource(struct rte_pci_device *dev,
if (rte_intr_dev_fd_set(dev->intr_handle, uio_cfg_fd))
goto error;
 
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return 0;
+
if (dev->kdrv == RTE_PCI_KDRV_IGB_UIO) {
if (rte_intr_type_set(dev->intr_handle, RTE_INTR_HANDLE_UIO))
goto error;
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index fcd8a49daf..b6f79b067d 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -122,15 +122,15 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (rte_intr_dev_fd_set(dev->intr_handle, -1))
return -1;
 
-   /* secondary processes - use already recorded details */
-   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-   return pci_uio_map_secondary(dev);
-
/* allocate uio resource */
ret = pci_uio_alloc_resource(dev, &uio_res);
if (ret)
return ret;
 
+   /* secondary processes - use already recorded details */
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return pci_uio_map_secondary(dev);
+
/* Map all BARs */
for (i = 0; i != PCI_MAX_RESOURCE; i++) {
/* skip empty BAR */
-- 
2.39.1



Minutes of Technical Board meeting 24-January-2024

2024-01-29 Thread Kevin Traynor
Members Attending
-
Aaron Conole
Bruce Richardson
Hemant Agrawal
Honappa Nagarahalli
Jerin Jacob
Kevin Traynor (Chair)
Konstantin Ananyev
Maxime Coquelin
Morten Brorup
Stephen Hemminger
Thomas Monjalon

NOTE: The technical board meetings are on every second Wednesday at 3 pm
UTC.
Meetings are public, and DPDK community members are welcome to attend.
Agenda and minutes can be found at http://core.dpdk.org/techboard/minutes

Next meeting will be on Wednesday 2024-Feb-07 @ 3pm UTC, and will
be chaired by Konstantin.

Agenda Items


Arg parsing (Bruce)
---
- 3 series proposed as per last TB minutes
http://inbox.dpdk.org/dev/dbapr08mb581405e7cf23732b8941b10c98...@dbapr08mb5814.eurprd08.prod.outlook.com
- Discussion about these series and if merging it contradicts with
longer term function based initialization
- There is still a need to handle arguments for the foreseeable future
- Consensus for adding helpers to argparse lib that eal will depend on
- Drivers may use an extension of argparse library in time
- Call for patch reviews

3 year LTS maintenance? [Kevin]
---
- Current documented policy is for 2 years
- 19.11/20.11 LTS were maintained for 3 year and it went smoothly
- Proposal is to formally document 3 year LTS maintenance
- http://inbox.dpdk.org/dev/20240117162419.223820-1-ktray...@redhat.com
- Needs ack from LTS maintainers and companies involved in validation


Application logging [Thomas]

- Should applications use rte_logs or printf ?
- rte_log is useful because it can be enabled/disabled
- interactive items like testpmd stats are better suited to printf
- driver specific directories should use rte_log

gov/tech board meeting items [All]
--
- Thomas is starting investigation about DPDK with AI
- Discussed possibility of cloud companies publishing DPDK capability
and performance reports on dpdk website
- Some may prefer to publish on their own website
- If anyone has contacts or can help coordinate please reach out to Nathan



Re: [PATCH v3 0/2] ethdev: add the check for PTP capability

2024-01-29 Thread Ferruh Yigit
On 1/27/2024 1:52 AM, lihuisong (C) wrote:
> 
> 在 2024/1/27 0:54, Ferruh Yigit 写道:
>> On 1/11/2024 6:25 AM, lihuisong (C) wrote:
>>> Hi Ferruh,
>>>
>>> 在 2023/11/23 19:56, lihuisong (C) 写道:
 在 2023/11/2 7:39, Ferruh Yigit 写道:
> timesync_read_rx_timestamp
> On 9/21/2023 12:59 PM, lihuisong (C) wrote:
>> add ice & igc maintainers
>>
>> 在 2023/9/21 19:06, Ferruh Yigit 写道:
>>> On 9/21/2023 11:02 AM, lihuisong (C) wrote:
 Hi Ferruh,

 Sorry for my delay reply because of taking a look at all PMDs
 implementation.


 在 2023/9/16 1:46, Ferruh Yigit 写道:
> On 8/17/2023 9:42 AM, Huisong Li wrote:
>> From the first version of ptpclient, it seems that this
>> example
>> assume that
>> the PMDs support the PTP feature and enable PTP by default.
>> Please see
>> commit ab129e9065a5 ("examples/ptpclient: add minimal PTP
>> client")
>> which are introduced in 2015.
>>
>> And two years later, Rx HW timestamp offload was introduced to
>> enable or
>> disable PTP feature in HW via rte_eth_rxmode. Please see
>> commit 42ffc45aa340 ("ethdev: add Rx HW timestamp capability").
>>
> Hi Huisong,
>
> As far as I know this offload is not for PTP.
> PTP and TIMESTAMP are different.
 If TIMESTAMP offload cannot stand for PTP, we may need to add
 one new
 offlaod for PTP.

>>> Can you please detail what is "PTP offload"?
>>>
>> It indicates whether the device supports PTP or enable  PTP feature.
>>
> We have 'rte_eth_timesync_enable()' and 'rte_eth_timesync_disable()'
> APIs to control PTP support.
 No, this is just to control it.
 we still need to like a device capablity to report application if the
 port support to call this API, right?
> But when mention from "offload", it is something device itself does.
>
> PTP is a protocol (IEEE 1588), and used to synchronize clocks.
> What I get is protocol can be parsed by networking stack and it can be
> used by application to synchronize clock.
>
> When you are refer to "PTP offload", does it mean device (NIC)
> understands the protocol and parse it to synchronize device clock with
> other devices?
 Good point. PTP offload is unreasonable.
 But the capablity is required indeed.
 What do you think of introducing a RTE_ETH_DEV_PTP in
 dev->data->dev_flags for PTP feature?
>>> Can you take a look at this discussion line again?
>>>
>>> Let's introduce a  RTE_ETH_DEV_PTP in dev->data->dev_flags to reveal if
>>> the device support PTP feature.
>>>
> Hi Ferruh,
> 
> Thanks for taking your time to reply.
> 
>> Hi Huisong,
>>
>> First let me try to summarize the discussion since it has been some time.
>>
>> HW timer block can be used for Rx timestamp offload
>> (RTE_ETH_RX_OFFLOAD_TIMESTAMP) and/or PTP support, but they are two
>> different things.
>>
>> This patch uses 'RTE_ETH_RX_OFFLOAD_TIMESTAMP' capability for PTP
>> support, which is wrong.
>>
> correct.
>>
>> After we agreed on above, your next question is to use 'dev_flag' to
>> report PTP capability.
>>
>> First, can you please describe what is the motivation to learn if HW
>> supports PTP or now, what is the benefit of knowing this.
> There are a couple of device which have the same driver on a platform or
> OS.
> But just allow one device to support or be responsible for PTP feature.
> The firmware will report a capability to tell the device if it is
> support PTP.
> But, currently, driver doesn't know how to report user which device
> support PTP feature.
> 

Driver can hold a private data that records if PTP supported by the
device or not, and according this value PTP dev_ops can return error or
success.

This may not be ideal but it lets user to know about the support status,
can this work?


> In addition, many drivers use RTE_LIBRTE_IEEE1588 to control PTP code flow.
> Whether the device supports PTP is irrelevant to this macro.
>

Yes, I guess because both features use same HW, there is confusion there.

>>
>> If we agree that there is a need to know the PTP capability, question is
>> where to report this capability.
>>
>> Suggested 'dev_flags' is used for various things, some are internal
>> flags and some are status, I don't think overloading this variable is
>> not good idea.
> Yes, we need to consider  carefully.
>>
>> Other option is an update 'rte_eth_dev_info_get()' for it but it has the
>> same problem, this function is already overloaded with many different
>> things.
>>
>> We can have an API just to get PTP capability, but this will require a
>> new dev_ops, this can be an option but my concern is having a capability
>> dev_ops for each feature create a mess in dev_ops.
>>
>> Perhaps we can have single get_capability() API, and it gets features as
>> flags to return if th

Re: [PATCH 1/4] dts: constrain DPDK source flag

2024-01-29 Thread Juraj Linkeš
On Mon, Jan 22, 2024 at 7:26 PM Luca Vizzarro  wrote:
>
> DTS needs an input to gather the DPDK source code from. This is then
> built on the remote target. This commit makes sure that this input is
> more constrained, separating the Git revision ID – used to create a
> tarball using Git – and providing tarballed source code directly, while
> retaining mutual exclusion.
>

I didn't see the mutual exclusion being enforced in the code. From
what I can tell, I could pass both --tarball FILEPATH and --revision
and the former would be used without checking the latter.

> This makes the code more readable and easier to handle for input
> validation, of which this commit introduces a basic one based on the
> pre-existing code.
>

I wanted to have just one argument for basically the same thing, but
the input is pretty different so a two argument solution actually
sounds better.

> Moreover it ensures that these flags are explicitly required to be set
> by the user, dropping a default value. It also aids the user understand
> how to use the DTS in the scenario it is ran without any arguments set.
>
> Reviewed-by: Paul Szczepanek 
> Signed-off-by: Luca Vizzarro 
> ---
>  doc/guides/tools/dts.rst  |  8 +++--
>  dts/framework/settings.py | 64 ---
>  dts/framework/utils.py| 43 --
>  3 files changed, 79 insertions(+), 36 deletions(-)
>
> diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
> index 846696e14e..6e2da317e8 100644
> --- a/doc/guides/tools/dts.rst
> +++ b/doc/guides/tools/dts.rst
> @@ -215,12 +215,16 @@ DTS is run with ``main.py`` located in the ``dts`` 
> directory after entering Poet
>  .. code-block:: console
>
> (dts-py3.10) $ ./main.py --help
> -   usage: main.py [-h] [--config-file CONFIG_FILE] [--output-dir OUTPUT_DIR] 
> [-t TIMEOUT] [-v] [-s] [--tarball TARBALL] [--compile-timeout 
> COMPILE_TIMEOUT] [--test-cases TEST_CASES] [--re-run RE_RUN]
> +   usage: main.py [-h] (--tarball FILEPATH | --revision ID) [--config-file 
> CONFIG_FILE] [--output-dir OUTPUT_DIR] [-t TIMEOUT] [-v] [-s] 
> [--compile-timeout COMPILE_TIMEOUT] [--test-cases TEST_CASES] [--re-run 
> RE_RUN]
>
> Run DPDK test suites. All options may be specified with the environment 
> variables provided in brackets. Command line arguments have higher priority.
>
> options:
> -h, --helpshow this help message and exit
> +   --tarball FILEPATH, --snapshot FILEPATH
> + Path to DPDK source code tarball to test. (default: 
> None)
> +   --revision ID, --rev ID, --git-ref ID
> + Git revision ID to test. Could be commit, tag, tree 
> ID and vice versa. To test local changes, first commit them, then use their 
> commit ID (default: None)
> --config-file CONFIG_FILE
>   [DTS_CFG_FILE] configuration file that describes 
> the test cases, SUTs and targets. (default: ./conf.yaml)
> --output-dir OUTPUT_DIR, --output OUTPUT_DIR
> @@ -229,8 +233,6 @@ DTS is run with ``main.py`` located in the ``dts`` 
> directory after entering Poet
>   [DTS_TIMEOUT] The default timeout for all DTS 
> operations except for compiling DPDK. (default: 15)
> -v, --verbose [DTS_VERBOSE] Specify to enable verbose output, 
> logging all messages to the console. (default: False)
> -s, --skip-setup  [DTS_SKIP_SETUP] Specify to skip all setup steps on 
> SUT and TG nodes. (default: None)
> -   --tarball TARBALL, --snapshot TARBALL, --git-ref TARBALL
> - [DTS_DPDK_TARBALL] Path to DPDK source code tarball 
> or a git commit ID, tag ID or tree ID to test. To test local changes, first 
> commit them, then use the commit ID with this option. (default: dpdk.tar.xz)
> --compile-timeout COMPILE_TIMEOUT
>   [DTS_COMPILE_TIMEOUT] The timeout for compiling 
> DPDK. (default: 1200)
> --test-cases TEST_CASES
> diff --git a/dts/framework/settings.py b/dts/framework/settings.py
> index 609c8d0e62..2d0365e763 100644
> --- a/dts/framework/settings.py
> +++ b/dts/framework/settings.py
> @@ -76,7 +76,8 @@
>  from pathlib import Path
>  from typing import Any, TypeVar
>
> -from .utils import DPDKGitTarball
> +from .exception import ConfigurationError
> +from .utils import DPDKGitTarball, get_commit_id
>
>  _T = TypeVar("_T")
>
> @@ -149,6 +150,26 @@ def __call__(
>  return _EnvironmentArgument
>
>
> +def _parse_tarball(filepath: str) -> Path:
> +"""Validate if the filepath is valid and return a Path object."""

whether `filepath` is valid

Even though private methods don't get included in the API docs, I like
to be consistent. In this case, it doesn't really detract (but maybe
some disability would prove this wrong) while adding a bit (the fact
that we're referencing the argument).

I think the name should either be _validate_tarball or
_parse_tarball_path. The argument name is two words, so let's put an
undersco

Re: [EXT] Re: [PATCH] app/testpmd: add command to get Tx queue used count

2024-01-29 Thread Ferruh Yigit
On 1/29/2024 6:37 AM, Satha Koteswara Rao Kottidi wrote:
>> 
>> --
>> Please doc this command in doc/guides/testpmd_app_ug/testpmd_funcs.rst
>> 
>
> Thanks, will update in next version
>
>> 
>> Also why not extend "show port rxq xxx" command to support txq ?
> 
> txq and rxq are different directions, so  extended "show port " command to 
> support txq similar to rxq command. Could you please give more details if I 
> missed something here.
> 

There is an existing "show port  rxq  desc used
count" command, which is for Rx.

As you are adding support for Tx, instead of adding it as a new command,
existing 'cmd_show_rx_queue_desc_used_count_parsed()' can be extended to
support both Rx and Tx.
You can check 'cmd_show_rx_tx_desc_status_parsed()' as sample.

This also helps to have a unified syntax for Rx and Tx, as your version
is slightly diverging from the Rx one.


And please update 'cmd_help_long_parsed()' help string with relevant change.



RE: [EXT] Re: [PATCH] app/testpmd: add command to get Tx queue used count

2024-01-29 Thread Satha Koteswara Rao Kottidi


>> 
>> -
>> - Please doc this command in 
>> doc/guides/testpmd_app_ug/testpmd_funcs.rst
>> 
>
> Thanks, will update in next version
>
>> 
>> Also why not extend "show port rxq xxx" command to support txq ?
> 
> txq and rxq are different directions, so  extended "show port " command to 
> support txq similar to rxq command. Could you please give more details if I 
> missed something here.
> 

There is an existing "show port  rxq  desc used count" 
command, which is for Rx.

As you are adding support for Tx, instead of adding it as a new command, 
existing 'cmd_show_rx_queue_desc_used_count_parsed()' can be extended to 
support both Rx and Tx.
You can check 'cmd_show_rx_tx_desc_status_parsed()' as sample.

This also helps to have a unified syntax for Rx and Tx, as your version is 
slightly diverging from the Rx one.


And please update 'cmd_help_long_parsed()' help string with relevant change.

>>> Thanks Ferruh, will update unified command for Rx/Tx in next version.




RE: [PATCH v2 00/23] net/mlx5: support Geneve and options for HWS

2024-01-29 Thread Raslan Darawsheh
Hi,


> -Original Message-
> From: Michael Baum 
> Sent: Thursday, January 25, 2024 3:30 PM
> To: dev@dpdk.org
> Cc: Matan Azrad ; Raslan Darawsheh
> ; Dariusz Sosnowski ; Slava
> Ovsiienko ; Ori Kam ;
> Suanming Mou 
> Subject: [PATCH v2 00/23] net/mlx5: support Geneve and options for HWS
> 
> Add HWS support for both GENEVE and GENEVE TLV option headers.
> This patchset supports:
> 
>  - Add HW support for "RTE_FLOW_ITEM_TYPE_GENEVE" flow item.
>  - Add HW support for "RTE_FLOW_ITEM_TYPE_GENEVE_OPT" flow item.
>  - Add HW support for "RTE_FLOW_FIELD_GENEVE_VNI" for modify field flow
>action.
>  - Add HW support for "RTE_FLOW_FIELD_GENEVE_OPT_TYPE" for modify
> field
>flow action.
>  - Add HW support for "RTE_FLOW_FIELD_GENEVE_OPT_CLASS" for modify
> field
>flow action.
>  - Add HW support for "RTE_FLOW_FIELD_GENEVE_OPT_DATA" for modify
> field
>flow action.
> 
> The GENEVE TLV options support using flex parser.
> The profile should be specified to either 8 for multiple option or 0 for 
> single
> option.
> A new API is added to create the GENEVE option parser before using it in
> templates API.
> 
> v2:
>  - Rebase.
>  - Add "Acked-by" from v1.
> 
> Alex Vesker (4):
>   net/mlx5/hws: fix tunnel protocol checks
>   net/mlx5/hws: increase hl size for future compatibility
>   net/mlx5/hws: support GENEVE matching
>   net/mlx5/hws: support GENEVE options header
> 
> Michael Baum (19):
>   common/mlx5: fix duplicate read of general capabilities
>   common/mlx5: fix query sample info capability
>   net/mlx5: remove GENEVE options length limitation
>   net/mlx5: fix GENEVE option item translation
>   common/mlx5: add system image GUID attribute
>   common/mlx5: add GENEVE TLV option attribute structure
>   common/mlx5: add PRM attribute for TLV sample
>   common/mlx5: add sample info query syndrome into error log
>   common/mlx5: query GENEVE option sample ID from HCA attr
>   common/mlx5: add function to query GENEVE TLV option
>   net/mlx5: add physical device handle
>   net/mlx5: add GENEVE TLV options parser API
>   net/mlx5: add API to expose GENEVE option FW information
>   net/mlx5: add testpmd support for GENEVE TLV parser
>   net/mlx5: add support for GENEVE and option item in HWS
>   net/mlx5: add GENEVE option support for profile 0
>   net/mlx5: add GENEVE option support for group 0
>   net/mlx5: add support for GENEVE VNI modify field
>   net/mlx5: add support for modify GENEVE option header
> 
>  doc/guides/nics/mlx5.rst   |  251 +-
>  doc/guides/platform/mlx5.rst   |2 +
>  doc/guides/rel_notes/release_24_03.rst |9 +
>  drivers/common/mlx5/mlx5_devx_cmds.c   |  139 +++-
>  drivers/common/mlx5/mlx5_devx_cmds.h   |   29 +-
>  drivers/common/mlx5/mlx5_prm.h |   20 +-
>  drivers/common/mlx5/version.map|1 +
>  drivers/net/mlx5/hws/mlx5dr_definer.c  |  281 ++-
>  drivers/net/mlx5/hws/mlx5dr_definer.h  |   49 +-
>  drivers/net/mlx5/meson.build   |1 +
>  drivers/net/mlx5/mlx5.c|  115 ++-
>  drivers/net/mlx5/mlx5.h|   21 +
>  drivers/net/mlx5/mlx5_flow.c   |   30 +
>  drivers/net/mlx5/mlx5_flow.h   |   92 ++-
>  drivers/net/mlx5/mlx5_flow_dv.c|  158 ++--
>  drivers/net/mlx5/mlx5_flow_geneve.c| 1011
> 
>  drivers/net/mlx5/mlx5_flow_hw.c|  127 ++-
>  drivers/net/mlx5/mlx5_testpmd.c|  556 -
>  drivers/net/mlx5/rte_pmd_mlx5.h|  102 +++
>  drivers/net/mlx5/version.map   |3 +
>  20 files changed, 2811 insertions(+), 186 deletions(-)  create mode 100644
> drivers/net/mlx5/mlx5_flow_geneve.c
> 
> --
> 2.25.1
Series applied to next-net-mlx,
Kindest regards
Raslan Darawsheh



[PATCH] meson: link static libs with whole-archive in subproject

2024-01-29 Thread Robin Jarry
When DPDK is used as a subproject, declare static libs to be linked with
-Wl,--whole-archive along with the drivers. This is already done this
way in pkg-config.

Fixes: f93a605f2d6e ("build: add definitions for use as Meson subproject")
Cc: sta...@dpdk.org

Signed-off-by: Robin Jarry 
---
 buildtools/subproject/meson.build | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/buildtools/subproject/meson.build 
b/buildtools/subproject/meson.build
index 985ce76a9384..aa28f5fae152 100644
--- a/buildtools/subproject/meson.build
+++ b/buildtools/subproject/meson.build
@@ -8,8 +8,7 @@ if get_option('default_library') == 'static'
 dependencies: dpdk_static_lib_deps,
 # static library deps in DPDK build don't include "link_with" 
parameters,
 # so explicitly link-in both libs and drivers
-link_with: dpdk_static_libraries,
-link_whole: dpdk_drivers,
+link_whole: dpdk_static_libraries + dpdk_drivers,
 link_args: dpdk_extra_ldflags)
 else
 dpdk_dep = declare_dependency(
-- 
2.43.0



RE: [RFC] service: extend service function call statistics

2024-01-29 Thread Van Haaren, Harry
> -Original Message-
> From: Mattias Rönnblom 
> Sent: Saturday, January 27, 2024 7:32 PM
> To: Morten Brørup ; Mattias Rönnblom
> ; dev@dpdk.org
> Cc: Van Haaren, Harry ; Stefan Sundkvist
> 
> Subject: Re: [RFC] service: extend service function call statistics

Hi Mattias,

Thanks for the patch, looks good to me! Some responses to discussion below;

Acked-by: Harry van Haaren 

> On 2024-01-26 11:07, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> >> Sent: Friday, 26 January 2024 09.28
> >>
> >> On 2024-01-26 00:19, Morten Brørup wrote:
>  From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
>  Sent: Thursday, 25 January 2024 20.15
> 
>  Add two new per-service counters.
> 
>  RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service
> >> function
>  invocations where no work was performed.
> 
>  RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
>  resulting in an error.
> 
>  The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
>  counting all invocations, regardless of return value).
> 
>  The new statistics may be useful for both debugging and profiling
>  (e.g., calculate the average per-call processing latency for non-
> >> idle
>  service calls).
> 
>  Service core tests are extended to cover the new counters, and
>  coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.
> >>>
> >>> OK to all of the above. Good stuff.
> >>>
> 
>  The documentation for the CYCLES attributes are updated to reflect
>  their actual semantics.
> >>>
> >>> If this is intended behavior, then updating the documentation seems
> >> appropriate - I would even go so far as considering it a bug fix.
> >>>
> >>> However, quite a few cycles may be consumed by a service before it
> >> can conclude that it had no work to do. Shouldn't that be considered
> >> time spent by the service? I.e. should the code be fixed instead of the
> >> documentation?
> >>>
> >>
> >> Generally, polling for new work in the service is cheap, in my
> >> experience. But there's nothing in principle that prevents the
> >> situation
> >> your describe from occurring. You could add an "IDLE_CYCLES" counter in
> >> case that would ever be a real-world problem.
> >>
> >> That wouldn't be a fix, but rather just returning to the old, subtly
> >> broken, (pre-22.11?) semantics.
> >>
> >> Have a look at 809bd24 to see the rationale for the change. There's an
> >> example in 4689c57.
> >>
> >> The cause of this ambiguity is due to the fact that the platform/OS
> >> (i.e., DPDK) doesn't know which service to "wake up" (which in turn is
> >> the result of the platform not being able to tracking input sources,
> >> like NIC RX queues or timer wheels) and thus must ask every service to
> >> check if it has something to do.
> >
> > OK. Makes good sense.
> > So definitely fix the documentation, not the code. :-)

Agreed.

> >>
> >>> Alternatively, keep the behavior (for backwards compatibility) and
> >> fix the documentation, as this patch does, and add an IDLE_CYCLES
> >> counter for time spent in idle calls.
> >>>
> >>> PS: We're not using DPDK service cores in our applications, so I'm
> >> not familiar with the details. We are using something somewhat similar
> >> (but homegrown), also for profiling and power management purposes, and
> >> my feedback is based on my experience with our own variant of service
> >> cores.
> >>>
> >>
> >> When are you making the switch to service cores? :)
> >
> > Our own "service cores" implementation has some slightly different 
> > properties,
> which we are still experimenting with.
> >
> > E.g. in addition to the special return value "idle (no work to do)", we 
> > also have a
> special return value for "incomplete (more work urgently pending)" when a 
> service
> processed a full burst and still has more work pending its input queue.
> >
> > We are also considering returning a value to inform what time it needs to 
> > be called
> again. This concept is only an idea, and we haven't started experimenting 
> with it yet.
> >
> >
> >  From a high level perspective, the service cores library is much like an 
> > operating
> system's CPU scheduler, although based on voluntary time sharing. Many 
> algorithms
> and many parameters can be considered. It can also tie into power management 
> and
> prioritization of different tasks.

This was brought up as a concern when initially adding it to DPDK; the scope of 
service-cores
is quite easily going to change from "way to run a dataplane function on a 
core, to abstract
away the different environments that DPDK runs" to "userspace scheduler with 
bells-and-whistles".

The reason service-cores was built was to allow applications run with HW & SW 
eventdev PMDs,
and not have to handle the different threading requirements at the application 
level. This goal is
achieved by the current service-cores infrastructure.


> Servi

Re: [PATCH] meson: link static libs with whole-archive in subproject

2024-01-29 Thread David Marchand
On Mon, Jan 29, 2024 at 1:47 PM Robin Jarry  wrote:
>
> When DPDK is used as a subproject, declare static libs to be linked with
> -Wl,--whole-archive along with the drivers. This is already done this
> way in pkg-config.
>
> Fixes: f93a605f2d6e ("build: add definitions for use as Meson subproject")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Robin Jarry 
Tested-by: David Marchand 

-- 
David Marchand



Re: [PATCH 2/4] dts: customise argparse error message

2024-01-29 Thread Juraj Linkeš
On Mon, Jan 22, 2024 at 7:26 PM Luca Vizzarro  wrote:
>
> This commit customises the arguments parsing class' error message,
> making it so the confusing usage is not displayed in these occurrences,

I'm curious, what exactly is confusing about the message?

> but the user is redirected to use the help argument instead.
>
> Reviewed-by: Paul Szczepanek 
> Signed-off-by: Luca Vizzarro 
> ---
>  dts/framework/settings.py | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/dts/framework/settings.py b/dts/framework/settings.py
> index 2d0365e763..acfe5cad44 100644
> --- a/dts/framework/settings.py
> +++ b/dts/framework/settings.py
> @@ -170,6 +170,15 @@ def _parse_revision_id(rev_id: str) -> str:
>  )
>
>
> +class ArgumentParser(argparse.ArgumentParser):
> +"""ArgumentParser with a custom error message."""
> +def error(self, message):
> +print(f"{self.prog}: error: {message}\n", file=sys.stderr)
> +self.exit(2,
> +  "For help and usage, "
> +  "run the command with the --help flag.\n")
> +
> +
>  @dataclass(slots=True)
>  class Settings:
>  """Default framework-wide user settings.
> @@ -200,8 +209,8 @@ class Settings:
>  SETTINGS: Settings = Settings()
>
>
> -def _get_parser() -> argparse.ArgumentParser:
> -parser = argparse.ArgumentParser(
> +def _get_parser() -> ArgumentParser:
> +parser = ArgumentParser(
>  description="Run DPDK test suites. All options may be specified with 
> the environment "
>  "variables provided in brackets. Command line arguments have higher 
> priority.",
>  formatter_class=argparse.ArgumentDefaultsHelpFormatter,
> --
> 2.34.1
>


Re: [PATCH 4/4] dts: log stderr with failed remote commands

2024-01-29 Thread Juraj Linkeš
On Mon, Jan 22, 2024 at 7:26 PM Luca Vizzarro  wrote:
>
> Add the executed command stderr to RemoteCommandExecutionError. So that
> it is logged as an error, instead of just as debug.

Here's I'd add logged additionally as an error, as this sounds as if
we're changing debug to error.

>
> Reviewed-by: Paul Szczepanek 
> Signed-off-by: Luca Vizzarro 
> ---
>  dts/framework/exception.py | 10 +++---
>  dts/framework/remote_session/remote_session.py |  2 +-
>  2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/dts/framework/exception.py b/dts/framework/exception.py
> index 658eee2c38..9fc3fa096a 100644
> --- a/dts/framework/exception.py
> +++ b/dts/framework/exception.py
> @@ -130,20 +130,24 @@ class RemoteCommandExecutionError(DTSError):
>  #: The executed command.
>  command: str
>  _command_return_code: int
> +_command_stderr: str
>

I'd change the order here (and all other places) so that stderr is
before the return code.

> -def __init__(self, command: str, command_return_code: int):
> +def __init__(self, command: str, command_return_code: int, 
> command_stderr: str):
>  """Define the meaning of the first two arguments.
>
>  Args:
>  command: The executed command.
>  command_return_code: The return code of the executed command.
> +command_stderr: The stderr of the executed command.
>  """
>  self.command = command
>  self._command_return_code = command_return_code
> +self._command_stderr = command_stderr
>
>  def __str__(self) -> str:
> -"""Include both the command and return code in the string 
> representation."""
> -return f"Command {self.command} returned a non-zero exit code: 
> {self._command_return_code}"
> +"""Include the command, its return code and stderr in the string 
> representation."""
> +return (f"Command '{self.command}' returned a non-zero exit code: "
> +f"{self._command_return_code}\n{self._command_stderr}")

We should mention that the last string is the stderr output. Maybe we
just add 'Stderr:' before {self._command_stderr}. And maybe we should
put quotes around {self._command_stderr}.

>
>
>  class RemoteDirectoryExistsError(DTSError):
> diff --git a/dts/framework/remote_session/remote_session.py 
> b/dts/framework/remote_session/remote_session.py
> index 2059f9a981..345439f2de 100644
> --- a/dts/framework/remote_session/remote_session.py
> +++ b/dts/framework/remote_session/remote_session.py
> @@ -157,7 +157,7 @@ def send_command(
>  )
>  self._logger.debug(f"stdout: '{result.stdout}'")
>  self._logger.debug(f"stderr: '{result.stderr}'")
> -raise RemoteCommandExecutionError(command, result.return_code)
> +raise RemoteCommandExecutionError(command, result.return_code, 
> result.stderr)
>  self._logger.debug(f"Received from '{command}':\n{result}")
>  self.history.append(result)
>  return result
> --
> 2.34.1
>


Re: [PATCH] meson: link static libs with whole-archive in subproject

2024-01-29 Thread Bruce Richardson
On Mon, Jan 29, 2024 at 01:55:36PM +0100, David Marchand wrote:
> On Mon, Jan 29, 2024 at 1:47 PM Robin Jarry  wrote:
> >
> > When DPDK is used as a subproject, declare static libs to be linked with
> > -Wl,--whole-archive along with the drivers. This is already done this
> > way in pkg-config.
> >
> > Fixes: f93a605f2d6e ("build: add definitions for use as Meson subproject")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Robin Jarry 
> Tested-by: David Marchand 
> 
Acked-by: Bruce Richardson 


RE: [PATCH v1] net/mlx5: fix secondary process query stats segfault

2024-01-29 Thread Raslan Darawsheh
Hi,
> -Original Message-
> From: Rongwei Liu 
> Sent: Monday, January 22, 2024 9:44 AM
> To: dev@dpdk.org; Matan Azrad ; Slava Ovsiienko
> ; Ori Kam ; Suanming Mou
> ; NBU-Contact-Thomas Monjalon (EXTERNAL)
> 
> Cc: sta...@dpdk.org; Anatoly Burakov 
> Subject: [PATCH v1] net/mlx5: fix secondary process query stats segfault
> 
> The "outer_of_buffer" counter is owned by the primary process devx object
> and it is pointer by pointer in mlx5_priv structure. Actually, there are 4 
> levels'
> pointers in this piece of code.
> 
> The secondary process can't access this part directly since it belongs to 
> another
> process' heap.
[Fixed] typo = process' => process's 
> 
> Return ENOTSUP as workaround.
> 
> Signed-off-by: Rongwei Liu 
> Acked-by: Matan Azrad 
> Fixes: 750e48c7d ("common/mlx5: add DevX commands for queue counters")
> Cc: ma...@nvidia.com
> Cc: sta...@dpdk.org[
[Fixed] fixes tag and CC should be before signed-off by tag.

Patch applied to next-net-mlx,

Kindest regards
Raslan Darawsheh



RE: [RFC] ethdev: fast path async flow API

2024-01-29 Thread Dariusz Sosnowski
Hi all,

> -Original Message-
> From: Dariusz Sosnowski 
> Sent: Tuesday, January 23, 2024 12:37
> To: Stephen Hemminger 
> Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) ;
> Ferruh Yigit ; Andrew Rybchenko
> ; Ori Kam ;
> dev@dpdk.org
> Subject: RE: [RFC] ethdev: fast path async flow API
> 
> Hi Stephen,
> 
> Sorry for such a late response.
> 
> From: Stephen Hemminger 
> Sent: Thursday, January 4, 2024 02:08
> > On Wed, 3 Jan 2024 19:14:49 +
> > Dariusz Sosnowski  wrote:
> > > In summary, in my opinion extending the async flow API with bulking
> > capabilities or exposing the queue directly to the application is not 
> > desirable.
> > > This proposal aims to reduce the I-cache overhead in async flow API
> > > by
> > reusing the existing design pattern in DPDK - fast path functions are
> > inlined to the application code and they call cached PMD callbacks.
> >
> > Inline needs to more discouraged in DPDK, because it only works if
> > application ends up building with DPDK from source.
> > It doesn't work for the Linux distro packaging model and symbol
> > versioning, etc.
> 
> I understand the problem. In that case, I have a proposal.
> I had a chat with Thomas regarding this RFC, and he noticed that there are 2
> separate changes proposed here:
> 
> 1. Per-port callbacks for async flow API.
> - Moves specified callbacks to rte_flow_fp_ops struct and allow PMDs to
> register them dynamically.
> - Removes indirection at API level - no need to call rte_flow_ops_get().
> - Removes checking if callbacks are defined - either the default DPDK 
> callback
> is used or the one provided by PMD.
> 2. Make async flow API functions inlineable.
> 
> Change (1) won't break ABI (existing callbacks in rte_flow_ops can be marked
> as deprecated for now and phased out later) and in my opinion is less
> controversial compared to change (2).
> 
> I'll rerun the benchmarks for both changes separately to compare their
> benefits in isolation.
> This would allow us to decide if change (2) is worth the drawbacks it
> introduces.
> 
> What do you think?

I split the RFC into 2 parts:

1. Introduce per-port callbacks:
- Introduce rte_flow_fp_ops struct - holds callbacks for fast path calls, 
for each port. PMD registers callbacks through rte_flow_fp_ops_register().
- Relevant rte_flow_async_* functions just pass arguments to fast path 
callbacks. Validation checks are done only if RTE_FLOW_DEBUG macro is defined.
- Biggest difference is the removal of callback access through 
rte_flow_get_ops().
2. Inline async flow API functions.
- Relevant rte_flow_async_* functions definitions are moved to rte_flow.h 
and made inlineable.

Here are the results of the benchmark:

|   | Avg Insertion | Diff over baseline | Avg Deletion | 
Diff over baseline |
|---|---||--||
| baseline (v24.03-rc0) | 5855.4||9026.8|   
 |
| applied (1)   | 6384.2|+528.8 (+9%)|10054.2   |  
+1027.4 (+11.4%)  |
| applied (2)   | 6434.6|   +579.2 (+9.9%)   |10011.4   |   
+984.6 (+10.9%)  |

Results are in KFlows/sec.
The benchmark is continuously inserting and deleting 2M flow rules.
These rules match on IPv4 destination address and with a single action DROP.
Flow rules are inserted and deleted using a single flow queue.

Change (2) improves insertion rate performance by ~1% compared to (1), but 
decreases deletion rate by ~0.5%.
Based on these results, I think we can say that making rte_flow_async_*() calls 
inlineable may not be worth it compared to the issues it causes.

What do you all think about the results?

Best regards,
Dariusz Sosnowski
 


[PATCH v3 0/2] net/mlx5: add random compare support

2024-01-29 Thread Michael Baum
Add support for compare item with "RTE_FLOW_FIELD_RANDOM".

Depends-on: series-30606 ("ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE")

v2:
 - Rebase.
 - Add "RTE_FLOW_FIELD_META" compare support.
 - Reduce the "Depends-on" list.

v3:
 - Rebase.
 - Fix typo in function name, r/tranlate/translate.
 - Fix adding a line without newline at end of file.

Hamdan Igbaria (1):
  net/mlx5/hws: add support for compare matcher

Michael Baum (1):
  net/mlx5: add support to compare random value

 drivers/common/mlx5/mlx5_prm.h|  16 ++
 drivers/net/mlx5/hws/mlx5dr_cmd.c |   9 +-
 drivers/net/mlx5/hws/mlx5dr_cmd.h |   1 +
 drivers/net/mlx5/hws/mlx5dr_debug.c   |   4 +-
 drivers/net/mlx5/hws/mlx5dr_debug.h   |   1 +
 drivers/net/mlx5/hws/mlx5dr_definer.c | 243 +-
 drivers/net/mlx5/hws/mlx5dr_definer.h |  33 
 drivers/net/mlx5/hws/mlx5dr_matcher.c |  48 +
 drivers/net/mlx5/hws/mlx5dr_matcher.h |  12 +-
 drivers/net/mlx5/mlx5_flow_hw.c   |  70 ++--
 10 files changed, 410 insertions(+), 27 deletions(-)

-- 
2.25.1



[PATCH v3 1/2] net/mlx5/hws: add support for compare matcher

2024-01-29 Thread Michael Baum
From: Hamdan Igbaria 

Add support for compare matcher, this matcher will allow
direct comparison between two packet fields, or a packet
field and a value, with fully masked DW.
For now this matcher hash table is limited to size 1x1,
thus it supports only 1 rule STE.

Signed-off-by: Hamdan Igbaria 
Signed-off-by: Michael Baum 
---
 drivers/common/mlx5/mlx5_prm.h|  16 ++
 drivers/net/mlx5/hws/mlx5dr_cmd.c |   9 +-
 drivers/net/mlx5/hws/mlx5dr_cmd.h |   1 +
 drivers/net/mlx5/hws/mlx5dr_debug.c   |   4 +-
 drivers/net/mlx5/hws/mlx5dr_debug.h   |   1 +
 drivers/net/mlx5/hws/mlx5dr_definer.c | 243 +-
 drivers/net/mlx5/hws/mlx5dr_definer.h |  33 
 drivers/net/mlx5/hws/mlx5dr_matcher.c |  48 +
 drivers/net/mlx5/hws/mlx5dr_matcher.h |  12 +-
 9 files changed, 358 insertions(+), 9 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index f64f25dbb7..c8477658d7 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3455,6 +3455,7 @@ enum mlx5_ifc_rtc_ste_format {
MLX5_IFC_RTC_STE_FORMAT_8DW = 0x4,
MLX5_IFC_RTC_STE_FORMAT_11DW = 0x5,
MLX5_IFC_RTC_STE_FORMAT_RANGE = 0x7,
+   MLX5_IFC_RTC_STE_FORMAT_4DW_RANGE = 0x8,
 };
 
 enum mlx5_ifc_rtc_reparse_mode {
@@ -3493,6 +3494,21 @@ struct mlx5_ifc_rtc_bits {
u8 reserved_at_1a0[0x260];
 };
 
+struct mlx5_ifc_ste_match_4dw_range_ctrl_dw_bits {
+   u8 match[0x1];
+   u8 reserved_at_1[0x2];
+   u8 base1[0x1];
+   u8 inverse1[0x1];
+   u8 reserved_at_5[0x1];
+   u8 operator1[0x2];
+   u8 reserved_at_8[0x3];
+   u8 base0[0x1];
+   u8 inverse0[0x1];
+   u8 reserved_at_a[0x1];
+   u8 operator0[0x2];
+   u8 compare_delta[0x10];
+};
+
 struct mlx5_ifc_alias_context_bits {
u8 vhca_id_to_be_accessed[0x10];
u8 reserved_at_10[0xd];
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.c 
b/drivers/net/mlx5/hws/mlx5dr_cmd.c
index 876a47147d..702d6fadac 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.c
+++ b/drivers/net/mlx5/hws/mlx5dr_cmd.c
@@ -370,9 +370,12 @@ mlx5dr_cmd_rtc_create(struct ibv_context *ctx,
 attr, obj_type, MLX5_GENERAL_OBJ_TYPE_RTC);
 
attr = MLX5_ADDR_OF(create_rtc_in, in, rtc);
-   MLX5_SET(rtc, attr, ste_format_0, rtc_attr->is_frst_jumbo ?
-   MLX5_IFC_RTC_STE_FORMAT_11DW :
-   MLX5_IFC_RTC_STE_FORMAT_8DW);
+   if (rtc_attr->is_compare) {
+   MLX5_SET(rtc, attr, ste_format_0, 
MLX5_IFC_RTC_STE_FORMAT_4DW_RANGE);
+   } else {
+   MLX5_SET(rtc, attr, ste_format_0, rtc_attr->is_frst_jumbo ?
+MLX5_IFC_RTC_STE_FORMAT_11DW : 
MLX5_IFC_RTC_STE_FORMAT_8DW);
+   }
 
if (rtc_attr->is_scnd_range) {
MLX5_SET(rtc, attr, ste_format_1, 
MLX5_IFC_RTC_STE_FORMAT_RANGE);
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.h 
b/drivers/net/mlx5/hws/mlx5dr_cmd.h
index 18c2b07fc8..073ffd9633 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.h
+++ b/drivers/net/mlx5/hws/mlx5dr_cmd.h
@@ -82,6 +82,7 @@ struct mlx5dr_cmd_rtc_create_attr {
uint8_t reparse_mode;
bool is_frst_jumbo;
bool is_scnd_range;
+   bool is_compare;
 };
 
 struct mlx5dr_cmd_alias_obj_create_attr {
diff --git a/drivers/net/mlx5/hws/mlx5dr_debug.c 
b/drivers/net/mlx5/hws/mlx5dr_debug.c
index 11557bcab8..a9094cd35b 100644
--- a/drivers/net/mlx5/hws/mlx5dr_debug.c
+++ b/drivers/net/mlx5/hws/mlx5dr_debug.c
@@ -99,6 +99,7 @@ static int
 mlx5dr_debug_dump_matcher_match_template(FILE *f, struct mlx5dr_matcher 
*matcher)
 {
bool is_root = matcher->tbl->level == MLX5DR_ROOT_LEVEL;
+   bool is_compare = mlx5dr_matcher_is_compare(matcher);
enum mlx5dr_debug_res_type type;
int i, ret;
 
@@ -117,7 +118,8 @@ mlx5dr_debug_dump_matcher_match_template(FILE *f, struct 
mlx5dr_matcher *matcher
return rte_errno;
}
 
-   type = MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_MATCH_DEFINER;
+   type = is_compare ? 
MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_COMPARE_MATCH_DEFINER :
+   
MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_MATCH_DEFINER;
ret = mlx5dr_debug_dump_matcher_template_definer(f, mt, 
mt->definer, type);
if (ret)
return ret;
diff --git a/drivers/net/mlx5/hws/mlx5dr_debug.h 
b/drivers/net/mlx5/hws/mlx5dr_debug.h
index 5cffdb10b5..a89a6a0b1d 100644
--- a/drivers/net/mlx5/hws/mlx5dr_debug.h
+++ b/drivers/net/mlx5/hws/mlx5dr_debug.h
@@ -24,6 +24,7 @@ enum mlx5dr_debug_res_type {
MLX5DR_DEBUG_RES_TYPE_MATCHER_ACTION_TEMPLATE = 4204,
MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_HASH_DEFINER = 4205,
MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_RANGE_DEFINER = 4206,
+   MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_COMPARE_MATCH_DEFINER = 4207,
 };
 
 static inline uint64_t
diff --git a/drivers/net/mlx5/hws/mlx5dr_defi

[PATCH v3 2/2] net/mlx5: add support to compare random value

2024-01-29 Thread Michael Baum
Add support to use "RTE_FLOW_ITEM_TYPE_COMPARE" with
"RTE_FLOW_FIELD_RAMDOM" as an argument.
The random field is supported only when base is an immediate value,
random field cannot be compared with enother field.

Signed-off-by: Michael Baum 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 70 -
 1 file changed, 52 insertions(+), 18 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 08e211a5b8..63e76c86b1 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -6717,18 +6717,55 @@ flow_hw_prepend_item(const struct rte_flow_item *items,
return copied_items;
 }
 
-static inline bool
-flow_hw_item_compare_field_supported(enum rte_flow_field_id field)
+static int
+flow_hw_item_compare_field_validate(enum rte_flow_field_id arg_field,
+   enum rte_flow_field_id base_field,
+   struct rte_flow_error *error)
 {
-   switch (field) {
+   switch (arg_field) {
+   case RTE_FLOW_FIELD_TAG:
+   case RTE_FLOW_FIELD_META:
+   break;
+   case RTE_FLOW_FIELD_RANDOM:
+   if (base_field == RTE_FLOW_FIELD_VALUE)
+   return 0;
+   return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+ NULL,
+ "compare random is supported only 
with immediate value");
+   default:
+   return rte_flow_error_set(error, ENOTSUP,
+ RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+ NULL,
+ "compare item argument field is not 
supported");
+   }
+   switch (base_field) {
case RTE_FLOW_FIELD_TAG:
case RTE_FLOW_FIELD_META:
case RTE_FLOW_FIELD_VALUE:
-   return true;
+   break;
+   default:
+   return rte_flow_error_set(error, ENOTSUP,
+ RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+ NULL,
+ "compare item base field is not 
supported");
+   }
+   return 0;
+}
+
+static inline uint32_t
+flow_hw_item_compare_width_supported(enum rte_flow_field_id field)
+{
+   switch (field) {
+   case RTE_FLOW_FIELD_TAG:
+   case RTE_FLOW_FIELD_META:
+   return 32;
+   case RTE_FLOW_FIELD_RANDOM:
+   return 16;
default:
break;
}
-   return false;
+   return 0;
 }
 
 static int
@@ -6737,6 +6774,7 @@ flow_hw_validate_item_compare(const struct rte_flow_item 
*item,
 {
const struct rte_flow_item_compare *comp_m = item->mask;
const struct rte_flow_item_compare *comp_v = item->spec;
+   int ret;
 
if (unlikely(!comp_m))
return rte_flow_error_set(error, EINVAL,
@@ -6748,19 +6786,13 @@ flow_hw_validate_item_compare(const struct 
rte_flow_item *item,
   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
   NULL,
   "compare item only support full mask");
-   if (!flow_hw_item_compare_field_supported(comp_m->a.field) ||
-   !flow_hw_item_compare_field_supported(comp_m->b.field))
-   return rte_flow_error_set(error, ENOTSUP,
-  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-  NULL,
-  "compare item field not support");
-   if (comp_m->a.field == RTE_FLOW_FIELD_VALUE &&
-   comp_m->b.field == RTE_FLOW_FIELD_VALUE)
-   return rte_flow_error_set(error, EINVAL,
-  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-  NULL,
-  "compare between value is not valid");
+   ret = flow_hw_item_compare_field_validate(comp_m->a.field,
+ comp_m->b.field, error);
+   if (ret < 0)
+   return ret;
if (comp_v) {
+   uint32_t width;
+
if (comp_v->operation != comp_m->operation ||
comp_v->a.field != comp_m->a.field ||
comp_v->b.field != comp_m->b.field)
@@ -6768,7 +6800,9 @@ flow_hw_validate_item_compare(const struct rte_flow_item 
*item,
   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
   NULL,
   "compare item spec/mask not 
matching");
-   if ((comp_v->width & comp_m->width) != 32)
+   width = flow_hw_item_compare_width_supported(comp_v->a.field);
+   MLX5_ASSERT(width > 0);
+   if ((comp_v->width & comp_m->width) !=

Re: [PATCH v3 0/2] ethdev: add the check for PTP capability

2024-01-29 Thread lihuisong (C)



在 2024/1/29 19:16, Ferruh Yigit 写道:

On 1/27/2024 1:52 AM, lihuisong (C) wrote:

在 2024/1/27 0:54, Ferruh Yigit 写道:

On 1/11/2024 6:25 AM, lihuisong (C) wrote:

Hi Ferruh,

在 2023/11/23 19:56, lihuisong (C) 写道:

在 2023/11/2 7:39, Ferruh Yigit 写道:

timesync_read_rx_timestamp
On 9/21/2023 12:59 PM, lihuisong (C) wrote:

add ice & igc maintainers

在 2023/9/21 19:06, Ferruh Yigit 写道:

On 9/21/2023 11:02 AM, lihuisong (C) wrote:

Hi Ferruh,

Sorry for my delay reply because of taking a look at all PMDs
implementation.


在 2023/9/16 1:46, Ferruh Yigit 写道:

On 8/17/2023 9:42 AM, Huisong Li wrote:

 From the first version of ptpclient, it seems that this
example
assume that
the PMDs support the PTP feature and enable PTP by default.
Please see
commit ab129e9065a5 ("examples/ptpclient: add minimal PTP
client")
which are introduced in 2015.

And two years later, Rx HW timestamp offload was introduced to
enable or
disable PTP feature in HW via rte_eth_rxmode. Please see
commit 42ffc45aa340 ("ethdev: add Rx HW timestamp capability").


Hi Huisong,

As far as I know this offload is not for PTP.
PTP and TIMESTAMP are different.

If TIMESTAMP offload cannot stand for PTP, we may need to add
one new
offlaod for PTP.


Can you please detail what is "PTP offload"?


It indicates whether the device supports PTP or enable  PTP feature.


We have 'rte_eth_timesync_enable()' and 'rte_eth_timesync_disable()'
APIs to control PTP support.

No, this is just to control it.
we still need to like a device capablity to report application if the
port support to call this API, right?

But when mention from "offload", it is something device itself does.

PTP is a protocol (IEEE 1588), and used to synchronize clocks.
What I get is protocol can be parsed by networking stack and it can be
used by application to synchronize clock.

When you are refer to "PTP offload", does it mean device (NIC)
understands the protocol and parse it to synchronize device clock with
other devices?

Good point. PTP offload is unreasonable.
But the capablity is required indeed.
What do you think of introducing a RTE_ETH_DEV_PTP in
dev->data->dev_flags for PTP feature?

Can you take a look at this discussion line again?

Let's introduce a  RTE_ETH_DEV_PTP in dev->data->dev_flags to reveal if
the device support PTP feature.


Hi Ferruh,

Thanks for taking your time to reply.


Hi Huisong,

First let me try to summarize the discussion since it has been some time.

HW timer block can be used for Rx timestamp offload
(RTE_ETH_RX_OFFLOAD_TIMESTAMP) and/or PTP support, but they are two
different things.

This patch uses 'RTE_ETH_RX_OFFLOAD_TIMESTAMP' capability for PTP
support, which is wrong.


correct.

After we agreed on above, your next question is to use 'dev_flag' to
report PTP capability.

First, can you please describe what is the motivation to learn if HW
supports PTP or now, what is the benefit of knowing this.

There are a couple of device which have the same driver on a platform or
OS.
But just allow one device to support or be responsible for PTP feature.
The firmware will report a capability to tell the device if it is
support PTP.
But, currently, driver doesn't know how to report user which device
support PTP feature.


Driver can hold a private data that records if PTP supported by the
device or not, and according this value PTP dev_ops can return error or
success.

This may not be ideal but it lets user to know about the support status,
can this work?

I don't think it is user friendly.
Users know which port supports the PTP feature only when the API fails 
to be invoked, right?
In addition, this is a common issue for all supported PTP device. So It 
is better to do this check in PMD.




In addition, many drivers use RTE_LIBRTE_IEEE1588 to control PTP code flow.
Whether the device supports PTP is irrelevant to this macro.


Yes, I guess because both features use same HW, there is confusion there.


If we agree that there is a need to know the PTP capability, question is
where to report this capability.

Suggested 'dev_flags' is used for various things, some are internal
flags and some are status, I don't think overloading this variable is
not good idea.

Yes, we need to consider  carefully.

Other option is an update 'rte_eth_dev_info_get()' for it but it has the
same problem, this function is already overloaded with many different
things.

We can have an API just to get PTP capability, but this will require a
new dev_ops, this can be an option but my concern is having a capability
dev_ops for each feature create a mess in dev_ops.

Perhaps we can have single get_capability() API, and it gets features as
flags to return if that feature is supported or not, but this requires a
wider discussion.

Instead can we deduce the capability from PTP relevant dev_ops, if they
are implemented we can say it is supported? This doesn't require new
support.

Thank you mentioning so many ways.
For the end of advice, I don't think it is appropriate.
Because we hav

Re: [PATCH] ethdev: add template table resize API

2024-01-29 Thread Ferruh Yigit
On 12/17/2023 9:32 AM, Gregory Etelson wrote:
> Template table creation API sets table flows capacity.
> If application needs more flows then the table was designed for,
> the following procedures must be completed:
> 1. Create a new template table with larger flows capacity.
> 2. Re-create existing flows in the new table and delete flows from
>the original table.
> 3. Destroy original table.
> 
> Application cannot always execute that procedure:
> * Port may not have sufficient resources to allocate a new table
>   while maintaining original table.
> * Application may not have existing flows "recipes" to re-create
>   flows in a new table.
> 
> The patch defines a new API that allows application to resize
> existing template table:
> 
> * Resizable template table must be created with the
> RTE_FLOW_TABLE_SPECIALIZE_RESIZABLE_TABLE bit set.
> 
> * Application resizes existing table with the
>   `rte_flow_template_table_resize()` function call.
>   The table resize procedure updates the table maximal flow number
>   only. Other table attributes are not affected by the table resize.
>   ** The table resize procedure must not interrupt
>  existing table flows operations in hardware.
>   ** The table resize procedure must not alter flow handlers held by
>  application.
> 
> * After `rte_flow_template_table_resize()` returned, application must
>   update all existing table flow rules by calling
>   `rte_flow_async_update_resized()`.
>   The table resize procedure does not change application flow handler.
>   However, flow object can reference internal PMD resources that are
>   obsolete after table resize.
>   `rte_flow_async_update_resized()` moves internal flow references
>   to the updated table resources.
>   The flow update must not interrupt hardware flow operations.
> 
> * When all table flow were updated, application must call
>   `rte_flow_template_table_resize_complete()`.
>   The function releases PMD resources related to the original
>   table.
>   Application can start new table resize after
>   `rte_flow_template_table_resize_complete()` returned.
> 

Hi Gregory, Ori,

Why we need three separate APIs,
 rte_flow_template_table_resize
 rte_flow_async_update_resized
 rte_flow_template_table_resize_complete

Why not 'rte_flow_template_table_resize()' update existing flows and
release resources related to the original tables automatically?


Re: [PATCH v3 0/2] ethdev: add the check for PTP capability

2024-01-29 Thread Ferruh Yigit
On 1/29/2024 1:58 PM, lihuisong (C) wrote:
> 
> 在 2024/1/29 19:16, Ferruh Yigit 写道:
>> On 1/27/2024 1:52 AM, lihuisong (C) wrote:
>>> 在 2024/1/27 0:54, Ferruh Yigit 写道:
 On 1/11/2024 6:25 AM, lihuisong (C) wrote:
> Hi Ferruh,
>
> 在 2023/11/23 19:56, lihuisong (C) 写道:
>> 在 2023/11/2 7:39, Ferruh Yigit 写道:
>>> timesync_read_rx_timestamp
>>> On 9/21/2023 12:59 PM, lihuisong (C) wrote:
 add ice & igc maintainers

 在 2023/9/21 19:06, Ferruh Yigit 写道:
> On 9/21/2023 11:02 AM, lihuisong (C) wrote:
>> Hi Ferruh,
>>
>> Sorry for my delay reply because of taking a look at all PMDs
>> implementation.
>>
>>
>> 在 2023/9/16 1:46, Ferruh Yigit 写道:
>>> On 8/17/2023 9:42 AM, Huisong Li wrote:
  From the first version of ptpclient, it seems that this
 example
 assume that
 the PMDs support the PTP feature and enable PTP by default.
 Please see
 commit ab129e9065a5 ("examples/ptpclient: add minimal PTP
 client")
 which are introduced in 2015.

 And two years later, Rx HW timestamp offload was introduced to
 enable or
 disable PTP feature in HW via rte_eth_rxmode. Please see
 commit 42ffc45aa340 ("ethdev: add Rx HW timestamp capability").

>>> Hi Huisong,
>>>
>>> As far as I know this offload is not for PTP.
>>> PTP and TIMESTAMP are different.
>> If TIMESTAMP offload cannot stand for PTP, we may need to add
>> one new
>> offlaod for PTP.
>>
> Can you please detail what is "PTP offload"?
>
 It indicates whether the device supports PTP or enable  PTP
 feature.

>>> We have 'rte_eth_timesync_enable()' and 'rte_eth_timesync_disable()'
>>> APIs to control PTP support.
>> No, this is just to control it.
>> we still need to like a device capablity to report application if the
>> port support to call this API, right?
>>> But when mention from "offload", it is something device itself does.
>>>
>>> PTP is a protocol (IEEE 1588), and used to synchronize clocks.
>>> What I get is protocol can be parsed by networking stack and it
>>> can be
>>> used by application to synchronize clock.
>>>
>>> When you are refer to "PTP offload", does it mean device (NIC)
>>> understands the protocol and parse it to synchronize device clock
>>> with
>>> other devices?
>> Good point. PTP offload is unreasonable.
>> But the capablity is required indeed.
>> What do you think of introducing a RTE_ETH_DEV_PTP in
>> dev->data->dev_flags for PTP feature?
> Can you take a look at this discussion line again?
>
> Let's introduce a  RTE_ETH_DEV_PTP in dev->data->dev_flags to
> reveal if
> the device support PTP feature.
>
>>> Hi Ferruh,
>>>
>>> Thanks for taking your time to reply.
>>>
 Hi Huisong,

 First let me try to summarize the discussion since it has been some
 time.

 HW timer block can be used for Rx timestamp offload
 (RTE_ETH_RX_OFFLOAD_TIMESTAMP) and/or PTP support, but they are two
 different things.

 This patch uses 'RTE_ETH_RX_OFFLOAD_TIMESTAMP' capability for PTP
 support, which is wrong.

>>> correct.
 After we agreed on above, your next question is to use 'dev_flag' to
 report PTP capability.

 First, can you please describe what is the motivation to learn if HW
 supports PTP or now, what is the benefit of knowing this.
>>> There are a couple of device which have the same driver on a platform or
>>> OS.
>>> But just allow one device to support or be responsible for PTP feature.
>>> The firmware will report a capability to tell the device if it is
>>> support PTP.
>>> But, currently, driver doesn't know how to report user which device
>>> support PTP feature.
>>>
>> Driver can hold a private data that records if PTP supported by the
>> device or not, and according this value PTP dev_ops can return error or
>> success.
>>
>> This may not be ideal but it lets user to know about the support status,
>> can this work?
> I don't think it is user friendly.
> Users know which port supports the PTP feature only when the API fails
> to be invoked, right?
> In addition, this is a common issue for all supported PTP device. So It
> is better to do this check in PMD.
>>
>>
>>> In addition, many drivers use RTE_LIBRTE_IEEE1588 to control PTP code
>>> flow.
>>> Whether the device supports PTP is irrelevant to this macro.
>>>
>> Yes, I guess because both features use same HW, there is confusion there.
>>
 If we agree that there is a need to know the PTP capability,
 question is
 where to report this capability.

 Suggested 'dev_flags' is used for various things, some are internal
>>>

Re: [dpdk-dev] [v2] ethdev: support Tx queue used count

2024-01-29 Thread Ferruh Yigit
On 1/18/2024 9:47 AM, jer...@marvell.com wrote:
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index a050baab0f..73a788d91a 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -319,6 +319,7 @@ EXPERIMENTAL {
>  
>   # added in 24.03
>   rte_eth_find_rss_algo;
> + rte_eth_tx_queue_count;
>

As new API, 'rte_eth_tx_queue_count()', is static inline function, no
need to add it into .map file.
Patch is already merged but I will update it in next-net, and will
remove above update to 'ethdev/version.map' file.



Re: [PATCH] ethdev: add template table resize API

2024-01-29 Thread Etelson, Gregory

Hello Ferruh,



Hi Gregory, Ori,

Why we need three separate APIs,
rte_flow_template_table_resize
rte_flow_async_update_resized
rte_flow_template_table_resize_complete

Why not 'rte_flow_template_table_resize()' update existing flows and
release resources related to the original tables automatically?




Template table resize API allows to add new flows immediately after 
rte_flow_template_table_resize completed.

A multi-threaded application can add new and update old flows simultaneously.

A single resize-and-update API would require to lock PMD for entire operation.
For application with 1e6 flows doubling a table would end up with 
considerable down time.


The rte_flow_template_table_resize_complete was added for PMDs that cannot 
differentiate flows created before and after table resize.


Regards,
Gregory


[PATCH] net/ark: add PCIe ids for Atomic Rules TK242 devices

2024-01-29 Thread Ed Czeck
Update documentation and release notes.

Signed-off-by: Ed Czeck 
---
 doc/guides/nics/ark.rst| 3 +++
 doc/guides/rel_notes/release_24_03.rst | 4 
 drivers/net/ark/ark_ethdev.c   | 3 +++
 3 files changed, 10 insertions(+)

diff --git a/doc/guides/nics/ark.rst b/doc/guides/nics/ark.rst
index bcc9f505df..a9f6d4cdb8 100644
--- a/doc/guides/nics/ark.rst
+++ b/doc/guides/nics/ark.rst
@@ -307,6 +307,9 @@ ARK PMD supports the following Arkville RTL PCIe instances 
including:
 * ``1d6c:101e`` - AR-ARKA-FX1 [Arkville 64B DPDK Data Mover for Agilex R-Tile]
 * ``1d6c:101f`` - AR-TK242 [2x100GbE Packet Capture Device]
 * ``1d6c:1022`` - AR-ARKA-FX2 [Arkville 128B DPDK Data Mover for Agilex]
+* ``1d6c:1024`` - AR-TK242 [2x100GbE Packet Capture Device]
+* ``1d6c:1025`` - AR-TK242-FX2 [2x100GbE Gen5 Packet Capture Device]
+* ``1d6c:1026`` - AR-TK242-FX2 [1x200GbE Gen5 Packet Capture Device]
 
 Arkville RTL Core Configurations
 
diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index 6f8ad27808..54a321f1c2 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -55,6 +55,10 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Updated Atomic Rules' Arkville PMD.**
+
+  * Added support for Atomic Rules' TK242 packet-capture family of devices
+with PCI IDs: ``0x1024, 0x1025, 0x1026``.
 
 Removed Items
 -
diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
index 0ffd4b9e9e..c029dc46b3 100644
--- a/drivers/net/ark/ark_ethdev.c
+++ b/drivers/net/ark/ark_ethdev.c
@@ -99,6 +99,9 @@ static const struct rte_pci_id pci_id_ark_map[] = {
{RTE_PCI_DEVICE(AR_VENDOR_ID, 0x101e)},
{RTE_PCI_DEVICE(AR_VENDOR_ID, 0x101f)},
{RTE_PCI_DEVICE(AR_VENDOR_ID, 0x1022)},
+   {RTE_PCI_DEVICE(AR_VENDOR_ID, 0x1024)},
+   {RTE_PCI_DEVICE(AR_VENDOR_ID, 0x1025)},
+   {RTE_PCI_DEVICE(AR_VENDOR_ID, 0x1026)},
{.vendor_id = 0, /* sentinel */ },
 };
 
-- 
2.34.1



Re: [PATCH] maintainers: update for e1000/igc

2024-01-29 Thread Thomas Monjalon
16/01/2024 08:54, Simei Su:
> Remove myself from maintainers.
> 
> Signed-off-by: Simei Su 

Sorry to see you leaving.
Applied




Re: [PATCH] maintainers: update for e1000/ixgbe

2024-01-29 Thread Thomas Monjalon
16/01/2024 07:33, Wenjun Wu:
> Remove myself from maintainers.
> 
> Signed-off-by: Wenjun Wu 

Sorry to see you leaving.
Applied




Re: [PATCH] maintainers: updated for Intel drivers

2024-01-29 Thread Thomas Monjalon
17/01/2024 11:17, qiming.y...@intel.com:
> From: Qiming Yang 
> 
> Remove the maintainer that no longer work.
> 
> Signed-off-by: Qiming Yang 

Sorry to see you leaving.
Applied




Re: [PATCH] maintainers: update for i40e/iavf/idpf/cpfl

2024-01-29 Thread Thomas Monjalon
16/01/2024 14:32, beilei.x...@intel.com:
> From: Beilei Xing 
> 
> Remove myself from maintainers.
> 
> Signed-off-by: Beilei Xing 

Sorry to see you leaving.
Applied




Re: [PATCH] maintainers: update for intel PMD

2024-01-29 Thread Thomas Monjalon
18/01/2024 20:01, Qi Zhang:
> Remove my name for next-net-intel, fm10k, ice and af_xdp.
> 
> Signed-off-by: Qi Zhang 

Sorry to see you leaving.
Applied




Re: [PATCH v1] maintainers: remove tianfei from ifpga

2024-01-29 Thread Thomas Monjalon
29/11/2023 02:43, Xu, Rosen:
> Hi,
> 
> > -Original Message-
> > From: Huang, Wei 
> > Sent: Wednesday, November 29, 2023 9:45 AM
> > To: dev@dpdk.org; tho...@monjalon.net
> > Cc: sta...@dpdk.org; Xu, Rosen ; Mcnamara, John
> > ; Huang, Wei 
> > Subject: [PATCH v1] maintainers: remove tianfei from ifpga
> > 
> > Signed-off-by: Wei Huang 
> 
> Acked-by: Rosen Xu 

Applied





Re: [PATCH v3 3/3] dts: add API doc generation

2024-01-29 Thread Jeremy Spewock
On Mon, Jan 22, 2024 at 11:35 AM Juraj Linkeš
 wrote:
>
> The tool used to generate developer docs is Sphinx, which is already in
> use in DPDK. The same configuration is used to preserve style, but it's
> been augmented with doc-generating configuration. There's a change that
> modifies how the sidebar displays the content hierarchy that's been put
> into an if block to not interfere with regular docs.
>
> Sphinx generates the documentation from Python docstrings. The docstring
> format is the Google format [0] which requires the sphinx.ext.napoleon
> extension. The other extension, sphinx.ext.intersphinx, enables linking
> to object in external documentations, such as the Python documentation.
>
> There are two requirements for building DTS docs:
> * The same Python version as DTS or higher, because Sphinx imports the
>   code.
> * Also the same Python packages as DTS, for the same reason.
>
> [0] 
> https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings
>
> Signed-off-by: Juraj Linkeš 
> ---
>  buildtools/call-sphinx-build.py | 33 +++-
>  doc/api/doxy-api-index.md   |  3 +++
>  doc/api/doxy-api.conf.in|  2 ++
>  doc/api/meson.build | 11 +++---
>  doc/guides/conf.py  | 39 -
>  doc/guides/meson.build  |  1 +
>  doc/guides/tools/dts.rst| 34 +++-
>  dts/doc/meson.build | 27 +++
>  dts/meson.build | 16 ++
>  meson.build |  1 +
>  10 files changed, 148 insertions(+), 19 deletions(-)
>  create mode 100644 dts/doc/meson.build
>  create mode 100644 dts/meson.build
>
> diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
> index 39a60d09fa..aea771a64e 100755
> --- a/buildtools/call-sphinx-build.py
> +++ b/buildtools/call-sphinx-build.py
> @@ -3,37 +3,50 @@
>  # Copyright(c) 2019 Intel Corporation
>  #
>
> +import argparse
>  import sys
>  import os
>  from os.path import join
>  from subprocess import run, PIPE, STDOUT
>  from packaging.version import Version
>
> -# assign parameters to variables
> -(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
> +parser = argparse.ArgumentParser()
> +parser.add_argument('sphinx')
> +parser.add_argument('version')
> +parser.add_argument('src')
> +parser.add_argument('dst')
> +parser.add_argument('--dts-root', default=None)
> +args, extra_args = parser.parse_known_args()
>
>  # set the version in environment for sphinx to pick up
> -os.environ['DPDK_VERSION'] = version
> +os.environ['DPDK_VERSION'] = args.version
> +if args.dts_root:
> +os.environ['DTS_ROOT'] = args.dts_root
>
>  # for sphinx version >= 1.7 add parallelism using "-j auto"
> -ver = run([sphinx, '--version'], stdout=PIPE,
> +ver = run([args.sphinx, '--version'], stdout=PIPE,
>stderr=STDOUT).stdout.decode().split()[-1]
> -sphinx_cmd = [sphinx] + extra_args
> +sphinx_cmd = [args.sphinx] + extra_args
>  if Version(ver) >= Version('1.7'):
>  sphinx_cmd += ['-j', 'auto']
>
>  # find all the files sphinx will process so we can write them as dependencies
>  srcfiles = []
> -for root, dirs, files in os.walk(src):
> +for root, dirs, files in os.walk(args.src):
>  srcfiles.extend([join(root, f) for f in files])
>
> +if not os.path.exists(args.dst):
> +os.makedirs(args.dst)
> +
>  # run sphinx, putting the html output in a "html" directory
> -with open(join(dst, 'sphinx_html.out'), 'w') as out:
> -process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
> -  stdout=out)
> +with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
> +process = run(
> +sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
> +stdout=out
> +)
>
>  # create a gcc format .d file giving all the dependencies of this doc build
> -with open(join(dst, '.html.d'), 'w') as d:
> +with open(join(args.dst, '.html.d'), 'w') as d:
>  d.write('html: ' + ' '.join(srcfiles) + '\n')
>
>  sys.exit(process.returncode)
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index a6a768bd7c..b49b24acce 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -241,3 +241,6 @@ The public API headers are grouped by topics:
>[experimental APIs](@ref rte_compat.h),
>[ABI versioning](@ref rte_function_versioning.h),
>[version](@ref rte_version.h)
> +
> +- **tests**:
> +  [**DTS**](@dts_api_main_page)
> diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
> index e94c9e4e46..d53edeba57 100644
> --- a/doc/api/doxy-api.conf.in
> +++ b/doc/api/doxy-api.conf.in
> @@ -121,6 +121,8 @@ SEARCHENGINE= YES
>  SORT_MEMBER_DOCS= NO
>  SOURCE_BROWSER  = YES
>
> +ALIASES = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
> +
>  EXAMPLE_PATH= @TOPDIR@/examples
>  EXAMPLE_PATTERNS= *.c
>  EXAMPLE_RECURSIVE   = 

Re: [PATCH] maintainers: add new maintainer to DMA perf tool

2024-01-29 Thread Thomas Monjalon
15/12/2023 01:35, Chengwen Feng:
> Add myself as a new maintainer to DMA device performance tool.
> 
> Signed-off-by: Chengwen Feng 
Acked-by: Thomas Monjalon 

Applied, thanks.




Re: [RFC] ethdev: fast path async flow API

2024-01-29 Thread Ferruh Yigit
On 1/29/2024 1:38 PM, Dariusz Sosnowski wrote:
> Hi all,
> 
>> -Original Message-
>> From: Dariusz Sosnowski 
>> Sent: Tuesday, January 23, 2024 12:37
>> To: Stephen Hemminger 
>> Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) ;
>> Ferruh Yigit ; Andrew Rybchenko
>> ; Ori Kam ;
>> dev@dpdk.org
>> Subject: RE: [RFC] ethdev: fast path async flow API
>>
>> Hi Stephen,
>>
>> Sorry for such a late response.
>>
>> From: Stephen Hemminger 
>> Sent: Thursday, January 4, 2024 02:08
>>> On Wed, 3 Jan 2024 19:14:49 +
>>> Dariusz Sosnowski  wrote:
 In summary, in my opinion extending the async flow API with bulking
>>> capabilities or exposing the queue directly to the application is not 
>>> desirable.
 This proposal aims to reduce the I-cache overhead in async flow API
 by
>>> reusing the existing design pattern in DPDK - fast path functions are
>>> inlined to the application code and they call cached PMD callbacks.
>>>
>>> Inline needs to more discouraged in DPDK, because it only works if
>>> application ends up building with DPDK from source.
>>> It doesn't work for the Linux distro packaging model and symbol
>>> versioning, etc.
>>
>> I understand the problem. In that case, I have a proposal.
>> I had a chat with Thomas regarding this RFC, and he noticed that there are 2
>> separate changes proposed here:
>>
>> 1. Per-port callbacks for async flow API.
>> - Moves specified callbacks to rte_flow_fp_ops struct and allow PMDs to
>> register them dynamically.
>> - Removes indirection at API level - no need to call rte_flow_ops_get().
>> - Removes checking if callbacks are defined - either the default DPDK 
>> callback
>> is used or the one provided by PMD.
>> 2. Make async flow API functions inlineable.
>>
>> Change (1) won't break ABI (existing callbacks in rte_flow_ops can be marked
>> as deprecated for now and phased out later) and in my opinion is less
>> controversial compared to change (2).
>>
>> I'll rerun the benchmarks for both changes separately to compare their
>> benefits in isolation.
>> This would allow us to decide if change (2) is worth the drawbacks it
>> introduces.
>>
>> What do you think?
> 
> I split the RFC into 2 parts:
> 
> 1. Introduce per-port callbacks:
> - Introduce rte_flow_fp_ops struct - holds callbacks for fast path calls, 
> for each port. PMD registers callbacks through rte_flow_fp_ops_register().
> - Relevant rte_flow_async_* functions just pass arguments to fast path 
> callbacks. Validation checks are done only if RTE_FLOW_DEBUG macro is defined.
> - Biggest difference is the removal of callback access through 
> rte_flow_get_ops().
> 2. Inline async flow API functions.
> - Relevant rte_flow_async_* functions definitions are moved to rte_flow.h 
> and made inlineable.
> 
> Here are the results of the benchmark:
> 
> |   | Avg Insertion | Diff over baseline | Avg Deletion | 
> Diff over baseline |
> |---|---||--||
> | baseline (v24.03-rc0) | 5855.4||9026.8| 
>|
> | applied (1)   | 6384.2|+528.8 (+9%)|10054.2   | 
>  +1027.4 (+11.4%)  |
> | applied (2)   | 6434.6|   +579.2 (+9.9%)   |10011.4   | 
>   +984.6 (+10.9%)  |
> 
> Results are in KFlows/sec.
> The benchmark is continuously inserting and deleting 2M flow rules.
> These rules match on IPv4 destination address and with a single action DROP.
> Flow rules are inserted and deleted using a single flow queue.
> 
> Change (2) improves insertion rate performance by ~1% compared to (1), but 
> decreases deletion rate by ~0.5%.
> Based on these results, I think we can say that making rte_flow_async_*() 
> calls inlineable may not be worth it compared to the issues it causes.
> 
> What do you all think about the results?
> 

Hi Dariusz,

As discussed before, converting APIs to static inline functions or
exposing 'rte_eth_dev' has cons but with only applying first part above
seems get rid of them with reasonable performance gain, so I think we
can continue with this approach and continue to reviews.

Most of the 'rte_flow_async_*' are already missing the function validity
check, so having a default (dummy?) rte_flow_fp_ops for them seems even
an improvement there.


Only previously 'struct rte_flow_ops' was coming from drivers, ethdev
layer doesn't need to maintain anything.
But with 'rte_flow_fp_ops' struct, ethdev needs to store another array
with fixed ('RTE_MAX_ETHPORTS') size which will be another blocker in
the future to convert this fixed arrays to dynamically allocated arrays.

For this, does it still help we add an a new field to "struct
rte_eth_dev", like "struct rte_flow_ops *flow_ops", similar to 'dev_ops'?
The indirection still will be there, but eliminate 'rte_flow_get_ops()'
call and checks comes with it.
If makes sense, is there a chance to experiment this and get some
p

Re: [PATCH] net/ark: add PCIe ids for Atomic Rules TK242 devices

2024-01-29 Thread Ferruh Yigit
On 1/29/2024 3:48 PM, Ed Czeck wrote:
> Update documentation and release notes.
> 
> Signed-off-by: Ed Czeck 
>

Applied to dpdk-next-net/main, thanks.



RE: [EXT] Re: [PATCH v4 1/1] ethdev: parsing multiple representor devargs string

2024-01-29 Thread Harman Kalra
Hi Ferruh

Thanks for the review
Please find response inline


> -Original Message-
> From: Ferruh Yigit 
> Sent: Friday, January 26, 2024 7:13 PM
> To: Harman Kalra ; dev@dpdk.org; Thomas Monjalon
> ; Andrew Rybchenko
> ; Ajit Khaparde
> ; Somnath Kotur
> ; John Daley ;
> Hyong Youb Kim ; Yuying Zhang
> ; Beilei Xing ; Qiming Yang
> ; Qi Zhang ; Wenjun Wu
> ; Dariusz Sosnowski ;
> Viacheslav Ovsiienko ; Ori Kam
> ; Suanming Mou ; Matan
> Azrad ; Chaoyong He 
> Subject: [EXT] Re: [PATCH v4 1/1] ethdev: parsing multiple representor
> devargs string
> 
> External Email
> 
> --
> On 1/21/2024 7:19 PM, Harman Kalra wrote:
> > Adding support for parsing multiple representor devargs strings passed
> > to a PCI BDF. There may be scenario where port representors for
> > various PFs or VFs under PFs are required and all these are
> > representor ports shall be backed by single pci device. In such case
> > port representors can be created using devargs string:
> > ,representor=[pf[0-1],pf2vf[1,2-3],[4-5]]
> >
> 
> This patch is to be able to parse multiple representor device argument, but I
> am concerned how it is used.

In Marvell CNXK port representor solution all representors are backed by a 
single 
PCI device (we call it as eswitch device). Eswitch device is not DPDK ethdev 
device
but an internal device with NIC capabilities and is configured with as many no 
of
Rxqs and Txqs as port representor required.
Hence each port representor will have dedicated RxQ/TxQ pair to communicate with
respective represented PF or VF. 


> 
> When there are multiple representor ports backed up by same device, can't
> it cause syncronization issues, like if two representor interfaces used for
> conflicting configurations. Or if datapath will be supported, what if two
> representator used simultaneously.

As I mentioned above, each port representor will have dedicated RxQ and TxQ,
worker cores can poll on respective queues without any synchronization issue.
I hope I am able to explain the use case.

> 
> 




> 
> Meanwhile please check some comments below related to the parser code.
> 
> <...>
> 
> > @@ -459,9 +460,26 @@ eth_dev_devargs_tokenise(struct rte_kvargs
> *arglist, const char *str_in)
> > break;
> >
> > case 3: /* Parsing list */
> > -   if (*letter == ']')
> > -   state = 2;
> > -   else if (*letter == '\0')
> > +   if (*letter == ']') {
> > +   /* For devargs having singles lists move to
> state 2 once letter
> > +* becomes ']' so each can be considered as
> different pair key
> > +* value. But in nested lists case e.g. multiple
> representors
> > +* case i.e. [pf[0-3],pfvf[3,4-6]], complete
> nested list should
> > +* be considered as one pair value, hence
> checking if end of outer
> > +* list ']' is reached else stay on state 3.
> > +*/
> > +   if ((strcmp("representor", pair->key) == 0)
>   &&
> > +   (*(letter + 1) != '\0' && *(letter + 2) != 
> > '\0'
> &&
> > +*(letter + 3) != '\0') 
> > &&
> > +   ((*(letter + 2) == 'p' && *(letter + 3) == 
> > 'f')
> ||
> > +(*(letter + 2) == 'v' && *(letter + 3) == 
> > 'f')
> ||
> > +(*(letter + 2) == 's' && *(letter + 3) == 
> > 'f')
> ||
> > +(*(letter + 2) == 'c' && isdigit(*(letter 
> > + 3)))
> ||
> > +(*(letter + 2) == '[' && isdigit(*(letter +
> 3)
> >
> 
> Above is hard to understand but there are some assumptions in the input,
> can we list supported syntax in the comment to make it more clear.
> 
> For example following seems not support, can you please check if
> intentional:
> [vf0,vf1] // I am aware this can be put as vf[0,1] too [vf[0,1],3] 
> [vf[0],vf[1]]

Intention behind above change is just to detect if its nested list i.e. multiple
devargs (constituting lists as well) or a single devarg with a list.

pf0vf[1-4] is a single list devarg example, while [pf[0-3],pfvf[3,4-6]] is
nested list example, where multiple devargs pf[0-3] and pfvf[3,4-6]
(which are also lists) are bind to a list of devargs.

And as I mentioned in the comment, complete "nested list should be considered 
as one pair value", hence entire list i.e. [vf[0,1],3] [vf[0],vf[1]] will be one
value of arglist and later eth_dev_tokenise_representor_list() will tokenize
and feed to rte_eth_devargs_parse_representor_ports().

Whether any format is supported or not should be handled by 
rte_eth_devargs_parse_representor_ports()


> 
> I am not saying above shoul

[PATCH 2/3] crypto/qat: add sm2 encryption/decryption function

2024-01-29 Thread Arkadiusz Kusztal
This commit adds SM2 elliptic curve based asymmetric
encryption and decryption to the Intel QuickAssist
Technology PMD.

Signed-off-by: Arkadiusz Kusztal 
---
 .../common/qat/qat_adf/icp_qat_fw_mmp_ids.h   |   3 +
 drivers/common/qat/qat_adf/qat_pke.h  |  20 +++
 drivers/crypto/qat/qat_asym.c | 140 +-
 3 files changed, 157 insertions(+), 6 deletions(-)

diff --git a/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h 
b/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
index 630c6e1a9b..aa49612ca1 100644
--- a/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
+++ b/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
@@ -1542,6 +1542,9 @@ icp_qat_fw_mmp_ecdsa_verify_gfp_521_input::in in @endlink
  * @li no output parameters
  */
 
+#define PKE_ECSM2_ENCRYPTION 0x25221720
+#define PKE_ECSM2_DECRYPTION 0x201716e6
+
 #define PKE_LIVENESS 0x0001
 /**< Functionality ID for PKE_LIVENESS
  * @li 0 input parameter(s)
diff --git a/drivers/common/qat/qat_adf/qat_pke.h 
b/drivers/common/qat/qat_adf/qat_pke.h
index f88932a275..ac051e965d 100644
--- a/drivers/common/qat/qat_adf/qat_pke.h
+++ b/drivers/common/qat/qat_adf/qat_pke.h
@@ -334,4 +334,24 @@ get_sm2_ecdsa_verify_function(void)
return qat_function;
 }
 
+static struct qat_asym_function
+get_sm2_encryption_function(void)
+{
+   struct qat_asym_function qat_function = {
+   PKE_ECSM2_ENCRYPTION, 32
+   };
+
+   return qat_function;
+}
+
+static struct qat_asym_function
+get_sm2_decryption_function(void)
+{
+   struct qat_asym_function qat_function = {
+   PKE_ECSM2_DECRYPTION, 32
+   };
+
+   return qat_function;
+}
+
 #endif
diff --git a/drivers/crypto/qat/qat_asym.c b/drivers/crypto/qat/qat_asym.c
index 2bf3060278..7407821a6e 100644
--- a/drivers/crypto/qat/qat_asym.c
+++ b/drivers/crypto/qat/qat_asym.c
@@ -925,6 +925,15 @@ sm2_ecdsa_sign_set_input(struct icp_qat_fw_pke_request 
*qat_req,
qat_req->input_param_count = 3;
qat_req->output_param_count = 2;
 
+   HEXDUMP("SM2 K test", asym_op->sm2.k.data,
+   cookie->alg_bytesize);
+   HEXDUMP("SM2 K", cookie->input_array[0],
+   cookie->alg_bytesize);
+   HEXDUMP("SM2 msg", cookie->input_array[1],
+   cookie->alg_bytesize);
+   HEXDUMP("SM2 pkey", cookie->input_array[2],
+   cookie->alg_bytesize);
+
return RTE_CRYPTO_OP_STATUS_SUCCESS;
 }
 
@@ -975,6 +984,114 @@ sm2_ecdsa_sign_collect(struct rte_crypto_asym_op *asym_op,
return RTE_CRYPTO_OP_STATUS_SUCCESS;
 }
 
+static int
+sm2_encryption_set_input(struct icp_qat_fw_pke_request *qat_req,
+   struct qat_asym_op_cookie *cookie,
+   const struct rte_crypto_asym_op *asym_op,
+   const struct rte_crypto_asym_xform *xform)
+{
+   const struct qat_asym_function qat_function =
+   get_sm2_encryption_function();
+   const uint32_t qat_func_alignsize =
+   qat_function.bytesize;
+
+   SET_PKE_LN(asym_op->sm2.k, qat_func_alignsize, 0);
+   SET_PKE_LN(xform->ec.q.x, qat_func_alignsize, 1);
+   SET_PKE_LN(xform->ec.q.y, qat_func_alignsize, 2);
+
+   cookie->alg_bytesize = qat_function.bytesize;
+   cookie->qat_func_alignsize = qat_function.bytesize;
+   qat_req->pke_hdr.cd_pars.func_id = qat_function.func_id;
+   qat_req->input_param_count = 3;
+   qat_req->output_param_count = 4;
+
+   HEXDUMP("SM2 K", cookie->input_array[0],
+   qat_func_alignsize);
+   HEXDUMP("SM2 Q.x", cookie->input_array[1],
+   qat_func_alignsize);
+   HEXDUMP("SM2 Q.y", cookie->input_array[2],
+   qat_func_alignsize);
+
+   return RTE_CRYPTO_OP_STATUS_SUCCESS;
+}
+
+static uint8_t
+sm2_encryption_collect(struct rte_crypto_asym_op *asym_op,
+   const struct qat_asym_op_cookie *cookie)
+{
+   uint32_t alg_bytesize = cookie->alg_bytesize;
+
+   rte_memcpy(asym_op->sm2.C1.x.data, cookie->output_array[0], 
alg_bytesize);
+   rte_memcpy(asym_op->sm2.C1.y.data, cookie->output_array[1], 
alg_bytesize);
+   rte_memcpy(asym_op->sm2.kP.x.data, cookie->output_array[2], 
alg_bytesize);
+   rte_memcpy(asym_op->sm2.kP.y.data, cookie->output_array[3], 
alg_bytesize);
+   asym_op->sm2.C1.x.length = alg_bytesize;
+   asym_op->sm2.C1.y.length = alg_bytesize;
+   asym_op->sm2.kP.x.length = alg_bytesize;
+   asym_op->sm2.kP.y.length = alg_bytesize;
+
+   HEXDUMP("C1[x1]", cookie->output_array[0],
+   alg_bytesize);
+   HEXDUMP("C1[y]", cookie->output_array[1],
+   alg_bytesize);
+   HEXDUMP("kP[x]", cookie->output_array[2],
+   alg_bytesize);
+   HEXDUMP("kP[y]", cookie->output_array[3],
+   alg_bytesize);
+   return RTE_CRYPTO_OP_STATUS_SUCCESS;
+}
+
+
+static int
+sm2_decryption_set_input(struct icp_qat_fw_pke_request *qat_req,
+   struct qat_asym_op_cookie *cookie,
+   const struct rte_crypto_a

[PATCH 1/3] cryptodev: add ec points to sm2 op

2024-01-29 Thread Arkadiusz Kusztal
In the case when PMD cannot support full process of the SM2,
but elliptic curve computation only, additional fields
are needed to handle such a case.

Points C1, kP therefore were added to the SM2 crypto operation struct.

Signed-off-by: Arkadiusz Kusztal 
---
 lib/cryptodev/rte_crypto_asym.h | 119 +++-
 1 file changed, 71 insertions(+), 48 deletions(-)

diff --git a/lib/cryptodev/rte_crypto_asym.h b/lib/cryptodev/rte_crypto_asym.h
index 39d3da3952..55620d2d3a 100644
--- a/lib/cryptodev/rte_crypto_asym.h
+++ b/lib/cryptodev/rte_crypto_asym.h
@@ -599,40 +599,6 @@ struct rte_crypto_ecpm_op_param {
/**< Scalar to multiply the input point */
 };
 
-/**
- * Asymmetric crypto transform data
- *
- * Structure describing asym xforms.
- */
-struct rte_crypto_asym_xform {
-   struct rte_crypto_asym_xform *next;
-   /**< Pointer to next xform to set up xform chain.*/
-   enum rte_crypto_asym_xform_type xform_type;
-   /**< Asymmetric crypto transform */
-
-   union {
-   struct rte_crypto_rsa_xform rsa;
-   /**< RSA xform parameters */
-
-   struct rte_crypto_modex_xform modex;
-   /**< Modular Exponentiation xform parameters */
-
-   struct rte_crypto_modinv_xform modinv;
-   /**< Modular Multiplicative Inverse xform parameters */
-
-   struct rte_crypto_dh_xform dh;
-   /**< DH xform parameters */
-
-   struct rte_crypto_dsa_xform dsa;
-   /**< DSA xform parameters */
-
-   struct rte_crypto_ec_xform ec;
-   /**< EC xform parameters, used by elliptic curve based
-* operations.
-*/
-   };
-};
-
 /**
  * SM2 operation params.
  */
@@ -658,20 +624,43 @@ struct rte_crypto_sm2_op_param {
 * will be overwritten by the PMD with the decrypted length.
 */
 
-   rte_crypto_param cipher;
-   /**<
-* Pointer to input data
-* - to be decrypted for SM2 private decrypt.
-*
-* Pointer to output data
-* - for SM2 public encrypt.
-* In this case the underlying array should have been allocated
-* with enough memory to hold ciphertext output (at least X bytes
-* for prime field curve of N bytes and for message M bytes,
-* where X = (C1 || C2 || C3) and computed based on SM2 RFC as
-* C1 (1 + N + N), C2 = M, C3 = N. The cipher.length field will
-* be overwritten by the PMD with the encrypted length.
-*/
+   union {
+   rte_crypto_param cipher;
+   /**<
+* Pointer to input data
+* - to be decrypted for SM2 private decrypt.
+*
+* Pointer to output data
+* - for SM2 public encrypt.
+* In this case the underlying array should have been allocated
+* with enough memory to hold ciphertext output (at least X 
bytes
+* for prime field curve of N bytes and for message M bytes,
+* where X = (C1 || C2 || C3) and computed based on SM2 RFC as
+* C1 (1 + N + N), C2 = M, C3 = N. The cipher.length field will
+* be overwritten by the PMD with the encrypted length.
+*/
+   struct {
+   struct rte_crypto_ec_point C1;
+   /**<
+* This field is used only when PMD does not support 
full
+* process of the SM2 encryption/decryption, but 
elliptic
+* curve part only.
+*
+* In the case of encryption, it is an output - point 
C1 = (x1,y1).
+* In the case of decryption, if is an input - point C1 
= (x1,y1)
+*
+*/
+   struct rte_crypto_ec_point kP;
+   /**<
+* This field is used only when PMD does not support 
full
+* process of the SM2 encryption/decryption, but 
elliptic
+* curve part only.
+*
+* It is an output in the encryption case, it is a point
+* [k]P = (x2,y2)
+*/
+   };
+   };
 
rte_crypto_uint id;
/**< The SM2 id used by signer and verifier. */
@@ -697,6 +686,40 @@ struct rte_crypto_sm2_op_param {
 */
 };
 
+/**
+ * Asymmetric crypto transform data
+ *
+ * Structure describing asym xforms.
+ */
+struct rte_crypto_asym_xform {
+   struct rte_crypto_asym_xform *next;
+   /**< Pointer to next xform to set up xform chain.*/
+   enum rte_crypto_asym_xform_type xform_type;
+   /**< Asymmetric crypto transform */
+
+   union {
+   struct rte_crypto_rsa_xform rsa;
+   /**

[PATCH 3/3] app/test: add test sm2 C1/Kp test cases

2024-01-29 Thread Arkadiusz Kusztal
This commit adds tests cases to be used when C1 or kP elliptic
curve points need to be computed.

Signed-off-by: Arkadiusz Kusztal 
---
 app/test/test_cryptodev_asym.c | 116 -
 app/test/test_cryptodev_sm2_test_vectors.h | 112 +++-
 2 files changed, 224 insertions(+), 4 deletions(-)

diff --git a/app/test/test_cryptodev_asym.c b/app/test/test_cryptodev_asym.c
index 17daf734e8..20da0fd815 100644
--- a/app/test/test_cryptodev_asym.c
+++ b/app/test/test_cryptodev_asym.c
@@ -2639,6 +2639,8 @@ test_sm2_sign(void)
asym_op->sm2.k.data = input_params.k.data;
asym_op->sm2.k.length = input_params.k.length;
}
+   asym_op->sm2.k.data = input_params.k.data;
+   asym_op->sm2.k.length = input_params.k.length;
 
/* Init out buf */
asym_op->sm2.r.data = output_buf_r;
@@ -3188,7 +3190,7 @@ static int send_one(void)
ticks++;
if (ticks >= DEQ_TIMEOUT) {
RTE_LOG(ERR, USER1,
-   "line %u FAILED: Cannot dequeue the crypto op 
on device %d",
+   "line %u FAILED: Cannot dequeue the crypto op 
on device, timeout %d",
__LINE__, params->valid_devs[0]);
return TEST_FAILED;
}
@@ -3467,6 +3469,110 @@ KAT_RSA_Decrypt_CRT(const void *data)
return 0;
 }
 
+static int
+test_sm2_encryption(const void *data)
+{
+   struct rte_crypto_asym_xform xform = { 0 };
+   const uint8_t dev_id = params->valid_devs[0];
+   const struct crypto_testsuite_sm2_params *test_vector = data;
+   uint8_t result_C1_x1[TEST_DATA_SIZE] = { 0 };
+   uint8_t result_C1_y1[TEST_DATA_SIZE] = { 0 };
+   uint8_t result_kP_x1[TEST_DATA_SIZE] = { 0 };
+   uint8_t result_kP_y1[TEST_DATA_SIZE] = { 0 };
+
+   xform.xform_type = RTE_CRYPTO_ASYM_XFORM_SM2;
+   xform.ec.curve_id = RTE_CRYPTO_EC_GROUP_SM2;
+   xform.ec.q = test_vector->pubkey;
+   self->op->asym->sm2.op_type = RTE_CRYPTO_ASYM_OP_ENCRYPT;
+   self->op->asym->sm2.k = test_vector->k;
+   if (rte_cryptodev_asym_session_create(dev_id, &xform,
+   params->session_mpool, &self->sess) < 0) {
+   RTE_LOG(ERR, USER1, "line %u FAILED: Session creation failed",
+   __LINE__);
+   return TEST_FAILED;
+   }
+   rte_crypto_op_attach_asym_session(self->op, self->sess);
+
+   self->op->asym->sm2.C1.x.data = result_C1_x1;
+   self->op->asym->sm2.C1.y.data = result_C1_y1;
+   self->op->asym->sm2.kP.x.data = result_kP_x1;
+   self->op->asym->sm2.kP.y.data = result_kP_y1;
+   TEST_ASSERT_SUCCESS(send_one(),
+   "Failed to process crypto op");
+
+   debug_hexdump(stdout, "C1[x]", self->op->asym->sm2.C1.x.data,
+   self->op->asym->sm2.C1.x.length);
+   debug_hexdump(stdout, "C1[y]", self->op->asym->sm2.C1.y.data,
+   self->op->asym->sm2.C1.y.length);
+   debug_hexdump(stdout, "kP[x]", self->op->asym->sm2.kP.x.data,
+   self->op->asym->sm2.kP.x.length);
+   debug_hexdump(stdout, "kP[y]", self->op->asym->sm2.kP.y.data,
+   self->op->asym->sm2.kP.y.length);
+
+   TEST_ASSERT_BUFFERS_ARE_EQUAL(test_vector->C1.x.data,
+   self->op->asym->sm2.C1.x.data,
+   test_vector->C1.x.length,
+   "Incorrect value of C1[x]\n");
+   TEST_ASSERT_BUFFERS_ARE_EQUAL(test_vector->C1.y.data,
+   self->op->asym->sm2.C1.y.data,
+   test_vector->C1.y.length,
+   "Incorrect value of C1[y]\n");
+   TEST_ASSERT_BUFFERS_ARE_EQUAL(test_vector->kP.x.data,
+   self->op->asym->sm2.kP.x.data,
+   test_vector->kP.x.length,
+   "Incorrect value of kP[x]\n");
+   TEST_ASSERT_BUFFERS_ARE_EQUAL(test_vector->kP.y.data,
+   self->op->asym->sm2.kP.y.data,
+   test_vector->kP.y.length,
+   "Incorrect value of kP[y]\n");
+
+   return TEST_SUCCESS;
+}
+
+static int
+test_sm2_decryption(const void *data)
+{
+   struct rte_crypto_asym_xform xform = {};
+   const uint8_t dev_id = params->valid_devs[0];
+   const struct crypto_testsuite_sm2_params *test_vector = data;
+   uint8_t result_kP_x1[TEST_DATA_SIZE] = { 0 };
+   uint8_t result_kP_y1[TEST_DATA_SIZE] = { 0 };
+
+   xform.xform_type = RTE_CRYPTO_ASYM_XFORM_SM2;
+   xform.ec.pkey = test_vector->pkey;
+   self->op->asym->sm2.op_type = RTE_CRYPTO_ASYM_OP_DECRYPT;
+   self->op->asym->sm2.C1 = test_vector->C1;
+
+   if (rte_cryptodev_asym_session_create(dev_id, &xform,
+   params->session_mpool, &self->sess) < 0) {
+   RTE_LOG(ERR, USER1, "line %u FAILED: Session creation failed",
+   __LINE__);
+   return TEST_FAILED;
+   }
+   rte_crypto_op_attach_asy

Re: [PATCH] RFC: use C11 alignas instead of GCC attribute aligned

2024-01-29 Thread Tyler Retzlaff
On Sun, Jan 28, 2024 at 11:00:31AM +0100, Mattias Rönnblom wrote:
> On 2024-01-28 09:57, Morten Brørup wrote:
> >>From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> >>Sent: Saturday, 27 January 2024 20.15
> >>
> >>On 2024-01-26 11:18, Morten Brørup wrote:
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> Sent: Friday, 26 January 2024 11.05
> 
> On 2024-01-25 23:53, Morten Brørup wrote:
> >>From: Tyler Retzlaff [mailto:roret...@linux.microsoft.com]
> >>Sent: Thursday, 25 January 2024 19.37
> >>
> >>ping.
> >>
> >>Please review this thread if you have time, the main point of
> >>discussion
> >>I would like to receive consensus on the following questions.
> >>
> >>1. Should we continue to expand common alignments behind an
> __rte_macro
> >>
> >> i.e. what do we prefer to appear in code
> >>
> >> alignas(RTE_CACHE_LINE_MIN_SIZE)
> >>
> >> -- or --
> >>
> >> __rte_cache_aligned
> >>
> >>One of the benefits of dropping the macro is it provides a clear
> visual
> >>indicator that it is not placed in the same location or get
> >>applied
> >>to types as is done with __attribute__((__aligned__(n))).
> >
> >We don't want our own proprietary variant of something that already
> exists in the C standard. Now that we have moved to C11, the __rte
> alignment macros should be considered obsolete.
> 
> Making so something cache-line aligned is not in C11.
> >>>
> >>>We are talking about the __rte_aligned() macro, not the cache
> >>alignment macro.
> >>>
> >>
> >>OK, in that case, what is the relevance of question 1 above?
> >
> >With this in mind, try re-reading Tyler's clarifications in this tread.
> >
> >Briefly: alignas() can be attached to variables and structure fields, but 
> >not to types (like __rte_aligned()), so to align a structure:
> >
> >struct foo {
> > int alignas(64) bar; /* alignas(64) must be here */
> > int baz;
> >}; /* __rte_aligned(64) was here, but alignas(64) cannot be here. */
> >
> >So the question is: Do we want to eliminate the __rte_aligned() macro - 
> >which relies on compiler attributes - and migrate to using the C11 standard 
> >alignas()?
> >
> >I think yes; after updating to C11, the workaround for pre-C11 not offering 
> >alignment is obsolete, and its removal should be on the roadmap.
> >
> 
> OK, thanks for the explanation. Interesting limitation in the standard.
> 
> If the construct the standard is offering is less effective (in this
> case, less readable) and the non-standard-based option is possible
> to implement on all compilers (i.e., on MSVC too), then we should
> keep the custom option. Especially if it's already there, but also
> in cases where it isn't.
> 
> In fact, one could argue *everything* related to alignment should go
> through something rte_, __rte_ or RTE_-prefixed. So, "int
> RTE_ALIGNAS(64) bar;". Maybe that would be silly, but it would be
> consistent with RTE_CACHE_ALIGNAS.
> 
> I would worry more about allowing DPDK developers writing clean and
> readable code, than very slightly lowering the bar for the fraction
> of newcomers experienced with the latest and greatest from the C
> standard, and *not* familiar with age-old GCC extensions.

I’d just like to summarize where my understanding is at after reviewing
this discussion and my downstream branch. But I also want to make it
clear that we probably need to use both standard C and non-standard
attribute/declspec for object and struct/union type alignment
respectively.

I've assumed we prefer avoiding per-compiler conditional expansion when
possible through the use of standard C mechanisms. But there are
instances when alignas is awkward.

So I think the following is consistent with what Mattias is advocating
sans any discussions related to actual naming of macros.

We should have 2 macros, upon which others may be built to expand to
well-known values for e.g. cache line size.

RTE_ALIGNAS(n) object;

* This macro is used to align C objects i.e. variable, array, struct/union
  fields etc.
* Trivially expands to alignas(n) for all toolchains.
* Placed in a location that both C and C++ translation units accept that
  is on the same line preceeding the object type.
  example:
  // RTE_ALIGNAS(n) object;
  RTE_ALIGNAS(16) char somearray[16];

RTE_ALIGN_TYPE(n)

* This macro is used to align struct/union types.
* Conditionally expands to __declspec(align(n)) (msvc) and
  __attribute__((__aligned__(n))) (for all other toolchains)
* Placed in a location that for all gcc,clang,msvc and both C and C++
  translation units accept.
  example:
  // {struct,union} RTE_ALIGN_TYPE(n) tag { ... };
  struct RTE_ALIGN_TYPE(64) sometype { ... };

I'm not picky about what the names actualy are if you have better
suggestions i'm happy to adopt them.

Thoughts? Comments?

Appreciate the discussion this has been helpful.

ty



[Patch v2] net/mana: use rte_pktmbuf_alloc_bulk for allocating RX WQEs

2024-01-29 Thread longli
From: Long Li 

Instead of allocating mbufs one by one during RX, use rte_pktmbuf_alloc_bulk()
to allocate them in a batch.

Signed-off-by: Long Li 
---
Change in v2:
use rte_calloc_socket() in place of rte_calloc()

 drivers/net/mana/rx.c | 68 ---
 1 file changed, 44 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mana/rx.c b/drivers/net/mana/rx.c
index acad5e26cd..b011bf3ea1 100644
--- a/drivers/net/mana/rx.c
+++ b/drivers/net/mana/rx.c
@@ -2,6 +2,7 @@
  * Copyright 2022 Microsoft Corporation
  */
 #include 
+#include 
 
 #include 
 #include 
@@ -59,9 +60,8 @@ mana_rq_ring_doorbell(struct mana_rxq *rxq)
 }
 
 static int
-mana_alloc_and_post_rx_wqe(struct mana_rxq *rxq)
+mana_post_rx_wqe(struct mana_rxq *rxq, struct rte_mbuf *mbuf)
 {
-   struct rte_mbuf *mbuf = NULL;
struct gdma_sgl_element sgl[1];
struct gdma_work_request request;
uint32_t wqe_size_in_bu;
@@ -69,12 +69,6 @@ mana_alloc_and_post_rx_wqe(struct mana_rxq *rxq)
int ret;
struct mana_mr_cache *mr;
 
-   mbuf = rte_pktmbuf_alloc(rxq->mp);
-   if (!mbuf) {
-   rxq->stats.nombuf++;
-   return -ENOMEM;
-   }
-
mr = mana_alloc_pmd_mr(&rxq->mr_btree, priv, mbuf);
if (!mr) {
DP_LOG(ERR, "failed to register RX MR");
@@ -121,19 +115,32 @@ mana_alloc_and_post_rx_wqe(struct mana_rxq *rxq)
  * Post work requests for a Rx queue.
  */
 static int
-mana_alloc_and_post_rx_wqes(struct mana_rxq *rxq)
+mana_alloc_and_post_rx_wqes(struct mana_rxq *rxq, uint32_t count)
 {
int ret;
uint32_t i;
+   struct rte_mbuf **mbufs;
+
+   mbufs = rte_calloc_socket("mana_rx_mbufs", count, sizeof(struct 
rte_mbuf *),
+ 0, rxq->mp->socket_id);
+   if (!mbufs)
+   return -ENOMEM;
+
+   ret = rte_pktmbuf_alloc_bulk(rxq->mp, mbufs, count);
+   if (ret) {
+   DP_LOG(ERR, "failed to allocate mbufs for RX");
+   rxq->stats.nombuf += count;
+   goto fail;
+   }
 
 #ifdef RTE_ARCH_32
rxq->wqe_cnt_to_short_db = 0;
 #endif
-   for (i = 0; i < rxq->num_desc; i++) {
-   ret = mana_alloc_and_post_rx_wqe(rxq);
+   for (i = 0; i < count; i++) {
+   ret = mana_post_rx_wqe(rxq, mbufs[i]);
if (ret) {
DP_LOG(ERR, "failed to post RX ret = %d", ret);
-   return ret;
+   goto fail;
}
 
 #ifdef RTE_ARCH_32
@@ -146,6 +153,8 @@ mana_alloc_and_post_rx_wqes(struct mana_rxq *rxq)
 
mana_rq_ring_doorbell(rxq);
 
+fail:
+   rte_free(mbufs);
return ret;
 }
 
@@ -404,7 +413,9 @@ mana_start_rx_queues(struct rte_eth_dev *dev)
}
 
for (i = 0; i < priv->num_queues; i++) {
-   ret = mana_alloc_and_post_rx_wqes(dev->data->rx_queues[i]);
+   struct mana_rxq *rxq = dev->data->rx_queues[i];
+
+   ret = mana_alloc_and_post_rx_wqes(rxq, rxq->num_desc);
if (ret)
goto fail;
}
@@ -423,7 +434,7 @@ uint16_t
 mana_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 {
uint16_t pkt_received = 0;
-   uint16_t wqe_posted = 0;
+   uint16_t wqe_consumed = 0;
struct mana_rxq *rxq = dpdk_rxq;
struct mana_priv *priv = rxq->priv;
struct rte_mbuf *mbuf;
@@ -535,18 +546,23 @@ mana_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
 
rxq->gdma_rq.tail += desc->wqe_size_in_bu;
 
-   /* Consume this request and post another request */
-   ret = mana_alloc_and_post_rx_wqe(rxq);
-   if (ret) {
-   DP_LOG(ERR, "failed to post rx wqe ret=%d", ret);
-   break;
-   }
-
-   wqe_posted++;
+   /* Record the number of the RX WQE we need to post to replenish
+* consumed RX requests
+*/
+   wqe_consumed++;
if (pkt_received == pkts_n)
break;
 
 #ifdef RTE_ARCH_32
+   /* Always post WQE as soon as it's consumed for short DB */
+   ret = mana_alloc_and_post_rx_wqes(rxq, wqe_consumed);
+   if (ret) {
+   DRV_LOG(ERR, "failed to post %d WQEs, ret %d",
+   wqe_consumed, ret);
+   return pkt_received;
+   }
+   wqe_consumed = 0;
+
/* Ring short doorbell if approaching the wqe increment
 * limit.
 */
@@ -569,8 +585,12 @@ mana_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
goto repoll;
}
 
-   if (wqe_posted)
-   mana_rq_ring_doorbell(rxq);
+   if (wqe_consumed) {
+   ret = mana_alloc_and_post_rx_wqes(rxq, wqe_c

[PATCH 1/2] net/mana: use a MR variable on the stack instead of allocating it

2024-01-29 Thread longli
From: Long Li 

The content of the MR is copied to the cache trees, it's not necessary to
allocate a MR to do this. Use a variable on the stack instead.

This also fixes the memory leak in the code where a MR is allocated but
never freed.

Signed-off-by: Long Li 
---
 drivers/net/mana/mr.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mana/mr.c b/drivers/net/mana/mr.c
index d6a5ad1460..c9d0f7ef5a 100644
--- a/drivers/net/mana/mr.c
+++ b/drivers/net/mana/mr.c
@@ -40,7 +40,7 @@ mana_new_pmd_mr(struct mana_mr_btree *local_tree, struct 
mana_priv *priv,
struct ibv_mr *ibv_mr;
struct mana_range ranges[pool->nb_mem_chunks];
uint32_t i;
-   struct mana_mr_cache *mr;
+   struct mana_mr_cache mr;
int ret;
 
rte_mempool_mem_iter(pool, mana_mempool_chunk_cb, ranges);
@@ -75,14 +75,13 @@ mana_new_pmd_mr(struct mana_mr_btree *local_tree, struct 
mana_priv *priv,
DP_LOG(DEBUG, "MR lkey %u addr %p len %zu",
   ibv_mr->lkey, ibv_mr->addr, ibv_mr->length);
 
-   mr = rte_calloc("MANA MR", 1, sizeof(*mr), 0);
-   mr->lkey = ibv_mr->lkey;
-   mr->addr = (uintptr_t)ibv_mr->addr;
-   mr->len = ibv_mr->length;
-   mr->verb_obj = ibv_mr;
+   mr.lkey = ibv_mr->lkey;
+   mr.addr = (uintptr_t)ibv_mr->addr;
+   mr.len = ibv_mr->length;
+   mr.verb_obj = ibv_mr;
 
rte_spinlock_lock(&priv->mr_btree_lock);
-   ret = mana_mr_btree_insert(&priv->mr_btree, mr);
+   ret = mana_mr_btree_insert(&priv->mr_btree, &mr);
rte_spinlock_unlock(&priv->mr_btree_lock);
if (ret) {
ibv_dereg_mr(ibv_mr);
@@ -90,7 +89,7 @@ mana_new_pmd_mr(struct mana_mr_btree *local_tree, struct 
mana_priv *priv,
return ret;
}
 
-   ret = mana_mr_btree_insert(local_tree, mr);
+   ret = mana_mr_btree_insert(local_tree, &mr);
if (ret) {
/* Don't need to clean up MR as it's already
 * in the global tree
-- 
2.17.1



[PATCH 2/2] net/mana: properly deal with MR cache expansion failure

2024-01-29 Thread longli
From: Long Li 

On MR cache expension failure, the request should fail as there is no path
to get a new MR into the tree. Attempting to insert a new MR to the cache
tree will result in memory violation.

Signed-off-by: Long Li 
---
 drivers/net/mana/mana.h |  6 +++---
 drivers/net/mana/mr.c   | 45 -
 2 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index eadcd01858..309d553956 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -522,9 +522,9 @@ void mana_del_pmd_mr(struct mana_mr_cache *mr);
 void mana_mempool_chunk_cb(struct rte_mempool *mp, void *opaque,
   struct rte_mempool_memhdr *memhdr, unsigned int idx);
 
-struct mana_mr_cache *mana_mr_btree_lookup(struct mana_mr_btree *bt,
-  uint16_t *idx,
-  uintptr_t addr, size_t len);
+int mana_mr_btree_lookup(struct mana_mr_btree *bt, uint16_t *idx,
+uintptr_t addr, size_t len,
+struct mana_mr_cache **cache);
 int mana_mr_btree_insert(struct mana_mr_btree *bt, struct mana_mr_cache 
*entry);
 int mana_mr_btree_init(struct mana_mr_btree *bt, int n, int socket);
 void mana_mr_btree_free(struct mana_mr_btree *bt);
diff --git a/drivers/net/mana/mr.c b/drivers/net/mana/mr.c
index c9d0f7ef5a..c4045141bc 100644
--- a/drivers/net/mana/mr.c
+++ b/drivers/net/mana/mr.c
@@ -138,8 +138,12 @@ mana_alloc_pmd_mr(struct mana_mr_btree *local_mr_btree, 
struct mana_priv *priv,
 
 try_again:
/* First try to find the MR in local queue tree */
-   mr = mana_mr_btree_lookup(local_mr_btree, &idx,
- (uintptr_t)mbuf->buf_addr, mbuf->buf_len);
+   ret = mana_mr_btree_lookup(local_mr_btree, &idx,
+  (uintptr_t)mbuf->buf_addr, mbuf->buf_len,
+  &mr);
+   if (ret)
+   return NULL;
+
if (mr) {
DP_LOG(DEBUG, "Local mr lkey %u addr 0x%" PRIxPTR " len %zu",
   mr->lkey, mr->addr, mr->len);
@@ -148,11 +152,14 @@ mana_alloc_pmd_mr(struct mana_mr_btree *local_mr_btree, 
struct mana_priv *priv,
 
/* If not found, try to find the MR in global tree */
rte_spinlock_lock(&priv->mr_btree_lock);
-   mr = mana_mr_btree_lookup(&priv->mr_btree, &idx,
- (uintptr_t)mbuf->buf_addr,
- mbuf->buf_len);
+   ret = mana_mr_btree_lookup(&priv->mr_btree, &idx,
+  (uintptr_t)mbuf->buf_addr,
+  mbuf->buf_len, &mr);
rte_spinlock_unlock(&priv->mr_btree_lock);
 
+   if (ret)
+   return NULL;
+
/* If found in the global tree, add it to the local tree */
if (mr) {
ret = mana_mr_btree_insert(local_mr_btree, mr);
@@ -228,22 +235,23 @@ mana_mr_btree_expand(struct mana_mr_btree *bt, int n)
 /*
  * Look for a region of memory in MR cache.
  */
-struct mana_mr_cache *
-mana_mr_btree_lookup(struct mana_mr_btree *bt, uint16_t *idx,
-uintptr_t addr, size_t len)
+int mana_mr_btree_lookup(struct mana_mr_btree *bt, uint16_t *idx,
+uintptr_t addr, size_t len,
+struct mana_mr_cache **cache)
 {
struct mana_mr_cache *table;
uint16_t n;
uint16_t base = 0;
int ret;
 
-   n = bt->len;
+   *cache = NULL;
 
+   n = bt->len;
/* Try to double the cache if it's full */
if (n == bt->size) {
ret = mana_mr_btree_expand(bt, bt->size << 1);
if (ret)
-   return NULL;
+   return ret;
}
 
table = bt->table;
@@ -262,14 +270,16 @@ mana_mr_btree_lookup(struct mana_mr_btree *bt, uint16_t 
*idx,
 
*idx = base;
 
-   if (addr + len <= table[base].addr + table[base].len)
-   return &table[base];
+   if (addr + len <= table[base].addr + table[base].len) {
+   *cache = &table[base];
+   return 0;
+   }
 
DP_LOG(DEBUG,
   "addr 0x%" PRIxPTR " len %zu idx %u sum 0x%" PRIxPTR " not 
found",
   addr, len, *idx, addr + len);
 
-   return NULL;
+   return 0;
 }
 
 int
@@ -314,14 +324,21 @@ mana_mr_btree_insert(struct mana_mr_btree *bt, struct 
mana_mr_cache *entry)
struct mana_mr_cache *table;
uint16_t idx = 0;
uint16_t shift;
+   int ret;
+
+   ret = mana_mr_btree_lookup(bt, &idx, entry->addr, entry->len, &table);
+   if (ret)
+   return ret;
 
-   if (mana_mr_btree_lookup(bt, &idx, entry->addr, entry->len)) {
+   if (table) {
DP_LOG(DEBUG, "Addr 0x%" PRIxPTR " len %zu exists in btree",
   entry->addr, entry->len);

Re: [PATCH] app/testpmd: fix crash in multi-process packet forwarding

2024-01-29 Thread huangdengdui



On 2024/1/26 14:23, fengchengwen wrote:
> Hi Dengdui,
> 
> On 2024/1/26 10:41, Dengdui Huang wrote:
>> On multi-process scenario, each process creates flows based on the
>> number of queues. When nbcore is greater than 1, multiple cores may
>> use the same queue to forward packet, like:
>> dpdk-testpmd -a BDF --proc-type=auto -- -i --rxq=4 --txq=4
>> --nb-cores=2 --num-procs=2 --proc-id=0
>> testpmd> start
>> mac packet forwarding - ports=1 - cores=2 - streams=4 - NUMA support
>> enabled, MP allocation mode: native
>> Logical Core 2 (socket 0) forwards packets on 2 streams:
>> RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
>> RX P=0/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00
>> Logical Core 3 (socket 0) forwards packets on 2 streams:
>> RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
>> RX P=0/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00
> 
> tip: it would be more readable if with an indent, just like below example.
> 
> Acked-by: Chengwen Feng 
> 
> Thanks
> 
OK, Thanks
>>
>> After this commit, the result will be:
>> dpdk-testpmd -a BDF --proc-type=auto -- -i --rxq=4 --txq=4
>> --nb-cores=2 --num-procs=2 --proc-id=0
>> testpmd> start
>> io packet forwarding - ports=1 - cores=2 - streams=2 - NUMA support
>> enabled, MP allocation mode: native
>> Logical Core 2 (socket 0) forwards packets on 1 streams:
>>   RX P=0/Q=0 (socket 2) -> TX P=0/Q=0 (socket 2) peer=02:00:00:00:00:00
>> Logical Core 3 (socket 0) forwards packets on 1 streams:
>>   RX P=0/Q=1 (socket 2) -> TX P=0/Q=1 (socket 2) peer=02:00:00:00:00:00
>>
>> Fixes: a550baf24af9 ("app/testpmd: support multi-process")
>> Cc: sta...@dpdk.org
>>
>> Signed-off-by: Dengdui Huang 
>> ---
>>  app/test-pmd/config.c | 6 +-
>>  1 file changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
>> index cad7537bc6..2c4dedd603 100644
>> --- a/app/test-pmd/config.c
>> +++ b/app/test-pmd/config.c
>> @@ -4794,7 +4794,6 @@ rss_fwd_config_setup(void)
>>  queueid_t  nb_q;
>>  streamid_t  sm_id;
>>  int start;
>> -int end;
>>  
>>  nb_q = nb_rxq;
>>  if (nb_q > nb_txq)
>> @@ -4802,7 +4801,7 @@ rss_fwd_config_setup(void)
>>  cur_fwd_config.nb_fwd_lcores = (lcoreid_t) nb_fwd_lcores;
>>  cur_fwd_config.nb_fwd_ports = nb_fwd_ports;
>>  cur_fwd_config.nb_fwd_streams =
>> -(streamid_t) (nb_q * cur_fwd_config.nb_fwd_ports);
>> +(streamid_t) (nb_q / num_procs * cur_fwd_config.nb_fwd_ports);
>>  
>>  if (cur_fwd_config.nb_fwd_streams < cur_fwd_config.nb_fwd_lcores)
>>  cur_fwd_config.nb_fwd_lcores =
>> @@ -4824,7 +4823,6 @@ rss_fwd_config_setup(void)
>>   * the 2~3 queue for secondary process.
>>   */
>>  start = proc_id * nb_q / num_procs;
>> -end = start + nb_q / num_procs;
>>  rxp = 0;
>>  rxq = start;
>>  for (sm_id = 0; sm_id < cur_fwd_config.nb_fwd_streams; sm_id++) {
>> @@ -4843,8 +4841,6 @@ rss_fwd_config_setup(void)
>>  continue;
>>  rxp = 0;
>>  rxq++;
>> -if (rxq >= end)
>> -rxq = start;
>>  }
>>  }
>>  
>>


[PATCH v2] app/testpmd: fix crash in multi-process packet forwarding

2024-01-29 Thread Dengdui Huang
On multi-process scenario, each process creates flows based on the
number of queues. When nbcore is greater than 1, multiple cores may
use the same queue to forward packet, like:
dpdk-testpmd -a BDF --proc-type=auto -- -i --rxq=4 --txq=4
--nb-cores=2 --num-procs=2 --proc-id=0
testpmd> start
mac packet forwarding - ports=1 - cores=2 - streams=4 - NUMA support
enabled, MP allocation mode: native
Logical Core 2 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
  RX P=0/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00
Logical Core 3 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
  RX P=0/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00

After this commit, the result will be:
dpdk-testpmd -a BDF --proc-type=auto -- -i --rxq=4 --txq=4
--nb-cores=2 --num-procs=2 --proc-id=0
testpmd> start
io packet forwarding - ports=1 - cores=2 - streams=2 - NUMA support
enabled, MP allocation mode: native
Logical Core 2 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=0 (socket 2) -> TX P=0/Q=0 (socket 2) peer=02:00:00:00:00:00
Logical Core 3 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=1 (socket 2) -> TX P=0/Q=1 (socket 2) peer=02:00:00:00:00:00

Fixes: a550baf24af9 ("app/testpmd: support multi-process")
Cc: sta...@dpdk.org

Signed-off-by: Dengdui Huang 
Acked-by: Chengwen Feng 
---
 app/test-pmd/config.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index cad7537bc6..2c4dedd603 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4794,7 +4794,6 @@ rss_fwd_config_setup(void)
queueid_t  nb_q;
streamid_t  sm_id;
int start;
-   int end;
 
nb_q = nb_rxq;
if (nb_q > nb_txq)
@@ -4802,7 +4801,7 @@ rss_fwd_config_setup(void)
cur_fwd_config.nb_fwd_lcores = (lcoreid_t) nb_fwd_lcores;
cur_fwd_config.nb_fwd_ports = nb_fwd_ports;
cur_fwd_config.nb_fwd_streams =
-   (streamid_t) (nb_q * cur_fwd_config.nb_fwd_ports);
+   (streamid_t) (nb_q / num_procs * cur_fwd_config.nb_fwd_ports);
 
if (cur_fwd_config.nb_fwd_streams < cur_fwd_config.nb_fwd_lcores)
cur_fwd_config.nb_fwd_lcores =
@@ -4824,7 +4823,6 @@ rss_fwd_config_setup(void)
 * the 2~3 queue for secondary process.
 */
start = proc_id * nb_q / num_procs;
-   end = start + nb_q / num_procs;
rxp = 0;
rxq = start;
for (sm_id = 0; sm_id < cur_fwd_config.nb_fwd_streams; sm_id++) {
@@ -4843,8 +4841,6 @@ rss_fwd_config_setup(void)
continue;
rxp = 0;
rxq++;
-   if (rxq >= end)
-   rxq = start;
}
 }
 
-- 
2.33.0



[PATCH] [v3]lib/telemetry:fix telemetry conns leak in case of socket write fail

2024-01-29 Thread Shaowei Sun
Telemetry can only create 10 conns by default, each of which is processed
by a thread.

When a thread fails to write using socket, the thread will end directly
without reducing the total number of conns.

This will result in the machine running for a long time, and if there are
10 failures, the telemetry will be unavailable

Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")

Signed-off-by: Shaowei Sun <1819846...@qq.com>
Acked-by: Bruce Richardson 
Acked-by: Ciara Power 
Acked-by: Chengwen Feng 
---
 lib/telemetry/telemetry.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index 31e2391867..0b00c04090 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -378,8 +378,8 @@ client_handler(void *sock_id)
"{\"version\":\"%s\",\"pid\":%d,\"max_output_len\":%d}",
telemetry_version, getpid(), MAX_OUTPUT_LEN);
if (write(s, info_str, strlen(info_str)) < 0) {
-   close(s);
-   return NULL;
+   TMTY_LOG_LINE(ERR, "Socket write base info to client failed");
+   goto exit;
}
 
/* receive data is not null terminated */
@@ -404,6 +404,7 @@ client_handler(void *sock_id)
 
bytes = read(s, buffer, sizeof(buffer) - 1);
}
+exit:
close(s);
rte_atomic_fetch_sub_explicit(&v2_clients, 1, rte_memory_order_relaxed);
return NULL;
-- 
2.37.1 (Apple Git-137.1)



[RFC 0/2] net/tap RSS BPF rewrite

2024-01-29 Thread Stephen Hemminger
This is a early (alpha) view of my recent work to
to update (gut job) of the RSS handling in the
tap device. The rework was so big that it did not
make sense to do it in incremental stages.

Overall it comes out much simpler and cleaner.
Much more testing is needed, at this point it will
load the program and try and setup filter (but fails).

Stephen Hemminger (2):
  tap: stop "vendoring" linux bpf headers
  tap: rework BPF handling

 .gitignore| 3 -
 drivers/net/tap/bpf/.gitignore| 2 +
 drivers/net/tap/bpf/Makefile  |21 +-
 drivers/net/tap/bpf/README|12 +
 drivers/net/tap/bpf/bpf_api.h |   276 -
 drivers/net/tap/bpf/bpf_elf.h |53 -
 drivers/net/tap/bpf/bpf_extract.py|86 -
 drivers/net/tap/bpf/tap_bpf_program.c |   255 -
 drivers/net/tap/bpf/tap_rss.c |   269 +
 drivers/net/tap/meson.build   |26 +-
 drivers/net/tap/rte_eth_tap.c | 2 +
 drivers/net/tap/rte_eth_tap.h | 9 +-
 drivers/net/tap/tap_bpf.h |   121 -
 drivers/net/tap/tap_bpf_api.c |   190 -
 drivers/net/tap/tap_bpf_insns.h   |  1743 
 drivers/net/tap/tap_flow.c|   530 +-
 drivers/net/tap/tap_flow.h|11 +-
 drivers/net/tap/tap_rss.h |14 +-
 drivers/net/tap/tap_rss.skel.h| 11625 
 drivers/net/tap/tap_rss.stub.h|45 +
 20 files changed, 12137 insertions(+), 3156 deletions(-)
 create mode 100644 drivers/net/tap/bpf/.gitignore
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c
 delete mode 100644 drivers/net/tap/tap_bpf.h
 delete mode 100644 drivers/net/tap/tap_bpf_api.c
 delete mode 100644 drivers/net/tap/tap_bpf_insns.h
 create mode 100644 drivers/net/tap/tap_rss.skel.h
 create mode 100644 drivers/net/tap/tap_rss.stub.h

-- 
2.43.0



[RFC 1/2] tap: stop "vendoring" linux bpf headers

2024-01-29 Thread Stephen Hemminger
The proper place for finding bpf structures and functions is
in linux/bpf.h. The original version was trying to workaround the
case where the build environment was running on old pre BPF
version of Glibc, but the target environment had BPF. This is not
a supportable build method, and not how rest of DPDK works.

Having own private (and divergent) version headers leads to future
problems when BPF definitions evolve.

Since DPDK officially supports only LTS or later kernel
there is no need for the #ifdef workarounds in the TAP flow
code. Cloning headers leads to problems and no longer needed.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/tap/bpf/bpf_extract.py |   1 -
 drivers/net/tap/tap_bpf.h  | 121 -
 drivers/net/tap/tap_bpf_api.c  |  20 +++--
 drivers/net/tap/tap_bpf_insns.h|   1 -
 drivers/net/tap/tap_flow.c |  89 -
 5 files changed, 13 insertions(+), 219 deletions(-)
 delete mode 100644 drivers/net/tap/tap_bpf.h

diff --git a/drivers/net/tap/bpf/bpf_extract.py 
b/drivers/net/tap/bpf/bpf_extract.py
index b630c42b809f..73c4dafe4eca 100644
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ b/drivers/net/tap/bpf/bpf_extract.py
@@ -65,7 +65,6 @@ def write_header(out, source):
 print(f' * Auto-generated from {source}', file=out)
 print(" * This not the original source file. Do NOT edit it.", file=out)
 print(" */\n", file=out)
-print("#include ", file=out)
 
 
 def main():
diff --git a/drivers/net/tap/tap_bpf.h b/drivers/net/tap/tap_bpf.h
deleted file mode 100644
index 0d38bc111fe0..
--- a/drivers/net/tap/tap_bpf.h
+++ /dev/null
@@ -1,121 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#ifndef __TAP_BPF_H__
-#define __TAP_BPF_H__
-
-#include 
-
-/* Do not #include  since eBPF must compile on different
- * distros which may include partial definitions for eBPF (while the
- * kernel itself may support eBPF). Instead define here all that is needed
- */
-
-/* BPF_MAP_UPDATE_ELEM command flags */
-#defineBPF_ANY 0 /* create a new element or update an existing */
-
-/* BPF architecture instruction struct */
-struct bpf_insn {
-   __u8code;
-   __u8dst_reg:4;
-   __u8src_reg:4;
-   __s16   off;
-   __s32   imm; /* immediate value */
-};
-
-/* BPF program types */
-enum bpf_prog_type {
-   BPF_PROG_TYPE_UNSPEC,
-   BPF_PROG_TYPE_SOCKET_FILTER,
-   BPF_PROG_TYPE_KPROBE,
-   BPF_PROG_TYPE_SCHED_CLS,
-   BPF_PROG_TYPE_SCHED_ACT,
-};
-
-/* BPF commands types */
-enum bpf_cmd {
-   BPF_MAP_CREATE,
-   BPF_MAP_LOOKUP_ELEM,
-   BPF_MAP_UPDATE_ELEM,
-   BPF_MAP_DELETE_ELEM,
-   BPF_MAP_GET_NEXT_KEY,
-   BPF_PROG_LOAD,
-};
-
-/* BPF maps types */
-enum bpf_map_type {
-   BPF_MAP_TYPE_UNSPEC,
-   BPF_MAP_TYPE_HASH,
-};
-
-/* union of anonymous structs used with TAP BPF commands */
-union bpf_attr {
-   /* BPF_MAP_CREATE command */
-   struct {
-   __u32   map_type;
-   __u32   key_size;
-   __u32   value_size;
-   __u32   max_entries;
-   __u32   map_flags;
-   __u32   inner_map_fd;
-   };
-
-   /* BPF_MAP_UPDATE_ELEM, BPF_MAP_DELETE_ELEM commands */
-   struct {
-   __u32   map_fd;
-   __aligned_u64   key;
-   union {
-   __aligned_u64 value;
-   __aligned_u64 next_key;
-   };
-   __u64   flags;
-   };
-
-   /* BPF_PROG_LOAD command */
-   struct {
-   __u32   prog_type;
-   __u32   insn_cnt;
-   __aligned_u64   insns;
-   __aligned_u64   license;
-   __u32   log_level;
-   __u32   log_size;
-   __aligned_u64   log_buf;
-   __u32   kern_version;
-   __u32   prog_flags;
-   };
-} __rte_aligned(8);
-
-#ifndef __NR_bpf
-# if defined(__i386__)
-#  define __NR_bpf 357
-# elif defined(__x86_64__)
-#  define __NR_bpf 321
-# elif defined(__arm__)
-#  define __NR_bpf 386
-# elif defined(__aarch64__)
-#  define __NR_bpf 280
-# elif defined(__sparc__)
-#  define __NR_bpf 349
-# elif defined(__s390__)
-#  define __NR_bpf 351
-# elif defined(__powerpc__)
-#  define __NR_bpf 361
-# elif defined(__riscv)
-#  define __NR_bpf 280
-# elif defined(__loongarch__)
-#  define __NR_bpf 280
-# else
-#  error __NR_bpf not defined
-# endif
-#endif
-
-enum {
-   BPF_MAP_ID_KEY,
-   BPF_MAP_ID_SIMPLE,
-};
-
-static int bpf_load(enum bpf_prog_type type, const struct bpf_insn *insns,
-   size_t insns_cnt, const char *license);
-
-#endif /* __TAP_BPF_H__ */
diff --git a/drivers/net/tap/tap_bpf_api.c b/drivers/net/tap/tap_bpf_api.c
index 15283f8917ed..9e05e2ddf19b 100644
--- a/drivers/net/tap/tap_bpf_api.c

[RFC] ethdev: introduce PTP device capability

2024-01-29 Thread Huisong Li
Currently, the PTP feature is a little messy and has some issue.
1> There is different implementation for PTP. This feature of some
   hardware depends on the Rx HW timestamp offload, and use this
   offload to detect if enable PTP feature. Others can enable PTP
   feature with only ethdev ops.
2> For some drivers, enabling PTP feature also depends on the macro
   RTE_LIBRTE_IEEE1588. Actually, whether device support, enable
   or disable this feature should not be determined at compilation
   time. This has been discussed in thread [1].
3> The user haven't a good way to distinguish which port supports
   the PTP feature in the case that a couple of device belong to the
   same driver. And PTP related API in ethdev layer doesn't do check
   for PTP capability. This has been mentioned and discussed in
   thread [2].

In the thread [1], there is a conclusion that remove the compiling
macro RTE_LIBRTE_IEEE1588. And in the thread [2], there is a common
opinion that the RTE_ETH_RX_OFFLOAD_TIMESTAMP cannot be used as the
PTP capability.

For the above issuse, this patch introduces a PTP capability in
rte_eth_dev_info.dev_capa to report PTP capability.

Welcome to jumping into discussion.

[1] 
https://patchwork.dpdk.org/project/dpdk/patch/20230203132810.14187-1-tho...@monjalon.net/
[2] https://inbox.dpdk.org/dev/20230817084226.55327-1-lihuis...@huawei.com/

Signed-off-by: Huisong Li 
---
 lib/ethdev/rte_ethdev.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index efa4f12b2a..4c8bd691bd 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1613,6 +1613,12 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP RTE_BIT64(3)
 /** Device supports keeping shared flow objects across restart. */
 #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
+/**
+ * Device supports PTP feature.
+ * For some hardware, this feature also need to set the offload
+ * RTE_ETH_RX_OFFLOAD_TIMESTAMP, please check rte_eth_dev_info.rx_offload_capa.
+ */
+#define RTE_ETH_DEV_CAPA_PTP RTE_BIT64(5)
 /**@}*/
 
 /*
-- 
2.30.0



Re: [PATCH v2 1/2] bus/pci: fix secondary process PCI uio resource map problem

2024-01-29 Thread fengchengwen
Hi,

Nice to see this fix.
ps: our team also founds this problem, and the bugfix is still being reviewed 
internally.

As for this patch, I prefer not extract subfunction, because the father is 
simple and just
need do some minor adjustment.

map_idx = 0;
for (i = 0; i != PCI_MAX_RESOURCE; i++) {
/* skip empty BAR */
if (dev->mem_resource[i].phys_addr == 0)
continue;
... // use map_idx instead i
map_idx++;
dev->mem_resource[i].addr = mapaddr;
}

Also suggest use pci_uio_find_resource() to refactor pci_uio_map_secondary() 
function.

Thanks

On 2024/1/29 17:22, Chaoyong He wrote:
> From: Zerun Fu 
> 
> For the primary process, the logic loops all BARs and will skip
> the map of BAR with an invalid physical address (0), also will
> assign 'uio_res->nb_maps' with the real mapped BARs number. But
> for the secondary process, instead of loops all BARs, the logic
> using the 'uio_res->nb_map' as index. If the device uses continuous
> BARs there will be no problem, whereas if it uses discrete BARs,
> it will lead to mapping errors.
> 
> Fix this problem by also loops all BARs and skip the map of BAR
> with an invalid physical address in secondary process.
> 
> Fixes: 9b957f378abf ("pci: merge uio functions for linux and bsd")
> Cc: muk...@igel.co.jp
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Zerun Fu 
> Reviewed-by: Chaoyong He 
> Reviewed-by: Long Wu 
> Reviewed-by: Peng Zhang 
> ---
>  drivers/bus/pci/pci_common_uio.c | 94 
>  1 file changed, 60 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/bus/pci/pci_common_uio.c 
> b/drivers/bus/pci/pci_common_uio.c
> index 76c661f054..fcd8a49daf 100644
> --- a/drivers/bus/pci/pci_common_uio.c
> +++ b/drivers/bus/pci/pci_common_uio.c
> @@ -23,10 +23,57 @@ static struct rte_tailq_elem rte_uio_tailq = {
>  };
>  EAL_REGISTER_TAILQ(rte_uio_tailq)
>  
> +static int
> +pci_uio_map_secondary_resource_by_index(struct rte_pci_device *dev,
> + int res_idx, struct mapped_pci_resource *uio_res, int map_idx)
> +{
> + int fd, i;
> +
> + if (map_idx >= uio_res->nb_maps)
> + return -1;
> +
> + /*
> +  * open devname, to mmap it
> +  */
> + fd = open(uio_res->maps[map_idx].path, O_RDWR);
> + if (fd < 0) {
> + RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> + uio_res->maps[map_idx].path, strerror(errno));
> + return -1;
> + }
> +
> + void *mapaddr = pci_map_resource(uio_res->maps[map_idx].addr,
> + fd, (off_t)uio_res->maps[map_idx].offset,
> + (size_t)uio_res->maps[map_idx].size, 0);
> +
> + /* fd is not needed in secondary process, close it */
> + close(fd);
> + if (mapaddr != uio_res->maps[map_idx].addr) {
> + RTE_LOG(ERR, EAL,
> + "Cannot mmap device resource file %s to address: %p\n",
> + uio_res->maps[map_idx].path,
> + uio_res->maps[map_idx].addr);
> + if (mapaddr != NULL) {
> + /* unmap addrs correctly mapped */
> + for (i = 0; i < map_idx; i++)
> + pci_unmap_resource(
> + uio_res->maps[i].addr,
> + (size_t)uio_res->maps[i].size);
> + /* unmap addr wrongly mapped */
> + pci_unmap_resource(mapaddr,
> + (size_t)uio_res->maps[map_idx].size);
> + }
> + return -1;
> + }
> + dev->mem_resource[res_idx].addr = mapaddr;
> +
> + return 0;
> +}
> +
>  static int
>  pci_uio_map_secondary(struct rte_pci_device *dev)
>  {
> - int fd, i, j;
> + int map_idx = 0, res_idx, ret;
>   struct mapped_pci_resource *uio_res;
>   struct mapped_pci_res_list *uio_res_list =
>   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
> @@ -37,41 +84,20 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
>   if (rte_pci_addr_cmp(&uio_res->pci_addr, &dev->addr))
>   continue;
>  
> - for (i = 0; i != uio_res->nb_maps; i++) {
> - /*
> -  * open devname, to mmap it
> -  */
> - fd = open(uio_res->maps[i].path, O_RDWR);
> - if (fd < 0) {
> - RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> - uio_res->maps[i].path, strerror(errno));
> - return -1;
> + /* Map all BARs */
> + for (res_idx = 0; res_idx != PCI_MAX_RESOURCE; res_idx++) {
> +  /* skip empty BAR */
> + if (dev->mem_resource[r

Re: [PATCH v2 2/2] bus/pci: fix secondary process save 'FD' problem

2024-01-29 Thread fengchengwen
Hi,

The pci_uio_alloc_resource in drivers/bus/pci/bsd should also modified.

How about add subfunction: pci_uio_init_intr() which includes 
access/rte_intr_fd_set/rte_intr_fd_get/rte_intr_type_set?

Thanks

On 2024/1/29 17:22, Chaoyong He wrote:
> From: Zerun Fu 
> 
> In the previous logic the 'fd' was only saved in the primary process,
> but for some devices this value is also used in the secondary logic.
> 
> For example, the call of 'rte_pci_find_ext_capability()' will fail in
> the secondary process.
> 
> Fix this problem by getting and saving the value of 'fd' also in the
> secondary process logic.
> 
> Fixes: 9b957f378abf ("pci: merge uio functions for linux and bsd")
> Cc: muk...@igel.co.jp
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Zerun Fu 
> Reviewed-by: Chaoyong He 
> Reviewed-by: Long Wu 
> Reviewed-by: Peng Zhang 
> ---
>  drivers/bus/pci/linux/pci_uio.c  | 5 -
>  drivers/bus/pci/pci_common_uio.c | 8 
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
> index 97d740dfe5..6680e42efb 100644
> --- a/drivers/bus/pci/linux/pci_uio.c
> +++ b/drivers/bus/pci/linux/pci_uio.c
> @@ -237,7 +237,7 @@ pci_uio_alloc_resource(struct rte_pci_device *dev,
>   }
>   snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
>  
> - /* save fd if in primary process */
> + /* save fd */
>   fd = open(devname, O_RDWR);
>   if (fd < 0) {
>   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> @@ -261,6 +261,9 @@ pci_uio_alloc_resource(struct rte_pci_device *dev,
>   if (rte_intr_dev_fd_set(dev->intr_handle, uio_cfg_fd))
>   goto error;
>  
> + if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> + return 0;
> +
>   if (dev->kdrv == RTE_PCI_KDRV_IGB_UIO) {
>   if (rte_intr_type_set(dev->intr_handle, RTE_INTR_HANDLE_UIO))
>   goto error;
> diff --git a/drivers/bus/pci/pci_common_uio.c 
> b/drivers/bus/pci/pci_common_uio.c
> index fcd8a49daf..b6f79b067d 100644
> --- a/drivers/bus/pci/pci_common_uio.c
> +++ b/drivers/bus/pci/pci_common_uio.c
> @@ -122,15 +122,15 @@ pci_uio_map_resource(struct rte_pci_device *dev)
>   if (rte_intr_dev_fd_set(dev->intr_handle, -1))
>   return -1;
>  
> - /* secondary processes - use already recorded details */
> - if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> - return pci_uio_map_secondary(dev);
> -
>   /* allocate uio resource */
>   ret = pci_uio_alloc_resource(dev, &uio_res);
>   if (ret)
>   return ret;
>  
> + /* secondary processes - use already recorded details */
> + if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> + return pci_uio_map_secondary(dev);
> +
>   /* Map all BARs */
>   for (i = 0; i != PCI_MAX_RESOURCE; i++) {
>   /* skip empty BAR */
> 


[PATCH v7 0/5] app/testpmd: support multiple process attach and detach port

2024-01-29 Thread Huisong Li
This patchset fix some bugs and support attaching and detaching port
in primary and secondary.

---
 -v7: fix conflicts
 -v6: adjust rte_eth_dev_is_used position based on alphabetical order
  in version.map
 -v5: move 'ALLOCATED' state to the back of 'REMOVED' to avoid abi break.
 -v4: fix a misspelling. 
 -v3:
   #1 merge patch 1/6 and patch 2/6 into patch 1/5, and add modification
  for other bus type.
   #2 add a RTE_ETH_DEV_ALLOCATED state in rte_eth_dev_state to resolve
  the probelm in patch 2/5. 
 -v2: resend due to CI unexplained failure.

Huisong Li (5):
  drivers/bus: restore driver assignment at front of probing
  ethdev: fix skip valid port in probing callback
  app/testpmd: check the validity of the port
  app/testpmd: add attach and detach port for multiple process
  app/testpmd: stop forwarding in new or destroy event

 app/test-pmd/testpmd.c   | 47 +++-
 app/test-pmd/testpmd.h   |  1 -
 drivers/bus/auxiliary/auxiliary_common.c |  9 -
 drivers/bus/dpaa/dpaa_bus.c  |  9 -
 drivers/bus/fslmc/fslmc_bus.c|  8 +++-
 drivers/bus/ifpga/ifpga_bus.c| 12 --
 drivers/bus/pci/pci_common.c |  9 -
 drivers/bus/vdev/vdev.c  | 10 -
 drivers/bus/vmbus/vmbus_common.c |  9 -
 drivers/net/bnxt/bnxt_ethdev.c   |  3 +-
 drivers/net/bonding/bonding_testpmd.c|  1 -
 drivers/net/mlx5/mlx5.c  |  2 +-
 lib/ethdev/ethdev_driver.c   | 13 +--
 lib/ethdev/ethdev_driver.h   | 12 ++
 lib/ethdev/ethdev_pci.h  |  2 +-
 lib/ethdev/rte_class_eth.c   |  2 +-
 lib/ethdev/rte_ethdev.c  |  4 +-
 lib/ethdev/rte_ethdev.h  |  4 +-
 lib/ethdev/version.map   |  1 +
 19 files changed, 114 insertions(+), 44 deletions(-)

-- 
2.33.0



[PATCH v7 3/5] app/testpmd: check the validity of the port

2024-01-29 Thread Huisong Li
This patch checks the validity of port id for all events in
'eth_event_callback()'.

Signed-off-by: Huisong Li 
Acked-by: Aman Singh 
Acked-by: Chengwen Feng 
---
 app/test-pmd/testpmd.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9e4e99e53b..859d4aad7d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3960,14 +3960,15 @@ eth_event_callback(portid_t port_id, enum 
rte_eth_event_type type, void *param,
fflush(stdout);
}
 
+   if (port_id_is_invalid(port_id, DISABLED_WARN))
+   return 0;
+
switch (type) {
case RTE_ETH_EVENT_NEW:
ports[port_id].need_setup = 1;
ports[port_id].port_status = RTE_PORT_HANDLING;
break;
case RTE_ETH_EVENT_INTR_RMV:
-   if (port_id_is_invalid(port_id, DISABLED_WARN))
-   break;
if (rte_eal_alarm_set(10,
rmv_port_callback, (void *)(intptr_t)port_id))
fprintf(stderr,
-- 
2.33.0



[PATCH v7 2/5] ethdev: fix skip valid port in probing callback

2024-01-29 Thread Huisong Li
The event callback in application may use the macro RTE_ETH_FOREACH_DEV to
iterate over all enabled ports to do something(like, verifying the port id
validity) when receive a probing event. If the ethdev state of a port is
not RTE_ETH_DEV_UNUSED, this port will be considered as a valid port.

However, this state is set to RTE_ETH_DEV_ATTACHED after pushing probing
event. It means that probing callback will skip this port. But this
assignment can not move to front of probing notification. See
commit be8cd210379a ("ethdev: fix port probing notification")

So this patch has to add a new state, RTE_ETH_DEV_ALLOCATED. Set the ethdev
state to RTE_ETH_DEV_ALLOCATED before pushing probing event and set it to
RTE_ETH_DEV_ATTACHED after definitely probed. And this port is valid if its
device state is 'ALLOCATED' or 'ATTACHED'.

In addition, the new state has to be placed behind 'REMOVED' to avoid ABI
break. Fortunately, this ethdev state is internal and applications can not
access it directly. So this patch encapsulates an API, rte_eth_dev_is_used,
for ethdev or PMD to call and eliminate concerns about using this state
enum value comparison.

Fixes: be8cd210379a ("ethdev: fix port probing notification")
Cc: sta...@dpdk.org

Signed-off-by: Huisong Li 
Acked-by: Chengwen Feng 
---
 drivers/net/bnxt/bnxt_ethdev.c |  3 ++-
 drivers/net/mlx5/mlx5.c|  2 +-
 lib/ethdev/ethdev_driver.c | 13 ++---
 lib/ethdev/ethdev_driver.h | 12 
 lib/ethdev/ethdev_pci.h|  2 +-
 lib/ethdev/rte_class_eth.c |  2 +-
 lib/ethdev/rte_ethdev.c|  4 ++--
 lib/ethdev/rte_ethdev.h|  4 +++-
 lib/ethdev/version.map |  1 +
 9 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index acf7e6e46e..cb4c4cfb0e 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -6109,7 +6109,8 @@ bnxt_dev_uninit(struct rte_eth_dev *eth_dev)
 
PMD_DRV_LOG(DEBUG, "Calling Device uninit\n");
 
-   if (eth_dev->state != RTE_ETH_DEV_UNUSED)
+
+   if (rte_eth_dev_is_used(eth_dev->state))
bnxt_dev_close_op(eth_dev);
 
return 0;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 3a182de248..1f1d9b7d96 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -3258,7 +3258,7 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_device 
*odev)
while (port_id < RTE_MAX_ETHPORTS) {
struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 
-   if (dev->state != RTE_ETH_DEV_UNUSED &&
+   if (rte_eth_dev_is_used(dev->state) &&
dev->device &&
(dev->device == odev ||
 (dev->device->driver &&
diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
index bd917a15fc..2b4a6e2b2c 100644
--- a/lib/ethdev/ethdev_driver.c
+++ b/lib/ethdev/ethdev_driver.c
@@ -52,8 +52,8 @@ eth_dev_find_free_port(void)
for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
/* Using shared name field to find a free port. */
if (eth_dev_shared_data->data[i].name[0] == '\0') {
-   RTE_ASSERT(rte_eth_devices[i].state ==
-  RTE_ETH_DEV_UNUSED);
+   RTE_ASSERT(!rte_eth_dev_is_used(
+   rte_eth_devices[i].state));
return i;
}
}
@@ -217,11 +217,18 @@ rte_eth_dev_probing_finish(struct rte_eth_dev *dev)
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
 
+   dev->state = RTE_ETH_DEV_ALLOCATED;
rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_NEW, NULL);
 
dev->state = RTE_ETH_DEV_ATTACHED;
 }
 
+bool rte_eth_dev_is_used(uint16_t dev_state)
+{
+   return dev_state == RTE_ETH_DEV_ALLOCATED ||
+   dev_state == RTE_ETH_DEV_ATTACHED;
+}
+
 int
 rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
 {
@@ -239,7 +246,7 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
if (ret != 0)
return ret;
 
-   if (eth_dev->state != RTE_ETH_DEV_UNUSED)
+   if (rte_eth_dev_is_used(eth_dev->state))
rte_eth_dev_callback_process(eth_dev,
RTE_ETH_EVENT_DESTROY, NULL);
 
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index f05f68a67c..238bd72ab0 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1596,6 +1596,18 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 __rte_internal
 void rte_eth_dev_probing_finish(struct rte_eth_dev *dev);
 
+/**
+ * Check if a Ethernet device state is used or not
+ *
+ * @param dev_state
+ *   The state of the Ethernet device
+ * @return
+ *   - true if the state of the Ethernet device is allocated or attached
+ *   - false 

[PATCH v7 1/5] drivers/bus: restore driver assignment at front of probing

2024-01-29 Thread Huisong Li
The driver assignment was moved back at the end of the device probing
because there is no something to use rte_driver during the phase of
probing. See commit 391797f04208 ("drivers/bus: move driver assignment
to end of probing")

However, it is necessary for probing callback to reference rte_driver
before probing. For example, probing callback may call some APIs which
access the rte_pci_driver::driver by the device::driver pointer to get
driver information. In this case, a segment fault will occur in probing
callback if there is not this assignment.

Further, some comments in code need to be updated if we do that. The
driver pointer in rte_device is set before probing and needs to be reset
if probing failed. And rte_dev_is_probed can not be called inside probing.

Fixes: 391797f04208 ("drivers/bus: move driver assignment to end of probing")
Cc: sta...@dpdk.org

Signed-off-by: Huisong Li 
Acked-by: Chengwen Feng 
---
 drivers/bus/auxiliary/auxiliary_common.c |  9 +++--
 drivers/bus/dpaa/dpaa_bus.c  |  9 +++--
 drivers/bus/fslmc/fslmc_bus.c|  8 +++-
 drivers/bus/ifpga/ifpga_bus.c| 12 +---
 drivers/bus/pci/pci_common.c |  9 +++--
 drivers/bus/vdev/vdev.c  | 10 --
 drivers/bus/vmbus/vmbus_common.c |  9 +++--
 7 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/drivers/bus/auxiliary/auxiliary_common.c 
b/drivers/bus/auxiliary/auxiliary_common.c
index 29f99342a7..6313684b8f 100644
--- a/drivers/bus/auxiliary/auxiliary_common.c
+++ b/drivers/bus/auxiliary/auxiliary_common.c
@@ -132,16 +132,21 @@ rte_auxiliary_probe_one_driver(struct 
rte_auxiliary_driver *drv,
}
 
dev->driver = drv;
+   /*
+* Reference rte_driver before probing so as to this pointer can
+* be used to get driver information in case of segment fault in
+* probing callback.
+*/
+   dev->device.driver = &drv->driver;
 
AUXILIARY_LOG(INFO, "Probe auxiliary driver: %s device: %s (NUMA node 
%i)",
  drv->driver.name, dev->name, dev->device.numa_node);
ret = drv->probe(drv, dev);
if (ret != 0) {
dev->driver = NULL;
+   dev->device.driver = NULL;
rte_intr_instance_free(dev->intr_handle);
dev->intr_handle = NULL;
-   } else {
-   dev->device.driver = &drv->driver;
}
 
return ret;
diff --git a/drivers/bus/dpaa/dpaa_bus.c b/drivers/bus/dpaa/dpaa_bus.c
index e57159f5d8..f1b817e58c 100644
--- a/drivers/bus/dpaa/dpaa_bus.c
+++ b/drivers/bus/dpaa/dpaa_bus.c
@@ -693,17 +693,22 @@ rte_dpaa_bus_probe(void)
(dev->device.devargs &&
 dev->device.devargs->policy == RTE_DEV_BLOCKED))
continue;
-
+   /*
+* Reference rte_driver before probing so as to this
+* pointer can be used to get driver information in case
+* of segment fault in probing callback.
+*/
+   dev->device.driver = &drv->driver;
if (probe_all ||
(dev->device.devargs &&
 dev->device.devargs->policy == RTE_DEV_ALLOWED)) {
ret = drv->probe(drv, dev);
if (ret) {
+   dev->device.driver = NULL;
DPAA_BUS_ERR("unable to probe:%s",
 dev->name);
} else {
dev->driver = drv;
-   dev->device.driver = &drv->driver;
}
}
break;
diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c
index 57bfb5111a..4bc0c6d3d4 100644
--- a/drivers/bus/fslmc/fslmc_bus.c
+++ b/drivers/bus/fslmc/fslmc_bus.c
@@ -471,15 +471,21 @@ rte_fslmc_probe(void)
continue;
}
 
+   /*
+* Reference rte_driver before probing so as to this
+* pointer can be used to get driver information in case
+* of segment fault in probing callback.
+*/
+   dev->device.driver = &drv->driver;
if (probe_all ||
   (dev->device.devargs &&
dev->device.devargs->policy == RTE_DEV_ALLOWED)) {
ret = drv->probe(drv, dev);
if (ret) {
+   dev->device.driver = NULL;
DPAA2_BUS_ERR("Unable

[PATCH v7 5/5] app/testpmd: stop forwarding in new or destroy event

2024-01-29 Thread Huisong Li
When testpmd receives the new or destroy event, the port related
information will be updated. Testpmd must stop packet forwarding
before updating the information to avoid some serious problems.

Signed-off-by: Huisong Li 
Acked-by: Chengwen Feng 
---
 app/test-pmd/testpmd.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index db985d33a3..dfdd7e723d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3968,6 +3968,8 @@ eth_event_callback(portid_t port_id, enum 
rte_eth_event_type type, void *param,
 
switch (type) {
case RTE_ETH_EVENT_NEW:
+   if (test_done == 0)
+   stop_packet_forwarding();
if (setup_on_probe_event)
setup_attached_port(port_id);
break;
@@ -3978,6 +3980,8 @@ eth_event_callback(portid_t port_id, enum 
rte_eth_event_type type, void *param,
"Could not set up deferred device removal\n");
break;
case RTE_ETH_EVENT_DESTROY:
+   if (test_done == 0)
+   stop_packet_forwarding();
ports[port_id].port_status = RTE_PORT_CLOSED;
printf("Port %u is closed\n", port_id);
if (rte_eal_alarm_set(10, remove_invalid_ports_callback,
-- 
2.33.0



[PATCH v7 4/5] app/testpmd: add attach and detach port for multiple process

2024-01-29 Thread Huisong Li
The port information needs to be updated due to attaching and detaching
port. Currently, it is done in the same thread as removing or probing
device, which doesn't satisfy the operation of attaching and detaching
device in multiple process.

If this operation is performed in one process, the other process can
receive 'new' or 'destroy' event. So we can move updating port information
to event callback to support attaching and detaching port in primary and
secondary process.

The reason for adding an alarm callback in 'destroy' event is that the
ethdev state is changed from 'ATTACHED' to 'UNUSED' only after the event
callback finished. But the remove_invalid_ports() function removes invalid
port only if ethdev state is 'UNUSED'. If we don't add alarm callback, this
detached port information can not be removed.

Signed-off-by: Huisong Li 
Signed-off-by: Dongdong Liu 
Acked-by: Chengwen Feng 
---
 app/test-pmd/testpmd.c| 38 ---
 app/test-pmd/testpmd.h|  1 -
 drivers/net/bonding/bonding_testpmd.c |  1 -
 3 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 859d4aad7d..db985d33a3 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3625,15 +3625,12 @@ attach_port(char *identifier)
return;
}
 
-   /* first attach mode: event */
-   if (setup_on_probe_event) {
-   /* new ports are detected on RTE_ETH_EVENT_NEW event */
-   for (pi = 0; pi < RTE_MAX_ETHPORTS; pi++)
-   if (ports[pi].port_status == RTE_PORT_HANDLING &&
-   ports[pi].need_setup != 0)
-   setup_attached_port(pi);
+   /*
+* first attach mode: event, setting up attached port is done in
+* probing callback.
+*/
+   if (setup_on_probe_event)
return;
-   }
 
/* second attach mode: iterator */
RTE_ETH_FOREACH_MATCHING_DEV(pi, identifier, &iterator) {
@@ -3664,7 +3661,6 @@ setup_attached_port(portid_t pi)
ports_ids[nb_ports++] = pi;
fwd_ports_ids[nb_fwd_ports++] = pi;
nb_cfg_ports = nb_fwd_ports;
-   ports[pi].need_setup = 0;
ports[pi].port_status = RTE_PORT_STOPPED;
 
printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
@@ -3698,10 +3694,8 @@ detach_device(struct rte_device *dev)
TESTPMD_LOG(ERR, "Failed to detach device %s\n", 
rte_dev_name(dev));
return;
}
-   remove_invalid_ports();
 
printf("Device is detached\n");
-   printf("Now total ports is %d\n", nb_ports);
printf("Done\n");
return;
 }
@@ -3768,11 +3762,9 @@ detach_devargs(char *identifier)
return;
}
 
-   remove_invalid_ports();
-
printf("Device %s is detached\n", identifier);
-   printf("Now total ports is %d\n", nb_ports);
printf("Done\n");
+
rte_devargs_reset(&da);
 }
 
@@ -3936,11 +3928,22 @@ rmv_port_callback(void *arg)
struct rte_device *device = dev_info.device;
close_port(port_id);
detach_device(device); /* might be already removed or have more 
ports */
+   remove_invalid_ports();
+   printf("Now total ports is %d\n", nb_ports);
}
if (need_to_start)
start_packet_forwarding(0);
 }
 
+static void
+remove_invalid_ports_callback(void *arg)
+{
+   RTE_SET_USED(arg);
+
+   remove_invalid_ports();
+   printf("Now total ports is %d\n", nb_ports);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -3965,8 +3968,8 @@ eth_event_callback(portid_t port_id, enum 
rte_eth_event_type type, void *param,
 
switch (type) {
case RTE_ETH_EVENT_NEW:
-   ports[port_id].need_setup = 1;
-   ports[port_id].port_status = RTE_PORT_HANDLING;
+   if (setup_on_probe_event)
+   setup_attached_port(port_id);
break;
case RTE_ETH_EVENT_INTR_RMV:
if (rte_eal_alarm_set(10,
@@ -3977,6 +3980,9 @@ eth_event_callback(portid_t port_id, enum 
rte_eth_event_type type, void *param,
case RTE_ETH_EVENT_DESTROY:
ports[port_id].port_status = RTE_PORT_CLOSED;
printf("Port %u is closed\n", port_id);
+   if (rte_eal_alarm_set(10, remove_invalid_ports_callback,
+   (void *)(intptr_t)port_id))
+   fprintf(stderr, "Could not set up deferred device 
released\n");
break;
case RTE_ETH_EVENT_RX_AVAIL_THRESH: {
uint16_t rxq_id;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9b10a9ea1c..0e860c2e67 100644
--- a/app/test-pmd/testpmd.h
+++ b/ap

[PATCH V1] net/mlx5: store IPv6 TC detection result in physical device

2024-01-29 Thread Gavin Li
Previously, discovering of IPv6 traffic class would happen on each device
not sharing context with others. However, It's not necessary to repeat it
on devices of the same physical device. A flow will be created and
destroyed in the detection, which may trigger cache allocation and take
more memory in scale cases.

To solve the problem, store the discovering of IPv6 traffic class result
in physical device, and do it only once per physical device.

Fixes: 569b8340a012 ("net/mlx5: discover IPv6 traffic class support in RDMA 
core")
Signed-off-by: Gavin Li 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/linux/mlx5_os.c | 12 +++-
 drivers/net/mlx5/mlx5.h  | 13 -
 drivers/net/mlx5/mlx5_flow_dv.c  |  2 +-
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index e47d0d0238..dd140e9934 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1602,11 +1602,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
goto error;
}
rte_rwlock_init(&priv->ind_tbls_lock);
-   if (!priv->sh->cdev->config.hca_attr.modify_outer_ipv6_traffic_class ||
-   (sh->config.dv_flow_en == 1 &&
-   !priv->sh->ipv6_tc_fallback &&
-   mlx5_flow_discover_ipv6_tc_support(eth_dev)))
-   priv->sh->ipv6_tc_fallback = 1;
+   if (sh->phdev->config.ipv6_tc_fallback == MLX5_IPV6_TC_UNKNOWN) {
+   if (!sh->cdev->config.hca_attr.modify_outer_ipv6_traffic_class 
||
+   (sh->config.dv_flow_en == 1 && 
mlx5_flow_discover_ipv6_tc_support(eth_dev)))
+   sh->phdev->config.ipv6_tc_fallback = 
MLX5_IPV6_TC_FALLBACK;
+   else
+   sh->phdev->config.ipv6_tc_fallback = MLX5_IPV6_TC_OK;
+   }
if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_MLX5_HWS_SUPPORT
if (priv->sh->config.dv_esw_en) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 683029023e..ce9aa64a1d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1421,6 +1421,17 @@ struct mlx5_dev_registers {
 
 struct mlx5_geneve_tlv_options;
 
+enum mlx5_ipv6_tc_support {
+   MLX5_IPV6_TC_UNKNOWN = 0,
+   MLX5_IPV6_TC_FALLBACK,
+   MLX5_IPV6_TC_OK,
+};
+
+struct mlx5_common_nic_config {
+   enum mlx5_ipv6_tc_support ipv6_tc_fallback;
+   /* Whether ipv6 traffic class should use old value. */
+};
+
 /**
  * Physical device structure.
  * This device is created once per NIC to manage recourses shared by all ports
@@ -1431,6 +1442,7 @@ struct mlx5_physical_device {
struct mlx5_dev_ctx_shared *sh; /* Created on sherd context. */
uint64_t guid; /* System image guid, the uniq ID of physical device. */
struct mlx5_geneve_tlv_options *tlv_options;
+   struct mlx5_common_nic_config config;
uint32_t refcnt;
 };
 
@@ -1459,7 +1471,6 @@ struct mlx5_dev_ctx_shared {
uint32_t lag_rx_port_affinity_en:1;
/* lag_rx_port_affinity is supported. */
uint32_t hws_max_log_bulk_sz:5;
-   uint32_t ipv6_tc_fallback:1;
/* Log of minimal HWS counters created hard coded. */
uint32_t hws_max_nb_counters; /* Maximal number for HWS counters. */
uint32_t max_port; /* Maximal IB device port index. */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 6998be107f..1d2fdd3391 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1608,7 +1608,7 @@ mlx5_modify_flex_item(const struct rte_eth_dev *dev,
 static inline bool
 mlx5_dv_modify_ipv6_traffic_class_supported(struct mlx5_priv *priv)
 {
-   return !priv->sh->ipv6_tc_fallback;
+   return priv->sh->phdev->config.ipv6_tc_fallback == MLX5_IPV6_TC_OK;
 }
 
 void
-- 
2.34.1



Re: [RFC] service: extend service function call statistics

2024-01-29 Thread Mattias Rönnblom

On 2024-01-29 13:50, Van Haaren, Harry wrote:

-Original Message-
From: Mattias Rönnblom 
Sent: Saturday, January 27, 2024 7:32 PM
To: Morten Brørup ; Mattias Rönnblom
; dev@dpdk.org
Cc: Van Haaren, Harry ; Stefan Sundkvist

Subject: Re: [RFC] service: extend service function call statistics


Hi Mattias,

Thanks for the patch, looks good to me! Some responses to discussion below;

Acked-by: Harry van Haaren 


On 2024-01-26 11:07, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Friday, 26 January 2024 09.28

On 2024-01-26 00:19, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
Sent: Thursday, 25 January 2024 20.15

Add two new per-service counters.

RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service

function

invocations where no work was performed.

RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
resulting in an error.

The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
counting all invocations, regardless of return value).

The new statistics may be useful for both debugging and profiling
(e.g., calculate the average per-call processing latency for non-

idle

service calls).

Service core tests are extended to cover the new counters, and
coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.


OK to all of the above. Good stuff.



The documentation for the CYCLES attributes are updated to reflect
their actual semantics.


If this is intended behavior, then updating the documentation seems

appropriate - I would even go so far as considering it a bug fix.


However, quite a few cycles may be consumed by a service before it

can conclude that it had no work to do. Shouldn't that be considered
time spent by the service? I.e. should the code be fixed instead of the
documentation?




Generally, polling for new work in the service is cheap, in my
experience. But there's nothing in principle that prevents the
situation
your describe from occurring. You could add an "IDLE_CYCLES" counter in
case that would ever be a real-world problem.

That wouldn't be a fix, but rather just returning to the old, subtly
broken, (pre-22.11?) semantics.

Have a look at 809bd24 to see the rationale for the change. There's an
example in 4689c57.

The cause of this ambiguity is due to the fact that the platform/OS
(i.e., DPDK) doesn't know which service to "wake up" (which in turn is
the result of the platform not being able to tracking input sources,
like NIC RX queues or timer wheels) and thus must ask every service to
check if it has something to do.


OK. Makes good sense.
So definitely fix the documentation, not the code. :-)


Agreed.




Alternatively, keep the behavior (for backwards compatibility) and

fix the documentation, as this patch does, and add an IDLE_CYCLES
counter for time spent in idle calls.


PS: We're not using DPDK service cores in our applications, so I'm

not familiar with the details. We are using something somewhat similar
(but homegrown), also for profiling and power management purposes, and
my feedback is based on my experience with our own variant of service
cores.




When are you making the switch to service cores? :)


Our own "service cores" implementation has some slightly different properties,

which we are still experimenting with.


E.g. in addition to the special return value "idle (no work to do)", we also 
have a

special return value for "incomplete (more work urgently pending)" when a 
service
processed a full burst and still has more work pending its input queue.


We are also considering returning a value to inform what time it needs to be 
called

again. This concept is only an idea, and we haven't started experimenting with 
it yet.



  From a high level perspective, the service cores library is much like an 
operating

system's CPU scheduler, although based on voluntary time sharing. Many 
algorithms
and many parameters can be considered. It can also tie into power management and
prioritization of different tasks.


This was brought up as a concern when initially adding it to DPDK; the scope of 
service-cores
is quite easily going to change from "way to run a dataplane function on a 
core, to abstract
away the different environments that DPDK runs" to "userspace scheduler with 
bells-and-whistles".



Yes, and in a hypothetical DPDK without Eventdev or competing deferred 
work mechanisms that would have been a natural evolution.



The reason service-cores was built was to allow applications run with HW & SW 
eventdev PMDs,
and not have to handle the different threading requirements at the application 
level. This goal is
achieved by the current service-cores infrastructure.



It's a generic and public API. I'm not sure I think it matters much what 
service it was initially built. There's no "PMD use only" label on it, 
which is a good thing.





Service cores in their current form is more like a primitive kernel
bottom half implementation.

The primary wo