Re: [RFC 1/3] uapi: introduce kernel uAPI headers importation

2024-09-06 Thread Maxime Coquelin




On 9/6/24 08:46, Morten Brørup wrote:

From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
Sent: Friday, 6 September 2024 00.15

This patch introduces uAPI headers importation into the
DPDK repository. This import is possible thanks to Linux
Kernel licence exception for syscalls:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/LICENS
ES/exceptions/Linux-syscall-note

Header files are have to be explicitly imported, and
libraries and drivers have to explicitly enable their
inclusion.

Guidelines are provided in the documentation, and a helper
script is also provided to ensure proper importation of the
header (unmodified content from a released Kernel version).

Next version will introduce a script to check headers are
valids.

Signed-off-by: Maxime Coquelin 
---


Excellent solution, Maxime.

Minor suggestions and typos mentioned below.

Acked-by: Morten Brørup 



Thanks Morten, I'll fix below typos and several build failures caught by
CI in next revision.


+print_usage()
+{
+   echo "Usage: $(basename $0) [-h] [file] [version]"
+   echo "Example of valid file is linux/vfio.h"
+   echo "Example of valid version is v6.10"


Suggest:
+   echo "Example of valid file: linux/vfio.h"
+   echo "Example of valid version: v6.10"



+Once imported, the header files should be committed without any other change,
+and the commit message MUST specify the imported version using ``uAPI ID:``
+tag and title MUST be prefixed with uapi keywork. For example::


"uAPI ID:" -> "uAPI Version"
"keywork" -> "keyword"





Re: [RFC 1/3] uapi: introduce kernel uAPI headers importation

2024-09-06 Thread David Marchand
On Fri, Sep 6, 2024 at 12:15 AM Maxime Coquelin
 wrote:
>
> This patch introduces uAPI headers importation into the
> DPDK repository. This import is possible thanks to Linux
> Kernel licence exception for syscalls:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/LICENSES/exceptions/Linux-syscall-note
>
> Header files are have to be explicitly imported, and
> libraries and drivers have to explicitly enable their
> inclusion.
>
> Guidelines are provided in the documentation, and a helper
> script is also provided to ensure proper importation of the
> header (unmodified content from a released Kernel version).
>
> Next version will introduce a script to check headers are
> valids.
>
> Signed-off-by: Maxime Coquelin 
> ---
>  devtools/import-linux-uapi.sh  | 48 
>  doc/guides/contributing/index.rst  |  1 +
>  doc/guides/contributing/linux_uapi.rst | 63 ++
>  meson.build|  4 ++
>  4 files changed, 116 insertions(+)
>  create mode 100755 devtools/import-linux-uapi.sh
>  create mode 100644 doc/guides/contributing/linux_uapi.rst
>
> diff --git a/devtools/import-linux-uapi.sh b/devtools/import-linux-uapi.sh
> new file mode 100755
> index 00..efeffdd332
> --- /dev/null
> +++ b/devtools/import-linux-uapi.sh
> @@ -0,0 +1,48 @@
> +#!/bin/sh -e
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright (c) 2024 Red Hat, Inc.
> +
> +#
> +# Import Linux Kernel uAPI header file
> +#
> +
> +base_url="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/uapi/";
> +base_path="linux-headers/uapi/"
> +
> +print_usage()
> +{
> +   echo "Usage: $(basename $0) [-h] [file] [version]"

file and version are not optional.
So they should not be surrounded with [].


> +   echo "Example of valid file is linux/vfio.h"
> +   echo "Example of valid version is v6.10"
> +}
> +
> +while getopts hv ARG ; do
> +   case $ARG in
> +   h ) print_usage; exit 0 ;;
> +   ? ) print_usage; exit 1 ;;
> +   esac
> +done
> +shift $(($OPTIND - 1))
> +
> +if [ $# -ne 2 ]; then
> +   print_usage; exit 1;

For consistency with the rest of the script, don't use ;


> +fi
> +
> +file=$1
> +version=$2
> +
> +url="${base_url}${file}?h=${version}"
> +path="${base_path}${file}"
> +
> +# Move to the root of the DPDK tree
> +cd $(dirname $0)/..
> +
> +# Check file and version are valid
> +curl -s -o /dev/null -w "%{http_code}" $url | grep -q "200"

Can we rely on curl to report such errors?
-f is probably the right option.

@@ -37,12 +37,9 @@ path="${base_path}${file}"
 # Move to the root of the DPDK tree
 cd $(dirname $0)/..

-# Check file and version are valid
-curl -s -o /dev/null -w "%{http_code}" $url | grep -q "200"
-
 # Create path if needed
 mkdir -p $(dirname $path)

 # Download the file
-curl -s -o $path $url
+curl -s -f -o $path $url

$ ./devtools/import-linux-uapi.sh linux/vdplop.h v6.10; echo $?
22


> +
> +# Create path if needed
> +mkdir -p $(dirname $path)
> +
> +# Download the file
> +curl -s -o $path $url
> +

No need for a blank line at the end of the file.


> diff --git a/doc/guides/contributing/index.rst 
> b/doc/guides/contributing/index.rst
> index dcb9b1fbf0..603dc72654 100644
> --- a/doc/guides/contributing/index.rst
> +++ b/doc/guides/contributing/index.rst
> @@ -19,3 +19,4 @@ Contributor's Guidelines
>  vulnerability
>  stable
>  cheatsheet
> +linux_uapi
> diff --git a/doc/guides/contributing/linux_uapi.rst 
> b/doc/guides/contributing/linux_uapi.rst
> new file mode 100644
> index 00..3bfd05eb62
> --- /dev/null
> +++ b/doc/guides/contributing/linux_uapi.rst
> @@ -0,0 +1,63 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> +   Copyright(c) 2024 Red Hat, Inc.
> +
> +Linux uAPI header files
> +===
> +
> +

Single empty line.


> +Rationale
> +-
> +
> +The system a DPDK library or driver is built on is not necessarily running 
> the
> +same Kernel version than the system that will run it. Importing Linux Kernel

Please start sentences on a new line.
It won't affect the generated documentation and it slightly enhance
readability, code churn when updating another sentence etc...


> +uAPI headers enable to build features that are not supported yet by the build
> +system.
> +
> +For example, the build system runs upstream Kernel v5.19 and we would like to
> +build a VDUSE application that will use VDUSE_IOTLB_GET_INFO ioctl() 
> introduced
> +in Linux Kernel v6.0.
> +
> +`Linux Kernel licence exception regarding syscalls
> +`_
> +enable importing unmodified Linux Kernel uAPI header files.
> +
> +Importing or updating an uAPI header file
> +-
> +
> +In order to ensure the imported uAPI headers are both unmodified and from a
> +released version of the linu

Re: [PATCH 2/2] vhost: add reconnection support to VDUSE

2024-09-06 Thread Chenbo Xia
Hi Maxime,

> On Sep 5, 2024, at 22:26, Maxime Coquelin  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> This patch enables VDUSE reconnection support making use of
> the newly introduced reconnection mechanism in Vhost
> library.
> 
> At DPDK VDUSE device creation time, there are two
> possibilities:
> 1. The Kernel VDUSE device does not exist:
>  a. A reconnection file named after the VUDSE device name
> is created in VDUSE tmpfs.
>  b. The file is truncated to 'struct vhost_reconnect_data'
> size, and mmapped.
>  c. Negotiated features, Virtio status... are saved for
> sanity checks at reconnect time.
> 2. The Kernel VDUSE device already exists:
>  a. Exit with failure if no reconnect file exists for
> this device.
>  b. Open and mmap the reconnect file.
>  c. Perform sanity check to ensure features are compatible.
>  d. Restore virtqueues' available indexes at startup time.
> 
> Then at runtime, the virtqueues' available index are logged by
> the Vhost reconnection mechanism.
> 
> At DPDK VDUSE device destruction time, there are two
> possibilities:
> 1. The Kernel VDUSE device destruction succeed, which
>means it is no more attached to the vDPA bus. The
>reconnection file is unmapped and then removed.
> 2. The Kernel VDUSE device destruction failed, meaning it
>is no more attached to the vDPA bus. The reconnection
>file is unmapped but not removed to make possible later
>reconnection.
> 
> Signed-off-by: Maxime Coquelin 
> ---
> lib/vhost/vduse.c | 280 +++---
> 1 file changed, 241 insertions(+), 39 deletions(-)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index c66602905c..bd0e492d62 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -136,7 +136,7 @@ vduse_control_queue_event(int fd, void *arg, int *remove 
> __rte_unused)
> }
> 
> static void
> -vduse_vring_setup(struct virtio_net *dev, unsigned int index)
> +vduse_vring_setup(struct virtio_net *dev, unsigned int index, bool reconnect)
> {
>struct vhost_virtqueue *vq = dev->virtqueue[index];
>struct vhost_vring_addr *ra = &vq->ring_addrs;
> @@ -152,6 +152,19 @@ vduse_vring_setup(struct virtio_net *dev, unsigned int 
> index)
>return;
>}
> 
> +   if (reconnect) {
> +   vq->last_avail_idx = vq->reconnect_log->last_avail_idx;
> +   vq->last_used_idx = vq->reconnect_log->last_avail_idx;
> +   } else {
> +   vq->last_avail_idx = vq_info.split.avail_index;
> +   vq->last_used_idx = vq_info.split.avail_index;
> +   }
> +   vq->size = vq_info.num;
> +   vq->ready = true;
> +   vq->enabled = vq_info.ready;
> +   ra->desc_user_addr = vq_info.desc_addr;
> +   ra->avail_user_addr = vq_info.driver_addr;
> +   ra->used_user_addr = vq_info.device_addr;
>VHOST_CONFIG_LOG(dev->ifname, INFO, "VQ %u info:", index);
>VHOST_CONFIG_LOG(dev->ifname, INFO, "\tnum: %u", vq_info.num);
>VHOST_CONFIG_LOG(dev->ifname, INFO, "\tdesc_addr: %llx",
> @@ -162,15 +175,6 @@ vduse_vring_setup(struct virtio_net *dev, unsigned int 
> index)
>(unsigned long long)vq_info.device_addr);
>VHOST_CONFIG_LOG(dev->ifname, INFO, "\tavail_idx: %u", 
> vq_info.split.avail_index);
>VHOST_CONFIG_LOG(dev->ifname, INFO, "\tready: %u", vq_info.ready);
> -
> -   vq->last_avail_idx = vq_info.split.avail_index;
> -   vq->size = vq_info.num;
> -   vq->ready = true;
> -   vq->enabled = vq_info.ready;
> -   ra->desc_user_addr = vq_info.desc_addr;
> -   ra->avail_user_addr = vq_info.driver_addr;
> -   ra->used_user_addr = vq_info.device_addr;
> -
>vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
>if (vq->kickfd < 0) {
>VHOST_CONFIG_LOG(dev->ifname, ERR, "Failed to init kickfd for 
> VQ %u: %s",
> @@ -267,7 +271,7 @@ vduse_vring_cleanup(struct virtio_net *dev, unsigned int 
> index)
> }
> 
> static void
> -vduse_device_start(struct virtio_net *dev)
> +vduse_device_start(struct virtio_net *dev, bool reconnect)
> {
>unsigned int i, ret;
> 
> @@ -287,6 +291,15 @@ vduse_device_start(struct virtio_net *dev)
>return;
>}
> 
> +   if (reconnect && dev->features != dev->reconnect_log->features) {
> +   VHOST_CONFIG_LOG(dev->ifname, ERR,
> +   "Mismatch between reconnect file features 
> 0x%" PRIx64 " & device features 0x%" PRIx64,

Checkpatch reports long line

> +   dev->reconnect_log->features, dev->features);
> +   return;
> +   }
> +
> +   dev->reconnect_log->features = dev->features;
> +
>VHOST_CONFIG_LOG(dev->ifname, INFO, "Negotiated Virtio features: 0x%" 
> PRIx64,
>dev->features);
> 
> @@ -300,7 +313,7 @@ vduse_device_start(struct virtio_net *dev)
>}
> 
>for (i = 0; i < dev->n

Re: [PATCH v1 1/1] usertools: add DPDK build directory setup script

2024-09-06 Thread fengchengwen
On 2024/9/5 15:29, David Marchand wrote:
> On Wed, Sep 4, 2024 at 5:17 PM Anatoly Burakov
>  wrote:
>>
>> Currently, the only way to set up a build directory for DPDK development
>> is through running Meson directly. This has a number of drawbacks.
>>
>> For one, the default configuration is very "fat", meaning everything gets
>> enabled and built (aside from examples, which have to be enabled
>> manually), so while Meson is very good at minimizing work needed to rebuild
>> DPDK, for any change that affects a lot of components (such as editing an
>> EAL header), there's a lot of rebuilding to do.
>>
>> It is of course possible to reduce the number of components built through
>> meson options, but this mechanism isn't perfect, as the user needs to
>> remember exact spelling of all the options and components, and currently
>> it doesn't handle inter-component dependencies very well (e.g. if net/ice
>> is enabled, common/iavf is not automatically enabled, so net/ice can't be
>> built unless user also doesn't forget to specify common/iavf).
> 
> There should be an explicit error explaining why the driver is not enabled.
> Is it not the case?
> 
> 
>>
>> Enter this script. It relies on Meson's introspection capabilities as well
>> as the dependency graphs generated by our build system to display all
>> available components, and handle any dependencies for them automatically,
>> while also not forcing user to remember any command-line options and lists
>> of drivers, and instead relying on interactive TUI to display list of
>> available options. It can also produce builds that are as minimal as
>> possible (including cutting down libraries being built) by utilizing the
>> fact that our dependency graphs report which dependency is mandatory and
>> which one is optional.
>>
>> Because it is not meant to replace native Meson build configuration but
>> is rather targeted at users who are not intimately familiar wtih DPDK's
>> build system, it is run in interactive mode by default. However, it is
>> also possible to run it without interaction, in which case it will pass
>> all its parameters to Meson directly, with added benefit of dependency
>> tracking and producing minimal builds if desired.
>>
>> Signed-off-by: Anatoly Burakov 
> 
> There is no documentation.
> And it is a devtools script and not a usertools, iow, no point in
> installing this along a built dpdk.
> 
> I don't see a lot of value in such script.

+1
I just run this script, and it provide UI just like Linux kernel "make 
menuconfig",
but I think DPDK is not complicated enough to have to use such menuconfig.

> In my opinion, people who really want to tune their dpdk build should
> enter the details carefully and understand the implications.
> But other than that, I have no strong objection.
> 
> 


RE: [EXTERNAL] Re: [PATCH] [RFC] cryptodev: replace LIST_END enumerators with APIs

2024-09-06 Thread Akhil Goyal
> >
> > Here's an idea...
> >
> > We can introduce a generic design pattern where we keep the _LIST_END enum
> value at the end, somehow marking it private (and not part of the API/ABI), 
> and
> move the _list_end() function inside the C file, so it uses the _LIST_END enum
> value that the library was built with. E.g. like this:
> >
> >
> > In the header file:
> >
> > enum rte_crypto_asym_op_type {
> > RTE_CRYPTO_ASYM_OP_VERIFY,
> > /**< Signature Verification operation */
> > #if RTE_BUILDING_INTERNAL
> > __RTE_CRYPTO_ASYM_OP_LIST_END /* internal */
> > #endif
> > }
> >
> > int rte_crypto_asym_op_list_end(void);
> >
> >
> > And in the associated library code file, when including rte_crypto_asym.h:
> >
> > #define RTE_BUILDING_INTERNAL
> > #include 
> >
> > int
> > rte_crypto_asym_op_list_end(void)
> > {
> > return __RTE_CRYPTO_ASYM_OP_LIST_END;
> > }
> 
> It's more generic, and also keep LIST_END in the define, we just add new enum
> before it.
> But based on my understanding of ABI compatibility, from the point view of
> application,
> this API should return old-value even with the new library, but it will 
> return new-
> value
> with new library. It could also break ABI.
> 
> So this API should force inline, just as this patch did. But it seem can't 
> work if
> move
> this API to header file and add static inline.
> 
Yes, moving to c file does not seem to solve the purpose.
So should we move with the way the patch is submitted or we have some other 
suggestion?

Regards,
Akhil


Re: [PATCH 2/2] vhost: add reconnection support to VDUSE

2024-09-06 Thread Maxime Coquelin

Hi Chenbo,

Thanks for the review!

On 9/6/24 09:14, Chenbo Xia wrote:

Hi Maxime,


On Sep 5, 2024, at 22:26, Maxime Coquelin  wrote:

External email: Use caution opening links or attachments


This patch enables VDUSE reconnection support making use of
the newly introduced reconnection mechanism in Vhost
library.

At DPDK VDUSE device creation time, there are two
possibilities:
1. The Kernel VDUSE device does not exist:
  a. A reconnection file named after the VUDSE device name
 is created in VDUSE tmpfs.
  b. The file is truncated to 'struct vhost_reconnect_data'
 size, and mmapped.
  c. Negotiated features, Virtio status... are saved for
 sanity checks at reconnect time.
2. The Kernel VDUSE device already exists:
  a. Exit with failure if no reconnect file exists for
 this device.
  b. Open and mmap the reconnect file.
  c. Perform sanity check to ensure features are compatible.
  d. Restore virtqueues' available indexes at startup time.

Then at runtime, the virtqueues' available index are logged by
the Vhost reconnection mechanism.

At DPDK VDUSE device destruction time, there are two
possibilities:
1. The Kernel VDUSE device destruction succeed, which
means it is no more attached to the vDPA bus. The
reconnection file is unmapped and then removed.
2. The Kernel VDUSE device destruction failed, meaning it
is no more attached to the vDPA bus. The reconnection
file is unmapped but not removed to make possible later
reconnection.

Signed-off-by: Maxime Coquelin 
---
lib/vhost/vduse.c | 280 +++---
1 file changed, 241 insertions(+), 39 deletions(-)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index c66602905c..bd0e492d62 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -136,7 +136,7 @@ vduse_control_queue_event(int fd, void *arg, int *remove 
__rte_unused)
}

static void
-vduse_vring_setup(struct virtio_net *dev, unsigned int index)
+vduse_vring_setup(struct virtio_net *dev, unsigned int index, bool reconnect)
{
struct vhost_virtqueue *vq = dev->virtqueue[index];
struct vhost_vring_addr *ra = &vq->ring_addrs;
@@ -152,6 +152,19 @@ vduse_vring_setup(struct virtio_net *dev, unsigned int 
index)
return;
}

+   if (reconnect) {
+   vq->last_avail_idx = vq->reconnect_log->last_avail_idx;
+   vq->last_used_idx = vq->reconnect_log->last_avail_idx;
+   } else {
+   vq->last_avail_idx = vq_info.split.avail_index;
+   vq->last_used_idx = vq_info.split.avail_index;
+   }
+   vq->size = vq_info.num;
+   vq->ready = true;
+   vq->enabled = vq_info.ready;
+   ra->desc_user_addr = vq_info.desc_addr;
+   ra->avail_user_addr = vq_info.driver_addr;
+   ra->used_user_addr = vq_info.device_addr;
VHOST_CONFIG_LOG(dev->ifname, INFO, "VQ %u info:", index);
VHOST_CONFIG_LOG(dev->ifname, INFO, "\tnum: %u", vq_info.num);
VHOST_CONFIG_LOG(dev->ifname, INFO, "\tdesc_addr: %llx",
@@ -162,15 +175,6 @@ vduse_vring_setup(struct virtio_net *dev, unsigned int 
index)
(unsigned long long)vq_info.device_addr);
VHOST_CONFIG_LOG(dev->ifname, INFO, "\tavail_idx: %u", 
vq_info.split.avail_index);
VHOST_CONFIG_LOG(dev->ifname, INFO, "\tready: %u", vq_info.ready);
-
-   vq->last_avail_idx = vq_info.split.avail_index;
-   vq->size = vq_info.num;
-   vq->ready = true;
-   vq->enabled = vq_info.ready;
-   ra->desc_user_addr = vq_info.desc_addr;
-   ra->avail_user_addr = vq_info.driver_addr;
-   ra->used_user_addr = vq_info.device_addr;
-
vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
if (vq->kickfd < 0) {
VHOST_CONFIG_LOG(dev->ifname, ERR, "Failed to init kickfd for VQ %u: 
%s",
@@ -267,7 +271,7 @@ vduse_vring_cleanup(struct virtio_net *dev, unsigned int 
index)
}

static void
-vduse_device_start(struct virtio_net *dev)
+vduse_device_start(struct virtio_net *dev, bool reconnect)
{
unsigned int i, ret;

@@ -287,6 +291,15 @@ vduse_device_start(struct virtio_net *dev)
return;
}

+   if (reconnect && dev->features != dev->reconnect_log->features) {
+   VHOST_CONFIG_LOG(dev->ifname, ERR,
+   "Mismatch between reconnect file features 0x%" PRIx64 " 
& device features 0x%" PRIx64,


Checkpatch reports long line


I noticed it, but prefered to keep it as is.
Better to have a too long line when grepping logs than splitting IMHO.




+   dev->reconnect_log->features, dev->features);
+   return;
+   }
+
+   dev->reconnect_log->features = dev->features;
+
VHOST_CONFIG_LOG(dev->ifname, INFO, "Negotiated Virtio features: 0x%" 
PRIx64,
dev->features);

@@ -300,7 +313,7 @@ vduse_device_start(struct virtio_net *dev)
}

for (i = 0; i < dev->nr_vring; i++)
-   

Re: [PATCH] net/hns3: dump queue head and tail pointer info

2024-09-06 Thread Ferruh Yigit
On 9/5/2024 7:48 AM, Jie Hai wrote:
> From: Dengdui Huang 
> 
> Add dump the head and tail pointer of RxTx queue.
> -- Rx queue head and tail info:
>  qid  sw_head  sw_hold  hw_head  hw_tail
>   0288  32   256  320
>   1248  56   192  280
>   2264  72   192  296
>   3256  64   192  292
> -- Tx queue head and tail info:
>  qid  sw_head  sw_tail  hw_head  hw_tail
>   0092   84   92
>   132   131  128  139
>   232   128  120  128
>   396   184  176  184
> 
> Signed-off-by: Dengdui Huang 
>

Applied to dpdk-next-net/main, thanks.


RE: [PATCH] [RFC] cryptodev: replace LIST_END enumerators with APIs

2024-09-06 Thread Morten Brørup
> From: fengchengwen [mailto:fengcheng...@huawei.com]
> Sent: Friday, 6 September 2024 08.33
> 
> On 2024/9/5 23:09, Morten Brørup wrote:
> >> +++ b/app/test/test_cryptodev_asym.c
> >> @@ -581,7 +581,7 @@ static inline void print_asym_capa(
> >>rte_cryptodev_asym_get_xform_string(capa->xform_type));
> >>printf("operation supported -");
> >>
> >> -  for (i = 0; i < RTE_CRYPTO_ASYM_OP_LIST_END; i++) {
> >> +  for (i = 0; i < rte_crypto_asym_op_list_end(); i++) {
> >
> >> +++ b/lib/cryptodev/rte_crypto_asym.h
> >> +static inline int
> >> +rte_crypto_asym_xform_type_list_end(void)
> >> +{
> >> +  return RTE_CRYPTO_ASYM_XFORM_SM2 + 1;
> >> +}
> >> +
> >>  /**
> >>   * Asymmetric crypto operation type variants
> >> + * Note: Update rte_crypto_asym_op_list_end for every new type added.
> >>   */
> >>  enum rte_crypto_asym_op_type {
> >>RTE_CRYPTO_ASYM_OP_ENCRYPT,
> >> @@ -135,9 +141,14 @@ enum rte_crypto_asym_op_type {
> >>/**< Signature Generation operation */
> >>RTE_CRYPTO_ASYM_OP_VERIFY,
> >>/**< Signature Verification operation */
> >> -  RTE_CRYPTO_ASYM_OP_LIST_END
> >>  };
> >>
> >> +static inline int
> >> +rte_crypto_asym_op_list_end(void)
> >> +{
> >> +  return RTE_CRYPTO_ASYM_OP_VERIFY + 1;
> >> +}
> >
> > I like the concept of replacing an "last enum value" with a "last enum
> function" for API/ABI compatibility purposes.
> 
> +1
> There are many such define in DPDK, e.g. RTE_ETH_EVENT_MAX
> 
> >
> > Here's an idea...
> >
> > We can introduce a generic design pattern where we keep the _LIST_END enum
> value at the end, somehow marking it private (and not part of the API/ABI),
> and move the _list_end() function inside the C file, so it uses the _LIST_END
> enum value that the library was built with. E.g. like this:
> >
> >
> > In the header file:
> >
> > enum rte_crypto_asym_op_type {
> > RTE_CRYPTO_ASYM_OP_VERIFY,
> > /**< Signature Verification operation */
> > #if RTE_BUILDING_INTERNAL
> > __RTE_CRYPTO_ASYM_OP_LIST_END /* internal */
> > #endif
> > }
> >
> > int rte_crypto_asym_op_list_end(void);
> >
> >
> > And in the associated library code file, when including rte_crypto_asym.h:
> >
> > #define RTE_BUILDING_INTERNAL
> > #include 
> >
> > int
> > rte_crypto_asym_op_list_end(void)
> > {
> > return __RTE_CRYPTO_ASYM_OP_LIST_END;
> > }
> 
> It's more generic, and also keep LIST_END in the define, we just add new enum
> before it.
> But based on my understanding of ABI compatibility, from the point view of
> application,
> this API should return old-value even with the new library, but it will return
> new-value
> with new library. It could also break ABI.
> 
> So this API should force inline, just as this patch did. But it seem can't
> work if move
> this API to header file and add static inline.

Maybe a combination, returning the lowest end of the two versions of the list, 
would work...

--
Common header file (rte_common.h):
--

/* Add at end of enum list in the header file. */
#define RTE_ENUM_LIST_END(name) \
_ # name # _ENUM_LIST_END /**< @internal */

/* Add somewhere in header file, preferably after the enum list. */
#define rte_declare_enum_list_end(name) \
/** @internal */ \
int _# name # _enum_list_end(void); \
\
static int name # _enum_list_end(void) \
{ \
static int cached = 0; \
\
if (likely(cached != 0)) \
return cached; \
\
return cached = RTE_MIN( \
RTE_ENUM_LIST_END(name), \
_ # name # _enum_list_end()); \
} \
\
int _# name # _enum_list_end(void)

/* Add in the library/driver implementation. */
#define rte_define_enum_list_end(name) \
int _# name # _enum_list_end(void) \
{ \
return RTE_ENUM_LIST_END(name); \
} \
\
int _# name # _enum_list_end(void)


Library header file:


enum rte_crypto_asym_op_type {
RTE_CRYPTO_ASYM_OP_VERIFY,
/**< Signature Verification operation */
RTE_ENUM_LIST_END(rte_crypto_asym_op)
}

rte_declare_enum_list_end(rte_crypto_asym_op);

---
Library C file:
---

rte_define_enum_list_end(rte_crypto_asym_op);



Re: [PATCH] eal: increase max file descriptor for secondary process device

2024-09-06 Thread Ferruh Yigit
On 9/5/2024 5:20 PM, Stephen Hemminger wrote:
> The TAP and XDP driver both are limited to only 8 queues when
> because of the small limit imposed by EAL. Increase the limit
> now since this release allows changing ABI.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  doc/guides/rel_notes/release_24_11.rst | 5 +
>  lib/eal/include/rte_eal.h  | 2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/rel_notes/release_24_11.rst 
> b/doc/guides/rel_notes/release_24_11.rst
> index 0ff70d9057..5af70e04c5 100644
> --- a/doc/guides/rel_notes/release_24_11.rst
> +++ b/doc/guides/rel_notes/release_24_11.rst
> @@ -100,6 +100,11 @@ ABI Changes
> Also, make sure to start the actual text at the margin.
> ===
>  
> +* The maximum number of file descriptors that can be passed to a secondary 
> process
> +  has been increased from 8 to 253 (which is the maximum possible with Unix 
> domain
> +  socket). This allows for more queues when using software devices such as 
> TAP
> +  and XDP.
> +
>  
>  Known Issues
>  
> diff --git a/lib/eal/include/rte_eal.h b/lib/eal/include/rte_eal.h
> index c2256f832e..c826e143f1 100644
> --- a/lib/eal/include/rte_eal.h
> +++ b/lib/eal/include/rte_eal.h
> @@ -155,7 +155,7 @@ int rte_eal_primary_proc_alive(const char 
> *config_file_path);
>   */
>  bool rte_mp_disable(void);
>  
> -#define RTE_MP_MAX_FD_NUM8/* The max amount of fds */
> +#define RTE_MP_MAX_FD_NUM253  /* The max amount of fds (see SCM_MAX_FD) 
> */
>  #define RTE_MP_MAX_NAME_LEN  64   /* The max length of action name */
>  #define RTE_MP_MAX_PARAM_LEN 256  /* The max length of param */
>  struct rte_mp_msg {
>

It would be nice to add this to deprecation notice first, but it seems
we missed, I think it is safe to update in ABI break release and good to
remove this restriction.

Acked-by: Ferruh Yigit 




Re: [PATCH v2] doc: add new driver guidelines

2024-09-06 Thread fengchengwen
On 2024/8/15 3:08, Stephen Hemminger wrote:
> From: Nandini Persad 
> 
> This document was created to assist contributors in creating DPDK drivers
> and provides suggestions and guidelines on how to upstream effectively.
> 
> Co-authored-by: Ferruh Yigit 
> Co-authored-by: Thomas Monjalon 
> Signed-off-by: Nandini Persad 
> Reviewed-by: Stephen Hemminger 
> ---
> 
> v2 - review feedback
>- add co-author and reviewed-by
> 
>  doc/guides/contributing/index.rst  |   1 +
>  doc/guides/contributing/new_driver.rst | 202 +
>  2 files changed, 203 insertions(+)
>  create mode 100644 doc/guides/contributing/new_driver.rst
> 

...

> +
> +Finalizing
> +--
> +
> +Once the driver has been upstreamed, the author has
> +a responsibility to the community to maintain it.
> +
> +This includes the public test report. Authors must send a public
> +test report after the first upstreaming of the PMD. The same
> +public test procedure may be reproduced regularly per release.
> +
> +After the PMD is upstreamed, the author should send a patch
> +to update the website with the name of the new PMD and supported devices
> +via the DPDK mailing list..

.. -> .

> +
> +For more information about the role of maintainers, see :doc:`patches`.
> +
> +
> +
> +Splitting into Patches
> +--
> +

...

> +
> +
> +The following order in the patch series is as suggested below.
> +
> +The first patch should have the driver's skeleton which should include:
> +
> +* Maintainer's file update
> +* Driver documentation
> +* Document must have links to official product documentation web page
> +* The  new document should be added into the index (`doc/guides/index.rst`)

The  new -> The new

...

> +
> +Additional Suggestions
> +--
> +
> +* We recommend using DPDK macros instead of inventing new ones in the PMD.
> +* Do not include unused headers. Use the ./devtools/process-iwyu.py tool.
> +* Do not disable compiler warnings in the build file.
> +* Do not use #ifdef with driver-defined macros, instead prefer runtime 
> configuration.
> +* Document device parameters in the driver guide.
> +* Make device operations struct 'const'.
> +* Use dynamic logging.
> +* Do not use DPDK version checks in the upstream code.

Could you explain it (DPDK version check) ?

> +* Be sure to have SPDX license tags and copyright notice on each side.
> +  Use ./devtools/check-spdx-tag.sh
> +* Run the Coccinelle scripts ./devtools/cocci.sh which check for common 
> cleanups such as
> +  useless null checks before calling free routines.
> +
> +Dependencies
> +
> +
> +At times, drivers may have dependencies to external software.
> +For driver dependencies, same DPDK rules for dependencies applies.
> +Dependencies should be publicly and freely available,
> +or this is a blocker for upstreaming the driver.

Could you explain it (what's the blocker) ?

> +
> +
> +.. _tool_list:
> +
> +Test Tools
> +--
> +
> +Build and check the driver's documentation. Make sure there are no
> +warnings and driver shows up in the relevant index page.
> +
> +Be sure to run the following test tools per patch in a patch series:
> +
> +* checkpatches.sh
> +* check-git-log.sh
> +* check-meson.py
> +* check-doc-vs-code.sh
> 

Some drivers already provide private APIs, I think we should add note
for "not add private APIs, prefer to extend the corresponding framework API" 
for new drivers.


RE: [PATCH] net/af_packet: add timestamp offloading support

2024-09-06 Thread Morten Brørup
> From: Stefan Lässer [mailto:stefan.laes...@omicronenergy.com]
> Sent: Friday, 6 September 2024 08.23
> 
> > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > Sent: Tuesday, 3 September 2024 18.22
> > >
> > > On Tue, 3 Sep 2024 13:43:06 +0200
> > > Stefan Laesser  wrote:
> > >
> > > > Add the packet timestamp from TPACKET_V2 to the mbuf dynamic rx
> > > > timestamp register if offload RTE_ETH_RX_OFFLOAD_TIMESTAMP is
> > > > enabled.
> > > >
> > > > TPACKET_V2 provides the timestamp with nanosecond resolution.
> > > >
> > > > Signed-off-by: Stefan Laesser 
> > > > ---
> > > >  .mailmap  |  1 +
> > > >  doc/guides/nics/af_packet.rst |  8 --
> > > >  drivers/net/af_packet/rte_eth_af_packet.c | 34
> > > > +-
> > > -
> > > >  3 files changed, 38 insertions(+), 5 deletions(-)
> > >
> > > Adding timestamp is good, but it would be better if the timestamp
> > > field was generic. The pcap PMD also has a timestamp, and pdump API
> > > could/should use timestamp as well.
> >
> > As far as I can see, this patch does use the existing cross-driver/generic
> > timestamp dynamic field, like the pcap driver.
> 
> Yes, I use the generic timestamp dynamic field as used in all the other PMDs I
> have looked at.
> 
> >
> > >
> > > What makes sense is for there to be a standard dynamic field for
> > > nanosecond resolution timestamp, and add a make sure that all drivers
> > > use the same base  1/1/1970 same as Linux/Unix.
> >
> > Yes, standardizing on nanosecond resolution and a common base might have
> > been a better choice than using driver-specific units for the generic
> > timestamp dynamic field.
> > If the driver can use the NIC's native clock system, the driver doesn't need
> to
> > convert to nanoseconds, which has a performance cost.
> > However, I suppose any application using timestamps needs to do this
> > conversion in the application instead, so the total performance is the same
> as
> > if the drivers did it. I.e. from a performance perspective, the drivers
> might as
> > well do the conversion, and from a usability perspective, it would be easier
> > with a standard unit and base.
> >
> > We should define a roadmap towards dynamic mbuf field timestamps using
> > fixed unit and base (instead of driver-specific) and migrate towards it.
> >
> > Perhaps start by adding an ethdev capability flag,
> > RTE_ETH_RX_OFFLOAD_TIMESTAMP_NS used together with
> > RTE_ETH_RX_OFFLOAD_TIMESTAMP to indicate that the timestamp unit and
> > base follows a common standard, i.e. nanoseconds since UNIX epoch.
> >
> > There may be other considerations, though: The NIC's clock may drift
> > compared to the CPU's clock, and compared to the clock of other NICs in the
> > same system. So the "base" and "nanoseconds" will still be using the NIC's
> > clock as reference, and it might be way out of sync with the CPU's clock.
> >
> > > Also, having
> > > standard helpers in ethdev for the conversion from TSC to NS would
> > > help.
> >
> > Helpers to convert from CPU TSC to nanoseconds have broader scope than
> > ethdev and belong in the EAL, perhaps in
> > /lib/eal/include/generic/rte_cycles.h?
> 
> Should I extend my patch to include the new RTE_ETH_RX_OFFLOAD_TIMESTAMP_NS
> capability?

That would be nice, but not a requirement. :-)

Please do it as a series of patches, maybe three:
1. This patch.
2. A patch to generally introduce TIMESTAMP_NS RX offload and capability flags.
3. A patch to implement TIMESTAMP_NS in af_packet.

The new TIMESTAMP_NS feature might trigger some discussions, and you don't want 
this patch caught up too much in that discussion.

> What happens if the user only enables RTE_ETH_RX_OFFLOAD_TIMESTAMP in the
> AF_PACKET PMD?
> I would suggest that in this case the timestamp will have microsecond accuracy
> and only if RTE_ETH_RX_OFFLOAD_TIMESTAMP_NS is also enabled, then the
> timestamp will have nanosecond accuracy.

There's no need for different timestamp accuracy if TIMESTAMP_NS is not enabled.
RTE_ETH_RX_OFFLOAD_TIMESTAMP means that a timestamp is present, with driver 
dependent clock and base.
The driver is allowed to use nanoseconds as clock and UNIX origo as base, 
regardless.




Re: [PATCH v2] doc: add new driver guidelines

2024-09-06 Thread Ferruh Yigit
On 9/6/2024 9:05 AM, fengchengwen wrote:
> On 2024/8/15 3:08, Stephen Hemminger wrote:
>> From: Nandini Persad 
>>
>> This document was created to assist contributors in creating DPDK drivers
>> and provides suggestions and guidelines on how to upstream effectively.
>>
>> Co-authored-by: Ferruh Yigit 
>> Co-authored-by: Thomas Monjalon 
>> Signed-off-by: Nandini Persad 
>> Reviewed-by: Stephen Hemminger 
>> ---
>>
>> v2 - review feedback
>>- add co-author and reviewed-by
>>
>>  doc/guides/contributing/index.rst  |   1 +
>>  doc/guides/contributing/new_driver.rst | 202 +
>>  2 files changed, 203 insertions(+)
>>  create mode 100644 doc/guides/contributing/new_driver.rst
>>
> 
> ...
> 
>> +
>> +Finalizing
>> +--
>> +
>> +Once the driver has been upstreamed, the author has
>> +a responsibility to the community to maintain it.
>> +
>> +This includes the public test report. Authors must send a public
>> +test report after the first upstreaming of the PMD. The same
>> +public test procedure may be reproduced regularly per release.
>> +
>> +After the PMD is upstreamed, the author should send a patch
>> +to update the website with the name of the new PMD and supported devices
>> +via the DPDK mailing list..
> 
> .. -> .
> 
>> +
>> +For more information about the role of maintainers, see :doc:`patches`.
>> +
>> +
>> +
>> +Splitting into Patches
>> +--
>> +
> 
> ...
> 
>> +
>> +
>> +The following order in the patch series is as suggested below.
>> +
>> +The first patch should have the driver's skeleton which should include:
>> +
>> +* Maintainer's file update
>> +* Driver documentation
>> +* Document must have links to official product documentation web page
>> +* The  new document should be added into the index (`doc/guides/index.rst`)
> 
> The  new -> The new
> 
> ...
> 
>> +
>> +Additional Suggestions
>> +--
>> +
>> +* We recommend using DPDK macros instead of inventing new ones in the PMD.
>> +* Do not include unused headers. Use the ./devtools/process-iwyu.py tool.
>> +* Do not disable compiler warnings in the build file.
>> +* Do not use #ifdef with driver-defined macros, instead prefer runtime 
>> configuration.
>> +* Document device parameters in the driver guide.
>> +* Make device operations struct 'const'.
>> +* Use dynamic logging.
>> +* Do not use DPDK version checks in the upstream code.
> 
> Could you explain it (DPDK version check) ?
> 

It refers usage of 'RTE_VERSION_NUM' macro. This may be required for out
of tree drivers, as they may be supporting multiple DPDK version.

Not sure adding too much details for sure, what about following update:
`* Do not use DPDK version checks (via RTE_VERSION_NUM) in the upstream
code.`


>> +* Be sure to have SPDX license tags and copyright notice on each side.
>> +  Use ./devtools/check-spdx-tag.sh
>> +* Run the Coccinelle scripts ./devtools/cocci.sh which check for common 
>> cleanups such as
>> +  useless null checks before calling free routines.
>> +
>> +Dependencies
>> +
>> +
>> +At times, drivers may have dependencies to external software.
>> +For driver dependencies, same DPDK rules for dependencies applies.
>> +Dependencies should be publicly and freely available,
>> +or this is a blocker for upstreaming the driver.
> 
> Could you explain it (what's the blocker) ?
> 

It is trying to say, this prevents upstreaming, wording can be updated
to clarify, what about following:

`Dependencies should be publicly and freely available to be able to
upstream the driver.`


>> +
>> +
>> +.. _tool_list:
>> +
>> +Test Tools
>> +--
>> +
>> +Build and check the driver's documentation. Make sure there are no
>> +warnings and driver shows up in the relevant index page.
>> +
>> +Be sure to run the following test tools per patch in a patch series:
>> +
>> +* checkpatches.sh
>> +* check-git-log.sh
>> +* check-meson.py
>> +* check-doc-vs-code.sh
>>
> 
> Some drivers already provide private APIs, I think we should add note
> for "not add private APIs, prefer to extend the corresponding framework API" 
> for new drivers.
>

Ack.
What about adding this to "Additional Suggestions", like following:
`Do not introduce public APIs directly from the driver.`



RE: [PATCH v1 1/1] usertools: add DPDK build directory setup script

2024-09-06 Thread Morten Brørup
> From: fengchengwen [mailto:fengcheng...@huawei.com]
> Sent: Friday, 6 September 2024 09.41
> 
> On 2024/9/5 15:29, David Marchand wrote:
> > On Wed, Sep 4, 2024 at 5:17 PM Anatoly Burakov
> >  wrote:
> >>
> >> Enter this script. It relies on Meson's introspection capabilities as well
> >> as the dependency graphs generated by our build system to display all
> >> available components, and handle any dependencies for them automatically,
> >> while also not forcing user to remember any command-line options and lists
> >> of drivers, and instead relying on interactive TUI to display list of
> >> available options. It can also produce builds that are as minimal as
> >> possible (including cutting down libraries being built) by utilizing the
> >> fact that our dependency graphs report which dependency is mandatory and
> >> which one is optional.
> >>
> >> Because it is not meant to replace native Meson build configuration but
> >> is rather targeted at users who are not intimately familiar wtih DPDK's
> >> build system, it is run in interactive mode by default. However, it is
> >> also possible to run it without interaction, in which case it will pass
> >> all its parameters to Meson directly, with added benefit of dependency
> >> tracking and producing minimal builds if desired.
> >>
> >> Signed-off-by: Anatoly Burakov 
> >
> > There is no documentation.

+1

> > And it is a devtools script and not a usertools, iow, no point in
> > installing this along a built dpdk.

+1

> >
> > I don't see a lot of value in such script.
> 
> +1
> I just run this script, and it provide UI just like Linux kernel "make
> menuconfig",
> but I think DPDK is not complicated enough to have to use such menuconfig.
> 
> > In my opinion, people who really want to tune their dpdk build should
> > enter the details carefully and understand the implications.
> > But other than that, I have no strong objection.

I think this script is a good step on the roadmap towards making DPDK build 
time configuration more developer friendly.

The idea of making DPDK 100 % runtime configurable and 0 % build time 
configurable has failed.

DPDK should be buildable by distros with a lot of features and drivers enabled, 
and projects using it for special use cases should have the ability to build a 
purpose-specific variant. Just like the kernel.



Re: [RFC 1/3] uapi: introduce kernel uAPI headers importation

2024-09-06 Thread Maxime Coquelin




On 9/6/24 09:13, David Marchand wrote:

On Fri, Sep 6, 2024 at 12:15 AM Maxime Coquelin
 wrote:


This patch introduces uAPI headers importation into the
DPDK repository. This import is possible thanks to Linux
Kernel licence exception for syscalls:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/LICENSES/exceptions/Linux-syscall-note

Header files are have to be explicitly imported, and
libraries and drivers have to explicitly enable their
inclusion.

Guidelines are provided in the documentation, and a helper
script is also provided to ensure proper importation of the
header (unmodified content from a released Kernel version).

Next version will introduce a script to check headers are
valids.

Signed-off-by: Maxime Coquelin 
---
  devtools/import-linux-uapi.sh  | 48 
  doc/guides/contributing/index.rst  |  1 +
  doc/guides/contributing/linux_uapi.rst | 63 ++
  meson.build|  4 ++
  4 files changed, 116 insertions(+)
  create mode 100755 devtools/import-linux-uapi.sh
  create mode 100644 doc/guides/contributing/linux_uapi.rst

diff --git a/devtools/import-linux-uapi.sh b/devtools/import-linux-uapi.sh
new file mode 100755
index 00..efeffdd332
--- /dev/null
+++ b/devtools/import-linux-uapi.sh
@@ -0,0 +1,48 @@
+#!/bin/sh -e
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2024 Red Hat, Inc.
+
+#
+# Import Linux Kernel uAPI header file
+#
+
+base_url="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/uapi/";
+base_path="linux-headers/uapi/"
+
+print_usage()
+{
+   echo "Usage: $(basename $0) [-h] [file] [version]"


file and version are not optional.
So they should not be surrounded with [].


Ok





+   echo "Example of valid file is linux/vfio.h"
+   echo "Example of valid version is v6.10"
+}
+
+while getopts hv ARG ; do
+   case $ARG in
+   h ) print_usage; exit 0 ;;
+   ? ) print_usage; exit 1 ;;
+   esac
+done
+shift $(($OPTIND - 1))
+
+if [ $# -ne 2 ]; then
+   print_usage; exit 1;


For consistency with the rest of the script, don't use ;


Ok




+fi
+
+file=$1
+version=$2
+
+url="${base_url}${file}?h=${version}"
+path="${base_path}${file}"
+
+# Move to the root of the DPDK tree
+cd $(dirname $0)/..
+
+# Check file and version are valid
+curl -s -o /dev/null -w "%{http_code}" $url | grep -q "200"


Can we rely on curl to report such errors?
-f is probably the right option.

@@ -37,12 +37,9 @@ path="${base_path}${file}"
  # Move to the root of the DPDK tree
  cd $(dirname $0)/..

-# Check file and version are valid
-curl -s -o /dev/null -w "%{http_code}" $url | grep -q "200"
-
  # Create path if needed
  mkdir -p $(dirname $path)

  # Download the file
-curl -s -o $path $url
+curl -s -f -o $path $url

$ ./devtools/import-linux-uapi.sh linux/vdplop.h v6.10; echo $?
22


OK, what about this to get rid of the mkdir?

diff --git a/devtools/import-linux-uapi.sh b/devtools/import-linux-uapi.sh
index efeffdd332..3769da80bb 100755
--- a/devtools/import-linux-uapi.sh
+++ b/devtools/import-linux-uapi.sh
@@ -37,12 +37,6 @@ path="${base_path}${file}"
 # Move to the root of the DPDK tree
 cd $(dirname $0)/..

-# Check file and version are valid
-curl -s -o /dev/null -w "%{http_code}" $url | grep -q "200"
-
-# Create path if needed
-mkdir -p $(dirname $path)
-
 # Download the file
-curl -s -o $path $url
+curl -s -f --create-dirs -o $path $url


The only downside in both your version and this one is that versus 
initial one is that the directory gets created if curl failed.


We can though combine the best of both worlds:

$ git diff
diff --git a/devtools/import-linux-uapi.sh b/devtools/import-linux-uapi.sh
index efeffdd332..857d3dd33b 100755
--- a/devtools/import-linux-uapi.sh
+++ b/devtools/import-linux-uapi.sh
@@ -34,15 +34,9 @@ version=$2
 url="${base_url}${file}?h=${version}"
 path="${base_path}${file}"

-# Move to the root of the DPDK tree
-cd $(dirname $0)/..
-
-# Check file and version are valid
-curl -s -o /dev/null -w "%{http_code}" $url | grep -q "200"
-
-# Create path if needed
-mkdir -p $(dirname $path)
+# Check URL is valid
+curl -s -f -o /dev/null $url

 # Download the file
-curl -s -o $path $url
+curl -s -f --create-dirs -o $path $url

$ ./devtools/import-linux-uapi.sh linux/vduse.h v6.10
$ ./devtools/import-linux-uapi.sh linuxxx/vduse.h v6.10
$ find linux-headers/
linux-headers/
linux-headers/uapi
linux-headers/uapi/.gitignore
linux-headers/uapi/linux
linux-headers/uapi/linux/vduse.h


What do you prefer?





+
+# Create path if needed
+mkdir -p $(dirname $path)
+
+# Download the file
+curl -s -o $path $url
+


No need for a blank line at the end of the file.


Ack




diff --git a/doc/guides/contributing/index.rst 
b/doc/guides/contributing/index.rst
index dcb9b1fbf0..603dc72654 100644
--- a/doc/guides/contributing/index.rst
+++ b/doc/guides/contributing/index.rst
@@ -19,3 +19,4 @@ Contributor

Re: [RFC 0/2] introduce LLC aware functions

2024-09-06 Thread Burakov, Anatoly




Yes, this does help clarify things a lot as to why current NUMA support
would be insufficient to express what you are describing.

However, in that case I would echo sentiment others have expressed
already as this kind of deep sysfs parsing doesn't seem like it would be
in scope for EAL, it sounds more like something a sysadmin/orchestration
(or the application itself) would do.

I mean, in principle I'm not opposed to having such an API, it just
seems like the abstraction would perhaps need to be a bit more robust
than directly referencing cache structure? Maybe something that
degenerates into NUMA nodes would be better, so that applications
wouldn't have to *specifically* worry about cache locality but instead
have a more generic API they can use to group cores together?



Unfortunately can't cover all usecases by sysadmin/orchestration (as
graph usecase one above), and definitely too much HW detail for the
application, that is why we required some programmatic way (APIs) for
applications.

And we are on the same page that, the more we can get away from
architecture details in the abstraction (APIs) better it is, overall
intention is to provide ways to application to find lcores works
efficiently with each other.

For this what do you think about slightly different API *, like:
```
rte_get_next_lcore_ex(uint i, u32 flag)
```

Based on the flag, we can grab the next eligible lcore, for this patch
the flag can be `RTE_LCORE_LLC`, but options are wide and different
architectures can have different grouping to benefit most from HW in a
vendor agnostic way.
I like the idea, what do you think about this abstraction?

* Kudos to Vipin 😉



Hi Ferruh,

In principle, having flags for this sort of thing sounds like a better 
way to go. I do like this idea as well! It of course remains to be seen 
how it can work in practice but to me it certainly looks like a path 
worth exploring.


--
Thanks,
Anatoly



Re: [PATCH 0/3] eal: mark API's as stable

2024-09-06 Thread Ferruh Yigit
On 9/5/2024 3:01 PM, Jerin Jacob wrote:
> On Thu, Sep 5, 2024 at 3:14 PM Morten Brørup  
> wrote:
>>
>>> From: David Marchand [mailto:david.march...@redhat.com]
>>> Sent: Thursday, 5 September 2024 11.03
>>>
>>> On Thu, Sep 5, 2024 at 10:55 AM Morten Brørup 
>>> wrote:

> From: David Marchand [mailto:david.march...@redhat.com]
> Sent: Thursday, 5 September 2024 09.59
>
> On Wed, Sep 4, 2024 at 8:10 PM Stephen Hemminger
>  wrote:
>>
>> The API's in ethtool from before 23.11 should be marked stable.
>
> EAL* ?
>
>> Should probably include the trace api's but that is more complex change.
>
> On the trace API itself it should be ok.

 No!
>>>
>>> *sigh*
>>>

 Trace must remain experimental until controlled by a meson option, e.g.
>>> "enable_trace", whereby trace can be completely disabled and omitted from 
>>> the
>>> compiled application/libraries/drivers at build time.
>>>
>>> This seems unrelated to marking the API stable as regardless of the
>>> API state at the moment, this code is always present.
>>
>> I cannot foresee if disabling trace at build time will require changes to 
>> the trace API. So I'm being cautious here.
>>
>> However, if Jerin (as author of the trace subsystem) foresees that it will 
>> be possible to disable trace at build time without affecting the trace API, 
>> I don't object to marking the trace API (or some of it) stable.
> 
> I don't for foresee any ABI changes when adding disabling trace
> compile time support. However, I don't understand why we need to do
> that. In the sense, fast path functions are already having an option
> to compile out.
> Slow path functions can be disabled at runtime at the cost of 1 cycle
> as instrumentation cost. Having said that, I don't have any concern
> about disabling trace as an option.
> 

I agree with Jerin, I don't see motivation to disable slow path traces
when they can be disabled in runtime.
And fast path traces already have compile flag to disable them.

Build time configurations in long term has problems too, so I am for not
using them unless we don't have to.

> 
>>
>> Before doing that, rte_trace_mode_get/set() and the accompanying enum 
>> rte_trace_mode should be changed to rte_trace_config_get/set() using a new 
>> struct rte_trace_config (containing the enum rte_trace_mode, and expandable 
>> with new fields as the need arises). This will prepare for e.g. tracing to 
>> other destinations than system memory, such as a remote trace collector on 
>> the network, like SYSLOG.
>>
>>> Patches welcome if you want it stripped.
>>
>> Don't have time myself, so I suggested it as a code challenge instead. :-)
>>



Re: [PATCH v1 1/1] usertools: add DPDK build directory setup script

2024-09-06 Thread Burakov, Anatoly

On 9/6/2024 10:28 AM, Morten Brørup wrote:

From: fengchengwen [mailto:fengcheng...@huawei.com]
Sent: Friday, 6 September 2024 09.41

On 2024/9/5 15:29, David Marchand wrote:

On Wed, Sep 4, 2024 at 5:17 PM Anatoly Burakov
 wrote:


Enter this script. It relies on Meson's introspection capabilities as well
as the dependency graphs generated by our build system to display all
available components, and handle any dependencies for them automatically,
while also not forcing user to remember any command-line options and lists
of drivers, and instead relying on interactive TUI to display list of
available options. It can also produce builds that are as minimal as
possible (including cutting down libraries being built) by utilizing the
fact that our dependency graphs report which dependency is mandatory and
which one is optional.

Because it is not meant to replace native Meson build configuration but
is rather targeted at users who are not intimately familiar wtih DPDK's
build system, it is run in interactive mode by default. However, it is
also possible to run it without interaction, in which case it will pass
all its parameters to Meson directly, with added benefit of dependency
tracking and producing minimal builds if desired.

Signed-off-by: Anatoly Burakov 


There is no documentation.


+1


And it is a devtools script and not a usertools, iow, no point in
installing this along a built dpdk.


+1



I don't see a lot of value in such script.


+1
I just run this script, and it provide UI just like Linux kernel "make
menuconfig",
but I think DPDK is not complicated enough to have to use such menuconfig.


In my opinion, people who really want to tune their dpdk build should
enter the details carefully and understand the implications.
But other than that, I have no strong objection.


I think this script is a good step on the roadmap towards making DPDK build 
time configuration more developer friendly.

The idea of making DPDK 100 % runtime configurable and 0 % build time 
configurable has failed.

DPDK should be buildable by distros with a lot of features and drivers enabled, 
and projects using it for special use cases should have the ability to build a 
purpose-specific variant. Just like the kernel.



Well, technically, this doesn't enable this use case any more than it is 
already enabled by Meson, it's just a more friendly frontend for doing 
that sort of thing. menuconfig is a good analogy, although the script is 
way more limited in scope than menuconfig, and doesn't cover nearly as 
many DPDK options as a proper menuconfig-like script would, as for 
example it doesn't cover things like CPU instruction sets or other 
build-time configuration that we have in Meson. (I did have this in my 
internal prototype, but I decided to remove this feature because the 
script was getting positively giant, it's pushing 800 lines as it is)


Still, I think it'll be easier to use for people unfamiliar with DPDK 
(or people who don't like typing a lot, of which I am one).


--
Thanks,
Anatoly



Re: 32-bit virtio failing on DPDK v23.11.1 (and tags)

2024-09-06 Thread Maxime Coquelin

Hello Chris,

On 9/3/24 16:43, Chris Brezovec (cbrezove) wrote:

Hi Maxime / others,

I am just following up to see if you have had any chance to look at what 
I previously sent and had any ideas regarding the issue.


It seems there are not a lot of people testing 32-bits builds with
Virtio if it is borken since v23.03.

As it looks important to you, could you please work in setting up a CI?

For the issue itself, nothing catch my eye for now. I will continue to
have a look.

Regards,
Maxime


Thanks in advance!

-ChrisB

*From: *Chris Brezovec (cbrezove) 
*Date: *Wednesday, August 28, 2024 at 5:27 PM
*To: *dev@dpdk.org , maxime.coque...@redhat.com 


*Cc: *common-dpio-core-team(mailer list) 
*Subject: *32-bit virtio failing on DPDK v23.11.1 (and tags)

HI Maxime,

My name is Chris Brezovec, we met and talked about some 32 bit virtio 
issues we were seeing at Cisco during the DPDK summit last year.  There 
was also a back and forth between you and Dave Johnson at Cisco last 
September regarding the same issue.  I have attached some of the email 
chain from that conversation that resulted in this commit being made to 
dpdk v23.11 
(https://github.com/DPDK/dpdk/commit/8c41645be010ec7fa0df4f6c3790b167945154b4 ).


We recently picked up the v23.11.1 DPDK release and saw that 32 bit 
virtio is not working again, but 64-bit virtio is working.  We are 
noticing CVQ timeouts - PMD receives no response from host and this 
leads to failure of the port to start.  We were able to recreate this 
issue using testpmd.  We have done some tracing through the virtio 
changes made during the development of the v23.xx DPDK release, and 
believe we have identified the following rework commit to have caused a 
failure 
(https://github.com/DPDK/dpdk/commit/a632f0f64ffba3553a18bdb51a670c1b603c0ce6 ).


We have also tested v23.07, v23.11, v23.11.2-rc2, v24.07 and they all 
seem to see the same issue when running in 32-bit mode using testpmd.


We were hoping you might be able to take a quick look at the two commits 
and see if there might be something obvious missing in the refactor work 
that might have caused this issue.  I am thinking there might a location 
or two in the code that should be using the VIRTIO_MBUF_ADDR() or 
similar macro that might have been missed.


Regards,

ChrisB

This is some of the testpmd output seen on v23.11.2-rc2:

LD_LIBRARY_PATH=/home/rmelton/scratch/dpdk-v23.11.2-rc2.git/build/lib 
/home/rmelton/scratch/dpdk-v23.11.2-rc2.git/build/app/dpdk-testpmd -l 
2-3 -a :07:00.0 --log-level pmd.net.iavf.*,8 --log-level lib.eal.*,8 
--log-level=lib.eal:info --log-level=lib.eal:debug 
--log-level=lib.ethdev:info --log-level=lib.ethdev:debug 
--log-level=lib.virtio:warning --log-level=lib.virtio:info 
--log-level=lib.virtio:debug --log-level=pmd.*:debug --iova-mode=pa -- -i


— snip —

virtio_send_command(): vq->vq_desc_head_idx = 0, status = 255, 
vq->hw->cvq = 0x76d9acc0 vq = 0x76d9ac80


virtio_send_command_split(): vq->vq_queue_index = 2

virtio_send_command_split(): vq->vq_free_cnt=64

vq->vq_desc_head_idx=0

virtio_dev_promiscuous_disable(): Failed to disable promisc

Failed to disable promiscuous mode for device (port 0): Resource 
temporarily unavailable


Error during restoring configuration for device (port 0): Resource 
temporarily unavailable


virtio_dev_stop(): stop

Fail to start port 0: Resource temporarily unavailable

Done

virtio_send_command(): vq->vq_desc_head_idx = 0, status = 255, 
vq->hw->cvq = 0x76d9acc0 vq = 0x76d9ac80


virtio_send_command_split(): vq->vq_queue_index = 2

virtio_send_command_split(): vq->vq_free_cnt=64

vq->vq_desc_head_idx=0

virtio_dev_promiscuous_enable(): Failed to enable promisc

Error during enabling promiscuous mode for port 0: Resource temporarily 
unavailable - ignore






Re: [PATCH 0/3] eal: mark API's as stable

2024-09-06 Thread Ferruh Yigit
On 9/5/2024 8:58 AM, David Marchand wrote:
> On Wed, Sep 4, 2024 at 8:10 PM Stephen Hemminger
>  wrote:
>>
>> The API's in ethtool from before 23.11 should be marked stable.
> 
> EAL* ?
> 
>> Should probably include the trace api's but that is more complex change.
> 
> On the trace API itself it should be ok.
> The problem is with the tracepoint variables themselves, and I don't
> think we should mark them stable.
> 

We cleaned tracepoint variables from ethdev map file, why they exist for
'eal'?

I can see .map file has bunch of "__rte_eal_trace_generic_*", I think
they exists to support 'rte_eal_trace_generic_*()' APIs which can be
called from other libraries.

Do we really need them?
Why not whoever calls them directly call 'rte_trace_point_emit_*' instead?
As these rte_eal_trace_generic_*()' not used at all, I assume this is
what done already.

@Jerin,
what do think to remove 'rte_eal_trace_generic_*()' APIs, so trace
always keeps local to library, and don't bloat the eal .map file?




Re: [PATCH 0/3] eal: mark API's as stable

2024-09-06 Thread David Marchand
On Fri, Sep 6, 2024 at 11:34 AM Ferruh Yigit  wrote:
> > On the trace API itself it should be ok.
> > The problem is with the tracepoint variables themselves, and I don't
> > think we should mark them stable.
> >
>
> We cleaned tracepoint variables from ethdev map file, why they exist for
> 'eal'?
>
> I can see .map file has bunch of "__rte_eal_trace_generic_*", I think
> they exists to support 'rte_eal_trace_generic_*()' APIs which can be
> called from other libraries.
>
> Do we really need them?
> Why not whoever calls them directly call 'rte_trace_point_emit_*' instead?
> As these rte_eal_trace_generic_*()' not used at all, I assume this is
> what done already.
>
> @Jerin,
> what do think to remove 'rte_eal_trace_generic_*()' APIs, so trace
> always keeps local to library, and don't bloat the eal .map file?

IIRC, we still need to export them for inline helpers.


-- 
David Marchand



Re: 21.11.8 patches review and test

2024-09-06 Thread Kevin Traynor
On 05/09/2024 15:02, Kevin Traynor wrote:
> On 05/09/2024 14:29, Ali Alnubani wrote:
>>> -Original Message-
>>> From: Kevin Traynor 
>>> Sent: Thursday, September 5, 2024 3:38 PM
>>> To: sta...@dpdk.org
>>> Cc: dev@dpdk.org; Abhishek Marathe ; Ali
>>> Alnubani ; David Christensen ;
>>> Hemant Agrawal ; Ian Stokes
>>> ; Jerin Jacob ; John McNamara
>>> ; Ju-Hyoung Lee ; Kevin
>>> Traynor ; Luca Boccassi ; Pei Zhang
>>> ; Raslan Darawsheh ; NBU-
>>> Contact-Thomas Monjalon (EXTERNAL) ;
>>> yangh...@redhat.com
>>> Subject: 21.11.8 patches review and test
>>>
>>> Hi all,
>>>
>>> Here is a list of patches targeted for stable release 21.11.8.
>>>
>>> The planned date for the final release is 18th September.
>>>
>>> Please help with testing and validation of your use cases and report
>>> any issues/results with reply-all to this mail. For the final release
>>> the fixes and reported validations will be added to the release notes.
>>>
>>> A release candidate tarball can be found at:
>>>
>>> https://dpdk.org/browse/dpdk-stable/tag/?id=v21.11.8-rc1
>>>
>>> These patches are located at branch 21.11 of dpdk-stable repo:
>>> https://dpdk.org/browse/dpdk-stable/
>>>
>>> Thanks.
>>>
>>> Kevin
>>>
>>> ---
>>
>> Hi Kevin,
>>
>> I see this build failure in Debian 12 and Fedora 40:
>>
>> $ meson --werror --buildtype=debugoptimized build && ninja -C build
>> [..]
>> drivers/net/softnic/rte_eth_softnic_meter.c:916:25: error: 's' may be used 
>> uninitialized [-Werror=maybe-uninitialized]
>>
>> Will update with the rest of our functional testing later during the next 
>> couple of weeks.
>>
>> Regards,
>> Ali
> 
> ok, thanks. I will check it out.
Hi Ali,

It looks like a false positive, as the stats [0] are initialised in
mtr_stats_convert() before they are used. The code is unchanged since
the last release so probably it's compiler/distro change for this release.

I've built with this meson command using latest gcc and clang on a F40
and not seeing this issue [1].

Are you using same compiler versions ? Any other details needed to
reproduce ?

thanks,
Kevin.

[0]
https://git.dpdk.org/dpdk-stable/tree/drivers/net/softnic/rte_eth_softnic_meter.c?h=21.11#n906

[1]
$ clang --version
clang version 18.1.6 (Fedora 18.1.6-3.fc40)
$ gcc --version
gcc (GCC) 14.2.1 20240801 (Red Hat 14.2.1-1)

commit 680818068d31764357075cde440232ce5ab8b786 (HEAD -> 21.11, tag:
v21.11.8-rc1, origin/21.11)
Author: Kevin Traynor 
Date:   Thu Sep 5 10:34:16 2024 +0100

version: 21.11.8-rc1

e.g.
$ meson --werror --buildtype=debugoptimized build-gcc
...
$ ninja -C build-gcc
ninja: Entering directory `build-gcc'
[3071/3071] Linking target app/test/dpdk-test



> Kevin.




RE: [PATCH 0/3] eal: mark API's as stable

2024-09-06 Thread Morten Brørup
> From: Ferruh Yigit [mailto:ferruh.yi...@amd.com]
> Sent: Friday, 6 September 2024 10.54
> 
> On 9/5/2024 3:01 PM, Jerin Jacob wrote:
> > On Thu, Sep 5, 2024 at 3:14 PM Morten Brørup 
> wrote:
> >>
> >>> From: David Marchand [mailto:david.march...@redhat.com]
> >>> Sent: Thursday, 5 September 2024 11.03
> >>>
> >>> On Thu, Sep 5, 2024 at 10:55 AM Morten Brørup 
> >>> wrote:
> 
> > From: David Marchand [mailto:david.march...@redhat.com]
> > Sent: Thursday, 5 September 2024 09.59
> >
> > On Wed, Sep 4, 2024 at 8:10 PM Stephen Hemminger
> >  wrote:
> >>
> >> The API's in ethtool from before 23.11 should be marked stable.
> >
> > EAL* ?
> >
> >> Should probably include the trace api's but that is more complex
> change.
> >
> > On the trace API itself it should be ok.
> 
>  No!
> >>>
> >>> *sigh*
> >>>
> 
>  Trace must remain experimental until controlled by a meson option, e.g.
> >>> "enable_trace", whereby trace can be completely disabled and omitted from
> the
> >>> compiled application/libraries/drivers at build time.
> >>>
> >>> This seems unrelated to marking the API stable as regardless of the
> >>> API state at the moment, this code is always present.
> >>
> >> I cannot foresee if disabling trace at build time will require changes to
> the trace API. So I'm being cautious here.
> >>
> >> However, if Jerin (as author of the trace subsystem) foresees that it will
> be possible to disable trace at build time without affecting the trace API, I
> don't object to marking the trace API (or some of it) stable.
> >
> > I don't for foresee any ABI changes when adding disabling trace
> > compile time support. However, I don't understand why we need to do
> > that. In the sense, fast path functions are already having an option
> > to compile out.
> > Slow path functions can be disabled at runtime at the cost of 1 cycle
> > as instrumentation cost. Having said that, I don't have any concern
> > about disabling trace as an option.
> >
> 
> I agree with Jerin, I don't see motivation to disable slow path traces
> when they can be disabled in runtime.
> And fast path traces already have compile flag to disable them.
> 
> Build time configurations in long term has problems too, so I am for not
> using them unless we don't have to.

For some use cases, trace is dead code, and should be omitted.
You don't want dead code in production systems.

Please remember that DPDK is also being used in highly optimized embedded 
systems, hardware appliances and other systems where memory is not abundant.

DPDK is not only for cloud and distros. ;-)

The CI only tests DPDK with a build time configuration expected to be usable 
for distros.
I'm not asking to change that.
I'm only asking for more build time configurability to support other use cases.



RE: [EXTERNAL] [PATCH] net/virtio-user: reset used index counter in dev reset

2024-09-06 Thread Shiva Shankar Kommula
Hello Maxime, 
could you please review the following change ? 

Thanks

> Subject: [EXTERNAL] [PATCH] net/virtio-user: reset used index counter in dev
> reset
> 
> When the virtio device is reinitialized during ethdev reconfiguration, all the
> virtio rings are recreated and repopulated on the device. Accordingly, reset 
> the
> used index counter value back to zero. Signed-off-by: Kommula Shiva Shankar
>  
> When the virtio device is reinitialized during ethdev reconfiguration, all the
> virtio rings are recreated and repopulated on the device.
> Accordingly, reset the used index counter value back to zero.
> 
> Signed-off-by: Kommula Shiva Shankar 
> ---
>  drivers/net/virtio/virtio_user_ethdev.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/virtio/virtio_user_ethdev.c
> b/drivers/net/virtio/virtio_user_ethdev.c
> index ae6593ba0b..d60c7e188c 100644
> --- a/drivers/net/virtio/virtio_user_ethdev.c
> +++ b/drivers/net/virtio/virtio_user_ethdev.c
> @@ -204,6 +204,7 @@ virtio_user_setup_queue_packed(struct virtqueue
> *vq,
>   vring->device = (void *)(uintptr_t)used_addr;
>   dev->packed_queues[queue_idx].avail_wrap_counter = true;
>   dev->packed_queues[queue_idx].used_wrap_counter = true;
> + dev->packed_queues[queue_idx].used_idx = 0;
> 
>   for (i = 0; i < vring->num; i++)
>   vring->desc[i].flags = 0;
> --
> 2.43.0


[PATCH 0/3] Error report improvement and fix

2024-09-06 Thread Gavin Li
This patch set is to improve error handling in pmd and under layer.

Gavin Li (3):
  net/mlx5: set rte errno if malloc failed
  net/mlx5/hws: add log for failing to create rule in HWS
  net/mlx5/hws: print CQE error syndrome and more information

 drivers/net/mlx5/hws/mlx5dr_rule.c |  6 ++
 drivers/net/mlx5/hws/mlx5dr_send.c |  9 -
 drivers/net/mlx5/mlx5_flow_hw.c| 31 +++---
 3 files changed, 38 insertions(+), 8 deletions(-)

-- 
2.34.1



[PATCH 2/3] net/mlx5/hws: add log for failing to create rule in HWS

2024-09-06 Thread Gavin Li
From: "Minggang Li (Gavin)" 

Signed-off-by: Gavin Li 
Acked-by: Alex Vesker 
---
 drivers/net/mlx5/hws/mlx5dr_rule.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/mlx5/hws/mlx5dr_rule.c 
b/drivers/net/mlx5/hws/mlx5dr_rule.c
index 1edb7eac74..5d66d81ea5 100644
--- a/drivers/net/mlx5/hws/mlx5dr_rule.c
+++ b/drivers/net/mlx5/hws/mlx5dr_rule.c
@@ -638,6 +638,7 @@ static int mlx5dr_rule_destroy_hws(struct mlx5dr_rule *rule,
 
/* Rule is not completed yet */
if (rule->status == MLX5DR_RULE_STATUS_CREATING) {
+   DR_LOG(NOTICE, "Cannot destroy, rule creation still in 
progress");
rte_errno = EBUSY;
return rte_errno;
}
@@ -806,12 +807,14 @@ static int mlx5dr_rule_enqueue_precheck(struct 
mlx5dr_rule *rule,
struct mlx5dr_context *ctx = rule->matcher->tbl->ctx;
 
if (unlikely(!attr->user_data)) {
+   DR_LOG(DEBUG, "User data must be provided for rule operations");
rte_errno = EINVAL;
return rte_errno;
}
 
/* Check if there is room in queue */
if 
(unlikely(mlx5dr_send_engine_full(&ctx->send_queue[attr->queue_id]))) {
+   DR_LOG(NOTICE, "No room in queue[%d]", attr->queue_id);
rte_errno = EBUSY;
return rte_errno;
}
@@ -823,6 +826,7 @@ static int mlx5dr_rule_enqueue_precheck_move(struct 
mlx5dr_rule *rule,
 struct mlx5dr_rule_attr *attr)
 {
if (unlikely(rule->status != MLX5DR_RULE_STATUS_CREATED)) {
+   DR_LOG(DEBUG, "Cannot move, rule status is invalid");
rte_errno = EINVAL;
return rte_errno;
}
@@ -835,6 +839,7 @@ static int mlx5dr_rule_enqueue_precheck_create(struct 
mlx5dr_rule *rule,
 {
if (unlikely(mlx5dr_matcher_is_in_resize(rule->matcher))) {
/* Matcher in resize - new rules are not allowed */
+   DR_LOG(NOTICE, "Resizing in progress, cannot create rule");
rte_errno = EAGAIN;
return rte_errno;
}
@@ -1068,6 +1073,7 @@ int mlx5dr_rule_hash_calculate(struct mlx5dr_matcher 
*matcher,
mlx5dr_table_is_root(matcher->tbl) ||
matcher->tbl->ctx->caps->access_index_mode == 
MLX5DR_MATCHER_INSERT_BY_HASH ||
matcher->tbl->ctx->caps->flow_table_hash_type != 
MLX5_FLOW_TABLE_HASH_TYPE_CRC32) {
+   DR_LOG(DEBUG, "Matcher is not supported");
rte_errno = ENOTSUP;
return -rte_errno;
}
-- 
2.34.1



[PATCH 1/3] net/mlx5: set rte errno if malloc failed

2024-09-06 Thread Gavin Li
From: "Minggang Li (Gavin)" 

rte_errno should be set if anything wrong happened in under layer so that
user can figure out what's going on.

There were some cases that did not set it when ipool allcation failed. To
fix the issue, set rte_errno to ENOMEM if mlx5_ipool_malloc failed to
allocate ID.

Fixes: c40c061a02 ("net/mlx5: add basic flow queue operation")
Fixes: 48fbb0e93d ("net/mlx5: support flow meter mark indirect action with HWS")
cc: sta...@dpdk.org
Signed-off-by: Gavin Li 
Acked-by: Bing Zhao 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 50888944a5..509de2a6a4 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -1897,7 +1897,7 @@ flow_hw_meter_mark_alloc(struct rte_eth_dev *dev, 
uint32_t queue,
const struct rte_flow_action_meter_mark *meter_mark = action->conf;
struct mlx5_aso_mtr *aso_mtr;
struct mlx5_flow_meter_info *fm;
-   uint32_t mtr_id;
+   uint32_t mtr_id = 0;
uintptr_t handle = (uintptr_t)MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 
@@ -1909,8 +1909,15 @@ flow_hw_meter_mark_alloc(struct rte_eth_dev *dev, 
uint32_t queue,
if (meter_mark->profile == NULL)
return NULL;
aso_mtr = mlx5_ipool_malloc(pool->idx_pool, &mtr_id);
-   if (!aso_mtr)
+   if (!aso_mtr) {
+   rte_flow_error_set(error, ENOMEM,
+  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+  NULL,
+  "failed to allocate aso meter entry");
+   if (mtr_id)
+   mlx5_ipool_free(pool->idx_pool, mtr_id);
return NULL;
+   }
/* Fill the flow meter parameters. */
aso_mtr->type = ASO_METER_INDIRECT;
fm = &aso_mtr->fm;
@@ -3918,8 +3925,10 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
return NULL;
}
flow = mlx5_ipool_malloc(table->flow, &flow_idx);
-   if (!flow)
+   if (!flow) {
+   rte_errno = ENOMEM;
goto error;
+   }
rule_acts = flow_hw_get_dr_action_buffer(priv, table, 
action_template_index, queue);
/*
 * Set the table here in order to know the destination table
@@ -3930,8 +3939,10 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
flow->idx = flow_idx;
if (table->resource) {
mlx5_ipool_malloc(table->resource, &res_idx);
-   if (!res_idx)
+   if (!res_idx) {
+   rte_errno = ENOMEM;
goto error;
+   }
flow->res_idx = res_idx;
} else {
flow->res_idx = flow_idx;
@@ -4062,8 +4073,10 @@ flow_hw_async_flow_create_by_index(struct rte_eth_dev 
*dev,
return NULL;
}
flow = mlx5_ipool_malloc(table->flow, &flow_idx);
-   if (!flow)
+   if (!flow) {
+   rte_errno = ENOMEM;
goto error;
+   }
rule_acts = flow_hw_get_dr_action_buffer(priv, table, 
action_template_index, queue);
/*
 * Set the table here in order to know the destination table
@@ -4074,8 +4087,10 @@ flow_hw_async_flow_create_by_index(struct rte_eth_dev 
*dev,
flow->idx = flow_idx;
if (table->resource) {
mlx5_ipool_malloc(table->resource, &res_idx);
-   if (!res_idx)
+   if (!res_idx) {
+   rte_errno = ENOMEM;
goto error;
+   }
flow->res_idx = res_idx;
} else {
flow->res_idx = flow_idx;
@@ -4210,8 +4225,10 @@ flow_hw_async_flow_update(struct rte_eth_dev *dev,
nf->idx = of->idx;
if (table->resource) {
mlx5_ipool_malloc(table->resource, &res_idx);
-   if (!res_idx)
+   if (!res_idx) {
+   rte_errno = ENOMEM;
goto error;
+   }
nf->res_idx = res_idx;
} else {
nf->res_idx = of->res_idx;
-- 
2.34.1



[PATCH 3/3] net/mlx5/hws: print CQE error syndrome and more information

2024-09-06 Thread Gavin Li
From: "Minggang Li (Gavin)" 

Signed-off-by: Gavin Li 
Acked-by: Alex Vesker 
---
 drivers/net/mlx5/hws/mlx5dr_send.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/hws/mlx5dr_send.c 
b/drivers/net/mlx5/hws/mlx5dr_send.c
index 3022c50260..c931896a79 100644
--- a/drivers/net/mlx5/hws/mlx5dr_send.c
+++ b/drivers/net/mlx5/hws/mlx5dr_send.c
@@ -598,8 +598,15 @@ static void mlx5dr_send_engine_poll_cq(struct 
mlx5dr_send_engine *queue,
cqe_owner != sw_own)
return;
 
-   if (unlikely(cqe_opcode != MLX5_CQE_REQ))
+   if (unlikely(cqe_opcode != MLX5_CQE_REQ)) {
+   struct mlx5_err_cqe *err_cqe = (struct mlx5_err_cqe *)cqe;
+
+   DR_LOG(ERR, "CQE ERR:0x%x, Vender_ERR:0x%x, OP:0x%x, QPN:0x%x, 
WQE_CNT:0x%x",
+   err_cqe->syndrome, err_cqe->vendor_err_synd, cqe_opcode,
+   (rte_be_to_cpu_32(err_cqe->s_wqe_opcode_qpn) & 
0xff),
+   rte_be_to_cpu_16(err_cqe->wqe_counter));
queue->err = true;
+   }
 
rte_io_rmb();
 
-- 
2.34.1



Re: [PATCH 0/3] eal: mark API's as stable

2024-09-06 Thread Ferruh Yigit
On 9/6/2024 10:48 AM, David Marchand wrote:
> On Fri, Sep 6, 2024 at 11:34 AM Ferruh Yigit  wrote:
>>> On the trace API itself it should be ok.
>>> The problem is with the tracepoint variables themselves, and I don't
>>> think we should mark them stable.
>>>
>>
>> We cleaned tracepoint variables from ethdev map file, why they exist for
>> 'eal'?
>>
>> I can see .map file has bunch of "__rte_eal_trace_generic_*", I think
>> they exists to support 'rte_eal_trace_generic_*()' APIs which can be
>> called from other libraries.
>>
>> Do we really need them?
>> Why not whoever calls them directly call 'rte_trace_point_emit_*' instead?
>> As these rte_eal_trace_generic_*()' not used at all, I assume this is
>> what done already.
>>
>> @Jerin,
>> what do think to remove 'rte_eal_trace_generic_*()' APIs, so trace
>> always keeps local to library, and don't bloat the eal .map file?
> 
> IIRC, we still need to export them for inline helpers.
> 

As far as I can see they are only used for 'rte_eal_trace_generic_*()'
trace helper APIs, but does eal really expose these helper APIs?

Is there any other inline helpers I am missing?


Yunsilicon Roadmap for 24.11

2024-09-06 Thread WanRenyong
Hello, 
Please find below Yunsilicon roadmap for 24.11. 

 * Introduce XSC PMD for Yunsilicon metaScale SmartNIC

Support Features
--
- MTU update
- TSO
- RSS hash
- RSS key update
- RSS reta update
- L3 checksum offload
- L4 checksum offload
- Inner L3 checksum
- Inner L4 checksum
- Basic stats

Support NICs
--
- metaScale-200S   Single QSFP56 Port 200GE SmartNIC
- metaScale-200     Quad QSFP28 Ports 100GE SmartNIC
- metaScale-50       Dual QSFP28 Port 25GE SmartNIC
- metaScale-100Q   Quad QSFP28 Port 25GE SmartNIC

About Yunsilicon
-
Yunsilicon Technology Co., Ltd is a high tech startup focused on cloud 
datacenter ASIC product development and technology innovation.  MetaScale 
SmartNIC is designed for modern data centers, cloud environments, and 
high-performance networks and storage in AI computing centers.  
For more about Yunsilicon products, please see: 
https://www.yunsilicon.com/#/productInformation

Thanks,
WanRenyong


Re: Crash in tap pmd when using more than 8 rx queues

2024-09-06 Thread Ferruh Yigit
On 9/5/2024 1:55 PM, Edwin Brossette wrote:
> Hello,
> 
> I have recently stumbled into an issue with my DPDK-based application
> running the failsafe pmd. This pmd uses a tap device, with which my
> application fails to start if more than 8 rx queues are used. This issue
> appears to be related to this patch:
> https://git.dpdk.org/dpdk/commit/?
> id=c36ce7099c2187926cd62cff7ebd479823554929  commit/?id=c36ce7099c2187926cd62cff7ebd479823554929>
> 
> I have seen in the documentation that there was a limitation to 8 max
> queues shared when using a tap device shared between multiple processes.
> However, my application uses a single primary process, with no secondary
> process, but it appears that I am still running into this limitation.
> 
> Now if we look at this small chunk of code:
> 
> memset(&msg, 0, sizeof(msg));
> strlcpy(msg.name , TAP_MP_REQ_START_RXTX,
> sizeof(msg.name ));
> strlcpy(request_param->port_name, dev->data->name, sizeof(request_param-
>>port_name));
> msg.len_param = sizeof(*request_param);
> for (i = 0; i < dev->data->nb_tx_queues; i++) {
>     msg.fds[fd_iterator++] = process_private->txq_fds[i];
>     msg.num_fds++;
>     request_param->txq_count++;
> }
> for (i = 0; i < dev->data->nb_rx_queues; i++) {
>     msg.fds[fd_iterator++] = process_private->rxq_fds[i];
>     msg.num_fds++;
>     request_param->rxq_count++;
> }
> (Note that I am not using the latest DPDK version, but stable v23.11.1.
> But I believe the issue is still present on latest.)
> 
> There are no checks on the maximum value i can take in the for loops.
> Since the size of msg.fds is limited by the maximum of 8 queues shared
> between process because of the IPC API, there is a potential buffer
> overflow which can happen here.
> 
> See the struct declaration:
> struct rte_mp_msg {
>  char name[RTE_MP_MAX_NAME_LEN];
>  int len_param;
>  int num_fds;
>  uint8_t param[RTE_MP_MAX_PARAM_LEN];
>  int fds[RTE_MP_MAX_FD_NUM];
> };
> 
> This means that if the number of queues used is more than 8, the program
> will crash. This is what happens on my end as I get the following log:
> *** stack smashing detected ***: terminated
> 
> Reverting the commit mentionned above fixes my issue. Also setting a
> check like this works for me:
> 
> if (dev->data->nb_tx_queues + dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM)
>  return -1;
> 
> I've made the changes on my local branch to fix my issue. This mail is
> just to bring attention on this problem.
> Thank you in advance for considering it.
> 

Hi Edwin,

Thanks for the report, I confirm issue is valid, although that code
changed a little (to increase 8 limit) [3].

And in this release Stephen put another patch [1] to increase the limit
even more, but irrelevant from the limit, tap code needs to be fixed.

To fix:
1. We need to add "nb_rx_queues > RTE_MP_MAX_FD_NUM" check you
mentioned, to not blindly update the 'msg.fds[]'
2. We should prevent this to be a limit for tap PMD when there is only
primary process, this seems was oversight in our end.


Can you work on the issue or just reporting it?
Can you please report the bug in Bugzilla [2], to record the issue?



[1]
https://patches.dpdk.org/project/dpdk/patch/20240905162018.74301-1-step...@networkplumber.org/

[2]
https://bugs.dpdk.org/

[3]
https://git.dpdk.org/dpdk/commit/?id=72ab1dc1598e



[RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK

2024-09-06 Thread Anatoly Burakov
While initially, DPDK has used the term "socket ID" to refer to physical package
ID, the last time DPDK read "physical_package_id" for socket ID was ~9 years
ago, so it's been a while since we've actually switched over to using the term
"socket" to mean "NUMA node".

This wasn't a problem before, as most systems had one NUMA node per physical
socket. However, in the last few years, more and more systems have multiple NUMA
nodes per physical CPU socket. Since DPDK used NUMA nodes already, the
transition was pretty seamless, however now we're faced with a situation when
most of our documentation still uses outdated terms, and our API is ripe with
references to "sockets" when in actuality we mean "NUMA nodes". This could be a
source of confusion.

While completely renaming all of our API's would be a huge effort, will take a
long time and arguably wouldn't even be worth the API breakages (given that this
mismatch between terminology and reality is implicitly understood by most people
working on DPDK, and so this isn't so much of a problem in practice), we can do
some tweaks around the edges and at least document this unfortunate reality.

This patchset suggests the following changes:

- Update rte_socket/rte_lcore documentation to refer to NUMA nodes rather than
sockets - Rename internal structures' fields to better reflect this intention -
Rename --socket-mem/--socket-limit flags to refer to NUMA rather than sockets -
Add internal API to get physical package ID [1]

The documentation is updated to refer to new EAL flags, but is otherwise left
untouched, and instead the entry in "glossary" is amended to indicate that when
DPDK documentation refers to "sockets", it actually means "NUMA ID's". As next
steps, we could rename all API parameters to refer to NUMA ID rather than socket
ID - this would not break neither API nor ABI, and instead would be a
documentation change in practice.

[1] This could be used to group lcores by physical package, see e.g. discussion
under this patch: 
https://patches.dpdk.org/project/dpdk/cover/20240827151014.201-1-vipin.vargh...@amd.com/

Anatoly Burakov (5):
  eal: update socket ID API documentation
  lcore: rename socket ID to NUMA ID
  eal: rename socket ID to NUMA ID in internal config
  eal: rename --socket-mem/--socket-limit
  lcore: store physical package ID internally

 doc/guides/faq/faq.rst|  4 +--
 doc/guides/howto/lm_bond_virtio_sriov.rst |  2 +-
 doc/guides/howto/lm_virtio_vhost_user.rst |  2 +-
 doc/guides/howto/pvp_reference_benchmark.rst  |  4 +--
 .../virtio_user_for_container_networking.rst  |  2 +-
 doc/guides/linux_gsg/build_sample_apps.rst| 20 +--
 doc/guides/linux_gsg/linux_eal_parameters.rst | 16 -
 doc/guides/nics/mlx4.rst  |  2 +-
 doc/guides/nics/mlx5.rst  |  2 +-
 .../prog_guide/env_abstraction_layer.rst  | 12 +++
 doc/guides/prog_guide/glossary.rst|  5 ++-
 doc/guides/prog_guide/multi_proc_support.rst  |  2 +-
 doc/guides/sample_app_ug/bbdev_app.rst|  6 ++--
 doc/guides/sample_app_ug/ipsec_secgw.rst  |  6 ++--
 doc/guides/sample_app_ug/vdpa.rst |  2 +-
 doc/guides/sample_app_ug/vhost.rst|  4 +--
 lib/eal/common/eal_common_dynmem.c| 14 
 lib/eal/common/eal_common_lcore.c | 28 +---
 lib/eal/common/eal_common_options.c   | 33 ++-
 lib/eal/common/eal_common_thread.c| 12 +++
 lib/eal/common/eal_internal_cfg.h | 10 +++---
 lib/eal/common/eal_options.h  |  8 +++--
 lib/eal/common/eal_private.h  |  5 ++-
 lib/eal/common/eal_thread.h   | 11 +++
 lib/eal/common/malloc_heap.c  |  2 +-
 lib/eal/freebsd/eal.c |  2 +-
 lib/eal/freebsd/eal_lcore.c   |  6 
 lib/eal/include/rte_lcore.h   | 25 +++---
 lib/eal/linux/eal.c   | 22 ++---
 lib/eal/linux/eal_lcore.c | 28 
 lib/eal/linux/eal_memory.c| 22 ++---
 lib/eal/windows/eal.c |  2 +-
 lib/eal/windows/eal_lcore.c   |  7 
 33 files changed, 204 insertions(+), 124 deletions(-)

-- 
2.43.5



[RFC PATCH v1 1/5] eal: update socket ID API documentation

2024-09-06 Thread Anatoly Burakov
Currently, even though through out DPDK we refer to "socket ID's", in
actuality we are referring to NUMA node ID's, which do not necessarily
correspond to physical sockets.

This is not an API change nor a semantics change, it is merely an update
of API documentation to match what is already the case (the semantics
have changed back when systems started reporting multiple NUMA nodes per
physical socket).

Signed-off-by: Anatoly Burakov 
---
 doc/guides/prog_guide/glossary.rst |  5 -
 lib/eal/include/rte_lcore.h| 25 -
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/doc/guides/prog_guide/glossary.rst 
b/doc/guides/prog_guide/glossary.rst
index 8d6349701e..d09d7bf5f6 100644
--- a/doc/guides/prog_guide/glossary.rst
+++ b/doc/guides/prog_guide/glossary.rst
@@ -191,7 +191,10 @@ Slave lcore
Deprecated name for *worker lcore*. No longer used.
 
 Socket
-   A physical CPU, that includes several *cores*.
+   For historical reasons, the term "socket" is used in the DPDK to refer to
+   both physical sockets, as well as NUMA nodes. As a general rule, the term
+   should be understood to mean "NUMA node" unless it is clear from context
+   that it is referring to physical CPU sockets.
 
 SLA
Service Level Agreement
diff --git a/lib/eal/include/rte_lcore.h b/lib/eal/include/rte_lcore.h
index 7deae47af3..de9e940b76 100644
--- a/lib/eal/include/rte_lcore.h
+++ b/lib/eal/include/rte_lcore.h
@@ -113,22 +113,21 @@ unsigned int rte_lcore_count(void);
 int rte_lcore_index(int lcore_id);
 
 /**
- * Return the ID of the physical socket of the logical core we are
- * running on.
+ * Return the ID of NUMA node of the logical core we are running on.
  * @return
- *   the ID of current lcoreid's physical socket
+ *   the ID of current lcoreid's NUMA node
  */
 unsigned int rte_socket_id(void);
 
 /**
- * Return number of physical sockets detected on the system.
+ * Return number of NUMA nodes detected on the system.
  *
- * Note that number of nodes may not be correspondent to their physical id's:
- * for example, a system may report two socket id's, but the actual socket id's
+ * Note that number of nodes may not be correspondent to their NUMA ID's:
+ * for example, a system may report two NUMA ID's, but the actual NUMA ID's
  * may be 0 and 8.
  *
  * @return
- *   the number of physical sockets as recognized by EAL
+ *   the number of NUMA ID's as recognized by EAL
  */
 unsigned int
 rte_socket_count(void);
@@ -137,26 +136,26 @@ rte_socket_count(void);
  * Return socket id with a particular index.
  *
  * This will return socket id at a particular position in list of all detected
- * physical socket id's. For example, on a machine with sockets [0, 8], passing
- * 1 as a parameter will return 8.
+ * NUMA node ID's. For example, on a machine with NUMA nodes [0, 8], passing 1
+ * as a parameter will return 8.
  *
  * @param idx
- *   index of physical socket id to return
+ *   index of NUMA node ID to return
  *
  * @return
- *   - physical socket id as recognized by EAL
+ *   - NUMA node ID as recognized by EAL
  *   - -1 on error, with errno set to EINVAL
  */
 int
 rte_socket_id_by_idx(unsigned int idx);
 
 /**
- * Get the ID of the physical socket of the specified lcore
+ * Get the ID of the NUMA node of the specified lcore
  *
  * @param lcore_id
  *   the targeted lcore, which MUST be between 0 and RTE_MAX_LCORE-1.
  * @return
- *   the ID of lcoreid's physical socket
+ *   the ID of lcoreid's NUMA node
  */
 unsigned int
 rte_lcore_to_socket_id(unsigned int lcore_id);
-- 
2.43.5



[RFC PATCH v1 2/5] lcore: rename socket ID to NUMA ID

2024-09-06 Thread Anatoly Burakov
Rename socket ID to NUMA ID in internal lcore structure. This does not
change any user facing API's, although it does alter a couple of log
messages.

In particular, telemetry API and lcore dump API changes have been omitted
as there may be consumers of these API that depend on specifics of messages
generated by these API's.

Signed-off-by: Anatoly Burakov 
---
 lib/eal/common/eal_common_lcore.c  | 10 +-
 lib/eal/common/eal_common_thread.c | 12 ++--
 lib/eal/common/eal_private.h   |  2 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/eal/common/eal_common_lcore.c 
b/lib/eal/common/eal_common_lcore.c
index 2ff9252c52..ba8fce6607 100644
--- a/lib/eal/common/eal_common_lcore.c
+++ b/lib/eal/common/eal_common_lcore.c
@@ -115,7 +115,7 @@ unsigned int rte_get_next_lcore(unsigned int i, int 
skip_main, int wrap)
 unsigned int
 rte_lcore_to_socket_id(unsigned int lcore_id)
 {
-   return lcore_config[lcore_id].socket_id;
+   return lcore_config[lcore_id].numa_id;
 }
 
 static int
@@ -173,17 +173,17 @@ rte_eal_cpu_init(void)
config->lcore_role[lcore_id] = ROLE_RTE;
lcore_config[lcore_id].core_role = ROLE_RTE;
lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
-   lcore_config[lcore_id].socket_id = socket_id;
+   lcore_config[lcore_id].numa_id = socket_id;
EAL_LOG(DEBUG, "Detected lcore %u as "
-   "core %u on socket %u",
+   "core %u on NUMA node %u",
lcore_id, lcore_config[lcore_id].core_id,
-   lcore_config[lcore_id].socket_id);
+   lcore_config[lcore_id].numa_id);
count++;
}
for (; lcore_id < CPU_SETSIZE; lcore_id++) {
if (eal_cpu_detected(lcore_id) == 0)
continue;
-   EAL_LOG(DEBUG, "Skipped lcore %u as core %u on socket %u",
+   EAL_LOG(DEBUG, "Skipped lcore %u as core %u on NUMA node %u",
lcore_id, eal_cpu_core_id(lcore_id),
eal_cpu_socket_id(lcore_id));
}
diff --git a/lib/eal/common/eal_common_thread.c 
b/lib/eal/common/eal_common_thread.c
index a53bc639ae..aa98bdc3ff 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -24,13 +24,13 @@
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
 RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
-static RTE_DEFINE_PER_LCORE(unsigned int, _socket_id) =
+static RTE_DEFINE_PER_LCORE(unsigned int, _numa_id) =
(unsigned int)SOCKET_ID_ANY;
 static RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
 
 unsigned rte_socket_id(void)
 {
-   return RTE_PER_LCORE(_socket_id);
+   return RTE_PER_LCORE(_numa_id);
 }
 
 static int
@@ -66,8 +66,8 @@ thread_update_affinity(rte_cpuset_t *cpusetp)
 {
unsigned int lcore_id = rte_lcore_id();
 
-   /* store socket_id in TLS for quick access */
-   RTE_PER_LCORE(_socket_id) =
+   /* store numa_id in TLS for quick access */
+   RTE_PER_LCORE(_numa_id) =
eal_cpuset_socket_id(cpusetp);
 
/* store cpuset in TLS for quick access */
@@ -76,7 +76,7 @@ thread_update_affinity(rte_cpuset_t *cpusetp)
 
if (lcore_id != (unsigned)LCORE_ID_ANY) {
/* EAL thread will update lcore_config */
-   lcore_config[lcore_id].socket_id = RTE_PER_LCORE(_socket_id);
+   lcore_config[lcore_id].numa_id = RTE_PER_LCORE(_numa_id);
memmove(&lcore_config[lcore_id].cpuset, cpusetp,
sizeof(rte_cpuset_t));
}
@@ -256,7 +256,7 @@ static int control_thread_init(void *arg)
/* Set control thread socket ID to SOCKET_ID_ANY
 * as control threads may be scheduled on any NUMA node.
 */
-   RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
+   RTE_PER_LCORE(_numa_id) = SOCKET_ID_ANY;
params->ret = rte_thread_set_affinity_by_id(rte_thread_self(), cpuset);
if (params->ret != 0) {
rte_atomic_store_explicit(¶ms->status,
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index af09620426..196dadc8a2 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -30,7 +30,7 @@ struct lcore_config {
volatile int ret;  /**< return value of function */
 
volatile RTE_ATOMIC(enum rte_lcore_state_t) state; /**< lcore state */
-   unsigned int socket_id;/**< physical socket id for this lcore */
+   unsigned int numa_id;/**< NUMA node ID for this lcore */
unsigned int core_id;  /**< core number on socket for this lcore */
int core_index;/**< relative index, starting from 0 */
uint8_t core_role; /**< role of core eg: OFF, RTE, SERVICE */
-- 
2.43.5



[RFC PATCH v1 3/5] eal: rename socket ID to NUMA ID in internal config

2024-09-06 Thread Anatoly Burakov
This patch renames socket ID-related fields in internal EAL config
structure to refer to NUMA ID instead. No user-facing API's are changed.

Signed-off-by: Anatoly Burakov 
---
 lib/eal/common/eal_common_dynmem.c  | 14 +++---
 lib/eal/common/eal_common_options.c | 16 
 lib/eal/common/eal_internal_cfg.h   | 10 +-
 lib/eal/common/malloc_heap.c|  2 +-
 lib/eal/freebsd/eal.c   |  2 +-
 lib/eal/linux/eal.c | 10 +-
 lib/eal/linux/eal_memory.c  | 22 +++---
 lib/eal/windows/eal.c   |  2 +-
 8 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/lib/eal/common/eal_common_dynmem.c 
b/lib/eal/common/eal_common_dynmem.c
index b4dc231940..4377af5aab 100644
--- a/lib/eal/common/eal_common_dynmem.c
+++ b/lib/eal/common/eal_common_dynmem.c
@@ -264,9 +264,9 @@ eal_dynmem_hugepage_init(void)
 #endif
}
 
-   /* make a copy of socket_mem, needed for balanced allocation. */
+   /* make a copy of numa_mem, needed for balanced allocation. */
for (hp_sz_idx = 0; hp_sz_idx < RTE_MAX_NUMA_NODES; hp_sz_idx++)
-   memory[hp_sz_idx] = internal_conf->socket_mem[hp_sz_idx];
+   memory[hp_sz_idx] = internal_conf->numa_mem[hp_sz_idx];
 
/* calculate final number of pages */
if (eal_dynmem_calc_num_pages_per_socket(memory,
@@ -334,10 +334,10 @@ eal_dynmem_hugepage_init(void)
}
 
/* if socket limits were specified, set them */
-   if (internal_conf->force_socket_limits) {
+   if (internal_conf->force_numa_limits) {
unsigned int i;
for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
-   uint64_t limit = internal_conf->socket_limit[i];
+   uint64_t limit = internal_conf->numa_limit[i];
if (limit == 0)
continue;
if (rte_mem_alloc_validator_register("socket-limit",
@@ -382,7 +382,7 @@ eal_dynmem_calc_num_pages_per_socket(
return -1;
 
/* if specific memory amounts per socket weren't requested */
-   if (internal_conf->force_sockets == 0) {
+   if (internal_conf->force_numa == 0) {
size_t total_size;
 #ifdef RTE_ARCH_64
int cpu_per_socket[RTE_MAX_NUMA_NODES];
@@ -509,10 +509,10 @@ eal_dynmem_calc_num_pages_per_socket(
 
/* if we didn't satisfy all memory requirements per socket */
if (memory[socket] > 0 &&
-   internal_conf->socket_mem[socket] != 0) {
+   internal_conf->numa_mem[socket] != 0) {
/* to prevent icc errors */
requested = (unsigned int)(
-   internal_conf->socket_mem[socket] / 0x10);
+   internal_conf->numa_mem[socket] / 0x10);
available = requested -
((unsigned int)(memory[socket] / 0x10));
EAL_LOG(ERR, "Not enough memory available on "
diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index f1a5e329a5..73fbb8587b 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -333,14 +333,14 @@ eal_reset_internal_config(struct internal_config 
*internal_cfg)
internal_cfg->hugepage_dir = NULL;
internal_cfg->hugepage_file.unlink_before_mapping = false;
internal_cfg->hugepage_file.unlink_existing = true;
-   internal_cfg->force_sockets = 0;
+   internal_cfg->force_numa = 0;
/* zero out the NUMA config */
for (i = 0; i < RTE_MAX_NUMA_NODES; i++)
-   internal_cfg->socket_mem[i] = 0;
-   internal_cfg->force_socket_limits = 0;
+   internal_cfg->numa_mem[i] = 0;
+   internal_cfg->force_numa_limits = 0;
/* zero out the NUMA limits config */
for (i = 0; i < RTE_MAX_NUMA_NODES; i++)
-   internal_cfg->socket_limit[i] = 0;
+   internal_cfg->numa_limit[i] = 0;
/* zero out hugedir descriptors */
for (i = 0; i < MAX_HUGEPAGE_SIZES; i++) {
memset(&internal_cfg->hugepage_info[i], 0,
@@ -2041,7 +2041,7 @@ eal_adjust_config(struct internal_config *internal_cfg)
/* if no memory amounts were requested, this will result in 0 and
 * will be overridden later, right after eal_hugepage_info_init() */
for (i = 0; i < RTE_MAX_NUMA_NODES; i++)
-   internal_cfg->memory += internal_cfg->socket_mem[i];
+   internal_cfg->memory += internal_cfg->numa_mem[i];
 
return 0;
 }
@@ -2082,12 +2082,12 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
"option");
return -1;
}
-   if (mem_parsed && internal_cfg->force_sockets == 1) {
+   if (mem_p

[RFC PATCH v1 4/5] eal: rename --socket-mem/--socket-limit

2024-09-06 Thread Anatoly Burakov
Currently, --socket-mem and --socket-limit EAL flags effectively refer to
NUMA nodes, not CPU sockets. Update the flag names to reflect this. Old
flag names are still supported for backward compatibility.

Signed-off-by: Anatoly Burakov 
---

Notes:
Technically, this is a user-facing change and so would require a
deprecation notice. We can do it the other way around, and add
support for --numa-mem/--numa-limit but do not expose it in
documentation yet, and instead add a deprecation notice for next
release. However, since old flags are kept for compatibility,
nothing will break as a result of merging this series even if we
didn't announce this change in advance. I'm open to feedback on
how to best do this change.

 doc/guides/faq/faq.rst|  4 ++--
 doc/guides/howto/lm_bond_virtio_sriov.rst |  2 +-
 doc/guides/howto/lm_virtio_vhost_user.rst |  2 +-
 doc/guides/howto/pvp_reference_benchmark.rst  |  4 ++--
 .../virtio_user_for_container_networking.rst  |  2 +-
 doc/guides/linux_gsg/build_sample_apps.rst| 20 +--
 doc/guides/linux_gsg/linux_eal_parameters.rst | 16 +++
 doc/guides/nics/mlx4.rst  |  2 +-
 doc/guides/nics/mlx5.rst  |  2 +-
 .../prog_guide/env_abstraction_layer.rst  | 12 +--
 doc/guides/prog_guide/multi_proc_support.rst  |  2 +-
 doc/guides/sample_app_ug/bbdev_app.rst|  6 +++---
 doc/guides/sample_app_ug/ipsec_secgw.rst  |  6 +++---
 doc/guides/sample_app_ug/vdpa.rst |  2 +-
 doc/guides/sample_app_ug/vhost.rst|  4 ++--
 lib/eal/common/eal_common_options.c   | 17 +---
 lib/eal/common/eal_options.h  |  8 +---
 lib/eal/linux/eal.c   | 12 +--
 18 files changed, 64 insertions(+), 59 deletions(-)

diff --git a/doc/guides/faq/faq.rst b/doc/guides/faq/faq.rst
index 2aec432d75..8557d9daf9 100644
--- a/doc/guides/faq/faq.rst
+++ b/doc/guides/faq/faq.rst
@@ -31,7 +31,7 @@ If I execute "l2fwd -l 0-3 -m 64 -n 3 -- -p 3", I get the 
following output, indi
 I have set up a total of 1024 Hugepages (that is, allocated 512 2M pages to 
each NUMA node).
 
 The -m command line parameter does not guarantee that huge pages will be 
reserved on specific sockets. Therefore, allocated huge pages may not be on 
socket 0.
-To request memory to be reserved on a specific socket, please use the 
--socket-mem command-line parameter instead of -m.
+To request memory to be reserved on a specific socket, please use the 
--numa-mem command-line parameter instead of -m.
 
 
 I am running a 32-bit DPDK application on a NUMA system, and sometimes the 
application initializes fine but cannot allocate memory. Why is that happening?
@@ -54,7 +54,7 @@ For example, if your EAL coremask is 0xff0, the main core 
will usually be the fi
 .. Note: Instead of '-c 0xff0' use the '-l 4-11' as a cleaner way to define 
lcores.
 
 In this way, the hugepages have a greater chance of being allocated to the 
correct socket.
-Additionally, a ``--socket-mem`` option could be used to ensure the 
availability of memory for each socket, so that if hugepages were allocated on
+Additionally, a ``--numa-mem`` option could be used to ensure the availability 
of memory for each socket, so that if hugepages were allocated on
 the wrong socket, the application simply will not start.
 
 
diff --git a/doc/guides/howto/lm_bond_virtio_sriov.rst 
b/doc/guides/howto/lm_bond_virtio_sriov.rst
index 60b4462c2c..1859508559 100644
--- a/doc/guides/howto/lm_bond_virtio_sriov.rst
+++ b/doc/guides/howto/lm_bond_virtio_sriov.rst
@@ -614,7 +614,7 @@ Run testpmd in the Virtual Machine.
# use for bonding of virtio and vf tests in VM
 
/root/dpdk//app/dpdk-testpmd \
-   -l 0-3 -n 4 --socket-mem 350 --  --i --port-topology=chained
+   -l 0-3 -n 4 --numa-mem 350 --  --i --port-topology=chained
 
 .. _lm_bond_virtio_sriov_switch_conf:
 
diff --git a/doc/guides/howto/lm_virtio_vhost_user.rst 
b/doc/guides/howto/lm_virtio_vhost_user.rst
index c5c48f10a9..b84ef0dc29 100644
--- a/doc/guides/howto/lm_virtio_vhost_user.rst
+++ b/doc/guides/howto/lm_virtio_vhost_user.rst
@@ -438,4 +438,4 @@ run_testpmd_in_vm.sh
# test system has 8 cpus (0-7), use cpus 2-7 for VM
 
/root/dpdk//app/dpdk-testpmd \
-   -l 0-5 -n 4 --socket-mem 350 -- --burst=64 --i
+   -l 0-5 -n 4 --numa-mem 350 -- --burst=64 --i
diff --git a/doc/guides/howto/pvp_reference_benchmark.rst 
b/doc/guides/howto/pvp_reference_benchmark.rst
index 1043356b3d..073e72ea6f 100644
--- a/doc/guides/howto/pvp_reference_benchmark.rst
+++ b/doc/guides/howto/pvp_reference_benchmark.rst
@@ -122,7 +122,7 @@ Testpmd launch
 
.. code-block:: console
 
-  /app/dpdk-testpmd -l 0,2,3,4,5 --socket-mem=1024 -n 4 \
+  /app/dpdk-testpmd -l 0,2,3,4,5 --numa-mem=1024 -n 4 \
   --vdev 'net_vhost0,iface=/tmp/vhost-user1' \
   --vdev 'net_vhost1,iface=/tmp/vhost-user2' --

[RFC PATCH v1 5/5] lcore: store physical package ID internally

2024-09-06 Thread Anatoly Burakov
This patch introduces a new field in the lcore structure that stores the
physical package ID of the core. This field is populated during EAL init.
It is not exposed through any external API's for now.

Signed-off-by: Anatoly Burakov 
---
 lib/eal/common/eal_common_lcore.c | 18 ++
 lib/eal/common/eal_private.h  |  3 +++
 lib/eal/common/eal_thread.h   | 11 +++
 lib/eal/freebsd/eal_lcore.c   |  6 ++
 lib/eal/linux/eal_lcore.c | 28 
 lib/eal/windows/eal_lcore.c   |  7 +++
 6 files changed, 73 insertions(+)

diff --git a/lib/eal/common/eal_common_lcore.c 
b/lib/eal/common/eal_common_lcore.c
index ba8fce6607..9e937c2d6a 100644
--- a/lib/eal/common/eal_common_lcore.c
+++ b/lib/eal/common/eal_common_lcore.c
@@ -144,7 +144,9 @@ rte_eal_cpu_init(void)
unsigned lcore_id;
unsigned count = 0;
unsigned int socket_id, prev_socket_id;
+   unsigned int package_id, prev_package_id;
int lcore_to_socket_id[RTE_MAX_LCORE];
+   int lcore_to_package_id[RTE_MAX_LCORE];
 
/*
 * Parse the maximum set of logical cores, detect the subset of running
@@ -160,6 +162,10 @@ rte_eal_cpu_init(void)
socket_id = eal_cpu_socket_id(lcore_id);
lcore_to_socket_id[lcore_id] = socket_id;
 
+   /* find physical package ID */
+   package_id = eal_cpu_package_id(lcore_id);
+   lcore_to_package_id[lcore_id] = package_id;
+
if (eal_cpu_detected(lcore_id) == 0) {
config->lcore_role[lcore_id] = ROLE_OFF;
lcore_config[lcore_id].core_index = -1;
@@ -174,6 +180,7 @@ rte_eal_cpu_init(void)
lcore_config[lcore_id].core_role = ROLE_RTE;
lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
lcore_config[lcore_id].numa_id = socket_id;
+   lcore_config[lcore_id].package_id = package_id;
EAL_LOG(DEBUG, "Detected lcore %u as "
"core %u on NUMA node %u",
lcore_id, lcore_config[lcore_id].core_id,
@@ -199,14 +206,25 @@ rte_eal_cpu_init(void)
qsort(lcore_to_socket_id, RTE_DIM(lcore_to_socket_id),
sizeof(lcore_to_socket_id[0]), socket_id_cmp);
 
+   /* sort all package id's in ascending order */
+   qsort(lcore_to_package_id, RTE_DIM(lcore_to_package_id),
+   sizeof(lcore_to_package_id[0]), socket_id_cmp);
+
prev_socket_id = -1;
+   prev_package_id = -1;
config->numa_node_count = 0;
+   config->package_count = 0;
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
socket_id = lcore_to_socket_id[lcore_id];
+   package_id = lcore_to_package_id[lcore_id];
if (socket_id != prev_socket_id)
config->numa_nodes[config->numa_node_count++] =
socket_id;
+   if (package_id != prev_package_id)
+   config->packages[config->package_count++] =
+   package_id;
prev_socket_id = socket_id;
+   prev_package_id = package_id;
}
EAL_LOG(INFO, "Detected NUMA nodes: %u", config->numa_node_count);
 
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index 196dadc8a2..611c0de640 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -31,6 +31,7 @@ struct lcore_config {
 
volatile RTE_ATOMIC(enum rte_lcore_state_t) state; /**< lcore state */
unsigned int numa_id;/**< NUMA node ID for this lcore */
+   unsigned int package_id;   /**< Physical package ID for this lcore */
unsigned int core_id;  /**< core number on socket for this lcore */
int core_index;/**< relative index, starting from 0 */
uint8_t core_role; /**< role of core eg: OFF, RTE, SERVICE */
@@ -48,6 +49,8 @@ struct rte_config {
uint32_t lcore_count;/**< Number of available logical cores. */
uint32_t numa_node_count;/**< Number of detected NUMA nodes. */
uint32_t numa_nodes[RTE_MAX_NUMA_NODES]; /**< List of detected NUMA 
nodes. */
+   uint32_t package_count;  /**< Number of detected physical packages. 
*/
+   uint32_t packages[RTE_MAX_NUMA_NODES]; /**< List of detected physical 
packages. */
uint32_t service_lcore_count;/**< Number of available service cores. */
enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
 
diff --git a/lib/eal/common/eal_thread.h b/lib/eal/common/eal_thread.h
index 1c3c3442d3..32ba36589e 100644
--- a/lib/eal/common/eal_thread.h
+++ b/lib/eal/common/eal_thread.h
@@ -27,6 +27,17 @@ __rte_noreturn uint32_t eal_thread_loop(void *arg);
  */
 unsigned eal_cpu_socket_id(unsigned cpu_id);
 
+/**
+ 

RE: [RFC 1/2] eal: add llc aware functions

2024-09-06 Thread Varghese, Vipin
[AMD Official Use Only - AMD Internal Distribution Only]




> >Some SOCs may only show upper-level caches here, therefore
> > cannot be use blindly without knowing the SOC.
> >
> > Can you please help us understand
> >
>
> For instance, in Neoverse N1 can disable the use of SLC as LLC (a BIOS 
> setting)
> If SLC is not used as LLC, then your script would report the unified L2 as an 
> LLC.

Does `disabling SLC as LLC` disable L3? I think not, and what you are implying 
is the ` ls -d /sys/bus/cpu/devices/cpu%u/cache/index[0-9] | sort -r …… `  will 
return index2 and not index3. Is this the understanding?


> I don't think that's what you are interested in.
My intention as shared is to `whether BIOS setting for CPU NUMA is enabled or 
not, I would like to allow the end customer get the core complexes (tile) which 
are under one group`.
So, if the `Last Level Cache` is L3 or L2 seen by OS, API allows the end user 
to get DPDK lcores sharing the last level cache.

But as per the earlier communication, specific SoC does not behave when some 
setting are done different. For AMD SoC case we are trying to help end user 
with right setting with tuning guides as pointed by ` 12. How to get best 
performance on AMD platform — Data Plane Development Kit 24.11.0-rc0 
documentation 
(dpdk.org)`

Can you please confirm if such tuning guides or recommended settings are shared 
? If not, can you please allow me to setup a technical call to sync on the same?

>
> > 1. if there are specific SoC which do not populate the information at
> > all? If yes are they in DTS?
>
> This information is populated correctly for all SOCs, comment was on the
> script.

Please note, I am not running any script. The command LCORE_GET_LLC is executed 
using C function `open`. As per suggestion of Stephen we have replied we will 
change to C function logic to get details.
Hope there is no longer confusion on this?




[PATCH 00/19] XSC PMD for Yunsilicon NICs

2024-09-06 Thread WanRenyong
This xsc PMD (**librte_net_xsc**) provides poll mode driver for Yunsilicon 
metaScale
serials NICs.

Features:
-
- MTU update
- TSO
- RSS hash
- RSS key update
- RSS reta update
- L3 checksum offload
- L4 checksum offload
- Inner L3 checksum
- Inner L4 checksum
- Basic stats 

Support NICs:
-
- metaScale-200S   Single QSFP56 Port 200GE SmartNIC
- metaScale-200Quad QSFP28 Ports 100GE SmartNIC
- metaScale-50 Dual QSFP28 Port 25GE SmartNIC
- metaScale-100Q   Quad QSFP28 Port 25GE SmartNIC


-

WanRenyong (19):
  net/xsc: add doc and minimum build framework
  net/xsc: add log macro
  net/xsc: add PCI device probe and remove
  net/xsc: add xsc device init and uninit
  net/xsc: add ioctl command interface
  net/xsc: initialize hardware information
  net/xsc: add representor ports probe
  net/xsc: create eth devices for representor ports
  net/xsc: initial representor eth device
  net/xsc: add ethdev configure and rxtx queue setup ops
  net/xsc: add mailbox and structure
  net/xsc: add ethdev RSS hash ops
  net/xsc: add ethdev start and stop ops
  net/xsc: add ethdev Rx burst
  net/xsc: add ethdev Tx burst
  net/xsc: configure xsc device hardware table
  net/xsc: add dev link and MTU ops
  net/xsc: add dev infos get
  net/xsc: add dev basic stats ops

 .mailmap |4 +
 MAINTAINERS  |9 +
 doc/guides/nics/features/xsc.ini |   18 +
 doc/guides/nics/index.rst|1 +
 doc/guides/nics/xsc.rst  |   31 +
 drivers/net/meson.build  |1 +
 drivers/net/xsc/meson.build  |   36 +
 drivers/net/xsc/xsc_ctrl.c   |   64 ++
 drivers/net/xsc/xsc_ctrl.h   |  314 +++
 drivers/net/xsc/xsc_defs.h   |   61 ++
 drivers/net/xsc/xsc_dev.c|  326 +++
 drivers/net/xsc/xsc_dev.h|   99 +++
 drivers/net/xsc/xsc_ethdev.c | 1434 ++
 drivers/net/xsc/xsc_ethdev.h |   81 ++
 drivers/net/xsc/xsc_flow.c   |  167 
 drivers/net/xsc/xsc_flow.h   |   67 ++
 drivers/net/xsc/xsc_log.h|   44 +
 drivers/net/xsc/xsc_rxtx.c   |  445 +
 drivers/net/xsc/xsc_rxtx.h   |  214 +
 drivers/net/xsc/xsc_utils.c  |  346 +++
 drivers/net/xsc/xsc_utils.h  |   27 +
 21 files changed, 3789 insertions(+)
 create mode 100644 doc/guides/nics/features/xsc.ini
 create mode 100644 doc/guides/nics/xsc.rst
 create mode 100644 drivers/net/xsc/meson.build
 create mode 100644 drivers/net/xsc/xsc_ctrl.c
 create mode 100644 drivers/net/xsc/xsc_ctrl.h
 create mode 100644 drivers/net/xsc/xsc_defs.h
 create mode 100644 drivers/net/xsc/xsc_dev.c
 create mode 100644 drivers/net/xsc/xsc_dev.h
 create mode 100644 drivers/net/xsc/xsc_ethdev.c
 create mode 100644 drivers/net/xsc/xsc_ethdev.h
 create mode 100644 drivers/net/xsc/xsc_flow.c
 create mode 100644 drivers/net/xsc/xsc_flow.h
 create mode 100644 drivers/net/xsc/xsc_log.h
 create mode 100644 drivers/net/xsc/xsc_rxtx.c
 create mode 100644 drivers/net/xsc/xsc_rxtx.h
 create mode 100644 drivers/net/xsc/xsc_utils.c
 create mode 100644 drivers/net/xsc/xsc_utils.h

-- 
2.25.1


[PATCH 02/19] net/xsc: add log macro

2024-09-06 Thread WanRenyong
Add log macro to print runtime messages and trace functions.

Signed-off-by: WanRenyong 
---
 drivers/net/xsc/xsc_ethdev.c | 11 +
 drivers/net/xsc/xsc_log.h| 44 
 2 files changed, 55 insertions(+)
 create mode 100644 drivers/net/xsc/xsc_log.h

diff --git a/drivers/net/xsc/xsc_ethdev.c b/drivers/net/xsc/xsc_ethdev.c
index 0e48cb76fa..58ceaa3940 100644
--- a/drivers/net/xsc/xsc_ethdev.c
+++ b/drivers/net/xsc/xsc_ethdev.c
@@ -1,3 +1,14 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2024 Yunsilicon Technology Co., Ltd.
  */
+
+#include "xsc_log.h"
+
+RTE_LOG_REGISTER_SUFFIX(xsc_logtype_init, init, NOTICE);
+RTE_LOG_REGISTER_SUFFIX(xsc_logtype_driver, driver, NOTICE);
+#ifdef RTE_ETHDEV_DEBUG_RX
+RTE_LOG_REGISTER_SUFFIX(xsc_logtype_rx, rx, DEBUG);
+#endif
+#ifdef RTE_ETHDEV_DEBUG_TX
+RTE_LOG_REGISTER_SUFFIX(xsc_logtype_tx, tx, DEBUG);
+#endif
diff --git a/drivers/net/xsc/xsc_log.h b/drivers/net/xsc/xsc_log.h
new file mode 100644
index 00..163145ff09
--- /dev/null
+++ b/drivers/net/xsc/xsc_log.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2024 Yunsilicon Technology Co., Ltd.
+ */
+
+#ifndef _XSC_LOG_H_
+#define _XSC_LOG_H_
+
+#include 
+
+extern int xsc_logtype_init;
+extern int xsc_logtype_driver;
+
+#define PMD_INIT_LOG(level, fmt, ...) \
+   rte_log(RTE_LOG_ ## level, xsc_logtype_init, "%s(): " fmt "\n", \
+   __func__, ##__VA_ARGS__)
+
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+extern int xsc_logtype_rx;
+#define PMD_RX_LOG(level, fmt, ...)\
+   rte_log(RTE_LOG_ ## level, xsc_logtype_rx,  \
+   "%s(): " fmt "\n", __func__, ##__VA_ARGS__)
+#else
+#define PMD_RX_LOG(level, fmt, ...) do { } while (0)
+#endif
+
+#ifdef RTE_ETHDEV_DEBUG_TX
+extern int xsc_logtype_tx;
+#define PMD_TX_LOG(level, fmt, ...)\
+   rte_log(RTE_LOG_ ## level, xsc_logtype_tx,  \
+   "%s(): " fmt "\n", __func__, ##__VA_ARGS__)
+#else
+#define PMD_TX_LOG(level, fmt, ...) do { } while (0)
+#endif
+
+#define PMD_DRV_LOG_RAW(level, fmt, ...) \
+   rte_log(RTE_LOG_ ## level, xsc_logtype_driver, "%s(): " fmt, \
+   __func__, ##__VA_ARGS__)
+
+#define PMD_DRV_LOG(level, fmt, ...) \
+   PMD_DRV_LOG_RAW(level, fmt "\n", ##__VA_ARGS__)
+
+#endif /* _XSC_LOG_H_ */
-- 
2.25.1


[PATCH 01/19] net/xsc: add doc and minimum build framework

2024-09-06 Thread WanRenyong
Add minimum PMD code, doc and build infrastructure for xsc.

Signed-off-by: WanRenyong 
---
 .mailmap |  4 
 MAINTAINERS  |  9 +
 doc/guides/nics/features/xsc.ini |  9 +
 doc/guides/nics/index.rst|  1 +
 doc/guides/nics/xsc.rst  | 31 +++
 drivers/net/meson.build  |  1 +
 drivers/net/xsc/meson.build  | 13 +
 drivers/net/xsc/xsc_ethdev.c |  3 +++
 8 files changed, 71 insertions(+)
 create mode 100644 doc/guides/nics/features/xsc.ini
 create mode 100644 doc/guides/nics/xsc.rst
 create mode 100644 drivers/net/xsc/meson.build
 create mode 100644 drivers/net/xsc/xsc_ethdev.c

diff --git a/.mailmap b/.mailmap
index 09fa253e12..d09ed30e16 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1034,6 +1034,7 @@ Nagadheeraj Rottela 
 Naga Harish K S V 
 Naga Suresh Somarowthu 
 Nalla Pradeep 
+Na Na 
 Na Na 
 Nan Chen 
 Nannan Lu 
@@ -1268,6 +1269,7 @@ Ronak Doshi  
 Ron Beider 
 Ronghua Zhang 
 RongQiang Xie 
+Rong Qian 
 RongQing Li 
 Rongwei Liu 
 Rory Sexton 
@@ -1586,6 +1588,7 @@ Waldemar Dworakowski 
 Walter Heymans 
 Wang Sheng-Hui 
 Wangyu (Eric) 
+WanRenyong 
 Waterman Cao 
 Wathsala Vithanage 
 Weichun Chen 
@@ -1638,6 +1641,7 @@ Xiaonan Zhang 
 Xiao Wang 
 Xiaoxiao Zeng 
 Xiaoxin Peng 
+Xiaoxiong Zhang 
 Xiaoyu Min  
 Xiaoyun Li 
 Xiaoyun Wang 
diff --git a/MAINTAINERS b/MAINTAINERS
index c5a703b5c0..f87d802b24 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -994,6 +994,15 @@ F: drivers/net/txgbe/
 F: doc/guides/nics/txgbe.rst
 F: doc/guides/nics/features/txgbe.ini
 
+Yunsilicon xsc
+M: WanRenyong 
+M: Na Na 
+M: Rong Qian 
+M: Xiaoxiong Zhang 
+F: drivers/net/xsc/
+F: doc/guides/nics/xsc.rst
+F: doc/guides/nics/features/xsc.ini
+
 VMware vmxnet3
 M: Jochen Behrens 
 F: drivers/net/vmxnet3/
diff --git a/doc/guides/nics/features/xsc.ini b/doc/guides/nics/features/xsc.ini
new file mode 100644
index 00..b5c44ce535
--- /dev/null
+++ b/doc/guides/nics/features/xsc.ini
@@ -0,0 +1,9 @@
+;
+; Supported features of the 'xsc' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux= Y
+ARMv8= Y
+x86-64   = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index c14bc7988a..9781097a21 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -69,3 +69,4 @@ Network Interface Controller Drivers
 vhost
 virtio
 vmxnet3
+xsc
diff --git a/doc/guides/nics/xsc.rst b/doc/guides/nics/xsc.rst
new file mode 100644
index 00..d34447a259
--- /dev/null
+++ b/doc/guides/nics/xsc.rst
@@ -0,0 +1,31 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright 2024 Yunsilicon Technology Co., Ltd
+
+XSC Poll Mode Driver
+==
+
+The xsc PMD (**librte_net_xsc**) provides poll mode driver support for
+10/25/50/100/200 Gbps Yunsilicon metaScale Series Network Adapters.
+
+Supported NICs
+--
+
+The following Yunsilicon device models are supported by the same xsc driver:
+
+  - metaScale-200S
+  - metaScale-200
+  - metaScale-100Q
+  - metaScale-50
+
+Prerequisites
+--
+
+- Follow the DPDK :ref:`Getting Started Guide for Linux ` to setup 
the basic DPDK environment.
+
+- Learning about Yunsilicon metaScale Series NICs using
+  ``_.
+
+Limitations or Known issues
+---
+32bit ARCHs have not been tested and may not be supported.
+Windows and BSD are not supported yet.
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index fb6d34b782..67fbe81861 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -62,6 +62,7 @@ drivers = [
 'vhost',
 'virtio',
 'vmxnet3',
+'xsc',
 ]
 std_deps = ['ethdev', 'kvargs'] # 'ethdev' also pulls in mbuf, net, eal etc
 std_deps += ['bus_pci'] # very many PMDs depend on PCI, so make std
diff --git a/drivers/net/xsc/meson.build b/drivers/net/xsc/meson.build
new file mode 100644
index 00..11cdcf912b
--- /dev/null
+++ b/drivers/net/xsc/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Yunsilicon Technology Co., Ltd.
+
+if not is_linux
+build = false
+reason = 'only supported on Linux'
+endif
+
+sources = files(
+'xsc_ethdev.c',
+)
+
+
diff --git a/drivers/net/xsc/xsc_ethdev.c b/drivers/net/xsc/xsc_ethdev.c
new file mode 100644
index 00..0e48cb76fa
--- /dev/null
+++ b/drivers/net/xsc/xsc_ethdev.c
@@ -0,0 +1,3 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2024 Yunsilicon Technology Co., Ltd.
+ */
-- 
2.25.1


[PATCH 04/19] net/xsc: add xsc device init and uninit

2024-09-06 Thread WanRenyong
XSC device is a concept of low level device used to manage
hardware resource and to interact with firmware.

Signed-off-by: WanRenyong 
---
 drivers/net/xsc/meson.build  |  20 +
 drivers/net/xsc/xsc_defs.h   |  23 +
 drivers/net/xsc/xsc_dev.c| 162 +++
 drivers/net/xsc/xsc_dev.h|  34 
 drivers/net/xsc/xsc_ethdev.c |  22 -
 drivers/net/xsc/xsc_ethdev.h |   1 +
 drivers/net/xsc/xsc_utils.c  |  96 +
 drivers/net/xsc/xsc_utils.h  |  14 +++
 8 files changed, 371 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/xsc/xsc_dev.c
 create mode 100644 drivers/net/xsc/xsc_dev.h
 create mode 100644 drivers/net/xsc/xsc_utils.c
 create mode 100644 drivers/net/xsc/xsc_utils.h

diff --git a/drivers/net/xsc/meson.build b/drivers/net/xsc/meson.build
index 11cdcf912b..96b4e59ac4 100644
--- a/drivers/net/xsc/meson.build
+++ b/drivers/net/xsc/meson.build
@@ -8,6 +8,26 @@ endif
 
 sources = files(
 'xsc_ethdev.c',
+'xsc_dev.c',
+'xsc_utils.c',
 )
 
+libnames = ['ibverbs']
+foreach libname:libnames
+lib = dependency('lib' + libname, method : 'pkg-config')
+if lib.found()
+ext_deps += lib
+else
+build = false
+reason = 'missing dependency, "' + lib + '"'
+subdir_done()
+endif
+endforeach
 
+lib = dependency('libxscale', required: false, method : 'pkg-config')
+if lib.found()
+ext_deps += lib
+cflags += '-DHAVE_XSC_DV_PROVIDER=1'
+else
+cflags += '-DHAVE_XSC_DV_PROVIDER=0'
+endif
diff --git a/drivers/net/xsc/xsc_defs.h b/drivers/net/xsc/xsc_defs.h
index b4ede6eca6..97cd61b2d1 100644
--- a/drivers/net/xsc/xsc_defs.h
+++ b/drivers/net/xsc/xsc_defs.h
@@ -8,5 +8,28 @@
 #define XSC_PCI_VENDOR_ID  0x1f67
 #define XSC_PCI_DEV_ID_MS  0x
 
+enum xsc_nic_mode {
+   XSC_NIC_MODE_LEGACY,
+   XSC_NIC_MODE_SWITCHDEV,
+   XSC_NIC_MODE_SOC,
+};
+
+enum xsc_pph_type {
+   XSC_PPH_NONE= 0,
+   XSC_RX_PPH  = 0x1,
+   XSC_TX_PPH  = 0x2,
+   XSC_VFREP_PPH   = 0x4,
+   XSC_UPLINK_PPH  = 0x8,
+};
+
+enum xsc_flow_mode {
+   XSC_FLOW_OFF_HW_ONLY,
+   XSC_FLOW_ON_HW_ONLY,
+   XSC_FLOW_ON_HW_FIRST,
+   XSC_FLOW_HOTSPOT,
+   XSC_FLOW_MODE_NULL = 7,
+   XSC_FLOW_MODE_MAX,
+};
+
 #endif /* XSC_DEFS_H_ */
 
diff --git a/drivers/net/xsc/xsc_dev.c b/drivers/net/xsc/xsc_dev.c
new file mode 100644
index 00..9673049628
--- /dev/null
+++ b/drivers/net/xsc/xsc_dev.c
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2024 Yunsilicon Technology Co., Ltd.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "xsc_log.h"
+#include "xsc_defs.h"
+#include "xsc_dev.h"
+#include "xsc_utils.h"
+
+#define XSC_DEV_DEF_FLOW_MODE  XSC_FLOW_MODE_NULL
+#define XSC_DEV_CTRL_FILE_FMT  "/dev/yunsilicon/port_ctrl_" PCI_PRI_FMT
+
+static
+void xsc_dev_args_parse(struct xsc_dev *dev, struct rte_devargs *devargs)
+{
+   struct rte_kvargs *kvlist;
+   struct xsc_devargs *xdevargs = &dev->devargs;
+   const char *tmp;
+
+   kvlist = rte_kvargs_parse(devargs->args, NULL);
+   if (kvlist == NULL)
+   return;
+
+   tmp = rte_kvargs_get(kvlist, XSC_PPH_MODE_ARG);
+   if (tmp != NULL)
+   xdevargs->pph_mode = atoi(tmp);
+   else
+   xdevargs->pph_mode = XSC_PPH_NONE;
+   tmp = rte_kvargs_get(kvlist, XSC_NIC_MODE_ARG);
+   if (tmp != NULL)
+   xdevargs->nic_mode = atoi(tmp);
+   else
+   xdevargs->nic_mode = XSC_NIC_MODE_LEGACY;
+   tmp = rte_kvargs_get(kvlist, XSC_FLOW_MODE_ARG);
+   if (tmp != NULL)
+   xdevargs->flow_mode = atoi(tmp);
+   else
+   xdevargs->flow_mode = XSC_DEV_DEF_FLOW_MODE;
+
+   rte_kvargs_free(kvlist);
+}
+
+static int
+xsc_dev_open(struct xsc_dev *dev, struct rte_pci_device *pci_dev)
+{
+   struct ibv_device *ib_dev;
+   char ctrl_file[PATH_MAX];
+   struct rte_pci_addr *pci_addr = &pci_dev->addr;
+   int ret;
+
+   ib_dev = xsc_get_ibv_device(&pci_dev->addr);
+   if (ib_dev == NULL) {
+   PMD_DRV_LOG(ERR, "Could not get ibv device");
+   return -ENODEV;
+   }
+
+   dev->ibv_ctx = ibv_open_device(ib_dev);
+   if (dev->ibv_ctx == NULL) {
+   PMD_DRV_LOG(ERR, "Could not open ibv device: %s", ib_dev->name);
+   return -ENODEV;
+   }
+
+   dev->ibv_pd = ibv_alloc_pd(dev->ibv_ctx);
+   if (dev->ibv_pd == NULL) {
+   PMD_DRV_LOG(ERR, "Failed to create pd:%s", ib_dev->name);
+   ret = -EINVAL;
+   goto alloc_pd_fail;
+   }
+
+   strcpy(dev->ibv_name, ib_dev->name);
+
+   snprintf(ctrl_file, PATH_MAX, XSC_DEV_CTRL_FILE_FMT,
+pci_addr->domain, pci_addr->bus, pci_addr->devid, 
pci_addr->function);
+
+   ret = op

[PATCH 05/19] net/xsc: add ioctl command interface

2024-09-06 Thread WanRenyong
IOCTL command interface is one of methods used to interact with
firmware by PMD. By using ioctl interface, PMD sends command to
the kernel module, then the kernel module translates the command
and sends it to firmware, at last, the kernel module send back
PDM the result from firmware.

Signed-off-by: WanRenyong 
---
 drivers/net/xsc/meson.build |  1 +
 drivers/net/xsc/xsc_ctrl.c  | 56 
 drivers/net/xsc/xsc_ctrl.h  | 86 +
 3 files changed, 143 insertions(+)
 create mode 100644 drivers/net/xsc/xsc_ctrl.c
 create mode 100644 drivers/net/xsc/xsc_ctrl.h

diff --git a/drivers/net/xsc/meson.build b/drivers/net/xsc/meson.build
index 96b4e59ac4..5c989dba13 100644
--- a/drivers/net/xsc/meson.build
+++ b/drivers/net/xsc/meson.build
@@ -10,6 +10,7 @@ sources = files(
 'xsc_ethdev.c',
 'xsc_dev.c',
 'xsc_utils.c',
+'xsc_ctrl.c',
 )
 
 libnames = ['ibverbs']
diff --git a/drivers/net/xsc/xsc_ctrl.c b/drivers/net/xsc/xsc_ctrl.c
new file mode 100644
index 00..3e37bd914e
--- /dev/null
+++ b/drivers/net/xsc/xsc_ctrl.c
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2024 Yunsilicon Technology Co., Ltd.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "xsc_log.h"
+#include "xsc_dev.h"
+#include "xsc_ctrl.h"
+
+int
+xsc_ioctl(struct xsc_dev *dev, int cmd, int opcode,
+ void *data_in, int in_len, void *data_out, int out_len)
+{
+   struct xsc_ioctl_hdr *hdr;
+   int data_len = RTE_MAX(in_len, out_len);
+   int alloc_len = sizeof(struct xsc_ioctl_hdr) + data_len;
+   int ret = 0;
+
+   hdr = rte_zmalloc(NULL, alloc_len, RTE_CACHE_LINE_SIZE);
+   if (hdr == NULL) {
+   PMD_DRV_LOG(ERR, "Failed to allocate xsc ioctl cmd memory");
+   return -ENOMEM;
+   }
+
+   hdr->check_field = XSC_IOCTL_CHECK_FIELD;
+   hdr->attr.opcode = opcode;
+   hdr->attr.length = data_len;
+   hdr->attr.error = 0;
+
+   if (data_in != NULL && in_len > 0)
+   rte_memcpy(hdr + 1, data_in, in_len);
+
+   ret = ioctl(dev->ctrl_fd, cmd, hdr);
+   if (ret == 0) {
+   if (hdr->attr.error != 0)
+   ret = hdr->attr.error;
+   else if (data_out != NULL && out_len > 0)
+   rte_memcpy(data_out, hdr + 1, out_len);
+   }
+
+   rte_free(hdr);
+   return ret;
+}
diff --git a/drivers/net/xsc/xsc_ctrl.h b/drivers/net/xsc/xsc_ctrl.h
new file mode 100644
index 00..d343e1b1a7
--- /dev/null
+++ b/drivers/net/xsc/xsc_ctrl.h
@@ -0,0 +1,86 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2024 Yunsilicon Technology Co., Ltd.
+ */
+
+#ifndef _XSC_CTRL_H_
+#define _XSC_CTRL_H_
+
+#include 
+
+#define XSC_IOCTL_CHECK_FIELD  0x01234567
+
+#define XSC_IOCTL_MAGIC0x1b
+#define XSC_IOCTL_CMDQ \
+   _IOWR(XSC_IOCTL_MAGIC, 1, struct xsc_ioctl_hdr)
+#define XSC_IOCTL_DRV_GET \
+   _IOR(XSC_IOCTL_MAGIC, 2, struct xsc_ioctl_hdr)
+#define XSC_IOCTL_CMDQ_RAW \
+   _IOWR(XSC_IOCTL_MAGIC, 5, struct xsc_ioctl_hdr)
+
+enum xsc_ioctl_opcode {
+   XSC_IOCTL_GET_HW_INFO   = 0x100,
+};
+
+enum xsc_ioctl_opmod {
+   XSC_IOCTL_OP_GET_LOCAL,
+};
+
+struct xsc_ioctl_attr {
+   uint16_t opcode; /* ioctl cmd */
+   uint16_t length; /* data length */
+   uint32_t error;  /* ioctl error info */
+   uint8_t data[0]; /* specific table info */
+};
+
+struct xsc_ioctl_hdr {
+   uint32_t check_field;
+   uint32_t domain;
+   uint32_t bus;
+   uint32_t devfn;
+   struct xsc_ioctl_attr attr;
+};
+
+struct xsc_ioctl_data_tl {
+   uint16_t table;
+   uint16_t opmod;
+   uint16_t length;
+   uint16_t rsvd;
+};
+
+struct xsc_ioctl_get_hwinfo {
+   uint32_t domain;
+   uint32_t bus;
+   uint32_t devfn;
+   uint32_t pcie_no;
+   uint32_t func_id;
+   uint32_t pcie_host;
+   uint32_t mac_phy_port;
+   uint32_t funcid_to_logic_port_off;
+   uint16_t lag_id;
+   uint16_t raw_qp_id_base;
+   uint16_t raw_rss_qp_id_base;
+   uint16_t pf0_vf_funcid_base;
+   uint16_t pf0_vf_funcid_top;
+   uint16_t pf1_vf_funcid_base;
+   uint16_t pf1_vf_funcid_top;
+   uint16_t pcie0_pf_funcid_base;
+   uint16_t pcie0_pf_funcid_top;
+   uint16_t pcie1_pf_funcid_base;
+   uint16_t pcie1_pf_funcid_top;
+   uint16_t lag_port_start;
+   uint16_t raw_tpe_qp_num;
+   int send_seg_num;
+   int recv_seg_num;
+   uint8_t on_chip_tbl_vld;
+   uint8_t dma_rw_tbl_vld;
+   uint8_t pct_compress_vld;
+   uint32_t chip_version;
+   uint32_t hca_core_clock;
+   uint8_t mac_bit;
+   uint8_t esw_mode;
+};
+
+int xsc_ioctl(struct xsc_dev *dev, int cmd, int opcode,
+ void *data_in, int in_len, void *data_out, int out_len);
+
+#

[PATCH 06/19] net/xsc: initialize hardware information

2024-09-06 Thread WanRenyong
Getting hardware information is done by ioctl command, which
contains the information of xsc device, as well as the common
information of the NIC board.

Signed-off-by: WanRenyong 
---
 drivers/net/xsc/xsc_dev.c | 63 +++
 drivers/net/xsc/xsc_dev.h | 32 
 2 files changed, 95 insertions(+)

diff --git a/drivers/net/xsc/xsc_dev.c b/drivers/net/xsc/xsc_dev.c
index 9673049628..1eb68ac95d 100644
--- a/drivers/net/xsc/xsc_dev.c
+++ b/drivers/net/xsc/xsc_dev.c
@@ -18,10 +18,64 @@
 #include "xsc_defs.h"
 #include "xsc_dev.h"
 #include "xsc_utils.h"
+#include "xsc_ctrl.h"
 
 #define XSC_DEV_DEF_FLOW_MODE  XSC_FLOW_MODE_NULL
 #define XSC_DEV_CTRL_FILE_FMT  "/dev/yunsilicon/port_ctrl_" PCI_PRI_FMT
 
+static int xsc_hwinfo_init(struct xsc_dev *dev)
+{
+   struct {
+   struct xsc_ioctl_data_tl tl;
+   struct xsc_ioctl_get_hwinfo hwinfo;
+   } data;
+   struct xsc_ioctl_get_hwinfo *info = &data.hwinfo;
+   int data_len;
+   int ret;
+
+   PMD_INIT_FUNC_TRACE();
+
+   data_len = sizeof(data);
+   data.tl.opmod = XSC_IOCTL_OP_GET_LOCAL;
+   ret = xsc_ioctl(dev, XSC_IOCTL_DRV_GET, XSC_IOCTL_GET_HW_INFO, &data, 
data_len,
+   &data, data_len);
+   if (ret != 0) {
+   PMD_DRV_LOG(ERR, "Failed to get hardware info");
+   return ret;
+   }
+
+   dev->hwinfo.valid = 1;
+   dev->hwinfo.pcie_no = info->pcie_no;
+   dev->hwinfo.func_id = info->func_id;
+   dev->hwinfo.pcie_host = info->pcie_host;
+   dev->hwinfo.mac_phy_port = info->mac_phy_port;
+   dev->hwinfo.funcid_to_logic_port_off = info->funcid_to_logic_port_off;
+   dev->hwinfo.lag_id = info->lag_id;
+   dev->hwinfo.raw_qp_id_base = info->raw_qp_id_base;
+   dev->hwinfo.raw_rss_qp_id_base = info->raw_rss_qp_id_base;
+   dev->hwinfo.pf0_vf_funcid_base = info->pf0_vf_funcid_base;
+   dev->hwinfo.pf0_vf_funcid_top = info->pf0_vf_funcid_top;
+   dev->hwinfo.pf1_vf_funcid_base = info->pf1_vf_funcid_base;
+   dev->hwinfo.pf1_vf_funcid_top = info->pf1_vf_funcid_top;
+   dev->hwinfo.pcie0_pf_funcid_base = info->pcie0_pf_funcid_base;
+   dev->hwinfo.pcie0_pf_funcid_top = info->pcie0_pf_funcid_top;
+   dev->hwinfo.pcie1_pf_funcid_base = info->pcie1_pf_funcid_base;
+   dev->hwinfo.pcie1_pf_funcid_top = info->pcie1_pf_funcid_top;
+   dev->hwinfo.lag_port_start = info->lag_port_start;
+   dev->hwinfo.raw_tpe_qp_num = info->raw_tpe_qp_num;
+   dev->hwinfo.send_seg_num = info->send_seg_num;
+   dev->hwinfo.recv_seg_num = info->recv_seg_num;
+   dev->hwinfo.on_chip_tbl_vld = info->on_chip_tbl_vld;
+   dev->hwinfo.dma_rw_tbl_vld = info->dma_rw_tbl_vld;
+   dev->hwinfo.pct_compress_vld = info->pct_compress_vld;
+   dev->hwinfo.chip_version = info->chip_version;
+   dev->hwinfo.hca_core_clock = info->hca_core_clock;
+   dev->hwinfo.mac_bit = info->mac_bit;
+   dev->hwinfo.esw_mode = info->esw_mode;
+
+   return 0;
+}
+
 static
 void xsc_dev_args_parse(struct xsc_dev *dev, struct rte_devargs *devargs)
 {
@@ -142,11 +196,20 @@ xsc_dev_init(struct rte_pci_device *pci_dev, struct 
xsc_dev **dev)
goto dev_open_fail;
}
 
+   ret = xsc_hwinfo_init(d);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Failed to initialize hardware info");
+   goto hwinfo_init_fail;
+   return ret;
+   }
+
d->pci_dev = pci_dev;
*dev = d;
 
return 0;
 
+hwinfo_init_fail:
+   xsc_dev_close(d);
 dev_open_fail:
rte_free(d);
return ret;
diff --git a/drivers/net/xsc/xsc_dev.h b/drivers/net/xsc/xsc_dev.h
index ce9dd65400..5f0e911b42 100644
--- a/drivers/net/xsc/xsc_dev.h
+++ b/drivers/net/xsc/xsc_dev.h
@@ -11,6 +11,37 @@
 #define XSC_NIC_MODE_ARG "nic_mode"
 #define XSC_FLOW_MODE_ARG "flow_mode"
 
+struct xsc_hwinfo {
+   uint8_t valid; /* 1: current phy info is valid, 0 : invalid */
+   uint32_t pcie_no; /* pcie number , 0 or 1 */
+   uint32_t func_id; /* pf glb func id */
+   uint32_t pcie_host; /* host pcie number */
+   uint32_t mac_phy_port; /* mac port */
+   uint32_t funcid_to_logic_port_off; /* port func id offset  */
+   uint16_t lag_id;
+   uint16_t raw_qp_id_base;
+   uint16_t raw_rss_qp_id_base;
+   uint16_t pf0_vf_funcid_base;
+   uint16_t pf0_vf_funcid_top;
+   uint16_t pf1_vf_funcid_base;
+   uint16_t pf1_vf_funcid_top;
+   uint16_t pcie0_pf_funcid_base;
+   uint16_t pcie0_pf_funcid_top;
+   uint16_t pcie1_pf_funcid_base;
+   uint16_t pcie1_pf_funcid_top;
+   uint16_t lag_port_start;
+   uint16_t raw_tpe_qp_num;
+   int send_seg_num;
+   int recv_seg_num;
+   uint8_t on_chip_tbl_vld;
+   uint8_t dma_rw_tbl_vld;
+   uint8_t pct_compress_vld;
+   uint32_t chip_version;
+   uint32_t hca_core_clock;
+   uint8_t mac_bit;
+   

[PATCH 03/19] net/xsc: add PCI device probe and remove

2024-09-06 Thread WanRenyong
Support the following Yunsilicon NICs to be probed:

- metaScale-200
- metaScale-200S
- metaScale-50
- metaScale-100Q

Signed-off-by: WanRenyong 
Signed-off-by: Na Na 
---
 drivers/net/xsc/xsc_defs.h   | 12 ++
 drivers/net/xsc/xsc_ethdev.c | 74 
 drivers/net/xsc/xsc_ethdev.h | 16 
 3 files changed, 102 insertions(+)
 create mode 100644 drivers/net/xsc/xsc_defs.h
 create mode 100644 drivers/net/xsc/xsc_ethdev.h

diff --git a/drivers/net/xsc/xsc_defs.h b/drivers/net/xsc/xsc_defs.h
new file mode 100644
index 00..b4ede6eca6
--- /dev/null
+++ b/drivers/net/xsc/xsc_defs.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2024 Yunsilicon Technology Co., Ltd.
+ */
+
+#ifndef XSC_DEFS_H_
+#define XSC_DEFS_H_
+
+#define XSC_PCI_VENDOR_ID  0x1f67
+#define XSC_PCI_DEV_ID_MS  0x
+
+#endif /* XSC_DEFS_H_ */
+
diff --git a/drivers/net/xsc/xsc_ethdev.c b/drivers/net/xsc/xsc_ethdev.c
index 58ceaa3940..8f4d539848 100644
--- a/drivers/net/xsc/xsc_ethdev.c
+++ b/drivers/net/xsc/xsc_ethdev.c
@@ -2,7 +2,81 @@
  * Copyright 2024 Yunsilicon Technology Co., Ltd.
  */
 
+#include 
+
 #include "xsc_log.h"
+#include "xsc_defs.h"
+#include "xsc_ethdev.h"
+
+static int
+xsc_ethdev_init(struct rte_eth_dev *eth_dev)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(eth_dev);
+
+   PMD_INIT_FUNC_TRACE();
+
+   priv->eth_dev = eth_dev;
+   priv->pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
+
+   return 0;
+}
+
+static int
+xsc_ethdev_uninit(struct rte_eth_dev *eth_dev)
+{
+   RTE_SET_USED(eth_dev);
+   PMD_INIT_FUNC_TRACE();
+
+   return 0;
+}
+
+static int
+xsc_ethdev_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+struct rte_pci_device *pci_dev)
+{
+   int ret;
+
+   PMD_INIT_FUNC_TRACE();
+
+   ret = rte_eth_dev_pci_generic_probe(pci_dev,
+   sizeof(struct xsc_ethdev_priv),
+   xsc_ethdev_init);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Failed to probe ethdev: %s", pci_dev->name);
+   return ret;
+   }
+
+   return 0;
+}
+
+static int
+xsc_ethdev_pci_remove(struct rte_pci_device *pci_dev)
+{
+   int ret;
+
+   PMD_INIT_FUNC_TRACE();
+
+   ret = rte_eth_dev_pci_generic_remove(pci_dev, xsc_ethdev_uninit);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Could not remove ethdev: %s", pci_dev->name);
+   return ret;
+   }
+
+   return 0;
+}
+
+static const struct rte_pci_id xsc_ethdev_pci_id_map[] = {
+   { RTE_PCI_DEVICE(XSC_PCI_VENDOR_ID, XSC_PCI_DEV_ID_MS) },
+};
+
+static struct rte_pci_driver xsc_ethdev_pci_driver = {
+   .id_table  = xsc_ethdev_pci_id_map,
+   .probe = xsc_ethdev_pci_probe,
+   .remove = xsc_ethdev_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_xsc, xsc_ethdev_pci_driver);
+RTE_PMD_REGISTER_PCI_TABLE(net_xsc, xsc_ethdev_pci_id_map);
 
 RTE_LOG_REGISTER_SUFFIX(xsc_logtype_init, init, NOTICE);
 RTE_LOG_REGISTER_SUFFIX(xsc_logtype_driver, driver, NOTICE);
diff --git a/drivers/net/xsc/xsc_ethdev.h b/drivers/net/xsc/xsc_ethdev.h
new file mode 100644
index 00..75aa34dc63
--- /dev/null
+++ b/drivers/net/xsc/xsc_ethdev.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2024 Yunsilicon Technology Co., Ltd.
+ */
+
+#ifndef _XSC_ETHDEV_H_
+#define _XSC_ETHDEV_H_
+
+struct xsc_ethdev_priv {
+   struct rte_eth_dev *eth_dev;
+   struct rte_pci_device *pci_dev;
+};
+
+#define TO_XSC_ETHDEV_PRIV(dev) \
+   ((struct xsc_ethdev_priv *)(dev)->data->dev_private)
+
+#endif /* _XSC_ETHDEV_H_ */
-- 
2.25.1


[PATCH 09/19] net/xsc: initial representor eth device

2024-09-06 Thread WanRenyong
Initialize xsc eth device private data.

Signed-off-by: WanRenyong 
---
 drivers/net/xsc/xsc_defs.h   |   2 +-
 drivers/net/xsc/xsc_dev.h|   3 +
 drivers/net/xsc/xsc_ethdev.c |  64 +
 drivers/net/xsc/xsc_ethdev.h |  30 ++
 drivers/net/xsc/xsc_utils.c  | 105 +++
 drivers/net/xsc/xsc_utils.h  |   8 ++-
 6 files changed, 210 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xsc/xsc_defs.h b/drivers/net/xsc/xsc_defs.h
index 8cb67ed2e1..7dc57e5717 100644
--- a/drivers/net/xsc/xsc_defs.h
+++ b/drivers/net/xsc/xsc_defs.h
@@ -10,7 +10,7 @@
 
 #define XSC_VFREP_BASE_LOGICAL_PORT 1081
 
-
+#define XSC_MAX_MAC_ADDRESSES 3
 
 enum xsc_nic_mode {
XSC_NIC_MODE_LEGACY,
diff --git a/drivers/net/xsc/xsc_dev.h b/drivers/net/xsc/xsc_dev.h
index 93ab1e24fe..f77551f1c5 100644
--- a/drivers/net/xsc/xsc_dev.h
+++ b/drivers/net/xsc/xsc_dev.h
@@ -15,6 +15,9 @@
 
 #define XSC_DEV_REPR_PORT  0
 
+#define FUNCID_TYPE_MASK 0x1c000
+#define FUNCID_MASK 0x3fff
+
 struct xsc_hwinfo {
uint8_t valid; /* 1: current phy info is valid, 0 : invalid */
uint32_t pcie_no; /* pcie number , 0 or 1 */
diff --git a/drivers/net/xsc/xsc_ethdev.c b/drivers/net/xsc/xsc_ethdev.c
index d6efc3c9a0..aacce8b90d 100644
--- a/drivers/net/xsc/xsc_ethdev.c
+++ b/drivers/net/xsc/xsc_ethdev.c
@@ -8,15 +8,79 @@
 #include "xsc_defs.h"
 #include "xsc_dev.h"
 #include "xsc_ethdev.h"
+#include "xsc_utils.h"
+
+#include "xsc_ctrl.h"
+
+const struct eth_dev_ops xsc_dev_ops = {
+};
 
 static int
 xsc_ethdev_init_one_representor(struct rte_eth_dev *eth_dev, void *init_params)
 {
struct xsc_repr_port *repr_port = (struct xsc_repr_port *)init_params;
struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(eth_dev);
+   struct xsc_dev_config *config = &priv->config;
+   struct rte_ether_addr mac;
 
priv->repr_port = repr_port;
repr_port->drv_data = eth_dev;
+   priv->xdev = repr_port->xdev;
+   priv->mtu = RTE_ETHER_MTU;
+   priv->funcid_type = (repr_port->info.funcid & FUNCID_TYPE_MASK) >> 14;
+   priv->funcid = repr_port->info.funcid & FUNCID_MASK;
+   if (repr_port->info.port_type == XSC_PORT_TYPE_UPLINK ||
+   repr_port->info.port_type == XSC_PORT_TYPE_UPLINK_BOND)
+   priv->eth_type = RTE_ETH_REPRESENTOR_PF;
+   else
+   priv->eth_type = RTE_ETH_REPRESENTOR_VF;
+   priv->representor_id = repr_port->info.repr_id;
+   priv->dev_data = eth_dev->data;
+   priv->ifindex = repr_port->info.ifindex;
+
+   eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
+   eth_dev->data->mac_addrs = priv->mac;
+   if (rte_is_zero_ether_addr(eth_dev->data->mac_addrs)) {
+   if (priv->ifindex > 0) {
+   int ret  = xsc_get_mac(mac.addr_bytes, priv->ifindex);
+   if (ret != 0) {
+   PMD_DRV_LOG(ERR, "port %u cannot get MAC 
address",
+   eth_dev->data->port_id);
+   return -ENODEV;
+   }
+   } else {
+   rte_eth_random_addr(mac.addr_bytes);
+   }
+   }
+
+   xsc_mac_addr_add(eth_dev, &mac, 0);
+
+   if (priv->ifindex > 0)
+   xsc_get_mtu(&priv->mtu, priv->ifindex);
+
+   config->hw_csum = 1;
+
+   config->pph_flag =  priv->xdev->devargs.pph_mode;
+   if ((config->pph_flag & XSC_TX_PPH) != 0) {
+   config->tso = 0;
+   } else {
+   config->tso = 1;
+   if (config->tso)
+   config->tso_max_payload_sz = 1500;
+   }
+
+   priv->representor = !!priv->eth_type;
+   if (priv->representor) {
+   eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
+   eth_dev->data->representor_id = priv->representor_id;
+   eth_dev->data->backer_port_id = eth_dev->data->port_id;
+   }
+   eth_dev->dev_ops = &xsc_dev_ops;
+
+   eth_dev->rx_pkt_burst = rte_eth_pkt_burst_dummy;
+   eth_dev->tx_pkt_burst = rte_eth_pkt_burst_dummy;
+
+   rte_eth_dev_probing_finish(eth_dev);
 
return 0;
 }
diff --git a/drivers/net/xsc/xsc_ethdev.h b/drivers/net/xsc/xsc_ethdev.h
index a05a63193c..7c7e71d618 100644
--- a/drivers/net/xsc/xsc_ethdev.h
+++ b/drivers/net/xsc/xsc_ethdev.h
@@ -5,11 +5,41 @@
 #ifndef _XSC_ETHDEV_H_
 #define _XSC_ETHDEV_H_
 
+struct xsc_dev_config {
+   uint8_t pph_flag;
+   unsigned int hw_csum:1;
+   unsigned int tso:1;
+   unsigned int tso_max_payload_sz;
+};
+
 struct xsc_ethdev_priv {
struct rte_eth_dev *eth_dev;
struct rte_pci_device *pci_dev;
struct xsc_dev *xdev;
struct xsc_repr_port *repr_port;
+   struct xsc_dev_config config;
+   struct rte_eth_dev_data *dev_data;
+   struct rte_ether_addr mac[XSC_MAX_MAC_ADDRESSES];
+   struct rte_eth_rss_co

[PATCH 10/19] net/xsc: add ethdev configure and rxtx queue setup ops

2024-09-06 Thread WanRenyong
Implement xsc ethdev configure, Rx and Tx queue setup functions.

Signed-off-by: WanRenyong 
---
 drivers/net/xsc/xsc_ethdev.c | 171 +++
 drivers/net/xsc/xsc_ethdev.h |   6 ++
 drivers/net/xsc/xsc_rxtx.h   | 115 +++
 3 files changed, 292 insertions(+)
 create mode 100644 drivers/net/xsc/xsc_rxtx.h

diff --git a/drivers/net/xsc/xsc_ethdev.c b/drivers/net/xsc/xsc_ethdev.c
index aacce8b90d..5ad9567eb3 100644
--- a/drivers/net/xsc/xsc_ethdev.c
+++ b/drivers/net/xsc/xsc_ethdev.c
@@ -11,8 +11,179 @@
 #include "xsc_utils.h"
 
 #include "xsc_ctrl.h"
+#include "xsc_rxtx.h"
+
+static int
+xsc_rss_modify_cmd(struct xsc_ethdev_priv *priv, uint8_t *rss_key,
+  uint8_t rss_key_len)
+{
+   return 0;
+}
+
+static int
+xsc_ethdev_rss_hash_update(struct rte_eth_dev *dev,
+  struct rte_eth_rss_conf *rss_conf)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+   int ret = 0;
+
+   if (rss_conf->rss_key_len > XSC_RSS_HASH_KEY_LEN ||
+   rss_conf->rss_key == NULL) {
+   PMD_DRV_LOG(ERR, "Xsc pmd key len is %d bigger than %d",
+   rss_conf->rss_key_len, XSC_RSS_HASH_KEY_LEN);
+   return -EINVAL;
+   }
+
+   ret = xsc_rss_modify_cmd(priv, rss_conf->rss_key, 
rss_conf->rss_key_len);
+   if (ret == 0) {
+   rte_memcpy(priv->rss_conf.rss_key, rss_conf->rss_key,
+   priv->rss_conf.rss_key_len);
+   priv->rss_conf.rss_key_len = rss_conf->rss_key_len;
+   priv->rss_conf.rss_hf = rss_conf->rss_hf;
+   }
+
+   return ret;
+}
+
+static int
+xsc_ethdev_configure(struct rte_eth_dev *dev)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+   struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+   int ret;
+   struct rte_eth_rss_conf *rss_conf;
+
+   priv->num_sq = dev->data->nb_tx_queues;
+   priv->num_rq = dev->data->nb_rx_queues;
+
+   if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
+   dev->data->dev_conf.rxmode.offloads |= 
RTE_ETH_RX_OFFLOAD_RSS_HASH;
+
+   if (priv->rss_conf.rss_key == NULL) {
+   priv->rss_conf.rss_key = rte_zmalloc(NULL, XSC_RSS_HASH_KEY_LEN,
+   RTE_CACHE_LINE_SIZE);
+   if (priv->rss_conf.rss_key == NULL) {
+   PMD_DRV_LOG(ERR, "Failed to alloc rss_key");
+   rte_errno = ENOMEM;
+   ret = -rte_errno;
+   goto error;
+   }
+   priv->rss_conf.rss_key_len = XSC_RSS_HASH_KEY_LEN;
+   }
+
+   if (dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key != NULL) {
+   rss_conf = &dev->data->dev_conf.rx_adv_conf.rss_conf;
+   ret = xsc_ethdev_rss_hash_update(dev, rss_conf);
+   if (ret != 0) {
+   PMD_DRV_LOG(ERR, "Xsc pmd set rss key error!");
+   rte_errno = -ENOEXEC;
+   goto error;
+   }
+   }
+
+   if (rxmode->offloads && RTE_ETH_RX_OFFLOAD_VLAN_FILTER) {
+   PMD_DRV_LOG(ERR, "xsc pmd do not support vlan filter now!");
+   rte_errno = EINVAL;
+   goto error;
+   }
+
+   if (rxmode->offloads && RTE_ETH_RX_OFFLOAD_VLAN_STRIP) {
+   PMD_DRV_LOG(ERR, "xsc pmd do not support vlan strip now!");
+   rte_errno = EINVAL;
+   goto error;
+   }
+
+   priv->txqs = (void *)dev->data->tx_queues;
+   priv->rxqs = (void *)dev->data->rx_queues;
+   return 0;
+
+error:
+   return -rte_errno;
+}
+
+static int
+xsc_ethdev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+ uint32_t socket, const struct rte_eth_rxconf *conf,
+ struct rte_mempool *mp)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+   struct xsc_rxq_data *rxq_data = NULL;
+   uint16_t desc_n;
+   uint16_t rx_free_thresh;
+   uint64_t offloads = conf->offloads |
+   dev->data->dev_conf.rxmode.offloads;
+
+   desc = (desc > XSC_MAX_DESC_NUMBER) ? XSC_MAX_DESC_NUMBER : desc;
+   desc_n = desc;
+
+   if (!rte_is_power_of_2(desc))
+   desc_n = 1 << rte_log2_u32(desc);
+
+   rxq_data = rte_malloc_socket(NULL, sizeof(*rxq_data) + desc_n * 
sizeof(struct rte_mbuf *),
+   RTE_CACHE_LINE_SIZE, socket);
+   if (rxq_data == NULL) {
+   PMD_DRV_LOG(ERR, "Port %u create rxq idx %d failure",
+   dev->data->port_id, idx);
+   rte_errno = ENOMEM;
+   return -rte_errno;
+   }
+   rxq_data->idx = idx;
+   rxq_data->priv = priv;
+   (*priv->rxqs)[idx] = rxq_data;
+
+   rx_free_thresh = (conf

[PATCH 08/19] net/xsc: create eth devices for representor ports

2024-09-06 Thread WanRenyong
Each representor port is a rte ethernet device.

Signed-off-by: WanRenyong 
---
 drivers/net/xsc/xsc_ethdev.c | 87 
 drivers/net/xsc/xsc_ethdev.h |  1 +
 2 files changed, 88 insertions(+)

diff --git a/drivers/net/xsc/xsc_ethdev.c b/drivers/net/xsc/xsc_ethdev.c
index 6a33cbb2cd..d6efc3c9a0 100644
--- a/drivers/net/xsc/xsc_ethdev.c
+++ b/drivers/net/xsc/xsc_ethdev.c
@@ -9,6 +9,83 @@
 #include "xsc_dev.h"
 #include "xsc_ethdev.h"
 
+static int
+xsc_ethdev_init_one_representor(struct rte_eth_dev *eth_dev, void *init_params)
+{
+   struct xsc_repr_port *repr_port = (struct xsc_repr_port *)init_params;
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(eth_dev);
+
+   priv->repr_port = repr_port;
+   repr_port->drv_data = eth_dev;
+
+   return 0;
+}
+
+static int
+xsc_ethdev_init_representors(struct rte_eth_dev *eth_dev)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(eth_dev);
+   struct rte_eth_devargs eth_da = { .nb_representor_ports = 0 };
+   struct rte_device *dev;
+   struct xsc_dev *xdev;
+   struct xsc_repr_port *repr_port;
+   char name[RTE_ETH_NAME_MAX_LEN];
+   int i;
+   int ret;
+
+   PMD_INIT_FUNC_TRACE();
+
+   dev = &priv->pci_dev->device;
+   if (dev->devargs != NULL) {
+   ret = rte_eth_devargs_parse(dev->devargs->args, ð_da, 1);
+   if (ret < 0) {
+   PMD_DRV_LOG(ERR, "Failed to parse device arguments: %s",
+   dev->devargs->args);
+   return -EINVAL;
+   }
+   }
+
+   xdev = priv->xdev;
+   ret = xsc_repr_ports_probe(xdev, eth_da.nb_representor_ports, 
RTE_MAX_ETHPORTS);
+   if (ret != 0) {
+   PMD_DRV_LOG(ERR, "Failed to probe %d xsc device representors",
+   eth_da.nb_representor_ports);
+   return ret;
+   }
+
+   repr_port = &xdev->repr_ports[XSC_DEV_REPR_PORT];
+   ret = xsc_ethdev_init_one_representor(eth_dev, repr_port);
+   if (ret != 0) {
+   PMD_DRV_LOG(ERR, "Failed to init backing representor");
+   return ret;
+   }
+
+   for (i = 1; i < xdev->num_repr_ports; i++) {
+   repr_port = &xdev->repr_ports[i];
+   snprintf(name, sizeof(name), "%s_rep_%d",
+xdev->ibv_name, repr_port->info.repr_id);
+   ret = rte_eth_dev_create(&xdev->pci_dev->device,
+name,
+sizeof(struct xsc_ethdev_priv),
+NULL, NULL,
+xsc_ethdev_init_one_representor,
+repr_port);
+   if (ret != 0) {
+   PMD_DRV_LOG(ERR, "Failed to create representor: %d", i);
+   goto destroy_reprs;
+   }
+   }
+
+   return 0;
+
+destroy_reprs:
+   while ((i--) > 1) {
+   repr_port = &xdev->repr_ports[i];
+   rte_eth_dev_destroy((struct rte_eth_dev *)repr_port->drv_data, 
NULL);
+   }
+   return ret;
+}
+
 static int
 xsc_ethdev_init(struct rte_eth_dev *eth_dev)
 {
@@ -26,7 +103,17 @@ xsc_ethdev_init(struct rte_eth_dev *eth_dev)
return ret;
}
 
+   ret = xsc_ethdev_init_representors(eth_dev);
+   if (ret != 0) {
+   PMD_DRV_LOG(ERR, "Failed to initialize representors");
+   goto uninit_xsc_dev;
+   }
+
return 0;
+
+uninit_xsc_dev:
+   xsc_dev_uninit(priv->xdev);
+   return ret;
 }
 
 static int
diff --git a/drivers/net/xsc/xsc_ethdev.h b/drivers/net/xsc/xsc_ethdev.h
index 22fc462e25..a05a63193c 100644
--- a/drivers/net/xsc/xsc_ethdev.h
+++ b/drivers/net/xsc/xsc_ethdev.h
@@ -9,6 +9,7 @@ struct xsc_ethdev_priv {
struct rte_eth_dev *eth_dev;
struct rte_pci_device *pci_dev;
struct xsc_dev *xdev;
+   struct xsc_repr_port *repr_port;
 };
 
 #define TO_XSC_ETHDEV_PRIV(dev) \
-- 
2.25.1


[PATCH 07/19] net/xsc: add representor ports probe

2024-09-06 Thread WanRenyong
XSC representor port is designed to store representor resources.
In addition to common representor ports, xsc device is a special
representor port.

Signed-off-by: WanRenyong 
Signed-off-by: Na Na 
---
 drivers/net/xsc/xsc_defs.h  |  24 +++
 drivers/net/xsc/xsc_dev.c   | 103 +-
 drivers/net/xsc/xsc_dev.h   |  27 
 drivers/net/xsc/xsc_utils.c | 122 
 drivers/net/xsc/xsc_utils.h |   3 +
 5 files changed, 278 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xsc/xsc_defs.h b/drivers/net/xsc/xsc_defs.h
index 97cd61b2d1..8cb67ed2e1 100644
--- a/drivers/net/xsc/xsc_defs.h
+++ b/drivers/net/xsc/xsc_defs.h
@@ -8,6 +8,10 @@
 #define XSC_PCI_VENDOR_ID  0x1f67
 #define XSC_PCI_DEV_ID_MS  0x
 
+#define XSC_VFREP_BASE_LOGICAL_PORT 1081
+
+
+
 enum xsc_nic_mode {
XSC_NIC_MODE_LEGACY,
XSC_NIC_MODE_SWITCHDEV,
@@ -31,5 +35,25 @@ enum xsc_flow_mode {
XSC_FLOW_MODE_MAX,
 };
 
+enum xsc_funcid_type {
+   XSC_FUNCID_TYPE_INVAL   = 0x0,
+   XSC_EMU_FUNCID  = 0x1,
+   XSC_PHYPORT_MAC_FUNCID  = 0x2,
+   XSC_VF_IOCTL_FUNCID = 0x3,
+   XSC_PHYPORT_LAG_FUNCID  = 0x4,
+   XSC_FUNCID_TYPE_UNKNOWN = 0x5,
+};
+
+enum xsc_phy_port_type {
+   XSC_PORT_TYPE_NONE = 0,
+   XSC_PORT_TYPE_UPLINK, /* mac0rep */
+   XSC_PORT_TYPE_UPLINK_BOND, /* bondrep */
+   XSC_PORT_TYPE_PFVF, /*hasreps: vfrep*/
+   XSC_PORT_TYPE_PFHPF, /*hasreps: host pf rep*/
+   XSC_PORT_TYPE_UNKNOWN,
+};
+
+#define XSC_PHY_PORT_NUM 1
+
 #endif /* XSC_DEFS_H_ */
 
diff --git a/drivers/net/xsc/xsc_dev.c b/drivers/net/xsc/xsc_dev.c
index 1eb68ac95d..3ba9a16116 100644
--- a/drivers/net/xsc/xsc_dev.c
+++ b/drivers/net/xsc/xsc_dev.c
@@ -23,6 +23,31 @@
 #define XSC_DEV_DEF_FLOW_MODE  XSC_FLOW_MODE_NULL
 #define XSC_DEV_CTRL_FILE_FMT  "/dev/yunsilicon/port_ctrl_" PCI_PRI_FMT
 
+static int
+xsc_dev_alloc_vfos_info(struct xsc_dev *dev)
+{
+   struct xsc_hwinfo *hwinfo;
+   int vfrep_offset = 0;
+   int base_lp = 0;
+
+   hwinfo = &dev->hwinfo;
+   if (hwinfo->pcie_no == 1) {
+   vfrep_offset = hwinfo->func_id -
+  hwinfo->pcie1_pf_funcid_base +
+  hwinfo->pcie0_pf_funcid_top -
+  hwinfo->pcie0_pf_funcid_base  + 1;
+   } else {
+   vfrep_offset = hwinfo->func_id - hwinfo->pcie0_pf_funcid_base;
+   }
+
+   base_lp = XSC_VFREP_BASE_LOGICAL_PORT;
+   if (dev->devargs.nic_mode == XSC_NIC_MODE_LEGACY)
+   base_lp = base_lp + vfrep_offset;
+
+   dev->vfos_logical_in_port = base_lp;
+   return 0;
+}
+
 static int xsc_hwinfo_init(struct xsc_dev *dev)
 {
struct {
@@ -174,6 +199,73 @@ xsc_dev_close(struct xsc_dev *dev)
ibv_close_device(dev->ibv_ctx);
 }
 
+static void
+xsc_repr_info_init(struct xsc_repr_info *info, enum xsc_phy_port_type 
port_type,
+  enum xsc_funcid_type funcid_type, int32_t repr_id)
+{
+   info->repr_id = repr_id;
+   info->port_type = port_type;
+   if (port_type == XSC_PORT_TYPE_UPLINK_BOND) {
+   info->pf_bond = 1;
+   info->funcid = XSC_PHYPORT_LAG_FUNCID << 14;
+   } else if (port_type == XSC_PORT_TYPE_UPLINK) {
+   info->pf_bond = -1;
+   info->funcid = XSC_PHYPORT_MAC_FUNCID << 14;
+   } else if (port_type == XSC_PORT_TYPE_PFVF) {
+   info->funcid = funcid_type << 14;
+   }
+}
+
+int
+xsc_repr_ports_probe(struct xsc_dev *dev, int nb_ports, int max_nb_ports)
+{
+   int funcid_type;
+   struct xsc_repr_port *repr_port;
+   int i;
+   int ret;
+
+   PMD_INIT_FUNC_TRACE();
+
+   ret = xsc_get_ifindex_by_pci_addr(&dev->pci_dev->addr, &dev->ifindex);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Could not get xsc dev ifindex");
+   return ret;
+   }
+
+   dev->num_repr_ports = nb_ports + 1;
+
+   dev->repr_ports = rte_zmalloc(NULL,
+ sizeof(struct xsc_repr_port) * 
dev->num_repr_ports,
+ RTE_CACHE_LINE_SIZE);
+   if (dev->repr_ports == NULL) {
+   PMD_DRV_LOG(ERR, "Failed to allocate memory for repr_ports");
+   return -ENOMEM;
+   }
+
+   funcid_type = (dev->devargs.nic_mode == XSC_NIC_MODE_SWITCHDEV) ?
+   XSC_VF_IOCTL_FUNCID : XSC_PHYPORT_MAC_FUNCID;
+
+   repr_port = &dev->repr_ports[XSC_DEV_REPR_PORT];
+   xsc_repr_info_init(&repr_port->info,
+  XSC_PORT_TYPE_UPLINK, XSC_FUNCID_TYPE_UNKNOWN, -1);
+   repr_port->info.ifindex = dev->ifindex;
+   repr_port->xdev = dev;
+
+   if ((dev->devargs.pph_mode & XSC_TX_PPH) == 0)
+   repr_port->info.repr_id = 510;
+   else
+   repr_port->info.repr_id = max_nb_ports - 1;
+
+   for (i = 1; i < dev->num_repr_ports; i++) {
+   repr_port = &dev->r

[PATCH 11/19] net/xsc: add mailbox and structure

2024-09-06 Thread WanRenyong
Mailbox is a communication channel between driver and firmware.

Signed-off-by: WanRenyong 
---
 drivers/net/xsc/xsc_ctrl.c |  8 
 drivers/net/xsc/xsc_ctrl.h | 31 +++
 2 files changed, 39 insertions(+)

diff --git a/drivers/net/xsc/xsc_ctrl.c b/drivers/net/xsc/xsc_ctrl.c
index 3e37bd914e..4d5e4b4d07 100644
--- a/drivers/net/xsc/xsc_ctrl.c
+++ b/drivers/net/xsc/xsc_ctrl.c
@@ -54,3 +54,11 @@ xsc_ioctl(struct xsc_dev *dev, int cmd, int opcode,
rte_free(hdr);
return ret;
 }
+
+int
+xsc_mailbox_exec(struct xsc_dev *dev, void *data_in,
+int in_len, void *data_out, int out_len)
+{
+   /* ignore opcode in hdr->attr when cmd = XSC_IOCTL_CMDQ_RAW */
+   return xsc_ioctl(dev, XSC_IOCTL_CMDQ_RAW, 0, data_in, in_len, data_out, 
out_len);
+}
diff --git a/drivers/net/xsc/xsc_ctrl.h b/drivers/net/xsc/xsc_ctrl.h
index d343e1b1a7..a7259c5fcb 100644
--- a/drivers/net/xsc/xsc_ctrl.h
+++ b/drivers/net/xsc/xsc_ctrl.h
@@ -40,6 +40,35 @@ struct xsc_ioctl_hdr {
struct xsc_ioctl_attr attr;
 };
 
+/* ioctl */
+struct xsc_inbox_hdr {
+   __be16 opcode;
+   uint8_trsvd[4];
+   __be16 opmod;
+};
+
+struct xsc_outbox_hdr {
+   uint8_t status;
+   uint8_t rsvd[3];
+   __be32  syndrome;
+};
+
+/* ioctl mbox */
+struct xsc_ioctl_mbox_in {
+   struct xsc_inbox_hdrhdr;
+   __be16  len;
+   __be16  rsvd;
+   uint8_t data[];
+};
+
+struct xsc_ioctl_mbox_out {
+   struct xsc_outbox_hdr   hdr;
+   __be32  error;
+   __be16  len;
+   __be16  rsvd;
+   uint8_t data[];
+};
+
 struct xsc_ioctl_data_tl {
uint16_t table;
uint16_t opmod;
@@ -82,5 +111,7 @@ struct xsc_ioctl_get_hwinfo {
 
 int xsc_ioctl(struct xsc_dev *dev, int cmd, int opcode,
  void *data_in, int in_len, void *data_out, int out_len);
+int xsc_mailbox_exec(struct xsc_dev *dev, void *data_in,
+int in_len, void *data_out, int out_len);
 
 #endif /* _XSC_CTRL_H_ */
-- 
2.25.1


[PATCH 14/19] net/xsc: add ethdev Rx burst

2024-09-06 Thread WanRenyong
Implement xsc PMD receive function.

Signed-off-by: WanRenyong 
Signed-off-by: Xiaoxiong Zhang 
---
 drivers/net/xsc/xsc_rxtx.c | 189 -
 drivers/net/xsc/xsc_rxtx.h |   9 ++
 2 files changed, 197 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xsc/xsc_rxtx.c b/drivers/net/xsc/xsc_rxtx.c
index 66b1511c6a..28360e62ff 100644
--- a/drivers/net/xsc/xsc_rxtx.c
+++ b/drivers/net/xsc/xsc_rxtx.c
@@ -7,11 +7,198 @@
 #include "xsc_dev.h"
 #include "xsc_ethdev.h"
 #include "xsc_rxtx.h"
+#include "xsc_utils.h"
+#include "xsc_ctrl.h"
+
+#define XSC_CQE_OWNER_MASK 0x1
+#define XSC_CQE_OWNER_HW   0x2
+#define XSC_CQE_OWNER_SW   0x4
+#define XSC_CQE_OWNER_ERR  0x8
+
+#define XSC_MAX_RX_BURST_MBUFS 64
+
+static __rte_always_inline int
+check_cqe_own(volatile struct xsc_cqe *cqe, const uint16_t cqe_n,
+ const uint16_t ci)
+
+{
+   if (unlikely(((cqe->owner & XSC_CQE_OWNER_MASK) !=
+   ((ci >> cqe_n) & XSC_CQE_OWNER_MASK
+   return XSC_CQE_OWNER_HW;
+
+   rte_io_rmb();
+   if (cqe->msg_len <= 0 && cqe->is_error)
+   return XSC_CQE_OWNER_ERR;
+
+   return XSC_CQE_OWNER_SW;
+}
+
+static inline void
+xsc_cq_to_mbuf(struct xsc_rxq_data *rxq, struct rte_mbuf *pkt,
+  volatile struct xsc_cqe *cqe)
+{
+   uint32_t rss_hash_res = 0;
+   pkt->port = rxq->port_id;
+   if (rxq->rss_hash) {
+   rss_hash_res = rte_be_to_cpu_32(cqe->vni);
+   if (rss_hash_res) {
+   pkt->hash.rss = rss_hash_res;
+   pkt->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+   }
+   }
+}
+
+static inline int
+xsc_rx_poll_len(struct xsc_rxq_data *rxq, volatile struct xsc_cqe *cqe)
+{
+   int len;
+
+   do {
+   len = 0;
+   int ret;
+
+   ret = check_cqe_own(cqe, rxq->cqe_n, rxq->cq_ci);
+   if (unlikely(ret != XSC_CQE_OWNER_SW)) {
+   if (unlikely(ret == XSC_CQE_OWNER_ERR)) {
+   /* TODO */
+   if (ret == XSC_CQE_OWNER_HW ||
+   ret == -1)
+   return 0;
+   } else {
+   return 0;
+   }
+   }
+
+   rxq->cq_ci += 1;
+   len = rte_le_to_cpu_32(cqe->msg_len);
+   return len;
+   } while (1);
+}
+
+static __rte_always_inline void
+xsc_pkt_info_sync(struct rte_mbuf *rep, struct rte_mbuf *seg)
+{
+   if (rep != NULL && seg != NULL) {
+   rep->data_len = seg->data_len;
+   rep->pkt_len = seg->pkt_len;
+   rep->data_off = seg->data_off;
+   rep->port = seg->port;
+   }
+}
 
 uint16_t
 xsc_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 {
-   return 0;
+   struct xsc_rxq_data *rxq = dpdk_rxq;
+   const uint32_t wqe_m = rxq->wqe_m;
+   const uint32_t cqe_m = rxq->cqe_m;
+   const uint32_t sge_n = rxq->sge_n;
+   struct rte_mbuf *pkt = NULL;
+   struct rte_mbuf *seg = NULL;
+   volatile struct xsc_cqe *cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_m];
+   uint32_t nb_pkts = 0;
+   uint32_t rq_ci = rxq->rq_ci;
+   int len = 0;
+   uint32_t cq_ci_two = 0;
+   int read_cqe_num = 0;
+   int read_cqe_num_len = 0;
+   volatile struct xsc_cqe_u64 *cqe_u64 = NULL;
+   struct rte_mbuf *rep;
+
+   while (pkts_n) {
+   uint32_t idx = rq_ci & wqe_m;
+   volatile struct xsc_wqe_data_seg *wqe =
+   &((volatile struct xsc_wqe_data_seg *)rxq->wqes)[idx << 
sge_n];
+   seg = (*rxq->elts)[idx];
+   rte_prefetch0(cqe);
+   rte_prefetch0(wqe);
+
+   rep = rte_mbuf_raw_alloc(seg->pool);
+   if (unlikely(rep == NULL))
+   break;
+
+   if (!pkt) {
+   if (read_cqe_num) {
+   cqe = cqe + 1;
+   len = read_cqe_num_len;
+   read_cqe_num = 0;
+   } else if ((rxq->cq_ci % 2 == 0) && (pkts_n > 1)) {
+   cq_ci_two = (rxq->cq_ci & rxq->cqe_m) / 2;
+   cqe_u64 = &(*rxq->cqes_u64)[cq_ci_two];
+   cqe = (volatile struct xsc_cqe *)cqe_u64;
+   len = xsc_rx_poll_len(rxq, cqe);
+   if (len > 0) {
+   read_cqe_num_len = xsc_rx_poll_len(rxq, 
cqe + 1);
+   if (read_cqe_num_len > 0)
+   read_cqe_num = 1;
+   }
+   } else {
+   cqe = &(*rxq->cqes)[rxq->c

[PATCH 13/19] net/xsc: add ethdev start and stop ops

2024-09-06 Thread WanRenyong
Implement xsc ethdev start and stop function.

Signed-off-by: WanRenyong 
Signed-off-by: Rong Qian 
---
 drivers/net/xsc/meson.build  |   1 +
 drivers/net/xsc/xsc_ctrl.h   | 152 ++-
 drivers/net/xsc/xsc_defs.h   |   2 +
 drivers/net/xsc/xsc_dev.h|   3 +
 drivers/net/xsc/xsc_ethdev.c | 740 ++-
 drivers/net/xsc/xsc_ethdev.h |  10 +
 drivers/net/xsc/xsc_rxtx.c   |  22 ++
 drivers/net/xsc/xsc_rxtx.h   |  68 +++-
 8 files changed, 994 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/xsc/xsc_rxtx.c

diff --git a/drivers/net/xsc/meson.build b/drivers/net/xsc/meson.build
index 5c989dba13..2fc4e5ace7 100644
--- a/drivers/net/xsc/meson.build
+++ b/drivers/net/xsc/meson.build
@@ -11,6 +11,7 @@ sources = files(
 'xsc_dev.c',
 'xsc_utils.c',
 'xsc_ctrl.c',
+'xsc_rxtx.c',
 )
 
 libnames = ['ibverbs']
diff --git a/drivers/net/xsc/xsc_ctrl.h b/drivers/net/xsc/xsc_ctrl.h
index c33e625097..e51847d68f 100644
--- a/drivers/net/xsc/xsc_ctrl.h
+++ b/drivers/net/xsc/xsc_ctrl.h
@@ -5,7 +5,17 @@
 #ifndef _XSC_CTRL_H_
 #define _XSC_CTRL_H_
 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include 
+#include 
+
+#ifndef PAGE_SIZE
+#define PAGE_SIZE  4096
+#endif
 
 #define XSC_IOCTL_CHECK_FIELD  0x01234567
 
@@ -25,6 +35,17 @@ enum xsc_ioctl_opmod {
XSC_IOCTL_OP_GET_LOCAL,
 };
 
+#define XSC_DIV_ROUND_UP(n, d) ({ \
+   typeof(d) _d = (d); \
+   typeof(n) _n = (n); \
+   ((_n) + (_d) - 1) / (_d); \
+})
+
+enum {
+   XSC_IOCTL_SET_QP_STATUS = 0x200,
+   XSC_IOCTL_SET_MAX
+};
+
 struct xsc_ioctl_attr {
uint16_t opcode; /* ioctl cmd */
uint16_t length; /* data length */
@@ -40,7 +61,18 @@ struct xsc_ioctl_hdr {
struct xsc_ioctl_attr attr;
 };
 
-/* ioctl */
+enum {
+   XSC_QUEUE_TYPE_RDMA_RC= 0,
+   XSC_QUEUE_TYPE_RDMA_MAD   = 1,
+   XSC_QUEUE_TYPE_RAW= 2,
+   XSC_QUEUE_TYPE_VIRTIO_NET = 3,
+   XSC_QUEUE_TYPE_VIRTIO_BLK = 4,
+   XSC_QUEUE_TYPE_RAW_TPE= 5,
+   XSC_QUEUE_TYPE_RAW_TSO= 6,
+   XSC_QUEUE_TYPE_RAW_TX = 7,
+   XSC_QUEUE_TYPE_INVALID= 0xFF,
+};
+
 struct xsc_inbox_hdr {
__be16 opcode;
uint8_trsvd[4];
@@ -53,7 +85,6 @@ struct xsc_outbox_hdr {
__be32  syndrome;
 };
 
-/* ioctl mbox */
 struct xsc_ioctl_mbox_in {
struct xsc_inbox_hdrhdr;
__be16  len;
@@ -96,6 +127,54 @@ struct xsc_cmd_modify_nic_hca_mbox_out {
uint8_t rsvd0[4];
 };
 
+struct xsc_create_qp_request {
+   __be16  input_qpn;
+   __be16  pa_num;
+   uint8_t qp_type;
+   uint8_t log_sq_sz;
+   uint8_t log_rq_sz;
+   uint8_t dma_direct;
+   __be32  pdn;
+   __be16  cqn_send;
+   __be16  cqn_recv;
+   __be16  glb_funcid;
+   uint8_t page_shift;
+   uint8_t rsvd;
+   __be64  pas[];
+};
+
+struct xsc_create_multiqp_mbox_in {
+   struct xsc_inbox_hdrhdr;
+   __be16  qp_num;
+   uint8_t qp_type;
+   uint8_t rsvd;
+   __be32  req_len;
+   uint8_t data[];
+};
+
+struct xsc_create_multiqp_mbox_out {
+   struct xsc_outbox_hdr   hdr;
+   __be32  qpn_base;
+};
+
+
+struct xsc_destroy_qp_mbox_in {
+   struct xsc_inbox_hdrhdr;
+   __be32  qpn;
+   uint8_t rsvd[4];
+};
+
+struct xsc_destroy_qp_mbox_out {
+   struct xsc_outbox_hdr   hdr;
+   uint8_t rsvd[8];
+};
+
+struct xsc_ioctl_qp_range {
+   uint16_topcode;
+   int num;
+   uint32_tqpn;
+};
+
 struct xsc_ioctl_data_tl {
uint16_t table;
uint16_t opmod;
@@ -136,6 +215,75 @@ struct xsc_ioctl_get_hwinfo {
uint8_t esw_mode;
 };
 
+/* for xscdv providers */
+#if !HAVE_XSC_DV_PROVIDER
+enum xscdv_obj_type {
+   XSCDV_OBJ_QP= 1 << 0,
+   XSCDV_OBJ_CQ= 1 << 1,
+   XSCDV_OBJ_SRQ   = 1 << 2,
+   XSCDV_OBJ_RWQ   = 1 << 3,
+   XSCDV_OBJ_DM= 1 << 4,
+   XSCDV_OBJ_AH= 1 << 5,
+   XSCDV_OBJ_PD= 1 << 6,
+};
+
+enum xsc_qp_create_flags {
+   XSC_QP_CREATE_RAWPACKE_TSO  = 1 << 0,
+   XSC_QP_CREATE_RAWPACKET_TSO = 1 << 0,
+   XSC_QP_CREATE_RAWPACKET_TX  = 1 << 1,
+};
+
+struct xscdv_cq_init_attr {
+   uint64_t comp_mask; /* Use enum xscdv_cq_init_attr_mask */
+   uint8_t cqe_comp_res_format; /* Use enum xscdv_cqe_comp_res_format */
+   uint32_t flags;
+   uint16_t cqe_size; /* when XSCDV_CQ_INIT_ATTR_MASK_CQE_SIZE set */
+};
+
+struct xscdv_obj {
+   struct {
+   struct ibv_qp   *in;
+   str

[PATCH 15/19] net/xsc: add ethdev Tx burst

2024-09-06 Thread WanRenyong
Implement xsc PMD transmit function.

Signed-off-by: WanRenyong 
Signed-off-by: Rong Qian 
---
 doc/guides/nics/features/xsc.ini |   4 +
 drivers/net/xsc/xsc_rxtx.c   | 231 ++-
 drivers/net/xsc/xsc_rxtx.h   |   9 ++
 3 files changed, 242 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/features/xsc.ini b/doc/guides/nics/features/xsc.ini
index bdeb7a984b..772c6418c4 100644
--- a/doc/guides/nics/features/xsc.ini
+++ b/doc/guides/nics/features/xsc.ini
@@ -7,6 +7,10 @@
 RSS hash = Y
 RSS key update   = Y
 RSS reta update  = Y
+L3 checksum offload  = Y
+L4 checksum offload  = Y
+Inner L3 checksum= Y
+Inner L4 checksum= Y
 Linux= Y
 ARMv8= Y
 x86-64   = Y
diff --git a/drivers/net/xsc/xsc_rxtx.c b/drivers/net/xsc/xsc_rxtx.c
index 28360e62ff..7a31cd428c 100644
--- a/drivers/net/xsc/xsc_rxtx.c
+++ b/drivers/net/xsc/xsc_rxtx.c
@@ -14,6 +14,8 @@
 #define XSC_CQE_OWNER_HW   0x2
 #define XSC_CQE_OWNER_SW   0x4
 #define XSC_CQE_OWNER_ERR  0x8
+#define XSC_OPCODE_RAW 0x7
+#define XSC_TX_COMP_CQE_HANDLE_MAX 2
 
 #define XSC_MAX_RX_BURST_MBUFS 64
 
@@ -201,9 +203,234 @@ xsc_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
return nb_pkts;
 }
 
+static __rte_always_inline void
+xsc_tx_elts_free(struct xsc_txq_data *__rte_restrict txq, uint16_t tail)
+{
+   uint16_t elts_n = tail - txq->elts_tail;
+   uint32_t free_n;
+
+   do {
+   free_n = txq->elts_s - (txq->elts_tail & txq->elts_m);
+   free_n = RTE_MIN(free_n, elts_n);
+   rte_pktmbuf_free_bulk(&txq->elts[txq->elts_tail & txq->elts_m], 
free_n);
+   txq->elts_tail += free_n;
+   elts_n -= free_n;
+   } while (elts_n > 0);
+}
+
+static void
+xsc_tx_cqes_handle(struct xsc_txq_data *__rte_restrict txq)
+{
+   uint32_t count = XSC_TX_COMP_CQE_HANDLE_MAX;
+   volatile struct xsc_cqe *last_cqe = NULL;
+   volatile struct xsc_cqe *cqe;
+   bool doorbell = false;
+   int ret;
+   uint16_t tail;
+
+   do {
+   cqe = &txq->cqes[txq->cq_ci & txq->cqe_m];
+   ret = check_cqe_own(cqe, txq->cqe_n, txq->cq_ci);
+   if (unlikely(ret != XSC_CQE_OWNER_SW)) {
+   if (likely(ret != XSC_CQE_OWNER_ERR))
+   /* No new CQEs in completion queue. */
+   break;
+   doorbell = true;
+   ++txq->cq_ci;
+   txq->cq_pi = txq->cq_ci;
+   last_cqe = NULL;
+   continue;
+   }
+
+   doorbell = true;
+   ++txq->cq_ci;
+   last_cqe = cqe;
+   } while (--count > 0);
+
+   if (likely(doorbell)) {
+   union xsc_cq_doorbell cq_db = {
+   .cq_data = 0
+   };
+   cq_db.next_cid = txq->cq_ci;
+   cq_db.cq_num = txq->cqn;
+
+   /* Ring doorbell */
+   rte_compiler_barrier();
+   *txq->cq_db = rte_cpu_to_le_32(cq_db.cq_data);
+
+   /* Release completed elts */
+   if (likely(last_cqe != NULL)) {
+   txq->wqe_pi = rte_le_to_cpu_16(last_cqe->wqe_id) >> 
txq->wqe_ds_n;
+   tail = txq->fcqs[(txq->cq_ci - 1) & txq->cqe_m];
+   if (likely(tail != txq->elts_tail))
+   xsc_tx_elts_free(txq, tail);
+   }
+   }
+}
+
+static __rte_always_inline void
+xsc_tx_wqe_ctrl_seg_init(struct xsc_txq_data *__rte_restrict txq,
+struct rte_mbuf *__rte_restrict mbuf,
+struct xsc_wqe *__rte_restrict wqe)
+{
+   struct xsc_send_wqe_ctrl_seg *cs = &wqe->cseg;
+   int i = 0;
+   int ds_max = (1 << txq->wqe_ds_n) - 1;
+
+   cs->msg_opcode = XSC_OPCODE_RAW;
+   cs->wqe_id = rte_cpu_to_le_16(txq->wqe_ci << txq->wqe_ds_n);
+   cs->has_pph = 0;
+   /* clear dseg's seg len */
+   if (cs->ds_data_num > 1 && cs->ds_data_num <= ds_max) {
+   for (i = 1; i < cs->ds_data_num; i++)
+   wqe->dseg[i].seg_len = 0;
+   }
+
+   cs->ds_data_num = mbuf->nb_segs;
+   if (mbuf->ol_flags & RTE_MBUF_F_TX_IP_CKSUM)
+   cs->csum_en = 0x2;
+   else
+   cs->csum_en = 0;
+
+   if (txq->tso_en == 1 && (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG)) {
+   cs->has_pph = 0;
+   cs->so_type = 1;
+   cs->so_hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len;
+   cs->so_data_size = rte_cpu_to_le_16(mbuf->tso_segsz);
+   }
+
+   cs->msg_len = rte_cpu_to_le_32(rte_pktmbuf_pkt_len(mbuf));
+   if (unlikely(cs->msg_len == 0))
+   cs->msg_len = rte_cpu_to_le_32(rte_pktmbuf_data_len(mbuf));
+
+   /* do not generate cqe for every

[PATCH 12/19] net/xsc: add ethdev RSS hash ops

2024-09-06 Thread WanRenyong
Implement xsc ethdev RSS hash config get and update functions.

Signed-off-by: WanRenyong 
---
 doc/guides/nics/features/xsc.ini |  3 +++
 drivers/net/xsc/xsc_ctrl.h   | 27 
 drivers/net/xsc/xsc_ethdev.c | 43 +++-
 drivers/net/xsc/xsc_ethdev.h | 17 +
 drivers/net/xsc/xsc_utils.h  |  5 +++-
 5 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/features/xsc.ini b/doc/guides/nics/features/xsc.ini
index b5c44ce535..bdeb7a984b 100644
--- a/doc/guides/nics/features/xsc.ini
+++ b/doc/guides/nics/features/xsc.ini
@@ -4,6 +4,9 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+RSS hash = Y
+RSS key update   = Y
+RSS reta update  = Y
 Linux= Y
 ARMv8= Y
 x86-64   = Y
diff --git a/drivers/net/xsc/xsc_ctrl.h b/drivers/net/xsc/xsc_ctrl.h
index a7259c5fcb..c33e625097 100644
--- a/drivers/net/xsc/xsc_ctrl.h
+++ b/drivers/net/xsc/xsc_ctrl.h
@@ -69,6 +69,33 @@ struct xsc_ioctl_mbox_out {
uint8_t data[];
 };
 
+struct xsc_nic_attr {
+   __be16   caps;
+   __be16   caps_mask;
+   uint8_t  mac_addr[6];
+};
+
+struct xsc_rss_modify_attr {
+   uint8_t  caps_mask;
+   uint8_t  rss_en;
+   __be16   rqn_base;
+   __be16   rqn_num;
+   uint8_t  hfunc;
+   __be32   hash_tmpl;
+   uint8_t  hash_key[52];
+};
+
+struct xsc_cmd_modify_nic_hca_mbox_in {
+   struct xsc_inbox_hdrhdr;
+   struct xsc_nic_attr nic;
+   struct xsc_rss_modify_attr  rss;
+};
+
+struct xsc_cmd_modify_nic_hca_mbox_out {
+   struct xsc_outbox_hdr   hdr;
+   uint8_t rsvd0[4];
+};
+
 struct xsc_ioctl_data_tl {
uint16_t table;
uint16_t opmod;
diff --git a/drivers/net/xsc/xsc_ethdev.c b/drivers/net/xsc/xsc_ethdev.c
index 5ad9567eb3..1f09ab9735 100644
--- a/drivers/net/xsc/xsc_ethdev.c
+++ b/drivers/net/xsc/xsc_ethdev.c
@@ -17,6 +17,45 @@ static int
 xsc_rss_modify_cmd(struct xsc_ethdev_priv *priv, uint8_t *rss_key,
   uint8_t rss_key_len)
 {
+   struct xsc_cmd_modify_nic_hca_mbox_in in = {};
+   struct xsc_cmd_modify_nic_hca_mbox_out out = {};
+   uint8_t rss_caps_mask = 0;
+   int ret, key_len = 0;
+
+   in.hdr.opcode = rte_cpu_to_be_16(XSC_CMD_OP_MODIFY_NIC_HCA);
+
+   key_len = RTE_MIN(rss_key_len, XSC_RSS_HASH_KEY_LEN);
+   rte_memcpy(in.rss.hash_key, rss_key, key_len);
+   rss_caps_mask |= BIT(XSC_RSS_HASH_KEY_UPDATE);
+
+   in.rss.caps_mask = rss_caps_mask;
+   in.rss.rss_en = 1;
+   in.nic.caps_mask = rte_cpu_to_be_16(BIT(XSC_TBM_CAP_RSS));
+   in.nic.caps = in.nic.caps_mask;
+
+   ret = xsc_mailbox_exec(priv->xdev, &in, sizeof(in), &out, sizeof(out));
+   if (ret != 0 || out.hdr.status != 0)
+   return -1;
+   return 0;
+}
+
+static int
+xsc_ethdev_rss_hash_conf_get(struct rte_eth_dev *dev,
+struct rte_eth_rss_conf *rss_conf)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+
+   if (!rss_conf) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   if (rss_conf->rss_key != NULL &&
+   rss_conf->rss_key_len >= priv->rss_conf.rss_key_len) {
+   memcpy(rss_conf->rss_key, priv->rss_conf.rss_key,
+  priv->rss_conf.rss_key_len);
+   }
+   rss_conf->rss_key_len = priv->rss_conf.rss_key_len;
+   rss_conf->rss_hf = priv->rss_conf.rss_hf;
return 0;
 }
 
@@ -30,7 +69,7 @@ xsc_ethdev_rss_hash_update(struct rte_eth_dev *dev,
if (rss_conf->rss_key_len > XSC_RSS_HASH_KEY_LEN ||
rss_conf->rss_key == NULL) {
PMD_DRV_LOG(ERR, "Xsc pmd key len is %d bigger than %d",
-   rss_conf->rss_key_len, XSC_RSS_HASH_KEY_LEN);
+   rss_conf->rss_key_len, XSC_RSS_HASH_KEY_LEN);
return -EINVAL;
}
 
@@ -184,6 +223,8 @@ const struct eth_dev_ops xsc_dev_ops = {
.dev_configure = xsc_ethdev_configure,
.rx_queue_setup = xsc_ethdev_rx_queue_setup,
.tx_queue_setup = xsc_ethdev_tx_queue_setup,
+   .rss_hash_update = xsc_ethdev_rss_hash_update,
+   .rss_hash_conf_get = xsc_ethdev_rss_hash_conf_get,
 };
 
 static int
diff --git a/drivers/net/xsc/xsc_ethdev.h b/drivers/net/xsc/xsc_ethdev.h
index 10c3d8cc87..fb92d47dd0 100644
--- a/drivers/net/xsc/xsc_ethdev.h
+++ b/drivers/net/xsc/xsc_ethdev.h
@@ -9,6 +9,8 @@
 #define XSC_MAX_DESC_NUMBER 1024
 #define XSC_RX_FREE_THRESH 32
 
+#define XSC_CMD_OP_MODIFY_NIC_HCA 0x812
+
 struct xsc_dev_config {
uint8_t pph_flag;
unsigned int hw_csum:1;
@@ -51,4 +53,19 @@ struct xsc_ethdev_priv {
 #define TO_XSC_ETHDEV_PRIV(dev) \
((struct xsc_ethdev_priv *)(dev)->data->dev_private)
 
+enum {
+

[PATCH 17/19] net/xsc: add dev link and MTU ops

2024-09-06 Thread WanRenyong
XSC PMD does not support update link right now, in order to
start device successfully link_update function always return 0.

Signed-off-by: WanRenyong 
---
 doc/guides/nics/features/xsc.ini |  1 +
 drivers/net/xsc/xsc_ethdev.c | 50 
 drivers/net/xsc/xsc_utils.c  | 23 +++
 drivers/net/xsc/xsc_utils.h  |  1 +
 4 files changed, 75 insertions(+)

diff --git a/doc/guides/nics/features/xsc.ini b/doc/guides/nics/features/xsc.ini
index 772c6418c4..84c5ff4b6b 100644
--- a/doc/guides/nics/features/xsc.ini
+++ b/doc/guides/nics/features/xsc.ini
@@ -4,6 +4,7 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+MTU update   = Y
 RSS hash = Y
 RSS key update   = Y
 RSS reta update  = Y
diff --git a/drivers/net/xsc/xsc_ethdev.c b/drivers/net/xsc/xsc_ethdev.c
index d3e044e740..54b7e79145 100644
--- a/drivers/net/xsc/xsc_ethdev.c
+++ b/drivers/net/xsc/xsc_ethdev.c
@@ -187,6 +187,20 @@ xsc_ethdev_configure(struct rte_eth_dev *dev)
return -rte_errno;
 }
 
+static int
+xsc_ethdev_set_link_down(struct rte_eth_dev *dev)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+   return xsc_link_process(dev, priv->ifindex, IFF_UP);
+}
+
+static int
+xsc_ethdev_set_link_up(struct rte_eth_dev *dev)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+   return xsc_link_process(dev, priv->ifindex, ~IFF_UP);
+}
+
 static int
 xsc_init_obj(struct xscdv_obj *obj, uint64_t obj_type)
 {
@@ -983,6 +997,39 @@ xsc_ethdev_tx_queue_setup(struct rte_eth_dev *dev, 
uint16_t idx, uint16_t desc,
return 0;
 }
 
+static int
+xsc_ethdev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+   uint16_t get_mtu = 0;
+   int ret = 0;
+
+   if (priv->eth_type != RTE_ETH_REPRESENTOR_PF) {
+   priv->mtu = mtu;
+   return 0;
+   }
+
+   ret = xsc_get_mtu(&priv->mtu, priv->ifindex);
+   if (ret)
+   return ret;
+
+   ret = xsc_set_mtu(mtu, priv->ifindex);
+   if (ret)
+   return ret;
+
+   ret = xsc_get_mtu(&get_mtu, priv->ifindex);
+   if (ret)
+   return ret;
+
+   if (get_mtu != mtu) {
+   PMD_DRV_LOG(ERR, "mtu set to %u failure", mtu);
+   return -EAGAIN;
+   }
+
+   priv->mtu = mtu;
+   return 0;
+}
+
 static int
 xsc_ethdev_link_update(__rte_unused struct rte_eth_dev *dev,
   __rte_unused int wait_to_complete)
@@ -994,12 +1041,15 @@ const struct eth_dev_ops xsc_dev_ops = {
.dev_configure = xsc_ethdev_configure,
.dev_start = xsc_ethdev_start,
.dev_stop = xsc_ethdev_stop,
+   .dev_set_link_down = xsc_ethdev_set_link_down,
+   .dev_set_link_up = xsc_ethdev_set_link_up,
.dev_close = xsc_ethdev_close,
.link_update = xsc_ethdev_link_update,
.rx_queue_setup = xsc_ethdev_rx_queue_setup,
.tx_queue_setup = xsc_ethdev_tx_queue_setup,
.rx_queue_release = xsc_ethdev_rxq_release,
.tx_queue_release = xsc_ethdev_txq_release,
+   .mtu_set = xsc_ethdev_set_mtu,
.rss_hash_update = xsc_ethdev_rss_hash_update,
.rss_hash_conf_get = xsc_ethdev_rss_hash_conf_get,
 };
diff --git a/drivers/net/xsc/xsc_utils.c b/drivers/net/xsc/xsc_utils.c
index e40b0904b7..788cdfa54a 100644
--- a/drivers/net/xsc/xsc_utils.c
+++ b/drivers/net/xsc/xsc_utils.c
@@ -321,3 +321,26 @@ xsc_mac_addr_add(struct rte_eth_dev *dev, struct 
rte_ether_addr *mac, uint32_t i
dev->data->mac_addrs[index] = *mac;
return 0;
 }
+
+int
+xsc_link_process(struct rte_eth_dev *dev __rte_unused,
+uint32_t ifindex, unsigned int flags)
+{
+   struct ifreq request;
+   struct ifreq *ifr = &request;
+   char ifname[sizeof(ifr->ifr_name)];
+   int ret;
+   unsigned int keep = ~IFF_UP;
+
+   if (if_indextoname(ifindex, ifname) == NULL)
+   return -rte_errno;
+
+   ret = xsc_ifreq_by_ifname(ifname, SIOCGIFFLAGS, &request);
+   if (ret)
+   return ret;
+
+   request.ifr_flags &= keep;
+   request.ifr_flags |= flags & ~keep;
+
+   return xsc_ifreq_by_ifname(ifname, SIOCSIFFLAGS, &request);
+}
diff --git a/drivers/net/xsc/xsc_utils.h b/drivers/net/xsc/xsc_utils.h
index 672ba3871e..d9327020cd 100644
--- a/drivers/net/xsc/xsc_utils.h
+++ b/drivers/net/xsc/xsc_utils.h
@@ -22,5 +22,6 @@ int xsc_mac_addr_add(struct rte_eth_dev *dev, struct 
rte_ether_addr *mac, uint32
 int xsc_get_mtu(uint16_t *mtu, uint32_t ifindex);
 int xsc_set_mtu(uint16_t mtu, uint32_t ifindex);
 int xsc_get_mac(uint8_t *mac, uint32_t ifindex);
+int xsc_link_process(struct rte_eth_dev *dev, uint32_t ifindex, unsigned int 
flags);
 
 #endif
-- 
2.25.1


[PATCH 18/19] net/xsc: add dev infos get

2024-09-06 Thread WanRenyong
Implement xsc ethdev information get function.

Signed-off-by: WanRenyong 
---
 drivers/net/xsc/xsc_ethdev.c | 60 
 1 file changed, 60 insertions(+)

diff --git a/drivers/net/xsc/xsc_ethdev.c b/drivers/net/xsc/xsc_ethdev.c
index 54b7e79145..0c8a620d03 100644
--- a/drivers/net/xsc/xsc_ethdev.c
+++ b/drivers/net/xsc/xsc_ethdev.c
@@ -918,6 +918,65 @@ xsc_ethdev_close(struct rte_eth_dev *dev)
return 0;
 }
 
+static uint64_t
+xsc_get_rx_queue_offloads(struct rte_eth_dev *dev)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+   struct xsc_dev_config *config = &priv->config;
+   uint64_t offloads = 0;
+
+   if (config->hw_csum)
+   offloads |= (RTE_ETH_RX_OFFLOAD_IPV4_CKSUM |
+RTE_ETH_RX_OFFLOAD_UDP_CKSUM |
+RTE_ETH_RX_OFFLOAD_TCP_CKSUM);
+
+   return offloads;
+}
+
+static uint64_t
+xsc_get_tx_port_offloads(struct rte_eth_dev *dev)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+   uint64_t offloads = 0;
+   struct xsc_dev_config *config = &priv->config;
+
+   if (config->hw_csum)
+   offloads |= (RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
+RTE_ETH_TX_OFFLOAD_UDP_CKSUM |
+RTE_ETH_TX_OFFLOAD_TCP_CKSUM);
+   if (config->tso)
+   offloads |= RTE_ETH_TX_OFFLOAD_TCP_TSO;
+   return offloads;
+}
+
+static int
+xsc_ethdev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+
+   info->min_rx_bufsize = 64;
+   info->max_rx_pktlen = 65536;
+   info->max_lro_pkt_size = 0;
+   info->max_rx_queues = 256;
+   info->max_tx_queues = 1024;
+   info->rx_desc_lim.nb_max = 4096;
+   info->rx_desc_lim.nb_min = 16;
+   info->tx_desc_lim.nb_max = 8192;
+   info->tx_desc_lim.nb_min = 128;
+
+   info->rx_queue_offload_capa = xsc_get_rx_queue_offloads(dev);
+   info->rx_offload_capa = info->rx_queue_offload_capa;
+   info->tx_offload_capa = xsc_get_tx_port_offloads(dev);
+
+   info->if_index = priv->ifindex;
+   info->hash_key_size = XSC_RSS_HASH_KEY_LEN;
+   info->tx_desc_lim.nb_seg_max = 8;
+   info->tx_desc_lim.nb_mtu_seg_max = 8;
+   info->switch_info.name = dev->data->name;
+   info->switch_info.port_id = priv->representor_id;
+   return 0;
+}
+
 static int
 xsc_ethdev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
  uint32_t socket, const struct rte_eth_rxconf *conf,
@@ -1045,6 +1104,7 @@ const struct eth_dev_ops xsc_dev_ops = {
.dev_set_link_up = xsc_ethdev_set_link_up,
.dev_close = xsc_ethdev_close,
.link_update = xsc_ethdev_link_update,
+   .dev_infos_get = xsc_ethdev_infos_get,
.rx_queue_setup = xsc_ethdev_rx_queue_setup,
.tx_queue_setup = xsc_ethdev_tx_queue_setup,
.rx_queue_release = xsc_ethdev_rxq_release,
-- 
2.25.1


[PATCH 19/19] net/xsc: add dev basic stats ops

2024-09-06 Thread WanRenyong
Implement xsc ethdev basic stats get and reset functions.

Signed-off-by: WanRenyong 
---
 doc/guides/nics/features/xsc.ini |  1 +
 drivers/net/xsc/xsc_ethdev.c | 76 
 drivers/net/xsc/xsc_rxtx.c   | 11 -
 drivers/net/xsc/xsc_rxtx.h   | 15 +++
 4 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/xsc.ini b/doc/guides/nics/features/xsc.ini
index 84c5ff4b6b..d73cf9d136 100644
--- a/doc/guides/nics/features/xsc.ini
+++ b/doc/guides/nics/features/xsc.ini
@@ -12,6 +12,7 @@ L3 checksum offload  = Y
 L4 checksum offload  = Y
 Inner L3 checksum= Y
 Inner L4 checksum= Y
+Basic stats  = Y
 Linux= Y
 ARMv8= Y
 x86-64   = Y
diff --git a/drivers/net/xsc/xsc_ethdev.c b/drivers/net/xsc/xsc_ethdev.c
index 0c8a620d03..b3ae7ba75e 100644
--- a/drivers/net/xsc/xsc_ethdev.c
+++ b/drivers/net/xsc/xsc_ethdev.c
@@ -1089,6 +1089,80 @@ xsc_ethdev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
return 0;
 }
 
+static int
+xsc_ethdev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+   uint32_t rxqs_n = priv->num_rq;
+   uint32_t txqs_n = priv->num_sq;
+   uint32_t i, idx;
+   struct xsc_rxq_data *rxq;
+   struct xsc_txq_data *txq;
+
+   memset(stats, 0, sizeof(struct rte_eth_stats));
+   for (i = 0; i < rxqs_n; ++i) {
+   rxq = xsc_rxq_get(dev, i);
+   if (unlikely(rxq == NULL))
+   continue;
+
+   idx = rxq->idx;
+   if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+   stats->q_ipackets[idx] += rxq->stats.rx_pkts;
+   stats->q_ibytes[idx] += rxq->stats.rx_bytes;
+   stats->q_errors[idx] += (rxq->stats.rx_errors +
+rxq->stats.rx_nombuf);
+   }
+   stats->ipackets += rxq->stats.rx_pkts;
+   stats->ibytes += rxq->stats.rx_bytes;
+   stats->ierrors += rxq->stats.rx_errors;
+   stats->rx_nombuf += rxq->stats.rx_nombuf;
+   }
+
+   for (i = 0; i < txqs_n; ++i) {
+   txq = xsc_txq_get(dev, i);
+   if (unlikely(txq == NULL))
+   continue;
+
+   idx = txq->idx;
+   if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+   stats->q_opackets[idx] += txq->stats.tx_pkts;
+   stats->q_obytes[idx] += txq->stats.tx_bytes;
+   stats->q_errors[idx] += txq->stats.tx_errors;
+   }
+   stats->opackets += txq->stats.tx_pkts;
+   stats->obytes += txq->stats.tx_bytes;
+   stats->oerrors += txq->stats.tx_errors;
+   }
+
+   return 0;
+}
+
+static int
+xsc_ethdev_stats_reset(struct rte_eth_dev *dev)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+   uint32_t rxqs_n = priv->num_rq;
+   uint32_t txqs_n = priv->num_sq;
+   uint32_t i;
+   struct xsc_rxq_data *rxq;
+   struct xsc_txq_data *txq;
+
+   for (i = 0; i < rxqs_n; ++i) {
+   rxq = xsc_rxq_get(dev, i);
+   if (unlikely(rxq == NULL))
+   continue;
+   memset(&rxq->stats, 0, sizeof(struct xsc_rxq_stats));
+   }
+   for (i = 0; i < txqs_n; ++i) {
+   txq = xsc_txq_get(dev, i);
+   if (unlikely(txq == NULL))
+   continue;
+   memset(&txq->stats, 0, sizeof(struct xsc_txq_stats));
+   }
+
+   return 0;
+}
+
 static int
 xsc_ethdev_link_update(__rte_unused struct rte_eth_dev *dev,
   __rte_unused int wait_to_complete)
@@ -1104,6 +1178,8 @@ const struct eth_dev_ops xsc_dev_ops = {
.dev_set_link_up = xsc_ethdev_set_link_up,
.dev_close = xsc_ethdev_close,
.link_update = xsc_ethdev_link_update,
+   .stats_get = xsc_ethdev_stats_get,
+   .stats_reset = xsc_ethdev_stats_reset,
.dev_infos_get = xsc_ethdev_infos_get,
.rx_queue_setup = xsc_ethdev_rx_queue_setup,
.tx_queue_setup = xsc_ethdev_tx_queue_setup,
diff --git a/drivers/net/xsc/xsc_rxtx.c b/drivers/net/xsc/xsc_rxtx.c
index 7a31cd428c..8aed8f4b12 100644
--- a/drivers/net/xsc/xsc_rxtx.c
+++ b/drivers/net/xsc/xsc_rxtx.c
@@ -62,6 +62,7 @@ xsc_rx_poll_len(struct xsc_rxq_data *rxq, volatile struct 
xsc_cqe *cqe)
ret = check_cqe_own(cqe, rxq->cqe_n, rxq->cq_ci);
if (unlikely(ret != XSC_CQE_OWNER_SW)) {
if (unlikely(ret == XSC_CQE_OWNER_ERR)) {
+   ++rxq->stats.rx_errors;
/* TODO */
if (ret == XSC_CQE_OWNER_HW ||
ret == -1)
@@ -116,8 +117,10 @@ xsc_rx_burst(void *dpdk_rxq, st

[PATCH 16/19] net/xsc: configure xsc device hardware table

2024-09-06 Thread WanRenyong
Configure hardware table to enable transmission and reception of
the queues.

Signed-off-by: WanRenyong 
Signed-off-by: Xiaoxiong Zhang 
---
 drivers/net/xsc/meson.build  |   1 +
 drivers/net/xsc/xsc_ctrl.h   |  22 +
 drivers/net/xsc/xsc_ethdev.c |  39 
 drivers/net/xsc/xsc_flow.c   | 167 +++
 drivers/net/xsc/xsc_flow.h   |  67 ++
 5 files changed, 296 insertions(+)
 create mode 100644 drivers/net/xsc/xsc_flow.c
 create mode 100644 drivers/net/xsc/xsc_flow.h

diff --git a/drivers/net/xsc/meson.build b/drivers/net/xsc/meson.build
index 2fc4e5ace7..348e8ed145 100644
--- a/drivers/net/xsc/meson.build
+++ b/drivers/net/xsc/meson.build
@@ -12,6 +12,7 @@ sources = files(
 'xsc_utils.c',
 'xsc_ctrl.c',
 'xsc_rxtx.c',
+'xsc_flow.c',
 )
 
 libnames = ['ibverbs']
diff --git a/drivers/net/xsc/xsc_ctrl.h b/drivers/net/xsc/xsc_ctrl.h
index e51847d68f..51bda47ca8 100644
--- a/drivers/net/xsc/xsc_ctrl.h
+++ b/drivers/net/xsc/xsc_ctrl.h
@@ -41,6 +41,12 @@ enum xsc_ioctl_opmod {
((_n) + (_d) - 1) / (_d); \
 })
 
+enum {
+   XSC_CMD_OP_MODIFY_RAW_QP = 0x81f,
+   XSC_CMD_OP_IOCTL_FLOW= 0x900,
+   XSC_CMD_OP_MAX
+};
+
 enum {
XSC_IOCTL_SET_QP_STATUS = 0x200,
XSC_IOCTL_SET_MAX
@@ -72,6 +78,22 @@ enum {
XSC_QUEUE_TYPE_RAW_TX = 7,
XSC_QUEUE_TYPE_INVALID= 0xFF,
 };
+enum  xsc_flow_tbl_id {
+   XSC_FLOW_TBL_IPAT = 0,
+   XSC_FLOW_TBL_PCT_V4 = 4,
+   XSC_FLOW_TBL_EPAT = 19,
+   XSC_FLOW_TBL_MAX
+};
+
+enum xsc_ioctl_op {
+   XSC_IOCTL_OP_ADD,
+   XSC_IOCTL_OP_DEL,
+   XSC_IOCTL_OP_GET,
+   XSC_IOCTL_OP_CLR,
+   XSC_IOCTL_OP_MOD,
+   XSC_IOCTL_OP_MAX
+};
+
 
 struct xsc_inbox_hdr {
__be16 opcode;
diff --git a/drivers/net/xsc/xsc_ethdev.c b/drivers/net/xsc/xsc_ethdev.c
index 991978dd1c..d3e044e740 100644
--- a/drivers/net/xsc/xsc_ethdev.c
+++ b/drivers/net/xsc/xsc_ethdev.c
@@ -11,6 +11,7 @@
 #include "xsc_dev.h"
 #include "xsc_ethdev.h"
 #include "xsc_utils.h"
+#include "xsc_flow.h"
 #include "xsc_ctrl.h"
 #include "xsc_rxtx.h"
 
@@ -787,6 +788,43 @@ xsc_rxq_start(struct rte_eth_dev *dev)
return -rte_errno;
 }
 
+static int
+xsc_dev_start_config_hw(struct rte_eth_dev *dev)
+{
+   struct xsc_ethdev_priv *priv = TO_XSC_ETHDEV_PRIV(dev);
+   struct xsc_hwinfo *hwinfo;
+   int peer_dstinfo = 0;
+   int peer_logicalport = 0;
+   int logical_port = 0;
+   int local_dstinfo = 0;
+   int pcie_logic_port = 0;
+   int qp_set_id = 0;
+   int rep_id;
+   struct xsc_rxq_data *rxq = xsc_rxq_get(dev, 0);
+   uint16_t rx_qpn = (uint16_t)rxq->qpn;
+   static int xsc_global_pct_priority_idx = 128;
+
+   if (priv->funcid_type != XSC_PHYPORT_MAC_FUNCID)
+   return -1;
+
+   hwinfo = &priv->xdev->hwinfo;
+   rep_id = priv->representor_id;
+   peer_dstinfo = hwinfo->mac_phy_port;
+   peer_logicalport = hwinfo->mac_phy_port;
+
+   qp_set_id = rep_id % 511 + 1;
+   logical_port = priv->xdev->vfos_logical_in_port + qp_set_id - 1;
+   local_dstinfo = logical_port;
+   pcie_logic_port = hwinfo->pcie_no + 8;
+
+   xsc_create_ipat(dev, logical_port, peer_dstinfo);
+   xsc_create_epat(dev, local_dstinfo, pcie_logic_port,
+   rx_qpn - hwinfo->raw_rss_qp_id_base, priv->num_rq);
+   xsc_create_pct(dev, logical_port, peer_dstinfo, 
xsc_global_pct_priority_idx++);
+   xsc_create_pct(dev, peer_logicalport, local_dstinfo, 
xsc_global_pct_priority_idx++);
+   return 0;
+}
+
 static int
 xsc_ethdev_start(struct rte_eth_dev *dev)
 {
@@ -812,6 +850,7 @@ xsc_ethdev_start(struct rte_eth_dev *dev)
dev->rx_pkt_burst = xsc_rx_burst;
dev->tx_pkt_burst = xsc_tx_burst;
 
+   ret = xsc_dev_start_config_hw(dev);
return 0;
 
 error:
diff --git a/drivers/net/xsc/xsc_flow.c b/drivers/net/xsc/xsc_flow.c
new file mode 100644
index 00..f1101b29d0
--- /dev/null
+++ b/drivers/net/xsc/xsc_flow.c
@@ -0,0 +1,167 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2024 Yunsilicon Technology Co., Ltd.
+ */
+
+#include 
+#include 
+
+#include "xsc_log.h"
+#include "xsc_defs.h"
+#include "xsc_dev.h"
+#include "xsc_ethdev.h"
+#include "xsc_utils.h"
+#include "xsc_flow.h"
+#include "xsc_ctrl.h"
+
+
+static int
+xsc_flow_exec(struct xsc_dev *dev, void *cmd, int len, int table, int opmod)
+{
+   struct xsc_ioctl_data_tl *tl;
+   struct xsc_ioctl_mbox_in *in;
+   struct xsc_ioctl_mbox_out *out;
+   int in_len;
+   int out_len;
+   int data_len;
+   int cmd_len;
+   int ret;
+
+   data_len = sizeof(struct xsc_ioctl_data_tl) + len;
+   in_len = sizeof(struct xsc_ioctl_mbox_in) + data_len;
+   out_len = sizeof(struct xsc_ioctl_mbox_out) + data_len;
+   cmd_len = RTE_MAX(in_len, out_len);
+   in = rte_zmalloc(NULL, cmd_len, RTE_CACHE_LINE_SIZE);
+   if (in == NULL) {
+

RE: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK

2024-09-06 Thread Morten Brørup
> From: Anatoly Burakov [mailto:anatoly.bura...@intel.com]
> Sent: Friday, 6 September 2024 13.47
> To: dev@dpdk.org
> Subject: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK
> 
> While initially, DPDK has used the term "socket ID" to refer to physical
> package
> ID, the last time DPDK read "physical_package_id" for socket ID was ~9 years
> ago, so it's been a while since we've actually switched over to using the term
> "socket" to mean "NUMA node".
> 
> This wasn't a problem before, as most systems had one NUMA node per physical
> socket. However, in the last few years, more and more systems have multiple
> NUMA
> nodes per physical CPU socket. Since DPDK used NUMA nodes already, the
> transition was pretty seamless, however now we're faced with a situation when
> most of our documentation still uses outdated terms, and our API is ripe with
> references to "sockets" when in actuality we mean "NUMA nodes". This could be
> a
> source of confusion.
> 
> While completely renaming all of our API's would be a huge effort, will take a
> long time and arguably wouldn't even be worth the API breakages (given that
> this
> mismatch between terminology and reality is implicitly understood by most
> people
> working on DPDK, and so this isn't so much of a problem in practice), we can
> do
> some tweaks around the edges and at least document this unfortunate reality.
> 
> This patchset suggests the following changes:
> 
> - Update rte_socket/rte_lcore documentation to refer to NUMA nodes rather than
> sockets - Rename internal structures' fields to better reflect this intention
> -
> Rename --socket-mem/--socket-limit flags to refer to NUMA rather than sockets
> -
> Add internal API to get physical package ID [1]
> 
> The documentation is updated to refer to new EAL flags, but is otherwise left
> untouched, and instead the entry in "glossary" is amended to indicate that
> when
> DPDK documentation refers to "sockets", it actually means "NUMA ID's". As next
> steps, we could rename all API parameters to refer to NUMA ID rather than
> socket
> ID - this would not break neither API nor ABI, and instead would be a
> documentation change in practice.
> 
> [1] This could be used to group lcores by physical package, see e.g.
> discussion
> under this patch:
> https://patches.dpdk.org/project/dpdk/cover/20240827151014.201-1-
> vipin.vargh...@amd.com/

Thank you for cleaning this up, Anatoly.

I would prefer to take one more step and also rename functions and parameters, 
e.g. rte_socket_id() -> rte_numa_id().

For backwards compatibility, macros/functions with the old names can be added.



[PATCH v1 1/3] crypto/ipsec_mb: add SM3 algorithm support

2024-09-06 Thread Brian Dooley
This patch introduces SM3 algorithm support to the AESNI_MB PMD.

Signed-off-by: Brian Dooley 
---
 drivers/crypto/ipsec_mb/pmd_aesni_mb.c  |  5 
 drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h | 26 -
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ipsec_mb/pmd_aesni_mb.c 
b/drivers/crypto/ipsec_mb/pmd_aesni_mb.c
index ef4228bd38..4711b7f590 100644
--- a/drivers/crypto/ipsec_mb/pmd_aesni_mb.c
+++ b/drivers/crypto/ipsec_mb/pmd_aesni_mb.c
@@ -377,6 +377,11 @@ aesni_mb_set_session_auth_parameters(IMB_MGR *mb_mgr,
sess->template_job.hash_alg = IMB_AUTH_SHA_512;
auth_precompute = 0;
break;
+#if IMB_VERSION(1, 5, 0) <= IMB_VERSION_NUM
+   case RTE_CRYPTO_AUTH_SM3:
+   sess->template_job.hash_alg = IMB_AUTH_SM3;
+   break;
+#endif
default:
IPSEC_MB_LOG(ERR,
"Unsupported authentication algorithm selection");
diff --git a/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h 
b/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h
index d6af2d4ded..90d2702a01 100644
--- a/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h
+++ b/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h
@@ -732,6 +732,27 @@ static const struct rte_cryptodev_capabilities 
aesni_mb_capabilities[] = {
}, }
}, }
},
+   {   /* SM3 */
+   .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+   {.sym = {
+   .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+   {.auth = {
+   .algo = RTE_CRYPTO_AUTH_SM3,
+   .block_size = 64,
+   .key_size = {
+   .min = 0,
+   .max = 0,
+   .increment = 0
+   },
+   .digest_size = {
+   .min = 32,
+   .max = 32,
+   .increment = 1
+   },
+   .iv_size = { 0 }
+   }, }
+   }, }
+   },
RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST()
 };
 
@@ -840,7 +861,10 @@ static const unsigned int auth_digest_byte_lengths[] = {
[IMB_AUTH_SHA_512]  = 64,
[IMB_AUTH_ZUC_EIA3_BITLEN]  = 4,
[IMB_AUTH_SNOW3G_UIA2_BITLEN]   = 4,
-   [IMB_AUTH_KASUMI_UIA1]  = 4
+   [IMB_AUTH_KASUMI_UIA1]  = 4,
+#if IMB_VERSION(1, 5, 0) <= IMB_VERSION_NUM
+   [IMB_AUTH_SM3]  = 32
+#endif
/**< Vector mode dependent pointer table of the multi-buffer APIs */
 
 };
-- 
2.25.1



[PATCH v1 2/3] crypto/ipsec_mb: add HMAC SM3 algorithm support

2024-09-06 Thread Brian Dooley
This patch introduces HMAC SM3 algorithm support to the AESNI_MB PMD.

Signed-off-by: Brian Dooley 
---
 drivers/crypto/ipsec_mb/pmd_aesni_mb.c  |  3 +++
 drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h | 24 -
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ipsec_mb/pmd_aesni_mb.c 
b/drivers/crypto/ipsec_mb/pmd_aesni_mb.c
index 4711b7f590..019867fe1c 100644
--- a/drivers/crypto/ipsec_mb/pmd_aesni_mb.c
+++ b/drivers/crypto/ipsec_mb/pmd_aesni_mb.c
@@ -381,6 +381,9 @@ aesni_mb_set_session_auth_parameters(IMB_MGR *mb_mgr,
case RTE_CRYPTO_AUTH_SM3:
sess->template_job.hash_alg = IMB_AUTH_SM3;
break;
+   case RTE_CRYPTO_AUTH_SM3_HMAC:
+   sess->template_job.hash_alg = IMB_AUTH_HMAC_SM3;
+   break;
 #endif
default:
IPSEC_MB_LOG(ERR,
diff --git a/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h 
b/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h
index 90d2702a01..24c2686952 100644
--- a/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h
+++ b/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h
@@ -753,6 +753,27 @@ static const struct rte_cryptodev_capabilities 
aesni_mb_capabilities[] = {
}, }
}, }
},
+   {   /* HMAC SM3 */
+   .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+   {.sym = {
+   .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+   {.auth = {
+   .algo = RTE_CRYPTO_AUTH_SM3_HMAC,
+   .block_size = 64,
+   .key_size = {
+   .min = 1,
+   .max = 65535,
+   .increment = 1
+   },
+   .digest_size = {
+   .min = 32,
+   .max = 32,
+   .increment = 1
+   },
+   .iv_size = { 0 }
+   }, }
+   }, }
+   },
RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST()
 };
 
@@ -863,7 +884,8 @@ static const unsigned int auth_digest_byte_lengths[] = {
[IMB_AUTH_SNOW3G_UIA2_BITLEN]   = 4,
[IMB_AUTH_KASUMI_UIA1]  = 4,
 #if IMB_VERSION(1, 5, 0) <= IMB_VERSION_NUM
-   [IMB_AUTH_SM3]  = 32
+   [IMB_AUTH_SM3]  = 32,
+   [IMB_AUTH_HMAC_SM3] = 32,
 #endif
/**< Vector mode dependent pointer table of the multi-buffer APIs */
 
-- 
2.25.1



[PATCH v1 3/3] crypto/ipsec_mb: add SM4 algorithm support

2024-09-06 Thread Brian Dooley
This patch introduces SM4 algorithm support to the AESNI_MB PMD.

Signed-off-by: Brian Dooley 
---
 drivers/crypto/ipsec_mb/pmd_aesni_mb.c  | 22 ++
 drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h | 47 +
 2 files changed, 69 insertions(+)

diff --git a/drivers/crypto/ipsec_mb/pmd_aesni_mb.c 
b/drivers/crypto/ipsec_mb/pmd_aesni_mb.c
index 019867fe1c..1dbffb1337 100644
--- a/drivers/crypto/ipsec_mb/pmd_aesni_mb.c
+++ b/drivers/crypto/ipsec_mb/pmd_aesni_mb.c
@@ -451,6 +451,9 @@ aesni_mb_set_session_cipher_parameters(const IMB_MGR 
*mb_mgr,
uint8_t is_zuc = 0;
uint8_t is_snow3g = 0;
uint8_t is_kasumi = 0;
+#if IMB_VERSION(1, 5, 0) <= IMB_VERSION_NUM
+   uint8_t is_sm4 = 0;
+#endif
 
if (xform == NULL) {
sess->template_job.cipher_mode = IMB_CIPHER_NULL;
@@ -521,6 +524,16 @@ aesni_mb_set_session_cipher_parameters(const IMB_MGR 
*mb_mgr,
sess->iv.offset = xform->cipher.iv.offset;
sess->template_job.iv_len_in_bytes = xform->cipher.iv.length;
return 0;
+#if IMB_VERSION(1, 5, 0) <= IMB_VERSION_NUM
+   case RTE_CRYPTO_CIPHER_SM4_CBC:
+   sess->template_job.cipher_mode = IMB_CIPHER_SM4_CBC;
+   is_sm4 = 1;
+   break;
+   case RTE_CRYPTO_CIPHER_SM4_ECB:
+   sess->template_job.cipher_mode = IMB_CIPHER_SM4_ECB;
+   is_sm4 = 1;
+   break;
+#endif
default:
IPSEC_MB_LOG(ERR, "Unsupported cipher mode parameter");
return -ENOTSUP;
@@ -655,6 +668,15 @@ aesni_mb_set_session_cipher_parameters(const IMB_MGR 
*mb_mgr,
&sess->cipher.pKeySched_kasumi_cipher);
sess->template_job.enc_keys = 
&sess->cipher.pKeySched_kasumi_cipher;
sess->template_job.dec_keys = 
&sess->cipher.pKeySched_kasumi_cipher;
+#if IMB_VERSION(1, 5, 0) <= IMB_VERSION_NUM
+   } else if (is_sm4) {
+   sess->template_job.key_len_in_bytes = IMB_KEY_128_BYTES;
+   IMB_SM4_KEYEXP(mb_mgr, xform->cipher.key.data,
+   sess->cipher.expanded_sm4_keys.encode,
+   sess->cipher.expanded_sm4_keys.decode);
+   sess->template_job.enc_keys = 
sess->cipher.expanded_sm4_keys.encode;
+   sess->template_job.dec_keys = 
sess->cipher.expanded_sm4_keys.decode;
+#endif
} else {
if (xform->cipher.key.length != 8) {
IPSEC_MB_LOG(ERR, "Invalid cipher key length");
diff --git a/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h 
b/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h
index 24c2686952..e9d605d646 100644
--- a/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h
+++ b/drivers/crypto/ipsec_mb/pmd_aesni_mb_priv.h
@@ -774,6 +774,42 @@ static const struct rte_cryptodev_capabilities 
aesni_mb_capabilities[] = {
}, }
}, }
},
+   {   /* SM4 CBC */
+   .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+   {.sym = {
+   .xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
+   {.cipher = {
+   .algo = RTE_CRYPTO_CIPHER_SM4_CBC,
+   .block_size = 16,
+   .key_size = {
+   .min = 16,
+   .max = 16,
+   .increment = 0
+   },
+   .iv_size = {
+   .min = 16,
+   .max = 16,
+   .increment = 0
+   }
+   }, }
+   }, }
+   },
+   {   /* SM4 ECB */
+   .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+   {.sym = {
+   .xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
+   {.cipher = {
+   .algo = RTE_CRYPTO_CIPHER_SM4_ECB,
+   .block_size = 16,
+   .key_size = {
+   .min = 16,
+   .max = 16,
+   .increment = 0
+   },
+   .iv_size = { 0 }
+   }, }
+   }, }
+   },
RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST()
 };
 
@@ -951,6 +987,17 @@ struct __rte_cache_aligned aesni_mb_session {
/* *< SNOW3G scheduled cipher key */
kasumi_key_sched_t pKeySched_kasumi_cipher;
/* *< KASUMI scheduled cipher key */
+#if IMB_VERSION(1, 5, 0) <= IMB_VERSION_NUM
+   struct {
+   alignas(16) uint32_t 
encode[IMB_SM4_KEY_SCHED

Re: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK

2024-09-06 Thread Burakov, Anatoly

On 9/6/2024 2:37 PM, Morten Brørup wrote:

From: Anatoly Burakov [mailto:anatoly.bura...@intel.com]
Sent: Friday, 6 September 2024 13.47
To: dev@dpdk.org
Subject: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK

While initially, DPDK has used the term "socket ID" to refer to physical
package
ID, the last time DPDK read "physical_package_id" for socket ID was ~9 years
ago, so it's been a while since we've actually switched over to using the term
"socket" to mean "NUMA node".

This wasn't a problem before, as most systems had one NUMA node per physical
socket. However, in the last few years, more and more systems have multiple
NUMA
nodes per physical CPU socket. Since DPDK used NUMA nodes already, the
transition was pretty seamless, however now we're faced with a situation when
most of our documentation still uses outdated terms, and our API is ripe with
references to "sockets" when in actuality we mean "NUMA nodes". This could be
a
source of confusion.

While completely renaming all of our API's would be a huge effort, will take a
long time and arguably wouldn't even be worth the API breakages (given that
this
mismatch between terminology and reality is implicitly understood by most
people
working on DPDK, and so this isn't so much of a problem in practice), we can
do
some tweaks around the edges and at least document this unfortunate reality.

This patchset suggests the following changes:

- Update rte_socket/rte_lcore documentation to refer to NUMA nodes rather than
sockets - Rename internal structures' fields to better reflect this intention
-
Rename --socket-mem/--socket-limit flags to refer to NUMA rather than sockets
-
Add internal API to get physical package ID [1]

The documentation is updated to refer to new EAL flags, but is otherwise left
untouched, and instead the entry in "glossary" is amended to indicate that
when
DPDK documentation refers to "sockets", it actually means "NUMA ID's". As next
steps, we could rename all API parameters to refer to NUMA ID rather than
socket
ID - this would not break neither API nor ABI, and instead would be a
documentation change in practice.

[1] This could be used to group lcores by physical package, see e.g.
discussion
 under this patch:
https://patches.dpdk.org/project/dpdk/cover/20240827151014.201-1-
vipin.vargh...@amd.com/


Thank you for cleaning this up, Anatoly.

I would prefer to take one more step and also rename functions and parameters, 
e.g. rte_socket_id() -> rte_numa_id().

For backwards compatibility, macros/functions with the old names can be added.



I don't think we can do such changes without deprecation notices, but 
it's a good candidate for next release.


I have thought about including parameter renames in this patchset, but 
for now I decided against doing so. I can certainly include this in the 
next revision if that's something community is willing to accept.


--
Thanks,
Anatoly



RE: [PATCH v2 1/3] app/testpmd: add register keyword

2024-09-06 Thread Varghese, Vipin
[AMD Official Use Only - AMD Internal Distribution Only]




> > > >> --- a/app/test-pmd/macswap_sse.h
> > > >> +++ b/app/test-pmd/macswap_sse.h
> > > >> @@ -16,13 +16,13 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t
> nb,
> > > >>uint64_t ol_flags;
> > > >>int i;
> > > >>int r;
> > > >> - __m128i addr0, addr1, addr2, addr3;
> > > >> + register __m128i addr0, addr1, addr2, addr3;
> > > > Some compilers treat register as a no-op. Are you sure? Did you check
> with godbolt.
> > >
> > > Thank you Stephen, I have tested the code changes on Linux using GCC
> > > and Clang compiler.
> > >
> > > In both cases in Linux environment, we have seen the the values
> > > loaded onto register `xmm`.
> > >
> > > ```
> > > registerconst__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, 5, 4,
> > > 3, 2, 1, 0, 11, 10, 9, 8, 7, 6); vmovdqaxmm0, xmmwordptr[rip+
> > > .LCPI0_0]
>
> Yep, that what I would probably expect: one time load before the loop starts,
> right?
> Curious  what exactly it would generate then if 'register' keyword is missed?
> BTW, on my box,  gcc-11  with '-O3 -msse4.2 ...'  I am seeing expected
> behavior without 'register' keyword.
> Is it some particular compiler version that misbehaves?

Thank you, Konstantin, for this pointer. I have been trying this understand 
this a bit more internally. Here are my observations

1. shuf simd ISA works on XMM register only.
2. Any values from variables has to be loaded to `xmm` register before 
processing.
3. when compiled for `-march=native` with compiler not aware (SoC Arch gcc 
weights) without patch might have generating with ` movzx   eax, BYTE PTR 
[rbp-48]`
4. when register keyword is applied for both shufl_mask and addr, the compiler 
generates trying to get the variables directly into xmm using ` vmovdqu 
(%rsi),%xmm1`

So, I think you are right, from gcc12.3 and gcc 13.1 which supports 
`-march=znver4` this problem will not come.

>
> > >
> > > ```
> > >
> > > Both cases we have performance improvement.
> > >
> > >
> > > Can you please help us understand if we have missed out something?
> >
> > Ok, not sure why compiler would not decide to already use a register here?


RE: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK

2024-09-06 Thread Morten Brørup
> From: Burakov, Anatoly [mailto:anatoly.bura...@intel.com]
> Sent: Friday, 6 September 2024 14.46
> 
> On 9/6/2024 2:37 PM, Morten Brørup wrote:
> >> From: Anatoly Burakov [mailto:anatoly.bura...@intel.com]
> >> Sent: Friday, 6 September 2024 13.47
> >> To: dev@dpdk.org
> >> Subject: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK
> >>
> >> While initially, DPDK has used the term "socket ID" to refer to physical
> >> package
> >> ID, the last time DPDK read "physical_package_id" for socket ID was ~9
> years
> >> ago, so it's been a while since we've actually switched over to using the
> term
> >> "socket" to mean "NUMA node".
> >>
> >> This wasn't a problem before, as most systems had one NUMA node per
> physical
> >> socket. However, in the last few years, more and more systems have multiple
> >> NUMA
> >> nodes per physical CPU socket. Since DPDK used NUMA nodes already, the
> >> transition was pretty seamless, however now we're faced with a situation
> when
> >> most of our documentation still uses outdated terms, and our API is ripe
> with
> >> references to "sockets" when in actuality we mean "NUMA nodes". This could
> be
> >> a
> >> source of confusion.
> >>
> >> While completely renaming all of our API's would be a huge effort, will
> take a
> >> long time and arguably wouldn't even be worth the API breakages (given that
> >> this
> >> mismatch between terminology and reality is implicitly understood by most
> >> people
> >> working on DPDK, and so this isn't so much of a problem in practice), we
> can
> >> do
> >> some tweaks around the edges and at least document this unfortunate
> reality.
> >>
> >> This patchset suggests the following changes:
> >>
> >> - Update rte_socket/rte_lcore documentation to refer to NUMA nodes rather
> than
> >> sockets - Rename internal structures' fields to better reflect this
> intention
> >> -
> >> Rename --socket-mem/--socket-limit flags to refer to NUMA rather than
> sockets
> >> -
> >> Add internal API to get physical package ID [1]
> >>
> >> The documentation is updated to refer to new EAL flags, but is otherwise
> left
> >> untouched, and instead the entry in "glossary" is amended to indicate that
> >> when
> >> DPDK documentation refers to "sockets", it actually means "NUMA ID's". As
> next
> >> steps, we could rename all API parameters to refer to NUMA ID rather than
> >> socket
> >> ID - this would not break neither API nor ABI, and instead would be a
> >> documentation change in practice.
> >>
> >> [1] This could be used to group lcores by physical package, see e.g.
> >> discussion
> >>  under this patch:
> >> https://patches.dpdk.org/project/dpdk/cover/20240827151014.201-1-
> >> vipin.vargh...@amd.com/
> >
> > Thank you for cleaning this up, Anatoly.
> >
> > I would prefer to take one more step and also rename functions and
> parameters, e.g. rte_socket_id() -> rte_numa_id().
> >
> > For backwards compatibility, macros/functions with the old names can be
> added.
> >
> 
> I don't think we can do such changes without deprecation notices, but
> it's a good candidate for next release.

Perhaps we can keep ABI compatibility by adding wrapper functions with the old 
names/parameters, which simply call the same functions with the new 
names/parameters.

The Devil is in the details, and I haven't looked deeply into this. So take 
with a grain of salt.

> 
> I have thought about including parameter renames in this patchset, but
> for now I decided against doing so. I can certainly include this in the
> next revision if that's something community is willing to accept.

I agree with your decision on this. Renaming the parameters without renaming 
the functions could be confusing.

If we cannot take the additional step to rename the functions, let's also not 
rename their parameters.

> 
> --
> Thanks,
> Anatoly



Re: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK

2024-09-06 Thread Bruce Richardson
On Fri, Sep 06, 2024 at 03:02:53PM +0200, Morten Brørup wrote:
> > From: Burakov, Anatoly [mailto:anatoly.bura...@intel.com]
> > Sent: Friday, 6 September 2024 14.46
> > 
> > On 9/6/2024 2:37 PM, Morten Brørup wrote:
> > >> From: Anatoly Burakov [mailto:anatoly.bura...@intel.com]
> > >> Sent: Friday, 6 September 2024 13.47
> > >> To: dev@dpdk.org
> > >> Subject: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK
> > >>
> > >> While initially, DPDK has used the term "socket ID" to refer to physical
> > >> package
> > >> ID, the last time DPDK read "physical_package_id" for socket ID was ~9
> > years
> > >> ago, so it's been a while since we've actually switched over to using the
> > term
> > >> "socket" to mean "NUMA node".
> > >>
> > >> This wasn't a problem before, as most systems had one NUMA node per
> > physical
> > >> socket. However, in the last few years, more and more systems have 
> > >> multiple
> > >> NUMA
> > >> nodes per physical CPU socket. Since DPDK used NUMA nodes already, the
> > >> transition was pretty seamless, however now we're faced with a situation
> > when
> > >> most of our documentation still uses outdated terms, and our API is ripe
> > with
> > >> references to "sockets" when in actuality we mean "NUMA nodes". This 
> > >> could
> > be
> > >> a
> > >> source of confusion.
> > >>
> > >> While completely renaming all of our API's would be a huge effort, will
> > take a
> > >> long time and arguably wouldn't even be worth the API breakages (given 
> > >> that
> > >> this
> > >> mismatch between terminology and reality is implicitly understood by most
> > >> people
> > >> working on DPDK, and so this isn't so much of a problem in practice), we
> > can
> > >> do
> > >> some tweaks around the edges and at least document this unfortunate
> > reality.
> > >>
> > >> This patchset suggests the following changes:
> > >>
> > >> - Update rte_socket/rte_lcore documentation to refer to NUMA nodes rather
> > than
> > >> sockets - Rename internal structures' fields to better reflect this
> > intention
> > >> -
> > >> Rename --socket-mem/--socket-limit flags to refer to NUMA rather than
> > sockets
> > >> -
> > >> Add internal API to get physical package ID [1]
> > >>
> > >> The documentation is updated to refer to new EAL flags, but is otherwise
> > left
> > >> untouched, and instead the entry in "glossary" is amended to indicate 
> > >> that
> > >> when
> > >> DPDK documentation refers to "sockets", it actually means "NUMA ID's". As
> > next
> > >> steps, we could rename all API parameters to refer to NUMA ID rather than
> > >> socket
> > >> ID - this would not break neither API nor ABI, and instead would be a
> > >> documentation change in practice.
> > >>
> > >> [1] This could be used to group lcores by physical package, see e.g.
> > >> discussion
> > >>  under this patch:
> > >> https://patches.dpdk.org/project/dpdk/cover/20240827151014.201-1-
> > >> vipin.vargh...@amd.com/
> > >
> > > Thank you for cleaning this up, Anatoly.
> > >
> > > I would prefer to take one more step and also rename functions and
> > parameters, e.g. rte_socket_id() -> rte_numa_id().
> > >
> > > For backwards compatibility, macros/functions with the old names can be
> > added.
> > >
> > 
> > I don't think we can do such changes without deprecation notices, but
> > it's a good candidate for next release.
> 
> Perhaps we can keep ABI compatibility by adding wrapper functions with the 
> old names/parameters, which simply call the same functions with the new 
> names/parameters.
> 
> The Devil is in the details, and I haven't looked deeply into this. So take 
> with a grain of salt.
> 
> > 
> > I have thought about including parameter renames in this patchset, but
> > for now I decided against doing so. I can certainly include this in the
> > next revision if that's something community is willing to accept.
> 
> I agree with your decision on this. Renaming the parameters without renaming 
> the functions could be confusing.
> 

I actually wonder if that is true. If we are simply renaming the parameters
without:
a) changing their types
b) changing the function behaviour
then it is neither an API nor an ABI break. If we were to do so, it would
be like changing a comment, since the actual parameter name is purely a
convenience to hint to the user what the value being passed actually does.

That only applies for function parameters though. For any defines or macros
that need renaming, then we are into API break territory and we would want
backward compatible versions of same.

/Bruce


Re: [PATCH 0/3] eal: mark API's as stable

2024-09-06 Thread Jerin Jacob
On Fri, Sep 6, 2024 at 3:04 PM Ferruh Yigit  wrote:
>
> On 9/5/2024 8:58 AM, David Marchand wrote:
> > On Wed, Sep 4, 2024 at 8:10 PM Stephen Hemminger
> >  wrote:
> >>
> >> The API's in ethtool from before 23.11 should be marked stable.
> >
> > EAL* ?
> >
> >> Should probably include the trace api's but that is more complex change.
> >
> > On the trace API itself it should be ok.
> > The problem is with the tracepoint variables themselves, and I don't
> > think we should mark them stable.
> >
>
> We cleaned tracepoint variables from ethdev map file, why they exist for
> 'eal'?
>
> I can see .map file has bunch of "__rte_eal_trace_generic_*", I think
> they exists to support 'rte_eal_trace_generic_*()' APIs which can be
> called from other libraries.
>
> Do we really need them?
> Why not whoever calls them directly call 'rte_trace_point_emit_*' instead?
> As these rte_eal_trace_generic_*()' not used at all, I assume this is
> what done already.
>
> @Jerin,
> what do think to remove 'rte_eal_trace_generic_*()' APIs, so trace
> always keeps local to library, and don't bloat the eal .map file?

The purpose of exposing rte_eal_trace_generic_* is that, applications
can add generic trace points
in the application.


>
>


[RFCv2 0/6] Stage-Ordered API and other extensions for ring library

2024-09-06 Thread Konstantin Ananyev
From: Konstantin Ananyev 

v1 -> v2:
- dropped 
- rename 'elmst/objst' to 'meta' (Morten)
- introduce new data-path APIs set: one with both meta{} and objs[],
  second with just objs[] (Morten)
- split data-path APIs into burst/bulk flavours (same as rte_ring)
- added dump function for te_soring and improved dump() for rte_ring. 
- dropped patch from v1:
  " ring: minimize reads of the counterpart cache-line"
  - no performance gain observed
  - actually it does change behavior of conventional rte_ring
enqueue/dequeue APIs -
it could return available/free less then actually exist in the ring.
As in some other libs we reliy on that information - it will 
introduce problems.

The main aim of these series is to extend ring library with
new API that allows user to create/use Staged-Ordered-Ring (SORING)
abstraction. In addition to that there are few other patches that serve
different purposes:
- first two patches are just code reordering to de-duplicate
  and generalize existing rte_ring code.
- patch #3 extends rte_ring_dump() to correctly print head/tail metadata
  for different sync modes.
- next two patches introduce SORING API into the ring library and
  provide UT for it.
- patch #6 extends l3fwd sample app to work in pipeline (worker-pool) mode.
  Right now it is done for demonstration and performance comparison
  purposes, as it makes possible to run l3fwd in different modes:
  run-to-completion, eventdev, pipeline
  and perform sort-of 'apple-to-apple' performance comparisons.
  I am aware that in general community consensus on l3fwd is to keep its
  functionality simple and limited. From other side we already do have
  eventdev  mode for it, so why pipeline should be prohibited?
  Though if l3fwd is not an option, then we need to select some other
  existing sample app to integrate with. Probably ipsec-secgw would be the
  second best choice from my perspective, though it would require much more
  effort.
  Have to say that current l3fwd patch is way too big and unfinished,
  so if we'll decide to go forward with it, it has to be split and reworked.

Seeking community help/feedback (apart from usual patch review activity):
=
- While we tested these changes quite extensively, our platform coverage
  is limitedi to x86.
  So would appreciate the feedback how it behaves on other architectures
  DPDK supports (ARM, PPC, etc.).
- Adding new (pipeline) mode for l3fwd sample app.
  Is it worth it? If not, what other sample app should be used to
  demonstrate new functionality we worked on? ipsec-secgw? Something else?

SORING overview
==
Staged-Ordered-Ring (SORING) provides a SW abstraction for 'ordered' queues
with multiple processing 'stages'. It is based on conventional DPDK rte_ring,
re-uses many of its concepts, and even substantial part of its code.
It can be viewed as an 'extension' of rte_ring functionality.
In particular, main SORING properties:
- circular ring buffer with fixed size objects
- producer, consumer plus multiple processing stages in between.
- allows to split objects processing into multiple stages.
- objects remain in the same ring while moving from one stage to the other,
  initial order is preserved, no extra copying needed.
- preserves the ingress order of objects within the queue across multiple
  stages
- each stage (and producer/consumer) can be served by single and/or
  multiple threads.
- number of stages, size and number of objects in the ring are
 configurable at ring initialization time.

Data-path API provides four main operations:
- enqueue/dequeue works in the same manner as for conventional rte_ring,
  all rte_ring synchronization types are supported.
- acquire/release - for each stage there is an acquire (start) and
  release (finish) operation. After some objects are 'acquired' - given thread
  can safely assume that it has exclusive ownership of these objects till
  it will invoke 'release' for them.
  After 'release', objects can be 'acquired' by next stage and/or dequeued
  by the consumer (in case of last stage).

Expected use-case: applications that uses pipeline model
(probably with multiple stages) for packet processing, when preserving
incoming packet order is important.

The concept of ‘ring with stages’ is similar to DPDK OPDL eventdev PMD [1],
but the internals are different.
In particular, SORING maintains internal array of 'states' for each element
in the ring that is  shared by all threads/processes that access the ring.
That allows 'release' to avoid excessive waits on the tail value and helps
to improve performancei and scalability.
In terms of performance, with our measurements rte_soring and
conventional rte_ring provide nearly identical numbers.
As an example, on our SUT: Intel ICX CPU @ 2.00GHz,
l3fwd (--lookup=acl) in pipeline mode (see patch #5 for details) both
rte_ring and rte_soring reach ~20Mpps for single I/O lcore and same
number of worker 

[RFCv2 1/6] ring: common functions for 'move head' ops

2024-09-06 Thread Konstantin Ananyev
From: Konstantin Ananyev 

Note upfront: that change doesn't introduce any functional or
performance changes.
It is just a code-reordering for:
 - code deduplication
 - ability in future to re-use the same code to introduce new functionality

For each sync mode corresponding move_prod_head() and
move_cons_head() are nearly identical to each other,
the only differences are:
 - do we need to use a @capacity to calculate number of entries or not.
 - what we need to update (prod/cons) and what is used as
   read-only counterpart.
So instead of having 2 copies of nearly identical functions,
introduce a new common one that could be used by both functions:
move_prod_head() and move_cons_head().

As another positive thing - we can get rid of referencing whole rte_ring
structure in that new common sub-function.

Signed-off-by: Konstantin Ananyev 
---
 lib/ring/rte_ring_c11_pvt.h  | 134 +--
 lib/ring/rte_ring_elem_pvt.h |  66 +++
 lib/ring/rte_ring_generic_pvt.h  | 121 
 lib/ring/rte_ring_hts_elem_pvt.h |  85 ++--
 lib/ring/rte_ring_rts_elem_pvt.h |  85 ++--
 5 files changed, 149 insertions(+), 342 deletions(-)

diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index 629b2d9288..048933ddc6 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -28,41 +28,19 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, 
uint32_t old_val,
rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
 }
 
-/**
- * @internal This function updates the producer head for enqueue
- *
- * @param r
- *   A pointer to the ring structure
- * @param is_sp
- *   Indicates whether multi-producer path is needed or not
- * @param n
- *   The number of elements we will want to enqueue, i.e. how far should the
- *   head be moved
- * @param behavior
- *   RTE_RING_QUEUE_FIXED:Enqueue a fixed number of items from a ring
- *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
- * @param old_head
- *   Returns head value as it was before the move, i.e. where enqueue starts
- * @param new_head
- *   Returns the current/new head value i.e. where enqueue finishes
- * @param free_entries
- *   Returns the amount of free space in the ring BEFORE head was moved
- * @return
- *   Actual number of objects enqueued.
- *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
- */
 static __rte_always_inline unsigned int
-__rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
-   unsigned int n, enum rte_ring_queue_behavior behavior,
-   uint32_t *old_head, uint32_t *new_head,
-   uint32_t *free_entries)
+__rte_ring_headtail_move_head(struct rte_ring_headtail *d,
+   const struct rte_ring_headtail *s, uint32_t capacity,
+   unsigned int is_st, unsigned int n,
+   enum rte_ring_queue_behavior behavior,
+   uint32_t *old_head, uint32_t *new_head, uint32_t *entries)
 {
-   const uint32_t capacity = r->capacity;
-   uint32_t cons_tail;
-   unsigned int max = n;
+   uint32_t stail;
int success;
+   unsigned int max = n;
 
-   *old_head = rte_atomic_load_explicit(&r->prod.head, 
rte_memory_order_relaxed);
+   *old_head = rte_atomic_load_explicit(&d->head,
+   rte_memory_order_relaxed);
do {
/* Reset n to the initial burst count */
n = max;
@@ -73,112 +51,36 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int 
is_sp,
/* load-acquire synchronize with store-release of ht->tail
 * in update_tail.
 */
-   cons_tail = rte_atomic_load_explicit(&r->cons.tail,
+   stail = rte_atomic_load_explicit(&s->tail,
rte_memory_order_acquire);
 
/* The subtraction is done between two unsigned 32bits value
 * (the result is always modulo 32 bits even if we have
-* *old_head > cons_tail). So 'free_entries' is always between 0
+* *old_head > s->tail). So 'free_entries' is always between 0
 * and capacity (which is < size).
 */
-   *free_entries = (capacity + cons_tail - *old_head);
+   *entries = (capacity + stail - *old_head);
 
/* check that we have enough room in ring */
-   if (unlikely(n > *free_entries))
+   if (unlikely(n > *entries))
n = (behavior == RTE_RING_QUEUE_FIXED) ?
-   0 : *free_entries;
+   0 : *entries;
 
if (n == 0)
return 0;
 
*new_head = *old_head + n;
-   if (is_sp) {
-   r->prod.head = *new_head;
+   if

[RFCv2 2/6] ring: make copying functions generic

2024-09-06 Thread Konstantin Ananyev
From: Konstantin Ananyev 

Note upfront: that change doesn't introduce any functional
or performance changes.
It is just a code-reordering for:
 - improve code modularity and re-usability
 - ability in future to re-use the same code to introduce new functionality

There is no real need for enqueue_elems()/dequeue_elems()
to get pointer to actual rte_ring structure, instead it is enough to pass
a pointer to actual elements buffer inside the ring.
In return, we'll get a copying functions that could be used for other
queueing abstractions that do have circular ring buffer inside.

Signed-off-by: Konstantin Ananyev 
---
 lib/ring/rte_ring_elem_pvt.h | 117 ---
 1 file changed, 68 insertions(+), 49 deletions(-)

diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h
index 3a83668a08..216cb6089f 100644
--- a/lib/ring/rte_ring_elem_pvt.h
+++ b/lib/ring/rte_ring_elem_pvt.h
@@ -17,12 +17,14 @@
 #endif
 
 static __rte_always_inline void
-__rte_ring_enqueue_elems_32(struct rte_ring *r, const uint32_t size,
-   uint32_t idx, const void *obj_table, uint32_t n)
+__rte_ring_enqueue_elems_32(void *ring_table, const void *obj_table,
+   uint32_t size, uint32_t idx, uint32_t n)
 {
unsigned int i;
-   uint32_t *ring = (uint32_t *)&r[1];
+
+   uint32_t *ring = ring_table;
const uint32_t *obj = (const uint32_t *)obj_table;
+
if (likely(idx + n <= size)) {
for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
ring[idx] = obj[i];
@@ -60,14 +62,14 @@ __rte_ring_enqueue_elems_32(struct rte_ring *r, const 
uint32_t size,
 }
 
 static __rte_always_inline void
-__rte_ring_enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
-   const void *obj_table, uint32_t n)
+__rte_ring_enqueue_elems_64(void *ring_table, const void *obj_table,
+   uint32_t size, uint32_t idx, uint32_t n)
 {
unsigned int i;
-   const uint32_t size = r->size;
-   uint32_t idx = prod_head & r->mask;
-   uint64_t *ring = (uint64_t *)&r[1];
+
+   uint64_t *ring = ring_table;
const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
+
if (likely(idx + n <= size)) {
for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
ring[idx] = obj[i];
@@ -93,14 +95,14 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r, uint32_t 
prod_head,
 }
 
 static __rte_always_inline void
-__rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
-   const void *obj_table, uint32_t n)
+__rte_ring_enqueue_elems_128(void *ring_table, const void *obj_table,
+   uint32_t size, uint32_t idx, uint32_t n)
 {
unsigned int i;
-   const uint32_t size = r->size;
-   uint32_t idx = prod_head & r->mask;
-   rte_int128_t *ring = (rte_int128_t *)&r[1];
+
+   rte_int128_t *ring = ring_table;
const rte_int128_t *obj = (const rte_int128_t *)obj_table;
+
if (likely(idx + n <= size)) {
for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
memcpy((void *)(ring + idx),
@@ -126,37 +128,47 @@ __rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t 
prod_head,
  * single and multi producer enqueue functions.
  */
 static __rte_always_inline void
-__rte_ring_enqueue_elems(struct rte_ring *r, uint32_t prod_head,
-   const void *obj_table, uint32_t esize, uint32_t num)
+__rte_ring_do_enqueue_elems(void *ring_table, const void *obj_table,
+   uint32_t size, uint32_t idx, uint32_t esize, uint32_t num)
 {
/* 8B and 16B copies implemented individually to retain
 * the current performance.
 */
if (esize == 8)
-   __rte_ring_enqueue_elems_64(r, prod_head, obj_table, num);
+   __rte_ring_enqueue_elems_64(ring_table, obj_table, size,
+   idx, num);
else if (esize == 16)
-   __rte_ring_enqueue_elems_128(r, prod_head, obj_table, num);
+   __rte_ring_enqueue_elems_128(ring_table, obj_table, size,
+   idx, num);
else {
-   uint32_t idx, scale, nr_idx, nr_num, nr_size;
+   uint32_t scale, nr_idx, nr_num, nr_size;
 
/* Normalize to uint32_t */
scale = esize / sizeof(uint32_t);
nr_num = num * scale;
-   idx = prod_head & r->mask;
nr_idx = idx * scale;
-   nr_size = r->size * scale;
-   __rte_ring_enqueue_elems_32(r, nr_size, nr_idx,
-   obj_table, nr_num);
+   nr_size = size * scale;
+   __rte_ring_enqueue_elems_32(ring_table, obj_table, nr_size,
+   nr_idx, nr_num);
}
 }
 
 static __rte_always_inline void
-__rte_ring_dequeue_elems_32(struct rte_ring *r, const uint32_t size,
-   uint32_t idx, void *obj_table, uint32_t n)

[RFCv2 3/6] ring: make dump function more verbose

2024-09-06 Thread Konstantin Ananyev
From: Eimear Morrissey 

The current rte_ring_dump function uses the generic rte_ring_headtail
structure to access head/tail positions. This is incorrect for the RTS
case where the head is stored in a different offset in the union of
structs. Switching to a separate function for each sync type allows
to dump correct head/tail values and extra metadata.

Signed-off-by: Eimear Morrissey 
---
 .mailmap |  1 +
 app/test/test_ring_stress_impl.h |  1 +
 lib/ring/rte_ring.c  | 87 ++--
 lib/ring/rte_ring.h  | 15 ++
 lib/ring/version.map |  7 +++
 5 files changed, 107 insertions(+), 4 deletions(-)

diff --git a/.mailmap b/.mailmap
index 4a508bafad..3da86393c0 100644
--- a/.mailmap
+++ b/.mailmap
@@ -379,6 +379,7 @@ Eduard Serra 
 Edward Makarov 
 Edwin Brossette 
 Eelco Chaudron 
+Eimear Morrissey 
 Elad Nachman 
 Elad Persiko 
 Elena Agostini 
diff --git a/app/test/test_ring_stress_impl.h b/app/test/test_ring_stress_impl.h
index 8b0bfb11fe..8449cb4b15 100644
--- a/app/test/test_ring_stress_impl.h
+++ b/app/test/test_ring_stress_impl.h
@@ -380,6 +380,7 @@ test_mt1(int (*test)(void *))
}
 
lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
+   rte_ring_dump(stdout, r);
mt1_fini(r, data);
return rc;
 }
diff --git a/lib/ring/rte_ring.c b/lib/ring/rte_ring.c
index aebb6d6728..261f2a06db 100644
--- a/lib/ring/rte_ring.c
+++ b/lib/ring/rte_ring.c
@@ -364,20 +364,99 @@ rte_ring_free(struct rte_ring *r)
rte_free(te);
 }
 
+static const char *
+ring_get_sync_type(const enum rte_ring_sync_type st)
+{
+   switch (st) {
+   case RTE_RING_SYNC_ST:
+   return "single thread";
+   case RTE_RING_SYNC_MT:
+   return "multi thread";
+   case RTE_RING_SYNC_MT_RTS:
+   return "multi thread - RTS";
+   case RTE_RING_SYNC_MT_HTS:
+   return "multi thread - HTS";
+   default:
+   return "unknown";
+   }
+}
+
+static void
+ring_dump_ht_headtail(FILE *f, const char *prefix,
+   const struct rte_ring_headtail *ht)
+{
+   fprintf(f, "%ssync_type=%s\n", prefix,
+   ring_get_sync_type(ht->sync_type));
+   fprintf(f, "%shead=%"PRIu32"\n", prefix, ht->head);
+   fprintf(f, "%stail=%"PRIu32"\n", prefix, ht->tail);
+}
+
+static void
+ring_dump_rts_headtail(FILE *f, const char *prefix,
+   const struct rte_ring_rts_headtail *rts)
+{
+   fprintf(f, "%ssync_type=%s\n", prefix,
+   ring_get_sync_type(rts->sync_type));
+   fprintf(f, "%shead.pos=%"PRIu32"\n", prefix, rts->head.val.pos);
+   fprintf(f, "%shead.cnt=%"PRIu32"\n", prefix, rts->head.val.cnt);
+   fprintf(f, "%stail.pos=%"PRIu32"\n", prefix, rts->tail.val.pos);
+   fprintf(f, "%stail.cnt=%"PRIu32"\n", prefix, rts->tail.val.cnt);
+   fprintf(f, "%shtd_max=%"PRIu32"\n", prefix, rts->htd_max);
+}
+
+static void
+ring_dump_hts_headtail(FILE *f, const char *prefix,
+   const struct rte_ring_hts_headtail *hts)
+{
+   fprintf(f, "%ssync_type=%s\n", prefix,
+   ring_get_sync_type(hts->sync_type));
+   fprintf(f, "%shead=%"PRIu32"\n", prefix, hts->ht.pos.head);
+   fprintf(f, "%stail=%"PRIu32"\n", prefix, hts->ht.pos.tail);
+}
+
+void
+rte_ring_headtail_dump(FILE *f, const char *prefix,
+   const struct rte_ring_headtail *r)
+{
+   if (f == NULL || r == NULL)
+   return;
+
+   prefix = (prefix != NULL) ? prefix : "";
+
+   switch (r->sync_type) {
+   case RTE_RING_SYNC_ST:
+   case RTE_RING_SYNC_MT:
+   ring_dump_ht_headtail(f, prefix, r);
+   break;
+   case RTE_RING_SYNC_MT_RTS:
+   ring_dump_rts_headtail(f, prefix,
+   (const struct rte_ring_rts_headtail *)r);
+   break;
+   case RTE_RING_SYNC_MT_HTS:
+   ring_dump_hts_headtail(f, prefix,
+   (const struct rte_ring_hts_headtail *)r);
+   break;
+   default:
+   RING_LOG(ERR, "Invalid ring sync type detected");
+   }
+}
+
 /* dump the status of the ring on the console */
 void
 rte_ring_dump(FILE *f, const struct rte_ring *r)
 {
+   if (f == NULL || r == NULL)
+   return;
+
fprintf(f, "ring <%s>@%p\n", r->name, r);
fprintf(f, "  flags=%x\n", r->flags);
fprintf(f, "  size=%"PRIu32"\n", r->size);
fprintf(f, "  capacity=%"PRIu32"\n", r->capacity);
-   fprintf(f, "  ct=%"PRIu32"\n", r->cons.tail);
-   fprintf(f, "  ch=%"PRIu32"\n", r->cons.head);
-   fprintf(f, "  pt=%"PRIu32"\n", r->prod.tail);
-   fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
fprintf(f, "  used=%u\n", rte_ring_count(r));
fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
+
+   rte_ring_headtail_dump(f, "  cons.", &(r->cons));
+   rte_rin

[RFCv2 4/6] ring/soring: introduce Staged Ordered Ring

2024-09-06 Thread Konstantin Ananyev
From: Konstantin Ananyev 

Staged-Ordered-Ring (SORING) provides a SW abstraction for 'ordered' queues
with multiple processing 'stages'.
It is based on conventional DPDK rte_ring, re-uses many of its concepts,
and even substantial part of its code.
It can be viewed as an 'extension' of rte_ring functionality.
In particular, main SORING properties:
- circular ring buffer with fixed size objects
- producer, consumer plus multiple processing stages in the middle.
- allows to split objects processing into multiple stages.
- objects remain in the same ring while moving from one stage to the other,
  initial order is preserved, no extra copying needed.
- preserves the ingress order of objects within the queue across multiple
  stages, i.e.:
  at the same stage multiple threads can process objects from the ring in
  any order, but for the next stage objects will always appear in the
  original order.
- each stage (and producer/consumer) can be served by single and/or
  multiple threads.
- number of stages, size and number of objects in the ring are
  configurable at ring initialization time.

Data-path API provides four main operations:
- enqueue/dequeue works in the same manner as for conventional rte_ring,
  all rte_ring synchronization types are supported.
- acquire/release - for each stage there is an acquire (start) and
  release (finish) operation.
  after some objects are 'acquired' - given thread can safely assume that
  it has exclusive possession of these objects till 'release' for them is
  invoked.
  Note that right now user has to release exactly the same number of
  objects that was acquired before.
  After 'release', objects can be 'acquired' by next stage and/or dequeued
  by the consumer (in case of last stage).

Expected use-case: applications that uses pipeline model
(probably with multiple stages) for packet processing, when preserving
incoming packet order is important. I.E.: IPsec processing, etc.

Signed-off-by: Eimear Morrissey 
Signed-off-by: Konstantin Ananyev 
---
 lib/ring/meson.build  |   4 +-
 lib/ring/rte_soring.c | 182 ++
 lib/ring/rte_soring.h | 547 +
 lib/ring/soring.c | 548 ++
 lib/ring/soring.h | 124 ++
 lib/ring/version.map  |  19 ++
 6 files changed, 1422 insertions(+), 2 deletions(-)
 create mode 100644 lib/ring/rte_soring.c
 create mode 100644 lib/ring/rte_soring.h
 create mode 100644 lib/ring/soring.c
 create mode 100644 lib/ring/soring.h

diff --git a/lib/ring/meson.build b/lib/ring/meson.build
index 7fca958ed7..21f2c12989 100644
--- a/lib/ring/meson.build
+++ b/lib/ring/meson.build
@@ -1,8 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-sources = files('rte_ring.c')
-headers = files('rte_ring.h')
+sources = files('rte_ring.c', 'rte_soring.c', 'soring.c')
+headers = files('rte_ring.h', 'rte_soring.h')
 # most sub-headers are not for direct inclusion
 indirect_headers += files (
 'rte_ring_core.h',
diff --git a/lib/ring/rte_soring.c b/lib/ring/rte_soring.c
new file mode 100644
index 00..b6bc71b8c9
--- /dev/null
+++ b/lib/ring/rte_soring.c
@@ -0,0 +1,182 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Huawei Technologies Co., Ltd
+ */
+
+#include 
+
+#include "soring.h"
+#include 
+
+RTE_LOG_REGISTER_DEFAULT(soring_logtype, INFO);
+#define RTE_LOGTYPE_SORING soring_logtype
+#define SORING_LOG(level, ...) \
+   RTE_LOG_LINE(level, SORING, "" __VA_ARGS__)
+
+static uint32_t
+soring_calc_elem_num(uint32_t count)
+{
+   return rte_align32pow2(count + 1);
+}
+
+static int
+soring_check_param(uint32_t esize, uint32_t stsize, uint32_t count,
+   uint32_t stages)
+{
+   if (stages == 0) {
+   SORING_LOG(ERR, "invalid number of stages: %u", stages);
+   return -EINVAL;
+   }
+
+   /* Check if element size is a multiple of 4B */
+   if (esize == 0 || esize % 4 != 0) {
+   SORING_LOG(ERR, "invalid element size: %u", esize);
+   return -EINVAL;
+   }
+
+   /* Check if ret-code size is a multiple of 4B */
+   if (stsize % 4 != 0) {
+   SORING_LOG(ERR, "invalid retcode size: %u", stsize);
+   return -EINVAL;
+   }
+
+/* count must be a power of 2 */
+   if (rte_is_power_of_2(count) == 0 ||
+   (count > RTE_SORING_ELEM_MAX + 1)) {
+   SORING_LOG(ERR, "invalid number of elements: %u", count);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+/*
+ * Calculate size offsets for SORING internal data layout.
+ */
+static size_t
+soring_get_szofs(uint32_t esize, uint32_t stsize, uint32_t count,
+   uint32_t stages, size_t *elst_ofs, size_t *state_ofs,
+   size_t *stage_ofs)
+{
+   size_t sz;
+   const struct rte_soring * const r = NULL;
+
+   sz = sizeof(r[0]) + (size_t)count * esize;
+   sz = RTE_ALIGN(sz, RTE_CA

[RFCv2 5/6] app/test: add unit tests for soring API

2024-09-06 Thread Konstantin Ananyev
From: Konstantin Ananyev 

Add both functional and stess test-cases for soring API.
Stress test serves as both functional and performance test of soring
enqueue/dequeue/acquire/release operations under high contention
(for both over committed and non-over committed scenarios).

Signed-off-by: Eimear Morrissey 
Signed-off-by: Konstantin Ananyev 
---
 app/test/meson.build   |   3 +
 app/test/test_soring.c | 442 +++
 app/test/test_soring_mt_stress.c   |  40 ++
 app/test/test_soring_stress.c  |  48 ++
 app/test/test_soring_stress.h  |  35 ++
 app/test/test_soring_stress_impl.h | 827 +
 6 files changed, 1395 insertions(+)
 create mode 100644 app/test/test_soring.c
 create mode 100644 app/test/test_soring_mt_stress.c
 create mode 100644 app/test/test_soring_stress.c
 create mode 100644 app/test/test_soring_stress.h
 create mode 100644 app/test/test_soring_stress_impl.h

diff --git a/app/test/meson.build b/app/test/meson.build
index e29258e6ec..c290162e43 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -175,6 +175,9 @@ source_file_deps = {
 'test_security_proto.c' : ['cryptodev', 'security'],
 'test_seqlock.c': [],
 'test_service_cores.c': [],
+'test_soring.c': [],
+'test_soring_mt_stress.c': [],
+'test_soring_stress.c': [],
 'test_spinlock.c': [],
 'test_stack.c': ['stack'],
 'test_stack_perf.c': ['stack'],
diff --git a/app/test/test_soring.c b/app/test/test_soring.c
new file mode 100644
index 00..b2110305a7
--- /dev/null
+++ b/app/test/test_soring.c
@@ -0,0 +1,442 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Huawei Technologies Co., Ltd
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "test.h"
+
+#define MAX_ACQUIRED 20
+
+#define SORING_TEST_ASSERT(val, expected) do { \
+   RTE_TEST_ASSERT(expected == val, \
+   "%s: expected %u got %u\n", #val, expected, val); \
+} while (0)
+
+static void
+set_soring_init_param(struct rte_soring_param *prm,
+   const char *name, uint32_t esize, uint32_t elems,
+   uint32_t stages, uint32_t stsize,
+   enum rte_ring_sync_type rst_prod,
+   enum rte_ring_sync_type rst_cons)
+{
+   prm->name = name;
+   prm->elem_size = esize;
+   prm->elems = elems;
+   prm->stages = stages;
+   prm->meta_size = stsize;
+   prm->prod_synt = rst_prod;
+   prm->cons_synt = rst_cons;
+}
+
+static int
+move_forward_stage(struct rte_soring *sor,
+   uint32_t num_packets, uint32_t stage)
+{
+   uint32_t acquired;
+   uint32_t ftoken;
+   uint32_t *acquired_objs[MAX_ACQUIRED];
+
+   acquired = rte_soring_acquire_bulk(sor, acquired_objs, stage,
+   num_packets, &ftoken, NULL);
+   SORING_TEST_ASSERT(acquired, num_packets);
+   rte_soring_release(sor, NULL, stage, num_packets,
+   ftoken);
+
+   return 0;
+}
+
+/*
+ * struct rte_soring_param param checking.
+ */
+static int
+test_soring_init(void)
+{
+   struct rte_soring *sor = NULL;
+   struct rte_soring_param prm;
+   int rc;
+   size_t sz;
+   memset(&prm, 0, sizeof(prm));
+
+/*init memory*/
+   set_soring_init_param(&prm, "alloc_memory", sizeof(uintptr_t),
+   4, 1, 4, RTE_RING_SYNC_MT, RTE_RING_SYNC_MT);
+   sz = rte_soring_get_memsize(&prm);
+   sor = rte_zmalloc(NULL, sz, RTE_CACHE_LINE_SIZE);
+   RTE_TEST_ASSERT_NOT_NULL(sor, "could not allocate memory for soring");
+
+   set_soring_init_param(&prm, "test_invalid_stages", sizeof(uintptr_t),
+   4, 0, 4, RTE_RING_SYNC_MT, RTE_RING_SYNC_MT);
+   rc = rte_soring_init(sor, &prm);
+   RTE_TEST_ASSERT_FAIL(rc, "initted soring with invalid num stages");
+
+   set_soring_init_param(&prm, "test_invalid_esize", 0,
+   4, 1, 4, RTE_RING_SYNC_MT, RTE_RING_SYNC_MT);
+   rc = rte_soring_init(sor, &prm);
+   RTE_TEST_ASSERT_FAIL(rc, "initted soring with 0 esize");
+
+   set_soring_init_param(&prm, "test_invalid_esize", 9,
+   4, 1, 4, RTE_RING_SYNC_MT, RTE_RING_SYNC_MT);
+   rc = rte_soring_init(sor, &prm);
+   RTE_TEST_ASSERT_FAIL(rc, "initted soring with esize not multiple of 4");
+
+   set_soring_init_param(&prm, "test_invalid_rsize", sizeof(uintptr_t),
+   4, 1, 3, RTE_RING_SYNC_MT, RTE_RING_SYNC_MT);
+   rc = rte_soring_init(sor, &prm);
+   RTE_TEST_ASSERT_FAIL(rc, "initted soring with rcsize not multiple of 
4");
+
+   set_soring_init_param(&prm, "test_invalid_elems", sizeof(uintptr_t),
+   RTE_SORING_ELEM_MAX + 1, 1, 4, RTE_RING_SYNC_MT,
+   

[RFCv2 6/6] examples/l3fwd: make ACL work in pipeline and eventdev modes

2024-09-06 Thread Konstantin Ananyev
From: Konstantin Ananyev 

Note upfront:
This is a huge commit that is combined from several ones.
For now, I submit it just for reference and demonstration purposes and
will probably remove it in future versions.
If will decide to go ahead with it, then it needs to be reworked and split
into several proper commits.

It adds for l3fwd:
 - eventdev mode for ACL lookup-mode
 - Introduce a worker-pool-mode
   (right now implemented for ACL lookup-mode only).
Worker-Pool mode is a simple pipeline model, with the following stages:
 1) I/O thread receives packets from NIC RX HW queues and enqueues them
into the work queue
 2) Worker thread reads packets from the work queue(s),
process them and then puts processed packets back into the
work queue along with the processing status (routing info/error code).
 3) I/O thread dequeues packets and their status from the work queue,
and based on it either TX packet or drops it.
Very similar to l3fwd-eventdev working model.

Note that it could be several I/O threads, each can serve one or multiple
HW RX queues. Also there could be several Worker threads, each of them can
process packets from multiple work queues in round-robin fashion.

Work queue can be one of the following types:
 - wqorder: allows Worker threads to process packets in any order,
   but guarantees that on dequeue stage the ingress order of packets
   will be preserved. I.E. at stage #3, I/O thread will get packets
   exactly in the same order as they were enqueued at stage #1.
 - wqunorder: doesn't provide any ordered guarantees.

'wqunroder' mode is implemented using 2 rte_ring structures per queue.
'wqorder' mode is implemtened using rte_soring structure per queue.

To facilitate this new functionality, command line parameters were
extended:
 --mode:
   Possible values one of: poll/eventdev/wqorder/wqorderS/wqunorder/wqunorderS
   Default value: poll
   - wqorder: Worker-Pool ordered mode with a separate work queue for each
 HW RX queue.
   - wqorderS: Worker-Pool ordered mode with one work queue per I/O thread.
   - wqunorder: Worker-Pool un-ordered mode with a separate work queue for each
 HW RX queue.
   - wqunorderS: Worker-Pool un-ordered mode with oen work queue per I/O thread.
 --wqsize: number of elements for each worker queue.
 --lookup-iter: forces to perform ACL lookup several times over the same
   packet. This is artificial parameter and is added temporally for
   benchmarking purposes. Will be removed in latest versions (if any).

Note that in Worker-Pool mode all free lcores that were not assigned as
I/O threads will be used as Worker threads.
As an example:
dpdk-l3fwd --lcores=53,55,57,59,61 ... -- \
-P -p f --config '(0,0,53)(1,0,53)(2,0,53)(3,0,53)' --lookup acl \
--parse-ptype --mode=wqorder ...
In that case lcore 53 will be used as I/O thread (stages #1,3)
to serve 4 HW RX queues,
while lcores 55,57,59,61 will serve as Worker threads (stage #2).

Signed-off-by: Konstantin Ananyev 
---
 examples/l3fwd/l3fwd.h   |  55 +++
 examples/l3fwd/l3fwd_acl.c   | 125 +++---
 examples/l3fwd/l3fwd_acl_event.h | 258 +
 examples/l3fwd/l3fwd_event.c |  14 ++
 examples/l3fwd/l3fwd_event.h |   1 +
 examples/l3fwd/l3fwd_sse.h   |  49 +-
 examples/l3fwd/l3fwd_wqp.c   | 274 +++
 examples/l3fwd/l3fwd_wqp.h   | 130 +++
 examples/l3fwd/main.c|  75 -
 examples/l3fwd/meson.build   |   1 +
 10 files changed, 954 insertions(+), 28 deletions(-)
 create mode 100644 examples/l3fwd/l3fwd_acl_event.h
 create mode 100644 examples/l3fwd/l3fwd_wqp.c
 create mode 100644 examples/l3fwd/l3fwd_wqp.h

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 93ce652d02..218f363764 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -77,6 +77,42 @@ struct __rte_cache_aligned lcore_rx_queue {
uint16_t queue_id;
 };
 
+enum L3FWD_WORKER_MODE {
+   L3FWD_WORKER_POLL,
+   L3FWD_WORKER_UNQUE,
+   L3FWD_WORKER_ORQUE,
+};
+
+struct l3fwd_wqp_param {
+   enum L3FWD_WORKER_MODE mode;
+   uint32_t qsize;/**< Number of elems in worker queue */
+   int32_t single;/**< use single queue per I/O (poll) thread */
+};
+
+extern struct l3fwd_wqp_param l3fwd_wqp_param;
+
+enum {
+   LCORE_WQ_IN,
+   LCORE_WQ_OUT,
+   LCORE_WQ_NUM,
+};
+
+union lcore_wq {
+   struct rte_ring *r[LCORE_WQ_NUM];
+   struct {
+   struct rte_soring *sor;
+   /* used by WQ, sort of thred-local var */
+   uint32_t ftoken;
+   };
+};
+
+struct lcore_wq_pool {
+   uint32_t nb_queue;
+   uint32_t qmask;
+   union lcore_wq queue[MAX_RX_QUEUE_PER_LCORE];
+   struct l3fwd_wqp_param prm;
+};
+
 struct __rte_cache_aligned lcore_conf {
uint16_t n_rx_queue;
struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
@@ -86,6 +122,7 @@ struct __rte_cache_aligned l

Re: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK

2024-09-06 Thread Burakov, Anatoly

On 9/6/2024 3:07 PM, Bruce Richardson wrote:

On Fri, Sep 06, 2024 at 03:02:53PM +0200, Morten Brørup wrote:

From: Burakov, Anatoly [mailto:anatoly.bura...@intel.com]
Sent: Friday, 6 September 2024 14.46

On 9/6/2024 2:37 PM, Morten Brørup wrote:

From: Anatoly Burakov [mailto:anatoly.bura...@intel.com]
Sent: Friday, 6 September 2024 13.47
To: dev@dpdk.org
Subject: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK

While initially, DPDK has used the term "socket ID" to refer to physical
package
ID, the last time DPDK read "physical_package_id" for socket ID was ~9

years

ago, so it's been a while since we've actually switched over to using the

term

"socket" to mean "NUMA node".

This wasn't a problem before, as most systems had one NUMA node per

physical

socket. However, in the last few years, more and more systems have multiple
NUMA
nodes per physical CPU socket. Since DPDK used NUMA nodes already, the
transition was pretty seamless, however now we're faced with a situation

when

most of our documentation still uses outdated terms, and our API is ripe

with

references to "sockets" when in actuality we mean "NUMA nodes". This could

be

a
source of confusion.

While completely renaming all of our API's would be a huge effort, will

take a

long time and arguably wouldn't even be worth the API breakages (given that
this
mismatch between terminology and reality is implicitly understood by most
people
working on DPDK, and so this isn't so much of a problem in practice), we

can

do
some tweaks around the edges and at least document this unfortunate

reality.


This patchset suggests the following changes:

- Update rte_socket/rte_lcore documentation to refer to NUMA nodes rather

than

sockets - Rename internal structures' fields to better reflect this

intention

-
Rename --socket-mem/--socket-limit flags to refer to NUMA rather than

sockets

-
Add internal API to get physical package ID [1]

The documentation is updated to refer to new EAL flags, but is otherwise

left

untouched, and instead the entry in "glossary" is amended to indicate that
when
DPDK documentation refers to "sockets", it actually means "NUMA ID's". As

next

steps, we could rename all API parameters to refer to NUMA ID rather than
socket
ID - this would not break neither API nor ABI, and instead would be a
documentation change in practice.

[1] This could be used to group lcores by physical package, see e.g.
discussion
  under this patch:
https://patches.dpdk.org/project/dpdk/cover/20240827151014.201-1-
vipin.vargh...@amd.com/


Thank you for cleaning this up, Anatoly.

I would prefer to take one more step and also rename functions and

parameters, e.g. rte_socket_id() -> rte_numa_id().


For backwards compatibility, macros/functions with the old names can be

added.




I don't think we can do such changes without deprecation notices, but
it's a good candidate for next release.


Perhaps we can keep ABI compatibility by adding wrapper functions with the old 
names/parameters, which simply call the same functions with the new 
names/parameters.

The Devil is in the details, and I haven't looked deeply into this. So take 
with a grain of salt.



I have thought about including parameter renames in this patchset, but
for now I decided against doing so. I can certainly include this in the
next revision if that's something community is willing to accept.


I agree with your decision on this. Renaming the parameters without renaming 
the functions could be confusing.



I actually wonder if that is true. If we are simply renaming the parameters
without:
a) changing their types
b) changing the function behaviour
then it is neither an API nor an ABI break. If we were to do so, it would
be like changing a comment, since the actual parameter name is purely a
convenience to hint to the user what the value being passed actually does.

That only applies for function parameters though. For any defines or macros
that need renaming, then we are into API break territory and we would want
backward compatible versions of same.



To be clear, I was referring to the former rather than the latter; 
renaming public API function parameters/structure fields can be done 
relatively easily and won't break anything. If there is consensus on 
going further than I have with this patchset, I can certainly do so.


--
Thanks,
Anatoly



[RFC PATCH v1 00/12] DTS external DPDK build and stats

2024-09-06 Thread Juraj Linkeš
Add support for externally built DPDK. The supported scenarios are:
* DPDK built on remote node
* DPDK built locally
* DPDK not built anywhere, source tree or tarball on remote node
* DPDK not built anywhere, local source tree or tarball

Remove multiple build targets per test run. If different build targets
are to be tested, these can be specified in multiple test runs.

Remove the git-ref option since it's redundant with the new features.

Improve statistics with a json output that includes more complete
results.

Tomáš Ďurovec (12):
  dts: rename build target to DPDK build
  dts: one dpdk build per test run
  dts: fix remote session transferring files
  dts: improve path handling for local and remote paths
  dts: add the ability to copy directories via remote
  dts: add ability to prevent overwriting files/dirs
  dts: update argument option for prevent overwriting
  dts: add support for externally compiled DPDK
  doc: update argument options for external DPDK build
  dts: remove git ref option
  doc: remove git-ref argument
  dts: improve statistics

 doc/guides/tools/dts.rst  |  17 +-
 dts/conf.yaml |   6 +-
 dts/framework/config/__init__.py  | 106 -
 dts/framework/config/conf_yaml_schema.json|  51 ++-
 dts/framework/config/types.py |  19 +-
 dts/framework/exception.py|   4 +-
 dts/framework/logger.py   |   4 -
 dts/framework/remote_session/dpdk_shell.py|   2 +-
 .../remote_session/remote_session.py  |  18 +-
 dts/framework/remote_session/ssh_session.py   |  12 +-
 dts/framework/runner.py   | 150 +++
 dts/framework/settings.py | 188 ++---
 dts/framework/test_result.py  | 372 ++
 dts/framework/test_suite.py   |   2 +-
 dts/framework/testbed_model/node.py   |  22 +-
 dts/framework/testbed_model/os_session.py | 160 ++--
 dts/framework/testbed_model/posix_session.py  | 135 ++-
 dts/framework/testbed_model/sut_node.py   | 337 ++--
 dts/framework/utils.py| 168 
 dts/tests/TestSuite_smoke_tests.py|   2 +-
 20 files changed, 1110 insertions(+), 665 deletions(-)

-- 
2.43.0



[RFC PATCH v1 01/12] dts: rename build target to DPDK build

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Signed-off-by: Tomáš Ďurovec 
---
 dts/conf.yaml  |   2 +-
 dts/framework/config/__init__.py   |  26 ++---
 dts/framework/config/conf_yaml_schema.json |  10 +-
 dts/framework/config/types.py  |   4 +-
 dts/framework/logger.py|   4 +-
 dts/framework/runner.py| 112 ++---
 dts/framework/settings.py  |   2 +-
 dts/framework/test_result.py   |  72 +++--
 dts/framework/test_suite.py|   2 +-
 dts/framework/testbed_model/sut_node.py|  55 +-
 dts/tests/TestSuite_smoke_tests.py |   2 +-
 11 files changed, 142 insertions(+), 149 deletions(-)

diff --git a/dts/conf.yaml b/dts/conf.yaml
index 7d95016e68..d43e6fcfeb 100644
--- a/dts/conf.yaml
+++ b/dts/conf.yaml
@@ -4,7 +4,7 @@
 
 test_runs:
   # define one test run environment
-  - build_targets:
+  - dpdk_builds:
   - arch: x86_64
 os: linux
 cpu: native
diff --git a/dts/framework/config/__init__.py b/dts/framework/config/__init__.py
index df60a5030e..598d7101ed 100644
--- a/dts/framework/config/__init__.py
+++ b/dts/framework/config/__init__.py
@@ -45,8 +45,8 @@
 from typing_extensions import Self
 
 from framework.config.types import (
-BuildTargetConfigDict,
 ConfigurationDict,
+DPDKBuildConfigDict,
 NodeConfigDict,
 PortConfigDict,
 TestRunConfigDict,
@@ -335,7 +335,7 @@ class NodeInfo:
 
 
 @dataclass(slots=True, frozen=True)
-class BuildTargetConfiguration:
+class DPDKBuildConfiguration:
 """DPDK build configuration.
 
 The configuration used for building DPDK.
@@ -358,7 +358,7 @@ class BuildTargetConfiguration:
 name: str
 
 @classmethod
-def from_dict(cls, d: BuildTargetConfigDict) -> Self:
+def from_dict(cls, d: DPDKBuildConfigDict) -> Self:
 r"""A convenience method that processes the inputs before creating an 
instance.
 
 `arch`, `os`, `cpu` and `compiler` are converted to :class:`Enum`\s and
@@ -368,7 +368,7 @@ def from_dict(cls, d: BuildTargetConfigDict) -> Self:
 d: The configuration dictionary.
 
 Returns:
-The build target configuration instance.
+The DPDK build configuration instance.
 """
 return cls(
 arch=Architecture(d["arch"]),
@@ -381,8 +381,8 @@ def from_dict(cls, d: BuildTargetConfigDict) -> Self:
 
 
 @dataclass(slots=True, frozen=True)
-class BuildTargetInfo:
-"""Various versions and other information about a build target.
+class DPDKBuildInfo:
+"""Various versions and other information about a DPDK build.
 
 Attributes:
 dpdk_version: The DPDK version that was built.
@@ -437,7 +437,7 @@ class TestRunConfiguration:
 and with what DPDK build.
 
 Attributes:
-build_targets: A list of DPDK builds to test.
+dpdk_builds: A list of DPDK builds to test.
 perf: Whether to run performance tests.
 func: Whether to run functional tests.
 skip_smoke_tests: Whether to skip smoke tests.
@@ -447,7 +447,7 @@ class TestRunConfiguration:
 vdevs: The names of virtual devices to test.
 """
 
-build_targets: list[BuildTargetConfiguration]
+dpdk_builds: list[DPDKBuildConfiguration]
 perf: bool
 func: bool
 skip_smoke_tests: bool
@@ -464,7 +464,7 @@ def from_dict(
 ) -> Self:
 """A convenience method that processes the inputs before creating an 
instance.
 
-The build target and the test suite config are transformed into their 
respective objects.
+The DPDK build and the test suite config are transformed into their 
respective objects.
 SUT and TG configurations are taken from `node_map`. The other 
(:class:`bool`) attributes
 are just stored.
 
@@ -475,8 +475,8 @@ def from_dict(
 Returns:
 The test run configuration instance.
 """
-build_targets: list[BuildTargetConfiguration] = list(
-map(BuildTargetConfiguration.from_dict, d["build_targets"])
+dpdk_builds: list[DPDKBuildConfiguration] = list(
+map(DPDKBuildConfiguration.from_dict, d["dpdk_builds"])
 )
 test_suites: list[TestSuiteConfig] = 
list(map(TestSuiteConfig.from_dict, d["test_suites"]))
 sut_name = d["system_under_test_node"]["node_name"]
@@ -498,7 +498,7 @@ def from_dict(
 d["system_under_test_node"]["vdevs"] if "vdevs" in 
d["system_under_test_node"] else []
 )
 return cls(
-build_targets=build_targets,
+dpdk_builds=dpdk_builds,
 perf=d["perf"],
 func=d["func"],
 skip_smoke_tests=skip_smoke_tests,
@@ -548,7 +548,7 @@ class Configuration:
 def from_dict(cls, d: ConfigurationDict) -> Self:
 """A convenience method that processes the inputs before creating an 
instance.
 
-Build target and test suite config are transfor

[RFC PATCH v1 03/12] dts: fix remote session transferring files

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Fix parameters layout between source and destination
according to docs.

Signed-off-by: Tomáš Ďurovec 
---
 dts/framework/remote_session/remote_session.py | 14 --
 dts/framework/remote_session/ssh_session.py|  8 
 dts/framework/testbed_model/os_session.py  | 18 ++
 dts/framework/testbed_model/posix_session.py   |  8 
 4 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/dts/framework/remote_session/remote_session.py 
b/dts/framework/remote_session/remote_session.py
index 8c580b070f..6ca8593c90 100644
--- a/dts/framework/remote_session/remote_session.py
+++ b/dts/framework/remote_session/remote_session.py
@@ -199,32 +199,34 @@ def is_alive(self) -> bool:
 def copy_from(
 self,
 source_file: str | PurePath,
-destination_file: str | PurePath,
+destination_dir: str | PurePath,
 ) -> None:
 """Copy a file from the remote Node to the local filesystem.
 
 Copy `source_file` from the remote Node associated with this remote 
session
-to `destination_file` on the local filesystem.
+to `destination_dir` on the local filesystem.
 
 Args:
 source_file: The file on the remote Node.
-destination_file: A file or directory path on the local filesystem.
+destination_dir: A dir path on the local filesystem, where the 
`source_file`
+will be saved.
 """
 
 @abstractmethod
 def copy_to(
 self,
 source_file: str | PurePath,
-destination_file: str | PurePath,
+destination_dir: str | PurePath,
 ) -> None:
 """Copy a file from local filesystem to the remote Node.
 
-Copy `source_file` from local filesystem to `destination_file` on the 
remote Node
+Copy `source_file` from local filesystem to `destination_dir` on the 
remote Node
 associated with this remote session.
 
 Args:
 source_file: The file on the local filesystem.
-destination_file: A file or directory path on the remote Node.
+destination_dir: A dir path on the remote Node, where the 
`source_file`
+will be saved.
 """
 
 @abstractmethod
diff --git a/dts/framework/remote_session/ssh_session.py 
b/dts/framework/remote_session/ssh_session.py
index 66f8176833..a756bfecef 100644
--- a/dts/framework/remote_session/ssh_session.py
+++ b/dts/framework/remote_session/ssh_session.py
@@ -106,18 +106,18 @@ def is_alive(self) -> bool:
 def copy_from(
 self,
 source_file: str | PurePath,
-destination_file: str | PurePath,
+destination_dir: str | PurePath,
 ) -> None:
 """Overrides :meth:`~.remote_session.RemoteSession.copy_from`."""
-self.session.get(str(destination_file), str(source_file))
+self.session.get(str(source_file), str(destination_dir))
 
 def copy_to(
 self,
 source_file: str | PurePath,
-destination_file: str | PurePath,
+destination_dir: str | PurePath,
 ) -> None:
 """Overrides :meth:`~.remote_session.RemoteSession.copy_to`."""
-self.session.put(str(source_file), str(destination_file))
+self.session.put(str(source_file), str(destination_dir))
 
 def close(self) -> None:
 """Overrides :meth:`~.remote_session.RemoteSession.close`."""
diff --git a/dts/framework/testbed_model/os_session.py 
b/dts/framework/testbed_model/os_session.py
index 79f56b289b..8928a47d6f 100644
--- a/dts/framework/testbed_model/os_session.py
+++ b/dts/framework/testbed_model/os_session.py
@@ -181,32 +181,34 @@ def join_remote_path(self, *args: str | PurePath) -> 
PurePath:
 def copy_from(
 self,
 source_file: str | PurePath,
-destination_file: str | PurePath,
+destination_dir: str | PurePath,
 ) -> None:
 """Copy a file from the remote node to the local filesystem.
 
 Copy `source_file` from the remote node associated with this remote
-session to `destination_file` on the local filesystem.
+session to `destination_dir` on the local filesystem.
 
 Args:
-source_file: the file on the remote node.
-destination_file: a file or directory path on the local filesystem.
+source_file: The file on the remote node.
+destination_dir: A dir path on the local filesystem, where the 
`source_file`
+will be saved.
 """
 
 @abstractmethod
 def copy_to(
 self,
 source_file: str | PurePath,
-destination_file: str | PurePath,
+destination_dir: str | PurePath,
 ) -> None:
 """Copy a file from local filesystem to the remote node.
 
-Copy `source_file` from local filesystem to `destination_file`
+Copy `source_file` from local filesystem to `destination_dir`
 on the remote node associated with thi

[RFC PATCH v1 02/12] dts: one dpdk build per test run

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Signed-off-by: Tomáš Ďurovec 
---
 dts/conf.yaml  |  14 +--
 dts/framework/config/__init__.py   |   9 +-
 dts/framework/config/conf_yaml_schema.json |  10 +-
 dts/framework/config/types.py  |   2 +-
 dts/framework/logger.py|   4 -
 dts/framework/runner.py| 117 +---
 dts/framework/test_result.py   | 119 ++---
 dts/framework/test_suite.py|   2 +-
 dts/framework/testbed_model/sut_node.py|   6 +-
 dts/tests/TestSuite_smoke_tests.py |   2 +-
 10 files changed, 80 insertions(+), 205 deletions(-)

diff --git a/dts/conf.yaml b/dts/conf.yaml
index d43e6fcfeb..3d5ee5aee5 100644
--- a/dts/conf.yaml
+++ b/dts/conf.yaml
@@ -4,13 +4,13 @@
 
 test_runs:
   # define one test run environment
-  - dpdk_builds:
-  - arch: x86_64
-os: linux
-cpu: native
-# the combination of the following two makes CC="ccache gcc"
-compiler: gcc
-compiler_wrapper: ccache
+  - dpdk_build:
+  arch: x86_64
+  os: linux
+  cpu: native
+  # the combination of the following two makes CC="ccache gcc"
+  compiler: gcc
+  compiler_wrapper: ccache
 perf: false # disable performance testing
 func: true # enable functional testing
 skip_smoke_tests: false # optional
diff --git a/dts/framework/config/__init__.py b/dts/framework/config/__init__.py
index 598d7101ed..aba49143ae 100644
--- a/dts/framework/config/__init__.py
+++ b/dts/framework/config/__init__.py
@@ -437,7 +437,7 @@ class TestRunConfiguration:
 and with what DPDK build.
 
 Attributes:
-dpdk_builds: A list of DPDK builds to test.
+dpdk_build: A DPDK build to test.
 perf: Whether to run performance tests.
 func: Whether to run functional tests.
 skip_smoke_tests: Whether to skip smoke tests.
@@ -447,7 +447,7 @@ class TestRunConfiguration:
 vdevs: The names of virtual devices to test.
 """
 
-dpdk_builds: list[DPDKBuildConfiguration]
+dpdk_build: DPDKBuildConfiguration
 perf: bool
 func: bool
 skip_smoke_tests: bool
@@ -475,9 +475,6 @@ def from_dict(
 Returns:
 The test run configuration instance.
 """
-dpdk_builds: list[DPDKBuildConfiguration] = list(
-map(DPDKBuildConfiguration.from_dict, d["dpdk_builds"])
-)
 test_suites: list[TestSuiteConfig] = 
list(map(TestSuiteConfig.from_dict, d["test_suites"]))
 sut_name = d["system_under_test_node"]["node_name"]
 skip_smoke_tests = d.get("skip_smoke_tests", False)
@@ -498,7 +495,7 @@ def from_dict(
 d["system_under_test_node"]["vdevs"] if "vdevs" in 
d["system_under_test_node"] else []
 )
 return cls(
-dpdk_builds=dpdk_builds,
+dpdk_build=DPDKBuildConfiguration.from_dict(d["dpdk_build"]),
 perf=d["perf"],
 func=d["func"],
 skip_smoke_tests=skip_smoke_tests,
diff --git a/dts/framework/config/conf_yaml_schema.json 
b/dts/framework/config/conf_yaml_schema.json
index 4b63e9710e..c0c347199e 100644
--- a/dts/framework/config/conf_yaml_schema.json
+++ b/dts/framework/config/conf_yaml_schema.json
@@ -327,12 +327,8 @@
   "items": {
 "type": "object",
 "properties": {
-  "dpdk_builds": {
-"type": "array",
-"items": {
-  "$ref": "#/definitions/dpdk_build"
-},
-"minimum": 1
+  "dpdk_build": {
+"$ref": "#/definitions/dpdk_build"
   },
   "perf": {
 "type": "boolean",
@@ -383,7 +379,7 @@
 },
 "additionalProperties": false,
 "required": [
-  "dpdk_builds",
+  "dpdk_build",
   "perf",
   "func",
   "test_suites",
diff --git a/dts/framework/config/types.py b/dts/framework/config/types.py
index 703d9eb48e..9b3c997c80 100644
--- a/dts/framework/config/types.py
+++ b/dts/framework/config/types.py
@@ -108,7 +108,7 @@ class TestRunConfigDict(TypedDict):
 """Allowed keys and values."""
 
 #:
-dpdk_builds: list[DPDKBuildConfigDict]
+dpdk_build: DPDKBuildConfigDict
 #:
 perf: bool
 #:
diff --git a/dts/framework/logger.py b/dts/framework/logger.py
index 3fbe618219..d2b8e37da4 100644
--- a/dts/framework/logger.py
+++ b/dts/framework/logger.py
@@ -33,16 +33,12 @@ class DtsStage(StrEnum):
 #:
 test_run_setup = auto()
 #:
-dpdk_build_setup = auto()
-#:
 test_suite_setup = auto()
 #:
 test_suite = auto()
 #:
 test_suite_teardown = auto()
 #:
-dpdk_build_teardown = auto()
-#:
 test_run_teardown = auto()
 #:
 post_run = auto()
diff --git a/dts/framework/runner.py b/dts/framework/runner.py
index 2b5403e51c..a212ca2470 100644
--- a/dts/framework/runner.py
+++ b/dts/framework/runner.py
@@ -12,7 +12,7 @

[RFC PATCH v1 04/12] dts: improve path handling for local and remote paths

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Update remote session to clearly differentiate between
local and remote paths. Local paths now accept OS-aware
path objects, while remote paths handle OS-agnostic paths.

Signed-off-by: Tomáš Ďurovec 
---
 dts/framework/remote_session/remote_session.py | 6 +++---
 dts/framework/remote_session/ssh_session.py| 6 +++---
 dts/framework/testbed_model/os_session.py  | 6 +++---
 dts/framework/testbed_model/posix_session.py   | 6 +++---
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/dts/framework/remote_session/remote_session.py 
b/dts/framework/remote_session/remote_session.py
index 6ca8593c90..ce311e70b6 100644
--- a/dts/framework/remote_session/remote_session.py
+++ b/dts/framework/remote_session/remote_session.py
@@ -12,7 +12,7 @@
 
 from abc import ABC, abstractmethod
 from dataclasses import InitVar, dataclass, field
-from pathlib import PurePath
+from pathlib import Path, PurePath
 
 from framework.config import NodeConfiguration
 from framework.exception import RemoteCommandExecutionError
@@ -199,7 +199,7 @@ def is_alive(self) -> bool:
 def copy_from(
 self,
 source_file: str | PurePath,
-destination_dir: str | PurePath,
+destination_dir: str | Path,
 ) -> None:
 """Copy a file from the remote Node to the local filesystem.
 
@@ -215,7 +215,7 @@ def copy_from(
 @abstractmethod
 def copy_to(
 self,
-source_file: str | PurePath,
+source_file: str | Path,
 destination_dir: str | PurePath,
 ) -> None:
 """Copy a file from local filesystem to the remote Node.
diff --git a/dts/framework/remote_session/ssh_session.py 
b/dts/framework/remote_session/ssh_session.py
index a756bfecef..88a000912e 100644
--- a/dts/framework/remote_session/ssh_session.py
+++ b/dts/framework/remote_session/ssh_session.py
@@ -5,7 +5,7 @@
 
 import socket
 import traceback
-from pathlib import PurePath
+from pathlib import Path, PurePath
 
 from fabric import Connection  # type: ignore[import-untyped]
 from invoke.exceptions import (  # type: ignore[import-untyped]
@@ -106,14 +106,14 @@ def is_alive(self) -> bool:
 def copy_from(
 self,
 source_file: str | PurePath,
-destination_dir: str | PurePath,
+destination_dir: str | Path,
 ) -> None:
 """Overrides :meth:`~.remote_session.RemoteSession.copy_from`."""
 self.session.get(str(source_file), str(destination_dir))
 
 def copy_to(
 self,
-source_file: str | PurePath,
+source_file: str | Path,
 destination_dir: str | PurePath,
 ) -> None:
 """Overrides :meth:`~.remote_session.RemoteSession.copy_to`."""
diff --git a/dts/framework/testbed_model/os_session.py 
b/dts/framework/testbed_model/os_session.py
index 8928a47d6f..d24f44df10 100644
--- a/dts/framework/testbed_model/os_session.py
+++ b/dts/framework/testbed_model/os_session.py
@@ -25,7 +25,7 @@
 from abc import ABC, abstractmethod
 from collections.abc import Iterable
 from ipaddress import IPv4Interface, IPv6Interface
-from pathlib import PurePath
+from pathlib import Path, PurePath
 from typing import Union
 
 from framework.config import Architecture, NodeConfiguration, NodeInfo
@@ -181,7 +181,7 @@ def join_remote_path(self, *args: str | PurePath) -> 
PurePath:
 def copy_from(
 self,
 source_file: str | PurePath,
-destination_dir: str | PurePath,
+destination_dir: str | Path,
 ) -> None:
 """Copy a file from the remote node to the local filesystem.
 
@@ -197,7 +197,7 @@ def copy_from(
 @abstractmethod
 def copy_to(
 self,
-source_file: str | PurePath,
+source_file: str | Path,
 destination_dir: str | PurePath,
 ) -> None:
 """Copy a file from local filesystem to the remote node.
diff --git a/dts/framework/testbed_model/posix_session.py 
b/dts/framework/testbed_model/posix_session.py
index 7f0b1f2036..0d8c5f91a6 100644
--- a/dts/framework/testbed_model/posix_session.py
+++ b/dts/framework/testbed_model/posix_session.py
@@ -13,7 +13,7 @@
 
 import re
 from collections.abc import Iterable
-from pathlib import PurePath, PurePosixPath
+from pathlib import Path, PurePath, PurePosixPath
 
 from framework.config import Architecture, NodeInfo
 from framework.exception import DPDKBuildError, RemoteCommandExecutionError
@@ -88,14 +88,14 @@ def join_remote_path(self, *args: str | PurePath) -> 
PurePosixPath:
 def copy_from(
 self,
 source_file: str | PurePath,
-destination_dir: str | PurePath,
+destination_dir: str | Path,
 ) -> None:
 """Overrides :meth:`~.os_session.OSSession.copy_from`."""
 self.remote_session.copy_from(source_file, destination_dir)
 
 def copy_to(
 self,
-source_file: str | PurePath,
+source_file: str | Path,
 destination_dir: str | PurePath,
 ) -> None:
 """Overrides :meth:`~.os_

[RFC PATCH v1 05/12] dts: add the ability to copy directories via remote

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Signed-off-by: Tomáš Ďurovec 
---
 dts/framework/testbed_model/os_session.py| 88 +++---
 dts/framework/testbed_model/posix_session.py | 93 ---
 dts/framework/utils.py   | 97 ++--
 3 files changed, 246 insertions(+), 32 deletions(-)

diff --git a/dts/framework/testbed_model/os_session.py 
b/dts/framework/testbed_model/os_session.py
index d24f44df10..92b1a09d94 100644
--- a/dts/framework/testbed_model/os_session.py
+++ b/dts/framework/testbed_model/os_session.py
@@ -38,7 +38,7 @@
 )
 from framework.remote_session.remote_session import CommandResult
 from framework.settings import SETTINGS
-from framework.utils import MesonArgs
+from framework.utils import MesonArgs, TarCompressionFormat
 
 from .cpu import LogicalCore
 from .port import Port
@@ -178,11 +178,7 @@ def join_remote_path(self, *args: str | PurePath) -> 
PurePath:
 """
 
 @abstractmethod
-def copy_from(
-self,
-source_file: str | PurePath,
-destination_dir: str | Path,
-) -> None:
+def copy_from(self, source_file: str | PurePath, destination_dir: str | 
Path) -> None:
 """Copy a file from the remote node to the local filesystem.
 
 Copy `source_file` from the remote node associated with this remote
@@ -195,11 +191,7 @@ def copy_from(
 """
 
 @abstractmethod
-def copy_to(
-self,
-source_file: str | Path,
-destination_dir: str | PurePath,
-) -> None:
+def copy_to(self, source_file: str | Path, destination_dir: str | 
PurePath) -> None:
 """Copy a file from local filesystem to the remote node.
 
 Copy `source_file` from local filesystem to `destination_dir`
@@ -211,6 +203,57 @@ def copy_to(
 will be saved.
 """
 
+@abstractmethod
+def copy_dir_from(
+self,
+source_dir: str | PurePath,
+destination_dir: str | Path,
+compress_format: TarCompressionFormat = TarCompressionFormat.none,
+exclude: str | list[str] | None = None,
+) -> None:
+"""Copy a dir from the remote node to the local filesystem.
+
+Copy `source_dir` from the remote node associated with this remote 
session to
+`destination_dir` on the local filesystem. The new local dir will be 
created
+at `destination_dir` path.
+
+Args:
+source_dir: The dir on the remote node.
+destination_dir: A dir path on the local filesystem.
+compress_format: The compression format to use. Default is no 
compression.
+exclude: Files or dirs to exclude before creating the tarball.
+"""
+
+@abstractmethod
+def copy_dir_to(
+self,
+source_dir: str | Path,
+destination_dir: str | PurePath,
+compress_format: TarCompressionFormat = TarCompressionFormat.none,
+exclude: str | list[str] | None = None,
+) -> None:
+"""Copy a dir from the local filesystem to the remote node.
+
+Copy `source_dir` from the local filesystem to `destination_dir` on 
the remote node
+associated with this remote session. The new remote dir will be 
created at
+`destination_dir` path.
+
+Args:
+source_dir: The dir on the local filesystem.
+destination_dir: A dir path on the remote node.
+compress_format: The compression format to use. Default is no 
compression.
+exclude: Files or dirs to exclude before creating the tarball.
+"""
+
+@abstractmethod
+def remove_remote_file(self, remote_file_path: str | PurePath, force: bool 
= True) -> None:
+"""Remove remote file, by default remove forcefully.
+
+Args:
+remote_file_path: The path of the file to remove.
+force: If :data:`True`, ignore all warnings and try to remove at 
all costs.
+"""
+
 @abstractmethod
 def remove_remote_dir(
 self,
@@ -218,14 +261,31 @@ def remove_remote_dir(
 recursive: bool = True,
 force: bool = True,
 ) -> None:
-"""Remove remote directory, by default remove recursively and 
forcefully.
+"""Remove remote dir, by default remove recursively and forcefully.
 
 Args:
-remote_dir_path: The path of the directory to remove.
-recursive: If :data:`True`, also remove all contents inside the 
directory.
+remote_dir_path: The path of the dir to remove.
+recursive: If :data:`True`, also remove all contents inside the 
dir.
 force: If :data:`True`, ignore all warnings and try to remove at 
all costs.
 """
 
+@abstractmethod
+def create_remote_tarball(
+self,
+remote_dir_path: str | PurePath,
+compress_format: TarCompressionFormat = TarCompressionFormat.none,
+exclude: str | list[str] | None = None,
+) -> None:
+"""Create a tarbal

[RFC PATCH v1 06/12] dts: add ability to prevent overwriting files/dirs

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Signed-off-by: Tomáš Ďurovec 
---
 dts/framework/settings.py| 17 ++
 dts/framework/testbed_model/os_session.py| 31 +++---
 dts/framework/testbed_model/posix_session.py | 33 +---
 3 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/dts/framework/settings.py b/dts/framework/settings.py
index 2b8c583853..2f7089a26b 100644
--- a/dts/framework/settings.py
+++ b/dts/framework/settings.py
@@ -55,6 +55,11 @@
 Git revision ID to test. Could be commit, tag, tree ID etc.
 To test local changes, first commit them, then use their commit ID.
 
+.. option:: -f, --force
+.. envvar:: DTS_FORCE
+
+Specify to remove an already existing dpdk tarball before 
copying/extracting a new one.
+
 .. option:: --test-suite
 .. envvar:: DTS_TEST_SUITES
 
@@ -110,6 +115,8 @@ class Settings:
 #:
 dpdk_tarball_path: Path | str = ""
 #:
+force: bool = False
+#:
 compile_timeout: float = 1200
 #:
 test_suites: list[TestSuiteConfig] = field(default_factory=list)
@@ -337,6 +344,16 @@ def _get_parser() -> _DTSArgumentParser:
 )
 _add_env_var_to_action(action)
 
+action = parser.add_argument(
+"-f",
+"--force",
+action="store_true",
+default=SETTINGS.force,
+help="Specify to remove an already existing dpdk tarball before 
copying/extracting a "
+"new one.",
+)
+_add_env_var_to_action(action)
+
 action = parser.add_argument(
 "--compile-timeout",
 default=SETTINGS.compile_timeout,
diff --git a/dts/framework/testbed_model/os_session.py 
b/dts/framework/testbed_model/os_session.py
index 92b1a09d94..afc9ffb814 100644
--- a/dts/framework/testbed_model/os_session.py
+++ b/dts/framework/testbed_model/os_session.py
@@ -178,7 +178,9 @@ def join_remote_path(self, *args: str | PurePath) -> 
PurePath:
 """
 
 @abstractmethod
-def copy_from(self, source_file: str | PurePath, destination_dir: str | 
Path) -> None:
+def copy_from(
+self, source_file: str | PurePath, destination_dir: str | Path, force: 
bool = SETTINGS.force
+) -> None:
 """Copy a file from the remote node to the local filesystem.
 
 Copy `source_file` from the remote node associated with this remote
@@ -188,10 +190,14 @@ def copy_from(self, source_file: str | PurePath, 
destination_dir: str | Path) ->
 source_file: The file on the remote node.
 destination_dir: A dir path on the local filesystem, where the 
`source_file`
 will be saved.
+force: If :data:`True`, remove an already existing `source_file` 
at the
+`destination_dir` before copying to prevent overwriting data.
 """
 
 @abstractmethod
-def copy_to(self, source_file: str | Path, destination_dir: str | 
PurePath) -> None:
+def copy_to(
+self, source_file: str | Path, destination_dir: str | PurePath, force: 
bool = SETTINGS.force
+) -> None:
 """Copy a file from local filesystem to the remote node.
 
 Copy `source_file` from local filesystem to `destination_dir`
@@ -201,6 +207,8 @@ def copy_to(self, source_file: str | Path, destination_dir: 
str | PurePath) -> N
 source_file: The file on the local filesystem.
 destination_dir: A dir path on the remote Node, where the 
`source_file`
 will be saved.
+force: If :data:`True`, remove an already existing `source_file` 
at the
+`destination_dir` before copying to prevent overwriting data.
 """
 
 @abstractmethod
@@ -210,6 +218,7 @@ def copy_dir_from(
 destination_dir: str | Path,
 compress_format: TarCompressionFormat = TarCompressionFormat.none,
 exclude: str | list[str] | None = None,
+force: bool = SETTINGS.force,
 ) -> None:
 """Copy a dir from the remote node to the local filesystem.
 
@@ -222,6 +231,8 @@ def copy_dir_from(
 destination_dir: A dir path on the local filesystem.
 compress_format: The compression format to use. Default is no 
compression.
 exclude: Files or dirs to exclude before creating the tarball.
+force: If :data:`True`, remove an already existing `source_dir` at 
the `destination_dir`
+before copying to prevent overwriting data.
 """
 
 @abstractmethod
@@ -231,18 +242,21 @@ def copy_dir_to(
 destination_dir: str | PurePath,
 compress_format: TarCompressionFormat = TarCompressionFormat.none,
 exclude: str | list[str] | None = None,
+force: bool = SETTINGS.force,
 ) -> None:
 """Copy a dir from the local filesystem to the remote node.
 
 Copy `source_dir` from the local filesystem to `destination_dir` on 
the remote node
-associated with this remote session. The new remote dir will be 
created at
-`destination_dir` path.
+ 

[RFC PATCH v1 07/12] dts: update argument option for prevent overwriting

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Signed-off-by: Tomáš Ďurovec 
---
 doc/guides/tools/dts.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..059776c888 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -241,6 +241,7 @@ DTS is run with ``main.py`` located in the ``dts`` 
directory after entering Poet
  --revision ID, --rev ID, --git-ref ID
[DTS_DPDK_REVISION_ID] Git revision ID to test. 
Could be commit, tag, tree ID etc. To test local changes, first
commit them, then use their commit ID. (default: 
None)
+ -f, --force   [DTS_FORCE] Specify to remove an already existing 
dpdk tarball before copying/extracting a new one. (default: False)
  --compile-timeout SECONDS
[DTS_COMPILE_TIMEOUT] The timeout for compiling 
DPDK. (default: 1200)
  --test-suite TEST_SUITE [TEST_CASES ...]
-- 
2.43.0



[RFC PATCH v1 08/12] dts: add support for externally compiled DPDK

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Signed-off-by: Tomáš Ďurovec 
---
 dts/conf.yaml|  14 +-
 dts/framework/config/__init__.py |  87 -
 dts/framework/config/conf_yaml_schema.json   |  41 ++-
 dts/framework/config/types.py|  17 +-
 dts/framework/exception.py   |   4 +-
 dts/framework/remote_session/dpdk_shell.py   |   2 +-
 dts/framework/runner.py  |  16 +-
 dts/framework/settings.py| 160 --
 dts/framework/test_result.py |  27 +-
 dts/framework/testbed_model/node.py  |  22 +-
 dts/framework/testbed_model/os_session.py|  43 ++-
 dts/framework/testbed_model/posix_session.py |  23 +-
 dts/framework/testbed_model/sut_node.py  | 314 +--
 13 files changed, 562 insertions(+), 208 deletions(-)

diff --git a/dts/conf.yaml b/dts/conf.yaml
index 3d5ee5aee5..a38aaca7f7 100644
--- a/dts/conf.yaml
+++ b/dts/conf.yaml
@@ -5,12 +5,14 @@
 test_runs:
   # define one test run environment
   - dpdk_build:
-  arch: x86_64
-  os: linux
-  cpu: native
-  # the combination of the following two makes CC="ccache gcc"
-  compiler: gcc
-  compiler_wrapper: ccache
+  tarball: "" # define path to DPDK tarball
+  build:
+arch: x86_64
+os: linux
+cpu: native
+# the combination of the following two makes CC="ccache gcc"
+compiler: gcc
+compiler_wrapper: ccache
 perf: false # disable performance testing
 func: true # enable functional testing
 skip_smoke_tests: false # optional
diff --git a/dts/framework/config/__init__.py b/dts/framework/config/__init__.py
index aba49143ae..0896f4e495 100644
--- a/dts/framework/config/__init__.py
+++ b/dts/framework/config/__init__.py
@@ -47,6 +47,7 @@
 from framework.config.types import (
 ConfigurationDict,
 DPDKBuildConfigDict,
+DPDKSetupDict,
 NodeConfigDict,
 PortConfigDict,
 TestRunConfigDict,
@@ -380,6 +381,67 @@ def from_dict(cls, d: DPDKBuildConfigDict) -> Self:
 )
 
 
+@dataclass(slots=True, frozen=True)
+class DPDKLocation:
+"""DPDK location.
+
+The path to the DPDK sources, build dir and type of location.
+
+Attributes:
+dpdk_tree: The path to the DPDK tree.
+tarball: The path to the DPDK tarball.
+remote: If :data:`True`, `dpdk_tree` or `tarball` is on the SUT node.
+build_dir: A directory name, which would be located in the `dpdk tree` 
or `tarball`.
+"""
+
+dpdk_tree: str | None
+tarball: str | None
+remote: bool
+build_dir: str | None
+
+@classmethod
+def from_dict(cls, d: DPDKSetupDict) -> Self | None:
+"""A convenience method that processes and validate the inputs before 
creating an instance.
+
+Ensures that either `dpdk_tree` or `tarball` is provided and, if local
+(`remote` is False), verifies their existence. Constructs and returns
+a `DPDKLocation` object with the provided parameters if validation is
+successful, or `None` if neither `dpdk_tree` nor `tarball` is given.
+
+Args:
+d: The configuration dictionary.
+
+Returns:
+A DPDK location if construction is successful, otherwise None.
+
+Raises:
+ConfigurationError: If `dpdk_tree` or `tarball` not found in local 
filesystem.
+"""
+dpdk_tree = d.get("dpdk_tree")
+tarball = d.get("tarball")
+remote = d.get("remote", False)
+
+if dpdk_tree or tarball:
+if not remote:
+if dpdk_tree and not Path(dpdk_tree).is_dir():
+raise ConfigurationError(
+f"DPDK tree '{dpdk_tree}' not found in local 
filesystem."
+)
+if tarball and not Path(tarball).is_file():
+raise ConfigurationError(
+f"DPDK tarball '{tarball}' not found in local 
filesystem."
+)
+
+return cls(
+dpdk_tree=dpdk_tree,
+tarball=tarball,
+remote=remote,
+build_dir=d.get("dir_name"),
+)
+
+return None
+
+
 @dataclass(slots=True, frozen=True)
 class DPDKBuildInfo:
 """Various versions and other information about a DPDK build.
@@ -389,8 +451,8 @@ class DPDKBuildInfo:
 compiler_version: The version of the compiler used to build DPDK.
 """
 
-dpdk_version: str
-compiler_version: str
+dpdk_version: str | None
+compiler_version: str | None
 
 
 @dataclass(slots=True, frozen=True)
@@ -437,7 +499,8 @@ class TestRunConfiguration:
 and with what DPDK build.
 
 Attributes:
-dpdk_build: A DPDK build to test.
+dpdk_location: The target source of the DPDK tree.
+dpdk_build_config: A DPDK build configuration to test.
 perf: Whether to run performance tests.
 func: Wheth

[RFC PATCH v1 09/12] doc: update argument options for external DPDK build

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Signed-off-by: Tomáš Ďurovec 
---
 doc/guides/tools/dts.rst | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 059776c888..8aac22bc60 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -235,12 +235,14 @@ DTS is run with ``main.py`` located in the ``dts`` 
directory after entering Poet
  -t SECONDS, --timeout SECONDS
[DTS_TIMEOUT] The default timeout for all DTS 
operations except for compiling DPDK. (default: 15)
  -v, --verbose [DTS_VERBOSE] Specify to enable verbose output, 
logging all messages to the console. (default: False)
- -s, --skip-setup  [DTS_SKIP_SETUP] Specify to skip all setup steps on 
SUT and TG nodes. (default: False)
+ --dpdk-tree DIR_PATH  [DTS_DPDK_TREE] Path to DPDK source code tree to 
test. (default: None)
  --tarball FILE_PATH, --snapshot FILE_PATH
[DTS_DPDK_TARBALL] Path to DPDK source code tarball 
to test. (default: None)
  --revision ID, --rev ID, --git-ref ID
[DTS_DPDK_REVISION_ID] Git revision ID to test. 
Could be commit, tag, tree ID etc. To test local changes, first
commit them, then use their commit ID. (default: 
None)
+ --remote-source   [DTS_REMOTE_SOURCE] Set when the DPDK source tree 
or tarball is located on the SUT node. (default: False)
+ --build-dir DIR_NAME  [DTS_BUILD_DIR] A directory name, which would be 
located in the `dpdk tree` or `tarball`. (default: None)
  -f, --force   [DTS_FORCE] Specify to remove an already existing 
dpdk tarball before copying/extracting a new one. (default: False)
  --compile-timeout SECONDS
[DTS_COMPILE_TIMEOUT] The timeout for compiling 
DPDK. (default: 1200)
@@ -255,8 +257,8 @@ DTS is run with ``main.py`` located in the ``dts`` 
directory after entering Poet
 
 
 The brackets contain the names of environment variables that set the same 
thing.
-The minimum DTS needs is a config file and a DPDK tarball or git ref ID.
-You may pass those to DTS using the command line arguments or use the default 
paths.
+The minimum DTS needs is a config file and a DPDK source which can add in 
config
+or command line argument/environment variable option.
 
 Example command for running DTS with the template configuration and DPDK tag 
v23.11:
 
-- 
2.43.0



[RFC PATCH v1 10/12] dts: remove git ref option

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Signed-off-by: Tomáš Ďurovec 
---
 dts/framework/settings.py |  31 --
 dts/framework/utils.py| 117 --
 2 files changed, 148 deletions(-)

diff --git a/dts/framework/settings.py b/dts/framework/settings.py
index 97acd62fd8..d514e887d3 100644
--- a/dts/framework/settings.py
+++ b/dts/framework/settings.py
@@ -49,12 +49,6 @@
 
 Path to DPDK source code tarball to test.
 
-.. option:: --revision, --rev, --git-ref
-.. envvar:: DTS_DPDK_REVISION_ID
-
-Git revision ID to test. Could be commit, tag, tree ID etc.
-To test local changes, first commit them, then use their commit ID.
-
 .. option:: --remote-source
 .. envvar:: DTS_REMOTE_SOURCE
 
@@ -101,8 +95,6 @@
 from typing import Callable
 
 from .config import DPDKLocation, TestSuiteConfig
-from .exception import ConfigurationError
-from .utils import DPDKGitTarball, get_commit_id
 
 
 @dataclass(slots=True)
@@ -249,14 +241,6 @@ def _get_help_string(self, action):
 return help
 
 
-def _parse_revision_id(rev_id: str) -> str:
-"""Validate revision ID and retrieve corresponding commit ID."""
-try:
-return get_commit_id(rev_id)
-except ConfigurationError:
-raise argparse.ArgumentTypeError("The Git revision ID supplied is 
invalid or ambiguous")
-
-
 def _required_with_one_of(parser: _DTSArgumentParser, action: Action, 
*required_dests: str) -> None:
 """Verify that `action` is listed together with `required_dests`.
 
@@ -372,18 +356,6 @@ def _get_parser() -> _DTSArgumentParser:
 )
 _add_env_var_to_action(action, "DPDK_TARBALL")
 
-action = dpdk_source.add_argument(
-"--revision",
-"--rev",
-"--git-ref",
-type=_parse_revision_id,
-help="Git revision ID to test. Could be commit, tag, tree ID etc. "
-"To test local changes, first commit them, then use their commit ID.",
-metavar="ID",
-dest="dpdk_revision_id",
-)
-_add_env_var_to_action(action)
-
 action = parser.add_argument(
 "--remote-source",
 action="store_true",
@@ -526,9 +498,6 @@ def get_settings() -> Settings:
 parser = _get_parser()
 args = parser.parse_args()
 
-if args.dpdk_revision_id:
-args.dpdk_tarball_path = Path(DPDKGitTarball(args.dpdk_revision_id, 
args.output_dir))
-
 args.dpdk_location = _process_dpdk_location(
 args.dpdk_tree_path, args.dpdk_tarball_path, args.remote_source, 
args.build_dir
 )
diff --git a/dts/framework/utils.py b/dts/framework/utils.py
index 5757872fbd..37313c268b 100644
--- a/dts/framework/utils.py
+++ b/dts/framework/utils.py
@@ -14,21 +14,16 @@
 REGEX_FOR_PCI_ADDRESS: The regex representing a PCI address, e.g. 
``:00:08.0``.
 """
 
-import atexit
 import fnmatch
 import json
 import os
-import subprocess
 import tarfile
 from enum import Enum
 from pathlib import Path
-from subprocess import SubprocessError
 from typing import Any
 
 from scapy.packet import Packet  # type: ignore[import-untyped]
 
-from .exception import ConfigurationError
-
 REGEX_FOR_PCI_ADDRESS: str = 
"/[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}.[0-9]{1}/"
 
 
@@ -74,31 +69,6 @@ def get_packet_summaries(packets: list[Packet]) -> str:
 return f"Packet contents: \n{packet_summaries}"
 
 
-def get_commit_id(rev_id: str) -> str:
-"""Given a Git revision ID, return the corresponding commit ID.
-
-Args:
-rev_id: The Git revision ID.
-
-Raises:
-ConfigurationError: The ``git rev-parse`` command failed, suggesting
-an invalid or ambiguous revision ID was supplied.
-"""
-result = subprocess.run(
-["git", "rev-parse", "--verify", rev_id],
-text=True,
-capture_output=True,
-)
-if result.returncode != 0:
-raise ConfigurationError(
-f"{rev_id} is not a valid git reference.\n"
-f"Command: {result.args}\n"
-f"Stdout: {result.stdout}\n"
-f"Stderr: {result.stderr}"
-)
-return result.stdout.strip()
-
-
 class StrEnum(Enum):
 """Enum with members stored as strings."""
 
@@ -174,93 +144,6 @@ def extension(self):
 return f".{self.value}" if self == self.none else 
f".{self.none.value}.{self.value}"
 
 
-class DPDKGitTarball:
-"""Compressed tarball of DPDK from the repository.
-
-The class supports the :class:`os.PathLike` protocol,
-which is used to get the Path of the tarball::
-
-from pathlib import Path
-tarball = DPDKGitTarball("HEAD", "output")
-tarball_path = Path(tarball)
-"""
-
-_git_ref: str
-_tar_compression_format: TarCompressionFormat
-_tarball_dir: Path
-_tarball_name: str
-_tarball_path: Path | None
-
-def __init__(
-self,
-git_ref: str,
-output_dir: str,
-tar_compression_format: TarCompressionFormat = TarCompressionFormat.xz,
-):
-"""Create the tarball during initialization.
-

[RFC PATCH v1 11/12] doc: remove git-ref argument

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Signed-off-by: Tomáš Ďurovec 
---
 doc/guides/tools/dts.rst | 8 
 1 file changed, 8 deletions(-)

diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 8aac22bc60..55e9c37c9b 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -236,8 +236,6 @@ DTS is run with ``main.py`` located in the ``dts`` 
directory after entering Poet
[DTS_TIMEOUT] The default timeout for all DTS 
operations except for compiling DPDK. (default: 15)
  -v, --verbose [DTS_VERBOSE] Specify to enable verbose output, 
logging all messages to the console. (default: False)
  --dpdk-tree DIR_PATH  [DTS_DPDK_TREE] Path to DPDK source code tree to 
test. (default: None)
- --tarball FILE_PATH, --snapshot FILE_PATH
-   [DTS_DPDK_TARBALL] Path to DPDK source code tarball 
to test. (default: None)
  --revision ID, --rev ID, --git-ref ID
[DTS_DPDK_REVISION_ID] Git revision ID to test. 
Could be commit, tag, tree ID etc. To test local changes, first
commit them, then use their commit ID. (default: 
None)
@@ -260,12 +258,6 @@ The brackets contain the names of environment variables 
that set the same thing.
 The minimum DTS needs is a config file and a DPDK source which can add in 
config
 or command line argument/environment variable option.
 
-Example command for running DTS with the template configuration and DPDK tag 
v23.11:
-
-.. code-block:: console
-
-   (dts-py3.10) $ ./main.py --git-ref v23.11
-
 
 DTS Results
 ~~~
-- 
2.43.0



[RFC PATCH v1 12/12] dts: improve statistics

2024-09-06 Thread Juraj Linkeš
From: Tomáš Ďurovec 

Signed-off-by: Tomáš Ďurovec 
---
 dts/framework/runner.py  |   5 +-
 dts/framework/test_result.py | 272 +++
 2 files changed, 187 insertions(+), 90 deletions(-)

diff --git a/dts/framework/runner.py b/dts/framework/runner.py
index c4ac5db194..ff8270a8d7 100644
--- a/dts/framework/runner.py
+++ b/dts/framework/runner.py
@@ -419,7 +419,8 @@ def _run_test_run(
 self._logger.info(
 f"Running test run with SUT 
'{test_run_config.system_under_test_node.name}'."
 )
-test_run_result.add_sut_info(sut_node.node_info)
+test_run_result.ports = sut_node.ports
+test_run_result.sut_info = sut_node.node_info
 try:
 dpdk_location = SETTINGS.dpdk_location or 
test_run_config.dpdk_location
 if not dpdk_location:
@@ -431,7 +432,7 @@ def _run_test_run(
 )
 
 sut_node.set_up_test_run(test_run_config, dpdk_location)
-test_run_result.add_dpdk_build_info(sut_node.get_dpdk_build_info())
+test_run_result.dpdk_build_info = sut_node.get_dpdk_build_info()
 tg_node.set_up_test_run(test_run_config, dpdk_location)
 test_run_result.update_setup(Result.PASS)
 except Exception as e:
diff --git a/dts/framework/test_result.py b/dts/framework/test_result.py
index c4343602aa..cfa1171d7b 100644
--- a/dts/framework/test_result.py
+++ b/dts/framework/test_result.py
@@ -22,18 +22,20 @@
 variable modify the directory where the files with results will be stored.
 """
 
-import os.path
+import json
 from collections.abc import MutableSequence
-from dataclasses import dataclass
+from dataclasses import asdict, dataclass
 from enum import Enum, auto
+from pathlib import Path
 from types import FunctionType
-from typing import Union
+from typing import Any, TypedDict
 
 from .config import DPDKBuildInfo, NodeInfo, TestRunConfiguration, 
TestSuiteConfig
 from .exception import DTSError, ErrorSeverity
 from .logger import DTSLogger
 from .settings import SETTINGS
 from .test_suite import TestSuite
+from .testbed_model.port import Port
 
 
 @dataclass(slots=True, frozen=True)
@@ -85,6 +87,29 @@ def __bool__(self) -> bool:
 return self is self.PASS
 
 
+class TestCaseResultDict(TypedDict):
+test_case_name: str
+result: str
+
+
+class TestSuiteResultDict(TypedDict):
+test_suite_name: str
+test_cases: list[TestCaseResultDict]
+
+
+class TestRunResultDict(TypedDict, total=False):
+compiler_version: str | None
+dpdk_version: str | None
+ports: list[dict[str, Any]] | None
+test_suites: list[TestSuiteResultDict]
+summary: dict[str, Any]
+
+
+class DtsRunResultDict(TypedDict):
+test_runs: list[TestRunResultDict]
+summary: dict[str, Any]
+
+
 class FixtureResult:
 """A record that stores the result of a setup or a teardown.
 
@@ -198,14 +223,12 @@ def get_errors(self) -> list[Exception]:
 """
 return self._get_setup_teardown_errors() + self._get_child_errors()
 
-def add_stats(self, statistics: "Statistics") -> None:
-"""Collate stats from the whole result hierarchy.
+def to_dict(self):
+""" """
 
-Args:
-statistics: The :class:`Statistics` object where the stats will be 
collated.
-"""
+def add_result(self, results: dict[str, Any] | dict[str, float]):
 for child_result in self.child_results:
-child_result.add_stats(statistics)
+child_result.add_result(results)
 
 
 class DTSResult(BaseResult):
@@ -229,8 +252,6 @@ class DTSResult(BaseResult):
 _logger: DTSLogger
 _errors: list[Exception]
 _return_code: ErrorSeverity
-_stats_result: Union["Statistics", None]
-_stats_filename: str
 
 def __init__(self, logger: DTSLogger):
 """Extend the constructor with top-level specifics.
@@ -243,8 +264,6 @@ def __init__(self, logger: DTSLogger):
 self._logger = logger
 self._errors = []
 self._return_code = ErrorSeverity.NO_ERR
-self._stats_result = None
-self._stats_filename = os.path.join(SETTINGS.output_dir, 
"statistics.txt")
 
 def add_test_run(self, test_run_config: TestRunConfiguration) -> 
"TestRunResult":
 """Add and return the child result (test run).
@@ -281,10 +300,8 @@ def process(self) -> None:
 for error in self._errors:
 self._logger.debug(repr(error))
 
-self._stats_result = Statistics(self.dpdk_version)
-self.add_stats(self._stats_result)
-with open(self._stats_filename, "w+") as stats_file:
-stats_file.write(str(self._stats_result))
+TextSummary(self).save(Path(SETTINGS.output_dir, 
"results_summary.txt"))
+JsonResults(self).save(Path(SETTINGS.output_dir, "results.json"))
 
 def get_return_code(self) -> int:
 """Go through all stored Exceptions and return the final DTS error 
code.
@@ -302,6 +319,16 @@ def get_r

[PATCH] eal/x86: fix 32-bit write-combined stores

2024-09-06 Thread Bruce Richardson
The "movdiri" instruction is given as a series of bytes in rte_io.h so
that it works on compilers/assemblers which are unaware of the
instruction. The REX prefix (0x40) on this instruction is invalid for
32-bit code, causing issues. Thankfully, the prefix is unnecessary in
64-bit code, since the data size used is 32-bits.

Fixes: 8a00dfc738fe ("eal: add write combining store")
Cc: radu.nico...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Bruce Richardson 
---
 lib/eal/x86/include/rte_io.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/eal/x86/include/rte_io.h b/lib/eal/x86/include/rte_io.h
index 0e1fefdee1..5366e09c47 100644
--- a/lib/eal/x86/include/rte_io.h
+++ b/lib/eal/x86/include/rte_io.h
@@ -24,7 +24,7 @@ __rte_x86_movdiri(uint32_t value, volatile void *addr)
 {
asm volatile(
/* MOVDIRI */
-   ".byte 0x40, 0x0f, 0x38, 0xf9, 0x02"
+   ".byte 0x0f, 0x38, 0xf9, 0x02"
:
: "a" (value), "d" (addr));
 }
--
2.43.0



[DPDK/ethdev Bug 1536] net/tap: crash in tap pmd when using more than RTE_MP_MAX_FD_NUM rx queues

2024-09-06 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=1536

Bug ID: 1536
   Summary: net/tap: crash in tap pmd when using more than
RTE_MP_MAX_FD_NUM rx queues
   Product: DPDK
   Version: 22.03
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: edwin.brosse...@6wind.com
  Target Milestone: ---

Hello,

I have recently stumbled into an issue with my DPDK-based application running
the failsafe pmd. This pmd uses a tap device, with which my application fails
to start if more than 8 rx queues are used. This issue appears to be related to
this patch:
https://git.dpdk.org/dpdk/commit/?id=c36ce7099c2187926cd62cff7ebd479823554929

I have seen in the documentation that there was a limitation to 8 max queues
shared when using a tap device shared between multiple processes. However, my
application uses a single primary process, with no secondary process, but it
appears that I am still running into this limitation.

Now if we look at this small chunk of code:

memset(&msg, 0, sizeof(msg));
strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name));
strlcpy(request_param->port_name, dev->data->name,
sizeof(request_param->port_name));
msg.len_param = sizeof(*request_param);
for (i = 0; i < dev->data->nb_tx_queues; i++) {
msg.fds[fd_iterator++] = process_private->txq_fds[i];
msg.num_fds++;
request_param->txq_count++;
}
for (i = 0; i < dev->data->nb_rx_queues; i++) {
msg.fds[fd_iterator++] = process_private->rxq_fds[i];
msg.num_fds++;
request_param->rxq_count++;
}
(Note that I am not using the latest DPDK version, but stable v23.11.1. But I
believe the issue is still present on latest.)

There are no checks on the maximum value i can take in the for loops. Since the
size of msg.fds is limited by the maximum of 8 queues shared between process
because of the IPC API, there is a potential buffer overflow which can happen
here.

See the struct declaration:
struct rte_mp_msg {
 char name[RTE_MP_MAX_NAME_LEN];
 int len_param;
 int num_fds;
 uint8_t param[RTE_MP_MAX_PARAM_LEN];
 int fds[RTE_MP_MAX_FD_NUM];
};

This means that if the number of queues used is more than 8, the program will
crash. This is what happens on my end as I get the following log:
*** stack smashing detected ***: terminated

Reverting the commit mentioned above fixes my issue. Also setting a check like
this works for me:

if (dev->data->nb_tx_queues + dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM)
 return -1;

I've made the changes on my local branch to fix my issue.

--

Potential fixes discussed: 

1. Add "nb_rx_queues > RTE_MP_MAX_FD_NUM" check to not blindly update the
'msg.fds[]'

2. Prevent this to be a limit for tap PMD when there is only a primary process.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [PATCH v3 1/1] dts: add methods for modifying MTU to testpmd shell

2024-09-06 Thread Juraj Linkeš

diff --git a/dts/framework/remote_session/testpmd_shell.py 
b/dts/framework/remote_session/testpmd_shell.py
index ca24b28070..c1462ba2d3 100644
--- a/dts/framework/remote_session/testpmd_shell.py
+++ b/dts/framework/remote_session/testpmd_shell.py
@@ -888,6 +888,51 @@ def show_port_stats(self, port_id: int) -> 
TestPmdPortStats:



+def set_port_mtu_all(self, mtu: int, verify: bool = True) -> None:
+"""Change the MTU of all ports using testpmd.
+
+Runs :meth:`set_port_mtu` for every port that testpmd is aware of.
+
+Args:
+mtu: Desired value for the MTU to be set to.
+verify: Whether to verify that setting the MTU on each port was 
successful or not.
+Defaults to :data:`True`.
+
+Raises:
+InteractiveCommandExecutionError: If `verify` is :data:`True` and 
the MTU was not
+properly updated on at least one port.
+"""
+if self._app_params.ports is not None:


We should utilize the port info caching patch here:
https://patches.dpdk.org/project/dpdk/patch/20240823074137.13989-1-juraj.lin...@pantheon.tech/

Other than that, the patch looks good.


+for port_id in range(len(self._app_params.ports)):
+self.set_port_mtu(port_id, mtu, verify)
+
  def _close(self) -> None:
  """Overrides :meth:`~.interactive_shell.close`."""
  self.stop()




RE: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK

2024-09-06 Thread Morten Brørup
> From: Burakov, Anatoly [mailto:anatoly.bura...@intel.com]
> Sent: Friday, 6 September 2024 15.18
> Subject: Re: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in DPDK
> 
> On 9/6/2024 3:07 PM, Bruce Richardson wrote:
> > On Fri, Sep 06, 2024 at 03:02:53PM +0200, Morten Brørup wrote:
> >>> From: Burakov, Anatoly [mailto:anatoly.bura...@intel.com]
> >>> Sent: Friday, 6 September 2024 14.46
> >>>
> >>> On 9/6/2024 2:37 PM, Morten Brørup wrote:
> > From: Anatoly Burakov [mailto:anatoly.bura...@intel.com]
> > Sent: Friday, 6 September 2024 13.47
> > To: dev@dpdk.org
> > Subject: [RFC PATCH v1 0/5] Adjust wording for NUMA vs. socket ID in
> DPDK
> >
> > While initially, DPDK has used the term "socket ID" to refer to physical
> > package
> > ID, the last time DPDK read "physical_package_id" for socket ID was ~9
> >>> years
> > ago, so it's been a while since we've actually switched over to using
> the
> >>> term
> > "socket" to mean "NUMA node".
> >
> > This wasn't a problem before, as most systems had one NUMA node per
> >>> physical
> > socket. However, in the last few years, more and more systems have
> multiple
> > NUMA
> > nodes per physical CPU socket. Since DPDK used NUMA nodes already, the
> > transition was pretty seamless, however now we're faced with a situation
> >>> when
> > most of our documentation still uses outdated terms, and our API is ripe
> >>> with
> > references to "sockets" when in actuality we mean "NUMA nodes". This
> could
> >>> be
> > a
> > source of confusion.
> >
> > While completely renaming all of our API's would be a huge effort, will
> >>> take a
> > long time and arguably wouldn't even be worth the API breakages (given
> that
> > this
> > mismatch between terminology and reality is implicitly understood by
> most
> > people
> > working on DPDK, and so this isn't so much of a problem in practice), we
> >>> can
> > do
> > some tweaks around the edges and at least document this unfortunate
> >>> reality.
> >
> > This patchset suggests the following changes:
> >
> > - Update rte_socket/rte_lcore documentation to refer to NUMA nodes
> rather
> >>> than
> > sockets - Rename internal structures' fields to better reflect this
> >>> intention
> > -
> > Rename --socket-mem/--socket-limit flags to refer to NUMA rather than
> >>> sockets
> > -
> > Add internal API to get physical package ID [1]
> >
> > The documentation is updated to refer to new EAL flags, but is otherwise
> >>> left
> > untouched, and instead the entry in "glossary" is amended to indicate
> that
> > when
> > DPDK documentation refers to "sockets", it actually means "NUMA ID's".
> As
> >>> next
> > steps, we could rename all API parameters to refer to NUMA ID rather
> than
> > socket
> > ID - this would not break neither API nor ABI, and instead would be a
> > documentation change in practice.
> >
> > [1] This could be used to group lcores by physical package, see e.g.
> > discussion
> >   under this patch:
> > https://patches.dpdk.org/project/dpdk/cover/20240827151014.201-1-
> > vipin.vargh...@amd.com/
> 
>  Thank you for cleaning this up, Anatoly.
> 
>  I would prefer to take one more step and also rename functions and
> >>> parameters, e.g. rte_socket_id() -> rte_numa_id().
> 
>  For backwards compatibility, macros/functions with the old names can be
> >>> added.
> 
> >>>
> >>> I don't think we can do such changes without deprecation notices, but
> >>> it's a good candidate for next release.
> >>
> >> Perhaps we can keep ABI compatibility by adding wrapper functions with the
> old names/parameters, which simply call the same functions with the new
> names/parameters.
> >>
> >> The Devil is in the details, and I haven't looked deeply into this. So take
> with a grain of salt.
> >>
> >>>
> >>> I have thought about including parameter renames in this patchset, but
> >>> for now I decided against doing so. I can certainly include this in the
> >>> next revision if that's something community is willing to accept.
> >>
> >> I agree with your decision on this. Renaming the parameters without
> renaming the functions could be confusing.
> >>
> >
> > I actually wonder if that is true. If we are simply renaming the parameters
> > without:
> > a) changing their types
> > b) changing the function behaviour
> > then it is neither an API nor an ABI break. If we were to do so, it would
> > be like changing a comment, since the actual parameter name is purely a
> > convenience to hint to the user what the value being passed actually does.
> >
> > That only applies for function parameters though. For any defines or macros
> > that need renaming, then we are into API break territory and we would want
> > backward compatible versions of same.
> >
> 
> To be clear, I was referring to the 

Re: [PATCH 0/3] eal: mark API's as stable

2024-09-06 Thread Ferruh Yigit
On 9/6/2024 2:11 PM, Jerin Jacob wrote:
> On Fri, Sep 6, 2024 at 3:04 PM Ferruh Yigit  wrote:
>>
>> On 9/5/2024 8:58 AM, David Marchand wrote:
>>> On Wed, Sep 4, 2024 at 8:10 PM Stephen Hemminger
>>>  wrote:

 The API's in ethtool from before 23.11 should be marked stable.
>>>
>>> EAL* ?
>>>
 Should probably include the trace api's but that is more complex change.
>>>
>>> On the trace API itself it should be ok.
>>> The problem is with the tracepoint variables themselves, and I don't
>>> think we should mark them stable.
>>>
>>
>> We cleaned tracepoint variables from ethdev map file, why they exist for
>> 'eal'?
>>
>> I can see .map file has bunch of "__rte_eal_trace_generic_*", I think
>> they exists to support 'rte_eal_trace_generic_*()' APIs which can be
>> called from other libraries.
>>
>> Do we really need them?
>> Why not whoever calls them directly call 'rte_trace_point_emit_*' instead?
>> As these rte_eal_trace_generic_*()' not used at all, I assume this is
>> what done already.
>>
>> @Jerin,
>> what do think to remove 'rte_eal_trace_generic_*()' APIs, so trace
>> always keeps local to library, and don't bloat the eal .map file?
> 
> The purpose of exposing rte_eal_trace_generic_* is that, applications
> can add generic trace points
> in the application.
> 

Can't applications use 'rte_trace_point_emit_*()' directly, as libraries
does?



Re: Crash in tap pmd when using more than 8 rx queues

2024-09-06 Thread Edwin Brossette
Hello,

I created a Bugzilla PR, just as you requested:
https://bugs.dpdk.org/show_bug.cgi?id=1536

As for the bug resolution, I have other matters to attend to and I'm afraid
I cannot spend more time on this issue, so I was only planning to report it.

Regards,
Edwin Brossette.

On Fri, Sep 6, 2024 at 1:16 PM Ferruh Yigit  wrote:

> On 9/5/2024 1:55 PM, Edwin Brossette wrote:
> > Hello,
> >
> > I have recently stumbled into an issue with my DPDK-based application
> > running the failsafe pmd. This pmd uses a tap device, with which my
> > application fails to start if more than 8 rx queues are used. This issue
> > appears to be related to this patch:
> > https://git.dpdk.org/dpdk/commit/?
> > id=c36ce7099c2187926cd62cff7ebd479823554929  > commit/?id=c36ce7099c2187926cd62cff7ebd479823554929>
> >
> > I have seen in the documentation that there was a limitation to 8 max
> > queues shared when using a tap device shared between multiple processes.
> > However, my application uses a single primary process, with no secondary
> > process, but it appears that I am still running into this limitation.
> >
> > Now if we look at this small chunk of code:
> >
> > memset(&msg, 0, sizeof(msg));
> > strlcpy(msg.name , TAP_MP_REQ_START_RXTX,
> > sizeof(msg.name ));
> > strlcpy(request_param->port_name, dev->data->name, sizeof(request_param-
> >>port_name));
> > msg.len_param = sizeof(*request_param);
> > for (i = 0; i < dev->data->nb_tx_queues; i++) {
> > msg.fds[fd_iterator++] = process_private->txq_fds[i];
> > msg.num_fds++;
> > request_param->txq_count++;
> > }
> > for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > msg.fds[fd_iterator++] = process_private->rxq_fds[i];
> > msg.num_fds++;
> > request_param->rxq_count++;
> > }
> > (Note that I am not using the latest DPDK version, but stable v23.11.1.
> > But I believe the issue is still present on latest.)
> >
> > There are no checks on the maximum value i can take in the for loops.
> > Since the size of msg.fds is limited by the maximum of 8 queues shared
> > between process because of the IPC API, there is a potential buffer
> > overflow which can happen here.
> >
> > See the struct declaration:
> > struct rte_mp_msg {
> >  char name[RTE_MP_MAX_NAME_LEN];
> >  int len_param;
> >  int num_fds;
> >  uint8_t param[RTE_MP_MAX_PARAM_LEN];
> >  int fds[RTE_MP_MAX_FD_NUM];
> > };
> >
> > This means that if the number of queues used is more than 8, the program
> > will crash. This is what happens on my end as I get the following log:
> > *** stack smashing detected ***: terminated
> >
> > Reverting the commit mentionned above fixes my issue. Also setting a
> > check like this works for me:
> >
> > if (dev->data->nb_tx_queues + dev->data->nb_rx_queues >
> RTE_MP_MAX_FD_NUM)
> >  return -1;
> >
> > I've made the changes on my local branch to fix my issue. This mail is
> > just to bring attention on this problem.
> > Thank you in advance for considering it.
> >
>
> Hi Edwin,
>
> Thanks for the report, I confirm issue is valid, although that code
> changed a little (to increase 8 limit) [3].
>
> And in this release Stephen put another patch [1] to increase the limit
> even more, but irrelevant from the limit, tap code needs to be fixed.
>
> To fix:
> 1. We need to add "nb_rx_queues > RTE_MP_MAX_FD_NUM" check you
> mentioned, to not blindly update the 'msg.fds[]'
> 2. We should prevent this to be a limit for tap PMD when there is only
> primary process, this seems was oversight in our end.
>
>
> Can you work on the issue or just reporting it?
> Can you please report the bug in Bugzilla [2], to record the issue?
>
>
>
> [1]
>
> https://patches.dpdk.org/project/dpdk/patch/20240905162018.74301-1-step...@networkplumber.org/
>
> [2]
> https://bugs.dpdk.org/
>
> [3]
> https://git.dpdk.org/dpdk/commit/?id=72ab1dc1598e
>
>


RE: DPDK Summit Montreal - Schedule

2024-09-06 Thread Konstantin Ananyev


> > We will talk about the future of DPDK, the best userland networking 
> > libraries
> > having an incredible hardware support from our large community.
> > It will be an opportunity to connect, learn and collaborate with developers
> > from around the world who contribute to and utilize DPDK.
> >
> > Talks will cover CPU optimizations, GPU processing, machine learning,
> > hashing, packet offload, cryptography, testing and more.
> >
> > The schedule can be found here, almost complete:
> > https://events.linuxfoundation.org/dpdk-summit/program/schedule/
> >
> >
> > A workshop session is planned to allow debating and making progress
> > in smaller group discussions about specific topics to be determined.
> > Examples of such topics could be:
> > - debuggability
> > - power management techniques & efficiency
> > - config restore bypass in ethdev port start
> > - secondary process usage and limitations
> >
> > Feel free to propose your ideas in advance so we can come prepared.
> > Then we will organize ourselves in discussion groups
> > in order to progress and hopefully reach some new conclusions.
> >
> 
> What do you think about 'rte_flow', it is a powerful tool but complex
> and not adopted in same level by all vendors. As it hard to test, we are
> having difficulty to provide consistency between vendor implementations.
> We can discuss how to spread understanding among various vendors and
> users, how to increase adoption, and future targets/plans.
> 
> 
> Another one can be 'tooling', as a result of kernel bypass, some known
> Linux networking tools does not work with DPDK solutions, and this
> creates confusing and entry barrier for some people, this problem
> mentioned a few times before.
> Perhaps we should address this problem in a more structured way, to
> design and later implement gradually some solutions. We can discuss
> methods and plans to improve our tooling support.
> 

Thanks Ferruh, sound like an interesting ones to me, specially the second one.
As another possible subject: we talk about how to make core DPDK data-structures
(mempool, hash-table, ring, etc.) less static: i.e. add ability to grow/shrink 
on demand?
Another long-hanging thing - RTE_MAX_LCORE... - can it be runtime parameter, 
instead
of build-time parameter?
All these things I think would help overall  by reducing memory footprint, 
improving usability, etc. 



[PATCH 2/4] net/ice: fix AVX-512 pointer copy on 32-bit

2024-09-06 Thread Bruce Richardson
The size of a pointer on 32-bit is only 4 rather than 8 bytes, so
copying 32 pointers only requires half the number of AVX-512 load store
operations.

Fixes: a4e480de268e ("net/ice: optimize Tx by using AVX512")
Cc: sta...@dpdk.org

Signed-off-by: Bruce Richardson 
---
 drivers/net/ice/ice_rxtx_vec_avx512.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c 
b/drivers/net/ice/ice_rxtx_vec_avx512.c
index 04148e8ea2..add095ef06 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx512.c
@@ -907,6 +907,7 @@ ice_tx_free_bufs_avx512(struct ice_tx_queue *txq)
uint32_t copied = 0;
/* n is multiple of 32 */
while (copied < n) {
+#ifdef RTE_ARCH_64
const __m512i a = _mm512_loadu_si512(&txep[copied]);
const __m512i b = _mm512_loadu_si512(&txep[copied + 8]);
const __m512i c = _mm512_loadu_si512(&txep[copied + 
16]);
@@ -916,6 +917,12 @@ ice_tx_free_bufs_avx512(struct ice_tx_queue *txq)
_mm512_storeu_si512(&cache_objs[copied + 8], b);
_mm512_storeu_si512(&cache_objs[copied + 16], c);
_mm512_storeu_si512(&cache_objs[copied + 24], d);
+#else
+   const __m512i a = _mm512_loadu_si512(&txep[copied]);
+   const __m512i b = _mm512_loadu_si512(&txep[copied + 
16]);
+   _mm512_storeu_si512(&cache_objs[copied], a);
+   _mm512_storeu_si512(&cache_objs[copied + 16], b);
+#endif
copied += 32;
}
cache->len += n;
-- 
2.43.0



[PATCH 0/4] fix issues with using AVX-512 drivers on 32-bit

2024-09-06 Thread Bruce Richardson
The AVX-512 copy code in multiple drivers was incorrect for 32-bit as it
assumed that each pointer was always 8B in size.

Bruce Richardson (4):
  net/i40e: fix AVX-512 pointer copy on 32-bit
  net/ice: fix AVX-512 pointer copy on 32-bit
  net/iavf: fix AVX-512 pointer copy on 32-bit
  common/idpf: fix AVX-512 pointer copy on 32-bit

 drivers/common/idpf/idpf_common_rxtx_avx512.c | 7 +++
 drivers/net/i40e/i40e_rxtx_vec_avx512.c   | 7 +++
 drivers/net/iavf/iavf_rxtx_vec_avx512.c   | 7 +++
 drivers/net/ice/ice_rxtx_vec_avx512.c | 7 +++
 4 files changed, 28 insertions(+)

--
2.43.0



[PATCH 1/4] net/i40e: fix AVX-512 pointer copy on 32-bit

2024-09-06 Thread Bruce Richardson
The size of a pointer on 32-bit is only 4 rather than 8 bytes, so
copying 32 pointers only requires half the number of AVX-512 load store
operations.

Fixes: 5171b4ee6b6b ("net/i40e: optimize Tx by using AVX512")
Cc: sta...@dpdk.org

Signed-off-by: Bruce Richardson 
---
 drivers/net/i40e/i40e_rxtx_vec_avx512.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c 
b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
index 0238b03f8a..3b2750221b 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
@@ -799,6 +799,7 @@ i40e_tx_free_bufs_avx512(struct i40e_tx_queue *txq)
uint32_t copied = 0;
/* n is multiple of 32 */
while (copied < n) {
+#ifdef RTE_ARCH_64
const __m512i a = _mm512_load_si512(&txep[copied]);
const __m512i b = _mm512_load_si512(&txep[copied + 8]);
const __m512i c = _mm512_load_si512(&txep[copied + 16]);
@@ -808,6 +809,12 @@ i40e_tx_free_bufs_avx512(struct i40e_tx_queue *txq)
_mm512_storeu_si512(&cache_objs[copied + 8], b);
_mm512_storeu_si512(&cache_objs[copied + 16], c);
_mm512_storeu_si512(&cache_objs[copied + 24], d);
+#else
+   const __m512i a = _mm512_load_si512(&txep[copied]);
+   const __m512i b = _mm512_load_si512(&txep[copied + 16]);
+   _mm512_storeu_si512(&cache_objs[copied], a);
+   _mm512_storeu_si512(&cache_objs[copied + 16], b);
+#endif
copied += 32;
}
cache->len += n;
-- 
2.43.0



Re: [PATCH 0/3] eal: mark API's as stable

2024-09-06 Thread Ferruh Yigit
On 9/6/2024 11:04 AM, Morten Brørup wrote:
>> From: Ferruh Yigit [mailto:ferruh.yi...@amd.com]
>> Sent: Friday, 6 September 2024 10.54
>>
>> On 9/5/2024 3:01 PM, Jerin Jacob wrote:
>>> On Thu, Sep 5, 2024 at 3:14 PM Morten Brørup 
>> wrote:

> From: David Marchand [mailto:david.march...@redhat.com]
> Sent: Thursday, 5 September 2024 11.03
>
> On Thu, Sep 5, 2024 at 10:55 AM Morten Brørup 
> wrote:
>>
>>> From: David Marchand [mailto:david.march...@redhat.com]
>>> Sent: Thursday, 5 September 2024 09.59
>>>
>>> On Wed, Sep 4, 2024 at 8:10 PM Stephen Hemminger
>>>  wrote:

 The API's in ethtool from before 23.11 should be marked stable.
>>>
>>> EAL* ?
>>>
 Should probably include the trace api's but that is more complex
>> change.
>>>
>>> On the trace API itself it should be ok.
>>
>> No!
>
> *sigh*
>
>>
>> Trace must remain experimental until controlled by a meson option, e.g.
> "enable_trace", whereby trace can be completely disabled and omitted from
>> the
> compiled application/libraries/drivers at build time.
>
> This seems unrelated to marking the API stable as regardless of the
> API state at the moment, this code is always present.

 I cannot foresee if disabling trace at build time will require changes to
>> the trace API. So I'm being cautious here.

 However, if Jerin (as author of the trace subsystem) foresees that it will
>> be possible to disable trace at build time without affecting the trace API, I
>> don't object to marking the trace API (or some of it) stable.
>>>
>>> I don't for foresee any ABI changes when adding disabling trace
>>> compile time support. However, I don't understand why we need to do
>>> that. In the sense, fast path functions are already having an option
>>> to compile out.
>>> Slow path functions can be disabled at runtime at the cost of 1 cycle
>>> as instrumentation cost. Having said that, I don't have any concern
>>> about disabling trace as an option.
>>>
>>
>> I agree with Jerin, I don't see motivation to disable slow path traces
>> when they can be disabled in runtime.
>> And fast path traces already have compile flag to disable them.
>>
>> Build time configurations in long term has problems too, so I am for not
>> using them unless we don't have to.
> 
> For some use cases, trace is dead code, and should be omitted.
> You don't want dead code in production systems.
> 
> Please remember that DPDK is also being used in highly optimized embedded 
> systems, hardware appliances and other systems where memory is not abundant.
> 
> DPDK is not only for cloud and distros. ;-)
> 
> The CI only tests DPDK with a build time configuration expected to be usable 
> for distros.
> I'm not asking to change that.
> I'm only asking for more build time configurability to support other use 
> cases.
> 

I see, but that build time configuration argument exists in multiple
aspects. And with meson switch we lean to dynamic configuration approach.

When a build time config introduced, again and again we are having cases
that specific code enabled with compile time macro broken and nobody
noticed.
Having code enabled always and configured in runtime produces more
robust deliverable.

We are aware that DPDK is used in embedded device, but they are not
mostly very resource restricted devices, is it really matter to have a
few megabytes (I didn't check but I expect this the max binary size can
increase with tracing code) larger DPDK binary, does it really makes any
difference?



[PATCH 3/4] net/iavf: fix AVX-512 pointer copy on 32-bit

2024-09-06 Thread Bruce Richardson
The size of a pointer on 32-bit is only 4 rather than 8 bytes, so
copying 32 pointers only requires half the number of AVX-512 load store
operations.

Fixes: 9ab9514c150e ("net/iavf: enable AVX512 for Tx")
Cc: sta...@dpdk.org

Signed-off-by: Bruce Richardson 
---
 drivers/net/iavf/iavf_rxtx_vec_avx512.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c 
b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
index 3bb6f305df..d6a861bf80 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
@@ -1892,6 +1892,7 @@ iavf_tx_free_bufs_avx512(struct iavf_tx_queue *txq)
uint32_t copied = 0;
/* n is multiple of 32 */
while (copied < n) {
+#ifdef RTE_ARCH_64
const __m512i a = _mm512_loadu_si512(&txep[copied]);
const __m512i b = _mm512_loadu_si512(&txep[copied + 8]);
const __m512i c = _mm512_loadu_si512(&txep[copied + 
16]);
@@ -1901,6 +1902,12 @@ iavf_tx_free_bufs_avx512(struct iavf_tx_queue *txq)
_mm512_storeu_si512(&cache_objs[copied + 8], b);
_mm512_storeu_si512(&cache_objs[copied + 16], c);
_mm512_storeu_si512(&cache_objs[copied + 24], d);
+#else
+   const __m512i a = _mm512_loadu_si512(&txep[copied]);
+   const __m512i b = _mm512_loadu_si512(&txep[copied + 
16]);
+   _mm512_storeu_si512(&cache_objs[copied], a);
+   _mm512_storeu_si512(&cache_objs[copied + 16], b);
+#endif
copied += 32;
}
cache->len += n;
-- 
2.43.0



[PATCH 4/4] common/idpf: fix AVX-512 pointer copy on 32-bit

2024-09-06 Thread Bruce Richardson
The size of a pointer on 32-bit is only 4 rather than 8 bytes, so
copying 32 pointers only requires half the number of AVX-512 load store
operations.

Fixes: 5bf87b45b2c8 ("net/idpf: add AVX512 data path for single queue model")
Cc: sta...@dpdk.org

Signed-off-by: Bruce Richardson 
---
 drivers/common/idpf/idpf_common_rxtx_avx512.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c 
b/drivers/common/idpf/idpf_common_rxtx_avx512.c
index 3b5e124ec8..b8450b03ae 100644
--- a/drivers/common/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c
@@ -1043,6 +1043,7 @@ idpf_tx_singleq_free_bufs_avx512(struct idpf_tx_queue 
*txq)
uint32_t copied = 0;
/* n is multiple of 32 */
while (copied < n) {
+#ifdef RTE_ARCH_64
const __m512i a = _mm512_loadu_si512(&txep[copied]);
const __m512i b = _mm512_loadu_si512(&txep[copied + 8]);
const __m512i c = _mm512_loadu_si512(&txep[copied + 
16]);
@@ -1052,6 +1053,12 @@ idpf_tx_singleq_free_bufs_avx512(struct idpf_tx_queue 
*txq)
_mm512_storeu_si512(&cache_objs[copied + 8], b);
_mm512_storeu_si512(&cache_objs[copied + 16], c);
_mm512_storeu_si512(&cache_objs[copied + 24], d);
+#else
+   const __m512i a = _mm512_loadu_si512(&txep[copied]);
+   const __m512i b = _mm512_loadu_si512(&txep[copied + 
16]);
+   _mm512_storeu_si512(&cache_objs[copied], a);
+   _mm512_storeu_si512(&cache_objs[copied + 16], b);
+#endif
copied += 32;
}
cache->len += n;
-- 
2.43.0



  1   2   >