Re: [PATCH 1/3] net/enic: add support for eCPRI matching

2022-01-27 Thread Thomas Monjalon
26/01/2022 15:01, Ferruh Yigit:
> On 1/26/2022 2:00 PM, Ferruh Yigit wrote:
> > On 1/14/2022 3:10 AM, John Daley wrote:
> >> eCPRI message can be over Ethernet layer (.1Q supported also) or over
> >> UDP layer. Message header formats are the same in these two variants.
> >>
> >> Only up though the first packet header in the PDU can be matched.
> >> RSS on the eCPRI header fields is not supported.
> >>
> >> Signed-off-by: John Daley
> >> Reviewed-by: Hyong Youb Kim
> >> ---
> >>   doc/guides/rel_notes/release_22_03.rst |  1 +
> >>   drivers/net/enic/enic_fm_flow.c| 65 ++
> >>   2 files changed, 66 insertions(+)
> > 
> > Documentation update is missing, can you please fix?
> > 
> > $ ./devtools/check-doc-vs-code.sh
> > rte_flow doc out of sync for enic
> >  item ecpri
> 
> Hi Thomas,
> 
> Can we add './devtools/check-doc-vs-code.sh' check to CI, what do you think?

Yes of course




RE: [PATCH v2] doc: update matching versions in ice guide

2022-01-27 Thread Guo, Junfeng



> -Original Message-
> From: Zhang, Qi Z 
> Sent: Tuesday, January 25, 2022 09:26
> To: Yang, Qiming 
> Cc: Guo, Junfeng ; dev@dpdk.org;
> david.march...@redhat.com; Zhang, Qi Z ;
> sta...@dpdk.org
> Subject: [PATCH v2] doc: update matching versions in ice guide
> 
> Add recommended matching list for ice PMD in DPDK 21.08 and DPDK
> 21.11.
> 
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Qi Zhang 
> ---
> v2:
> - cc stable as backport is required.
> 
>  doc/guides/nics/ice.rst | 4 
>  1 file changed, 4 insertions(+)
> 
> --
> 2.26.2

Acked-by: Junfeng Guo 

Regards,
Junfeng Guo


Re: [PATCH v3 0/9] vhost: improve logging

2022-01-27 Thread David Marchand
On Thu, Jan 27, 2022 at 6:37 AM Xia, Chenbo  wrote:
> > From: Maxime Coquelin 
> >
> > This series aims at easing Vhost logs analysis, by
> > prepending the Vhost-user socket path to all logs and to
> > remove multi-line comments. Doing so, filtering Vhost-user
> > ports logs is much easier.
> >
> > Changes in v3:
> > ==
> > - Fix various typos reported (Chenbo)
> > - Revert one multi-line comment removal (Chenbo)
> >
> > Changes in v2:
> > ==
> > - Add missing socket paths (David)
> > - avoid identical logs in iotlb code (David)
> > - Use data log type when used in datapath (David)
> >
> > Maxime Coquelin (9):
> >   vhost: improve IOTLB logs
> >   vhost: improve vDPA registration failure log
> >   vhost: improve Vhost layer logs
> >   vhost: improve Vhost-user layer logs
> >   vhost: improve socket layer logs
> >   vhost: improve Virtio-net layer logs
> >   vhost: remove multi-line logs
> >   vhost: differentiate IOTLB logs
> >   vhost: use proper logging type for data path
> >
> >  lib/vhost/iotlb.c  |  30 +-
> >  lib/vhost/iotlb.h  |  10 +-
> >  lib/vhost/socket.c | 148 -
> >  lib/vhost/vdpa.c   |   4 +-
> >  lib/vhost/vhost.c  | 108 ---
> >  lib/vhost/vhost_user.c | 678 -
> >  lib/vhost/vhost_user.h |   4 +-
> >  lib/vhost/virtio_net.c | 165 +-
> >  8 files changed, 548 insertions(+), 599 deletions(-)
> >
> > --
> > 2.34.1
>
> Series applied to next-virtio/main, thanks

For the series:
Reviewed-by: David Marchand 


-- 
David Marchand



RE: [PATCH v2] net/af_xdp: use libxdp if available

2022-01-27 Thread Loftus, Ciara
> 
> On Tue, Jan 25, 2022 at 07:20:43AM +, Ciara Loftus wrote:
> > AF_XDP support is deprecated in libbpf since v0.7.0 [1]. The
> > libxdp library now provides the functionality which once was in
> > libbpf and which the AF_XDP PMD relies on. This commit updates the
> > AF_XDP meson build to use the libxdp library if a version >= v1.2.2 is
> > available. If it is not available, only versions of libbpf prior to v0.7.0
> > are allowed, as they still contain the required AF_XDP functionality.
> >
> > libbpf still remains a dependency even if libxdp is present, as we
> > use libbpf APIs for program loading.
> >
> > The minimum required kernel version for libxdp for use with AF_XDP is
> v5.3.
> > For the library to be fully-featured, a kernel v5.10 or newer is
> > recommended. The full compatibility information can be found in the
> libxdp
> > README.
> >
> > v1.2.2 of libxdp includes an important fix required for linking with
> > DPDK which is why this version or greater is required. Meson uses
> > pkg-config to verify the version of libxdp on the system, so it is
> > necessary that the library is discoverable using pkg-config in order for
> > the PMD to use it. To verify this, you can run:
> > pkg-config --modversion libxdp
> >
> > [1] https://github.com/libbpf/libbpf/commit/277846bc6c15
> >
> > Signed-off-by: Ciara Loftus 
> 
> Hi Ciara,
> 
> couple of comments inline below.
> 
> /Bruce
> 
> > ---
> > v2:
> > * Set minimum libxdp version at v1.2.2
> >
> > RFC -> v1:
> > * Set minimum libxdp version at v1.3.0
> > * Don't provide alternative to discovery via pkg-config
> > * Add missing newline to end of file
> > ---
> >  doc/guides/nics/af_xdp.rst |  6 ++--
> >  doc/guides/rel_notes/release_22_03.rst |  4 +++
> >  drivers/net/af_xdp/compat.h|  4 +++
> >  drivers/net/af_xdp/meson.build | 39 +-
> >  drivers/net/af_xdp/rte_eth_af_xdp.c|  1 -
> >  5 files changed, 42 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> > index c9d0e1ad6c..db02ea1984 100644
> > --- a/doc/guides/nics/af_xdp.rst
> > +++ b/doc/guides/nics/af_xdp.rst
> > @@ -43,9 +43,7 @@ Prerequisites
> >  This is a Linux-specific PMD, thus the following prerequisites apply:
> >
> >  *  A Linux Kernel (version > v4.18) with XDP sockets configuration enabled;
> > -*  libbpf (within kernel version > v5.1-rc4) with latest af_xdp support
> installed,
> > -   User can install libbpf via `make install_lib` && `make 
> > install_headers` in
> > -   /tools/lib/bpf;
> > +*  Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf <=v0.6.0
> >  *  A Kernel bound interface to attach to;
> >  *  For need_wakeup feature, it requires kernel version later than v5.3-rc1;
> >  *  For PMD zero copy, it requires kernel version later than v5.4-rc1;
> > @@ -143,4 +141,4 @@ Limitations
> >NAPI context from a watchdog timer instead of from softirqs. More
> information
> >on this feature can be found at [1].
> >
> > -  [1] https://lwn.net/Articles/837010/
> > \ No newline at end of file
> > +  [1] https://lwn.net/Articles/837010/
> > diff --git a/doc/guides/rel_notes/release_22_03.rst
> b/doc/guides/rel_notes/release_22_03.rst
> > index 8a202ec4f4..ad7283df65 100644
> > --- a/doc/guides/rel_notes/release_22_03.rst
> > +++ b/doc/guides/rel_notes/release_22_03.rst
> > @@ -55,6 +55,10 @@ New Features
> >   Also, make sure to start the actual text at the margin.
> >   ===
> >
> > +* **Update AF_XDP PMD**
> > +
> > +  * Added support for libxdp >=v1.2.2.
> > +
> >
> >  Removed Items
> >  -
> > diff --git a/drivers/net/af_xdp/compat.h b/drivers/net/af_xdp/compat.h
> > index 3880dc7dd7..245df1b109 100644
> > --- a/drivers/net/af_xdp/compat.h
> > +++ b/drivers/net/af_xdp/compat.h
> > @@ -2,7 +2,11 @@
> >   * Copyright(c) 2020 Intel Corporation.
> >   */
> >
> > +#ifdef RTE_LIBRTE_AF_XDP_PMD_LIBXDP
> 
> This is a really long macro name. With meson builds we have largely moved
> away from using "RTE_LIBRTE_" as a prefix, and also have dropped "PMD"
> from
> names too. The global enable macro for AF_XDP driver is now
> "RTE_NET_AF_XDP" so I'd suggest this macro could be shortened to
> "RTE_NET_AF_XDP_LIBXDP".

+1

> 
> > +#include 
> > +#else
> >  #include 
> > +#endif
> >  #include 
> >  #include 
> >
> > diff --git a/drivers/net/af_xdp/meson.build
> b/drivers/net/af_xdp/meson.build
> > index 3ed2b29784..981d4c6087 100644
> > --- a/drivers/net/af_xdp/meson.build
> > +++ b/drivers/net/af_xdp/meson.build
> > @@ -9,19 +9,44 @@ endif
> >
> >  sources = files('rte_eth_af_xdp.c')
> >
> > +xdp_dep = dependency('libxdp', version : '>=1.2.2', required: false,
> method: 'pkg-config')
> >  bpf_dep = dependency('libbpf', required: false, method: 'pkg-config')
> >  if not bpf_dep.found()
> >  bpf_dep = cc.find_library('bpf', required: false)
> >  endif
> >
> > -if bpf_dep.found() and cc.h

[PATCH v2] vhost: add vDPA resource cleanup callback

2022-01-27 Thread Xueming Li
This patch adds vDPA device cleanup callback to release resources on
vhost user connection close.

Signed-off-by: Xueming Li 
---
 lib/vhost/vdpa_driver.h | 3 +++
 lib/vhost/vhost_user.c  | 6 ++
 2 files changed, 9 insertions(+)

diff --git a/lib/vhost/vdpa_driver.h b/lib/vhost/vdpa_driver.h
index fc2d6acedd1..fddbd506523 100644
--- a/lib/vhost/vdpa_driver.h
+++ b/lib/vhost/vdpa_driver.h
@@ -34,6 +34,9 @@ struct rte_vdpa_dev_ops {
/** Driver close the device (Mandatory) */
int (*dev_close)(int vid);
 
+   /** Connection closed, clean up resources */
+   int (*dev_cleanup)(int vid);
+
/** Enable/disable this vring (Mandatory) */
int (*set_vring_state)(int vid, int vring, int state);
 
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 5eb1dd68123..798b0ca4c0d 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -220,6 +220,12 @@ free_mem_region(struct virtio_net *dev)
 void
 vhost_backend_cleanup(struct virtio_net *dev)
 {
+   struct rte_vdpa_device *vdpa_dev;
+
+   vdpa_dev = dev->vdpa_dev;
+   if (vdpa_dev && vdpa_dev->ops->dev_cleanup != NULL)
+   vdpa_dev->ops->dev_cleanup(dev->vid);
+
if (dev->mem) {
free_mem_region(dev);
rte_free(dev->mem);
-- 
2.34.1



Re: [PATCH] vhost: add vDPA resource cleanup callback

2022-01-27 Thread Xueming(Steven) Li
On Wed, 2022-01-26 at 11:03 +0100, Maxime Coquelin wrote:
> Hi Xueming,
> 
> On 11/3/21 14:49, Maxime Coquelin wrote:
> > 
> > 
> > On 11/3/21 14:45, Xueming(Steven) Li wrote:
> > > On Wed, 2021-11-03 at 09:46 +0100, Maxime Coquelin wrote:
> > > > 
> > > > On 11/3/21 09:41, Xia, Chenbo wrote:
> > > > > Hi Xueming,
> > > > > 
> > > > > > -Original Message-
> > > > > > From: Xueming(Steven) Li 
> > > > > > Sent: Thursday, October 21, 2021 8:36 PM
> > > > > > To: maxime.coque...@redhat.com; dev@dpdk.org
> > > > > > Cc: Xia, Chenbo 
> > > > > > Subject: Re: [PATCH] vhost: add vDPA resource cleanup callback
> > > > > > 
> > > > > > On Thu, 2021-10-21 at 14:00 +0200, Maxime Coquelin wrote:
> > > > > > > Hi Xueming,
> > > > > > > 
> > > > > > > On 10/19/21 13:39, Xueming Li wrote:
> > > > > > > > This patch adds vDPA device cleanup callback to release 
> > > > > > > > resources on
> > > > > > > > vhost user connection close.
> > > > > > > > 
> > > > > > > > Signed-off-by: Xueming Li 
> > > > > > > > ---
> > > > > > > >     lib/vhost/rte_vdpa_dev.h | 3 +++
> > > > > > > >     lib/vhost/vhost_user.c   | 6 ++
> > > > > > > >     2 files changed, 9 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/lib/vhost/rte_vdpa_dev.h b/lib/vhost/rte_vdpa_dev.h
> > > > > > > > index b0f494815fa..2711004fe05 100644
> > > > > > > > --- a/lib/vhost/rte_vdpa_dev.h
> > > > > > > > +++ b/lib/vhost/rte_vdpa_dev.h
> > > > > > > > @@ -32,6 +32,9 @@ struct rte_vdpa_dev_ops {
> > > > > > > >     /** Driver close the device (Mandatory) */
> > > > > > > >     int (*dev_close)(int vid);
> > > > > > > > 
> > > > > > > > +    /** Connection closed, clean up resources */
> > > > > > > > +    int (*dev_cleanup)(int vid);
> > > > > > > > +
> > > > > > > >     /** Enable/disable this vring (Mandatory) */
> > > > > > > >     int (*set_vring_state)(int vid, int vring, int state);
> > > > > > > > 
> > > > > > > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> > > > > > > > index 5a894ca0cc7..032b621c86c 100644
> > > > > > > > --- a/lib/vhost/vhost_user.c
> > > > > > > > +++ b/lib/vhost/vhost_user.c
> > > > > > > > @@ -162,6 +162,12 @@ free_mem_region(struct virtio_net *dev)
> > > > > > > >     void
> > > > > > > >     vhost_backend_cleanup(struct virtio_net *dev)
> > > > > > > >     {
> > > > > > > > +    struct rte_vdpa_device *vdpa_dev;
> > > > > > > > +
> > > > > > > > +    vdpa_dev = dev->vdpa_dev;
> > > > > > > > +    if (vdpa_dev && vdpa_dev->ops->dev_cleanup != NULL)
> > > > > > > > +    vdpa_dev->ops->dev_cleanup(dev->vid);
> > > > > > > > +
> > > > > > > >     if (dev->mem) {
> > > > > > > >     free_mem_region(dev);
> > > > > > > >     rte_free(dev->mem);
> > > > > > > > 
> > > > > > > 
> > > > > > > What will be done there that cannot be done in .dev_close()?
> > > > > > 
> > > > > > .dev_close() mainly handles VM suspend and driver reset. If release
> > > > > > everything inside dev_close(), the suspend and resume takes longer 
> > > > > > time
> > > > > > if number of VQs are huge. Customer want to upgrade VM configuration
> > > > > > using suspend and resume, pause customer VM too long can't be 
> > > > > > accepted.
> > > > > 
> > > > > By saying 'upgrade VM configuration', do you mean VM memory hotplug? 
> > > > > Or something
> > > > > more?
> > > > > 
> > > > > Is this patch a next-step improvement of this commit?
> > > > > 
> > > > > commit 127f9c6f7b78a47b73b3e1c39e021cc81a30b4c9
> > > > > Author: Matan Azrad 
> > > > > Date:   Mon Jun 29 14:08:19 2020 +
> > > > > 
> > > > >   vhost: handle memory hotplug with vDPA devices
> > > > > 
> > > > >   Some vDPA drivers' basic configurations should be updated when 
> > > > > the
> > > > >   guest memory is hotplugged.
> > > > > 
> > > > >   Close vDPA device before hotplug operation and recreate it 
> > > > > after the
> > > > >   hotplug operation is done.
> > > > > 
> > > > >   Signed-off-by: Matan Azrad 
> > > > >   Reviewed-by: Maxime Coquelin 
> > > > >   Reviewed-by: Chenbo Xia 
> > > > > 
> > > > > > So the idea is to cache and reuse resource between dev_close() and
> > > > > > dev_conf(). Actually, the two functions looks more like dev_stop() 
> > > > > > and
> > > > > > dev_start().
> > > > > > 
> > > > > > dev_cleanup hooks to vhost backend cleanup which called when socket
> > > > > > closed for both client and server mode, a safe point to cleanup all
> > > > > > cached resources.
> > > > > > 
> > > > > > > Having the mlx5 implementation of this callback alongside this 
> > > > > > > patch may
> > > > > > > help to understand.
> > > > > > 
> > > > > > The mlx5 implementation still a prototype, pending on internal 
> > > > > > review.
> > > > > > So I just post the vhost part to get suggestion/comment. Let me 
> > > > > > know if
> > > > > > the ugly code does help :)
> > > > > 
> > > > > I would prefer to see the mlx implementation with this patch in the 
> > > > > sa

[Bug 928] [dpdk-next-*] drivers/librte_event_cnxk meson build failure with icc 19.1.3.304 on RedHat8.4/64 and UB20.04/64

2022-01-27 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=928

Bug ID: 928
   Summary: [dpdk-next-*] drivers/librte_event_cnxk  meson build
failure with icc 19.1.3.304  on RedHat8.4/64 and
UB20.04/64
   Product: DPDK
   Version: unspecified
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: core
  Assignee: dev@dpdk.org
  Reporter: daxuex@intel.com
  Target Milestone: ---

[DPDK version]

dpdk-next-net/dpdk-next-net-mrvl/dpdk-next-intel/dpdk-next-net-brcm/dpdk-next-net-mlx/dpdk-next-virtio:
 
dpdk-next-net's bad commit:
commit 48cfbe601149989fe5a7c032d2d63329f26aecb9
Author: Pavan Nikhilesh 
Date:   Sat Jan 22 21:18:17 2022 +0530

net/cnxk: add cn10k template Rx functions to build

Add cn10k segregated Rx and event dequeue functions to build,
add macros to make future modifications simpler.

Signed-off-by: Pavan Nikhilesh 
Acked-by: Jerin Jacob "


[OS version]:
RedHat Enterprise Linux 8.4(Ootpa)/kernel 4.18.0-305
ICC19.1.3.304

UB20.04/5.8.0-48-generic
ICC19.1.3.304

[Test Setup]:
source /opt/intel/compilers_and_libraries_2020.4.304/linux/bin/iccvars.sh
intel64
CC=icc meson --werror -Denable_kmods=True -Dlibdir=lib -Dexamples=all
--default-library=static x86_64-native-linuxapp-icc
ninja -C x86_64-native-linuxapp-icc


[RedHat8.4 and ubuntu20.04 log as below]
[196/3458] Compiling C object lib/librte_acl.a.p/acl_acl_run_avx512.c.o
In file included from ../lib/acl/acl_run_avx512.c(110):
../lib/acl/acl_run_avx512x8.h(162): warning #300: const variable "zero"
requires an initializer
static const uint32_t zero;
  ^

In file included from ../lib/acl/acl_run_avx512.c(137):
../lib/acl/acl_run_avx512x16.h(198): warning #300: const variable "zero"
requires an initializer
static const uint32_t zero;
  ^

[531/3458] Compiling C object lib/librte_graph.a.p/graph_graph_stats.c.o
../lib/graph/graph_stats.c(39): warning #2405: array of elements containing a
flexible array member is nonstandard
struct cluster_node clusters[];
^

[2897/3458] Compiling C object
drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_base_dlb2_resource.c.o
../drivers/event/dlb2/pf/base/dlb2_resource.c(392): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(1110): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(1154): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(1301): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(1348): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(1406): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(1428): warning #592: variable
"iter1" is used before its value is set
RTE_SET_USED(iter1);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(1429): warning #592: variable
"iter2" is used before its value is set
RTE_SET_USED(iter2);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(1462): warning #592: variable
"iteration" is used before its value is set
RTE_SET_USED(iteration);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(1653): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(1676): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(1952): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(2188): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(2394): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(2413): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(2443): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../drivers/event/dlb2/pf/base/dlb2_resource.c(2468): warning #592: variable
"iter" is used before its value is set
RTE_SET_USED(iter);
^

../dr

RE: [PATCH v1] raw/ifpga: fix ifpga devices cleanup function

2022-01-27 Thread Huang, Wei
Hi,

> -Original Message-
> From: Yigit, Ferruh 
> Sent: Wednesday, January 26, 2022 21:25
> To: Huang, Wei ; dev@dpdk.org; Xu, Rosen
> ; Zhang, Qi Z ; Nipun Gupta
> ; Hemant Agrawal 
> Cc: sta...@dpdk.org; Zhang, Tianfei 
> Subject: Re: [PATCH v1] raw/ifpga: fix ifpga devices cleanup function
> 
> On 1/26/2022 3:29 AM, Wei Huang wrote:
> > Use rte_dev_remove() to replace rte_rawdev_pmd_release() in
> > ifpga_rawdev_cleanup(), resources occupied by ifpga raw devices such
> > as threads can be released correctly.
> >
> 
> As far as I understand you are fixing an issue that not all resources are
> released, is this correct?
> What are these not released resources?
> 
> And 'rte_rawdev_pmd_release()' rawdev API seems intended to do the
> cleanup, is it expected that some resources are not freed after this call, or
> should we fix that API?
> If the device remove API needs to be used, what is the point of
> 'rte_rawdev_pmd_release()' API?
> 
> cc'ed rawdev maintainers for comment.

Yes, this patch is to release all the resources of ifpga_rawdev after testpmd 
exit, the not released resources are interrupt and thread.

rte_rawdev_pmd_release implemented in ifpga_rawdev only release memory 
allocated by ifpga driver, that's the expected behavior.

I think it's a simple and safe way to release resources completely by calling 
rte_dev_remove.

> 
> > Fixes: f724a802 ("raw/ifpga: add miscellaneous APIs")
> >
> > Signed-off-by: Wei Huang 
> > ---
> >   drivers/raw/ifpga/ifpga_rawdev.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/raw/ifpga/ifpga_rawdev.c
> > b/drivers/raw/ifpga/ifpga_rawdev.c
> > index fdf3c23..88c38aa 100644
> > --- a/drivers/raw/ifpga/ifpga_rawdev.c
> > +++ b/drivers/raw/ifpga/ifpga_rawdev.c
> > @@ -1787,12 +1787,14 @@ int ifpga_rawdev_partial_reconfigure(struct
> rte_rawdev *dev, int port,
> >   void ifpga_rawdev_cleanup(void)
> >   {
> > struct ifpga_rawdev *dev;
> > +   struct rte_rawdev *rdev;
> > unsigned int i;
> >
> > for (i = 0; i < IFPGA_RAWDEV_NUM; i++) {
> > dev = &ifpga_rawdevices[i];
> > if (dev->rawdev) {
> > -   rte_rawdev_pmd_release(dev->rawdev);
> > +   rdev = dev->rawdev;
> > +   rte_dev_remove(rdev->device);
> > dev->rawdev = NULL;
> > }
> > }



[DPDK 1/3] net/ice: display/reset VF stats on DCF representor

2022-01-27 Thread Ke Zhang
this feature need to update ice kernel driver (newer than v1.8.0_3)

Signed-off-by: Ke Zhang 
---
 drivers/net/ice/ice_dcf_vf_representor.c | 128 +++
 1 file changed, 128 insertions(+)

diff --git a/drivers/net/ice/ice_dcf_vf_representor.c 
b/drivers/net/ice/ice_dcf_vf_representor.c
index b9fcfc80ad..26d29b5bee 100644
--- a/drivers/net/ice/ice_dcf_vf_representor.c
+++ b/drivers/net/ice/ice_dcf_vf_representor.c
@@ -10,6 +10,9 @@
 #include "ice_dcf_ethdev.h"
 #include "ice_rxtx.h"
 
+#define ICE_DCF_REPR_32_BIT_WIDTH (CHAR_BIT * 4)
+#define ICE_DCF_REPR_48_BIT_WIDTH (CHAR_BIT * 6)
+#define ICE_DCF_REPR_48_BIT_MASK  RTE_LEN2MASK(ICE_DCF_REPR_48_BIT_WIDTH, 
uint64_t)
 static uint16_t
 ice_dcf_vf_repr_rx_burst(__rte_unused void *rxq,
 __rte_unused struct rte_mbuf **rx_pkts,
@@ -387,6 +390,129 @@ ice_dcf_vf_repr_vlan_tpid_set(struct rte_eth_dev *dev,
return 0;
 }
 
+static int
+ice_dcf_repr_query_stats(struct ice_dcf_hw *hw,
+uint16_t vf_id, struct virtchnl_eth_stats *pstats)
+{
+   struct virtchnl_queue_select q_stats;
+   struct dcf_virtchnl_cmd args;
+   int err;
+
+   memset(&q_stats, 0, sizeof(q_stats));
+   q_stats.vsi_id = hw->vf_vsi_map[vf_id] & ~VIRTCHNL_DCF_VF_VSI_VALID;
+
+   args.v_op = VIRTCHNL_OP_GET_STATS;
+   args.req_msg = (uint8_t *)&q_stats;
+   args.req_msglen = sizeof(q_stats);
+   args.rsp_msglen = sizeof(struct virtchnl_eth_stats);
+   args.rsp_msgbuf = (uint8_t *)pstats;
+   args.rsp_buflen = sizeof(struct virtchnl_eth_stats);
+
+   err = ice_dcf_execute_virtchnl_cmd(hw, &args);
+   if (err) {
+   PMD_DRV_LOG(ERR, "fail to execute command OP_GET_STATS");
+   return err;
+   }
+
+   return 0;
+}
+
+static int
+ice_dcf_vf_repr_stats_reset(struct rte_eth_dev *dev)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr);
+   struct virtchnl_eth_stats pstats;
+   int ret;
+
+   if (hw->resetting)
+   return 0;
+
+   /* read stat values to clear hardware registers */
+   ret = ice_dcf_repr_query_stats(hw, repr->vf_id, &pstats);
+   if (ret != 0)
+   return ret;
+
+   /* set stats offset base on current values */
+   hw->eth_stats_offset = pstats;
+
+   return 0;
+}
+
+static void
+ice_dcf_stat_update_48(uint64_t *offset, uint64_t *stat)
+{
+   if (*stat >= *offset)
+   *stat = *stat - *offset;
+   else
+   *stat = (uint64_t)((*stat +
+   ((uint64_t)1 << ICE_DCF_REPR_48_BIT_WIDTH)) - *offset);
+
+   *stat &= ICE_DCF_REPR_48_BIT_MASK;
+}
+
+static void
+ice_dcf_stat_update_32(uint64_t *offset, uint64_t *stat)
+{
+   if (*stat >= *offset)
+   *stat = (uint64_t)(*stat - *offset);
+   else
+   *stat = (uint64_t)((*stat +
+   ((uint64_t)1 << ICE_DCF_REPR_32_BIT_WIDTH)) - *offset);
+}
+
+static void
+ice_dcf_update_stats(struct ice_dcf_hw *hw, struct virtchnl_eth_stats *nes)
+{
+   struct virtchnl_eth_stats *oes = &hw->eth_stats_offset;
+
+   ice_dcf_stat_update_48(&oes->rx_bytes, &nes->rx_bytes);
+   ice_dcf_stat_update_48(&oes->rx_unicast, &nes->rx_unicast);
+   ice_dcf_stat_update_48(&oes->rx_multicast, &nes->rx_multicast);
+   ice_dcf_stat_update_48(&oes->rx_broadcast, &nes->rx_broadcast);
+   ice_dcf_stat_update_32(&oes->rx_discards, &nes->rx_discards);
+   ice_dcf_stat_update_48(&oes->tx_bytes, &nes->tx_bytes);
+   ice_dcf_stat_update_48(&oes->tx_unicast, &nes->tx_unicast);
+   ice_dcf_stat_update_48(&oes->tx_multicast, &nes->tx_multicast);
+   ice_dcf_stat_update_48(&oes->tx_broadcast, &nes->tx_broadcast);
+   ice_dcf_stat_update_32(&oes->tx_errors, &nes->tx_errors);
+   ice_dcf_stat_update_32(&oes->tx_discards, &nes->tx_discards);
+}
+
+static int
+ice_dcf_vf_repr_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr);
+   struct virtchnl_eth_stats pstats;
+   int ret;
+
+   if (hw->resetting) {
+   PMD_DRV_LOG(ERR,
+   "The DCF has been reset by PF, please reinit 
first");
+   return -EIO;
+   }
+
+   ret = ice_dcf_repr_query_stats(hw, repr->vf_id, &pstats);
+   if (ret == 0) {
+   uint8_t crc_stats_len = (dev->data->dev_conf.rxmode.offloads &
+RTE_ETH_RX_OFFLOAD_KEEP_CRC) ? 0 :
+RTE_ETHER_CRC_LEN;
+   ice_dcf_update_stats(hw, &pstats);
+   stats->ipackets = pstats.rx_unicast + pstats.rx_multicast +
+   pstats.rx_broadcast - pstats.rx_discards;
+   stats->opackets = pstats.tx_broadcast + pstats.tx_multica

[DPDK 2/3] net/ice: configure the VLAN filter for VFs on DCF representor

2022-01-27 Thread Ke Zhang
this feature need to update ice kernel driver (newer than v1.8.0_3)

Signed-off-by: Ke Zhang 
---
 drivers/net/ice/ice_dcf_vf_representor.c | 51 
 1 file changed, 51 insertions(+)

diff --git a/drivers/net/ice/ice_dcf_vf_representor.c 
b/drivers/net/ice/ice_dcf_vf_representor.c
index 26d29b5bee..bb353fb45f 100644
--- a/drivers/net/ice/ice_dcf_vf_representor.c
+++ b/drivers/net/ice/ice_dcf_vf_representor.c
@@ -513,6 +513,56 @@ ice_dcf_vf_repr_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
}
return ret;
 }
+static int
+ice_dcf_add_del_vlan_v2(struct rte_eth_dev *dev, uint16_t vlanid, bool add)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr);
+   struct virtchnl_vlan_filter_list_v2 vlan_filter;
+   struct dcf_virtchnl_cmd args;
+   struct virtchnl_vlan *vlan_setting;
+   int err;
+
+   vlan_setting = &vlan_filter.filters[0].outer;
+   memset(&vlan_filter, 0, sizeof(vlan_filter));
+   vlan_filter.vport_id = hw->vf_vsi_map[repr->vf_id] & 
~VIRTCHNL_DCF_VF_VSI_VALID;
+   vlan_filter.num_elements = 1;
+   vlan_setting->tpid = RTE_ETHER_TYPE_VLAN;
+   vlan_setting->tci = vlanid;
+
+   memset(&args, 0, sizeof(args));
+   args.v_op = add ? VIRTCHNL_OP_ADD_VLAN_V2 : VIRTCHNL_OP_DEL_VLAN_V2;
+   args.req_msg = (uint8_t *)&vlan_filter;
+   args.req_msglen = sizeof(vlan_filter);
+
+   err = ice_dcf_execute_virtchnl_cmd(hw, &args);
+   if (err) {
+   PMD_DRV_LOG(ERR, "Fail to execute command %s",
+   add ? "OP_ADD_ETH_ADDR" :  "OP_DEL_ETH_ADDR");
+   return err;
+   }
+   return 0;
+}
+
+static int
+ice_dcf_vf_repr_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int 
on)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   int err;
+
+   if (!ice_dcf_vlan_offload_ena(repr)) {
+   PMD_DRV_LOG(ERR, "It is not VLAN_V2");
+   return -ENOTSUP;
+   }
+
+   err = ice_dcf_add_del_vlan_v2(dev, vlan_id, on);
+   if (err) {
+   PMD_DRV_LOG(ERR, "Failed to set vlan filter, err:%d", err);
+   return -ENOTSUP;
+   }
+   return 0;
+}
+
 static const struct eth_dev_ops ice_dcf_vf_repr_dev_ops = {
.dev_configure= ice_dcf_vf_repr_dev_configure,
.dev_start= ice_dcf_vf_repr_dev_start,
@@ -531,6 +581,7 @@ static const struct eth_dev_ops ice_dcf_vf_repr_dev_ops = {
.vlan_tpid_set= ice_dcf_vf_repr_vlan_tpid_set,
.stats_reset  = ice_dcf_vf_repr_stats_reset,
.stats_get= ice_dcf_vf_repr_stats_get,
+   .vlan_filter_set  = ice_dcf_vf_repr_vlan_filter_set,
 };
 
 int
-- 
2.25.1



[DPDK 3/3] net/ice: Add / Remove VF mac address on DCF representor

2022-01-27 Thread Ke Zhang
this feature need to update ice kernel driver (newer than v1.8.0_3)

Signed-off-by: Ke Zhang 
---
 drivers/net/ice/ice_dcf_ethdev.h |  1 +
 drivers/net/ice/ice_dcf_vf_representor.c | 81 +++-
 2 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ice/ice_dcf_ethdev.h b/drivers/net/ice/ice_dcf_ethdev.h
index 8510e37119..b1bdf39a74 100644
--- a/drivers/net/ice/ice_dcf_ethdev.h
+++ b/drivers/net/ice/ice_dcf_ethdev.h
@@ -50,6 +50,7 @@ struct ice_dcf_vf_repr {
struct rte_ether_addr mac_addr;
uint16_t switch_domain_id;
uint16_t vf_id;
+   uint16_t mac_num; /* Number of MAC addresses */
 
struct ice_dcf_vlan outer_vlan_info; /* DCF always handle outer VLAN */
 };
diff --git a/drivers/net/ice/ice_dcf_vf_representor.c 
b/drivers/net/ice/ice_dcf_vf_representor.c
index bb353fb45f..9df3553508 100644
--- a/drivers/net/ice/ice_dcf_vf_representor.c
+++ b/drivers/net/ice/ice_dcf_vf_representor.c
@@ -136,7 +136,7 @@ ice_dcf_vf_repr_dev_info_get(struct rte_eth_dev *dev,
return -EIO;
 
dev_info->device = dev->device;
-   dev_info->max_mac_addrs = 1;
+   dev_info->max_mac_addrs = ICE_NUM_MACADDR_MAX;
dev_info->max_rx_queues = dcf_hw->vsi_res->num_queue_pairs;
dev_info->max_tx_queues = dcf_hw->vsi_res->num_queue_pairs;
dev_info->min_rx_bufsize = ICE_BUF_SIZE_MIN;
@@ -513,6 +513,82 @@ ice_dcf_vf_repr_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
}
return ret;
 }
+static int
+ice_dcf_repr_add_del_eth_addr(struct ice_dcf_hw *hw,
+   uint16_t vf_id,
+   struct rte_ether_addr *addr,
+   bool add, uint8_t type)
+{
+   struct virtchnl_ether_addr_list *list;
+   struct dcf_virtchnl_cmd args;
+   uint8_t cmd_buffer[sizeof(struct virtchnl_ether_addr_list) +
+  sizeof(struct virtchnl_ether_addr)];
+   int err;
+
+   list = (struct virtchnl_ether_addr_list *)cmd_buffer;
+   list->vsi_id = hw->vf_vsi_map[vf_id] & ~VIRTCHNL_DCF_VF_VSI_VALID;
+   list->num_elements = 1;
+   list->list[0].type = type;
+   rte_memcpy(list->list[0].addr, addr->addr_bytes,
+   sizeof(addr->addr_bytes));
+
+   args.v_op = add ? VIRTCHNL_OP_ADD_ETH_ADDR : VIRTCHNL_OP_DEL_ETH_ADDR;
+   args.req_msg = cmd_buffer;
+   args.req_msglen = sizeof(cmd_buffer);
+
+   err = ice_dcf_execute_virtchnl_cmd(hw, &args);
+   if (err) {
+   PMD_DRV_LOG(ERR, "Fail to execute command %s",
+   add ? "OP_ADD_ETH_ADDR" :  "OP_DEL_ETH_ADDR");
+   return err;
+   }
+
+   return 0;
+}
+
+static int
+ice_dcf_vf_repr_add_mac_addr(struct rte_eth_dev *dev, struct rte_ether_addr 
*addr,
+__rte_unused uint32_t index,
+__rte_unused uint32_t pool)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr);
+   int err;
+
+   if (rte_is_zero_ether_addr(addr)) {
+   PMD_DRV_LOG(ERR, "Invalid Ethernet Address");
+   return -EINVAL;
+   }
+
+   err = ice_dcf_repr_add_del_eth_addr(hw, repr->vf_id, addr, true, 
VIRTCHNL_ETHER_ADDR_EXTRA);
+   if (err) {
+   PMD_DRV_LOG(ERR, "fail to add MAC address");
+   return -EIO;
+   }
+
+   repr->mac_num++;
+
+   return 0;
+}
+
+static void
+ice_dcf_vf_repr_del_mac_addr(struct rte_eth_dev *dev, uint32_t index)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr);
+   struct rte_ether_addr *addr;
+   int err;
+
+   addr = &dev->data->mac_addrs[index];
+
+   err = ice_dcf_repr_add_del_eth_addr(hw, repr->vf_id, addr,
+   false, VIRTCHNL_ETHER_ADDR_EXTRA);
+   if (err)
+   PMD_DRV_LOG(ERR, "fail to del MAC address");
+
+   repr->mac_num--;
+}
+
 static int
 ice_dcf_add_del_vlan_v2(struct rte_eth_dev *dev, uint16_t vlanid, bool add)
 {
@@ -581,6 +657,8 @@ static const struct eth_dev_ops ice_dcf_vf_repr_dev_ops = {
.vlan_tpid_set= ice_dcf_vf_repr_vlan_tpid_set,
.stats_reset  = ice_dcf_vf_repr_stats_reset,
.stats_get= ice_dcf_vf_repr_stats_get,
+   .mac_addr_add = ice_dcf_vf_repr_add_mac_addr,
+   .mac_addr_remove  = ice_dcf_vf_repr_del_mac_addr,
.vlan_filter_set  = ice_dcf_vf_repr_vlan_filter_set,
 };
 
@@ -596,6 +674,7 @@ ice_dcf_vf_repr_init(struct rte_eth_dev *vf_rep_eth_dev, 
void *init_param)
repr->outer_vlan_info.port_vlan_ena = false;
repr->outer_vlan_info.stripping_ena = false;
repr->outer_vlan_info.tpid = RTE_ETHER_TYPE_VLAN;
+   repr->mac_num = 1;
 
vf_rep_eth_dev->dev_ops = &ice_dcf_vf_repr_dev_ops;
 
-- 
2.25.1



[PATCH] net/ice: Add / Remove VF mac address on DCF representor

2022-01-27 Thread Ke Zhang
this feature need to update ice kernel driver (newer than v1.8.0_3)

Signed-off-by: Ke Zhang 
---
 drivers/net/ice/ice_dcf_ethdev.h |  1 +
 drivers/net/ice/ice_dcf_vf_representor.c | 81 +++-
 2 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ice/ice_dcf_ethdev.h b/drivers/net/ice/ice_dcf_ethdev.h
index 8510e37119..b1bdf39a74 100644
--- a/drivers/net/ice/ice_dcf_ethdev.h
+++ b/drivers/net/ice/ice_dcf_ethdev.h
@@ -50,6 +50,7 @@ struct ice_dcf_vf_repr {
struct rte_ether_addr mac_addr;
uint16_t switch_domain_id;
uint16_t vf_id;
+   uint16_t mac_num; /* Number of MAC addresses */
 
struct ice_dcf_vlan outer_vlan_info; /* DCF always handle outer VLAN */
 };
diff --git a/drivers/net/ice/ice_dcf_vf_representor.c 
b/drivers/net/ice/ice_dcf_vf_representor.c
index bb353fb45f..9df3553508 100644
--- a/drivers/net/ice/ice_dcf_vf_representor.c
+++ b/drivers/net/ice/ice_dcf_vf_representor.c
@@ -136,7 +136,7 @@ ice_dcf_vf_repr_dev_info_get(struct rte_eth_dev *dev,
return -EIO;
 
dev_info->device = dev->device;
-   dev_info->max_mac_addrs = 1;
+   dev_info->max_mac_addrs = ICE_NUM_MACADDR_MAX;
dev_info->max_rx_queues = dcf_hw->vsi_res->num_queue_pairs;
dev_info->max_tx_queues = dcf_hw->vsi_res->num_queue_pairs;
dev_info->min_rx_bufsize = ICE_BUF_SIZE_MIN;
@@ -513,6 +513,82 @@ ice_dcf_vf_repr_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
}
return ret;
 }
+static int
+ice_dcf_repr_add_del_eth_addr(struct ice_dcf_hw *hw,
+   uint16_t vf_id,
+   struct rte_ether_addr *addr,
+   bool add, uint8_t type)
+{
+   struct virtchnl_ether_addr_list *list;
+   struct dcf_virtchnl_cmd args;
+   uint8_t cmd_buffer[sizeof(struct virtchnl_ether_addr_list) +
+  sizeof(struct virtchnl_ether_addr)];
+   int err;
+
+   list = (struct virtchnl_ether_addr_list *)cmd_buffer;
+   list->vsi_id = hw->vf_vsi_map[vf_id] & ~VIRTCHNL_DCF_VF_VSI_VALID;
+   list->num_elements = 1;
+   list->list[0].type = type;
+   rte_memcpy(list->list[0].addr, addr->addr_bytes,
+   sizeof(addr->addr_bytes));
+
+   args.v_op = add ? VIRTCHNL_OP_ADD_ETH_ADDR : VIRTCHNL_OP_DEL_ETH_ADDR;
+   args.req_msg = cmd_buffer;
+   args.req_msglen = sizeof(cmd_buffer);
+
+   err = ice_dcf_execute_virtchnl_cmd(hw, &args);
+   if (err) {
+   PMD_DRV_LOG(ERR, "Fail to execute command %s",
+   add ? "OP_ADD_ETH_ADDR" :  "OP_DEL_ETH_ADDR");
+   return err;
+   }
+
+   return 0;
+}
+
+static int
+ice_dcf_vf_repr_add_mac_addr(struct rte_eth_dev *dev, struct rte_ether_addr 
*addr,
+__rte_unused uint32_t index,
+__rte_unused uint32_t pool)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr);
+   int err;
+
+   if (rte_is_zero_ether_addr(addr)) {
+   PMD_DRV_LOG(ERR, "Invalid Ethernet Address");
+   return -EINVAL;
+   }
+
+   err = ice_dcf_repr_add_del_eth_addr(hw, repr->vf_id, addr, true, 
VIRTCHNL_ETHER_ADDR_EXTRA);
+   if (err) {
+   PMD_DRV_LOG(ERR, "fail to add MAC address");
+   return -EIO;
+   }
+
+   repr->mac_num++;
+
+   return 0;
+}
+
+static void
+ice_dcf_vf_repr_del_mac_addr(struct rte_eth_dev *dev, uint32_t index)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr);
+   struct rte_ether_addr *addr;
+   int err;
+
+   addr = &dev->data->mac_addrs[index];
+
+   err = ice_dcf_repr_add_del_eth_addr(hw, repr->vf_id, addr,
+   false, VIRTCHNL_ETHER_ADDR_EXTRA);
+   if (err)
+   PMD_DRV_LOG(ERR, "fail to del MAC address");
+
+   repr->mac_num--;
+}
+
 static int
 ice_dcf_add_del_vlan_v2(struct rte_eth_dev *dev, uint16_t vlanid, bool add)
 {
@@ -581,6 +657,8 @@ static const struct eth_dev_ops ice_dcf_vf_repr_dev_ops = {
.vlan_tpid_set= ice_dcf_vf_repr_vlan_tpid_set,
.stats_reset  = ice_dcf_vf_repr_stats_reset,
.stats_get= ice_dcf_vf_repr_stats_get,
+   .mac_addr_add = ice_dcf_vf_repr_add_mac_addr,
+   .mac_addr_remove  = ice_dcf_vf_repr_del_mac_addr,
.vlan_filter_set  = ice_dcf_vf_repr_vlan_filter_set,
 };
 
@@ -596,6 +674,7 @@ ice_dcf_vf_repr_init(struct rte_eth_dev *vf_rep_eth_dev, 
void *init_param)
repr->outer_vlan_info.port_vlan_ena = false;
repr->outer_vlan_info.stripping_ena = false;
repr->outer_vlan_info.tpid = RTE_ETHER_TYPE_VLAN;
+   repr->mac_num = 1;
 
vf_rep_eth_dev->dev_ops = &ice_dcf_vf_repr_dev_ops;
 
-- 
2.25.1



[PATCH] net/ice: configure the VLAN filter for VFs on DCF representor

2022-01-27 Thread Ke Zhang
this feature need to update ice kernel driver (newer than v1.8.0_3)

Signed-off-by: Ke Zhang 
---
 drivers/net/ice/ice_dcf_vf_representor.c | 51 
 1 file changed, 51 insertions(+)

diff --git a/drivers/net/ice/ice_dcf_vf_representor.c 
b/drivers/net/ice/ice_dcf_vf_representor.c
index 26d29b5bee..bb353fb45f 100644
--- a/drivers/net/ice/ice_dcf_vf_representor.c
+++ b/drivers/net/ice/ice_dcf_vf_representor.c
@@ -513,6 +513,56 @@ ice_dcf_vf_repr_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
}
return ret;
 }
+static int
+ice_dcf_add_del_vlan_v2(struct rte_eth_dev *dev, uint16_t vlanid, bool add)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr);
+   struct virtchnl_vlan_filter_list_v2 vlan_filter;
+   struct dcf_virtchnl_cmd args;
+   struct virtchnl_vlan *vlan_setting;
+   int err;
+
+   vlan_setting = &vlan_filter.filters[0].outer;
+   memset(&vlan_filter, 0, sizeof(vlan_filter));
+   vlan_filter.vport_id = hw->vf_vsi_map[repr->vf_id] & 
~VIRTCHNL_DCF_VF_VSI_VALID;
+   vlan_filter.num_elements = 1;
+   vlan_setting->tpid = RTE_ETHER_TYPE_VLAN;
+   vlan_setting->tci = vlanid;
+
+   memset(&args, 0, sizeof(args));
+   args.v_op = add ? VIRTCHNL_OP_ADD_VLAN_V2 : VIRTCHNL_OP_DEL_VLAN_V2;
+   args.req_msg = (uint8_t *)&vlan_filter;
+   args.req_msglen = sizeof(vlan_filter);
+
+   err = ice_dcf_execute_virtchnl_cmd(hw, &args);
+   if (err) {
+   PMD_DRV_LOG(ERR, "Fail to execute command %s",
+   add ? "OP_ADD_ETH_ADDR" :  "OP_DEL_ETH_ADDR");
+   return err;
+   }
+   return 0;
+}
+
+static int
+ice_dcf_vf_repr_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int 
on)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   int err;
+
+   if (!ice_dcf_vlan_offload_ena(repr)) {
+   PMD_DRV_LOG(ERR, "It is not VLAN_V2");
+   return -ENOTSUP;
+   }
+
+   err = ice_dcf_add_del_vlan_v2(dev, vlan_id, on);
+   if (err) {
+   PMD_DRV_LOG(ERR, "Failed to set vlan filter, err:%d", err);
+   return -ENOTSUP;
+   }
+   return 0;
+}
+
 static const struct eth_dev_ops ice_dcf_vf_repr_dev_ops = {
.dev_configure= ice_dcf_vf_repr_dev_configure,
.dev_start= ice_dcf_vf_repr_dev_start,
@@ -531,6 +581,7 @@ static const struct eth_dev_ops ice_dcf_vf_repr_dev_ops = {
.vlan_tpid_set= ice_dcf_vf_repr_vlan_tpid_set,
.stats_reset  = ice_dcf_vf_repr_stats_reset,
.stats_get= ice_dcf_vf_repr_stats_get,
+   .vlan_filter_set  = ice_dcf_vf_repr_vlan_filter_set,
 };
 
 int
-- 
2.25.1



[PATCH] net/ice: display/reset VF stats on DCF representor

2022-01-27 Thread Ke Zhang
this feature need to update ice kernel driver (newer than v1.8.0_3)

Signed-off-by: Ke Zhang 
---
 drivers/net/ice/ice_dcf_vf_representor.c | 128 +++
 1 file changed, 128 insertions(+)

diff --git a/drivers/net/ice/ice_dcf_vf_representor.c 
b/drivers/net/ice/ice_dcf_vf_representor.c
index b9fcfc80ad..26d29b5bee 100644
--- a/drivers/net/ice/ice_dcf_vf_representor.c
+++ b/drivers/net/ice/ice_dcf_vf_representor.c
@@ -10,6 +10,9 @@
 #include "ice_dcf_ethdev.h"
 #include "ice_rxtx.h"
 
+#define ICE_DCF_REPR_32_BIT_WIDTH (CHAR_BIT * 4)
+#define ICE_DCF_REPR_48_BIT_WIDTH (CHAR_BIT * 6)
+#define ICE_DCF_REPR_48_BIT_MASK  RTE_LEN2MASK(ICE_DCF_REPR_48_BIT_WIDTH, 
uint64_t)
 static uint16_t
 ice_dcf_vf_repr_rx_burst(__rte_unused void *rxq,
 __rte_unused struct rte_mbuf **rx_pkts,
@@ -387,6 +390,129 @@ ice_dcf_vf_repr_vlan_tpid_set(struct rte_eth_dev *dev,
return 0;
 }
 
+static int
+ice_dcf_repr_query_stats(struct ice_dcf_hw *hw,
+uint16_t vf_id, struct virtchnl_eth_stats *pstats)
+{
+   struct virtchnl_queue_select q_stats;
+   struct dcf_virtchnl_cmd args;
+   int err;
+
+   memset(&q_stats, 0, sizeof(q_stats));
+   q_stats.vsi_id = hw->vf_vsi_map[vf_id] & ~VIRTCHNL_DCF_VF_VSI_VALID;
+
+   args.v_op = VIRTCHNL_OP_GET_STATS;
+   args.req_msg = (uint8_t *)&q_stats;
+   args.req_msglen = sizeof(q_stats);
+   args.rsp_msglen = sizeof(struct virtchnl_eth_stats);
+   args.rsp_msgbuf = (uint8_t *)pstats;
+   args.rsp_buflen = sizeof(struct virtchnl_eth_stats);
+
+   err = ice_dcf_execute_virtchnl_cmd(hw, &args);
+   if (err) {
+   PMD_DRV_LOG(ERR, "fail to execute command OP_GET_STATS");
+   return err;
+   }
+
+   return 0;
+}
+
+static int
+ice_dcf_vf_repr_stats_reset(struct rte_eth_dev *dev)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr);
+   struct virtchnl_eth_stats pstats;
+   int ret;
+
+   if (hw->resetting)
+   return 0;
+
+   /* read stat values to clear hardware registers */
+   ret = ice_dcf_repr_query_stats(hw, repr->vf_id, &pstats);
+   if (ret != 0)
+   return ret;
+
+   /* set stats offset base on current values */
+   hw->eth_stats_offset = pstats;
+
+   return 0;
+}
+
+static void
+ice_dcf_stat_update_48(uint64_t *offset, uint64_t *stat)
+{
+   if (*stat >= *offset)
+   *stat = *stat - *offset;
+   else
+   *stat = (uint64_t)((*stat +
+   ((uint64_t)1 << ICE_DCF_REPR_48_BIT_WIDTH)) - *offset);
+
+   *stat &= ICE_DCF_REPR_48_BIT_MASK;
+}
+
+static void
+ice_dcf_stat_update_32(uint64_t *offset, uint64_t *stat)
+{
+   if (*stat >= *offset)
+   *stat = (uint64_t)(*stat - *offset);
+   else
+   *stat = (uint64_t)((*stat +
+   ((uint64_t)1 << ICE_DCF_REPR_32_BIT_WIDTH)) - *offset);
+}
+
+static void
+ice_dcf_update_stats(struct ice_dcf_hw *hw, struct virtchnl_eth_stats *nes)
+{
+   struct virtchnl_eth_stats *oes = &hw->eth_stats_offset;
+
+   ice_dcf_stat_update_48(&oes->rx_bytes, &nes->rx_bytes);
+   ice_dcf_stat_update_48(&oes->rx_unicast, &nes->rx_unicast);
+   ice_dcf_stat_update_48(&oes->rx_multicast, &nes->rx_multicast);
+   ice_dcf_stat_update_48(&oes->rx_broadcast, &nes->rx_broadcast);
+   ice_dcf_stat_update_32(&oes->rx_discards, &nes->rx_discards);
+   ice_dcf_stat_update_48(&oes->tx_bytes, &nes->tx_bytes);
+   ice_dcf_stat_update_48(&oes->tx_unicast, &nes->tx_unicast);
+   ice_dcf_stat_update_48(&oes->tx_multicast, &nes->tx_multicast);
+   ice_dcf_stat_update_48(&oes->tx_broadcast, &nes->tx_broadcast);
+   ice_dcf_stat_update_32(&oes->tx_errors, &nes->tx_errors);
+   ice_dcf_stat_update_32(&oes->tx_discards, &nes->tx_discards);
+}
+
+static int
+ice_dcf_vf_repr_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+   struct ice_dcf_vf_repr *repr = dev->data->dev_private;
+   struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr);
+   struct virtchnl_eth_stats pstats;
+   int ret;
+
+   if (hw->resetting) {
+   PMD_DRV_LOG(ERR,
+   "The DCF has been reset by PF, please reinit 
first");
+   return -EIO;
+   }
+
+   ret = ice_dcf_repr_query_stats(hw, repr->vf_id, &pstats);
+   if (ret == 0) {
+   uint8_t crc_stats_len = (dev->data->dev_conf.rxmode.offloads &
+RTE_ETH_RX_OFFLOAD_KEEP_CRC) ? 0 :
+RTE_ETHER_CRC_LEN;
+   ice_dcf_update_stats(hw, &pstats);
+   stats->ipackets = pstats.rx_unicast + pstats.rx_multicast +
+   pstats.rx_broadcast - pstats.rx_discards;
+   stats->opackets = pstats.tx_broadcast + pstats.tx_multica

Re: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints

2022-01-27 Thread Jerin Jacob
On Thu, Jan 27, 2022 at 3:32 AM Alexander Kozyrev  wrote:
>
> On Tuesday, January 25, 2022 13:44 Jerin Jacob  wrote:
> > On Tue, Jan 25, 2022 at 6:58 AM Alexander Kozyrev 
> > wrote:
> > >
> > > On Monday, January 24, 2022 12:41 Ajit Khaparde
> >  wrote:
> > > > On Mon, Jan 24, 2022 at 6:37 AM Jerin Jacob 
> > > > wrote:
> > > > >
> >
> > > Ok, I'll adopt this wording in the v3.
> > >
> > > > > > + *
> > > > > > + * @param port_id
> > > > > > + *   Port identifier of Ethernet device.
> > > > > > + * @param[in] port_attr
> > > > > > + *   Port configuration attributes.
> > > > > > + * @param[out] error
> > > > > > + *   Perform verbose error reporting if not NULL.
> > > > > > + *   PMDs initialize this structure in case of error only.
> > > > > > + *
> > > > > > + * @return
> > > > > > + *   0 on success, a negative errno value otherwise and rte_errno 
> > > > > > is
> > set.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int
> > > > > > +rte_flow_configure(uint16_t port_id,
> > > > >
> > > > > Should we couple, setting resource limit hint to configure function as
> > > > > if we add future items in
> > > > > configuration, we may pain to manage all state. Instead how about,
> > > > > rte_flow_resource_reserve_hint_set()?
> > > > +1
> > > Port attributes are the hints, PMD can safely ignore anything that is not
> > supported/deemed unreasonable.
> > > Having several functions to call instead of one configuration function 
> > > seems
> > like a burden to me.
> >
> > If we add a lot of features which has different state it will be
> > difficult to manage.
> > Since it is the slow path and OPTIONAL API. IMO, it should be fine to
> > have a separate API for a specific purpose
> > to have a clean interface.
>
> This approach contradicts to the DPDK way of configuring devices.
> It you look at the rte_eth_dev_configure or rte_eth_rx_queue_setup API
> you will see that the configuration is propagated via config structures.
> I would like to conform to this approach with my new API as well.

There is a subtle difference,  those are mandatory APIs. i,e application must
call those API to use the subsequent APIs.

I am OK with introducing rte_flow_configure() for such use cases.
Probably, we can add these parameters in rte_flow_configure() for the
new features.
And make it mandatory API for the next ABI to avoid application breakage.

Also, please change git commit to the description for adding  the
configure state
for rte_flow API.

BTW: Your Queue patch[3/3] probably needs to add the nb_queue
parameter to configure.
So the driver knows, the number queue needed upfront like the ethdev API scheme.


>
> Another question is how to deal with interdependencies with separate hints?
> There could be some resources that requires other resources to be present.
> Or one resource shares the hardware registers with another one and needs to
> be accounted for. That is not easy to do with separate function calls.

I got the use case now.

>
> > >
> > > >
> > > > >
> > > > >
> > > > > > +  const struct rte_flow_port_attr *port_attr,
> > > > > > +  struct rte_flow_error *error);
> > > > >
> > > > > I think, we should have _get function to get those limit numbers
> > otherwise,
> > > > > we can not write portable applications as the return value is  kind of
> > > > > boolean now if
> > > > > don't define exact values for rte_errno for reasons.
> > > > +1
> > > We had this discussion in RFC. The limits will vary from NIC to NIC and 
> > > from
> > system to
> > > system, depending on hardware capabilities and amount of free memory
> > for example.
> > > It is easier to reject a configuration with a clear error description as 
> > > we do
> > for flow creation.
> >
> > In that case, we can return a "defined" return value or "defined"
> > errno to capture this case so that
> > the application can make forward progress to differentiate between API
> > failed vs dont having enough resources
> > and move on.
>
> I think you are right and it will be useful to provide some hardware 
> capabilities.
> I'll add something like rte_flow_info_get() to obtain available flow rule 
> resources.

Ack.


Re: [PATCH] mempool: fix rte primary program coredump

2022-01-27 Thread Olivier Matz
Hi Tianli,

On Wed, Nov 10, 2021 at 11:57:19PM +0800, Tianli Lai wrote:
> the primary program(such as ofp app) run first, then run the secondary
> program(such as dpdk-pdump), the primary program would receive signal
> SIGSEGV. the function stack as follow:
> 
> aived signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffee60e700 (LWP 112613)]
> 0x75f2cc0b in bucket_stack_pop (stack=0x0001) at
> /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:95
> 95  if (stack->top == 0)
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.17-196.el7.x86_64 libatomic-4.8.5-16.el7.x86_64
> libconfig-1.4.9-5.el7.x86_64 libgcc-4.8.5-16.el7.x86_64
> libpcap-1.5.3-12.el7.x86_64 numactl-libs-2.0.9-6.el7_2.x86_64
> openssl-libs-1.0.2k-8.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
>  #0  0x75f2cc0b in bucket_stack_pop (stack=0x0001) at 
> /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:95
>  #1  0x75f2e5dc in bucket_dequeue_orphans 
> (bd=0x2209e5fac0,obj_table=0x220b083710, n_orphans=251) at 
> /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:190
>  #2  0x75f30192 in bucket_dequeue 
> (mp=0x220b07d5c0,obj_table=0x220b083710, n=251) at 
> /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:288
>  #3  0x75f47e18 in rte_mempool_ops_dequeue_bulk 
> (mp=0x220b07d5c0,obj_table=0x220b083710, n=251) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:739
>  #4  0x75f4819d in __mempool_generic_get (cache=0x220b083700, n=1, 
> obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1443
>  #5  rte_mempool_generic_get (cache=0x220b083700, n=1, 
> obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1506
>  #6  rte_mempool_get_bulk (n=1, obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1539
>  #7  rte_mempool_get (obj_p=0x7fffee5deb18, mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1565
>  #8  rte_mbuf_raw_alloc (mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:551
>  #9  0x75f483a4 in rte_pktmbuf_alloc (mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:804
>  #10 0x75f4c9d9 in pdump_pktmbuf_copy (m=0x220746ad80, 
> mp=0x220b07d5c0) at /ofp/dpdk/lib/librte_pdump/rte_pdump.c:99
>  #11 0x75f4e42e in pdump_copy (pkts=0x7fffee5dfdf0, nb_pkts=1, 
> user_params=0x776d7cc0 ) at 
> /ofp/dpdk/lib/librte_pdump/rte_pdump.c:151
>  #12 0x75f4eadd in pdump_rx (port=0, qidx=0, pkts=0x7fffee5dfdf0, 
> nb_pkts=1, max_pkts=16, user_params=0x776d7cc0 ) at 
> /ofp/dpdk/lib/librte_pdump/rte_pdump.c:172
>  #13 0x75d0e9e8 in rte_eth_rx_burst (port_id=0, queue_id=0, 
> rx_pkts=0x7fffee5dfdf0, nb_pkts=16) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/usr/local/include/dpdk/rte_ethdev.h:4396
>  #14 0x75d114c3 in recv_pkt_dpdk (pktio_entry=0x22005436c0, index=0, 
> pkt_table=0x7fffee5dfdf0, num=16) at odp_packet_dpdk.c:1081
>  #15 0x75d2f931 in odp_pktin_recv (queue=...,packets=0x7fffee5dfdf0, 
> num=16) at ../linux-generic/odp_packet_io.c:1896
>  #16 0x0040a344 in rx_burst (pktin=...) at app_main.c:223
>  #17 0x0040aca4 in run_server_single (arg=0x7fffe2b0) at 
> app_main.c:417
>  #18 0x77bd6883 in run_thread (arg=0x7fffe3b8) at threads.c:67
>  #19 0x753c8e25 in start_thread () from /lib64/libpthread.so.0
>  #20 0x7433e34d in clone () from /lib64/libc.so.6.c:67
> 
> The program crash down reason is:
> 
> In primary program and secondary program , the global array 
> rte_mempool_ops.ops[]:
> primary namesecondary name
>  [0]:   "bucket""ring_mp_mc"
>  [1]:   "dpaa"  "ring_sp_sc"
>  [2]:   "dpaa2" "ring_mp_sc"
>  [3]:   "octeontx_fpavf""ring_sp_mc"
>  [4]:   "octeontx2_npa" "octeontx2_npa"
>  [5]:   "ring_mp_mc""bucket"
>  [6]:   "ring_sp_sc""stack"
>  [7]:   "ring_mp_sc""if_stack"
>  [8]:   "ring_sp_mc""dpaa"
>  [9]:   "stack" "dpaa2"
>  [10]:  "if_stack"  "octeontx_fpavf"
>  [11]:  NULLNULL
> 
>  this array in primary program is different with secondary program.
>  so when secondary program call rte_pktmbuf_pool_create_by_ops() with
>  mempool name “ring_mp_mc”, but the primary program use "bucket" type
>  to alloc rte_mbuf.
> 
>  so sort this array both primary program and secondary program when init
>  memzone.
> 
> Signed-off-by: Tianli Lai 

I think it is the same problem than the one described here:
http://inbox.dpdk.org/dev/1583114253-15345-1-git-send-email-xiangxia.m@gmail.com/#r

To summarize what is said in the thread, sorting ops look dangerous because it
changes the index during the lifetim

RE: [PATCH] vhost: fix data-plane access to released vq

2022-01-27 Thread Wang, YuanX
Hi Maxime,

> -Original Message-
> From: Maxime Coquelin 
> Sent: Wednesday, January 26, 2022 10:03 PM
> To: Wang, YuanX ; Xia, Chenbo
> 
> Cc: dev@dpdk.org; Hu, Jiayu ; Ding, Xuan
> ; Ma, WenwuX ; Ling,
> WeiX 
> Subject: Re: [PATCH] vhost: fix data-plane access to released vq
> 
> Hi Yuan,
> 
> On 12/3/21 17:34, Yuan Wang wrote:
> > From: yuan wang 
> >
> > When numa reallocation occurs, numa_realoc() on the control plane will
> > free the old vq. If rte_vhost_dequeue_burst() on the data plane get
> > the vq just before release, then it will access the released vq. We
> > need to put the
> > vq->access_lock into struct virtio_net to ensure that it
> > can prevents this situation.
> 
> 
> This patch is a fix, so the Fixes tag would be needed.
> 
> But are you really facing this issue, or this is just based on code review?

This issue is run-time checked with AddressSanitizer which can be turned on by:
meson configure -Db_sanitize=address 

> 
> Currently NUMA reallocation is called whenever
> translate_ring_addresses() is called.
> 
> translate_ring_addresses() is primarly called at device initialization, before
> the .new_device() callback is called. At that stage, there is no risk to
> performa NUMA reallocation as the application is not expected to use APIs
> requiring vq->access_lock acquisition.
> 
> But I agree there are possibilities that numa_realloc() gets called while 
> device
> is in running state. But even if that happened, I don't think it is possible 
> that
> numa_realloc() ends-up reallocating the virtqueue on a different NUMA
> node (the vring should not have moved from a physical memory standpoint).
> And if even it happened, we should be safe because we ensure the VQ was
> not ready (so not usable by the
> application) before proceeding with reallocation:

Here is a scenario where VQ ready has not been set:
1. run the testpmd and then start the data plane process.
2. run the front-end.
3. new_device() gets called when the first two queues are ready, even if the 
later queues are not.
4. when processing messages from the later queues, it may go to numa_realloc(), 
the ready flag has not been set and therefore can be reallocated.

If all the queues are ready before call new_deivce(), this issue does not occur.
I think maybe it is another solution.

Thanks,
Yuan

> 
> static struct virtio_net*
> numa_realloc(struct virtio_net *dev, int index) {
>   int node, dev_node;
>   struct virtio_net *old_dev;
>   struct vhost_virtqueue *vq;
>   struct batch_copy_elem *bce;
>   struct guest_page *gp;
>   struct rte_vhost_memory *mem;
>   size_t mem_size;
>   int ret;
> 
>   old_dev = dev;
>   vq = dev->virtqueue[index];
> 
>   /*
>* If VQ is ready, it is too late to reallocate, it certainly already
>* happened anyway on VHOST_USER_SET_VRING_ADRR.
>*/
>   if (vq->ready)
>   return dev;
> 
> So, if this is fixing a real issue, I would need more details on the issue in 
> order
> to understand why vq->ready was not set when it should have been.
> 
> On a side note, while trying to understand how you could face an issue, I
> noticed that translate_ring_addresses() may be called by
> vhost_user_iotlb_msg(). In that case, vq->access_lock is not held as this is
> the handler for VHOST_USER_IOTLB_MSG. We may want to protect
> translate_ring_addresses() calls with locking the VQ locks. I will post a fix 
> for
> it.
> 
> > Signed-off-by: Yuan Wang 
> > ---
> >   lib/vhost/vhost.c  | 26 +-
> >   lib/vhost/vhost.h  |  4 +---
> >   lib/vhost/vhost_user.c |  4 ++--
> >   lib/vhost/virtio_net.c | 16 
> >   4 files changed, 24 insertions(+), 26 deletions(-)
> >
> 
> ...
> 
> > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index
> > 7085e0885c..f85ce4fda5 100644
> > --- a/lib/vhost/vhost.h
> > +++ b/lib/vhost/vhost.h
> > @@ -185,9 +185,6 @@ struct vhost_virtqueue {
> > boolaccess_ok;
> > boolready;
> >
> > -   rte_spinlock_t  access_lock;
> > -
> > -
> > union {
> > struct vring_used_elem  *shadow_used_split;
> > struct vring_used_elem_packed *shadow_used_packed;
> @@ -384,6
> > +381,7 @@ struct virtio_net {
> > int extbuf;
> > int linearbuf;
> > struct vhost_virtqueue  *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
> > +   rte_spinlock_t  vq_access_lock[VHOST_MAX_QUEUE_PAIRS
> * 2];
> 
> The problem here is that you'll be introducing false sharing, so I expect
> performance to no more scale with the number of queues.
> 
> It also consumes unnecessary memory.
> 
> > struct inflight_mem_info *inflight_info;
> >   #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
> > charifname[IF_NAME_SZ];
> 
> Thanks,
> Maxime



Re: [EXT] Re: [dpdk-dev] [PATCH v2 2/2] app/testpmd: add queue based pfc CLI options

2022-01-27 Thread Ferruh Yigit

On 1/27/2022 7:13 AM, Sunil Kumar Kori wrote:



-Original Message-
From: Ferruh Yigit 
Sent: Tuesday, January 25, 2022 11:07 PM
To: Jerin Jacob Kollanukkaran ; dev@dpdk.org; Xiaoyun
Li ; Aman Singh ; Yuying
Zhang 
Cc: tho...@monjalon.net; ajit.khapa...@broadcom.com;
abo...@pensando.io; andrew.rybche...@oktetlabs.ru;
beilei.x...@intel.com; bruce.richard...@intel.com; ch...@att.com;
chenbo@intel.com; ciara.lof...@intel.com; Devendra Singh Rawat
; ed.cz...@atomicrules.com;
evge...@amazon.com; gr...@u256.net; g.si...@nxp.com;
zhouguoy...@huawei.com; haiyue.w...@intel.com; Harman Kalra
; heinrich.k...@corigine.com;
hemant.agra...@nxp.com; hyon...@cisco.com; igo...@amazon.com; Igor
Russkikh ; jgraj...@cisco.com;
jasvinder.si...@intel.com; jianw...@trustnetic.com;
jiawe...@trustnetic.com; jingjing...@intel.com; johnd...@cisco.com;
john.mil...@atomicrules.com; linvi...@tuxdriver.com; keith.wi...@intel.com;
Kiran Kumar Kokkilagadda ;
ouli...@huawei.com; Liron Himi ;
lon...@microsoft.com; m...@semihalf.com; spin...@cesnet.cz;
ma...@nvidia.com; matt.pet...@windriver.com;
maxime.coque...@redhat.com; m...@semihalf.com; humi...@huawei.com;
Pradeep Kumar Nalla ; Nithin Kumar Dabilpuram
; qiming.y...@intel.com; qi.z.zh...@intel.com;
Radha Chintakuntla ; rahul.lakkire...@chelsio.com;
Rasesh Mody ; rosen...@intel.com;
sachin.sax...@oss.nxp.com; Satha Koteswara Rao Kottidi
; Shahed Shaikh ;
shaib...@amazon.com; shepard.sie...@atomicrules.com;
asoma...@amd.com; somnath.ko...@broadcom.com;
sthem...@microsoft.com; steven.webs...@windriver.com; Sunil Kumar Kori
; mtetsu...@gmail.com; Veerasenareddy Burru
; viachesl...@nvidia.com; xiao.w.w...@intel.com;
cloud.wangxiao...@huawei.com; yisen.zhu...@huawei.com;
yongw...@vmware.com; xuanziya...@huawei.com
Subject: [EXT] Re: [dpdk-dev] [PATCH v2 2/2] app/testpmd: add queue based
pfc CLI options

External Email

--
On 1/13/2022 10:27 AM, jer...@marvell.com wrote:

From: Sunil Kumar Kori 

Patch adds command line options to configure queue based priority flow
control.

- Syntax command is given as below:

set pfc_queue_ctrl  rx\
tx



Isn't the order of the paramters odd, it is mixing Rx/Tx config, what about
ordering Rx and Tx paramters?


It's been kept like this to portray config for rx_pause and tx_pause separately 
i.e. mode and corresponding config.



What do you mean 'separately'? You need to provide all arguments anyway, right?

I was thinking first have the Rx arguments, later Tx, like:

rxtx

Am I missing something, is there a benefit of what you did in this patch?


- Example command to configure queue based priority flow control
on rx and tx side for port 0, Rx queue 0, Tx queue 0 with pause
time 2047

testpmd> set pfc_queue_ctrl 0 rx on 0 0 tx on 0 0 2047

Signed-off-by: Sunil Kumar Kori 


<...>




Re: [PATCH] vhost: fix data-plane access to released vq

2022-01-27 Thread Maxime Coquelin

Hi,

On 1/27/22 11:30, Wang, YuanX wrote:

Hi Maxime,


-Original Message-
From: Maxime Coquelin 
Sent: Wednesday, January 26, 2022 10:03 PM
To: Wang, YuanX ; Xia, Chenbo

Cc: dev@dpdk.org; Hu, Jiayu ; Ding, Xuan
; Ma, WenwuX ; Ling,
WeiX 
Subject: Re: [PATCH] vhost: fix data-plane access to released vq

Hi Yuan,

On 12/3/21 17:34, Yuan Wang wrote:

From: yuan wang 

When numa reallocation occurs, numa_realoc() on the control plane will
free the old vq. If rte_vhost_dequeue_burst() on the data plane get
the vq just before release, then it will access the released vq. We
need to put the
vq->access_lock into struct virtio_net to ensure that it
can prevents this situation.



This patch is a fix, so the Fixes tag would be needed.

But are you really facing this issue, or this is just based on code review?


This issue is run-time checked with AddressSanitizer which can be turned on by:
meson configure -Db_sanitize=address 



Currently NUMA reallocation is called whenever
translate_ring_addresses() is called.

translate_ring_addresses() is primarly called at device initialization, before
the .new_device() callback is called. At that stage, there is no risk to
performa NUMA reallocation as the application is not expected to use APIs
requiring vq->access_lock acquisition.

But I agree there are possibilities that numa_realloc() gets called while device
is in running state. But even if that happened, I don't think it is possible 
that
numa_realloc() ends-up reallocating the virtqueue on a different NUMA
node (the vring should not have moved from a physical memory standpoint).
And if even it happened, we should be safe because we ensure the VQ was
not ready (so not usable by the
application) before proceeding with reallocation:


Here is a scenario where VQ ready has not been set:
1. run the testpmd and then start the data plane process.
2. run the front-end.
3. new_device() gets called when the first two queues are ready, even if the 
later queues are not.
4. when processing messages from the later queues, it may go to numa_realloc(), 
the ready flag has not been set and therefore can be reallocated.


I will need a bit more details here.

AFAICT, if the ready flag is not set for a given virtqueue, the
virtqueue is not supposed to be exposed to the application. Is there a
case where it happens? If so, the fix should consist in ensuring the
application cannot use the virtqueue if it is not ready.

Regards,
Maxime



If all the queues are ready before call new_deivce(), this issue does not occur.
I think maybe it is another solution.


No, that was the older behaviour but causes issues with vDPA.
We cannot just revert to older behaviour.

Thanks,
Maxime


Thanks,
Yuan



static struct virtio_net*
numa_realloc(struct virtio_net *dev, int index) {
int node, dev_node;
struct virtio_net *old_dev;
struct vhost_virtqueue *vq;
struct batch_copy_elem *bce;
struct guest_page *gp;
struct rte_vhost_memory *mem;
size_t mem_size;
int ret;

old_dev = dev;
vq = dev->virtqueue[index];

/*
 * If VQ is ready, it is too late to reallocate, it certainly already
 * happened anyway on VHOST_USER_SET_VRING_ADRR.
 */
if (vq->ready)
return dev;

So, if this is fixing a real issue, I would need more details on the issue in 
order
to understand why vq->ready was not set when it should have been.

On a side note, while trying to understand how you could face an issue, I
noticed that translate_ring_addresses() may be called by
vhost_user_iotlb_msg(). In that case, vq->access_lock is not held as this is
the handler for VHOST_USER_IOTLB_MSG. We may want to protect
translate_ring_addresses() calls with locking the VQ locks. I will post a fix 
for
it.


Signed-off-by: Yuan Wang 
---
   lib/vhost/vhost.c  | 26 +-
   lib/vhost/vhost.h  |  4 +---
   lib/vhost/vhost_user.c |  4 ++--
   lib/vhost/virtio_net.c | 16 
   4 files changed, 24 insertions(+), 26 deletions(-)



...


diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index
7085e0885c..f85ce4fda5 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -185,9 +185,6 @@ struct vhost_virtqueue {
boolaccess_ok;
boolready;

-   rte_spinlock_t  access_lock;
-
-
union {
struct vring_used_elem  *shadow_used_split;
struct vring_used_elem_packed *shadow_used_packed;

@@ -384,6

+381,7 @@ struct virtio_net {
int extbuf;
int linearbuf;
struct vhost_virtqueue  *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
+   rte_spinlock_t  vq_access_lock[VHOST_MAX_QUEUE_PAIRS

* 2];

The problem here is that you'll be introducing false sharing, so I expect
performance to no more scale with the number of queues.

It also consumes unnecessary memory.



[PATCH] devtools: fix comment detection in forbidden token check

2022-01-27 Thread David Marchand
After a comment section was detected, passing to a new hunk was not seen
as ending the section and all subsequent hunks were ignored.

Fixes: 7413e7f2aeb3 ("devtools: alert on new calls to exit from libs")
Cc: sta...@dpdk.org

Reported-by: Thomas Monjalon 
Signed-off-by: David Marchand 
---
 devtools/check-forbidden-tokens.awk | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/devtools/check-forbidden-tokens.awk 
b/devtools/check-forbidden-tokens.awk
index 61ba707c9b..026844141c 100755
--- a/devtools/check-forbidden-tokens.awk
+++ b/devtools/check-forbidden-tokens.awk
@@ -20,6 +20,9 @@ BEGIN {
 # state machine assumes the comments structure is enforced by
 # checkpatches.pl
 (in_file) {
+   if ($0 ~ "^@@") {
+   in_comment = 0
+   }
# comment start
if (index($0,comment_start) > 0) {
in_comment = 1
-- 
2.23.0



RE: [PATCH v2 1/4] crypto: use single buffer for asymmetric session

2022-01-27 Thread Zhang, Roy Fan
> -Original Message-
> From: Power, Ciara 
> Sent: Monday, January 24, 2022 3:04 PM
> To: dev@dpdk.org
> Cc: Zhang, Roy Fan ; gak...@marvell.com;
> ano...@marvell.com; m...@ashroe.eu; Power, Ciara
> ; Doherty, Declan ;
> Ankur Dwivedi ; Tejasree Kondoj
> ; Griffin, John ; Trahe,
> Fiona ; Jain, Deepak K 
> Subject: [PATCH v2 1/4] crypto: use single buffer for asymmetric session
> 
> Rather than using a session buffer that contains pointers to private
> session data elsewhere, have a single session buffer.
> This session is created for a driver ID, and the mempool element
> contains space for the max session private data needed for any driver.
> 
> Signed-off-by: Ciara Power 
> 
> ---
Although Anoob had slightly different idea on missed iova address to the 
session private data,
the general approach looks ok to me.
Acked-by: Fan Zhang 


RE: [PATCH v2 2/4] crypto: hide asym session structure

2022-01-27 Thread Zhang, Roy Fan
> -Original Message-
> From: Power, Ciara 
> Sent: Monday, January 24, 2022 3:04 PM
> To: dev@dpdk.org
> Cc: Zhang, Roy Fan ; gak...@marvell.com;
> ano...@marvell.com; m...@ashroe.eu; Power, Ciara
> ; Doherty, Declan ;
> Ankur Dwivedi ; Tejasree Kondoj
> ; Griffin, John ; Trahe,
> Fiona ; Jain, Deepak K 
> Subject: [PATCH v2 2/4] crypto: hide asym session structure
> 
> The rte_cryptodev_asym_session structure is now moved to an internal
> header. This will no longer be used directly by apps,
> private session data can be accessed via get API.
> 
> Signed-off-by: Ciara Power 
> ---
Acked-by: Fan Zhang 


RE: [PATCH v2 3/4] crypto: add asym session user data API

2022-01-27 Thread Zhang, Roy Fan
> -Original Message-
> From: Power, Ciara 
> Sent: Monday, January 24, 2022 3:04 PM
> To: dev@dpdk.org
> Cc: Zhang, Roy Fan ; gak...@marvell.com;
> ano...@marvell.com; m...@ashroe.eu; Power, Ciara
> ; Doherty, Declan 
> Subject: [PATCH v2 3/4] crypto: add asym session user data API
> 
> A user data field is added to the asymmetric session structure.
> Relevant API added to get/set the field.
> 
> Signed-off-by: Ciara Power 
> 
> ---
Acked-by: Fan Zhang 


RE: [PATCH v2 4/4] crypto: modify return value for asym session create

2022-01-27 Thread Zhang, Roy Fan
> -Original Message-
> From: Power, Ciara 
> Sent: Monday, January 24, 2022 3:04 PM
> To: dev@dpdk.org
> Cc: Zhang, Roy Fan ; gak...@marvell.com;
> ano...@marvell.com; m...@ashroe.eu; Power, Ciara
> ; Doherty, Declan 
> Subject: [PATCH v2 4/4] crypto: modify return value for asym session create
> 
> Rather than the asym session create function returning a session on
> success, and a NULL value on error, it is modified to now return int
> values - 0 on success or -EINVAL/-ENOTSUP/-ENOMEM on failure.
> The session to be used is passed as input.
> 
> This adds clarity on the failure of the create function, which enables
> treating the -ENOTSUP return as TEST_SKIPPED in test apps.
> 
> Signed-off-by: Ciara Power 
> ---
Acked-by: Fan Zhang 


RE: [PATCH v2 77/83] compressdev: remove unnecessary NULL checks

2022-01-27 Thread Zhang, Roy Fan
> -Original Message-
> From: Stephen Hemminger 
> Sent: Monday, January 24, 2022 5:47 PM
> To: dev@dpdk.org
> Cc: Stephen Hemminger ; Zhang, Roy Fan
> ; Ashish Gupta 
> Subject: [PATCH v2 77/83] compressdev: remove unnecessary NULL checks
> 
> Remove redundant NULL pointer checks before free functions
> found by nullfree.cocci
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  lib/compressdev/rte_compressdev.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/lib/compressdev/rte_compressdev.c
> b/lib/compressdev/rte_compressdev.c
> index 2e9218af68f6..d4f7d4d3daf2 100644
> --- a/lib/compressdev/rte_compressdev.c
> +++ b/lib/compressdev/rte_compressdev.c
> @@ -405,8 +405,7 @@ rte_compressdev_queue_pairs_release(struct
> rte_compressdev *dev)
>   return ret;
>   }
> 
> - if (dev->data->queue_pairs != NULL)
> - rte_free(dev->data->queue_pairs);
> + rte_free(dev->data->queue_pairs);
>   dev->data->queue_pairs = NULL;
>   dev->data->nb_queue_pairs = 0;
> 
> --
> 2.30.2
Acked-by: Fan Zhang 


RE: [PATCH v2 32/83] crypto/ipsec_mb: remove unnecessary NULL checks

2022-01-27 Thread Zhang, Roy Fan
> -Original Message-
> From: Stephen Hemminger 
> Sent: Monday, January 24, 2022 5:46 PM
> To: dev@dpdk.org
> Cc: Stephen Hemminger ; Zhang, Roy Fan
> ; De Lara Guarch, Pablo
> 
> Subject: [PATCH v2 32/83] crypto/ipsec_mb: remove unnecessary NULL
> checks
> 
> Remove redundant NULL pointer checks before free functions
> found by nullfree.cocci
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/crypto/ipsec_mb/ipsec_mb_ops.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/crypto/ipsec_mb/ipsec_mb_ops.c
> b/drivers/crypto/ipsec_mb/ipsec_mb_ops.c
> index 189262c4ad52..f808da9edf89 100644
> --- a/drivers/crypto/ipsec_mb/ipsec_mb_ops.c
> +++ b/drivers/crypto/ipsec_mb/ipsec_mb_ops.c
> @@ -102,8 +102,7 @@ ipsec_mb_qp_release(struct rte_cryptodev *dev,
> uint16_t qp_id)
> 
>   if (qp != NULL && rte_eal_process_type() == RTE_PROC_PRIMARY) {
>   r = rte_ring_lookup(qp->name);
> - if (r)
> - rte_ring_free(r);
> + rte_ring_free(r);
> 
>  #if IMB_VERSION(1, 1, 0) > IMB_VERSION_NUM
>   if (qp->mb_mgr)
> @@ -291,8 +290,7 @@ ipsec_mb_qp_setup(struct rte_cryptodev *dev,
> uint16_t qp_id,
>   if (qp->mb_mgr_mz)
>   rte_memzone_free(qp->mb_mgr_mz);
>  #endif
> - if (qp)
> - rte_free(qp);
> + rte_free(qp);
>   return ret;
>  }
> 
> --
> 2.30.2
Acked-by: Fan Zhang 


[PATCH] vhost: fix unsafe vrings addresses modifications

2022-01-27 Thread Maxime Coquelin
This patch adds missing protection around vring_invalidate
and translate_ring_addresses calls in vhost_user_iotlb_msg.

Fixes: eefac9536a90 ("vhost: postpone device creation until rings are mapped")
Cc: sta...@dpdk.org

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost_user.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 5eb1dd6812..ae8513c465 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -2566,8 +2566,11 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
vhost_user_iotlb_cache_insert(vq, imsg->iova, vva,
len, imsg->perm);
 
-   if (is_vring_iotlb(dev, vq, imsg))
+   if (is_vring_iotlb(dev, vq, imsg)) {
+   rte_spinlock_lock(&vq->access_lock);
*pdev = dev = translate_ring_addresses(dev, i);
+   rte_spinlock_unlock(&vq->access_lock);
+   }
}
break;
case VHOST_IOTLB_INVALIDATE:
@@ -2580,8 +2583,11 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct 
VhostUserMsg *msg,
vhost_user_iotlb_cache_remove(vq, imsg->iova,
imsg->size);
 
-   if (is_vring_iotlb(dev, vq, imsg))
+   if (is_vring_iotlb(dev, vq, imsg)) {
+   rte_spinlock_lock(&vq->access_lock);
vring_invalidate(dev, vq);
+   rte_spinlock_unlock(&vq->access_lock);
+   }
}
break;
default:
-- 
2.34.1



Re: [PATCH v2 0/6] Fast restart with many hugepages

2022-01-27 Thread Bruce Richardson
On Wed, Jan 19, 2022 at 11:09:11PM +0200, Dmitry Kozlyuk wrote:
> This patchset is a new design and implementation of [1].
> 
> v2:
>   * Fix hugepage file removal when they are no longer used.
> Disable removal with --huge-unlink=never as intended.
> Document this behavior difference. (Bruce)
>   * Improve documentation, commit messages, and naming. (Thomas)
> 
Thanks for the v2, I now see the promised perf improvements when running
some quick tests with testpmd. Some quick numbers below, summary version is
that for testpmd with default mempool size startup/exit time drops from
1.7s to 1.4s, and when I increase mempool size to 4M mbufs, time drops
from 7.6s to 3.9s.

/Bruce

cmd: "time echo "quit" | sudo ./build/app/dpdk-testpmd -c F --no-pci -- -i"

Baseline (no patches) - 1.7 sec
Baseline (with patches) - 1.7 sec
Huge-unlink=never - 1.4 sec

Adding --total-num-mbufs=4096000

Baseline (with patches) - 7.6 sec
Huge-unlink=never - 3.9 sec


Re: [PATCH v2] net/enic: add support for eCPRI matching

2022-01-27 Thread Ferruh Yigit

On 1/26/2022 9:48 PM, John Daley wrote:

eCPRI message can be over Ethernet layer (.1Q supported also) or over
UDP layer. Message header formats are the same in these two variants.

Only up though the first packet header in the PDU can be matched.
RSS on the eCPRI header fields is not supported.

Signed-off-by: John Daley 
Reviewed-by: Hyong Youb Kim 
---
v2 - include enic.ini update

  doc/guides/nics/features/enic.ini  |  1 +
  doc/guides/rel_notes/release_22_03.rst |  1 +
  drivers/net/enic/enic_fm_flow.c| 65 ++
  3 files changed, 67 insertions(+)

diff --git a/doc/guides/nics/features/enic.ini 
b/doc/guides/nics/features/enic.ini
index c3bcead05e..88e4ef8c64 100644
--- a/doc/guides/nics/features/enic.ini
+++ b/doc/guides/nics/features/enic.ini
@@ -53,6 +53,7 @@ vlan = Y
  vxlan= Y
  geneve   = Y
  geneve_opt   = Y
+ecpri= Y


Can you please add in alphabetical order, as in 
'doc/guides/nics/features/default.ini'?

  
  [rte_flow actions]

  count= Y
diff --git a/doc/guides/rel_notes/release_22_03.rst 
b/doc/guides/rel_notes/release_22_03.rst
index b38dc54e62..52d1e32cf6 100644
--- a/doc/guides/rel_notes/release_22_03.rst
+++ b/doc/guides/rel_notes/release_22_03.rst
@@ -58,6 +58,7 @@ New Features
  * **Updated Cisco enic driver.**
  
* Added rte_flow support for matching GENEVE packets.

+  * Added rte_flow support for matching eCPRI packets.
  
  Removed Items

  -
diff --git a/drivers/net/enic/enic_fm_flow.c b/drivers/net/enic/enic_fm_flow.c
index 752ffeb5c5..589c9253e1 100644
--- a/drivers/net/enic/enic_fm_flow.c
+++ b/drivers/net/enic/enic_fm_flow.c
@@ -237,6 +237,7 @@ static enic_copy_item_fn enic_fm_copy_item_vxlan;
  static enic_copy_item_fn enic_fm_copy_item_gtp;
  static enic_copy_item_fn enic_fm_copy_item_geneve;
  static enic_copy_item_fn enic_fm_copy_item_geneve_opt;
+static enic_copy_item_fn enic_fm_copy_item_ecpri;
  
  /* Ingress actions */

  static const enum rte_flow_action_type enic_fm_supported_ig_actions[] = {
@@ -392,6 +393,15 @@ static const struct enic_fm_items enic_fm_items[] = {
   RTE_FLOW_ITEM_TYPE_END,
},
},
+   [RTE_FLOW_ITEM_TYPE_ECPRI] = {
+   .copy_item = enic_fm_copy_item_ecpri,
+   .valid_start_item = 1,
+   .prev_items = (const enum rte_flow_item_type[]) {
+  RTE_FLOW_ITEM_TYPE_ETH,
+  RTE_FLOW_ITEM_TYPE_UDP,
+  RTE_FLOW_ITEM_TYPE_END,
+   },
+   },
  };
  
  static int

@@ -877,6 +887,61 @@ enic_fm_copy_item_geneve_opt(struct copy_item_args *arg)
return 0;
  }
  
+/* Match eCPRI combined message header */

+static int
+enic_fm_copy_item_ecpri(struct copy_item_args *arg)
+{
+   const struct rte_flow_item *item = arg->item;
+   const struct rte_flow_item_ecpri *spec = item->spec;
+   const struct rte_flow_item_ecpri *mask = item->mask;
+   struct fm_tcam_match_entry *entry = arg->fm_tcam_entry;
+   struct fm_header_set *fm_data, *fm_mask;
+   uint8_t *fm_data_to, *fm_mask_to;
+
+   ENICPMD_FUNC_TRACE();
+
+   /* Tunneling not supported- only matching on inner eCPRI fields. */
+   if (arg->header_level > 0)
+   return -EINVAL;
+
+   /* Need both spec and mask */
+   if (!spec || !mask)
+   return -EINVAL;
+
+   fm_data = &entry->ftm_data.fk_hdrset[0];
+   fm_mask = &entry->ftm_mask.fk_hdrset[0];
+
+   /* eCPRI can only follow L2/VLAN layer if ethernet type is 0xAEFE. */
+   if (!(fm_data->fk_metadata & FKM_UDP) &&
+   (fm_mask->l2.eth.fk_ethtype != UINT16_MAX ||
+   rte_cpu_to_be_16(fm_data->l2.eth.fk_ethtype) !=
+   RTE_ETHER_TYPE_ECPRI))
+   return -EINVAL;
+
+   if (fm_data->fk_metadata & FKM_UDP) {
+   /* eCPRI on UDP */
+   fm_data->fk_header_select |= FKH_L4RAW;
+   fm_mask->fk_header_select |= FKH_L4RAW;
+   fm_data_to = &fm_data->l4.rawdata[sizeof(fm_data->l4.udp)];
+   fm_mask_to = &fm_mask->l4.rawdata[sizeof(fm_data->l4.udp)];
+   } else {
+   /* eCPRI directly after Etherent header */
+   fm_data->fk_header_select |= FKH_L3RAW;
+   fm_mask->fk_header_select |= FKH_L3RAW;
+   fm_data_to = &fm_data->l3.rawdata[0];
+   fm_mask_to = &fm_mask->l3.rawdata[0];
+   }
+
+   /*
+* Use the raw L3 or L4 buffer to match eCPRI since fm_header_set does
+* not have eCPRI header. Only 1st message header of PDU can be matched.
+* "C" * bit ignored.
+*/
+   memcpy(fm_data_to, spec, sizeof(*spec));
+   memcpy(fm_mask_to, mask, sizeof(*mask));
+   return 0;
+}
+
  /*
   * Currently, raw pattern match is very limited. It is intended for matching
   * UDP t

Re: [PATCH v1] raw/ifpga: fix ifpga devices cleanup function

2022-01-27 Thread Ferruh Yigit

On 1/27/2022 8:57 AM, Huang, Wei wrote:

Hi,


-Original Message-
From: Yigit, Ferruh 
Sent: Wednesday, January 26, 2022 21:25
To: Huang, Wei ; dev@dpdk.org; Xu, Rosen
; Zhang, Qi Z ; Nipun Gupta
; Hemant Agrawal 
Cc: sta...@dpdk.org; Zhang, Tianfei 
Subject: Re: [PATCH v1] raw/ifpga: fix ifpga devices cleanup function

On 1/26/2022 3:29 AM, Wei Huang wrote:

Use rte_dev_remove() to replace rte_rawdev_pmd_release() in
ifpga_rawdev_cleanup(), resources occupied by ifpga raw devices such
as threads can be released correctly.



As far as I understand you are fixing an issue that not all resources are
released, is this correct?
What are these not released resources?

And 'rte_rawdev_pmd_release()' rawdev API seems intended to do the
cleanup, is it expected that some resources are not freed after this call, or
should we fix that API?
If the device remove API needs to be used, what is the point of
'rte_rawdev_pmd_release()' API?

cc'ed rawdev maintainers for comment.


Yes, this patch is to release all the resources of ifpga_rawdev after testpmd 
exit, the not released resources are interrupt and thread.

rte_rawdev_pmd_release implemented in ifpga_rawdev only release memory 
allocated by ifpga driver, that's the expected behavior.

I think it's a simple and safe way to release resources completely by calling 
rte_dev_remove.



If device hot remove is better option, why 'rte_rawdev_pmd_release()' API 
exists?




Fixes: f724a802 ("raw/ifpga: add miscellaneous APIs")

Signed-off-by: Wei Huang 
---
   drivers/raw/ifpga/ifpga_rawdev.c | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/raw/ifpga/ifpga_rawdev.c
b/drivers/raw/ifpga/ifpga_rawdev.c
index fdf3c23..88c38aa 100644
--- a/drivers/raw/ifpga/ifpga_rawdev.c
+++ b/drivers/raw/ifpga/ifpga_rawdev.c
@@ -1787,12 +1787,14 @@ int ifpga_rawdev_partial_reconfigure(struct

rte_rawdev *dev, int port,

   void ifpga_rawdev_cleanup(void)
   {
struct ifpga_rawdev *dev;
+   struct rte_rawdev *rdev;
unsigned int i;

for (i = 0; i < IFPGA_RAWDEV_NUM; i++) {
dev = &ifpga_rawdevices[i];
if (dev->rawdev) {
-   rte_rawdev_pmd_release(dev->rawdev);
+   rdev = dev->rawdev;
+   rte_dev_remove(rdev->device);
dev->rawdev = NULL;
}
}






Re: [PATCH v2 00/15] fix and feature for hns3 PMD

2022-01-27 Thread Ferruh Yigit

On 1/22/2022 1:51 AM, Min Hu (Connor) wrote:

This patch contains 15 patches, which include fixing codecheck warning
,code refactor and indirect counter action support.

Chengwen Feng (4):
   net/hns3: remove invalid encapsulation function
   net/hns3: delete strerror invoke
   net/hns3: rename function
   net/hns3: support indirect counter action

Huisong Li (10):
   net/hns3: remove unnecessary assignment
   net/hns3: fix a misjudgment expression
   net/hns3: extract a common API to initialize MAC addrs
   net/hns3: remove unnecessary black lines
   net/hns3: extract a function to handle reset fail
   net/hns3: remove unused variables
   net/hns3: remove the number of queue descriptors
   net/hns3: remove the printing of memory addresses
   net/hns3: extract a common interface to obtain revision ID
   net/hns3: extract functions to create RSS and FDIR flow rule

Jie Hai (1):
   net/hns3: remove unnecessary 'inline'



Hi Connor,

There are some patches in the set that are sent by non maintainers,
for them it requires maintainer ack.
Since you are sending the patch, it implies that you are OK with them
but can you please add your explicit review/ack tags to them?


Re: [PATCH v2 11/15] net/hns3: remove invalid encapsulation function

2022-01-27 Thread Ferruh Yigit

On 1/22/2022 1:51 AM, Min Hu (Connor) wrote:

From: Chengwen Feng 

This patch remove invalid encapsulation functions.


removed functions don't look like encapsulation functions, updating
commit log while merging.



Signed-off-by: Chengwen Feng 
---
  drivers/net/hns3/hns3_ethdev.c| 27 ---
  drivers/net/hns3/hns3_ethdev_vf.c | 13 ++---
  2 files changed, 10 insertions(+), 30 deletions(-)

diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index 9db0cb19f8..491fa41888 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -593,22 +593,6 @@ hns3_set_vlan_rx_offload_cfg(struct hns3_adapter *hns,
return ret;
  }
  
-static void

-hns3_update_rx_offload_cfg(struct hns3_adapter *hns,
-  struct hns3_rx_vtag_cfg *vcfg)
-{
-   struct hns3_pf *pf = &hns->pf;
-   memcpy(&pf->vtag_config.rx_vcfg, vcfg, sizeof(pf->vtag_config.rx_vcfg));
-}
-
-static void
-hns3_update_tx_offload_cfg(struct hns3_adapter *hns,
-  struct hns3_tx_vtag_cfg *vcfg)
-{
-   struct hns3_pf *pf = &hns->pf;
-   memcpy(&pf->vtag_config.tx_vcfg, vcfg, sizeof(pf->vtag_config.tx_vcfg));
-}
-
  static int
  hns3_en_hw_strip_rxvtag(struct hns3_adapter *hns, bool enable)
  {
@@ -638,7 +622,8 @@ hns3_en_hw_strip_rxvtag(struct hns3_adapter *hns, bool 
enable)
return ret;
}
  
-	hns3_update_rx_offload_cfg(hns, &rxvlan_cfg);

+   memcpy(&hns->pf.vtag_config.rx_vcfg, &rxvlan_cfg,
+  sizeof(struct hns3_rx_vtag_cfg));
  
  	return ret;

  }
@@ -836,7 +821,9 @@ hns3_vlan_txvlan_cfg(struct hns3_adapter *hns, uint16_t 
port_base_vlan_state,
return ret;
}
  
-	hns3_update_tx_offload_cfg(hns, &txvlan_cfg);

+   memcpy(&hns->pf.vtag_config.tx_vcfg, &txvlan_cfg,
+  sizeof(struct hns3_tx_vtag_cfg));
+
return ret;
  }
  
@@ -962,7 +949,9 @@ hns3_en_pvid_strip(struct hns3_adapter *hns, int on)

if (ret)
return ret;
  
-	hns3_update_rx_offload_cfg(hns, &rx_vlan_cfg);

+   memcpy(&hns->pf.vtag_config.rx_vcfg, &rx_vlan_cfg,
+  sizeof(struct hns3_rx_vtag_cfg));
+
return ret;
  }
  
diff --git a/drivers/net/hns3/hns3_ethdev_vf.c b/drivers/net/hns3/hns3_ethdev_vf.c

index a9e129288b..1af2e07e81 100644
--- a/drivers/net/hns3/hns3_ethdev_vf.c
+++ b/drivers/net/hns3/hns3_ethdev_vf.c
@@ -1026,15 +1026,6 @@ hns3vf_get_configuration(struct hns3_hw *hw)
return hns3vf_get_port_base_vlan_filter_state(hw);
  }
  
-static int

-hns3vf_set_tc_queue_mapping(struct hns3_adapter *hns, uint16_t nb_rx_q,
-   uint16_t nb_tx_q)
-{
-   struct hns3_hw *hw = &hns->hw;
-
-   return hns3_queue_to_tc_mapping(hw, nb_rx_q, nb_tx_q);
-}
-
  static void
  hns3vf_request_link_info(struct hns3_hw *hw)
  {
@@ -1530,7 +1521,7 @@ hns3vf_init_vf(struct rte_eth_dev *eth_dev)
goto err_set_tc_queue;
}
  
-	ret = hns3vf_set_tc_queue_mapping(hns, hw->tqps_num, hw->tqps_num);

+   ret = hns3_queue_to_tc_mapping(hw, hw->tqps_num, hw->tqps_num);
if (ret) {
PMD_INIT_LOG(ERR, "failed to set tc info, ret = %d.", ret);
goto err_set_tc_queue;
@@ -1739,7 +1730,7 @@ hns3vf_do_start(struct hns3_adapter *hns, bool 
reset_queue)
uint16_t nb_tx_q = hw->data->nb_tx_queues;
int ret;
  
-	ret = hns3vf_set_tc_queue_mapping(hns, nb_rx_q, nb_tx_q);

+   ret = hns3_queue_to_tc_mapping(hw, nb_rx_q, nb_tx_q);
if (ret)
return ret;
  




RE: [PATCH v2 5/6] net/axgbe: add support for new port mode

2022-01-27 Thread Namburu, Chandu-babu
[Public]

Acked-by: Chandubabu Namburu 

-Original Message-
From: sseba...@amd.com  
Sent: Tuesday, January 25, 2022 5:48 PM
To: dev@dpdk.org
Subject: [PATCH v2 5/6] net/axgbe: add support for new port mode

From: Selwin Sebastian 

Add support for a new port mode that is a backplane connection without support 
for auto negotiation.

Signed-off-by: Selwin Sebastian 
---
 drivers/net/axgbe/axgbe_phy_impl.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/axgbe/axgbe_phy_impl.c 
b/drivers/net/axgbe/axgbe_phy_impl.c
index eefb03e94e..b0e1c267b1 100644
--- a/drivers/net/axgbe/axgbe_phy_impl.c
+++ b/drivers/net/axgbe/axgbe_phy_impl.c
@@ -46,6 +46,7 @@ enum axgbe_port_mode {
AXGBE_PORT_MODE_10GBASE_T,
AXGBE_PORT_MODE_10GBASE_R,
AXGBE_PORT_MODE_SFP,
+   AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG,
AXGBE_PORT_MODE_MAX,
 };
 
@@ -885,6 +886,7 @@ static enum axgbe_mode axgbe_phy_an73_redrv_outcome(struct 
axgbe_port *pdata)
if (ad_reg & 0x80) {
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
mode = AXGBE_MODE_KR;
break;
default:
@@ -894,6 +896,7 @@ static enum axgbe_mode axgbe_phy_an73_redrv_outcome(struct 
axgbe_port *pdata)
} else if (ad_reg & 0x20) {
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
mode = AXGBE_MODE_KX_1000;
break;
case AXGBE_PORT_MODE_1000BASE_X:
@@ -1052,6 +1055,7 @@ static unsigned int axgbe_phy_an_advertising(struct 
axgbe_port *pdata)
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
advertising |= ADVERTISED_1baseKR_Full;
break;
case AXGBE_PORT_MODE_BACKPLANE_2500:
@@ -1122,6 +1126,7 @@ static enum axgbe_an_mode axgbe_phy_an_mode(struct 
axgbe_port *pdata)
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
return AXGBE_AN_MODE_CL73;
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
case AXGBE_PORT_MODE_BACKPLANE_2500:
return AXGBE_AN_MODE_NONE;
case AXGBE_PORT_MODE_1000BASE_T:
@@ -1400,6 +1405,7 @@ static enum axgbe_mode axgbe_phy_switch_mode(struct 
axgbe_port *pdata)
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
return axgbe_phy_switch_bp_mode(pdata);
case AXGBE_PORT_MODE_BACKPLANE_2500:
return axgbe_phy_switch_bp_2500_mode(pdata);
@@ -1495,6 +1501,7 @@ static enum axgbe_mode axgbe_phy_get_mode(struct 
axgbe_port *pdata,
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
return axgbe_phy_get_bp_mode(speed);
case AXGBE_PORT_MODE_BACKPLANE_2500:
return axgbe_phy_get_bp_2500_mode(speed);
@@ -1644,6 +1651,7 @@ static bool axgbe_phy_use_mode(struct axgbe_port *pdata, 
enum axgbe_mode mode)
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
return axgbe_phy_use_bp_mode(pdata, mode);
case AXGBE_PORT_MODE_BACKPLANE_2500:
return axgbe_phy_use_bp_2500_mode(pdata, mode); @@ -1806,6 
+1814,7 @@ static bool axgbe_phy_port_mode_mismatch(struct axgbe_port *pdata)
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
if ((phy_data->port_speeds & AXGBE_PHY_PORT_SPEED_1000) ||
(phy_data->port_speeds & AXGBE_PHY_PORT_SPEED_1))
return false;
@@ -1858,6 +1867,7 @@ static bool axgbe_phy_conn_type_mismatch(struct 
axgbe_port *pdata)
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
case AXGBE_PORT_MODE_BACKPLANE_2500:
if (phy_data->conn_type == AXGBE_CONN_TYPE_BACKPLANE)
return false;
@@ -2122,6 +2132,8 @@ static int axgbe_phy_init(struct axgbe_port *pdata)
/* Backplane support */
case AXGBE_PORT_MODE_BACKPLANE:
pdata->phy.supported |= SUPPORTED_Autoneg;
+   /* Fallthrough */
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
pdata->phy.supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
pdata->phy.supported |= SUPPORTED_Backplane;
if (phy_data->port_speeds & AXGBE_PHY_PORT_SPEED_1000) {
--
2.25.1


RE: [PATCH v2 4/6] net/axgbe: reset PHY Rx when mailbox command timeout

2022-01-27 Thread Namburu, Chandu-babu
[Public]

Acked-by: Chandubabu Namburu 

-Original Message-
From: sseba...@amd.com  
Sent: Tuesday, January 25, 2022 5:48 PM
To: dev@dpdk.org
Subject: [PATCH v2 4/6] net/axgbe: reset PHY Rx when mailbox command timeout

From: Selwin Sebastian 

Sometimes mailbox commands timeout when the RX data path becomes unresponsive. 
This prevents the submission of new mailbox commands to DXIO. This patch 
identifies the timeout and resets the RX data path so that the next message can 
be submitted properly.

Signed-off-by: Selwin Sebastian 
---
 drivers/net/axgbe/axgbe_common.h   | 14 ++
 drivers/net/axgbe/axgbe_phy_impl.c | 29 -
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/net/axgbe/axgbe_common.h b/drivers/net/axgbe/axgbe_common.h
index 5a7ac35b6a..a5431dd998 100644
--- a/drivers/net/axgbe/axgbe_common.h
+++ b/drivers/net/axgbe/axgbe_common.h
@@ -1270,10 +1270,18 @@
 #define MDIO_PMA_10GBR_FECCTRL 0x00ab
 #endif
 
+#ifndef MDIO_PMA_RX_CTRL1
+#define MDIO_PMA_RX_CTRL1  0x8051
+#endif
+
 #ifndef MDIO_PCS_DIG_CTRL
 #define MDIO_PCS_DIG_CTRL  0x8000
 #endif
 
+#ifndef MDIO_PCS_DIGITAL_STAT
+#define MDIO_PCS_DIGITAL_STAT  0x8010
+#endif
+
 #ifndef MDIO_AN_XNP
 #define MDIO_AN_XNP0x0016
 #endif
@@ -1354,6 +1362,8 @@
 #define AXGBE_KR_TRAINING_ENABLE   BIT(1)
 
 #define AXGBE_PCS_CL37_BP  BIT(12)
+#define XGBE_PCS_PSEQ_STATE_MASK   0x1c
+#define XGBE_PCS_PSEQ_STATE_POWER_GOOD 0x10
 
 #define AXGBE_AN_CL37_INT_CMPLTBIT(0)
 #define AXGBE_AN_CL37_INT_MASK 0x01
@@ -1401,6 +1411,10 @@ static inline uint32_t high32_value(uint64_t addr)
 #define XGBE_PMA_PLL_CTRL_SET  BIT(15)
 #define XGBE_PMA_PLL_CTRL_CLEAR0x
 
+#define XGBE_PMA_RX_RST_0_MASK BIT(4)
+#define XGBE_PMA_RX_RST_0_RESET_ON 0x10
+#define XGBE_PMA_RX_RST_0_RESET_OFF0x00
+
 /*END*/
 
 /* Bit setting and getting macros
diff --git a/drivers/net/axgbe/axgbe_phy_impl.c 
b/drivers/net/axgbe/axgbe_phy_impl.c
index 2ed94868b8..eefb03e94e 100644
--- a/drivers/net/axgbe/axgbe_phy_impl.c
+++ b/drivers/net/axgbe/axgbe_phy_impl.c
@@ -1196,6 +1196,28 @@ static void axgbe_phy_set_redrv_mode(struct axgbe_port 
*pdata)
axgbe_phy_put_comm_ownership(pdata);
 }
 
+static void axgbe_phy_rx_reset(struct axgbe_port *pdata) {
+   int reg;
+
+   reg = XMDIO_READ_BITS(pdata, MDIO_MMD_PCS, MDIO_PCS_DIGITAL_STAT,
+ XGBE_PCS_PSEQ_STATE_MASK);
+   if (reg == XGBE_PCS_PSEQ_STATE_POWER_GOOD) {
+   /* Mailbox command timed out, reset of RX block is required.
+* This can be done by asseting the reset bit and wait for
+* its compeletion.
+*/
+   XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_RX_CTRL1,
+XGBE_PMA_RX_RST_0_MASK, 
XGBE_PMA_RX_RST_0_RESET_ON);
+   rte_delay_us(20);
+   XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_RX_CTRL1,
+XGBE_PMA_RX_RST_0_MASK, 
XGBE_PMA_RX_RST_0_RESET_OFF);
+   rte_delay_us(45);
+   PMD_DRV_LOG(ERR, "firmware mailbox reset performed\n");
+   }
+}
+
+
 static void axgbe_phy_pll_ctrl(struct axgbe_port *pdata, bool enable)  {
XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_VEND2_PMA_MISC_CTRL0, @@ 
-1216,8 +1238,10 @@ static void axgbe_phy_perform_ratechange(struct axgbe_port 
*pdata,
axgbe_phy_pll_ctrl(pdata, false);
 
/* Log if a previous command did not complete */
-   if (XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS))
+   if (XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS)) {
PMD_DRV_LOG(NOTICE, "firmware mailbox not ready for command\n");
+   axgbe_phy_rx_reset(pdata);
+   }
 
/* Construct the command */
XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, COMMAND, cmd); @@ -1235,6 +1259,9 
@@ static void axgbe_phy_perform_ratechange(struct axgbe_port *pdata,
goto reenable_pll;
rte_delay_us(1500);
}
+   PMD_DRV_LOG(NOTICE, "firmware mailbox command did not complete\n");
+   /* Reset on error */
+   axgbe_phy_rx_reset(pdata);
 
 reenable_pll:
 /* Re-enable the PLL control */
--
2.25.1


RE: [PATCH v2 3/6] net/axgbe: simplify mailbox interface rate change code

2022-01-27 Thread Namburu, Chandu-babu
[Public]

Acked-by: Chandubabu Namburu 

-Original Message-
From: sseba...@amd.com  
Sent: Tuesday, January 25, 2022 5:48 PM
To: dev@dpdk.org
Subject: [PATCH v2 3/6] net/axgbe: simplify mailbox interface rate change code

From: Selwin Sebastian 

Simplify and centralize the mailbox command rate change interface by having a 
single function perform the writes to the mailbox registers to issue the 
request.

Signed-off-by: Selwin Sebastian 
---
 drivers/net/axgbe/axgbe_phy_impl.c | 95 --
 1 file changed, 23 insertions(+), 72 deletions(-)

diff --git a/drivers/net/axgbe/axgbe_phy_impl.c 
b/drivers/net/axgbe/axgbe_phy_impl.c
index 08d3484a11..2ed94868b8 100644
--- a/drivers/net/axgbe/axgbe_phy_impl.c
+++ b/drivers/net/axgbe/axgbe_phy_impl.c
@@ -1207,21 +1207,26 @@ static void axgbe_phy_pll_ctrl(struct axgbe_port 
*pdata, bool enable)
rte_delay_us(150);
 }
 
-static void axgbe_phy_start_ratechange(struct axgbe_port *pdata)
+static void axgbe_phy_perform_ratechange(struct axgbe_port *pdata,
+   unsigned int cmd, unsigned int sub_cmd)
 {
+   unsigned int s0 = 0;
+   unsigned int wait;
/* Clear the PLL so that it helps in power down sequence */
axgbe_phy_pll_ctrl(pdata, false);
 
/* Log if a previous command did not complete */
if (XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS))
PMD_DRV_LOG(NOTICE, "firmware mailbox not ready for command\n");
-   else
-   return;
-}
 
-static void axgbe_phy_complete_ratechange(struct axgbe_port *pdata) -{
-   unsigned int wait;
+   /* Construct the command */
+   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, COMMAND, cmd);
+   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, sub_cmd);
+
+   /* Issue the command */
+   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, s0);
+   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
+   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
 
/* Wait for command to complete */
wait = AXGBE_RATECHANGE_COUNT;
@@ -1240,21 +1245,10 @@ static void axgbe_phy_complete_ratechange(struct 
axgbe_port *pdata)
 
 static void axgbe_phy_rrc(struct axgbe_port *pdata)  {
-   unsigned int s0;
 
-   axgbe_phy_start_ratechange(pdata);
 
/* Receiver Reset Cycle */
-   s0 = 0;
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, COMMAND, 5);
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 0);
-
-   /* Call FW to make the change */
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, s0);
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
-   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
-
-   axgbe_phy_complete_ratechange(pdata);
+   axgbe_phy_perform_ratechange(pdata, 5, 0);
 
PMD_DRV_LOG(DEBUG, "receiver reset complete\n");  } @@ -1263,13 +1257,9 
@@ static void axgbe_phy_power_off(struct axgbe_port *pdata)  {
struct axgbe_phy_data *phy_data = pdata->phy_data;
 
-   axgbe_phy_start_ratechange(pdata);
+   /* Power off */
+   axgbe_phy_perform_ratechange(pdata, 0, 0);
 
-   /* Call FW to make the change */
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, 0);
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
-   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
-   axgbe_phy_complete_ratechange(pdata);
phy_data->cur_mode = AXGBE_MODE_UNKNOWN;
 
PMD_DRV_LOG(DEBUG, "phy powered off\n"); @@ -1278,31 +1268,21 @@ static 
void axgbe_phy_power_off(struct axgbe_port *pdata)  static void 
axgbe_phy_sfi_mode(struct axgbe_port *pdata)  {
struct axgbe_phy_data *phy_data = pdata->phy_data;
-   unsigned int s0;
 
axgbe_phy_set_redrv_mode(pdata);
 
-   axgbe_phy_start_ratechange(pdata);
-
/* 10G/SFI */
-   s0 = 0;
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, COMMAND, 3);
if (phy_data->sfp_cable != AXGBE_SFP_CABLE_PASSIVE) {
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 0);
+   axgbe_phy_perform_ratechange(pdata, 3, 0);
} else {
if (phy_data->sfp_cable_len <= 1)
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 1);
+   axgbe_phy_perform_ratechange(pdata, 3, 1);
else if (phy_data->sfp_cable_len <= 3)
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 2);
+   axgbe_phy_perform_ratechange(pdata, 3, 2);
else
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 3);
+   axgbe_phy_perform_ratechange(pdata, 3, 3);
}
 
-   /* Call FW to make the change */
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, s0);
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
-   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
-   axgbe_phy_complete_ratechange(pdata);
phy_data->cur_mode = AXGBE_MODE_SFI;
 
PMD_DRV_LOG(DEBUG, "10GbE SFI 

RE: [PATCH v2 2/6] net/axgbe: toggle PLL settings during rate change

2022-01-27 Thread Namburu, Chandu-babu
[Public]

Acked-by: Chandubabu Namburu 

-Original Message-
From: sseba...@amd.com  
Sent: Tuesday, January 25, 2022 5:48 PM
To: dev@dpdk.org
Subject: [PATCH v2 2/6] net/axgbe: toggle PLL settings during rate change

From: Selwin Sebastian 

For each rate change command submission, the FW has to do a phy power off 
sequence internally. For this to happen correctly, the PLL re-initialization 
control setting has to be turned off before sending mailbox commands and 
re-enabled once the command submission is complete. Without the PLL control 
setting, the link up takes longer time in a fixed phy configuration.

Signed-off-by: Selwin Sebastian 
---
 drivers/net/axgbe/axgbe_common.h   |  9 +
 drivers/net/axgbe/axgbe_phy_impl.c | 22 --
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/net/axgbe/axgbe_common.h b/drivers/net/axgbe/axgbe_common.h
index df0aa21a9b..5a7ac35b6a 100644
--- a/drivers/net/axgbe/axgbe_common.h
+++ b/drivers/net/axgbe/axgbe_common.h
@@ -1314,6 +1314,11 @@
 #define MDIO_VEND2_PMA_CDR_CONTROL 0x8056
 #endif
 
+#ifndef MDIO_VEND2_PMA_MISC_CTRL0
+#define MDIO_VEND2_PMA_MISC_CTRL0  0x8090
+#endif
+
+
 #ifndef MDIO_CTRL1_SPEED1G
 #define MDIO_CTRL1_SPEED1G (MDIO_CTRL1_SPEED10G & ~BMCR_SPEED100)
 #endif
@@ -1392,6 +1397,10 @@ static inline uint32_t high32_value(uint64_t addr)
return (addr >> 32) & 0x0;
 }
 
+#define XGBE_PMA_PLL_CTRL_MASK BIT(15)
+#define XGBE_PMA_PLL_CTRL_SET  BIT(15)
+#define XGBE_PMA_PLL_CTRL_CLEAR0x
+
 /*END*/
 
 /* Bit setting and getting macros
diff --git a/drivers/net/axgbe/axgbe_phy_impl.c 
b/drivers/net/axgbe/axgbe_phy_impl.c
index 72104f8a3f..08d3484a11 100644
--- a/drivers/net/axgbe/axgbe_phy_impl.c
+++ b/drivers/net/axgbe/axgbe_phy_impl.c
@@ -1196,8 +1196,22 @@ static void axgbe_phy_set_redrv_mode(struct axgbe_port 
*pdata)
axgbe_phy_put_comm_ownership(pdata);
 }
 
+static void axgbe_phy_pll_ctrl(struct axgbe_port *pdata, bool enable) {
+   XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_VEND2_PMA_MISC_CTRL0,
+   XGBE_PMA_PLL_CTRL_MASK,
+   enable ? XGBE_PMA_PLL_CTRL_SET
+   : XGBE_PMA_PLL_CTRL_CLEAR);
+
+   /* Wait for command to complete */
+   rte_delay_us(150);
+}
+
 static void axgbe_phy_start_ratechange(struct axgbe_port *pdata)  {
+   /* Clear the PLL so that it helps in power down sequence */
+   axgbe_phy_pll_ctrl(pdata, false);
+
/* Log if a previous command did not complete */
if (XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS))
PMD_DRV_LOG(NOTICE, "firmware mailbox not ready for 
command\n"); @@ -1213,10 +1227,14 @@ static void 
axgbe_phy_complete_ratechange(struct axgbe_port *pdata)
wait = AXGBE_RATECHANGE_COUNT;
while (wait--) {
if (!XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS))
-   return;
-
+   goto reenable_pll;
rte_delay_us(1500);
}
+
+reenable_pll:
+/* Re-enable the PLL control */
+   axgbe_phy_pll_ctrl(pdata, true);
+
PMD_DRV_LOG(NOTICE, "firmware mailbox command did not complete\n");  }
 
--
2.25.1


RE: [PATCH v2 1/6] net/axgbe: always attempt link training in KR mode

2022-01-27 Thread Namburu, Chandu-babu
[Public]

Acked-by: Chandubabu Namburu 

-Original Message-
From: sseba...@amd.com  
Sent: Tuesday, January 25, 2022 5:48 PM
To: dev@dpdk.org
Subject: [PATCH v2 1/6] net/axgbe: always attempt link training in KR mode

From: Selwin Sebastian 

Link training is always attempted when in KR mode, but the code is structured 
to check if link training has been enabled before attempting to perform 
it.Since that check will always be true, simplify the code to always enable and 
start link training during KR auto-negotiation.

Signed-off-by: Selwin Sebastian 
---
 drivers/net/axgbe/axgbe_mdio.c | 62 --
 1 file changed, 15 insertions(+), 47 deletions(-)

diff --git a/drivers/net/axgbe/axgbe_mdio.c b/drivers/net/axgbe/axgbe_mdio.c 
index 32d8c666f9..913ceada0d 100644
--- a/drivers/net/axgbe/axgbe_mdio.c
+++ b/drivers/net/axgbe/axgbe_mdio.c
@@ -80,31 +80,10 @@ static void axgbe_an_clear_interrupts_all(struct axgbe_port 
*pdata)
axgbe_an37_clear_interrupts(pdata);
 }
 
-static void axgbe_an73_enable_kr_training(struct axgbe_port *pdata) -{
-   unsigned int reg;
-
-   reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
 
-   reg |= AXGBE_KR_TRAINING_ENABLE;
-   XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
-}
-
-static void axgbe_an73_disable_kr_training(struct axgbe_port *pdata) -{
-   unsigned int reg;
-
-   reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
-
-   reg &= ~AXGBE_KR_TRAINING_ENABLE;
-   XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
-}
 
 static void axgbe_kr_mode(struct axgbe_port *pdata)  {
-   /* Enable KR training */
-   axgbe_an73_enable_kr_training(pdata);
-
/* Set MAC to 10G speed */
pdata->hw_if.set_speed(pdata, SPEED_1);
 
@@ -114,9 +93,6 @@ static void axgbe_kr_mode(struct axgbe_port *pdata)
 
 static void axgbe_kx_2500_mode(struct axgbe_port *pdata)  {
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
-
/* Set MAC to 2.5G speed */
pdata->hw_if.set_speed(pdata, SPEED_2500);
 
@@ -126,9 +102,6 @@ static void axgbe_kx_2500_mode(struct axgbe_port *pdata)
 
 static void axgbe_kx_1000_mode(struct axgbe_port *pdata)  {
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
-
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000);
 
@@ -142,8 +115,6 @@ static void axgbe_sfi_mode(struct axgbe_port *pdata)
if (pdata->kr_redrv)
return axgbe_kr_mode(pdata);
 
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
 
/* Set MAC to 10G speed */
pdata->hw_if.set_speed(pdata, SPEED_1); @@ -154,8 +125,6 @@ static 
void axgbe_sfi_mode(struct axgbe_port *pdata)
 
 static void axgbe_x_mode(struct axgbe_port *pdata)  {
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
 
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000); @@ -166,8 +135,6 @@ static 
void axgbe_x_mode(struct axgbe_port *pdata)
 
 static void axgbe_sgmii_1000_mode(struct axgbe_port *pdata)  {
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
 
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000); @@ -178,8 +145,6 @@ static 
void axgbe_sgmii_1000_mode(struct axgbe_port *pdata)
 
 static void axgbe_sgmii_100_mode(struct axgbe_port *pdata)  {
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
 
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000); @@ -284,6 +249,12 @@ static 
void axgbe_an73_set(struct axgbe_port *pdata, bool enable,  {
unsigned int reg;
 
+   /* Disable KR training for now */
+   reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
+   reg &= ~AXGBE_KR_TRAINING_ENABLE;
+   XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
+
+   /* Update AN settings */
reg = XMDIO_READ(pdata, MDIO_MMD_AN, MDIO_CTRL1);
reg &= ~MDIO_AN_CTRL1_ENABLE;
 
@@ -379,20 +350,17 @@ static enum axgbe_an axgbe_an73_tx_training(struct 
axgbe_port *pdata,
XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_FECCTRL, reg);
 
/* Start KR training */
-   reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
-   if (reg & AXGBE_KR_TRAINING_ENABLE) {
-   if (pdata->phy_if.phy_impl.kr_training_pre)
-   pdata->phy_if.phy_impl.kr_training_pre(pdata);
+   if (pdata->phy_if.phy_impl.kr_training_pre)
+   pdata->phy_if.phy_impl.kr_training_pre(pdata);
 
-   reg |= AXGBE_KR_TRAINING_START;
-   XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL,
-   reg);
-
-   PMD_DRV_LOG(DEBUG, "KR training initiated\n");
+   reg = XMDIO_R

Re: [PATCH v2 00/15] fix and feature for hns3 PMD

2022-01-27 Thread Ferruh Yigit

On 1/27/2022 12:49 PM, Ferruh Yigit wrote:

On 1/22/2022 1:51 AM, Min Hu (Connor) wrote:

This patch contains 15 patches, which include fixing codecheck warning
,code refactor and indirect counter action support.

Chengwen Feng (4):
   net/hns3: remove invalid encapsulation function
   net/hns3: delete strerror invoke
   net/hns3: rename function
   net/hns3: support indirect counter action

Huisong Li (10):
   net/hns3: remove unnecessary assignment
   net/hns3: fix a misjudgment expression
   net/hns3: extract a common API to initialize MAC addrs
   net/hns3: remove unnecessary black lines
   net/hns3: extract a function to handle reset fail
   net/hns3: remove unused variables
   net/hns3: remove the number of queue descriptors
   net/hns3: remove the printing of memory addresses
   net/hns3: extract a common interface to obtain revision ID
   net/hns3: extract functions to create RSS and FDIR flow rule

Jie Hai (1):
   net/hns3: remove unnecessary 'inline'



Hi Connor,

There are some patches in the set that are sent by non maintainers,
for them it requires maintainer ack.
Since you are sending the patch, it implies that you are OK with them
but can you please add your explicit review/ack tags to them?



Implicit ack converted to an explicit one:

For series,
Acked-by: Min Hu (Connor) 

Series applied to dpdk-next-net/main, thanks.


Re: [PATCH v2 1/6] doc: add hugepage mapping details

2022-01-27 Thread Bruce Richardson
On Wed, Jan 19, 2022 at 11:09:12PM +0200, Dmitry Kozlyuk wrote:
> Hugepage mapping is a layer of EAL malloc builds upon.
> There were implicit references to its details,
> like mentions of segment file descriptors,
> but no explicit description of its modes and operation.
> Add an overview of mechanics used on ech supported OS.
> Convert memory management subsections from list items
> to level 4 headers: they are big and important enough.
> 
> Signed-off-by: Dmitry Kozlyuk 
> ---

Some good cleanup and doc enhancements here. Some comments inline below.
One could argue that this patch should perhaps to be 2, with the conversion
of bullets to subsections being separate, but personally I think it's fine
having it in one patch as here.

Acked-by: Bruce Richardson 

>  .../prog_guide/env_abstraction_layer.rst  | 95 +--
>  1 file changed, 86 insertions(+), 9 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst 
> b/doc/guides/prog_guide/env_abstraction_layer.rst
> index c6accce701..fede7fe69d 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -86,7 +86,7 @@ See chapter
>  Memory Mapping Discovery and Memory Reservation
>  ~~~
>  
> -The allocation of large contiguous physical memory is done using the 
> hugetlbfs kernel filesystem.
> +The allocation of large contiguous physical memory is done using hugepages.
>  The EAL provides an API to reserve named memory zones in this contiguous 
> memory.
>  The physical address of the reserved memory for that memory zone is also 
> returned to the user by the memory zone reservation API.
>  
> @@ -95,11 +95,13 @@ and legacy mode. Both modes are explained below.
>  
>  .. note::
>  
> -Memory reservations done using the APIs provided by rte_malloc are also 
> backed by pages from the hugetlbfs filesystem.
> +Memory reservations done using the APIs provided by rte_malloc
> +are also backed by hugepages unless ``--no-huge`` option is given.
>  
> -+ Dynamic memory mode
> +Dynamic Memory Mode
> +^^^
>  
> -Currently, this mode is only supported on Linux.
> +Currently, this mode is only supported on Linux and Windows.
>  
>  In this mode, usage of hugepages by DPDK application will grow and shrink 
> based
>  on application's requests. Any memory allocation through ``rte_malloc()``,
> @@ -155,7 +157,8 @@ of memory that can be used by DPDK application.
>  :ref:`Multi-process Support ` for more details 
> about
>  DPDK IPC.
>  
> -+ Legacy memory mode
> +Legacy Memory Mode
> +^^
>  
>  This mode is enabled by specifying ``--legacy-mem`` command-line switch to 
> the
>  EAL. This switch will have no effect on FreeBSD as FreeBSD only supports
> @@ -168,7 +171,8 @@ not allow acquiring or releasing hugepages from the 
> system at runtime.
>  If neither ``-m`` nor ``--socket-mem`` were specified, the entire available
>  hugepage memory will be preallocated.
>  
> -+ Hugepage allocation matching
> +Hugepage Allocation Matching
> +
>  
>  This behavior is enabled by specifying the ``--match-allocations`` 
> command-line
>  switch to the EAL. This switch is Linux-only and not supported with
> @@ -182,7 +186,8 @@ matching can be used by these types of applications to 
> satisfy both of these
>  requirements. This can result in some increased memory usage which is
>  very dependent on the memory allocation patterns of the application.
>  
> -+ 32-bit support
> +32-bit Support
> +^^
>  
>  Additional restrictions are present when running in 32-bit mode. In dynamic
>  memory mode, by default maximum of 2 gigabytes of VA space will be 
> preallocated,
> @@ -192,7 +197,8 @@ used.
>  In legacy mode, VA space will only be preallocated for segments that were
>  requested (plus padding, to keep IOVA-contiguousness).
>  
> -+ Maximum amount of memory
> +Maximum Amount of Memory
> +
>  
>  All possible virtual memory space that can ever be used for hugepage mapping 
> in
>  a DPDK process is preallocated at startup, thereby placing an upper limit on 
> how
> @@ -222,7 +228,77 @@ Normally, these options do not need to be changed.
>  can later be mapped into that preallocated VA space (if dynamic memory 
> mode
>  is enabled), and can optionally be mapped into it at startup.
>  
> -+ Segment file descriptors
> +Hugepage Mapping
> +
> +
> +Below is an overview of methods used for each OS to obtain hugepages,
> +explaining why certain limitations and options exist in EAL.
> +See the user guide for a specific OS for configuration details.
> +
> +FreeBSD uses ``contigmem`` kernel module
> +to reserve a fixed number of hugepages at system start,
> +which are mapped by EAL at initialization using a specific ``sysctl()``.
> +
> +Windows EAL allocates hugepages from the OS as needed using Win32 API,
> +so avail

Re: [PATCH v2 6/6] net/axgbe: alter the port speed bit range

2022-01-27 Thread Ferruh Yigit

On 1/25/2022 4:21 PM, Namburu, Chandu-babu wrote:

[Public]


Moving ack down, please don't top post.



-Original Message-
From: sseba...@amd.com 
Sent: Tuesday, January 25, 2022 5:48 PM
To: dev@dpdk.org
Subject: [PATCH v2 6/6] net/axgbe: alter the port speed bit range

From: Selwin Sebastian 

Newer generation Hardware uses the slightly different port speed bit widths, so 
alter the existing port speed bit range to extend support to the newer 
generation hardware while maintaining the backward compatibility with older 
generation hardware.

The previously reserved bits are now being used which then requires the 
adjustment to the BIT values, e.g.:

Before:
PORT_PROPERTY_0[22:21] - Reserved
PORT_PROPERTY_0[26:23] - Supported Speeds

After:
PORT_PROPERTY_0[21] - Reserved
PORT_PROPERTY_0[26:22] - Supported Speeds

To make this backwards compatible, the existing BIT definitions for the port 
speeds are incremented by one to maintain the original position.

Signed-off-by: Selwin Sebastian 

For series,
Acked-by: Chandubabu Namburu 



Series applied to dpdk-next-net/main, thanks.


[PATCH 0/5] Add JSON vector set support to fips validation

2022-01-27 Thread Brandon Lo
Adds a very basic introduction to JSON vector sets in
the fips validation example application. This patch set
will only introduce the AES-GCM test using a JSON request
file because the other algorithms need more information
than what is given in the new JSON format.

Brandon Lo (5):
  examples/fips_validation: add jansson dependency
  examples/fips_validation: add json info to header
  examples/fips_validation: add json parsing
  examples/fips_validation: allow json file as input
  examples/fips_validation: add json to gcm test

 examples/fips_validation/fips_validation.c|  84 
 examples/fips_validation/fips_validation.h|  42 +++-
 .../fips_validation/fips_validation_gcm.c | 149 ++
 examples/fips_validation/main.c   | 192 +-
 examples/fips_validation/meson.build  |   4 +
 5 files changed, 467 insertions(+), 4 deletions(-)

-- 
2.25.1



[PATCH 1/5] examples/fips_validation: add jansson dependency

2022-01-27 Thread Brandon Lo
Added a check for RTE_HAS_JANSSON into the meson
configuration file for JSON support.

Signed-off-by: Brandon Lo 
---
 examples/fips_validation/meson.build | 4 
 1 file changed, 4 insertions(+)

diff --git a/examples/fips_validation/meson.build 
b/examples/fips_validation/meson.build
index 7eef456318..8cd63066b5 100644
--- a/examples/fips_validation/meson.build
+++ b/examples/fips_validation/meson.build
@@ -21,3 +21,7 @@ sources = files(
 'fips_dev_self_test.c',
 'main.c',
 )
+
+if dpdk_conf.has('RTE_HAS_JANSSON')
+ext_deps += jansson_dep
+endif
-- 
2.25.1



[PATCH 2/5] examples/fips_validation: add json info to header

2022-01-27 Thread Brandon Lo
Added json-specific functions and other information needed to
test the new FIPS test vectors.

Signed-off-by: Brandon Lo 
---
 examples/fips_validation/fips_validation.h | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/examples/fips_validation/fips_validation.h 
b/examples/fips_validation/fips_validation.h
index aaadf01ba8..e2789df93a 100644
--- a/examples/fips_validation/fips_validation.h
+++ b/examples/fips_validation/fips_validation.h
@@ -5,6 +5,10 @@
 #ifndef _FIPS_VALIDATION_H_
 #define _FIPS_VALIDATION_H_
 
+#ifdef RTE_HAS_JANSSON
+#include 
+#endif /* RTE_HAS_JANSSON */
+
 #define FIPS_PARSE_ERR(fmt, args)  \
RTE_LOG(ERR, USER1, "FIPS parse error" ## fmt ## "\n", ## args)
 
@@ -24,6 +28,9 @@
 #define REQ_FILE_PERFIX"req"
 #define RSP_FILE_PERFIX"rsp"
 #define FAX_FILE_PERFIX"fax"
+#define JSON_FILE_PERFIX   "json"
+
+#define ACVVERSION "1.0"
 
 enum fips_test_algorithms {
FIPS_TEST_ALGO_AES = 0,
@@ -40,7 +47,8 @@ enum fips_test_algorithms {
 enum file_types {
FIPS_TYPE_REQ = 1,
FIPS_TYPE_FAX,
-   FIPS_TYPE_RSP
+   FIPS_TYPE_RSP,
+   FIPS_TYPE_JSON,
 };
 
 enum fips_test_op {
@@ -161,6 +169,23 @@ struct gcm_interim_data {
uint8_t gen_iv;
 };
 
+#ifdef RTE_HAS_JANSSON
+struct fips_test_json_info {
+   /* Information used for reading from json */
+   json_t *json_root;
+   json_t *json_vector_set;
+   json_t *json_test_group;
+   json_t *json_test_case;
+   /* Location of json write output */
+   json_t *json_write_root;
+   json_t *json_write_group;
+   json_t *json_write_set;
+   json_t *json_write_case;
+   /* Other info */
+   uint8_t is_sample;
+};
+#endif /* RTE_HAS_JANSSON */
+
 struct fips_test_interim_info {
FILE *fp_rd;
FILE *fp_wr;
@@ -196,6 +221,10 @@ struct fips_test_interim_info {
 extern struct fips_test_vector vec;
 extern struct fips_test_interim_info info;
 
+#ifdef RTE_HAS_JANSSON
+extern struct fips_test_json_info json_info;
+#endif /* RTE_HAS_JANSSON */
+
 int
 fips_test_init(const char *req_file_path, const char *rsp_file_path,
const char *device_name);
@@ -212,6 +241,17 @@ fips_test_parse_one_case(void);
 void
 fips_test_write_one_case(void);
 
+#ifdef RTE_HAS_JANSSON
+int
+fips_test_parse_one_json_vector_set(void);
+
+int
+fips_test_parse_one_json_group(void);
+
+int
+fips_test_parse_one_json_case(void);
+#endif /* RTE_HAS_JANSSON */
+
 int
 parse_test_aes_init(void);
 
-- 
2.25.1



[PATCH 3/5] examples/fips_validation: add json parsing

2022-01-27 Thread Brandon Lo
Added functions to parse the required information from a vector set
given in the new json format.

Signed-off-by: Brandon Lo 
---
 examples/fips_validation/fips_validation.c | 84 ++
 1 file changed, 84 insertions(+)

diff --git a/examples/fips_validation/fips_validation.c 
b/examples/fips_validation/fips_validation.c
index 52a7bf952d..40254a9181 100644
--- a/examples/fips_validation/fips_validation.c
+++ b/examples/fips_validation/fips_validation.c
@@ -276,6 +276,8 @@ parse_file_type(const char *path)
info.file_type = FIPS_TYPE_RSP;
else if (strstr(path, FAX_FILE_PERFIX))
info.file_type = FIPS_TYPE_FAX;
+   else if (strstr(path, JSON_FILE_PERFIX))
+   info.file_type = FIPS_TYPE_JSON;
else
return -EINVAL;
 
@@ -311,6 +313,21 @@ fips_test_init(const char *req_file_path, const char 
*rsp_file_path,
return -EINVAL;
}
 
+   if (info.file_type == FIPS_TYPE_JSON) {
+#ifdef RTE_HAS_JANSSON
+   json_error_t error;
+   json_info.json_root = json_loadf(info.fp_rd, 0, &error);
+   if (!json_info.json_root) {
+   RTE_LOG(ERR, USER1, "Cannot parse json file %s (line 
%d, column %d)\n",
+   req_file_path, error.line, error.column);
+   return -EINVAL;
+   }
+#else /* RTE_HAS_JANSSON */
+   RTE_LOG(ERR, USER1, "No json library configured.\n");
+   return -EINVAL;
+#endif /* RTE_HAS_JANSSON */
+   }
+
info.fp_wr = fopen(rsp_file_path, "w");
if (!info.fp_wr) {
RTE_LOG(ERR, USER1, "Cannot open file %s\n", rsp_file_path);
@@ -329,6 +346,8 @@ fips_test_init(const char *req_file_path, const char 
*rsp_file_path,
return -EINVAL;
}
 
+   if (info.file_type == FIPS_TYPE_JSON) return 0;
+
if (fips_test_parse_header() < 0) {
RTE_LOG(ERR, USER1, "Failed parsing header\n");
return -1;
@@ -429,6 +448,71 @@ fips_test_write_one_case(void)
fprintf(info.fp_wr, "%s\n", info.vec[i]);
 }
 
+#ifdef RTE_HAS_JANSSON
+int
+fips_test_parse_one_json_vector_set(void)
+{
+   json_t *algo_obj = json_object_get(json_info.json_vector_set, 
"algorithm");
+   const char *algo_str = json_string_value(algo_obj);
+
+   /* Vector sets contain the algorithm type, and nothing else we need. */
+   if (strstr(algo_str, "AES-GCM")) info.algo = FIPS_TEST_ALGO_AES_GCM;
+   else return -EINVAL;
+
+   return 0;
+}
+
+int
+fips_test_parse_one_json_group(void)
+{
+   int ret;
+
+   if (info.interim_callbacks) {
+   char json_value[256];
+   for (int i = 0; info.interim_callbacks[i].key != NULL; i++) {
+   json_t *param = 
json_object_get(json_info.json_test_group, info.interim_callbacks[i].key);
+   json_int_t val = json_integer_value(param);
+   sprintf(json_value, "%lld", val);
+   /* First argument is blank because the key
+  is not included in the string being parsed. */
+   ret = info.interim_callbacks[i].cb(
+   "", json_value,
+   info.interim_callbacks[i].val
+   );
+   if (ret < 0)
+   return ret;
+   }
+   }
+
+   return 0;
+}
+
+int
+fips_test_parse_one_json_case(void)
+{
+   uint32_t i;
+   int ret = 0;
+
+   for (i = 0; info.callbacks[i].key != NULL; i++) {
+   json_t *param = json_object_get(json_info.json_test_case, 
info.callbacks[i].key);
+   if (param) {
+   const char *json_string = json_string_value(param);
+   strcpy(info.one_line_text, json_string);
+   /* First argument is blank because the key
+  is not included in the string being parsed. */
+   ret = info.callbacks[i].cb(
+   "", info.one_line_text,
+   info.callbacks[i].val
+   );
+   if (ret < 0)
+   return ret;
+   }
+   }
+
+   return 0;
+}
+#endif /* RTE_HAS_JANSSON */
+
 static int
 parser_read_uint64_hex(uint64_t *value, const char *p)
 {
-- 
2.25.1



[PATCH 4/5] examples/fips_validation: allow json file as input

2022-01-27 Thread Brandon Lo
Added the ability to use the json format as the input
and output of the example application.

Signed-off-by: Brandon Lo 
---
 examples/fips_validation/main.c | 192 +++-
 1 file changed, 189 insertions(+), 3 deletions(-)

diff --git a/examples/fips_validation/main.c b/examples/fips_validation/main.c
index dc40bffe7d..40d0d10ec7 100644
--- a/examples/fips_validation/main.c
+++ b/examples/fips_validation/main.c
@@ -34,11 +34,17 @@ enum {
OPT_CRYPTODEV_BK_ID_NUM,
 #define OPT_CRYPTODEV_BK_DIR_KEY"broken-test-dir"
OPT_CRYPTODEV_BK_DIR_KEY_NUM,
+#define OPT_USE_JSON"use-json"
+   OPT_USE_JSON_NUM,
 };
 
 struct fips_test_vector vec;
 struct fips_test_interim_info info;
 
+#ifdef RTE_HAS_JANSSON
+struct fips_test_json_info json_info;
+#endif /* RTE_HAS_JANSSON */
+
 struct cryptodev_fips_validate_env {
const char *req_path;
const char *rsp_path;
@@ -169,6 +175,11 @@ cryptodev_fips_validate_app_uninit(void)
 static int
 fips_test_one_file(void);
 
+#ifdef RTE_HAS_JANSSON
+static int
+fips_test_one_json_file(void);
+#endif /* RTE_HAS_JANSSON */
+
 static int
 parse_cryptodev_arg(char *arg)
 {
@@ -392,6 +403,7 @@ int
 main(int argc, char *argv[])
 {
int ret;
+   char use_json;
 
ret = rte_eal_init(argc, argv);
if (ret < 0) {
@@ -427,9 +439,16 @@ main(int argc, char *argv[])
ret, env.req_path);
goto exit;
}
+   use_json = info.file_type == FIPS_TYPE_JSON;
 
-
+#ifdef RTE_HAS_JANSSON
+   ret = info.file_type == FIPS_TYPE_JSON ?
+   fips_test_one_json_file() : fips_test_one_file();
+   if (use_json) json_decref(json_info.json_root);
+#else /* RTE_HAS_JANSSON */
ret = fips_test_one_file();
+#endif /* RTE_HAS_JANSSON */
+
if (ret < 0) {
RTE_LOG(ERR, USER1, "Error %i: Failed test %s\n",
ret, env.req_path);
@@ -483,8 +502,19 @@ main(int argc, char *argv[])
ret, req_path);
break;
}
+   use_json = info.file_type == FIPS_TYPE_JSON;
 
+#ifdef RTE_HAS_JANSSON
+   if (use_json) {
+   ret = fips_test_one_json_file();
+   json_decref(json_info.json_root);
+   } else {
+   ret = fips_test_one_file();
+   }
+#else /* RTE_HAS_JANSSON */
ret = fips_test_one_file();
+#endif /* RTE_HAS_JANSSON */
+
if (ret < 0) {
RTE_LOG(ERR, USER1, "Error %i: Failed test 
%s\n",
ret, req_path);
@@ -1226,7 +1256,7 @@ fips_generic_test(void)
struct fips_val val = {NULL, 0};
int ret;
 
-   fips_test_write_one_case();
+   if (info.file_type != FIPS_TYPE_JSON) fips_test_write_one_case();
 
ret = fips_run_test();
if (ret < 0) {
@@ -1245,6 +1275,7 @@ fips_generic_test(void)
switch (info.file_type) {
case FIPS_TYPE_REQ:
case FIPS_TYPE_RSP:
+   case FIPS_TYPE_JSON:
if (info.parse_writeback == NULL)
return -EPERM;
ret = info.parse_writeback(&val);
@@ -1260,7 +1291,7 @@ fips_generic_test(void)
break;
}
 
-   fprintf(info.fp_wr, "\n");
+   if (info.file_type != FIPS_TYPE_JSON) fprintf(info.fp_wr, "\n");
free(val.val);
 
return 0;
@@ -1856,3 +1887,158 @@ fips_test_one_file(void)
 
return ret;
 }
+
+#ifdef RTE_HAS_JANSSON
+static int
+fips_test_json_init_writeback(void)
+{
+   json_t *session_info, *session_write;
+   session_info = json_array_get(json_info.json_root, 0);
+   session_write = json_object();
+   json_info.json_write_root = json_array();
+
+   json_object_set(session_write, "jwt",
+   json_object_get(session_info, "jwt"));
+   json_object_set(session_write, "url",
+   json_object_get(session_info, "url"));
+   json_object_set(session_write, "isSample",
+   json_object_get(session_info, "isSample"));
+
+   json_info.is_sample = json_boolean_value(
+   json_object_get(session_info, "isSample"));
+
+   json_array_append_new(json_info.json_write_root, session_write);
+   return 0;
+}
+
+static int
+fips_test_one_test_case(void)
+{
+   int ret;
+
+   ret = fips_test_parse_one_json_case();
+
+   switch (ret) {
+   case 0:
+   ret = test_ops.test();
+   if (ret == 0)
+   break;
+   RTE_LOG(ERR, USER1, "Error %i: test block\n",
+   ret);
+   break;
+   case 

[PATCH 5/5] examples/fips_validation: add json to gcm test

2022-01-27 Thread Brandon Lo
Adds json-specific testing and writeback function. Allows
the user to test AES-GCM vector sets.

Signed-off-by: Brandon Lo 
---
 .../fips_validation/fips_validation_gcm.c | 149 ++
 1 file changed, 149 insertions(+)

diff --git a/examples/fips_validation/fips_validation_gcm.c 
b/examples/fips_validation/fips_validation_gcm.c
index 250d09bf90..4df20370b6 100644
--- a/examples/fips_validation/fips_validation_gcm.c
+++ b/examples/fips_validation/fips_validation_gcm.c
@@ -6,6 +6,10 @@
 #include 
 #include 
 
+#ifdef RTE_HAS_JANSSON
+#include 
+#endif /* RTE_HAS_JANSSON */
+
 #include 
 #include 
 
@@ -37,6 +41,27 @@
 #define OP_ENC_EXT_STR "ExtIV"
 #define OP_ENC_INT_STR "IntIV"
 
+#define KEYLEN_JSON_STR"keyLen"
+#define IVLEN_JSON_STR "ivLen"
+#define PAYLOADLEN_JSON_STR"payloadLen"
+#define AADLEN_JSON_STR"aadLen"
+#define TAGLEN_JSON_STR"tagLen"
+
+#define KEY_JSON_STR   "key"
+#define IV_JSON_STR"iv"
+#define PT_JSON_STR"pt"
+#define CT_JSON_STR"ct"
+#define AAD_JSON_STR   "aad"
+#define TAG_JSON_STR   "tag"
+#define DIR_JSON_STR   "direction"
+
+#define OP_ENC_JSON_STR"encrypt"
+#define OP_DEC_JSON_STR"decrypt"
+
+#define IVGEN_JSON_STR "ivGen"
+#define OP_ENC_EXT_JSON_STR"external"
+#define OP_ENC_INT_JSON_STR"internal"
+
 #define NEG_TEST_STR   "FAIL"
 
 /**
@@ -136,6 +161,40 @@ struct fips_test_callback gcm_enc_vectors[] = {
{NULL, NULL, NULL} /**< end pointer */
 };
 
+#ifdef RTE_HAS_JANSSON
+struct fips_test_callback gcm_dec_json_vectors[] = {
+   {KEY_JSON_STR, parse_uint8_known_len_hex_str, &vec.aead.key},
+   {IV_JSON_STR, parse_uint8_known_len_hex_str, &vec.iv},
+   {CT_JSON_STR, parse_gcm_pt_ct_str, &vec.ct},
+   {AAD_JSON_STR, parse_gcm_aad_str, &vec.aead.aad},
+   {TAG_JSON_STR, parse_uint8_known_len_hex_str,
+   &vec.aead.digest},
+   {NULL, NULL, NULL} /**< end pointer */
+};
+
+struct fips_test_callback gcm_interim_json_vectors[] = {
+   {KEYLEN_JSON_STR, parser_read_uint32_bit_val, &vec.aead.key},
+   {IVLEN_JSON_STR, parser_read_uint32_bit_val, &vec.iv},
+   {PAYLOADLEN_JSON_STR, parser_read_gcm_pt_len, &vec.pt},
+   {PAYLOADLEN_JSON_STR, parser_read_uint32_bit_val, &vec.ct},
+   /**< The NIST json test vectors use 'payloadLen' to denote 
input text
+*  length in case of decrypt & encrypt operations.
+*/
+   {AADLEN_JSON_STR, parser_read_uint32_bit_val, &vec.aead.aad},
+   {TAGLEN_JSON_STR, parser_read_uint32_bit_val,
+   &vec.aead.digest},
+   {NULL, NULL, NULL} /**< end pointer */
+};
+
+struct fips_test_callback gcm_enc_json_vectors[] = {
+   {KEY_JSON_STR, parse_uint8_known_len_hex_str, &vec.aead.key},
+   {IV_JSON_STR, parse_uint8_known_len_hex_str, &vec.iv},
+   {PT_JSON_STR, parse_gcm_pt_ct_str, &vec.pt},
+   {AAD_JSON_STR, parse_gcm_aad_str, &vec.aead.aad},
+   {NULL, NULL, NULL} /**< end pointer */
+};
+#endif /* RTE_HAS_JANSSON */
+
 static int
 parse_test_gcm_writeback(struct fips_val *val)
 {
@@ -188,12 +247,102 @@ parse_test_gcm_writeback(struct fips_val *val)
return 0;
 }
 
+#ifdef RTE_HAS_JANSSON
+static int
+parse_test_gcm_json_writeback(struct fips_val *val)
+{
+   struct fips_val tmp_val;
+   json_t *tcId, *tag;
+
+   tcId = json_object_get(json_info.json_test_case, "tcId");
+
+   json_info.json_write_case = json_object();
+   json_object_set(json_info.json_write_case, "tcId", tcId);
+
+   if (info.op == FIPS_TEST_ENC_AUTH_GEN) {
+   json_t *ct;
+
+   tmp_val.val = val->val;
+   tmp_val.len = vec.pt.len;
+
+   writeback_hex_str("", info.one_line_text, &tmp_val);
+   ct = json_string(info.one_line_text);
+   json_object_set_new(json_info.json_write_case, CT_JSON_STR, ct);
+
+   if (info.interim_info.gcm_data.gen_iv) {
+   json_t *iv;
+   tmp_val.val = vec.iv.val;
+   tmp_val.len = vec.iv.len;
+
+   writeback_hex_str("", info.one_line_text, &tmp_val);
+   iv = json_string(info.one_line_text);
+   json_object_set_new(json_info.json_write_case, 
IV_JSON_STR, iv);
+
+   rte_free(vec.iv.val);
+   vec.iv.val = NULL;
+   }
+
+   tmp_val.val = val->val + vec.pt.len;
+   tmp_val.len = val->len - vec.pt.len;
+
+   writeback_hex_str("", info.one_line_text, &tmp_val);
+   tag = json_string(info.one_line_text);
+   json_object_set_new(json_info.json_write_case, TAG_JSON_STR, 
tag)

[PATCH 0/5] vhost: introduce per-virtqueue stats API

2022-01-27 Thread Maxime Coquelin
This series introduces a new Vhost API that provides
per-virtqueue statistics to the application. It will be
generally useful, but initial motivation for this series
was to be able to get to get virtqueues stats when Virtio
RSS feature will be supported in Vhost library.

First patch is a fix that should be considered even if the
series does not make it in v22.03. Second patch introduces
the new API and generic statistics. Patch 3 makes use of
this API in Vhost PMD. The two last patches introduce more
specific counters (syscalls in DP, IOTLB cache hits and
misses).

I understand we are late in the v22.03 release cycle, and
so the series may be postponned to next release even though
it does not introduces much risks of regressions. At least
patch 1 has to be considered for this release.

Maxime Coquelin (5):
  vhost: fix missing virtqueue lock protection
  vhost: add per-virtqueue statistics support
  net/vhost: move to Vhost library stats API
  vhost: add statistics for guest notifications
  vhost: add statistics for IOTLB

 drivers/net/vhost/rte_eth_vhost.c | 348 +++---
 lib/vhost/rte_vhost.h |  89 
 lib/vhost/socket.c|   4 +-
 lib/vhost/version.map |   5 +
 lib/vhost/vhost.c | 124 ++-
 lib/vhost/vhost.h |  26 ++-
 lib/vhost/virtio_net.c|  53 +
 7 files changed, 416 insertions(+), 233 deletions(-)

-- 
2.34.1



[PATCH 1/5] vhost: fix missing virtqueue lock protection

2022-01-27 Thread Maxime Coquelin
This patch ensures virtqueue metadata are not being
modified while rte_vhost_vring_call() is executed.

Fixes: 6c299bb7322f ("vhost: introduce vring call API")
Cc: sta...@dpdk.org

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index f59ca6c157..42c01abf25 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -1294,11 +1294,15 @@ rte_vhost_vring_call(int vid, uint16_t vring_idx)
if (!vq)
return -1;
 
+   rte_spinlock_lock(&vq->access_lock);
+
if (vq_is_packed(dev))
vhost_vring_call_packed(dev, vq);
else
vhost_vring_call_split(dev, vq);
 
+   rte_spinlock_unlock(&vq->access_lock);
+
return 0;
 }
 
-- 
2.34.1



[PATCH 2/5] vhost: add per-virtqueue statistics support

2022-01-27 Thread Maxime Coquelin
This patch introduces new APIs for the application
to query and reset per-virtqueue statistics. The
patch also introduces generic counters.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/rte_vhost.h  |  89 +
 lib/vhost/socket.c |   4 +-
 lib/vhost/version.map  |   5 ++
 lib/vhost/vhost.c  | 109 -
 lib/vhost/vhost.h  |  18 ++-
 lib/vhost/virtio_net.c |  53 
 6 files changed, 274 insertions(+), 4 deletions(-)

diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index b454c05868..e739091ca0 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -37,6 +37,7 @@ extern "C" {
 #define RTE_VHOST_USER_LINEARBUF_SUPPORT   (1ULL << 6)
 #define RTE_VHOST_USER_ASYNC_COPY  (1ULL << 7)
 #define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS  (1ULL << 8)
+#define RTE_VHOST_USER_NET_STATS_ENABLE(1ULL << 9)
 
 /* Features. */
 #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE
@@ -317,6 +318,32 @@ struct rte_vhost_power_monitor_cond {
uint8_t match;
 };
 
+/** Maximum name length for the statistics counters */
+#define RTE_VHOST_STATS_NAME_SIZE 64
+
+/**
+ * Vhost virtqueue statistics structure
+ *
+ * This structure is used by rte_vhost_vring_stats_get() to provide
+ * virtqueue statistics to the calling application.
+ * It maps a name ID, corresponding to an index in the array returned
+ * by rte_vhost_vring_stats_get_names(), to a statistic value.
+ */
+struct rte_vhost_stat {
+   uint64_t id;/**< The index in xstats name array. */
+   uint64_t value; /**< The statistic counter value. */
+};
+
+/**
+ * Vhost virtqueue statistic name element
+ *
+ * This structure is used by rte_vhost_vring_stats_get_anmes() to
+ * provide virtqueue statistics names to the calling application.
+ */
+struct rte_vhost_stat_name {
+   char name[RTE_VHOST_STATS_NAME_SIZE]; /**< The statistic name. */
+};
+
 /**
  * Convert guest physical address to host virtual address
  *
@@ -1059,6 +1086,68 @@ __rte_experimental
 int
 rte_vhost_slave_config_change(int vid, bool need_reply);
 
+/**
+ * Retrieve names of statistics of a Vhost virtqueue.
+ *
+ * There is an assumption that 'stat_names' and 'stats' arrays are matched
+ * by array index: stats_names[i].name => stats[i].value
+ *
+ * @param vid
+ *   vhost device ID
+ * @param queue_id
+ *   vhost queue index
+ * @param stats_names
+ *   array of at least size elements to be filled.
+ *   If set to NULL, the function returns the required number of elements.
+ * @param size
+ *   The number of elements in stats_names array.
+ * @return
+ *   A negative value on error, otherwise the number of entries filled in the
+ *   stats name array.
+ */
+__rte_experimental
+int
+rte_vhost_vring_stats_get_names(int vid, uint16_t queue_id,
+   struct rte_vhost_stat_name *name, unsigned int size);
+
+/**
+ * Retrieve statistics of a Vhost virtqueue.
+ *
+ * There is an assumption that 'stat_names' and 'stats' arrays are matched
+ * by array index: stats_names[i].name => stats[i].value
+ *
+ * @param vid
+ *   vhost device ID
+ * @param queue_id
+ *   vhost queue index
+ * @param stats
+ *   A pointer to a table of structure of type rte_vhost_stat to be filled with
+ *   virtqueue statistics ids and values.
+ * @param n
+ *   The number of elements in stats array.
+ * @return
+ *   A negative value on error, otherwise the number of entries filled in the
+ *   stats table.
+ */
+__rte_experimental
+int
+rte_vhost_vring_stats_get(int vid, uint16_t queue_id,
+   struct rte_vhost_stat *stats, unsigned int n);
+
+/**
+ * Reset statistics of a Vhost virtqueue.
+ *
+ * @param vid
+ *   vhost device ID
+ * @param queue_id
+ *   vhost queue index
+ * @return
+ *   0 on success, a negative value on error.
+ */
+__rte_experimental
+int
+rte_vhost_vring_stats_reset(int vid, uint16_t queue_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index c2f8013cd5..6020565fb6 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -43,6 +43,7 @@ struct vhost_user_socket {
bool linearbuf;
bool async_copy;
bool net_compliant_ol_flags;
+   bool stats_enabled;
 
/*
 * The "supported_features" indicates the feature bits the
@@ -228,7 +229,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket 
*vsocket)
vhost_set_ifname(vid, vsocket->path, size);
 
vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net,
-   vsocket->net_compliant_ol_flags);
+   vsocket->net_compliant_ol_flags, vsocket->stats_enabled);
 
vhost_attach_vdpa_device(vid, vsocket->vdpa_dev);
 
@@ -864,6 +865,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
vsocket->net_compliant_ol_flags = flags & 
RTE_VHOST_U

[PATCH 4/5] vhost: add statistics for guest notifications

2022-01-27 Thread Maxime Coquelin
This patch adds a new virtqueue statistic for guest
notifications. It is useful to deduce from hypervisor side
whether the corresponding guest Virtio device is using
Kernel Virtio-net driver or DPDK Virtio PMD.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost.c | 1 +
 lib/vhost/vhost.h | 5 +
 2 files changed, 6 insertions(+)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 0c6a737aca..2d0d9e7f51 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -46,6 +46,7 @@ static const struct vhost_vq_stats_name_off 
vhost_vq_stat_strings[] = {
{"size_512_1023_packets",  offsetof(struct vhost_virtqueue, 
stats.size_bins[5])},
{"size_1024_1518_packets", offsetof(struct vhost_virtqueue, 
stats.size_bins[6])},
{"size_1519_max_packets",  offsetof(struct vhost_virtqueue, 
stats.size_bins[7])},
+   {"guest_notifications",offsetof(struct vhost_virtqueue, 
stats.guest_notifications)},
 };
 
 #define VHOST_NB_VQ_STATS RTE_DIM(vhost_vq_stat_strings)
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 4c151244c7..0c7669e8c9 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -131,6 +131,7 @@ struct virtqueue_stats {
uint64_t broadcast;
/* Size bins in array as RFC 2819, undersized [0], 64 [1], etc */
uint64_t size_bins[8];
+   uint64_t guest_notifications;
 };
 
 /**
@@ -787,6 +788,8 @@ vhost_vring_call_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq)
(vq->callfd >= 0)) ||
unlikely(!signalled_used_valid)) {
eventfd_write(vq->callfd, (eventfd_t) 1);
+   if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+   vq->stats.guest_notifications++;
if (dev->notify_ops->guest_notified)
dev->notify_ops->guest_notified(dev->vid);
}
@@ -795,6 +798,8 @@ vhost_vring_call_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq)
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
&& (vq->callfd >= 0)) {
eventfd_write(vq->callfd, (eventfd_t)1);
+   if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+   vq->stats.guest_notifications++;
if (dev->notify_ops->guest_notified)
dev->notify_ops->guest_notified(dev->vid);
}
-- 
2.34.1



[PATCH 5/5] vhost: add statistics for IOTLB

2022-01-27 Thread Maxime Coquelin
This patch adds statistics for IOTLB hits and misses.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost.c | 10 +-
 lib/vhost/vhost.h |  3 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 2d0d9e7f51..bda974d34f 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -47,6 +47,8 @@ static const struct vhost_vq_stats_name_off 
vhost_vq_stat_strings[] = {
{"size_1024_1518_packets", offsetof(struct vhost_virtqueue, 
stats.size_bins[6])},
{"size_1519_max_packets",  offsetof(struct vhost_virtqueue, 
stats.size_bins[7])},
{"guest_notifications",offsetof(struct vhost_virtqueue, 
stats.guest_notifications)},
+   {"iotlb_hits", offsetof(struct vhost_virtqueue, 
stats.iotlb_hits)},
+   {"iotlb_misses",   offsetof(struct vhost_virtqueue, 
stats.iotlb_misses)},
 };
 
 #define VHOST_NB_VQ_STATS RTE_DIM(vhost_vq_stat_strings)
@@ -64,8 +66,14 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
tmp_size = *size;
 
vva = vhost_user_iotlb_cache_find(vq, iova, &tmp_size, perm);
-   if (tmp_size == *size)
+   if (tmp_size == *size) {
+   if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+   vq->stats.iotlb_hits++;
return vva;
+   }
+
+   if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+   vq->stats.iotlb_misses++;
 
iova += tmp_size;
 
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 0c7669e8c9..8c078decab 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -132,6 +132,9 @@ struct virtqueue_stats {
/* Size bins in array as RFC 2819, undersized [0], 64 [1], etc */
uint64_t size_bins[8];
uint64_t guest_notifications;
+   uint64_t iotlb_hits;
+   uint64_t iotlb_misses;
+   uint64_t iotlb_errors;
 };
 
 /**
-- 
2.34.1



[PATCH 3/5] net/vhost: move to Vhost library stats API

2022-01-27 Thread Maxime Coquelin
Now that we have Vhost statistics APIs, this patch replaces
Vhost PMD extented statistics implementation with calls
to the new API. It will enable getting more statistics for
counters that cannot be implmented at the PMD level.

Signed-off-by: Maxime Coquelin 
---
 drivers/net/vhost/rte_eth_vhost.c | 348 +++---
 1 file changed, 120 insertions(+), 228 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 070f0e6dfd..bac1c0acba 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -59,33 +59,10 @@ static struct rte_ether_addr base_eth_addr = {
}
 };
 
-enum vhost_xstats_pkts {
-   VHOST_UNDERSIZE_PKT = 0,
-   VHOST_64_PKT,
-   VHOST_65_TO_127_PKT,
-   VHOST_128_TO_255_PKT,
-   VHOST_256_TO_511_PKT,
-   VHOST_512_TO_1023_PKT,
-   VHOST_1024_TO_1522_PKT,
-   VHOST_1523_TO_MAX_PKT,
-   VHOST_BROADCAST_PKT,
-   VHOST_MULTICAST_PKT,
-   VHOST_UNICAST_PKT,
-   VHOST_PKT,
-   VHOST_BYTE,
-   VHOST_MISSED_PKT,
-   VHOST_ERRORS_PKT,
-   VHOST_ERRORS_FRAGMENTED,
-   VHOST_ERRORS_JABBER,
-   VHOST_UNKNOWN_PROTOCOL,
-   VHOST_XSTATS_MAX,
-};
-
 struct vhost_stats {
uint64_t pkts;
uint64_t bytes;
uint64_t missed_pkts;
-   uint64_t xstats[VHOST_XSTATS_MAX];
 };
 
 struct vhost_queue {
@@ -140,138 +117,92 @@ struct rte_vhost_vring_state {
 
 static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
 
-#define VHOST_XSTATS_NAME_SIZE 64
-
-struct vhost_xstats_name_off {
-   char name[VHOST_XSTATS_NAME_SIZE];
-   uint64_t offset;
-};
-
-/* [rx]_is prepended to the name string here */
-static const struct vhost_xstats_name_off vhost_rxport_stat_strings[] = {
-   {"good_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_PKT])},
-   {"total_bytes",
-offsetof(struct vhost_queue, stats.xstats[VHOST_BYTE])},
-   {"missed_pkts",
-offsetof(struct vhost_queue, stats.xstats[VHOST_MISSED_PKT])},
-   {"broadcast_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_BROADCAST_PKT])},
-   {"multicast_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_MULTICAST_PKT])},
-   {"unicast_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_UNICAST_PKT])},
-{"undersize_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_UNDERSIZE_PKT])},
-   {"size_64_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_64_PKT])},
-   {"size_65_to_127_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_65_TO_127_PKT])},
-   {"size_128_to_255_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_128_TO_255_PKT])},
-   {"size_256_to_511_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_256_TO_511_PKT])},
-   {"size_512_to_1023_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_512_TO_1023_PKT])},
-   {"size_1024_to_1522_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_1024_TO_1522_PKT])},
-   {"size_1523_to_max_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_1523_TO_MAX_PKT])},
-   {"errors_with_bad_CRC",
-offsetof(struct vhost_queue, stats.xstats[VHOST_ERRORS_PKT])},
-   {"fragmented_errors",
-offsetof(struct vhost_queue, stats.xstats[VHOST_ERRORS_FRAGMENTED])},
-   {"jabber_errors",
-offsetof(struct vhost_queue, stats.xstats[VHOST_ERRORS_JABBER])},
-   {"unknown_protos_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_UNKNOWN_PROTOCOL])},
-};
-
-/* [tx]_ is prepended to the name string here */
-static const struct vhost_xstats_name_off vhost_txport_stat_strings[] = {
-   {"good_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_PKT])},
-   {"total_bytes",
-offsetof(struct vhost_queue, stats.xstats[VHOST_BYTE])},
-   {"missed_pkts",
-offsetof(struct vhost_queue, stats.xstats[VHOST_MISSED_PKT])},
-   {"broadcast_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_BROADCAST_PKT])},
-   {"multicast_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_MULTICAST_PKT])},
-   {"unicast_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_UNICAST_PKT])},
-   {"undersize_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_UNDERSIZE_PKT])},
-   {"size_64_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_64_PKT])},
-   {"size_65_to_127_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_65_TO_127_PKT])},
-   {"size_128_to_255_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_128_TO_255_PKT])},
-   {"size_256_to_511_packets",
-offsetof(struct vhost_queue, stats.xstats[VHOST_256_TO_511_PKT])},
-   {"size_512_to_1023_packets",
-o

[PATCH 00/20] mlx5: refactor devargs management

2022-01-27 Thread Michael Baum
These patches rearrange the management of the devargs on two different
levels.

The first splits the net driver's devargs into two categories,
device-dependent devargs and port-dependent devargs.
Arguments that depend on the device are updated once in the creation of
the shared device context structure, and do not change even if the user
has sent new devargs in the probe again. In contrast, the arguments that
depend on the port are updated separately for each port.

The second layer deals with the parsing of devargs in the common driver.
The common driver once parses the devargs into a dictionary, then sends
it to all the drivers that will use it during the their probing. Each
driver updates within dictionary which keys it has used, then the common
driver checks the updated dictionary and reports about unknown devargs.

Michael Baum (20):
  net/mlx5: fix wrong check sibling device config mismatch
  net/mlx5: fix ineffective metadata argument adjustment
  net/mlx5: fix wrong place of ASO CT object release
  net/mlx5: fix inconsistency errno update in SH creation
  net/mlx5: remove declaration duplications
  net/mlx5: remove checking devargs duplication
  net/mlx5: remove HCA attr structure duplication
  net/mlx5: remove DevX flag duplication
  net/mlx5: remove Verbs query device duplication
  common/mlx5: share VF checking function
  net/mlx5: share realtime timestamp configure
  net/mlx5: share counter config function
  net/mlx5: add E-switch mode flag
  net/mlx5: rearrange device attribute structure
  net/mlx5: concentrate all device configurations
  net/mlx5: add share device context config structure
  net/mlx5: using function to detect operation by DevX
  net/mlx5: separate per port configuration
  common/mlx5: add check for common devargs in probing again
  common/mlx5: refactor devargs management

 drivers/common/mlx5/mlx5_common.c   | 345 +++--
 drivers/common/mlx5/mlx5_common.h   |  51 +-
 drivers/common/mlx5/mlx5_common_pci.c   |  18 +
 drivers/common/mlx5/version.map |   3 +
 drivers/compress/mlx5/mlx5_compress.c   |  38 +-
 drivers/crypto/mlx5/mlx5_crypto.c   |  39 +-
 drivers/net/mlx5/linux/mlx5_flow_os.c   |   3 +-
 drivers/net/mlx5/linux/mlx5_os.c| 885 +---
 drivers/net/mlx5/linux/mlx5_verbs.c |   9 +-
 drivers/net/mlx5/linux/mlx5_vlan_os.c   |   3 +-
 drivers/net/mlx5/mlx5.c | 872 +--
 drivers/net/mlx5/mlx5.h | 216 +++---
 drivers/net/mlx5/mlx5_devx.c|  19 +-
 drivers/net/mlx5/mlx5_ethdev.c  |  31 +-
 drivers/net/mlx5/mlx5_flow.c|  50 +-
 drivers/net/mlx5/mlx5_flow.h|   2 +-
 drivers/net/mlx5/mlx5_flow_dv.c |  93 ++-
 drivers/net/mlx5/mlx5_flow_flex.c   |   4 +-
 drivers/net/mlx5/mlx5_flow_meter.c  |  14 +-
 drivers/net/mlx5/mlx5_rxmode.c  |   8 +-
 drivers/net/mlx5/mlx5_rxq.c |  49 +-
 drivers/net/mlx5/mlx5_trigger.c |  35 +-
 drivers/net/mlx5/mlx5_tx.c  |   2 +-
 drivers/net/mlx5/mlx5_txpp.c|  14 +-
 drivers/net/mlx5/mlx5_txq.c |  62 +-
 drivers/net/mlx5/mlx5_vlan.c|   4 +-
 drivers/net/mlx5/windows/mlx5_flow_os.c |   2 +-
 drivers/net/mlx5/windows/mlx5_os.c  | 342 +++--
 drivers/regex/mlx5/mlx5_regex.c |   3 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c   |  32 +-
 30 files changed, 1841 insertions(+), 1407 deletions(-)

-- 
2.25.1



[PATCH 01/20] net/mlx5: fix wrong check sibling device config mismatch

2022-01-27 Thread Michael Baum
The MLX5 net driver supports "probe again". In probing again, it
creates a new ethdev under an existing infiniband device context.

Sibling devices sharing infiniband device context should have compatible
configurations, so some of the devargs given in the probe again, the
ones that are mainly relevant to the sharing device context are sent to
the mlx5_dev_check_sibling_config function which makes sure that they
compatible its siblings.
However, the arguments are adjusted according to the capability of the
device, and the function compares the arguments of the probe again
before the adjustment with the arguments of the siblings after the
adjustment. A user who sends the same values to all siblings may fail in
this comparison if he requested something that the device does not
support and adjusted.

This patch moves the call to the mlx5_dev_check_sibling_config function
after the relevant adjustments.

Fixes: 92d5dd483450 ("net/mlx5: check sibling device configurations mismatch")
Fixes: 2d241515ebaf ("net/mlx5: add devarg for extensive metadata support")
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c   | 41 --
 drivers/net/mlx5/windows/mlx5_os.c | 28 
 2 files changed, 39 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index aecdc5a68a..de0bb87460 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1241,6 +1241,28 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
}
/* Override some values set by hardware configuration. */
mlx5_args(config, dpdk_dev->devargs);
+   /* Update final values for devargs before check sibling config. */
+#if !defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_MLX5DV_DR)
+   if (config->dv_flow_en) {
+   DRV_LOG(WARNING, "DV flow is not supported.");
+   config->dv_flow_en = 0;
+   }
+#endif
+#ifdef HAVE_MLX5DV_DR_ESWITCH
+   if (!(sh->cdev->config.hca_attr.eswitch_manager && config->dv_flow_en &&
+ (switch_info->representor || switch_info->master)))
+   config->dv_esw_en = 0;
+#else
+   config->dv_esw_en = 0;
+#endif
+   if (!priv->config.dv_esw_en &&
+   priv->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY) {
+   DRV_LOG(WARNING,
+   "Metadata mode %u is not supported (no E-Switch).",
+   priv->config.dv_xmeta_en);
+   priv->config.dv_xmeta_en = MLX5_XMETA_MODE_LEGACY;
+   }
+   /* Check sibling device configurations. */
err = mlx5_dev_check_sibling_config(priv, config, dpdk_dev);
if (err)
goto error;
@@ -1251,12 +1273,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 #if !defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) && \
!defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
DRV_LOG(DEBUG, "counters are not supported");
-#endif
-#if !defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_MLX5DV_DR)
-   if (config->dv_flow_en) {
-   DRV_LOG(WARNING, "DV flow is not supported");
-   config->dv_flow_en = 0;
-   }
 #endif
config->ind_table_max_size =
sh->device_attr.max_rwq_indirection_table_size;
@@ -1652,13 +1668,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 * Verbs context returned by ibv_open_device().
 */
mlx5_link_update(eth_dev, 0);
-#ifdef HAVE_MLX5DV_DR_ESWITCH
-   if (!(config->hca_attr.eswitch_manager && config->dv_flow_en &&
- (switch_info->representor || switch_info->master)))
-   config->dv_esw_en = 0;
-#else
-   config->dv_esw_en = 0;
-#endif
/* Detect minimal data bytes to inline. */
mlx5_set_min_inline(spawn, config);
/* Store device configuration on private structure. */
@@ -1725,12 +1734,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
err = -err;
goto error;
}
-   if (!priv->config.dv_esw_en &&
-   priv->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY) {
-   DRV_LOG(WARNING, "metadata mode %u is not supported "
-"(no E-Switch)", priv->config.dv_xmeta_en);
-   priv->config.dv_xmeta_en = MLX5_XMETA_MODE_LEGACY;
-   }
mlx5_set_metadata_mask(eth_dev);
if (priv->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
!priv->sh->dv_regc0_mask) {
diff --git a/drivers/net/mlx5/windows/mlx5_os.c 
b/drivers/net/mlx5/windows/mlx5_os.c
index ac0af0ff7d..8eb53f2cb7 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -439,6 +439,21 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
}
/* Override some values set by hardware configuration. */
mlx5_args(config, dpdk_dev->devargs);
+   /* Update final values for devargs before check sibling config. *

[PATCH 02/20] net/mlx5: fix ineffective metadata argument adjustment

2022-01-27 Thread Michael Baum
In "dv_xmeta_en" devarg there is an option of dv_xmeta_en=3 which
engages tunnel offload mode. In E-Switch configuration, that mode
implicitly activates dv_xmeta_en=1.

The update according to E-switch support is done immediately after the
first parsing of the devargs, but there is another adjustment later.

This patch moves the adjustment after the second parsing.

Fixes: 4ec6360de37d ("net/mlx5: implement tunnel offload")
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index de0bb87460..e45e56f4b6 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -977,10 +977,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
strerror(rte_errno));
goto error;
}
-   if (config->dv_miss_info) {
-   if (switch_info->master || switch_info->representor)
-   config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
-   }
sh = mlx5_alloc_shared_dev_ctx(spawn, config);
if (!sh)
return NULL;
@@ -1242,6 +1238,10 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
/* Override some values set by hardware configuration. */
mlx5_args(config, dpdk_dev->devargs);
/* Update final values for devargs before check sibling config. */
+   if (config->dv_miss_info) {
+   if (switch_info->master || switch_info->representor)
+   config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
+   }
 #if !defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_MLX5DV_DR)
if (config->dv_flow_en) {
DRV_LOG(WARNING, "DV flow is not supported.");
-- 
2.25.1



[PATCH 03/20] net/mlx5: fix wrong place of ASO CT object release

2022-01-27 Thread Michael Baum
The ASO connection tracking structure is initialized once for sharing
device context.

Its release takes place in the close function which is called for each
ethdev individually. i.e. when there is more than one ethdev under the
same sharing device context, it will be destroyed when one of them is
closed. If the other wants to use it later, it may cause it to crash.

In addition, the creation of this structure is performed in the spawn
function. if one of the creations of the objects following it fails, it
is supposed to be destroyed but this does not happen.

This patch moves its release to the sharing device context free function
and thus solves both problems.

Fixes: 0af8a2298a42 ("net/mlx5: release connection tracking management")
Fixes: ee9e5fad03eb ("net/mlx5: initialize connection tracking management")
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 67eda41a60..d1d398f49a 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1321,6 +1321,8 @@ mlx5_free_shared_dev_ctx(struct mlx5_dev_ctx_shared *sh)
 *  Only primary process handles async device events.
 **/
mlx5_flow_counters_mng_close(sh);
+   if (sh->ct_mng)
+   mlx5_flow_aso_ct_mng_close(sh);
if (sh->aso_age_mng) {
mlx5_flow_aso_age_mng_close(sh);
sh->aso_age_mng = NULL;
@@ -1594,8 +1596,6 @@ mlx5_dev_close(struct rte_eth_dev *dev)
if (priv->mreg_cp_tbl)
mlx5_hlist_destroy(priv->mreg_cp_tbl);
mlx5_mprq_free_mp(dev);
-   if (priv->sh->ct_mng)
-   mlx5_flow_aso_ct_mng_close(priv->sh);
mlx5_os_free_shared_dr(priv);
if (priv->rss_conf.rss_key != NULL)
mlx5_free(priv->rss_conf.rss_key);
-- 
2.25.1



[PATCH 04/20] net/mlx5: fix inconsistency errno update in SH creation

2022-01-27 Thread Michael Baum
The mlx5_alloc_shared_dev_ctx() function has a local variable named
"err" which contains the errno value in case of failure.

When functions called by this function are failed, this variable is
updated with their return value (that should be a positive errno value).
However, some functions doesn't update errno value by themselves or
return negative errno value. If one of them fails, the "err" variable
contains negative value what cause to assertion failure.

This patch updates all functions uses by mlx5_alloc_shared_dev_ctx()
function to update rte_errno and take this value instead of "err" value.

Fixes: 5dfa003db53f ("common/mlx5: fix post doorbell barrier")
Fixes: 5d55a494f4e6 ("net/mlx5: split multi-thread flow handling per OS")
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_flow_os.c   |  3 ++-
 drivers/net/mlx5/linux/mlx5_os.c| 16 ++--
 drivers/net/mlx5/mlx5.c | 24 +++-
 drivers/net/mlx5/windows/mlx5_flow_os.c |  2 +-
 drivers/net/mlx5/windows/mlx5_os.c  | 24 
 5 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_flow_os.c 
b/drivers/net/mlx5/linux/mlx5_flow_os.c
index 893f00b824..a5956c255a 100644
--- a/drivers/net/mlx5/linux/mlx5_flow_os.c
+++ b/drivers/net/mlx5/linux/mlx5_flow_os.c
@@ -14,7 +14,8 @@ mlx5_flow_os_init_workspace_once(void)
 {
if (rte_thread_key_create(&key_workspace, flow_release_workspace)) {
DRV_LOG(ERR, "Can't create flow workspace data thread key.");
-   return -ENOMEM;
+   rte_errno = ENOMEM;
+   return -rte_errno;
}
return 0;
 }
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index e45e56f4b6..f587a47f6e 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -138,7 +138,7 @@ mlx5_os_set_nonblock_channel_fd(int fd)
  *   Pointer to mlx5 device attributes.
  *
  * @return
- *   0 on success, non zero error number otherwise
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx5_os_get_dev_attr(struct mlx5_common_device *cdev,
@@ -150,8 +150,10 @@ mlx5_os_get_dev_attr(struct mlx5_common_device *cdev,
 
memset(device_attr, 0, sizeof(*device_attr));
err = mlx5_glue->query_device_ex(ctx, NULL, &attr_ex);
-   if (err)
-   return err;
+   if (err) {
+   rte_errno = errno;
+   return -rte_errno;
+   }
device_attr->device_cap_flags_ex = attr_ex.device_cap_flags_ex;
device_attr->max_qp_wr = attr_ex.orig_attr.max_qp_wr;
device_attr->max_sge = attr_ex.orig_attr.max_sge;
@@ -170,8 +172,10 @@ mlx5_os_get_dev_attr(struct mlx5_common_device *cdev,
 
struct mlx5dv_context dv_attr = { .comp_mask = 0 };
err = mlx5_glue->dv_query_device(ctx, &dv_attr);
-   if (err)
-   return err;
+   if (err) {
+   rte_errno = errno;
+   return -rte_errno;
+   }
 
device_attr->flags = dv_attr.flags;
device_attr->comp_mask = dv_attr.comp_mask;
@@ -195,7 +199,7 @@ mlx5_os_get_dev_attr(struct mlx5_common_device *cdev,
strlcpy(device_attr->fw_ver, attr_ex.orig_attr.fw_ver,
sizeof(device_attr->fw_ver));
 
-   return err;
+   return 0;
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d1d398f49a..25d4d2082b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1172,12 +1172,11 @@ mlx5_alloc_shared_dev_ctx(const struct 
mlx5_dev_spawn_data *spawn,
MLX5_ASSERT(spawn->max_port);
sh = mlx5_malloc(MLX5_MEM_ZERO | MLX5_MEM_RTE,
 sizeof(struct mlx5_dev_ctx_shared) +
-spawn->max_port *
-sizeof(struct mlx5_dev_shared_port),
+spawn->max_port * sizeof(struct mlx5_dev_shared_port),
 RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
if (!sh) {
-   DRV_LOG(ERR, "shared context allocation failure");
-   rte_errno  = ENOMEM;
+   DRV_LOG(ERR, "Shared context allocation failure.");
+   rte_errno = ENOMEM;
goto exit;
}
pthread_mutex_init(&sh->txpp.mutex, NULL);
@@ -1199,9 +1198,8 @@ mlx5_alloc_shared_dev_ctx(const struct 
mlx5_dev_spawn_data *spawn,
strncpy(sh->ibdev_path, mlx5_os_get_ctx_device_path(sh->cdev->ctx),
sizeof(sh->ibdev_path) - 1);
/*
-* Setting port_id to max unallowed value means
-* there is no interrupt subhandler installed for
-* the given port index i.
+* Setting port_id to max unallowed value means there is no interrupt
+* subhandler installed for the given port index i.
 */
for (i = 0; i < sh->max_port; i++) {
sh->

[PATCH 05/20] net/mlx5: remove declaration duplications

2022-01-27 Thread Michael Baum
In mlx5_ethdev.c file are implemented those 4 functions:
 - mlx5_dev_infos_get
 - mlx5_fw_version_get
 - mlx5_dev_set_mtu
 - mlx5_hairpin_cap_get

In mlx5.h file they are declared twice. First time under mlx5.c file and
second time under mlx5_ethdev.c file.

This patch removes the redundant declaration.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5.h | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 737ad6895c..823a943978 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1532,11 +1532,6 @@ void mlx5_set_metadata_mask(struct rte_eth_dev *dev);
 int mlx5_dev_check_sibling_config(struct mlx5_priv *priv,
  struct mlx5_dev_config *config,
  struct rte_device *dpdk_dev);
-int mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info);
-int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size);
-int mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu);
-int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
-struct rte_eth_hairpin_cap *cap);
 bool mlx5_flex_parser_ecpri_exist(struct rte_eth_dev *dev);
 int mlx5_flex_parser_ecpri_alloc(struct rte_eth_dev *dev);
 int mlx5_flow_aso_age_mng_init(struct mlx5_dev_ctx_shared *sh);
@@ -1556,10 +1551,8 @@ int mlx5_representor_info_get(struct rte_eth_dev *dev,
(((repr_id) >> 12) & 3)
 uint16_t mlx5_representor_id_encode(const struct mlx5_switch_info *info,
enum rte_eth_representor_type hpf_type);
-int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver,
-   size_t fw_size);
-int mlx5_dev_infos_get(struct rte_eth_dev *dev,
-  struct rte_eth_dev_info *info);
+int mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info);
+int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size);
 const uint32_t *mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev);
 int mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu);
 int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
-- 
2.25.1



[PATCH 06/20] net/mlx5: remove checking devargs duplication

2022-01-27 Thread Michael Baum
The device arguments are parsed and updated twice during spawning. First
time before creating the share device context, and again later after
updating a default value to one of the arguments.

This patch consolidates them into one parsing and updates the default
values before it.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c   | 73 +-
 drivers/net/mlx5/mlx5.c| 23 +-
 drivers/net/mlx5/mlx5.h|  2 +-
 drivers/net/mlx5/windows/mlx5_os.c | 49 
 4 files changed, 65 insertions(+), 82 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index f587a47f6e..11d15d0aef 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -968,22 +968,45 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
mlx5_dev_close(eth_dev);
return NULL;
}
-   /*
-* Some parameters ("tx_db_nc" in particularly) are needed in
-* advance to create dv/verbs device context. We proceed the
-* devargs here to get ones, and later proceed devargs again
-* to override some hardware settings.
-*/
+   /* Process parameters. */
err = mlx5_args(config, dpdk_dev->devargs);
if (err) {
-   err = rte_errno;
DRV_LOG(ERR, "failed to process device arguments: %s",
strerror(rte_errno));
-   goto error;
+   return NULL;
}
sh = mlx5_alloc_shared_dev_ctx(spawn, config);
if (!sh)
return NULL;
+   /* Update final values for devargs before check sibling config. */
+   if (config->dv_miss_info) {
+   if (switch_info->master || switch_info->representor)
+   config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
+   }
+#if !defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_MLX5DV_DR)
+   if (config->dv_flow_en) {
+   DRV_LOG(WARNING, "DV flow is not supported.");
+   config->dv_flow_en = 0;
+   }
+#endif
+#ifdef HAVE_MLX5DV_DR_ESWITCH
+   if (!(sh->cdev->config.hca_attr.eswitch_manager && config->dv_flow_en &&
+ (switch_info->representor || switch_info->master)))
+   config->dv_esw_en = 0;
+#else
+   config->dv_esw_en = 0;
+#endif
+   if (!config->dv_esw_en &&
+   config->dv_xmeta_en != MLX5_XMETA_MODE_LEGACY) {
+   DRV_LOG(WARNING,
+   "Metadata mode %u is not supported (no E-Switch).",
+   config->dv_xmeta_en);
+   config->dv_xmeta_en = MLX5_XMETA_MODE_LEGACY;
+   }
+   /* Check sibling device configurations. */
+   err = mlx5_dev_check_sibling_config(sh, config, dpdk_dev);
+   if (err)
+   goto error;
 #ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
config->dest_tir = 1;
 #endif
@@ -1049,8 +1072,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
mprq_caps.max_single_wqe_log_num_of_strides;
}
 #endif
-   /* Rx CQE compression is enabled by default. */
-   config->cqe_comp = 1;
 #ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
if (dv_attr.comp_mask & MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS) {
config->tunnel_en = dv_attr.tunnel_offloads_caps &
@@ -1239,37 +1260,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
DRV_LOG(DEBUG, "dev_port-%u new domain_id=%u\n",
priv->dev_port, priv->domain_id);
}
-   /* Override some values set by hardware configuration. */
-   mlx5_args(config, dpdk_dev->devargs);
-   /* Update final values for devargs before check sibling config. */
-   if (config->dv_miss_info) {
-   if (switch_info->master || switch_info->representor)
-   config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
-   }
-#if !defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_MLX5DV_DR)
-   if (config->dv_flow_en) {
-   DRV_LOG(WARNING, "DV flow is not supported.");
-   config->dv_flow_en = 0;
-   }
-#endif
-#ifdef HAVE_MLX5DV_DR_ESWITCH
-   if (!(sh->cdev->config.hca_attr.eswitch_manager && config->dv_flow_en &&
- (switch_info->representor || switch_info->master)))
-   config->dv_esw_en = 0;
-#else
-   config->dv_esw_en = 0;
-#endif
-   if (!priv->config.dv_esw_en &&
-   priv->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY) {
-   DRV_LOG(WARNING,
-   "Metadata mode %u is not supported (no E-Switch).",
-   priv->config.dv_xmeta_en);
-   priv->config.dv_xmeta_en = MLX5_XMETA_MODE_LEGACY;
-   }
-   /* Check sibling device configurations. */
-   err = mlx5_dev_check_sibling_config(priv, config, dpdk_dev);
-   if (err)
-   goto error;
config->hw_csum = !!(sh->device_at

[PATCH 08/20] net/mlx5: remove DevX flag duplication

2022-01-27 Thread Michael Baum
The sharing device context structure has a field named "devx" which
indicates if DevX is supported.
The common configure stracture has also field named "devx" with the same
meaning.

There is no need for this duplication, because there is a reference to
the common structure from within the sharing device context structure.

This patch removes it from sharing device context structure and uses the
common config structure instead.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c| 16 
 drivers/net/mlx5/linux/mlx5_verbs.c |  4 ++--
 drivers/net/mlx5/mlx5.c |  3 +--
 drivers/net/mlx5/mlx5.h |  1 -
 drivers/net/mlx5/mlx5_ethdev.c  |  3 ++-
 drivers/net/mlx5/mlx5_flow.c|  2 +-
 drivers/net/mlx5/mlx5_flow_dv.c | 23 ---
 drivers/net/mlx5/mlx5_trigger.c |  2 +-
 drivers/net/mlx5/windows/mlx5_os.c  |  6 +++---
 9 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 39ca145e4a..b579be25cb 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -682,7 +682,7 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev 
__rte_unused)
fallback = true;
 #else
fallback = false;
-   if (!sh->devx || !priv->config.dv_flow_en ||
+   if (!sh->cdev->config.devx || !priv->config.dv_flow_en ||
!hca_attr->flow_counters_dump ||
!(hca_attr->flow_counter_bulk_alloc_bitmap & 0x4) ||
(mlx5_flow_dv_discover_counter_offset_support(dev) == -ENOTSUP))
@@ -1316,7 +1316,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
config->mps == MLX5_MPW_ENHANCED ? "enhanced " :
config->mps == MLX5_MPW ? "legacy " : "",
config->mps != MLX5_MPW_DISABLED ? "enabled" : "disabled");
-   if (sh->devx) {
+   if (sh->cdev->config.devx) {
sh->steering_format_version = hca_attr->steering_format_version;
/* Check for LRO support. */
if (config->dest_tir && hca_attr->lro_cap &&
@@ -1434,13 +1434,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
config->cqe_comp = 0;
}
if (config->cqe_comp_fmt == MLX5_CQE_RESP_FORMAT_FTAG_STRIDX &&
-   (!sh->devx || !hca_attr->mini_cqe_resp_flow_tag)) {
+   (!sh->cdev->config.devx || !hca_attr->mini_cqe_resp_flow_tag)) {
DRV_LOG(WARNING, "Flow Tag CQE compression"
 " format isn't supported.");
config->cqe_comp = 0;
}
if (config->cqe_comp_fmt == MLX5_CQE_RESP_FORMAT_L34H_STRIDX &&
-   (!sh->devx || !hca_attr->mini_cqe_resp_l3_l4_tag)) {
+   (!sh->cdev->config.devx || !hca_attr->mini_cqe_resp_l3_l4_tag)) {
DRV_LOG(WARNING, "L3/L4 Header CQE compression"
 " format isn't supported.");
config->cqe_comp = 0;
@@ -1463,7 +1463,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
hca_attr->log_max_static_sq_wq);
DRV_LOG(DEBUG, "WQE rate PP mode is %ssupported",
hca_attr->qos.wqe_rate_pp ? "" : "not ");
-   if (!sh->devx) {
+   if (!sh->cdev->config.devx) {
DRV_LOG(ERR, "DevX is required for packet pacing");
err = ENODEV;
goto error;
@@ -1519,7 +1519,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
priv->dev_port);
}
}
-   if (sh->devx) {
+   if (sh->cdev->config.devx) {
uint32_t reg[MLX5_ST_SZ_DW(register_mtutc)];
 
err = hca_attr->access_register_user ?
@@ -1676,7 +1676,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
if (mlx5_flex_item_port_init(eth_dev) < 0)
goto error;
}
-   if (sh->devx && config->dv_flow_en && config->dest_tir) {
+   if (sh->cdev->config.devx && config->dv_flow_en && config->dest_tir) {
priv->obj_ops = devx_obj_ops;
mlx5_queue_counter_id_prepare(eth_dev);
priv->obj_ops.lb_dummy_queue_create =
@@ -2735,7 +2735,7 @@ mlx5_os_dev_shared_handler_install(struct 
mlx5_dev_ctx_shared *sh)
rte_intr_fd_set(sh->intr_handle, -1);
}
}
-   if (sh->devx) {
+   if (sh->cdev->config.devx) {
 #ifdef HAVE_IBV_DEVX_ASYNC
sh->intr_handle_devx =
rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
diff --git a/drivers/net/mlx5/linux/mlx5_verbs.c 
b/drivers/net/mlx5/linux/mlx5_verbs.c
index 2b6eef44a7..722017efa4 100644
--- a/drivers/net/mlx5/linux/mlx5_verbs.c
+++ b/drivers/net/mlx5/linux/mlx5_verbs.c
@@ -998,7 +998,7 @@ mlx5_txq_ibv_obj_new(struct rte_eth_dev *dev, uint16_t idx)
qp.comp_mask

[PATCH 07/20] net/mlx5: remove HCA attr structure duplication

2022-01-27 Thread Michael Baum
The HCA attribute structure is field of net configure structure.
It is also field of common configure structure.

There is no need for this duplication, because there is a reference to
the common structure from within the net structures.

This patch removes it from net configure structure and uses the common
config structure instead.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c   | 95 ++
 drivers/net/mlx5/mlx5.c| 14 +++--
 drivers/net/mlx5/mlx5.h|  1 -
 drivers/net/mlx5/mlx5_devx.c   |  8 ++-
 drivers/net/mlx5/mlx5_ethdev.c |  2 +-
 drivers/net/mlx5/mlx5_flow.c   | 16 ++---
 drivers/net/mlx5/mlx5_flow_dv.c| 13 ++--
 drivers/net/mlx5/mlx5_flow_flex.c  |  4 +-
 drivers/net/mlx5/mlx5_flow_meter.c |  4 +-
 drivers/net/mlx5/mlx5_rxq.c|  4 +-
 drivers/net/mlx5/mlx5_trigger.c| 12 ++--
 drivers/net/mlx5/mlx5_txpp.c   |  2 +-
 drivers/net/mlx5/windows/mlx5_os.c | 25 
 13 files changed, 100 insertions(+), 100 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 11d15d0aef..39ca145e4a 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -675,6 +675,7 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev 
__rte_unused)
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
struct mlx5_priv *priv = dev->data->dev_private;
struct mlx5_dev_ctx_shared *sh = priv->sh;
+   struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
bool fallback;
 
 #ifndef HAVE_IBV_DEVX_ASYNC
@@ -682,16 +683,16 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev 
__rte_unused)
 #else
fallback = false;
if (!sh->devx || !priv->config.dv_flow_en ||
-   !priv->config.hca_attr.flow_counters_dump ||
-   !(priv->config.hca_attr.flow_counter_bulk_alloc_bitmap & 0x4) ||
+   !hca_attr->flow_counters_dump ||
+   !(hca_attr->flow_counter_bulk_alloc_bitmap & 0x4) ||
(mlx5_flow_dv_discover_counter_offset_support(dev) == -ENOTSUP))
fallback = true;
 #endif
if (fallback)
DRV_LOG(INFO, "Use fall-back DV counter management. Flow "
"counter dump:%d, bulk_alloc_bitmap:0x%hhx.",
-   priv->config.hca_attr.flow_counters_dump,
-   priv->config.hca_attr.flow_counter_bulk_alloc_bitmap);
+   hca_attr->flow_counters_dump,
+   hca_attr->flow_counter_bulk_alloc_bitmap);
/* Initialize fallback mode only on the port initializes sh. */
if (sh->refcnt == 1)
sh->cmng.counter_fallback = fallback;
@@ -875,6 +876,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 {
const struct mlx5_switch_info *switch_info = &spawn->info;
struct mlx5_dev_ctx_shared *sh = NULL;
+   struct mlx5_hca_attr *hca_attr = &spawn->cdev->config.hca_attr;
struct ibv_port_attr port_attr = { .state = IBV_PORT_NOP };
struct mlx5dv_context dv_attr = { .comp_mask = 0 };
struct rte_eth_dev *eth_dev = NULL;
@@ -990,7 +992,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
}
 #endif
 #ifdef HAVE_MLX5DV_DR_ESWITCH
-   if (!(sh->cdev->config.hca_attr.eswitch_manager && config->dv_flow_en &&
+   if (!(hca_attr->eswitch_manager && config->dv_flow_en &&
  (switch_info->representor || switch_info->master)))
config->dv_esw_en = 0;
 #else
@@ -1315,14 +1317,12 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
config->mps == MLX5_MPW ? "legacy " : "",
config->mps != MLX5_MPW_DISABLED ? "enabled" : "disabled");
if (sh->devx) {
-   config->hca_attr = sh->cdev->config.hca_attr;
-   sh->steering_format_version =
-   config->hca_attr.steering_format_version;
+   sh->steering_format_version = hca_attr->steering_format_version;
/* Check for LRO support. */
-   if (config->dest_tir && config->hca_attr.lro_cap &&
+   if (config->dest_tir && hca_attr->lro_cap &&
config->dv_flow_en) {
/* TBD check tunnel lro caps. */
-   config->lro.supported = config->hca_attr.lro_cap;
+   config->lro.supported = hca_attr->lro_cap;
DRV_LOG(DEBUG, "Device supports LRO");
/*
 * If LRO timeout is not configured by application,
@@ -1330,21 +1330,19 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 */
if (!config->lro.timeout)
config->lro.timeout =
-   config->hca_attr.lro_timer_supported_periods[0];
+  hca_attr->lro_timer_supported_periods[0];
DRV_LOG(DEBUG, "

[PATCH 09/20] net/mlx5: remove Verbs query device duplication

2022-01-27 Thread Michael Baum
The sharing device context structure has a field named "device_attr"
which s filled by mlx5_os_get_dev_attr() function.
The spawn function calls mlx5_os_get_dev_attr() again and save it to
local variable identical to "device_attr" field.

There is no need for this duplication, because there is a reference to
the sharing device context structure from spawn function.

This patch removes the local "device_attr" from spawn function, and uses
the context's field instead.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c   | 63 ++
 drivers/net/mlx5/windows/mlx5_os.c |  6 +--
 2 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index b579be25cb..e8e842a09e 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -171,6 +171,15 @@ mlx5_os_get_dev_attr(struct mlx5_common_device *cdev,
device_attr->tso_supported_qpts = attr_ex.tso_caps.supported_qpts;
 
struct mlx5dv_context dv_attr = { .comp_mask = 0 };
+#ifdef HAVE_IBV_MLX5_MOD_SWP
+   dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_SWP;
+#endif
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+   dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS;
+#endif
+#ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
+   dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_STRIDING_RQ;
+#endif
err = mlx5_glue->dv_query_device(ctx, &dv_attr);
if (err) {
rte_errno = errno;
@@ -183,6 +192,7 @@ mlx5_os_get_dev_attr(struct mlx5_common_device *cdev,
device_attr->sw_parsing_offloads =
dv_attr.sw_parsing_caps.sw_parsing_offloads;
 #endif
+#ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
device_attr->min_single_stride_log_num_of_bytes =
dv_attr.striding_rq_caps.min_single_stride_log_num_of_bytes;
device_attr->max_single_stride_log_num_of_bytes =
@@ -193,6 +203,7 @@ mlx5_os_get_dev_attr(struct mlx5_common_device *cdev,
dv_attr.striding_rq_caps.max_single_wqe_log_num_of_strides;
device_attr->stride_supported_qpts =
dv_attr.striding_rq_caps.supported_qpts;
+#endif
 #ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
device_attr->tunnel_offloads_caps = dv_attr.tunnel_offloads_caps;
 #endif
@@ -878,7 +889,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
struct mlx5_dev_ctx_shared *sh = NULL;
struct mlx5_hca_attr *hca_attr = &spawn->cdev->config.hca_attr;
struct ibv_port_attr port_attr = { .state = IBV_PORT_NOP };
-   struct mlx5dv_context dv_attr = { .comp_mask = 0 };
struct rte_eth_dev *eth_dev = NULL;
struct mlx5_priv *priv = NULL;
int err = 0;
@@ -1011,23 +1021,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
goto error;
 #ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
config->dest_tir = 1;
-#endif
-#ifdef HAVE_IBV_MLX5_MOD_SWP
-   dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_SWP;
 #endif
/*
 * Multi-packet send is supported by ConnectX-4 Lx PF as well
 * as all ConnectX-5 devices.
 */
-#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
-   dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS;
-#endif
-#ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
-   dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_STRIDING_RQ;
-#endif
-   mlx5_glue->dv_query_device(sh->cdev->ctx, &dv_attr);
-   if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) {
-   if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW) {
+   if (sh->device_attr.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) {
+   if (sh->device_attr.flags & MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW) {
DRV_LOG(DEBUG, "enhanced MPW is supported");
mps = MLX5_MPW_ENHANCED;
} else {
@@ -1039,44 +1039,41 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
mps = MLX5_MPW_DISABLED;
}
 #ifdef HAVE_IBV_MLX5_MOD_SWP
-   if (dv_attr.comp_mask & MLX5DV_CONTEXT_MASK_SWP)
-   swp = dv_attr.sw_parsing_caps.sw_parsing_offloads;
+   if (sh->device_attr.comp_mask & MLX5DV_CONTEXT_MASK_SWP)
+   swp = sh->device_attr.sw_parsing_offloads;
DRV_LOG(DEBUG, "SWP support: %u", swp);
 #endif
config->swp = swp & (MLX5_SW_PARSING_CAP | MLX5_SW_PARSING_CSUM_CAP |
MLX5_SW_PARSING_TSO_CAP);
 #ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
-   if (dv_attr.comp_mask & MLX5DV_CONTEXT_MASK_STRIDING_RQ) {
-   struct mlx5dv_striding_rq_caps mprq_caps =
-   dv_attr.striding_rq_caps;
-
+   if (sh->device_attr.comp_mask & MLX5DV_CONTEXT_MASK_STRIDING_RQ) {
DRV_LOG(DEBUG, "\tmin_single_stride_log_num_of_bytes: %d",
-   mprq_caps.min_single_stride_log_num_of_bytes);
+   sh->device_attr.min_single_stride_log_num_of_bytes);
DRV_LOG(DEBUG,

[PATCH 11/20] net/mlx5: share realtime timestamp configure

2022-01-27 Thread Michael Baum
The realtime timestamp configure work for Linux as same as Windows.
This patch removes it to the function implemented in the folder shared
between the operating systems, removing the duplication.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c   | 23 ++-
 drivers/net/mlx5/mlx5.c| 37 ++
 drivers/net/mlx5/mlx5.h|  3 +++
 drivers/net/mlx5/windows/mlx5_os.c | 22 +-
 4 files changed, 43 insertions(+), 42 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 2fb91fec06..bb90cc4426 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1516,27 +1516,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
priv->dev_port);
}
}
-   if (sh->cdev->config.devx) {
-   uint32_t reg[MLX5_ST_SZ_DW(register_mtutc)];
-
-   err = hca_attr->access_register_user ?
-   mlx5_devx_cmd_register_read
-   (sh->cdev->ctx, MLX5_REGISTER_ID_MTUTC, 0,
-   reg, MLX5_ST_SZ_DW(register_mtutc)) : ENOTSUP;
-   if (!err) {
-   uint32_t ts_mode;
-
-   /* MTUTC register is read successfully. */
-   ts_mode = MLX5_GET(register_mtutc, reg,
-  time_stamp_mode);
-   if (ts_mode == MLX5_MTUTC_TIMESTAMP_MODE_REAL_TIME)
-   config->rt_timestamp = 1;
-   } else {
-   /* Kernel does not support register reading. */
-   if (hca_attr->dev_freq_khz == (NS_PER_S / MS_PER_S))
-   config->rt_timestamp = 1;
-   }
-   }
+   if (sh->cdev->config.devx)
+   mlx5_rt_timestamp_config(sh, config, hca_attr);
/*
 * If HW has bug working with tunnel packet decapsulation and
 * scatter FCS, and decapsulation is needed, clear the hw_fcs_strip
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b371a87355..5146359100 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1129,6 +1129,43 @@ mlx5_setup_tis(struct mlx5_dev_ctx_shared *sh)
return 0;
 }
 
+/**
+ * Configure realtime timestamp format.
+ *
+ * @param sh
+ *   Pointer to mlx5_dev_ctx_shared object.
+ * @param config
+ *   Device configuration parameters.
+ * @param hca_attr
+ *   Pointer to DevX HCA capabilities structure.
+ */
+void
+mlx5_rt_timestamp_config(struct mlx5_dev_ctx_shared *sh,
+struct mlx5_dev_config *config,
+struct mlx5_hca_attr *hca_attr)
+{
+   uint32_t dw_cnt = MLX5_ST_SZ_DW(register_mtutc);
+   uint32_t reg[dw_cnt];
+   int ret = ENOTSUP;
+
+   if (hca_attr->access_register_user)
+   ret = mlx5_devx_cmd_register_read(sh->cdev->ctx,
+ MLX5_REGISTER_ID_MTUTC, 0,
+ reg, dw_cnt);
+   if (!ret) {
+   uint32_t ts_mode;
+
+   /* MTUTC register is read successfully. */
+   ts_mode = MLX5_GET(register_mtutc, reg, time_stamp_mode);
+   if (ts_mode == MLX5_MTUTC_TIMESTAMP_MODE_REAL_TIME)
+   config->rt_timestamp = 1;
+   } else {
+   /* Kernel does not support register reading. */
+   if (hca_attr->dev_freq_khz == (NS_PER_S / MS_PER_S))
+   config->rt_timestamp = 1;
+   }
+}
+
 /**
  * Allocate shared device context. If there is multiport device the
  * master and representors will share this context, if there is single
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6bc7a34f60..0f90d757e9 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1517,6 +1517,9 @@ void mlx5_age_event_prepare(struct mlx5_dev_ctx_shared 
*sh);
 port_id < RTE_MAX_ETHPORTS; \
 port_id = mlx5_eth_find_next(port_id + 1, dev))
 int mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs);
+void mlx5_rt_timestamp_config(struct mlx5_dev_ctx_shared *sh,
+ struct mlx5_dev_config *config,
+ struct mlx5_hca_attr *hca_attr);
 struct mlx5_dev_ctx_shared *
 mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn,
   const struct mlx5_dev_config *config);
diff --git a/drivers/net/mlx5/windows/mlx5_os.c 
b/drivers/net/mlx5/windows/mlx5_os.c
index 178e58b4d7..a9c7ba2a14 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -483,27 +483,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
DRV_LOG(DEBUG, "VLAN stripping is %ssupported",

[PATCH 10/20] common/mlx5: share VF checking function

2022-01-27 Thread Michael Baum
The check if device is VF work for Linux as same as Windows.
This patch removes it to the function implemented in the folder shared
between the operating systems, removing the duplication.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/common/mlx5/mlx5_common.h | 15 +++
 drivers/common/mlx5/mlx5_common_pci.c | 18 ++
 drivers/common/mlx5/version.map   |  1 +
 drivers/net/mlx5/linux/mlx5_os.c  | 18 +-
 drivers/net/mlx5/windows/mlx5_os.c| 16 +---
 5 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.h 
b/drivers/common/mlx5/mlx5_common.h
index e8809844af..80f59c81fb 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -8,6 +8,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -487,6 +488,20 @@ __rte_internal
 bool
 mlx5_dev_is_pci(const struct rte_device *dev);
 
+/**
+ * Test PCI device is a VF device.
+ *
+ * @param pci_dev
+ *   Pointer to PCI device.
+ *
+ * @return
+ *   - True on PCI device is a VF device.
+ *   - False otherwise.
+ */
+__rte_internal
+bool
+mlx5_dev_is_vf_pci(struct rte_pci_device *pci_dev);
+
 __rte_internal
 int
 mlx5_dev_mempool_subscribe(struct mlx5_common_device *cdev);
diff --git a/drivers/common/mlx5/mlx5_common_pci.c 
b/drivers/common/mlx5/mlx5_common_pci.c
index 8b38091d87..8fd2cb076c 100644
--- a/drivers/common/mlx5/mlx5_common_pci.c
+++ b/drivers/common/mlx5/mlx5_common_pci.c
@@ -108,6 +108,24 @@ mlx5_dev_is_pci(const struct rte_device *dev)
return strcmp(dev->bus->name, "pci") == 0;
 }
 
+bool
+mlx5_dev_is_vf_pci(struct rte_pci_device *pci_dev)
+{
+   switch (pci_dev->id.device_id) {
+   case PCI_DEVICE_ID_MELLANOX_CONNECTX4VF:
+   case PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF:
+   case PCI_DEVICE_ID_MELLANOX_CONNECTX5VF:
+   case PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF:
+   case PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF:
+   case PCI_DEVICE_ID_MELLANOX_CONNECTX6VF:
+   case PCI_DEVICE_ID_MELLANOX_CONNECTXVF:
+   return true;
+   default:
+   break;
+   }
+   return false;
+}
+
 bool
 mlx5_dev_pci_match(const struct mlx5_class_driver *drv,
   const struct rte_device *dev)
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 462b7cea5e..59ab434631 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -13,6 +13,7 @@ INTERNAL {
mlx5_common_verbs_dereg_mr; # WINDOWS_NO_EXPORT
 
mlx5_dev_is_pci;
+   mlx5_dev_is_vf_pci;
mlx5_dev_mempool_unregister;
mlx5_dev_mempool_subscribe;
 
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index e8e842a09e..2fb91fec06 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2100,7 +2100,6 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
struct rte_pci_device *pci_dev = RTE_DEV_TO_PCI(cdev->dev);
struct mlx5_dev_spawn_data *list = NULL;
struct mlx5_dev_config dev_config;
-   unsigned int dev_config_vf;
struct rte_eth_devargs eth_da = *req_eth_da;
struct rte_pci_addr owner_pci = pci_dev->addr; /* Owner PF. */
struct mlx5_bond_info bond_info;
@@ -2421,21 +2420,6 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
 * (i.e. master first, then representors from lowest to highest ID).
 */
qsort(list, ns, sizeof(*list), mlx5_dev_spawn_data_cmp);
-   /* Device specific configuration. */
-   switch (pci_dev->id.device_id) {
-   case PCI_DEVICE_ID_MELLANOX_CONNECTX4VF:
-   case PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF:
-   case PCI_DEVICE_ID_MELLANOX_CONNECTX5VF:
-   case PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF:
-   case PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF:
-   case PCI_DEVICE_ID_MELLANOX_CONNECTX6VF:
-   case PCI_DEVICE_ID_MELLANOX_CONNECTXVF:
-   dev_config_vf = 1;
-   break;
-   default:
-   dev_config_vf = 0;
-   break;
-   }
if (eth_da.type != RTE_ETH_REPRESENTOR_NONE) {
/* Set devargs default values. */
if (eth_da.nb_mh_controllers == 0) {
@@ -2459,7 +2443,7 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
 
/* Default configuration. */
mlx5_os_config_default(&dev_config, &cdev->config);
-   dev_config.vf = dev_config_vf;
+   dev_config.vf = mlx5_dev_is_vf_pci(pci_dev);
list[i].eth_dev = mlx5_dev_spawn(cdev->dev, &list[i],
 &dev_config, ð_da);
if (!list[i].eth_dev) {
diff --git a/drivers/net/mlx5/windows/mlx5_os.c 
b/drivers/net/mlx5/windows/mlx5_os.c
index 31f0247be7..178e58b4d7 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_o

[PATCH 13/20] net/mlx5: add E-switch mode flag

2022-01-27 Thread Michael Baum
This patch adds in SH structure a flag which indicates whether is
E-Switch mode.
When configure "dv_esw_en" from devargs, it is enabled only when is
E-switch mode. So, since dv_esw_en has been configure, it is enough to
check if "dv_esw_en" is valid.
This patch also removes E-Switch mode check when "dv_esw_en" is checked
too.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c | 14 +-
 drivers/net/mlx5/mlx5.c  |  1 +
 drivers/net/mlx5/mlx5.h  |  1 +
 drivers/net/mlx5/mlx5_ethdev.c   |  4 ++--
 drivers/net/mlx5/mlx5_flow_dv.c  | 12 +++-
 drivers/net/mlx5/mlx5_trigger.c  |  5 ++---
 6 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 9a05c1ba44..47b088db83 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -951,10 +951,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
if (!sh)
return NULL;
/* Update final values for devargs before check sibling config. */
-   if (config->dv_miss_info) {
-   if (switch_info->master || switch_info->representor)
-   config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
-   }
 #if !defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_MLX5DV_DR)
if (config->dv_flow_en) {
DRV_LOG(WARNING, "DV flow is not supported.");
@@ -962,12 +958,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
}
 #endif
 #ifdef HAVE_MLX5DV_DR_ESWITCH
-   if (!(hca_attr->eswitch_manager && config->dv_flow_en &&
- (switch_info->representor || switch_info->master)))
+   if (!(hca_attr->eswitch_manager && config->dv_flow_en && sh->esw_mode))
config->dv_esw_en = 0;
 #else
config->dv_esw_en = 0;
 #endif
+   if (config->dv_miss_info && config->dv_esw_en)
+   config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
if (!config->dv_esw_en &&
config->dv_xmeta_en != MLX5_XMETA_MODE_LEGACY) {
DRV_LOG(WARNING,
@@ -1133,7 +1130,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 * register to match on vport index. The engaged part of metadata
 * register is defined by mask.
 */
-   if (switch_info->representor || switch_info->master) {
+   if (sh->esw_mode) {
err = mlx5_glue->devx_port_query(sh->cdev->ctx,
 spawn->phys_port,
 &vport_info);
@@ -1164,8 +1161,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
}
if (vport_info.query_flags & MLX5_PORT_QUERY_VPORT) {
priv->vport_id = vport_info.vport_id;
-   } else if (spawn->pf_bond >= 0 &&
-  (switch_info->representor || switch_info->master)) {
+   } else if (spawn->pf_bond >= 0 && sh->esw_mode) {
DRV_LOG(ERR,
"Cannot deduce vport index for port %d on bonding 
device %s",
spawn->phys_port, spawn->phys_dev_name);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 27bcca9012..e1fe8f9375 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1259,6 +1259,7 @@ mlx5_alloc_shared_dev_ctx(const struct 
mlx5_dev_spawn_data *spawn,
pthread_mutex_init(&sh->txpp.mutex, NULL);
sh->numa_node = spawn->cdev->dev->numa_node;
sh->cdev = spawn->cdev;
+   sh->esw_mode = !!(spawn->info.master || spawn->info.representor);
if (spawn->bond_info)
sh->bond = *spawn->bond_info;
err = mlx5_os_get_dev_attr(sh->cdev, &sh->device_attr);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d69b6a357b..a713e61572 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1146,6 +1146,7 @@ struct mlx5_flex_item {
 struct mlx5_dev_ctx_shared {
LIST_ENTRY(mlx5_dev_ctx_shared) next;
uint32_t refcnt;
+   uint32_t esw_mode:1; /* Whether is E-Switch mode. */
uint32_t flow_hit_aso_en:1; /* Flow Hit ASO is supported. */
uint32_t steering_format_version:4;
/* Indicates the device steering logic format. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 801c467bba..06d5acb75f 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -672,7 +672,7 @@ mlx5_port_to_eswitch_info(uint16_t port, bool valid)
}
dev = &rte_eth_devices[port];
priv = dev->data->dev_private;
-   if (!(priv->representor || priv->master)) {
+   if (!priv->sh->esw_mode) {
rte_errno = EINVAL;
return NULL;
}
@@ -699,7 +699,7 @@ mlx5_dev_to_eswitch_info(struct rte_eth_dev *dev)
struct mlx5_priv *priv;
 
priv = dev->data->dev_private;
-   if (!(priv->representor || priv->master)) {
+   if (!priv->sh->esw_mode) {

[PATCH 14/20] net/mlx5: rearrange device attribute structure

2022-01-27 Thread Michael Baum
Rearrange the mlx5_os_get_dev_attr() function in such a way that it
first executes the queries and only then updates the fields.
In addition, it changed its name in preparation for expanding its
operations to configure the capabilities inside it.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c| 122 +---
 drivers/net/mlx5/linux/mlx5_verbs.c |   5 +-
 drivers/net/mlx5/mlx5.c |   4 +-
 drivers/net/mlx5/mlx5.h |  56 ++---
 drivers/net/mlx5/mlx5_devx.c|   2 +-
 drivers/net/mlx5/mlx5_ethdev.c  |   5 +-
 drivers/net/mlx5/mlx5_trigger.c |   8 +-
 drivers/net/mlx5/mlx5_txq.c |  18 ++--
 drivers/net/mlx5/windows/mlx5_os.c  |  67 ++-
 9 files changed, 127 insertions(+), 160 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 47b088db83..b6848fc34c 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -131,46 +131,25 @@ mlx5_os_set_nonblock_channel_fd(int fd)
  * with out parameter of type 'struct ibv_device_attr_ex *'. Then fill in mlx5
  * device attributes from the glue out parameter.
  *
- * @param cdev
- *   Pointer to mlx5 device.
- *
- * @param device_attr
- *   Pointer to mlx5 device attributes.
+ * @param sh
+ *   Pointer to shared device context.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_os_get_dev_attr(struct mlx5_common_device *cdev,
-struct mlx5_dev_attr *device_attr)
+mlx5_os_capabilities_prepare(struct mlx5_dev_ctx_shared *sh)
 {
int err;
-   struct ibv_context *ctx = cdev->ctx;
+   struct ibv_context *ctx = sh->cdev->ctx;
struct ibv_device_attr_ex attr_ex;
+   struct mlx5dv_context dv_attr = { .comp_mask = 0 };
 
-   memset(device_attr, 0, sizeof(*device_attr));
err = mlx5_glue->query_device_ex(ctx, NULL, &attr_ex);
if (err) {
rte_errno = errno;
return -rte_errno;
}
-   device_attr->device_cap_flags_ex = attr_ex.device_cap_flags_ex;
-   device_attr->max_qp_wr = attr_ex.orig_attr.max_qp_wr;
-   device_attr->max_sge = attr_ex.orig_attr.max_sge;
-   device_attr->max_cq = attr_ex.orig_attr.max_cq;
-   device_attr->max_cqe = attr_ex.orig_attr.max_cqe;
-   device_attr->max_mr = attr_ex.orig_attr.max_mr;
-   device_attr->max_pd = attr_ex.orig_attr.max_pd;
-   device_attr->max_qp = attr_ex.orig_attr.max_qp;
-   device_attr->max_srq = attr_ex.orig_attr.max_srq;
-   device_attr->max_srq_wr = attr_ex.orig_attr.max_srq_wr;
-   device_attr->raw_packet_caps = attr_ex.raw_packet_caps;
-   device_attr->max_rwq_indirection_table_size =
-   attr_ex.rss_caps.max_rwq_indirection_table_size;
-   device_attr->max_tso = attr_ex.tso_caps.max_tso;
-   device_attr->tso_supported_qpts = attr_ex.tso_caps.supported_qpts;
-
-   struct mlx5dv_context dv_attr = { .comp_mask = 0 };
 #ifdef HAVE_IBV_MLX5_MOD_SWP
dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_SWP;
 #endif
@@ -185,31 +164,40 @@ mlx5_os_get_dev_attr(struct mlx5_common_device *cdev,
rte_errno = errno;
return -rte_errno;
}
-
-   device_attr->flags = dv_attr.flags;
-   device_attr->comp_mask = dv_attr.comp_mask;
+   memset(&sh->dev_cap, 0, sizeof(struct mlx5_dev_cap));
+   sh->dev_cap.device_cap_flags_ex = attr_ex.device_cap_flags_ex;
+   sh->dev_cap.max_qp_wr = attr_ex.orig_attr.max_qp_wr;
+   sh->dev_cap.max_sge = attr_ex.orig_attr.max_sge;
+   sh->dev_cap.max_cq = attr_ex.orig_attr.max_cq;
+   sh->dev_cap.max_qp = attr_ex.orig_attr.max_qp;
+   sh->dev_cap.raw_packet_caps = attr_ex.raw_packet_caps;
+   sh->dev_cap.max_rwq_indirection_table_size =
+   attr_ex.rss_caps.max_rwq_indirection_table_size;
+   sh->dev_cap.max_tso = attr_ex.tso_caps.max_tso;
+   sh->dev_cap.tso_supported_qpts = attr_ex.tso_caps.supported_qpts;
+   strlcpy(sh->dev_cap.fw_ver, attr_ex.orig_attr.fw_ver,
+   sizeof(sh->dev_cap.fw_ver));
+   sh->dev_cap.flags = dv_attr.flags;
+   sh->dev_cap.comp_mask = dv_attr.comp_mask;
 #ifdef HAVE_IBV_MLX5_MOD_SWP
-   device_attr->sw_parsing_offloads =
+   sh->dev_cap.sw_parsing_offloads =
dv_attr.sw_parsing_caps.sw_parsing_offloads;
 #endif
 #ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
-   device_attr->min_single_stride_log_num_of_bytes =
+   sh->dev_cap.min_single_stride_log_num_of_bytes =
dv_attr.striding_rq_caps.min_single_stride_log_num_of_bytes;
-   device_attr->max_single_stride_log_num_of_bytes =
+   sh->dev_cap.max_single_stride_log_num_of_bytes =
dv_attr.striding_rq_caps.max_single_stride_log_num_of_bytes;
-   device_attr->min_single_wqe_log_num_of_strides =
+   sh->dev_cap.min_single_wqe_log_num_o

[PATCH 12/20] net/mlx5: share counter config function

2022-01-27 Thread Michael Baum
The mlx5_flow_counter_mode_config function exists for both Linux and
Windows with the same name and content.
This patch moves its implementation to the folder shared between the
operating systems, removing the duplication.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c   | 40 --
 drivers/net/mlx5/mlx5.c| 40 ++
 drivers/net/mlx5/mlx5.h|  1 +
 drivers/net/mlx5/windows/mlx5_os.c | 40 --
 4 files changed, 41 insertions(+), 80 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index bb90cc4426..9a05c1ba44 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -673,46 +673,6 @@ mlx5_init_once(void)
return ret;
 }
 
-/**
- * DV flow counter mode detect and config.
- *
- * @param dev
- *   Pointer to rte_eth_dev structure.
- *
- */
-static void
-mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-   struct mlx5_priv *priv = dev->data->dev_private;
-   struct mlx5_dev_ctx_shared *sh = priv->sh;
-   struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
-   bool fallback;
-
-#ifndef HAVE_IBV_DEVX_ASYNC
-   fallback = true;
-#else
-   fallback = false;
-   if (!sh->cdev->config.devx || !priv->config.dv_flow_en ||
-   !hca_attr->flow_counters_dump ||
-   !(hca_attr->flow_counter_bulk_alloc_bitmap & 0x4) ||
-   (mlx5_flow_dv_discover_counter_offset_support(dev) == -ENOTSUP))
-   fallback = true;
-#endif
-   if (fallback)
-   DRV_LOG(INFO, "Use fall-back DV counter management. Flow "
-   "counter dump:%d, bulk_alloc_bitmap:0x%hhx.",
-   hca_attr->flow_counters_dump,
-   hca_attr->flow_counter_bulk_alloc_bitmap);
-   /* Initialize fallback mode only on the port initializes sh. */
-   if (sh->refcnt == 1)
-   sh->cmng.counter_fallback = fallback;
-   else if (fallback != sh->cmng.counter_fallback)
-   DRV_LOG(WARNING, "Port %d in sh has different fallback mode "
-   "with others:%d.", PORT_ID(priv), fallback);
-#endif
-}
-
 /**
  * DR flow drop action support detect.
  *
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 5146359100..27bcca9012 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -513,6 +513,46 @@ mlx5_flow_aging_init(struct mlx5_dev_ctx_shared *sh)
}
 }
 
+/**
+ * DV flow counter mode detect and config.
+ *
+ * @param dev
+ *   Pointer to rte_eth_dev structure.
+ *
+ */
+void
+mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+   struct mlx5_priv *priv = dev->data->dev_private;
+   struct mlx5_dev_ctx_shared *sh = priv->sh;
+   struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
+   bool fallback;
+
+#ifndef HAVE_IBV_DEVX_ASYNC
+   fallback = true;
+#else
+   fallback = false;
+   if (!sh->cdev->config.devx || !priv->config.dv_flow_en ||
+   !hca_attr->flow_counters_dump ||
+   !(hca_attr->flow_counter_bulk_alloc_bitmap & 0x4) ||
+   (mlx5_flow_dv_discover_counter_offset_support(dev) == -ENOTSUP))
+   fallback = true;
+#endif
+   if (fallback)
+   DRV_LOG(INFO, "Use fall-back DV counter management. Flow "
+   "counter dump:%d, bulk_alloc_bitmap:0x%hhx.",
+   hca_attr->flow_counters_dump,
+   hca_attr->flow_counter_bulk_alloc_bitmap);
+   /* Initialize fallback mode only on the port initializes sh. */
+   if (sh->refcnt == 1)
+   sh->cmng.counter_fallback = fallback;
+   else if (fallback != sh->cmng.counter_fallback)
+   DRV_LOG(WARNING, "Port %d in sh has different fallback mode "
+   "with others:%d.", PORT_ID(priv), fallback);
+#endif
+}
+
 /**
  * Initialize the counters management structure.
  *
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0f90d757e9..d69b6a357b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1535,6 +1535,7 @@ int mlx5_dev_check_sibling_config(struct 
mlx5_dev_ctx_shared *sh,
  struct rte_device *dpdk_dev);
 bool mlx5_flex_parser_ecpri_exist(struct rte_eth_dev *dev);
 int mlx5_flex_parser_ecpri_alloc(struct rte_eth_dev *dev);
+void mlx5_flow_counter_mode_config(struct rte_eth_dev *dev);
 int mlx5_flow_aso_age_mng_init(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_flow_mtrs_mng_init(struct mlx5_dev_ctx_shared *sh);
 int mlx5_flow_aso_ct_mng_init(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/windows/mlx5_os.c 
b/drivers/net/mlx5/windows/mlx5_os.c
index a9c7ba2a14..eaa63ad50f 100644
--- a/drivers/net/mlx5/windows

[PATCH 15/20] net/mlx5: concentrate all device configurations

2022-01-27 Thread Michael Baum
Move all device configure to be performed by mlx5_os_cap_config()
function instead of the spawn function.
In addition move all relevant fields from mlx5_dev_config structure to
mlx5_dev_cap.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c  | 497 +-
 drivers/net/mlx5/linux/mlx5_vlan_os.c |   3 +-
 drivers/net/mlx5/mlx5.c   |  11 +-
 drivers/net/mlx5/mlx5.h   |  78 ++--
 drivers/net/mlx5/mlx5_devx.c  |   6 +-
 drivers/net/mlx5/mlx5_ethdev.c|  12 +-
 drivers/net/mlx5/mlx5_flow.c  |   4 +-
 drivers/net/mlx5/mlx5_rxmode.c|   8 +-
 drivers/net/mlx5/mlx5_rxq.c   |  34 +-
 drivers/net/mlx5/mlx5_trigger.c   |   5 +-
 drivers/net/mlx5/mlx5_txq.c   |  36 +-
 drivers/net/mlx5/mlx5_vlan.c  |   4 +-
 drivers/net/mlx5/windows/mlx5_os.c| 101 +++---
 13 files changed, 380 insertions(+), 419 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index b6848fc34c..13db399b5e 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -141,11 +141,12 @@ int
 mlx5_os_capabilities_prepare(struct mlx5_dev_ctx_shared *sh)
 {
int err;
-   struct ibv_context *ctx = sh->cdev->ctx;
+   struct mlx5_common_device *cdev = sh->cdev;
+   struct mlx5_hca_attr *hca_attr = &cdev->config.hca_attr;
struct ibv_device_attr_ex attr_ex;
struct mlx5dv_context dv_attr = { .comp_mask = 0 };
 
-   err = mlx5_glue->query_device_ex(ctx, NULL, &attr_ex);
+   err = mlx5_glue->query_device_ex(cdev->ctx, NULL, &attr_ex);
if (err) {
rte_errno = errno;
return -rte_errno;
@@ -159,45 +160,229 @@ mlx5_os_capabilities_prepare(struct mlx5_dev_ctx_shared 
*sh)
 #ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_STRIDING_RQ;
 #endif
-   err = mlx5_glue->dv_query_device(ctx, &dv_attr);
+   err = mlx5_glue->dv_query_device(cdev->ctx, &dv_attr);
if (err) {
rte_errno = errno;
return -rte_errno;
}
memset(&sh->dev_cap, 0, sizeof(struct mlx5_dev_cap));
-   sh->dev_cap.device_cap_flags_ex = attr_ex.device_cap_flags_ex;
+   if (mlx5_dev_is_pci(cdev->dev))
+   sh->dev_cap.vf = mlx5_dev_is_vf_pci(RTE_DEV_TO_PCI(cdev->dev));
+   else
+   sh->dev_cap.sf = 1;
sh->dev_cap.max_qp_wr = attr_ex.orig_attr.max_qp_wr;
sh->dev_cap.max_sge = attr_ex.orig_attr.max_sge;
sh->dev_cap.max_cq = attr_ex.orig_attr.max_cq;
sh->dev_cap.max_qp = attr_ex.orig_attr.max_qp;
-   sh->dev_cap.raw_packet_caps = attr_ex.raw_packet_caps;
-   sh->dev_cap.max_rwq_indirection_table_size =
-   attr_ex.rss_caps.max_rwq_indirection_table_size;
-   sh->dev_cap.max_tso = attr_ex.tso_caps.max_tso;
-   sh->dev_cap.tso_supported_qpts = attr_ex.tso_caps.supported_qpts;
+#ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
+   sh->dev_cap.dest_tir = 1;
+#endif
+#if defined(HAVE_IBV_FLOW_DV_SUPPORT) && defined(HAVE_MLX5DV_DR)
+   DRV_LOG(DEBUG, "DV flow is supported.");
+   sh->dev_cap.dv_flow_en = 1;
+#endif
+#ifdef HAVE_MLX5DV_DR_ESWITCH
+   if (hca_attr->eswitch_manager && sh->dev_cap.dv_flow_en && sh->esw_mode)
+   sh->dev_cap.dv_esw_en = 1;
+#endif
+   /*
+* Multi-packet send is supported by ConnectX-4 Lx PF as well
+* as all ConnectX-5 devices.
+*/
+   if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) {
+   if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW) {
+   DRV_LOG(DEBUG, "Enhanced MPW is supported.");
+   sh->dev_cap.mps = MLX5_MPW_ENHANCED;
+   } else {
+   DRV_LOG(DEBUG, "MPW is supported.");
+   sh->dev_cap.mps = MLX5_MPW;
+   }
+   } else {
+   DRV_LOG(DEBUG, "MPW isn't supported.");
+   sh->dev_cap.mps = MLX5_MPW_DISABLED;
+   }
+#if (RTE_CACHE_LINE_SIZE == 128)
+   if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP)
+   sh->dev_cap.cqe_comp = 1;
+   DRV_LOG(DEBUG, "Rx CQE 128B compression is %ssupported.",
+   sh->dev_cap.cqe_comp ? "" : "not ");
+#else
+   sh->dev_cap.cqe_comp = 1;
+#endif
+#ifdef HAVE_IBV_DEVICE_MPLS_SUPPORT
+   sh->dev_cap.mpls_en =
+   ((dv_attr.tunnel_offloads_caps &
+ MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_CW_MPLS_OVER_GRE) &&
+(dv_attr.tunnel_offloads_caps &
+ MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_CW_MPLS_OVER_UDP));
+   DRV_LOG(DEBUG, "MPLS over GRE/UDP tunnel offloading is %ssupported.",
+   sh->dev_cap.mpls_en ? "" : "not ");
+#else
+   DRV_LOG(WARNING,
+   "MPLS over GRE/UDP tunnel offloading disabled due to old 
OFED/rdma-core version o

[PATCH 16/20] net/mlx5: add share device context config structure

2022-01-27 Thread Michael Baum
Add configuration structure for shared device context. This structure
contains all configurations coming from devargs which oriented to
device. It is a field of shared device context (SH) structure, and is
updated once in mlx5_alloc_shared_dev_ctx() function.
This structure cannot be changed when probing again, so add function to
prevent it. The mlx5_probe_again_args_validate() function creates a
temporary IB context configure structure according to new devargs
attached in probing again, then checks the match between the temporary
structure and the existing IB context configure structure.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c   |  95 ++
 drivers/net/mlx5/mlx5.c| 453 +
 drivers/net/mlx5/mlx5.h|  43 +--
 drivers/net/mlx5/mlx5_ethdev.c |   3 +-
 drivers/net/mlx5/mlx5_flow.c   |  30 +-
 drivers/net/mlx5/mlx5_flow.h   |   2 +-
 drivers/net/mlx5/mlx5_flow_dv.c|  45 +--
 drivers/net/mlx5/mlx5_flow_meter.c |  10 +-
 drivers/net/mlx5/mlx5_rxq.c|   7 +-
 drivers/net/mlx5/mlx5_trigger.c|  10 +-
 drivers/net/mlx5/mlx5_txpp.c   |  12 +-
 drivers/net/mlx5/mlx5_txq.c|   2 +-
 drivers/net/mlx5/windows/mlx5_os.c |  35 +--
 13 files changed, 457 insertions(+), 290 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 13db399b5e..50cc287e73 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -436,7 +436,7 @@ __mlx5_discovery_misc5_cap(struct mlx5_priv *priv)
dv_attr.priority = 3;
 #ifdef HAVE_MLX5DV_DR_ESWITCH
void *misc2_m;
-   if (priv->config.dv_esw_en) {
+   if (priv->sh->config.dv_esw_en) {
/* FDB enabled reg_c_0 */
dv_attr.match_criteria_enable |=
(1 << MLX5_MATCH_CRITERIA_ENABLE_MISC2_BIT);
@@ -557,7 +557,7 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
}
sh->tx_domain = domain;
 #ifdef HAVE_MLX5DV_DR_ESWITCH
-   if (priv->config.dv_esw_en) {
+   if (sh->config.dv_esw_en) {
domain = mlx5_glue->dr_create_domain(sh->cdev->ctx,
 MLX5DV_DR_DOMAIN_TYPE_FDB);
if (!domain) {
@@ -579,20 +579,20 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
goto error;
}
 #endif
-   if (!sh->tunnel_hub && priv->config.dv_miss_info)
+   if (!sh->tunnel_hub && sh->config.dv_miss_info)
err = mlx5_alloc_tunnel_hub(sh);
if (err) {
DRV_LOG(ERR, "mlx5_alloc_tunnel_hub failed err=%d", err);
goto error;
}
-   if (priv->config.reclaim_mode == MLX5_RCM_AGGR) {
+   if (sh->config.reclaim_mode == MLX5_RCM_AGGR) {
mlx5_glue->dr_reclaim_domain_memory(sh->rx_domain, 1);
mlx5_glue->dr_reclaim_domain_memory(sh->tx_domain, 1);
if (sh->fdb_domain)
mlx5_glue->dr_reclaim_domain_memory(sh->fdb_domain, 1);
}
sh->pop_vlan_action = mlx5_glue->dr_create_flow_action_pop_vlan();
-   if (!priv->config.allow_duplicate_pattern) {
+   if (!sh->config.allow_duplicate_pattern) {
 #ifndef HAVE_MLX5_DR_ALLOW_DUPLICATE
DRV_LOG(WARNING, "Disallow duplicate pattern is not supported - 
maybe old rdma-core version?");
 #endif
@@ -859,7 +859,7 @@ mlx5_flow_drop_action_config(struct rte_eth_dev *dev 
__rte_unused)
 #ifdef HAVE_MLX5DV_DR
struct mlx5_priv *priv = dev->data->dev_private;
 
-   if (!priv->config.dv_flow_en || !priv->sh->dr_drop_action)
+   if (!priv->sh->config.dv_flow_en || !priv->sh->dr_drop_action)
return;
/**
 * DR supports drop action placeholder when it is supported;
@@ -1115,31 +1115,9 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
strerror(rte_errno));
return NULL;
}
-   sh = mlx5_alloc_shared_dev_ctx(spawn, config);
+   sh = mlx5_alloc_shared_dev_ctx(spawn);
if (!sh)
return NULL;
-   /* Update final values for devargs before check sibling config. */
-   if (config->dv_flow_en && !sh->dev_cap.dv_flow_en) {
-   DRV_LOG(WARNING, "DV flow is not supported.");
-   config->dv_flow_en = 0;
-   }
-   if (config->dv_esw_en && !sh->dev_cap.dv_esw_en) {
-   DRV_LOG(WARNING, "E-Switch DV flow is not supported.");
-   config->dv_esw_en = 0;
-   }
-   if (config->dv_miss_info && config->dv_esw_en)
-   config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
-   if (!config->dv_esw_en &&
-   config->dv_xmeta_en != MLX5_XMETA_MODE_LEGACY) {
-   DRV_LOG(WARNING,
-   "Metadata mode %u is not supported (no E-Switch).",
-   config->dv_xmeta_en);
-   config->dv_xmeta_en = MLX5_XM

[PATCH 17/20] net/mlx5: using function to detect operation by DevX

2022-01-27 Thread Michael Baum
Add inline function indicating whether HW objects operations can be
created by DevX. It makes the code more readable.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c |  6 ++
 drivers/net/mlx5/mlx5.h  | 24 
 drivers/net/mlx5/mlx5_ethdev.c   |  3 +--
 drivers/net/mlx5/mlx5_trigger.c  |  3 +--
 4 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 50cc287e73..c432cf0858 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -370,8 +370,7 @@ mlx5_os_capabilities_prepare(struct mlx5_dev_ctx_shared *sh)
sh->dev_cap.txpp_en = 0;
 #endif
/* Check for LRO support. */
-   if (sh->dev_cap.dest_tir && sh->dev_cap.dv_flow_en &&
-   hca_attr->lro_cap) {
+   if (mlx5_devx_obj_ops_en(sh) && hca_attr->lro_cap) {
/* TBD check tunnel lro caps. */
sh->dev_cap.lro_supported = 1;
DRV_LOG(DEBUG, "Device supports LRO.");
@@ -1550,8 +1549,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
if (mlx5_flex_item_port_init(eth_dev) < 0)
goto error;
}
-   if (sh->cdev->config.devx && sh->config.dv_flow_en &&
-   sh->dev_cap.dest_tir) {
+   if (mlx5_devx_obj_ops_en(sh)) {
priv->obj_ops = devx_obj_ops;
mlx5_queue_counter_id_prepare(eth_dev);
priv->obj_ops.lb_dummy_queue_create =
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5ca48ef68f..46fa5131a7 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1496,6 +1496,30 @@ enum dr_dump_rec_type {
DR_DUMP_REC_TYPE_PMD_COUNTER = 4430,
 };
 
+/**
+ * Indicates whether HW objects operations can be created by DevX.
+ *
+ * This function is used for both:
+ *  Before creation - deciding whether to create HW objects operations by DevX.
+ *  After creation - indicator if HW objects operations were created by DevX.
+ *
+ * @param sh
+ *   Pointer to shared device context.
+ *
+ * @return
+ *   True if HW objects were created by DevX, False otherwise.
+ */
+static inline bool
+mlx5_devx_obj_ops_en(struct mlx5_dev_ctx_shared *sh)
+{
+   /*
+* When advanced DR API is available and DV flow is supported and
+* DevX is supported, HW objects operations are created by DevX.
+*/
+   return (sh->cdev->config.devx && sh->config.dv_flow_en &&
+   sh->dev_cap.dest_tir);
+}
+
 /* mlx5.c */
 
 int mlx5_getenv_int(const char *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 9e478db8df..d637dee98d 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -721,8 +721,7 @@ mlx5_hairpin_cap_get(struct rte_eth_dev *dev, struct 
rte_eth_hairpin_cap *cap)
 {
struct mlx5_priv *priv = dev->data->dev_private;
 
-   if (!priv->sh->cdev->config.devx || !priv->sh->dev_cap.dest_tir ||
-   !priv->sh->config.dv_flow_en) {
+   if (!mlx5_devx_obj_ops_en(priv->sh)) {
rte_errno = ENOTSUP;
return -rte_errno;
}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index eb03e9f7b1..e234d11215 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1104,8 +1104,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
dev->data->port_id, strerror(rte_errno));
goto error;
}
-   if ((priv->sh->cdev->config.devx && priv->sh->config.dv_flow_en &&
-priv->sh->dev_cap.dest_tir) &&
+   if (mlx5_devx_obj_ops_en(priv->sh) &&
priv->obj_ops.lb_dummy_queue_create) {
ret = priv->obj_ops.lb_dummy_queue_create(dev);
if (ret)
-- 
2.25.1



[PATCH 19/20] common/mlx5: add check for common devargs in probing again

2022-01-27 Thread Michael Baum
MLX5 common driver supports probing again in two scenarios:
 - Add new driver under existing device. common probe function gets
   it in devargs, then calls the requested driver's probe function
   (regardless of the driver's own support in probing again) with the
   existing device as parameter.
 - Transfer the probing again support of the drivers themselves
   (currently only net). In this scenario, the existing device is sent
   as a parameter to the existing driver's probe too.

In both cases it gets a new set of arguments that do not necessarily
match the configured arguments in the existing device.
Some of the arguments belong to the configuration of the existing
device, so they can't be updated in the probing again. On the other
hand, there are arguments that belong to a specific driver or specific
port and might get a new value in the probing again.
The user might generate any argument he wants in probing again, but when
he generates arguments belonging to the common device configuration, it
does not affect.

This patch adds an explicit check for the devargs belonging to the
common device configuration. If there is no match to the existing
configuration, it returns an error.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/common/mlx5/mlx5_common.c | 100 ++
 1 file changed, 100 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_common.c 
b/drivers/common/mlx5/mlx5_common.c
index 47a541f5ef..f74d27e74d 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -613,6 +613,86 @@ mlx5_common_dev_create(struct rte_device *eal_dev, 
uint32_t classes)
return cdev;
 }
 
+/**
+ * Validate common devargs when probing again.
+ *
+ * When common device probing again, it cannot change its configurations.
+ * If user ask non compatible configurations in devargs, it is error.
+ * This function checks the match between:
+ *  - Common device configurations requested by probe again devargs.
+ *  - Existing common device configurations.
+ *
+ * @param cdev
+ *   Pointer to mlx5 device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_common_probe_again_args_validate(struct mlx5_common_device *cdev)
+{
+   struct mlx5_common_dev_config *config;
+   int ret;
+
+   /* Secondary process should not handle devargs. */
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return 0;
+   /* Probe again doesn't have to generate devargs. */
+   if (cdev->dev->devargs == NULL)
+   return 0;
+   config = mlx5_malloc(MLX5_MEM_ZERO | MLX5_MEM_RTE,
+sizeof(struct mlx5_common_dev_config),
+RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+   if (config == NULL) {
+   rte_errno = -ENOMEM;
+   return -rte_errno;
+   }
+   /*
+* Creates a temporary common configure structure according to new
+* devargs attached in probing again.
+*/
+   ret = mlx5_common_config_get(cdev->dev->devargs, config);
+   if (ret) {
+   DRV_LOG(ERR, "Failed to process device configure: %s",
+   strerror(rte_errno));
+   mlx5_free(config);
+   return ret;
+   }
+   /*
+* Checks the match between the temporary structure and the existing
+* common device structure.
+*/
+   if (cdev->config.mr_ext_memseg_en ^ config->mr_ext_memseg_en) {
+   DRV_LOG(ERR, "\"mr_ext_memseg_en\" "
+   "configuration mismatch for device %s.",
+   cdev->dev->name);
+   goto error;
+   }
+   if (cdev->config.mr_mempool_reg_en ^ config->mr_mempool_reg_en) {
+   DRV_LOG(ERR, "\"mr_mempool_reg_en\" "
+   "configuration mismatch for device %s.",
+   cdev->dev->name);
+   goto error;
+   }
+   if (cdev->config.sys_mem_en ^ config->sys_mem_en) {
+   DRV_LOG(ERR,
+   "\"sys_mem_en\" configuration mismatch for device %s.",
+   cdev->dev->name);
+   goto error;
+   }
+   if (cdev->config.dbnc ^ config->dbnc) {
+   DRV_LOG(ERR, "\"dbnc\" configuration mismatch for device %s.",
+   cdev->dev->name);
+   goto error;
+   }
+   mlx5_free(config);
+   return 0;
+error:
+   mlx5_free(config);
+   rte_errno = EINVAL;
+   return -rte_errno;
+}
+
 static int
 drivers_remove(struct mlx5_common_device *cdev, uint32_t enabled_classes)
 {
@@ -699,12 +779,32 @@ mlx5_common_dev_probe(struct rte_device *eal_dev)
if (classes == 0)
/* Default to net class. */
classes = MLX5_CLASS_ETH;
+   /*
+* MLX5 common driver supports probing again in two scenarios:
+* - Add new 

[PATCH 18/20] net/mlx5: separate per port configuration

2022-01-27 Thread Michael Baum
Add configuration structure for port (ethdev). This structure contains
all configurations coming from devargs which oriented to port. It is a
field of mlx5_priv structure, and is updated in spawn function for each
port.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c   | 121 ++--
 drivers/net/mlx5/mlx5.c| 178 -
 drivers/net/mlx5/mlx5.h|  21 ++--
 drivers/net/mlx5/mlx5_devx.c   |   3 +-
 drivers/net/mlx5/mlx5_ethdev.c |   7 +-
 drivers/net/mlx5/mlx5_rxq.c|   4 +-
 drivers/net/mlx5/mlx5_tx.c |   2 +-
 drivers/net/mlx5/mlx5_txq.c|   6 +-
 drivers/net/mlx5/windows/mlx5_os.c |  55 ++---
 9 files changed, 188 insertions(+), 209 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index c432cf0858..6979385782 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -999,8 +999,6 @@ mlx5_representor_match(struct mlx5_dev_spawn_data *spawn,
  *   Backing DPDK device.
  * @param spawn
  *   Verbs device parameters (name, port, switch_info) to spawn.
- * @param config
- *   Device configuration parameters.
  * @param eth_da
  *   Device arguments.
  *
@@ -1014,12 +1012,10 @@ mlx5_representor_match(struct mlx5_dev_spawn_data 
*spawn,
 static struct rte_eth_dev *
 mlx5_dev_spawn(struct rte_device *dpdk_dev,
   struct mlx5_dev_spawn_data *spawn,
-  struct mlx5_dev_config *config,
   struct rte_eth_devargs *eth_da)
 {
const struct mlx5_switch_info *switch_info = &spawn->info;
struct mlx5_dev_ctx_shared *sh = NULL;
-   struct mlx5_hca_attr *hca_attr = &spawn->cdev->config.hca_attr;
struct ibv_port_attr port_attr = { .state = IBV_PORT_NOP };
struct rte_eth_dev *eth_dev = NULL;
struct mlx5_priv *priv = NULL;
@@ -1029,7 +1025,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
int own_domain_id = 0;
uint16_t port_id;
struct mlx5_port_info vport_info = { .query_flags = 0 };
-   int nl_rdma = -1;
+   int nl_rdma;
int i;
 
/* Determine if this port representor is supposed to be spawned. */
@@ -1107,13 +1103,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
mlx5_dev_close(eth_dev);
return NULL;
}
-   /* Process parameters. */
-   err = mlx5_args(config, dpdk_dev->devargs);
-   if (err) {
-   DRV_LOG(ERR, "failed to process device arguments: %s",
-   strerror(rte_errno));
-   return NULL;
-   }
sh = mlx5_alloc_shared_dev_ctx(spawn);
if (!sh)
return NULL;
@@ -1269,41 +1258,10 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
DRV_LOG(DEBUG, "dev_port-%u new domain_id=%u\n",
priv->dev_port, priv->domain_id);
}
-   if (config->hw_padding && !sh->dev_cap.hw_padding) {
-   DRV_LOG(DEBUG, "Rx end alignment padding isn't supported");
-   config->hw_padding = 0;
-   } else if (config->hw_padding) {
-   DRV_LOG(DEBUG, "Rx end alignment padding is enabled");
-   }
-   /*
-* MPW is disabled by default, while the Enhanced MPW is enabled
-* by default.
-*/
-   if (config->mps == MLX5_ARG_UNSET)
-   config->mps = (sh->dev_cap.mps == MLX5_MPW_ENHANCED) ?
- MLX5_MPW_ENHANCED : MLX5_MPW_DISABLED;
-   else
-   config->mps = config->mps ? sh->dev_cap.mps : MLX5_MPW_DISABLED;
-   DRV_LOG(INFO, "%sMPS is %s",
-   config->mps == MLX5_MPW_ENHANCED ? "enhanced " :
-   config->mps == MLX5_MPW ? "legacy " : "",
-   config->mps != MLX5_MPW_DISABLED ? "enabled" : "disabled");
if (sh->cdev->config.devx) {
+   struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
+
sh->steering_format_version = hca_attr->steering_format_version;
-   /* LRO is supported only when DV flow enabled. */
-   if (sh->dev_cap.lro_supported && sh->config.dv_flow_en)
-   sh->dev_cap.lro_supported = 0;
-   if (sh->dev_cap.lro_supported) {
-   /*
-* If LRO timeout is not configured by application,
-* use the minimal supported value.
-*/
-   if (!config->lro_timeout)
-   config->lro_timeout =
-  hca_attr->lro_timer_supported_periods[0];
-   DRV_LOG(DEBUG, "LRO session timeout set to %d usec",
-   config->lro_timeout);
-   }
 #if defined(HAVE_MLX5DV_DR) && \
(defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER) || \
 defined(HAVE_MLX5_DR_CREATE_ACTION_ASO))
@@ -1395,39

[PATCH 20/20] common/mlx5: refactor devargs management

2022-01-27 Thread Michael Baum
Improve the devargs handling in two aspects:
 - Parse the devargs string only once.
 - Return error and report for unknown keys.

The common driver parses once the devargs string into a dictionary, then
provides it to all the drivers' probe. Each driver updates within it
which keys it has used, then common driver receives the updated
dictionary and reports about unknown devargs.

Signed-off-by: Michael Baum 
Acked-by: Matan Azrad 
---
 drivers/common/mlx5/mlx5_common.c | 255 +-
 drivers/common/mlx5/mlx5_common.h |  36 +++-
 drivers/common/mlx5/version.map   |   2 +
 drivers/compress/mlx5/mlx5_compress.c |  38 ++--
 drivers/crypto/mlx5/mlx5_crypto.c |  39 ++--
 drivers/net/mlx5/linux/mlx5_os.c  |  47 +++--
 drivers/net/mlx5/mlx5.c   | 212 ++---
 drivers/net/mlx5/mlx5.h   |  14 +-
 drivers/net/mlx5/windows/mlx5_os.c|  18 +-
 drivers/regex/mlx5/mlx5_regex.c   |   3 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c |  32 ++--
 11 files changed, 498 insertions(+), 198 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.c 
b/drivers/common/mlx5/mlx5_common.c
index f74d27e74d..96906d3f39 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -21,6 +21,24 @@
 
 uint8_t haswell_broadwell_cpu;
 
+/* Driver type key for new device global syntax. */
+#define MLX5_DRIVER_KEY "driver"
+
+/* Enable extending memsegs when creating a MR. */
+#define MLX5_MR_EXT_MEMSEG_EN "mr_ext_memseg_en"
+
+/* Device parameter to configure implicit registration of mempool memory. */
+#define MLX5_MR_MEMPOOL_REG_EN "mr_mempool_reg_en"
+
+/* The default memory allocator used in PMD. */
+#define MLX5_SYS_MEM_EN "sys_mem_en"
+
+/*
+ * Device parameter to force doorbell register mapping
+ * to non-cahed region eliminating the extra write memory barrier.
+ */
+#define MLX5_TX_DB_NC "tx_db_nc"
+
 /* In case this is an x86_64 intel processor to check if
  * we should use relaxed ordering.
  */
@@ -92,6 +110,122 @@ driver_get(uint32_t class)
return NULL;
 }
 
+int
+mlx5_kvargs_process(struct mlx5_kvargs_ctrl *mkvlist, const char *const keys[],
+   arg_handler_t handler, void *opaque_arg)
+{
+   const struct rte_kvargs_pair *pair;
+   uint32_t i, j;
+
+   MLX5_ASSERT(mkvlist && mkvlist->kvlist);
+   /* Process parameters. */
+   for (i = 0; i < mkvlist->kvlist->count; i++) {
+   pair = &mkvlist->kvlist->pairs[i];
+   for (j = 0; keys[j] != NULL; ++j) {
+   if (strcmp(pair->key, keys[j]) != 0)
+   continue;
+   if ((*handler)(pair->key, pair->value, opaque_arg) < 0)
+   return -1;
+   mkvlist->is_used[i] = true;
+   break;
+   }
+   }
+   return 0;
+}
+
+/**
+ * Prepare a mlx5 kvargs control.
+ *
+ * @param[out] mkvlist
+ *   Pointer to mlx5 kvargs control.
+ * @param[in] devargs
+ *   The input string containing the key/value associations.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_kvargs_prepare(struct mlx5_kvargs_ctrl *mkvlist,
+   const struct rte_devargs *devargs)
+{
+   struct rte_kvargs *kvlist;
+   uint32_t i;
+
+   if (devargs == NULL)
+   return 0;
+   kvlist = rte_kvargs_parse(devargs->args, NULL);
+   if (kvlist == NULL) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   /*
+* rte_kvargs_parse enable key without value, in mlx5 PMDs we disable
+* this syntax.
+*/
+   for (i = 0; i < kvlist->count; i++) {
+   const struct rte_kvargs_pair *pair = &kvlist->pairs[i];
+   if (pair->value == NULL || *(pair->value) == '\0') {
+   DRV_LOG(ERR, "Key %s is missing value.", pair->key);
+   rte_kvargs_free(kvlist);
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   }
+   /* Makes sure all devargs used array is false. */
+   memset(mkvlist, 0, sizeof(*mkvlist));
+   mkvlist->kvlist = kvlist;
+   DRV_LOG(DEBUG, "Parse successfully %u devargs.",
+   mkvlist->kvlist->count);
+   return 0;
+}
+
+/**
+ * Release a mlx5 kvargs control.
+ *
+ * @param[out] mkvlist
+ *   Pointer to mlx5 kvargs control.
+ */
+static void
+mlx5_kvargs_release(struct mlx5_kvargs_ctrl *mkvlist)
+{
+   if (mkvlist == NULL)
+   return;
+   rte_kvargs_free(mkvlist->kvlist);
+   memset(mkvlist, 0, sizeof(*mkvlist));
+}
+
+/**
+ * Validate device arguments list.
+ * It report about the first unknown parameter.
+ *
+ * @param[in] mkvlist
+ *   Pointer to mlx5 kvargs control.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static in

Re: [PATCH] devtools: fix comment detection in forbidden token check

2022-01-27 Thread Thomas Monjalon
27/01/2022 11:55, David Marchand:
> After a comment section was detected, passing to a new hunk was not seen
> as ending the section and all subsequent hunks were ignored.
> 
> Fixes: 7413e7f2aeb3 ("devtools: alert on new calls to exit from libs")
> Cc: sta...@dpdk.org
> 
> Reported-by: Thomas Monjalon 
> Signed-off-by: David Marchand 

Applied, thanks





RE: 回复: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side

2022-01-27 Thread Ananyev, Konstantin


> > > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > Sent: Tuesday, 18 January 2022 17.54
> > >
> > > [quick summary: ethdev API to bypass mempool]
> > >
> > > 18/01/2022 16:51, Ferruh Yigit:
> > > > On 12/28/2021 6:55 AM, Feifei Wang wrote:
> > > > > Morten Brørup :
> > > > >> The patch provides a significant performance improvement, but I
> > > > >> am wondering if any real world applications exist that would use
> > > this. Only a
> > > > >> "router on a stick" (i.e. a single-port router) comes to my mind,
> > > and that is
> > > > >> probably sufficient to call it useful in the real world. Do you
> > > have any other
> > > > >> examples to support the usefulness of this patch?
> > > > >>
> > > > > One case I have is about network security. For network firewall,
> > > all packets need
> > > > > to ingress on the specified port and egress on the specified port
> > > to do packet filtering.
> > > > > In this case, we can know flow direction in advance.
> > > >
> > > > I also have some concerns on how useful this API will be in real
> > > life,
> > > > and does the use case worth the complexity it brings.
> > > > And it looks too much low level detail for the application.
> > >
> > > That's difficult to judge.
> > > The use case is limited and the API has some severe limitations.
> > > The benefit is measured with l3fwd, which is not exactly a real app.
> > > Do we want an API which improves performance in limited scenarios at
> > > the cost of breaking some general design assumptions?
> > >
> > > Can we achieve the same level of performance with a mempool trick?
> >
> > Perhaps the mbuf library could offer bulk functions for alloc/free of raw
> > mbufs - essentially a shortcut directly to the mempool library.
> >
> > There might be a few more details to micro-optimize in the mempool library,
> > if approached with this use case in mind. E.g. the
> > rte_mempool_default_cache() could do with a few unlikely() in its
> > comparisons.
> >
> > Also, for this use case, the mempool library adds tracing overhead, which 
> > this
> > API bypasses. And considering how short the code path through the mempool
> > cache is, the tracing overhead is relatively much. I.e.: memcpy(NIC->NIC) 
> > vs.
> > trace() memcpy(NIC->cache) trace() memcpy(cache->NIC).
> >
> > A key optimization point could be the number of mbufs being moved to/from
> > the mempool cache. If that number was fixed at compile time, a faster
> > memcpy() could be used. However, it seems that different PMDs use bursts of
> > either 4, 8, or in this case 32 mbufs. If only they could agree on such a 
> > simple
> > detail.
> This patch removes the stores and loads which saves on the backend cycles. I 
> do not think, other optimizations can do the same.

My thought here was that we can try to introduce for mempool-cache ZC API,
similar to one we have for the ring.
Then on TX free path we wouldn't need to copy mbufs to be freed to temporary 
array on the stack.
Instead we can put them straight from TX SW ring to the mempool cache.
That should save extra store/load for mbuf and might help to achieve some 
performance gain
without by-passing mempool.

> 
> >
> > Overall, I strongly agree that it is preferable to optimize the core 
> > libraries,
> > rather than bypass them. Bypassing will eventually lead to "spaghetti code".
> IMO, this is not "spaghetti code". There is no design rule in DPDK that says 
> the RX side must allocate buffers from a mempool or TX side
> must free buffers to a mempool. This patch does not break any modular 
> boundaries. For ex: access internal details of another library.

I also have few concerns about that approach:
- proposed implementation breaks boundary logical boundary between RX/TX code.
  Right now they co-exist independently, and design of TX path doesn't directly 
affect RX path
  and visa-versa. With proposed approach RX path need to be aware about TX 
queue details and
  mbuf freeing strategy. So if we'll decide to change TX code, we probably 
would be able to do that   
  without affecting RX path.
  That probably can be fixed by formalizing things a bit more by introducing 
new dev-ops API:
  eth_dev_tx_queue_free_mbufs(port id, queue id, mbufs_to_free[], ...)
  But that would probably eat-up significant portion of the gain you are seeing 
right now.

- very limited usage scenario - it will have a positive effect only when we 
have a fixed forwarding mapping:
  all (or nearly all) packets from the RX queue are forwarded into the same TX 
queue. 
  Even for l3fwd it doesn’t look like a generic scenario.

- we effectively link RX and TX queues - when this feature is enabled, user 
can't stop TX queue,
  without stopping RX queue first. 
  
  


Re: [PATCH v3 1/1] dma/cnxk: fix installing internal cnxk DMA headers

2022-01-27 Thread Thomas Monjalon
> > DMA module internal header files are currently being installed to the prefix
> > directory. This patch updates DMA meson config file to exclude internal
> > headers during install stage.
> > 
> > Fixes: 53f6d7328b (dma/cnxk: create and initialize device on PCI probing)
> > Cc: sta...@dpdk.org
> > 
> > Signed-off-by: Srikanth Yalavarthi 
> Acked-by: Radha Mohan Chintakuntla 

Applied, thanks





Re: [EXT] Re: [dpdk-dev] [PATCH v2 2/2] app/testpmd: add queue based pfc CLI options

2022-01-27 Thread Ajit Khaparde
On Thu, Jan 27, 2022 at 2:40 AM Ferruh Yigit  wrote:
>
> On 1/27/2022 7:13 AM, Sunil Kumar Kori wrote:
> >
> >> -Original Message-
> >> From: Ferruh Yigit 
> >> Sent: Tuesday, January 25, 2022 11:07 PM
> >> To: Jerin Jacob Kollanukkaran ; dev@dpdk.org; Xiaoyun
> >> Li ; Aman Singh ; Yuying
> >> Zhang 
> >> Cc: tho...@monjalon.net; ajit.khapa...@broadcom.com;
> >> abo...@pensando.io; andrew.rybche...@oktetlabs.ru;
> >> beilei.x...@intel.com; bruce.richard...@intel.com; ch...@att.com;
> >> chenbo@intel.com; ciara.lof...@intel.com; Devendra Singh Rawat
> >> ; ed.cz...@atomicrules.com;
> >> evge...@amazon.com; gr...@u256.net; g.si...@nxp.com;
> >> zhouguoy...@huawei.com; haiyue.w...@intel.com; Harman Kalra
> >> ; heinrich.k...@corigine.com;
> >> hemant.agra...@nxp.com; hyon...@cisco.com; igo...@amazon.com; Igor
> >> Russkikh ; jgraj...@cisco.com;
> >> jasvinder.si...@intel.com; jianw...@trustnetic.com;
> >> jiawe...@trustnetic.com; jingjing...@intel.com; johnd...@cisco.com;
> >> john.mil...@atomicrules.com; linvi...@tuxdriver.com; keith.wi...@intel.com;
> >> Kiran Kumar Kokkilagadda ;
> >> ouli...@huawei.com; Liron Himi ;
> >> lon...@microsoft.com; m...@semihalf.com; spin...@cesnet.cz;
> >> ma...@nvidia.com; matt.pet...@windriver.com;
> >> maxime.coque...@redhat.com; m...@semihalf.com; humi...@huawei.com;
> >> Pradeep Kumar Nalla ; Nithin Kumar Dabilpuram
> >> ; qiming.y...@intel.com; qi.z.zh...@intel.com;
> >> Radha Chintakuntla ; rahul.lakkire...@chelsio.com;
> >> Rasesh Mody ; rosen...@intel.com;
> >> sachin.sax...@oss.nxp.com; Satha Koteswara Rao Kottidi
> >> ; Shahed Shaikh ;
> >> shaib...@amazon.com; shepard.sie...@atomicrules.com;
> >> asoma...@amd.com; somnath.ko...@broadcom.com;
> >> sthem...@microsoft.com; steven.webs...@windriver.com; Sunil Kumar Kori
> >> ; mtetsu...@gmail.com; Veerasenareddy Burru
> >> ; viachesl...@nvidia.com; xiao.w.w...@intel.com;
> >> cloud.wangxiao...@huawei.com; yisen.zhu...@huawei.com;
> >> yongw...@vmware.com; xuanziya...@huawei.com
> >> Subject: [EXT] Re: [dpdk-dev] [PATCH v2 2/2] app/testpmd: add queue based
> >> pfc CLI options
> >>
> >> External Email
> >>
> >> --
> >> On 1/13/2022 10:27 AM, jer...@marvell.com wrote:
> >>> From: Sunil Kumar Kori 
> >>>
> >>> Patch adds command line options to configure queue based priority flow
> >>> control.
> >>>
> >>> - Syntax command is given as below:
> >>>
> >>> set pfc_queue_ctrl  rx\
> >>> tx
> >>>
> >>
> >> Isn't the order of the paramters odd, it is mixing Rx/Tx config, what about
> >> ordering Rx and Tx paramters?
> >>
> > It's been kept like this to portray config for rx_pause and tx_pause 
> > separately i.e. mode and corresponding config.
> >
>
> What do you mean 'separately'? You need to provide all arguments anyway, 
> right?
>
> I was thinking first have the Rx arguments, later Tx, like:
>
> rxtx
I think this grouping is better.

>
> Am I missing something, is there a benefit of what you did in this patch?

>
> >>> - Example command to configure queue based priority flow control
> >>> on rx and tx side for port 0, Rx queue 0, Tx queue 0 with pause
> >>> time 2047
> >>>
> >>> testpmd> set pfc_queue_ctrl 0 rx on 0 0 tx on 0 0 2047
> >>>
> >>> Signed-off-by: Sunil Kumar Kori 
> >>
> >> <...>
>


RE: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side

2022-01-27 Thread Morten Brørup
> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> Sent: Thursday, 27 January 2022 05.07
> 
> Thanks Morten, appreciate your comments. Few responses inline.
> 
> > -Original Message-
> > From: Morten Brørup 
> > Sent: Sunday, December 26, 2021 4:25 AM
> >
> > > From: Feifei Wang [mailto:feifei.wa...@arm.com]
> > > Sent: Friday, 24 December 2021 17.46
> > >
> 
> 
> > >
> > > However, this solution poses several constraint:
> > >
> > > 1)The receive queue needs to know which transmit queue it should
> take
> > > the buffers from. The application logic decides which transmit port
> to
> > > use to send out the packets. In many use cases the NIC might have a
> > > single port ([1], [2], [3]), in which case a given transmit queue
> is
> > > always mapped to a single receive queue (1:1 Rx queue: Tx queue).
> This
> > > is easy to configure.
> > >
> > > If the NIC has 2 ports (there are several references), then we will
> > > have
> > > 1:2 (RX queue: TX queue) mapping which is still easy to configure.
> > > However, if this is generalized to 'N' ports, the configuration can
> be
> > > long. More over the PMD would have to scan a list of transmit
> queues
> > > to pull the buffers from.
> >
> > I disagree with the description of this constraint.
> >
> > As I understand it, it doesn't matter now many ports or queues are in
> a NIC or
> > system.
> >
> > The constraint is more narrow:
> >
> > This patch requires that all packets ingressing on some port/queue
> must
> > egress on the specific port/queue that it has been configured to ream
> its
> > buffers from. I.e. an application cannot route packets between
> multiple ports
> > with this patch.
> Agree, this patch as is has this constraint. It is not a constraint
> that would apply for NICs with single port. The above text is
> describing some of the issues associated with generalizing the solution
> for N number of ports. If N is small, the configuration is small and
> scanning should not be bad.
> 

Perhaps we can live with the 1:1 limitation, if that is the primary use case.

Alternatively, the feature could fall back to using the mempool if unable to 
get/put buffers directly from/to a participating NIC. In this case, I envision 
a library serving as a shim layer between the NICs and the mempool. In other 
words: Take a step back from the implementation, and discuss the high level 
requirements and architecture of the proposed feature.

> >
> > >
> 
> 
> 
> > >
> >
> > You are missing the fourth constraint:
> >
> > 4) The application must transmit all received packets immediately,
> i.e. QoS
> > queueing and similar is prohibited.
> I do not understand this, can you please elaborate?. Even if there is
> QoS queuing, there would be steady stream of packets being transmitted.
> These transmitted packets will fill the buffers on the RX side.

E.g. an appliance may receive packets on a 10 Gbps backbone port, and queue 
some of the packets up for a customer with a 20 Mbit/s subscription. When there 
is a large burst of packets towards that subscriber, they will queue up in the 
QoS queue dedicated to that subscriber. During that traffic burst, there is 
much more RX than TX. And after the traffic burst, there will be more TX than 
RX.

> 
> >
> 
> 
> > >
> >
> > The patch provides a significant performance improvement, but I am
> > wondering if any real world applications exist that would use this.
> Only a
> > "router on a stick" (i.e. a single-port router) comes to my mind, and
> that is
> > probably sufficient to call it useful in the real world. Do you have
> any other
> > examples to support the usefulness of this patch?
> SmartNIC is a clear and dominant use case, typically they have a single
> port for data plane traffic (dual ports are mostly for redundancy)
> This patch avoids good amount of store operations. The smaller CPUs
> found in SmartNICs have smaller store buffers which can become
> bottlenecks. Avoiding the lcore cache saves valuable HW cache space.

OK. This is an important use case!

> 
> >
> > Anyway, the patch doesn't do any harm if unused, and the only
> performance
> > cost is the "if (rxq->direct_rxrearm_enable)" branch in the Ethdev
> driver. So I
> > don't oppose to it.
> >
> 



[PATCH v3] app/test-fib: fix possible division by zero

2022-01-27 Thread Vladimir Medvedkin
This patch fixes the division by 0,
which occurs if the number of routes is less than 10.
Can be triggered by passing -n argument with value < 10:

./dpdk-test-fib -- -n 9
...
Floating point exception (core dumped)

Fixes: 103809d032cd ("app/test-fib: add test application for FIB")
Cc: sta...@dpdk.org

Signed-off-by: Vladimir Medvedkin 
---
 app/test-fib/main.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/app/test-fib/main.c b/app/test-fib/main.c
index ecd420116a..067c5284f9 100644
--- a/app/test-fib/main.c
+++ b/app/test-fib/main.c
@@ -711,6 +711,10 @@ parse_opts(int argc, char **argv)
print_usage();
rte_exit(-EINVAL, "Invalid option -n\n");
}
+
+   if (config.nb_routes < config.print_fract)
+   config.print_fract = config.nb_routes;
+
break;
case 'd':
distrib_string = optarg;
@@ -1242,6 +1246,10 @@ main(int argc, char **argv)
config.nb_routes = 0;
while (fgets(line, sizeof(line), fr) != NULL)
config.nb_routes++;
+
+   if (config.nb_routes < config.print_fract)
+   config.print_fract = config.nb_routes;
+
rewind(fr);
}
 
-- 
2.25.1



[PATCH v3] net/enic: add support for eCPRI matching

2022-01-27 Thread John Daley
eCPRI message can be over Ethernet layer (.1Q supported also) or over
UDP layer. Message header formats are the same in these two variants.

Only up though the first packet header in the PDU can be matched.
RSS on the eCPRI payload is not supported.

Signed-off-by: John Daley 
Reviewed-by: Hyong Youb Kim 
---

v3: put new rte flow item feature in alphabetical order

 doc/guides/nics/features/enic.ini  |  1 +
 doc/guides/rel_notes/release_22_03.rst |  1 +
 drivers/net/enic/enic_fm_flow.c| 67 +-
 3 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/enic.ini 
b/doc/guides/nics/features/enic.ini
index 00231baf85..61bec4910e 100644
--- a/doc/guides/nics/features/enic.ini
+++ b/doc/guides/nics/features/enic.ini
@@ -39,6 +39,7 @@ x86-64   = Y
 Usage doc= Y
 
 [rte_flow items]
+ecpri= Y
 eth  = Y
 geneve   = Y
 geneve_opt   = Y
diff --git a/doc/guides/rel_notes/release_22_03.rst 
b/doc/guides/rel_notes/release_22_03.rst
index 33be3241b9..6786eb3b48 100644
--- a/doc/guides/rel_notes/release_22_03.rst
+++ b/doc/guides/rel_notes/release_22_03.rst
@@ -58,6 +58,7 @@ New Features
 * **Updated Cisco enic driver.**
 
   * Added rte_flow support for matching GENEVE packets.
+  * Added rte_flow support for matching eCPRI packets.
 
 Removed Items
 -
diff --git a/drivers/net/enic/enic_fm_flow.c b/drivers/net/enic/enic_fm_flow.c
index bf04d714d0..f0bda19a70 100644
--- a/drivers/net/enic/enic_fm_flow.c
+++ b/drivers/net/enic/enic_fm_flow.c
@@ -237,6 +237,7 @@ static enic_copy_item_fn enic_fm_copy_item_vxlan;
 static enic_copy_item_fn enic_fm_copy_item_gtp;
 static enic_copy_item_fn enic_fm_copy_item_geneve;
 static enic_copy_item_fn enic_fm_copy_item_geneve_opt;
+static enic_copy_item_fn enic_fm_copy_item_ecpri;
 
 /* Ingress actions */
 static const enum rte_flow_action_type enic_fm_supported_ig_actions[] = {
@@ -392,6 +393,15 @@ static const struct enic_fm_items enic_fm_items[] = {
   RTE_FLOW_ITEM_TYPE_END,
},
},
+   [RTE_FLOW_ITEM_TYPE_ECPRI] = {
+   .copy_item = enic_fm_copy_item_ecpri,
+   .valid_start_item = 1,
+   .prev_items = (const enum rte_flow_item_type[]) {
+  RTE_FLOW_ITEM_TYPE_ETH,
+  RTE_FLOW_ITEM_TYPE_UDP,
+  RTE_FLOW_ITEM_TYPE_END,
+   },
+   },
 };
 
 static int
@@ -877,6 +887,61 @@ enic_fm_copy_item_geneve_opt(struct copy_item_args *arg)
return 0;
 }
 
+/* Match eCPRI combined message header */
+static int
+enic_fm_copy_item_ecpri(struct copy_item_args *arg)
+{
+   const struct rte_flow_item *item = arg->item;
+   const struct rte_flow_item_ecpri *spec = item->spec;
+   const struct rte_flow_item_ecpri *mask = item->mask;
+   struct fm_tcam_match_entry *entry = arg->fm_tcam_entry;
+   struct fm_header_set *fm_data, *fm_mask;
+   uint8_t *fm_data_to, *fm_mask_to;
+
+   ENICPMD_FUNC_TRACE();
+
+   /* Tunneling not supported- only matching on inner eCPRI fields. */
+   if (arg->header_level > 0)
+   return -EINVAL;
+
+   /* Need both spec and mask */
+   if (!spec || !mask)
+   return -EINVAL;
+
+   fm_data = &entry->ftm_data.fk_hdrset[0];
+   fm_mask = &entry->ftm_mask.fk_hdrset[0];
+
+   /* eCPRI can only follow L2/VLAN layer if ethernet type is 0xAEFE. */
+   if (!(fm_data->fk_metadata & FKM_UDP) &&
+   (fm_mask->l2.eth.fk_ethtype != UINT16_MAX ||
+   rte_cpu_to_be_16(fm_data->l2.eth.fk_ethtype) !=
+   RTE_ETHER_TYPE_ECPRI))
+   return -EINVAL;
+
+   if (fm_data->fk_metadata & FKM_UDP) {
+   /* eCPRI on UDP */
+   fm_data->fk_header_select |= FKH_L4RAW;
+   fm_mask->fk_header_select |= FKH_L4RAW;
+   fm_data_to = &fm_data->l4.rawdata[sizeof(fm_data->l4.udp)];
+   fm_mask_to = &fm_mask->l4.rawdata[sizeof(fm_data->l4.udp)];
+   } else {
+   /* eCPRI directly after Etherent header */
+   fm_data->fk_header_select |= FKH_L3RAW;
+   fm_mask->fk_header_select |= FKH_L3RAW;
+   fm_data_to = &fm_data->l3.rawdata[0];
+   fm_mask_to = &fm_mask->l3.rawdata[0];
+   }
+
+   /*
+* Use the raw L3 or L4 buffer to match eCPRI since fm_header_set does
+* not have eCPRI header. Only 1st message header of PDU can be matched.
+* "C" * bit ignored.
+*/
+   memcpy(fm_data_to, spec, sizeof(*spec));
+   memcpy(fm_mask_to, mask, sizeof(*mask));
+   return 0;
+}
+
 /*
  * Currently, raw pattern match is very limited. It is intended for matching
  * UDP tunnel header (e.g. vxlan or geneve).
@@ -2521,11 +2586,11 @@ enic_action_handle_get(struct enic_flowman *fm, struct 
fm_action *action_

[PATCH v3] net/enic: support max descriptors allowed by adapter

2022-01-27 Thread John Daley
Newer VIC adapters have the max number of supported RX and TX
descriptors in their configuration. Use these values as the
maximums.

Signed-off-by: John Daley 
Reviewed-by: Hyong Youb Kim 
---
v3: add line just below so 0-day bot applies dependency
Depends-on: patch-105799 ("net/enic: update VIC firmware API")

 drivers/net/enic/base/cq_enet_desc.h |  6 -
 drivers/net/enic/enic_res.c  | 20 
 drivers/net/enic/enic_res.h  |  6 +++--
 drivers/net/enic/enic_rxtx.c | 35 +++-
 4 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/drivers/net/enic/base/cq_enet_desc.h 
b/drivers/net/enic/base/cq_enet_desc.h
index a34a4f5400..02db85b9a0 100644
--- a/drivers/net/enic/base/cq_enet_desc.h
+++ b/drivers/net/enic/base/cq_enet_desc.h
@@ -67,7 +67,8 @@ struct cq_enet_rq_desc_64 {
uint16_t vlan;
uint16_t checksum_fcoe;
uint8_t flags;
-   uint8_t unused[48];
+   uint8_t fetch_idx_flags;
+   uint8_t unused[47];
uint8_t type_color;
 };
 
@@ -92,6 +93,9 @@ struct cq_enet_rq_desc_64 {
 #define CQ_ENET_RQ_DESC_BYTES_WRITTEN_BITS  14
 #define CQ_ENET_RQ_DESC_BYTES_WRITTEN_MASK \
((1 << CQ_ENET_RQ_DESC_BYTES_WRITTEN_BITS) - 1)
+#define CQ_ENET_RQ_DESC_FETCH_IDX_BITS  2
+#define CQ_ENET_RQ_DESC_FETCH_IDX_MASK \
+   ((1 << CQ_ENET_RQ_DESC_FETCH_IDX_BITS) - 1)
 #define CQ_ENET_RQ_DESC_FLAGS_TRUNCATED (0x1 << 14)
 #define CQ_ENET_RQ_DESC_FLAGS_VLAN_STRIPPED (0x1 << 15)
 
diff --git a/drivers/net/enic/enic_res.c b/drivers/net/enic/enic_res.c
index 9cfb857939..caf773bab2 100644
--- a/drivers/net/enic/enic_res.c
+++ b/drivers/net/enic/enic_res.c
@@ -26,6 +26,7 @@ int enic_get_vnic_config(struct enic *enic)
struct vnic_enet_config *c = &enic->config;
int err;
uint64_t sizes;
+   uint32_t max_rq_descs, max_wq_descs;
 
err = vnic_dev_get_mac_addr(enic->vdev, enic->mac_addr);
if (err) {
@@ -57,6 +58,8 @@ int enic_get_vnic_config(struct enic *enic)
GET_CONFIG(loop_tag);
GET_CONFIG(num_arfs);
GET_CONFIG(max_pkt_size);
+   GET_CONFIG(max_rq_ring);
+   GET_CONFIG(max_wq_ring);
 
/* max packet size is only defined in newer VIC firmware
 * and will be 0 for legacy firmware and VICs
@@ -101,20 +104,29 @@ int enic_get_vnic_config(struct enic *enic)
((enic->filter_actions & FILTER_ACTION_COUNTER_FLAG) ?
 "count " : ""));
 
-   c->wq_desc_count = RTE_MIN((uint32_t)ENIC_MAX_WQ_DESCS,
+   /* The max size of RQ and WQ rings are specified in 1500 series VICs and
+* beyond. If they are not specified by the VIC or if 64B CQ descriptors
+* are not being used, the max number of descriptors is 4096.
+*/
+   max_wq_descs = (enic->cq64_request && c->max_wq_ring) ? c->max_wq_ring :
+  ENIC_LEGACY_MAX_WQ_DESCS;
+   c->wq_desc_count = RTE_MIN(max_wq_descs,
RTE_MAX((uint32_t)ENIC_MIN_WQ_DESCS, c->wq_desc_count));
c->wq_desc_count &= 0xffe0; /* must be aligned to groups of 32 */
-
-   c->rq_desc_count = RTE_MIN((uint32_t)ENIC_MAX_RQ_DESCS,
+   max_rq_descs = (enic->cq64_request && c->max_rq_ring) ? c->max_rq_ring
+  : ENIC_LEGACY_MAX_WQ_DESCS;
+   c->rq_desc_count = RTE_MIN(max_rq_descs,
RTE_MAX((uint32_t)ENIC_MIN_RQ_DESCS, c->rq_desc_count));
c->rq_desc_count &= 0xffe0; /* must be aligned to groups of 32 */
+   dev_debug(NULL, "Max supported VIC descriptors: WQ:%u, RQ:%u\n",
+ max_wq_descs, max_rq_descs);
 
c->intr_timer_usec = RTE_MIN(c->intr_timer_usec,
  vnic_dev_get_intr_coal_timer_max(enic->vdev));
 
dev_info(enic_get_dev(enic),
"vNIC MAC addr " RTE_ETHER_ADDR_PRT_FMT
-   "wq/rq %d/%d mtu %d, max mtu:%d\n",
+   " wq/rq %d/%d mtu %d, max mtu:%d\n",
enic->mac_addr[0], enic->mac_addr[1], enic->mac_addr[2],
enic->mac_addr[3], enic->mac_addr[4], enic->mac_addr[5],
c->wq_desc_count, c->rq_desc_count,
diff --git a/drivers/net/enic/enic_res.h b/drivers/net/enic/enic_res.h
index 34f15d5a42..ae979d52be 100644
--- a/drivers/net/enic/enic_res.h
+++ b/drivers/net/enic/enic_res.h
@@ -12,9 +12,11 @@
 #include "vnic_rq.h"
 
 #define ENIC_MIN_WQ_DESCS  64
-#define ENIC_MAX_WQ_DESCS  4096
 #define ENIC_MIN_RQ_DESCS  64
-#define ENIC_MAX_RQ_DESCS  4096
+
+/* 1400 series VICs and prior all have 4K max, after that it's in the config */
+#define ENIC_LEGACY_MAX_WQ_DESCS4096
+#define ENIC_LEGACY_MAX_RQ_DESCS4096
 
 /* A descriptor ring has a multiple of 32 descriptors */
 #define ENIC_ALIGN_DESCS   32
diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c
index 33e96b480e..74a90694c7 

[Bug 929] rte_dump_stack() is not safe to call from signal handler

2022-01-27 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=929

Bug ID: 929
   Summary: rte_dump_stack() is not safe to call from signal
handler
   Product: DPDK
   Version: 21.11
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: core
  Assignee: dev@dpdk.org
  Reporter: step...@networkplumber.org
  Target Milestone: ---

A common scenario is to have a signal handler call rte_dump_stack().
Unfortunately, existing version is not safe in signal handler because it
assumes that malloc pool is not corrupted.

https://www.gnu.org/software/libc/manual/html_node/Backtraces.html

Better alternative is libunwind or libbacktrace

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace

2022-01-27 Thread Stephen Hemminger
On Mon, 6 Sep 2021 09:47:29 +0530
 wrote:

> From: Jerin Jacob 
> 
> adding optional libwind library dependency to DPDK for
> enhanced backtrace based on ucontext.
> 
> Signed-off-by: Jerin Jacob 


Was looking for better backtrace and noticed that there is libbacktrace
on github (BSD licensed). It provides more information like file and line 
number.
Maybe DPDK should integrate it?


PS: existing rte_dump_stack() is not safe from signal handlers.
https://bugs.dpdk.org/show_bug.cgi?id=929


RE: [PATCH v2 03/10] ethdev: bring in async queue-based flow rules

2022-01-27 Thread Alexander Kozyrev
On Wednesday, January 26, 2022 13:54 Ajit Khaparde  
wrote:
>
> On Tue, Jan 25, 2022 at 9:03 PM Alexander Kozyrev 
> wrote:
> >
> > On Monday, January 24, 2022 19:00 Ivan Malov 
> wrote:
> > > This series is very helpful as it draws attention to
> > > the problem of making flow API efficient. That said,
> > > there is much room for improvement, especially in
> > > what comes to keeping things clear and concise.
> > >
> > > In example, the following APIs
> > >
> > > - rte_flow_q_flow_create()
> > > - rte_flow_q_flow_destroy()
> > > - rte_flow_q_action_handle_create()
> > > - rte_flow_q_action_handle_destroy()
> > > - rte_flow_q_action_handle_update()
> > >
> > > should probably be transformed into a unified one
> > >
> > > int
> > > rte_flow_q_submit_task(uint16_t  port_id,
> > > uint32_t  queue_id,
> > > const struct rte_flow_q_ops_attr *q_ops_attr,
> > > enum rte_flow_q_task_type task_type,
> > > const void   *task_data,
> > > rte_flow_q_task_cookie_t  cookie,
> > > struct rte_flow_error*error);
> > >
> > > with a handful of corresponding enum defines and data structures
> > > for these 5 operations.
> > We were thinking about the unified function for all queue operations.
> > But it has too many drawbacks in our opinion:
> > 1. Different operation return different results and unneeded parameters.
> > q_flow_create gives a flow handle, q_action_handle returns indirect action
> handle.
> > destroy functions return the status. All these cases needs to be handled
> differently.
> > Also, the unified function is bloated with various parameters not needed
> for all operations.
> > Both of these point results in hard to understand API and messy
> documentation with
> > various structures on how to use it in every particular case.
> > 2. Performance consideration.
> > We aimed the new API with the insertion/deletion rate in mind.
> > Adding if conditions to distinguish between requested operation will cause
> some degradation.
> > It is preferred to have separate small functions that do one job and make it
> efficient.
> > 3. Conforming to the current API.
> > The idea is to have the same API as we had before and extend it with
> asynchronous counterparts.
> > That is why we took the familiar functions and added queue-based version
> s for them.
> > It is easier for application to switch to new API with this approach.
> Interfaces are still the same.
> Alexander, I think you have made some good points here.
> Dedicated API is better compared to the unified function.

Glad I made it clearer. Ivan, what do you think about these considerations? 

> >
> > > By the way, shouldn't this variety of operation types cover such
> > > from the tunnel offload model? Getting PMD's opaque "tunnel
> > > match" items and "tunnel set" actions - things like that.
> > Don't quite get the idea. Could you please elaborate more on this?
> >
> > > Also, I suggest that the attribute "drain"
> > > be replaced by "postpone" (invert the meaning).
> > > rte_flow_q_drain() should then be renamed to
> > > rte_flow_q_push_postponed().
> > >
> > > The rationale behind my suggestion is that "drain" tricks readers into
> > > thinking that the enqueued operations are going to be completely
> purged,
> > > whilst the true intention of the API is to "push" them to the hardware.
> > I don't have a strong opinion on this naming, if you think "postpone" is
> better.
> > Or we can name it as "doorbell" to signal a NIC about some work to be
> done
> > and "rte_flow_q_doorbell" to do this explicitly after a few operations.
> >
> > > rte_flow_q_dequeue() also needs to be revisited. The name suggests
> that
> > > some "tasks" be cancelled, whereas in reality this API implies "poll"
> > > semantics. So why not name it "rte_flow_q_poll"?
> > The polling implies an active busy-waiting of the result. Which is not the
> case here.
> > What we do is only getting results for already processed operations, hence
> "dequeue" as opposite to "queue".
> > What do you think? Or we can have push for drain and pull for dequeue as
> an alternative.
> >
> > > I believe this function should return an array of completions, just
> > > like it does in the current version, but provide a "cookie" (can be
> > > represented by a uintptr_t value) for each completion entry.
> > >
> > > The rationale behind choosing "cookie" over "user_data" is clarity.
> > > Term "user_data" sounds like "flow action conf" or "counter data",
> > > whilst in reality it likely stands for something normally called
> > > a cookie. Please correct me if I've got that wrong.
> > I haven't heard about cookies in context not related to web browsing.
> > I think that user data is more than a simple cookie, it can contain
> > anything that application wants to associate with th

[PATCH v2] net/ice: fix missing clock initialization

2022-01-27 Thread Simei Su
Rx PHY timer init value is not same as primary timer init value when
power up which will lead Rx timestamp always have big gap compared
with PTP timestamp. This patch adds PHC init time in initializing PTP
hardware clock.

Fixes: 646dcbe6c701 ("net/ice: support IEEE 1588 PTP")
Cc: sta...@dpdk.org

Signed-off-by: Simei Su 
---
v2:
* Rename commit title.

 drivers/net/ice/ice_ethdev.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index d01acb8..dbf822e 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -5661,6 +5661,8 @@ ice_timesync_enable(struct rte_eth_dev *dev)
struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ice_adapter *ad =
ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+   uint64_t start_time;
+   struct timespec system_time;
int ret;
 
if (dev->data->dev_started && !(dev->data->dev_conf.rxmode.offloads &
@@ -5682,6 +5684,15 @@ ice_timesync_enable(struct rte_eth_dev *dev)
"Failed to write PHC increment time value");
return -1;
}
+
+   clock_gettime(CLOCK_MONOTONIC, &system_time);
+   start_time = system_time.tv_sec * NSEC_PER_SEC +
+system_time.tv_nsec;
+   ret = ice_ptp_init_time(hw, start_time);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Failed to write PHC initial time");
+   return -1;
+   }
}
 
/* Initialize cycle counters for system time/RX/TX timestamp */
-- 
2.9.5



Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace

2022-01-27 Thread Jerin Jacob
On Fri, Jan 28, 2022 at 2:18 AM Stephen Hemminger
 wrote:
>
> On Mon, 6 Sep 2021 09:47:29 +0530
>  wrote:
>
> > From: Jerin Jacob 
> >
> > adding optional libwind library dependency to DPDK for
> > enhanced backtrace based on ucontext.
> >
> > Signed-off-by: Jerin Jacob 
>
>
> Was looking for better backtrace and noticed that there is libbacktrace
> on github (BSD licensed). It provides more information like file and line 
> number.
> Maybe DPDK should integrate it?

TB already decided to NOT pursue that path.


>
>
> PS: existing rte_dump_stack() is not safe from signal handlers.
> https://bugs.dpdk.org/show_bug.cgi?id=929


[Bug 930] ConnectX6 DPDK dpdk-testpmd Receive tcp ,udp Mixed flow performance is very low!

2022-01-27 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=930

Bug ID: 930
   Summary: ConnectX6 DPDK dpdk-testpmd Receive tcp ,udp Mixed
flow performance is very low!
   Product: DPDK
   Version: 21.11
  Hardware: x86
OS: Linux
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: killerst...@gmail.com
  Target Milestone: ---

I use Ixia to construct two streams
Total 20Gbps  2976 pps

flow1 udp 64size small packet
Send per second 10G bps 1488 pps 

flow2 tcp 64size small packet 
Send per second 10G bps 1488 pps



./dpdk-testpmd -l 4-22 -n 8 -- -i --rxq 19 --txq 19 --nb-cores 18 --rxd 2048
--txd 2048 --portmask 0xff

set fwd rxonly
start
show port stats all

testpmd> show port stats all

   NIC statistics for port 0  
  RX-packets: 103906391  RX-missed: 369790696  RX-bytes:  6234383466
  RX-errors: 0
  RX-nombuf:  0 
  TX-packets: 0  TX-errors: 0  TX-bytes:  0

  Throughput (since last show)
  Rx-pps:  4205026  Rx-bps:   2018412608
  Tx-pps:0  Tx-bps:0
  

Recive per second 
2018412608  bps   2g bps
4205026  pps  4 million pps

rx_discards_phy drop per second 10 million pps

[root@localhost ~]# ethtool -S enp202s0f0 |grep dis
 rx_discards_phy: 35892329864
 tx_discards_phy: 0
 rx_prio0_discards: 35892164419
 rx_prio1_discards: 0
 rx_prio2_discards: 0
 rx_prio3_discards: 0
 rx_prio4_discards: 0
 rx_prio5_discards: 0
 rx_prio6_discards: 0
 rx_prio7_discards: 0


If both flow become TCP, rx_discards_phy will not be drop.
flow1 tcp 64size small packet
flow2 tcp 64size small packet


testpmd> show port stats all

   NIC statistics for port 0  
  RX-packets: 7177423122 RX-missed: 369790696  RX-bytes:  430645390083
  RX-errors: 0
  RX-nombuf:  0 
  TX-packets: 0  TX-errors: 0  TX-bytes:  0

  Throughput (since last show)
  Rx-pps: 29779180  Rx-bps:  14294006816
  Tx-pps:0  Tx-bps:0
  
29779180 pps  29 million pps  

rx_discards_phy no drop

[root@localhost ~]# ethtool -S enp202s0f0 |grep dis
 rx_discards_phy: 0
 tx_discards_phy: 0
 rx_prio0_discards: 0
 rx_prio1_discards: 0
 rx_prio2_discards: 0
 rx_prio3_discards: 0
 rx_prio4_discards: 0
 rx_prio5_discards: 0
 rx_prio6_discards: 0
 rx_prio7_discards: 0


server 
dell poweredge r750
[root@localhost proc]# lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):64
On-line CPU(s) list:   0-63
Thread(s) per core:1
Core(s) per socket:32
Socket(s): 2
NUMA node(s):  2
Vendor ID: GenuineIntel
CPU family:6
Model: 106
Model name:Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Stepping:  6
CPU MHz:   2900.000
BogoMIPS:  5800.00
Virtualization:VT-x
L1d cache: 48K
L1i cache: 32K
L2 cache:  1280K
L3 cache:  55296K
NUMA node0 CPU(s):
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
NUMA node1 CPU(s):
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl
vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
3dnowprefetch epb cat_l3 invpcid_single intel_pt ssbd mba ibrs ibpb stibp
ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1
hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed adx smap
avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec
xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln
pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni
avx512_bitalg avx512_vpopcntdq md_clear pconfig spec_ctrl intel_stibp flush_l1d
arch_capabilities





MLX ConnectX6  100G PCIE4 x16

[root@localhost ~]# ofed_info -s
MLNX_OFED_LINUX-5.5-1.0.3.2:

[root@localhost ~]# mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
--

  Device Type:  ConnectX6
  Part Number:  MCX653106A-ECA_Ax

RE: [PATCH v1] raw/ifpga: fix ifpga devices cleanup function

2022-01-27 Thread Huang, Wei
Hi, 

> -Original Message-
> From: Yigit, Ferruh 
> Sent: Thursday, January 27, 2022 20:34
> To: Huang, Wei ; dev@dpdk.org; Xu, Rosen
> ; Zhang, Qi Z ; Nipun Gupta
> ; Hemant Agrawal 
> Cc: sta...@dpdk.org; Zhang, Tianfei 
> Subject: Re: [PATCH v1] raw/ifpga: fix ifpga devices cleanup function
> 
> On 1/27/2022 8:57 AM, Huang, Wei wrote:
> > Hi,
> >
> >> -Original Message-
> >> From: Yigit, Ferruh 
> >> Sent: Wednesday, January 26, 2022 21:25
> >> To: Huang, Wei ; dev@dpdk.org; Xu, Rosen
> >> ; Zhang, Qi Z ; Nipun Gupta
> >> ; Hemant Agrawal 
> >> Cc: sta...@dpdk.org; Zhang, Tianfei 
> >> Subject: Re: [PATCH v1] raw/ifpga: fix ifpga devices cleanup function
> >>
> >> On 1/26/2022 3:29 AM, Wei Huang wrote:
> >>> Use rte_dev_remove() to replace rte_rawdev_pmd_release() in
> >>> ifpga_rawdev_cleanup(), resources occupied by ifpga raw devices such
> >>> as threads can be released correctly.
> >>>
> >>
> >> As far as I understand you are fixing an issue that not all resources
> >> are released, is this correct?
> >> What are these not released resources?
> >>
> >> And 'rte_rawdev_pmd_release()' rawdev API seems intended to do the
> >> cleanup, is it expected that some resources are not freed after this
> >> call, or should we fix that API?
> >> If the device remove API needs to be used, what is the point of
> >> 'rte_rawdev_pmd_release()' API?
> >>
> >> cc'ed rawdev maintainers for comment.
> >
> > Yes, this patch is to release all the resources of ifpga_rawdev after 
> > testpmd
> exit, the not released resources are interrupt and thread.
> >
> > rte_rawdev_pmd_release implemented in ifpga_rawdev only release
> memory allocated by ifpga driver, that's the expected behavior.
> >
> > I think it's a simple and safe way to release resources completely by 
> > calling
> rte_dev_remove.
> >
> 
> If device hot remove is better option, why 'rte_rawdev_pmd_release()' API
> exists?

Agree, let me try to free all resources in rte_rawdev_pmd_release().

> 
> >>
> >>> Fixes: f724a802 ("raw/ifpga: add miscellaneous APIs")
> >>>
> >>> Signed-off-by: Wei Huang 
> >>> ---
> >>>drivers/raw/ifpga/ifpga_rawdev.c | 4 +++-
> >>>1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/raw/ifpga/ifpga_rawdev.c
> >>> b/drivers/raw/ifpga/ifpga_rawdev.c
> >>> index fdf3c23..88c38aa 100644
> >>> --- a/drivers/raw/ifpga/ifpga_rawdev.c
> >>> +++ b/drivers/raw/ifpga/ifpga_rawdev.c
> >>> @@ -1787,12 +1787,14 @@ int ifpga_rawdev_partial_reconfigure(struct
> >> rte_rawdev *dev, int port,
> >>>void ifpga_rawdev_cleanup(void)
> >>>{
> >>>   struct ifpga_rawdev *dev;
> >>> + struct rte_rawdev *rdev;
> >>>   unsigned int i;
> >>>
> >>>   for (i = 0; i < IFPGA_RAWDEV_NUM; i++) {
> >>>   dev = &ifpga_rawdevices[i];
> >>>   if (dev->rawdev) {
> >>> - rte_rawdev_pmd_release(dev->rawdev);
> >>> + rdev = dev->rawdev;
> >>> + rte_dev_remove(rdev->device);
> >>>   dev->rawdev = NULL;
> >>>   }
> >>>   }
> >



RE: [PATCH v3] net/i40e: reduce redundant reset operation

2022-01-27 Thread Zhang, Qi Z



> -Original Message-
> From: Feifei Wang 
> Sent: Thursday, January 27, 2022 3:40 PM
> To: Xing, Beilei 
> Cc: dev@dpdk.org; n...@arm.com; Feifei Wang ;
> Ruifeng Wang 
> Subject: [PATCH v3] net/i40e: reduce redundant reset operation
> 
> For free buffer operation in i40e vector path, it is unnecessary to store 
> 'NULL'
> into txep.mbuf. This is because when putting mbuf into Tx queue, tx_tail is 
> the
> sentinel. And when doing tx_free, tx_next_dd is the sentinel. In all 
> processes,
> mbuf==NULL is not a condition in check.
> Thus reset of mbuf is unnecessary and can be omitted.
> 
> Signed-off-by: Feifei Wang 
> Reviewed-by: Ruifeng Wang 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi