date:20220606

RE: [v1 3/4] net/mlx5: add support to modify ECN field

2022-06-06 Thread Slava Ovsiienko

> -Original Message-
> From: Sean Zhang (Networking SW) 
> Sent: Saturday, April 2, 2022 10:12
> To: Matan Azrad ; Slava Ovsiienko
> 
> Cc: dev@dpdk.org
> Subject: [v1 3/4] net/mlx5: add support to modify ECN field
> 
> This patch is to support modify ECN field in IPv4/IPv6 header.
> 
> Signed-off-by: Sean Zhang 
Acked-by: Viacheslav Ovsiienko

RE: [v1 4/4] net/mlx5: add modify field support in meter

2022-06-06 Thread Slava Ovsiienko

> -Original Message-
> From: Sean Zhang (Networking SW) 
> Sent: Saturday, April 2, 2022 10:12
> To: Matan Azrad ; Slava Ovsiienko
> 
> Cc: dev@dpdk.org
> Subject: [v1 4/4] net/mlx5: add modify field support in meter
> 
> This patch introduces MODIFY_FIELD action support in meter. User can create
> meter policy with MODIFY_FIELD action in green/yellow action.
> 
> For example:
> 
> testpmd> add port meter policy 0 21 g_actions modify_field op set
>   dst_type ipv4_ecn src_type value src_value 3 width 2 / ...
> 
> Signed-off-by: Sean Zhang 
Acked-by: Viacheslav Ovsiienko

RE: [v1] net/mlx5: support represented port item

2022-06-06 Thread Slava Ovsiienko

> -Original Message-
> From: Sean Zhang (Networking SW) 
> Sent: Saturday, April 2, 2022 9:40
> To: Matan Azrad ; Slava Ovsiienko
> 
> Cc: dev@dpdk.org
> Subject: [v1] net/mlx5: support represented port item
> 
> Add support for represented_port item in pattern. And if the spec and mask
> both are NULL, translate function will not add source vport to matcher.
> 
> For example, testpmd starts with PF, VF-rep0 and VF-rep1, below command
> will redirect packets from VF0 and VF1 to wire:
> testpmd> flow create 0 ingress transfer group 0 pattern eth /
> represented_port / end actions represented_port ethdev_id is 0 / end
> 
> Signed-off-by: Sean Zhang 
Acked-by: Viacheslav Ovsiienko

[PATCH v3] kni: fix warning about discarding const qualifier

2022-06-06 Thread Ke Zhang

The warning info:
warning: passing argument 1 of ‘memcpy’ discards ‘const’
qualifier from pointer target type

Variable dev_addr is done const intentionally in v5.17 to
prevent using it directly. See kernel series [1] for more
information.

[1] https://lore.kernel.org/netdev/YZYAb4X%2FVQFy0iks@shredder/T/

Fixes: ea6b39b5b847 ("kni: remove ethtool support")
Cc: sta...@dpdk.org

Signed-off-by: Ke Zhang 
Signed-off-by: Andrew Rybchenko 
---
 kernel/linux/kni/compat.h   | 4 
 kernel/linux/kni/kni_misc.c | 6 +-
 kernel/linux/kni/kni_net.c  | 5 -
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/linux/kni/compat.h b/kernel/linux/kni/compat.h
index 664785674f..ef1526ef85 100644
--- a/kernel/linux/kni/compat.h
+++ b/kernel/linux/kni/compat.h
@@ -141,3 +141,7 @@
 #if KERNEL_VERSION(5, 9, 0) > LINUX_VERSION_CODE
 #define HAVE_TSK_IN_GUP
 #endif
+
+#if KERNEL_VERSION(5, 15, 0) <= LINUX_VERSION_CODE
+#define HAVE_ETH_HW_ADDR_SET
+#endif
diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c
index 780187d8bf..11fea961b3 100644
--- a/kernel/linux/kni/kni_misc.c
+++ b/kernel/linux/kni/kni_misc.c
@@ -403,10 +403,14 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 
/* if user has provided a valid mac address */
if (is_valid_ether_addr(dev_info.mac_addr))
+#ifdef HAVE_ETH_HW_ADDR_SET
+   eth_hw_addr_set(net_dev, dev_info.mac_addr);
+#else
memcpy(net_dev->dev_addr, dev_info.mac_addr, ETH_ALEN);
+#endif
else
/* Generate random MAC address. */
-   eth_random_addr(net_dev->dev_addr);
+   eth_hw_addr_random(net_dev);
 
if (dev_info.mtu)
net_dev->mtu = dev_info.mtu;
diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index 29e5b9e21f..496ce7e4ae 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -779,8 +779,11 @@ kni_net_set_mac(struct net_device *netdev, void *p)
return -EADDRNOTAVAIL;
 
memcpy(req.mac_addr, addr->sa_data, netdev->addr_len);
+#ifdef HAVE_ETH_HW_ADDR_SET
+   eth_hw_addr_set(netdev, addr->sa_data);
+#else
memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len);
-
+#endif
ret = kni_net_process_request(netdev, &req);
 
return (ret == 0 ? req.result : ret);
-- 
2.25.1

[Bug 1025] [dpdk 22.07/dpdk-next-net] kernel/linux/kni meson build failed on Ub22.04/Ub20.04/Fedora36/Centos7.9/SUSE15/RHEL8.6

2022-06-06 Thread bugzilla

https://bugs.dpdk.org/show_bug.cgi?id=1025

Bug ID: 1025
   Summary: [dpdk 22.07/dpdk-next-net] kernel/linux/kni meson
build failed on
Ub22.04/Ub20.04/Fedora36/Centos7.9/SUSE15/RHEL8.6
   Product: DPDK
   Version: unspecified
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: critical
  Priority: Normal
 Component: core
  Assignee: dev@dpdk.org
  Reporter: daxuex@intel.com
  Target Milestone: ---

[Dpdk version]
dpdk branch: fb96caa56aabaa425ae66cd638ce9b9065828044


[OS version]
OS: FC36-64
Kernel Version: 5.17.7-300.fc36.x86_64
GCC Version: gcc (GCC) 12.1.1 20220507 (Red Hat 12.1.1-1)
Clang Version: 14.0.0 (Fedora 14.0.0-1.fc36)

OS: RHEL86-64
Kernel Version: 4.18.0-372.9.1.el8.x86_64
GCC Version: gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-13)
Clang Version: 14.0.0 (Red Hat 14.0.0-1.module_el8.7.0+1142+5343df54)

OS: SUSE15-64
Kernel Version: 5.3.18-57-default
GCC Version: gcc (SUSE Linux) 7.5.0
Clang Version: 11.0.1

OS: UB2004-64
Kernel Version: 5.8.0-48-generic
GCC Version: gcc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
Clang Version: 10.0.0-4ubuntu1

OS: UB2204-64
Kernel Version: 5.15.0-25-generic
GCC Version: gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0


OS: CentOS79-64
Kernel Version: 3.10.0-1160.el7.x86_64
GCC Version: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)


[Test Setup]
#  meson --werror -Denable_kmods=True -Dlibdir=lib -Dexamples=all
--default-library=static gcc-linux-app
# ninja -C gcc-linux-app

[log]

[3506/3587] Generating kernel/linux/kni/rte_kni with a custom command
FAILED: kernel/linux/kni/rte_kni.ko
/usr/bin/make -j4 -C /lib/modules/5.15.0-25-generic/build
M=/tmp/dpdk/gcc-linux-app/kernel/linux/kni src=/tmp/dpdk/kernel/linux/kni
'MODULE_CFLAGS=  -include /tmp/dpdk/config/rte_config.h
-I/tmp/dpdk/lib/eal/include -I/tmp/dpdk/lib/kni -I/tmp/dpdk/gcc-linux-app
-I/tmp/dpdk/kernel/linux/kni' modules
make: Entering directory '/usr/src/linux-headers-5.15.0-25-generic'
  CC [M]  /tmp/dpdk/gcc-linux-app/kernel/linux/kni/kni_misc.o
  CC [M]  /tmp/dpdk/gcc-linux-app/kernel/linux/kni/kni_net.o
/tmp/dpdk/kernel/linux/kni/kni_net.c: In function ‘kni_net_rx_normal’:
/tmp/dpdk/kernel/linux/kni/kni_net.c:448:2: error: #else after #else
  448 | #else
  |  ^~~~
/tmp/dpdk/kernel/linux/kni/kni_net.c:444: error: the conditional began here
  444 | #ifdef HAVE_NETIF_RX_NI
  |
/tmp/dpdk/kernel/linux/kni/kni_net.c:444: error: unterminated #else
/tmp/dpdk/kernel/linux/kni/kni_net.c:445:17: error: expected declaration or
statement at end of input
  445 | netif_rx_ni(skb);
  | ^~~
/tmp/dpdk/kernel/linux/kni/kni_net.c:445:17: error: expected declaration or
statement at end of input
/tmp/dpdk/kernel/linux/kni/kni_net.c:382:18: warning: unused variable ‘ret’
[-Wunused-variable]
  382 | uint32_t ret;
  |  ^~~
At top level:
/tmp/dpdk/kernel/linux/kni/kni_net.c:297:1: warning: ‘kni_net_tx’ defined but
not used [-Wunused-function]
  297 | kni_net_tx(struct sk_buff *skb, struct net_device *dev)
  | ^~
/tmp/dpdk/kernel/linux/kni/kni_net.c:284:1: warning: ‘kni_net_config’ defined
but not used [-Wunused-function]
  284 | kni_net_config(struct net_device *dev, struct ifmap *map)
  | ^~
/tmp/dpdk/kernel/linux/kni/kni_net.c:202:1: warning: ‘kni_net_release’ defined
but not used [-Wunused-function]
  202 | kni_net_release(struct net_device *dev)
  | ^~~
/tmp/dpdk/kernel/linux/kni/kni_net.c:180:1: warning: ‘kni_net_open’ defined but
not used [-Wunused-function]
  180 | kni_net_open(struct net_device *dev)
  | ^~~~
/tmp/dpdk/kernel/linux/kni/kni_net.c:38:21: warning: ‘kni_net_rx_func’ defined
but not used [-Wunused-variable]
   38 | static kni_net_rx_t kni_net_rx_func = kni_net_rx_normal;
  | ^~~
make[1]: *** [scripts/Makefile.build:285:
/tmp/dpdk/gcc-linux-app/kernel/linux/kni/kni_net.o] Error 1
make[1]: *** Waiting for unfinished jobs
make: *** [Makefile:1875: /tmp/dpdk/gcc-linux-app/kernel/linux/kni] Error 2
make: Leaving directory '/usr/src/linux-headers-5.15.0-25-generic'
[3513/3587] Compiling C object app/test/dpdk-test.p/test_ring.c.o
ninja: build stopped: subcommand failed.


[bad commit]
commit c98600d4bed6d15599e448990f2ba117ca938a2d
Author: Jiri Slaby 
Date:   Wed Jun 1 08:53:58 2022 +0200

kni: fix build with Linux 5.18

Since commit 2655926aea9b (net: Remove netif_rx_any_context() and
netif_rx_ni().) in 5.18, netif_rx_ni() no longer exists as netif_rx()
can be called from any context. So define HAVE_NETIF_RX_NI for older
releases and call the appropriate function in kni_net.

netif_rx_ni() must be used on older kernel since netif_rx() might
migh

Re: [dpdk-dev] [PATCH RFC] net/ena: Add Windows support.

2022-06-06 Thread Michał Krawczyk

sob., 21 maj 2022 o 00:08 Ferruh Yigit  napisał(a):
>
> On 8/30/2021 3:05 PM, William Tu wrote:
> > On Mon, Aug 30, 2021 at 12:12 AM Michał Krawczyk  wrote:
> > [...]
> >> Hi William,
> >>
> >> It's great to hear that you're working on ENA support for Windows!
> >>
> >> ENA PMD uses admin interrupt for processing all the commands like
> >> creating the IO queues, setting up the MTU, etc., and also for the
> >> AENQ events. With the current driver design it's critical to have this
> >> admin interrupt working.
> >>
> >> It looks like the admin interrupt is not functional and from what I've
> >> seen in the email regarding the v21.11 roadmap for the Windows
> >> support, the netuio interrupt support is going to be added in the
> >> future. That might be the reason for you seeing those errors.
> >
> > Hi Michal,
> > Thank you! Then I will wait for netuio support.
> > William
>
>
> Hi William, Michał,
>
> This is a very old thread, but still in the patchwork, I wonder if is
> there any update on the issue?

Hi Ferruh,

sorry for the late reply - nothing new from my side.

I'm not sure what's the current state of the netuio interrupt support
for the Windows, but if it still didn't land, then the original issue
still persists.

Thanks,
Michal

RE: [v1 2/4] common/mlx5: add modify ECN capability check

2022-06-06 Thread Slava Ovsiienko

> -Original Message-
> From: Sean Zhang (Networking SW) 
> Sent: Saturday, April 2, 2022 10:12
> To: Matan Azrad ; Slava Ovsiienko
> 
> Cc: dev@dpdk.org
> Subject: [v1 2/4] common/mlx5: add modify ECN capability check
> 
> Flag outer_ip_ecn in header modify capabilities properties layout is added in
> order to check if the firmware supports modification of ecn field.
> 
> Signed-off-by: Sean Zhang 
Acked-by: Viacheslav Ovsiienko

[PATCH v2] net/virtio: unmap PCI device in secondary process

2022-06-06 Thread Yuan Wang

In multi-process, the secondary process will remap PCI during
initialization, but the mapping is not removed in the uninit path,
the device is not closed, and the device busy error will be reported
when the device is hotplugged.

This patch unmaps PCI device at secondary process uninitialization
based on virtio_rempa_pci.

Fixes: 36a7a2e7a53 ("net/virtio: move PCI device init in dedicated file")
Cc: sta...@dpdk.org

Signed-off-by: Yuan Wang 
Tested-by: Wei Ling 
---
 drivers/net/virtio/virtio_pci_ethdev.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_pci_ethdev.c 
b/drivers/net/virtio/virtio_pci_ethdev.c
index 54645dc62e..1f6bdeddda 100644
--- a/drivers/net/virtio/virtio_pci_ethdev.c
+++ b/drivers/net/virtio/virtio_pci_ethdev.c
@@ -122,10 +122,20 @@ static int
 eth_virtio_pci_uninit(struct rte_eth_dev *eth_dev)
 {
int ret;
+   struct virtio_pci_dev *dev;
+   struct virtio_hw *hw;
PMD_INIT_FUNC_TRACE();
 
-   if (rte_eal_process_type() == RTE_PROC_SECONDARY)
+   if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+   dev = eth_dev->data->dev_private;
+   hw = &dev->hw;
+
+   if (dev->modern)
+   rte_pci_unmap_device(RTE_ETH_DEV_TO_PCI(eth_dev));
+   else
+   vtpci_legacy_ioport_unmap(hw);
return 0;
+   }
 
ret = virtio_dev_stop(eth_dev);
virtio_dev_close(eth_dev);
-- 
2.25.1

RE: [PATCH] testpmd: optimize forward stream statistics

2022-06-06 Thread Guo, Junfeng



> -Original Message-
> From: Singh, Aman Deep 
> Sent: Wednesday, June 1, 2022 16:46
> To: Guo, Junfeng ; Zhang, Qi Z
> ; Wu, Jingjing ; Xing,
> Beilei 
> Cc: dev@dpdk.org; Wang, Xiao W 
> Subject: Re: [PATCH] testpmd: optimize forward stream statistics
> 
> Hi Junfeng
> 
> On 5/24/2022 1:57 PM, Junfeng Guo wrote:
> > 1. add throughput statistics for forward stream
> > 2. display forward statistics for every forward stream
> >
> > Signed-off-by: Xiao Wang 
> > Signed-off-by: Junfeng Guo 
> > ---
> >   app/test-pmd/testpmd.c | 41
> ++---
> >   app/test-pmd/testpmd.h |  6 ++
> >   2 files changed, 44 insertions(+), 3 deletions(-)
> >
> > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> > index fe2ce19f99..076a042e77 100644
> > --- a/app/test-pmd/testpmd.c
> > +++ b/app/test-pmd/testpmd.c
> > @@ -1926,6 +1926,8 @@ fwd_stream_stats_display(streamid_t
> stream_id)
> >   {
> > struct fwd_stream *fs;
> > static const char *fwd_top_stats_border = "---";
> > +   uint64_t diff_pkts_rx, diff_pkts_tx, diff_cycles;
> > +   uint64_t pps_rx, pps_tx;
> >
> > fs = fwd_streams[stream_id];
> > if ((fs->rx_packets == 0) && (fs->tx_packets == 0) &&
> > @@ -1939,6 +1941,21 @@ fwd_stream_stats_display(streamid_t
> stream_id)
> >" TX-dropped: %-14"PRIu64,
> >fs->rx_packets, fs->tx_packets, fs->fwd_dropped);
> >
> > +   diff_pkts_rx = fs->rx_packets - fs->pre_rx;
> > +   diff_pkts_tx = fs->tx_packets - fs->pre_tx;
> > +   diff_cycles = fs->pre_cycles;
> > +
> > +   fs->pre_rx = fs->rx_packets;
> > +   fs->pre_tx = fs->tx_packets;
> > +   fs->pre_cycles = rte_rdtsc();
> > +   if (diff_cycles > 0)
> > +   diff_cycles = fs->pre_cycles - diff_cycles;
> > +
> > +   pps_rx = diff_cycles > 0 ?
> > +   (double)diff_pkts_rx * rte_get_tsc_hz() / diff_cycles : 0;
> > +   pps_tx = diff_cycles > 0 ?
> > +   (double)diff_pkts_tx * rte_get_tsc_hz() / diff_cycles : 0;
> > +
> > /* if checksum mode */
> > if (cur_fwd_eng == &csum_fwd_engine) {
> > printf("  RX- bad IP checksum: %-14"PRIu64
> > @@ -1952,6 +1969,11 @@ fwd_stream_stats_display(streamid_t
> stream_id)
> > printf("\n");
> > }
> >
> > +   printf("\n  Throughput (since last show)\n");
> > +   printf("  Rx-pps: %12"PRIu64"\n  Tx-pps: %12"PRIu64"\n", pps_rx,
> pps_tx);
> > +   fs->rx_pps = pps_rx;
> > +   fs->tx_pps = pps_tx;
> > +
> > if (record_burst_stats) {
> > pkt_burst_stats_display("RX", &fs->rx_burst_stats);
> > pkt_burst_stats_display("TX", &fs->tx_burst_stats);
> > @@ -1979,6 +2001,8 @@ fwd_stats_display(void)
> > uint64_t fwd_cycles = 0;
> > uint64_t total_recv = 0;
> > uint64_t total_xmit = 0;
> > +   uint64_t total_rx_pps = 0;
> > +   uint64_t total_tx_pps = 0;
> > struct rte_port *port;
> > streamid_t sm_id;
> > portid_t pt_id;
> > @@ -1989,10 +2013,9 @@ fwd_stats_display(void)
> > for (sm_id = 0; sm_id < cur_fwd_config.nb_fwd_streams;
> sm_id++) {
> > struct fwd_stream *fs = fwd_streams[sm_id];
> >
> > -   if (cur_fwd_config.nb_fwd_streams >
> > +   fwd_stream_stats_display(sm_id);
> > +   if (cur_fwd_config.nb_fwd_streams ==
> > cur_fwd_config.nb_fwd_ports) {
> > -   fwd_stream_stats_display(sm_id);
> > -   } else {
> > ports_stats[fs->tx_port].tx_stream = fs;
> > ports_stats[fs->rx_port].rx_stream = fs;
> > }
> > @@ -2008,7 +2031,14 @@ fwd_stats_display(void)
> >
> > if (record_core_cycles)
> > fwd_cycles += fs->core_cycles;
> > +
> > +   total_rx_pps += fs->rx_pps;
> > +   total_tx_pps += fs->tx_pps;
> > }
> > +
> > +   printf("\n  Total Rx-pps: %12"PRIu64"  Tx-pps: %12"PRIu64"\n",
> > +  total_rx_pps, total_tx_pps);
> > +
> > for (i = 0; i < cur_fwd_config.nb_fwd_ports; i++) {
> > pt_id = fwd_ports_ids[i];
> > port = &ports[pt_id];
> > @@ -2124,6 +2154,11 @@ fwd_stats_reset(void)
> > fs->rx_bad_l4_csum = 0;
> > fs->rx_bad_outer_l4_csum = 0;
> > fs->rx_bad_outer_ip_csum = 0;
> > +   fs->pre_rx = 0;
> > +   fs->pre_tx = 0;
> > +   fs->pre_cycles = 0;
> > +   fs->rx_pps = 0;
> > +   fs->tx_pps = 0;
> >
> > memset(&fs->rx_burst_stats, 0, sizeof(fs-
> >rx_burst_stats));
> > memset(&fs->tx_burst_stats, 0, sizeof(fs-
> >tx_burst_stats));
> > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> > index 31f766c965..cad57af27e 100644
> > --- a/app/test-pmd/testpmd.h
> > +++ b/app/test-pmd/testpmd.h
> > @@ -155,6 +155,12 @@ struct fwd_stream {
> > struct pkt_burst_stats rx_burst_stats;
> > struct pkt_burst_stats tx_burst_stats;
> > struct fwd_lcore *lcore; /**< Lcore being scheduled. */
> > +
> > +   uint64_t pre_rx;
> > +   uint6

RE: [PATCH] testpmd: optimize forward stream statistics

2022-06-06 Thread Guo, Junfeng



> -Original Message-
> From: Andrew Rybchenko 
> Sent: Tuesday, May 31, 2022 23:28
> To: Guo, Junfeng ; Zhang, Qi Z
> ; Wu, Jingjing ; Xing,
> Beilei 
> Cc: dev@dpdk.org; Wang, Xiao W 
> Subject: Re: [PATCH] testpmd: optimize forward stream statistics
> 
> On 5/24/22 11:27, Junfeng Guo wrote:
> > 1. add throughput statistics for forward stream
> > 2. display forward statistics for every forward stream
> >
> > Signed-off-by: Xiao Wang 
> > Signed-off-by: Junfeng Guo 
> 
> Sorry, I don't understand why summary say "optimize", but description
> do not say what is optimized and how. It just mentioned some additions.
> 
> Shouldn't summary say:
> app/testpmd: add throughput stats for forward streams

Thanks for your review and comments!
I'll update the patch in the coming version.

Regards,
Junfeng Guo

> 
> Any reviews from testpmd maintainers?

RE: [PATCH v2] net/virtio: unmap PCI device in secondary process

2022-06-06 Thread Ling, WeiX

> -Original Message-
> From: Wang, YuanX 
> Sent: Monday, June 6, 2022 11:56 PM
> To: maxime.coque...@redhat.com; Xia, Chenbo ;
> dev@dpdk.org
> Cc: Hu, Jiayu ; He, Xingguang
> ; Ling, WeiX ; Wang, YuanX
> ; sta...@dpdk.org
> Subject: [PATCH v2] net/virtio: unmap PCI device in secondary process
> 
> In multi-process, the secondary process will remap PCI during initialization,
> but the mapping is not removed in the uninit path, the device is not closed,
> and the device busy error will be reported when the device is hotplugged.
> 
> This patch unmaps PCI device at secondary process uninitialization based on
> virtio_rempa_pci.
> 
> Fixes: 36a7a2e7a53 ("net/virtio: move PCI device init in dedicated file")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Yuan Wang 
> Tested-by: Wei Ling 
> ---

Tested-by: Wei Ling

Re: Optimizations are not features

2022-06-06 Thread Konstantin Ananyev


04/06/2022 13:51, Andrew Rybchenko пишет:

On 6/4/22 15:19, Morten Brørup wrote:

From: Jerin Jacob [mailto:jerinjac...@gmail.com]
Sent: Saturday, 4 June 2022 13.10

On Sat, Jun 4, 2022 at 3:30 PM Andrew Rybchenko
 wrote:


On 6/4/22 12:33, Jerin Jacob wrote:

On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup

 wrote:


I would like the DPDK community to change its view on compile time

options. Here is why:




Application specific performance micro-optimizations like “fast

mbuf free” and “mbuf direct re-arm” are being added to DPDK and
presented as features.




They are not features, but optimizations, and I don’t understand

the need for them to be available at run-time!




Instead of adding a bunch of exotic exceptions to the fast path of

the PMDs, they should be compile time options. This will improve
performance by avoiding branches in the fast path, both for the
applications using them, and for generic applications (where the exotic
code is omitted).


Agree. I think, keeping the best of both worlds would be

-Enable the feature/optimization as runtime
-Have a compile-time option to disable the feature/optimization as

an override.


It is hard to find the right balance, but in general compile
time options are a nightmare for maintenance. Number of
required builds will grow as an exponent.


Test combinations are exponential for N features, regardless if N are 
runtime or compile time options.


But since I'm talking about build checks I don't care about exponential
grows in run time. Yes, testing should care, but it is a separate story.




Of course, we can
limit number of checked combinations, but it will result in
flow of patches to fix build in other cases.


The build breakage can be fixed if we use (2) vs (1)

1)
#ifdef ...
My feature
#endif

2)
static __rte_always_inline int
rte_has_xyz_feature(void)
{
#ifdef RTE_LIBRTE_XYZ_FEATURE
 return RTE_LIBRTE_XYZ_FEATURE;
#else
 return 0;
#endif
}

if(rte_has_xyz_feature())) {
My feature code

}



Jerin, thanks, very good example.

I'm not sure all the features can be covered by that, e.g. added 
fields in structures.


+1



Also, I would consider such features "opt in" at compile time only. As 
such, they could be allowed to break the ABI/API.






Also compile time options tend to make code less readable
which makes all aspects of the development harder.

Yes, compile time is nice for micro optimizations, but
I have great concerns that it is a right way to go.


Please note that I am only talking about the performance

optimizations that are limited to application specific use cases. I
think it makes sense to require that performance optimizing an
application also requires recompiling the performance critical
libraries used by it.

abandon some of existing functionality to create a 'short-cut'


Allowing compile time options for application specific performance

optimizations in DPDK would also open a path for other optimizations,
which can only be achieved at compile time, such as “no fragmented
packets”, “no attached mbufs” and “single mbuf pool”. And even more
exotic optimizations, such as the “indexed mempool cache”, which was
rejected due to ABI violations – they could be marked as “risky and
untested” or similar, but still be part of the DPDK main repository.





Thanks Morten for bringing it up, it is an interesting topic.
Though I look at it from different angle.
All optimizations you mentioned above introduce new limitations:
MBUF_FAST_FREE - no indirect mbufs and multiple mempools,
mempool object indexes - mempool size is limited to 4GB,
direct rearm - drop ability to stop/reconfigure TX queue,
while RX queue is still running,
etc.
Note that all these limitations are not forced by HW.
All of them are pure SW limitations that developers forced in
(or tried to) to get few extra performance.
That's concerning tendency.

As more and more such 'optimization via limitation' will come in:
- DPDK feature list will become more and more fragmented.
- Would cause more and more confusion for the users.
- Unmet expectations - difference in performance between 'default'
  and 'optimized' version of DPDK will become bigger and bigger.
- As Andrew already mentioned, maintaining all these 'sub-flavours'
  of DPDK will become more and more difficult.

So, probably instead of making such changes easier,
we need somehow to persuade developers to think more about
optimizations that would be generic and transparent to the user.
I do realize that it is not always possible due to various reasons
(HW limitations, external dependencies, etc.)
but that's another story.

Let's take for example MBUF_FAST_FREE.
In fact, I am not sure that we need it as tx offload flag at all.
PMD TX-path has all necessary information to decide at run-time
can it do fast_free() for not:
At tx_burst() PMD can check are all mbufs satisfy these conditions
(same mempool, refcnt==1) and update some fields and/or counters
inside TXQ to reflect it.
Then, at tx_free() we ca

[PATCH v2] app/testpmd: add throughput stats for forward streams

2022-06-06 Thread Junfeng Guo

1. add throughput statistics (in pps) for forward streams.
2. display the forward statistics for every forward stream.

v2:
add parameter descriptions and fix commit title.

Signed-off-by: Xiao Wang 
Signed-off-by: Junfeng Guo 
---
 app/test-pmd/testpmd.c | 41 ++---
 app/test-pmd/testpmd.h |  6 ++
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe2ce19f99..076a042e77 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1926,6 +1926,8 @@ fwd_stream_stats_display(streamid_t stream_id)
 {
struct fwd_stream *fs;
static const char *fwd_top_stats_border = "---";
+   uint64_t diff_pkts_rx, diff_pkts_tx, diff_cycles;
+   uint64_t pps_rx, pps_tx;
 
fs = fwd_streams[stream_id];
if ((fs->rx_packets == 0) && (fs->tx_packets == 0) &&
@@ -1939,6 +1941,21 @@ fwd_stream_stats_display(streamid_t stream_id)
   " TX-dropped: %-14"PRIu64,
   fs->rx_packets, fs->tx_packets, fs->fwd_dropped);
 
+   diff_pkts_rx = fs->rx_packets - fs->pre_rx;
+   diff_pkts_tx = fs->tx_packets - fs->pre_tx;
+   diff_cycles = fs->pre_cycles;
+
+   fs->pre_rx = fs->rx_packets;
+   fs->pre_tx = fs->tx_packets;
+   fs->pre_cycles = rte_rdtsc();
+   if (diff_cycles > 0)
+   diff_cycles = fs->pre_cycles - diff_cycles;
+
+   pps_rx = diff_cycles > 0 ?
+   (double)diff_pkts_rx * rte_get_tsc_hz() / diff_cycles : 0;
+   pps_tx = diff_cycles > 0 ?
+   (double)diff_pkts_tx * rte_get_tsc_hz() / diff_cycles : 0;
+
/* if checksum mode */
if (cur_fwd_eng == &csum_fwd_engine) {
printf("  RX- bad IP checksum: %-14"PRIu64
@@ -1952,6 +1969,11 @@ fwd_stream_stats_display(streamid_t stream_id)
printf("\n");
}
 
+   printf("\n  Throughput (since last show)\n");
+   printf("  Rx-pps: %12"PRIu64"\n  Tx-pps: %12"PRIu64"\n", pps_rx, 
pps_tx);
+   fs->rx_pps = pps_rx;
+   fs->tx_pps = pps_tx;
+
if (record_burst_stats) {
pkt_burst_stats_display("RX", &fs->rx_burst_stats);
pkt_burst_stats_display("TX", &fs->tx_burst_stats);
@@ -1979,6 +2001,8 @@ fwd_stats_display(void)
uint64_t fwd_cycles = 0;
uint64_t total_recv = 0;
uint64_t total_xmit = 0;
+   uint64_t total_rx_pps = 0;
+   uint64_t total_tx_pps = 0;
struct rte_port *port;
streamid_t sm_id;
portid_t pt_id;
@@ -1989,10 +2013,9 @@ fwd_stats_display(void)
for (sm_id = 0; sm_id < cur_fwd_config.nb_fwd_streams; sm_id++) {
struct fwd_stream *fs = fwd_streams[sm_id];
 
-   if (cur_fwd_config.nb_fwd_streams >
+   fwd_stream_stats_display(sm_id);
+   if (cur_fwd_config.nb_fwd_streams ==
cur_fwd_config.nb_fwd_ports) {
-   fwd_stream_stats_display(sm_id);
-   } else {
ports_stats[fs->tx_port].tx_stream = fs;
ports_stats[fs->rx_port].rx_stream = fs;
}
@@ -2008,7 +2031,14 @@ fwd_stats_display(void)
 
if (record_core_cycles)
fwd_cycles += fs->core_cycles;
+
+   total_rx_pps += fs->rx_pps;
+   total_tx_pps += fs->tx_pps;
}
+
+   printf("\n  Total Rx-pps: %12"PRIu64"  Tx-pps: %12"PRIu64"\n",
+  total_rx_pps, total_tx_pps);
+
for (i = 0; i < cur_fwd_config.nb_fwd_ports; i++) {
pt_id = fwd_ports_ids[i];
port = &ports[pt_id];
@@ -2124,6 +2154,11 @@ fwd_stats_reset(void)
fs->rx_bad_l4_csum = 0;
fs->rx_bad_outer_l4_csum = 0;
fs->rx_bad_outer_ip_csum = 0;
+   fs->pre_rx = 0;
+   fs->pre_tx = 0;
+   fs->pre_cycles = 0;
+   fs->rx_pps = 0;
+   fs->tx_pps = 0;
 
memset(&fs->rx_burst_stats, 0, sizeof(fs->rx_burst_stats));
memset(&fs->tx_burst_stats, 0, sizeof(fs->tx_burst_stats));
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 31f766c965..dc1bba5637 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -155,6 +155,12 @@ struct fwd_stream {
struct pkt_burst_stats rx_burst_stats;
struct pkt_burst_stats tx_burst_stats;
struct fwd_lcore *lcore; /**< Lcore being scheduled. */
+
+   uint64_t pre_rx; /**< previously recorded received packets */
+   uint64_t pre_tx; /**< previously recorded transmitted packets */
+   uint64_t pre_cycles; /**< previously recorded processor's time-stamp 
counter cycles */
+   uint64_t rx_pps; /**< throughput of received packets in pps */
+   uint64_t tx_pps; /**< throughput of transmitted packets in pps */
 };
 
 /**
-- 
2.25.1

RE: [PATCH v3] pcap: support MTU set

2022-06-06 Thread Ido Goshen

> -Original Message-
> From: Ferruh Yigit 
> Sent: Monday, 30 May 2022 21:06
> To: Ido Goshen ; ferruh.yi...@xilinx.com;
> step...@networkplumber.org
> Cc: dev@dpdk.org; Tianli Lai 
> Subject: Re: [PATCH v3] pcap: support MTU set
> 
> On 5/30/2022 11:36 AM, Ido Goshen wrote:
> > Support rte_eth_dev_set_mtu by pcap vdevs Enforce mtu on rx/tx
> >

> 
> > +   rte_pktmbuf_free(mbuf);
> > +   continue;
> 
> Normally a PMD should not silently free a packet itself, it should return 
> error and
> application will decide to free the packet or not.
> 

[idog] 
The doc say:
' The return value can be less than the value of the *tx_pkts* parameter when
   the transmit ring is full or has been filled up.'
Which is not the case
It will force failing all the burst's following packets too even if under MTU
I think in HW case oversized TX is dropped by the HW and not left to the app. 
Freeing might mimic it better, simpler and safer

I do miss incrementing the oerrors for that case

RE: [PATCH 6/6] net/vhost: perform SW checksum in Tx path

2022-06-06 Thread Ma, WenwuX



> -Original Message-
> From: Maxime Coquelin 
> Sent: 2022年6月2日 17:07
> To: Ma, WenwuX ; dev@dpdk.org;
> jasow...@redhat.com; Xia, Chenbo ;
> david.march...@redhat.com; Matz, Olivier 
> Cc: sta...@dpdk.org
> Subject: Re: [PATCH 6/6] net/vhost: perform SW checksum in Tx path
> 
> Hi Wenwu,
> 
> Sorry, I missed your review.
> 
> On 5/7/22 05:20, Ma, WenwuX wrote:
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: 2022年5月5日 18:27
> >> To: dev@dpdk.org; jasow...@redhat.com; Xia, Chenbo
> >> ; david.march...@redhat.com;
> >> olivier.m...@6wind.com
> >> Cc: sta...@dpdk.org; Maxime Coquelin 
> >> Subject: [PATCH 6/6] net/vhost: perform SW checksum in Tx path
> >>
> >> Virtio specification supports guest checksum offloading for L4, which
> >> is enabled with VIRTIO_NET_F_GUEST_CSUM feature negotiation.
> However,
> >> the Vhost PMD does not advertise Tx checksum offload capabilities.
> >>
> >> Advertising these offload capabilities at the ethdev level is not
> >> enough, because we could still end-up with the application enabling
> >> these offloads while the guest not negotiating it.
> >>
> >> This patch advertizes the Tx checksum offload capabilities, and
> >> introduces a compatibility layer to cover the case
> >> VIRTIO_NET_F_GUEST_CSUM has not been negotiated but the
> application
> >> does configure the Tx checksum offloads. This function performs the L4 Tx
> checksum in SW for UDP and TCP.
> >> Compared to Rx SW checksum, the Tx SW checksum function needs to
> >> compute the pseudo-header checksum, as we cannot knwo whether it
> was
> >> done before.
> >>
> >> This patch does not advertize SCTP checksum offloading capability for
> >> now, but it could be handled later if the need arises.
> >
> > In virtio_enqueue_offload(), if RTE_MBUF_F_TX_IP_CKSUM is set, we will
> > performs the L3 Tx checksum, why do not we advertise IPV4 checksum
> offloading capability?
> > Will we advertise it later?
> >
> 
> Indeed, we have an IPv4 SW checksum fallback in Vhost library.
> We could think about adding the capability, but that's not urgent I think. Do
> you have a use-case where it is needed?
> 
The GRO/GSO library doesn't re-calculate IPv4 checksums for merged/fragmented 
packets, it will cause iperf in the vm to fail.

> Regards,
> Maxime

RE: [RFC v2 2/2] ethdev: queue-based flow aged report

2022-06-06 Thread Ori Kam

Hi,
For some reason this mail stopped being plain text.
Pleas find my comment marked with [Ori]

Best,
Ori

From: Jack Min 
Sent: Thursday, June 2, 2022 1:24 PM
To: Ori Kam ; Andrew Rybchenko 
; NBU-Contact-Thomas Monjalon (EXTERNAL) 
; Ferruh Yigit 
Cc: dev@dpdk.org
Subject: Re: [RFC v2 2/2] ethdev: queue-based flow aged report

On 6/2/22 14:10, Ori Kam wrote:

Hi,
Hello,






-Original Message-

From: Andrew Rybchenko 


Sent: Wednesday, June 1, 2022 9:21 PM

Subject: Re: [RFC v2 2/2] ethdev: queue-based flow aged report



Again, summary must not be a statement.



On 6/1/22 10:39, Xiaoyu Min wrote:

When application use queue-based flow rule management and operate the

same flow rule on the same queue, e.g create/destroy/query, API of

querying aged flow rules should also have queue id parameter just like

other queue-based flow APIs.



By this way, PMD can work in more optimized way since resources are

isolated by queue and needn't synchronize.



If application do use queue-based flow management but configure port

without RTE_FLOW_PORT_FLAG_STRICT_QUEUE, which means application operate

a given flow rule on different queues, the queue id parameter will

be ignored.



In addition to the above change, another new API is added which help the

application get information about which queues have aged out flows after

RTE_ETH_EVENT_FLOW_AGED event received. The queried queue id can be

used in the above queue based query aged flows API.



Signed-off-by: Xiaoyu Min 

---

  lib/ethdev/rte_flow.h| 82 

  lib/ethdev/rte_flow_driver.h | 13 ++

  2 files changed, 95 insertions(+)



diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h

index 38439fcd1d..a12becfe3b 100644

--- a/lib/ethdev/rte_flow.h

+++ b/lib/ethdev/rte_flow.h

@@ -2810,6 +2810,7 @@ enum rte_flow_action_type {

 * See function rte_flow_get_aged_flows

 * see enum RTE_ETH_EVENT_FLOW_AGED

 * See struct rte_flow_query_age

+* See function rte_flow_get_q_aged_flows

 */

   RTE_FLOW_ACTION_TYPE_AGE,



@@ -5624,6 +5625,87 @@ rte_flow_async_action_handle_update(uint16_t port_id,

const void *update,

void *user_data,

struct rte_flow_error *error);

+

+/**

+ * @warning

+ * @b EXPERIMENTAL: this API may change without prior notice.

+ *

+ * Get flow queues which have aged out flows on a given port.

+ *

+ * The application can use this function to query which queues have aged out 
flows after

+ * a RTE_ETH_EVENT_FLOW_AGED event is received so the returned queue id can be 
used to

+ * get aged out flows on this given queue by call rte_flow_get_q_aged_flows.

+ *

+ * This function can be called from the event callback or synchronously 
regardless of the event.

+ *

+ * @param port_id

+ *   Port identifier of Ethernet device.

+ * @param[in, out] queue_id

+ *   Array of queue id that will be set.

+ * @param[in] nb_queue_id

+ *   Maximum number of the queue id that can be returned.

+ *   This value should be equal to the size of the queue_id array.

+ * @param[out] error

+ *   Perform verbose error reporting if not NULL. Initialized in case of

+ *   error only.

+ *

+ * @return

+ *   if nb_queue_id is 0, return the amount of all queues which have aged out 
flows.

+ *   if nb_queue_id is not 0 , return the amount of queues which have aged out 
flows

+ *   reported in the queue_id array, otherwise negative errno value.



I'm sorry, but it is unclear for me what happens if provided array is

insufficient to return all queues. IMHO, we still should provide as

much as we can. The question is how to report that we have more queues.

It looks like the only sensible way is to return value greater than

nb_queue_id.



I think that just like any other function, this function should return the max 
based on the requested number.

Returning bigger number may result in out of buf issues, or require extra 
validation step from application.

In addition as far as I can see the common practice in DPDK is to return the 
requested number.
Yes, it just likes other functions.




I have other concern with this function, from my understanding this function 
will be called on the service thread

that handels the aging event, after calling this function the application still 
needs to propagate the event to the

correct threads.

I think it will be better if the event itself will hold which queue triggered 
the aging. Or even better to get the

As discussed in v1, there seems no good place in the current callback function 
to pass this kind of information from driver to application.

Or you have a better idea?

[Ori] Maybe use the new queues, for example maybe application can get the 
notification as part of the polling function.
maybe it can even get the aged rules.



notification on the correct thread. (I know it is much more complicated but 
maybe it is worth the

[PATCH v4 0/2] support to clear in-flight packets for async

2022-06-06 Thread Yuan Wang

These patches support to clear in-flight packets for async dequeue
and introduce thread-safe version of this function.

v4:
- Rebase to latest DPDK

v3:
- Rebase to latest DPDK

v2:
- Rebase to latest DPDK
- Use the thread-safe version in destroy_device

v1:
- Protect vq access with splitlock

Yuan Wang (2):
  vhost: support clear in-flight packets for async dequeue
  example/vhost: support to clear in-flight packets for async dequeue

 doc/guides/prog_guide/vhost_lib.rst|  8 ++-
 doc/guides/rel_notes/release_22_07.rst |  5 ++
 examples/vhost/main.c  | 26 ++--
 lib/vhost/rte_vhost_async.h| 25 
 lib/vhost/version.map  |  1 +
 lib/vhost/virtio_net.c | 82 +-
 6 files changed, 139 insertions(+), 8 deletions(-)

-- 
2.25.1

[PATCH v4 1/2] vhost: support clear in-flight packets for async dequeue

2022-06-06 Thread Yuan Wang

rte_vhost_clear_queue_thread_unsafe() supports to clear
in-flight packets for async enqueue only. But after
supporting async dequeue, this API should support async dequeue too.

This patch also adds the thread-safe version of this API,
the difference between the two API is that thread safety uses lock.

These APIs maybe used to clean up packets in the async channel
to prevent packet loss when the device state changes or
when the device is destroyed.

Signed-off-by: Yuan Wang 
---
 doc/guides/prog_guide/vhost_lib.rst|  8 ++-
 doc/guides/rel_notes/release_22_07.rst |  5 ++
 lib/vhost/rte_vhost_async.h| 25 
 lib/vhost/version.map  |  1 +
 lib/vhost/virtio_net.c | 82 +-
 5 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst 
b/doc/guides/prog_guide/vhost_lib.rst
index cd3f6caa9a..b9545770d0 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -288,7 +288,13 @@ The following is an overview of some key Vhost API 
functions:
 
 * ``rte_vhost_clear_queue_thread_unsafe(vid, queue_id, **pkts, count, dma_id, 
vchan_id)``
 
-  Clear inflight packets which are submitted to DMA engine in vhost async data
+  Clear in-flight packets which are submitted to async channel in vhost
+  async data path without performing any locking. Completed packets are
+  returned to applications through ``pkts``.
+
+* ``rte_vhost_clear_queue(vid, queue_id, **pkts, count, dma_id, vchan_id)``
+
+  Clear in-flight packets which are submitted to async channel in vhost async 
data
   path. Completed packets are returned to applications through ``pkts``.
 
 * ``rte_vhost_vring_stats_get_names(int vid, uint16_t queue_id, struct 
rte_vhost_stat_name *names, unsigned int size)``
diff --git a/doc/guides/rel_notes/release_22_07.rst 
b/doc/guides/rel_notes/release_22_07.rst
index c81383f4a3..2ca06b543c 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -147,6 +147,11 @@ New Features
   Added vhost async dequeue API which can leverage DMA devices to
   accelerate receiving pkts from guest.
 
+* **Added thread-safe version of inflight packet clear API in vhost library.**
+
+  Added an API which can clear the inflight packets submitted to
+  the async channel in a thread-safe manner in the vhost async data path.
+
 Removed Items
 -
 
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index a1e7f674ed..1db2a10124 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -183,6 +183,31 @@ uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, 
uint16_t queue_id,
struct rte_mbuf **pkts, uint16_t count, int16_t dma_id,
uint16_t vchan_id);
 
+/**
+ * This function checks async completion status and clear packets for
+ * a specific vhost device queue. Packets which are inflight will be
+ * returned in an array.
+ *
+ * @param vid
+ *  ID of vhost device to clear data
+ * @param queue_id
+ *  Queue id to clear data
+ * @param pkts
+ *  Blank array to get return packet pointer
+ * @param count
+ *  Size of the packet array
+ * @param dma_id
+ *  The identifier of the DMA device
+ * @param vchan_id
+ *  The identifier of virtual DMA channel
+ * @return
+ *  Number of packets returned
+ */
+__rte_experimental
+uint16_t rte_vhost_clear_queue(int vid, uint16_t queue_id,
+   struct rte_mbuf **pkts, uint16_t count, int16_t dma_id,
+   uint16_t vchan_id);
+
 /**
  * The DMA vChannels used in asynchronous data path must be configured
  * first. So this function needs to be called before enabling DMA
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 4880b9a422..9329f88e79 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -95,6 +95,7 @@ EXPERIMENTAL {
rte_vhost_vring_stats_reset;
rte_vhost_async_try_dequeue_burst;
rte_vhost_driver_get_vdpa_dev_type;
+   rte_vhost_clear_queue;
 };
 
 INTERNAL {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 68a26eb17d..a90ae3cb96 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -26,6 +26,11 @@
 
 #define MAX_BATCH_LEN 256
 
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
+   struct rte_mbuf **pkts, uint16_t count, int16_t dma_id,
+   uint16_t vchan_id, bool legacy_ol_flags);
+
 /* DMA device copy operation tracking array. */
 struct async_dma_info dma_copy_track[RTE_DMADEV_DEFAULT_MAX];
 
@@ -2155,7 +2160,7 @@ rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t 
queue_id,
return 0;
 
VHOST_LOG_DATA(DEBUG, "(%s) %s\n", dev->ifname, __func__);
-   if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->nr_vring))) {
+   if (unlikely(queue_id >= dev->nr_vring)) {
VHOST_LOG_DATA(ER

[PATCH v4 2/2] example/vhost: support to clear in-flight packets for async dequeue

2022-06-06 Thread Yuan Wang

This patch allows vring_state_changed() to clear in-flight
dequeue packets. It also clears the in-flight packets in
a thread-safe way in destroy_device().

Signed-off-by: Yuan Wang 
---
 examples/vhost/main.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 9aae340c46..1e36c35565 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1543,6 +1543,25 @@ vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, 
uint16_t queue_id)
}
 }
 
+static void
+vhost_clear_queue(struct vhost_dev *vdev, uint16_t queue_id)
+{
+   uint16_t n_pkt = 0;
+   int pkts_inflight;
+
+   int16_t dma_id = 
dma_bind[vid2socketid[vdev->vid]].dmas[queue_id].dev_id;
+   pkts_inflight = rte_vhost_async_get_inflight(vdev->vid, queue_id);
+
+   struct rte_mbuf *m_cpl[pkts_inflight];
+
+   while (pkts_inflight) {
+   n_pkt = rte_vhost_clear_queue(vdev->vid, queue_id, m_cpl,
+   pkts_inflight, dma_id, 0);
+   free_pkts(m_cpl, n_pkt);
+   pkts_inflight = rte_vhost_async_get_inflight(vdev->vid, 
queue_id);
+   }
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchronization  occurs through the use of the
@@ -1600,13 +1619,13 @@ destroy_device(int vid)
vdev->vid);
 
if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
-   vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
+   vhost_clear_queue(vdev, VIRTIO_RXQ);
rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false;
}
 
if (dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled) {
-   vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+   vhost_clear_queue(vdev, VIRTIO_TXQ);
rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled = false;
}
@@ -1765,9 +1784,6 @@ vring_state_changed(int vid, uint16_t queue_id, int 
enable)
if (!vdev)
return -1;
 
-   if (queue_id != VIRTIO_RXQ)
-   return 0;
-
if (dma_bind[vid2socketid[vid]].dmas[queue_id].async_enabled) {
if (!enable)
vhost_clear_queue_thread_unsafe(vdev, queue_id);
-- 
2.25.1

[Bug 1026] ethdev API: rte_eth_dev_adjust_nb_rx_tx_desc

2022-06-06 Thread bugzilla

https://bugs.dpdk.org/show_bug.cgi?id=1026

Bug ID: 1026
   Summary: ethdev API: rte_eth_dev_adjust_nb_rx_tx_desc
   Product: DPDK
   Version: 20.11
  Hardware: All
OS: Linux
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: giovanni.tosa...@infovista.com
  Target Milestone: ---

We are using the "rte_eth_dev_adjust_nb_rx_tx_desc" API in order to get the
descriptors limits of a NIC. Basically, by initializing the desc parameters
with an "uint16_t" value it should returns the top limit and it is working fine
with some NICs (e.g. Mellanox Cx5). 

The problem is that some NIC's (e.g. Intel XXV710) seems to not really support
"unsigned" value by returning the lower limit instead. Differently, it works
correctly returning the top limit by using 32767 instead of 65535.

Please, could you confirm if this is a know issue ?

Thanks,
Giovanni

-- 
You are receiving this mail because:
You are the assignee for the bug.

[PATCH] kni: fix build

2022-06-06 Thread Thomas Monjalon

A previous fix had #else instead of #endif.
The error message is:
kernel/linux/kni/kni_net.c: In function ‘kni_net_rx_normal’:
kernel/linux/kni/kni_net.c:448:2: error: #else after #else

Bugzilla ID: 1025
Fixes: c98600d4bed6 ("kni: fix build with Linux 5.18")
Cc: sta...@dpdk.org

Signed-off-by: Thomas Monjalon 
---
 kernel/linux/kni/kni_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index a8b092b756..41805fcabf 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -445,7 +445,7 @@ kni_net_rx_normal(struct kni_dev *kni)
netif_rx_ni(skb);
 #else
netif_rx(skb);
-#else
+#endif
 
/* Update statistics */
dev->stats.rx_bytes += len;
-- 
2.36.0

Re: [PATCH v2] net/igc: add I226 support

2022-06-06 Thread Thomas Monjalon

06/06/2022 01:12, Zhang, Qi Z:
> 
> > -Original Message-
> > From: Thomas Monjalon 
> > Sent: Monday, June 6, 2022 12:42 AM
> > To: Zhang, Qi Z ; Yang, Qiming
> > 
> > Cc: dev@dpdk.org; Liu, KevinX 
> > Subject: Re: [PATCH v2] net/igc: add I226 support
> > 
> > 25/05/2022 07:57, Qiming Yang:
> > > Added I226 Series device ID in igc driver and updated igc guide
> > > document for new devices.
> > >
> > > Signed-off-by: Qiming Yang 
> > > Signed-off-by: Kevin Liu 
> > > ---
> > > v2:
> > > * rebased
> > > ---
> > >  doc/guides/nics/igc.rst| 14 +++---
> > >  doc/guides/rel_notes/release_22_03.rst |  5 +
> > 
> > You are sending a patch after 22.03 is closed, so it should be listed in 
> > 22.07!
> > 
> > I will fix while pulling the tree prepared by Qi.
> > Please be more careful with the basic checks.
> 
> Thanks for capture this, have dropped this patch in dpdk-next-net-intel.
> A new version is required.

Too late, it is in the main tree with release notes fixed.
Do you need more fix?

RE: [PATCH v2] net/igc: add I226 support

2022-06-06 Thread Zhang, Qi Z




> -Original Message-
> From: Thomas Monjalon 
> Sent: Monday, June 6, 2022 6:49 PM
> To: Yang, Qiming ; Zhang, Qi Z
> 
> Cc: dev@dpdk.org; Liu, KevinX 
> Subject: Re: [PATCH v2] net/igc: add I226 support
> 
> 06/06/2022 01:12, Zhang, Qi Z:
> >
> > > -Original Message-
> > > From: Thomas Monjalon 
> > > Sent: Monday, June 6, 2022 12:42 AM
> > > To: Zhang, Qi Z ; Yang, Qiming
> > > 
> > > Cc: dev@dpdk.org; Liu, KevinX 
> > > Subject: Re: [PATCH v2] net/igc: add I226 support
> > >
> > > 25/05/2022 07:57, Qiming Yang:
> > > > Added I226 Series device ID in igc driver and updated igc guide
> > > > document for new devices.
> > > >
> > > > Signed-off-by: Qiming Yang 
> > > > Signed-off-by: Kevin Liu 
> > > > ---
> > > > v2:
> > > > * rebased
> > > > ---
> > > >  doc/guides/nics/igc.rst| 14 +++---
> > > >  doc/guides/rel_notes/release_22_03.rst |  5 +
> > >
> > > You are sending a patch after 22.03 is closed, so it should be listed in
> 22.07!
> > >
> > > I will fix while pulling the tree prepared by Qi.
> > > Please be more careful with the basic checks.
> >
> > Thanks for capture this, have dropped this patch in dpdk-next-net-intel.
> > A new version is required.
> 
> Too late, it is in the main tree with release notes fixed.
> Do you need more fix?

OK, I guess we need to revert it with a new fix.
Sorry for the chaos...

>

[PATCH v1 00/17] Add vDPA multi-threads optiomization

2022-06-06 Thread Li Zhang

Allow the driver to use internal threads to
obtain fast configuration.
All the threads will be open on the same core of
the event completion queue scheduling thread.

Add max_conf_threads parameter to configure
the maximum number of internal threads in addition to
the caller thread (8 is suggested).
These internal threads to pipeline handle VDPA tasks
in system and shared with all VDPA devices.
Default is 0, don't use internal threads for configuration.

Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
http://patchwork.dpdk.org/project/dpdk/list/?series=21868

RFC ("Add vDPA multi-threads optiomization")
https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-l...@nvidia.com/

Li Zhang (12):
  vdpa/mlx5: fix usage of capability for max number of virtqs
  common/mlx5: extend virtq modifiable fields
  vdpa/mlx5: pre-create virtq in the prob
  vdpa/mlx5: optimize datapath-control synchronization
  vdpa/mlx5: add multi-thread management for configuration
  vdpa/mlx5: add task ring for MT management
  vdpa/mlx5: add MT task for VM memory registration
  vdpa/mlx5: add virtq creation task for MT management
  vdpa/mlx5: add virtq LM log task
  vdpa/mlx5: add device close task
  vdpa/mlx5: add virtq sub-resources creation
  vdpa/mlx5: prepare virtqueue resource creation

Yajun Wu (5):
  eal: add device removal in rte cleanup
  examples/vdpa: fix devices cleanup
  vdpa/mlx5: support pre create virtq resource
  common/mlx5: add DevX API to move QP to reset state
  vdpa/mlx5: support event qp reuse

 doc/guides/vdpadevs/mlx5.rst  |  25 +
 drivers/common/mlx5/mlx5_devx_cmds.c  |  77 ++-
 drivers/common/mlx5/mlx5_devx_cmds.h  |   6 +-
 drivers/common/mlx5/mlx5_prm.h|  30 +-
 drivers/vdpa/mlx5/meson.build |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c | 270 +--
 drivers/vdpa/mlx5/mlx5_vdpa.h | 152 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 360 ++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   | 160 +--
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c  | 128 -
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 270 +++
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c   |  22 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 654 +++---
 examples/vdpa/main.c  |   5 +-
 lib/eal/freebsd/eal.c |  33 ++
 lib/eal/include/rte_dev.h |   6 +
 lib/eal/linux/eal.c   |  33 ++
 lib/eal/windows/eal.c |  33 ++
 18 files changed, 1878 insertions(+), 387 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

-- 
2.31.1

[PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs

2022-06-06 Thread Li Zhang

The driver wrongly takes the capability value for
the number of virtq pairs instead of just the number of virtqs.

Adjust all the usages of it to be the number of virtqs.

Fixes: c2eb33a ("vdpa/mlx5: manage virtqs by array")
Cc: sta...@dpdk.org

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   | 12 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 76fa5d4299..ee71339b78 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -84,7 +84,7 @@ mlx5_vdpa_get_queue_num(struct rte_vdpa_device *vdev, 
uint32_t *queue_num)
DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
return -1;
}
-   *queue_num = priv->caps.max_num_virtio_queues;
+   *queue_num = priv->caps.max_num_virtio_queues / 2;
return 0;
 }
 
@@ -141,7 +141,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
return -EINVAL;
}
-   if (vring >= (int)priv->caps.max_num_virtio_queues * 2) {
+   if (vring >= (int)priv->caps.max_num_virtio_queues) {
DRV_LOG(ERR, "Too big vring id: %d.", vring);
return -E2BIG;
}
@@ -388,7 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
return -ENODEV;
}
-   if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+   if (qid >= (int)priv->caps.max_num_virtio_queues) {
DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
vdev->device->name);
return -E2BIG;
@@ -411,7 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
return -ENODEV;
}
-   if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+   if (qid >= (int)priv->caps.max_num_virtio_queues) {
DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
vdev->device->name);
return -E2BIG;
@@ -624,7 +624,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
DRV_LOG(DEBUG, "No capability to support virtq statistics.");
priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv) +
   sizeof(struct mlx5_vdpa_virtq) *
-  attr->vdpa.max_num_virtio_queues * 2,
+  attr->vdpa.max_num_virtio_queues,
   RTE_CACHE_LINE_SIZE);
if (!priv) {
DRV_LOG(ERR, "Failed to allocate private memory.");
@@ -685,7 +685,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
uint32_t i;
 
mlx5_vdpa_dev_cache_clean(priv);
-   for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+   for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
if (!priv->virtqs[i].counters)
continue;
claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters));
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e025be47d2..c258eb3024 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -72,7 +72,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
unsigned int i, j;
 
-   for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+   for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
@@ -492,9 +492,9 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
DRV_LOG(INFO, "TSO is enabled without CSUM, force CSUM.");
priv->features |= (1ULL << VIRTIO_NET_F_CSUM);
}
-   if (nr_vring > priv->caps.max_num_virtio_queues * 2) {
+   if (nr_vring > priv->caps.max_num_virtio_queues) {
DRV_LOG(ERR, "Do not support more than %d virtqs(%d).",
-   (int)priv->caps.max_num_virtio_queues * 2,
+   (int)priv->caps.max_num_virtio_queues,
(int)nr_vring);
return -1;
}
-- 
2.31.1

[PATCH v1 02/17] eal: add device removal in rte cleanup

2022-06-06 Thread Li Zhang

From: Yajun Wu 

Add device removal in function rte_eal_cleanup. This is the last chance
device remove get called for sanity. Loop vdev bus first and then all bus
for all device, calling rte_dev_remove.

Cc: sta...@dpdk.org

Signed-off-by: Yajun Wu 
---
 lib/eal/freebsd/eal.c | 33 +
 lib/eal/include/rte_dev.h |  6 ++
 lib/eal/linux/eal.c   | 33 +
 lib/eal/windows/eal.c | 33 +
 4 files changed, 105 insertions(+)

diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index a6b20960f2..5ffd9146b6 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -886,11 +886,44 @@ rte_eal_init(int argc, char **argv)
return fctret;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+   RTE_SET_USED(bus);
+   RTE_SET_USED(data);
+   return 0;
+}
+
+static void
+remove_all_device(void)
+{
+   struct rte_bus *start = NULL, *next;
+   struct rte_dev_iterator dev_iter = {0};
+   struct rte_device *dev = NULL;
+   struct rte_device *tdev = NULL;
+   char devstr[128];
+
+   RTE_DEV_FOREACH_SAFE(dev, "bus=vdev", &dev_iter, tdev) {
+   (void)rte_dev_remove(dev);
+   }
+   while ((next = rte_bus_find(start, bus_match_all, NULL)) != NULL) {
+   start = next;
+   /* Skip buses that don't have iterate method */
+   if (!next->dev_iterate || !next->name)
+   continue;
+   snprintf(devstr, sizeof(devstr), "bus=%s", next->name);
+   RTE_DEV_FOREACH_SAFE(dev, devstr, &dev_iter, tdev) {
+   (void)rte_dev_remove(dev);
+   }
+   };
+}
+
 int
 rte_eal_cleanup(void)
 {
struct internal_config *internal_conf =
eal_get_internal_configuration();
+   remove_all_device();
rte_service_finalize();
rte_mp_channel_cleanup();
/* after this point, any DPDK pointers will become dangling */
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index e6ff1218f9..382d548ea3 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -492,6 +492,12 @@ int
 rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
  size_t len);
 
+#define RTE_DEV_FOREACH_SAFE(dev, devstr, it, tdev) \
+   for (rte_dev_iterator_init(it, devstr), \
+   (dev) = rte_dev_iterator_next(it); \
+   (dev) && ((tdev) = rte_dev_iterator_next(it), 1); \
+   (dev) = (tdev))
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 1ef263434a..30b295916e 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1248,6 +1248,38 @@ mark_freeable(const struct rte_memseg_list *msl, const 
struct rte_memseg *ms,
return 0;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+   RTE_SET_USED(bus);
+   RTE_SET_USED(data);
+   return 0;
+}
+
+static void
+remove_all_device(void)
+{
+   struct rte_bus *start = NULL, *next;
+   struct rte_dev_iterator dev_iter = {0};
+   struct rte_device *dev = NULL;
+   struct rte_device *tdev = NULL;
+   char devstr[128];
+
+   RTE_DEV_FOREACH_SAFE(dev, "bus=vdev", &dev_iter, tdev) {
+   (void)rte_dev_remove(dev);
+   }
+   while ((next = rte_bus_find(start, bus_match_all, NULL)) != NULL) {
+   start = next;
+   /* Skip buses that don't have iterate method */
+   if (!next->dev_iterate || !next->name)
+   continue;
+   snprintf(devstr, sizeof(devstr), "bus=%s", next->name);
+   RTE_DEV_FOREACH_SAFE(dev, devstr, &dev_iter, tdev) {
+   (void)rte_dev_remove(dev);
+   }
+   };
+}
+
 int
 rte_eal_cleanup(void)
 {
@@ -1257,6 +1289,7 @@ rte_eal_cleanup(void)
struct internal_config *internal_conf =
eal_get_internal_configuration();
 
+   remove_all_device();
if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
internal_conf->hugepage_file.unlink_existing)
rte_memseg_walk(mark_freeable, NULL);
diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
index 122de2a319..3d7d411293 100644
--- a/lib/eal/windows/eal.c
+++ b/lib/eal/windows/eal.c
@@ -254,12 +254,45 @@ __rte_trace_point_register(rte_trace_point_t *trace, 
const char *name,
return -ENOTSUP;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+   RTE_SET_USED(bus);
+   RTE_SET_USED(data);
+   return 0;
+}
+
+static void
+remove_all_device(void)
+{
+   struct rte_bus *start = NULL, *next;
+   struct rte_dev_iterator dev_iter = {0};
+   struct rte_device *dev = NULL;
+   struct rte_device *tdev = NULL;
+   char devstr[128];
+
+   RTE_DEV_FOREACH_SAFE(dev, "b

[PATCH 02/16] examples/vdpa: fix vDPA device remove

2022-06-06 Thread Li Zhang

From: Yajun Wu 

Add calling rte_dev_remove in vDPA example application exit. Otherwise
rte_dev_remove never get called.

Fixes: edbed86d1cc ("examples/vdpa: introduce a new sample for vDPA")
Cc: sta...@dpdk.org

Signed-off-by: Yajun Wu 
---
 examples/vdpa/main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/examples/vdpa/main.c b/examples/vdpa/main.c
index 7e11ef4e26..534f1e9715 100644
--- a/examples/vdpa/main.c
+++ b/examples/vdpa/main.c
@@ -632,6 +632,10 @@ main(int argc, char *argv[])
vdpa_sample_quit();
}
 
+   RTE_DEV_FOREACH(dev, "class=vdpa", &dev_iter) {
+   rte_dev_remove(dev);
+   }
+
/* clean up the EAL */
rte_eal_cleanup();
 
-- 
2.31.1

[PATCH v1 03/17] examples/vdpa: fix devices cleanup

2022-06-06 Thread Li Zhang

From: Yajun Wu 

Move rte_eal_cleanup to function vdpa_sample_quit which
handling all example app quit.
Otherwise rte_eal_cleanup won't be called on receiving signal
like SIGINT(control + c).

Fixes: 10aa3757 ("examples: add eal cleanup to examples")
Cc: sta...@dpdk.org

Signed-off-by: Yajun Wu 
---
 examples/vdpa/main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/examples/vdpa/main.c b/examples/vdpa/main.c
index 7e11ef4e26..62e32b633d 100644
--- a/examples/vdpa/main.c
+++ b/examples/vdpa/main.c
@@ -286,6 +286,8 @@ vdpa_sample_quit(void)
if (vports[i].ifname[0] != '\0')
close_vdpa(&vports[i]);
}
+   /* clean up the EAL */
+   rte_eal_cleanup();
 }
 
 static void
@@ -632,8 +634,5 @@ main(int argc, char *argv[])
vdpa_sample_quit();
}
 
-   /* clean up the EAL */
-   rte_eal_cleanup();
-
return 0;
 }
-- 
2.31.1

[PATCH 04/16] common/mlx5: add DevX API to move QP to reset state

2022-06-06 Thread Li Zhang

From: Yajun Wu 

Support set QP to RESET state.

Signed-off-by: Yajun Wu 
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++
 drivers/common/mlx5/mlx5_prm.h   | 17 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c 
b/drivers/common/mlx5/mlx5_devx_cmds.c
index c6bdbc12bb..1d6d6578d6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2264,11 +2264,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, 
uint32_t qp_st_mod_op,
uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+   uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)];
} in;
union {
uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+   uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)];
} out;
void *qpc;
int ret;
@@ -2311,6 +2313,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, 
uint32_t qp_st_mod_op,
inlen = sizeof(in.rtr2rts);
outlen = sizeof(out.rtr2rts);
break;
+   case MLX5_CMD_OP_QP_2RST:
+   MLX5_SET(2rst_qp_in, &in, qpn, qp->id);
+   inlen = sizeof(in.qp2rst);
+   outlen = sizeof(out.qp2rst);
+   break;
default:
DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
qp_st_mod_op);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index bc3e70a1d1..8a2f55c33e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3657,6 +3657,23 @@ struct mlx5_ifc_init2init_qp_in_bits {
u8 reserved_at_800[0x80];
 };
 
+struct mlx5_ifc_2rst_qp_out_bits {
+   u8 status[0x8];
+   u8 reserved_at_8[0x18];
+   u8 syndrome[0x20];
+   u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_2rst_qp_in_bits {
+   u8 opcode[0x10];
+   u8 uid[0x10];
+   u8 vhca_tunnel_id[0x10];
+   u8 op_mod[0x10];
+   u8 reserved_at_80[0x8];
+   u8 qpn[0x18];
+   u8 reserved_at_a0[0x20];
+};
+
 struct mlx5_ifc_dealloc_pd_out_bits {
u8 status[0x8];
u8 reserved_0[0x18];
-- 
2.31.1

[PATCH v1 04/17] vdpa/mlx5: support pre create virtq resource

2022-06-06 Thread Li Zhang

From: Yajun Wu 

The motivation of this change is to reduce vDPA device queue creation
time by create some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu 
---
 doc/guides/vdpadevs/mlx5.rst  | 14 +++
 drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++-
 drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 3ded142311..0ad77bf535 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 
drivers.
 
   - 0, HW default.
 
+- ``queue_size`` parameter [int]
+
+  - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up
+first time queue creation. Set it together with queues devarg.
+
+  - 0, default value, no pre-create virtq resource.
+
+- ``queues`` parameter [int]
+
+  - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx 
queue)
+for pre-create queue resource to speed up first time queue creation. Set it
+together with queue_size devarg.
+
+  - 0, default value, no pre-create virtq resource.
 
 Error handling
 ^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee71339b78..faf833ee2f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 static void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
-   mlx5_vdpa_virtqs_cleanup(priv);
+   /* Clean pre-created resource in dev removal only. */
+   if (!priv->queues)
+   mlx5_vdpa_virtqs_cleanup(priv);
mlx5_vdpa_mem_dereg(priv);
 }
 
@@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char 
*val, void *opaque)
priv->hw_max_latency_us = (uint32_t)tmp;
} else if (strcmp(key, "hw_max_pending_comp") == 0) {
priv->hw_max_pending_comp = (uint32_t)tmp;
+   } else if (strcmp(key, "queue_size") == 0) {
+   priv->queue_size = (uint16_t)tmp;
+   } else if (strcmp(key, "queues") == 0) {
+   priv->queues = (uint16_t)tmp;
+   } else {
+   DRV_LOG(WARNING, "Invalid key %s.", key);
}
return 0;
 }
@@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
if (!priv->event_us &&
priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER)
priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US;
+   if ((priv->queue_size && !priv->queues) ||
+   (!priv->queue_size && priv->queues)) {
+   priv->queue_size = 0;
+   priv->queues = 0;
+   DRV_LOG(WARNING, "Please provide both queue_size and queues.");
+   }
DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode);
DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us);
DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
+   DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues,
+   priv->queue_size);
+}
+
+static int
+mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
+{
+   uint32_t index;
+   uint32_t i;
+
+   if (!priv->queues)
+   return 0;
+   for (index = 0; index < (priv->queues * 2); ++index) {
+   struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+
+   if (priv->caps.queue_counters_valid) {
+   if (!virtq->counters)
+   virtq->counters =
+   mlx5_devx_cmd_create_virtio_q_counters
+   (priv->cdev->ctx);
+   if (!virtq->counters) {
+   DRV_LOG(ERR, "Failed to create virtq couners 
for virtq"
+   " %d.", index);
+   return -1;
+   }
+   }
+   for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+   uint32_t size;
+   void *buf;
+   struct mlx5dv_devx_umem *obj;
+
+

[PATCH 03/16] vdpa/mlx5: support pre create virtq resource

2022-06-06 Thread Li Zhang

From: Yajun Wu 

The motivation of this change is to reduce vDPA device queue creation
time by create some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu 
---
 doc/guides/vdpadevs/mlx5.rst  | 14 +++
 drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++-
 drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 3ded142311..0ad77bf535 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 
drivers.
 
   - 0, HW default.
 
+- ``queue_size`` parameter [int]
+
+  - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up
+first time queue creation. Set it together with queues devarg.
+
+  - 0, default value, no pre-create virtq resource.
+
+- ``queues`` parameter [int]
+
+  - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx 
queue)
+for pre-create queue resource to speed up first time queue creation. Set it
+together with queue_size devarg.
+
+  - 0, default value, no pre-create virtq resource.
 
 Error handling
 ^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee71339b78..faf833ee2f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 static void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
-   mlx5_vdpa_virtqs_cleanup(priv);
+   /* Clean pre-created resource in dev removal only. */
+   if (!priv->queues)
+   mlx5_vdpa_virtqs_cleanup(priv);
mlx5_vdpa_mem_dereg(priv);
 }
 
@@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char 
*val, void *opaque)
priv->hw_max_latency_us = (uint32_t)tmp;
} else if (strcmp(key, "hw_max_pending_comp") == 0) {
priv->hw_max_pending_comp = (uint32_t)tmp;
+   } else if (strcmp(key, "queue_size") == 0) {
+   priv->queue_size = (uint16_t)tmp;
+   } else if (strcmp(key, "queues") == 0) {
+   priv->queues = (uint16_t)tmp;
+   } else {
+   DRV_LOG(WARNING, "Invalid key %s.", key);
}
return 0;
 }
@@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
if (!priv->event_us &&
priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER)
priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US;
+   if ((priv->queue_size && !priv->queues) ||
+   (!priv->queue_size && priv->queues)) {
+   priv->queue_size = 0;
+   priv->queues = 0;
+   DRV_LOG(WARNING, "Please provide both queue_size and queues.");
+   }
DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode);
DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us);
DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
+   DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues,
+   priv->queue_size);
+}
+
+static int
+mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
+{
+   uint32_t index;
+   uint32_t i;
+
+   if (!priv->queues)
+   return 0;
+   for (index = 0; index < (priv->queues * 2); ++index) {
+   struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+
+   if (priv->caps.queue_counters_valid) {
+   if (!virtq->counters)
+   virtq->counters =
+   mlx5_devx_cmd_create_virtio_q_counters
+   (priv->cdev->ctx);
+   if (!virtq->counters) {
+   DRV_LOG(ERR, "Failed to create virtq couners 
for virtq"
+   " %d.", index);
+   return -1;
+   }
+   }
+   for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+   uint32_t size;
+   void *buf;
+   struct mlx5dv_devx_umem *obj;
+
+

[PATCH v1 05/17] common/mlx5: add DevX API to move QP to reset state

2022-06-06 Thread Li Zhang

From: Yajun Wu 

Support set QP to RESET state.

Signed-off-by: Yajun Wu 
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++
 drivers/common/mlx5/mlx5_prm.h   | 17 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c 
b/drivers/common/mlx5/mlx5_devx_cmds.c
index c6bdbc12bb..1d6d6578d6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2264,11 +2264,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, 
uint32_t qp_st_mod_op,
uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+   uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)];
} in;
union {
uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+   uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)];
} out;
void *qpc;
int ret;
@@ -2311,6 +2313,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, 
uint32_t qp_st_mod_op,
inlen = sizeof(in.rtr2rts);
outlen = sizeof(out.rtr2rts);
break;
+   case MLX5_CMD_OP_QP_2RST:
+   MLX5_SET(2rst_qp_in, &in, qpn, qp->id);
+   inlen = sizeof(in.qp2rst);
+   outlen = sizeof(out.qp2rst);
+   break;
default:
DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
qp_st_mod_op);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index bc3e70a1d1..8a2f55c33e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3657,6 +3657,23 @@ struct mlx5_ifc_init2init_qp_in_bits {
u8 reserved_at_800[0x80];
 };
 
+struct mlx5_ifc_2rst_qp_out_bits {
+   u8 status[0x8];
+   u8 reserved_at_8[0x18];
+   u8 syndrome[0x20];
+   u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_2rst_qp_in_bits {
+   u8 opcode[0x10];
+   u8 uid[0x10];
+   u8 vhca_tunnel_id[0x10];
+   u8 op_mod[0x10];
+   u8 reserved_at_80[0x8];
+   u8 qpn[0x18];
+   u8 reserved_at_a0[0x20];
+};
+
 struct mlx5_ifc_dealloc_pd_out_bits {
u8 status[0x8];
u8 reserved_0[0x18];
-- 
2.31.1

[PATCH 05/16] vdpa/mlx5: support event qp reuse

2022-06-06 Thread Li Zhang

From: Yajun Wu 

To speed up queue create time, event qp and cq will create only once.
Each virtq creation will reuse same event qp and cq.

Because FW will set event qp to error state during virtq destroy,
need modify event qp to RESET state, then modify qp to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
previous. Add new variable qp_ci to save SW qp ci. Move qp pi
independently with cq ci.

Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
release.

Signed-off-by: Yajun Wu 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   |  8 
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 12 +-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index faf833ee2f..ee99952e11 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid)
}
mlx5_vdpa_steer_unset(priv);
mlx5_vdpa_virtqs_release(priv);
+   mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
priv->state = MLX5_VDPA_STATE_PROBED;
@@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv 
*priv)
return 0;
for (index = 0; index < (priv->queues * 2); ++index) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+   int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
+   -1, &virtq->eqp);
 
+   if (ret) {
+   DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+   index);
+   return -1;
+   }
if (priv->caps.queue_counters_valid) {
if (!virtq->counters)
virtq->counters =
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f6719a3c60..bf82026e37 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp {
struct mlx5_vdpa_cq cq;
struct mlx5_devx_obj *fw_qp;
struct mlx5_devx_qp sw_qp;
+   uint16_t qp_pi;
 };
 
 struct mlx5_vdpa_query_mr {
@@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  int callfd, struct mlx5_vdpa_event_qp *eqp);
 
 /**
@@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int 
qid,
  */
 int
 mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
+
+/**
+ * Drain virtq CQ CQE.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 7167a98db0..b43dca9255 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
};
uint32_t word;
} last_word;
-   uint16_t next_wqe_counter = cq->cq_ci;
+   uint16_t next_wqe_counter = eqp->qp_pi;
uint16_t cur_wqe_counter;
uint16_t comp;
 
@@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
rte_io_wmb();
/* Ring CQ doorbell record. */
cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+   eqp->qp_pi += comp;
rte_io_wmb();
/* Ring SW QP doorbell record. */
-   eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size);
+   eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size);
}
return comp;
 }
@@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
return max;
 }
 
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
+{
+   unsigned int i;
+
+   for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+   struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
+
+   mlx5_vdpa_queue_complete(cq);
+   if (cq->cq_obj.cq) {
+   cq->cq_obj.cqes[0].wqe_counter =
+   rte_cpu_to_be_16(UINT16_MAX);
+   priv->virtqs[i].eqp.qp_pi = 0;
+   if (!cq->armed)
+   mlx5_vdpa_cq_arm(priv, cq);
+   }
+   }
+}
+
 /* Wait on all CQs channel for completion event. */
 static st

[PATCH 06/16] common/mlx5: extend virtq modifiable fields

2022-06-06 Thread Li Zhang

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang 
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++-
 drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
 drivers/common/mlx5/mlx5_prm.h   | 13 +-
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c 
b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d6d6578d6..1b68c37092 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx,
vdpa_attr->log_doorbell_stride =
MLX5_GET(virtio_emulation_cap, hcattr,
 log_doorbell_stride);
+   vdpa_attr->vnet_modify_ext =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+vnet_modify_ext);
+   vdpa_attr->virtio_net_q_addr_modify =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+virtio_net_q_addr_modify);
+   vdpa_attr->virtio_q_index_modify =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+virtio_q_index_modify);
vdpa_attr->log_doorbell_bar_size =
MLX5_GET(virtio_emulation_cap, hcattr,
 log_doorbell_bar_size);
@@ -2074,27 +2083,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj 
*virtq_obj,
MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
-   MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+   MLX5_SET64(virtio_net_q, virtq, modify_field_select,
+   attr->mod_fields_bitmap);
MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
-   switch (attr->type) {
-   case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+   if (!attr->mod_fields_bitmap) {
+   DRV_LOG(ERR, "Failed to modify VIRTQ for no type set.");
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE)
MLX5_SET16(virtio_net_q, virtq, state, attr->state);
-   break;
-   case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+   if (attr->mod_fields_bitmap &
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) {
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
 attr->dirty_bitmap_mkey);
MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
 attr->dirty_bitmap_addr);
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
 attr->dirty_bitmap_size);
-   break;
-   case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+   }
+   if (attr->mod_fields_bitmap &
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE)
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
 attr->dirty_bitmap_dump_enable);
-   break;
-   default:
-   rte_errno = EINVAL;
-   return -rte_errno;
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) {
+   MLX5_SET(virtio_q, virtctx, queue_period_mode,
+   attr->hw_latency_mode);
+   MLX5_SET(virtio_q, virtctx, queue_period_us,
+   attr->hw_max_latency_us);
+   MLX5_SET(virtio_q, virtctx, queue_max_count,
+   attr->hw_max_pending_comp);
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) {
+   MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+   MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+   MLX5_SET64(virtio_q, virtctx, available_addr,
+   attr->available_addr);
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX)
+   MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+  attr->hw_available_index);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX)
+   MLX5_SET16(virtio_net_q, virtq, hw_used_index,
+   attr->hw_used_index);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE)
+   MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0)
+   MLX5_SET16(virtio_q, virtctx, virtio_version

[PATCH v1 06/17] vdpa/mlx5: support event qp reuse

2022-06-06 Thread Li Zhang

From: Yajun Wu 

To speed up queue create time, event qp and cq will create only once.
Each virtq creation will reuse same event qp and cq.

Because FW will set event qp to error state during virtq destroy,
need modify event qp to RESET state, then modify qp to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
previous. Add new variable qp_ci to save SW qp ci. Move qp pi
independently with cq ci.

Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
release.

Signed-off-by: Yajun Wu 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   |  8 
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 12 +-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index faf833ee2f..ee99952e11 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid)
}
mlx5_vdpa_steer_unset(priv);
mlx5_vdpa_virtqs_release(priv);
+   mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
priv->state = MLX5_VDPA_STATE_PROBED;
@@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv 
*priv)
return 0;
for (index = 0; index < (priv->queues * 2); ++index) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+   int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
+   -1, &virtq->eqp);
 
+   if (ret) {
+   DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+   index);
+   return -1;
+   }
if (priv->caps.queue_counters_valid) {
if (!virtq->counters)
virtq->counters =
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f6719a3c60..bf82026e37 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp {
struct mlx5_vdpa_cq cq;
struct mlx5_devx_obj *fw_qp;
struct mlx5_devx_qp sw_qp;
+   uint16_t qp_pi;
 };
 
 struct mlx5_vdpa_query_mr {
@@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  int callfd, struct mlx5_vdpa_event_qp *eqp);
 
 /**
@@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int 
qid,
  */
 int
 mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
+
+/**
+ * Drain virtq CQ CQE.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 7167a98db0..b43dca9255 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
};
uint32_t word;
} last_word;
-   uint16_t next_wqe_counter = cq->cq_ci;
+   uint16_t next_wqe_counter = eqp->qp_pi;
uint16_t cur_wqe_counter;
uint16_t comp;
 
@@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
rte_io_wmb();
/* Ring CQ doorbell record. */
cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+   eqp->qp_pi += comp;
rte_io_wmb();
/* Ring SW QP doorbell record. */
-   eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size);
+   eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size);
}
return comp;
 }
@@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
return max;
 }
 
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
+{
+   unsigned int i;
+
+   for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+   struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
+
+   mlx5_vdpa_queue_complete(cq);
+   if (cq->cq_obj.cq) {
+   cq->cq_obj.cqes[0].wqe_counter =
+   rte_cpu_to_be_16(UINT16_MAX);
+   priv->virtqs[i].eqp.qp_pi = 0;
+   if (!cq->armed)
+   mlx5_vdpa_cq_arm(priv, cq);
+   }
+   }
+}
+
 /* Wait on all CQs channel for completion event. */
 static st

[PATCH 07/16] vdpa/mlx5: pre-create virtq in the prob

2022-06-06 Thread Li Zhang

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h   |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c|  13 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 257 +---
 3 files changed, 170 insertions(+), 104 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index bf82026e37..e5553079fe 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
uint16_t vq_size;
uint8_t notifier_state;
bool stopped;
+   uint32_t configured:1;
uint32_t version;
struct mlx5_vdpa_priv *priv;
struct mlx5_devx_obj *virtq;
@@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, 
int qid);
  */
 void
 mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 43a2b98255..a8faf0c116 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -12,14 +12,17 @@ int
 mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+   .mod_fields_bitmap =
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
.dirty_bitmap_dump_enable = enable,
};
+   struct mlx5_vdpa_virtq *virtq;
int i;
 
for (i = 0; i < priv->nr_virtqs; ++i) {
attr.queue_index = i;
-   if (!priv->virtqs[i].virtq) {
+   virtq = &priv->virtqs[i];
+   if (!virtq->configured) {
DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
"enabling.", i);
} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
@@ -37,10 +40,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
   uint64_t log_size)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+   .mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
.dirty_bitmap_addr = log_base,
.dirty_bitmap_size = log_size,
};
+   struct mlx5_vdpa_virtq *virtq;
int i;
int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd,
  priv->cdev->pdn,
@@ -54,7 +58,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
attr.dirty_bitmap_mkey = priv->lm_mr.lkey;
for (i = 0; i < priv->nr_virtqs; ++i) {
attr.queue_index = i;
-   if (!priv->virtqs[i].virtq) {
+   virtq = &priv->virtqs[i];
+   if (!virtq->configured) {
DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
  &attr)) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6637ba1503..55cbc9fad2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+   virtq->configured = 0;
for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
if (virtq->umems[j].obj) {
claim_zero(mlx5_glue->devx_umem_dereg
@@ -111,11 +112,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
rte_intr_fd_set(virtq->intr_handle, -1);
}
rte_intr_instance_free(virtq->intr_handle);
-   if (virtq->virtq) {
+   if (virtq->configured) {
ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
if (ret)
DRV_LOG(WARNING, "Failed to stop virtq %d.",
virtq->index);
+   virtq->configured = 0;
claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
}
virtq->virtq = NULL;
@@ -138,7 +140,7 @@ int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_MODIF

[PATCH v1 07/17] common/mlx5: extend virtq modifiable fields

2022-06-06 Thread Li Zhang

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang 
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++-
 drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
 drivers/common/mlx5/mlx5_prm.h   | 13 +-
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c 
b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d6d6578d6..1b68c37092 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx,
vdpa_attr->log_doorbell_stride =
MLX5_GET(virtio_emulation_cap, hcattr,
 log_doorbell_stride);
+   vdpa_attr->vnet_modify_ext =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+vnet_modify_ext);
+   vdpa_attr->virtio_net_q_addr_modify =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+virtio_net_q_addr_modify);
+   vdpa_attr->virtio_q_index_modify =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+virtio_q_index_modify);
vdpa_attr->log_doorbell_bar_size =
MLX5_GET(virtio_emulation_cap, hcattr,
 log_doorbell_bar_size);
@@ -2074,27 +2083,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj 
*virtq_obj,
MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
-   MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+   MLX5_SET64(virtio_net_q, virtq, modify_field_select,
+   attr->mod_fields_bitmap);
MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
-   switch (attr->type) {
-   case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+   if (!attr->mod_fields_bitmap) {
+   DRV_LOG(ERR, "Failed to modify VIRTQ for no type set.");
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE)
MLX5_SET16(virtio_net_q, virtq, state, attr->state);
-   break;
-   case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+   if (attr->mod_fields_bitmap &
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) {
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
 attr->dirty_bitmap_mkey);
MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
 attr->dirty_bitmap_addr);
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
 attr->dirty_bitmap_size);
-   break;
-   case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+   }
+   if (attr->mod_fields_bitmap &
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE)
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
 attr->dirty_bitmap_dump_enable);
-   break;
-   default:
-   rte_errno = EINVAL;
-   return -rte_errno;
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) {
+   MLX5_SET(virtio_q, virtctx, queue_period_mode,
+   attr->hw_latency_mode);
+   MLX5_SET(virtio_q, virtctx, queue_period_us,
+   attr->hw_max_latency_us);
+   MLX5_SET(virtio_q, virtctx, queue_max_count,
+   attr->hw_max_pending_comp);
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) {
+   MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+   MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+   MLX5_SET64(virtio_q, virtctx, available_addr,
+   attr->available_addr);
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX)
+   MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+  attr->hw_available_index);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX)
+   MLX5_SET16(virtio_net_q, virtq, hw_used_index,
+   attr->hw_used_index);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE)
+   MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0)
+   MLX5_SET16(virtio_q, virtctx, virtio_version

[PATCH v1 08/17] vdpa/mlx5: pre-create virtq in the prob

2022-06-06 Thread Li Zhang

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h   |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c|  13 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 257 +---
 3 files changed, 170 insertions(+), 104 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index bf82026e37..e5553079fe 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
uint16_t vq_size;
uint8_t notifier_state;
bool stopped;
+   uint32_t configured:1;
uint32_t version;
struct mlx5_vdpa_priv *priv;
struct mlx5_devx_obj *virtq;
@@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, 
int qid);
  */
 void
 mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 43a2b98255..a8faf0c116 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -12,14 +12,17 @@ int
 mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+   .mod_fields_bitmap =
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
.dirty_bitmap_dump_enable = enable,
};
+   struct mlx5_vdpa_virtq *virtq;
int i;
 
for (i = 0; i < priv->nr_virtqs; ++i) {
attr.queue_index = i;
-   if (!priv->virtqs[i].virtq) {
+   virtq = &priv->virtqs[i];
+   if (!virtq->configured) {
DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
"enabling.", i);
} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
@@ -37,10 +40,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
   uint64_t log_size)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+   .mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
.dirty_bitmap_addr = log_base,
.dirty_bitmap_size = log_size,
};
+   struct mlx5_vdpa_virtq *virtq;
int i;
int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd,
  priv->cdev->pdn,
@@ -54,7 +58,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
attr.dirty_bitmap_mkey = priv->lm_mr.lkey;
for (i = 0; i < priv->nr_virtqs; ++i) {
attr.queue_index = i;
-   if (!priv->virtqs[i].virtq) {
+   virtq = &priv->virtqs[i];
+   if (!virtq->configured) {
DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
  &attr)) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6637ba1503..55cbc9fad2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+   virtq->configured = 0;
for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
if (virtq->umems[j].obj) {
claim_zero(mlx5_glue->devx_umem_dereg
@@ -111,11 +112,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
rte_intr_fd_set(virtq->intr_handle, -1);
}
rte_intr_instance_free(virtq->intr_handle);
-   if (virtq->virtq) {
+   if (virtq->configured) {
ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
if (ret)
DRV_LOG(WARNING, "Failed to stop virtq %d.",
virtq->index);
+   virtq->configured = 0;
claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
}
virtq->virtq = NULL;
@@ -138,7 +140,7 @@ int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_MODIF

[PATCH 09/16] vdpa/mlx5: add multi-thread management for configuration

2022-06-06 Thread Li Zhang

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang 
---
 doc/guides/vdpadevs/mlx5.rst  |  11 +++
 drivers/vdpa/mlx5/meson.build |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c |  41 
 drivers/vdpa/mlx5/mlx5_vdpa.h |  36 +++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
 7 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 0ad77bf535..b75a01688d 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 
drivers.
   CPU core number to set polling thread affinity to, default to control plane
   cpu.
 
+- ``max_conf_threads`` parameter [int]
+
+  Allow the driver to use internal threads to obtain fast configuration.
+  All the threads will be open on the same core of the event completion queue 
scheduling thread.
+
+  - 0, default, don't use internal threads for configuration.
+
+  - 1 - 256, number of internal threads in addition to the caller thread (8 is 
suggested).
+This value, if not 0, should be the same for all the devices;
+the first prob will take it with the event_core for all the multi-thread 
configurations in the driver.
+
 - ``hw_latency_mode`` parameter [int]
 
   The completion queue moderation mode:
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 0fa82ad257..9d8dbb1a82 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
 'mlx5_vdpa_virtq.c',
 'mlx5_vdpa_steer.c',
 'mlx5_vdpa_lm.c',
+'mlx5_vdpa_cthread.c',
 )
 cflags_options = [
 '-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e5a11f72fd..a9d023ed08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
  TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
 
 static struct mlx5_vdpa_priv *
@@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char 
*val, void *opaque)
DRV_LOG(WARNING, "Invalid event_core %s.", val);
else
priv->event_core = tmp;
+   } else if (strcmp(key, "max_conf_threads") == 0) {
+   if (tmp) {
+   priv->use_c_thread = true;
+   if (!conf_thread_mng.initializer_priv) {
+   conf_thread_mng.initializer_priv = priv;
+   if (tmp > MLX5_VDPA_MAX_C_THRD) {
+   DRV_LOG(WARNING,
+   "Invalid max_conf_threads %s "
+   "and set max_conf_threads to %d",
+   val, MLX5_VDPA_MAX_C_THRD);
+   tmp = MLX5_VDPA_MAX_C_THRD;
+   }
+   conf_thread_mng.max_thrds = tmp;
+   } else if (tmp != conf_thread_mng.max_thrds) {
+   DRV_LOG(WARNING,
+   "max_conf_threads is PMD argument and not per device, "
+   "only the first device configuration set it, current value is %d "
+   "and will not be changed to %d.",
+   conf_thread_mng.max_thrds, (int)tmp);
+   }
+   } else {
+   priv->use_c_thread = false;
+   }
} else if (strcmp(key, "hw_latency_mode") == 0) {
priv->hw_latency_mode = (uint32_t)tmp;
} else if (strcmp(key, "hw_max_latency_us") == 0) {
@@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
"hw_max_latency_us",
"hw_max_pending_comp",
"no_traffic_time",
+   "queue_size",
+   "queues",
+   "max_conf_threads",
NULL,

[PATCH 08/16] vdpa/mlx5: optimize datapath-control synchronization

2022-06-06 Thread Li Zhang

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
  parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
  which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   | 24 ---
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 13 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++---
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c| 34 +++---
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++---
 6 files changed, 184 insertions(+), 79 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee99952e11..e5a11f72fd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
struct mlx5_vdpa_priv *priv =
mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+   struct mlx5_vdpa_virtq *virtq;
int ret;
 
if (priv == NULL) {
@@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
DRV_LOG(ERR, "Too big vring id: %d.", vring);
return -E2BIG;
}
-   pthread_mutex_lock(&priv->vq_config_lock);
+   virtq = &priv->virtqs[vring];
+   pthread_mutex_lock(&virtq->virtq_lock);
ret = mlx5_vdpa_virtq_enable(priv, vring, state);
-   pthread_mutex_unlock(&priv->vq_config_lock);
+   pthread_mutex_unlock(&virtq->virtq_lock);
return ret;
 }
 
@@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid)
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
+   pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
+   pthread_mutex_unlock(&priv->steer_update_lock);
mlx5_vdpa_virtqs_release(priv);
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
@@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid)
if (!priv->connected)
mlx5_vdpa_dev_cache_clean(priv);
priv->vid = 0;
-   /* The mutex may stay locked after event thread cancel - initiate it. */
-   pthread_mutex_init(&priv->vq_config_lock, NULL);
DRV_LOG(INFO, "vDPA device %d was closed.", vid);
return ret;
 }
@@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+   struct mlx5_vdpa_virtq *virtq;
uint32_t index;
uint32_t i;
 
+   for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+   index++) {
+   virtq = &priv->virtqs[index];
+   pthread_mutex_init(&virtq->virtq_lock, NULL);
+   }
if (!priv->queues)
return 0;
for (index = 0; index < (priv->queues * 2); ++index) {
-   struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+   virtq = &priv->virtqs[index];
int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-   -1, &virtq->eqp);
+   -1, virtq);
 
if (ret) {
DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
@@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
priv->num_lag_ports = attr->num_lag_ports;
if (attr->num_lag_ports == 0)
priv->num_lag_ports = 1;
-   pthread_mutex_init(&priv->vq_config_lock, NULL);
+   rte_spinlock_init(&priv->db_lock);
+   pthread_mutex_init(&priv->steer_update_lock, NULL);
priv->cdev = cdev;
mlx5_vdpa_config_get(mkvlist, priv);
if (mlx5_vdpa_create_dev_resources(priv))
@@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
mlx5_vdpa_release_dev_resources(priv);
if (priv->vdev)
rte_vdpa_unregister_device(priv->vdev);
-   pthread_mutex_destroy(&priv->vq_config_lock);
rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e5553079fe..3fd5eefc5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq {
bool stopped;
uint32_t configured:1;
uint32_t version;
+   pthread_mutex_t virtq_lock;
struct mlx5_vdpa_priv *priv;
struct mlx5_devx_obj *virtq;
struct mlx5_devx_obj *counters;
@@ -126,7 +127,8 @@ struct mlx5_vdpa_priv {
TAILQ_

[PATCH v1 09/17] vdpa/mlx5: optimize datapath-control synchronization

2022-06-06 Thread Li Zhang

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
  parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
  which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   | 24 ---
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 13 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++---
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c| 34 +++---
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++---
 6 files changed, 184 insertions(+), 79 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee99952e11..e5a11f72fd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
struct mlx5_vdpa_priv *priv =
mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+   struct mlx5_vdpa_virtq *virtq;
int ret;
 
if (priv == NULL) {
@@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
DRV_LOG(ERR, "Too big vring id: %d.", vring);
return -E2BIG;
}
-   pthread_mutex_lock(&priv->vq_config_lock);
+   virtq = &priv->virtqs[vring];
+   pthread_mutex_lock(&virtq->virtq_lock);
ret = mlx5_vdpa_virtq_enable(priv, vring, state);
-   pthread_mutex_unlock(&priv->vq_config_lock);
+   pthread_mutex_unlock(&virtq->virtq_lock);
return ret;
 }
 
@@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid)
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
+   pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
+   pthread_mutex_unlock(&priv->steer_update_lock);
mlx5_vdpa_virtqs_release(priv);
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
@@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid)
if (!priv->connected)
mlx5_vdpa_dev_cache_clean(priv);
priv->vid = 0;
-   /* The mutex may stay locked after event thread cancel - initiate it. */
-   pthread_mutex_init(&priv->vq_config_lock, NULL);
DRV_LOG(INFO, "vDPA device %d was closed.", vid);
return ret;
 }
@@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+   struct mlx5_vdpa_virtq *virtq;
uint32_t index;
uint32_t i;
 
+   for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+   index++) {
+   virtq = &priv->virtqs[index];
+   pthread_mutex_init(&virtq->virtq_lock, NULL);
+   }
if (!priv->queues)
return 0;
for (index = 0; index < (priv->queues * 2); ++index) {
-   struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+   virtq = &priv->virtqs[index];
int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-   -1, &virtq->eqp);
+   -1, virtq);
 
if (ret) {
DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
@@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
priv->num_lag_ports = attr->num_lag_ports;
if (attr->num_lag_ports == 0)
priv->num_lag_ports = 1;
-   pthread_mutex_init(&priv->vq_config_lock, NULL);
+   rte_spinlock_init(&priv->db_lock);
+   pthread_mutex_init(&priv->steer_update_lock, NULL);
priv->cdev = cdev;
mlx5_vdpa_config_get(mkvlist, priv);
if (mlx5_vdpa_create_dev_resources(priv))
@@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
mlx5_vdpa_release_dev_resources(priv);
if (priv->vdev)
rte_vdpa_unregister_device(priv->vdev);
-   pthread_mutex_destroy(&priv->vq_config_lock);
rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e5553079fe..3fd5eefc5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq {
bool stopped;
uint32_t configured:1;
uint32_t version;
+   pthread_mutex_t virtq_lock;
struct mlx5_vdpa_priv *priv;
struct mlx5_devx_obj *virtq;
struct mlx5_devx_obj *counters;
@@ -126,7 +127,8 @@ struct mlx5_vdpa_priv {
TAILQ_

[PATCH 10/16] vdpa/mlx5: add task ring for MT management

2022-06-06 Thread Li Zhang

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |  17 
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +-
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 4e7c2557b7..2bbb868ec6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -74,10 +74,22 @@ enum {
 };
 
 #define MLX5_VDPA_MAX_C_THRD 256
+#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
+#define MLX5_VDPA_TASKS_PER_DEV 64
+
+/* Generic task information and size must be multiple of 4B. */
+struct mlx5_vdpa_task {
+   struct mlx5_vdpa_priv *priv;
+   uint32_t *remaining_cnt;
+   uint32_t *err_cnt;
+   uint32_t idx;
+} __rte_packed __rte_aligned(4);
 
 /* Generic mlx5_vdpa_c_thread information. */
 struct mlx5_vdpa_c_thread {
pthread_t tid;
+   struct rte_ring *rng;
+   pthread_cond_t c_cond;
 };
 
 struct mlx5_vdpa_conf_thread_mng {
@@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core);
  */
 void
 mlx5_vdpa_mult_threads_destroy(bool need_unlock);
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+   uint32_t thrd_idx,
+   uint32_t num);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index ba7d8b63b3..1fdc92d3ad 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -11,17 +11,103 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r,
+   void **obj, uint32_t n, uint32_t *avail)
+{
+   uint32_t m;
+
+   m = rte_ring_dequeue_bulk_elem_start(r, obj,
+   sizeof(struct mlx5_vdpa_task), n, avail);
+   n = (m == n) ? n : 0;
+   rte_ring_dequeue_elem_finish(r, n);
+   return n;
+}
+
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
+   void * const *obj, uint32_t n, uint32_t *free)
+{
+   uint32_t m;
+
+   m = rte_ring_enqueue_bulk_elem_start(r, n, free);
+   n = (m == n) ? n : 0;
+   rte_ring_enqueue_elem_finish(r, obj,
+   sizeof(struct mlx5_vdpa_task), n);
+   return n;
+}
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+   uint32_t thrd_idx,
+   uint32_t num)
+{
+   struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
+   struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+   uint32_t i;
+
+   MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
+   for (i = 0 ; i < num; i++) {
+   task[i].priv = priv;
+   /* To be added later. */
+   }
+   if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
+   return -1;
+   for (i = 0 ; i < num; i++)
+   if (task[i].remaining_cnt)
+   __atomic_fetch_add(task[i].remaining_cnt, 1,
+   __ATOMIC_RELAXED);
+   /* wake up conf thread. */
+   pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+   pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond);
+   pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+   return 0;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
-   /* To be added later. */
-   return arg;
+   struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
+   pthread_t thread_id = pthread_self();
+   struct mlx5_vdpa_priv *priv;
+   struct mlx5_vdpa_task task;
+   struct rte_ring *rng;
+   uint32_t thrd_idx;
+   uint32_t task_num;
+
+   for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
+   thrd_idx++)
+   if (multhrd->cthrd[thrd_idx].tid == thread_id)
+   break;
+   if (thrd_idx >= multhrd->max_thrds)
+   return NULL;
+   rng = multhrd->cthrd[thrd_idx].rng;
+   while (1) {
+   task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng,
+   (void **)&task, 1, NULL);
+   if (!task_num) {
+   /* No task and condition wait. */
+   pthread_mutex_lock(&multhrd->cthrd_lock);
+   pthread_cond_wait(
+   &multhrd->cthrd[thrd_idx].c_cond,
+   &m

[PATCH v1 10/17] vdpa/mlx5: add multi-thread management for configuration

2022-06-06 Thread Li Zhang

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang 
---
 doc/guides/vdpadevs/mlx5.rst  |  11 +++
 drivers/vdpa/mlx5/meson.build |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c |  41 
 drivers/vdpa/mlx5/mlx5_vdpa.h |  36 +++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
 7 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 0ad77bf535..b75a01688d 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 
drivers.
   CPU core number to set polling thread affinity to, default to control plane
   cpu.
 
+- ``max_conf_threads`` parameter [int]
+
+  Allow the driver to use internal threads to obtain fast configuration.
+  All the threads will be open on the same core of the event completion queue 
scheduling thread.
+
+  - 0, default, don't use internal threads for configuration.
+
+  - 1 - 256, number of internal threads in addition to the caller thread (8 is 
suggested).
+This value, if not 0, should be the same for all the devices;
+the first prob will take it with the event_core for all the multi-thread 
configurations in the driver.
+
 - ``hw_latency_mode`` parameter [int]
 
   The completion queue moderation mode:
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 0fa82ad257..9d8dbb1a82 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
 'mlx5_vdpa_virtq.c',
 'mlx5_vdpa_steer.c',
 'mlx5_vdpa_lm.c',
+'mlx5_vdpa_cthread.c',
 )
 cflags_options = [
 '-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e5a11f72fd..a9d023ed08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
  TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
 
 static struct mlx5_vdpa_priv *
@@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char 
*val, void *opaque)
DRV_LOG(WARNING, "Invalid event_core %s.", val);
else
priv->event_core = tmp;
+   } else if (strcmp(key, "max_conf_threads") == 0) {
+   if (tmp) {
+   priv->use_c_thread = true;
+   if (!conf_thread_mng.initializer_priv) {
+   conf_thread_mng.initializer_priv = priv;
+   if (tmp > MLX5_VDPA_MAX_C_THRD) {
+   DRV_LOG(WARNING,
+   "Invalid max_conf_threads %s "
+   "and set max_conf_threads to %d",
+   val, MLX5_VDPA_MAX_C_THRD);
+   tmp = MLX5_VDPA_MAX_C_THRD;
+   }
+   conf_thread_mng.max_thrds = tmp;
+   } else if (tmp != conf_thread_mng.max_thrds) {
+   DRV_LOG(WARNING,
+   "max_conf_threads is PMD argument and not per device, "
+   "only the first device configuration set it, current value is %d "
+   "and will not be changed to %d.",
+   conf_thread_mng.max_thrds, (int)tmp);
+   }
+   } else {
+   priv->use_c_thread = false;
+   }
} else if (strcmp(key, "hw_latency_mode") == 0) {
priv->hw_latency_mode = (uint32_t)tmp;
} else if (strcmp(key, "hw_max_latency_us") == 0) {
@@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
"hw_max_latency_us",
"hw_max_pending_comp",
"no_traffic_time",
+   "queue_size",
+   "queues",
+   "max_conf_threads",
NULL,

[PATCH 11/16] vdpa/mlx5: add MT task for VM memory registration

2022-06-06 Thread Li Zhang

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.h |  31 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 -
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 270 ++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
 5 files changed, 258 insertions(+), 97 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a9d023ed08..e3b32fa087 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
rte_errno = rte_errno ? rte_errno : EINVAL;
goto error;
}
-   SLIST_INIT(&priv->mr_list);
pthread_mutex_lock(&priv_list_lock);
TAILQ_INSERT_TAIL(&priv_list, priv, next);
pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2bbb868ec6..3316ce42be 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp {
 };
 
 struct mlx5_vdpa_query_mr {
-   SLIST_ENTRY(mlx5_vdpa_query_mr) next;
union {
struct ibv_mr *mr;
struct mlx5_devx_obj *mkey;
@@ -76,10 +75,17 @@ enum {
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
+#define MLX5_VDPA_MAX_MRS 0x
+
+/* Vdpa task types. */
+enum mlx5_vdpa_task_type {
+   MLX5_VDPA_TASK_REG_MR = 1,
+};
 
 /* Generic task information and size must be multiple of 4B. */
 struct mlx5_vdpa_task {
struct mlx5_vdpa_priv *priv;
+   enum mlx5_vdpa_task_type type;
uint32_t *remaining_cnt;
uint32_t *err_cnt;
uint32_t idx;
@@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng {
 };
 extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
 
+struct mlx5_vdpa_vmem_info {
+   struct rte_vhost_memory *vmem;
+   uint32_t entries_num;
+   uint64_t gcd;
+   uint64_t size;
+   uint8_t mode;
+};
+
 struct mlx5_vdpa_virtq {
SLIST_ENTRY(mlx5_vdpa_virtq) next;
uint8_t enable;
@@ -176,7 +190,7 @@ struct mlx5_vdpa_priv {
struct mlx5_hca_vdpa_attr caps;
uint32_t gpa_mkey_index;
struct ibv_mr *null_mr;
-   struct rte_vhost_memory *vmem;
+   struct mlx5_vdpa_vmem_info vmem_info;
struct mlx5dv_devx_event_channel *eventc;
struct mlx5dv_devx_event_channel *err_chnl;
struct mlx5_uar uar;
@@ -187,11 +201,13 @@ struct mlx5_vdpa_priv {
uint8_t num_lag_ports;
uint64_t features; /* Negotiated features. */
uint16_t log_max_rqt_size;
+   uint16_t last_c_thrd_idx;
+   uint16_t num_mrs; /* Number of memory regions. */
struct mlx5_vdpa_steer steer;
struct mlx5dv_var *var;
void *virtq_db_addr;
struct mlx5_pmd_wrapped_mr lm_mr;
-   SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+   struct mlx5_vdpa_query_mr **mrs;
struct mlx5_vdpa_virtq virtqs[];
 };
 
@@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
-   uint32_t num);
+   enum mlx5_vdpa_task_type task_type,
+   uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+   void **task_data, uint32_t num);
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+   uint32_t *err_cnt, uint32_t sleep_time);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1fdc92d3ad..10391931ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
-   uint32_t num)
+   enum mlx5_vdpa_task_type task_type,
+   uint32_t *remaining_cnt, uint32_t *err_cnt,
+   void **task_data, uint32_t num)
 {
struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+   uint32_t *data = (uint32_t *)task_data;
uint32_t i;

[PATCH v1 11/17] vdpa/mlx5: add task ring for MT management

2022-06-06 Thread Li Zhang

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |  17 
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +-
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 4e7c2557b7..2bbb868ec6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -74,10 +74,22 @@ enum {
 };
 
 #define MLX5_VDPA_MAX_C_THRD 256
+#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
+#define MLX5_VDPA_TASKS_PER_DEV 64
+
+/* Generic task information and size must be multiple of 4B. */
+struct mlx5_vdpa_task {
+   struct mlx5_vdpa_priv *priv;
+   uint32_t *remaining_cnt;
+   uint32_t *err_cnt;
+   uint32_t idx;
+} __rte_packed __rte_aligned(4);
 
 /* Generic mlx5_vdpa_c_thread information. */
 struct mlx5_vdpa_c_thread {
pthread_t tid;
+   struct rte_ring *rng;
+   pthread_cond_t c_cond;
 };
 
 struct mlx5_vdpa_conf_thread_mng {
@@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core);
  */
 void
 mlx5_vdpa_mult_threads_destroy(bool need_unlock);
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+   uint32_t thrd_idx,
+   uint32_t num);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index ba7d8b63b3..1fdc92d3ad 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -11,17 +11,103 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r,
+   void **obj, uint32_t n, uint32_t *avail)
+{
+   uint32_t m;
+
+   m = rte_ring_dequeue_bulk_elem_start(r, obj,
+   sizeof(struct mlx5_vdpa_task), n, avail);
+   n = (m == n) ? n : 0;
+   rte_ring_dequeue_elem_finish(r, n);
+   return n;
+}
+
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
+   void * const *obj, uint32_t n, uint32_t *free)
+{
+   uint32_t m;
+
+   m = rte_ring_enqueue_bulk_elem_start(r, n, free);
+   n = (m == n) ? n : 0;
+   rte_ring_enqueue_elem_finish(r, obj,
+   sizeof(struct mlx5_vdpa_task), n);
+   return n;
+}
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+   uint32_t thrd_idx,
+   uint32_t num)
+{
+   struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
+   struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+   uint32_t i;
+
+   MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
+   for (i = 0 ; i < num; i++) {
+   task[i].priv = priv;
+   /* To be added later. */
+   }
+   if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
+   return -1;
+   for (i = 0 ; i < num; i++)
+   if (task[i].remaining_cnt)
+   __atomic_fetch_add(task[i].remaining_cnt, 1,
+   __ATOMIC_RELAXED);
+   /* wake up conf thread. */
+   pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+   pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond);
+   pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+   return 0;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
-   /* To be added later. */
-   return arg;
+   struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
+   pthread_t thread_id = pthread_self();
+   struct mlx5_vdpa_priv *priv;
+   struct mlx5_vdpa_task task;
+   struct rte_ring *rng;
+   uint32_t thrd_idx;
+   uint32_t task_num;
+
+   for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
+   thrd_idx++)
+   if (multhrd->cthrd[thrd_idx].tid == thread_id)
+   break;
+   if (thrd_idx >= multhrd->max_thrds)
+   return NULL;
+   rng = multhrd->cthrd[thrd_idx].rng;
+   while (1) {
+   task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng,
+   (void **)&task, 1, NULL);
+   if (!task_num) {
+   /* No task and condition wait. */
+   pthread_mutex_lock(&multhrd->cthrd_lock);
+   pthread_cond_wait(
+   &multhrd->cthrd[thrd_idx].c_cond,
+   &m

[PATCH 12/16] vdpa/mlx5: add virtq creation task for MT management

2022-06-06 Thread Li Zhang

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |   9 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 149 +++---
 4 files changed, 134 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3316ce42be..35221f5ddc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ enum {
 /* Vdpa task types. */
 enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
+   MLX5_VDPA_TASK_SETUP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info {
 
 struct mlx5_vdpa_virtq {
SLIST_ENTRY(mlx5_vdpa_virtq) next;
-   uint8_t enable;
uint16_t index;
uint16_t vq_size;
uint8_t notifier_state;
-   bool stopped;
uint32_t configured:1;
+   uint32_t enable:1;
+   uint32_t stopped:1;
uint32_t version;
pthread_mutex_t virtq_lock;
struct mlx5_vdpa_priv *priv;
@@ -565,11 +566,13 @@ bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
enum mlx5_vdpa_task_type task_type,
-   uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+   uint32_t *remaining_cnt, uint32_t *err_cnt,
void **task_data, uint32_t num);
 int
 mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
 bool
 mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
uint32_t *err_cnt, uint32_t sleep_time);
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 10391931ae..1389d369ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 {
struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
pthread_t thread_id = pthread_self();
+   struct mlx5_vdpa_virtq *virtq;
struct mlx5_vdpa_priv *priv;
struct mlx5_vdpa_task task;
struct rte_ring *rng;
@@ -139,6 +140,19 @@ mlx5_vdpa_c_thread_handle(void *arg)
__ATOMIC_RELAXED);
}
break;
+   case MLX5_VDPA_TASK_SETUP_VIRTQ:
+   virtq = &priv->virtqs[task.idx];
+   pthread_mutex_lock(&virtq->virtq_lock);
+   ret = mlx5_vdpa_virtq_setup(priv,
+   task.idx, false);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to setup virtq %d.", task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1, __ATOMIC_RELAXED);
+   }
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
default:
DRV_LOG(ERR, "Invalid vdpa task type %d.",
task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b45fbac146..f782b6b832 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
goto unlock;
if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
goto unlock;
-   virtq->stopped = true;
+   virtq->stopped = 1;
/* Query error info. */
if (mlx5_vdpa_virtq_query(priv, vq_index))
goto log;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 0b317655db..db05220e76 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -111,8 +111,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+   if (virtq->index != i)
+   continue;
pthread_mutex_lock(&virtq->virtq_lock);
-   virtq->configured = 0;
for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
if (virtq->umems[j].obj) {
claim_zero(mlx5_glue->devx_umem_dereg
@@ -131,7 +132,6 @@ mlx5_vdpa_virtqs_cleanup(struct m

[PATCH 13/16] vdpa/mlx5: add virtq LM log task

2022-06-06 Thread Li Zhang

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |  3 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c  | 85 +--
 3 files changed, 105 insertions(+), 17 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 35221f5ddc..e08931719f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -72,6 +72,8 @@ enum {
MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_USED_RING_LEN(size) \
+   ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
@@ -81,6 +83,7 @@ enum {
 enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
MLX5_VDPA_TASK_SETUP_VIRTQ,
+   MLX5_VDPA_TASK_STOP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1389d369ae..98369f0887 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
struct mlx5_vdpa_priv *priv;
struct mlx5_vdpa_task task;
struct rte_ring *rng;
+   uint64_t features;
uint32_t thrd_idx;
uint32_t task_num;
int ret;
@@ -153,6 +154,39 @@ mlx5_vdpa_c_thread_handle(void *arg)
}
pthread_mutex_unlock(&virtq->virtq_lock);
break;
+   case MLX5_VDPA_TASK_STOP_VIRTQ:
+   virtq = &priv->virtqs[task.idx];
+   pthread_mutex_lock(&virtq->virtq_lock);
+   ret = mlx5_vdpa_virtq_stop(priv,
+   task.idx);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to stop virtq %d.",
+   task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1,
+   __ATOMIC_RELAXED);
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
+   }
+   ret = rte_vhost_get_negotiated_features(
+   priv->vid, &features);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to get negotiated features virtq %d.",
+   task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1,
+   __ATOMIC_RELAXED);
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
+   }
+   if (RTE_VHOST_NEED_LOG(features))
+   rte_vhost_log_used_vring(
+   priv->vid, task.idx, 0,
+   MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
default:
DRV_LOG(ERR, "Invalid vdpa task type %d.",
task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index efebf364d0..c2e78218ca 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -89,39 +89,90 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
return -1;
 }
 
-#define MLX5_VDPA_USED_RING_LEN(size) \
-   ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
-
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+   uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+   uint32_t i, thrd_idx, data[1];
struct mlx5_vdpa_virtq *virtq;
uint64_t features;
-   int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
-   int i;
+   int ret;
 
+   ret = rte_vhost_get_negotiated_features(priv->vid, &features);
if (ret) {
DRV_LOG(ERR, "Failed to get negotiated features.");
return -1;
}
-   if (!RTE_VHOST_NEED_LOG(features))
-   return 0;
-   for (i = 0; i < priv->nr_virtqs; ++i) {
-   virtq = &priv->virtqs[i];
-   if (!priv->virtqs[i].virtq) {
-   DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
-   } else {
+   if (priv->use_c_thread && priv->nr_virtqs) {
+   uint32_t main_task_idx[priv->nr_virtqs];
+
+   for (i = 0; i < p

[PATCH v1 12/17] vdpa/mlx5: add MT task for VM memory registration

2022-06-06 Thread Li Zhang

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.h |  31 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 -
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 270 ++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
 5 files changed, 258 insertions(+), 97 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a9d023ed08..e3b32fa087 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
rte_errno = rte_errno ? rte_errno : EINVAL;
goto error;
}
-   SLIST_INIT(&priv->mr_list);
pthread_mutex_lock(&priv_list_lock);
TAILQ_INSERT_TAIL(&priv_list, priv, next);
pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2bbb868ec6..3316ce42be 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp {
 };
 
 struct mlx5_vdpa_query_mr {
-   SLIST_ENTRY(mlx5_vdpa_query_mr) next;
union {
struct ibv_mr *mr;
struct mlx5_devx_obj *mkey;
@@ -76,10 +75,17 @@ enum {
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
+#define MLX5_VDPA_MAX_MRS 0x
+
+/* Vdpa task types. */
+enum mlx5_vdpa_task_type {
+   MLX5_VDPA_TASK_REG_MR = 1,
+};
 
 /* Generic task information and size must be multiple of 4B. */
 struct mlx5_vdpa_task {
struct mlx5_vdpa_priv *priv;
+   enum mlx5_vdpa_task_type type;
uint32_t *remaining_cnt;
uint32_t *err_cnt;
uint32_t idx;
@@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng {
 };
 extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
 
+struct mlx5_vdpa_vmem_info {
+   struct rte_vhost_memory *vmem;
+   uint32_t entries_num;
+   uint64_t gcd;
+   uint64_t size;
+   uint8_t mode;
+};
+
 struct mlx5_vdpa_virtq {
SLIST_ENTRY(mlx5_vdpa_virtq) next;
uint8_t enable;
@@ -176,7 +190,7 @@ struct mlx5_vdpa_priv {
struct mlx5_hca_vdpa_attr caps;
uint32_t gpa_mkey_index;
struct ibv_mr *null_mr;
-   struct rte_vhost_memory *vmem;
+   struct mlx5_vdpa_vmem_info vmem_info;
struct mlx5dv_devx_event_channel *eventc;
struct mlx5dv_devx_event_channel *err_chnl;
struct mlx5_uar uar;
@@ -187,11 +201,13 @@ struct mlx5_vdpa_priv {
uint8_t num_lag_ports;
uint64_t features; /* Negotiated features. */
uint16_t log_max_rqt_size;
+   uint16_t last_c_thrd_idx;
+   uint16_t num_mrs; /* Number of memory regions. */
struct mlx5_vdpa_steer steer;
struct mlx5dv_var *var;
void *virtq_db_addr;
struct mlx5_pmd_wrapped_mr lm_mr;
-   SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+   struct mlx5_vdpa_query_mr **mrs;
struct mlx5_vdpa_virtq virtqs[];
 };
 
@@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
-   uint32_t num);
+   enum mlx5_vdpa_task_type task_type,
+   uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+   void **task_data, uint32_t num);
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+   uint32_t *err_cnt, uint32_t sleep_time);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1fdc92d3ad..10391931ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
-   uint32_t num)
+   enum mlx5_vdpa_task_type task_type,
+   uint32_t *remaining_cnt, uint32_t *err_cnt,
+   void **task_data, uint32_t num)
 {
struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+   uint32_t *data = (uint32_t *)task_data;
uint32_t i;

[PATCH 14/16] vdpa/mlx5: add device close task

2022-06-06 Thread Li Zhang

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 56 +--
 drivers/vdpa/mlx5/mlx5_vdpa.h |  8 
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 +++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e3b32fa087..d000854c08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
-static void
+void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
/* Clean pre-created resource in dev removal only. */
@@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
mlx5_vdpa_mem_dereg(priv);
 }
 
+static bool
+mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
+{
+   uint32_t timeout = 0;
+
+   /* Check and wait all close tasks done. */
+   while (__atomic_load_n(&priv->dev_close_progress,
+   __ATOMIC_RELAXED) != 0 && timeout < 1000) {
+   rte_delay_us_sleep(1);
+   timeout++;
+   }
+   if (priv->dev_close_progress) {
+   DRV_LOG(ERR,
+   "Failed to wait close device tasks done vid %d.",
+   priv->vid);
+   return true;
+   }
+   return false;
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid)
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
+   if (priv->use_c_thread) {
+   if (priv->last_c_thrd_idx >=
+   (conf_thread_mng.max_thrds - 1))
+   priv->last_c_thrd_idx = 0;
+   else
+   priv->last_c_thrd_idx++;
+   __atomic_store_n(&priv->dev_close_progress,
+   1, __ATOMIC_RELAXED);
+   if (mlx5_vdpa_task_add(priv,
+   priv->last_c_thrd_idx,
+   MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+   NULL, NULL, NULL, 1)) {
+   DRV_LOG(ERR,
+   "Fail to add dev close task. ");
+   goto single_thrd;
+   }
+   priv->state = MLX5_VDPA_STATE_PROBED;
+   DRV_LOG(INFO, "vDPA device %d was closed.", vid);
+   return ret;
+   }
+single_thrd:
pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
pthread_mutex_unlock(&priv->steer_update_lock);
@@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid)
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-   priv->state = MLX5_VDPA_STATE_PROBED;
if (!priv->connected)
mlx5_vdpa_dev_cache_clean(priv);
priv->vid = 0;
+   __atomic_store_n(&priv->dev_close_progress, 0,
+   __ATOMIC_RELAXED);
+   priv->state = MLX5_VDPA_STATE_PROBED;
DRV_LOG(INFO, "vDPA device %d was closed.", vid);
return ret;
 }
@@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid)
DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
return -1;
}
+   if (mlx5_vdpa_wait_dev_close_tasks_done(priv))
+   return -1;
priv->vid = vid;
priv->connected = true;
if (mlx5_vdpa_mtu_set(priv))
@@ -444,8 +489,11 @@ mlx5_vdpa_dev_cleanup(int vid)
DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
return -1;
}
-   if (priv->state == MLX5_VDPA_STATE_PROBED)
+   if (priv->state == MLX5_VDPA_STATE_PROBED) {
+   if (priv->use_c_thread)
+   mlx5_vdpa_wait_dev_close_tasks_done(priv);
mlx5_vdpa_dev_cache_clean(priv);
+   }
priv->connected = false;
return 0;
 }
@@ -839,6 +887,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
mlx5_vdpa_dev_close(priv->vid);
+   if (priv->use_c_thread)
+   mlx5_vdpa_wait_dev_close_tasks_done(priv);
mlx5_vdpa_release_dev_resources(priv);
if (priv->vdev)
rte_vdpa_unregister_device(priv->vdev);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e08931719f..b6392b9d66 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
MLX5_VDPA_TASK_SETUP_VIRTQ,
MLX5_VDPA_TASK_STOP_VIRTQ,
+   MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
 };

[PATCH v1 13/17] vdpa/mlx5: add virtq creation task for MT management

2022-06-06 Thread Li Zhang

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |   9 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 149 +++---
 4 files changed, 134 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3316ce42be..35221f5ddc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ enum {
 /* Vdpa task types. */
 enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
+   MLX5_VDPA_TASK_SETUP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info {
 
 struct mlx5_vdpa_virtq {
SLIST_ENTRY(mlx5_vdpa_virtq) next;
-   uint8_t enable;
uint16_t index;
uint16_t vq_size;
uint8_t notifier_state;
-   bool stopped;
uint32_t configured:1;
+   uint32_t enable:1;
+   uint32_t stopped:1;
uint32_t version;
pthread_mutex_t virtq_lock;
struct mlx5_vdpa_priv *priv;
@@ -565,11 +566,13 @@ bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
enum mlx5_vdpa_task_type task_type,
-   uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+   uint32_t *remaining_cnt, uint32_t *err_cnt,
void **task_data, uint32_t num);
 int
 mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
 bool
 mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
uint32_t *err_cnt, uint32_t sleep_time);
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 10391931ae..1389d369ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 {
struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
pthread_t thread_id = pthread_self();
+   struct mlx5_vdpa_virtq *virtq;
struct mlx5_vdpa_priv *priv;
struct mlx5_vdpa_task task;
struct rte_ring *rng;
@@ -139,6 +140,19 @@ mlx5_vdpa_c_thread_handle(void *arg)
__ATOMIC_RELAXED);
}
break;
+   case MLX5_VDPA_TASK_SETUP_VIRTQ:
+   virtq = &priv->virtqs[task.idx];
+   pthread_mutex_lock(&virtq->virtq_lock);
+   ret = mlx5_vdpa_virtq_setup(priv,
+   task.idx, false);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to setup virtq %d.", task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1, __ATOMIC_RELAXED);
+   }
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
default:
DRV_LOG(ERR, "Invalid vdpa task type %d.",
task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b45fbac146..f782b6b832 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
goto unlock;
if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
goto unlock;
-   virtq->stopped = true;
+   virtq->stopped = 1;
/* Query error info. */
if (mlx5_vdpa_virtq_query(priv, vq_index))
goto log;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 0b317655db..db05220e76 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -111,8 +111,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+   if (virtq->index != i)
+   continue;
pthread_mutex_lock(&virtq->virtq_lock);
-   virtq->configured = 0;
for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
if (virtq->umems[j].obj) {
claim_zero(mlx5_glue->devx_umem_dereg
@@ -131,7 +132,6 @@ mlx5_vdpa_virtqs_cleanup(struct m

[PATCH v1 14/17] vdpa/mlx5: add virtq LM log task

2022-06-06 Thread Li Zhang

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |  3 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c  | 85 +--
 3 files changed, 105 insertions(+), 17 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 35221f5ddc..e08931719f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -72,6 +72,8 @@ enum {
MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_USED_RING_LEN(size) \
+   ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
@@ -81,6 +83,7 @@ enum {
 enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
MLX5_VDPA_TASK_SETUP_VIRTQ,
+   MLX5_VDPA_TASK_STOP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1389d369ae..98369f0887 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
struct mlx5_vdpa_priv *priv;
struct mlx5_vdpa_task task;
struct rte_ring *rng;
+   uint64_t features;
uint32_t thrd_idx;
uint32_t task_num;
int ret;
@@ -153,6 +154,39 @@ mlx5_vdpa_c_thread_handle(void *arg)
}
pthread_mutex_unlock(&virtq->virtq_lock);
break;
+   case MLX5_VDPA_TASK_STOP_VIRTQ:
+   virtq = &priv->virtqs[task.idx];
+   pthread_mutex_lock(&virtq->virtq_lock);
+   ret = mlx5_vdpa_virtq_stop(priv,
+   task.idx);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to stop virtq %d.",
+   task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1,
+   __ATOMIC_RELAXED);
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
+   }
+   ret = rte_vhost_get_negotiated_features(
+   priv->vid, &features);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to get negotiated features virtq %d.",
+   task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1,
+   __ATOMIC_RELAXED);
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
+   }
+   if (RTE_VHOST_NEED_LOG(features))
+   rte_vhost_log_used_vring(
+   priv->vid, task.idx, 0,
+   MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
default:
DRV_LOG(ERR, "Invalid vdpa task type %d.",
task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index efebf364d0..c2e78218ca 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -89,39 +89,90 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
return -1;
 }
 
-#define MLX5_VDPA_USED_RING_LEN(size) \
-   ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
-
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+   uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+   uint32_t i, thrd_idx, data[1];
struct mlx5_vdpa_virtq *virtq;
uint64_t features;
-   int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
-   int i;
+   int ret;
 
+   ret = rte_vhost_get_negotiated_features(priv->vid, &features);
if (ret) {
DRV_LOG(ERR, "Failed to get negotiated features.");
return -1;
}
-   if (!RTE_VHOST_NEED_LOG(features))
-   return 0;
-   for (i = 0; i < priv->nr_virtqs; ++i) {
-   virtq = &priv->virtqs[i];
-   if (!priv->virtqs[i].virtq) {
-   DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
-   } else {
+   if (priv->use_c_thread && priv->nr_virtqs) {
+   uint32_t main_task_idx[priv->nr_virtqs];
+
+   for (i = 0; i < p

[PATCH v1 15/17] vdpa/mlx5: add device close task

2022-06-06 Thread Li Zhang

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 56 +--
 drivers/vdpa/mlx5/mlx5_vdpa.h |  8 
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 +++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e3b32fa087..d000854c08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
-static void
+void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
/* Clean pre-created resource in dev removal only. */
@@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
mlx5_vdpa_mem_dereg(priv);
 }
 
+static bool
+mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
+{
+   uint32_t timeout = 0;
+
+   /* Check and wait all close tasks done. */
+   while (__atomic_load_n(&priv->dev_close_progress,
+   __ATOMIC_RELAXED) != 0 && timeout < 1000) {
+   rte_delay_us_sleep(1);
+   timeout++;
+   }
+   if (priv->dev_close_progress) {
+   DRV_LOG(ERR,
+   "Failed to wait close device tasks done vid %d.",
+   priv->vid);
+   return true;
+   }
+   return false;
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid)
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
+   if (priv->use_c_thread) {
+   if (priv->last_c_thrd_idx >=
+   (conf_thread_mng.max_thrds - 1))
+   priv->last_c_thrd_idx = 0;
+   else
+   priv->last_c_thrd_idx++;
+   __atomic_store_n(&priv->dev_close_progress,
+   1, __ATOMIC_RELAXED);
+   if (mlx5_vdpa_task_add(priv,
+   priv->last_c_thrd_idx,
+   MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+   NULL, NULL, NULL, 1)) {
+   DRV_LOG(ERR,
+   "Fail to add dev close task. ");
+   goto single_thrd;
+   }
+   priv->state = MLX5_VDPA_STATE_PROBED;
+   DRV_LOG(INFO, "vDPA device %d was closed.", vid);
+   return ret;
+   }
+single_thrd:
pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
pthread_mutex_unlock(&priv->steer_update_lock);
@@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid)
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-   priv->state = MLX5_VDPA_STATE_PROBED;
if (!priv->connected)
mlx5_vdpa_dev_cache_clean(priv);
priv->vid = 0;
+   __atomic_store_n(&priv->dev_close_progress, 0,
+   __ATOMIC_RELAXED);
+   priv->state = MLX5_VDPA_STATE_PROBED;
DRV_LOG(INFO, "vDPA device %d was closed.", vid);
return ret;
 }
@@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid)
DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
return -1;
}
+   if (mlx5_vdpa_wait_dev_close_tasks_done(priv))
+   return -1;
priv->vid = vid;
priv->connected = true;
if (mlx5_vdpa_mtu_set(priv))
@@ -444,8 +489,11 @@ mlx5_vdpa_dev_cleanup(int vid)
DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
return -1;
}
-   if (priv->state == MLX5_VDPA_STATE_PROBED)
+   if (priv->state == MLX5_VDPA_STATE_PROBED) {
+   if (priv->use_c_thread)
+   mlx5_vdpa_wait_dev_close_tasks_done(priv);
mlx5_vdpa_dev_cache_clean(priv);
+   }
priv->connected = false;
return 0;
 }
@@ -839,6 +887,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
mlx5_vdpa_dev_close(priv->vid);
+   if (priv->use_c_thread)
+   mlx5_vdpa_wait_dev_close_tasks_done(priv);
mlx5_vdpa_release_dev_resources(priv);
if (priv->vdev)
rte_vdpa_unregister_device(priv->vdev);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e08931719f..b6392b9d66 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
MLX5_VDPA_TASK_SETUP_VIRTQ,
MLX5_VDPA_TASK_STOP_VIRTQ,
+   MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
 };

[PATCH 15/16] vdpa/mlx5: add virtq sub-resources creation

2022-06-06 Thread Li Zhang

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang 
Signed-off-by: Yajun Wu 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   | 72 +++--
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +
 5 files changed, 123 insertions(+), 93 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d000854c08..f006a9cd3f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -627,65 +627,39 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-   struct mlx5_vdpa_virtq *virtq;
+   uint32_t max_queues;
uint32_t index;
-   uint32_t i;
+   struct mlx5_vdpa_virtq *virtq;
 
-   for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+   for (index = 0; index < priv->caps.max_num_virtio_queues;
index++) {
virtq = &priv->virtqs[index];
pthread_mutex_init(&virtq->virtq_lock, NULL);
}
-   if (!priv->queues)
+   if (!priv->queues || !priv->queue_size)
return 0;
-   for (index = 0; index < (priv->queues * 2); ++index) {
+   max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
+   (priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+   for (index = 0; index < max_queues; ++index)
+   if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+   index))
+   goto error;
+   if (mlx5_vdpa_is_modify_virtq_supported(priv))
+   if (mlx5_vdpa_steer_update(priv, true))
+   goto error;
+   return 0;
+error:
+   for (index = 0; index < max_queues; ++index) {
virtq = &priv->virtqs[index];
-   int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-   -1, virtq);
-
-   if (ret) {
-   DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-   index);
-   return -1;
-   }
-   if (priv->caps.queue_counters_valid) {
-   if (!virtq->counters)
-   virtq->counters =
-   mlx5_devx_cmd_create_virtio_q_counters
-   (priv->cdev->ctx);
-   if (!virtq->counters) {
-   DRV_LOG(ERR, "Failed to create virtq couners 
for virtq"
-   " %d.", index);
-   return -1;
-   }
-   }
-   for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-   uint32_t size;
-   void *buf;
-   struct mlx5dv_devx_umem *obj;
-
-   size = priv->caps.umems[i].a * priv->queue_size +
-   priv->caps.umems[i].b;
-   buf = rte_zmalloc(__func__, size, 4096);
-   if (buf == NULL) {
-   DRV_LOG(ERR, "Cannot allocate umem %d memory 
for virtq"
-   " %u.", i, index);
-   return -1;
-   }
-   obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
-   size, IBV_ACCESS_LOCAL_WRITE);
-   if (obj == NULL) {
-   rte_free(buf);
-   DRV_LOG(ERR, "Failed to register umem %d for 
virtq %u.",
-   i, index);
-   return -1;
-   }
-   virtq->umems[i].size = size;
-   virtq->umems[i].buf = buf;
-   virtq->umems[i].obj = obj;
+   if (virtq->virtq) {
+   pthread_mutex_lock(&virtq->virtq_lock);
+   mlx5_vdpa_virtq_unset(virtq);
+   pthread_mutex_unlock(&virtq->virtq_lock);
}
}
-   return 0;
+   if (mlx5_vdpa_is_modify_virtq_supported(priv))
+   mlx5_vdpa_steer_unset(priv);
+   return -1;
 }
 
 static int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index b6392b9d66..f353db62ac 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);

[PATCH 16/16] vdpa/mlx5: prepare virtqueue resource creation

2022-06-06 Thread Li Zhang

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 115 --
 drivers/vdpa/mlx5/mlx5_vdpa.h |  12 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  15 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 111 +
 4 files changed, 208 insertions(+), 45 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f006a9cd3f..c5d82872c7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -275,23 +275,18 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv 
*priv)
 }
 
 static int
-mlx5_vdpa_dev_close(int vid)
+_internal_mlx5_vdpa_dev_close(struct mlx5_vdpa_priv *priv,
+   bool release_resource)
 {
-   struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
-   struct mlx5_vdpa_priv *priv =
-   mlx5_vdpa_find_priv_resource_by_vdev(vdev);
int ret = 0;
+   int vid = priv->vid;
 
-   if (priv == NULL) {
-   DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
-   return -1;
-   }
mlx5_vdpa_cqe_event_unset(priv);
if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
-   if (priv->use_c_thread) {
+   if (priv->use_c_thread && !release_resource) {
if (priv->last_c_thrd_idx >=
(conf_thread_mng.max_thrds - 1))
priv->last_c_thrd_idx = 0;
@@ -315,7 +310,7 @@ mlx5_vdpa_dev_close(int vid)
pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
pthread_mutex_unlock(&priv->steer_update_lock);
-   mlx5_vdpa_virtqs_release(priv);
+   mlx5_vdpa_virtqs_release(priv, release_resource);
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
@@ -329,6 +324,24 @@ mlx5_vdpa_dev_close(int vid)
return ret;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+   struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+   struct mlx5_vdpa_priv *priv;
+
+   if (!vdev) {
+   DRV_LOG(ERR, "Invalid vDPA device.");
+   return -1;
+   }
+   priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+   if (priv == NULL) {
+   DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+   return -1;
+   }
+   return _internal_mlx5_vdpa_dev_close(priv, false);
+}
+
 static int
 mlx5_vdpa_dev_config(int vid)
 {
@@ -624,11 +637,33 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
priv->queue_size);
 }
 
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv)
+{
+   uint32_t max_queues, index;
+   struct mlx5_vdpa_virtq *virtq;
+
+   if (!priv->queues || !priv->queue_size)
+   return;
+   max_queues = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+   (priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+   if (mlx5_vdpa_is_modify_virtq_supported(priv))
+   mlx5_vdpa_steer_unset(priv);
+   for (index = 0; index < max_queues; ++index) {
+   virtq = &priv->virtqs[index];
+   if (virtq->virtq) {
+   pthread_mutex_lock(&virtq->virtq_lock);
+   mlx5_vdpa_virtq_unset(virtq);
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   }
+   }
+}
+
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-   uint32_t max_queues;
-   uint32_t index;
+   uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+   uint32_t max_queues, index, thrd_idx, data[1];
struct mlx5_vdpa_virtq *virtq;
 
for (index = 0; index < priv->caps.max_num_virtio_queues;
@@ -640,25 +675,53 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv 
*priv)
return 0;
max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
-   for (index = 0; index < max_queues; ++index)
-   if (mlx5_vdpa_virtq_single_resource_prepare(priv,
-   index))
+   if (priv->use_c_thread) {
+   uint32_t main_task_idx[max_queues];
+
+   for (index = 0; index < max_queues; ++index) {
+   thrd_idx = index % (conf_thread_mng.max_thrds + 1);
+   if (!thrd_idx) {
+   main_task_idx[task_num] = index;
+   task_num++;
+   continue;
+   }
+

[PATCH v1 16/17] vdpa/mlx5: add virtq sub-resources creation

2022-06-06 Thread Li Zhang

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang 
Signed-off-by: Yajun Wu 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   | 72 +++--
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +
 5 files changed, 123 insertions(+), 93 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d000854c08..f006a9cd3f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -627,65 +627,39 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-   struct mlx5_vdpa_virtq *virtq;
+   uint32_t max_queues;
uint32_t index;
-   uint32_t i;
+   struct mlx5_vdpa_virtq *virtq;
 
-   for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+   for (index = 0; index < priv->caps.max_num_virtio_queues;
index++) {
virtq = &priv->virtqs[index];
pthread_mutex_init(&virtq->virtq_lock, NULL);
}
-   if (!priv->queues)
+   if (!priv->queues || !priv->queue_size)
return 0;
-   for (index = 0; index < (priv->queues * 2); ++index) {
+   max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
+   (priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+   for (index = 0; index < max_queues; ++index)
+   if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+   index))
+   goto error;
+   if (mlx5_vdpa_is_modify_virtq_supported(priv))
+   if (mlx5_vdpa_steer_update(priv, true))
+   goto error;
+   return 0;
+error:
+   for (index = 0; index < max_queues; ++index) {
virtq = &priv->virtqs[index];
-   int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-   -1, virtq);
-
-   if (ret) {
-   DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-   index);
-   return -1;
-   }
-   if (priv->caps.queue_counters_valid) {
-   if (!virtq->counters)
-   virtq->counters =
-   mlx5_devx_cmd_create_virtio_q_counters
-   (priv->cdev->ctx);
-   if (!virtq->counters) {
-   DRV_LOG(ERR, "Failed to create virtq couners 
for virtq"
-   " %d.", index);
-   return -1;
-   }
-   }
-   for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-   uint32_t size;
-   void *buf;
-   struct mlx5dv_devx_umem *obj;
-
-   size = priv->caps.umems[i].a * priv->queue_size +
-   priv->caps.umems[i].b;
-   buf = rte_zmalloc(__func__, size, 4096);
-   if (buf == NULL) {
-   DRV_LOG(ERR, "Cannot allocate umem %d memory 
for virtq"
-   " %u.", i, index);
-   return -1;
-   }
-   obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
-   size, IBV_ACCESS_LOCAL_WRITE);
-   if (obj == NULL) {
-   rte_free(buf);
-   DRV_LOG(ERR, "Failed to register umem %d for 
virtq %u.",
-   i, index);
-   return -1;
-   }
-   virtq->umems[i].size = size;
-   virtq->umems[i].buf = buf;
-   virtq->umems[i].obj = obj;
+   if (virtq->virtq) {
+   pthread_mutex_lock(&virtq->virtq_lock);
+   mlx5_vdpa_virtq_unset(virtq);
+   pthread_mutex_unlock(&virtq->virtq_lock);
}
}
-   return 0;
+   if (mlx5_vdpa_is_modify_virtq_supported(priv))
+   mlx5_vdpa_steer_unset(priv);
+   return -1;
 }
 
 static int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index b6392b9d66..f353db62ac 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);

[PATCH v1 17/17] vdpa/mlx5: prepare virtqueue resource creation

2022-06-06 Thread Li Zhang

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 115 --
 drivers/vdpa/mlx5/mlx5_vdpa.h |  12 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  15 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 111 +
 4 files changed, 208 insertions(+), 45 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f006a9cd3f..c5d82872c7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -275,23 +275,18 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv 
*priv)
 }
 
 static int
-mlx5_vdpa_dev_close(int vid)
+_internal_mlx5_vdpa_dev_close(struct mlx5_vdpa_priv *priv,
+   bool release_resource)
 {
-   struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
-   struct mlx5_vdpa_priv *priv =
-   mlx5_vdpa_find_priv_resource_by_vdev(vdev);
int ret = 0;
+   int vid = priv->vid;
 
-   if (priv == NULL) {
-   DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
-   return -1;
-   }
mlx5_vdpa_cqe_event_unset(priv);
if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
-   if (priv->use_c_thread) {
+   if (priv->use_c_thread && !release_resource) {
if (priv->last_c_thrd_idx >=
(conf_thread_mng.max_thrds - 1))
priv->last_c_thrd_idx = 0;
@@ -315,7 +310,7 @@ mlx5_vdpa_dev_close(int vid)
pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
pthread_mutex_unlock(&priv->steer_update_lock);
-   mlx5_vdpa_virtqs_release(priv);
+   mlx5_vdpa_virtqs_release(priv, release_resource);
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
@@ -329,6 +324,24 @@ mlx5_vdpa_dev_close(int vid)
return ret;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+   struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+   struct mlx5_vdpa_priv *priv;
+
+   if (!vdev) {
+   DRV_LOG(ERR, "Invalid vDPA device.");
+   return -1;
+   }
+   priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+   if (priv == NULL) {
+   DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+   return -1;
+   }
+   return _internal_mlx5_vdpa_dev_close(priv, false);
+}
+
 static int
 mlx5_vdpa_dev_config(int vid)
 {
@@ -624,11 +637,33 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
priv->queue_size);
 }
 
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv)
+{
+   uint32_t max_queues, index;
+   struct mlx5_vdpa_virtq *virtq;
+
+   if (!priv->queues || !priv->queue_size)
+   return;
+   max_queues = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+   (priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+   if (mlx5_vdpa_is_modify_virtq_supported(priv))
+   mlx5_vdpa_steer_unset(priv);
+   for (index = 0; index < max_queues; ++index) {
+   virtq = &priv->virtqs[index];
+   if (virtq->virtq) {
+   pthread_mutex_lock(&virtq->virtq_lock);
+   mlx5_vdpa_virtq_unset(virtq);
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   }
+   }
+}
+
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-   uint32_t max_queues;
-   uint32_t index;
+   uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+   uint32_t max_queues, index, thrd_idx, data[1];
struct mlx5_vdpa_virtq *virtq;
 
for (index = 0; index < priv->caps.max_num_virtio_queues;
@@ -640,25 +675,53 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv 
*priv)
return 0;
max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
-   for (index = 0; index < max_queues; ++index)
-   if (mlx5_vdpa_virtq_single_resource_prepare(priv,
-   index))
+   if (priv->use_c_thread) {
+   uint32_t main_task_idx[max_queues];
+
+   for (index = 0; index < max_queues; ++index) {
+   thrd_idx = index % (conf_thread_mng.max_thrds + 1);
+   if (!thrd_idx) {
+   main_task_idx[task_num] = index;
+   task_num++;
+   continue;
+   }
+

Re: [PATCH v1 5/5] examples/l3fwd: enable direct rearm mode

2022-06-06 Thread Konstantin Ananyev


31/05/2022 18:14, Honnappa Nagarahalli пишет:




25/05/2022 01:24, Honnappa Nagarahalli пишет:

From: Konstantin Ananyev 

20/04/2022 09:16, Feifei Wang Ð¿Ð¸ÑˆÐµÑ‚:

Enable direct rearm mode. The mapping is decided in the data plane
based on the first packet received.

Suggested-by: Honnappa Nagarahalli 
Signed-off-by: Feifei Wang 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Honnappa Nagarahalli 
---
   examples/l3fwd/l3fwd_lpm.c | 16 +++-
   1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
index bec22c44cd..38ffdf4636 100644
--- a/examples/l3fwd/l3fwd_lpm.c
+++ b/examples/l3fwd/l3fwd_lpm.c
@@ -147,7 +147,7 @@ lpm_main_loop(__rte_unused void *dummy)
   unsigned lcore_id;
   uint64_t prev_tsc, diff_tsc, cur_tsc;
   int i, nb_rx;
-    uint16_t portid;
+    uint16_t portid, tx_portid;
   uint8_t queueid;
   struct lcore_conf *qconf;
   const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) /
@@ -158,6 +158,8 @@ lpm_main_loop(__rte_unused void *dummy)
   const uint16_t n_rx_q = qconf->n_rx_queue;
   const uint16_t n_tx_p = qconf->n_tx_port;
+    int direct_rearm_map[n_rx_q];
+
   if (n_rx_q == 0) {
   RTE_LOG(INFO, L3FWD, "lcore %u has nothing to do\n",
lcore_id);
   return 0;
@@ -169,6 +171,7 @@ lpm_main_loop(__rte_unused void *dummy)
   portid = qconf->rx_queue_list[i].port_id;
   queueid = qconf->rx_queue_list[i].queue_id;
+    direct_rearm_map[i] = 0;
   RTE_LOG(INFO, L3FWD,
   " -- lcoreid=%u portid=%u rxqueueid=%hhu\n",
   lcore_id, portid, queueid); @@ -209,6 +212,17 @@
lpm_main_loop(__rte_unused void *dummy)
   if (nb_rx == 0)
   continue;
+    /* Determine the direct rearm mapping based on the
+first
+ * packet received on the rx queue
+ */
+    if (direct_rearm_map[i] == 0) {
+    tx_portid = lpm_get_dst_port(qconf, pkts_burst[0],
+    portid);
+    rte_eth_direct_rxrearm_map(portid, queueid,
+    tx_portid, queueid);
+    direct_rearm_map[i] = 1;
+    }
+



That just doesn't look right to me: why to make decision based on the
first packet?

The TX queue depends on the incoming packet. So, this method covers
more scenarios than doing it in the control plane where the outgoing
queue is not known.



What would happen if second and all other packets have to be routed
to different ports?

This is an example application and it should be fine to make this
assumption.
More over, it does not cause any problems if packets change in between.
When
the packets change back, the feature works again.


In fact, this direct-rearm mode seems suitable only for hard-coded
one to one mapped forwarding (examples/l2fwd, testpmd).
For l3fwd it can be used safely only when we have one port in use.

Can you elaborate more on the safety issue when more than one port is

used?



Also I think it should be selected at init-time and it shouldn't be
on by default.
To summarize, my opinion:
special cmd-line parameter to enable it.

Can you please elaborate why a command line parameter is required?
Other similar features like RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE are
enabled without a command line parameter. IMO, this is how it should
ber. Essentially we are trying to measure how different PMDs perform,
the ones that have implemented performance improvement features

would

show better performance (i.e. the PMDs implementing the features
should not be penalized by asking for additional user input).


  From my perspective, main purpose of l3fwd application is to demonstrate
DPDK ability to do packet routing based on input packet contents.
Making guesses about packet contents is a change in expected behavior.
For some cases it might improve performance, for many others - will most
likely cause performance drop.
I think that performance drop as default behavior (running the same
parameters as before) should not be allowed.
Plus you did not provided ability to switch off that behavior, if undesired.

There is no drop in L3fwd performance due to this patch.


Hmm..
Are you saying even when your guess is wrong, and you constantly hitting
slow-path (check tx_queue first - failure, then allocate from mempool)
you didn't observe any performance drop?
There is more work to do, and if workload is cpu-bound,
my guess - it should be noticeable.
Also, from previous experience, quite often even after
tiny changes in rx/tx code-path some slowdown was reported.
Usually that happened on some low-end ARM cpus (Marvell, NXP).




About comparison with RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE default
enablement - I don't think it is correct.
Within l3fwd app we can safely guarantee that all
RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE pre-requirements are met:
in each TX queue all mbufs will belong to the same mempool and th

Re: [PATCH v2] net/igc: add I226 support

2022-06-06 Thread Thomas Monjalon

06/06/2022 12:54, Zhang, Qi Z:
> 
> > -Original Message-
> > From: Thomas Monjalon 
> > Sent: Monday, June 6, 2022 6:49 PM
> > To: Yang, Qiming ; Zhang, Qi Z
> > 
> > Cc: dev@dpdk.org; Liu, KevinX 
> > Subject: Re: [PATCH v2] net/igc: add I226 support
> > 
> > 06/06/2022 01:12, Zhang, Qi Z:
> > >
> > > > -Original Message-
> > > > From: Thomas Monjalon 
> > > > Sent: Monday, June 6, 2022 12:42 AM
> > > > To: Zhang, Qi Z ; Yang, Qiming
> > > > 
> > > > Cc: dev@dpdk.org; Liu, KevinX 
> > > > Subject: Re: [PATCH v2] net/igc: add I226 support
> > > >
> > > > 25/05/2022 07:57, Qiming Yang:
> > > > > Added I226 Series device ID in igc driver and updated igc guide
> > > > > document for new devices.
> > > > >
> > > > > Signed-off-by: Qiming Yang 
> > > > > Signed-off-by: Kevin Liu 
> > > > > ---
> > > > > v2:
> > > > > * rebased
> > > > > ---
> > > > >  doc/guides/nics/igc.rst| 14 +++---
> > > > >  doc/guides/rel_notes/release_22_03.rst |  5 +
> > > >
> > > > You are sending a patch after 22.03 is closed, so it should be listed in
> > 22.07!
> > > >
> > > > I will fix while pulling the tree prepared by Qi.
> > > > Please be more careful with the basic checks.
> > >
> > > Thanks for capture this, have dropped this patch in dpdk-next-net-intel.
> > > A new version is required.
> > 
> > Too late, it is in the main tree with release notes fixed.
> > Do you need more fix?
> 
> OK, I guess we need to revert it with a new fix.
> Sorry for the chaos...

Why revert? If there is a bug, just fix it.

[PATCH v1 00/17] Add vDPA multi-threads optiomization

2022-06-06 Thread Li Zhang

Allow the driver to use internal threads to
obtain fast configuration.
All the threads will be open on the same core of
the event completion queue scheduling thread.

Add max_conf_threads parameter to configure
the maximum number of internal threads in addition to
the caller thread (8 is suggested).
These internal threads to pipeline handle VDPA tasks
in system and shared with all VDPA devices.
Default is 0, don't use internal threads for configuration.

Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
http://patchwork.dpdk.org/project/dpdk/list/?series=21868

RFC ("Add vDPA multi-threads optiomization")
https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-l...@nvidia.com/

Li Zhang (12):
  vdpa/mlx5: fix usage of capability for max number of virtqs
  common/mlx5: extend virtq modifiable fields
  vdpa/mlx5: pre-create virtq in the prob
  vdpa/mlx5: optimize datapath-control synchronization
  vdpa/mlx5: add multi-thread management for configuration
  vdpa/mlx5: add task ring for MT management
  vdpa/mlx5: add MT task for VM memory registration
  vdpa/mlx5: add virtq creation task for MT management
  vdpa/mlx5: add virtq LM log task
  vdpa/mlx5: add device close task
  vdpa/mlx5: add virtq sub-resources creation
  vdpa/mlx5: prepare virtqueue resource creation

Yajun Wu (5):
  eal: add device removal in rte cleanup
  examples/vdpa: fix devices cleanup
  vdpa/mlx5: support pre create virtq resource
  common/mlx5: add DevX API to move QP to reset state
  vdpa/mlx5: support event qp reuse

 doc/guides/vdpadevs/mlx5.rst  |  25 +
 drivers/common/mlx5/mlx5_devx_cmds.c  |  77 ++-
 drivers/common/mlx5/mlx5_devx_cmds.h  |   6 +-
 drivers/common/mlx5/mlx5_prm.h|  30 +-
 drivers/vdpa/mlx5/meson.build |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c | 270 +--
 drivers/vdpa/mlx5/mlx5_vdpa.h | 152 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 360 ++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   | 160 +--
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c  | 128 -
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 270 +++
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c   |  22 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 654 +++---
 examples/vdpa/main.c  |   5 +-
 lib/eal/freebsd/eal.c |  33 ++
 lib/eal/include/rte_dev.h |   6 +
 lib/eal/linux/eal.c   |  33 ++
 lib/eal/windows/eal.c |  33 ++
 18 files changed, 1878 insertions(+), 387 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

-- 
2.31.1

[PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs

2022-06-06 Thread Li Zhang

The driver wrongly takes the capability value for
the number of virtq pairs instead of just the number of virtqs.

Adjust all the usages of it to be the number of virtqs.

Fixes: c2eb33a ("vdpa/mlx5: manage virtqs by array")
Cc: sta...@dpdk.org

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   | 12 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 76fa5d4299..ee71339b78 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -84,7 +84,7 @@ mlx5_vdpa_get_queue_num(struct rte_vdpa_device *vdev, 
uint32_t *queue_num)
DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
return -1;
}
-   *queue_num = priv->caps.max_num_virtio_queues;
+   *queue_num = priv->caps.max_num_virtio_queues / 2;
return 0;
 }
 
@@ -141,7 +141,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
return -EINVAL;
}
-   if (vring >= (int)priv->caps.max_num_virtio_queues * 2) {
+   if (vring >= (int)priv->caps.max_num_virtio_queues) {
DRV_LOG(ERR, "Too big vring id: %d.", vring);
return -E2BIG;
}
@@ -388,7 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
return -ENODEV;
}
-   if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+   if (qid >= (int)priv->caps.max_num_virtio_queues) {
DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
vdev->device->name);
return -E2BIG;
@@ -411,7 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
return -ENODEV;
}
-   if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+   if (qid >= (int)priv->caps.max_num_virtio_queues) {
DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
vdev->device->name);
return -E2BIG;
@@ -624,7 +624,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
DRV_LOG(DEBUG, "No capability to support virtq statistics.");
priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv) +
   sizeof(struct mlx5_vdpa_virtq) *
-  attr->vdpa.max_num_virtio_queues * 2,
+  attr->vdpa.max_num_virtio_queues,
   RTE_CACHE_LINE_SIZE);
if (!priv) {
DRV_LOG(ERR, "Failed to allocate private memory.");
@@ -685,7 +685,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
uint32_t i;
 
mlx5_vdpa_dev_cache_clean(priv);
-   for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+   for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
if (!priv->virtqs[i].counters)
continue;
claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters));
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e025be47d2..c258eb3024 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -72,7 +72,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
unsigned int i, j;
 
-   for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+   for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
@@ -492,9 +492,9 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
DRV_LOG(INFO, "TSO is enabled without CSUM, force CSUM.");
priv->features |= (1ULL << VIRTIO_NET_F_CSUM);
}
-   if (nr_vring > priv->caps.max_num_virtio_queues * 2) {
+   if (nr_vring > priv->caps.max_num_virtio_queues) {
DRV_LOG(ERR, "Do not support more than %d virtqs(%d).",
-   (int)priv->caps.max_num_virtio_queues * 2,
+   (int)priv->caps.max_num_virtio_queues,
(int)nr_vring);
return -1;
}
-- 
2.31.1

[PATCH v1 02/17] eal: add device removal in rte cleanup

2022-06-06 Thread Li Zhang

From: Yajun Wu 

Add device removal in function rte_eal_cleanup. This is the last chance
device remove get called for sanity. Loop vdev bus first and then all bus
for all device, calling rte_dev_remove.

Cc: sta...@dpdk.org

Signed-off-by: Yajun Wu 
---
 lib/eal/freebsd/eal.c | 33 +
 lib/eal/include/rte_dev.h |  6 ++
 lib/eal/linux/eal.c   | 33 +
 lib/eal/windows/eal.c | 33 +
 4 files changed, 105 insertions(+)

diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index a6b20960f2..5ffd9146b6 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -886,11 +886,44 @@ rte_eal_init(int argc, char **argv)
return fctret;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+   RTE_SET_USED(bus);
+   RTE_SET_USED(data);
+   return 0;
+}
+
+static void
+remove_all_device(void)
+{
+   struct rte_bus *start = NULL, *next;
+   struct rte_dev_iterator dev_iter = {0};
+   struct rte_device *dev = NULL;
+   struct rte_device *tdev = NULL;
+   char devstr[128];
+
+   RTE_DEV_FOREACH_SAFE(dev, "bus=vdev", &dev_iter, tdev) {
+   (void)rte_dev_remove(dev);
+   }
+   while ((next = rte_bus_find(start, bus_match_all, NULL)) != NULL) {
+   start = next;
+   /* Skip buses that don't have iterate method */
+   if (!next->dev_iterate || !next->name)
+   continue;
+   snprintf(devstr, sizeof(devstr), "bus=%s", next->name);
+   RTE_DEV_FOREACH_SAFE(dev, devstr, &dev_iter, tdev) {
+   (void)rte_dev_remove(dev);
+   }
+   };
+}
+
 int
 rte_eal_cleanup(void)
 {
struct internal_config *internal_conf =
eal_get_internal_configuration();
+   remove_all_device();
rte_service_finalize();
rte_mp_channel_cleanup();
/* after this point, any DPDK pointers will become dangling */
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index e6ff1218f9..382d548ea3 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -492,6 +492,12 @@ int
 rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
  size_t len);
 
+#define RTE_DEV_FOREACH_SAFE(dev, devstr, it, tdev) \
+   for (rte_dev_iterator_init(it, devstr), \
+   (dev) = rte_dev_iterator_next(it); \
+   (dev) && ((tdev) = rte_dev_iterator_next(it), 1); \
+   (dev) = (tdev))
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 1ef263434a..30b295916e 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1248,6 +1248,38 @@ mark_freeable(const struct rte_memseg_list *msl, const 
struct rte_memseg *ms,
return 0;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+   RTE_SET_USED(bus);
+   RTE_SET_USED(data);
+   return 0;
+}
+
+static void
+remove_all_device(void)
+{
+   struct rte_bus *start = NULL, *next;
+   struct rte_dev_iterator dev_iter = {0};
+   struct rte_device *dev = NULL;
+   struct rte_device *tdev = NULL;
+   char devstr[128];
+
+   RTE_DEV_FOREACH_SAFE(dev, "bus=vdev", &dev_iter, tdev) {
+   (void)rte_dev_remove(dev);
+   }
+   while ((next = rte_bus_find(start, bus_match_all, NULL)) != NULL) {
+   start = next;
+   /* Skip buses that don't have iterate method */
+   if (!next->dev_iterate || !next->name)
+   continue;
+   snprintf(devstr, sizeof(devstr), "bus=%s", next->name);
+   RTE_DEV_FOREACH_SAFE(dev, devstr, &dev_iter, tdev) {
+   (void)rte_dev_remove(dev);
+   }
+   };
+}
+
 int
 rte_eal_cleanup(void)
 {
@@ -1257,6 +1289,7 @@ rte_eal_cleanup(void)
struct internal_config *internal_conf =
eal_get_internal_configuration();
 
+   remove_all_device();
if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
internal_conf->hugepage_file.unlink_existing)
rte_memseg_walk(mark_freeable, NULL);
diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
index 122de2a319..3d7d411293 100644
--- a/lib/eal/windows/eal.c
+++ b/lib/eal/windows/eal.c
@@ -254,12 +254,45 @@ __rte_trace_point_register(rte_trace_point_t *trace, 
const char *name,
return -ENOTSUP;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+   RTE_SET_USED(bus);
+   RTE_SET_USED(data);
+   return 0;
+}
+
+static void
+remove_all_device(void)
+{
+   struct rte_bus *start = NULL, *next;
+   struct rte_dev_iterator dev_iter = {0};
+   struct rte_device *dev = NULL;
+   struct rte_device *tdev = NULL;
+   char devstr[128];
+
+   RTE_DEV_FOREACH_SAFE(dev, "b

[PATCH v1 03/17] examples/vdpa: fix devices cleanup

2022-06-06 Thread Li Zhang

From: Yajun Wu 

Move rte_eal_cleanup to function vdpa_sample_quit which
handling all example app quit.
Otherwise rte_eal_cleanup won't be called on receiving signal
like SIGINT(control + c).

Fixes: 10aa3757 ("examples: add eal cleanup to examples")
Cc: sta...@dpdk.org

Signed-off-by: Yajun Wu 
---
 examples/vdpa/main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/examples/vdpa/main.c b/examples/vdpa/main.c
index 7e11ef4e26..62e32b633d 100644
--- a/examples/vdpa/main.c
+++ b/examples/vdpa/main.c
@@ -286,6 +286,8 @@ vdpa_sample_quit(void)
if (vports[i].ifname[0] != '\0')
close_vdpa(&vports[i]);
}
+   /* clean up the EAL */
+   rte_eal_cleanup();
 }
 
 static void
@@ -632,8 +634,5 @@ main(int argc, char *argv[])
vdpa_sample_quit();
}
 
-   /* clean up the EAL */
-   rte_eal_cleanup();
-
return 0;
 }
-- 
2.31.1

[PATCH v1 04/17] vdpa/mlx5: support pre create virtq resource

2022-06-06 Thread Li Zhang

From: Yajun Wu 

The motivation of this change is to reduce vDPA device queue creation
time by create some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu 
---
 doc/guides/vdpadevs/mlx5.rst  | 14 +++
 drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++-
 drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 3ded142311..0ad77bf535 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 
drivers.
 
   - 0, HW default.
 
+- ``queue_size`` parameter [int]
+
+  - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up
+first time queue creation. Set it together with queues devarg.
+
+  - 0, default value, no pre-create virtq resource.
+
+- ``queues`` parameter [int]
+
+  - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx 
queue)
+for pre-create queue resource to speed up first time queue creation. Set it
+together with queue_size devarg.
+
+  - 0, default value, no pre-create virtq resource.
 
 Error handling
 ^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee71339b78..faf833ee2f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 static void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
-   mlx5_vdpa_virtqs_cleanup(priv);
+   /* Clean pre-created resource in dev removal only. */
+   if (!priv->queues)
+   mlx5_vdpa_virtqs_cleanup(priv);
mlx5_vdpa_mem_dereg(priv);
 }
 
@@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char 
*val, void *opaque)
priv->hw_max_latency_us = (uint32_t)tmp;
} else if (strcmp(key, "hw_max_pending_comp") == 0) {
priv->hw_max_pending_comp = (uint32_t)tmp;
+   } else if (strcmp(key, "queue_size") == 0) {
+   priv->queue_size = (uint16_t)tmp;
+   } else if (strcmp(key, "queues") == 0) {
+   priv->queues = (uint16_t)tmp;
+   } else {
+   DRV_LOG(WARNING, "Invalid key %s.", key);
}
return 0;
 }
@@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
if (!priv->event_us &&
priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER)
priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US;
+   if ((priv->queue_size && !priv->queues) ||
+   (!priv->queue_size && priv->queues)) {
+   priv->queue_size = 0;
+   priv->queues = 0;
+   DRV_LOG(WARNING, "Please provide both queue_size and queues.");
+   }
DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode);
DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us);
DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
+   DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues,
+   priv->queue_size);
+}
+
+static int
+mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
+{
+   uint32_t index;
+   uint32_t i;
+
+   if (!priv->queues)
+   return 0;
+   for (index = 0; index < (priv->queues * 2); ++index) {
+   struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+
+   if (priv->caps.queue_counters_valid) {
+   if (!virtq->counters)
+   virtq->counters =
+   mlx5_devx_cmd_create_virtio_q_counters
+   (priv->cdev->ctx);
+   if (!virtq->counters) {
+   DRV_LOG(ERR, "Failed to create virtq couners 
for virtq"
+   " %d.", index);
+   return -1;
+   }
+   }
+   for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+   uint32_t size;
+   void *buf;
+   struct mlx5dv_devx_umem *obj;
+
+

[PATCH v1 05/17] common/mlx5: add DevX API to move QP to reset state

2022-06-06 Thread Li Zhang

From: Yajun Wu 

Support set QP to RESET state.

Signed-off-by: Yajun Wu 
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++
 drivers/common/mlx5/mlx5_prm.h   | 17 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c 
b/drivers/common/mlx5/mlx5_devx_cmds.c
index c6bdbc12bb..1d6d6578d6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2264,11 +2264,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, 
uint32_t qp_st_mod_op,
uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+   uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)];
} in;
union {
uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+   uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)];
} out;
void *qpc;
int ret;
@@ -2311,6 +2313,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, 
uint32_t qp_st_mod_op,
inlen = sizeof(in.rtr2rts);
outlen = sizeof(out.rtr2rts);
break;
+   case MLX5_CMD_OP_QP_2RST:
+   MLX5_SET(2rst_qp_in, &in, qpn, qp->id);
+   inlen = sizeof(in.qp2rst);
+   outlen = sizeof(out.qp2rst);
+   break;
default:
DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
qp_st_mod_op);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index bc3e70a1d1..8a2f55c33e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3657,6 +3657,23 @@ struct mlx5_ifc_init2init_qp_in_bits {
u8 reserved_at_800[0x80];
 };
 
+struct mlx5_ifc_2rst_qp_out_bits {
+   u8 status[0x8];
+   u8 reserved_at_8[0x18];
+   u8 syndrome[0x20];
+   u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_2rst_qp_in_bits {
+   u8 opcode[0x10];
+   u8 uid[0x10];
+   u8 vhca_tunnel_id[0x10];
+   u8 op_mod[0x10];
+   u8 reserved_at_80[0x8];
+   u8 qpn[0x18];
+   u8 reserved_at_a0[0x20];
+};
+
 struct mlx5_ifc_dealloc_pd_out_bits {
u8 status[0x8];
u8 reserved_0[0x18];
-- 
2.31.1

[PATCH v1 06/17] vdpa/mlx5: support event qp reuse

2022-06-06 Thread Li Zhang

From: Yajun Wu 

To speed up queue create time, event qp and cq will create only once.
Each virtq creation will reuse same event qp and cq.

Because FW will set event qp to error state during virtq destroy,
need modify event qp to RESET state, then modify qp to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
previous. Add new variable qp_ci to save SW qp ci. Move qp pi
independently with cq ci.

Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
release.

Signed-off-by: Yajun Wu 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   |  8 
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 12 +-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index faf833ee2f..ee99952e11 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid)
}
mlx5_vdpa_steer_unset(priv);
mlx5_vdpa_virtqs_release(priv);
+   mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
priv->state = MLX5_VDPA_STATE_PROBED;
@@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv 
*priv)
return 0;
for (index = 0; index < (priv->queues * 2); ++index) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+   int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
+   -1, &virtq->eqp);
 
+   if (ret) {
+   DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+   index);
+   return -1;
+   }
if (priv->caps.queue_counters_valid) {
if (!virtq->counters)
virtq->counters =
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f6719a3c60..bf82026e37 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp {
struct mlx5_vdpa_cq cq;
struct mlx5_devx_obj *fw_qp;
struct mlx5_devx_qp sw_qp;
+   uint16_t qp_pi;
 };
 
 struct mlx5_vdpa_query_mr {
@@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  int callfd, struct mlx5_vdpa_event_qp *eqp);
 
 /**
@@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int 
qid,
  */
 int
 mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
+
+/**
+ * Drain virtq CQ CQE.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 7167a98db0..b43dca9255 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
};
uint32_t word;
} last_word;
-   uint16_t next_wqe_counter = cq->cq_ci;
+   uint16_t next_wqe_counter = eqp->qp_pi;
uint16_t cur_wqe_counter;
uint16_t comp;
 
@@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
rte_io_wmb();
/* Ring CQ doorbell record. */
cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+   eqp->qp_pi += comp;
rte_io_wmb();
/* Ring SW QP doorbell record. */
-   eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size);
+   eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size);
}
return comp;
 }
@@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
return max;
 }
 
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
+{
+   unsigned int i;
+
+   for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+   struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
+
+   mlx5_vdpa_queue_complete(cq);
+   if (cq->cq_obj.cq) {
+   cq->cq_obj.cqes[0].wqe_counter =
+   rte_cpu_to_be_16(UINT16_MAX);
+   priv->virtqs[i].eqp.qp_pi = 0;
+   if (!cq->armed)
+   mlx5_vdpa_cq_arm(priv, cq);
+   }
+   }
+}
+
 /* Wait on all CQs channel for completion event. */
 static st

[PATCH v1 07/17] common/mlx5: extend virtq modifiable fields

2022-06-06 Thread Li Zhang

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang 
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++-
 drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
 drivers/common/mlx5/mlx5_prm.h   | 13 +-
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c 
b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d6d6578d6..1b68c37092 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx,
vdpa_attr->log_doorbell_stride =
MLX5_GET(virtio_emulation_cap, hcattr,
 log_doorbell_stride);
+   vdpa_attr->vnet_modify_ext =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+vnet_modify_ext);
+   vdpa_attr->virtio_net_q_addr_modify =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+virtio_net_q_addr_modify);
+   vdpa_attr->virtio_q_index_modify =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+virtio_q_index_modify);
vdpa_attr->log_doorbell_bar_size =
MLX5_GET(virtio_emulation_cap, hcattr,
 log_doorbell_bar_size);
@@ -2074,27 +2083,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj 
*virtq_obj,
MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
-   MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+   MLX5_SET64(virtio_net_q, virtq, modify_field_select,
+   attr->mod_fields_bitmap);
MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
-   switch (attr->type) {
-   case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+   if (!attr->mod_fields_bitmap) {
+   DRV_LOG(ERR, "Failed to modify VIRTQ for no type set.");
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE)
MLX5_SET16(virtio_net_q, virtq, state, attr->state);
-   break;
-   case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+   if (attr->mod_fields_bitmap &
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) {
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
 attr->dirty_bitmap_mkey);
MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
 attr->dirty_bitmap_addr);
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
 attr->dirty_bitmap_size);
-   break;
-   case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+   }
+   if (attr->mod_fields_bitmap &
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE)
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
 attr->dirty_bitmap_dump_enable);
-   break;
-   default:
-   rte_errno = EINVAL;
-   return -rte_errno;
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) {
+   MLX5_SET(virtio_q, virtctx, queue_period_mode,
+   attr->hw_latency_mode);
+   MLX5_SET(virtio_q, virtctx, queue_period_us,
+   attr->hw_max_latency_us);
+   MLX5_SET(virtio_q, virtctx, queue_max_count,
+   attr->hw_max_pending_comp);
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) {
+   MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+   MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+   MLX5_SET64(virtio_q, virtctx, available_addr,
+   attr->available_addr);
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX)
+   MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+  attr->hw_available_index);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX)
+   MLX5_SET16(virtio_net_q, virtq, hw_used_index,
+   attr->hw_used_index);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE)
+   MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0)
+   MLX5_SET16(virtio_q, virtctx, virtio_version

[PATCH v1 08/17] vdpa/mlx5: pre-create virtq in the prob

2022-06-06 Thread Li Zhang

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h   |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c|  13 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 257 +---
 3 files changed, 170 insertions(+), 104 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index bf82026e37..e5553079fe 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
uint16_t vq_size;
uint8_t notifier_state;
bool stopped;
+   uint32_t configured:1;
uint32_t version;
struct mlx5_vdpa_priv *priv;
struct mlx5_devx_obj *virtq;
@@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, 
int qid);
  */
 void
 mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 43a2b98255..a8faf0c116 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -12,14 +12,17 @@ int
 mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+   .mod_fields_bitmap =
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
.dirty_bitmap_dump_enable = enable,
};
+   struct mlx5_vdpa_virtq *virtq;
int i;
 
for (i = 0; i < priv->nr_virtqs; ++i) {
attr.queue_index = i;
-   if (!priv->virtqs[i].virtq) {
+   virtq = &priv->virtqs[i];
+   if (!virtq->configured) {
DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
"enabling.", i);
} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
@@ -37,10 +40,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
   uint64_t log_size)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+   .mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
.dirty_bitmap_addr = log_base,
.dirty_bitmap_size = log_size,
};
+   struct mlx5_vdpa_virtq *virtq;
int i;
int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd,
  priv->cdev->pdn,
@@ -54,7 +58,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
attr.dirty_bitmap_mkey = priv->lm_mr.lkey;
for (i = 0; i < priv->nr_virtqs; ++i) {
attr.queue_index = i;
-   if (!priv->virtqs[i].virtq) {
+   virtq = &priv->virtqs[i];
+   if (!virtq->configured) {
DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
  &attr)) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6637ba1503..55cbc9fad2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+   virtq->configured = 0;
for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
if (virtq->umems[j].obj) {
claim_zero(mlx5_glue->devx_umem_dereg
@@ -111,11 +112,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
rte_intr_fd_set(virtq->intr_handle, -1);
}
rte_intr_instance_free(virtq->intr_handle);
-   if (virtq->virtq) {
+   if (virtq->configured) {
ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
if (ret)
DRV_LOG(WARNING, "Failed to stop virtq %d.",
virtq->index);
+   virtq->configured = 0;
claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
}
virtq->virtq = NULL;
@@ -138,7 +140,7 @@ int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_MODIF

[PATCH v1 09/17] vdpa/mlx5: optimize datapath-control synchronization

2022-06-06 Thread Li Zhang

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
  parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
  which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   | 24 ---
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 13 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++---
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c| 34 +++---
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++---
 6 files changed, 184 insertions(+), 79 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee99952e11..e5a11f72fd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
struct mlx5_vdpa_priv *priv =
mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+   struct mlx5_vdpa_virtq *virtq;
int ret;
 
if (priv == NULL) {
@@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
DRV_LOG(ERR, "Too big vring id: %d.", vring);
return -E2BIG;
}
-   pthread_mutex_lock(&priv->vq_config_lock);
+   virtq = &priv->virtqs[vring];
+   pthread_mutex_lock(&virtq->virtq_lock);
ret = mlx5_vdpa_virtq_enable(priv, vring, state);
-   pthread_mutex_unlock(&priv->vq_config_lock);
+   pthread_mutex_unlock(&virtq->virtq_lock);
return ret;
 }
 
@@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid)
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
+   pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
+   pthread_mutex_unlock(&priv->steer_update_lock);
mlx5_vdpa_virtqs_release(priv);
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
@@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid)
if (!priv->connected)
mlx5_vdpa_dev_cache_clean(priv);
priv->vid = 0;
-   /* The mutex may stay locked after event thread cancel - initiate it. */
-   pthread_mutex_init(&priv->vq_config_lock, NULL);
DRV_LOG(INFO, "vDPA device %d was closed.", vid);
return ret;
 }
@@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+   struct mlx5_vdpa_virtq *virtq;
uint32_t index;
uint32_t i;
 
+   for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+   index++) {
+   virtq = &priv->virtqs[index];
+   pthread_mutex_init(&virtq->virtq_lock, NULL);
+   }
if (!priv->queues)
return 0;
for (index = 0; index < (priv->queues * 2); ++index) {
-   struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+   virtq = &priv->virtqs[index];
int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-   -1, &virtq->eqp);
+   -1, virtq);
 
if (ret) {
DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
@@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
priv->num_lag_ports = attr->num_lag_ports;
if (attr->num_lag_ports == 0)
priv->num_lag_ports = 1;
-   pthread_mutex_init(&priv->vq_config_lock, NULL);
+   rte_spinlock_init(&priv->db_lock);
+   pthread_mutex_init(&priv->steer_update_lock, NULL);
priv->cdev = cdev;
mlx5_vdpa_config_get(mkvlist, priv);
if (mlx5_vdpa_create_dev_resources(priv))
@@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
mlx5_vdpa_release_dev_resources(priv);
if (priv->vdev)
rte_vdpa_unregister_device(priv->vdev);
-   pthread_mutex_destroy(&priv->vq_config_lock);
rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e5553079fe..3fd5eefc5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq {
bool stopped;
uint32_t configured:1;
uint32_t version;
+   pthread_mutex_t virtq_lock;
struct mlx5_vdpa_priv *priv;
struct mlx5_devx_obj *virtq;
struct mlx5_devx_obj *counters;
@@ -126,7 +127,8 @@ struct mlx5_vdpa_priv {
TAILQ_

[PATCH v1 10/17] vdpa/mlx5: add multi-thread management for configuration

2022-06-06 Thread Li Zhang

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang 
---
 doc/guides/vdpadevs/mlx5.rst  |  11 +++
 drivers/vdpa/mlx5/meson.build |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c |  41 
 drivers/vdpa/mlx5/mlx5_vdpa.h |  36 +++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
 7 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 0ad77bf535..b75a01688d 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 
drivers.
   CPU core number to set polling thread affinity to, default to control plane
   cpu.
 
+- ``max_conf_threads`` parameter [int]
+
+  Allow the driver to use internal threads to obtain fast configuration.
+  All the threads will be open on the same core of the event completion queue 
scheduling thread.
+
+  - 0, default, don't use internal threads for configuration.
+
+  - 1 - 256, number of internal threads in addition to the caller thread (8 is 
suggested).
+This value, if not 0, should be the same for all the devices;
+the first prob will take it with the event_core for all the multi-thread 
configurations in the driver.
+
 - ``hw_latency_mode`` parameter [int]
 
   The completion queue moderation mode:
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 0fa82ad257..9d8dbb1a82 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
 'mlx5_vdpa_virtq.c',
 'mlx5_vdpa_steer.c',
 'mlx5_vdpa_lm.c',
+'mlx5_vdpa_cthread.c',
 )
 cflags_options = [
 '-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e5a11f72fd..a9d023ed08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
  TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
 
 static struct mlx5_vdpa_priv *
@@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char 
*val, void *opaque)
DRV_LOG(WARNING, "Invalid event_core %s.", val);
else
priv->event_core = tmp;
+   } else if (strcmp(key, "max_conf_threads") == 0) {
+   if (tmp) {
+   priv->use_c_thread = true;
+   if (!conf_thread_mng.initializer_priv) {
+   conf_thread_mng.initializer_priv = priv;
+   if (tmp > MLX5_VDPA_MAX_C_THRD) {
+   DRV_LOG(WARNING,
+   "Invalid max_conf_threads %s "
+   "and set max_conf_threads to %d",
+   val, MLX5_VDPA_MAX_C_THRD);
+   tmp = MLX5_VDPA_MAX_C_THRD;
+   }
+   conf_thread_mng.max_thrds = tmp;
+   } else if (tmp != conf_thread_mng.max_thrds) {
+   DRV_LOG(WARNING,
+   "max_conf_threads is PMD argument and not per device, "
+   "only the first device configuration set it, current value is %d "
+   "and will not be changed to %d.",
+   conf_thread_mng.max_thrds, (int)tmp);
+   }
+   } else {
+   priv->use_c_thread = false;
+   }
} else if (strcmp(key, "hw_latency_mode") == 0) {
priv->hw_latency_mode = (uint32_t)tmp;
} else if (strcmp(key, "hw_max_latency_us") == 0) {
@@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
"hw_max_latency_us",
"hw_max_pending_comp",
"no_traffic_time",
+   "queue_size",
+   "queues",
+   "max_conf_threads",
NULL,

[PATCH v1 11/17] vdpa/mlx5: add task ring for MT management

2022-06-06 Thread Li Zhang

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |  17 
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +-
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 4e7c2557b7..2bbb868ec6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -74,10 +74,22 @@ enum {
 };
 
 #define MLX5_VDPA_MAX_C_THRD 256
+#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
+#define MLX5_VDPA_TASKS_PER_DEV 64
+
+/* Generic task information and size must be multiple of 4B. */
+struct mlx5_vdpa_task {
+   struct mlx5_vdpa_priv *priv;
+   uint32_t *remaining_cnt;
+   uint32_t *err_cnt;
+   uint32_t idx;
+} __rte_packed __rte_aligned(4);
 
 /* Generic mlx5_vdpa_c_thread information. */
 struct mlx5_vdpa_c_thread {
pthread_t tid;
+   struct rte_ring *rng;
+   pthread_cond_t c_cond;
 };
 
 struct mlx5_vdpa_conf_thread_mng {
@@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core);
  */
 void
 mlx5_vdpa_mult_threads_destroy(bool need_unlock);
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+   uint32_t thrd_idx,
+   uint32_t num);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index ba7d8b63b3..1fdc92d3ad 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -11,17 +11,103 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r,
+   void **obj, uint32_t n, uint32_t *avail)
+{
+   uint32_t m;
+
+   m = rte_ring_dequeue_bulk_elem_start(r, obj,
+   sizeof(struct mlx5_vdpa_task), n, avail);
+   n = (m == n) ? n : 0;
+   rte_ring_dequeue_elem_finish(r, n);
+   return n;
+}
+
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
+   void * const *obj, uint32_t n, uint32_t *free)
+{
+   uint32_t m;
+
+   m = rte_ring_enqueue_bulk_elem_start(r, n, free);
+   n = (m == n) ? n : 0;
+   rte_ring_enqueue_elem_finish(r, obj,
+   sizeof(struct mlx5_vdpa_task), n);
+   return n;
+}
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+   uint32_t thrd_idx,
+   uint32_t num)
+{
+   struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
+   struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+   uint32_t i;
+
+   MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
+   for (i = 0 ; i < num; i++) {
+   task[i].priv = priv;
+   /* To be added later. */
+   }
+   if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
+   return -1;
+   for (i = 0 ; i < num; i++)
+   if (task[i].remaining_cnt)
+   __atomic_fetch_add(task[i].remaining_cnt, 1,
+   __ATOMIC_RELAXED);
+   /* wake up conf thread. */
+   pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+   pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond);
+   pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+   return 0;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
-   /* To be added later. */
-   return arg;
+   struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
+   pthread_t thread_id = pthread_self();
+   struct mlx5_vdpa_priv *priv;
+   struct mlx5_vdpa_task task;
+   struct rte_ring *rng;
+   uint32_t thrd_idx;
+   uint32_t task_num;
+
+   for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
+   thrd_idx++)
+   if (multhrd->cthrd[thrd_idx].tid == thread_id)
+   break;
+   if (thrd_idx >= multhrd->max_thrds)
+   return NULL;
+   rng = multhrd->cthrd[thrd_idx].rng;
+   while (1) {
+   task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng,
+   (void **)&task, 1, NULL);
+   if (!task_num) {
+   /* No task and condition wait. */
+   pthread_mutex_lock(&multhrd->cthrd_lock);
+   pthread_cond_wait(
+   &multhrd->cthrd[thrd_idx].c_cond,
+   &m

[PATCH v1 12/17] vdpa/mlx5: add MT task for VM memory registration

2022-06-06 Thread Li Zhang

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.h |  31 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 -
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 270 ++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
 5 files changed, 258 insertions(+), 97 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a9d023ed08..e3b32fa087 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
rte_errno = rte_errno ? rte_errno : EINVAL;
goto error;
}
-   SLIST_INIT(&priv->mr_list);
pthread_mutex_lock(&priv_list_lock);
TAILQ_INSERT_TAIL(&priv_list, priv, next);
pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2bbb868ec6..3316ce42be 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp {
 };
 
 struct mlx5_vdpa_query_mr {
-   SLIST_ENTRY(mlx5_vdpa_query_mr) next;
union {
struct ibv_mr *mr;
struct mlx5_devx_obj *mkey;
@@ -76,10 +75,17 @@ enum {
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
+#define MLX5_VDPA_MAX_MRS 0x
+
+/* Vdpa task types. */
+enum mlx5_vdpa_task_type {
+   MLX5_VDPA_TASK_REG_MR = 1,
+};
 
 /* Generic task information and size must be multiple of 4B. */
 struct mlx5_vdpa_task {
struct mlx5_vdpa_priv *priv;
+   enum mlx5_vdpa_task_type type;
uint32_t *remaining_cnt;
uint32_t *err_cnt;
uint32_t idx;
@@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng {
 };
 extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
 
+struct mlx5_vdpa_vmem_info {
+   struct rte_vhost_memory *vmem;
+   uint32_t entries_num;
+   uint64_t gcd;
+   uint64_t size;
+   uint8_t mode;
+};
+
 struct mlx5_vdpa_virtq {
SLIST_ENTRY(mlx5_vdpa_virtq) next;
uint8_t enable;
@@ -176,7 +190,7 @@ struct mlx5_vdpa_priv {
struct mlx5_hca_vdpa_attr caps;
uint32_t gpa_mkey_index;
struct ibv_mr *null_mr;
-   struct rte_vhost_memory *vmem;
+   struct mlx5_vdpa_vmem_info vmem_info;
struct mlx5dv_devx_event_channel *eventc;
struct mlx5dv_devx_event_channel *err_chnl;
struct mlx5_uar uar;
@@ -187,11 +201,13 @@ struct mlx5_vdpa_priv {
uint8_t num_lag_ports;
uint64_t features; /* Negotiated features. */
uint16_t log_max_rqt_size;
+   uint16_t last_c_thrd_idx;
+   uint16_t num_mrs; /* Number of memory regions. */
struct mlx5_vdpa_steer steer;
struct mlx5dv_var *var;
void *virtq_db_addr;
struct mlx5_pmd_wrapped_mr lm_mr;
-   SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+   struct mlx5_vdpa_query_mr **mrs;
struct mlx5_vdpa_virtq virtqs[];
 };
 
@@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
-   uint32_t num);
+   enum mlx5_vdpa_task_type task_type,
+   uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+   void **task_data, uint32_t num);
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+   uint32_t *err_cnt, uint32_t sleep_time);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1fdc92d3ad..10391931ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
-   uint32_t num)
+   enum mlx5_vdpa_task_type task_type,
+   uint32_t *remaining_cnt, uint32_t *err_cnt,
+   void **task_data, uint32_t num)
 {
struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+   uint32_t *data = (uint32_t *)task_data;
uint32_t i;

[PATCH v1 13/17] vdpa/mlx5: add virtq creation task for MT management

2022-06-06 Thread Li Zhang

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |   9 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 149 +++---
 4 files changed, 134 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3316ce42be..35221f5ddc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ enum {
 /* Vdpa task types. */
 enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
+   MLX5_VDPA_TASK_SETUP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info {
 
 struct mlx5_vdpa_virtq {
SLIST_ENTRY(mlx5_vdpa_virtq) next;
-   uint8_t enable;
uint16_t index;
uint16_t vq_size;
uint8_t notifier_state;
-   bool stopped;
uint32_t configured:1;
+   uint32_t enable:1;
+   uint32_t stopped:1;
uint32_t version;
pthread_mutex_t virtq_lock;
struct mlx5_vdpa_priv *priv;
@@ -565,11 +566,13 @@ bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
enum mlx5_vdpa_task_type task_type,
-   uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+   uint32_t *remaining_cnt, uint32_t *err_cnt,
void **task_data, uint32_t num);
 int
 mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
 bool
 mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
uint32_t *err_cnt, uint32_t sleep_time);
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 10391931ae..1389d369ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 {
struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
pthread_t thread_id = pthread_self();
+   struct mlx5_vdpa_virtq *virtq;
struct mlx5_vdpa_priv *priv;
struct mlx5_vdpa_task task;
struct rte_ring *rng;
@@ -139,6 +140,19 @@ mlx5_vdpa_c_thread_handle(void *arg)
__ATOMIC_RELAXED);
}
break;
+   case MLX5_VDPA_TASK_SETUP_VIRTQ:
+   virtq = &priv->virtqs[task.idx];
+   pthread_mutex_lock(&virtq->virtq_lock);
+   ret = mlx5_vdpa_virtq_setup(priv,
+   task.idx, false);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to setup virtq %d.", task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1, __ATOMIC_RELAXED);
+   }
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
default:
DRV_LOG(ERR, "Invalid vdpa task type %d.",
task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b45fbac146..f782b6b832 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
goto unlock;
if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
goto unlock;
-   virtq->stopped = true;
+   virtq->stopped = 1;
/* Query error info. */
if (mlx5_vdpa_virtq_query(priv, vq_index))
goto log;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 0b317655db..db05220e76 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -111,8 +111,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+   if (virtq->index != i)
+   continue;
pthread_mutex_lock(&virtq->virtq_lock);
-   virtq->configured = 0;
for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
if (virtq->umems[j].obj) {
claim_zero(mlx5_glue->devx_umem_dereg
@@ -131,7 +132,6 @@ mlx5_vdpa_virtqs_cleanup(struct m

[PATCH v1 14/17] vdpa/mlx5: add virtq LM log task

2022-06-06 Thread Li Zhang

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |  3 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c  | 85 +--
 3 files changed, 105 insertions(+), 17 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 35221f5ddc..e08931719f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -72,6 +72,8 @@ enum {
MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_USED_RING_LEN(size) \
+   ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
@@ -81,6 +83,7 @@ enum {
 enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
MLX5_VDPA_TASK_SETUP_VIRTQ,
+   MLX5_VDPA_TASK_STOP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1389d369ae..98369f0887 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
struct mlx5_vdpa_priv *priv;
struct mlx5_vdpa_task task;
struct rte_ring *rng;
+   uint64_t features;
uint32_t thrd_idx;
uint32_t task_num;
int ret;
@@ -153,6 +154,39 @@ mlx5_vdpa_c_thread_handle(void *arg)
}
pthread_mutex_unlock(&virtq->virtq_lock);
break;
+   case MLX5_VDPA_TASK_STOP_VIRTQ:
+   virtq = &priv->virtqs[task.idx];
+   pthread_mutex_lock(&virtq->virtq_lock);
+   ret = mlx5_vdpa_virtq_stop(priv,
+   task.idx);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to stop virtq %d.",
+   task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1,
+   __ATOMIC_RELAXED);
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
+   }
+   ret = rte_vhost_get_negotiated_features(
+   priv->vid, &features);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to get negotiated features virtq %d.",
+   task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1,
+   __ATOMIC_RELAXED);
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
+   }
+   if (RTE_VHOST_NEED_LOG(features))
+   rte_vhost_log_used_vring(
+   priv->vid, task.idx, 0,
+   MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
default:
DRV_LOG(ERR, "Invalid vdpa task type %d.",
task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index efebf364d0..c2e78218ca 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -89,39 +89,90 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
return -1;
 }
 
-#define MLX5_VDPA_USED_RING_LEN(size) \
-   ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
-
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+   uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+   uint32_t i, thrd_idx, data[1];
struct mlx5_vdpa_virtq *virtq;
uint64_t features;
-   int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
-   int i;
+   int ret;
 
+   ret = rte_vhost_get_negotiated_features(priv->vid, &features);
if (ret) {
DRV_LOG(ERR, "Failed to get negotiated features.");
return -1;
}
-   if (!RTE_VHOST_NEED_LOG(features))
-   return 0;
-   for (i = 0; i < priv->nr_virtqs; ++i) {
-   virtq = &priv->virtqs[i];
-   if (!priv->virtqs[i].virtq) {
-   DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
-   } else {
+   if (priv->use_c_thread && priv->nr_virtqs) {
+   uint32_t main_task_idx[priv->nr_virtqs];
+
+   for (i = 0; i < p

[PATCH v1 15/17] vdpa/mlx5: add device close task

2022-06-06 Thread Li Zhang

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 56 +--
 drivers/vdpa/mlx5/mlx5_vdpa.h |  8 
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 +++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e3b32fa087..d000854c08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
-static void
+void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
/* Clean pre-created resource in dev removal only. */
@@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
mlx5_vdpa_mem_dereg(priv);
 }
 
+static bool
+mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
+{
+   uint32_t timeout = 0;
+
+   /* Check and wait all close tasks done. */
+   while (__atomic_load_n(&priv->dev_close_progress,
+   __ATOMIC_RELAXED) != 0 && timeout < 1000) {
+   rte_delay_us_sleep(1);
+   timeout++;
+   }
+   if (priv->dev_close_progress) {
+   DRV_LOG(ERR,
+   "Failed to wait close device tasks done vid %d.",
+   priv->vid);
+   return true;
+   }
+   return false;
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid)
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
+   if (priv->use_c_thread) {
+   if (priv->last_c_thrd_idx >=
+   (conf_thread_mng.max_thrds - 1))
+   priv->last_c_thrd_idx = 0;
+   else
+   priv->last_c_thrd_idx++;
+   __atomic_store_n(&priv->dev_close_progress,
+   1, __ATOMIC_RELAXED);
+   if (mlx5_vdpa_task_add(priv,
+   priv->last_c_thrd_idx,
+   MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+   NULL, NULL, NULL, 1)) {
+   DRV_LOG(ERR,
+   "Fail to add dev close task. ");
+   goto single_thrd;
+   }
+   priv->state = MLX5_VDPA_STATE_PROBED;
+   DRV_LOG(INFO, "vDPA device %d was closed.", vid);
+   return ret;
+   }
+single_thrd:
pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
pthread_mutex_unlock(&priv->steer_update_lock);
@@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid)
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-   priv->state = MLX5_VDPA_STATE_PROBED;
if (!priv->connected)
mlx5_vdpa_dev_cache_clean(priv);
priv->vid = 0;
+   __atomic_store_n(&priv->dev_close_progress, 0,
+   __ATOMIC_RELAXED);
+   priv->state = MLX5_VDPA_STATE_PROBED;
DRV_LOG(INFO, "vDPA device %d was closed.", vid);
return ret;
 }
@@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid)
DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
return -1;
}
+   if (mlx5_vdpa_wait_dev_close_tasks_done(priv))
+   return -1;
priv->vid = vid;
priv->connected = true;
if (mlx5_vdpa_mtu_set(priv))
@@ -444,8 +489,11 @@ mlx5_vdpa_dev_cleanup(int vid)
DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
return -1;
}
-   if (priv->state == MLX5_VDPA_STATE_PROBED)
+   if (priv->state == MLX5_VDPA_STATE_PROBED) {
+   if (priv->use_c_thread)
+   mlx5_vdpa_wait_dev_close_tasks_done(priv);
mlx5_vdpa_dev_cache_clean(priv);
+   }
priv->connected = false;
return 0;
 }
@@ -839,6 +887,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
mlx5_vdpa_dev_close(priv->vid);
+   if (priv->use_c_thread)
+   mlx5_vdpa_wait_dev_close_tasks_done(priv);
mlx5_vdpa_release_dev_resources(priv);
if (priv->vdev)
rte_vdpa_unregister_device(priv->vdev);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e08931719f..b6392b9d66 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
MLX5_VDPA_TASK_SETUP_VIRTQ,
MLX5_VDPA_TASK_STOP_VIRTQ,
+   MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
 };

[PATCH v1 16/17] vdpa/mlx5: add virtq sub-resources creation

2022-06-06 Thread Li Zhang

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang 
Signed-off-by: Yajun Wu 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   | 72 +++--
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +
 5 files changed, 123 insertions(+), 93 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d000854c08..f006a9cd3f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -627,65 +627,39 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-   struct mlx5_vdpa_virtq *virtq;
+   uint32_t max_queues;
uint32_t index;
-   uint32_t i;
+   struct mlx5_vdpa_virtq *virtq;
 
-   for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+   for (index = 0; index < priv->caps.max_num_virtio_queues;
index++) {
virtq = &priv->virtqs[index];
pthread_mutex_init(&virtq->virtq_lock, NULL);
}
-   if (!priv->queues)
+   if (!priv->queues || !priv->queue_size)
return 0;
-   for (index = 0; index < (priv->queues * 2); ++index) {
+   max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
+   (priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+   for (index = 0; index < max_queues; ++index)
+   if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+   index))
+   goto error;
+   if (mlx5_vdpa_is_modify_virtq_supported(priv))
+   if (mlx5_vdpa_steer_update(priv, true))
+   goto error;
+   return 0;
+error:
+   for (index = 0; index < max_queues; ++index) {
virtq = &priv->virtqs[index];
-   int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-   -1, virtq);
-
-   if (ret) {
-   DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-   index);
-   return -1;
-   }
-   if (priv->caps.queue_counters_valid) {
-   if (!virtq->counters)
-   virtq->counters =
-   mlx5_devx_cmd_create_virtio_q_counters
-   (priv->cdev->ctx);
-   if (!virtq->counters) {
-   DRV_LOG(ERR, "Failed to create virtq couners 
for virtq"
-   " %d.", index);
-   return -1;
-   }
-   }
-   for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-   uint32_t size;
-   void *buf;
-   struct mlx5dv_devx_umem *obj;
-
-   size = priv->caps.umems[i].a * priv->queue_size +
-   priv->caps.umems[i].b;
-   buf = rte_zmalloc(__func__, size, 4096);
-   if (buf == NULL) {
-   DRV_LOG(ERR, "Cannot allocate umem %d memory 
for virtq"
-   " %u.", i, index);
-   return -1;
-   }
-   obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
-   size, IBV_ACCESS_LOCAL_WRITE);
-   if (obj == NULL) {
-   rte_free(buf);
-   DRV_LOG(ERR, "Failed to register umem %d for 
virtq %u.",
-   i, index);
-   return -1;
-   }
-   virtq->umems[i].size = size;
-   virtq->umems[i].buf = buf;
-   virtq->umems[i].obj = obj;
+   if (virtq->virtq) {
+   pthread_mutex_lock(&virtq->virtq_lock);
+   mlx5_vdpa_virtq_unset(virtq);
+   pthread_mutex_unlock(&virtq->virtq_lock);
}
}
-   return 0;
+   if (mlx5_vdpa_is_modify_virtq_supported(priv))
+   mlx5_vdpa_steer_unset(priv);
+   return -1;
 }
 
 static int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index b6392b9d66..f353db62ac 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);

[PATCH v1 17/17] vdpa/mlx5: prepare virtqueue resource creation

2022-06-06 Thread Li Zhang

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 115 --
 drivers/vdpa/mlx5/mlx5_vdpa.h |  12 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  15 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 111 +
 4 files changed, 208 insertions(+), 45 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f006a9cd3f..c5d82872c7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -275,23 +275,18 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv 
*priv)
 }
 
 static int
-mlx5_vdpa_dev_close(int vid)
+_internal_mlx5_vdpa_dev_close(struct mlx5_vdpa_priv *priv,
+   bool release_resource)
 {
-   struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
-   struct mlx5_vdpa_priv *priv =
-   mlx5_vdpa_find_priv_resource_by_vdev(vdev);
int ret = 0;
+   int vid = priv->vid;
 
-   if (priv == NULL) {
-   DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
-   return -1;
-   }
mlx5_vdpa_cqe_event_unset(priv);
if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
-   if (priv->use_c_thread) {
+   if (priv->use_c_thread && !release_resource) {
if (priv->last_c_thrd_idx >=
(conf_thread_mng.max_thrds - 1))
priv->last_c_thrd_idx = 0;
@@ -315,7 +310,7 @@ mlx5_vdpa_dev_close(int vid)
pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
pthread_mutex_unlock(&priv->steer_update_lock);
-   mlx5_vdpa_virtqs_release(priv);
+   mlx5_vdpa_virtqs_release(priv, release_resource);
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
@@ -329,6 +324,24 @@ mlx5_vdpa_dev_close(int vid)
return ret;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+   struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+   struct mlx5_vdpa_priv *priv;
+
+   if (!vdev) {
+   DRV_LOG(ERR, "Invalid vDPA device.");
+   return -1;
+   }
+   priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+   if (priv == NULL) {
+   DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+   return -1;
+   }
+   return _internal_mlx5_vdpa_dev_close(priv, false);
+}
+
 static int
 mlx5_vdpa_dev_config(int vid)
 {
@@ -624,11 +637,33 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
priv->queue_size);
 }
 
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv)
+{
+   uint32_t max_queues, index;
+   struct mlx5_vdpa_virtq *virtq;
+
+   if (!priv->queues || !priv->queue_size)
+   return;
+   max_queues = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+   (priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+   if (mlx5_vdpa_is_modify_virtq_supported(priv))
+   mlx5_vdpa_steer_unset(priv);
+   for (index = 0; index < max_queues; ++index) {
+   virtq = &priv->virtqs[index];
+   if (virtq->virtq) {
+   pthread_mutex_lock(&virtq->virtq_lock);
+   mlx5_vdpa_virtq_unset(virtq);
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   }
+   }
+}
+
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-   uint32_t max_queues;
-   uint32_t index;
+   uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+   uint32_t max_queues, index, thrd_idx, data[1];
struct mlx5_vdpa_virtq *virtq;
 
for (index = 0; index < priv->caps.max_num_virtio_queues;
@@ -640,25 +675,53 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv 
*priv)
return 0;
max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
-   for (index = 0; index < max_queues; ++index)
-   if (mlx5_vdpa_virtq_single_resource_prepare(priv,
-   index))
+   if (priv->use_c_thread) {
+   uint32_t main_task_idx[max_queues];
+
+   for (index = 0; index < max_queues; ++index) {
+   thrd_idx = index % (conf_thread_mng.max_thrds + 1);
+   if (!thrd_idx) {
+   main_task_idx[task_num] = index;
+   task_num++;
+   continue;
+   }
+

[Bug 1025] [dpdk 22.07 && dpdk-next-net] kernel/linux/kni meson build failed on Ub22.04/Ub20.04/Fedora36/Centos7.9/SUSE15/RHEL8.6

2022-06-06 Thread bugzilla

https://bugs.dpdk.org/show_bug.cgi?id=1025

Thomas Monjalon (tho...@monjalon.net) changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||tho...@monjalon.net
 Status|IN_PROGRESS |RESOLVED

--- Comment #5 from Thomas Monjalon (tho...@monjalon.net) ---
I didn't receive your patch,
but I had done one which is already pushed:
https://git.dpdk.org/dpdk/commit/?id=327ef506599

By the way, I would like to understand what happened.
You did not test the compilation before sending the patch?

-- 
You are receiving this mail because:
You are the assignee for the bug.

RE: [PATCH v2] net/igc: add I226 support

2022-06-06 Thread Zhang, Qi Z




> -Original Message-
> From: Thomas Monjalon 
> Sent: Monday, June 6, 2022 7:46 PM
> To: Yang, Qiming ; Zhang, Qi Z
> 
> Cc: dev@dpdk.org; Liu, KevinX ; Mcnamara, John
> 
> Subject: Re: [PATCH v2] net/igc: add I226 support
> 
> 06/06/2022 12:54, Zhang, Qi Z:
> >
> > > -Original Message-
> > > From: Thomas Monjalon 
> > > Sent: Monday, June 6, 2022 6:49 PM
> > > To: Yang, Qiming ; Zhang, Qi Z
> > > 
> > > Cc: dev@dpdk.org; Liu, KevinX 
> > > Subject: Re: [PATCH v2] net/igc: add I226 support
> > >
> > > 06/06/2022 01:12, Zhang, Qi Z:
> > > >
> > > > > -Original Message-
> > > > > From: Thomas Monjalon 
> > > > > Sent: Monday, June 6, 2022 12:42 AM
> > > > > To: Zhang, Qi Z ; Yang, Qiming
> > > > > 
> > > > > Cc: dev@dpdk.org; Liu, KevinX 
> > > > > Subject: Re: [PATCH v2] net/igc: add I226 support
> > > > >
> > > > > 25/05/2022 07:57, Qiming Yang:
> > > > > > Added I226 Series device ID in igc driver and updated igc
> > > > > > guide document for new devices.
> > > > > >
> > > > > > Signed-off-by: Qiming Yang 
> > > > > > Signed-off-by: Kevin Liu 
> > > > > > ---
> > > > > > v2:
> > > > > > * rebased
> > > > > > ---
> > > > > >  doc/guides/nics/igc.rst| 14 +++---
> > > > > >  doc/guides/rel_notes/release_22_03.rst |  5 +
> > > > >
> > > > > You are sending a patch after 22.03 is closed, so it should be
> > > > > listed in
> > > 22.07!
> > > > >
> > > > > I will fix while pulling the tree prepared by Qi.
> > > > > Please be more careful with the basic checks.
> > > >
> > > > Thanks for capture this, have dropped this patch in dpdk-next-net-intel.
> > > > A new version is required.
> > >
> > > Too late, it is in the main tree with release notes fixed.
> > > Do you need more fix?
> >
> > OK, I guess we need to revert it with a new fix.
> > Sorry for the chaos...
> 
> Why revert? If there is a bug, just fix it.

No a revert patch, I mean a fix patch with revert change on release_22_03.rst

> 
>

[v3 07/24] eal/loongarch: add dummy vector memcpy for LoongArch

2022-06-06 Thread Min Zhou

The hardware instructions based vector implementation for memcpy
will come later. At present, this dummy implementation can also
work.

Signed-off-by: Min Zhou 
---
 lib/eal/loongarch/include/rte_memcpy.h | 193 +
 lib/eal/loongarch/include/rte_vect.h   |  46 ++
 2 files changed, 239 insertions(+)
 create mode 100644 lib/eal/loongarch/include/rte_memcpy.h
 create mode 100644 lib/eal/loongarch/include/rte_vect.h

diff --git a/lib/eal/loongarch/include/rte_memcpy.h 
b/lib/eal/loongarch/include/rte_memcpy.h
new file mode 100644
index 00..98dc3dfc3b
--- /dev/null
+++ b/lib/eal/loongarch/include/rte_memcpy.h
@@ -0,0 +1,193 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _RTE_MEMCPY_LOONGARCH_H_
+#define _RTE_MEMCPY_LOONGARCH_H_
+
+#include 
+#include 
+#include 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_memcpy.h"
+
+static inline void
+rte_mov16(uint8_t *dst, const uint8_t *src)
+{
+   *(xmm_t *)dst = *(const xmm_t *)src;
+}
+
+static inline void
+rte_mov32(uint8_t *dst, const uint8_t *src)
+{
+   rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);
+   rte_mov16((uint8_t *)dst + 1 * 16, (const uint8_t *)src + 1 * 16);
+}
+
+static inline void
+rte_mov48(uint8_t *dst, const uint8_t *src)
+{
+   rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);
+   rte_mov16((uint8_t *)dst + 1 * 16, (const uint8_t *)src + 1 * 16);
+   rte_mov16((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 * 32);
+}
+
+static inline void
+rte_mov64(uint8_t *dst, const uint8_t *src)
+{
+   rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);
+   rte_mov16((uint8_t *)dst + 1 * 16, (const uint8_t *)src + 1 * 16);
+   rte_mov16((uint8_t *)dst + 2 * 16, (const uint8_t *)src + 2 * 16);
+   rte_mov16((uint8_t *)dst + 3 * 16, (const uint8_t *)src + 3 * 16);
+}
+
+static inline void
+rte_mov128(uint8_t *dst, const uint8_t *src)
+{
+   rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);
+   rte_mov16((uint8_t *)dst + 1 * 16, (const uint8_t *)src + 1 * 16);
+   rte_mov16((uint8_t *)dst + 2 * 16, (const uint8_t *)src + 2 * 16);
+   rte_mov16((uint8_t *)dst + 3 * 16, (const uint8_t *)src + 3 * 16);
+   rte_mov16((uint8_t *)dst + 4 * 16, (const uint8_t *)src + 4 * 16);
+   rte_mov16((uint8_t *)dst + 5 * 16, (const uint8_t *)src + 5 * 16);
+   rte_mov16((uint8_t *)dst + 6 * 16, (const uint8_t *)src + 6 * 16);
+   rte_mov16((uint8_t *)dst + 7 * 16, (const uint8_t *)src + 7 * 16);
+}
+
+static inline void
+rte_mov256(uint8_t *dst, const uint8_t *src)
+{
+   rte_mov128(dst, src);
+   rte_mov128(dst + 128, src + 128);
+}
+
+#define rte_memcpy(dst, src, n)  \
+   rte_memcpy_func((dst), (src), (n))
+
+static inline void *
+rte_memcpy_func(void *dst, const void *src, size_t n)
+{
+   void *ret = dst;
+
+   /* We can't copy < 16 bytes using XMM registers so do it manually. */
+   if (n < 16) {
+   if (n & 0x01) {
+   *(uint8_t *)dst = *(const uint8_t *)src;
+   dst = (uint8_t *)dst + 1;
+   src = (const uint8_t *)src + 1;
+   }
+   if (n & 0x02) {
+   *(uint16_t *)dst = *(const uint16_t *)src;
+   dst = (uint16_t *)dst + 1;
+   src = (const uint16_t *)src + 1;
+   }
+   if (n & 0x04) {
+   *(uint32_t *)dst = *(const uint32_t *)src;
+   dst = (uint32_t *)dst + 1;
+   src = (const uint32_t *)src + 1;
+   }
+   if (n & 0x08)
+   *(uint64_t *)dst = *(const uint64_t *)src;
+   return ret;
+   }
+
+   /* Special fast cases for <= 128 bytes */
+   if (n <= 32) {
+   rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+   rte_mov16((uint8_t *)dst - 16 + n,
+   (const uint8_t *)src - 16 + n);
+   return ret;
+   }
+
+   if (n <= 64) {
+   rte_mov32((uint8_t *)dst, (const uint8_t *)src);
+   rte_mov32((uint8_t *)dst - 32 + n,
+   (const uint8_t *)src - 32 + n);
+   return ret;
+   }
+
+   if (n <= 128) {
+   rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+   rte_mov64((uint8_t *)dst - 64 + n,
+   (const uint8_t *)src - 64 + n);
+   return ret;
+   }
+
+   /*
+* For large copies > 128 bytes. This combination of 256, 64 and 16 byte
+* copies was found to be faster than doing 128 and 32 byte copies as
+* well.
+*/
+   for ( ; n >= 256; n -= 256) {
+   rte_mov256((uint8_t *)dst, (const uint8_t *)src);
+   dst = (uint8_t *)dst + 256;
+

[v3 05/24] eal/loongarch: add spinlock operations for LoongArch

2022-06-06 Thread Min Zhou

This patch adds spinlock operations for LoongArch architecture.
These implementations are based on standard atomics of toolchain
and heavily reference generic spinlock codes.

Signed-off-by: Min Zhou 
---
 lib/eal/loongarch/include/rte_spinlock.h | 90 
 1 file changed, 90 insertions(+)
 create mode 100644 lib/eal/loongarch/include/rte_spinlock.h

diff --git a/lib/eal/loongarch/include/rte_spinlock.h 
b/lib/eal/loongarch/include/rte_spinlock.h
new file mode 100644
index 00..9ad46a3c91
--- /dev/null
+++ b/lib/eal/loongarch/include/rte_spinlock.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _RTE_SPINLOCK_LOONGARCH_H_
+#define _RTE_SPINLOCK_LOONGARCH_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include "generic/rte_spinlock.h"
+
+#ifndef RTE_FORCE_INTRINSICS
+static inline void
+rte_spinlock_lock(rte_spinlock_t *sl)
+{
+   int exp = 0;
+
+   while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
+   __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+   rte_wait_until_equal_32((volatile uint32_t *)&sl->locked,
+  0, __ATOMIC_RELAXED);
+   exp = 0;
+   }
+}
+
+static inline void
+rte_spinlock_unlock(rte_spinlock_t *sl)
+{
+   __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);
+}
+
+static inline int
+rte_spinlock_trylock(rte_spinlock_t *sl)
+{
+   int exp = 0;
+   return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
+   0, /* disallow spurious failure */
+   __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
+}
+#endif
+
+static inline int rte_tm_supported(void)
+{
+   return 0;
+}
+
+static inline void
+rte_spinlock_lock_tm(rte_spinlock_t *sl)
+{
+   rte_spinlock_lock(sl); /* fall-back */
+}
+
+static inline int
+rte_spinlock_trylock_tm(rte_spinlock_t *sl)
+{
+   return rte_spinlock_trylock(sl);
+}
+
+static inline void
+rte_spinlock_unlock_tm(rte_spinlock_t *sl)
+{
+   rte_spinlock_unlock(sl);
+}
+
+static inline void
+rte_spinlock_recursive_lock_tm(rte_spinlock_recursive_t *slr)
+{
+   rte_spinlock_recursive_lock(slr); /* fall-back */
+}
+
+static inline void
+rte_spinlock_recursive_unlock_tm(rte_spinlock_recursive_t *slr)
+{
+   rte_spinlock_recursive_unlock(slr);
+}
+
+static inline int
+rte_spinlock_recursive_trylock_tm(rte_spinlock_recursive_t *slr)
+{
+   return rte_spinlock_recursive_trylock(slr);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_SPINLOCK_LOONGARCH_H_ */
-- 
2.31.1

[v3 19/24] test/xmmt_ops: add dummy vector implementation for LoongArch

2022-06-06 Thread Min Zhou

The hardware instructions based vector implementation will come
in a future patch. This dummy implementation can also work.

Signed-off-by: Min Zhou 
---
 app/test/test_xmmt_ops.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h
index 3a82d5ecac..7b2c3c37dd 100644
--- a/app/test/test_xmmt_ops.h
+++ b/app/test/test_xmmt_ops.h
@@ -49,6 +49,23 @@ vect_set_epi32(int i3, int i2, int i1, int i0)
return data;
 }
 
+#elif defined(RTE_ARCH_LOONGARCH)
+/* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/
+static __rte_always_inline xmm_t
+vect_loadu_sil128(void *p)
+{
+   xmm_t data;
+   data = *(const xmm_t *)p;
+   return data;
+}
+
+/* sets the 4 signed 32-bit integer values and returns the xmm_t variable */
+static __rte_always_inline xmm_t
+vect_set_epi32(int i3, int i2, int i1, int i0)
+{
+   xmm_t data = (xmm_t){.u32 = {i0, i1, i2, i3} };
+   return data;
+}
 #endif
 
 #endif /* _TEST_XMMT_OPS_H_ */
-- 
2.31.1

[v3 21/24] i40e: add dummy vector implementation for LoongArch

2022-06-06 Thread Min Zhou

The purpose of this patch is used to fix building issues for
LoongArch architecture. The hardware instructions based vector
implementation will come in a future patch.

Signed-off-by: Min Zhou 
---
 drivers/net/i40e/i40e_rxtx_vec_lsx.c | 54 
 drivers/net/i40e/meson.build |  2 ++
 2 files changed, 56 insertions(+)
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_lsx.c

diff --git a/drivers/net/i40e/i40e_rxtx_vec_lsx.c 
b/drivers/net/i40e/i40e_rxtx_vec_lsx.c
new file mode 100644
index 00..727dc178f2
--- /dev/null
+++ b/drivers/net/i40e/i40e_rxtx_vec_lsx.c
@@ -0,0 +1,54 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#include "i40e_ethdev.h"
+#include "i40e_rxtx.h"
+
+/* The vector support will come later */
+#ifdef RTE_ARCH_NO_VECTOR
+uint16_t
+i40e_recv_scattered_pkts_vec(__rte_unused void *rx_queue,
+   __rte_unused struct rte_mbuf **rx_pkts,
+   __rte_unused uint16_t nb_pkts)
+{
+   return 0;
+}
+
+uint16_t
+i40e_recv_pkts_vec(__rte_unused void *rx_queue,
+   __rte_unused struct rte_mbuf **rx_pkts,
+   __rte_unused uint16_t nb_pkts)
+{
+   return 0;
+}
+uint16_t
+i40e_xmit_fixed_burst_vec(__rte_unused void *tx_queue,
+   __rte_unused struct rte_mbuf **tx_pkts,
+   __rte_unused uint16_t nb_pkts)
+{
+   return 0;
+}
+void __rte_cold
+i40e_rx_queue_release_mbufs_vec(__rte_unused struct i40e_rx_queue *rxq)
+{
+}
+int __rte_cold
+i40e_rxq_vec_setup(__rte_unused struct i40e_rx_queue *rxq)
+{
+   return -1;
+}
+int __rte_cold
+i40e_txq_vec_setup(__rte_unused struct i40e_tx_queue *txq)
+{
+   return -1;
+}
+int __rte_cold
+i40e_rx_vec_dev_conf_condition_check(__rte_unused struct rte_eth_dev *dev)
+{
+   return -1;
+}
+#else
+#error "The current version of LoongArch does not support vector!"
+#endif
diff --git a/drivers/net/i40e/meson.build b/drivers/net/i40e/meson.build
index efc5f93e35..9775f05da1 100644
--- a/drivers/net/i40e/meson.build
+++ b/drivers/net/i40e/meson.build
@@ -75,6 +75,8 @@ elif arch_subdir == 'ppc'
sources += files('i40e_rxtx_vec_altivec.c')
 elif arch_subdir == 'arm'
sources += files('i40e_rxtx_vec_neon.c')
+elif arch_subdir == 'loongarch'
+   sources += files('i40e_rxtx_vec_lsx.c')
 endif
 
 headers = files('rte_pmd_i40e.h')
-- 
2.31.1

[v3 13/24] eal/loongarch: add ticketlock operations for LoongArch

2022-06-06 Thread Min Zhou

This patch adds ticketlock operations for LoongArch architecture.
Let it uses generic ticketlock implementation.

Signed-off-by: Min Zhou 
---
 lib/eal/loongarch/include/rte_ticketlock.h | 18 ++
 1 file changed, 18 insertions(+)
 create mode 100644 lib/eal/loongarch/include/rte_ticketlock.h

diff --git a/lib/eal/loongarch/include/rte_ticketlock.h 
b/lib/eal/loongarch/include/rte_ticketlock.h
new file mode 100644
index 00..3959bcae7b
--- /dev/null
+++ b/lib/eal/loongarch/include/rte_ticketlock.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _RTE_TICKETLOCK_LOONGARCH_H_
+#define _RTE_TICKETLOCK_LOONGARCH_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_ticketlock.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_TICKETLOCK_LOONGARCH_H_ */
-- 
2.31.1

[v3 00/24] Support LoongArch architecture

2022-06-06 Thread Min Zhou

Dear team,
The following patch set is intended to support DPDK running on LoongArch
architecture.

LoongArch is the general processor architecture of Loongson and is a new
RISC ISA, which is a bit like MIPS or RISC-V.

The online documents of LoongArch are here:
https://loongson.github.io/LoongArch-Documentation/README-EN.html

The latest cross compile tool chain can be downloaded from:
https://github.com/loongson/build-tools

v3:
- add URL for cross compile tool chain
- remove rte_lpm_lsx.h which was a dummy vector implementation
  because there is already a scalar implementation, thanks to
  Michal Mazurek
- modify the name of compiler for cross compiling
- remove useless variable in meson.build

v2:
- use standard atomics of toolchain to implement
  atomic operations
- implement spinlock based on standard atomics

Min Zhou (24):
  eal/loongarch: add atomic operations for LoongArch
  eal/loongarch: add byte order operations for LoongArch
  eal/loongarch: add cpu cycle operations for LoongArch
  eal/loongarch: add prefetch operations for LoongArch
  eal/loongarch: add spinlock operations for LoongArch
  eal/loongarch: add cpu flag checks for LoongArch
  eal/loongarch: add dummy vector memcpy for LoongArch
  eal/loongarch: add io operations for LoongArch
  eal/loongarch: add mcslock operations for LoongArch
  eal/loongarch: add pause operations for LoongArch
  eal/loongarch: add pflock operations for LoongArch
  eal/loongarch: add rwlock operations for LoongArch
  eal/loongarch: add ticketlock operations for LoongArch
  eal/loongarch: add power operations for LoongArch
  eal/loongarch: add hypervisor operations for LoongArch
  mem: add huge page size definition for LoongArch
  eal/linux: set eal base address for LoongArch
  meson: introduce LoongArch architecture
  test/xmmt_ops: add dummy vector implementation for LoongArch
  ixgbe: add dummy vector implementation for LoongArch
  i40e: add dummy vector implementation for LoongArch
  tap: add system call number for LoongArch
  memif: add system call number for LoongArch
  maintainers: claim responsibility for LoongArch

 MAINTAINERS   |   9 +
 app/test/test_xmmt_ops.h  |  17 ++
 .../loongarch/loongarch_loongarch64_linux_gcc |  16 ++
 config/loongarch/meson.build  |  43 +++
 drivers/net/i40e/i40e_rxtx_vec_lsx.c  |  54 
 drivers/net/i40e/meson.build  |   2 +
 drivers/net/ixgbe/ixgbe_rxtx_vec_lsx.c|  60 +
 drivers/net/ixgbe/meson.build |   2 +
 drivers/net/memif/rte_eth_memif.h |   2 +-
 drivers/net/tap/tap_bpf.h |   2 +-
 lib/eal/include/rte_memory.h  |   1 +
 lib/eal/include/rte_memzone.h |   1 +
 lib/eal/linux/eal_memory.c|   4 +
 lib/eal/loongarch/include/meson.build |  21 ++
 lib/eal/loongarch/include/rte_atomic.h| 253 ++
 lib/eal/loongarch/include/rte_byteorder.h |  46 
 lib/eal/loongarch/include/rte_cpuflags.h  |  39 +++
 lib/eal/loongarch/include/rte_cycles.h|  53 
 lib/eal/loongarch/include/rte_io.h|  18 ++
 lib/eal/loongarch/include/rte_mcslock.h   |  18 ++
 lib/eal/loongarch/include/rte_memcpy.h| 193 +
 lib/eal/loongarch/include/rte_pause.h |  24 ++
 lib/eal/loongarch/include/rte_pflock.h|  17 ++
 .../loongarch/include/rte_power_intrinsics.h  |  20 ++
 lib/eal/loongarch/include/rte_prefetch.h  |  47 
 lib/eal/loongarch/include/rte_rwlock.h|  42 +++
 lib/eal/loongarch/include/rte_spinlock.h  |  90 +++
 lib/eal/loongarch/include/rte_ticketlock.h|  18 ++
 lib/eal/loongarch/include/rte_vect.h  |  46 
 lib/eal/loongarch/meson.build |  11 +
 lib/eal/loongarch/rte_cpuflags.c  |  94 +++
 lib/eal/loongarch/rte_cycles.c|  45 
 lib/eal/loongarch/rte_hypervisor.c|  11 +
 lib/eal/loongarch/rte_power_intrinsics.c  |  51 
 meson.build   |   2 +
 35 files changed, 1370 insertions(+), 2 deletions(-)
 create mode 100644 config/loongarch/loongarch_loongarch64_linux_gcc
 create mode 100644 config/loongarch/meson.build
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_lsx.c
 create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_lsx.c
 create mode 100644 lib/eal/loongarch/include/meson.build
 create mode 100644 lib/eal/loongarch/include/rte_atomic.h
 create mode 100644 lib/eal/loongarch/include/rte_byteorder.h
 create mode 100644 lib/eal/loongarch/include/rte_cpuflags.h
 create mode 100644 lib/eal/loongarch/include/rte_cycles.h
 create mode 100644 lib/eal/loongarch/include/rte_io.h
 create mode 100644 lib/eal/loongarch/include/rte_mcslock.h
 create mode 100644 lib/eal/loongarch/include/rte_memcpy.h
 create mode 100644 lib/eal/loongarch/include/rte_pause.h
 crea

[v3 09/24] eal/loongarch: add mcslock operations for LoongArch

2022-06-06 Thread Min Zhou

This patch adds mcslock operations for LoongArch architecture.
Let it uses generic mcslock implementation.

Signed-off-by: Min Zhou 
---
 lib/eal/loongarch/include/rte_mcslock.h | 18 ++
 1 file changed, 18 insertions(+)
 create mode 100644 lib/eal/loongarch/include/rte_mcslock.h

diff --git a/lib/eal/loongarch/include/rte_mcslock.h 
b/lib/eal/loongarch/include/rte_mcslock.h
new file mode 100644
index 00..c4484b66fa
--- /dev/null
+++ b/lib/eal/loongarch/include/rte_mcslock.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _RTE_MCSLOCK_LOONGARCH_H_
+#define _RTE_MCSLOCK_LOONGARCH_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_mcslock.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MCSLOCK_LOONGARCH_H_ */
-- 
2.31.1

[v3 24/24] maintainers: claim responsibility for LoongArch

2022-06-06 Thread Min Zhou

This patch adds claim responsibility for LoongArch architecture.

Signed-off-by: Min Zhou 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f34f6fa2e9..eb38bf473b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -269,6 +269,15 @@ F: lib/eal/include/rte_random.h
 F: lib/eal/common/rte_random.c
 F: app/test/test_rand_perf.c
 
+LoongArch
+M: Min Zhou 
+F: config/loongarch/
+F: lib/eal/loongarch/
+F: lib/*/*_lsx.*
+F: drivers/*/*/*_lsx.*
+F: app/*/*_lsx.*
+F: examples/*/*_lsx.*
+
 ARM v7
 M: Jan Viktorin 
 M: Ruifeng Wang 
-- 
2.31.1

[v3 14/24] eal/loongarch: add power operations for LoongArch

2022-06-06 Thread Min Zhou

This patch adds power operations for LoongArch architecture. In
fact, these operations are temporarily not supported on LoongArch.

Signed-off-by: Min Zhou 
---
 .../loongarch/include/rte_power_intrinsics.h  | 20 
 lib/eal/loongarch/rte_power_intrinsics.c  | 51 +++
 2 files changed, 71 insertions(+)
 create mode 100644 lib/eal/loongarch/include/rte_power_intrinsics.h
 create mode 100644 lib/eal/loongarch/rte_power_intrinsics.c

diff --git a/lib/eal/loongarch/include/rte_power_intrinsics.h 
b/lib/eal/loongarch/include/rte_power_intrinsics.h
new file mode 100644
index 00..b6a2c0d82e
--- /dev/null
+++ b/lib/eal/loongarch/include/rte_power_intrinsics.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _RTE_POWER_INTRINSIC_LOONGARCH_H_
+#define _RTE_POWER_INTRINSIC_LOONGARCH_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+
+#include "generic/rte_power_intrinsics.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_POWER_INTRINSIC_LOONGARCH_H_ */
diff --git a/lib/eal/loongarch/rte_power_intrinsics.c 
b/lib/eal/loongarch/rte_power_intrinsics.c
new file mode 100644
index 00..3dd1375ce4
--- /dev/null
+++ b/lib/eal/loongarch/rte_power_intrinsics.c
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#include "rte_power_intrinsics.h"
+
+/**
+ * This function is not supported on LOONGARCH.
+ */
+int
+rte_power_monitor(const struct rte_power_monitor_cond *pmc,
+   const uint64_t tsc_timestamp)
+{
+   RTE_SET_USED(pmc);
+   RTE_SET_USED(tsc_timestamp);
+
+   return -ENOTSUP;
+}
+
+/**
+ * This function is not supported on LOONGARCH.
+ */
+int
+rte_power_pause(const uint64_t tsc_timestamp)
+{
+   RTE_SET_USED(tsc_timestamp);
+
+   return -ENOTSUP;
+}
+
+/**
+ * This function is not supported on LOONGARCH.
+ */
+int
+rte_power_monitor_wakeup(const unsigned int lcore_id)
+{
+   RTE_SET_USED(lcore_id);
+
+   return -ENOTSUP;
+}
+
+int
+rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
+   const uint32_t num, const uint64_t tsc_timestamp)
+{
+   RTE_SET_USED(pmc);
+   RTE_SET_USED(num);
+   RTE_SET_USED(tsc_timestamp);
+
+   return -ENOTSUP;
+}
-- 
2.31.1

[v3 08/24] eal/loongarch: add io operations for LoongArch

2022-06-06 Thread Min Zhou

This patch adds io operations for LoongArch architecture. Let it
uses generic I/O implementation.

Signed-off-by: Min Zhou 
---
 lib/eal/loongarch/include/rte_io.h | 18 ++
 1 file changed, 18 insertions(+)
 create mode 100644 lib/eal/loongarch/include/rte_io.h

diff --git a/lib/eal/loongarch/include/rte_io.h 
b/lib/eal/loongarch/include/rte_io.h
new file mode 100644
index 00..af152a727a
--- /dev/null
+++ b/lib/eal/loongarch/include/rte_io.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _RTE_IO_LOONGARCH_H_
+#define _RTE_IO_LOONGARCH_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_io.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_IO_LOONGARCH_H_ */
-- 
2.31.1

[v3 17/24] eal/linux: set eal base address for LoongArch

2022-06-06 Thread Min Zhou

This patch sets a different eal base address for LoongArch
architecture.

Signed-off-by: Min Zhou 
---
 lib/eal/linux/eal_memory.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c
index c890c42106..60fc8cc6ca 100644
--- a/lib/eal/linux/eal_memory.c
+++ b/lib/eal/linux/eal_memory.c
@@ -77,7 +77,11 @@ uint64_t eal_get_baseaddr(void)
 * rte_mem_check_dma_mask for ensuring all memory is within supported
 * range.
 */
+#if defined(RTE_ARCH_LOONGARCH)
+   return 0x70ULL;
+#else
return 0x1ULL;
+#endif
 }
 
 /*
-- 
2.31.1

RE: [PATCH v4 3/7] ethdev: introduce Rx queue based fill threshold

2022-06-06 Thread Spike Du

Hi Andrew,
Please see below for "fill threshold" concept, I'm ok with other 
comments about code.

Regards,
Spike.

> -Original Message-
> From: Andrew Rybchenko 
> Sent: Saturday, June 4, 2022 8:46 PM
> To: Spike Du ; Matan Azrad ;
> Slava Ovsiienko ; Ori Kam ;
> NBU-Contact-Thomas Monjalon (EXTERNAL) ;
> Wenzhuo Lu ; Beilei Xing ;
> Bernard Iremonger ; Ray Kinsella
> ; Neil Horman 
> Cc: step...@networkplumber.org; m...@smartsharesystems.com;
> dev@dpdk.org; Raslan Darawsheh 
> Subject: Re: [PATCH v4 3/7] ethdev: introduce Rx queue based fill threshold
> 
> External email: Use caution opening links or attachments
> 
> 
> On 6/3/22 15:48, Spike Du wrote:
> > Fill threshold describes the fullness of a Rx queue. If the Rx queue
> > fullness is above the threshold, the device will trigger the event
> > RTE_ETH_EVENT_RX_FILL_THRESH.
> 
> Sorry, I'm not sure that I understand. As far as I know the process to add
> more Rx buffers to Rx queue is called 'refill' in many drivers. So fill level 
> is a
> number (or percentage) of free buffers in an Rx queue.
> If so, fill threashold should be a minimum fill level and below the level we
> should generate an event.
> 
> However reading the first paragraph of the descrition it looks like you mean
> oposite thing - a number (or percentage) of ready Rx buffers with received
> packets.
> 
> I think that the term "fill threshold" is suggested by me, but I did it with 
> mine
> understanding of the added feature. Now I'm confused.
> 
> Moreover, I don't understand how "fill threshold" could be in terms of ready
> Rx buffers. HW simply don't really know when ready Rx buffers are
> processed by SW. So, HW can't say for sure how many ready Rx buffers are
> pending. It could be calculated as Rx queue size minus number of free Rx
> buffers, but it is imprecise. First of all not all Rx descriptors could be 
> used.
> Second, HW ring size could differ queue size specified in SW.
> Queue size specified in SW could just limit maximum nubmer of free Rx
> buffers provided by the driver.
> 

Let me use other terms because "fill"/"refill" is also ambiguous to me.
In a RX ring, there are Rx buffers with received packets, you call it "ready Rx 
buffers", there is a RTE api rte_eth_rx_queue_count() to get the number,
It's also called "used descriptors" in the code.
Also there are Rx buffers provided by SW to allow HW "fill in" received 
packets, we can call it "usable Rx buffers" (here "usable" means usable for HW).
Let's define Rx queue "fullness":
Fullness = ready-Rx-buffers/Rxq-size
On the opposite, we have "emptiness"
Emptiness = usable-Rx-buffers/Rxq-size
Here "fill threshold" describes "fullness", it's not "refill" described in you 
above words. Because in your words, "refill" is the opposite, it's filling 
"usable/free Rx buffers", or "emptiness".

I can only briefly explain how mlx5 works to get LWM, because I'm not a 
Firmware guy.
Mlx5 Rx queue is basically RDMA queue. It has two indexes: producer index which 
increases when HW fills in packet, consumer index which increases when SW 
consumes the packet.
The queue size is known when it's created. The fullness is something like 
(producer_index - consumer_index) (I don't consider in wrap-around here).
So mlx5 has the way to get the fullness or emptiness in HW or FW. 
Another detail is mlx5 uses the term "LWM"(limit watermark), which describes 
"emptiness". When usable-Rx-buffers is below LWM, we trigger an event.
But Thomas think "fullness" is easier to understand, so we use "fullness" in 
rte APIs and we'll translate it to LWM in mlx5 PMD.

> > Fill threshold is defined as a percentage of Rx queue size with valid
> > value of [0,99].
> > Setting fill threshold to 0 means disable it, which is the default.
> > Add fill threshold configuration and query driver callbacks in eth_dev_ops.
> > Add command line options to support fill_thresh per-rxq configure.
> > - Command syntax:
> >set port  rxq  fill_thresh 
> >
> > - Example commands:
> > To configure fill_thresh as 30% of rxq size on port 1 rxq 0:
> > testpmd> set port 1 rxq 0 fill_thresh 30
> >
> > To disable fill_thresh on port 1 rxq 0:
> > testpmd> set port 1 rxq 0 fill_thresh 0
> >
> > Signed-off-by: Spike Du 
> > ---
> >   app/test-pmd/cmdline.c | 68
> +++
> >   app/test-pmd/config.c  | 21 ++
> >   app/test-pmd/testpmd.c | 18 
> >   app/test-pmd/testpmd.h |  2 ++
> >   lib/ethdev/ethdev_driver.h | 22 ++
> >   lib/ethdev/rte_ethdev.c| 52 +
> >   lib/ethdev/rte_ethdev.h| 72
> ++
> >   lib/ethdev/version.map |  2 ++
> >   8 files changed, 257 insertions(+)
> >
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > 0410bad..918581e 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -17823,6 +17823,73 @@ struct
> cmd_

[v3 10/24] eal/loongarch: add pause operations for LoongArch

2022-06-06 Thread Min Zhou

This patch adds architecture specific pause operations for
LoongArch architecture.

Signed-off-by: Min Zhou 
---
 lib/eal/loongarch/include/rte_pause.h | 24 
 1 file changed, 24 insertions(+)
 create mode 100644 lib/eal/loongarch/include/rte_pause.h

diff --git a/lib/eal/loongarch/include/rte_pause.h 
b/lib/eal/loongarch/include/rte_pause.h
new file mode 100644
index 00..438de23128
--- /dev/null
+++ b/lib/eal/loongarch/include/rte_pause.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _RTE_PAUSE_LOONGARCH_H_
+#define _RTE_PAUSE_LOONGARCH_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_atomic.h"
+
+#include "generic/rte_pause.h"
+
+static inline void rte_pause(void)
+{
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PAUSE_LOONGARCH_H_ */
-- 
2.31.1

[v3 16/24] mem: add huge page size definition for LoongArch

2022-06-06 Thread Min Zhou

LoongArch architecture has a different huge page size (32MB) than
other architectures. This patch adds a new huge page size for
LoongArch architecture.

Signed-off-by: Min Zhou 
---
 lib/eal/include/rte_memory.h  | 1 +
 lib/eal/include/rte_memzone.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
index 68b069fd04..ff4b5695db 100644
--- a/lib/eal/include/rte_memory.h
+++ b/lib/eal/include/rte_memory.h
@@ -30,6 +30,7 @@ extern "C" {
 #define RTE_PGSIZE_256K (1ULL << 18)
 #define RTE_PGSIZE_2M   (1ULL << 21)
 #define RTE_PGSIZE_16M  (1ULL << 24)
+#define RTE_PGSIZE_32M  (1ULL << 25)
 #define RTE_PGSIZE_256M (1ULL << 28)
 #define RTE_PGSIZE_512M (1ULL << 29)
 #define RTE_PGSIZE_1G   (1ULL << 30)
diff --git a/lib/eal/include/rte_memzone.h b/lib/eal/include/rte_memzone.h
index 5db1210831..a3305d9e97 100644
--- a/lib/eal/include/rte_memzone.h
+++ b/lib/eal/include/rte_memzone.h
@@ -35,6 +35,7 @@ extern "C" {
 #define RTE_MEMZONE_1GB0x0002   /**< Use 1GB pages. */
 #define RTE_MEMZONE_16MB   0x0100   /**< Use 16MB pages. */
 #define RTE_MEMZONE_16GB   0x0200   /**< Use 16GB pages. */
+#define RTE_MEMZONE_32MB   0x0400  /**< Use 32MB pages. */
 #define RTE_MEMZONE_256KB  0x0001   /**< Use 256KB pages. */
 #define RTE_MEMZONE_256MB  0x0002   /**< Use 256MB pages. */
 #define RTE_MEMZONE_512MB  0x0004   /**< Use 512MB pages. */
-- 
2.31.1

[v3 04/24] eal/loongarch: add prefetch operations for LoongArch

2022-06-06 Thread Min Zhou

This patch adds architecture specific prefetch operations
for LoongArch architecture.

Signed-off-by: Min Zhou 
---
 lib/eal/loongarch/include/rte_prefetch.h | 47 
 1 file changed, 47 insertions(+)
 create mode 100644 lib/eal/loongarch/include/rte_prefetch.h

diff --git a/lib/eal/loongarch/include/rte_prefetch.h 
b/lib/eal/loongarch/include/rte_prefetch.h
new file mode 100644
index 00..0fd9262ea8
--- /dev/null
+++ b/lib/eal/loongarch/include/rte_prefetch.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _RTE_PREFETCH_LOONGARCH_H_
+#define _RTE_PREFETCH_LOONGARCH_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include "generic/rte_prefetch.h"
+
+static inline void rte_prefetch0(const volatile void *p)
+{
+   __builtin_prefetch((const void *)(uintptr_t)p, 0, 3);
+}
+
+static inline void rte_prefetch1(const volatile void *p)
+{
+   __builtin_prefetch((const void *)(uintptr_t)p, 0, 2);
+}
+
+static inline void rte_prefetch2(const volatile void *p)
+{
+   __builtin_prefetch((const void *)(uintptr_t)p, 0, 1);
+}
+
+static inline void rte_prefetch_non_temporal(const volatile void *p)
+{
+   /* non-temporal version not available, fallback to rte_prefetch0 */
+   rte_prefetch0(p);
+}
+
+__rte_experimental
+static inline void
+rte_cldemote(const volatile void *p)
+{
+   RTE_SET_USED(p);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PREFETCH_LOONGARCH_H_ */
-- 
2.31.1

[v3 06/24] eal/loongarch: add cpu flag checks for LoongArch

2022-06-06 Thread Min Zhou

This patch uses aux vector software register to get CPU flags
and add CPU flag checking support for LoongArch architecture.

Signed-off-by: Min Zhou 
---
 lib/eal/loongarch/include/rte_cpuflags.h | 39 ++
 lib/eal/loongarch/rte_cpuflags.c | 94 
 2 files changed, 133 insertions(+)
 create mode 100644 lib/eal/loongarch/include/rte_cpuflags.h
 create mode 100644 lib/eal/loongarch/rte_cpuflags.c

diff --git a/lib/eal/loongarch/include/rte_cpuflags.h 
b/lib/eal/loongarch/include/rte_cpuflags.h
new file mode 100644
index 00..d9121a00a8
--- /dev/null
+++ b/lib/eal/loongarch/include/rte_cpuflags.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _RTE_CPUFLAGS_LOONGARCH_H_
+#define _RTE_CPUFLAGS_LOONGARCH_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Enumeration of all CPU features supported
+ */
+enum rte_cpu_flag_t {
+   RTE_CPUFLAG_CPUCFG = 0,
+   RTE_CPUFLAG_LAM,
+   RTE_CPUFLAG_UAL,
+   RTE_CPUFLAG_FPU,
+   RTE_CPUFLAG_LSX,
+   RTE_CPUFLAG_LASX,
+   RTE_CPUFLAG_CRC32,
+   RTE_CPUFLAG_COMPLEX,
+   RTE_CPUFLAG_CRYPTO,
+   RTE_CPUFLAG_LVZ,
+   RTE_CPUFLAG_LBT_X86,
+   RTE_CPUFLAG_LBT_ARM,
+   RTE_CPUFLAG_LBT_MIPS,
+   /* The last item */
+   RTE_CPUFLAG_NUMFLAGS /**< This should always be the last! */
+};
+
+#include "generic/rte_cpuflags.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_CPUFLAGS_LOONGARCH_H_ */
diff --git a/lib/eal/loongarch/rte_cpuflags.c b/lib/eal/loongarch/rte_cpuflags.c
new file mode 100644
index 00..4abcd0fdb3
--- /dev/null
+++ b/lib/eal/loongarch/rte_cpuflags.c
@@ -0,0 +1,94 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#include "rte_cpuflags.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Symbolic values for the entries in the auxiliary table */
+#define AT_HWCAP  16
+#define AT_HWCAP2 26
+
+/* software based registers */
+enum cpu_register_t {
+   REG_NONE = 0,
+   REG_HWCAP,
+   REG_MAX
+};
+
+typedef uint32_t hwcap_registers_t[REG_MAX];
+
+struct feature_entry {
+   uint32_t reg;
+   uint32_t bit;
+#define CPU_FLAG_NAME_MAX_LEN 64
+   char name[CPU_FLAG_NAME_MAX_LEN];
+};
+
+#define FEAT_DEF(name, reg, bit) \
+   [RTE_CPUFLAG_##name] = {reg, bit, #name},
+
+const struct feature_entry rte_cpu_feature_table[] = {
+   FEAT_DEF(CPUCFG, REG_HWCAP,   0)
+   FEAT_DEF(LAM,REG_HWCAP,   1)
+   FEAT_DEF(UAL,REG_HWCAP,   2)
+   FEAT_DEF(FPU,REG_HWCAP,   3)
+   FEAT_DEF(LSX,REG_HWCAP,   4)
+   FEAT_DEF(LASX,   REG_HWCAP,   5)
+   FEAT_DEF(CRC32,  REG_HWCAP,   6)
+   FEAT_DEF(COMPLEX,REG_HWCAP,   7)
+   FEAT_DEF(CRYPTO, REG_HWCAP,   8)
+   FEAT_DEF(LVZ,REG_HWCAP,   9)
+   FEAT_DEF(LBT_X86,REG_HWCAP,  10)
+   FEAT_DEF(LBT_ARM,REG_HWCAP,  11)
+   FEAT_DEF(LBT_MIPS,   REG_HWCAP,  12)
+};
+
+/*
+ * Read AUXV software register and get cpu features for LoongArch
+ */
+static void
+rte_cpu_get_features(hwcap_registers_t out)
+{
+   out[REG_HWCAP] = rte_cpu_getauxval(AT_HWCAP);
+}
+
+/*
+ * Checks if a particular flag is available on current machine.
+ */
+int
+rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
+{
+   const struct feature_entry *feat;
+   hwcap_registers_t regs = {0};
+
+   if (feature >= RTE_CPUFLAG_NUMFLAGS)
+   return -ENOENT;
+
+   feat = &rte_cpu_feature_table[feature];
+   if (feat->reg == REG_NONE)
+   return -EFAULT;
+
+   rte_cpu_get_features(regs);
+   return (regs[feat->reg] >> feat->bit) & 1;
+}
+
+const char *
+rte_cpu_get_flag_name(enum rte_cpu_flag_t feature)
+{
+   if (feature >= RTE_CPUFLAG_NUMFLAGS)
+   return NULL;
+   return rte_cpu_feature_table[feature].name;
+}
+
+void
+rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics)
+{
+   memset(intrinsics, 0, sizeof(*intrinsics));
+}
-- 
2.31.1

[v3 15/24] eal/loongarch: add hypervisor operations for LoongArch

2022-06-06 Thread Min Zhou

This patch adds hypervisor operations for LoongArch architecture.
In fact, these operations are currently not supported on LoongArch.

Signed-off-by: Min Zhou 
---
 lib/eal/loongarch/rte_hypervisor.c | 11 +++
 1 file changed, 11 insertions(+)
 create mode 100644 lib/eal/loongarch/rte_hypervisor.c

diff --git a/lib/eal/loongarch/rte_hypervisor.c 
b/lib/eal/loongarch/rte_hypervisor.c
new file mode 100644
index 00..d044906f71
--- /dev/null
+++ b/lib/eal/loongarch/rte_hypervisor.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#include "rte_hypervisor.h"
+
+enum rte_hypervisor
+rte_hypervisor_get(void)
+{
+   return RTE_HYPERVISOR_UNKNOWN;
+}
-- 
2.31.1

[v3 18/24] meson: introduce LoongArch architecture

2022-06-06 Thread Min Zhou

This patch adds some meson.build files for building DPDK on
LoongArch architecture.

Signed-off-by: Min Zhou 
---
 .../loongarch/loongarch_loongarch64_linux_gcc | 16 +++
 config/loongarch/meson.build  | 43 +++
 lib/eal/loongarch/include/meson.build | 21 +
 lib/eal/loongarch/meson.build | 11 +
 meson.build   |  2 +
 5 files changed, 93 insertions(+)
 create mode 100644 config/loongarch/loongarch_loongarch64_linux_gcc
 create mode 100644 config/loongarch/meson.build
 create mode 100644 lib/eal/loongarch/include/meson.build
 create mode 100644 lib/eal/loongarch/meson.build

diff --git a/config/loongarch/loongarch_loongarch64_linux_gcc 
b/config/loongarch/loongarch_loongarch64_linux_gcc
new file mode 100644
index 00..0c44ae96e6
--- /dev/null
+++ b/config/loongarch/loongarch_loongarch64_linux_gcc
@@ -0,0 +1,16 @@
+[binaries]
+c = 'loongarch64-unknown-linux-gnu-gcc'
+cpp = 'loongarch64-unknown-linux-gnu-cpp'
+ar = 'loongarch64-unknown-linux-gnu-gcc-ar'
+strip = 'loongarch64-unknown-linux-gnu-strip'
+pcap-config = ''
+
+[host_machine]
+system = 'linux'
+cpu_family = 'loongarch64'
+cpu = '3a5000'
+endian = 'little'
+
+[properties]
+implementor_id = 'generic'
+implementor_pn = 'default'
diff --git a/config/loongarch/meson.build b/config/loongarch/meson.build
new file mode 100644
index 00..d58e1ea6e9
--- /dev/null
+++ b/config/loongarch/meson.build
@@ -0,0 +1,43 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2022 Loongson Technology Corporation Limited
+
+if not dpdk_conf.get('RTE_ARCH_64')
+   error('Only 64-bit compiles are supported for this platform type')
+endif
+dpdk_conf.set('RTE_ARCH', 'loongarch')
+dpdk_conf.set('RTE_ARCH_LOONGARCH', 1)
+dpdk_conf.set('RTE_ARCH_NO_VECTOR', 1)
+
+machine_args_generic = [
+['default', ['-march=loongarch64']],
+]
+
+flags_generic = [
+['RTE_MACHINE', '"loongarch64"'],
+['RTE_MAX_LCORE', 64],
+['RTE_MAX_NUMA_NODES', 16],
+['RTE_CACHE_LINE_SIZE', 64]]
+
+impl_generic = ['Generic loongarch', flags_generic, machine_args_generic]
+
+machine = []
+machine_args = []
+
+machine = impl_generic
+impl_pn = 'default'
+
+message('Implementer : ' + machine[0])
+foreach flag: machine[1]
+if flag.length() > 0
+dpdk_conf.set(flag[0], flag[1])
+endif
+endforeach
+
+foreach marg: machine[2]
+if marg[0] == impl_pn
+foreach f: marg[1]
+   machine_args += f
+endforeach
+endif
+endforeach
+message(machine_args)
diff --git a/lib/eal/loongarch/include/meson.build 
b/lib/eal/loongarch/include/meson.build
new file mode 100644
index 00..d5699c5373
--- /dev/null
+++ b/lib/eal/loongarch/include/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2022 Loongson Technology Corporation Limited
+
+arch_headers = files(
+   'rte_atomic.h',
+   'rte_byteorder.h',
+   'rte_cpuflags.h',
+   'rte_cycles.h',
+   'rte_io.h',
+   'rte_mcslock.h',
+   'rte_memcpy.h',
+   'rte_pause.h',
+   'rte_pflock.h',
+   'rte_power_intrinsics.h',
+   'rte_prefetch.h',
+   'rte_rwlock.h',
+   'rte_spinlock.h',
+   'rte_ticketlock.h',
+   'rte_vect.h',
+)
+install_headers(arch_headers, subdir: get_option('include_subdir_arch'))
diff --git a/lib/eal/loongarch/meson.build b/lib/eal/loongarch/meson.build
new file mode 100644
index 00..e14b1ed431
--- /dev/null
+++ b/lib/eal/loongarch/meson.build
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2022 Loongson Technology Corporation Limited
+
+subdir('include')
+
+sources += files(
+   'rte_cpuflags.c',
+   'rte_cycles.c',
+   'rte_hypervisor.c',
+   'rte_power_intrinsics.c',
+)
diff --git a/meson.build b/meson.build
index 5561171617..bfad7b28a0 100644
--- a/meson.build
+++ b/meson.build
@@ -52,6 +52,8 @@ elif host_machine.cpu_family().startswith('arm') or 
host_machine.cpu_family().st
 arch_subdir = 'arm'
 elif host_machine.cpu_family().startswith('ppc')
 arch_subdir = 'ppc'
+elif host_machine.cpu_family().startswith('loongarch')
+arch_subdir = 'loongarch'
 endif
 
 # configure the build, and make sure configs here and in config folder are
-- 
2.31.1

[v3 12/24] eal/loongarch: add rwlock operations for LoongArch

2022-06-06 Thread Min Zhou

This patch adds rwlock operations for LoongArch architecture.
These implementations refer to rte_rwlock.h of PPC.

Signed-off-by: Min Zhou 
---
 lib/eal/loongarch/include/rte_rwlock.h | 42 ++
 1 file changed, 42 insertions(+)
 create mode 100644 lib/eal/loongarch/include/rte_rwlock.h

diff --git a/lib/eal/loongarch/include/rte_rwlock.h 
b/lib/eal/loongarch/include/rte_rwlock.h
new file mode 100644
index 00..aac6f60120
--- /dev/null
+++ b/lib/eal/loongarch/include/rte_rwlock.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _RTE_RWLOCK_LOONGARCH_H_
+#define _RTE_RWLOCK_LOONGARCH_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_rwlock.h"
+
+static inline void
+rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
+{
+   rte_rwlock_read_lock(rwl);
+}
+
+static inline void
+rte_rwlock_read_unlock_tm(rte_rwlock_t *rwl)
+{
+   rte_rwlock_read_unlock(rwl);
+}
+
+static inline void
+rte_rwlock_write_lock_tm(rte_rwlock_t *rwl)
+{
+   rte_rwlock_write_lock(rwl);
+}
+
+static inline void
+rte_rwlock_write_unlock_tm(rte_rwlock_t *rwl)
+{
+   rte_rwlock_write_unlock(rwl);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RWLOCK_LOONGARCH_H_ */
-- 
2.31.1

[v3 22/24] tap: add system call number for LoongArch

2022-06-06 Thread Min Zhou

This patch adds system call number of bpf for LoongArch
architecture.

Signed-off-by: Min Zhou 
---
 drivers/net/tap/tap_bpf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/tap/tap_bpf.h b/drivers/net/tap/tap_bpf.h
index f0b9fc7a2c..b1c9600ed8 100644
--- a/drivers/net/tap/tap_bpf.h
+++ b/drivers/net/tap/tap_bpf.h
@@ -93,7 +93,7 @@ union bpf_attr {
 #  define __NR_bpf 321
 # elif defined(__arm__)
 #  define __NR_bpf 386
-# elif defined(__aarch64__)
+# elif defined(__aarch64__) || defined(__loongarch__)
 #  define __NR_bpf 280
 # elif defined(__sparc__)
 #  define __NR_bpf 349
-- 
2.31.1

[v3 02/24] eal/loongarch: add byte order operations for LoongArch

2022-06-06 Thread Min Zhou

This patch adds architecture specific byte order operations
for LoongArch architecture. LoongArch bit designations are
always little-endian.

Signed-off-by: Min Zhou 
---
 lib/eal/loongarch/include/rte_byteorder.h | 46 +++
 1 file changed, 46 insertions(+)
 create mode 100644 lib/eal/loongarch/include/rte_byteorder.h

diff --git a/lib/eal/loongarch/include/rte_byteorder.h 
b/lib/eal/loongarch/include/rte_byteorder.h
new file mode 100644
index 00..2cda010256
--- /dev/null
+++ b/lib/eal/loongarch/include/rte_byteorder.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#ifndef _RTE_BYTEORDER_LOONGARCH_H_
+#define _RTE_BYTEORDER_LOONGARCH_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_byteorder.h"
+
+#ifndef RTE_FORCE_INTRINSICS
+#define rte_bswap16(x) rte_constant_bswap16(x)
+#define rte_bswap32(x) rte_constant_bswap32(x)
+#define rte_bswap64(x) rte_constant_bswap64(x)
+#endif
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+
+#define rte_cpu_to_le_16(x) (x)
+#define rte_cpu_to_le_32(x) (x)
+#define rte_cpu_to_le_64(x) (x)
+
+#define rte_cpu_to_be_16(x) rte_bswap16(x)
+#define rte_cpu_to_be_32(x) rte_bswap32(x)
+#define rte_cpu_to_be_64(x) rte_bswap64(x)
+
+#define rte_le_to_cpu_16(x) (x)
+#define rte_le_to_cpu_32(x) (x)
+#define rte_le_to_cpu_64(x) (x)
+
+#define rte_be_to_cpu_16(x) rte_bswap16(x)
+#define rte_be_to_cpu_32(x) rte_bswap32(x)
+#define rte_be_to_cpu_64(x) rte_bswap64(x)
+
+#else /* RTE_BIG_ENDIAN */
+#error "LoongArch not support big endian!"
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BYTEORDER_LOONGARCH_H_ */
-- 
2.31.1

[v3 20/24] ixgbe: add dummy vector implementation for LoongArch

2022-06-06 Thread Min Zhou

The purpose of this patch is used to fix building issues for
LoongArch architecture. The hardware instructions based vector
implementation will come in a future patch.

Signed-off-by: Min Zhou 
---
 drivers/net/ixgbe/ixgbe_rxtx_vec_lsx.c | 60 ++
 drivers/net/ixgbe/meson.build  |  2 +
 2 files changed, 62 insertions(+)
 create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_lsx.c

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_lsx.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_lsx.c
new file mode 100644
index 00..412c8f937a
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_lsx.c
@@ -0,0 +1,60 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Loongson Technology Corporation Limited
+ */
+
+#include "base/ixgbe_common.h"
+#include "ixgbe_ethdev.h"
+#include "ixgbe_rxtx.h"
+
+/* The vector support will come later */
+#ifdef RTE_ARCH_NO_VECTOR
+int
+ixgbe_rx_vec_dev_conf_condition_check(__rte_unused struct rte_eth_dev *dev)
+{
+   return -1;
+}
+
+uint16_t
+ixgbe_recv_pkts_vec(__rte_unused void *rx_queue,
+   __rte_unused struct rte_mbuf **rx_pkts,
+   __rte_unused uint16_t nb_pkts)
+{
+   return 0;
+}
+
+uint16_t
+ixgbe_recv_scattered_pkts_vec(__rte_unused void *rx_queue,
+   __rte_unused struct rte_mbuf **rx_pkts,
+   __rte_unused uint16_t nb_pkts)
+{
+   return 0;
+}
+
+int
+ixgbe_rxq_vec_setup(__rte_unused struct ixgbe_rx_queue *rxq)
+{
+   return -1;
+}
+
+uint16_t
+ixgbe_xmit_fixed_burst_vec(__rte_unused void *tx_queue,
+   __rte_unused struct rte_mbuf **tx_pkts,
+   __rte_unused uint16_t nb_pkts)
+{
+   return 0;
+}
+
+int
+ixgbe_txq_vec_setup(__rte_unused struct ixgbe_tx_queue *txq)
+{
+   return -1;
+}
+
+void
+ixgbe_rx_queue_release_mbufs_vec(__rte_unused struct ixgbe_rx_queue *rxq)
+{
+}
+#else
+#error "The current version of LoongArch does not support vector!"
+#endif
diff --git a/drivers/net/ixgbe/meson.build b/drivers/net/ixgbe/meson.build
index 162f8d5f46..33c9a58ac8 100644
--- a/drivers/net/ixgbe/meson.build
+++ b/drivers/net/ixgbe/meson.build
@@ -29,6 +29,8 @@ if arch_subdir == 'x86'
 endif
 elif arch_subdir == 'arm'
 sources += files('ixgbe_rxtx_vec_neon.c')
+elif arch_subdir == 'loongarch'
+sources += files('ixgbe_rxtx_vec_lsx.c')
 endif
 
 includes += include_directories('base')
-- 
2.31.1

1 2 >

1 - 100 of 147 matches

Mail list logo