Re: [PATCH 1/1] test/pmd_perf: handling of unknown connection speed

2022-06-27 Thread Heinrich Schuchardt

On 6/26/22 17:15, Thomas Monjalon wrote:

11/05/2022 18:33, Heinrich Schuchardt:

When running DPDK in QEMU it cannot determine the connection speed.
pmd_perf_autotest treats this as if the connection speed where
UNIT32_MAX Mbps:

 RTE>>pmd_perf_autotest
 Start PMD RXTX cycles cost test.
 Allocated mbuf pool on socket 0
 CONFIG RXD=1024 TXD=1024
 Performance test runs on lcore 1 socket 0
 Port 0 Address:52:54:00:12:34:57
 Port 1 Address:52:54:00:12:34:58
 Checking link statuses...
 Port 0 Link up at Unknown FDX Autoneg
 Port 1 Link up at Unknown FDX Autoneg
 IPv4 pktlen 46
 UDP pktlen 26
 Generate 4096 packets @socket 0
 inject 2048 packet to port 0
 inject 2048 packet to port 1
 Total packets inject to prime ports = 4096
 Each port will do 6391320379464 packets per second
 Test will stop after at least 25565281517856 packets received

This will not allow the test to terminate in a reasonable timespan.
Just assume 10 Gbps in this case instead:

 ...
 Test will stop after at least 59523808 packets received

Signed-off-by: Heinrich Schuchardt 
---
  app/test/test_pmd_perf.c | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 25611bfe9b..ee08c8aade 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -486,10 +486,17 @@ main_loop(__rte_unused void *args)
}
printf("Total packets inject to prime ports = %u\n", idx);
  
-	packets_per_second = (link_mbps * 1000 * 1000) /

-   ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT);
-   printf("Each port will do %"PRIu64" packets per second\n",
-  packets_per_second);
+   if (link_mbps != RTE_ETH_SPEED_NUM_UNKNOWN) {
+   packets_per_second = (link_mbps * 1000 * 1000) /
+   ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT);
+   printf("Each port will do %"PRIu64" packets per second\n",
+  packets_per_second);
+   total_packets = RTE_TEST_DURATION * conf->nb_ports * 
packets_per_second;


Yes this line should be removed.



This is redundant with below.


+   } else {
+   /* We don't know the speed. Pretend it is 10G */
+   packets_per_second = ((uint64_t)RTE_ETH_SPEED_NUM_10G * 1000 * 
1000) /
+   ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT);
+   }
  
  	total_packets = RTE_TEST_DURATION * conf->nb_ports * packets_per_second;


Why not just inserting this:

if (link_mbps == RTE_ETH_SPEED_NUM_UNKNOWN)
link_mbps = RTE_ETH_SPEED_NUM_10G;


Following your suggestion the message "Each port will do %"PRIu64" 
packets per second\n" would provide misleading information to the user. 
This should be avoided.


Best regards

Heinrich


[PATCH] net/iavf: fix issue of VF resetting

2022-06-27 Thread Yiding Zhou
When the VF is in closed state, the vf_reset flag can not be reverted
if the VF is reset asynchronously. This prevents all virtchnl commands
from executing, causing subsequent calls to iavf_dev_reset() to fail.

So the vf_reset flag needs to be reverted even when VF is in closed state.

Fixes: 676d986b4b86 ("net/iavf: fix crash after VF reset failure")
Cc: sta...@dpdk.org

Signed-off-by: Yiding Zhou 
---
 drivers/net/iavf/iavf_ethdev.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index 7df0bf8118..506fcff6e3 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -2702,8 +2702,10 @@ iavf_dev_close(struct rte_eth_dev *dev)
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return 0;
 
-   if (adapter->closed)
-   return 0;
+   if (adapter->closed) {
+   ret = 0;
+   goto out;
+   }
 
ret = iavf_dev_stop(dev);
adapter->closed = true;
@@ -2763,6 +2765,7 @@ iavf_dev_close(struct rte_eth_dev *dev)
 * the bus master bit will not be disabled, and this call will have no
 * effect.
 */
+out:
if (vf->vf_reset && !rte_pci_set_bus_master(pci_dev, true))
vf->vf_reset = false;
 
-- 
2.34.1



Re: [PATCH 22.07] doc: make doc roadmap common for Linux/BSD GSGs

2022-06-27 Thread Bruce Richardson
On Sun, Jun 26, 2022 at 11:34:30PM +0200, Thomas Monjalon wrote:
> 14/06/2022 16:12, Mcnamara, John:
> > From: Bruce Richardson 
> > > Both the Linux and FreeBSD GSG docs had a "Documentation Roadmap"
> > > section as part of the introduction page, and this contained the same
> > > information, with only the reference to the GSGs themselves being
> > > different. This text can be consolidated into a single text file which is
> > > included by both GSG intro sections - using relative links for the self
> > > reference.
> > 
> > Good idea.
> > 
> > Acked-by: John McNamara 
> 
> We already have a file doc/guides/linux_gsg/eal_args.include.rst
> I'll keep the format .include.rst for such file.
> 
Good idea. I missed that common file.

> Applied with this update, thanks.
> 
Thanks


Re: [PATCH 1/1] test/pmd_perf: handling of unknown connection speed

2022-06-27 Thread Thomas Monjalon
27/06/2022 09:18, Heinrich Schuchardt:
> On 6/26/22 17:15, Thomas Monjalon wrote:
> > 11/05/2022 18:33, Heinrich Schuchardt:
> >> When running DPDK in QEMU it cannot determine the connection speed.
> >> pmd_perf_autotest treats this as if the connection speed where
> >> UNIT32_MAX Mbps:
> >>
> >>  RTE>>pmd_perf_autotest
> >>  Start PMD RXTX cycles cost test.
> >>  Allocated mbuf pool on socket 0
> >>  CONFIG RXD=1024 TXD=1024
> >>  Performance test runs on lcore 1 socket 0
> >>  Port 0 Address:52:54:00:12:34:57
> >>  Port 1 Address:52:54:00:12:34:58
> >>  Checking link statuses...
> >>  Port 0 Link up at Unknown FDX Autoneg
> >>  Port 1 Link up at Unknown FDX Autoneg
> >>  IPv4 pktlen 46
> >>  UDP pktlen 26
> >>  Generate 4096 packets @socket 0
> >>  inject 2048 packet to port 0
> >>  inject 2048 packet to port 1
> >>  Total packets inject to prime ports = 4096
> >>  Each port will do 6391320379464 packets per second
> >>  Test will stop after at least 25565281517856 packets received
> >>
> >> This will not allow the test to terminate in a reasonable timespan.
> >> Just assume 10 Gbps in this case instead:
> >>
> >>  ...
> >>  Test will stop after at least 59523808 packets received
> >>
> >> Signed-off-by: Heinrich Schuchardt 
> >> ---
> >>   app/test/test_pmd_perf.c | 15 +++
> >>   1 file changed, 11 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
> >> index 25611bfe9b..ee08c8aade 100644
> >> --- a/app/test/test_pmd_perf.c
> >> +++ b/app/test/test_pmd_perf.c
> >> @@ -486,10 +486,17 @@ main_loop(__rte_unused void *args)
> >>}
> >>printf("Total packets inject to prime ports = %u\n", idx);
> >>   
> >> -  packets_per_second = (link_mbps * 1000 * 1000) /
> >> -  ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT);
> >> -  printf("Each port will do %"PRIu64" packets per second\n",
> >> - packets_per_second);
> >> +  if (link_mbps != RTE_ETH_SPEED_NUM_UNKNOWN) {
> >> +  packets_per_second = (link_mbps * 1000 * 1000) /
> >> +  ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT);
> >> +  printf("Each port will do %"PRIu64" packets per second\n",
> >> + packets_per_second);
> >> +  total_packets = RTE_TEST_DURATION * conf->nb_ports * 
> >> packets_per_second;
> 
> Yes this line should be removed.
> 
> > 
> > This is redundant with below.
> > 
> >> +  } else {
> >> +  /* We don't know the speed. Pretend it is 10G */
> >> +  packets_per_second = ((uint64_t)RTE_ETH_SPEED_NUM_10G * 1000 * 
> >> 1000) /
> >> +  ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT);
> >> +  }
> >>   
> >>total_packets = RTE_TEST_DURATION * conf->nb_ports * packets_per_second;
> > 
> > Why not just inserting this:
> > 
> > if (link_mbps == RTE_ETH_SPEED_NUM_UNKNOWN)
> > link_mbps = RTE_ETH_SPEED_NUM_10G;
> 
> Following your suggestion the message "Each port will do %"PRIu64" 
> packets per second\n" would provide misleading information to the user. 
> This should be avoided.

OK so we can have the printf inside an "if condition":

bool speed_unknown = (link_mbps == RTE_ETH_SPEED_NUM_UNKNOWN);
if (speed_unknown)
link_mbps = RTE_ETH_SPEED_NUM_10G;
packets_per_second = ...;
if (!speed_unknown)
printf(...);
total_packets = ...;





[PATCH] crypto/cnxk: predecrement esn value to be used in session

2022-06-27 Thread Anoob Joseph
ESN provided in the session would be the next sequence number to be
used. Hence predecrement the value, so that in datapath, incremented
value will be as expected.

Signed-off-by: Anoob Joseph 
---
 drivers/crypto/cnxk/cn9k_ipsec.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/cnxk/cn9k_ipsec.c b/drivers/crypto/cnxk/cn9k_ipsec.c
index cb9cf174a4..6d26b0cc01 100644
--- a/drivers/crypto/cnxk/cn9k_ipsec.c
+++ b/drivers/crypto/cnxk/cn9k_ipsec.c
@@ -41,7 +41,8 @@ cn9k_ipsec_outb_sa_create(struct cnxk_cpt_qp *qp,
/* Initialize lookaside IPsec private data */
sa->dir = RTE_SECURITY_IPSEC_SA_DIR_EGRESS;
 
-   sa->esn = ipsec->esn.value;
+   if (ipsec->esn.value)
+   sa->esn = ipsec->esn.value - 1;
 
ret = cnxk_ipsec_outb_rlens_get(&sa->rlens, ipsec, crypto_xform);
if (ret)
-- 
2.25.1



[PATCH] vdpa/sfc: handle sync issue between qemu and vhost-user

2022-06-27 Thread abhimanyu.saini
From: Abhimanyu Saini 

When DPDK app is running in the VF, it sometimes rings the doorbell
before dev_config has had a chance to complete and hence it misses
the event. As workaround, ring the doorbell when vDPA reports the
notify_area to QEMU.

Fixes: 5e7596ba7cb3 ("vdpa/sfc: introduce Xilinx vDPA driver")
Cc: sta...@dpdk.org

Signed-off-by: Vijay Kumar Srivastava 
Signed-off-by: Abhimanyu Saini 
---
 drivers/vdpa/sfc/sfc_vdpa_ops.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c
index b3d9b6c..63aa52d 100644
--- a/drivers/vdpa/sfc/sfc_vdpa_ops.c
+++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c
@@ -794,6 +794,8 @@
int vfio_dev_fd;
efx_rc_t rc;
unsigned int bar_offset;
+   volatile void *doorbell;
+   struct rte_pci_device *pci_dev;
struct rte_vdpa_device *vdpa_dev;
struct sfc_vdpa_ops_data *ops_data;
struct vfio_region_info reg = { .argsz = sizeof(reg) };
@@ -856,6 +858,18 @@
sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64,
  *offset);
 
+   pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev;
+   doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + *offset;
+
+   /*
+* virtio-net driver in VM sends queue notifications before
+* vDPA has a chance to setup the queues and notification area,
+* and hence the HW misses these doorbell notifications.
+* Since, it is safe to send duplicate doorbell, send another
+* doorbell from vDPA driver as workaround for this timing issue.
+*/
+   rte_write16(qid, doorbell);
+
return 0;
 }
 
-- 
1.8.3.1



[PATCH 2/4] vhost: restore device information in log messages

2022-06-27 Thread David Marchand
device information in the log messages was dropped.

Fixes: 52ade97e3641 ("vhost: fix physical address mapping")

Signed-off-by: David Marchand 
---
 lib/vhost/vhost_user.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 2b9a3b69fa..f324f822d6 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -144,7 +144,8 @@ async_dma_map(struct virtio_net *dev, bool do_map)
return;
 
/* DMA mapping errors won't stop 
VHOST_USER_SET_MEM_TABLE. */
-   VHOST_LOG_CONFIG(ERR, "DMA engine map 
failed\n");
+   VHOST_LOG_CONFIG(ERR, "(%s) DMA engine map 
failed\n",
+   dev->ifname);
}
}
 
@@ -160,7 +161,8 @@ async_dma_map(struct virtio_net *dev, bool do_map)
if (rte_errno == EINVAL)
return;
 
-   VHOST_LOG_CONFIG(ERR, "DMA engine unmap 
failed\n");
+   VHOST_LOG_CONFIG(ERR, "(%s) DMA engine unmap 
failed\n",
+   dev->ifname);
}
}
}
@@ -945,7 +947,8 @@ add_one_guest_page(struct virtio_net *dev, uint64_t 
guest_phys_addr,
dev->max_guest_pages * sizeof(*page),
RTE_CACHE_LINE_SIZE);
if (dev->guest_pages == NULL) {
-   VHOST_LOG_CONFIG(ERR, "cannot realloc guest_pages\n");
+   VHOST_LOG_CONFIG(ERR, "(%s) cannot realloc 
guest_pages\n",
+   dev->ifname);
rte_free(old_pages);
return -1;
}
-- 
2.36.1



[PATCH 4/4] vhost: prefix logs with context

2022-06-27 Thread David Marchand
We recently improved the log messages in the vhost library, adding some
context that helps filtering for a given vhost-user device.
However, some parts of the code were missed, and some later code changes
broke this new convention (fixes were sent previous to this patch).

Change the VHOST_LOG_CONFIG/DATA helpers and always ask for a string
used as context. This should help limit regressions on this topic.

Most of the time, the context is the vhost-user device socket path.
For the rest when a vhost-user device can not be related, generic
names were chosen:
- "dma", for vhost-user async DMA operations,
- "device", for vhost-user device creation and lookup,
- "thread", for threads management,

Signed-off-by: David Marchand 
---
 lib/vhost/iotlb.c  |  30 +-
 lib/vhost/socket.c | 129 -
 lib/vhost/vdpa.c   |   4 +-
 lib/vhost/vhost.c  | 144 -
 lib/vhost/vhost.h  |  20 +-
 lib/vhost/vhost_user.c | 642 +
 lib/vhost/virtio_net.c | 258 +
 7 files changed, 634 insertions(+), 593 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 5a5ba8b82a..35b4193606 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -70,18 +70,18 @@ vhost_user_iotlb_pending_insert(struct virtio_net *dev, 
struct vhost_virtqueue *
 
ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
if (ret) {
-   VHOST_LOG_CONFIG(DEBUG,
-   "(%s) IOTLB pool %s empty, clear entries for 
pending insertion\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, DEBUG,
+   "IOTLB pool %s empty, clear entries for pending 
insertion\n",
+   vq->iotlb_pool->name);
if (!TAILQ_EMPTY(&vq->iotlb_pending_list))
vhost_user_iotlb_pending_remove_all(vq);
else
vhost_user_iotlb_cache_random_evict(vq);
ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
if (ret) {
-   VHOST_LOG_CONFIG(ERR,
-   "(%s) IOTLB pool %s still empty, 
pending insertion failure\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, ERR,
+   "IOTLB pool %s still empty, pending insertion 
failure\n",
+   vq->iotlb_pool->name);
return;
}
}
@@ -169,18 +169,18 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, 
struct vhost_virtqueue *vq
 
ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
if (ret) {
-   VHOST_LOG_CONFIG(DEBUG,
-   "(%s) IOTLB pool %s empty, clear entries for 
cache insertion\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, DEBUG,
+   "IOTLB pool %s empty, clear entries for cache 
insertion\n",
+   vq->iotlb_pool->name);
if (!TAILQ_EMPTY(&vq->iotlb_list))
vhost_user_iotlb_cache_random_evict(vq);
else
vhost_user_iotlb_pending_remove_all(vq);
ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
if (ret) {
-   VHOST_LOG_CONFIG(ERR,
-   "(%s) IOTLB pool %s still empty, cache 
insertion failed\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, ERR,
+   "IOTLB pool %s still empty, cache insertion 
failed\n",
+   vq->iotlb_pool->name);
return;
}
}
@@ -320,7 +320,7 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index)
 
snprintf(pool_name, sizeof(pool_name), "iotlb_%u_%d_%d",
getpid(), dev->vid, vq_index);
-   VHOST_LOG_CONFIG(DEBUG, "(%s) IOTLB cache name: %s\n", dev->ifname, 
pool_name);
+   VHOST_LOG_CONFIG(dev->ifname, DEBUG, "IOTLB cache name: %s\n", 
pool_name);
 
/* If already created, free it and recreate */
vq->iotlb_pool = rte_mempool_lookup(pool_name);
@@ -332,8 +332,8 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index)
RTE_MEMPOOL_F_NO_CACHE_ALIGN |
RTE_MEMPOOL_F_SP_PUT);
if (!vq->iotlb_pool) {
-   VHOST_LOG_CONFIG(ERR, "(%s) Failed to create IOTLB cache pool 
%s\n",
-   dev->ifname, pool_name);
+   VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to create IOTLB 
cache pool %s\n",
+   pool_name);
return -1;
}
 
diff --g

[PATCH 0/4] Vhost logs fixes and improvement

2022-06-27 Thread David Marchand
Here is a series that fixes log messages (with one regression being
fixed in patch 2) and changes the VHOST_LOG_* helpers to enforce that
vhost log messages will always have some context/prefix to help
debugging on setups with many vhost ports.

The first three patches are low risk and can probably be merged in
v22.07.


-- 
David Marchand

David Marchand (4):
  vhost: add some trailing newline in log messages
  vhost: restore device information in log messages
  vhost: improve some datapath log messages
  vhost: prefix logs with context

 lib/vhost/iotlb.c  |  30 +-
 lib/vhost/socket.c | 129 -
 lib/vhost/vdpa.c   |   4 +-
 lib/vhost/vhost.c  | 144 +-
 lib/vhost/vhost.h  |  20 +-
 lib/vhost/vhost_user.c | 639 +
 lib/vhost/virtio_net.c | 258 +
 7 files changed, 634 insertions(+), 590 deletions(-)

-- 
2.36.1



[Bug 1042] [dpdk-22.07](ABI) unit_tests_eal/link_bonding_rssconf: link_bonding_rssconf_autotest test failed

2022-06-27 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=1042

Bug ID: 1042
   Summary: [dpdk-22.07](ABI) unit_tests_eal/link_bonding_rssconf:
link_bonding_rssconf_autotest test failed
   Product: DPDK
   Version: 22.03
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: examples
  Assignee: dev@dpdk.org
  Reporter: weix.l...@intel.com
  Target Milestone: ---

[Environment]

DPDK version: Use make showversion or for a non-released version: git remote -v
&& git show-ref --heads
commit 7cac53f205ebd04d8ebd3ee6a9dd84f698d4ada3 (HEAD -> main, tag: v22.07-rc2,
origin/main, origin/HEAD)
Author: Thomas Monjalon 
Date:   Mon Jun 27 04:03:44 2022 +0200version: 22.07-rc2
Signed-off-by: Thomas Monjalon 

Other software versions: N/A
OS: Red Hat Enterprise Linux 8.4 (Ootpa)/Linux 4.18.0-305.el8.x86_64
Compiler: gcc version 8.5.0 20210514 (Red Hat 8.5.0-4) (GCC)
Hardware platform: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
NIC hardware:  Intel Ethernet Controller XL710 for 40GbE QSFP+ 1583
NIC firmware:   i40e-4.18.0-305.el8.x86_64/8.70 0x8000c40f 1.3179.0

[Test Setup]
Steps to reproduce
List the steps to reproduce the issue.

1. Build the DPDK-22.07-rc2 lib with the following steps:
Note: /tmp/dpdk.tar.gz is the DPDK-22.07-rc2 packet.

tar zxfm /tmp/dpdk.tar.gz -C ~
cd ~/dpdk
cd .. && rm -rf dpdk_lib && mv dpdk dpdk_lib && cd dpdk_lib

rm -rf x86_64-native-linuxapp-gcc
CC=gcc meson -Denable_kmods=True -Dlibdir=lib  --default-library=shared
x86_64-native-linuxapp-gcc
ninja -C x86_64-native-linuxapp-gcc

rm -rf /root/tmp/dpdk_share_lib
DESTDIR=/root/tmp/dpdk_share_lib ninja -C x86_64-native-linuxapp-gcc -j 110
install

rm -rf /root/shared_lib_dpdk
mv /root/tmp/dpdk_share_lib/usr/local/lib /root/shared_lib_dpdk


2. Build the DPDK-21.11 APP with the following steps:
Note: /tmp/dpdk_abi.tar.gz is the DPDK-21.11 packet.

cd ..
tar zxf /tmp/dpdk_abi.tar.gz -C ~
cd ~/dpdk/
rm -rf x86_64-native-linuxapp-gcc
CC=gcc meson -Denable_kmods=True -Dlibdir=lib  --default-library=shared
x86_64-native-linuxapp-gcc
ninja -C x86_64-native-linuxapp-gcc

# delete the DPDK-21.11 target/lib and drivers directory
rm -rf x86_64-native-linuxapp-gcc/lib
rm -rf x86_64-native-linuxapp-gcc/drivers

3. Bind 2 NIC port to vfio-pci driver:

dpdk-devbind.py --force --bind=vfio-pci :18:00.0 :18:00.1

4. Start dpdk-test APP:

x86_64-native-linuxapp-gcc/app/test/dpdk-test -l 1-4 -n 4 -a :18:00.0 -a
:18:00.1  --file-prefix=dpdk_63552_20220624173253-d
/root/shared_lib_dpdk

5. Execute `link_bonding_rssconf_autotest` command to test:

RTE>>link_bonding_rssconf_autotest


Show the output from the previous commands.

[root@abi80 dpdk]# x86_64-native-linuxapp-gcc/app/test/dpdk-test -l 1-4 -n 4 -a
:18:00.0 -a :18:00.1  --file-prefix=dpdk_63552_20220624173253-d
/root/shared_lib_dpdk
EAL: Detected CPU lcores: 112
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/dpdk_63552_20220624173253/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found
for that size
EAL: VFIO support initialized
EAL: Using IOMMU type 1 (Type 1)
EAL: Ignore mapping IO port bar(1)
EAL: Ignore mapping IO port bar(4)
EAL: Probe PCI driver: net_i40e (8086:1583) device: :18:00.0 (socket 0)
i40e_GLQF_reg_init(): i40e device :18:00.0 changed global register
[0x002689a0]. original: 0x, new: 0x0029
i40e_GLQF_reg_init(): i40e device :18:00.0 changed global register
[0x00268ca4]. original: 0x1840, new: 0x9420
i40e_aq_debug_write_global_register(): i40e device :18:00.0 changed global
register [0x0026c7a0]. original: 0xa8, after: 0x28
EAL: Ignore mapping IO port bar(1)
EAL: Ignore mapping IO port bar(4)
EAL: Probe PCI driver: net_i40e (8086:1583) device: :18:00.1 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
APP: HPET is not enabled, using TSC as default timer
RTE>>link_bonding_rssconf_autotest
 + --- +
 + Test Suite : RSS Dynamic Configuration for Bonding Unit Test Suite
 + --- +
 + TestCase [ 0] : test_setup succeeded
Device with port_id=2 already stopped
Device with port_id=3 already stopped
Device with port_id=4 already stopped
Device with port_id=5 already stopped
bond_ethdev_promiscuous_disable(2684) - Failed to disable promiscuous mode for
port 2: Operation not supported
bond_ethdev_promiscuous_disable(2684) - Failed to disable promiscuous mode for
port 3: Operation not supported
bond_ethdev_promiscuous_disable(2684) - Failed to disable promiscuous mode for
port 4: Operation not supported
bond_ethdev_promiscuous_disable(2684) - Failed to disable promiscuous mode for
port 5: Operation not supported
bond_ethdev_allmulticast_disa

[PATCH 1/4] vhost: add some trailing newline in log messages

2022-06-27 Thread David Marchand
VHOST_LOG_* macros don't append a newline.
Add missing ones.

Fixes: e623e0c6d8a5 ("vhost: add reconnect ability")
Fixes: af1475918124 ("vhost: introduce API to start a specific driver")
Fixes: 2dfeebe26546 ("vhost: check return of mutex initialization")
Cc: sta...@dpdk.org

Signed-off-by: David Marchand 
---
 lib/vhost/socket.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index 7a0f63af14..24d60ca149 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -499,7 +499,7 @@ vhost_user_reconnect_init(void)
 
ret = pthread_mutex_init(&reconn_list.mutex, NULL);
if (ret < 0) {
-   VHOST_LOG_CONFIG(ERR, "%s: failed to initialize mutex", 
__func__);
+   VHOST_LOG_CONFIG(ERR, "%s: failed to initialize mutex\n", 
__func__);
return ret;
}
TAILQ_INIT(&reconn_list.head);
@@ -507,9 +507,9 @@ vhost_user_reconnect_init(void)
ret = rte_ctrl_thread_create(&reconn_tid, "vhost_reconn", NULL,
 vhost_user_client_reconnect, NULL);
if (ret != 0) {
-   VHOST_LOG_CONFIG(ERR, "failed to create reconnect thread");
+   VHOST_LOG_CONFIG(ERR, "failed to create reconnect thread\n");
if (pthread_mutex_destroy(&reconn_list.mutex))
-   VHOST_LOG_CONFIG(ERR, "%s: failed to destroy reconnect 
mutex", __func__);
+   VHOST_LOG_CONFIG(ERR, "%s: failed to destroy reconnect 
mutex\n", __func__);
}
 
return ret;
@@ -1170,8 +1170,8 @@ rte_vhost_driver_start(const char *path)
"vhost-events", NULL, fdset_event_dispatch,
&vhost_user.fdset);
if (ret != 0) {
-   VHOST_LOG_CONFIG(ERR, "(%s) failed to create fdset 
handling thread", path);
-
+   VHOST_LOG_CONFIG(ERR, "(%s) failed to create fdset 
handling thread\n",
+   path);
fdset_pipe_uninit(&vhost_user.fdset);
return -1;
}
-- 
2.36.1



[PATCH 3/4] vhost: improve some datapath log messages

2022-06-27 Thread David Marchand
Those messages were missed when adding socket context.
Fix this.

Signed-off-by: David Marchand 
---
 lib/vhost/vhost.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 4ebcb7448a..810bc71c9d 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -652,7 +652,7 @@ extern int vhost_data_log_level;
} \
snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), 
VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), "\n"); \
\
-   VHOST_LOG_DATA(DEBUG, "%s", packet); \
+   VHOST_LOG_DATA(DEBUG, "(%s) %s", device->ifname, packet); \
 } while (0)
 #else
 #define PRINT_PACKET(device, addr, size, header) do {} while (0)
@@ -866,8 +866,8 @@ vhost_vring_call_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq)
vq->signalled_used = new;
vq->signalled_used_valid = true;
 
-   VHOST_LOG_DATA(DEBUG, "%s: used_event_idx=%d, old=%d, new=%d\n",
-   __func__,
+   VHOST_LOG_DATA(DEBUG, "(%s) %s: used_event_idx=%d, old=%d, 
new=%d\n",
+   dev->ifname, __func__,
vhost_used_event(vq),
old, new);
 
-- 
2.36.1



[PATCH 2/2] eventdev: add function to enq new events to the same queue

2022-06-27 Thread pbhagavatula
From: Pavan Nikhilesh 

Introduce new fastpath function to enqueue events with type *OP_NEW*
to the same destination event queue.
This function can be used as a hint to the PMD to use optimized the
enqueue sequence.

Signed-off-by: Pavan Nikhilesh 
---
 lib/eventdev/eventdev_pmd.h  |  5 +-
 lib/eventdev/eventdev_private.c  | 13 ++
 lib/eventdev/rte_eventdev.h  | 80 +++-
 lib/eventdev/rte_eventdev_core.h | 11 -
 4 files changed, 105 insertions(+), 4 deletions(-)

diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h
index 69402668d8..f0bb97fb89 100644
--- a/lib/eventdev/eventdev_pmd.h
+++ b/lib/eventdev/eventdev_pmd.h
@@ -178,7 +178,10 @@ struct rte_eventdev {
/**< Pointer to PMD eth Tx adapter enqueue function. */
event_crypto_adapter_enqueue_t ca_enqueue;
 
-   uint64_t reserved_64s[4]; /**< Reserved for future fields */
+   event_enqueue_queue_burst_t enqueue_new_same_dest;
+   /**< PMD enqueue burst queue new function to same destination queue. */
+
+   uint64_t reserved_64s[3]; /**< Reserved for future fields */
void *reserved_ptrs[3];   /**< Reserved for future fields */
 } __rte_cache_aligned;
 
diff --git a/lib/eventdev/eventdev_private.c b/lib/eventdev/eventdev_private.c
index 1d3d9d357e..53d1db281b 100644
--- a/lib/eventdev/eventdev_private.c
+++ b/lib/eventdev/eventdev_private.c
@@ -24,6 +24,17 @@ dummy_event_enqueue_burst(__rte_unused void *port,
return 0;
 }
 
+static uint16_t
+dummy_event_enqueue_queue_burst(__rte_unused void *port,
+   __rte_unused uint8_t queue,
+   __rte_unused const struct rte_event ev[],
+   __rte_unused uint16_t nb_events)
+{
+   RTE_EDEV_LOG_ERR(
+   "event enqueue burst requested for unconfigured event device");
+   return 0;
+}
+
 static uint16_t
 dummy_event_dequeue(__rte_unused void *port, __rte_unused struct rte_event *ev,
__rte_unused uint64_t timeout_ticks)
@@ -90,6 +101,7 @@ event_dev_fp_ops_reset(struct rte_event_fp_ops *fp_op)
.enqueue_burst = dummy_event_enqueue_burst,
.enqueue_new_burst = dummy_event_enqueue_burst,
.enqueue_forward_burst = dummy_event_enqueue_burst,
+   .enqueue_new_same_dest = dummy_event_enqueue_queue_burst,
.dequeue = dummy_event_dequeue,
.dequeue_burst = dummy_event_dequeue_burst,
.maintain = dummy_event_maintain,
@@ -111,6 +123,7 @@ event_dev_fp_ops_set(struct rte_event_fp_ops *fp_op,
fp_op->enqueue_burst = dev->enqueue_burst;
fp_op->enqueue_new_burst = dev->enqueue_new_burst;
fp_op->enqueue_forward_burst = dev->enqueue_forward_burst;
+   fp_op->enqueue_new_same_dest = dev->enqueue_new_same_dest;
fp_op->dequeue = dev->dequeue;
fp_op->dequeue_burst = dev->dequeue_burst;
fp_op->maintain = dev->maintain;
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index 6a6f6ea4c1..2aa563740b 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -425,8 +425,9 @@ struct rte_event_dev_info {
 * A device that does not support bulk dequeue will set this as 1.
 */
uint32_t max_event_port_enqueue_depth;
-   /**< Maximum number of events can be enqueued at a time from an
-* event port by this device.
+   /**< Maximum number of events that can be enqueued at a time to a
+* event port by this device, applicable for rte_event::op is either
+* *RTE_EVENT_OP_FORWARD* or *RTE_EVENT_OP_RELEASE*
 * A device that does not support bulk enqueue will set this as 1.
 */
uint8_t max_event_port_links;
@@ -446,6 +447,12 @@ struct rte_event_dev_info {
 * device. These ports and queues are not accounted for in
 * max_event_ports or max_event_queues.
 */
+   int16_t max_event_port_enqueue_new_burst;
+   /**< Maximum number of events that can be enqueued at a time to a
+* event port by this device, applicable when rte_event::op is set to
+* *RTE_EVENT_OP_NEW*.
+* A device with no limits will set this value to -1.
+*/
 };
 
 /**
@@ -2082,6 +2089,75 @@ rte_event_enqueue_forward_burst(uint8_t dev_id, uint8_t 
port_id,
 fp_ops->enqueue_forward_burst);
 }
 
+/**
+ * Enqueue a burst of events objects of operation type *RTE_EVENT_OP_NEW* on
+ * an event device designated by its *dev_id* through the event port specified
+ * by *port_id* to the same queue specified by *queue_id*.
+ *
+ * Provides the same functionality as rte_event_enqueue_burst(), expect that
+ * application can use this API when the all objects in the burst contains
+ * the enqueue operation of the type *RTE_EVENT_OP_NEW* and are destined to the
+ * same queue. This specialized function can provide the addi

[PATCH 1/2] doc: add enqueue depth for new event type

2022-06-27 Thread pbhagavatula
From: Pavan Nikhilesh 

A new field ``max_event_port_enqueue_new_burst`` will be added to the
structure ``rte_event_dev_info``. The field defines the max enqueue
burst size of new events (OP_NEW) supported by the underlying event
device.

Signed-off-by: Pavan Nikhilesh 
---
 doc/guides/rel_notes/deprecation.rst | 5 +
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 4e5b23c53d..071317e8e3 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -125,3 +125,8 @@ Deprecation Notices
   applications should be updated to use the ``dmadev`` library instead,
   with the underlying HW-functionality being provided by the ``ioat`` or
   ``idxd`` dma drivers
+
+* eventdev: The structure ``rte_event_dev_info`` will be extended to include 
the
+  max enqueue burst size of new events supported by the underlying event 
device.
+  A new field ``max_event_port_enqueue_new_burst`` will be added to the 
structure
+  ``rte_event_dev_info`` in DPDK 22.11.
-- 
2.25.1



[PATCH v1 1/1] eventdev/eth_tx: use timestamp as dynamic mbuf field

2022-06-27 Thread Ganapati Kundapura
Added support to register timestamp dynamic field in mbuf.

Signed-off-by: Ganapati Kundapura 

diff --git a/lib/eventdev/rte_event_eth_tx_adapter.c 
b/lib/eventdev/rte_event_eth_tx_adapter.c
index c700fb7..23d5df3 100644
--- a/lib/eventdev/rte_event_eth_tx_adapter.c
+++ b/lib/eventdev/rte_event_eth_tx_adapter.c
@@ -74,6 +74,10 @@ do {\
} \
 } while (0)
 
+/* enable dynamic timestamp field in mbuf */
+uint64_t event_eth_tx_timestamp_dynflag;
+int event_eth_tx_timestamp_dynfield_offset = -1;
+
 /* Tx retry callback structure */
 struct txa_retry {
/* Ethernet port id */
@@ -197,7 +201,7 @@ static int
 txa_dev_id_array_init(void)
 {
if (txa_dev_id_array == NULL) {
-   int i;
+   int i, ret;
 
txa_dev_id_array = txa_memzone_array_get("txa_adapter_array",
sizeof(int),
@@ -207,6 +211,16 @@ txa_dev_id_array_init(void)
 
for (i = 0; i < RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE; i++)
txa_dev_id_array[i] = TXA_INVALID_DEV_ID;
+
+   /* Register mbuf dynamic timestamp field */
+   ret = rte_mbuf_dyn_tx_timestamp_register(
+   &event_eth_tx_timestamp_dynfield_offset,
+   &event_eth_tx_timestamp_dynflag);
+   if (ret != 0) {
+   RTE_EDEV_LOG_ERR("Error registering timestamp "
+"field/flag");
+   return -ENOMEM;
+   }
}
 
return 0;
diff --git a/lib/eventdev/rte_event_eth_tx_adapter.h 
b/lib/eventdev/rte_event_eth_tx_adapter.h
index 3908c2d..12e80a9 100644
--- a/lib/eventdev/rte_event_eth_tx_adapter.h
+++ b/lib/eventdev/rte_event_eth_tx_adapter.h
@@ -77,9 +77,19 @@ extern "C" {
 #include 
 
 #include 
+#include 
 
 #include "rte_eventdev.h"
 
+extern int event_eth_tx_timestamp_dynfield_offset;
+
+static inline rte_mbuf_timestamp_t *
+rte_event_eth_tx_timestamp_dynfield(struct rte_mbuf *mbuf)
+{
+   return RTE_MBUF_DYNFIELD(mbuf,
+   event_eth_tx_timestamp_dynfield_offset, rte_mbuf_timestamp_t *);
+}
+
 /**
  * Adapter configuration structure
  *
-- 
2.6.4



[PATCH v5] gro: bug fix in identifying fragmented packets

2022-06-27 Thread Kumara Parameshwaran
From: Kumara Parameshwaran 

A packet with RTE_PTYPE_L4_FRAG(0x300) contains both RTE_PTYPE_L4_TCP
(0x100) & RTE_PTYPE_L4_UDP (0x200). A fragmented packet as defined in
rte_mbuf_ptype.h cannot be recognized as other L4 types and hence the
GRO layer should not use IS_IPV4_TCP_PKT or IS_IPV4_UDP_PKT for
RTE_PTYPE_L4_FRAG. Hence, if the packet type is RTE_PTYPE_L4_FRAG the
ip header should be parsed to recognize the appropriate IP type and
invoke the respective gro handler.

Fixes: 1ca5e6740852 ("gro: support UDP/IPv4")
Cc: sta...@dpdk.org
Signed-off-by: Kumara Parameshwaran 
---
v1:
* Introduce IS_IPV4_FRAGMENT macro to check if fragmented packet and
  if true extract the IP header to identify the protocol type and
  invoke the appropriate gro handler. This is done for both
  rte_gro_reassemble and rte_gro_reassemble_burst APIs.
v2,v3,v4:
* Fix extra whitespace and column limit warnings
v5
* Use RTE_PTYPE_L4_FRAG to identify the fragmented packets in 
  IS_IPV4_TCP_PKT and IS_IPV4_VXLAN_TCP4_PKT

 lib/gro/rte_gro.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/gro/rte_gro.c b/lib/gro/rte_gro.c
index 6f7dd4d709..e35399fd42 100644
--- a/lib/gro/rte_gro.c
+++ b/lib/gro/rte_gro.c
@@ -32,6 +32,7 @@ static gro_tbl_pkt_count_fn 
tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM] = {
 
 #define IS_IPV4_TCP_PKT(ptype) (RTE_ETH_IS_IPV4_HDR(ptype) && \
((ptype & RTE_PTYPE_L4_TCP) == RTE_PTYPE_L4_TCP) && \
+   ((ptype & RTE_PTYPE_L4_FRAG) != RTE_PTYPE_L4_FRAG) && \
(RTE_ETH_IS_TUNNEL_PKT(ptype) == 0))
 
 #define IS_IPV4_UDP_PKT(ptype) (RTE_ETH_IS_IPV4_HDR(ptype) && \
@@ -40,6 +41,7 @@ static gro_tbl_pkt_count_fn 
tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM] = {
 
 #define IS_IPV4_VXLAN_TCP4_PKT(ptype) (RTE_ETH_IS_IPV4_HDR(ptype) && \
((ptype & RTE_PTYPE_L4_UDP) == RTE_PTYPE_L4_UDP) && \
+   ((ptype & RTE_PTYPE_L4_FRAG) != RTE_PTYPE_L4_FRAG) && \
((ptype & RTE_PTYPE_TUNNEL_VXLAN) == \
 RTE_PTYPE_TUNNEL_VXLAN) && \
((ptype & RTE_PTYPE_INNER_L4_TCP) == \
-- 
2.25.1



RE: [PATCH v4] net: fix checksum with unaligned buffer

2022-06-27 Thread Morten Brørup
> From: Emil Berg [mailto:emil.b...@ericsson.com]
> Sent: Monday, 27 June 2022 09.57
> 
> > From: Morten Brørup 
> > Sent: den 23 juni 2022 14:51
> >
> > > From: Morten Brørup [mailto:m...@smartsharesystems.com]
> > > Sent: Thursday, 23 June 2022 14.39
> > >
> > > With this patch, the checksum can be calculated on an unaligned
> buffer.
> > > I.e. the buf parameter is no longer required to be 16 bit aligned.
> > >
> > > The checksum is still calculated using a 16 bit aligned pointer, so
> > > the compiler can auto-vectorize the function's inner loop.
> > >
> > > When the buffer is unaligned, the first byte of the buffer is
> handled
> > > separately. Furthermore, the calculated checksum of the buffer is
> byte
> > > shifted before being added to the initial checksum, to compensate
> for
> > > the checksum having been calculated on the buffer shifted by one
> byte.
> > >
> > > v4:
> > > * Add copyright notice.
> > > * Include stdbool.h (Emil Berg).
> > > * Use RTE_PTR_ADD (Emil Berg).
> > > * Fix one more typo in commit message. Is 'unligned' even a word?
> > > v3:
> > > * Remove braces from single statement block.
> > > * Fix typo in commit message.
> > > v2:
> > > * Do not assume that the buffer is part of an aligned packet
> buffer.
> > >
> > > Bugzilla ID: 1035
> > > Cc: sta...@dpdk.org
> > >
> > > Signed-off-by: Morten Brørup 
> > > Tested-by: Emil Berg 
> > > ---
> > >  lib/net/rte_ip.h | 32 +++-
> > >  1 file changed, 27 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > > b502481670..738d643da0 100644
> > > --- a/lib/net/rte_ip.h
> > > +++ b/lib/net/rte_ip.h
> > > @@ -3,6 +3,7 @@
> > >   *  The Regents of the University of California.
> > >   * Copyright(c) 2010-2014 Intel Corporation.
> > >   * Copyright(c) 2014 6WIND S.A.
> > > + * Copyright(c) 2022 SmartShare Systems.
> > >   * All rights reserved.
> > >   */
> > >
> > > @@ -15,6 +16,7 @@
> > >   * IP-related defines
> > >   */
> > >
> > > +#include 
> > >  #include 
> > >
> > >  #ifdef RTE_EXEC_ENV_WINDOWS
> > > @@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t len,
> > > uint32_t sum)  {
> > >   /* extend strict-aliasing rules */
> > >   typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > > - const u16_p *u16_buf = (const u16_p *)buf;
> > > - const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > + const u16_p *u16_buf;
> > > + const u16_p *end;
> > > + uint32_t bsum = 0;
> > > + const bool unaligned = (uintptr_t)buf & 1;
> > > +
> > > + /* if buffer is unaligned, keeping it byte order independent */
> > > + if (unlikely(unaligned)) {
> > > + uint16_t first = 0;
> > > + if (unlikely(len == 0))
> > > + return 0;
> > > + ((unsigned char *)&first)[1] = *(const unsigned
> > char *)buf;
> > > + bsum += first;
> > > + buf = RTE_PTR_ADD(buf, 1);
> > > + len--;
> > > + }
> > >
> > > + /* aligned access for compiler auto-vectorization */
> > > + u16_buf = (const u16_p *)buf;
> > > + end = u16_buf + len / sizeof(*u16_buf);
> > >   for (; u16_buf != end; ++u16_buf)
> > > - sum += *u16_buf;
> > > + bsum += *u16_buf;
> > >
> > >   /* if length is odd, keeping it byte order independent */
> > >   if (unlikely(len % 2)) {
> > >   uint16_t left = 0;
> > >   *(unsigned char *)&left = *(const unsigned char
> > *)end;
> > > - sum += left;
> > > + bsum += left;
> > >   }
> > >
> > > - return sum;
> > > + /* if buffer is unaligned, swap the checksum bytes */
> > > + if (unlikely(unaligned))
> > > + bsum = (bsum & 0xFF00FF00) >> 8 | (bsum &
> > 0x00FF00FF) << 8;
> > > +
> > > + return sum + bsum;
> > >  }
> > >
> > >  /**
> > > --
> > > 2.17.1
> >
> > @Emil, thank you for thoroughly reviewing the previous versions.
> >
> > If your test succeeds and you are satisfied with the patch, remember
> to reply
> > with a "Tested-by" tag for patchwork.
> 
> The test succeeded and I'm satisfied with the patch. I added 'Tested-
> by: Emil Berg ' to the patch above, hopefully
> as you intended.

Thank you for testing! You don't need to put the tag inside the patch. Next 
time, just put the tag in your reply, like here, and Patchwork will catch it.

Tested-by: Emil Berg 

The same goes for other tags, like Acked-by, Reviewed-by, etc..

-Morten


Service core statistics MT safety

2022-06-27 Thread Mattias Rönnblom
Hi.

Is it safe to enable stats on MT safe services?

https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L366

It seems to me this would have to be an __atomic_add for this code to 
produce deterministic results.

Best regards,
Mattias


RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path

2022-06-27 Thread Ruifeng Wang
> -Original Message-
> From: Slava Ovsiienko 
> Sent: Monday, June 20, 2022 1:38 PM
> To: Ali Alnubani ; Ruifeng Wang
> ; Matan Azrad 
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> ; sta...@dpdk.org; nd 
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
> 
> Hi, Ruifeng

Hi Slava,

Thanks for your review.
> 
> My apologies for review delay.

Apologies too. I was on something else.

> As far I understand the hypothetical problem scenario is:
> - CPU core reorders reading of qwords of 16B vector
> - core reads the second 8B of CQE (old CQE values)
> - CQE update
> - core reads the first 8B of CQE (new CQE values)

Yes, This is the problem.
> 
> How the re-reading of CQEs can resolve the issue?
> This wrong scenario might happen on the second read and we would run into
> the same issue.

Here we are trying to ordering reading of a 16B vector (8B with op_own - high, 
and 8B without op_own - low).
The first read will load 16B. The second read will load and update low 8B (no 
op_own).
There are 2 possible status indicated by op_own: valid, invalid.
If CQE status is invalid, no problem, it will be ignored this time.
If CQE status is valid, the second read ensures the rest of CQE is no older 
than high 8B (with op_own). 
Assuming NIC updates op_own no earlier than the rest part of CQE, I think the 
second read ensures CQE content retrieved is correct.

> 
> In my opinion, the right solution to cover potential reordering should be:
> - read CQE
> - check CQE status (first 8B)

We don't need to check CQE status at the moment. See explanation above.
> - read memory barrier
> - read the rest of CQE
> 
> With best regards,
> Slava
> 
> > -Original Message-
> > From: Ali Alnubani 
> > Sent: Thursday, May 19, 2022 17:56
> > To: Ruifeng Wang ; Matan Azrad
> > ; Slava Ovsiienko 
> > Cc: dev@dpdk.org; honnappa.nagaraha...@arm.com; sta...@dpdk.org;
> > n...@arm.com
> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > > -Original Message-
> > > From: Ruifeng Wang 
> > > Sent: Tuesday, January 4, 2022 5:01 AM
> > > To: Matan Azrad ; Slava Ovsiienko
> > > 
> > > Cc: dev@dpdk.org; honnappa.nagaraha...@arm.com; sta...@dpdk.org;
> > > n...@arm.com; Ruifeng Wang 
> > > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > > vector path
> > >
> > > In NEON vector PMD, vector load loads two contiguous 8B of
> > > descriptor data into vector register. Given vector load ensures no
> > > 16B atomicity, read of the word that includes op_own field could be
> > > reordered after read of other words. In this case, some words could
> > > contain invalid data.
> > >
> > > Reloaded qword0 after read barrier to update vector register. This
> > > ensures that the fetched data is correct.
> > >
> > > Testpmd single core test on N1SDP/ThunderX2 showed no performance
> > > drop.
> > >
> > > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> > > completions")
> > > Cc: sta...@dpdk.org
> > >
> > > Signed-off-by: Ruifeng Wang 
> > > ---
> >
> > Tested with BlueField-2 and didn't see a performance impact.
> >
> > Tested-by: Ali Alnubani 
> >
> > Thanks,
> > Ali


Re: [PATCH] ip_frag: replace the rte memcpy with memcpy

2022-06-27 Thread Liang Ma
On Fri, Jun 24, 2022 at 06:25:10PM +0100, Konstantin Ananyev wrote:
> 23/06/2022 03:35, Stephen Hemminger пишет:
> > On Wed, 22 Jun 2022 23:49:39 +0100
> > Konstantin Ananyev  wrote:
> > 
> > > > @@ -26,7 +25,7 @@ static inline void __fill_ipv4hdr_frag(struct 
> > > > rte_ipv4_hdr *dst,
> > > > const struct rte_ipv4_hdr *src, uint16_t header_len,
> > > > uint16_t len, uint16_t fofs, uint16_t dofs, uint32_t mf)
> > > >{
> > > > -   rte_memcpy(dst, src, header_len);
> > > > +   memcpy(dst, src, header_len);
> > > 
> > > 
> > > I am fine with replacements in test and inside the lib, for cases
> > > where 'len' parameter is constant value.
> > > Though as I said before, here 'header_len' is not a constant value.
> > > Are you sure it will not introduce any performance regression?
> > 
> > Do you have any performance tests. The ip header options are very small.
> 
> 
> From my experience - usually it is not about how big or small amount
> we need to copy. It is about can compiler evaluate 'size' parameter
> for memcpy() at compilation time or not.
> If it can, great - it will most likely replace memcpy()
> with some really well optimized code.
> If not it has to generate a proper call to actual
> memcpy() function. Which again, can be well optimized, but the
> overhead of the function call itself can still be noticeable,
> specially for small copies.
> Anyway, as I can see, David already integrated these changes anyway.
> So now, we'll have to wait and see would anyone complain or not.
> About performance testing, the only one I am aware about:
> examples/ip_fragmentation
> 
> Konstantin
> 
> 
For some small(<1k) size, " rep movsb" is very fast. much faster than I
expected. I just noticed this in another application. 



Re: [PATCH v4] net: fix checksum with unaligned buffer

2022-06-27 Thread Mattias Rönnblom

On 2022-06-23 14:51, Morten Brørup wrote:

From: Morten Brørup [mailto:m...@smartsharesystems.com]
Sent: Thursday, 23 June 2022 14.39

With this patch, the checksum can be calculated on an unaligned buffer.
I.e. the buf parameter is no longer required to be 16 bit aligned.

The checksum is still calculated using a 16 bit aligned pointer, so the
compiler can auto-vectorize the function's inner loop.

When the buffer is unaligned, the first byte of the buffer is handled
separately. Furthermore, the calculated checksum of the buffer is byte
shifted before being added to the initial checksum, to compensate for
the
checksum having been calculated on the buffer shifted by one byte.

v4:
* Add copyright notice.
* Include stdbool.h (Emil Berg).
* Use RTE_PTR_ADD (Emil Berg).
* Fix one more typo in commit message. Is 'unligned' even a word?
v3:
* Remove braces from single statement block.
* Fix typo in commit message.
v2:
* Do not assume that the buffer is part of an aligned packet buffer.

Bugzilla ID: 1035
Cc: sta...@dpdk.org

Signed-off-by: Morten Brørup 
---
  lib/net/rte_ip.h | 32 +++-
  1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index b502481670..738d643da0 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -3,6 +3,7 @@
   *  The Regents of the University of California.
   * Copyright(c) 2010-2014 Intel Corporation.
   * Copyright(c) 2014 6WIND S.A.
+ * Copyright(c) 2022 SmartShare Systems.
   * All rights reserved.
   */

@@ -15,6 +16,7 @@
   * IP-related defines
   */

+#include 
  #include 

  #ifdef RTE_EXEC_ENV_WINDOWS
@@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t len,
uint32_t sum)
  {
/* extend strict-aliasing rules */
typedef uint16_t __attribute__((__may_alias__)) u16_p;
-   const u16_p *u16_buf = (const u16_p *)buf;
-   const u16_p *end = u16_buf + len / sizeof(*u16_buf);
+   const u16_p *u16_buf;
+   const u16_p *end;
+   uint32_t bsum = 0;
+   const bool unaligned = (uintptr_t)buf & 1;
+
+   /* if buffer is unaligned, keeping it byte order independent */
+   if (unlikely(unaligned)) {
+   uint16_t first = 0;
+   if (unlikely(len == 0))
+   return 0;
+   ((unsigned char *)&first)[1] = *(const unsigned char *)buf;
+   bsum += first;
+   buf = RTE_PTR_ADD(buf, 1);
+   len--;
+   }

+   /* aligned access for compiler auto-vectorization */


The compiler will be able to auto vectorize even unaligned accesses, 
just with different instructions. From what I can tell, there's no 
performance impact, at least not on the x86_64 systems I tried on.


I think you should remove the first special case conditional and use 
memcpy() instead of the cumbersome __may_alias__ construct to retrieve 
the data.



+   u16_buf = (const u16_p *)buf;
+   end = u16_buf + len / sizeof(*u16_buf);
for (; u16_buf != end; ++u16_buf)
-   sum += *u16_buf;
+   bsum += *u16_buf;

/* if length is odd, keeping it byte order independent */
if (unlikely(len % 2)) {
uint16_t left = 0;
*(unsigned char *)&left = *(const unsigned char *)end;
-   sum += left;
+   bsum += left;
}

-   return sum;
+   /* if buffer is unaligned, swap the checksum bytes */
+   if (unlikely(unaligned))
+   bsum = (bsum & 0xFF00FF00) >> 8 | (bsum & 0x00FF00FF) << 8;
+
+   return sum + bsum;
  }

  /**
--
2.17.1


@Emil, thank you for thoroughly reviewing the previous versions.

If your test succeeds and you are satisfied with the patch, remember to reply with a 
"Tested-by" tag for patchwork.



RE: Service core statistics MT safety

2022-06-27 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
> Sent: Monday, 27 June 2022 13.06
> 
> Hi.
> 
> Is it safe to enable stats on MT safe services?
> 
> https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L36
> 6
> 
> It seems to me this would have to be an __atomic_add for this code to
> produce deterministic results.

I agree. The same goes for the 'calls' field.



[PATCH] lib/hash: fix the return value description of rte_hash

2022-06-27 Thread Chenming C
The rte_hash lookup can return ZERO which is not a positive value.

Signed-off-by: Chenming C 
---
 lib/hash/rte_hash.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/hash/rte_hash.h b/lib/hash/rte_hash.h
index 7fa9702..1bf1aac 100644
--- a/lib/hash/rte_hash.h
+++ b/lib/hash/rte_hash.h
@@ -288,7 +288,7 @@ struct rte_hash *
  * @return
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOSPC if there is no space in the hash for this key.
- *   - A positive value that can be used by the caller as an offset into an
+ *   - A non-negative value that can be used by the caller as an offset into an
  * array of user data. This value is unique for this key. This
  * unique key id may be larger than the user specified entry count
  * when RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD flag is set.
@@ -312,7 +312,7 @@ struct rte_hash *
  * @return
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOSPC if there is no space in the hash for this key.
- *   - A positive value that can be used by the caller as an offset into an
+ *   - A non-negative value that can be used by the caller as an offset into an
  * array of user data. This value is unique for this key. This
  * unique key ID may be larger than the user specified entry count
  * when RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD flag is set.
@@ -343,7 +343,7 @@ struct rte_hash *
  * @return
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOENT if the key is not found.
- *   - A positive value that can be used by the caller as an offset into an
+ *   - A non-negative value that can be used by the caller as an offset into an
  * array of user data. This value is unique for this key, and is the same
  * value that was returned when the key was added.
  */
@@ -375,7 +375,7 @@ struct rte_hash *
  * @return
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOENT if the key is not found.
- *   - A positive value that can be used by the caller as an offset into an
+ *   - A non-negative value that can be used by the caller as an offset into an
  * array of user data. This value is unique for this key, and is the same
  * value that was returned when the key was added.
  */
@@ -442,7 +442,7 @@ struct rte_hash *
  * @param data
  *   Output with pointer to data returned from the hash table.
  * @return
- *   - A positive value that can be used by the caller as an offset into an
+ *   - A non-negative value that can be used by the caller as an offset into an
  * array of user data. This value is unique for this key, and is the same
  * value that was returned when the key was added.
  *   - -EINVAL if the parameters are invalid.
@@ -467,7 +467,7 @@ struct rte_hash *
  * @param data
  *   Output with pointer to data returned from the hash table.
  * @return
- *   - A positive value that can be used by the caller as an offset into an
+ *   - A non-negative value that can be used by the caller as an offset into an
  * array of user data. This value is unique for this key, and is the same
  * value that was returned when the key was added.
  *   - -EINVAL if the parameters are invalid.
@@ -490,7 +490,7 @@ struct rte_hash *
  * @return
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOENT if the key is not found.
- *   - A positive value that can be used by the caller as an offset into an
+ *   - A non-negative value that can be used by the caller as an offset into an
  * array of user data. This value is unique for this key, and is the same
  * value that was returned when the key was added.
  */
@@ -512,7 +512,7 @@ struct rte_hash *
  * @return
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOENT if the key is not found.
- *   - A positive value that can be used by the caller as an offset into an
+ *   - A non-negative value that can be used by the caller as an offset into an
  * array of user data. This value is unique for this key, and is the same
  * value that was returned when the key was added.
  */
-- 
1.8.3.1




RE: [PATCH v4] net: fix checksum with unaligned buffer

2022-06-27 Thread Morten Brørup
> From: Emil Berg [mailto:emil.b...@ericsson.com]
> Sent: Monday, 27 June 2022 14.51
> 
> > From: Emil Berg
> > Sent: den 27 juni 2022 14:46
> >
> > > From: Mattias Rönnblom 
> > > Sent: den 27 juni 2022 14:28
> > >
> > > On 2022-06-23 14:51, Morten Brørup wrote:
> > > >> From: Morten Brørup [mailto:m...@smartsharesystems.com]
> > > >> Sent: Thursday, 23 June 2022 14.39
> > > >>
> > > >> With this patch, the checksum can be calculated on an unaligned
> buffer.
> > > >> I.e. the buf parameter is no longer required to be 16 bit
> aligned.
> > > >>
> > > >> The checksum is still calculated using a 16 bit aligned pointer,
> so
> > > >> the compiler can auto-vectorize the function's inner loop.
> > > >>
> > > >> When the buffer is unaligned, the first byte of the buffer is
> > > >> handled separately. Furthermore, the calculated checksum of the
> > > >> buffer is byte shifted before being added to the initial
> checksum,
> > > >> to compensate for the checksum having been calculated on the
> buffer
> > > >> shifted by one byte.
> > > >>
> > > >> v4:
> > > >> * Add copyright notice.
> > > >> * Include stdbool.h (Emil Berg).
> > > >> * Use RTE_PTR_ADD (Emil Berg).
> > > >> * Fix one more typo in commit message. Is 'unligned' even a
> word?
> > > >> v3:
> > > >> * Remove braces from single statement block.
> > > >> * Fix typo in commit message.
> > > >> v2:
> > > >> * Do not assume that the buffer is part of an aligned packet
> buffer.
> > > >>
> > > >> Bugzilla ID: 1035
> > > >> Cc: sta...@dpdk.org
> > > >>
> > > >> Signed-off-by: Morten Brørup 
> > > >> ---
> > > >>   lib/net/rte_ip.h | 32 +++-
> > > >>   1 file changed, 27 insertions(+), 5 deletions(-)
> > > >>
> > > >> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > > >> b502481670..738d643da0 100644
> > > >> --- a/lib/net/rte_ip.h
> > > >> +++ b/lib/net/rte_ip.h
> > > >> @@ -3,6 +3,7 @@
> > > >>*  The Regents of the University of California.
> > > >>* Copyright(c) 2010-2014 Intel Corporation.
> > > >>* Copyright(c) 2014 6WIND S.A.
> > > >> + * Copyright(c) 2022 SmartShare Systems.
> > > >>* All rights reserved.
> > > >>*/
> > > >>
> > > >> @@ -15,6 +16,7 @@
> > > >>* IP-related defines
> > > >>*/
> > > >>
> > > >> +#include 
> > > >>   #include 
> > > >>
> > > >>   #ifdef RTE_EXEC_ENV_WINDOWS
> > > >> @@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t
> len,
> > > >> uint32_t sum)
> > > >>   {
> > > >>/* extend strict-aliasing rules */
> > > >>typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > > >> -  const u16_p *u16_buf = (const u16_p *)buf;
> > > >> -  const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > >> +  const u16_p *u16_buf;
> > > >> +  const u16_p *end;
> > > >> +  uint32_t bsum = 0;
> > > >> +  const bool unaligned = (uintptr_t)buf & 1;
> > > >> +
> > > >> +  /* if buffer is unaligned, keeping it byte order
> independent */
> > > >> +  if (unlikely(unaligned)) {
> > > >> +  uint16_t first = 0;
> > > >> +  if (unlikely(len == 0))
> > > >> +  return 0;
> > > >> +  ((unsigned char *)&first)[1] = *(const unsigned
> > > char *)buf;
> > > >> +  bsum += first;
> > > >> +  buf = RTE_PTR_ADD(buf, 1);
> > > >> +  len--;
> > > >> +  }
> > > >>
> > > >> +  /* aligned access for compiler auto-vectorization */
> > >
> > > The compiler will be able to auto vectorize even unaligned
> accesses,
> > > just with different instructions. From what I can tell, there's no
> > > performance impact, at least not on the x86_64 systems I tried on.
> > >
> > > I think you should remove the first special case conditional and
> use
> > > memcpy() instead of the cumbersome __may_alias__ construct to
> retrieve
> > > the data.
> > >
> >
> > Here:
> > https://www.agner.org/optimize/instruction_tables.pdf
> > it lists the latency of vmovdqa (aligned) as 6 cycles and the latency
> for
> > vmovdqu (unaligned) as 7 cycles. So I guess there can be some
> difference.
> > Although in practice I'm not sure what difference it makes. I've not
> seen any
> > difference in runtime between the two versions.
> >
> 
> Correction to my comment:
> Those stats are for some older CPU. For some newer CPUs such as Tiger
> Lake the stats seem to be the same regardless of aligned or unaligned.
> 

I agree that the memcpy method is more elegant and easy to read.

However, we would need to performance test the modified checksum function with 
a large number of CPUs to prove that we don't introduce a performance 
regression on any CPU architecture still supported by DPDK. And Emil already 
found a CPU where it costs 1 extra cycle per 16 bytes, which adds up to a total 
of ca. 91 extra cycles on a 1460 byte TCP packet.

So I opted for a solution with zero changes to the inner loop, so no 
performance retesting is required (for the previously supported use cases, 
whe

RE: [PATCH] net/af_xdp: make compatible with libbpf v0.8.0

2022-06-27 Thread Loftus, Ciara
> 
> On 6/24/22 13:23, Ciara Loftus wrote:
> > libbpf v0.8.0 deprecates the bpf_get_link_xdp_id and bpf_set_link_xdp_fd
> > functions. Use meson to detect if libbpf >= v0.7.0 is linked and if so, use
> > the recommended replacement functions bpf_xdp_query_id,
> bpf_xdp_attach
> > and bpf_xdp_detach which are available to use since libbpf v0.7.0.
> >
> > Also prevent linking with libbpf versions > v0.8.0.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> >   doc/guides/nics/af_xdp.rst  |  3 ++-
> >   drivers/net/af_xdp/compat.h | 36
> -
> >   drivers/net/af_xdp/meson.build  |  7 ++
> >   drivers/net/af_xdp/rte_eth_af_xdp.c | 19 +++
> >   4 files changed, 42 insertions(+), 23 deletions(-)
> 
> Don't we need to mention these changes in release notes?
> 
> >
> > diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> > index 56681c8365..9edb48df67 100644
> > --- a/doc/guides/nics/af_xdp.rst
> > +++ b/doc/guides/nics/af_xdp.rst
> > @@ -43,7 +43,8 @@ Prerequisites
> >   This is a Linux-specific PMD, thus the following prerequisites apply:
> >
> >   *  A Linux Kernel (version > v4.18) with XDP sockets configuration 
> > enabled;
> > -*  Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf <=v0.6.0
> > +*  Both libxdp >=v1.2.2 and libbpf <=v0.8.0 libraries installed, or, libbpf
> > +   <=v0.6.0.
> >   *  If using libxdp, it requires an environment variable called
> >  LIBXDP_OBJECT_PATH to be set to the location of where libxdp placed its
> bpf
> >  object files. This is usually in /usr/local/lib/bpf or 
> > /usr/local/lib64/bpf.
> > diff --git a/drivers/net/af_xdp/compat.h b/drivers/net/af_xdp/compat.h
> > index 28ea64aeaa..8f4ac8b5ea 100644
> > --- a/drivers/net/af_xdp/compat.h
> > +++ b/drivers/net/af_xdp/compat.h
> > @@ -60,7 +60,7 @@ tx_syscall_needed(struct xsk_ring_prod *q
> __rte_unused)
> >   }
> >   #endif
> >
> > -#ifdef RTE_NET_AF_XDP_LIBBPF_OBJ_OPEN
> > +#ifdef RTE_NET_AF_XDP_LIBBPF_V070
> 
> Typically version-based checks are considered as bad. Isn't it
> better use feature-based checks/defines?

Hi Andrew,

Thank you for the feedback. Is the feature-based checking something that we can 
push to the next release?

We are already using the pkg-config version-check method for other 
libraries/features in the meson.build file:
* libxdp >= v1.2.2 # earliest compatible libxdp release
* libbpf >= v0.7.0 # bpf_object__* functions
* libbpf >= v0.2.0 # shared umem feature

If we change to your suggested method I think we should change them all in one 
patch. IMO it's probably too close to the release to change them all right now. 
What do you think?

Thanks,
Ciara


Re: [PATCH] net/af_xdp: make compatible with libbpf v0.8.0

2022-06-27 Thread Andrew Rybchenko

On 6/27/22 17:17, Loftus, Ciara wrote:


On 6/24/22 13:23, Ciara Loftus wrote:

libbpf v0.8.0 deprecates the bpf_get_link_xdp_id and bpf_set_link_xdp_fd
functions. Use meson to detect if libbpf >= v0.7.0 is linked and if so, use
the recommended replacement functions bpf_xdp_query_id,

bpf_xdp_attach

and bpf_xdp_detach which are available to use since libbpf v0.7.0.

Also prevent linking with libbpf versions > v0.8.0.

Signed-off-by: Ciara Loftus 
---
   doc/guides/nics/af_xdp.rst  |  3 ++-
   drivers/net/af_xdp/compat.h | 36

-

   drivers/net/af_xdp/meson.build  |  7 ++
   drivers/net/af_xdp/rte_eth_af_xdp.c | 19 +++
   4 files changed, 42 insertions(+), 23 deletions(-)


Don't we need to mention these changes in release notes?



diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 56681c8365..9edb48df67 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -43,7 +43,8 @@ Prerequisites
   This is a Linux-specific PMD, thus the following prerequisites apply:

   *  A Linux Kernel (version > v4.18) with XDP sockets configuration enabled;
-*  Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf <=v0.6.0
+*  Both libxdp >=v1.2.2 and libbpf <=v0.8.0 libraries installed, or, libbpf
+   <=v0.6.0.
   *  If using libxdp, it requires an environment variable called
  LIBXDP_OBJECT_PATH to be set to the location of where libxdp placed its

bpf

  object files. This is usually in /usr/local/lib/bpf or 
/usr/local/lib64/bpf.
diff --git a/drivers/net/af_xdp/compat.h b/drivers/net/af_xdp/compat.h
index 28ea64aeaa..8f4ac8b5ea 100644
--- a/drivers/net/af_xdp/compat.h
+++ b/drivers/net/af_xdp/compat.h
@@ -60,7 +60,7 @@ tx_syscall_needed(struct xsk_ring_prod *q

__rte_unused)

   }
   #endif

-#ifdef RTE_NET_AF_XDP_LIBBPF_OBJ_OPEN
+#ifdef RTE_NET_AF_XDP_LIBBPF_V070


Typically version-based checks are considered as bad. Isn't it
better use feature-based checks/defines?


Hi Andrew,

Thank you for the feedback. Is the feature-based checking something that we can 
push to the next release?

We are already using the pkg-config version-check method for other 
libraries/features in the meson.build file:
* libxdp >= v1.2.2 # earliest compatible libxdp release
* libbpf >= v0.7.0 # bpf_object__* functions
* libbpf >= v0.2.0 # shared umem feature

If we change to your suggested method I think we should change them all in one 
patch. IMO it's probably too close to the release to change them all right now. 
What do you think?

Thanks,
Ciara


Hi Ciara,

yes, ideally we should avoid usage of version-based check everywhere,
but I don't think that it is critical to switch at once. We can use it
for new checks right now and rewrite old/existing checks a bit later in
the next release.

Please, note that my notes are related to review notes from Thomas who
asked by file_library() method is removed. Yes, it is confusing and it
is better to avoid it. Usage of feature-based checks would allow to
preserve find_library() as well.

Andrew.



Re: [PATCH] vdpa/sfc: handle sync issue between qemu and vhost-user

2022-06-27 Thread Andrew Rybchenko

When you send a new version, please, don't forget to specify
-v  on part format and use --in-reply-to
with the first mail ID. See contributors guidelines.

Also, new version should make it clear what is changed. See below.

On 6/27/22 11:49, abhimanyu.sa...@xilinx.com wrote:

From: Abhimanyu Saini 

When DPDK app is running in the VF, it sometimes rings the doorbell
before dev_config has had a chance to complete and hence it misses
the event. As workaround, ring the doorbell when vDPA reports the
notify_area to QEMU.

Fixes: 5e7596ba7cb3 ("vdpa/sfc: introduce Xilinx vDPA driver")


Above seems to be wrong. Nearby code is added later, so, it should be:

Fixes: 630be406dcbf ("vdpa/sfc: get queue notify area info")



Cc: sta...@dpdk.org

Signed-off-by: Vijay Kumar Srivastava 
Signed-off-by: Abhimanyu Saini 
---


Version changelog here.


  drivers/vdpa/sfc/sfc_vdpa_ops.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c
index b3d9b6c..63aa52d 100644
--- a/drivers/vdpa/sfc/sfc_vdpa_ops.c
+++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c
@@ -794,6 +794,8 @@
int vfio_dev_fd;
efx_rc_t rc;
unsigned int bar_offset;
+   volatile void *doorbell;
+   struct rte_pci_device *pci_dev;
struct rte_vdpa_device *vdpa_dev;
struct sfc_vdpa_ops_data *ops_data;
struct vfio_region_info reg = { .argsz = sizeof(reg) };
@@ -856,6 +858,18 @@
sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64,
  *offset);
  
+	pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev;

+   doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + *offset;
+
+   /*
+* virtio-net driver in VM sends queue notifications before
+* vDPA has a chance to setup the queues and notification area,
+* and hence the HW misses these doorbell notifications.
+* Since, it is safe to send duplicate doorbell, send another
+* doorbell from vDPA driver as workaround for this timing issue.
+*/
+   rte_write16(qid, doorbell);
+
return 0;
  }
  




RE: [PATCH] net/af_xdp: make compatible with libbpf v0.8.0

2022-06-27 Thread Loftus, Ciara
> 
> On 6/27/22 17:17, Loftus, Ciara wrote:
> >>
> >> On 6/24/22 13:23, Ciara Loftus wrote:
> >>> libbpf v0.8.0 deprecates the bpf_get_link_xdp_id and
> bpf_set_link_xdp_fd
> >>> functions. Use meson to detect if libbpf >= v0.7.0 is linked and if so, 
> >>> use
> >>> the recommended replacement functions bpf_xdp_query_id,
> >> bpf_xdp_attach
> >>> and bpf_xdp_detach which are available to use since libbpf v0.7.0.
> >>>
> >>> Also prevent linking with libbpf versions > v0.8.0.
> >>>
> >>> Signed-off-by: Ciara Loftus 
> >>> ---
> >>>doc/guides/nics/af_xdp.rst  |  3 ++-
> >>>drivers/net/af_xdp/compat.h | 36
> >> -
> >>>drivers/net/af_xdp/meson.build  |  7 ++
> >>>drivers/net/af_xdp/rte_eth_af_xdp.c | 19 +++
> >>>4 files changed, 42 insertions(+), 23 deletions(-)
> >>
> >> Don't we need to mention these changes in release notes?
> >>
> >>>
> >>> diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> >>> index 56681c8365..9edb48df67 100644
> >>> --- a/doc/guides/nics/af_xdp.rst
> >>> +++ b/doc/guides/nics/af_xdp.rst
> >>> @@ -43,7 +43,8 @@ Prerequisites
> >>>This is a Linux-specific PMD, thus the following prerequisites apply:
> >>>
> >>>*  A Linux Kernel (version > v4.18) with XDP sockets configuration
> enabled;
> >>> -*  Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf 
> >>> <=v0.6.0
> >>> +*  Both libxdp >=v1.2.2 and libbpf <=v0.8.0 libraries installed, or, 
> >>> libbpf
> >>> +   <=v0.6.0.
> >>>*  If using libxdp, it requires an environment variable called
> >>>   LIBXDP_OBJECT_PATH to be set to the location of where libxdp
> placed its
> >> bpf
> >>>   object files. This is usually in /usr/local/lib/bpf or 
> >>> /usr/local/lib64/bpf.
> >>> diff --git a/drivers/net/af_xdp/compat.h
> b/drivers/net/af_xdp/compat.h
> >>> index 28ea64aeaa..8f4ac8b5ea 100644
> >>> --- a/drivers/net/af_xdp/compat.h
> >>> +++ b/drivers/net/af_xdp/compat.h
> >>> @@ -60,7 +60,7 @@ tx_syscall_needed(struct xsk_ring_prod *q
> >> __rte_unused)
> >>>}
> >>>#endif
> >>>
> >>> -#ifdef RTE_NET_AF_XDP_LIBBPF_OBJ_OPEN
> >>> +#ifdef RTE_NET_AF_XDP_LIBBPF_V070
> >>
> >> Typically version-based checks are considered as bad. Isn't it
> >> better use feature-based checks/defines?
> >
> > Hi Andrew,
> >
> > Thank you for the feedback. Is the feature-based checking something that
> we can push to the next release?
> >
> > We are already using the pkg-config version-check method for other
> libraries/features in the meson.build file:
> > * libxdp >= v1.2.2 # earliest compatible libxdp release
> > * libbpf >= v0.7.0 # bpf_object__* functions
> > * libbpf >= v0.2.0 # shared umem feature
> >
> > If we change to your suggested method I think we should change them all
> in one patch. IMO it's probably too close to the release to change them all
> right now. What do you think?
> >
> > Thanks,
> > Ciara
> 
> Hi Ciara,
> 
> yes, ideally we should avoid usage of version-based check everywhere,
> but I don't think that it is critical to switch at once. We can use it
> for new checks right now and rewrite old/existing checks a bit later in
> the next release.
> 
> Please, note that my notes are related to review notes from Thomas who
> asked by file_library() method is removed. Yes, it is confusing and it
> is better to avoid it. Usage of feature-based checks would allow to
> preserve find_library() as well.

Thank you for the explanation.
In this case we want to check that the libbpf library is <=v0.8.0. At this 
moment in time v0.8.0 is the latest version of libbpf so we cannot check for a 
symbol that tells us the library is > v0.8.0. Can you think of a way to 
approach this without using the pkg-config version check method?

I've introduced this check to future-proof the PMD and ensure we only ever link 
with versions of libbpf that we've validated to be compatible with the PMD. 
When say v0.9.0 is released we can patch the PMD allowing for libbpf <= v0.9.0 
and make any necessary API changes as part of that patch. This should hopefully 
help avoid the scenario Thomas encountered.

Ciara

> 
> Andrew.



Re: [PATCH] examples/distributor: update dynamic configuration

2022-06-27 Thread Hunt, David

Hi Ömer,

I've a few comments:

On 21/06/2022 21:15, Abdullah Ömer Yamaç wrote:

In this patch,
* It is possible to switch the running mode of the distributor
using the command line argument.
* With "-c" parameter, you can run RX and Distributor
on the same core.
* Without "-c" parameter, you can run RX and Distributor
on the different core.
* Syntax error of the single RX and distributor core is fixed.
* When "-c" parameter is active, the wasted distributor core is
also deactivated in the main function.

Fixes: 4a7f40c0ff9a ("examples/distributor: add dedicated core")
Cc: sta...@dpdk.org

Signed-off-by: Abdullah Ömer Yamaç 

---
Cc: david.h...@intel.com
---
  doc/guides/sample_app_ug/dist_app.rst |   3 +-
  examples/distributor/main.c   | 205 +++---
  2 files changed, 152 insertions(+), 56 deletions(-)

diff --git a/doc/guides/sample_app_ug/dist_app.rst 
b/doc/guides/sample_app_ug/dist_app.rst
index 3bd03905c3..5c80561187 100644
--- a/doc/guides/sample_app_ug/dist_app.rst
+++ b/doc/guides/sample_app_ug/dist_app.rst
@@ -42,11 +42,12 @@ Running the Application
  
 ..  code-block:: console
  
-   .//examples/dpdk-distributor [EAL options] -- -p PORTMASK

+   .//examples/dpdk-distributor [EAL options] -- -p PORTMASK 
[-c]
  
 where,
  
 *   -p PORTMASK: Hexadecimal bitmask of ports to configure

+   *   -c: Combines the RX core with distribution core
  
  #. To run the application in linux environment with 10 lcores, 4 ports,

 issue the command:
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 02bf91f555..6e98f78054 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -39,6 +39,7 @@ volatile uint8_t quit_signal_rx;
  volatile uint8_t quit_signal_dist;
  volatile uint8_t quit_signal_work;
  unsigned int power_lib_initialised;
+bool enable_lcore_rx_distributor;
  
  static volatile struct app_stats {

struct {
@@ -256,14 +257,82 @@ lcore_rx(struct lcore_params *p)
}
app_stats.rx.rx_pkts += nb_rx;
  
-/*

- * You can run the distributor on the rx core with this code. Returned
- * packets are then send straight to the tx core.
- */
-#if 0
-   rte_distributor_process(d, bufs, nb_rx);
-   const uint16_t nb_ret = rte_distributor_returned_pktsd,
-   bufs, BURST_SIZE*2);
+   /*
+* Swap the following two lines if you want the rx traffic
+* to go directly to tx, no distribution.
+*/
+   struct rte_ring *out_ring = p->rx_dist_ring;
+   /* struct rte_ring *out_ring = p->dist_tx_ring; */
+
+   uint16_t sent = rte_ring_enqueue_burst(out_ring,
+   (void *)bufs, nb_rx, NULL);
+
+   app_stats.rx.enqueued_pkts += sent;
+   if (unlikely(sent < nb_rx)) {
+   app_stats.rx.enqdrop_pkts +=  nb_rx - sent;
+   RTE_LOG_DP(DEBUG, DISTRAPP,
+   "%s:Packet loss due to full ring\n", __func__);
+   while (sent < nb_rx)
+   rte_pktmbuf_free(bufs[sent++]);
+   }
+   if (++port == nb_ports)
+   port = 0;
+   }
+   if (power_lib_initialised)
+   rte_power_exit(rte_lcore_id());
+   /* set worker & tx threads quit flag */
+   printf("\nCore %u exiting rx task.\n", rte_lcore_id());
+   quit_signal = 1;
+   return 0;
+}
+
+static int
+lcore_rx_and_distributor(struct lcore_params *p)
+{
+   struct rte_distributor *d = p->d;
+   const uint16_t nb_ports = rte_eth_dev_count_avail();
+   const int socket_id = rte_socket_id();
+   uint16_t port;
+   struct rte_mbuf *bufs[BURST_SIZE*2];
+
+   RTE_ETH_FOREACH_DEV(port) {
+   /* skip ports that are not enabled */
+   if ((enabled_port_mask & (1 << port)) == 0)
+   continue;
+
+   if (rte_eth_dev_socket_id(port) > 0 &&
+   rte_eth_dev_socket_id(port) != socket_id)
+   printf("WARNING, port %u is on remote NUMA node to "
+   "RX thread.\n\tPerformance will not "
+   "be optimal.\n", port);
+   }
+
+   printf("\nCore %u doing packet RX and Distributor.\n", rte_lcore_id());
+   port = 0;
+   while (!quit_signal_rx) {
+
+   /* skip ports that are not enabled */
+   if ((enabled_port_mask & (1 << port)) == 0) {
+   if (++port == nb_ports)
+   port = 0;
+   continue;
+   }
+   const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs,
+   BURST_SIZE);
+   if (unlikely(nb_rx == 0)) {
+ 

[PATCH] doc: add event timer expiry drop stat

2022-06-27 Thread Naga Harish K S V
The structure ``rte_event_timer_adapter_stats`` will be
extended by adding a new field, ``evtim_drop_count``. This stat
will represent the number of times an event timer expiry is
dropped by the event timer adapter.

Signed-off-by: Naga Harish K S V 
---
 doc/guides/rel_notes/deprecation.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 4e5b23c53d..ab4ea67115 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -125,3 +125,9 @@ Deprecation Notices
   applications should be updated to use the ``dmadev`` library instead,
   with the underlying HW-functionality being provided by the ``ioat`` or
   ``idxd`` dma drivers
+
+* eventdev/timer: The structure ``rte_event_timer_adapter_stats`` will be
+  extended by adding a new field, ``evtim_drop_count``. This stat will
+  represent the number of times an event timer expiry is dropped
+  by the timer adapter. This field will be used by a future patch adding
+  support for periodic mode to the software timer adapter in DPDK 22.11.
-- 
2.25.1



Re: [PATCH] examples/distributor: update dynamic configuration

2022-06-27 Thread Omer Yamac

Hi David,

Thank you for your review. I have two questions. The first one is about 
new release. As I remember new DPDK realize will be published in a short 
time and my previous fix in that release. Therefore, Should I wait for 
that release to submit patch?


The other question is below,

On 27.06.2022 18:51, Hunt, David wrote:

Hi Ömer,

I've a few comments:

On 21/06/2022 21:15, Abdullah Ömer Yamaç wrote:

In this patch,
* It is possible to switch the running mode of the distributor
using the command line argument.
* With "-c" parameter, you can run RX and Distributor
on the same core.
* Without "-c" parameter, you can run RX and Distributor
on the different core.
* Syntax error of the single RX and distributor core is fixed.
* When "-c" parameter is active, the wasted distributor core is
also deactivated in the main function.

Fixes: 4a7f40c0ff9a ("examples/distributor: add dedicated core")
Cc: sta...@dpdk.org

Signed-off-by: Abdullah Ömer Yamaç 

---
Cc: david.h...@intel.com
---
  doc/guides/sample_app_ug/dist_app.rst |   3 +-
  examples/distributor/main.c   | 205 
+++---

  2 files changed, 152 insertions(+), 56 deletions(-)

diff --git a/doc/guides/sample_app_ug/dist_app.rst 
b/doc/guides/sample_app_ug/dist_app.rst

index 3bd03905c3..5c80561187 100644
--- a/doc/guides/sample_app_ug/dist_app.rst
+++ b/doc/guides/sample_app_ug/dist_app.rst
@@ -42,11 +42,12 @@ Running the Application
   ..  code-block:: console
  -   .//examples/dpdk-distributor [EAL options] -- -p 
PORTMASK
+   .//examples/dpdk-distributor [EAL options] -- -p 
PORTMASK [-c]

   where,
   *   -p PORTMASK: Hexadecimal bitmask of ports to configure
+   *   -c: Combines the RX core with distribution core
#. To run the application in linux environment with 10 lcores, 4 
ports,

 issue the command:
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 02bf91f555..6e98f78054 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -39,6 +39,7 @@ volatile uint8_t quit_signal_rx;
  volatile uint8_t quit_signal_dist;
  volatile uint8_t quit_signal_work;
  unsigned int power_lib_initialised;
+bool enable_lcore_rx_distributor;
static volatile struct app_stats {
struct {
@@ -256,14 +257,82 @@ lcore_rx(struct lcore_params *p)
}
app_stats.rx.rx_pkts += nb_rx;
  -/*
- * You can run the distributor on the rx core with this code. 
Returned

- * packets are then send straight to the tx core.
- */
-#if 0
-   rte_distributor_process(d, bufs, nb_rx);
-   const uint16_t nb_ret = rte_distributor_returned_pktsd,
-   bufs, BURST_SIZE*2);
+   /*
+* Swap the following two lines if you want the rx traffic
+* to go directly to tx, no distribution.
+*/
+   struct rte_ring *out_ring = p->rx_dist_ring;
+   /* struct rte_ring *out_ring = p->dist_tx_ring; */
+
+   uint16_t sent = rte_ring_enqueue_burst(out_ring,
+   (void *)bufs, nb_rx, NULL);
+
+   app_stats.rx.enqueued_pkts += sent;
+   if (unlikely(sent < nb_rx)) {
+   app_stats.rx.enqdrop_pkts +=  nb_rx - sent;
+   RTE_LOG_DP(DEBUG, DISTRAPP,
+   "%s:Packet loss due to full ring\n", __func__);
+   while (sent < nb_rx)
+   rte_pktmbuf_free(bufs[sent++]);
+   }
+   if (++port == nb_ports)
+   port = 0;
+   }
+   if (power_lib_initialised)
+   rte_power_exit(rte_lcore_id());
+   /* set worker & tx threads quit flag */
+   printf("\nCore %u exiting rx task.\n", rte_lcore_id());
+   quit_signal = 1;
+   return 0;
+}
+
+static int
+lcore_rx_and_distributor(struct lcore_params *p)
+{
+   struct rte_distributor *d = p->d;
+   const uint16_t nb_ports = rte_eth_dev_count_avail();
+   const int socket_id = rte_socket_id();
+   uint16_t port;
+   struct rte_mbuf *bufs[BURST_SIZE*2];
+
+   RTE_ETH_FOREACH_DEV(port) {
+   /* skip ports that are not enabled */
+   if ((enabled_port_mask & (1 << port)) == 0)
+   continue;
+
+   if (rte_eth_dev_socket_id(port) > 0 &&
+   rte_eth_dev_socket_id(port) != socket_id)
+   printf("WARNING, port %u is on remote NUMA node to "
+   "RX thread.\n\tPerformance will not "
+   "be optimal.\n", port);
+   }
+
+	printf("\nCore %u doing packet RX and Distributor.\n", 
rte_lcore_id());

+   port = 0;
+   while (!quit_signal_rx) {
+
+   /* skip ports that are not enabled */
+   if ((enabled_port_mask & (1 << port)) == 0) 

[PATCH] crypto/qat: fix docsis segmentation fault

2022-06-27 Thread Rebecca Troy
Currently if AES or DES algorithms fail for Docsis test suite,
a segmentation fault occurs when cryptodev_qat_autotest is ran.

This is due to a duplicate call of EVP_CIPHER_CTX_free for the
session context. Ctx is freed firstly in the bpi_cipher_ctx_init
function and then again at the end of qat_sym_session_configure_cipher
function.

This commit fixes this bug by removing the first instance
of EVP_CIPHER_CTX_free, leaving just the dedicated function in
the upper level to free the ctx.

Fixes: 98f060891615 ("crypto/qat: add symmetric session file")
Cc: fiona.tr...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Rebecca Troy 
---
 drivers/crypto/qat/qat_sym_session.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/qat/qat_sym_session.c 
b/drivers/crypto/qat/qat_sym_session.c
index e40c042ba9..568792b753 100644
--- a/drivers/crypto/qat/qat_sym_session.c
+++ b/drivers/crypto/qat/qat_sym_session.c
@@ -111,12 +111,12 @@ bpi_cipher_ctx_init(enum rte_crypto_cipher_algorithm 
cryptodev_algo,
const uint8_t *key, uint16_t key_length, void **ctx)
 {
const EVP_CIPHER *algo = NULL;
-   int ret;
+   int ret = 0;
*ctx = EVP_CIPHER_CTX_new();
 
if (*ctx == NULL) {
ret = -ENOMEM;
-   goto ctx_init_err;
+   return ret;
}
 
if (cryptodev_algo == RTE_CRYPTO_CIPHER_DES_DOCSISBPI)
@@ -130,14 +130,9 @@ bpi_cipher_ctx_init(enum rte_crypto_cipher_algorithm 
cryptodev_algo,
/* IV will be ECB encrypted whether direction is encrypt or decrypt*/
if (EVP_EncryptInit_ex(*ctx, algo, NULL, key, 0) != 1) {
ret = -EINVAL;
-   goto ctx_init_err;
+   return ret;
}
 
-   return 0;
-
-ctx_init_err:
-   if (*ctx != NULL)
-   EVP_CIPHER_CTX_free(*ctx);
return ret;
 }
 
-- 
2.34.1



[PATCH v4 0/6] add thread lifetime and attributes API

2022-06-27 Thread Tyler Retzlaff
add rte thread lifetime and attributes api. with these api additions
there is now sufficient platform abstracted thread api to remove the
use of pthread in the unit tests.

v4:
  * update version.map to show api from series added in 22.11 instead
of 22.07.
  * fix missing parameter name in rte_thread_func declaration causing
doxygen ci failure.

v3:
  * change rte_thread_func return type to uint32_t for exit value.
  * change rte_thread_join retval to be uint32_t (matched with the
return value from rte_thread_func).
  * introduce a wrapper for rte_thread_func on posix platforms to
adapt differences between rte_thread_func and pthread
start_routine.
  * remove interpretation / dereference of result from pthread_join
in posix implementation of rte_thread_join.
  * fix leak of dynamically allocated thread_routine_ctx on windows
in error paths.
  * don't cast and truncate NULL to integer value for rte_thread_join
when pthread_join returns no result.

v2:
  * split implementation of rte_thread_equal for windows / posix
and use pthread_equal for posix platforms.
  * remove parameter validation assertions and instead return
EINVAL for mandatory pointers to type that are NULL.
  * correct doxygen comment parameter name args -> arg

Tyler Retzlaff (6):
  eal: add thread attributes
  eal: add thread lifetime management
  eal: add basic rte thread ID equal API
  test/threads: add tests for thread lifetime API
  test/threads: add tests for thread attributes API
  test/threads: remove unit test use of pthread

 app/test/test_threads.c | 134 ++--
 lib/eal/common/meson.build  |   1 +
 lib/eal/common/rte_thread.c |  60 +++
 lib/eal/include/rte_thread.h| 187 ++
 lib/eal/unix/rte_thread.c   | 141 ++
 lib/eal/version.map |  10 ++
 lib/eal/windows/include/sched.h |   2 +-
 lib/eal/windows/rte_thread.c| 219 
 8 files changed, 705 insertions(+), 49 deletions(-)
 create mode 100644 lib/eal/common/rte_thread.c

-- 
1.8.3.1



[PATCH v4 3/6] eal: add basic rte thread ID equal API

2022-06-27 Thread Tyler Retzlaff
Add rte_thread_equal() that tests if two rte_thread_id are equal.

Signed-off-by: Narcisa Vasile 
Signed-off-by: Tyler Retzlaff 
Acked-by: Chengwen Feng 
---
 lib/eal/include/rte_thread.h | 19 +++
 lib/eal/unix/rte_thread.c|  6 ++
 lib/eal/version.map  |  1 +
 lib/eal/windows/rte_thread.c |  6 ++
 4 files changed, 32 insertions(+)

diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 3c8ff50..8359c1c 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -144,6 +144,25 @@ int rte_thread_create(rte_thread_t *thread_id,
 __rte_experimental
 rte_thread_t rte_thread_self(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check if 2 thread ids are equal.
+ *
+ * @param t1
+ *   First thread id.
+ *
+ * @param t2
+ *   Second thread id.
+ *
+ * @return
+ *   If the ids are equal, return nonzero.
+ *   Otherwise, return 0.
+ */
+__rte_experimental
+int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
+
 #ifdef RTE_HAS_CPUSET
 
 /**
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index d4c1a7f..37ebfcf 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -210,6 +210,12 @@ struct thread_routine_ctx {
return pthread_detach((pthread_t)thread_id.opaque_id);
 }
 
+int
+rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
+{
+   return pthread_equal((pthread_t)t1.opaque_id, (pthread_t)t2.opaque_id);
+}
+
 rte_thread_t
 rte_thread_self(void)
 {
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b404343..b5dc7f1 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -432,6 +432,7 @@ EXPERIMENTAL {
# added in 22.11
rte_thread_create;
rte_thread_detach;
+   rte_thread_equal;
rte_thread_join;
 };
 
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index ad71be4..1bc648e 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -287,6 +287,12 @@ struct thread_routine_ctx {
return 0;
 }
 
+int
+rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
+{
+   return t1.opaque_id == t2.opaque_id;
+}
+
 rte_thread_t
 rte_thread_self(void)
 {
-- 
1.8.3.1



[PATCH v4 2/6] eal: add thread lifetime management

2022-06-27 Thread Tyler Retzlaff
The *rte_thread_create()* function can optionally receive an
rte_thread_attr_t object that will cause the thread to be created with
the affinity and priority described by the attributes object. If
no rte_thread_attr_t is passed (parameter is NULL), the default
affinity and priority are used.

On Windows, the function executed by a thread when the thread starts is
represeneted by a function pointer of type DWORD (*func) (void*).
On other platforms, the function pointer is a void* (*func) (void*).

Performing a cast between these two types of function pointers to
uniformize the API on all platforms may result in undefined behavior.
TO fix this issue, a wrapper that respects the signature required by
CreateThread() has been created on Windows.

Signed-off-by: Narcisa Vasile 
Signed-off-by: Tyler Retzlaff 
---
 lib/eal/include/rte_thread.h|  75 ++
 lib/eal/unix/rte_thread.c   | 135 +
 lib/eal/version.map |   5 +
 lib/eal/windows/include/sched.h |   2 +-
 lib/eal/windows/rte_thread.c| 213 
 5 files changed, 389 insertions(+), 41 deletions(-)

diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 062308d..3c8ff50 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -31,6 +31,18 @@
 } rte_thread_t;
 
 /**
+ * Thread function
+ *
+ * Function pointer to thread start routine.
+ *
+ * @param arg
+ *   Argument passed to rte_thread_create().
+ * @return
+ *   Thread function exit value.
+ */
+typedef uint32_t (*rte_thread_func) (void *arg);
+
+/**
  * Thread priority values.
  */
 enum rte_thread_priority {
@@ -61,6 +73,69 @@ enum rte_thread_priority {
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
  *
+ * Create a new thread that will invoke the 'thread_func' routine.
+ *
+ * @param thread_id
+ *A pointer that will store the id of the newly created thread.
+ *
+ * @param thread_attr
+ *Attributes that are used at the creation of the new thread.
+ *
+ * @param thread_func
+ *The routine that the new thread will invoke when starting execution.
+ *
+ * @param arg
+ *Argument to be passed to the 'thread_func' routine.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_create(rte_thread_t *thread_id,
+   const rte_thread_attr_t *thread_attr,
+   rte_thread_func thread_func, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Waits for the thread identified by 'thread_id' to terminate
+ *
+ * @param thread_id
+ *The identifier of the thread.
+ *
+ * @param value_ptr
+ *Stores the exit status of the thread.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_join(rte_thread_t thread_id, uint32_t *value_ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Indicate that the return value of the thread is not needed and
+ * all thread resources should be release when the thread terminates.
+ *
+ * @param thread_id
+ *The id of the thread to be detached.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_detach(rte_thread_t thread_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
  * Get the id of the calling thread.
  *
  * @return
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 9126595..d4c1a7f 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -16,6 +16,11 @@ struct eal_tls_key {
pthread_key_t thread_index;
 };
 
+struct thread_routine_ctx {
+   rte_thread_func thread_func;
+   void *routine_args;
+};
+
 static int
 thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int *os_pri,
int *pol)
@@ -75,6 +80,136 @@ struct eal_tls_key {
return 0;
 }
 
+static void *
+thread_func_wrapper(void *arg)
+{
+   struct thread_routine_ctx ctx = *(struct thread_routine_ctx *)arg;
+
+   free(arg);
+
+   return (void *)(uintptr_t)ctx.thread_func(ctx.routine_args);
+}
+
+int
+rte_thread_create(rte_thread_t *thread_id,
+   const rte_thread_attr_t *thread_attr,
+   rte_thread_func thread_func, void *args)
+{
+   int ret = 0;
+   pthread_attr_t attr;
+   pthread_attr_t *attrp = NULL;
+   struct thread_routine_ctx *ctx;
+   struct sched_param param = {
+   .sched_priority = 0,
+   };
+   int policy = SCHED_OTHER;
+
+   ctx = calloc(1, sizeof(*ctx));
+   if (ctx == NULL) {
+   RTE_LOG(DEBUG, EAL, "Insufficient memory for thread context 
allocations\n");
+   ret = ENOMEM;
+   goto cleanup;
+ 

[PATCH v4 1/6] eal: add thread attributes

2022-06-27 Thread Tyler Retzlaff
Implement thread attributes for:

* thread affinity
* thread priority

Implement functions for managing thread attributes.

  Priority is represented through an enum that allows for two levels:

* RTE_THREAD_PRIORITY_NORMAL
* RTE_THREAD_PRIORITY_REALTIME_CRITICAL

  Affinity is described by the rte_cpuset_t type.

An rte_thread_attr_t object can be set to the default values
by calling rte_thread_attr_init().

Signed-off-by: Narcisa Vasile 
Signed-off-by: Tyler Retzlaff 
---
 lib/eal/common/meson.build   |  1 +
 lib/eal/common/rte_thread.c  | 60 
 lib/eal/include/rte_thread.h | 93 
 lib/eal/version.map  |  4 ++
 4 files changed, 158 insertions(+)
 create mode 100644 lib/eal/common/rte_thread.c

diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 917758c..1e77dba 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -36,6 +36,7 @@ sources += files(
 'rte_random.c',
 'rte_reciprocal.c',
 'rte_service.c',
+'rte_thread.c',
 'rte_version.c',
 )
 if is_linux or is_windows
diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
new file mode 100644
index 000..d204cc5
--- /dev/null
+++ b/lib/eal/common/rte_thread.c
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C) 2022 Microsoft Corporation
+ */
+
+#include 
+#include 
+
+int
+rte_thread_attr_init(rte_thread_attr_t *attr)
+{
+   if (attr == NULL)
+   return EINVAL;
+
+   CPU_ZERO(&attr->cpuset);
+   attr->priority = RTE_THREAD_PRIORITY_NORMAL;
+
+   return 0;
+}
+
+int
+rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
+   rte_cpuset_t *cpuset)
+{
+   if (thread_attr == NULL)
+   return EINVAL;
+
+   if (cpuset == NULL)
+   return EINVAL;
+
+   thread_attr->cpuset = *cpuset;
+
+   return 0;
+}
+
+int
+rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr,
+   rte_cpuset_t *cpuset)
+{
+   if (thread_attr == NULL)
+   return EINVAL;
+
+   if (cpuset == NULL)
+   return EINVAL;
+
+   *cpuset = thread_attr->cpuset;
+
+   return 0;
+}
+
+int
+rte_thread_attr_set_priority(rte_thread_attr_t *thread_attr,
+   enum rte_thread_priority priority)
+{
+   if (thread_attr == NULL)
+   return EINVAL;
+
+   thread_attr->priority = priority;
+
+   return 0;
+}
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 9e261bf..062308d 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -40,6 +40,18 @@ enum rte_thread_priority {
/**< highest thread priority allowed */
 };
 
+#ifdef RTE_HAS_CPUSET
+
+/**
+ * Representation for thread attributes.
+ */
+typedef struct {
+   enum rte_thread_priority priority; /**< thread priority */
+   rte_cpuset_t cpuset; /**< thread affinity */
+} rte_thread_attr_t;
+
+#endif /* RTE_HAS_CPUSET */
+
 /**
  * TLS key type, an opaque pointer.
  */
@@ -63,6 +75,87 @@ enum rte_thread_priority {
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
  *
+ * Initialize the attributes of a thread.
+ * These attributes can be passed to the rte_thread_create() function
+ * that will create a new thread and set its attributes according to attr.
+ *
+ * @param attr
+ *   Thread attributes to initialize.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_init(rte_thread_attr_t *attr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set the CPU affinity value in the thread attributes pointed to
+ * by 'thread_attr'.
+ *
+ * @param thread_attr
+ *   Points to the thread attributes in which affinity will be updated.
+ *
+ * @param cpuset
+ *   Points to the value of the affinity to be set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
+   rte_cpuset_t *cpuset);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the value of CPU affinity that is set in the thread attributes pointed
+ * to by 'thread_attr'.
+ *
+ * @param thread_attr
+ *   Points to the thread attributes from which affinity will be retrieved.
+ *
+ * @param cpuset
+ *   Pointer to the memory that will store the affinity.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr,
+   rte_cpuset_t *cpuset);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set the thread priority valu

[PATCH v4 6/6] test/threads: remove unit test use of pthread

2022-06-27 Thread Tyler Retzlaff
now that rte_thread provides thread lifetime functions stop using
pthread in unit tests.

Signed-off-by: Tyler Retzlaff 
---
 app/test/test_threads.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index 3c22cec..e0f18e4 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -3,7 +3,6 @@
  */
 
 #include 
-#include 
 
 #include 
 #include 
@@ -79,12 +78,11 @@
 static int
 test_thread_priority(void)
 {
-   pthread_t id;
rte_thread_t thread_id;
enum rte_thread_priority priority;
 
thread_id_ready = 0;
-   RTE_TEST_ASSERT(pthread_create(&id, NULL, thread_main, &thread_id) == 0,
+   RTE_TEST_ASSERT(rte_thread_create(&thread_id, NULL, thread_main, NULL) 
== 0,
"Failed to create thread");
 
while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0)
@@ -131,13 +129,12 @@
 static int
 test_thread_affinity(void)
 {
-   pthread_t id;
rte_thread_t thread_id;
rte_cpuset_t cpuset0;
rte_cpuset_t cpuset1;
 
thread_id_ready = 0;
-   RTE_TEST_ASSERT(pthread_create(&id, NULL, thread_main, &thread_id) == 0,
+   RTE_TEST_ASSERT(rte_thread_create(&thread_id, NULL, thread_main, NULL) 
== 0,
"Failed to create thread");
 
while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0)
-- 
1.8.3.1



[PATCH v4 5/6] test/threads: add tests for thread attributes API

2022-06-27 Thread Tyler Retzlaff
test basic functionality and demonstrate use of following thread
attributes api. additionally, test attributes are processed when
supplied to rte_thread_create().

* rte_thread_attr_init
* rte_thread_attr_set_affinity
* rte_thread_attr_get_affinity
* rte_thread_attr_set_priority

Signed-off-by: Narcisa Vasile 
Signed-off-by: Tyler Retzlaff 
---
 app/test/test_threads.c | 73 -
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index 1077373..3c22cec 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -17,7 +17,9 @@
 static uint32_t
 thread_main(void *arg)
 {
-   *(rte_thread_t *)arg = rte_thread_self();
+   if (arg != NULL)
+   *(rte_thread_t *)arg = rte_thread_self();
+
__atomic_store_n(&thread_id_ready, 1, __ATOMIC_RELEASE);
 
while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 1)
@@ -166,6 +168,73 @@
return 0;
 }
 
+static int
+test_thread_attributes_affinity(void)
+{
+   rte_thread_t thread_id;
+   rte_thread_attr_t attr;
+   rte_cpuset_t cpuset0;
+   rte_cpuset_t cpuset1;
+
+   RTE_TEST_ASSERT(rte_thread_attr_init(&attr) == 0,
+   "Failed to initialize thread attributes");
+
+   CPU_ZERO(&cpuset0);
+   RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(rte_thread_self(), 
&cpuset0) == 0,
+   "Failed to get thread affinity");
+   RTE_TEST_ASSERT(rte_thread_attr_set_affinity(&attr, &cpuset0) == 0,
+   "Failed to set thread attributes affinity");
+   RTE_TEST_ASSERT(rte_thread_attr_get_affinity(&attr, &cpuset1) == 0,
+   "Failed to get thread attributes affinity");
+   RTE_TEST_ASSERT(memcmp(&cpuset0, &cpuset1, sizeof(rte_cpuset_t)) == 0,
+   "Affinity should be stable");
+
+   thread_id_ready = 0;
+   RTE_TEST_ASSERT(rte_thread_create(&thread_id, &attr, thread_main, NULL) 
== 0,
+   "Failed to create attributes affinity thread.");
+
+   while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0)
+   ;
+
+   RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset1) == 0,
+   "Failed to get attributes thread affinity");
+   RTE_TEST_ASSERT(memcmp(&cpuset0, &cpuset1, sizeof(rte_cpuset_t)) == 0,
+   "Failed to apply affinity attributes");
+
+   __atomic_store_n(&thread_id_ready, 2, __ATOMIC_RELEASE);
+
+   return 0;
+}
+
+static int
+test_thread_attributes_priority(void)
+{
+   rte_thread_t thread_id;
+   rte_thread_attr_t attr;
+   enum rte_thread_priority priority;
+
+   RTE_TEST_ASSERT(rte_thread_attr_init(&attr) == 0,
+   "Failed to initialize thread attributes");
+   RTE_TEST_ASSERT(rte_thread_attr_set_priority(&attr, 
RTE_THREAD_PRIORITY_NORMAL) == 0,
+   "Failed to set thread attributes priority");
+
+   thread_id_ready = 0;
+   RTE_TEST_ASSERT(rte_thread_create(&thread_id, &attr, thread_main, NULL) 
== 0,
+   "Failed to create attributes priority thread.");
+
+   while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0)
+   ;
+
+   RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
+   "Failed to get thread priority");
+   RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_NORMAL,
+   "Failed to apply priority attributes");
+
+   __atomic_store_n(&thread_id_ready, 2, __ATOMIC_RELEASE);
+
+   return 0;
+}
+
 static struct unit_test_suite threads_test_suite = {
.suite_name = "threads autotest",
.setup = NULL,
@@ -175,6 +244,8 @@
TEST_CASE(test_thread_create_detach),
TEST_CASE(test_thread_affinity),
TEST_CASE(test_thread_priority),
+   TEST_CASE(test_thread_attributes_affinity),
+   TEST_CASE(test_thread_attributes_priority),
TEST_CASES_END()
}
 };
-- 
1.8.3.1



[PATCH v4 4/6] test/threads: add tests for thread lifetime API

2022-06-27 Thread Tyler Retzlaff
test basic functionality and demonstrate use of following thread
lifetime api.

* rte_thread_create
* rte_thread_detach
* rte_thread_join

Signed-off-by: Narcisa Vasile 
Signed-off-by: Tyler Retzlaff 
---
 app/test/test_threads.c | 54 +++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index b9d8b4e..1077373 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -14,7 +14,7 @@
 
 static uint32_t thread_id_ready;
 
-static void *
+static uint32_t
 thread_main(void *arg)
 {
*(rte_thread_t *)arg = rte_thread_self();
@@ -23,7 +23,55 @@
while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 1)
;
 
-   return NULL;
+   return 0;
+}
+
+static int
+test_thread_create_join(void)
+{
+   rte_thread_t thread_id;
+   rte_thread_t thread_main_id;
+
+   thread_id_ready = 0;
+   RTE_TEST_ASSERT(rte_thread_create(&thread_id, NULL, thread_main, 
&thread_main_id) == 0,
+   "Failed to create thread.");
+
+   while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0)
+   ;
+
+   RTE_TEST_ASSERT(rte_thread_equal(thread_id, thread_main_id) != 0,
+   "Unexpected thread id.");
+
+   __atomic_store_n(&thread_id_ready, 2, __ATOMIC_RELEASE);
+
+   RTE_TEST_ASSERT(rte_thread_join(thread_id, NULL) == 0,
+   "Failed to join thread.");
+
+   return 0;
+}
+
+static int
+test_thread_create_detach(void)
+{
+   rte_thread_t thread_id;
+   rte_thread_t thread_main_id;
+
+   thread_id_ready = 0;
+   RTE_TEST_ASSERT(rte_thread_create(&thread_id, NULL, thread_main,
+   &thread_main_id) == 0, "Failed to create thread.");
+
+   while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0)
+   ;
+
+   RTE_TEST_ASSERT(rte_thread_equal(thread_id, thread_main_id) != 0,
+   "Unexpected thread id.");
+
+   __atomic_store_n(&thread_id_ready, 2, __ATOMIC_RELEASE);
+
+   RTE_TEST_ASSERT(rte_thread_detach(thread_id) == 0,
+   "Failed to detach thread.");
+
+   return 0;
 }
 
 static int
@@ -123,6 +171,8 @@
.setup = NULL,
.teardown = NULL,
.unit_test_cases = {
+   TEST_CASE(test_thread_create_join),
+   TEST_CASE(test_thread_create_detach),
TEST_CASE(test_thread_affinity),
TEST_CASE(test_thread_priority),
TEST_CASES_END()
-- 
1.8.3.1



Re: [PATCH v4] net: fix checksum with unaligned buffer

2022-06-27 Thread Mattias Rönnblom

On 2022-06-27 15:22, Morten Brørup wrote:

From: Emil Berg [mailto:emil.b...@ericsson.com]
Sent: Monday, 27 June 2022 14.51


From: Emil Berg
Sent: den 27 juni 2022 14:46


From: Mattias Rönnblom 
Sent: den 27 juni 2022 14:28

On 2022-06-23 14:51, Morten Brørup wrote:

From: Morten Brørup [mailto:m...@smartsharesystems.com]
Sent: Thursday, 23 June 2022 14.39

With this patch, the checksum can be calculated on an unaligned

buffer.

I.e. the buf parameter is no longer required to be 16 bit

aligned.


The checksum is still calculated using a 16 bit aligned pointer,

so

the compiler can auto-vectorize the function's inner loop.

When the buffer is unaligned, the first byte of the buffer is
handled separately. Furthermore, the calculated checksum of the
buffer is byte shifted before being added to the initial

checksum,

to compensate for the checksum having been calculated on the

buffer

shifted by one byte.

v4:
* Add copyright notice.
* Include stdbool.h (Emil Berg).
* Use RTE_PTR_ADD (Emil Berg).
* Fix one more typo in commit message. Is 'unligned' even a

word?

v3:
* Remove braces from single statement block.
* Fix typo in commit message.
v2:
* Do not assume that the buffer is part of an aligned packet

buffer.


Bugzilla ID: 1035
Cc: sta...@dpdk.org

Signed-off-by: Morten Brørup 
---
   lib/net/rte_ip.h | 32 +++-
   1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
b502481670..738d643da0 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -3,6 +3,7 @@
*  The Regents of the University of California.
* Copyright(c) 2010-2014 Intel Corporation.
* Copyright(c) 2014 6WIND S.A.
+ * Copyright(c) 2022 SmartShare Systems.
* All rights reserved.
*/

@@ -15,6 +16,7 @@
* IP-related defines
*/

+#include 
   #include 

   #ifdef RTE_EXEC_ENV_WINDOWS
@@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t

len,

uint32_t sum)
   {
/* extend strict-aliasing rules */
typedef uint16_t __attribute__((__may_alias__)) u16_p;
-   const u16_p *u16_buf = (const u16_p *)buf;
-   const u16_p *end = u16_buf + len / sizeof(*u16_buf);
+   const u16_p *u16_buf;
+   const u16_p *end;
+   uint32_t bsum = 0;
+   const bool unaligned = (uintptr_t)buf & 1;
+
+   /* if buffer is unaligned, keeping it byte order

independent */

+   if (unlikely(unaligned)) {
+   uint16_t first = 0;
+   if (unlikely(len == 0))
+   return 0;
+   ((unsigned char *)&first)[1] = *(const unsigned

char *)buf;

+   bsum += first;
+   buf = RTE_PTR_ADD(buf, 1);
+   len--;
+   }

+   /* aligned access for compiler auto-vectorization */


The compiler will be able to auto vectorize even unaligned

accesses,

just with different instructions. From what I can tell, there's no
performance impact, at least not on the x86_64 systems I tried on.

I think you should remove the first special case conditional and

use

memcpy() instead of the cumbersome __may_alias__ construct to

retrieve

the data.



Here:
https://www.agner.org/optimize/instruction_tables.pdf
it lists the latency of vmovdqa (aligned) as 6 cycles and the latency

for

vmovdqu (unaligned) as 7 cycles. So I guess there can be some

difference.

Although in practice I'm not sure what difference it makes. I've not

seen any

difference in runtime between the two versions.



Correction to my comment:
Those stats are for some older CPU. For some newer CPUs such as Tiger
Lake the stats seem to be the same regardless of aligned or unaligned.



I agree that the memcpy method is more elegant and easy to read.

However, we would need to performance test the modified checksum function with 
a large number of CPUs to prove that we don't introduce a performance 
regression on any CPU architecture still supported by DPDK. And Emil already 
found a CPU where it costs 1 extra cycle per 16 bytes, which adds up to a total 
of ca. 91 extra cycles on a 1460 byte TCP packet.



I think you've misunderstood what latency means in such tables. It's a 
data dependency thing, not a measure of throughput. The throughput is 
*much* higher. My guess would be two such instruction per clock.


For your 1460 bytes example, my Zen3 AMD needs performs identical with 
both the current DPDK implementation, your patch, and a memcpy()-ified 
version of the current implementation. They all need ~130 clock 
cycles/packet, with warm caches. IPC is 3 instructions per cycle, but 
obvious not all instructions are SIMD.


The main issue with checksumming on the CPU is, in my experience, not 
that you don't have enough compute, but that you trash the caches.



So I opted for a solution with zero changes to the inner loop, so no 
performance retesting is required (for the previously supported use cases, 
where the buffer is aligned).



You will see performance degradation wit

Re: Service core statistics MT safety

2022-06-27 Thread Mattias Rönnblom

On 2022-06-27 19:39, Honnappa Nagarahalli wrote:





From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
Sent: Monday, 27 June 2022 13.06

Hi.

Is it safe to enable stats on MT safe services?

https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L3
6
6

It seems to me this would have to be an __atomic_add for this code to
produce deterministic results.


I agree. The same goes for the 'calls' field.

The calling function does the locking. 
https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L398

For more information you can look at: 
https://github.com/DPDK/dpdk/blob/main/lib/eal/include/rte_service.h#L120



What about the
https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L404
call (for MT safe services)?

There's no lock held there.


RE: Service core statistics MT safety

2022-06-27 Thread Honnappa Nagarahalli

> 
> > From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
> > Sent: Monday, 27 June 2022 13.06
> >
> > Hi.
> >
> > Is it safe to enable stats on MT safe services?
> >
> > https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L3
> > 6
> > 6
> >
> > It seems to me this would have to be an __atomic_add for this code to
> > produce deterministic results.
> 
> I agree. The same goes for the 'calls' field.
The calling function does the locking. 
https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L398

For more information you can look at: 
https://github.com/DPDK/dpdk/blob/main/lib/eal/include/rte_service.h#L120



[PATCH] raw/ioat: Check for the NULL pointer after calling malloc

2022-06-27 Thread 835703180
From: Shiqi Liu <835703...@qq.com>

As the possible failure of the malloc(), the not_checked and
checked could be NULL pointer.
Therefore, it should be better to check it in order to avoid
the dereference of the NULL pointer.

Fixes: b7aaf417f93 ("raw/ioat: add bus driver for device scanning 
automatically")

Signed-off-by: Shiqi Liu <835703...@qq.com>
---
 drivers/raw/ioat/idxd_bus.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/raw/ioat/idxd_bus.c b/drivers/raw/ioat/idxd_bus.c
index 539f51b1b1..8ab4ed5885 100644
--- a/drivers/raw/ioat/idxd_bus.c
+++ b/drivers/raw/ioat/idxd_bus.c
@@ -301,6 +301,10 @@ dsa_scan(void)
IOAT_PMD_DEBUG("%s(): found %s/%s", __func__, path, wq->d_name);
 
dev = malloc(sizeof(*dev));
+   if (dev == NULL) {
+   closedir(dev_dir);
+   return ENOMEM;
+   }
if (dsa_addr_parse(wq->d_name, &dev->addr) < 0) {
IOAT_PMD_ERR("Error parsing WQ name: %s", wq->d_name);
free(dev);
-- 
2.35.1.windows.2



RE: Service core statistics MT safety

2022-06-27 Thread Honnappa Nagarahalli

> >>
> >>> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
> >>> Sent: Monday, 27 June 2022 13.06
> >>>
> >>> Hi.
> >>>
> >>> Is it safe to enable stats on MT safe services?
> >>>
> >>> https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#
> >>> L3
> >>> 6
> >>> 6
> >>>
> >>> It seems to me this would have to be an __atomic_add for this code
> >>> to produce deterministic results.
> >>
> >> I agree. The same goes for the 'calls' field.
> > The calling function does the locking.
> > https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L3
> > 98
> >
> > For more information you can look at:
> > https://github.com/DPDK/dpdk/blob/main/lib/eal/include/rte_service.h#L
> > 120
> >
> 
> What about the
> https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L404
> call (for MT safe services)?
> 
> There's no lock held there.
Good point.
This is the case where the service running in service cores is MT safe. 
However, the stats are incremented outside of the MT Safety mechanism employed 
by the service. So, yes, this and other updates in the function 
'service_runner_do_callback' need to be updated atomically.


Re: Service core statistics MT safety

2022-06-27 Thread Mattias Rönnblom
On 2022-06-27 20:19, Honnappa Nagarahalli wrote:
> 

> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
> Sent: Monday, 27 June 2022 13.06
>
> Hi.
>
> Is it safe to enable stats on MT safe services?
>
> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444731-6096fdb16385f88f&q=1&e=27b94605-d1e2-40b6-af6d-9ebc54d5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2Feal%2Fcommon%2Frte_service.c%23
> L3
> 6
> 6
>
> It seems to me this would have to be an __atomic_add for this code
> to produce deterministic results.

 I agree. The same goes for the 'calls' field.
>>> The calling function does the locking.
>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444731-5ce419f8bf9f9b76&q=1&e=27b94605-d1e2-40b6-af6d-9ebc54d5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2Feal%2Fcommon%2Frte_service.c%23L3
>>> 98
>>>
>>> For more information you can look at:
>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444731-ba0d1416f08856f0&q=1&e=27b94605-d1e2-40b6-af6d-9ebc54d5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2Feal%2Finclude%2Frte_service.h%23L
>>> 120
>>>
>>
>> What about the
>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444731-b64334addc78c264&q=1&e=27b94605-d1e2-40b6-af6d-9ebc54d5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2Feal%2Fcommon%2Frte_service.c%23L404
>> call (for MT safe services)?
>>
>> There's no lock held there.
> Good point.
> This is the case where the service running in service cores is MT safe. 
> However, the stats are incremented outside of the MT Safety mechanism 
> employed by the service. So, yes, this and other updates in the function 
> 'service_runner_do_callback' need to be updated atomically.

Maybe a better solution would be to move this to the core_state struct 
(and eliminate the "calls" field since the same information is already 
in the core_state struct). The contention on these cache lines will be 
pretty crazy for services with short run time (say a thousand cycles or 
less per call), assuming they are mapped to many cores.

Idle service cores will basically do nothing else than stall waiting for 
these lines, I suspect, hampering the progress of more busy cores.


RE: [PATCH v4] net: fix checksum with unaligned buffer

2022-06-27 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> Sent: Monday, 27 June 2022 19.23
> 
> On 2022-06-27 15:22, Morten Brørup wrote:
> >> From: Emil Berg [mailto:emil.b...@ericsson.com]
> >> Sent: Monday, 27 June 2022 14.51
> >>
> >>> From: Emil Berg
> >>> Sent: den 27 juni 2022 14:46
> >>>
>  From: Mattias Rönnblom 
>  Sent: den 27 juni 2022 14:28
> 
>  On 2022-06-23 14:51, Morten Brørup wrote:
> >> From: Morten Brørup [mailto:m...@smartsharesystems.com]
> >> Sent: Thursday, 23 June 2022 14.39
> >>
> >> With this patch, the checksum can be calculated on an unaligned
> >> buffer.
> >> I.e. the buf parameter is no longer required to be 16 bit
> >> aligned.
> >>
> >> The checksum is still calculated using a 16 bit aligned pointer,
> >> so
> >> the compiler can auto-vectorize the function's inner loop.
> >>
> >> When the buffer is unaligned, the first byte of the buffer is
> >> handled separately. Furthermore, the calculated checksum of the
> >> buffer is byte shifted before being added to the initial
> >> checksum,
> >> to compensate for the checksum having been calculated on the
> >> buffer
> >> shifted by one byte.
> >>
> >> v4:
> >> * Add copyright notice.
> >> * Include stdbool.h (Emil Berg).
> >> * Use RTE_PTR_ADD (Emil Berg).
> >> * Fix one more typo in commit message. Is 'unligned' even a
> >> word?
> >> v3:
> >> * Remove braces from single statement block.
> >> * Fix typo in commit message.
> >> v2:
> >> * Do not assume that the buffer is part of an aligned packet
> >> buffer.
> >>
> >> Bugzilla ID: 1035
> >> Cc: sta...@dpdk.org
> >>
> >> Signed-off-by: Morten Brørup 

[...]

> 
>  The compiler will be able to auto vectorize even unaligned
> >> accesses,
>  just with different instructions. From what I can tell, there's no
>  performance impact, at least not on the x86_64 systems I tried on.
> 
>  I think you should remove the first special case conditional and
> >> use
>  memcpy() instead of the cumbersome __may_alias__ construct to
> >> retrieve
>  the data.
> 
> >>>
> >>> Here:
> >>> https://www.agner.org/optimize/instruction_tables.pdf
> >>> it lists the latency of vmovdqa (aligned) as 6 cycles and the
> latency
> >> for
> >>> vmovdqu (unaligned) as 7 cycles. So I guess there can be some
> >> difference.
> >>> Although in practice I'm not sure what difference it makes. I've
> not
> >> seen any
> >>> difference in runtime between the two versions.
> >>>
> >>
> >> Correction to my comment:
> >> Those stats are for some older CPU. For some newer CPUs such as
> Tiger
> >> Lake the stats seem to be the same regardless of aligned or
> unaligned.
> >>
> >
> > I agree that the memcpy method is more elegant and easy to read.
> >
> > However, we would need to performance test the modified checksum
> function with a large number of CPUs to prove that we don't introduce a
> performance regression on any CPU architecture still supported by DPDK.
> And Emil already found a CPU where it costs 1 extra cycle per 16 bytes,
> which adds up to a total of ca. 91 extra cycles on a 1460 byte TCP
> packet.
> >
> 
> I think you've misunderstood what latency means in such tables. It's a
> data dependency thing, not a measure of throughput. The throughput is
> *much* higher. My guess would be two such instruction per clock.
> 
> For your 1460 bytes example, my Zen3 AMD needs performs identical with
> both the current DPDK implementation, your patch, and a memcpy()-ified
> version of the current implementation. They all need ~130 clock
> cycles/packet, with warm caches. IPC is 3 instructions per cycle, but
> obvious not all instructions are SIMD.

You're right, I wasn't thinking deeper about it before extrapolating.

Great to see some real numbers! I wish someone would do the same testing on an 
old ARM CPU, so we could also see the other end of the scale.

> The main issue with checksumming on the CPU is, in my experience, not
> that you don't have enough compute, but that you trash the caches.

Agree. I have noticed that x86 has "non-temporal" instruction variants to 
load/store data without trashing the cache entirely.

A variant of the checksum function using such instructions might be handy.

Variants of the memcpy function using such instructions might also be handy for 
some purposes, e.g. copying the contents of packets, where the original and/or 
copy will not accessed shortly thereafter.

> > So I opted for a solution with zero changes to the inner loop, so no
> performance retesting is required (for the previously supported use
> cases, where the buffer is aligned).
> >
> 
> You will see performance degradation with this solution as well, under
> certain conditions. For unaligned 100 bytes of data, the current DPDK
> implementation and the memcpy()-fied version needs ~21 cc/packet. Your
> patch needs 54 cc/packet.

Yes, it's a tradeoff. I 

[PATCH v2] baseband/turbo_sw: Remove flexran_sdk meson option

2022-06-27 Thread Nicolas Chautru
v2: typo in documentation

Hi Thomas,
This is change you requested earlier this year. This is targeting 22.11.
Basically we no longer pass a specific option but rely on pkgconfig.
There is small change to generate the .pc files that Intel will make available 
by end of year. Still the related change in DPDK is available now.

Thanks
Nic



Nicolas Chautru (1):
  baseband/turbo_sw: remove Flexran SDK meson option

 doc/guides/bbdevs/turbo_sw.rst|  6 --
 drivers/baseband/turbo_sw/meson.build | 36 +--
 meson_options.txt |  2 --
 3 files changed, 17 insertions(+), 27 deletions(-)

-- 
1.8.3.1



[PATCH v2] baseband/turbo_sw: remove Flexran SDK meson option

2022-06-27 Thread Nicolas Chautru
The related dependency to build the PMD based on the
SDK libraries is now enabled through pkfconfig.

Signed-off-by: Nicolas Chautru 
---
 doc/guides/bbdevs/turbo_sw.rst|  6 --
 drivers/baseband/turbo_sw/meson.build | 36 +--
 meson_options.txt |  2 --
 3 files changed, 17 insertions(+), 27 deletions(-)

diff --git a/doc/guides/bbdevs/turbo_sw.rst b/doc/guides/bbdevs/turbo_sw.rst
index 1e23e37..24a9621 100644
--- a/doc/guides/bbdevs/turbo_sw.rst
+++ b/doc/guides/bbdevs/turbo_sw.rst
@@ -136,7 +136,8 @@ In order to enable this virtual bbdev PMD, the user may:
   FlexRAN SDK libraries were installed. And ``DIR_WIRELESS_SDK`` to the path
   where the libraries were extracted.
 
-* Tune the meson build option pointing the location of the FlexRAN SDK 
libraries ``flexran_sdk``
+* Point pkgconfig towards these libraries so that they can be automatically 
found by meson.
+  If not DPDK will still compile but the related functionality would be 
stubbed out.
 
 Example:
 
@@ -144,8 +145,9 @@ Example:
 
 export 
FLEXRAN_SDK=/FlexRAN-FEC-SDK-19-04/sdk/build-avx2-icc/install
 export 
DIR_WIRELESS_SDK=/FlexRAN-FEC-SDK-19-04/sdk/build-avx2-icc/
+export PKG_CONFIG_PATH=$DIR_WIRELESS_SDK/pkgcfg:$PKG_CONFIG_PATH
 cd build
-meson configure 
-Dflexran_sdk=/FlexRAN-FEC-SDK-19-04/sdk/build-avx512-icc/install
+meson configure
 
 * For AVX512 machines with SDK libraries installed then both 4G and 5G can be 
enabled for full real time FEC capability.
   For AVX2 machines it is possible to only enable the 4G libraries and the PMD 
capabilities will be limited to 4G FEC.
diff --git a/drivers/baseband/turbo_sw/meson.build 
b/drivers/baseband/turbo_sw/meson.build
index 477b8b3..9e35156 100644
--- a/drivers/baseband/turbo_sw/meson.build
+++ b/drivers/baseband/turbo_sw/meson.build
@@ -1,38 +1,28 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2019 Intel Corporation
 
-path = get_option('flexran_sdk')
+# check for FlexRAN SDK libraries
+dep_turbo = dependency('flexran_sdk_turbo', required: false)
+dep_dec5g = dependency('flexran_sdk_ldpc_decoder_5gnr', required: false)
 
-# check for FlexRAN SDK libraries for AVX2
-lib4g = cc.find_library('libturbo', dirs: [path + '/lib_turbo'], required: 
false)
-if lib4g.found()
-ext_deps += cc.find_library('libturbo', dirs: [path + '/lib_turbo'], 
required: true)
-ext_deps += cc.find_library('libcrc', dirs: [path + '/lib_crc'], required: 
true)
-ext_deps += cc.find_library('librate_matching', dirs: [path + 
'/lib_rate_matching'], required: true)
-ext_deps += cc.find_library('libcommon', dirs: [path + '/lib_common'], 
required: true)
+if dep_turbo.found()
 ext_deps += cc.find_library('libstdc++', required: true)
 ext_deps += cc.find_library('libirc', required: true)
 ext_deps += cc.find_library('libimf', required: true)
 ext_deps += cc.find_library('libipps', required: true)
 ext_deps += cc.find_library('libsvml', required: true)
-includes += include_directories(path + '/lib_turbo')
-includes += include_directories(path + '/lib_crc')
-includes += include_directories(path + '/lib_rate_matching')
-includes += include_directories(path + '/lib_common')
+   ext_deps += dep_turbo
+   ext_deps += dependency('flexran_sdk_crc', required: true)
+   ext_deps += dependency('flexran_sdk_rate_matching', required: true)
+   ext_deps += dependency('flexran_sdk_common', required: true)
 cflags += ['-DRTE_BBDEV_SDK_AVX2']
 endif
 
-# check for FlexRAN SDK libraries for AVX512
-lib5g = cc.find_library('libldpc_decoder_5gnr', dirs: [path + 
'/lib_ldpc_decoder_5gnr'], required: false)
-if lib5g.found()
-ext_deps += cc.find_library('libldpc_encoder_5gnr', dirs: [path + 
'/lib_ldpc_encoder_5gnr'], required: true)
-ext_deps += cc.find_library('libldpc_decoder_5gnr', dirs: [path + 
'/lib_ldpc_decoder_5gnr'], required: true)
-ext_deps += cc.find_library('libLDPC_ratematch_5gnr', dirs: [path + 
'/lib_LDPC_ratematch_5gnr'], required: true)
-ext_deps += cc.find_library('librate_dematching_5gnr', dirs: [path + 
'/lib_rate_dematching_5gnr'], required: true)
-includes += include_directories(path + '/lib_ldpc_encoder_5gnr')
-includes += include_directories(path + '/lib_ldpc_decoder_5gnr')
-includes += include_directories(path + '/lib_LDPC_ratematch_5gnr')
-includes += include_directories(path + '/lib_rate_dematching_5gnr')
+if dep_dec5g.found()
+   ext_deps += dep_dec5g
+   ext_deps += dependency('flexran_sdk_ldpc_encoder_5gnr', required: true)
+   ext_deps += dependency('flexran_sdk_LDPC_ratematch_5gnr', required: 
true)
+   ext_deps += dependency('flexran_sdk_rate_dematching_5gnr', required: 
true)
 cflags += ['-DRTE_BBDEV_SDK_AVX512']
 endif
 
diff --git a/meson_options.txt b/meson_options.txt
index 7c220ad..abaa304 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -22,8 +22,6 @@ option('enable_kmods', ty

RE: [PATCH v4] doc: announce changes in bbdev related to enum extension

2022-06-27 Thread Chautru, Nicolas
Hi Thomas, 
Kind reminder on this one.
Thanks
Nic

> -Original Message-
> From: Chautru, Nicolas
> Sent: Friday, June 17, 2022 9:13 AM
> To: dev@dpdk.org; tho...@monjalon.net
> Cc: t...@redhat.com; Kinsella, Ray ; Richardson,
> Bruce ; hemant.agra...@nxp.com;
> david.march...@redhat.com; step...@networkplumber.org; Maxime
> Coquelin ; gak...@marvell.com
> Subject: RE: [PATCH v4] doc: announce changes in bbdev related to enum
> extension
> 
> Hi Thomas,
> Can this one be applied based on your review?
> Thanks
> Nic
> 
> > -Original Message-
> > From: Maxime Coquelin 
> > Sent: Thursday, June 9, 2022 12:54 AM
> > To: Chautru, Nicolas ; dev@dpdk.org;
> > gak...@marvell.com; tho...@monjalon.net
> > Cc: t...@redhat.com; Kinsella, Ray ;
> > Richardson, Bruce ;
> > hemant.agra...@nxp.com; david.march...@redhat.com;
> > step...@networkplumber.org
> > Subject: Re: [PATCH v4] doc: announce changes in bbdev related to enum
> > extension
> >
> > Hi Nicolas,
> >
> > On 6/9/22 02:34, Nicolas Chautru wrote:
> > > Intent to resolve in DPDK 22.11 historical usage which prevents
> > > graceful extension of enum and API without troublesome ABI breakage
> > > as well as extending API RTE_BBDEV_OP_FFT for new operation type in
> > > bbdev as well as other new members in existing structures.
> > >
> > > Signed-off-by: Nicolas Chautru 
> > > ---
> > >   doc/guides/rel_notes/deprecation.rst | 11 +++
> > >   1 file changed, 11 insertions(+)
> > >
> > > diff --git a/doc/guides/rel_notes/deprecation.rst
> > > b/doc/guides/rel_notes/deprecation.rst
> > > index 4e5b23c..c8ab1ec 100644
> > > --- a/doc/guides/rel_notes/deprecation.rst
> > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > @@ -112,6 +112,17 @@ Deprecation Notices
> > > session and the private data of session. An opaque pointer can be
> exposed
> > > directly to application which can be attached to the 
> > > ``rte_crypto_op``.
> > >
> > > +* bbdev: ``RTE_BBDEV_OP_TYPE_COUNT`` terminating the
> > > +``rte_bbdev_op_type``
> > > +  enum will be deprecated and instead use fixed array size when
> > > +required to allow for
> > > +  future enum extension.
> > > +  Will extend API to support new operation type
> > > +``RTE_BBDEV_OP_FFT`` as per this
> > > +  RFC https://patchwork.dpdk.org/project/dpdk/list/?series=22111
> > > +  New members will be added in ``rte_bbdev_driver_info`` to expose
> > > +PMD queue topology inspired
> > > +  by this RFC
> > > +https://patches.dpdk.org/project/dpdk/list/?series=22076
> > > +  New member will be added in ``rte_bbdev_driver_info`` to expose
> > > +the device status as per
> > > +  this RFC https://patches.dpdk.org/project/dpdk/list/?series=23367
> > > +  This should be updated in DPDK 22.11.
> > > +
> > >   * security: Hide structure ``rte_security_session`` and expose an opaque
> > > pointer for the private data to the application which can be attached
> > > to the packet while enqueuing.
> >
> > Thanks for rewording the notice.
> >
> > Acked-by: Maxime Coquelin 
> >
> > Maxime



RE: Service core statistics MT safety

2022-06-27 Thread Honnappa Nagarahalli

> 
> > From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
> > Sent: Monday, 27 June 2022 13.06
> >
> > Hi.
> >
> > Is it safe to enable stats on MT safe services?
> >
> > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
> 4
> > 5444731-6096fdb16385f88f&q=1&e=27b94605-d1e2-40b6-af6d-
> 9ebc54d
> >
> 5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%
> 2Flib
> > %2Feal%2Fcommon%2Frte_service.c%23
> > L3
> > 6
> > 6
> >
> > It seems to me this would have to be an __atomic_add for this code
> > to produce deterministic results.
> 
>  I agree. The same goes for the 'calls' field.
> >>> The calling function does the locking.
> >>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
> 454
> >>> 44731-5ce419f8bf9f9b76&q=1&e=27b94605-d1e2-40b6-af6d-
> 9ebc54d5db1
> >>>
> 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib
> %2Feal
> >>> %2Fcommon%2Frte_service.c%23L3
> >>> 98
> >>>
> >>> For more information you can look at:
> >>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
> 454
> >>> 44731-ba0d1416f08856f0&q=1&e=27b94605-d1e2-40b6-af6d-
> 9ebc54d5db1
> >>>
> 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib
> %2Feal
> >>> %2Finclude%2Frte_service.h%23L
> >>> 120
> >>>
> >>
> >> What about the
> >> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
> 4544
> >> 4731-b64334addc78c264&q=1&e=27b94605-d1e2-40b6-af6d-
> 9ebc54d5db18&
> >>
> u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2
> Feal%2F
> >> common%2Frte_service.c%23L404
> >> call (for MT safe services)?
> >>
> >> There's no lock held there.
> > Good point.
> > This is the case where the service running in service cores is MT safe. 
> > However,
> the stats are incremented outside of the MT Safety mechanism employed by the
> service. So, yes, this and other updates in the function
> 'service_runner_do_callback' need to be updated atomically.
> 
> Maybe a better solution would be to move this to the core_state struct (and
> eliminate the "calls" field since the same information is already in the 
> core_state
> struct). The contention on these cache lines will be pretty crazy for 
> services with
> short run time (say a thousand cycles or less per call), assuming they are
> mapped to many cores.
That's one option, the structures are internal as well. With this option stats 
need to be aggregated which will not give an accurate view. But, that is the 
nature of the statistics. 

I am also wondering if these stats are of any use other than for debugging. 
Adding a capability to disable stats might help as well.

> 
> Idle service cores will basically do nothing else than stall waiting for 
> these lines, I
> suspect, hampering the progress of more busy cores.


[PATCH v3 0/7] bbdev changes for 22.11

2022-06-27 Thread Nicolas Chautru
v3: update to device status info to also use padded size for the related array.
Adding also 2 additionals commits to allow the API struc to expose more 
information related to queues
corner cases/warning as well as an optional rw lock.
Hemant, Maxime, this is planned for DPDK 21.11 but would like review/ack early 
is possible
to get this applied earlier and due to time off this summer.
Thanks
Nic

-- 

Hi,

Agregating together in a single serie a number of bbdev api changes previously 
submitted over the last few months and all targeted for 22.11 (4 different 
series detailed below). Related deprecation notice being pushed in 22.07 in 
parallel. 
* bbdev: add device status info
* bbdev: add new operation for FFT processing
* bbdev: add device info on queue topology
* bbdev: allow operation type enum for growth

v2: Update to the RTE_BBDEV_COUNT removal based on feedback from Thomas/Stephen 
: rejecting out of range op type and adjusting the new name for the padded 
maximum value used for fixed size arrays. 

---

Previous cover letters agregated below:

* bbdev: add device status info
https://patches.dpdk.org/project/dpdk/list/?series=23367

The updated structure will allow PMDs to expose through info_get what be may 
the status of the underlying accelerator, notably in case an HW error event 
having happened.

* bbdev: add new operation for FFT processing
https://patches.dpdk.org/project/dpdk/list/?series=22111

This contribution adds a new operation type to the existing ones already 
supported by the bbdev PMDs.
This set of operation is FFT-based processing for 5GNR baseband processing 
acceleration. This operates in the same lookaside fashion as other existing 
bbdev operation with a dedicated set of capabilities and parameters (marked as 
experimental).

I plan to also include a new PMD supporting this operation (and most of the 
related capabilities) in the next couple of months (either in 22.06 or 22.09) 
as well as extending the related bbdev-test.

* bbdev: add device info on queue topology
https://patches.dpdk.org/project/dpdk/list/?series=22076

Addressing an historical concern that the device info struct only
imperfectly captured what queues are available on the device
(number of operation and priority). This ended up being an iterative
process for application to find each queue could be configured.

ie. the gap was captured as technical debt previously  in comments
/* This isn't ideal because it reports the maximum number of queues but
 * does not provide info on how many can be uplink/downlink or different
 * priorities
 */

This is now being exposed explictly based on the what the device actually
supports using the existing info_get api

* bbdev: allow operation type enum for growth
https://patches.dpdk.org/project/dpdk/list/?series=23509

This is related to the general intent to remove using MAX value for enums. 
There is consensus that we should avoid this for a while notably for 
future-proofed ABI concerns 
https://patches.dpdk.org/project/dpdk/patch/20200130142003.2645765-1-ferruh.yi...@intel.com/.
But still there is arguably not yet an explicit best recommendation to handle 
this especially when we actualy need to expose array whose index is such an 
enum.
As a specific example here I am refering to RTE_BBDEV_OP_TYPE_COUNT in enum 
rte_bbdev_op_type which is being extended for new operation type being support 
in bbdev (such as 
https://patches.dpdk.org/project/dpdk/patch/1646956157-245769-2-git-send-email-nicolas.chau...@intel.com/
 adding new FFT operation)

There is also the intent to be able to expose information for each operation 
type through the bbdev api such as dynamically configured queues information 
per such operation type 
https://patches.dpdk.org/project/dpdk/patch/1646785355-168133-2-git-send-email-nicolas.chau...@intel.com/

Basically we are considering best way to accomodate for this, notably based on 
discussions with Ray Kinsella and Bruce Richardson, to handle such a case 
moving forward: specifically for the example with RTE_BBDEV_OP_TYPE_COUNT and 
also more generally.

One possible option is captured in that patchset and is basically based on the 
simple principle to allow for growth and prevent ABI breakage. Ie. the last 
value of the enum is set with a higher value than required so that to allow 
insertion of new enum outside of the major ABI versions.
In that case the RTE_BBDEV_OP_TYPE_COUNT is still present and can be exposed 
and used while still allowing for addition thanks to the implicit padding-like 
room. As an alternate variant, instead of using that last enum value, that 
extended size could be exposed as an #define outside of the enum but would be 
fundamentally the same (public).

Another option would be to avoid array alltogether and use each time this a new 
dedicated API function (operation type enum being an input argument instead of 
an index to an array in an existing structure so that to get access to 
structure related to a given operation 

[PATCH v3 1/7] bbdev: allow operation type enum for growth

2022-06-27 Thread Nicolas Chautru
Updating the enum for rte_bbdev_op_type
to allow to keep ABI compatible for enum insertion
while adding padded maximum value for array need.
Removing RTE_BBDEV_OP_TYPE_COUNT and instead exposing
RTE_BBDEV_OP_TYPE_PADDED_MAX.

Signed-off-by: Nicolas Chautru 
---
 app/test-bbdev/test_bbdev.c  | 2 +-
 app/test-bbdev/test_bbdev_perf.c | 4 ++--
 examples/bbdev_app/main.c| 2 +-
 lib/bbdev/rte_bbdev.c| 9 +
 lib/bbdev/rte_bbdev_op.h | 2 +-
 5 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/app/test-bbdev/test_bbdev.c b/app/test-bbdev/test_bbdev.c
index ac06d73..1063f6e 100644
--- a/app/test-bbdev/test_bbdev.c
+++ b/app/test-bbdev/test_bbdev.c
@@ -521,7 +521,7 @@ struct bbdev_testsuite_params {
rte_mempool_free(mp);
 
TEST_ASSERT((mp = rte_bbdev_op_pool_create("Test_INV",
-   RTE_BBDEV_OP_TYPE_COUNT, size, cache_size, 0)) == NULL,
+   RTE_BBDEV_OP_TYPE_PADDED_MAX, size, cache_size, 0)) == 
NULL,
"Failed test for rte_bbdev_op_pool_create: "
"returned value is not NULL for invalid type");
 
diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c
index fad3b1e..1abda2d 100644
--- a/app/test-bbdev/test_bbdev_perf.c
+++ b/app/test-bbdev/test_bbdev_perf.c
@@ -2428,13 +2428,13 @@ typedef int (test_case_function)(struct active_device 
*ad,
 
/* Find capabilities */
const struct rte_bbdev_op_cap *cap = info.drv.capabilities;
-   for (i = 0; i < RTE_BBDEV_OP_TYPE_COUNT; i++) {
+   do {
if (cap->type == test_vector.op_type) {
capabilities = cap;
break;
}
cap++;
-   }
+   } while (cap->type != RTE_BBDEV_OP_NONE);
TEST_ASSERT_NOT_NULL(capabilities,
"Couldn't find capabilities");
 
diff --git a/examples/bbdev_app/main.c b/examples/bbdev_app/main.c
index fc7e8b8..ef0ba76 100644
--- a/examples/bbdev_app/main.c
+++ b/examples/bbdev_app/main.c
@@ -1041,7 +1041,7 @@ uint16_t bbdev_parse_number(const char *mask)
void *sigret;
struct app_config_params app_params = def_app_config;
struct rte_mempool *ethdev_mbuf_mempool, *bbdev_mbuf_mempool;
-   struct rte_mempool *bbdev_op_pools[RTE_BBDEV_OP_TYPE_COUNT];
+   struct rte_mempool *bbdev_op_pools[RTE_BBDEV_OP_TYPE_PADDED_MAX];
struct lcore_conf lcore_conf[RTE_MAX_LCORE] = { {0} };
struct lcore_statistics lcore_stats[RTE_MAX_LCORE] = { {0} };
struct stats_lcore_params stats_lcore;
diff --git a/lib/bbdev/rte_bbdev.c b/lib/bbdev/rte_bbdev.c
index aaee7b7..22bd894 100644
--- a/lib/bbdev/rte_bbdev.c
+++ b/lib/bbdev/rte_bbdev.c
@@ -23,6 +23,8 @@
 
 #define DEV_NAME "BBDEV"
 
+/* Number of supported operation types */
+#define BBDEV_OP_TYPE_COUNT 5
 
 /* BBDev library logging ID */
 RTE_LOG_REGISTER_DEFAULT(bbdev_logtype, NOTICE);
@@ -890,10 +892,10 @@ struct rte_mempool *
return NULL;
}
 
-   if (type >= RTE_BBDEV_OP_TYPE_COUNT) {
+   if (type >= BBDEV_OP_TYPE_COUNT) {
rte_bbdev_log(ERR,
"Invalid op type (%u), should be less than %u",
-   type, RTE_BBDEV_OP_TYPE_COUNT);
+   type, BBDEV_OP_TYPE_COUNT);
return NULL;
}
 
@@ -1122,10 +1124,9 @@ struct rte_mempool *
"RTE_BBDEV_OP_TURBO_DEC",
"RTE_BBDEV_OP_TURBO_ENC",
"RTE_BBDEV_OP_LDPC_DEC",
-   "RTE_BBDEV_OP_LDPC_ENC",
};
 
-   if (op_type < RTE_BBDEV_OP_TYPE_COUNT)
+   if (op_type < BBDEV_OP_TYPE_COUNT)
return op_types[op_type];
 
rte_bbdev_log(ERR, "Invalid operation type");
diff --git a/lib/bbdev/rte_bbdev_op.h b/lib/bbdev/rte_bbdev_op.h
index 6d56133..cd82418 100644
--- a/lib/bbdev/rte_bbdev_op.h
+++ b/lib/bbdev/rte_bbdev_op.h
@@ -748,7 +748,7 @@ enum rte_bbdev_op_type {
RTE_BBDEV_OP_TURBO_ENC,  /**< Turbo encode */
RTE_BBDEV_OP_LDPC_DEC,  /**< LDPC decode */
RTE_BBDEV_OP_LDPC_ENC,  /**< LDPC encode */
-   RTE_BBDEV_OP_TYPE_COUNT,  /**< Count of different op types */
+   RTE_BBDEV_OP_TYPE_PADDED_MAX = 8,  /**< Maximum op type number 
including padding */
 };
 
 /** Bit indexes of possible errors reported through status field */
-- 
1.8.3.1



[PATCH v3 2/7] bbdev: add device status info

2022-06-27 Thread Nicolas Chautru
Added device status information, so that the PMD can
expose information related to the underlying accelerator device status.
Minor order change in structure to fit into padding hole.

Signed-off-by: Nicolas Chautru 
---
 drivers/baseband/acc100/rte_acc100_pmd.c   |  1 +
 drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c |  1 +
 drivers/baseband/fpga_lte_fec/fpga_lte_fec.c   |  1 +
 drivers/baseband/la12xx/bbdev_la12xx.c |  1 +
 drivers/baseband/null/bbdev_null.c |  1 +
 drivers/baseband/turbo_sw/bbdev_turbo_software.c   |  1 +
 lib/bbdev/rte_bbdev.c  | 24 +++
 lib/bbdev/rte_bbdev.h  | 35 --
 lib/bbdev/version.map  |  6 
 9 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/baseband/acc100/rte_acc100_pmd.c 
b/drivers/baseband/acc100/rte_acc100_pmd.c
index de7e4bc..17ba798 100644
--- a/drivers/baseband/acc100/rte_acc100_pmd.c
+++ b/drivers/baseband/acc100/rte_acc100_pmd.c
@@ -1060,6 +1060,7 @@
 
/* Read and save the populated config from ACC100 registers */
fetch_acc100_config(dev);
+   dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED;
 
/* This isn't ideal because it reports the maximum number of queues but
 * does not provide info on how many can be uplink/downlink or different
diff --git a/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c 
b/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c
index 82ae6ba..57b12af 100644
--- a/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c
+++ b/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c
@@ -369,6 +369,7 @@
dev_info->capabilities = bbdev_capabilities;
dev_info->cpu_flag_reqs = NULL;
dev_info->data_endianness = RTE_LITTLE_ENDIAN;
+   dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED;
 
/* Calculates number of queues assigned to device */
dev_info->max_num_queues = 0;
diff --git a/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c 
b/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c
index 21d3529..2a330c4 100644
--- a/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c
+++ b/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c
@@ -645,6 +645,7 @@ struct __rte_cache_aligned fpga_queue {
dev_info->capabilities = bbdev_capabilities;
dev_info->cpu_flag_reqs = NULL;
dev_info->data_endianness = RTE_LITTLE_ENDIAN;
+   dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED;
 
/* Calculates number of queues assigned to device */
dev_info->max_num_queues = 0;
diff --git a/drivers/baseband/la12xx/bbdev_la12xx.c 
b/drivers/baseband/la12xx/bbdev_la12xx.c
index 4d1bd16..c1f88c6 100644
--- a/drivers/baseband/la12xx/bbdev_la12xx.c
+++ b/drivers/baseband/la12xx/bbdev_la12xx.c
@@ -100,6 +100,7 @@ struct bbdev_la12xx_params {
dev_info->capabilities = bbdev_capabilities;
dev_info->cpu_flag_reqs = NULL;
dev_info->min_alignment = 64;
+   dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED;
 
rte_bbdev_log_debug("got device info from %u", dev->data->dev_id);
 }
diff --git a/drivers/baseband/null/bbdev_null.c 
b/drivers/baseband/null/bbdev_null.c
index 248e129..94a1976 100644
--- a/drivers/baseband/null/bbdev_null.c
+++ b/drivers/baseband/null/bbdev_null.c
@@ -82,6 +82,7 @@ struct bbdev_queue {
 * here for code completeness.
 */
dev_info->data_endianness = RTE_LITTLE_ENDIAN;
+   dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED;
 
rte_bbdev_log_debug("got device info from %u", dev->data->dev_id);
 }
diff --git a/drivers/baseband/turbo_sw/bbdev_turbo_software.c 
b/drivers/baseband/turbo_sw/bbdev_turbo_software.c
index af7bc41..dbc5524 100644
--- a/drivers/baseband/turbo_sw/bbdev_turbo_software.c
+++ b/drivers/baseband/turbo_sw/bbdev_turbo_software.c
@@ -254,6 +254,7 @@ struct turbo_sw_queue {
dev_info->min_alignment = 64;
dev_info->harq_buffer_size = 0;
dev_info->data_endianness = RTE_LITTLE_ENDIAN;
+   dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED;
 
rte_bbdev_log_debug("got device info from %u\n", dev->data->dev_id);
 }
diff --git a/lib/bbdev/rte_bbdev.c b/lib/bbdev/rte_bbdev.c
index 22bd894..555bda9 100644
--- a/lib/bbdev/rte_bbdev.c
+++ b/lib/bbdev/rte_bbdev.c
@@ -25,6 +25,8 @@
 
 /* Number of supported operation types */
 #define BBDEV_OP_TYPE_COUNT 5
+/* Number of supported device status */
+#define BBDEV_DEV_STATUS_COUNT 9
 
 /* BBDev library logging ID */
 RTE_LOG_REGISTER_DEFAULT(bbdev_logtype, NOTICE);
@@ -1132,3 +1134,25 @@ struct rte_mempool *
rte_bbdev_log(ERR, "Invalid operation type");
return NULL;
 }
+
+const char *
+rte_bbdev_device_status_str(enum rte_bbdev_device_status status)
+{
+   static const char * const dev_sta_string[] = {
+   "RTE_BBDEV_DEV_NOSTATUS",
+   "RTE_BBDEV_DEV_NOT_SUPPORTED",
+   "RTE_BBDEV_DE

[PATCH v3 3/7] bbdev: add device info on queue topology

2022-06-27 Thread Nicolas Chautru
Adding more options in the API to expose the number
of queues exposed and related priority.

Signed-off-by: Nicolas Chautru 
---
 lib/bbdev/rte_bbdev.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 9b1ffa4..ac941d6 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -289,6 +289,10 @@ struct rte_bbdev_driver_info {
 
/** Maximum number of queues supported by the device */
unsigned int max_num_queues;
+   /** Maximum number of queues supported per operation type */
+   unsigned int num_queues[RTE_BBDEV_OP_TYPE_PADDED_MAX];
+   /** Priority level supported per operation type */
+   unsigned int queue_priority[RTE_BBDEV_OP_TYPE_PADDED_MAX];
/** Queue size limit (queue size must also be power of 2) */
uint32_t queue_size_lim;
/** Set if device off-loads operation to hardware  */
-- 
1.8.3.1



[PATCH v3 4/7] drivers/baseband: update PMDs to expose queue per operation

2022-06-27 Thread Nicolas Chautru
Add support in existing bbdev PMDs for the explicit number of queue
and priority for each operation type configured on the device.

Signed-off-by: Nicolas Chautru 
---
 drivers/baseband/acc100/rte_acc100_pmd.c   | 29 +-
 drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c |  8 ++
 drivers/baseband/fpga_lte_fec/fpga_lte_fec.c   |  8 ++
 drivers/baseband/la12xx/bbdev_la12xx.c |  7 ++
 drivers/baseband/turbo_sw/bbdev_turbo_software.c   | 11 
 5 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/drivers/baseband/acc100/rte_acc100_pmd.c 
b/drivers/baseband/acc100/rte_acc100_pmd.c
index 17ba798..d568d0d 100644
--- a/drivers/baseband/acc100/rte_acc100_pmd.c
+++ b/drivers/baseband/acc100/rte_acc100_pmd.c
@@ -966,6 +966,7 @@
struct rte_bbdev_driver_info *dev_info)
 {
struct acc100_device *d = dev->data->dev_private;
+   int i;
 
static const struct rte_bbdev_op_cap bbdev_capabilities[] = {
{
@@ -1062,19 +1063,23 @@
fetch_acc100_config(dev);
dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED;
 
-   /* This isn't ideal because it reports the maximum number of queues but
-* does not provide info on how many can be uplink/downlink or different
-* priorities
-*/
-   dev_info->max_num_queues =
-   d->acc100_conf.q_dl_5g.num_aqs_per_groups *
-   d->acc100_conf.q_dl_5g.num_qgroups +
-   d->acc100_conf.q_ul_5g.num_aqs_per_groups *
-   d->acc100_conf.q_ul_5g.num_qgroups +
-   d->acc100_conf.q_dl_4g.num_aqs_per_groups *
-   d->acc100_conf.q_dl_4g.num_qgroups +
-   d->acc100_conf.q_ul_4g.num_aqs_per_groups *
+   /* Expose number of queues */
+   dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0;
+   dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = 
d->acc100_conf.q_ul_4g.num_aqs_per_groups *
d->acc100_conf.q_ul_4g.num_qgroups;
+   dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = 
d->acc100_conf.q_dl_4g.num_aqs_per_groups *
+   d->acc100_conf.q_dl_4g.num_qgroups;
+   dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = 
d->acc100_conf.q_ul_5g.num_aqs_per_groups *
+   d->acc100_conf.q_ul_5g.num_qgroups;
+   dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = 
d->acc100_conf.q_dl_5g.num_aqs_per_groups *
+   d->acc100_conf.q_dl_5g.num_qgroups;
+   dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = 
d->acc100_conf.q_ul_4g.num_qgroups;
+   dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = 
d->acc100_conf.q_dl_4g.num_qgroups;
+   dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = 
d->acc100_conf.q_ul_5g.num_qgroups;
+   dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = 
d->acc100_conf.q_dl_5g.num_qgroups;
+   dev_info->max_num_queues = 0;
+   for (i = RTE_BBDEV_OP_TURBO_DEC; i < RTE_BBDEV_OP_LDPC_ENC; i++)
+   dev_info->max_num_queues += dev_info->num_queues[i];
dev_info->queue_size_lim = ACC100_MAX_QUEUE_DEPTH;
dev_info->hardware_accelerated = true;
dev_info->max_dl_queue_priority =
diff --git a/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c 
b/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c
index 57b12af..b4982af 100644
--- a/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c
+++ b/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c
@@ -379,6 +379,14 @@
if (hw_q_id != FPGA_INVALID_HW_QUEUE_ID)
dev_info->max_num_queues++;
}
+   /* Expose number of queue per operation type */
+   dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0;
+   dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = 0;
+   dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = 0;
+   dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = dev_info->max_num_queues 
/ 2;
+   dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = dev_info->max_num_queues 
/ 2;
+   dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = 1;
+   dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = 1;
 }
 
 /**
diff --git a/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c 
b/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c
index 2a330c4..dc7f479 100644
--- a/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c
+++ b/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c
@@ -655,6 +655,14 @@ struct __rte_cache_aligned fpga_queue {
if (hw_q_id != FPGA_INVALID_HW_QUEUE_ID)
dev_info->max_num_queues++;
}
+   /* Expose number of queue per operation type */
+   dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0;
+   dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = dev_info->max_num_queues 
/ 2;
+   dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = dev_info->max_num_queues 
/ 2;
+   dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = 0;
+   dev_info->num_queues[RTE_BBDEV_OP_LD

[PATCH v3 6/7] bbdev: add queue related warning and status information

2022-06-27 Thread Nicolas Chautru
This allows to expose more information with regards to any
queue related failure and warning which cannot be supported
in existing API.

Signed-off-by: Nicolas Chautru 
---
 app/test-bbdev/test_bbdev_perf.c |  2 ++
 lib/bbdev/rte_bbdev.c|  2 ++
 lib/bbdev/rte_bbdev.h| 19 +++
 3 files changed, 23 insertions(+)

diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c
index 1abda2d..653b21f 100644
--- a/app/test-bbdev/test_bbdev_perf.c
+++ b/app/test-bbdev/test_bbdev_perf.c
@@ -4360,6 +4360,8 @@ typedef int (test_case_function)(struct active_device *ad,
stats->dequeued_count = q_stats->dequeued_count;
stats->enqueue_err_count = q_stats->enqueue_err_count;
stats->dequeue_err_count = q_stats->dequeue_err_count;
+   stats->enqueue_warning_count = q_stats->enqueue_warning_count;
+   stats->dequeue_warning_count = q_stats->dequeue_warning_count;
stats->acc_offload_cycles = q_stats->acc_offload_cycles;
 
return 0;
diff --git a/lib/bbdev/rte_bbdev.c b/lib/bbdev/rte_bbdev.c
index 28b105d..fb59b51 100644
--- a/lib/bbdev/rte_bbdev.c
+++ b/lib/bbdev/rte_bbdev.c
@@ -723,6 +723,8 @@ struct rte_bbdev *
stats->dequeued_count += q_stats->dequeued_count;
stats->enqueue_err_count += q_stats->enqueue_err_count;
stats->dequeue_err_count += q_stats->dequeue_err_count;
+   stats->enqueue_warn_count += q_stats->enqueue_warn_count;
+   stats->dequeue_warn_count += q_stats->dequeue_warn_count;
}
rte_bbdev_log_debug("Got stats on %u", dev->data->dev_id);
 }
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index ed528b8..c625a14 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -224,6 +224,19 @@ struct rte_bbdev_queue_conf {
 rte_bbdev_queue_stop(uint16_t dev_id, uint16_t queue_id);
 
 /**
+ * Flags indicate the reason why a previous enqueue may not have
+ * consummed all requested operations
+ * In case of multiple reasons the latter superdes a previous one
+ */
+enum rte_bbdev_enqueue_status {
+   RTE_BBDEV_ENQ_STATUS_NONE, /**< Nothing to report */
+   RTE_BBDEV_ENQ_STATUS_QUEUE_FULL,   /**< Not enough room in queue */
+   RTE_BBDEV_ENQ_STATUS_RING_FULL,/**< Not enough room in ring */
+   RTE_BBDEV_ENQ_STATUS_INVALID_OP,   /**< Operation was rejected as 
invalid */
+   RTE_BBDEV_ENQ_STATUS_PADDED_MAX = 6,   /**< Maximum enq status number 
including padding */
+};
+
+/**
  * Flags indicate the status of the device
  */
 enum rte_bbdev_device_status {
@@ -246,6 +259,12 @@ struct rte_bbdev_stats {
uint64_t enqueue_err_count;
/** Total error count on operations dequeued */
uint64_t dequeue_err_count;
+   /** Total warning count on operations enqueued */
+   uint64_t enqueue_warn_count;
+   /** Total warning count on operations dequeued */
+   uint64_t dequeue_warn_count;
+   /** Total enqueue status count based on rte_bbdev_enqueue_status enum */
+   uint64_t enqueue_status_count[RTE_BBDEV_ENQ_STATUS_PADDED_MAX];
/** CPU cycles consumed by the (HW/SW) accelerator device to offload
 *  the enqueue request to its internal queues.
 *  - For a HW device this is the cycles consumed in MMIO write
-- 
1.8.3.1



[PATCH v3 5/7] bbdev: add new operation for FFT processing

2022-06-27 Thread Nicolas Chautru
Extension of bbdev operation to support FFT based operations.

Signed-off-by: Nicolas Chautru 
Acked-by: Hemant Agrawal 
---
 doc/guides/prog_guide/bbdev.rst | 130 +++
 lib/bbdev/rte_bbdev.c   |  11 ++-
 lib/bbdev/rte_bbdev.h   |  76 
 lib/bbdev/rte_bbdev_op.h| 149 
 lib/bbdev/version.map   |   4 ++
 5 files changed, 369 insertions(+), 1 deletion(-)

diff --git a/doc/guides/prog_guide/bbdev.rst b/doc/guides/prog_guide/bbdev.rst
index 70fa01a..4a055b5 100644
--- a/doc/guides/prog_guide/bbdev.rst
+++ b/doc/guides/prog_guide/bbdev.rst
@@ -1118,6 +1118,136 @@ Figure :numref:`figure_turbo_tb_decode` above
 showing the Turbo decoding of CBs using BBDEV interface in TB-mode
 is also valid for LDPC decode.
 
+BBDEV FFT Operation
+
+
+This operation allows to run a combination of DFT and/or IDFT and/or 
time-domain windowing.
+These can be used in a modular fashion (using bypass modes) or as a processing 
pipeline
+which can be used for FFT-based baseband signal processing.
+In more details it allows :
+- to process the data first through an IDFT of adjustable size and padding;
+- to perform the windowing as a programmable cyclic shift offset of the data 
followed by a
+pointwise multiplication by a time domain window;
+- to process the related data through a DFT of adjustable size and depadding 
for each such cyclic
+shift output.
+
+A flexible number of Rx antennas are being processed in parallel with the same 
configuration.
+The API allows more generally for flexibility in what the PMD may support 
(cabability flags) and
+flexibility to adjust some of the parameters of the processing.
+
+The operation/capability flags that can be set for each FFT operation are 
given below.
+
+  **NOTE:** The actual operation flags that may be used with a specific
+  BBDEV PMD are dependent on the driver capabilities as reported via
+  ``rte_bbdev_info_get()``, and may be a subset of those below.
+
+++
+|Description of FFT capability flags |
+++
+|RTE_BBDEV_FFT_WINDOWING |
+| Set to enable/support windowing in time domain |
+++
+|RTE_BBDEV_FFT_CS_ADJUSTMENT |
+| Set to enable/support  the cyclic shift time offset adjustment |
+++
+|RTE_BBDEV_FFT_DFT_BYPASS|
+| Set to bypass the DFT and use directly the IDFT as an option   |
+++
+|RTE_BBDEV_FFT_IDFT_BYPASS   |
+| Set to bypass the IDFT and use directly the DFT as an option   |
+++
+|RTE_BBDEV_FFT_WINDOWING_BYPASS  |
+| Set to bypass the time domain windowing  as an option  |
+++
+|RTE_BBDEV_FFT_POWER_MEAS|
+| Set to provide an optional power measument of the DFT output   |
+++
+|RTE_BBDEV_FFT_FP16_INPUT|
+| Set if the input data shall use FP16 format instead of INT16   |
+++
+|RTE_BBDEV_FFT_FP16_OUTPUT   |
+| Set if the output data shall use FP16 format instead of INT16  |
+++
+
+The structure passed for each FFT operation is given below,
+with the operation flags forming a bitmask in the ``op_flags`` field.
+
+.. code-block:: c
+
+struct rte_bbdev_op_fft {
+struct rte_bbdev_op_data base_input;
+struct rte_bbdev_op_data base_output;
+struct rte_bbdev_op_data power_meas_output;
+uint32_t op_flags;
+uint16_t input_sequence_size;
+uint16_t input_leading_padding;
+uint16_t output_sequence_size;
+uint16_t output_leading_depadding;
+uint8_t window_index[RTE_BBDEV_MAX_CS_2];
+uint16_t cs_bitmap;
+uint8_t num_antennas_log2;
+uint8_t idft_log2;
+uint8_t dft_log2;
+int8_t cs_time_adjustment;
+int8_t idft_shift;
+int8_t dft_shift;
+uint16_t ncs_reciprocal;
+uint16_t power_shift;
+uint16_t fp16_exp_adjust;
+};
+
+The FFT parameters are set out in the table below.
+
++--+

[PATCH v3 7/7] bbdev: add a lock option for enqueue/dequeue operation

2022-06-27 Thread Nicolas Chautru
Locking is not explictly required but can be valuable
in case the application cannot guarantee to be thread-safe,
or specifically is at risk of using the same queue from multiple threads.
This is an option for PMD to use this.

Signed-off-by: Nicolas Chautru 
---
 lib/bbdev/rte_bbdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index c625a14..e0aa52e 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -458,6 +458,8 @@ struct rte_bbdev_data {
int socket_id;  /**< NUMA socket that device is on */
bool started;  /**< Device run-time state */
uint16_t process_cnt;  /** Counter of processes using the device */
+   rte_rwlock_t lock_enq; /**< lock protection for the Enqueue */
+   rte_rwlock_t lock_deq; /**< lock protection for the Dequeue */
 };
 
 /* Forward declarations */
-- 
1.8.3.1



[PATCH v3] doc: add release notes for async vhost dequeue data-path

2022-06-27 Thread Cheng Jiang
Add release notes for asynchronous vhost dequeue data-path. Emphasize
that split virtqueue and packed virtqueue are both supported in
asynchronous vhost dequeue data-path.

Signed-off-by: Cheng Jiang 
---
v3: code rebased.
v2: fixed a full stop missing in the commit message.

 doc/guides/rel_notes/release_22_07.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_22_07.rst 
b/doc/guides/rel_notes/release_22_07.rst
index 6365800313..e43ab15260 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -107,7 +107,8 @@ New Features
 * **Added vhost async dequeue API to receive packets from guest.**

   Added vhost async dequeue API which can leverage DMA devices to
-  accelerate receiving packets from guest.
+  accelerate receiving packets from guest. Split virtqueue and packed
+  virtqueue are both supported.

 * **Added thread-safe version of in-flight packet clear API in vhost library.**

--
2.35.1



[PATCH v2] doc: add event timer expiry drop stat

2022-06-27 Thread Naga Harish K S V
The structure ``rte_event_timer_adapter_stats`` will be
extended by adding a new field, ``evtim_drop_count``. This stat
will represent the number of times an event_timer expiry event
is dropped by the event timer adapter.

Signed-off-by: Naga Harish K S V 
---
v2:
* update commit message
---
 doc/guides/rel_notes/deprecation.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 4e5b23c53d..597a457a37 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -125,3 +125,9 @@ Deprecation Notices
   applications should be updated to use the ``dmadev`` library instead,
   with the underlying HW-functionality being provided by the ``ioat`` or
   ``idxd`` dma drivers
+
+* eventdev/timer: The structure ``rte_event_timer_adapter_stats`` will be
+  extended by adding a new field, ``evtim_drop_count``. This stat will
+  represent the number of times an event_timer expiry event is dropped
+  by the timer adapter. This field will be used by a future patch adding
+  support for periodic mode to the software timer adapter in DPDK 22.11.
-- 
2.25.1



[PATCH v1] net/ice: fix memory allocation issue of packet flag

2022-06-27 Thread Yuying Zhang
Current code doesn't allocate memory of lookup element to add packet
flag. This patch adds one lookup item in the list to fix this memory
issue.

Fixes: 8b95092b7f69 ("net/ice/base: fix direction of flow that matches any")
Cc: sta...@dpdk.org

Signed-off-by: Yuying Zhang 
---
 drivers/net/ice/ice_switch_filter.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ice/ice_switch_filter.c 
b/drivers/net/ice/ice_switch_filter.c
index 36c9bffb73..e84283fec1 100644
--- a/drivers/net/ice/ice_switch_filter.c
+++ b/drivers/net/ice/ice_switch_filter.c
@@ -1863,7 +1863,10 @@ ice_switch_parse_pattern_action(struct ice_adapter *ad,
else if (vlan_num == 2)
tun_type = ICE_NON_TUN_QINQ;
 
-   list = rte_zmalloc(NULL, item_num * sizeof(*list), 0);
+   /* reserve one more memory slot for direction flag which may
+* consume 1 lookup item.
+*/
+   list = rte_zmalloc(NULL, (item_num + 1) * sizeof(*list), 0);
if (!list) {
rte_flow_error_set(error, EINVAL,
   RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
-- 
2.25.1



[PATCH v2] vdpa/sfc: handle sync issue between qemu and vhost-user

2022-06-27 Thread abhimanyu.saini
From: Abhimanyu Saini 

When DPDK app is running in the VF, it sometimes rings the doorbell
before dev_config has had a chance to complete and hence it misses
the event. As workaround, ring the doorbell when vDPA reports the
notify_area to QEMU.

Fixes: 630be406dcbf ("vdpa/sfc: get queue notify area info")
Cc: sta...@dpdk.org

Signed-off-by: Vijay Kumar Srivastava 
Signed-off-by: Abhimanyu Saini 
---
v1:
* Update the commit id that this patch fixes

 drivers/vdpa/sfc/sfc_vdpa_ops.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c
index b3d9b6c..63aa52d 100644
--- a/drivers/vdpa/sfc/sfc_vdpa_ops.c
+++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c
@@ -794,6 +794,8 @@
int vfio_dev_fd;
efx_rc_t rc;
unsigned int bar_offset;
+   volatile void *doorbell;
+   struct rte_pci_device *pci_dev;
struct rte_vdpa_device *vdpa_dev;
struct sfc_vdpa_ops_data *ops_data;
struct vfio_region_info reg = { .argsz = sizeof(reg) };
@@ -856,6 +858,18 @@
sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64,
  *offset);
 
+   pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev;
+   doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + *offset;
+
+   /*
+* virtio-net driver in VM sends queue notifications before
+* vDPA has a chance to setup the queues and notification area,
+* and hence the HW misses these doorbell notifications.
+* Since, it is safe to send duplicate doorbell, send another
+* doorbell from vDPA driver as workaround for this timing issue.
+*/
+   rte_write16(qid, doorbell);
+
return 0;
 }
 
-- 
1.8.3.1



[PATCH v3] net/igc: move the initialization of data path into dev_init

2022-06-27 Thread zhichaox . zeng
From: Zhichao Zeng 

The upper-layer application usually call the common interface "dev_init"
to initialize the data path, but in the igc driver, the initialization
of data path is in "igc_rx_init" and "eth_igc_tx_queue_setup", while in
other drivers it is in "dev_init". So when upper-layer application
calling these function pointers will occur segmentation faults.

This patch moves the initialization of data path into "eth_igc_dev_init"
to avoid segmentation faults, which is consistent with other drivers.

Fixes: a5aeb2b9e225 ("net/igc: support Rx and Tx")
Cc: alvinx.zh...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Zhichao Zeng 

---
v2:
remove unnecessary parameters, move declaration to relevant header file
---
v3:
remove redundant code, optimize commit log
---
 drivers/net/igc/igc_ethdev.c |  3 +++
 drivers/net/igc/igc_txrx.c   | 10 +++---
 drivers/net/igc/igc_txrx.h   |  4 
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/net/igc/igc_ethdev.c b/drivers/net/igc/igc_ethdev.c
index b9933b395d..25fb91bfec 100644
--- a/drivers/net/igc/igc_ethdev.c
+++ b/drivers/net/igc/igc_ethdev.c
@@ -1234,6 +1234,9 @@ eth_igc_dev_init(struct rte_eth_dev *dev)
dev->rx_queue_count = eth_igc_rx_queue_count;
dev->rx_descriptor_status = eth_igc_rx_descriptor_status;
dev->tx_descriptor_status = eth_igc_tx_descriptor_status;
+   dev->rx_pkt_burst = igc_recv_pkts;
+   dev->tx_pkt_burst = igc_xmit_pkts;
+   dev->tx_pkt_prepare = eth_igc_prep_pkts;
 
/*
 * for secondary processes, we don't initialize any further as primary
diff --git a/drivers/net/igc/igc_txrx.c b/drivers/net/igc/igc_txrx.c
index e48d5df11a..f38c5e7863 100644
--- a/drivers/net/igc/igc_txrx.c
+++ b/drivers/net/igc/igc_txrx.c
@@ -345,7 +345,7 @@ rx_desc_get_pkt_info(struct igc_rx_queue *rxq, struct 
rte_mbuf *rxm,
rxm->packet_type = rx_desc_pkt_info_to_pkt_type(pkt_info);
 }
 
-static uint16_t
+uint16_t
 igc_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 {
struct igc_rx_queue * const rxq = rx_queue;
@@ -1071,8 +1071,6 @@ igc_rx_init(struct rte_eth_dev *dev)
uint16_t i;
int ret;
 
-   dev->rx_pkt_burst = igc_recv_pkts;
-
/*
 * Make sure receives are disabled while setting
 * up the descriptor ring.
@@ -1397,7 +1395,7 @@ eth_igc_rx_queue_setup(struct rte_eth_dev *dev,
 }
 
 /* prepare packets for transmit */
-static uint16_t
+uint16_t
 eth_igc_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts)
 {
@@ -1604,7 +1602,7 @@ tx_desc_cksum_flags_to_olinfo(uint64_t ol_flags)
return tmp;
 }
 
-static uint16_t
+uint16_t
 igc_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
struct igc_tx_queue * const txq = tx_queue;
@@ -2030,8 +2028,6 @@ int eth_igc_tx_queue_setup(struct rte_eth_dev *dev, 
uint16_t queue_idx,
txq->sw_ring, txq->tx_ring, txq->tx_ring_phys_addr);
 
igc_reset_tx_queue(txq);
-   dev->tx_pkt_burst = igc_xmit_pkts;
-   dev->tx_pkt_prepare = ð_igc_prep_pkts;
dev->data->tx_queues[queue_idx] = txq;
txq->offloads = tx_conf->offloads;
 
diff --git a/drivers/net/igc/igc_txrx.h b/drivers/net/igc/igc_txrx.h
index 535108a868..a5240df9d7 100644
--- a/drivers/net/igc/igc_txrx.h
+++ b/drivers/net/igc/igc_txrx.h
@@ -49,6 +49,10 @@ void eth_igc_txq_info_get(struct rte_eth_dev *dev, uint16_t 
queue_id,
struct rte_eth_txq_info *qinfo);
 void eth_igc_vlan_strip_queue_set(struct rte_eth_dev *dev,
uint16_t rx_queue_id, int on);
+uint16_t igc_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t 
nb_pkts);
+uint16_t igc_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t 
nb_pkts);
+uint16_t eth_igc_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf 
**tx_pkts,
+   uint16_t nb_pkts);
 #ifdef __cplusplus
 }
 #endif
-- 
2.25.1



Re: [PATCH v4] net: fix checksum with unaligned buffer

2022-06-27 Thread Mattias Rönnblom

On 2022-06-27 22:21, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Monday, 27 June 2022 19.23

On 2022-06-27 15:22, Morten Brørup wrote:

From: Emil Berg [mailto:emil.b...@ericsson.com]
Sent: Monday, 27 June 2022 14.51


From: Emil Berg
Sent: den 27 juni 2022 14:46


From: Mattias Rönnblom 
Sent: den 27 juni 2022 14:28

On 2022-06-23 14:51, Morten Brørup wrote:

From: Morten Brørup [mailto:m...@smartsharesystems.com]
Sent: Thursday, 23 June 2022 14.39

With this patch, the checksum can be calculated on an unaligned

buffer.

I.e. the buf parameter is no longer required to be 16 bit

aligned.


The checksum is still calculated using a 16 bit aligned pointer,

so

the compiler can auto-vectorize the function's inner loop.

When the buffer is unaligned, the first byte of the buffer is
handled separately. Furthermore, the calculated checksum of the
buffer is byte shifted before being added to the initial

checksum,

to compensate for the checksum having been calculated on the

buffer

shifted by one byte.

v4:
* Add copyright notice.
* Include stdbool.h (Emil Berg).
* Use RTE_PTR_ADD (Emil Berg).
* Fix one more typo in commit message. Is 'unligned' even a

word?

v3:
* Remove braces from single statement block.
* Fix typo in commit message.
v2:
* Do not assume that the buffer is part of an aligned packet

buffer.


Bugzilla ID: 1035
Cc: sta...@dpdk.org

Signed-off-by: Morten Brørup 


[...]



The compiler will be able to auto vectorize even unaligned

accesses,

just with different instructions. From what I can tell, there's no
performance impact, at least not on the x86_64 systems I tried on.

I think you should remove the first special case conditional and

use

memcpy() instead of the cumbersome __may_alias__ construct to

retrieve

the data.



Here:
https://www.agner.org/optimize/instruction_tables.pdf
it lists the latency of vmovdqa (aligned) as 6 cycles and the

latency

for

vmovdqu (unaligned) as 7 cycles. So I guess there can be some

difference.

Although in practice I'm not sure what difference it makes. I've

not

seen any

difference in runtime between the two versions.



Correction to my comment:
Those stats are for some older CPU. For some newer CPUs such as

Tiger

Lake the stats seem to be the same regardless of aligned or

unaligned.




I agree that the memcpy method is more elegant and easy to read.

However, we would need to performance test the modified checksum

function with a large number of CPUs to prove that we don't introduce a
performance regression on any CPU architecture still supported by DPDK.
And Emil already found a CPU where it costs 1 extra cycle per 16 bytes,
which adds up to a total of ca. 91 extra cycles on a 1460 byte TCP
packet.




I think you've misunderstood what latency means in such tables. It's a
data dependency thing, not a measure of throughput. The throughput is
*much* higher. My guess would be two such instruction per clock.

For your 1460 bytes example, my Zen3 AMD needs performs identical with
both the current DPDK implementation, your patch, and a memcpy()-ified
version of the current implementation. They all need ~130 clock
cycles/packet, with warm caches. IPC is 3 instructions per cycle, but
obvious not all instructions are SIMD.


You're right, I wasn't thinking deeper about it before extrapolating.

Great to see some real numbers! I wish someone would do the same testing on an 
old ARM CPU, so we could also see the other end of the scale.



I've ran it on an ARM A72. For the aligned 1460 bytes case I got: 
Current DPDK ~572 cc. Your patch: ~578 cc. Memcpy-fied: ~573 cc. They 
performed about the same for all unaligned/aligned and sizes I tested. 
This platform (or could be GCC version as well) doesn't suffer from the 
unaligned performance degradation your patch showed on my AMD machine.



The main issue with checksumming on the CPU is, in my experience, not
that you don't have enough compute, but that you trash the caches.


Agree. I have noticed that x86 has "non-temporal" instruction variants to 
load/store data without trashing the cache entirely.

A variant of the checksum function using such instructions might be handy.



Yes, although you may need to prefetch the payload for good performance.


Variants of the memcpy function using such instructions might also be handy for 
some purposes, e.g. copying the contents of packets, where the original and/or 
copy will not accessed shortly thereafter.



Indeed and I think it's been discussed on the list. There's some work to 
get it right, since alignment requirement and the fact a different 
memory model is used for those SIMD instructions causes trouble for a 
generic implementation. (For x86_64.)



So I opted for a solution with zero changes to the inner loop, so no

performance retesting is required (for the previously supported use
cases, where the buffer is aligned).




You will see performance degradation with this solution as well, under
certain con

RE: [PATCH v1] net/ice: fix memory allocation issue of packet flag

2022-06-27 Thread Zhang, Qi Z



> -Original Message-
> From: Zhang, Yuying 
> Sent: Tuesday, June 28, 2022 12:07 PM
> To: dev@dpdk.org; Zhang, Qi Z 
> Cc: Zhang, Yuying ; sta...@dpdk.org
> Subject: [PATCH v1] net/ice: fix memory allocation issue of packet flag
> 
> Current code doesn't allocate memory of lookup element to add packet flag.
> This patch adds one lookup item in the list to fix this memory issue.
> 
> Fixes: 8b95092b7f69 ("net/ice/base: fix direction of flow that matches any")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Yuying Zhang 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi


RE: [PATCH] net/iavf: fix issue of VF resetting

2022-06-27 Thread Zhang, Qi Z



> -Original Message-
> From: Zhou, YidingX 
> Sent: Monday, June 27, 2022 3:23 PM
> To: dev@dpdk.org
> Cc: sta...@dpdk.org; Zhou, YidingX ; Wu, Jingjing
> ; Xing, Beilei ; Zhang, Qi Z
> 
> Subject: [PATCH] net/iavf: fix issue of VF resetting
> 
> When the VF is in closed state, the vf_reset flag can not be reverted if the 
> VF is
> reset asynchronously. This prevents all virtchnl commands from executing,
> causing subsequent calls to iavf_dev_reset() to fail.
> 
> So the vf_reset flag needs to be reverted even when VF is in closed state.
> 
> Fixes: 676d986b4b86 ("net/iavf: fix crash after VF reset failure")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Yiding Zhou 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi



Re: Service core statistics MT safety

2022-06-27 Thread Mattias Rönnblom
On 2022-06-28 02:14, Honnappa Nagarahalli wrote:
> 
>>
>>> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
>>> Sent: Monday, 27 June 2022 13.06
>>>
>>> Hi.
>>>
>>> Is it safe to enable stats on MT safe services?
>>>
>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
>> 4
>>> 5444731-6096fdb16385f88f&q=1&e=27b94605-d1e2-40b6-af6d-
>> 9ebc54d
>>>
>> 5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%
>> 2Flib
>>> %2Feal%2Fcommon%2Frte_service.c%23
>>> L3
>>> 6
>>> 6
>>>
>>> It seems to me this would have to be an __atomic_add for this code
>>> to produce deterministic results.
>>
>> I agree. The same goes for the 'calls' field.
> The calling function does the locking.
> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
>> 454
> 44731-5ce419f8bf9f9b76&q=1&e=27b94605-d1e2-40b6-af6d-
>> 9ebc54d5db1
>
>> 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib
>> %2Feal
> %2Fcommon%2Frte_service.c%23L3
> 98
>
> For more information you can look at:
> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
>> 454
> 44731-ba0d1416f08856f0&q=1&e=27b94605-d1e2-40b6-af6d-
>> 9ebc54d5db1
>
>> 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib
>> %2Feal
> %2Finclude%2Frte_service.h%23L
> 120
>

 What about the
 https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
>> 4544
 4731-b64334addc78c264&q=1&e=27b94605-d1e2-40b6-af6d-
>> 9ebc54d5db18&

>> u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2
>> Feal%2F
 common%2Frte_service.c%23L404
 call (for MT safe services)?

 There's no lock held there.
>>> Good point.
>>> This is the case where the service running in service cores is MT safe. 
>>> However,
>> the stats are incremented outside of the MT Safety mechanism employed by the
>> service. So, yes, this and other updates in the function
>> 'service_runner_do_callback' need to be updated atomically.
>>
>> Maybe a better solution would be to move this to the core_state struct (and
>> eliminate the "calls" field since the same information is already in the 
>> core_state
>> struct). The contention on these cache lines will be pretty crazy for 
>> services with
>> short run time (say a thousand cycles or less per call), assuming they are
>> mapped to many cores.
> That's one option, the structures are internal as well. With this option 
> stats need to be aggregated which will not give an accurate view. But, that 
> is the nature of the statistics.
> 

Per-core counters is a very common pattern. Used for Linux MIB counters, 
for example. I'm not sure I think it's much less accurate. I mean, you 
just load in quick succession what's globally visible for the different 
per-lcore counters. If you do a relaxed store on an ARM, how long time 
does it take until it's seen by someone doing a relaxed load on a 
different core? Roughly.

> I am also wondering if these stats are of any use other than for debugging. 
> Adding a capability to disable stats might help as well.
> 

They could be used as a crude tool to determine service core 
utilization. Comparing utilization between different services running on 
the same core should be straight-forward, but lcore utilization is 
harder in absolute terms. If you just look at "cycles", a completely 
idle core would look like it's very busy (basically rdtsc latency added 
for every loop). I assume you'd have to do some heuristic based on both 
"calls" and "cycles" to get an estimate.

I think service core utilization would be very useful, although that 
would require some changes in the service function signature, so the 
service can report back if it did some useful work for a particular call.

That would make for a DPDK 'top'. Just like 'top', it can't impose any 
serious performance degradation when used, to be really useful, I think.

Sure, it should be possible to turn it on and off. I thought that was 
the case already?

>>
>> Idle service cores will basically do nothing else than stall waiting for 
>> these lines, I
>> suspect, hampering the progress of more busy cores.