Re: [PATCH 1/1] test/pmd_perf: handling of unknown connection speed
On 6/26/22 17:15, Thomas Monjalon wrote: 11/05/2022 18:33, Heinrich Schuchardt: When running DPDK in QEMU it cannot determine the connection speed. pmd_perf_autotest treats this as if the connection speed where UNIT32_MAX Mbps: RTE>>pmd_perf_autotest Start PMD RXTX cycles cost test. Allocated mbuf pool on socket 0 CONFIG RXD=1024 TXD=1024 Performance test runs on lcore 1 socket 0 Port 0 Address:52:54:00:12:34:57 Port 1 Address:52:54:00:12:34:58 Checking link statuses... Port 0 Link up at Unknown FDX Autoneg Port 1 Link up at Unknown FDX Autoneg IPv4 pktlen 46 UDP pktlen 26 Generate 4096 packets @socket 0 inject 2048 packet to port 0 inject 2048 packet to port 1 Total packets inject to prime ports = 4096 Each port will do 6391320379464 packets per second Test will stop after at least 25565281517856 packets received This will not allow the test to terminate in a reasonable timespan. Just assume 10 Gbps in this case instead: ... Test will stop after at least 59523808 packets received Signed-off-by: Heinrich Schuchardt --- app/test/test_pmd_perf.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c index 25611bfe9b..ee08c8aade 100644 --- a/app/test/test_pmd_perf.c +++ b/app/test/test_pmd_perf.c @@ -486,10 +486,17 @@ main_loop(__rte_unused void *args) } printf("Total packets inject to prime ports = %u\n", idx); - packets_per_second = (link_mbps * 1000 * 1000) / - ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT); - printf("Each port will do %"PRIu64" packets per second\n", - packets_per_second); + if (link_mbps != RTE_ETH_SPEED_NUM_UNKNOWN) { + packets_per_second = (link_mbps * 1000 * 1000) / + ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT); + printf("Each port will do %"PRIu64" packets per second\n", + packets_per_second); + total_packets = RTE_TEST_DURATION * conf->nb_ports * packets_per_second; Yes this line should be removed. This is redundant with below. + } else { + /* We don't know the speed. Pretend it is 10G */ + packets_per_second = ((uint64_t)RTE_ETH_SPEED_NUM_10G * 1000 * 1000) / + ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT); + } total_packets = RTE_TEST_DURATION * conf->nb_ports * packets_per_second; Why not just inserting this: if (link_mbps == RTE_ETH_SPEED_NUM_UNKNOWN) link_mbps = RTE_ETH_SPEED_NUM_10G; Following your suggestion the message "Each port will do %"PRIu64" packets per second\n" would provide misleading information to the user. This should be avoided. Best regards Heinrich
[PATCH] net/iavf: fix issue of VF resetting
When the VF is in closed state, the vf_reset flag can not be reverted if the VF is reset asynchronously. This prevents all virtchnl commands from executing, causing subsequent calls to iavf_dev_reset() to fail. So the vf_reset flag needs to be reverted even when VF is in closed state. Fixes: 676d986b4b86 ("net/iavf: fix crash after VF reset failure") Cc: sta...@dpdk.org Signed-off-by: Yiding Zhou --- drivers/net/iavf/iavf_ethdev.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c index 7df0bf8118..506fcff6e3 100644 --- a/drivers/net/iavf/iavf_ethdev.c +++ b/drivers/net/iavf/iavf_ethdev.c @@ -2702,8 +2702,10 @@ iavf_dev_close(struct rte_eth_dev *dev) if (rte_eal_process_type() != RTE_PROC_PRIMARY) return 0; - if (adapter->closed) - return 0; + if (adapter->closed) { + ret = 0; + goto out; + } ret = iavf_dev_stop(dev); adapter->closed = true; @@ -2763,6 +2765,7 @@ iavf_dev_close(struct rte_eth_dev *dev) * the bus master bit will not be disabled, and this call will have no * effect. */ +out: if (vf->vf_reset && !rte_pci_set_bus_master(pci_dev, true)) vf->vf_reset = false; -- 2.34.1
Re: [PATCH 22.07] doc: make doc roadmap common for Linux/BSD GSGs
On Sun, Jun 26, 2022 at 11:34:30PM +0200, Thomas Monjalon wrote: > 14/06/2022 16:12, Mcnamara, John: > > From: Bruce Richardson > > > Both the Linux and FreeBSD GSG docs had a "Documentation Roadmap" > > > section as part of the introduction page, and this contained the same > > > information, with only the reference to the GSGs themselves being > > > different. This text can be consolidated into a single text file which is > > > included by both GSG intro sections - using relative links for the self > > > reference. > > > > Good idea. > > > > Acked-by: John McNamara > > We already have a file doc/guides/linux_gsg/eal_args.include.rst > I'll keep the format .include.rst for such file. > Good idea. I missed that common file. > Applied with this update, thanks. > Thanks
Re: [PATCH 1/1] test/pmd_perf: handling of unknown connection speed
27/06/2022 09:18, Heinrich Schuchardt: > On 6/26/22 17:15, Thomas Monjalon wrote: > > 11/05/2022 18:33, Heinrich Schuchardt: > >> When running DPDK in QEMU it cannot determine the connection speed. > >> pmd_perf_autotest treats this as if the connection speed where > >> UNIT32_MAX Mbps: > >> > >> RTE>>pmd_perf_autotest > >> Start PMD RXTX cycles cost test. > >> Allocated mbuf pool on socket 0 > >> CONFIG RXD=1024 TXD=1024 > >> Performance test runs on lcore 1 socket 0 > >> Port 0 Address:52:54:00:12:34:57 > >> Port 1 Address:52:54:00:12:34:58 > >> Checking link statuses... > >> Port 0 Link up at Unknown FDX Autoneg > >> Port 1 Link up at Unknown FDX Autoneg > >> IPv4 pktlen 46 > >> UDP pktlen 26 > >> Generate 4096 packets @socket 0 > >> inject 2048 packet to port 0 > >> inject 2048 packet to port 1 > >> Total packets inject to prime ports = 4096 > >> Each port will do 6391320379464 packets per second > >> Test will stop after at least 25565281517856 packets received > >> > >> This will not allow the test to terminate in a reasonable timespan. > >> Just assume 10 Gbps in this case instead: > >> > >> ... > >> Test will stop after at least 59523808 packets received > >> > >> Signed-off-by: Heinrich Schuchardt > >> --- > >> app/test/test_pmd_perf.c | 15 +++ > >> 1 file changed, 11 insertions(+), 4 deletions(-) > >> > >> diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c > >> index 25611bfe9b..ee08c8aade 100644 > >> --- a/app/test/test_pmd_perf.c > >> +++ b/app/test/test_pmd_perf.c > >> @@ -486,10 +486,17 @@ main_loop(__rte_unused void *args) > >>} > >>printf("Total packets inject to prime ports = %u\n", idx); > >> > >> - packets_per_second = (link_mbps * 1000 * 1000) / > >> - ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT); > >> - printf("Each port will do %"PRIu64" packets per second\n", > >> - packets_per_second); > >> + if (link_mbps != RTE_ETH_SPEED_NUM_UNKNOWN) { > >> + packets_per_second = (link_mbps * 1000 * 1000) / > >> + ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT); > >> + printf("Each port will do %"PRIu64" packets per second\n", > >> + packets_per_second); > >> + total_packets = RTE_TEST_DURATION * conf->nb_ports * > >> packets_per_second; > > Yes this line should be removed. > > > > > This is redundant with below. > > > >> + } else { > >> + /* We don't know the speed. Pretend it is 10G */ > >> + packets_per_second = ((uint64_t)RTE_ETH_SPEED_NUM_10G * 1000 * > >> 1000) / > >> + ((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT); > >> + } > >> > >>total_packets = RTE_TEST_DURATION * conf->nb_ports * packets_per_second; > > > > Why not just inserting this: > > > > if (link_mbps == RTE_ETH_SPEED_NUM_UNKNOWN) > > link_mbps = RTE_ETH_SPEED_NUM_10G; > > Following your suggestion the message "Each port will do %"PRIu64" > packets per second\n" would provide misleading information to the user. > This should be avoided. OK so we can have the printf inside an "if condition": bool speed_unknown = (link_mbps == RTE_ETH_SPEED_NUM_UNKNOWN); if (speed_unknown) link_mbps = RTE_ETH_SPEED_NUM_10G; packets_per_second = ...; if (!speed_unknown) printf(...); total_packets = ...;
[PATCH] crypto/cnxk: predecrement esn value to be used in session
ESN provided in the session would be the next sequence number to be used. Hence predecrement the value, so that in datapath, incremented value will be as expected. Signed-off-by: Anoob Joseph --- drivers/crypto/cnxk/cn9k_ipsec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/cnxk/cn9k_ipsec.c b/drivers/crypto/cnxk/cn9k_ipsec.c index cb9cf174a4..6d26b0cc01 100644 --- a/drivers/crypto/cnxk/cn9k_ipsec.c +++ b/drivers/crypto/cnxk/cn9k_ipsec.c @@ -41,7 +41,8 @@ cn9k_ipsec_outb_sa_create(struct cnxk_cpt_qp *qp, /* Initialize lookaside IPsec private data */ sa->dir = RTE_SECURITY_IPSEC_SA_DIR_EGRESS; - sa->esn = ipsec->esn.value; + if (ipsec->esn.value) + sa->esn = ipsec->esn.value - 1; ret = cnxk_ipsec_outb_rlens_get(&sa->rlens, ipsec, crypto_xform); if (ret) -- 2.25.1
[PATCH] vdpa/sfc: handle sync issue between qemu and vhost-user
From: Abhimanyu Saini When DPDK app is running in the VF, it sometimes rings the doorbell before dev_config has had a chance to complete and hence it misses the event. As workaround, ring the doorbell when vDPA reports the notify_area to QEMU. Fixes: 5e7596ba7cb3 ("vdpa/sfc: introduce Xilinx vDPA driver") Cc: sta...@dpdk.org Signed-off-by: Vijay Kumar Srivastava Signed-off-by: Abhimanyu Saini --- drivers/vdpa/sfc/sfc_vdpa_ops.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c index b3d9b6c..63aa52d 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.c +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c @@ -794,6 +794,8 @@ int vfio_dev_fd; efx_rc_t rc; unsigned int bar_offset; + volatile void *doorbell; + struct rte_pci_device *pci_dev; struct rte_vdpa_device *vdpa_dev; struct sfc_vdpa_ops_data *ops_data; struct vfio_region_info reg = { .argsz = sizeof(reg) }; @@ -856,6 +858,18 @@ sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64, *offset); + pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev; + doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + *offset; + + /* +* virtio-net driver in VM sends queue notifications before +* vDPA has a chance to setup the queues and notification area, +* and hence the HW misses these doorbell notifications. +* Since, it is safe to send duplicate doorbell, send another +* doorbell from vDPA driver as workaround for this timing issue. +*/ + rte_write16(qid, doorbell); + return 0; } -- 1.8.3.1
[PATCH 2/4] vhost: restore device information in log messages
device information in the log messages was dropped. Fixes: 52ade97e3641 ("vhost: fix physical address mapping") Signed-off-by: David Marchand --- lib/vhost/vhost_user.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index 2b9a3b69fa..f324f822d6 100644 --- a/lib/vhost/vhost_user.c +++ b/lib/vhost/vhost_user.c @@ -144,7 +144,8 @@ async_dma_map(struct virtio_net *dev, bool do_map) return; /* DMA mapping errors won't stop VHOST_USER_SET_MEM_TABLE. */ - VHOST_LOG_CONFIG(ERR, "DMA engine map failed\n"); + VHOST_LOG_CONFIG(ERR, "(%s) DMA engine map failed\n", + dev->ifname); } } @@ -160,7 +161,8 @@ async_dma_map(struct virtio_net *dev, bool do_map) if (rte_errno == EINVAL) return; - VHOST_LOG_CONFIG(ERR, "DMA engine unmap failed\n"); + VHOST_LOG_CONFIG(ERR, "(%s) DMA engine unmap failed\n", + dev->ifname); } } } @@ -945,7 +947,8 @@ add_one_guest_page(struct virtio_net *dev, uint64_t guest_phys_addr, dev->max_guest_pages * sizeof(*page), RTE_CACHE_LINE_SIZE); if (dev->guest_pages == NULL) { - VHOST_LOG_CONFIG(ERR, "cannot realloc guest_pages\n"); + VHOST_LOG_CONFIG(ERR, "(%s) cannot realloc guest_pages\n", + dev->ifname); rte_free(old_pages); return -1; } -- 2.36.1
[PATCH 4/4] vhost: prefix logs with context
We recently improved the log messages in the vhost library, adding some context that helps filtering for a given vhost-user device. However, some parts of the code were missed, and some later code changes broke this new convention (fixes were sent previous to this patch). Change the VHOST_LOG_CONFIG/DATA helpers and always ask for a string used as context. This should help limit regressions on this topic. Most of the time, the context is the vhost-user device socket path. For the rest when a vhost-user device can not be related, generic names were chosen: - "dma", for vhost-user async DMA operations, - "device", for vhost-user device creation and lookup, - "thread", for threads management, Signed-off-by: David Marchand --- lib/vhost/iotlb.c | 30 +- lib/vhost/socket.c | 129 - lib/vhost/vdpa.c | 4 +- lib/vhost/vhost.c | 144 - lib/vhost/vhost.h | 20 +- lib/vhost/vhost_user.c | 642 + lib/vhost/virtio_net.c | 258 + 7 files changed, 634 insertions(+), 593 deletions(-) diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c index 5a5ba8b82a..35b4193606 100644 --- a/lib/vhost/iotlb.c +++ b/lib/vhost/iotlb.c @@ -70,18 +70,18 @@ vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue * ret = rte_mempool_get(vq->iotlb_pool, (void **)&node); if (ret) { - VHOST_LOG_CONFIG(DEBUG, - "(%s) IOTLB pool %s empty, clear entries for pending insertion\n", - dev->ifname, vq->iotlb_pool->name); + VHOST_LOG_CONFIG(dev->ifname, DEBUG, + "IOTLB pool %s empty, clear entries for pending insertion\n", + vq->iotlb_pool->name); if (!TAILQ_EMPTY(&vq->iotlb_pending_list)) vhost_user_iotlb_pending_remove_all(vq); else vhost_user_iotlb_cache_random_evict(vq); ret = rte_mempool_get(vq->iotlb_pool, (void **)&node); if (ret) { - VHOST_LOG_CONFIG(ERR, - "(%s) IOTLB pool %s still empty, pending insertion failure\n", - dev->ifname, vq->iotlb_pool->name); + VHOST_LOG_CONFIG(dev->ifname, ERR, + "IOTLB pool %s still empty, pending insertion failure\n", + vq->iotlb_pool->name); return; } } @@ -169,18 +169,18 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node); if (ret) { - VHOST_LOG_CONFIG(DEBUG, - "(%s) IOTLB pool %s empty, clear entries for cache insertion\n", - dev->ifname, vq->iotlb_pool->name); + VHOST_LOG_CONFIG(dev->ifname, DEBUG, + "IOTLB pool %s empty, clear entries for cache insertion\n", + vq->iotlb_pool->name); if (!TAILQ_EMPTY(&vq->iotlb_list)) vhost_user_iotlb_cache_random_evict(vq); else vhost_user_iotlb_pending_remove_all(vq); ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node); if (ret) { - VHOST_LOG_CONFIG(ERR, - "(%s) IOTLB pool %s still empty, cache insertion failed\n", - dev->ifname, vq->iotlb_pool->name); + VHOST_LOG_CONFIG(dev->ifname, ERR, + "IOTLB pool %s still empty, cache insertion failed\n", + vq->iotlb_pool->name); return; } } @@ -320,7 +320,7 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index) snprintf(pool_name, sizeof(pool_name), "iotlb_%u_%d_%d", getpid(), dev->vid, vq_index); - VHOST_LOG_CONFIG(DEBUG, "(%s) IOTLB cache name: %s\n", dev->ifname, pool_name); + VHOST_LOG_CONFIG(dev->ifname, DEBUG, "IOTLB cache name: %s\n", pool_name); /* If already created, free it and recreate */ vq->iotlb_pool = rte_mempool_lookup(pool_name); @@ -332,8 +332,8 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index) RTE_MEMPOOL_F_NO_CACHE_ALIGN | RTE_MEMPOOL_F_SP_PUT); if (!vq->iotlb_pool) { - VHOST_LOG_CONFIG(ERR, "(%s) Failed to create IOTLB cache pool %s\n", - dev->ifname, pool_name); + VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to create IOTLB cache pool %s\n", + pool_name); return -1; } diff --g
[PATCH 0/4] Vhost logs fixes and improvement
Here is a series that fixes log messages (with one regression being fixed in patch 2) and changes the VHOST_LOG_* helpers to enforce that vhost log messages will always have some context/prefix to help debugging on setups with many vhost ports. The first three patches are low risk and can probably be merged in v22.07. -- David Marchand David Marchand (4): vhost: add some trailing newline in log messages vhost: restore device information in log messages vhost: improve some datapath log messages vhost: prefix logs with context lib/vhost/iotlb.c | 30 +- lib/vhost/socket.c | 129 - lib/vhost/vdpa.c | 4 +- lib/vhost/vhost.c | 144 +- lib/vhost/vhost.h | 20 +- lib/vhost/vhost_user.c | 639 + lib/vhost/virtio_net.c | 258 + 7 files changed, 634 insertions(+), 590 deletions(-) -- 2.36.1
[Bug 1042] [dpdk-22.07](ABI) unit_tests_eal/link_bonding_rssconf: link_bonding_rssconf_autotest test failed
https://bugs.dpdk.org/show_bug.cgi?id=1042 Bug ID: 1042 Summary: [dpdk-22.07](ABI) unit_tests_eal/link_bonding_rssconf: link_bonding_rssconf_autotest test failed Product: DPDK Version: 22.03 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: examples Assignee: dev@dpdk.org Reporter: weix.l...@intel.com Target Milestone: --- [Environment] DPDK version: Use make showversion or for a non-released version: git remote -v && git show-ref --heads commit 7cac53f205ebd04d8ebd3ee6a9dd84f698d4ada3 (HEAD -> main, tag: v22.07-rc2, origin/main, origin/HEAD) Author: Thomas Monjalon Date: Mon Jun 27 04:03:44 2022 +0200version: 22.07-rc2 Signed-off-by: Thomas Monjalon Other software versions: N/A OS: Red Hat Enterprise Linux 8.4 (Ootpa)/Linux 4.18.0-305.el8.x86_64 Compiler: gcc version 8.5.0 20210514 (Red Hat 8.5.0-4) (GCC) Hardware platform: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz NIC hardware: Intel Ethernet Controller XL710 for 40GbE QSFP+ 1583 NIC firmware: i40e-4.18.0-305.el8.x86_64/8.70 0x8000c40f 1.3179.0 [Test Setup] Steps to reproduce List the steps to reproduce the issue. 1. Build the DPDK-22.07-rc2 lib with the following steps: Note: /tmp/dpdk.tar.gz is the DPDK-22.07-rc2 packet. tar zxfm /tmp/dpdk.tar.gz -C ~ cd ~/dpdk cd .. && rm -rf dpdk_lib && mv dpdk dpdk_lib && cd dpdk_lib rm -rf x86_64-native-linuxapp-gcc CC=gcc meson -Denable_kmods=True -Dlibdir=lib --default-library=shared x86_64-native-linuxapp-gcc ninja -C x86_64-native-linuxapp-gcc rm -rf /root/tmp/dpdk_share_lib DESTDIR=/root/tmp/dpdk_share_lib ninja -C x86_64-native-linuxapp-gcc -j 110 install rm -rf /root/shared_lib_dpdk mv /root/tmp/dpdk_share_lib/usr/local/lib /root/shared_lib_dpdk 2. Build the DPDK-21.11 APP with the following steps: Note: /tmp/dpdk_abi.tar.gz is the DPDK-21.11 packet. cd .. tar zxf /tmp/dpdk_abi.tar.gz -C ~ cd ~/dpdk/ rm -rf x86_64-native-linuxapp-gcc CC=gcc meson -Denable_kmods=True -Dlibdir=lib --default-library=shared x86_64-native-linuxapp-gcc ninja -C x86_64-native-linuxapp-gcc # delete the DPDK-21.11 target/lib and drivers directory rm -rf x86_64-native-linuxapp-gcc/lib rm -rf x86_64-native-linuxapp-gcc/drivers 3. Bind 2 NIC port to vfio-pci driver: dpdk-devbind.py --force --bind=vfio-pci :18:00.0 :18:00.1 4. Start dpdk-test APP: x86_64-native-linuxapp-gcc/app/test/dpdk-test -l 1-4 -n 4 -a :18:00.0 -a :18:00.1 --file-prefix=dpdk_63552_20220624173253-d /root/shared_lib_dpdk 5. Execute `link_bonding_rssconf_autotest` command to test: RTE>>link_bonding_rssconf_autotest Show the output from the previous commands. [root@abi80 dpdk]# x86_64-native-linuxapp-gcc/app/test/dpdk-test -l 1-4 -n 4 -a :18:00.0 -a :18:00.1 --file-prefix=dpdk_63552_20220624173253-d /root/shared_lib_dpdk EAL: Detected CPU lcores: 112 EAL: Detected NUMA nodes: 2 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/dpdk_63552_20220624173253/mp_socket EAL: Selected IOVA mode 'VA' EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size EAL: VFIO support initialized EAL: Using IOMMU type 1 (Type 1) EAL: Ignore mapping IO port bar(1) EAL: Ignore mapping IO port bar(4) EAL: Probe PCI driver: net_i40e (8086:1583) device: :18:00.0 (socket 0) i40e_GLQF_reg_init(): i40e device :18:00.0 changed global register [0x002689a0]. original: 0x, new: 0x0029 i40e_GLQF_reg_init(): i40e device :18:00.0 changed global register [0x00268ca4]. original: 0x1840, new: 0x9420 i40e_aq_debug_write_global_register(): i40e device :18:00.0 changed global register [0x0026c7a0]. original: 0xa8, after: 0x28 EAL: Ignore mapping IO port bar(1) EAL: Ignore mapping IO port bar(4) EAL: Probe PCI driver: net_i40e (8086:1583) device: :18:00.1 (socket 0) TELEMETRY: No legacy callbacks, legacy socket not created APP: HPET is not enabled, using TSC as default timer RTE>>link_bonding_rssconf_autotest + --- + + Test Suite : RSS Dynamic Configuration for Bonding Unit Test Suite + --- + + TestCase [ 0] : test_setup succeeded Device with port_id=2 already stopped Device with port_id=3 already stopped Device with port_id=4 already stopped Device with port_id=5 already stopped bond_ethdev_promiscuous_disable(2684) - Failed to disable promiscuous mode for port 2: Operation not supported bond_ethdev_promiscuous_disable(2684) - Failed to disable promiscuous mode for port 3: Operation not supported bond_ethdev_promiscuous_disable(2684) - Failed to disable promiscuous mode for port 4: Operation not supported bond_ethdev_promiscuous_disable(2684) - Failed to disable promiscuous mode for port 5: Operation not supported bond_ethdev_allmulticast_disa
[PATCH 1/4] vhost: add some trailing newline in log messages
VHOST_LOG_* macros don't append a newline. Add missing ones. Fixes: e623e0c6d8a5 ("vhost: add reconnect ability") Fixes: af1475918124 ("vhost: introduce API to start a specific driver") Fixes: 2dfeebe26546 ("vhost: check return of mutex initialization") Cc: sta...@dpdk.org Signed-off-by: David Marchand --- lib/vhost/socket.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c index 7a0f63af14..24d60ca149 100644 --- a/lib/vhost/socket.c +++ b/lib/vhost/socket.c @@ -499,7 +499,7 @@ vhost_user_reconnect_init(void) ret = pthread_mutex_init(&reconn_list.mutex, NULL); if (ret < 0) { - VHOST_LOG_CONFIG(ERR, "%s: failed to initialize mutex", __func__); + VHOST_LOG_CONFIG(ERR, "%s: failed to initialize mutex\n", __func__); return ret; } TAILQ_INIT(&reconn_list.head); @@ -507,9 +507,9 @@ vhost_user_reconnect_init(void) ret = rte_ctrl_thread_create(&reconn_tid, "vhost_reconn", NULL, vhost_user_client_reconnect, NULL); if (ret != 0) { - VHOST_LOG_CONFIG(ERR, "failed to create reconnect thread"); + VHOST_LOG_CONFIG(ERR, "failed to create reconnect thread\n"); if (pthread_mutex_destroy(&reconn_list.mutex)) - VHOST_LOG_CONFIG(ERR, "%s: failed to destroy reconnect mutex", __func__); + VHOST_LOG_CONFIG(ERR, "%s: failed to destroy reconnect mutex\n", __func__); } return ret; @@ -1170,8 +1170,8 @@ rte_vhost_driver_start(const char *path) "vhost-events", NULL, fdset_event_dispatch, &vhost_user.fdset); if (ret != 0) { - VHOST_LOG_CONFIG(ERR, "(%s) failed to create fdset handling thread", path); - + VHOST_LOG_CONFIG(ERR, "(%s) failed to create fdset handling thread\n", + path); fdset_pipe_uninit(&vhost_user.fdset); return -1; } -- 2.36.1
[PATCH 3/4] vhost: improve some datapath log messages
Those messages were missed when adding socket context. Fix this. Signed-off-by: David Marchand --- lib/vhost/vhost.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index 4ebcb7448a..810bc71c9d 100644 --- a/lib/vhost/vhost.h +++ b/lib/vhost/vhost.h @@ -652,7 +652,7 @@ extern int vhost_data_log_level; } \ snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), "\n"); \ \ - VHOST_LOG_DATA(DEBUG, "%s", packet); \ + VHOST_LOG_DATA(DEBUG, "(%s) %s", device->ifname, packet); \ } while (0) #else #define PRINT_PACKET(device, addr, size, header) do {} while (0) @@ -866,8 +866,8 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq) vq->signalled_used = new; vq->signalled_used_valid = true; - VHOST_LOG_DATA(DEBUG, "%s: used_event_idx=%d, old=%d, new=%d\n", - __func__, + VHOST_LOG_DATA(DEBUG, "(%s) %s: used_event_idx=%d, old=%d, new=%d\n", + dev->ifname, __func__, vhost_used_event(vq), old, new); -- 2.36.1
[PATCH 2/2] eventdev: add function to enq new events to the same queue
From: Pavan Nikhilesh Introduce new fastpath function to enqueue events with type *OP_NEW* to the same destination event queue. This function can be used as a hint to the PMD to use optimized the enqueue sequence. Signed-off-by: Pavan Nikhilesh --- lib/eventdev/eventdev_pmd.h | 5 +- lib/eventdev/eventdev_private.c | 13 ++ lib/eventdev/rte_eventdev.h | 80 +++- lib/eventdev/rte_eventdev_core.h | 11 - 4 files changed, 105 insertions(+), 4 deletions(-) diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h index 69402668d8..f0bb97fb89 100644 --- a/lib/eventdev/eventdev_pmd.h +++ b/lib/eventdev/eventdev_pmd.h @@ -178,7 +178,10 @@ struct rte_eventdev { /**< Pointer to PMD eth Tx adapter enqueue function. */ event_crypto_adapter_enqueue_t ca_enqueue; - uint64_t reserved_64s[4]; /**< Reserved for future fields */ + event_enqueue_queue_burst_t enqueue_new_same_dest; + /**< PMD enqueue burst queue new function to same destination queue. */ + + uint64_t reserved_64s[3]; /**< Reserved for future fields */ void *reserved_ptrs[3]; /**< Reserved for future fields */ } __rte_cache_aligned; diff --git a/lib/eventdev/eventdev_private.c b/lib/eventdev/eventdev_private.c index 1d3d9d357e..53d1db281b 100644 --- a/lib/eventdev/eventdev_private.c +++ b/lib/eventdev/eventdev_private.c @@ -24,6 +24,17 @@ dummy_event_enqueue_burst(__rte_unused void *port, return 0; } +static uint16_t +dummy_event_enqueue_queue_burst(__rte_unused void *port, + __rte_unused uint8_t queue, + __rte_unused const struct rte_event ev[], + __rte_unused uint16_t nb_events) +{ + RTE_EDEV_LOG_ERR( + "event enqueue burst requested for unconfigured event device"); + return 0; +} + static uint16_t dummy_event_dequeue(__rte_unused void *port, __rte_unused struct rte_event *ev, __rte_unused uint64_t timeout_ticks) @@ -90,6 +101,7 @@ event_dev_fp_ops_reset(struct rte_event_fp_ops *fp_op) .enqueue_burst = dummy_event_enqueue_burst, .enqueue_new_burst = dummy_event_enqueue_burst, .enqueue_forward_burst = dummy_event_enqueue_burst, + .enqueue_new_same_dest = dummy_event_enqueue_queue_burst, .dequeue = dummy_event_dequeue, .dequeue_burst = dummy_event_dequeue_burst, .maintain = dummy_event_maintain, @@ -111,6 +123,7 @@ event_dev_fp_ops_set(struct rte_event_fp_ops *fp_op, fp_op->enqueue_burst = dev->enqueue_burst; fp_op->enqueue_new_burst = dev->enqueue_new_burst; fp_op->enqueue_forward_burst = dev->enqueue_forward_burst; + fp_op->enqueue_new_same_dest = dev->enqueue_new_same_dest; fp_op->dequeue = dev->dequeue; fp_op->dequeue_burst = dev->dequeue_burst; fp_op->maintain = dev->maintain; diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h index 6a6f6ea4c1..2aa563740b 100644 --- a/lib/eventdev/rte_eventdev.h +++ b/lib/eventdev/rte_eventdev.h @@ -425,8 +425,9 @@ struct rte_event_dev_info { * A device that does not support bulk dequeue will set this as 1. */ uint32_t max_event_port_enqueue_depth; - /**< Maximum number of events can be enqueued at a time from an -* event port by this device. + /**< Maximum number of events that can be enqueued at a time to a +* event port by this device, applicable for rte_event::op is either +* *RTE_EVENT_OP_FORWARD* or *RTE_EVENT_OP_RELEASE* * A device that does not support bulk enqueue will set this as 1. */ uint8_t max_event_port_links; @@ -446,6 +447,12 @@ struct rte_event_dev_info { * device. These ports and queues are not accounted for in * max_event_ports or max_event_queues. */ + int16_t max_event_port_enqueue_new_burst; + /**< Maximum number of events that can be enqueued at a time to a +* event port by this device, applicable when rte_event::op is set to +* *RTE_EVENT_OP_NEW*. +* A device with no limits will set this value to -1. +*/ }; /** @@ -2082,6 +2089,75 @@ rte_event_enqueue_forward_burst(uint8_t dev_id, uint8_t port_id, fp_ops->enqueue_forward_burst); } +/** + * Enqueue a burst of events objects of operation type *RTE_EVENT_OP_NEW* on + * an event device designated by its *dev_id* through the event port specified + * by *port_id* to the same queue specified by *queue_id*. + * + * Provides the same functionality as rte_event_enqueue_burst(), expect that + * application can use this API when the all objects in the burst contains + * the enqueue operation of the type *RTE_EVENT_OP_NEW* and are destined to the + * same queue. This specialized function can provide the addi
[PATCH 1/2] doc: add enqueue depth for new event type
From: Pavan Nikhilesh A new field ``max_event_port_enqueue_new_burst`` will be added to the structure ``rte_event_dev_info``. The field defines the max enqueue burst size of new events (OP_NEW) supported by the underlying event device. Signed-off-by: Pavan Nikhilesh --- doc/guides/rel_notes/deprecation.rst | 5 + 1 file changed, 5 insertions(+) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 4e5b23c53d..071317e8e3 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -125,3 +125,8 @@ Deprecation Notices applications should be updated to use the ``dmadev`` library instead, with the underlying HW-functionality being provided by the ``ioat`` or ``idxd`` dma drivers + +* eventdev: The structure ``rte_event_dev_info`` will be extended to include the + max enqueue burst size of new events supported by the underlying event device. + A new field ``max_event_port_enqueue_new_burst`` will be added to the structure + ``rte_event_dev_info`` in DPDK 22.11. -- 2.25.1
[PATCH v1 1/1] eventdev/eth_tx: use timestamp as dynamic mbuf field
Added support to register timestamp dynamic field in mbuf. Signed-off-by: Ganapati Kundapura diff --git a/lib/eventdev/rte_event_eth_tx_adapter.c b/lib/eventdev/rte_event_eth_tx_adapter.c index c700fb7..23d5df3 100644 --- a/lib/eventdev/rte_event_eth_tx_adapter.c +++ b/lib/eventdev/rte_event_eth_tx_adapter.c @@ -74,6 +74,10 @@ do {\ } \ } while (0) +/* enable dynamic timestamp field in mbuf */ +uint64_t event_eth_tx_timestamp_dynflag; +int event_eth_tx_timestamp_dynfield_offset = -1; + /* Tx retry callback structure */ struct txa_retry { /* Ethernet port id */ @@ -197,7 +201,7 @@ static int txa_dev_id_array_init(void) { if (txa_dev_id_array == NULL) { - int i; + int i, ret; txa_dev_id_array = txa_memzone_array_get("txa_adapter_array", sizeof(int), @@ -207,6 +211,16 @@ txa_dev_id_array_init(void) for (i = 0; i < RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE; i++) txa_dev_id_array[i] = TXA_INVALID_DEV_ID; + + /* Register mbuf dynamic timestamp field */ + ret = rte_mbuf_dyn_tx_timestamp_register( + &event_eth_tx_timestamp_dynfield_offset, + &event_eth_tx_timestamp_dynflag); + if (ret != 0) { + RTE_EDEV_LOG_ERR("Error registering timestamp " +"field/flag"); + return -ENOMEM; + } } return 0; diff --git a/lib/eventdev/rte_event_eth_tx_adapter.h b/lib/eventdev/rte_event_eth_tx_adapter.h index 3908c2d..12e80a9 100644 --- a/lib/eventdev/rte_event_eth_tx_adapter.h +++ b/lib/eventdev/rte_event_eth_tx_adapter.h @@ -77,9 +77,19 @@ extern "C" { #include #include +#include #include "rte_eventdev.h" +extern int event_eth_tx_timestamp_dynfield_offset; + +static inline rte_mbuf_timestamp_t * +rte_event_eth_tx_timestamp_dynfield(struct rte_mbuf *mbuf) +{ + return RTE_MBUF_DYNFIELD(mbuf, + event_eth_tx_timestamp_dynfield_offset, rte_mbuf_timestamp_t *); +} + /** * Adapter configuration structure * -- 2.6.4
[PATCH v5] gro: bug fix in identifying fragmented packets
From: Kumara Parameshwaran A packet with RTE_PTYPE_L4_FRAG(0x300) contains both RTE_PTYPE_L4_TCP (0x100) & RTE_PTYPE_L4_UDP (0x200). A fragmented packet as defined in rte_mbuf_ptype.h cannot be recognized as other L4 types and hence the GRO layer should not use IS_IPV4_TCP_PKT or IS_IPV4_UDP_PKT for RTE_PTYPE_L4_FRAG. Hence, if the packet type is RTE_PTYPE_L4_FRAG the ip header should be parsed to recognize the appropriate IP type and invoke the respective gro handler. Fixes: 1ca5e6740852 ("gro: support UDP/IPv4") Cc: sta...@dpdk.org Signed-off-by: Kumara Parameshwaran --- v1: * Introduce IS_IPV4_FRAGMENT macro to check if fragmented packet and if true extract the IP header to identify the protocol type and invoke the appropriate gro handler. This is done for both rte_gro_reassemble and rte_gro_reassemble_burst APIs. v2,v3,v4: * Fix extra whitespace and column limit warnings v5 * Use RTE_PTYPE_L4_FRAG to identify the fragmented packets in IS_IPV4_TCP_PKT and IS_IPV4_VXLAN_TCP4_PKT lib/gro/rte_gro.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/gro/rte_gro.c b/lib/gro/rte_gro.c index 6f7dd4d709..e35399fd42 100644 --- a/lib/gro/rte_gro.c +++ b/lib/gro/rte_gro.c @@ -32,6 +32,7 @@ static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM] = { #define IS_IPV4_TCP_PKT(ptype) (RTE_ETH_IS_IPV4_HDR(ptype) && \ ((ptype & RTE_PTYPE_L4_TCP) == RTE_PTYPE_L4_TCP) && \ + ((ptype & RTE_PTYPE_L4_FRAG) != RTE_PTYPE_L4_FRAG) && \ (RTE_ETH_IS_TUNNEL_PKT(ptype) == 0)) #define IS_IPV4_UDP_PKT(ptype) (RTE_ETH_IS_IPV4_HDR(ptype) && \ @@ -40,6 +41,7 @@ static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM] = { #define IS_IPV4_VXLAN_TCP4_PKT(ptype) (RTE_ETH_IS_IPV4_HDR(ptype) && \ ((ptype & RTE_PTYPE_L4_UDP) == RTE_PTYPE_L4_UDP) && \ + ((ptype & RTE_PTYPE_L4_FRAG) != RTE_PTYPE_L4_FRAG) && \ ((ptype & RTE_PTYPE_TUNNEL_VXLAN) == \ RTE_PTYPE_TUNNEL_VXLAN) && \ ((ptype & RTE_PTYPE_INNER_L4_TCP) == \ -- 2.25.1
RE: [PATCH v4] net: fix checksum with unaligned buffer
> From: Emil Berg [mailto:emil.b...@ericsson.com] > Sent: Monday, 27 June 2022 09.57 > > > From: Morten Brørup > > Sent: den 23 juni 2022 14:51 > > > > > From: Morten Brørup [mailto:m...@smartsharesystems.com] > > > Sent: Thursday, 23 June 2022 14.39 > > > > > > With this patch, the checksum can be calculated on an unaligned > buffer. > > > I.e. the buf parameter is no longer required to be 16 bit aligned. > > > > > > The checksum is still calculated using a 16 bit aligned pointer, so > > > the compiler can auto-vectorize the function's inner loop. > > > > > > When the buffer is unaligned, the first byte of the buffer is > handled > > > separately. Furthermore, the calculated checksum of the buffer is > byte > > > shifted before being added to the initial checksum, to compensate > for > > > the checksum having been calculated on the buffer shifted by one > byte. > > > > > > v4: > > > * Add copyright notice. > > > * Include stdbool.h (Emil Berg). > > > * Use RTE_PTR_ADD (Emil Berg). > > > * Fix one more typo in commit message. Is 'unligned' even a word? > > > v3: > > > * Remove braces from single statement block. > > > * Fix typo in commit message. > > > v2: > > > * Do not assume that the buffer is part of an aligned packet > buffer. > > > > > > Bugzilla ID: 1035 > > > Cc: sta...@dpdk.org > > > > > > Signed-off-by: Morten Brørup > > > Tested-by: Emil Berg > > > --- > > > lib/net/rte_ip.h | 32 +++- > > > 1 file changed, 27 insertions(+), 5 deletions(-) > > > > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index > > > b502481670..738d643da0 100644 > > > --- a/lib/net/rte_ip.h > > > +++ b/lib/net/rte_ip.h > > > @@ -3,6 +3,7 @@ > > > * The Regents of the University of California. > > > * Copyright(c) 2010-2014 Intel Corporation. > > > * Copyright(c) 2014 6WIND S.A. > > > + * Copyright(c) 2022 SmartShare Systems. > > > * All rights reserved. > > > */ > > > > > > @@ -15,6 +16,7 @@ > > > * IP-related defines > > > */ > > > > > > +#include > > > #include > > > > > > #ifdef RTE_EXEC_ENV_WINDOWS > > > @@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t len, > > > uint32_t sum) { > > > /* extend strict-aliasing rules */ > > > typedef uint16_t __attribute__((__may_alias__)) u16_p; > > > - const u16_p *u16_buf = (const u16_p *)buf; > > > - const u16_p *end = u16_buf + len / sizeof(*u16_buf); > > > + const u16_p *u16_buf; > > > + const u16_p *end; > > > + uint32_t bsum = 0; > > > + const bool unaligned = (uintptr_t)buf & 1; > > > + > > > + /* if buffer is unaligned, keeping it byte order independent */ > > > + if (unlikely(unaligned)) { > > > + uint16_t first = 0; > > > + if (unlikely(len == 0)) > > > + return 0; > > > + ((unsigned char *)&first)[1] = *(const unsigned > > char *)buf; > > > + bsum += first; > > > + buf = RTE_PTR_ADD(buf, 1); > > > + len--; > > > + } > > > > > > + /* aligned access for compiler auto-vectorization */ > > > + u16_buf = (const u16_p *)buf; > > > + end = u16_buf + len / sizeof(*u16_buf); > > > for (; u16_buf != end; ++u16_buf) > > > - sum += *u16_buf; > > > + bsum += *u16_buf; > > > > > > /* if length is odd, keeping it byte order independent */ > > > if (unlikely(len % 2)) { > > > uint16_t left = 0; > > > *(unsigned char *)&left = *(const unsigned char > > *)end; > > > - sum += left; > > > + bsum += left; > > > } > > > > > > - return sum; > > > + /* if buffer is unaligned, swap the checksum bytes */ > > > + if (unlikely(unaligned)) > > > + bsum = (bsum & 0xFF00FF00) >> 8 | (bsum & > > 0x00FF00FF) << 8; > > > + > > > + return sum + bsum; > > > } > > > > > > /** > > > -- > > > 2.17.1 > > > > @Emil, thank you for thoroughly reviewing the previous versions. > > > > If your test succeeds and you are satisfied with the patch, remember > to reply > > with a "Tested-by" tag for patchwork. > > The test succeeded and I'm satisfied with the patch. I added 'Tested- > by: Emil Berg ' to the patch above, hopefully > as you intended. Thank you for testing! You don't need to put the tag inside the patch. Next time, just put the tag in your reply, like here, and Patchwork will catch it. Tested-by: Emil Berg The same goes for other tags, like Acked-by, Reviewed-by, etc.. -Morten
Service core statistics MT safety
Hi. Is it safe to enable stats on MT safe services? https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L366 It seems to me this would have to be an __atomic_add for this code to produce deterministic results. Best regards, Mattias
RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
> -Original Message- > From: Slava Ovsiienko > Sent: Monday, June 20, 2022 1:38 PM > To: Ali Alnubani ; Ruifeng Wang > ; Matan Azrad > Cc: dev@dpdk.org; Honnappa Nagarahalli > ; sta...@dpdk.org; nd > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector > path > > Hi, Ruifeng Hi Slava, Thanks for your review. > > My apologies for review delay. Apologies too. I was on something else. > As far I understand the hypothetical problem scenario is: > - CPU core reorders reading of qwords of 16B vector > - core reads the second 8B of CQE (old CQE values) > - CQE update > - core reads the first 8B of CQE (new CQE values) Yes, This is the problem. > > How the re-reading of CQEs can resolve the issue? > This wrong scenario might happen on the second read and we would run into > the same issue. Here we are trying to ordering reading of a 16B vector (8B with op_own - high, and 8B without op_own - low). The first read will load 16B. The second read will load and update low 8B (no op_own). There are 2 possible status indicated by op_own: valid, invalid. If CQE status is invalid, no problem, it will be ignored this time. If CQE status is valid, the second read ensures the rest of CQE is no older than high 8B (with op_own). Assuming NIC updates op_own no earlier than the rest part of CQE, I think the second read ensures CQE content retrieved is correct. > > In my opinion, the right solution to cover potential reordering should be: > - read CQE > - check CQE status (first 8B) We don't need to check CQE status at the moment. See explanation above. > - read memory barrier > - read the rest of CQE > > With best regards, > Slava > > > -Original Message- > > From: Ali Alnubani > > Sent: Thursday, May 19, 2022 17:56 > > To: Ruifeng Wang ; Matan Azrad > > ; Slava Ovsiienko > > Cc: dev@dpdk.org; honnappa.nagaraha...@arm.com; sta...@dpdk.org; > > n...@arm.com > > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON > > vector path > > > > > -Original Message- > > > From: Ruifeng Wang > > > Sent: Tuesday, January 4, 2022 5:01 AM > > > To: Matan Azrad ; Slava Ovsiienko > > > > > > Cc: dev@dpdk.org; honnappa.nagaraha...@arm.com; sta...@dpdk.org; > > > n...@arm.com; Ruifeng Wang > > > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON > > > vector path > > > > > > In NEON vector PMD, vector load loads two contiguous 8B of > > > descriptor data into vector register. Given vector load ensures no > > > 16B atomicity, read of the word that includes op_own field could be > > > reordered after read of other words. In this case, some words could > > > contain invalid data. > > > > > > Reloaded qword0 after read barrier to update vector register. This > > > ensures that the fetched data is correct. > > > > > > Testpmd single core test on N1SDP/ThunderX2 showed no performance > > > drop. > > > > > > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx > > > completions") > > > Cc: sta...@dpdk.org > > > > > > Signed-off-by: Ruifeng Wang > > > --- > > > > Tested with BlueField-2 and didn't see a performance impact. > > > > Tested-by: Ali Alnubani > > > > Thanks, > > Ali
Re: [PATCH] ip_frag: replace the rte memcpy with memcpy
On Fri, Jun 24, 2022 at 06:25:10PM +0100, Konstantin Ananyev wrote: > 23/06/2022 03:35, Stephen Hemminger пишет: > > On Wed, 22 Jun 2022 23:49:39 +0100 > > Konstantin Ananyev wrote: > > > > > > @@ -26,7 +25,7 @@ static inline void __fill_ipv4hdr_frag(struct > > > > rte_ipv4_hdr *dst, > > > > const struct rte_ipv4_hdr *src, uint16_t header_len, > > > > uint16_t len, uint16_t fofs, uint16_t dofs, uint32_t mf) > > > >{ > > > > - rte_memcpy(dst, src, header_len); > > > > + memcpy(dst, src, header_len); > > > > > > > > > I am fine with replacements in test and inside the lib, for cases > > > where 'len' parameter is constant value. > > > Though as I said before, here 'header_len' is not a constant value. > > > Are you sure it will not introduce any performance regression? > > > > Do you have any performance tests. The ip header options are very small. > > > From my experience - usually it is not about how big or small amount > we need to copy. It is about can compiler evaluate 'size' parameter > for memcpy() at compilation time or not. > If it can, great - it will most likely replace memcpy() > with some really well optimized code. > If not it has to generate a proper call to actual > memcpy() function. Which again, can be well optimized, but the > overhead of the function call itself can still be noticeable, > specially for small copies. > Anyway, as I can see, David already integrated these changes anyway. > So now, we'll have to wait and see would anyone complain or not. > About performance testing, the only one I am aware about: > examples/ip_fragmentation > > Konstantin > > For some small(<1k) size, " rep movsb" is very fast. much faster than I expected. I just noticed this in another application.
Re: [PATCH v4] net: fix checksum with unaligned buffer
On 2022-06-23 14:51, Morten Brørup wrote: From: Morten Brørup [mailto:m...@smartsharesystems.com] Sent: Thursday, 23 June 2022 14.39 With this patch, the checksum can be calculated on an unaligned buffer. I.e. the buf parameter is no longer required to be 16 bit aligned. The checksum is still calculated using a 16 bit aligned pointer, so the compiler can auto-vectorize the function's inner loop. When the buffer is unaligned, the first byte of the buffer is handled separately. Furthermore, the calculated checksum of the buffer is byte shifted before being added to the initial checksum, to compensate for the checksum having been calculated on the buffer shifted by one byte. v4: * Add copyright notice. * Include stdbool.h (Emil Berg). * Use RTE_PTR_ADD (Emil Berg). * Fix one more typo in commit message. Is 'unligned' even a word? v3: * Remove braces from single statement block. * Fix typo in commit message. v2: * Do not assume that the buffer is part of an aligned packet buffer. Bugzilla ID: 1035 Cc: sta...@dpdk.org Signed-off-by: Morten Brørup --- lib/net/rte_ip.h | 32 +++- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index b502481670..738d643da0 100644 --- a/lib/net/rte_ip.h +++ b/lib/net/rte_ip.h @@ -3,6 +3,7 @@ * The Regents of the University of California. * Copyright(c) 2010-2014 Intel Corporation. * Copyright(c) 2014 6WIND S.A. + * Copyright(c) 2022 SmartShare Systems. * All rights reserved. */ @@ -15,6 +16,7 @@ * IP-related defines */ +#include #include #ifdef RTE_EXEC_ENV_WINDOWS @@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) { /* extend strict-aliasing rules */ typedef uint16_t __attribute__((__may_alias__)) u16_p; - const u16_p *u16_buf = (const u16_p *)buf; - const u16_p *end = u16_buf + len / sizeof(*u16_buf); + const u16_p *u16_buf; + const u16_p *end; + uint32_t bsum = 0; + const bool unaligned = (uintptr_t)buf & 1; + + /* if buffer is unaligned, keeping it byte order independent */ + if (unlikely(unaligned)) { + uint16_t first = 0; + if (unlikely(len == 0)) + return 0; + ((unsigned char *)&first)[1] = *(const unsigned char *)buf; + bsum += first; + buf = RTE_PTR_ADD(buf, 1); + len--; + } + /* aligned access for compiler auto-vectorization */ The compiler will be able to auto vectorize even unaligned accesses, just with different instructions. From what I can tell, there's no performance impact, at least not on the x86_64 systems I tried on. I think you should remove the first special case conditional and use memcpy() instead of the cumbersome __may_alias__ construct to retrieve the data. + u16_buf = (const u16_p *)buf; + end = u16_buf + len / sizeof(*u16_buf); for (; u16_buf != end; ++u16_buf) - sum += *u16_buf; + bsum += *u16_buf; /* if length is odd, keeping it byte order independent */ if (unlikely(len % 2)) { uint16_t left = 0; *(unsigned char *)&left = *(const unsigned char *)end; - sum += left; + bsum += left; } - return sum; + /* if buffer is unaligned, swap the checksum bytes */ + if (unlikely(unaligned)) + bsum = (bsum & 0xFF00FF00) >> 8 | (bsum & 0x00FF00FF) << 8; + + return sum + bsum; } /** -- 2.17.1 @Emil, thank you for thoroughly reviewing the previous versions. If your test succeeds and you are satisfied with the patch, remember to reply with a "Tested-by" tag for patchwork.
RE: Service core statistics MT safety
> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] > Sent: Monday, 27 June 2022 13.06 > > Hi. > > Is it safe to enable stats on MT safe services? > > https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L36 > 6 > > It seems to me this would have to be an __atomic_add for this code to > produce deterministic results. I agree. The same goes for the 'calls' field.
[PATCH] lib/hash: fix the return value description of rte_hash
The rte_hash lookup can return ZERO which is not a positive value. Signed-off-by: Chenming C --- lib/hash/rte_hash.h | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/lib/hash/rte_hash.h b/lib/hash/rte_hash.h index 7fa9702..1bf1aac 100644 --- a/lib/hash/rte_hash.h +++ b/lib/hash/rte_hash.h @@ -288,7 +288,7 @@ struct rte_hash * * @return * - -EINVAL if the parameters are invalid. * - -ENOSPC if there is no space in the hash for this key. - * - A positive value that can be used by the caller as an offset into an + * - A non-negative value that can be used by the caller as an offset into an * array of user data. This value is unique for this key. This * unique key id may be larger than the user specified entry count * when RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD flag is set. @@ -312,7 +312,7 @@ struct rte_hash * * @return * - -EINVAL if the parameters are invalid. * - -ENOSPC if there is no space in the hash for this key. - * - A positive value that can be used by the caller as an offset into an + * - A non-negative value that can be used by the caller as an offset into an * array of user data. This value is unique for this key. This * unique key ID may be larger than the user specified entry count * when RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD flag is set. @@ -343,7 +343,7 @@ struct rte_hash * * @return * - -EINVAL if the parameters are invalid. * - -ENOENT if the key is not found. - * - A positive value that can be used by the caller as an offset into an + * - A non-negative value that can be used by the caller as an offset into an * array of user data. This value is unique for this key, and is the same * value that was returned when the key was added. */ @@ -375,7 +375,7 @@ struct rte_hash * * @return * - -EINVAL if the parameters are invalid. * - -ENOENT if the key is not found. - * - A positive value that can be used by the caller as an offset into an + * - A non-negative value that can be used by the caller as an offset into an * array of user data. This value is unique for this key, and is the same * value that was returned when the key was added. */ @@ -442,7 +442,7 @@ struct rte_hash * * @param data * Output with pointer to data returned from the hash table. * @return - * - A positive value that can be used by the caller as an offset into an + * - A non-negative value that can be used by the caller as an offset into an * array of user data. This value is unique for this key, and is the same * value that was returned when the key was added. * - -EINVAL if the parameters are invalid. @@ -467,7 +467,7 @@ struct rte_hash * * @param data * Output with pointer to data returned from the hash table. * @return - * - A positive value that can be used by the caller as an offset into an + * - A non-negative value that can be used by the caller as an offset into an * array of user data. This value is unique for this key, and is the same * value that was returned when the key was added. * - -EINVAL if the parameters are invalid. @@ -490,7 +490,7 @@ struct rte_hash * * @return * - -EINVAL if the parameters are invalid. * - -ENOENT if the key is not found. - * - A positive value that can be used by the caller as an offset into an + * - A non-negative value that can be used by the caller as an offset into an * array of user data. This value is unique for this key, and is the same * value that was returned when the key was added. */ @@ -512,7 +512,7 @@ struct rte_hash * * @return * - -EINVAL if the parameters are invalid. * - -ENOENT if the key is not found. - * - A positive value that can be used by the caller as an offset into an + * - A non-negative value that can be used by the caller as an offset into an * array of user data. This value is unique for this key, and is the same * value that was returned when the key was added. */ -- 1.8.3.1
RE: [PATCH v4] net: fix checksum with unaligned buffer
> From: Emil Berg [mailto:emil.b...@ericsson.com] > Sent: Monday, 27 June 2022 14.51 > > > From: Emil Berg > > Sent: den 27 juni 2022 14:46 > > > > > From: Mattias Rönnblom > > > Sent: den 27 juni 2022 14:28 > > > > > > On 2022-06-23 14:51, Morten Brørup wrote: > > > >> From: Morten Brørup [mailto:m...@smartsharesystems.com] > > > >> Sent: Thursday, 23 June 2022 14.39 > > > >> > > > >> With this patch, the checksum can be calculated on an unaligned > buffer. > > > >> I.e. the buf parameter is no longer required to be 16 bit > aligned. > > > >> > > > >> The checksum is still calculated using a 16 bit aligned pointer, > so > > > >> the compiler can auto-vectorize the function's inner loop. > > > >> > > > >> When the buffer is unaligned, the first byte of the buffer is > > > >> handled separately. Furthermore, the calculated checksum of the > > > >> buffer is byte shifted before being added to the initial > checksum, > > > >> to compensate for the checksum having been calculated on the > buffer > > > >> shifted by one byte. > > > >> > > > >> v4: > > > >> * Add copyright notice. > > > >> * Include stdbool.h (Emil Berg). > > > >> * Use RTE_PTR_ADD (Emil Berg). > > > >> * Fix one more typo in commit message. Is 'unligned' even a > word? > > > >> v3: > > > >> * Remove braces from single statement block. > > > >> * Fix typo in commit message. > > > >> v2: > > > >> * Do not assume that the buffer is part of an aligned packet > buffer. > > > >> > > > >> Bugzilla ID: 1035 > > > >> Cc: sta...@dpdk.org > > > >> > > > >> Signed-off-by: Morten Brørup > > > >> --- > > > >> lib/net/rte_ip.h | 32 +++- > > > >> 1 file changed, 27 insertions(+), 5 deletions(-) > > > >> > > > >> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index > > > >> b502481670..738d643da0 100644 > > > >> --- a/lib/net/rte_ip.h > > > >> +++ b/lib/net/rte_ip.h > > > >> @@ -3,6 +3,7 @@ > > > >>* The Regents of the University of California. > > > >>* Copyright(c) 2010-2014 Intel Corporation. > > > >>* Copyright(c) 2014 6WIND S.A. > > > >> + * Copyright(c) 2022 SmartShare Systems. > > > >>* All rights reserved. > > > >>*/ > > > >> > > > >> @@ -15,6 +16,7 @@ > > > >>* IP-related defines > > > >>*/ > > > >> > > > >> +#include > > > >> #include > > > >> > > > >> #ifdef RTE_EXEC_ENV_WINDOWS > > > >> @@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t > len, > > > >> uint32_t sum) > > > >> { > > > >>/* extend strict-aliasing rules */ > > > >>typedef uint16_t __attribute__((__may_alias__)) u16_p; > > > >> - const u16_p *u16_buf = (const u16_p *)buf; > > > >> - const u16_p *end = u16_buf + len / sizeof(*u16_buf); > > > >> + const u16_p *u16_buf; > > > >> + const u16_p *end; > > > >> + uint32_t bsum = 0; > > > >> + const bool unaligned = (uintptr_t)buf & 1; > > > >> + > > > >> + /* if buffer is unaligned, keeping it byte order > independent */ > > > >> + if (unlikely(unaligned)) { > > > >> + uint16_t first = 0; > > > >> + if (unlikely(len == 0)) > > > >> + return 0; > > > >> + ((unsigned char *)&first)[1] = *(const unsigned > > > char *)buf; > > > >> + bsum += first; > > > >> + buf = RTE_PTR_ADD(buf, 1); > > > >> + len--; > > > >> + } > > > >> > > > >> + /* aligned access for compiler auto-vectorization */ > > > > > > The compiler will be able to auto vectorize even unaligned > accesses, > > > just with different instructions. From what I can tell, there's no > > > performance impact, at least not on the x86_64 systems I tried on. > > > > > > I think you should remove the first special case conditional and > use > > > memcpy() instead of the cumbersome __may_alias__ construct to > retrieve > > > the data. > > > > > > > Here: > > https://www.agner.org/optimize/instruction_tables.pdf > > it lists the latency of vmovdqa (aligned) as 6 cycles and the latency > for > > vmovdqu (unaligned) as 7 cycles. So I guess there can be some > difference. > > Although in practice I'm not sure what difference it makes. I've not > seen any > > difference in runtime between the two versions. > > > > Correction to my comment: > Those stats are for some older CPU. For some newer CPUs such as Tiger > Lake the stats seem to be the same regardless of aligned or unaligned. > I agree that the memcpy method is more elegant and easy to read. However, we would need to performance test the modified checksum function with a large number of CPUs to prove that we don't introduce a performance regression on any CPU architecture still supported by DPDK. And Emil already found a CPU where it costs 1 extra cycle per 16 bytes, which adds up to a total of ca. 91 extra cycles on a 1460 byte TCP packet. So I opted for a solution with zero changes to the inner loop, so no performance retesting is required (for the previously supported use cases, whe
RE: [PATCH] net/af_xdp: make compatible with libbpf v0.8.0
> > On 6/24/22 13:23, Ciara Loftus wrote: > > libbpf v0.8.0 deprecates the bpf_get_link_xdp_id and bpf_set_link_xdp_fd > > functions. Use meson to detect if libbpf >= v0.7.0 is linked and if so, use > > the recommended replacement functions bpf_xdp_query_id, > bpf_xdp_attach > > and bpf_xdp_detach which are available to use since libbpf v0.7.0. > > > > Also prevent linking with libbpf versions > v0.8.0. > > > > Signed-off-by: Ciara Loftus > > --- > > doc/guides/nics/af_xdp.rst | 3 ++- > > drivers/net/af_xdp/compat.h | 36 > - > > drivers/net/af_xdp/meson.build | 7 ++ > > drivers/net/af_xdp/rte_eth_af_xdp.c | 19 +++ > > 4 files changed, 42 insertions(+), 23 deletions(-) > > Don't we need to mention these changes in release notes? > > > > > diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst > > index 56681c8365..9edb48df67 100644 > > --- a/doc/guides/nics/af_xdp.rst > > +++ b/doc/guides/nics/af_xdp.rst > > @@ -43,7 +43,8 @@ Prerequisites > > This is a Linux-specific PMD, thus the following prerequisites apply: > > > > * A Linux Kernel (version > v4.18) with XDP sockets configuration > > enabled; > > -* Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf <=v0.6.0 > > +* Both libxdp >=v1.2.2 and libbpf <=v0.8.0 libraries installed, or, libbpf > > + <=v0.6.0. > > * If using libxdp, it requires an environment variable called > > LIBXDP_OBJECT_PATH to be set to the location of where libxdp placed its > bpf > > object files. This is usually in /usr/local/lib/bpf or > > /usr/local/lib64/bpf. > > diff --git a/drivers/net/af_xdp/compat.h b/drivers/net/af_xdp/compat.h > > index 28ea64aeaa..8f4ac8b5ea 100644 > > --- a/drivers/net/af_xdp/compat.h > > +++ b/drivers/net/af_xdp/compat.h > > @@ -60,7 +60,7 @@ tx_syscall_needed(struct xsk_ring_prod *q > __rte_unused) > > } > > #endif > > > > -#ifdef RTE_NET_AF_XDP_LIBBPF_OBJ_OPEN > > +#ifdef RTE_NET_AF_XDP_LIBBPF_V070 > > Typically version-based checks are considered as bad. Isn't it > better use feature-based checks/defines? Hi Andrew, Thank you for the feedback. Is the feature-based checking something that we can push to the next release? We are already using the pkg-config version-check method for other libraries/features in the meson.build file: * libxdp >= v1.2.2 # earliest compatible libxdp release * libbpf >= v0.7.0 # bpf_object__* functions * libbpf >= v0.2.0 # shared umem feature If we change to your suggested method I think we should change them all in one patch. IMO it's probably too close to the release to change them all right now. What do you think? Thanks, Ciara
Re: [PATCH] net/af_xdp: make compatible with libbpf v0.8.0
On 6/27/22 17:17, Loftus, Ciara wrote: On 6/24/22 13:23, Ciara Loftus wrote: libbpf v0.8.0 deprecates the bpf_get_link_xdp_id and bpf_set_link_xdp_fd functions. Use meson to detect if libbpf >= v0.7.0 is linked and if so, use the recommended replacement functions bpf_xdp_query_id, bpf_xdp_attach and bpf_xdp_detach which are available to use since libbpf v0.7.0. Also prevent linking with libbpf versions > v0.8.0. Signed-off-by: Ciara Loftus --- doc/guides/nics/af_xdp.rst | 3 ++- drivers/net/af_xdp/compat.h | 36 - drivers/net/af_xdp/meson.build | 7 ++ drivers/net/af_xdp/rte_eth_af_xdp.c | 19 +++ 4 files changed, 42 insertions(+), 23 deletions(-) Don't we need to mention these changes in release notes? diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst index 56681c8365..9edb48df67 100644 --- a/doc/guides/nics/af_xdp.rst +++ b/doc/guides/nics/af_xdp.rst @@ -43,7 +43,8 @@ Prerequisites This is a Linux-specific PMD, thus the following prerequisites apply: * A Linux Kernel (version > v4.18) with XDP sockets configuration enabled; -* Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf <=v0.6.0 +* Both libxdp >=v1.2.2 and libbpf <=v0.8.0 libraries installed, or, libbpf + <=v0.6.0. * If using libxdp, it requires an environment variable called LIBXDP_OBJECT_PATH to be set to the location of where libxdp placed its bpf object files. This is usually in /usr/local/lib/bpf or /usr/local/lib64/bpf. diff --git a/drivers/net/af_xdp/compat.h b/drivers/net/af_xdp/compat.h index 28ea64aeaa..8f4ac8b5ea 100644 --- a/drivers/net/af_xdp/compat.h +++ b/drivers/net/af_xdp/compat.h @@ -60,7 +60,7 @@ tx_syscall_needed(struct xsk_ring_prod *q __rte_unused) } #endif -#ifdef RTE_NET_AF_XDP_LIBBPF_OBJ_OPEN +#ifdef RTE_NET_AF_XDP_LIBBPF_V070 Typically version-based checks are considered as bad. Isn't it better use feature-based checks/defines? Hi Andrew, Thank you for the feedback. Is the feature-based checking something that we can push to the next release? We are already using the pkg-config version-check method for other libraries/features in the meson.build file: * libxdp >= v1.2.2 # earliest compatible libxdp release * libbpf >= v0.7.0 # bpf_object__* functions * libbpf >= v0.2.0 # shared umem feature If we change to your suggested method I think we should change them all in one patch. IMO it's probably too close to the release to change them all right now. What do you think? Thanks, Ciara Hi Ciara, yes, ideally we should avoid usage of version-based check everywhere, but I don't think that it is critical to switch at once. We can use it for new checks right now and rewrite old/existing checks a bit later in the next release. Please, note that my notes are related to review notes from Thomas who asked by file_library() method is removed. Yes, it is confusing and it is better to avoid it. Usage of feature-based checks would allow to preserve find_library() as well. Andrew.
Re: [PATCH] vdpa/sfc: handle sync issue between qemu and vhost-user
When you send a new version, please, don't forget to specify -v on part format and use --in-reply-to with the first mail ID. See contributors guidelines. Also, new version should make it clear what is changed. See below. On 6/27/22 11:49, abhimanyu.sa...@xilinx.com wrote: From: Abhimanyu Saini When DPDK app is running in the VF, it sometimes rings the doorbell before dev_config has had a chance to complete and hence it misses the event. As workaround, ring the doorbell when vDPA reports the notify_area to QEMU. Fixes: 5e7596ba7cb3 ("vdpa/sfc: introduce Xilinx vDPA driver") Above seems to be wrong. Nearby code is added later, so, it should be: Fixes: 630be406dcbf ("vdpa/sfc: get queue notify area info") Cc: sta...@dpdk.org Signed-off-by: Vijay Kumar Srivastava Signed-off-by: Abhimanyu Saini --- Version changelog here. drivers/vdpa/sfc/sfc_vdpa_ops.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c index b3d9b6c..63aa52d 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.c +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c @@ -794,6 +794,8 @@ int vfio_dev_fd; efx_rc_t rc; unsigned int bar_offset; + volatile void *doorbell; + struct rte_pci_device *pci_dev; struct rte_vdpa_device *vdpa_dev; struct sfc_vdpa_ops_data *ops_data; struct vfio_region_info reg = { .argsz = sizeof(reg) }; @@ -856,6 +858,18 @@ sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64, *offset); + pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev; + doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + *offset; + + /* +* virtio-net driver in VM sends queue notifications before +* vDPA has a chance to setup the queues and notification area, +* and hence the HW misses these doorbell notifications. +* Since, it is safe to send duplicate doorbell, send another +* doorbell from vDPA driver as workaround for this timing issue. +*/ + rte_write16(qid, doorbell); + return 0; }
RE: [PATCH] net/af_xdp: make compatible with libbpf v0.8.0
> > On 6/27/22 17:17, Loftus, Ciara wrote: > >> > >> On 6/24/22 13:23, Ciara Loftus wrote: > >>> libbpf v0.8.0 deprecates the bpf_get_link_xdp_id and > bpf_set_link_xdp_fd > >>> functions. Use meson to detect if libbpf >= v0.7.0 is linked and if so, > >>> use > >>> the recommended replacement functions bpf_xdp_query_id, > >> bpf_xdp_attach > >>> and bpf_xdp_detach which are available to use since libbpf v0.7.0. > >>> > >>> Also prevent linking with libbpf versions > v0.8.0. > >>> > >>> Signed-off-by: Ciara Loftus > >>> --- > >>>doc/guides/nics/af_xdp.rst | 3 ++- > >>>drivers/net/af_xdp/compat.h | 36 > >> - > >>>drivers/net/af_xdp/meson.build | 7 ++ > >>>drivers/net/af_xdp/rte_eth_af_xdp.c | 19 +++ > >>>4 files changed, 42 insertions(+), 23 deletions(-) > >> > >> Don't we need to mention these changes in release notes? > >> > >>> > >>> diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst > >>> index 56681c8365..9edb48df67 100644 > >>> --- a/doc/guides/nics/af_xdp.rst > >>> +++ b/doc/guides/nics/af_xdp.rst > >>> @@ -43,7 +43,8 @@ Prerequisites > >>>This is a Linux-specific PMD, thus the following prerequisites apply: > >>> > >>>* A Linux Kernel (version > v4.18) with XDP sockets configuration > enabled; > >>> -* Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf > >>> <=v0.6.0 > >>> +* Both libxdp >=v1.2.2 and libbpf <=v0.8.0 libraries installed, or, > >>> libbpf > >>> + <=v0.6.0. > >>>* If using libxdp, it requires an environment variable called > >>> LIBXDP_OBJECT_PATH to be set to the location of where libxdp > placed its > >> bpf > >>> object files. This is usually in /usr/local/lib/bpf or > >>> /usr/local/lib64/bpf. > >>> diff --git a/drivers/net/af_xdp/compat.h > b/drivers/net/af_xdp/compat.h > >>> index 28ea64aeaa..8f4ac8b5ea 100644 > >>> --- a/drivers/net/af_xdp/compat.h > >>> +++ b/drivers/net/af_xdp/compat.h > >>> @@ -60,7 +60,7 @@ tx_syscall_needed(struct xsk_ring_prod *q > >> __rte_unused) > >>>} > >>>#endif > >>> > >>> -#ifdef RTE_NET_AF_XDP_LIBBPF_OBJ_OPEN > >>> +#ifdef RTE_NET_AF_XDP_LIBBPF_V070 > >> > >> Typically version-based checks are considered as bad. Isn't it > >> better use feature-based checks/defines? > > > > Hi Andrew, > > > > Thank you for the feedback. Is the feature-based checking something that > we can push to the next release? > > > > We are already using the pkg-config version-check method for other > libraries/features in the meson.build file: > > * libxdp >= v1.2.2 # earliest compatible libxdp release > > * libbpf >= v0.7.0 # bpf_object__* functions > > * libbpf >= v0.2.0 # shared umem feature > > > > If we change to your suggested method I think we should change them all > in one patch. IMO it's probably too close to the release to change them all > right now. What do you think? > > > > Thanks, > > Ciara > > Hi Ciara, > > yes, ideally we should avoid usage of version-based check everywhere, > but I don't think that it is critical to switch at once. We can use it > for new checks right now and rewrite old/existing checks a bit later in > the next release. > > Please, note that my notes are related to review notes from Thomas who > asked by file_library() method is removed. Yes, it is confusing and it > is better to avoid it. Usage of feature-based checks would allow to > preserve find_library() as well. Thank you for the explanation. In this case we want to check that the libbpf library is <=v0.8.0. At this moment in time v0.8.0 is the latest version of libbpf so we cannot check for a symbol that tells us the library is > v0.8.0. Can you think of a way to approach this without using the pkg-config version check method? I've introduced this check to future-proof the PMD and ensure we only ever link with versions of libbpf that we've validated to be compatible with the PMD. When say v0.9.0 is released we can patch the PMD allowing for libbpf <= v0.9.0 and make any necessary API changes as part of that patch. This should hopefully help avoid the scenario Thomas encountered. Ciara > > Andrew.
Re: [PATCH] examples/distributor: update dynamic configuration
Hi Ömer, I've a few comments: On 21/06/2022 21:15, Abdullah Ömer Yamaç wrote: In this patch, * It is possible to switch the running mode of the distributor using the command line argument. * With "-c" parameter, you can run RX and Distributor on the same core. * Without "-c" parameter, you can run RX and Distributor on the different core. * Syntax error of the single RX and distributor core is fixed. * When "-c" parameter is active, the wasted distributor core is also deactivated in the main function. Fixes: 4a7f40c0ff9a ("examples/distributor: add dedicated core") Cc: sta...@dpdk.org Signed-off-by: Abdullah Ömer Yamaç --- Cc: david.h...@intel.com --- doc/guides/sample_app_ug/dist_app.rst | 3 +- examples/distributor/main.c | 205 +++--- 2 files changed, 152 insertions(+), 56 deletions(-) diff --git a/doc/guides/sample_app_ug/dist_app.rst b/doc/guides/sample_app_ug/dist_app.rst index 3bd03905c3..5c80561187 100644 --- a/doc/guides/sample_app_ug/dist_app.rst +++ b/doc/guides/sample_app_ug/dist_app.rst @@ -42,11 +42,12 @@ Running the Application .. code-block:: console - .//examples/dpdk-distributor [EAL options] -- -p PORTMASK + .//examples/dpdk-distributor [EAL options] -- -p PORTMASK [-c] where, * -p PORTMASK: Hexadecimal bitmask of ports to configure + * -c: Combines the RX core with distribution core #. To run the application in linux environment with 10 lcores, 4 ports, issue the command: diff --git a/examples/distributor/main.c b/examples/distributor/main.c index 02bf91f555..6e98f78054 100644 --- a/examples/distributor/main.c +++ b/examples/distributor/main.c @@ -39,6 +39,7 @@ volatile uint8_t quit_signal_rx; volatile uint8_t quit_signal_dist; volatile uint8_t quit_signal_work; unsigned int power_lib_initialised; +bool enable_lcore_rx_distributor; static volatile struct app_stats { struct { @@ -256,14 +257,82 @@ lcore_rx(struct lcore_params *p) } app_stats.rx.rx_pkts += nb_rx; -/* - * You can run the distributor on the rx core with this code. Returned - * packets are then send straight to the tx core. - */ -#if 0 - rte_distributor_process(d, bufs, nb_rx); - const uint16_t nb_ret = rte_distributor_returned_pktsd, - bufs, BURST_SIZE*2); + /* +* Swap the following two lines if you want the rx traffic +* to go directly to tx, no distribution. +*/ + struct rte_ring *out_ring = p->rx_dist_ring; + /* struct rte_ring *out_ring = p->dist_tx_ring; */ + + uint16_t sent = rte_ring_enqueue_burst(out_ring, + (void *)bufs, nb_rx, NULL); + + app_stats.rx.enqueued_pkts += sent; + if (unlikely(sent < nb_rx)) { + app_stats.rx.enqdrop_pkts += nb_rx - sent; + RTE_LOG_DP(DEBUG, DISTRAPP, + "%s:Packet loss due to full ring\n", __func__); + while (sent < nb_rx) + rte_pktmbuf_free(bufs[sent++]); + } + if (++port == nb_ports) + port = 0; + } + if (power_lib_initialised) + rte_power_exit(rte_lcore_id()); + /* set worker & tx threads quit flag */ + printf("\nCore %u exiting rx task.\n", rte_lcore_id()); + quit_signal = 1; + return 0; +} + +static int +lcore_rx_and_distributor(struct lcore_params *p) +{ + struct rte_distributor *d = p->d; + const uint16_t nb_ports = rte_eth_dev_count_avail(); + const int socket_id = rte_socket_id(); + uint16_t port; + struct rte_mbuf *bufs[BURST_SIZE*2]; + + RTE_ETH_FOREACH_DEV(port) { + /* skip ports that are not enabled */ + if ((enabled_port_mask & (1 << port)) == 0) + continue; + + if (rte_eth_dev_socket_id(port) > 0 && + rte_eth_dev_socket_id(port) != socket_id) + printf("WARNING, port %u is on remote NUMA node to " + "RX thread.\n\tPerformance will not " + "be optimal.\n", port); + } + + printf("\nCore %u doing packet RX and Distributor.\n", rte_lcore_id()); + port = 0; + while (!quit_signal_rx) { + + /* skip ports that are not enabled */ + if ((enabled_port_mask & (1 << port)) == 0) { + if (++port == nb_ports) + port = 0; + continue; + } + const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs, + BURST_SIZE); + if (unlikely(nb_rx == 0)) { +
[PATCH] doc: add event timer expiry drop stat
The structure ``rte_event_timer_adapter_stats`` will be extended by adding a new field, ``evtim_drop_count``. This stat will represent the number of times an event timer expiry is dropped by the event timer adapter. Signed-off-by: Naga Harish K S V --- doc/guides/rel_notes/deprecation.rst | 6 ++ 1 file changed, 6 insertions(+) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 4e5b23c53d..ab4ea67115 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -125,3 +125,9 @@ Deprecation Notices applications should be updated to use the ``dmadev`` library instead, with the underlying HW-functionality being provided by the ``ioat`` or ``idxd`` dma drivers + +* eventdev/timer: The structure ``rte_event_timer_adapter_stats`` will be + extended by adding a new field, ``evtim_drop_count``. This stat will + represent the number of times an event timer expiry is dropped + by the timer adapter. This field will be used by a future patch adding + support for periodic mode to the software timer adapter in DPDK 22.11. -- 2.25.1
Re: [PATCH] examples/distributor: update dynamic configuration
Hi David, Thank you for your review. I have two questions. The first one is about new release. As I remember new DPDK realize will be published in a short time and my previous fix in that release. Therefore, Should I wait for that release to submit patch? The other question is below, On 27.06.2022 18:51, Hunt, David wrote: Hi Ömer, I've a few comments: On 21/06/2022 21:15, Abdullah Ömer Yamaç wrote: In this patch, * It is possible to switch the running mode of the distributor using the command line argument. * With "-c" parameter, you can run RX and Distributor on the same core. * Without "-c" parameter, you can run RX and Distributor on the different core. * Syntax error of the single RX and distributor core is fixed. * When "-c" parameter is active, the wasted distributor core is also deactivated in the main function. Fixes: 4a7f40c0ff9a ("examples/distributor: add dedicated core") Cc: sta...@dpdk.org Signed-off-by: Abdullah Ömer Yamaç --- Cc: david.h...@intel.com --- doc/guides/sample_app_ug/dist_app.rst | 3 +- examples/distributor/main.c | 205 +++--- 2 files changed, 152 insertions(+), 56 deletions(-) diff --git a/doc/guides/sample_app_ug/dist_app.rst b/doc/guides/sample_app_ug/dist_app.rst index 3bd03905c3..5c80561187 100644 --- a/doc/guides/sample_app_ug/dist_app.rst +++ b/doc/guides/sample_app_ug/dist_app.rst @@ -42,11 +42,12 @@ Running the Application .. code-block:: console - .//examples/dpdk-distributor [EAL options] -- -p PORTMASK + .//examples/dpdk-distributor [EAL options] -- -p PORTMASK [-c] where, * -p PORTMASK: Hexadecimal bitmask of ports to configure + * -c: Combines the RX core with distribution core #. To run the application in linux environment with 10 lcores, 4 ports, issue the command: diff --git a/examples/distributor/main.c b/examples/distributor/main.c index 02bf91f555..6e98f78054 100644 --- a/examples/distributor/main.c +++ b/examples/distributor/main.c @@ -39,6 +39,7 @@ volatile uint8_t quit_signal_rx; volatile uint8_t quit_signal_dist; volatile uint8_t quit_signal_work; unsigned int power_lib_initialised; +bool enable_lcore_rx_distributor; static volatile struct app_stats { struct { @@ -256,14 +257,82 @@ lcore_rx(struct lcore_params *p) } app_stats.rx.rx_pkts += nb_rx; -/* - * You can run the distributor on the rx core with this code. Returned - * packets are then send straight to the tx core. - */ -#if 0 - rte_distributor_process(d, bufs, nb_rx); - const uint16_t nb_ret = rte_distributor_returned_pktsd, - bufs, BURST_SIZE*2); + /* +* Swap the following two lines if you want the rx traffic +* to go directly to tx, no distribution. +*/ + struct rte_ring *out_ring = p->rx_dist_ring; + /* struct rte_ring *out_ring = p->dist_tx_ring; */ + + uint16_t sent = rte_ring_enqueue_burst(out_ring, + (void *)bufs, nb_rx, NULL); + + app_stats.rx.enqueued_pkts += sent; + if (unlikely(sent < nb_rx)) { + app_stats.rx.enqdrop_pkts += nb_rx - sent; + RTE_LOG_DP(DEBUG, DISTRAPP, + "%s:Packet loss due to full ring\n", __func__); + while (sent < nb_rx) + rte_pktmbuf_free(bufs[sent++]); + } + if (++port == nb_ports) + port = 0; + } + if (power_lib_initialised) + rte_power_exit(rte_lcore_id()); + /* set worker & tx threads quit flag */ + printf("\nCore %u exiting rx task.\n", rte_lcore_id()); + quit_signal = 1; + return 0; +} + +static int +lcore_rx_and_distributor(struct lcore_params *p) +{ + struct rte_distributor *d = p->d; + const uint16_t nb_ports = rte_eth_dev_count_avail(); + const int socket_id = rte_socket_id(); + uint16_t port; + struct rte_mbuf *bufs[BURST_SIZE*2]; + + RTE_ETH_FOREACH_DEV(port) { + /* skip ports that are not enabled */ + if ((enabled_port_mask & (1 << port)) == 0) + continue; + + if (rte_eth_dev_socket_id(port) > 0 && + rte_eth_dev_socket_id(port) != socket_id) + printf("WARNING, port %u is on remote NUMA node to " + "RX thread.\n\tPerformance will not " + "be optimal.\n", port); + } + + printf("\nCore %u doing packet RX and Distributor.\n", rte_lcore_id()); + port = 0; + while (!quit_signal_rx) { + + /* skip ports that are not enabled */ + if ((enabled_port_mask & (1 << port)) == 0)
[PATCH] crypto/qat: fix docsis segmentation fault
Currently if AES or DES algorithms fail for Docsis test suite, a segmentation fault occurs when cryptodev_qat_autotest is ran. This is due to a duplicate call of EVP_CIPHER_CTX_free for the session context. Ctx is freed firstly in the bpi_cipher_ctx_init function and then again at the end of qat_sym_session_configure_cipher function. This commit fixes this bug by removing the first instance of EVP_CIPHER_CTX_free, leaving just the dedicated function in the upper level to free the ctx. Fixes: 98f060891615 ("crypto/qat: add symmetric session file") Cc: fiona.tr...@intel.com Cc: sta...@dpdk.org Signed-off-by: Rebecca Troy --- drivers/crypto/qat/qat_sym_session.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/crypto/qat/qat_sym_session.c b/drivers/crypto/qat/qat_sym_session.c index e40c042ba9..568792b753 100644 --- a/drivers/crypto/qat/qat_sym_session.c +++ b/drivers/crypto/qat/qat_sym_session.c @@ -111,12 +111,12 @@ bpi_cipher_ctx_init(enum rte_crypto_cipher_algorithm cryptodev_algo, const uint8_t *key, uint16_t key_length, void **ctx) { const EVP_CIPHER *algo = NULL; - int ret; + int ret = 0; *ctx = EVP_CIPHER_CTX_new(); if (*ctx == NULL) { ret = -ENOMEM; - goto ctx_init_err; + return ret; } if (cryptodev_algo == RTE_CRYPTO_CIPHER_DES_DOCSISBPI) @@ -130,14 +130,9 @@ bpi_cipher_ctx_init(enum rte_crypto_cipher_algorithm cryptodev_algo, /* IV will be ECB encrypted whether direction is encrypt or decrypt*/ if (EVP_EncryptInit_ex(*ctx, algo, NULL, key, 0) != 1) { ret = -EINVAL; - goto ctx_init_err; + return ret; } - return 0; - -ctx_init_err: - if (*ctx != NULL) - EVP_CIPHER_CTX_free(*ctx); return ret; } -- 2.34.1
[PATCH v4 0/6] add thread lifetime and attributes API
add rte thread lifetime and attributes api. with these api additions there is now sufficient platform abstracted thread api to remove the use of pthread in the unit tests. v4: * update version.map to show api from series added in 22.11 instead of 22.07. * fix missing parameter name in rte_thread_func declaration causing doxygen ci failure. v3: * change rte_thread_func return type to uint32_t for exit value. * change rte_thread_join retval to be uint32_t (matched with the return value from rte_thread_func). * introduce a wrapper for rte_thread_func on posix platforms to adapt differences between rte_thread_func and pthread start_routine. * remove interpretation / dereference of result from pthread_join in posix implementation of rte_thread_join. * fix leak of dynamically allocated thread_routine_ctx on windows in error paths. * don't cast and truncate NULL to integer value for rte_thread_join when pthread_join returns no result. v2: * split implementation of rte_thread_equal for windows / posix and use pthread_equal for posix platforms. * remove parameter validation assertions and instead return EINVAL for mandatory pointers to type that are NULL. * correct doxygen comment parameter name args -> arg Tyler Retzlaff (6): eal: add thread attributes eal: add thread lifetime management eal: add basic rte thread ID equal API test/threads: add tests for thread lifetime API test/threads: add tests for thread attributes API test/threads: remove unit test use of pthread app/test/test_threads.c | 134 ++-- lib/eal/common/meson.build | 1 + lib/eal/common/rte_thread.c | 60 +++ lib/eal/include/rte_thread.h| 187 ++ lib/eal/unix/rte_thread.c | 141 ++ lib/eal/version.map | 10 ++ lib/eal/windows/include/sched.h | 2 +- lib/eal/windows/rte_thread.c| 219 8 files changed, 705 insertions(+), 49 deletions(-) create mode 100644 lib/eal/common/rte_thread.c -- 1.8.3.1
[PATCH v4 3/6] eal: add basic rte thread ID equal API
Add rte_thread_equal() that tests if two rte_thread_id are equal. Signed-off-by: Narcisa Vasile Signed-off-by: Tyler Retzlaff Acked-by: Chengwen Feng --- lib/eal/include/rte_thread.h | 19 +++ lib/eal/unix/rte_thread.c| 6 ++ lib/eal/version.map | 1 + lib/eal/windows/rte_thread.c | 6 ++ 4 files changed, 32 insertions(+) diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h index 3c8ff50..8359c1c 100644 --- a/lib/eal/include/rte_thread.h +++ b/lib/eal/include/rte_thread.h @@ -144,6 +144,25 @@ int rte_thread_create(rte_thread_t *thread_id, __rte_experimental rte_thread_t rte_thread_self(void); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Check if 2 thread ids are equal. + * + * @param t1 + * First thread id. + * + * @param t2 + * Second thread id. + * + * @return + * If the ids are equal, return nonzero. + * Otherwise, return 0. + */ +__rte_experimental +int rte_thread_equal(rte_thread_t t1, rte_thread_t t2); + #ifdef RTE_HAS_CPUSET /** diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c index d4c1a7f..37ebfcf 100644 --- a/lib/eal/unix/rte_thread.c +++ b/lib/eal/unix/rte_thread.c @@ -210,6 +210,12 @@ struct thread_routine_ctx { return pthread_detach((pthread_t)thread_id.opaque_id); } +int +rte_thread_equal(rte_thread_t t1, rte_thread_t t2) +{ + return pthread_equal((pthread_t)t1.opaque_id, (pthread_t)t2.opaque_id); +} + rte_thread_t rte_thread_self(void) { diff --git a/lib/eal/version.map b/lib/eal/version.map index b404343..b5dc7f1 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -432,6 +432,7 @@ EXPERIMENTAL { # added in 22.11 rte_thread_create; rte_thread_detach; + rte_thread_equal; rte_thread_join; }; diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c index ad71be4..1bc648e 100644 --- a/lib/eal/windows/rte_thread.c +++ b/lib/eal/windows/rte_thread.c @@ -287,6 +287,12 @@ struct thread_routine_ctx { return 0; } +int +rte_thread_equal(rte_thread_t t1, rte_thread_t t2) +{ + return t1.opaque_id == t2.opaque_id; +} + rte_thread_t rte_thread_self(void) { -- 1.8.3.1
[PATCH v4 2/6] eal: add thread lifetime management
The *rte_thread_create()* function can optionally receive an rte_thread_attr_t object that will cause the thread to be created with the affinity and priority described by the attributes object. If no rte_thread_attr_t is passed (parameter is NULL), the default affinity and priority are used. On Windows, the function executed by a thread when the thread starts is represeneted by a function pointer of type DWORD (*func) (void*). On other platforms, the function pointer is a void* (*func) (void*). Performing a cast between these two types of function pointers to uniformize the API on all platforms may result in undefined behavior. TO fix this issue, a wrapper that respects the signature required by CreateThread() has been created on Windows. Signed-off-by: Narcisa Vasile Signed-off-by: Tyler Retzlaff --- lib/eal/include/rte_thread.h| 75 ++ lib/eal/unix/rte_thread.c | 135 + lib/eal/version.map | 5 + lib/eal/windows/include/sched.h | 2 +- lib/eal/windows/rte_thread.c| 213 5 files changed, 389 insertions(+), 41 deletions(-) diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h index 062308d..3c8ff50 100644 --- a/lib/eal/include/rte_thread.h +++ b/lib/eal/include/rte_thread.h @@ -31,6 +31,18 @@ } rte_thread_t; /** + * Thread function + * + * Function pointer to thread start routine. + * + * @param arg + * Argument passed to rte_thread_create(). + * @return + * Thread function exit value. + */ +typedef uint32_t (*rte_thread_func) (void *arg); + +/** * Thread priority values. */ enum rte_thread_priority { @@ -61,6 +73,69 @@ enum rte_thread_priority { * @warning * @b EXPERIMENTAL: this API may change without prior notice. * + * Create a new thread that will invoke the 'thread_func' routine. + * + * @param thread_id + *A pointer that will store the id of the newly created thread. + * + * @param thread_attr + *Attributes that are used at the creation of the new thread. + * + * @param thread_func + *The routine that the new thread will invoke when starting execution. + * + * @param arg + *Argument to be passed to the 'thread_func' routine. + * + * @return + * On success, return 0. + * On failure, return a positive errno-style error number. + */ +__rte_experimental +int rte_thread_create(rte_thread_t *thread_id, + const rte_thread_attr_t *thread_attr, + rte_thread_func thread_func, void *arg); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Waits for the thread identified by 'thread_id' to terminate + * + * @param thread_id + *The identifier of the thread. + * + * @param value_ptr + *Stores the exit status of the thread. + * + * @return + * On success, return 0. + * On failure, return a positive errno-style error number. + */ +__rte_experimental +int rte_thread_join(rte_thread_t thread_id, uint32_t *value_ptr); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Indicate that the return value of the thread is not needed and + * all thread resources should be release when the thread terminates. + * + * @param thread_id + *The id of the thread to be detached. + * + * @return + * On success, return 0. + * On failure, return a positive errno-style error number. + */ +__rte_experimental +int rte_thread_detach(rte_thread_t thread_id); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * * Get the id of the calling thread. * * @return diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c index 9126595..d4c1a7f 100644 --- a/lib/eal/unix/rte_thread.c +++ b/lib/eal/unix/rte_thread.c @@ -16,6 +16,11 @@ struct eal_tls_key { pthread_key_t thread_index; }; +struct thread_routine_ctx { + rte_thread_func thread_func; + void *routine_args; +}; + static int thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int *os_pri, int *pol) @@ -75,6 +80,136 @@ struct eal_tls_key { return 0; } +static void * +thread_func_wrapper(void *arg) +{ + struct thread_routine_ctx ctx = *(struct thread_routine_ctx *)arg; + + free(arg); + + return (void *)(uintptr_t)ctx.thread_func(ctx.routine_args); +} + +int +rte_thread_create(rte_thread_t *thread_id, + const rte_thread_attr_t *thread_attr, + rte_thread_func thread_func, void *args) +{ + int ret = 0; + pthread_attr_t attr; + pthread_attr_t *attrp = NULL; + struct thread_routine_ctx *ctx; + struct sched_param param = { + .sched_priority = 0, + }; + int policy = SCHED_OTHER; + + ctx = calloc(1, sizeof(*ctx)); + if (ctx == NULL) { + RTE_LOG(DEBUG, EAL, "Insufficient memory for thread context allocations\n"); + ret = ENOMEM; + goto cleanup; +
[PATCH v4 1/6] eal: add thread attributes
Implement thread attributes for: * thread affinity * thread priority Implement functions for managing thread attributes. Priority is represented through an enum that allows for two levels: * RTE_THREAD_PRIORITY_NORMAL * RTE_THREAD_PRIORITY_REALTIME_CRITICAL Affinity is described by the rte_cpuset_t type. An rte_thread_attr_t object can be set to the default values by calling rte_thread_attr_init(). Signed-off-by: Narcisa Vasile Signed-off-by: Tyler Retzlaff --- lib/eal/common/meson.build | 1 + lib/eal/common/rte_thread.c | 60 lib/eal/include/rte_thread.h | 93 lib/eal/version.map | 4 ++ 4 files changed, 158 insertions(+) create mode 100644 lib/eal/common/rte_thread.c diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index 917758c..1e77dba 100644 --- a/lib/eal/common/meson.build +++ b/lib/eal/common/meson.build @@ -36,6 +36,7 @@ sources += files( 'rte_random.c', 'rte_reciprocal.c', 'rte_service.c', +'rte_thread.c', 'rte_version.c', ) if is_linux or is_windows diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c new file mode 100644 index 000..d204cc5 --- /dev/null +++ b/lib/eal/common/rte_thread.c @@ -0,0 +1,60 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (C) 2022 Microsoft Corporation + */ + +#include +#include + +int +rte_thread_attr_init(rte_thread_attr_t *attr) +{ + if (attr == NULL) + return EINVAL; + + CPU_ZERO(&attr->cpuset); + attr->priority = RTE_THREAD_PRIORITY_NORMAL; + + return 0; +} + +int +rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr, + rte_cpuset_t *cpuset) +{ + if (thread_attr == NULL) + return EINVAL; + + if (cpuset == NULL) + return EINVAL; + + thread_attr->cpuset = *cpuset; + + return 0; +} + +int +rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr, + rte_cpuset_t *cpuset) +{ + if (thread_attr == NULL) + return EINVAL; + + if (cpuset == NULL) + return EINVAL; + + *cpuset = thread_attr->cpuset; + + return 0; +} + +int +rte_thread_attr_set_priority(rte_thread_attr_t *thread_attr, + enum rte_thread_priority priority) +{ + if (thread_attr == NULL) + return EINVAL; + + thread_attr->priority = priority; + + return 0; +} diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h index 9e261bf..062308d 100644 --- a/lib/eal/include/rte_thread.h +++ b/lib/eal/include/rte_thread.h @@ -40,6 +40,18 @@ enum rte_thread_priority { /**< highest thread priority allowed */ }; +#ifdef RTE_HAS_CPUSET + +/** + * Representation for thread attributes. + */ +typedef struct { + enum rte_thread_priority priority; /**< thread priority */ + rte_cpuset_t cpuset; /**< thread affinity */ +} rte_thread_attr_t; + +#endif /* RTE_HAS_CPUSET */ + /** * TLS key type, an opaque pointer. */ @@ -63,6 +75,87 @@ enum rte_thread_priority { * @warning * @b EXPERIMENTAL: this API may change without prior notice. * + * Initialize the attributes of a thread. + * These attributes can be passed to the rte_thread_create() function + * that will create a new thread and set its attributes according to attr. + * + * @param attr + * Thread attributes to initialize. + * + * @return + * On success, return 0. + * On failure, return a positive errno-style error number. + */ +__rte_experimental +int rte_thread_attr_init(rte_thread_attr_t *attr); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Set the CPU affinity value in the thread attributes pointed to + * by 'thread_attr'. + * + * @param thread_attr + * Points to the thread attributes in which affinity will be updated. + * + * @param cpuset + * Points to the value of the affinity to be set. + * + * @return + * On success, return 0. + * On failure, return a positive errno-style error number. + */ +__rte_experimental +int rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr, + rte_cpuset_t *cpuset); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Get the value of CPU affinity that is set in the thread attributes pointed + * to by 'thread_attr'. + * + * @param thread_attr + * Points to the thread attributes from which affinity will be retrieved. + * + * @param cpuset + * Pointer to the memory that will store the affinity. + * + * @return + * On success, return 0. + * On failure, return a positive errno-style error number. + */ +__rte_experimental +int rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr, + rte_cpuset_t *cpuset); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Set the thread priority valu
[PATCH v4 6/6] test/threads: remove unit test use of pthread
now that rte_thread provides thread lifetime functions stop using pthread in unit tests. Signed-off-by: Tyler Retzlaff --- app/test/test_threads.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/app/test/test_threads.c b/app/test/test_threads.c index 3c22cec..e0f18e4 100644 --- a/app/test/test_threads.c +++ b/app/test/test_threads.c @@ -3,7 +3,6 @@ */ #include -#include #include #include @@ -79,12 +78,11 @@ static int test_thread_priority(void) { - pthread_t id; rte_thread_t thread_id; enum rte_thread_priority priority; thread_id_ready = 0; - RTE_TEST_ASSERT(pthread_create(&id, NULL, thread_main, &thread_id) == 0, + RTE_TEST_ASSERT(rte_thread_create(&thread_id, NULL, thread_main, NULL) == 0, "Failed to create thread"); while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0) @@ -131,13 +129,12 @@ static int test_thread_affinity(void) { - pthread_t id; rte_thread_t thread_id; rte_cpuset_t cpuset0; rte_cpuset_t cpuset1; thread_id_ready = 0; - RTE_TEST_ASSERT(pthread_create(&id, NULL, thread_main, &thread_id) == 0, + RTE_TEST_ASSERT(rte_thread_create(&thread_id, NULL, thread_main, NULL) == 0, "Failed to create thread"); while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0) -- 1.8.3.1
[PATCH v4 5/6] test/threads: add tests for thread attributes API
test basic functionality and demonstrate use of following thread attributes api. additionally, test attributes are processed when supplied to rte_thread_create(). * rte_thread_attr_init * rte_thread_attr_set_affinity * rte_thread_attr_get_affinity * rte_thread_attr_set_priority Signed-off-by: Narcisa Vasile Signed-off-by: Tyler Retzlaff --- app/test/test_threads.c | 73 - 1 file changed, 72 insertions(+), 1 deletion(-) diff --git a/app/test/test_threads.c b/app/test/test_threads.c index 1077373..3c22cec 100644 --- a/app/test/test_threads.c +++ b/app/test/test_threads.c @@ -17,7 +17,9 @@ static uint32_t thread_main(void *arg) { - *(rte_thread_t *)arg = rte_thread_self(); + if (arg != NULL) + *(rte_thread_t *)arg = rte_thread_self(); + __atomic_store_n(&thread_id_ready, 1, __ATOMIC_RELEASE); while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 1) @@ -166,6 +168,73 @@ return 0; } +static int +test_thread_attributes_affinity(void) +{ + rte_thread_t thread_id; + rte_thread_attr_t attr; + rte_cpuset_t cpuset0; + rte_cpuset_t cpuset1; + + RTE_TEST_ASSERT(rte_thread_attr_init(&attr) == 0, + "Failed to initialize thread attributes"); + + CPU_ZERO(&cpuset0); + RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(rte_thread_self(), &cpuset0) == 0, + "Failed to get thread affinity"); + RTE_TEST_ASSERT(rte_thread_attr_set_affinity(&attr, &cpuset0) == 0, + "Failed to set thread attributes affinity"); + RTE_TEST_ASSERT(rte_thread_attr_get_affinity(&attr, &cpuset1) == 0, + "Failed to get thread attributes affinity"); + RTE_TEST_ASSERT(memcmp(&cpuset0, &cpuset1, sizeof(rte_cpuset_t)) == 0, + "Affinity should be stable"); + + thread_id_ready = 0; + RTE_TEST_ASSERT(rte_thread_create(&thread_id, &attr, thread_main, NULL) == 0, + "Failed to create attributes affinity thread."); + + while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0) + ; + + RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset1) == 0, + "Failed to get attributes thread affinity"); + RTE_TEST_ASSERT(memcmp(&cpuset0, &cpuset1, sizeof(rte_cpuset_t)) == 0, + "Failed to apply affinity attributes"); + + __atomic_store_n(&thread_id_ready, 2, __ATOMIC_RELEASE); + + return 0; +} + +static int +test_thread_attributes_priority(void) +{ + rte_thread_t thread_id; + rte_thread_attr_t attr; + enum rte_thread_priority priority; + + RTE_TEST_ASSERT(rte_thread_attr_init(&attr) == 0, + "Failed to initialize thread attributes"); + RTE_TEST_ASSERT(rte_thread_attr_set_priority(&attr, RTE_THREAD_PRIORITY_NORMAL) == 0, + "Failed to set thread attributes priority"); + + thread_id_ready = 0; + RTE_TEST_ASSERT(rte_thread_create(&thread_id, &attr, thread_main, NULL) == 0, + "Failed to create attributes priority thread."); + + while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0) + ; + + RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0, + "Failed to get thread priority"); + RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_NORMAL, + "Failed to apply priority attributes"); + + __atomic_store_n(&thread_id_ready, 2, __ATOMIC_RELEASE); + + return 0; +} + static struct unit_test_suite threads_test_suite = { .suite_name = "threads autotest", .setup = NULL, @@ -175,6 +244,8 @@ TEST_CASE(test_thread_create_detach), TEST_CASE(test_thread_affinity), TEST_CASE(test_thread_priority), + TEST_CASE(test_thread_attributes_affinity), + TEST_CASE(test_thread_attributes_priority), TEST_CASES_END() } }; -- 1.8.3.1
[PATCH v4 4/6] test/threads: add tests for thread lifetime API
test basic functionality and demonstrate use of following thread lifetime api. * rte_thread_create * rte_thread_detach * rte_thread_join Signed-off-by: Narcisa Vasile Signed-off-by: Tyler Retzlaff --- app/test/test_threads.c | 54 +++-- 1 file changed, 52 insertions(+), 2 deletions(-) diff --git a/app/test/test_threads.c b/app/test/test_threads.c index b9d8b4e..1077373 100644 --- a/app/test/test_threads.c +++ b/app/test/test_threads.c @@ -14,7 +14,7 @@ static uint32_t thread_id_ready; -static void * +static uint32_t thread_main(void *arg) { *(rte_thread_t *)arg = rte_thread_self(); @@ -23,7 +23,55 @@ while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 1) ; - return NULL; + return 0; +} + +static int +test_thread_create_join(void) +{ + rte_thread_t thread_id; + rte_thread_t thread_main_id; + + thread_id_ready = 0; + RTE_TEST_ASSERT(rte_thread_create(&thread_id, NULL, thread_main, &thread_main_id) == 0, + "Failed to create thread."); + + while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0) + ; + + RTE_TEST_ASSERT(rte_thread_equal(thread_id, thread_main_id) != 0, + "Unexpected thread id."); + + __atomic_store_n(&thread_id_ready, 2, __ATOMIC_RELEASE); + + RTE_TEST_ASSERT(rte_thread_join(thread_id, NULL) == 0, + "Failed to join thread."); + + return 0; +} + +static int +test_thread_create_detach(void) +{ + rte_thread_t thread_id; + rte_thread_t thread_main_id; + + thread_id_ready = 0; + RTE_TEST_ASSERT(rte_thread_create(&thread_id, NULL, thread_main, + &thread_main_id) == 0, "Failed to create thread."); + + while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0) + ; + + RTE_TEST_ASSERT(rte_thread_equal(thread_id, thread_main_id) != 0, + "Unexpected thread id."); + + __atomic_store_n(&thread_id_ready, 2, __ATOMIC_RELEASE); + + RTE_TEST_ASSERT(rte_thread_detach(thread_id) == 0, + "Failed to detach thread."); + + return 0; } static int @@ -123,6 +171,8 @@ .setup = NULL, .teardown = NULL, .unit_test_cases = { + TEST_CASE(test_thread_create_join), + TEST_CASE(test_thread_create_detach), TEST_CASE(test_thread_affinity), TEST_CASE(test_thread_priority), TEST_CASES_END() -- 1.8.3.1
Re: [PATCH v4] net: fix checksum with unaligned buffer
On 2022-06-27 15:22, Morten Brørup wrote: From: Emil Berg [mailto:emil.b...@ericsson.com] Sent: Monday, 27 June 2022 14.51 From: Emil Berg Sent: den 27 juni 2022 14:46 From: Mattias Rönnblom Sent: den 27 juni 2022 14:28 On 2022-06-23 14:51, Morten Brørup wrote: From: Morten Brørup [mailto:m...@smartsharesystems.com] Sent: Thursday, 23 June 2022 14.39 With this patch, the checksum can be calculated on an unaligned buffer. I.e. the buf parameter is no longer required to be 16 bit aligned. The checksum is still calculated using a 16 bit aligned pointer, so the compiler can auto-vectorize the function's inner loop. When the buffer is unaligned, the first byte of the buffer is handled separately. Furthermore, the calculated checksum of the buffer is byte shifted before being added to the initial checksum, to compensate for the checksum having been calculated on the buffer shifted by one byte. v4: * Add copyright notice. * Include stdbool.h (Emil Berg). * Use RTE_PTR_ADD (Emil Berg). * Fix one more typo in commit message. Is 'unligned' even a word? v3: * Remove braces from single statement block. * Fix typo in commit message. v2: * Do not assume that the buffer is part of an aligned packet buffer. Bugzilla ID: 1035 Cc: sta...@dpdk.org Signed-off-by: Morten Brørup --- lib/net/rte_ip.h | 32 +++- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index b502481670..738d643da0 100644 --- a/lib/net/rte_ip.h +++ b/lib/net/rte_ip.h @@ -3,6 +3,7 @@ * The Regents of the University of California. * Copyright(c) 2010-2014 Intel Corporation. * Copyright(c) 2014 6WIND S.A. + * Copyright(c) 2022 SmartShare Systems. * All rights reserved. */ @@ -15,6 +16,7 @@ * IP-related defines */ +#include #include #ifdef RTE_EXEC_ENV_WINDOWS @@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) { /* extend strict-aliasing rules */ typedef uint16_t __attribute__((__may_alias__)) u16_p; - const u16_p *u16_buf = (const u16_p *)buf; - const u16_p *end = u16_buf + len / sizeof(*u16_buf); + const u16_p *u16_buf; + const u16_p *end; + uint32_t bsum = 0; + const bool unaligned = (uintptr_t)buf & 1; + + /* if buffer is unaligned, keeping it byte order independent */ + if (unlikely(unaligned)) { + uint16_t first = 0; + if (unlikely(len == 0)) + return 0; + ((unsigned char *)&first)[1] = *(const unsigned char *)buf; + bsum += first; + buf = RTE_PTR_ADD(buf, 1); + len--; + } + /* aligned access for compiler auto-vectorization */ The compiler will be able to auto vectorize even unaligned accesses, just with different instructions. From what I can tell, there's no performance impact, at least not on the x86_64 systems I tried on. I think you should remove the first special case conditional and use memcpy() instead of the cumbersome __may_alias__ construct to retrieve the data. Here: https://www.agner.org/optimize/instruction_tables.pdf it lists the latency of vmovdqa (aligned) as 6 cycles and the latency for vmovdqu (unaligned) as 7 cycles. So I guess there can be some difference. Although in practice I'm not sure what difference it makes. I've not seen any difference in runtime between the two versions. Correction to my comment: Those stats are for some older CPU. For some newer CPUs such as Tiger Lake the stats seem to be the same regardless of aligned or unaligned. I agree that the memcpy method is more elegant and easy to read. However, we would need to performance test the modified checksum function with a large number of CPUs to prove that we don't introduce a performance regression on any CPU architecture still supported by DPDK. And Emil already found a CPU where it costs 1 extra cycle per 16 bytes, which adds up to a total of ca. 91 extra cycles on a 1460 byte TCP packet. I think you've misunderstood what latency means in such tables. It's a data dependency thing, not a measure of throughput. The throughput is *much* higher. My guess would be two such instruction per clock. For your 1460 bytes example, my Zen3 AMD needs performs identical with both the current DPDK implementation, your patch, and a memcpy()-ified version of the current implementation. They all need ~130 clock cycles/packet, with warm caches. IPC is 3 instructions per cycle, but obvious not all instructions are SIMD. The main issue with checksumming on the CPU is, in my experience, not that you don't have enough compute, but that you trash the caches. So I opted for a solution with zero changes to the inner loop, so no performance retesting is required (for the previously supported use cases, where the buffer is aligned). You will see performance degradation wit
Re: Service core statistics MT safety
On 2022-06-27 19:39, Honnappa Nagarahalli wrote: From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] Sent: Monday, 27 June 2022 13.06 Hi. Is it safe to enable stats on MT safe services? https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L3 6 6 It seems to me this would have to be an __atomic_add for this code to produce deterministic results. I agree. The same goes for the 'calls' field. The calling function does the locking. https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L398 For more information you can look at: https://github.com/DPDK/dpdk/blob/main/lib/eal/include/rte_service.h#L120 What about the https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L404 call (for MT safe services)? There's no lock held there.
RE: Service core statistics MT safety
> > > From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] > > Sent: Monday, 27 June 2022 13.06 > > > > Hi. > > > > Is it safe to enable stats on MT safe services? > > > > https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L3 > > 6 > > 6 > > > > It seems to me this would have to be an __atomic_add for this code to > > produce deterministic results. > > I agree. The same goes for the 'calls' field. The calling function does the locking. https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L398 For more information you can look at: https://github.com/DPDK/dpdk/blob/main/lib/eal/include/rte_service.h#L120
[PATCH] raw/ioat: Check for the NULL pointer after calling malloc
From: Shiqi Liu <835703...@qq.com> As the possible failure of the malloc(), the not_checked and checked could be NULL pointer. Therefore, it should be better to check it in order to avoid the dereference of the NULL pointer. Fixes: b7aaf417f93 ("raw/ioat: add bus driver for device scanning automatically") Signed-off-by: Shiqi Liu <835703...@qq.com> --- drivers/raw/ioat/idxd_bus.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/raw/ioat/idxd_bus.c b/drivers/raw/ioat/idxd_bus.c index 539f51b1b1..8ab4ed5885 100644 --- a/drivers/raw/ioat/idxd_bus.c +++ b/drivers/raw/ioat/idxd_bus.c @@ -301,6 +301,10 @@ dsa_scan(void) IOAT_PMD_DEBUG("%s(): found %s/%s", __func__, path, wq->d_name); dev = malloc(sizeof(*dev)); + if (dev == NULL) { + closedir(dev_dir); + return ENOMEM; + } if (dsa_addr_parse(wq->d_name, &dev->addr) < 0) { IOAT_PMD_ERR("Error parsing WQ name: %s", wq->d_name); free(dev); -- 2.35.1.windows.2
RE: Service core statistics MT safety
> >> > >>> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] > >>> Sent: Monday, 27 June 2022 13.06 > >>> > >>> Hi. > >>> > >>> Is it safe to enable stats on MT safe services? > >>> > >>> https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c# > >>> L3 > >>> 6 > >>> 6 > >>> > >>> It seems to me this would have to be an __atomic_add for this code > >>> to produce deterministic results. > >> > >> I agree. The same goes for the 'calls' field. > > The calling function does the locking. > > https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L3 > > 98 > > > > For more information you can look at: > > https://github.com/DPDK/dpdk/blob/main/lib/eal/include/rte_service.h#L > > 120 > > > > What about the > https://github.com/DPDK/dpdk/blob/main/lib/eal/common/rte_service.c#L404 > call (for MT safe services)? > > There's no lock held there. Good point. This is the case where the service running in service cores is MT safe. However, the stats are incremented outside of the MT Safety mechanism employed by the service. So, yes, this and other updates in the function 'service_runner_do_callback' need to be updated atomically.
Re: Service core statistics MT safety
On 2022-06-27 20:19, Honnappa Nagarahalli wrote: > > From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] > Sent: Monday, 27 June 2022 13.06 > > Hi. > > Is it safe to enable stats on MT safe services? > > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444731-6096fdb16385f88f&q=1&e=27b94605-d1e2-40b6-af6d-9ebc54d5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2Feal%2Fcommon%2Frte_service.c%23 > L3 > 6 > 6 > > It seems to me this would have to be an __atomic_add for this code > to produce deterministic results. I agree. The same goes for the 'calls' field. >>> The calling function does the locking. >>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444731-5ce419f8bf9f9b76&q=1&e=27b94605-d1e2-40b6-af6d-9ebc54d5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2Feal%2Fcommon%2Frte_service.c%23L3 >>> 98 >>> >>> For more information you can look at: >>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444731-ba0d1416f08856f0&q=1&e=27b94605-d1e2-40b6-af6d-9ebc54d5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2Feal%2Finclude%2Frte_service.h%23L >>> 120 >>> >> >> What about the >> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444731-b64334addc78c264&q=1&e=27b94605-d1e2-40b6-af6d-9ebc54d5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2Feal%2Fcommon%2Frte_service.c%23L404 >> call (for MT safe services)? >> >> There's no lock held there. > Good point. > This is the case where the service running in service cores is MT safe. > However, the stats are incremented outside of the MT Safety mechanism > employed by the service. So, yes, this and other updates in the function > 'service_runner_do_callback' need to be updated atomically. Maybe a better solution would be to move this to the core_state struct (and eliminate the "calls" field since the same information is already in the core_state struct). The contention on these cache lines will be pretty crazy for services with short run time (say a thousand cycles or less per call), assuming they are mapped to many cores. Idle service cores will basically do nothing else than stall waiting for these lines, I suspect, hampering the progress of more busy cores.
RE: [PATCH v4] net: fix checksum with unaligned buffer
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Monday, 27 June 2022 19.23 > > On 2022-06-27 15:22, Morten Brørup wrote: > >> From: Emil Berg [mailto:emil.b...@ericsson.com] > >> Sent: Monday, 27 June 2022 14.51 > >> > >>> From: Emil Berg > >>> Sent: den 27 juni 2022 14:46 > >>> > From: Mattias Rönnblom > Sent: den 27 juni 2022 14:28 > > On 2022-06-23 14:51, Morten Brørup wrote: > >> From: Morten Brørup [mailto:m...@smartsharesystems.com] > >> Sent: Thursday, 23 June 2022 14.39 > >> > >> With this patch, the checksum can be calculated on an unaligned > >> buffer. > >> I.e. the buf parameter is no longer required to be 16 bit > >> aligned. > >> > >> The checksum is still calculated using a 16 bit aligned pointer, > >> so > >> the compiler can auto-vectorize the function's inner loop. > >> > >> When the buffer is unaligned, the first byte of the buffer is > >> handled separately. Furthermore, the calculated checksum of the > >> buffer is byte shifted before being added to the initial > >> checksum, > >> to compensate for the checksum having been calculated on the > >> buffer > >> shifted by one byte. > >> > >> v4: > >> * Add copyright notice. > >> * Include stdbool.h (Emil Berg). > >> * Use RTE_PTR_ADD (Emil Berg). > >> * Fix one more typo in commit message. Is 'unligned' even a > >> word? > >> v3: > >> * Remove braces from single statement block. > >> * Fix typo in commit message. > >> v2: > >> * Do not assume that the buffer is part of an aligned packet > >> buffer. > >> > >> Bugzilla ID: 1035 > >> Cc: sta...@dpdk.org > >> > >> Signed-off-by: Morten Brørup [...] > > The compiler will be able to auto vectorize even unaligned > >> accesses, > just with different instructions. From what I can tell, there's no > performance impact, at least not on the x86_64 systems I tried on. > > I think you should remove the first special case conditional and > >> use > memcpy() instead of the cumbersome __may_alias__ construct to > >> retrieve > the data. > > >>> > >>> Here: > >>> https://www.agner.org/optimize/instruction_tables.pdf > >>> it lists the latency of vmovdqa (aligned) as 6 cycles and the > latency > >> for > >>> vmovdqu (unaligned) as 7 cycles. So I guess there can be some > >> difference. > >>> Although in practice I'm not sure what difference it makes. I've > not > >> seen any > >>> difference in runtime between the two versions. > >>> > >> > >> Correction to my comment: > >> Those stats are for some older CPU. For some newer CPUs such as > Tiger > >> Lake the stats seem to be the same regardless of aligned or > unaligned. > >> > > > > I agree that the memcpy method is more elegant and easy to read. > > > > However, we would need to performance test the modified checksum > function with a large number of CPUs to prove that we don't introduce a > performance regression on any CPU architecture still supported by DPDK. > And Emil already found a CPU where it costs 1 extra cycle per 16 bytes, > which adds up to a total of ca. 91 extra cycles on a 1460 byte TCP > packet. > > > > I think you've misunderstood what latency means in such tables. It's a > data dependency thing, not a measure of throughput. The throughput is > *much* higher. My guess would be two such instruction per clock. > > For your 1460 bytes example, my Zen3 AMD needs performs identical with > both the current DPDK implementation, your patch, and a memcpy()-ified > version of the current implementation. They all need ~130 clock > cycles/packet, with warm caches. IPC is 3 instructions per cycle, but > obvious not all instructions are SIMD. You're right, I wasn't thinking deeper about it before extrapolating. Great to see some real numbers! I wish someone would do the same testing on an old ARM CPU, so we could also see the other end of the scale. > The main issue with checksumming on the CPU is, in my experience, not > that you don't have enough compute, but that you trash the caches. Agree. I have noticed that x86 has "non-temporal" instruction variants to load/store data without trashing the cache entirely. A variant of the checksum function using such instructions might be handy. Variants of the memcpy function using such instructions might also be handy for some purposes, e.g. copying the contents of packets, where the original and/or copy will not accessed shortly thereafter. > > So I opted for a solution with zero changes to the inner loop, so no > performance retesting is required (for the previously supported use > cases, where the buffer is aligned). > > > > You will see performance degradation with this solution as well, under > certain conditions. For unaligned 100 bytes of data, the current DPDK > implementation and the memcpy()-fied version needs ~21 cc/packet. Your > patch needs 54 cc/packet. Yes, it's a tradeoff. I
[PATCH v2] baseband/turbo_sw: Remove flexran_sdk meson option
v2: typo in documentation Hi Thomas, This is change you requested earlier this year. This is targeting 22.11. Basically we no longer pass a specific option but rely on pkgconfig. There is small change to generate the .pc files that Intel will make available by end of year. Still the related change in DPDK is available now. Thanks Nic Nicolas Chautru (1): baseband/turbo_sw: remove Flexran SDK meson option doc/guides/bbdevs/turbo_sw.rst| 6 -- drivers/baseband/turbo_sw/meson.build | 36 +-- meson_options.txt | 2 -- 3 files changed, 17 insertions(+), 27 deletions(-) -- 1.8.3.1
[PATCH v2] baseband/turbo_sw: remove Flexran SDK meson option
The related dependency to build the PMD based on the SDK libraries is now enabled through pkfconfig. Signed-off-by: Nicolas Chautru --- doc/guides/bbdevs/turbo_sw.rst| 6 -- drivers/baseband/turbo_sw/meson.build | 36 +-- meson_options.txt | 2 -- 3 files changed, 17 insertions(+), 27 deletions(-) diff --git a/doc/guides/bbdevs/turbo_sw.rst b/doc/guides/bbdevs/turbo_sw.rst index 1e23e37..24a9621 100644 --- a/doc/guides/bbdevs/turbo_sw.rst +++ b/doc/guides/bbdevs/turbo_sw.rst @@ -136,7 +136,8 @@ In order to enable this virtual bbdev PMD, the user may: FlexRAN SDK libraries were installed. And ``DIR_WIRELESS_SDK`` to the path where the libraries were extracted. -* Tune the meson build option pointing the location of the FlexRAN SDK libraries ``flexran_sdk`` +* Point pkgconfig towards these libraries so that they can be automatically found by meson. + If not DPDK will still compile but the related functionality would be stubbed out. Example: @@ -144,8 +145,9 @@ Example: export FLEXRAN_SDK=/FlexRAN-FEC-SDK-19-04/sdk/build-avx2-icc/install export DIR_WIRELESS_SDK=/FlexRAN-FEC-SDK-19-04/sdk/build-avx2-icc/ +export PKG_CONFIG_PATH=$DIR_WIRELESS_SDK/pkgcfg:$PKG_CONFIG_PATH cd build -meson configure -Dflexran_sdk=/FlexRAN-FEC-SDK-19-04/sdk/build-avx512-icc/install +meson configure * For AVX512 machines with SDK libraries installed then both 4G and 5G can be enabled for full real time FEC capability. For AVX2 machines it is possible to only enable the 4G libraries and the PMD capabilities will be limited to 4G FEC. diff --git a/drivers/baseband/turbo_sw/meson.build b/drivers/baseband/turbo_sw/meson.build index 477b8b3..9e35156 100644 --- a/drivers/baseband/turbo_sw/meson.build +++ b/drivers/baseband/turbo_sw/meson.build @@ -1,38 +1,28 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2019 Intel Corporation -path = get_option('flexran_sdk') +# check for FlexRAN SDK libraries +dep_turbo = dependency('flexran_sdk_turbo', required: false) +dep_dec5g = dependency('flexran_sdk_ldpc_decoder_5gnr', required: false) -# check for FlexRAN SDK libraries for AVX2 -lib4g = cc.find_library('libturbo', dirs: [path + '/lib_turbo'], required: false) -if lib4g.found() -ext_deps += cc.find_library('libturbo', dirs: [path + '/lib_turbo'], required: true) -ext_deps += cc.find_library('libcrc', dirs: [path + '/lib_crc'], required: true) -ext_deps += cc.find_library('librate_matching', dirs: [path + '/lib_rate_matching'], required: true) -ext_deps += cc.find_library('libcommon', dirs: [path + '/lib_common'], required: true) +if dep_turbo.found() ext_deps += cc.find_library('libstdc++', required: true) ext_deps += cc.find_library('libirc', required: true) ext_deps += cc.find_library('libimf', required: true) ext_deps += cc.find_library('libipps', required: true) ext_deps += cc.find_library('libsvml', required: true) -includes += include_directories(path + '/lib_turbo') -includes += include_directories(path + '/lib_crc') -includes += include_directories(path + '/lib_rate_matching') -includes += include_directories(path + '/lib_common') + ext_deps += dep_turbo + ext_deps += dependency('flexran_sdk_crc', required: true) + ext_deps += dependency('flexran_sdk_rate_matching', required: true) + ext_deps += dependency('flexran_sdk_common', required: true) cflags += ['-DRTE_BBDEV_SDK_AVX2'] endif -# check for FlexRAN SDK libraries for AVX512 -lib5g = cc.find_library('libldpc_decoder_5gnr', dirs: [path + '/lib_ldpc_decoder_5gnr'], required: false) -if lib5g.found() -ext_deps += cc.find_library('libldpc_encoder_5gnr', dirs: [path + '/lib_ldpc_encoder_5gnr'], required: true) -ext_deps += cc.find_library('libldpc_decoder_5gnr', dirs: [path + '/lib_ldpc_decoder_5gnr'], required: true) -ext_deps += cc.find_library('libLDPC_ratematch_5gnr', dirs: [path + '/lib_LDPC_ratematch_5gnr'], required: true) -ext_deps += cc.find_library('librate_dematching_5gnr', dirs: [path + '/lib_rate_dematching_5gnr'], required: true) -includes += include_directories(path + '/lib_ldpc_encoder_5gnr') -includes += include_directories(path + '/lib_ldpc_decoder_5gnr') -includes += include_directories(path + '/lib_LDPC_ratematch_5gnr') -includes += include_directories(path + '/lib_rate_dematching_5gnr') +if dep_dec5g.found() + ext_deps += dep_dec5g + ext_deps += dependency('flexran_sdk_ldpc_encoder_5gnr', required: true) + ext_deps += dependency('flexran_sdk_LDPC_ratematch_5gnr', required: true) + ext_deps += dependency('flexran_sdk_rate_dematching_5gnr', required: true) cflags += ['-DRTE_BBDEV_SDK_AVX512'] endif diff --git a/meson_options.txt b/meson_options.txt index 7c220ad..abaa304 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -22,8 +22,6 @@ option('enable_kmods', ty
RE: [PATCH v4] doc: announce changes in bbdev related to enum extension
Hi Thomas, Kind reminder on this one. Thanks Nic > -Original Message- > From: Chautru, Nicolas > Sent: Friday, June 17, 2022 9:13 AM > To: dev@dpdk.org; tho...@monjalon.net > Cc: t...@redhat.com; Kinsella, Ray ; Richardson, > Bruce ; hemant.agra...@nxp.com; > david.march...@redhat.com; step...@networkplumber.org; Maxime > Coquelin ; gak...@marvell.com > Subject: RE: [PATCH v4] doc: announce changes in bbdev related to enum > extension > > Hi Thomas, > Can this one be applied based on your review? > Thanks > Nic > > > -Original Message- > > From: Maxime Coquelin > > Sent: Thursday, June 9, 2022 12:54 AM > > To: Chautru, Nicolas ; dev@dpdk.org; > > gak...@marvell.com; tho...@monjalon.net > > Cc: t...@redhat.com; Kinsella, Ray ; > > Richardson, Bruce ; > > hemant.agra...@nxp.com; david.march...@redhat.com; > > step...@networkplumber.org > > Subject: Re: [PATCH v4] doc: announce changes in bbdev related to enum > > extension > > > > Hi Nicolas, > > > > On 6/9/22 02:34, Nicolas Chautru wrote: > > > Intent to resolve in DPDK 22.11 historical usage which prevents > > > graceful extension of enum and API without troublesome ABI breakage > > > as well as extending API RTE_BBDEV_OP_FFT for new operation type in > > > bbdev as well as other new members in existing structures. > > > > > > Signed-off-by: Nicolas Chautru > > > --- > > > doc/guides/rel_notes/deprecation.rst | 11 +++ > > > 1 file changed, 11 insertions(+) > > > > > > diff --git a/doc/guides/rel_notes/deprecation.rst > > > b/doc/guides/rel_notes/deprecation.rst > > > index 4e5b23c..c8ab1ec 100644 > > > --- a/doc/guides/rel_notes/deprecation.rst > > > +++ b/doc/guides/rel_notes/deprecation.rst > > > @@ -112,6 +112,17 @@ Deprecation Notices > > > session and the private data of session. An opaque pointer can be > exposed > > > directly to application which can be attached to the > > > ``rte_crypto_op``. > > > > > > +* bbdev: ``RTE_BBDEV_OP_TYPE_COUNT`` terminating the > > > +``rte_bbdev_op_type`` > > > + enum will be deprecated and instead use fixed array size when > > > +required to allow for > > > + future enum extension. > > > + Will extend API to support new operation type > > > +``RTE_BBDEV_OP_FFT`` as per this > > > + RFC https://patchwork.dpdk.org/project/dpdk/list/?series=22111 > > > + New members will be added in ``rte_bbdev_driver_info`` to expose > > > +PMD queue topology inspired > > > + by this RFC > > > +https://patches.dpdk.org/project/dpdk/list/?series=22076 > > > + New member will be added in ``rte_bbdev_driver_info`` to expose > > > +the device status as per > > > + this RFC https://patches.dpdk.org/project/dpdk/list/?series=23367 > > > + This should be updated in DPDK 22.11. > > > + > > > * security: Hide structure ``rte_security_session`` and expose an opaque > > > pointer for the private data to the application which can be attached > > > to the packet while enqueuing. > > > > Thanks for rewording the notice. > > > > Acked-by: Maxime Coquelin > > > > Maxime
RE: Service core statistics MT safety
> > > From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] > > Sent: Monday, 27 June 2022 13.06 > > > > Hi. > > > > Is it safe to enable stats on MT safe services? > > > > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af- > 4 > > 5444731-6096fdb16385f88f&q=1&e=27b94605-d1e2-40b6-af6d- > 9ebc54d > > > 5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain% > 2Flib > > %2Feal%2Fcommon%2Frte_service.c%23 > > L3 > > 6 > > 6 > > > > It seems to me this would have to be an __atomic_add for this code > > to produce deterministic results. > > I agree. The same goes for the 'calls' field. > >>> The calling function does the locking. > >>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af- > 454 > >>> 44731-5ce419f8bf9f9b76&q=1&e=27b94605-d1e2-40b6-af6d- > 9ebc54d5db1 > >>> > 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib > %2Feal > >>> %2Fcommon%2Frte_service.c%23L3 > >>> 98 > >>> > >>> For more information you can look at: > >>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af- > 454 > >>> 44731-ba0d1416f08856f0&q=1&e=27b94605-d1e2-40b6-af6d- > 9ebc54d5db1 > >>> > 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib > %2Feal > >>> %2Finclude%2Frte_service.h%23L > >>> 120 > >>> > >> > >> What about the > >> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af- > 4544 > >> 4731-b64334addc78c264&q=1&e=27b94605-d1e2-40b6-af6d- > 9ebc54d5db18& > >> > u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2 > Feal%2F > >> common%2Frte_service.c%23L404 > >> call (for MT safe services)? > >> > >> There's no lock held there. > > Good point. > > This is the case where the service running in service cores is MT safe. > > However, > the stats are incremented outside of the MT Safety mechanism employed by the > service. So, yes, this and other updates in the function > 'service_runner_do_callback' need to be updated atomically. > > Maybe a better solution would be to move this to the core_state struct (and > eliminate the "calls" field since the same information is already in the > core_state > struct). The contention on these cache lines will be pretty crazy for > services with > short run time (say a thousand cycles or less per call), assuming they are > mapped to many cores. That's one option, the structures are internal as well. With this option stats need to be aggregated which will not give an accurate view. But, that is the nature of the statistics. I am also wondering if these stats are of any use other than for debugging. Adding a capability to disable stats might help as well. > > Idle service cores will basically do nothing else than stall waiting for > these lines, I > suspect, hampering the progress of more busy cores.
[PATCH v3 0/7] bbdev changes for 22.11
v3: update to device status info to also use padded size for the related array. Adding also 2 additionals commits to allow the API struc to expose more information related to queues corner cases/warning as well as an optional rw lock. Hemant, Maxime, this is planned for DPDK 21.11 but would like review/ack early is possible to get this applied earlier and due to time off this summer. Thanks Nic -- Hi, Agregating together in a single serie a number of bbdev api changes previously submitted over the last few months and all targeted for 22.11 (4 different series detailed below). Related deprecation notice being pushed in 22.07 in parallel. * bbdev: add device status info * bbdev: add new operation for FFT processing * bbdev: add device info on queue topology * bbdev: allow operation type enum for growth v2: Update to the RTE_BBDEV_COUNT removal based on feedback from Thomas/Stephen : rejecting out of range op type and adjusting the new name for the padded maximum value used for fixed size arrays. --- Previous cover letters agregated below: * bbdev: add device status info https://patches.dpdk.org/project/dpdk/list/?series=23367 The updated structure will allow PMDs to expose through info_get what be may the status of the underlying accelerator, notably in case an HW error event having happened. * bbdev: add new operation for FFT processing https://patches.dpdk.org/project/dpdk/list/?series=22111 This contribution adds a new operation type to the existing ones already supported by the bbdev PMDs. This set of operation is FFT-based processing for 5GNR baseband processing acceleration. This operates in the same lookaside fashion as other existing bbdev operation with a dedicated set of capabilities and parameters (marked as experimental). I plan to also include a new PMD supporting this operation (and most of the related capabilities) in the next couple of months (either in 22.06 or 22.09) as well as extending the related bbdev-test. * bbdev: add device info on queue topology https://patches.dpdk.org/project/dpdk/list/?series=22076 Addressing an historical concern that the device info struct only imperfectly captured what queues are available on the device (number of operation and priority). This ended up being an iterative process for application to find each queue could be configured. ie. the gap was captured as technical debt previously in comments /* This isn't ideal because it reports the maximum number of queues but * does not provide info on how many can be uplink/downlink or different * priorities */ This is now being exposed explictly based on the what the device actually supports using the existing info_get api * bbdev: allow operation type enum for growth https://patches.dpdk.org/project/dpdk/list/?series=23509 This is related to the general intent to remove using MAX value for enums. There is consensus that we should avoid this for a while notably for future-proofed ABI concerns https://patches.dpdk.org/project/dpdk/patch/20200130142003.2645765-1-ferruh.yi...@intel.com/. But still there is arguably not yet an explicit best recommendation to handle this especially when we actualy need to expose array whose index is such an enum. As a specific example here I am refering to RTE_BBDEV_OP_TYPE_COUNT in enum rte_bbdev_op_type which is being extended for new operation type being support in bbdev (such as https://patches.dpdk.org/project/dpdk/patch/1646956157-245769-2-git-send-email-nicolas.chau...@intel.com/ adding new FFT operation) There is also the intent to be able to expose information for each operation type through the bbdev api such as dynamically configured queues information per such operation type https://patches.dpdk.org/project/dpdk/patch/1646785355-168133-2-git-send-email-nicolas.chau...@intel.com/ Basically we are considering best way to accomodate for this, notably based on discussions with Ray Kinsella and Bruce Richardson, to handle such a case moving forward: specifically for the example with RTE_BBDEV_OP_TYPE_COUNT and also more generally. One possible option is captured in that patchset and is basically based on the simple principle to allow for growth and prevent ABI breakage. Ie. the last value of the enum is set with a higher value than required so that to allow insertion of new enum outside of the major ABI versions. In that case the RTE_BBDEV_OP_TYPE_COUNT is still present and can be exposed and used while still allowing for addition thanks to the implicit padding-like room. As an alternate variant, instead of using that last enum value, that extended size could be exposed as an #define outside of the enum but would be fundamentally the same (public). Another option would be to avoid array alltogether and use each time this a new dedicated API function (operation type enum being an input argument instead of an index to an array in an existing structure so that to get access to structure related to a given operation
[PATCH v3 1/7] bbdev: allow operation type enum for growth
Updating the enum for rte_bbdev_op_type to allow to keep ABI compatible for enum insertion while adding padded maximum value for array need. Removing RTE_BBDEV_OP_TYPE_COUNT and instead exposing RTE_BBDEV_OP_TYPE_PADDED_MAX. Signed-off-by: Nicolas Chautru --- app/test-bbdev/test_bbdev.c | 2 +- app/test-bbdev/test_bbdev_perf.c | 4 ++-- examples/bbdev_app/main.c| 2 +- lib/bbdev/rte_bbdev.c| 9 + lib/bbdev/rte_bbdev_op.h | 2 +- 5 files changed, 10 insertions(+), 9 deletions(-) diff --git a/app/test-bbdev/test_bbdev.c b/app/test-bbdev/test_bbdev.c index ac06d73..1063f6e 100644 --- a/app/test-bbdev/test_bbdev.c +++ b/app/test-bbdev/test_bbdev.c @@ -521,7 +521,7 @@ struct bbdev_testsuite_params { rte_mempool_free(mp); TEST_ASSERT((mp = rte_bbdev_op_pool_create("Test_INV", - RTE_BBDEV_OP_TYPE_COUNT, size, cache_size, 0)) == NULL, + RTE_BBDEV_OP_TYPE_PADDED_MAX, size, cache_size, 0)) == NULL, "Failed test for rte_bbdev_op_pool_create: " "returned value is not NULL for invalid type"); diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c index fad3b1e..1abda2d 100644 --- a/app/test-bbdev/test_bbdev_perf.c +++ b/app/test-bbdev/test_bbdev_perf.c @@ -2428,13 +2428,13 @@ typedef int (test_case_function)(struct active_device *ad, /* Find capabilities */ const struct rte_bbdev_op_cap *cap = info.drv.capabilities; - for (i = 0; i < RTE_BBDEV_OP_TYPE_COUNT; i++) { + do { if (cap->type == test_vector.op_type) { capabilities = cap; break; } cap++; - } + } while (cap->type != RTE_BBDEV_OP_NONE); TEST_ASSERT_NOT_NULL(capabilities, "Couldn't find capabilities"); diff --git a/examples/bbdev_app/main.c b/examples/bbdev_app/main.c index fc7e8b8..ef0ba76 100644 --- a/examples/bbdev_app/main.c +++ b/examples/bbdev_app/main.c @@ -1041,7 +1041,7 @@ uint16_t bbdev_parse_number(const char *mask) void *sigret; struct app_config_params app_params = def_app_config; struct rte_mempool *ethdev_mbuf_mempool, *bbdev_mbuf_mempool; - struct rte_mempool *bbdev_op_pools[RTE_BBDEV_OP_TYPE_COUNT]; + struct rte_mempool *bbdev_op_pools[RTE_BBDEV_OP_TYPE_PADDED_MAX]; struct lcore_conf lcore_conf[RTE_MAX_LCORE] = { {0} }; struct lcore_statistics lcore_stats[RTE_MAX_LCORE] = { {0} }; struct stats_lcore_params stats_lcore; diff --git a/lib/bbdev/rte_bbdev.c b/lib/bbdev/rte_bbdev.c index aaee7b7..22bd894 100644 --- a/lib/bbdev/rte_bbdev.c +++ b/lib/bbdev/rte_bbdev.c @@ -23,6 +23,8 @@ #define DEV_NAME "BBDEV" +/* Number of supported operation types */ +#define BBDEV_OP_TYPE_COUNT 5 /* BBDev library logging ID */ RTE_LOG_REGISTER_DEFAULT(bbdev_logtype, NOTICE); @@ -890,10 +892,10 @@ struct rte_mempool * return NULL; } - if (type >= RTE_BBDEV_OP_TYPE_COUNT) { + if (type >= BBDEV_OP_TYPE_COUNT) { rte_bbdev_log(ERR, "Invalid op type (%u), should be less than %u", - type, RTE_BBDEV_OP_TYPE_COUNT); + type, BBDEV_OP_TYPE_COUNT); return NULL; } @@ -1122,10 +1124,9 @@ struct rte_mempool * "RTE_BBDEV_OP_TURBO_DEC", "RTE_BBDEV_OP_TURBO_ENC", "RTE_BBDEV_OP_LDPC_DEC", - "RTE_BBDEV_OP_LDPC_ENC", }; - if (op_type < RTE_BBDEV_OP_TYPE_COUNT) + if (op_type < BBDEV_OP_TYPE_COUNT) return op_types[op_type]; rte_bbdev_log(ERR, "Invalid operation type"); diff --git a/lib/bbdev/rte_bbdev_op.h b/lib/bbdev/rte_bbdev_op.h index 6d56133..cd82418 100644 --- a/lib/bbdev/rte_bbdev_op.h +++ b/lib/bbdev/rte_bbdev_op.h @@ -748,7 +748,7 @@ enum rte_bbdev_op_type { RTE_BBDEV_OP_TURBO_ENC, /**< Turbo encode */ RTE_BBDEV_OP_LDPC_DEC, /**< LDPC decode */ RTE_BBDEV_OP_LDPC_ENC, /**< LDPC encode */ - RTE_BBDEV_OP_TYPE_COUNT, /**< Count of different op types */ + RTE_BBDEV_OP_TYPE_PADDED_MAX = 8, /**< Maximum op type number including padding */ }; /** Bit indexes of possible errors reported through status field */ -- 1.8.3.1
[PATCH v3 2/7] bbdev: add device status info
Added device status information, so that the PMD can expose information related to the underlying accelerator device status. Minor order change in structure to fit into padding hole. Signed-off-by: Nicolas Chautru --- drivers/baseband/acc100/rte_acc100_pmd.c | 1 + drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c | 1 + drivers/baseband/fpga_lte_fec/fpga_lte_fec.c | 1 + drivers/baseband/la12xx/bbdev_la12xx.c | 1 + drivers/baseband/null/bbdev_null.c | 1 + drivers/baseband/turbo_sw/bbdev_turbo_software.c | 1 + lib/bbdev/rte_bbdev.c | 24 +++ lib/bbdev/rte_bbdev.h | 35 -- lib/bbdev/version.map | 6 9 files changed, 69 insertions(+), 2 deletions(-) diff --git a/drivers/baseband/acc100/rte_acc100_pmd.c b/drivers/baseband/acc100/rte_acc100_pmd.c index de7e4bc..17ba798 100644 --- a/drivers/baseband/acc100/rte_acc100_pmd.c +++ b/drivers/baseband/acc100/rte_acc100_pmd.c @@ -1060,6 +1060,7 @@ /* Read and save the populated config from ACC100 registers */ fetch_acc100_config(dev); + dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED; /* This isn't ideal because it reports the maximum number of queues but * does not provide info on how many can be uplink/downlink or different diff --git a/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c b/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c index 82ae6ba..57b12af 100644 --- a/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c +++ b/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c @@ -369,6 +369,7 @@ dev_info->capabilities = bbdev_capabilities; dev_info->cpu_flag_reqs = NULL; dev_info->data_endianness = RTE_LITTLE_ENDIAN; + dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED; /* Calculates number of queues assigned to device */ dev_info->max_num_queues = 0; diff --git a/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c b/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c index 21d3529..2a330c4 100644 --- a/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c +++ b/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c @@ -645,6 +645,7 @@ struct __rte_cache_aligned fpga_queue { dev_info->capabilities = bbdev_capabilities; dev_info->cpu_flag_reqs = NULL; dev_info->data_endianness = RTE_LITTLE_ENDIAN; + dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED; /* Calculates number of queues assigned to device */ dev_info->max_num_queues = 0; diff --git a/drivers/baseband/la12xx/bbdev_la12xx.c b/drivers/baseband/la12xx/bbdev_la12xx.c index 4d1bd16..c1f88c6 100644 --- a/drivers/baseband/la12xx/bbdev_la12xx.c +++ b/drivers/baseband/la12xx/bbdev_la12xx.c @@ -100,6 +100,7 @@ struct bbdev_la12xx_params { dev_info->capabilities = bbdev_capabilities; dev_info->cpu_flag_reqs = NULL; dev_info->min_alignment = 64; + dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED; rte_bbdev_log_debug("got device info from %u", dev->data->dev_id); } diff --git a/drivers/baseband/null/bbdev_null.c b/drivers/baseband/null/bbdev_null.c index 248e129..94a1976 100644 --- a/drivers/baseband/null/bbdev_null.c +++ b/drivers/baseband/null/bbdev_null.c @@ -82,6 +82,7 @@ struct bbdev_queue { * here for code completeness. */ dev_info->data_endianness = RTE_LITTLE_ENDIAN; + dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED; rte_bbdev_log_debug("got device info from %u", dev->data->dev_id); } diff --git a/drivers/baseband/turbo_sw/bbdev_turbo_software.c b/drivers/baseband/turbo_sw/bbdev_turbo_software.c index af7bc41..dbc5524 100644 --- a/drivers/baseband/turbo_sw/bbdev_turbo_software.c +++ b/drivers/baseband/turbo_sw/bbdev_turbo_software.c @@ -254,6 +254,7 @@ struct turbo_sw_queue { dev_info->min_alignment = 64; dev_info->harq_buffer_size = 0; dev_info->data_endianness = RTE_LITTLE_ENDIAN; + dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED; rte_bbdev_log_debug("got device info from %u\n", dev->data->dev_id); } diff --git a/lib/bbdev/rte_bbdev.c b/lib/bbdev/rte_bbdev.c index 22bd894..555bda9 100644 --- a/lib/bbdev/rte_bbdev.c +++ b/lib/bbdev/rte_bbdev.c @@ -25,6 +25,8 @@ /* Number of supported operation types */ #define BBDEV_OP_TYPE_COUNT 5 +/* Number of supported device status */ +#define BBDEV_DEV_STATUS_COUNT 9 /* BBDev library logging ID */ RTE_LOG_REGISTER_DEFAULT(bbdev_logtype, NOTICE); @@ -1132,3 +1134,25 @@ struct rte_mempool * rte_bbdev_log(ERR, "Invalid operation type"); return NULL; } + +const char * +rte_bbdev_device_status_str(enum rte_bbdev_device_status status) +{ + static const char * const dev_sta_string[] = { + "RTE_BBDEV_DEV_NOSTATUS", + "RTE_BBDEV_DEV_NOT_SUPPORTED", + "RTE_BBDEV_DE
[PATCH v3 3/7] bbdev: add device info on queue topology
Adding more options in the API to expose the number of queues exposed and related priority. Signed-off-by: Nicolas Chautru --- lib/bbdev/rte_bbdev.h | 4 1 file changed, 4 insertions(+) diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h index 9b1ffa4..ac941d6 100644 --- a/lib/bbdev/rte_bbdev.h +++ b/lib/bbdev/rte_bbdev.h @@ -289,6 +289,10 @@ struct rte_bbdev_driver_info { /** Maximum number of queues supported by the device */ unsigned int max_num_queues; + /** Maximum number of queues supported per operation type */ + unsigned int num_queues[RTE_BBDEV_OP_TYPE_PADDED_MAX]; + /** Priority level supported per operation type */ + unsigned int queue_priority[RTE_BBDEV_OP_TYPE_PADDED_MAX]; /** Queue size limit (queue size must also be power of 2) */ uint32_t queue_size_lim; /** Set if device off-loads operation to hardware */ -- 1.8.3.1
[PATCH v3 4/7] drivers/baseband: update PMDs to expose queue per operation
Add support in existing bbdev PMDs for the explicit number of queue and priority for each operation type configured on the device. Signed-off-by: Nicolas Chautru --- drivers/baseband/acc100/rte_acc100_pmd.c | 29 +- drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c | 8 ++ drivers/baseband/fpga_lte_fec/fpga_lte_fec.c | 8 ++ drivers/baseband/la12xx/bbdev_la12xx.c | 7 ++ drivers/baseband/turbo_sw/bbdev_turbo_software.c | 11 5 files changed, 51 insertions(+), 12 deletions(-) diff --git a/drivers/baseband/acc100/rte_acc100_pmd.c b/drivers/baseband/acc100/rte_acc100_pmd.c index 17ba798..d568d0d 100644 --- a/drivers/baseband/acc100/rte_acc100_pmd.c +++ b/drivers/baseband/acc100/rte_acc100_pmd.c @@ -966,6 +966,7 @@ struct rte_bbdev_driver_info *dev_info) { struct acc100_device *d = dev->data->dev_private; + int i; static const struct rte_bbdev_op_cap bbdev_capabilities[] = { { @@ -1062,19 +1063,23 @@ fetch_acc100_config(dev); dev_info->device_status = RTE_BBDEV_DEV_NOT_SUPPORTED; - /* This isn't ideal because it reports the maximum number of queues but -* does not provide info on how many can be uplink/downlink or different -* priorities -*/ - dev_info->max_num_queues = - d->acc100_conf.q_dl_5g.num_aqs_per_groups * - d->acc100_conf.q_dl_5g.num_qgroups + - d->acc100_conf.q_ul_5g.num_aqs_per_groups * - d->acc100_conf.q_ul_5g.num_qgroups + - d->acc100_conf.q_dl_4g.num_aqs_per_groups * - d->acc100_conf.q_dl_4g.num_qgroups + - d->acc100_conf.q_ul_4g.num_aqs_per_groups * + /* Expose number of queues */ + dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = d->acc100_conf.q_ul_4g.num_aqs_per_groups * d->acc100_conf.q_ul_4g.num_qgroups; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = d->acc100_conf.q_dl_4g.num_aqs_per_groups * + d->acc100_conf.q_dl_4g.num_qgroups; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = d->acc100_conf.q_ul_5g.num_aqs_per_groups * + d->acc100_conf.q_ul_5g.num_qgroups; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = d->acc100_conf.q_dl_5g.num_aqs_per_groups * + d->acc100_conf.q_dl_5g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = d->acc100_conf.q_ul_4g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = d->acc100_conf.q_dl_4g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = d->acc100_conf.q_ul_5g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = d->acc100_conf.q_dl_5g.num_qgroups; + dev_info->max_num_queues = 0; + for (i = RTE_BBDEV_OP_TURBO_DEC; i < RTE_BBDEV_OP_LDPC_ENC; i++) + dev_info->max_num_queues += dev_info->num_queues[i]; dev_info->queue_size_lim = ACC100_MAX_QUEUE_DEPTH; dev_info->hardware_accelerated = true; dev_info->max_dl_queue_priority = diff --git a/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c b/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c index 57b12af..b4982af 100644 --- a/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c +++ b/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c @@ -379,6 +379,14 @@ if (hw_q_id != FPGA_INVALID_HW_QUEUE_ID) dev_info->max_num_queues++; } + /* Expose number of queue per operation type */ + dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = dev_info->max_num_queues / 2; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = dev_info->max_num_queues / 2; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = 1; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = 1; } /** diff --git a/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c b/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c index 2a330c4..dc7f479 100644 --- a/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c +++ b/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c @@ -655,6 +655,14 @@ struct __rte_cache_aligned fpga_queue { if (hw_q_id != FPGA_INVALID_HW_QUEUE_ID) dev_info->max_num_queues++; } + /* Expose number of queue per operation type */ + dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = dev_info->max_num_queues / 2; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = dev_info->max_num_queues / 2; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_LD
[PATCH v3 6/7] bbdev: add queue related warning and status information
This allows to expose more information with regards to any queue related failure and warning which cannot be supported in existing API. Signed-off-by: Nicolas Chautru --- app/test-bbdev/test_bbdev_perf.c | 2 ++ lib/bbdev/rte_bbdev.c| 2 ++ lib/bbdev/rte_bbdev.h| 19 +++ 3 files changed, 23 insertions(+) diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c index 1abda2d..653b21f 100644 --- a/app/test-bbdev/test_bbdev_perf.c +++ b/app/test-bbdev/test_bbdev_perf.c @@ -4360,6 +4360,8 @@ typedef int (test_case_function)(struct active_device *ad, stats->dequeued_count = q_stats->dequeued_count; stats->enqueue_err_count = q_stats->enqueue_err_count; stats->dequeue_err_count = q_stats->dequeue_err_count; + stats->enqueue_warning_count = q_stats->enqueue_warning_count; + stats->dequeue_warning_count = q_stats->dequeue_warning_count; stats->acc_offload_cycles = q_stats->acc_offload_cycles; return 0; diff --git a/lib/bbdev/rte_bbdev.c b/lib/bbdev/rte_bbdev.c index 28b105d..fb59b51 100644 --- a/lib/bbdev/rte_bbdev.c +++ b/lib/bbdev/rte_bbdev.c @@ -723,6 +723,8 @@ struct rte_bbdev * stats->dequeued_count += q_stats->dequeued_count; stats->enqueue_err_count += q_stats->enqueue_err_count; stats->dequeue_err_count += q_stats->dequeue_err_count; + stats->enqueue_warn_count += q_stats->enqueue_warn_count; + stats->dequeue_warn_count += q_stats->dequeue_warn_count; } rte_bbdev_log_debug("Got stats on %u", dev->data->dev_id); } diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h index ed528b8..c625a14 100644 --- a/lib/bbdev/rte_bbdev.h +++ b/lib/bbdev/rte_bbdev.h @@ -224,6 +224,19 @@ struct rte_bbdev_queue_conf { rte_bbdev_queue_stop(uint16_t dev_id, uint16_t queue_id); /** + * Flags indicate the reason why a previous enqueue may not have + * consummed all requested operations + * In case of multiple reasons the latter superdes a previous one + */ +enum rte_bbdev_enqueue_status { + RTE_BBDEV_ENQ_STATUS_NONE, /**< Nothing to report */ + RTE_BBDEV_ENQ_STATUS_QUEUE_FULL, /**< Not enough room in queue */ + RTE_BBDEV_ENQ_STATUS_RING_FULL,/**< Not enough room in ring */ + RTE_BBDEV_ENQ_STATUS_INVALID_OP, /**< Operation was rejected as invalid */ + RTE_BBDEV_ENQ_STATUS_PADDED_MAX = 6, /**< Maximum enq status number including padding */ +}; + +/** * Flags indicate the status of the device */ enum rte_bbdev_device_status { @@ -246,6 +259,12 @@ struct rte_bbdev_stats { uint64_t enqueue_err_count; /** Total error count on operations dequeued */ uint64_t dequeue_err_count; + /** Total warning count on operations enqueued */ + uint64_t enqueue_warn_count; + /** Total warning count on operations dequeued */ + uint64_t dequeue_warn_count; + /** Total enqueue status count based on rte_bbdev_enqueue_status enum */ + uint64_t enqueue_status_count[RTE_BBDEV_ENQ_STATUS_PADDED_MAX]; /** CPU cycles consumed by the (HW/SW) accelerator device to offload * the enqueue request to its internal queues. * - For a HW device this is the cycles consumed in MMIO write -- 1.8.3.1
[PATCH v3 5/7] bbdev: add new operation for FFT processing
Extension of bbdev operation to support FFT based operations. Signed-off-by: Nicolas Chautru Acked-by: Hemant Agrawal --- doc/guides/prog_guide/bbdev.rst | 130 +++ lib/bbdev/rte_bbdev.c | 11 ++- lib/bbdev/rte_bbdev.h | 76 lib/bbdev/rte_bbdev_op.h| 149 lib/bbdev/version.map | 4 ++ 5 files changed, 369 insertions(+), 1 deletion(-) diff --git a/doc/guides/prog_guide/bbdev.rst b/doc/guides/prog_guide/bbdev.rst index 70fa01a..4a055b5 100644 --- a/doc/guides/prog_guide/bbdev.rst +++ b/doc/guides/prog_guide/bbdev.rst @@ -1118,6 +1118,136 @@ Figure :numref:`figure_turbo_tb_decode` above showing the Turbo decoding of CBs using BBDEV interface in TB-mode is also valid for LDPC decode. +BBDEV FFT Operation + + +This operation allows to run a combination of DFT and/or IDFT and/or time-domain windowing. +These can be used in a modular fashion (using bypass modes) or as a processing pipeline +which can be used for FFT-based baseband signal processing. +In more details it allows : +- to process the data first through an IDFT of adjustable size and padding; +- to perform the windowing as a programmable cyclic shift offset of the data followed by a +pointwise multiplication by a time domain window; +- to process the related data through a DFT of adjustable size and depadding for each such cyclic +shift output. + +A flexible number of Rx antennas are being processed in parallel with the same configuration. +The API allows more generally for flexibility in what the PMD may support (cabability flags) and +flexibility to adjust some of the parameters of the processing. + +The operation/capability flags that can be set for each FFT operation are given below. + + **NOTE:** The actual operation flags that may be used with a specific + BBDEV PMD are dependent on the driver capabilities as reported via + ``rte_bbdev_info_get()``, and may be a subset of those below. + +++ +|Description of FFT capability flags | +++ +|RTE_BBDEV_FFT_WINDOWING | +| Set to enable/support windowing in time domain | +++ +|RTE_BBDEV_FFT_CS_ADJUSTMENT | +| Set to enable/support the cyclic shift time offset adjustment | +++ +|RTE_BBDEV_FFT_DFT_BYPASS| +| Set to bypass the DFT and use directly the IDFT as an option | +++ +|RTE_BBDEV_FFT_IDFT_BYPASS | +| Set to bypass the IDFT and use directly the DFT as an option | +++ +|RTE_BBDEV_FFT_WINDOWING_BYPASS | +| Set to bypass the time domain windowing as an option | +++ +|RTE_BBDEV_FFT_POWER_MEAS| +| Set to provide an optional power measument of the DFT output | +++ +|RTE_BBDEV_FFT_FP16_INPUT| +| Set if the input data shall use FP16 format instead of INT16 | +++ +|RTE_BBDEV_FFT_FP16_OUTPUT | +| Set if the output data shall use FP16 format instead of INT16 | +++ + +The structure passed for each FFT operation is given below, +with the operation flags forming a bitmask in the ``op_flags`` field. + +.. code-block:: c + +struct rte_bbdev_op_fft { +struct rte_bbdev_op_data base_input; +struct rte_bbdev_op_data base_output; +struct rte_bbdev_op_data power_meas_output; +uint32_t op_flags; +uint16_t input_sequence_size; +uint16_t input_leading_padding; +uint16_t output_sequence_size; +uint16_t output_leading_depadding; +uint8_t window_index[RTE_BBDEV_MAX_CS_2]; +uint16_t cs_bitmap; +uint8_t num_antennas_log2; +uint8_t idft_log2; +uint8_t dft_log2; +int8_t cs_time_adjustment; +int8_t idft_shift; +int8_t dft_shift; +uint16_t ncs_reciprocal; +uint16_t power_shift; +uint16_t fp16_exp_adjust; +}; + +The FFT parameters are set out in the table below. + ++--+
[PATCH v3 7/7] bbdev: add a lock option for enqueue/dequeue operation
Locking is not explictly required but can be valuable in case the application cannot guarantee to be thread-safe, or specifically is at risk of using the same queue from multiple threads. This is an option for PMD to use this. Signed-off-by: Nicolas Chautru --- lib/bbdev/rte_bbdev.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h index c625a14..e0aa52e 100644 --- a/lib/bbdev/rte_bbdev.h +++ b/lib/bbdev/rte_bbdev.h @@ -458,6 +458,8 @@ struct rte_bbdev_data { int socket_id; /**< NUMA socket that device is on */ bool started; /**< Device run-time state */ uint16_t process_cnt; /** Counter of processes using the device */ + rte_rwlock_t lock_enq; /**< lock protection for the Enqueue */ + rte_rwlock_t lock_deq; /**< lock protection for the Dequeue */ }; /* Forward declarations */ -- 1.8.3.1
[PATCH v3] doc: add release notes for async vhost dequeue data-path
Add release notes for asynchronous vhost dequeue data-path. Emphasize that split virtqueue and packed virtqueue are both supported in asynchronous vhost dequeue data-path. Signed-off-by: Cheng Jiang --- v3: code rebased. v2: fixed a full stop missing in the commit message. doc/guides/rel_notes/release_22_07.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst index 6365800313..e43ab15260 100644 --- a/doc/guides/rel_notes/release_22_07.rst +++ b/doc/guides/rel_notes/release_22_07.rst @@ -107,7 +107,8 @@ New Features * **Added vhost async dequeue API to receive packets from guest.** Added vhost async dequeue API which can leverage DMA devices to - accelerate receiving packets from guest. + accelerate receiving packets from guest. Split virtqueue and packed + virtqueue are both supported. * **Added thread-safe version of in-flight packet clear API in vhost library.** -- 2.35.1
[PATCH v2] doc: add event timer expiry drop stat
The structure ``rte_event_timer_adapter_stats`` will be extended by adding a new field, ``evtim_drop_count``. This stat will represent the number of times an event_timer expiry event is dropped by the event timer adapter. Signed-off-by: Naga Harish K S V --- v2: * update commit message --- doc/guides/rel_notes/deprecation.rst | 6 ++ 1 file changed, 6 insertions(+) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 4e5b23c53d..597a457a37 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -125,3 +125,9 @@ Deprecation Notices applications should be updated to use the ``dmadev`` library instead, with the underlying HW-functionality being provided by the ``ioat`` or ``idxd`` dma drivers + +* eventdev/timer: The structure ``rte_event_timer_adapter_stats`` will be + extended by adding a new field, ``evtim_drop_count``. This stat will + represent the number of times an event_timer expiry event is dropped + by the timer adapter. This field will be used by a future patch adding + support for periodic mode to the software timer adapter in DPDK 22.11. -- 2.25.1
[PATCH v1] net/ice: fix memory allocation issue of packet flag
Current code doesn't allocate memory of lookup element to add packet flag. This patch adds one lookup item in the list to fix this memory issue. Fixes: 8b95092b7f69 ("net/ice/base: fix direction of flow that matches any") Cc: sta...@dpdk.org Signed-off-by: Yuying Zhang --- drivers/net/ice/ice_switch_filter.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ice/ice_switch_filter.c b/drivers/net/ice/ice_switch_filter.c index 36c9bffb73..e84283fec1 100644 --- a/drivers/net/ice/ice_switch_filter.c +++ b/drivers/net/ice/ice_switch_filter.c @@ -1863,7 +1863,10 @@ ice_switch_parse_pattern_action(struct ice_adapter *ad, else if (vlan_num == 2) tun_type = ICE_NON_TUN_QINQ; - list = rte_zmalloc(NULL, item_num * sizeof(*list), 0); + /* reserve one more memory slot for direction flag which may +* consume 1 lookup item. +*/ + list = rte_zmalloc(NULL, (item_num + 1) * sizeof(*list), 0); if (!list) { rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_HANDLE, NULL, -- 2.25.1
[PATCH v2] vdpa/sfc: handle sync issue between qemu and vhost-user
From: Abhimanyu Saini When DPDK app is running in the VF, it sometimes rings the doorbell before dev_config has had a chance to complete and hence it misses the event. As workaround, ring the doorbell when vDPA reports the notify_area to QEMU. Fixes: 630be406dcbf ("vdpa/sfc: get queue notify area info") Cc: sta...@dpdk.org Signed-off-by: Vijay Kumar Srivastava Signed-off-by: Abhimanyu Saini --- v1: * Update the commit id that this patch fixes drivers/vdpa/sfc/sfc_vdpa_ops.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c index b3d9b6c..63aa52d 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.c +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c @@ -794,6 +794,8 @@ int vfio_dev_fd; efx_rc_t rc; unsigned int bar_offset; + volatile void *doorbell; + struct rte_pci_device *pci_dev; struct rte_vdpa_device *vdpa_dev; struct sfc_vdpa_ops_data *ops_data; struct vfio_region_info reg = { .argsz = sizeof(reg) }; @@ -856,6 +858,18 @@ sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64, *offset); + pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev; + doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + *offset; + + /* +* virtio-net driver in VM sends queue notifications before +* vDPA has a chance to setup the queues and notification area, +* and hence the HW misses these doorbell notifications. +* Since, it is safe to send duplicate doorbell, send another +* doorbell from vDPA driver as workaround for this timing issue. +*/ + rte_write16(qid, doorbell); + return 0; } -- 1.8.3.1
[PATCH v3] net/igc: move the initialization of data path into dev_init
From: Zhichao Zeng The upper-layer application usually call the common interface "dev_init" to initialize the data path, but in the igc driver, the initialization of data path is in "igc_rx_init" and "eth_igc_tx_queue_setup", while in other drivers it is in "dev_init". So when upper-layer application calling these function pointers will occur segmentation faults. This patch moves the initialization of data path into "eth_igc_dev_init" to avoid segmentation faults, which is consistent with other drivers. Fixes: a5aeb2b9e225 ("net/igc: support Rx and Tx") Cc: alvinx.zh...@intel.com Cc: sta...@dpdk.org Signed-off-by: Zhichao Zeng --- v2: remove unnecessary parameters, move declaration to relevant header file --- v3: remove redundant code, optimize commit log --- drivers/net/igc/igc_ethdev.c | 3 +++ drivers/net/igc/igc_txrx.c | 10 +++--- drivers/net/igc/igc_txrx.h | 4 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/net/igc/igc_ethdev.c b/drivers/net/igc/igc_ethdev.c index b9933b395d..25fb91bfec 100644 --- a/drivers/net/igc/igc_ethdev.c +++ b/drivers/net/igc/igc_ethdev.c @@ -1234,6 +1234,9 @@ eth_igc_dev_init(struct rte_eth_dev *dev) dev->rx_queue_count = eth_igc_rx_queue_count; dev->rx_descriptor_status = eth_igc_rx_descriptor_status; dev->tx_descriptor_status = eth_igc_tx_descriptor_status; + dev->rx_pkt_burst = igc_recv_pkts; + dev->tx_pkt_burst = igc_xmit_pkts; + dev->tx_pkt_prepare = eth_igc_prep_pkts; /* * for secondary processes, we don't initialize any further as primary diff --git a/drivers/net/igc/igc_txrx.c b/drivers/net/igc/igc_txrx.c index e48d5df11a..f38c5e7863 100644 --- a/drivers/net/igc/igc_txrx.c +++ b/drivers/net/igc/igc_txrx.c @@ -345,7 +345,7 @@ rx_desc_get_pkt_info(struct igc_rx_queue *rxq, struct rte_mbuf *rxm, rxm->packet_type = rx_desc_pkt_info_to_pkt_type(pkt_info); } -static uint16_t +uint16_t igc_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) { struct igc_rx_queue * const rxq = rx_queue; @@ -1071,8 +1071,6 @@ igc_rx_init(struct rte_eth_dev *dev) uint16_t i; int ret; - dev->rx_pkt_burst = igc_recv_pkts; - /* * Make sure receives are disabled while setting * up the descriptor ring. @@ -1397,7 +1395,7 @@ eth_igc_rx_queue_setup(struct rte_eth_dev *dev, } /* prepare packets for transmit */ -static uint16_t +uint16_t eth_igc_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { @@ -1604,7 +1602,7 @@ tx_desc_cksum_flags_to_olinfo(uint64_t ol_flags) return tmp; } -static uint16_t +uint16_t igc_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { struct igc_tx_queue * const txq = tx_queue; @@ -2030,8 +2028,6 @@ int eth_igc_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx, txq->sw_ring, txq->tx_ring, txq->tx_ring_phys_addr); igc_reset_tx_queue(txq); - dev->tx_pkt_burst = igc_xmit_pkts; - dev->tx_pkt_prepare = ð_igc_prep_pkts; dev->data->tx_queues[queue_idx] = txq; txq->offloads = tx_conf->offloads; diff --git a/drivers/net/igc/igc_txrx.h b/drivers/net/igc/igc_txrx.h index 535108a868..a5240df9d7 100644 --- a/drivers/net/igc/igc_txrx.h +++ b/drivers/net/igc/igc_txrx.h @@ -49,6 +49,10 @@ void eth_igc_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id, struct rte_eth_txq_info *qinfo); void eth_igc_vlan_strip_queue_set(struct rte_eth_dev *dev, uint16_t rx_queue_id, int on); +uint16_t igc_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); +uint16_t igc_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +uint16_t eth_igc_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts, + uint16_t nb_pkts); #ifdef __cplusplus } #endif -- 2.25.1
Re: [PATCH v4] net: fix checksum with unaligned buffer
On 2022-06-27 22:21, Morten Brørup wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Monday, 27 June 2022 19.23 On 2022-06-27 15:22, Morten Brørup wrote: From: Emil Berg [mailto:emil.b...@ericsson.com] Sent: Monday, 27 June 2022 14.51 From: Emil Berg Sent: den 27 juni 2022 14:46 From: Mattias Rönnblom Sent: den 27 juni 2022 14:28 On 2022-06-23 14:51, Morten Brørup wrote: From: Morten Brørup [mailto:m...@smartsharesystems.com] Sent: Thursday, 23 June 2022 14.39 With this patch, the checksum can be calculated on an unaligned buffer. I.e. the buf parameter is no longer required to be 16 bit aligned. The checksum is still calculated using a 16 bit aligned pointer, so the compiler can auto-vectorize the function's inner loop. When the buffer is unaligned, the first byte of the buffer is handled separately. Furthermore, the calculated checksum of the buffer is byte shifted before being added to the initial checksum, to compensate for the checksum having been calculated on the buffer shifted by one byte. v4: * Add copyright notice. * Include stdbool.h (Emil Berg). * Use RTE_PTR_ADD (Emil Berg). * Fix one more typo in commit message. Is 'unligned' even a word? v3: * Remove braces from single statement block. * Fix typo in commit message. v2: * Do not assume that the buffer is part of an aligned packet buffer. Bugzilla ID: 1035 Cc: sta...@dpdk.org Signed-off-by: Morten Brørup [...] The compiler will be able to auto vectorize even unaligned accesses, just with different instructions. From what I can tell, there's no performance impact, at least not on the x86_64 systems I tried on. I think you should remove the first special case conditional and use memcpy() instead of the cumbersome __may_alias__ construct to retrieve the data. Here: https://www.agner.org/optimize/instruction_tables.pdf it lists the latency of vmovdqa (aligned) as 6 cycles and the latency for vmovdqu (unaligned) as 7 cycles. So I guess there can be some difference. Although in practice I'm not sure what difference it makes. I've not seen any difference in runtime between the two versions. Correction to my comment: Those stats are for some older CPU. For some newer CPUs such as Tiger Lake the stats seem to be the same regardless of aligned or unaligned. I agree that the memcpy method is more elegant and easy to read. However, we would need to performance test the modified checksum function with a large number of CPUs to prove that we don't introduce a performance regression on any CPU architecture still supported by DPDK. And Emil already found a CPU where it costs 1 extra cycle per 16 bytes, which adds up to a total of ca. 91 extra cycles on a 1460 byte TCP packet. I think you've misunderstood what latency means in such tables. It's a data dependency thing, not a measure of throughput. The throughput is *much* higher. My guess would be two such instruction per clock. For your 1460 bytes example, my Zen3 AMD needs performs identical with both the current DPDK implementation, your patch, and a memcpy()-ified version of the current implementation. They all need ~130 clock cycles/packet, with warm caches. IPC is 3 instructions per cycle, but obvious not all instructions are SIMD. You're right, I wasn't thinking deeper about it before extrapolating. Great to see some real numbers! I wish someone would do the same testing on an old ARM CPU, so we could also see the other end of the scale. I've ran it on an ARM A72. For the aligned 1460 bytes case I got: Current DPDK ~572 cc. Your patch: ~578 cc. Memcpy-fied: ~573 cc. They performed about the same for all unaligned/aligned and sizes I tested. This platform (or could be GCC version as well) doesn't suffer from the unaligned performance degradation your patch showed on my AMD machine. The main issue with checksumming on the CPU is, in my experience, not that you don't have enough compute, but that you trash the caches. Agree. I have noticed that x86 has "non-temporal" instruction variants to load/store data without trashing the cache entirely. A variant of the checksum function using such instructions might be handy. Yes, although you may need to prefetch the payload for good performance. Variants of the memcpy function using such instructions might also be handy for some purposes, e.g. copying the contents of packets, where the original and/or copy will not accessed shortly thereafter. Indeed and I think it's been discussed on the list. There's some work to get it right, since alignment requirement and the fact a different memory model is used for those SIMD instructions causes trouble for a generic implementation. (For x86_64.) So I opted for a solution with zero changes to the inner loop, so no performance retesting is required (for the previously supported use cases, where the buffer is aligned). You will see performance degradation with this solution as well, under certain con
RE: [PATCH v1] net/ice: fix memory allocation issue of packet flag
> -Original Message- > From: Zhang, Yuying > Sent: Tuesday, June 28, 2022 12:07 PM > To: dev@dpdk.org; Zhang, Qi Z > Cc: Zhang, Yuying ; sta...@dpdk.org > Subject: [PATCH v1] net/ice: fix memory allocation issue of packet flag > > Current code doesn't allocate memory of lookup element to add packet flag. > This patch adds one lookup item in the list to fix this memory issue. > > Fixes: 8b95092b7f69 ("net/ice/base: fix direction of flow that matches any") > Cc: sta...@dpdk.org > > Signed-off-by: Yuying Zhang Acked-by: Qi Zhang Applied to dpdk-next-net-intel. Thanks Qi
RE: [PATCH] net/iavf: fix issue of VF resetting
> -Original Message- > From: Zhou, YidingX > Sent: Monday, June 27, 2022 3:23 PM > To: dev@dpdk.org > Cc: sta...@dpdk.org; Zhou, YidingX ; Wu, Jingjing > ; Xing, Beilei ; Zhang, Qi Z > > Subject: [PATCH] net/iavf: fix issue of VF resetting > > When the VF is in closed state, the vf_reset flag can not be reverted if the > VF is > reset asynchronously. This prevents all virtchnl commands from executing, > causing subsequent calls to iavf_dev_reset() to fail. > > So the vf_reset flag needs to be reverted even when VF is in closed state. > > Fixes: 676d986b4b86 ("net/iavf: fix crash after VF reset failure") > Cc: sta...@dpdk.org > > Signed-off-by: Yiding Zhou Acked-by: Qi Zhang Applied to dpdk-next-net-intel. Thanks Qi
Re: Service core statistics MT safety
On 2022-06-28 02:14, Honnappa Nagarahalli wrote: > >> >>> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] >>> Sent: Monday, 27 June 2022 13.06 >>> >>> Hi. >>> >>> Is it safe to enable stats on MT safe services? >>> >>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af- >> 4 >>> 5444731-6096fdb16385f88f&q=1&e=27b94605-d1e2-40b6-af6d- >> 9ebc54d >>> >> 5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain% >> 2Flib >>> %2Feal%2Fcommon%2Frte_service.c%23 >>> L3 >>> 6 >>> 6 >>> >>> It seems to me this would have to be an __atomic_add for this code >>> to produce deterministic results. >> >> I agree. The same goes for the 'calls' field. > The calling function does the locking. > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af- >> 454 > 44731-5ce419f8bf9f9b76&q=1&e=27b94605-d1e2-40b6-af6d- >> 9ebc54d5db1 > >> 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib >> %2Feal > %2Fcommon%2Frte_service.c%23L3 > 98 > > For more information you can look at: > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af- >> 454 > 44731-ba0d1416f08856f0&q=1&e=27b94605-d1e2-40b6-af6d- >> 9ebc54d5db1 > >> 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib >> %2Feal > %2Finclude%2Frte_service.h%23L > 120 > What about the https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af- >> 4544 4731-b64334addc78c264&q=1&e=27b94605-d1e2-40b6-af6d- >> 9ebc54d5db18& >> u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2 >> Feal%2F common%2Frte_service.c%23L404 call (for MT safe services)? There's no lock held there. >>> Good point. >>> This is the case where the service running in service cores is MT safe. >>> However, >> the stats are incremented outside of the MT Safety mechanism employed by the >> service. So, yes, this and other updates in the function >> 'service_runner_do_callback' need to be updated atomically. >> >> Maybe a better solution would be to move this to the core_state struct (and >> eliminate the "calls" field since the same information is already in the >> core_state >> struct). The contention on these cache lines will be pretty crazy for >> services with >> short run time (say a thousand cycles or less per call), assuming they are >> mapped to many cores. > That's one option, the structures are internal as well. With this option > stats need to be aggregated which will not give an accurate view. But, that > is the nature of the statistics. > Per-core counters is a very common pattern. Used for Linux MIB counters, for example. I'm not sure I think it's much less accurate. I mean, you just load in quick succession what's globally visible for the different per-lcore counters. If you do a relaxed store on an ARM, how long time does it take until it's seen by someone doing a relaxed load on a different core? Roughly. > I am also wondering if these stats are of any use other than for debugging. > Adding a capability to disable stats might help as well. > They could be used as a crude tool to determine service core utilization. Comparing utilization between different services running on the same core should be straight-forward, but lcore utilization is harder in absolute terms. If you just look at "cycles", a completely idle core would look like it's very busy (basically rdtsc latency added for every loop). I assume you'd have to do some heuristic based on both "calls" and "cycles" to get an estimate. I think service core utilization would be very useful, although that would require some changes in the service function signature, so the service can report back if it did some useful work for a particular call. That would make for a DPDK 'top'. Just like 'top', it can't impose any serious performance degradation when used, to be really useful, I think. Sure, it should be possible to turn it on and off. I thought that was the case already? >> >> Idle service cores will basically do nothing else than stall waiting for >> these lines, I >> suspect, hampering the progress of more busy cores.