Re: [PATCH] net/dpaa2: change threshold value
On 5/8/2023 4:11 PM, Tianli Lai wrote: Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button this threshold value can be changed with function argument nb_rx_desc. Signed-off-by: Tianli Lai --- drivers/net/dpaa2/dpaa2_ethdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c index 679f33ae1a..ad892ded4a 100644 --- a/drivers/net/dpaa2/dpaa2_ethdev.c +++ b/drivers/net/dpaa2/dpaa2_ethdev.c @@ -829,7 +829,7 @@ dpaa2_dev_rx_queue_setup(struct rte_eth_dev *dev, dpaa2_q->cgid, &taildrop); } else { /*enabling per rx queue congestion control */ - taildrop.threshold = CONG_THRESHOLD_RX_BYTES_Q; + taildrop.threshold = nb_rx_desc * 1024; taildrop.units = DPNI_CONGESTION_UNIT_BYTES; taildrop.oal = CONG_RX_OAL; DPAA2_PMD_DEBUG("Enabling Byte based Drop on queue= %d", -- 2.27.0 Hi Tianli, The number of bytes based tail drop threshold value "CONG_THRESHOLD_RX_BYTES_Q" is an optimized value for dpaa2 platform. we concluded this value after multiple benchmark experiments in past. Although, number of frame based threshold value is "nb_rx_desc" based only. We will further review this suggestion and get back. -- Thanks, Sachin Saxena (NXP)
RE: [v1 1/3] net/mlx5/hws: support dest root table action
From: Hamdan Igbaria > Add support for dest root table action creation. > This support will allow to redirect the packets to the kernel from non root > tables. > > Signed-off-by: Hamdan Igbaria > Reviewed-by: Alex Vesker Series-acked-by: Matan Azrad
[PATCH v2 0/4] Support VFIO sparse mmap in PCI bus
This series introduces a VFIO standard capability, called sparse mmap to PCI bus. In linux kernel, it's defined as VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of mmap whole BAR region into DPDK process, only mmap part of the BAR region after getting sparse mmap information from kernel. For the rest of BAR region that is not mmap-ed, DPDK process can use pread/pwrite system calls to access. Sparse mmap is useful when kernel does not want userspace to mmap whole BAR region, or kernel wants to control over access to specific BAR region. Vendors can choose to enable this feature or not for their devices in their specific kernel modules. In this patchset: Patch 1-3 is mainly for introducing BAR access APIs so that driver could use them to access specific BAR using pread/pwrite system calls when part of the BAR is not mmap-able. Patch 4 adds the VFIO sparse mmap support finally. v2: 1. add PCI device internal structure in bus/pci/windows/pci.c 2. fix parameter type error Chenbo Xia (3): bus/pci: introduce an internal representation of PCI device bus/pci: avoid depending on private value in kernel source bus/pci: introduce helper for MMIO read and write Miao Li (1): bus/pci: add VFIO sparse mmap support drivers/bus/pci/bsd/pci.c| 35 +++- drivers/bus/pci/linux/pci.c | 78 +-- drivers/bus/pci/linux/pci_init.h | 14 +- drivers/bus/pci/linux/pci_uio.c | 22 ++ drivers/bus/pci/linux/pci_vfio.c | 335 +-- drivers/bus/pci/pci_common.c | 12 +- drivers/bus/pci/private.h| 25 ++- drivers/bus/pci/rte_bus_pci.h| 48 + drivers/bus/pci/version.map | 3 + drivers/bus/pci/windows/pci.c| 14 +- lib/eal/include/rte_vfio.h | 1 - 11 files changed, 491 insertions(+), 96 deletions(-) -- 2.25.1
[PATCH v2 1/4] bus/pci: introduce an internal representation of PCI device
From: Chenbo Xia This patch introduces an internal representation of the PCI device which will be used to store the internal information that don't have to be exposed to drivers, e.g., the VFIO region sizes/offsets. In this patch, the internal structure is simply a wrapper of the rte_pci_device structure. More fields will be added. Signed-off-by: Chenbo Xia --- drivers/bus/pci/bsd/pci.c | 13 - drivers/bus/pci/linux/pci.c | 28 drivers/bus/pci/pci_common.c | 12 ++-- drivers/bus/pci/private.h | 14 +- drivers/bus/pci/windows/pci.c | 14 +- 5 files changed, 52 insertions(+), 29 deletions(-) diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c index 7459d15c7e..a747eca58c 100644 --- a/drivers/bus/pci/bsd/pci.c +++ b/drivers/bus/pci/bsd/pci.c @@ -208,16 +208,19 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx, static int pci_scan_one(int dev_pci_fd, struct pci_conf *conf) { + struct rte_pci_device_internal *pdev; struct rte_pci_device *dev; struct pci_bar_io bar; unsigned i, max; - dev = malloc(sizeof(*dev)); - if (dev == NULL) { + pdev = malloc(sizeof(*pdev)); + if (pdev == NULL) { + RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n"); return -1; } - memset(dev, 0, sizeof(*dev)); + memset(pdev, 0, sizeof(*pdev)); + dev = &pdev->device; dev->device.bus = &rte_pci_bus.bus; dev->addr.domain = conf->pc_sel.pc_domain; @@ -303,7 +306,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf) memmove(dev2->mem_resource, dev->mem_resource, sizeof(dev->mem_resource)); - pci_free(dev); + pci_free(pdev); } return 0; } @@ -313,7 +316,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf) return 0; skipdev: - pci_free(dev); + pci_free(pdev); return 0; } diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index ebd1395502..4c2c5ba382 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -211,22 +211,26 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr) { char filename[PATH_MAX]; unsigned long tmp; + struct rte_pci_device_internal *pdev; struct rte_pci_device *dev; char driver[PATH_MAX]; int ret; - dev = malloc(sizeof(*dev)); - if (dev == NULL) + pdev = malloc(sizeof(*pdev)); + if (pdev == NULL) { + RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n"); return -1; + } - memset(dev, 0, sizeof(*dev)); + memset(pdev, 0, sizeof(*pdev)); + dev = &pdev->device; dev->device.bus = &rte_pci_bus.bus; dev->addr = *addr; /* get vendor id */ snprintf(filename, sizeof(filename), "%s/vendor", dirname); if (eal_parse_sysfs_value(filename, &tmp) < 0) { - pci_free(dev); + pci_free(pdev); return -1; } dev->id.vendor_id = (uint16_t)tmp; @@ -234,7 +238,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr) /* get device id */ snprintf(filename, sizeof(filename), "%s/device", dirname); if (eal_parse_sysfs_value(filename, &tmp) < 0) { - pci_free(dev); + pci_free(pdev); return -1; } dev->id.device_id = (uint16_t)tmp; @@ -243,7 +247,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr) snprintf(filename, sizeof(filename), "%s/subsystem_vendor", dirname); if (eal_parse_sysfs_value(filename, &tmp) < 0) { - pci_free(dev); + pci_free(pdev); return -1; } dev->id.subsystem_vendor_id = (uint16_t)tmp; @@ -252,7 +256,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr) snprintf(filename, sizeof(filename), "%s/subsystem_device", dirname); if (eal_parse_sysfs_value(filename, &tmp) < 0) { - pci_free(dev); + pci_free(pdev); return -1; } dev->id.subsystem_device_id = (uint16_t)tmp; @@ -261,7 +265,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr) snprintf(filename, sizeof(filename), "%s/class", dirname); if (eal_parse_sysfs_value(filename, &tmp) < 0) { - pci_free(dev); + pci_free(pdev); return -1; } /* the least 24 bits are valid: class, subclass, program interface */ @@ -297,7 +
[PATCH v2 2/4] bus/pci: avoid depending on private value in kernel source
From: Chenbo Xia The value 40 used in VFIO_GET_REGION_ADDR() is a private value (VFIO_PCI_OFFSET_SHIFT) defined in Linux kernel source [1]. It is not part of VFIO API, and we should not depend on it. [1] https://github.com/torvalds/linux/blob/v6.2/include/linux/vfio_pci_core.h Signed-off-by: Chenbo Xia --- drivers/bus/pci/linux/pci.c | 4 +- drivers/bus/pci/linux/pci_init.h | 4 +- drivers/bus/pci/linux/pci_vfio.c | 195 +++ drivers/bus/pci/private.h| 9 ++ lib/eal/include/rte_vfio.h | 1 - 5 files changed, 158 insertions(+), 55 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 4c2c5ba382..04e21ae20f 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -645,7 +645,7 @@ int rte_pci_read_config(const struct rte_pci_device *device, return pci_uio_read_config(intr_handle, buf, len, offset); #ifdef VFIO_PRESENT case RTE_PCI_KDRV_VFIO: - return pci_vfio_read_config(intr_handle, buf, len, offset); + return pci_vfio_read_config(device, buf, len, offset); #endif default: rte_pci_device_name(&device->addr, devname, @@ -669,7 +669,7 @@ int rte_pci_write_config(const struct rte_pci_device *device, return pci_uio_write_config(intr_handle, buf, len, offset); #ifdef VFIO_PRESENT case RTE_PCI_KDRV_VFIO: - return pci_vfio_write_config(intr_handle, buf, len, offset); + return pci_vfio_write_config(device, buf, len, offset); #endif default: rte_pci_device_name(&device->addr, devname, diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h index dcea726186..9f6659ba6e 100644 --- a/drivers/bus/pci/linux/pci_init.h +++ b/drivers/bus/pci/linux/pci_init.h @@ -66,9 +66,9 @@ int pci_uio_ioport_unmap(struct rte_pci_ioport *p); #endif /* access config space */ -int pci_vfio_read_config(const struct rte_intr_handle *intr_handle, +int pci_vfio_read_config(const struct rte_pci_device *dev, void *buf, size_t len, off_t offs); -int pci_vfio_write_config(const struct rte_intr_handle *intr_handle, +int pci_vfio_write_config(const struct rte_pci_device *dev, const void *buf, size_t len, off_t offs); int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar, diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c index fab3483d9f..1748ad2ae0 100644 --- a/drivers/bus/pci/linux/pci_vfio.c +++ b/drivers/bus/pci/linux/pci_vfio.c @@ -43,45 +43,82 @@ static struct rte_tailq_elem rte_vfio_tailq = { }; EAL_REGISTER_TAILQ(rte_vfio_tailq) +static int +pci_vfio_get_region(const struct rte_pci_device *dev, int index, + uint64_t *size, uint64_t *offset) +{ + const struct rte_pci_device_internal *pdev = + RTE_PCI_DEVICE_INTERNAL_CONST(dev); + + if (index >= VFIO_PCI_NUM_REGIONS || index >= RTE_MAX_PCI_REGIONS) + return -1; + + if (pdev->region[index].size == 0 && pdev->region[index].offset == 0) + return -1; + + *size = pdev->region[index].size; + *offset = pdev->region[index].offset; + + return 0; +} + int -pci_vfio_read_config(const struct rte_intr_handle *intr_handle, +pci_vfio_read_config(const struct rte_pci_device *dev, void *buf, size_t len, off_t offs) { - int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle); + uint64_t size, offset; + int fd; - if (vfio_dev_fd < 0) + fd = rte_intr_dev_fd_get(dev->intr_handle); + + if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX, + &size, &offset) != 0) + return -1; + + if ((uint64_t)len + offs > size) return -1; - return pread64(vfio_dev_fd, buf, len, - VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs); + return pread64(fd, buf, len, offset + offs); } int -pci_vfio_write_config(const struct rte_intr_handle *intr_handle, +pci_vfio_write_config(const struct rte_pci_device *dev, const void *buf, size_t len, off_t offs) { - int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle); + uint64_t size, offset; + int fd; - if (vfio_dev_fd < 0) + fd = rte_intr_dev_fd_get(dev->intr_handle); + + if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX, + &size, &offset) != 0) return -1; - return pwrite64(vfio_dev_fd, buf, len, - VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs); + if ((uint64_t)len + offs > size) + return -1; + + return pwrite64(fd, buf, len, offset + offs); } /* get PCI BAR number where MSI-X interrupts are */ static int -pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table) +pci_vfi
[PATCH v2 3/4] bus/pci: introduce helper for MMIO read and write
From: Chenbo Xia The MMIO regions may not be mmap-able for VFIO-PCI devices. In this case, the driver should explicitly do read and write to access these regions. Signed-off-by: Chenbo Xia --- drivers/bus/pci/bsd/pci.c| 22 +++ drivers/bus/pci/linux/pci.c | 46 ++ drivers/bus/pci/linux/pci_init.h | 10 +++ drivers/bus/pci/linux/pci_uio.c | 22 +++ drivers/bus/pci/linux/pci_vfio.c | 36 drivers/bus/pci/rte_bus_pci.h| 48 drivers/bus/pci/version.map | 3 ++ 7 files changed, 187 insertions(+) diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c index a747eca58c..27f12590d4 100644 --- a/drivers/bus/pci/bsd/pci.c +++ b/drivers/bus/pci/bsd/pci.c @@ -489,6 +489,28 @@ int rte_pci_write_config(const struct rte_pci_device *dev, return -1; } +/* Read PCI MMIO space. */ +int rte_pci_mmio_read(const struct rte_pci_device *dev, int bar, + void *buf, size_t len, off_t offset) +{ + if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL || + (uint64_t)offset + len > dev->mem_resource[bar].len) + return -1; + memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len); + return len; +} + +/* Write PCI MMIO space. */ +int rte_pci_mmio_write(const struct rte_pci_device *dev, int bar, + const void *buf, size_t len, off_t offset) +{ + if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL || + (uint64_t)offset + len > dev->mem_resource[bar].len) + return -1; + memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len); + return len; +} + int rte_pci_ioport_map(struct rte_pci_device *dev, int bar, struct rte_pci_ioport *p) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 04e21ae20f..3d237398d9 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -680,6 +680,52 @@ int rte_pci_write_config(const struct rte_pci_device *device, } } +/* Read PCI MMIO space. */ +int rte_pci_mmio_read(const struct rte_pci_device *device, int bar, + void *buf, size_t len, off_t offset) +{ + char devname[RTE_DEV_NAME_MAX_LEN] = ""; + + switch (device->kdrv) { + case RTE_PCI_KDRV_IGB_UIO: + case RTE_PCI_KDRV_UIO_GENERIC: + return pci_uio_mmio_read(device, bar, buf, len, offset); +#ifdef VFIO_PRESENT + case RTE_PCI_KDRV_VFIO: + return pci_vfio_mmio_read(device, bar, buf, len, offset); +#endif + default: + rte_pci_device_name(&device->addr, devname, + RTE_DEV_NAME_MAX_LEN); + RTE_LOG(ERR, EAL, + "Unknown driver type for %s\n", devname); + return -1; + } +} + +/* Write PCI MMIO space. */ +int rte_pci_mmio_write(const struct rte_pci_device *device, int bar, + const void *buf, size_t len, off_t offset) +{ + char devname[RTE_DEV_NAME_MAX_LEN] = ""; + + switch (device->kdrv) { + case RTE_PCI_KDRV_IGB_UIO: + case RTE_PCI_KDRV_UIO_GENERIC: + return pci_uio_mmio_write(device, bar, buf, len, offset); +#ifdef VFIO_PRESENT + case RTE_PCI_KDRV_VFIO: + return pci_vfio_mmio_write(device, bar, buf, len, offset); +#endif + default: + rte_pci_device_name(&device->addr, devname, + RTE_DEV_NAME_MAX_LEN); + RTE_LOG(ERR, EAL, + "Unknown driver type for %s\n", devname); + return -1; + } +} + int rte_pci_ioport_map(struct rte_pci_device *dev, int bar, struct rte_pci_ioport *p) diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h index 9f6659ba6e..d842809ccd 100644 --- a/drivers/bus/pci/linux/pci_init.h +++ b/drivers/bus/pci/linux/pci_init.h @@ -37,6 +37,11 @@ int pci_uio_read_config(const struct rte_intr_handle *intr_handle, int pci_uio_write_config(const struct rte_intr_handle *intr_handle, const void *buf, size_t len, off_t offs); +int pci_uio_mmio_read(const struct rte_pci_device *dev, int bar, + void *buf, size_t len, off_t offset); +int pci_uio_mmio_write(const struct rte_pci_device *dev, int bar, + const void *buf, size_t len, off_t offset); + int pci_uio_ioport_map(struct rte_pci_device *dev, int bar, struct rte_pci_ioport *p); void pci_uio_ioport_read(struct rte_pci_ioport *p, @@ -71,6 +76,11 @@ int pci_vfio_read_config(const struct rte_pci_device *dev, int pci_vfio_write_config(const struct rte_pci_device *dev, const void *buf, size_t len, off_t offs); +int pci_vfio_mmio_read(const struct rte_pci_device *dev, int b
[PATCH v2 4/4] bus/pci: add VFIO sparse mmap support
This patch adds sparse mmap support in PCI bus. Sparse mmap is a capability defined in VFIO which allows multiple mmap areas in one VFIO region. In this patch, the sparse mmap regions are mapped to one continuous virtual address region that follows device-specific BAR layout. So, driver can still access all mapped sparse mmap regions by using 'bar_base_address + bar_offset'. Signed-off-by: Miao Li Signed-off-by: Chenbo Xia --- drivers/bus/pci/linux/pci_vfio.c | 104 +++ drivers/bus/pci/private.h| 2 + 2 files changed, 94 insertions(+), 12 deletions(-) diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c index f6289c907f..dd57566083 100644 --- a/drivers/bus/pci/linux/pci_vfio.c +++ b/drivers/bus/pci/linux/pci_vfio.c @@ -673,6 +673,54 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res, return 0; } +static int +pci_vfio_sparse_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res, + int bar_index, int additional_flags) +{ + struct pci_map *bar = &vfio_res->maps[bar_index]; + struct vfio_region_sparse_mmap_area *sparse; + void *bar_addr; + uint32_t i; + + if (bar->size == 0) { + RTE_LOG(DEBUG, EAL, "Bar size is 0, skip BAR%d\n", bar_index); + return 0; + } + + /* reserve the address using an inaccessible mapping */ + bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE | + MAP_ANONYMOUS | additional_flags, -1, 0); + if (bar_addr != MAP_FAILED) { + void *map_addr = NULL; + for (i = 0; i < bar->nr_areas; i++) { + sparse = &bar->areas[i]; + if (sparse->size) { + void *addr = RTE_PTR_ADD(bar_addr, (uintptr_t)sparse->offset); + map_addr = pci_map_resource(addr, vfio_dev_fd, + bar->offset + sparse->offset, sparse->size, + RTE_MAP_FORCE_ADDRESS); + if (map_addr == NULL) { + munmap(bar_addr, bar->size); + RTE_LOG(ERR, EAL, "Failed to map pci BAR%d\n", + bar_index); + goto err_map; + } + } + } + } else { + RTE_LOG(ERR, EAL, "Failed to create inaccessible mapping for BAR%d\n", + bar_index); + goto err_map; + } + + bar->addr = bar_addr; + return 0; + +err_map: + bar->nr_areas = 0; + return -1; +} + /* * region info may contain capability headers, so we need to keep reallocating * the memory until we match allocated memory size with argsz. @@ -875,6 +923,8 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev) for (i = 0; i < vfio_res->nb_maps; i++) { void *bar_addr; + struct vfio_info_cap_header *hdr; + struct vfio_region_info_cap_sparse_mmap *sparse; ret = pci_vfio_get_region_info(vfio_dev_fd, ®, i); if (ret < 0) { @@ -920,12 +970,33 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev) maps[i].size = reg->size; maps[i].path = NULL; /* vfio doesn't have per-resource paths */ - ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0); - if (ret < 0) { - RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n", - pci_addr, i, strerror(errno)); - free(reg); - goto err_vfio_res; + hdr = pci_vfio_info_cap(reg, VFIO_REGION_INFO_CAP_SPARSE_MMAP); + + if (hdr != NULL) { + sparse = container_of(hdr, + struct vfio_region_info_cap_sparse_mmap, header); + if (sparse->nr_areas > 0) { + maps[i].nr_areas = sparse->nr_areas; + maps[i].areas = sparse->areas; + } + } + + if (maps[i].nr_areas > 0) { + ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, i, 0); + if (ret < 0) { + RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i failed: %s\n", + pci_addr, i, strerror(errno)); + free(reg); + goto err_vfio_res; + } + } else { + ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0); + if (ret < 0) { + RTE_LOG(ERR, EAL, "%s m
RE: [PATCH] examples/ptpclient: add signal handler for cleanup
Ping. > -Original Message- > From: Rahul Bhansali > Sent: Friday, January 20, 2023 11:26 AM > To: 'dev@dpdk.org' ; 'Kirill Rybalchenko' > > Subject: RE: [PATCH] examples/ptpclient: add signal handler for cleanup > > Ping. > > > -Original Message- > > From: Rahul Bhansali > > Sent: Wednesday, November 2, 2022 10:21 PM > > To: dev@dpdk.org; Kirill Rybalchenko > > Subject: RE: [PATCH] examples/ptpclient: add signal handler for > > cleanup > > > > Ping. > > > > > -Original Message- > > > From: Rahul Bhansali > > > Sent: Wednesday, August 31, 2022 12:19 PM > > > To: dev@dpdk.org; Kirill Rybalchenko > > > Cc: Rahul Bhansali > > > Subject: [PATCH] examples/ptpclient: add signal handler for cleanup > > > > > > This adds the signal handler for SIGINT, SIGTERM. > > > Also, this will come out from infinite loop and do cleanup once it > > > receives any of the registered signal. > > > > > > Signed-off-by: Rahul Bhansali > > > --- > > > examples/ptpclient/ptpclient.c | 32 > > > ++-- > > > 1 file changed, 30 insertions(+), 2 deletions(-) > > > > > > diff --git a/examples/ptpclient/ptpclient.c > > > b/examples/ptpclient/ptpclient.c index 1f1c9c9c52..8b69716be1 100644 > > > --- a/examples/ptpclient/ptpclient.c > > > +++ b/examples/ptpclient/ptpclient.c > > > @@ -19,6 +19,9 @@ > > > #include > > > #include > > > #include > > > +#include > > > + > > > +static volatile bool force_quit; > > > > > > #define RX_RING_SIZE 1024 > > > #define TX_RING_SIZE 1024 > > > @@ -609,7 +612,7 @@ parse_ptp_frames(uint16_t portid, struct > > > rte_mbuf > > > *m) { > > > * The lcore main. This is the main thread that does the work, reading > > > from > an > > > * input port and writing to an output port. > > > */ > > > -static __rte_noreturn void > > > +static void > > > lcore_main(void) > > > { > > > uint16_t portid; > > > @@ -621,7 +624,7 @@ lcore_main(void) > > > > > > /* Run until the application is quit or killed. */ > > > > > > - while (1) { > > > + while (!force_quit) { > > > /* Read packet from RX queues. 8< */ > > > for (portid = 0; portid < ptp_enabled_port_nb; portid++) { > > > > > > @@ -734,6 +737,13 @@ ptp_parse_args(int argc, char **argv) > > > return 0; > > > } > > > > > > +static void > > > +signal_handler(int signum) > > > +{ > > > + if (signum == SIGINT || signum == SIGTERM) > > > + force_quit = true; > > > +} > > > + > > > /* > > > * The main function, which does initialization and calls the per-lcore > > > * functions. > > > @@ -758,6 +768,10 @@ main(int argc, char *argv[]) > > > argc -= ret; > > > argv += ret; > > > > > > + force_quit = false; > > > + signal(SIGINT, signal_handler); > > > + signal(SIGTERM, signal_handler); > > > + > > > ret = ptp_parse_args(argc, argv); > > > if (ret < 0) > > > rte_exit(EXIT_FAILURE, "Error with PTP initialization\n"); @@ - > > > 802,6 +816,20 @@ main(int argc, char *argv[]) > > > /* Call lcore_main on the main core only. */ > > > lcore_main(); > > > > > > + RTE_ETH_FOREACH_DEV(portid) { > > > + if ((ptp_enabled_port_mask & (1 << portid)) == 0) > > > + continue; > > > + > > > + /* Disable timesync timestamping for the Ethernet device */ > > > + rte_eth_timesync_disable(portid); > > > + > > > + ret = rte_eth_dev_stop(portid); > > > + if (ret != 0) > > > + printf("rte_eth_dev_stop: err=%d, port=%d\n", ret, > > > portid); > > > + > > > + rte_eth_dev_close(portid); > > > + } > > > + > > > /* clean up the EAL */ > > > rte_eal_cleanup(); > > > > > > -- > > > 2.25.1
Re: [PATCH v2] vfio: do not coalesce DMA mappings
On Wed, Jan 4, 2023 at 6:19 AM Nipun Gupta wrote: > At the cleanup time when dma unmap is done, linux kernel > does not allow unmap of individual segments which were > coalesced together while creating the DMA map for type1 IOMMU > mappings. So, this change updates the mapping of the memory > segments(hugepages) on a per-page basis. > > Signed-off-by: Nipun Gupta > Signed-off-by: Nikhil Agarwal Reviewed-by: Anatoly Burakov Applied, thanks. -- David Marchand
[PATCH v2] vhost: avoid sleeping under mutex
Covscan reported: 2. dpdk-21.11/lib/vhost/socket.c:852: lock_acquire: Calling function "pthread_mutex_lock" acquires lock "vhost_user.mutex". 23. dpdk-21.11/lib/vhost/socket.c:955: sleep: Call to "vhost_user_reconnect_init" might sleep while holding lock "vhost_user.mutex". # 953| vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT); # 954| if (vsocket->reconnect && reconn_tid == 0) { # 955|-> if (vhost_user_reconnect_init() != 0) # 956| goto out_mutex; # 957| } The reason for this warning is that vhost_user_reconnect_init() creates a ctrl thread and calls nanosleep waiting for this thread to be ready, while vhost_user.mutex is taken. Move the call to vhost_user_reconnect_init() out of this mutex. While at it, a pthread_t value should be considered opaque. Instead of relying reconn_tid == 0, use an internal flag in vhost_user_reconnect_init(). Coverity issue: 373686 Bugzilla ID: 981 Fixes: e623e0c6d8a5 ("vhost: add reconnect ability") Cc: sta...@dpdk.org Signed-off-by: David Marchand --- Changes since v1: - moved reconn_tid in vhost_user_reconnect_init as this variable is not used anywhere else, --- lib/vhost/socket.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c index 669c322e12..00a912c59e 100644 --- a/lib/vhost/socket.c +++ b/lib/vhost/socket.c @@ -427,7 +427,6 @@ struct vhost_user_reconnect_list { }; static struct vhost_user_reconnect_list reconn_list; -static pthread_t reconn_tid; static int vhost_user_connect_nonblock(char *path, int fd, struct sockaddr *un, size_t sz) @@ -498,8 +497,13 @@ vhost_user_client_reconnect(void *arg __rte_unused) static int vhost_user_reconnect_init(void) { + static bool reconn_init_done; + static pthread_t reconn_tid; int ret; + if (reconn_init_done) + return 0; + ret = pthread_mutex_init(&reconn_list.mutex, NULL); if (ret < 0) { VHOST_LOG_CONFIG("thread", ERR, "%s: failed to initialize mutex\n", __func__); @@ -515,6 +519,8 @@ vhost_user_reconnect_init(void) VHOST_LOG_CONFIG("thread", ERR, "%s: failed to destroy reconnect mutex\n", __func__); + } else { + reconn_init_done = true; } return ret; @@ -866,6 +872,11 @@ rte_vhost_driver_register(const char *path, uint64_t flags) if (!path) return -1; + if ((flags & RTE_VHOST_USER_CLIENT) != 0 && + (flags & RTE_VHOST_USER_NO_RECONNECT) == 0 && + vhost_user_reconnect_init() != 0) + return -1; + pthread_mutex_lock(&vhost_user.mutex); if (vhost_user.vsocket_cnt == MAX_VHOST_SOCKET) { @@ -961,11 +972,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags) } if ((flags & RTE_VHOST_USER_CLIENT) != 0) { - vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT); - if (vsocket->reconnect && reconn_tid == 0) { - if (vhost_user_reconnect_init() != 0) - goto out_mutex; - } + vsocket->reconnect = (flags & RTE_VHOST_USER_NO_RECONNECT) == 0; } else { vsocket->is_server = true; } -- 2.40.0
Re: [PATCH] vhost: avoid sleeping under mutex
On Thu, May 11, 2023 at 10:10 AM Xia, Chenbo wrote: > > Covscan reported: > > > > 2. dpdk-21.11/lib/vhost/socket.c:852: lock_acquire: Calling function > > "pthread_mutex_lock" acquires lock "vhost_user.mutex". > > 23. dpdk-21.11/lib/vhost/socket.c:955: sleep: Call to > > "vhost_user_reconnect_init" might sleep while holding lock > > "vhost_user.mutex". > > # 953| vsocket->reconnect = > > !(flags & RTE_VHOST_USER_NO_RECONNECT); > > # 954| if (vsocket->reconnect && reconn_tid == 0) { > > # 955|-> if (vhost_user_reconnect_init() != 0) > > # 956| goto out_mutex; > > # 957| } > > > > The reason for this warning is that vhost_user_reconnect_init() creates a > > ctrl thread and calls nanosleep waiting for this thread to be ready, > > while vhost_user.mutex is taken. > > > > Move the call to vhost_user_reconnect_init() out of this mutex. > > > > While at it, a pthread_t value should be considered opaque. > > Instead of relying reconn_tid == 0, use an internal flag in > > vhost_user_reconnect_init(). > > Should we make reconn_tid not a global variable anymore then? > Maybe a static pthread_t in vhost_user_reconnect_init? Yep, I just sent a v2. Thanks Chenbo. -- David Marchand
RE: [EXT] Re: [PATCH v7] mem: telemetry support for memseg and element information
Hi Anatoly and David, Could you review this patch, this is pending since October-22. If there are no comments, please accept the patch to make forward progress. Thanks, Amit Shukla > -Original Message- > From: Amit Prakash Shukla > Sent: Tuesday, February 28, 2023 1:00 PM > To: Thomas Monjalon ; dev@dpdk.org > Cc: Anatoly Burakov ; Jerin Jacob > Kollanukkaran ; david.march...@redhat.com; > bruce.richard...@intel.com; ciara.po...@intel.com; > dmitry.kozl...@gmail.com > Subject: RE: [EXT] Re: [PATCH v7] mem: telemetry support for memseg and > element information > > Hi Thomas, > > No changes pending from my side. > > Thanks, > Amit Shukla > > > -Original Message- > > From: Thomas Monjalon > > Sent: Monday, February 20, 2023 4:41 PM > > To: dev@dpdk.org > > Cc: Anatoly Burakov ; Jerin Jacob > > Kollanukkaran ; david.march...@redhat.com; > > bruce.richard...@intel.com; ciara.po...@intel.com; > > dmitry.kozl...@gmail.com; Amit Prakash Shukla > > > > Subject: [EXT] Re: [PATCH v7] mem: telemetry support for memseg and > > element information > > > > External Email > > > > -- > > 25/10/2022 15:02, Amit Prakash Shukla: > > > Changes adds telemetry support to display memory occupancy in > memseg > > > and the information of the elements allocated from a memseg based on > > > arguments provided by user. > > > > There was no comment since October. > > Can we merge? > > > >
Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
On Fri, May 12, 2023 at 7:26 PM Morten Brørup wrote: > > > From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] > > Sent: Friday, 12 May 2023 15.15 > > > > On 2023-05-12 13:59, Jerin Jacob wrote: > > > On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom > > > wrote: > > >> > > >> Use non-burst event enqueue and dequeue calls from burst enqueue and > > >> dequeue only when the burst size is compile-time constant (and equal > > >> to one). > > >> > > >> Signed-off-by: Mattias Rönnblom > > >> > > >> --- > > >> > > >> v3: Actually include the change v2 claimed to contain. > > >> v2: Wrap builtin call in __extension__, to avoid compiler warnings if > > >> application is compiled with -pedantic. (Morten Brørup) > > >> --- > > >> lib/eventdev/rte_eventdev.h | 4 ++-- > > >> 1 file changed, 2 insertions(+), 2 deletions(-) > > >> > > >> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h > > >> index a90e23ac8b..a471caeb6d 100644 > > >> --- a/lib/eventdev/rte_eventdev.h > > >> +++ b/lib/eventdev/rte_eventdev.h > > >> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t > > port_id, > > >> * Allow zero cost non burst mode routine invocation if > > application > > >> * requests nb_events as const one > > >> */ > > >> - if (nb_events == 1) > > >> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events > > >> == > > 1) > > > > > > "Why" part is not clear from the commit message. Is this to avoid > > > nb_events read if it is built-in const. > > > > The __builtin_constant_p() is introduced to avoid having the compiler > > generate a conditional branch and two different code paths in case > > nb_elem is a run-time variable. > > > > In particular, this matters if nb_elems is run-time variable and varies > > between 1 and some larger value. > > > > I should have mention this in the commit message. > > > > A very slight performance improvement. It also makes the code better > > match the comment, imo. Zero cost for const one enqueues, but no impact > > non-compile-time-constant-length enqueues. > > > > Feel free to ignore. > > > > > If so, check should be following. Right? > > > > > > if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1) > > > || nb_events == 1) > > @Mattias: You missed the second part of this comparison, also catching > nb_events == 1 with non-constant nb_events. > > @Jerin: Such a change has no effect, compared to the original code. Yes. That was the reason for I was skipping __builtin_constant_p in the initial version.
Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom wrote: > > On 2023-05-12 13:59, Jerin Jacob wrote: > > On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom > > wrote: > >> > >> Use non-burst event enqueue and dequeue calls from burst enqueue and > >> dequeue only when the burst size is compile-time constant (and equal > >> to one). > >> > >> Signed-off-by: Mattias Rönnblom > >> > >> --- > >> > >> v3: Actually include the change v2 claimed to contain. > >> v2: Wrap builtin call in __extension__, to avoid compiler warnings if > >> application is compiled with -pedantic. (Morten Brørup) > >> --- > >> lib/eventdev/rte_eventdev.h | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h > >> index a90e23ac8b..a471caeb6d 100644 > >> --- a/lib/eventdev/rte_eventdev.h > >> +++ b/lib/eventdev/rte_eventdev.h > >> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t > >> port_id, > >> * Allow zero cost non burst mode routine invocation if > >> application > >> * requests nb_events as const one > >> */ > >> - if (nb_events == 1) > >> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == > >> 1) > > > > "Why" part is not clear from the commit message. Is this to avoid > > nb_events read if it is built-in const. > > The __builtin_constant_p() is introduced to avoid having the compiler > generate a conditional branch and two different code paths in case > nb_elem is a run-time variable. > > In particular, this matters if nb_elems is run-time variable and varies > between 1 and some larger value. > > I should have mention this in the commit message. > > A very slight performance improvement. It also makes the code better > match the comment, imo. Zero cost for const one enqueues, but no impact > non-compile-time-constant-length enqueues. > > Feel free to ignore. I did some performance comparison of the patch. A low-end ARM machines shows 0.7% drop with single event case. No regression see with high-end ARM cores with single event case. IMO, optimizing the check for burst mode(the new patch) may not show any real improvement as the cost is divided by number of event. Whereas optimizing the check for single event case(The current code) shows better performance with single event case and no regression with burst mode as cost is divided by number of events. If you agree, then we can skip this patch. > > > If so, check should be following. Right? > > > > if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1) > > || nb_events == 1) > > > > At least, It was my original intention in the code. > > > > > > > >> return (fp_ops->enqueue)(port, ev); > >> else > >> return fn(port, ev, nb_events); > >> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t > >> port_id, struct rte_event ev[], > >> * Allow zero cost non burst mode routine invocation if > >> application > >> * requests nb_events as const one > >> */ > >> - if (nb_events == 1) > >> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == > >> 1) > >> return (fp_ops->dequeue)(port, ev, timeout_ticks); > >> else > >> return (fp_ops->dequeue_burst)(port, ev, nb_events, > >> -- > >> 2.34.1 > >> >
Re: [RFC v2 1/3] ethdev: add frequency adjustment API
Hello, 03/04/2023 11:22, Simei Su: > This patch introduces a new timesync API "rte_eth_timesync_adjust_freq" > which enables frequency adjustment during PTP timesync. You should explain how it compares with existing functions like rte_eth_timesync_adjust_time(). [...] > /** > + * Adjust the clock increment rate on an Ethernet device. > + * > + * This is usually used in conjunction with other Ethdev timesync functions > to > + * synchronize the device time using the IEEE1588/802.1AS protocol. All timesync functions have this sentence, but it does not help to understand how to combine them. > + * > + * @param port_id > + * The port identifier of the Ethernet device. > + * @param ppm > + * Parts per million with 16-bit fractional field Sorry I don't understand what this parameter is about. Probably because I'm not an expert of IEEE1588/802.1AS protocol. Please make it possible to understand for basic users. > + * > + * @return > + * - 0: Success. > + * - -ENODEV: The port ID is invalid. > + * - -EIO: if device is removed. > + * - -ENOTSUP: The function is not supported by the Ethernet driver. > + */ > +int rte_eth_timesync_adjust_freq(uint16_t port_id, int64_t ppm); In general, I think there is an effort to provide in order to make the timesync API better understandable. Example of something to explain: how Rx and Tx timestamps of a port are differents? Is it different of rte_eth_timesync_read_time()? The function rte_eth_read_clock() was provided with a better explanation. Please make improvements in API documentation as part of this API, I don't want to get more functions without improving explanation of older functions. Thank you
[PATCH v1] doc: updated libcrypto dependencies in QAT guide
The documenation needs extra steps for customers to explicitly show how to install libcrypto, which is needed for QAT. It requires the commands to install libcrypto for Ubuntu and RHEL. Signed-off-by: Samina Arshad --- .mailmap | 1 + doc/guides/cryptodevs/qat.rst | 14 ++ 2 files changed, 15 insertions(+) diff --git a/.mailmap b/.mailmap index 0859104404..3c2c1fba83 100644 --- a/.mailmap +++ b/.mailmap @@ -1179,6 +1179,7 @@ Salem Sol Sameh Gobriel Sam Grove Samik Gupta +Samina Arshad Samuel Gauthier Sangjin Han Sankar Chokkalingam diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst index ef754106a8..a3694f7131 100644 --- a/doc/guides/cryptodevs/qat.rst +++ b/doc/guides/cryptodevs/qat.rst @@ -234,6 +234,20 @@ These are the build configuration options affecting QAT, and their default value Both QAT SYM PMD and QAT ASYM PMD have an external dependency on libcrypto, so are not built by default. +Ubuntu + +.. code-block:: console + + apt install libssl-dev + + +RHEL + +.. code-block:: console + + dnf install openssl-devel + + The QAT compressdev PMD has no external dependencies, so is built by default. The number of VFs per PF varies - see table below. If multiple QAT packages are -- 2.25.1
Re: [PATCH v2] vhost: avoid sleeping under mutex
On Mon, 15 May 2023 13:18:44 +0200 David Marchand wrote: > Covscan reported: > > 2. dpdk-21.11/lib/vhost/socket.c:852: lock_acquire: Calling function > "pthread_mutex_lock" acquires lock "vhost_user.mutex". > 23. dpdk-21.11/lib/vhost/socket.c:955: sleep: Call to > "vhost_user_reconnect_init" might sleep while holding lock > "vhost_user.mutex". > # 953| vsocket->reconnect = > !(flags & RTE_VHOST_USER_NO_RECONNECT); > # 954| if (vsocket->reconnect && reconn_tid == 0) { > # 955|-> if (vhost_user_reconnect_init() != 0) > # 956| goto out_mutex; > # 957| } > > The reason for this warning is that vhost_user_reconnect_init() creates a > ctrl thread and calls nanosleep waiting for this thread to be ready, > while vhost_user.mutex is taken. > > Move the call to vhost_user_reconnect_init() out of this mutex. > > While at it, a pthread_t value should be considered opaque. > Instead of relying reconn_tid == 0, use an internal flag in > vhost_user_reconnect_init(). > > Coverity issue: 373686 > Bugzilla ID: 981 > Fixes: e623e0c6d8a5 ("vhost: add reconnect ability") > Cc: sta...@dpdk.org > > Signed-off-by: David Marchand > --- > Changes since v1: > - moved reconn_tid in vhost_user_reconnect_init as this > variable is not used anywhere else, > > --- > lib/vhost/socket.c | 19 +-- > 1 file changed, 13 insertions(+), 6 deletions(-) > > diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c > index 669c322e12..00a912c59e 100644 > --- a/lib/vhost/socket.c > +++ b/lib/vhost/socket.c > @@ -427,7 +427,6 @@ struct vhost_user_reconnect_list { > }; > > static struct vhost_user_reconnect_list reconn_list; > -static pthread_t reconn_tid; > > static int > vhost_user_connect_nonblock(char *path, int fd, struct sockaddr *un, size_t > sz) > @@ -498,8 +497,13 @@ vhost_user_client_reconnect(void *arg __rte_unused) > static int > vhost_user_reconnect_init(void) > { > + static bool reconn_init_done; > + static pthread_t reconn_tid; This would not be thread safe. Perhaps use a pthread condition variable instead of spinning here.
Re: [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support
On Mon, 15 May 2023 06:47:00 + Miao Li wrote: > + map_addr = pci_map_resource(addr, vfio_dev_fd, > + bar->offset + sparse->offset, > sparse->size, > + RTE_MAP_FORCE_ADDRESS); > + if (map_addr == NULL) { > + munmap(bar_addr, bar->size); > + RTE_LOG(ERR, EAL, "Failed to map pci > BAR%d\n", > + bar_index); If mmap() fails then printing errno would help diagnose why.
[PATCH] dma/idxd: add support for multi-process when using VFIO
When using vfio-pci/uio for hardware access, we need to avoid reinitializing the hardware when mapping from a secondary process. Instead, just configure the function pointers and reuse the data mappings from the primary process. With the code change, update driver doc with the information that vfio-pci can be used for multi-process support, and explicitly state the limitation on multi-process support being unavailable when using idxd kernel driver. Signed-off-by: Bruce Richardson --- doc/guides/dmadevs/idxd.rst| 5 + drivers/dma/idxd/idxd_common.c | 6 -- drivers/dma/idxd/idxd_pci.c| 30 ++ 3 files changed, 39 insertions(+), 2 deletions(-) diff --git a/doc/guides/dmadevs/idxd.rst b/doc/guides/dmadevs/idxd.rst index bdfd3e78ad..f75d1d0a85 100644 --- a/doc/guides/dmadevs/idxd.rst +++ b/doc/guides/dmadevs/idxd.rst @@ -35,6 +35,11 @@ Device Setup Intel\ |reg| DSA devices can use the IDXD kernel driver or DPDK-supported drivers, such as ``vfio-pci``. Both are supported by the IDXD PMD. +.. note:: +To use Intel\ |reg| DSA devices in DPDK multi-process applications, +the devices should be bound to the vfio-pci driver. +Multi-process is not supported when using the kernel IDXD driver. + Intel\ |reg| DSA devices using IDXD kernel driver ~~ diff --git a/drivers/dma/idxd/idxd_common.c b/drivers/dma/idxd/idxd_common.c index 6fe8ad4884..83d53942eb 100644 --- a/drivers/dma/idxd/idxd_common.c +++ b/drivers/dma/idxd/idxd_common.c @@ -599,6 +599,10 @@ idxd_dmadev_create(const char *name, struct rte_device *dev, dmadev->fp_obj->completed = idxd_completed; dmadev->fp_obj->completed_status = idxd_completed_status; dmadev->fp_obj->burst_capacity = idxd_burst_capacity; + dmadev->fp_obj->dev_private = dmadev->data->dev_private; + + if (rte_eal_process_type() != RTE_PROC_PRIMARY) + return 0; idxd = dmadev->data->dev_private; *idxd = *base_idxd; /* copy over the main fields already passed in */ @@ -619,8 +623,6 @@ idxd_dmadev_create(const char *name, struct rte_device *dev, idxd->batch_idx_ring = (void *)&idxd->batch_comp_ring[idxd->max_batches+1]; idxd->batch_iova = rte_mem_virt2iova(idxd->batch_comp_ring); - dmadev->fp_obj->dev_private = idxd; - idxd->dmadev->state = RTE_DMA_DEV_READY; return 0; diff --git a/drivers/dma/idxd/idxd_pci.c b/drivers/dma/idxd/idxd_pci.c index 781fa02db3..5fe9314d01 100644 --- a/drivers/dma/idxd/idxd_pci.c +++ b/drivers/dma/idxd/idxd_pci.c @@ -309,6 +309,36 @@ idxd_dmadev_probe_pci(struct rte_pci_driver *drv, struct rte_pci_device *dev) IDXD_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node); dev->device.driver = &drv->driver; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + char qname[32]; + int max_qid; + + /* look up queue 0 to get the pci structure */ + snprintf(qname, sizeof(qname), "%s-q0", name); + IDXD_PMD_INFO("Looking up %s\n", qname); + ret = idxd_dmadev_create(qname, &dev->device, NULL, &idxd_pci_ops); + if (ret != 0) { + IDXD_PMD_ERR("Failed to create dmadev %s", name); + return ret; + } + qid = rte_dma_get_dev_id_by_name(qname); + max_qid = rte_atomic16_read( + &((struct idxd_dmadev *)rte_dma_fp_objs[qid].dev_private)->u.pci->ref_count); + + /* we have queue 0 done, now configure the rest of the queues */ + for (qid = 1; qid < max_qid; qid++) { + /* add the queue number to each device name */ + snprintf(qname, sizeof(qname), "%s-q%d", name, qid); + IDXD_PMD_INFO("Looking up %s\n", qname); + ret = idxd_dmadev_create(qname, &dev->device, NULL, &idxd_pci_ops); + if (ret != 0) { + IDXD_PMD_ERR("Failed to create dmadev %s", name); + return ret; + } + } + return 0; + } + if (dev->device.devargs && dev->device.devargs->args[0] != '\0') { /* if the number of devargs grows beyond just 1, use rte_kvargs */ if (sscanf(dev->device.devargs->args, -- 2.39.2
Re: [PATCH v5] app/testpmd: txonly multiflow port change support
Does this patch need anything else to be done before it can be merged? I'm hoping to get this patch merged as part of the 23.07 release. Thanks, Josh
Re: [PATCH] net/gve: support queue start and stop operations
tested-by: Rushil Gupta On Mon, May 8, 2023 at 8:07 PM Junfeng Guo wrote: > Add support for queue operations for GQI: > - gve_rx_queue_start > - gve_tx_queue_start > - gve_rx_queue_stop > - gve_tx_queue_stop > > Add support for queue operations for DQO: > - gve_rx_queue_start_dqo > - gve_tx_queue_start_dqo > - gve_rx_queue_stop_dqo > - gve_tx_queue_stop_dqo > > Also move the funcs of rxq_mbufs_alloc into the corresponding files. > > Signed-off-by: Junfeng Guo > --- > drivers/net/gve/gve_ethdev.c | 166 +++ > drivers/net/gve/gve_ethdev.h | 36 > drivers/net/gve/gve_rx.c | 96 ++-- > drivers/net/gve/gve_rx_dqo.c | 97 ++-- > drivers/net/gve/gve_tx.c | 54 ++-- > drivers/net/gve/gve_tx_dqo.c | 54 ++-- > 6 files changed, 364 insertions(+), 139 deletions(-) > > diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c > index 8b6861a24f..1dcb3b3a01 100644 > --- a/drivers/net/gve/gve_ethdev.c > +++ b/drivers/net/gve/gve_ethdev.c > @@ -104,81 +104,6 @@ gve_dev_configure(struct rte_eth_dev *dev) > return 0; > } > > -static int > -gve_refill_pages(struct gve_rx_queue *rxq) > -{ > - struct rte_mbuf *nmb; > - uint16_t i; > - int diag; > - > - diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], > rxq->nb_rx_desc); > - if (diag < 0) { > - for (i = 0; i < rxq->nb_rx_desc - 1; i++) { > - nmb = rte_pktmbuf_alloc(rxq->mpool); > - if (!nmb) > - break; > - rxq->sw_ring[i] = nmb; > - } > - if (i < rxq->nb_rx_desc - 1) > - return -ENOMEM; > - } > - rxq->nb_avail = 0; > - rxq->next_avail = rxq->nb_rx_desc - 1; > - > - for (i = 0; i < rxq->nb_rx_desc; i++) { > - if (rxq->is_gqi_qpl) { > - rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(i * > PAGE_SIZE); > - } else { > - if (i == rxq->nb_rx_desc - 1) > - break; > - nmb = rxq->sw_ring[i]; > - rxq->rx_data_ring[i].addr = > rte_cpu_to_be_64(rte_mbuf_data_iova(nmb)); > - } > - } > - > - rte_write32(rte_cpu_to_be_32(rxq->next_avail), rxq->qrx_tail); > - > - return 0; > -} > - > -static int > -gve_refill_dqo(struct gve_rx_queue *rxq) > -{ > - struct rte_mbuf *nmb; > - uint16_t i; > - int diag; > - > - diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], > rxq->nb_rx_desc); > - if (diag < 0) { > - rxq->stats.no_mbufs_bulk++; > - for (i = 0; i < rxq->nb_rx_desc - 1; i++) { > - nmb = rte_pktmbuf_alloc(rxq->mpool); > - if (!nmb) > - break; > - rxq->sw_ring[i] = nmb; > - } > - if (i < rxq->nb_rx_desc - 1) { > - rxq->stats.no_mbufs += rxq->nb_rx_desc - 1 - i; > - return -ENOMEM; > - } > - } > - > - for (i = 0; i < rxq->nb_rx_desc; i++) { > - if (i == rxq->nb_rx_desc - 1) > - break; > - nmb = rxq->sw_ring[i]; > - rxq->rx_ring[i].buf_addr = > rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb)); > - rxq->rx_ring[i].buf_id = rte_cpu_to_le_16(i); > - } > - > - rxq->nb_rx_hold = 0; > - rxq->bufq_tail = rxq->nb_rx_desc - 1; > - > - rte_write32(rxq->bufq_tail, rxq->qrx_tail); > - > - return 0; > -} > - > static int > gve_link_update(struct rte_eth_dev *dev, __rte_unused int > wait_to_complete) > { > @@ -208,65 +133,68 @@ gve_link_update(struct rte_eth_dev *dev, > __rte_unused int wait_to_complete) > } > > static int > -gve_dev_start(struct rte_eth_dev *dev) > +gve_start_queues(struct rte_eth_dev *dev) > { > - uint16_t num_queues = dev->data->nb_tx_queues; > struct gve_priv *priv = dev->data->dev_private; > - struct gve_tx_queue *txq; > - struct gve_rx_queue *rxq; > + uint16_t num_queues; > uint16_t i; > - int err; > + int ret; > > + num_queues = dev->data->nb_tx_queues; > priv->txqs = (struct gve_tx_queue **)dev->data->tx_queues; > - err = gve_adminq_create_tx_queues(priv, num_queues); > - if (err) { > - PMD_DRV_LOG(ERR, "failed to create %u tx queues.", > num_queues); > - return err; > - } > - for (i = 0; i < num_queues; i++) { > - txq = priv->txqs[i]; > - txq->qtx_tail = > - &priv->db_bar2[rte_be_to_cpu_32(txq->qres->db_index)]; > - txq->qtx_head = > - > &priv->cnt_array[rte_be_to_cpu_32(txq->qres->counter_index)]; > - > -
Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
On 2023-05-15 14:38, Jerin Jacob wrote: On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom wrote: On 2023-05-12 13:59, Jerin Jacob wrote: On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom wrote: Use non-burst event enqueue and dequeue calls from burst enqueue and dequeue only when the burst size is compile-time constant (and equal to one). Signed-off-by: Mattias Rönnblom --- v3: Actually include the change v2 claimed to contain. v2: Wrap builtin call in __extension__, to avoid compiler warnings if application is compiled with -pedantic. (Morten Brørup) --- lib/eventdev/rte_eventdev.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h index a90e23ac8b..a471caeb6d 100644 --- a/lib/eventdev/rte_eventdev.h +++ b/lib/eventdev/rte_eventdev.h @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id, * Allow zero cost non burst mode routine invocation if application * requests nb_events as const one */ - if (nb_events == 1) + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1) "Why" part is not clear from the commit message. Is this to avoid nb_events read if it is built-in const. The __builtin_constant_p() is introduced to avoid having the compiler generate a conditional branch and two different code paths in case nb_elem is a run-time variable. In particular, this matters if nb_elems is run-time variable and varies between 1 and some larger value. I should have mention this in the commit message. A very slight performance improvement. It also makes the code better match the comment, imo. Zero cost for const one enqueues, but no impact non-compile-time-constant-length enqueues. Feel free to ignore. I did some performance comparison of the patch. A low-end ARM machines shows 0.7% drop with single event case. No regression see with high-end ARM cores with single event case. IMO, optimizing the check for burst mode(the new patch) may not show any real improvement as the cost is divided by number of event. Whereas optimizing the check for single event case(The current code) shows better performance with single event case and no regression with burst mode as cost is divided by number of events. I ran some tests on an AMD Zen 3 with DSW. In the below tests the enqueue burst size is not compile-time constant. Enqueue burst size Performance improvement Run-time constant 1 ~5% Run-time constant 2 ~0% Run-time variable 1-2 ~9% Run-time variable 1-16 ~0% The run-time variable enqueue sizes randomly (uniformly) distributed in the specified range. The first result may come as a surprise. The benchmark is using RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type in most apps). The single-event enqueue function only exists in a generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent). I suspect that is the reason for the performance improvement. This effect is large-enough to make it somewhat beneficial (+~1%) to use run-time variable single-event enqueue compared to keeping the burst size compile-time constant. The performance gain is counted toward both enqueue and dequeue costs (+benchmark app overhead), so an under-estimation if see this as an enqueue performance improvement. If you agree, then we can skip this patch. I have no strong opinion if this should be included or not. It was up to me, I would drop the single-enqueue special case handling altogether in the next ABI update. If so, check should be following. Right? if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1) || nb_events == 1) At least, It was my original intention in the code. return (fp_ops->enqueue)(port, ev); else return fn(port, ev, nb_events); @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[], * Allow zero cost non burst mode routine invocation if application * requests nb_events as const one */ - if (nb_events == 1) + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1) return (fp_ops->dequeue)(port, ev, timeout_ticks); else return (fp_ops->dequeue_burst)(port, ev, nb_events, -- 2.34.1
DTS roadmap for 23.07 release
Hello, Following is the roadmap for DTS for 23.07 release. 1) Tooling for developing automated documentation for DTS API docs. Uses Google docstring format. Link: http://patches.dpdk.org/project/dpdk/cover/20230511091408.236638-1-juraj.lin...@pantheon.tech/ 2) Integrate Fabric. This will contain the support for sudo with no password, removing use of scp. Link: http://patches.dpdk.org/project/dpdk/patch/20230424133537.58698-1-juraj.lin...@pantheon.tech/ 3) Traffic Generator abstraction http://patches.dpdk.org/project/dpdk/cover/20230420093109.594704-1-juraj.lin...@pantheon.tech/ 3) Non traffic based smoke tests http://patches.dpdk.org/project/dpdk/cover/20230413175415.7683-2-jspew...@iol.unh.edu/ 4) Minor changes: a) Using a git commit ID instead of tarball. Link: http://patches.dpdk.org/project/dpdk/patch/20230420140244.701467-1-juraj.lin...@pantheon.tech/ b) Updates to the linter tools. Link: http://patches.dpdk.org/project/dpdk/patch/20230331091355.1224059-1-juraj.lin...@pantheon.tech/ 5) Docker file for build and run environment Link: http://patches.dpdk.org/project/dpdk/patch/20221103134633.446646-1-juraj.lin...@pantheon.tech/ Thank you, Honnappa
Re: [PATCH v5] app/testpmd: txonly multiflow port change support
On 4/22/2023 12:20 AM, Joshua Washington wrote: > Google cloud routes traffic using IP addresses without the support of MAC > addresses, so changing source IP address for txonly-multi-flow can have > negative performance implications for net/gve when using testpmd. This > patch updates txonly multiflow mode to modify source ports instead of > source IP addresses. > > The change can be tested with the following command: > dpdk-testpmd -- --forward-mode=txonly --txonly-multi-flow \ > --tx-ip=, > > Signed-off-by: Joshua Washington > Reviewed-by: Rushil Gupta Hi Joshua, Aman, The reason to support multi-flow in Tx-only mode is, to test RSS in the Rx side. When used in guest, in a hypervisor infrastructure what is the usecase for multi-flow, why not use default single IP? And if there is a need to support multi-flow without updating source IP, what about to support various modes as "--txonly-multi-flow=XXX", where XXX can be src_ip, dst_ip, src_port, dst_port? If possible by keeping no param '--txonly-multi-flow' same as current usage (src_ip) for backward capability.. This extends testing capability and lets different platforms select different config based on their needs.
RE: [PATCH] examples/l3fwd: add hard code to collect empty poll and NIC counters
> > > > > > > > On Thu, May 11, 2023 at 1:55 PM Feifei Wang > > > wrote: > > > > > > > > This patch is to collect empty poll of 'rte_eth_rx_burst' > > > > functions in dpdk l3fwd application. Empty poll means Rx burst > > > > function receives no pkts in one loop. > > > > > > > > Furthermore, we also add 'nic_xstats_display' API to show NIC counters. > > > > > > > > Usage: > > > > With this patch, no special settings, just run l3fwd, and when you > > > > stoping l3fwd, thread will print the info above. > > > > > > > > Note: > > > > This patch has just a slight impact on performance and can be ignored. > > How much is the regression? > > > > > > > > IMO, We should not introduce regression as l3fwd kind of uses as > > > reference application. > > > I think, l3fwd should limit to stats exposed by ethdev(i.e directly > > > from NIC, without performance regression). > > Agree L3fwd is the reference app. Unfortunately, it is not in a state to > > debug > any problems. May be many are just believing the numbers without > understanding that there are problems. > > Can we place these stats under a run time flag and reduce the impact > further? > > I think, example applications, we can have compile time option for new > feature addtions in fastpath or add new forwarding mode in testpmd. New forwarding (L3fwd in this case) mode for testpmd was rejected overwhelmingly. So, that's not an option. I thought compile time options are discouraged as well. But, I am fine with the compile time approach to get some debugging capabilities in this application. May be we could understand the performance difference with run time flag? > > > > > > > > > > > > > > > > > > > > dpdk version:23.03 > > > > > > > > Suggested-by: Lijian Zhang > > > > Signed-off-by: Feifei Wang > > > > Reviewed-by: Ruifeng Wang > > > > Reviewed-by: Honnappa Nagarahalli > > > > > --- > > > > examples/l3fwd/l3fwd.h | 68 > > > ++ > > > > examples/l3fwd/l3fwd_lpm.c | 26 +-- > > > > examples/l3fwd/main.c | 22 > > > > 3 files changed, 114 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index > > > > b55855c932..2b3fca62f3 100644 > > > > --- a/examples/l3fwd/l3fwd.h > > > > +++ b/examples/l3fwd/l3fwd.h > > > > @@ -56,6 +56,17 @@ > > > > #define L3FWD_HASH_ENTRIES (1024*1024*1) > > > > #endif > > > > > > > > +struct lcore_stats { > > > > +uint32_t nb_rx_pkts[16]; > > > > +uint32_t num_loop[16]; > > > > +uint32_t none_loop[16]; > > > > +uint32_t no_full_loop[16]; > > > > +float none_loop_per[16]; > > > > +float no_full_loop_per[16]; > > > > +} __rte_cache_aligned; > > > > + > > > > +extern struct lcore_stats stats[RTE_MAX_LCORE]; > > > > + > > > > struct parm_cfg { > > > > const char *rule_ipv4_name; > > > > const char *rule_ipv6_name; @@ -115,6 +126,63 @@ extern > > > > struct acl_algorithms acl_alg[]; > > > > > > > > extern uint32_t max_pkt_len; > > > > > > > > +static inline void > > > > +nic_xstats_display(uint32_t port_id) { > > > > +struct rte_eth_xstat *xstats; > > > > +int cnt_xstats, idx_xstat; > > > > +struct rte_eth_xstat_name *xstats_names; > > > > + > > > > +printf("## NIC extended statistics for port %-2d\n", > > > > port_id); > > > > +if (!rte_eth_dev_is_valid_port(port_id)) { > > > > +fprintf(stderr, "Error: Invalid port number %i\n", > > > > port_id); > > > > +return; > > > > +} > > > > + > > > > +/* Get count */ > > > > +cnt_xstats = rte_eth_xstats_get_names(port_id, NULL, 0); > > > > +if (cnt_xstats < 0) { > > > > +fprintf(stderr, "Error: Cannot get count of xstats\n"); > > > > +return; > > > > +} > > > > + > > > > +/* Get id-name lookup table */ > > > > +xstats_names = malloc(sizeof(struct rte_eth_xstat_name) * > cnt_xstats); > > > > +if (xstats_names == NULL) { > > > > +fprintf(stderr, "Cannot allocate memory for xstats > > > > lookup\n"); > > > > +return; > > > > +} > > > > +if (cnt_xstats != rte_eth_xstats_get_names( > > > > +port_id, xstats_names, cnt_xstats)) { > > > > +fprintf(stderr, "Error: Cannot get xstats lookup\n"); > > > > +free(xstats_names); > > > > +return; > > > > +} > > > > + > > > > +/* Get stats themselves */ > > > > +xstats = malloc(sizeof(struct rte_eth_xstat) * cnt_xstats); > > > > +if (xstats == NULL) { > > > > +fprintf(stderr, "Cannot allocate memory for xstats\n"); > > > > +free(xstats_names); > > > > +return; > > > > +} > > > > +if (cnt_xstats != rte_eth_xstats_get(port_id, xstats, > > > > cnt_xstats)) { > > > > +fpr
Re: [PATCH v2] net/tap: resolve stringop-overflow with gcc 12 on ppc64le
On 3/23/2023 5:01 PM, David Christensen wrote: > Building DPDK with gcc 12 on a ppc64le system generates a > stringop-overflow warning. Replace the local MAC address > validation function parse_user_mac() with a call to > rte_ether_unformat_addr() instead. > > Bugzilla ID: 1197 > Cc: sta...@dpdk.org > > Signed-off-by: David Christensen > --- > v2: > * Added NULL checks previously performed in parse_user_mac() > --- > drivers/net/tap/rte_eth_tap.c | 26 ++ > 1 file changed, 2 insertions(+), 24 deletions(-) > > diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c > index 089ac202fa..8c50801fd4 100644 > --- a/drivers/net/tap/rte_eth_tap.c > +++ b/drivers/net/tap/rte_eth_tap.c > @@ -2267,29 +2267,6 @@ set_remote_iface(const char *key __rte_unused, > return 0; > } > > -static int parse_user_mac(struct rte_ether_addr *user_mac, > - const char *value) > -{ > - unsigned int index = 0; > - char mac_temp[strlen(ETH_TAP_USR_MAC_FMT) + 1], *mac_byte = NULL; > - > - if (user_mac == NULL || value == NULL) > - return 0; > - > - strlcpy(mac_temp, value, sizeof(mac_temp)); > - mac_byte = strtok(mac_temp, ":"); > - > - while ((mac_byte != NULL) && > - (strlen(mac_byte) <= 2) && > - (strlen(mac_byte) == strspn(mac_byte, > - ETH_TAP_CMP_MAC_FMT))) { > - user_mac->addr_bytes[index++] = strtoul(mac_byte, NULL, 16); > - mac_byte = strtok(NULL, ":"); > - } > - > - return index; > -} > - > static int > set_mac_type(const char *key __rte_unused, >const char *value, > @@ -2311,7 +2288,8 @@ set_mac_type(const char *key __rte_unused, > goto success; > } > > - if (parse_user_mac(user_mac, value) != 6) > + if (value == NULL || user_mac == NULL || > + rte_ether_unformat_addr(value, user_mac) < 0) > goto error; > success: > TAP_LOG(DEBUG, "TAP user MAC param (%s)", value); Hi David, I confirm the build error, btw it helps to future references to put build failure to the commit log, and change is reasonable to convert PMD local parse function to an API, BUT my concern is they don't behave exactly same, which changes user interface of the driver. The 'rte_ether_unformat_addr()' API expects exact "XX:XX:XX:XX:XX:XX or ::" format. Like 'parse_user_mac()' accepts 'a:a:a:a:a:a' as input, but API requires '0A:0A:0A:0A:0A:0A'. This is a small change but still may create a bad experience if an existing user/script hit by this, and I believe we don't have a strong reason to change the interface. To keep behavior same, we can either update 'rte_ether_unformat_addr()' to accept singe chars between ':', or fix the existing 'parse_user_mac()' for compiler warning, what do you think?
Re: [PATCH v2] net/tap: resolve stringop-overflow with gcc 12 on ppc64le
On Tue, 16 May 2023 00:14:52 +0100 Ferruh Yigit wrote: > Hi David, > > I confirm the build error, btw it helps to future references to put > build failure to the commit log, > > and change is reasonable to convert PMD local parse function to an API, > BUT my concern is they don't behave exactly same, which changes user > interface of the driver. > > The 'rte_ether_unformat_addr()' API expects exact "XX:XX:XX:XX:XX:XX or > ::" format. > Like 'parse_user_mac()' accepts 'a:a:a:a:a:a' as input, but API requires > '0A:0A:0A:0A:0A:0A'. > > This is a small change but still may create a bad experience if an > existing user/script hit by this, and I believe we don't have a strong > reason to change the interface. > > > To keep behavior same, we can either update 'rte_ether_unformat_addr()' > to accept singe chars between ':', > or fix the existing 'parse_user_mac()' for compiler warning, what do you > think? This is the kind of change where a simple release note will suffice. Not sure if anyone beyond some test script would ever use this anyway.
Re: [PATCH v2] net/tap: resolve stringop-overflow with gcc 12 on ppc64le
On 5/16/2023 12:20 AM, Stephen Hemminger wrote: > On Tue, 16 May 2023 00:14:52 +0100 > Ferruh Yigit wrote: > >> Hi David, >> >> I confirm the build error, btw it helps to future references to put >> build failure to the commit log, >> >> and change is reasonable to convert PMD local parse function to an API, >> BUT my concern is they don't behave exactly same, which changes user >> interface of the driver. >> >> The 'rte_ether_unformat_addr()' API expects exact "XX:XX:XX:XX:XX:XX or >> ::" format. >> Like 'parse_user_mac()' accepts 'a:a:a:a:a:a' as input, but API requires >> '0A:0A:0A:0A:0A:0A'. >> >> This is a small change but still may create a bad experience if an >> existing user/script hit by this, and I believe we don't have a strong >> reason to change the interface. >> >> >> To keep behavior same, we can either update 'rte_ether_unformat_addr()' >> to accept singe chars between ':', >> or fix the existing 'parse_user_mac()' for compiler warning, what do you >> think? > > This is the kind of change where a simple release note will suffice. > > Not sure if anyone beyond some test script would ever use this anyway. Yes only some scripts and possible applications that hotplug tap interface with hardcoded parameters may impacted, don't know how big is this amount but this ends up breaking something that was working before upgrading DPDK for them. And I believe the motivation is weak to break the behavior. Won't it be better to update 'rte_ether_unformat_addr()' to accept more flexible syntax, and use it? Is there any disadvantage of this approach?
Re: [PATCH v2] net/tap: resolve stringop-overflow with gcc 12 on ppc64le
On Tue, 16 May 2023 00:35:56 +0100 Ferruh Yigit wrote: > Yes only some scripts and possible applications that hotplug tap > interface with hardcoded parameters may impacted, don't know how big is > this amount but this ends up breaking something that was working before > upgrading DPDK for them. > > And I believe the motivation is weak to break the behavior. > > Won't it be better to update 'rte_ether_unformat_addr()' to accept more > flexible syntax, and use it? Is there any disadvantage of this approach? It is already more flexible than the standard ether_aton().
RE: [PATCH v2] net/i40e: remove redundant judgment
> -Original Message- > From: Zhang, Qi Z > Sent: Monday, May 15, 2023 9:59 AM > To: Zhang, Qi Z ; Feifei Wang > ; Richardson, Bruce ; > Konstantin Ananyev ; Zhang, Yuying > ; Xing, Beilei ; David > Christensen ; Ruifeng Wang > > Cc: dev@dpdk.org; nd ; Honnappa Nagarahalli > > Subject: RE: [PATCH v2] net/i40e: remove redundant judgment > > > > > -Original Message- > > From: Zhang, Qi Z > > Sent: Thursday, April 27, 2023 3:38 PM > > To: Feifei Wang ; Richardson, Bruce > > ; Konstantin Ananyev > > ; Zhang, Yuying > > ; Xing, Beilei ; David > > Christensen ; Ruifeng Wang > > > > Cc: dev@dpdk.org; n...@arm.com; Honnappa Nagarahalli > > > > Subject: RE: [PATCH v2] net/i40e: remove redundant judgment > > > > > > > > > -Original Message- > > > From: Feifei Wang > > > Sent: Tuesday, March 28, 2023 3:28 PM > > > To: Richardson, Bruce ; Konstantin > > > Ananyev ; Zhang, Yuying > > > ; Xing, Beilei ; > > > David Christensen ; Ruifeng Wang > > > > > > Cc: dev@dpdk.org; n...@arm.com; Feifei Wang ; > > > Honnappa Nagarahalli > > > Subject: [PATCH v2] net/i40e: remove redundant judgment > > > > > > Merged variable updates under the same condition. It reduces branch. > > > > > > In ampere-altra, there is no performance improvement with this patch. > > > In x86 sse and avx2 path, there is also no performance improvement. > > > > Thanks for sharing the results. While the code implements some best > > practices, such as reducing branching and adding compiler hints, which > > should generally improve performance, it's not necessary to highlight > > that it didn't provide benefits on certain specific platforms. > > > > Would it be ok to remove the last two lines when merging the patch? > > Ping > Sorry for I did not reply this. I agree with this when merging the patch. Thanks for the comments~. > > > > Otherwise > > Acked-by: Qi Zhang > > > > > > > > > > > > > v2: > > > 1. add change for avx and altivec path. > > > > > > Suggested-by: Honnappa Nagarahalli > > > Signed-off-by: Feifei Wang > > > Reviewed-by: Ruifeng Wang > > > --- > > > drivers/net/i40e/i40e_rxtx_common_avx.h | 9 + > > > drivers/net/i40e/i40e_rxtx_vec_altivec.c | 9 + > > > drivers/net/i40e/i40e_rxtx_vec_neon.c| 9 + > > > drivers/net/i40e/i40e_rxtx_vec_sse.c | 9 + > > > 4 files changed, 20 insertions(+), 16 deletions(-) > > > > > > diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h > > > b/drivers/net/i40e/i40e_rxtx_common_avx.h > > > index cfc1e63173..85958d6c81 100644 > > > --- a/drivers/net/i40e/i40e_rxtx_common_avx.h > > > +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h > > > @@ -198,14 +198,15 @@ i40e_rxq_rearm_common(struct > i40e_rx_queue > > *rxq, > > > __rte_unused bool avx512) #endif > > > > > > rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; > > > - if (rxq->rxrearm_start >= rxq->nb_rx_desc) > > > + rx_id = rxq->rxrearm_start - 1; > > > + > > > + if (unlikely(rxq->rxrearm_start >= rxq->nb_rx_desc)) { > > > rxq->rxrearm_start = 0; > > > + rx_id = rxq->nb_rx_desc - 1; > > > + } > > > > > > rxq->rxrearm_nb -= RTE_I40E_RXQ_REARM_THRESH; > > > > > > - rx_id = (uint16_t)((rxq->rxrearm_start == 0) ? > > > - (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1)); > > > - > > > /* Update the tail pointer on the NIC */ > > > I40E_PCI_REG_WC_WRITE(rxq->qrx_tail, rx_id); } diff --git > > > a/drivers/net/i40e/i40e_rxtx_vec_altivec.c > > > b/drivers/net/i40e/i40e_rxtx_vec_altivec.c > > > index 2dfa04599c..8672ad1c41 100644 > > > --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c > > > +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c > > > @@ -89,14 +89,15 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) > > > } > > > > > > rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; > > > - if (rxq->rxrearm_start >= rxq->nb_rx_desc) > > > + rx_id = rxq->rxrearm_start - 1; > > > + > > > + if (unlikely(rxq->rxrearm_start >= rxq->nb_rx_desc)) { > > > rxq->rxrearm_start = 0; > > > + rx_id = rxq->nb_rx_desc - 1; > > > + } > > > > > > rxq->rxrearm_nb -= RTE_I40E_RXQ_REARM_THRESH; > > > > > > - rx_id = (uint16_t)((rxq->rxrearm_start == 0) ? > > > - (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1)); > > > - > > > /* Update the tail pointer on the NIC */ > > > I40E_PCI_REG_WRITE(rxq->qrx_tail, rx_id); } diff --git > > > a/drivers/net/i40e/i40e_rxtx_vec_neon.c > > > b/drivers/net/i40e/i40e_rxtx_vec_neon.c > > > index 12e6f1cbcb..49391fe4c7 100644 > > > --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c > > > +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c > > > @@ -64,14 +64,15 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) > > > } > > > > > > rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; > > > - if (rxq->rxrearm_start >= rxq->nb_rx_desc) > > > + rx_id = rxq->rxrearm_start - 1; > > > + > > > + if (unlikely(rxq->rxrearm_start >= rxq->nb_rx_desc)) { > > > rxq->rxrearm_start = 0; > > > + rx_id = rxq->nb_rx_des
[Bug 1230] [dpdk-22.07][meson test] malloc_autotest test: failed to test
https://bugs.dpdk.org/show_bug.cgi?id=1230 Bug ID: 1230 Summary: [dpdk-22.07][meson test] malloc_autotest test: failed to test Product: DPDK Version: 23.07 Hardware: All OS: FreeBSD Status: UNCONFIRMED Severity: normal Priority: Normal Component: meson Assignee: dev@dpdk.org Reporter: yux.ji...@intel.com Target Milestone: --- Environment: - DPDK version: 23.03 OS: FreeBSD 13.2-RELEASE Compiler: gcc version 12.2.0 (FreeBSD Ports Collection) NIC hardware: Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 Test Setup: - rm -rf x86_64-native-bsdapp-gcc/ CC=gcc meson -Denable_kmods=True -Dlibdir=lib --default-library=static x86_64-native-bsdapp-gcc ninja -C x86_64-native-bsdapp-gcc kldunload contigmem.ko kldunload nic_uio.ko kldload x86_64-native-bsdapp-gcc/kernel/freebsd/contigmem.ko kldload x86_64-native-bsdapp-gcc/kernel/freebsd/nic_uio.ko kenv hw.nic_uio.bdfs="2:0:0,2:0:1" MALLOC_PERTURB_=162 DPDK_TEST=malloc_autotest /root/dpdk/x86_64-native-bsdapp-gcc/app/test/dpdk-test Show the output from the previous commands: - Incorrect heap statistics: Allocated size test_multi_alloc_statistics() failed Test Failed RTE>> Expected Result: - Test OK Regression: - Is this issue a regression: (Y/N) Y Version the regression was introduced: commit f62f4a375ff496abf66e48d5e1b1c442b86a82c1 Author: Fengnan Chang Date: Fri Feb 10 14:30:22 2023 +0800 malloc: optimize 4K allocations Here is a simple test case: -- You are receiving this mail because: You are the assignee for the bug.
Arm's roadmap for 23.07
(Bcc: Arm internal stake holders) Hello, Following are the work items planned for 23.07: 1. Buffer recycle (a.k.a direct-rearm of Rx side buffers) APIs and implementation. 2. Zero-copy mempool APIs, integration with i40e PMD and test cases. 3. Remove rte_ring generic implementation and improve c11 implementation (to allow for moving to C11 atomic APIs). 4. QAT on ARM support announcement. 5. IPSecMB support for tag SECLIB-IPSEC-2023.3.21. Thank you, Honnappa
RE: [PATCH] examples/ipsec-secgw: fix zero address in ethernet header
Ping. > -Original Message- > From: Rahul Bhansali > Sent: Thursday, March 30, 2023 3:39 PM > To: dev@dpdk.org; Radu Nicolau ; Akhil Goyal > > Cc: Rahul Bhansali > Subject: [PATCH] examples/ipsec-secgw: fix zero address in ethernet header > > During port init, src address stored in ethaddr_tbl is typecast which > violates the > stric-aliasing rule and not reflecting the updated source address in processed > packets too. > > Fixes: 6eb3ba0399 ("examples/ipsec-secgw: support poll mode NEON LPM > lookup") > > Signed-off-by: Rahul Bhansali > --- > examples/ipsec-secgw/ipsec-secgw.c | 20 ++-- > examples/ipsec-secgw/ipsec-secgw.h | 2 +- > 2 files changed, 11 insertions(+), 11 deletions(-) > > diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec- > secgw/ipsec-secgw.c > index d2d9d85b4a..029749e522 100644 > --- a/examples/ipsec-secgw/ipsec-secgw.c > +++ b/examples/ipsec-secgw/ipsec-secgw.c > @@ -99,10 +99,10 @@ uint32_t qp_desc_nb = 2048; > #define MTU_TO_FRAMELEN(x) ((x) + RTE_ETHER_HDR_LEN + > RTE_ETHER_CRC_LEN) > > struct ethaddr_info ethaddr_tbl[RTE_MAX_ETHPORTS] = { > - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x7e, 0x94, 0x9a) }, > - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x22, 0xa1, 0xd9) }, > - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x08, 0x69, 0x26) }, > - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x49, 0x9e, 0xdd) } > + { {{0}}, {{0x00, 0x16, 0x3e, 0x7e, 0x94, 0x9a}} }, > + { {{0}}, {{0x00, 0x16, 0x3e, 0x22, 0xa1, 0xd9}} }, > + { {{0}}, {{0x00, 0x16, 0x3e, 0x08, 0x69, 0x26}} }, > + { {{0}}, {{0x00, 0x16, 0x3e, 0x49, 0x9e, 0xdd}} } > }; > > struct offloads tx_offloads; > @@ -1427,9 +1427,8 @@ add_dst_ethaddr(uint16_t port, const struct > rte_ether_addr *addr) > if (port >= RTE_DIM(ethaddr_tbl)) > return -EINVAL; > > - ethaddr_tbl[port].dst = ETHADDR_TO_UINT64(addr); > - rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[port].dst, > - (struct rte_ether_addr *)(val_eth + port)); > + rte_ether_addr_copy(addr, ðaddr_tbl[port].dst); > + rte_ether_addr_copy(addr, (struct rte_ether_addr *)(val_eth + port)); > return 0; > } > > @@ -1907,11 +1906,12 @@ port_init(uint16_t portid, uint64_t req_rx_offloads, > uint64_t req_tx_offloads, > "Error getting MAC address (port %u): %s\n", > portid, rte_strerror(-ret)); > > - ethaddr_tbl[portid].src = ETHADDR_TO_UINT64(ðaddr); > + rte_ether_addr_copy(ðaddr, ðaddr_tbl[portid].src); > > - rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[portid].dst, > + rte_ether_addr_copy(ðaddr_tbl[portid].dst, > (struct rte_ether_addr *)(val_eth + portid)); > - rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[portid].src, > + > + rte_ether_addr_copy(ðaddr_tbl[portid].src, > (struct rte_ether_addr *)(val_eth + portid) + 1); > > print_ethaddr("Address: ", ðaddr); > diff --git a/examples/ipsec-secgw/ipsec-secgw.h b/examples/ipsec- > secgw/ipsec-secgw.h > index 0e0012d058..53665adf03 100644 > --- a/examples/ipsec-secgw/ipsec-secgw.h > +++ b/examples/ipsec-secgw/ipsec-secgw.h > @@ -84,7 +84,7 @@ struct ipsec_traffic_nb { > > /* port/source ethernet addr and destination ethernet addr */ struct > ethaddr_info { > - uint64_t src, dst; > + struct rte_ether_addr src, dst; > }; > > struct ipsec_spd_stats { > -- > 2.25.1
RE: [PATCH v6] enhance NUMA affinity heuristic
> -Original Message- > From: Thomas Monjalon > Sent: 2023年4月27日 14:58 > To: You, KaisenX > Cc: dev@dpdk.org; Zhou, YidingX ; > david.march...@redhat.com; Matz, Olivier ; > ferruh.yi...@amd.com; Burakov, Anatoly ; You, > KaisenX ; sta...@dpdk.org > Subject: Re: [PATCH v6] enhance NUMA affinity heuristic > > 25/04/2023 07:16, Kaisen You: > > Trying to allocate memory on the first detected numa node,it has less > > chance to find some memory actually available rather than on the main > > lcore numa node (especially when the DPDK application is started only > > on one numa node). > > You didn't change the explanations here as discussed previously. > > > > > Fixes: 8b0a1b8cb481 ("eal: stop using pthread for lcores and control > > threads") > > Fixes: 770d41bf3309 ("malloc: fix allocation with unknown socket ID") > > Cc: sta...@dpdk.org > > > > Signed-off-by: David Marchand > > Signed-off-by: Kaisen You > > > > --- > > Changes since v5: > > - Add comments to the code, > > > > Changes since v4: > > - mod the patch title, > > > > Changes since v3: > > - add the assignment of socket_id in thread initialization, > > > > Changes since v2: > > - add uncommitted local change and fix compilation, > > > > Changes since v1: > > - accomodate for configurations with main lcore running on multiples > > physical cores belonging to different numa, > > --- > > lib/eal/common/eal_common_thread.c | 4 > > lib/eal/common/malloc_heap.c | 6 ++ > > 2 files changed, 10 insertions(+) > > > > diff --git a/lib/eal/common/eal_common_thread.c > > b/lib/eal/common/eal_common_thread.c > > index 079a385630..d65bfe251b 100644 > > --- a/lib/eal/common/eal_common_thread.c > > +++ b/lib/eal/common/eal_common_thread.c > > @@ -252,6 +252,10 @@ static int ctrl_thread_init(void *arg) > > struct rte_thread_ctrl_params *params = arg; > > > > __rte_thread_init(rte_lcore_id(), cpuset); > > + /* set the value of the per-core variable _socket_id. > > +* Convenient for threads to find memory. > > +*/ > > That's an useless comment. > We want to know WHY you are setting to SOCKET_ID_ANY. > > > + RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY; > > params->ret = rte_thread_set_affinity_by_id(rte_thread_self(), > cpuset); > > if (params->ret != 0) { > > __atomic_store_n(¶ms->ctrl_thread_status, > > diff --git a/lib/eal/common/malloc_heap.c > > b/lib/eal/common/malloc_heap.c index d25bdc98f9..a624f08cf7 100644 > > --- a/lib/eal/common/malloc_heap.c > > +++ b/lib/eal/common/malloc_heap.c > > @@ -716,6 +716,12 @@ malloc_get_numa_socket(void) > > if (conf->socket_mem[socket_id] != 0) > > return socket_id; > > } > > + /* Trying to allocate memory on the main lcore numa node. > > +* especially when the DPDK application is started only on one numa > node. > > +*/ > > + socket_id = rte_lcore_to_socket_id(rte_get_main_lcore()); > > + if (socket_id != (unsigned int)SOCKET_ID_ANY) > > + return socket_id; > > You should explain why we could reach this point, i.e. SOCKET_ID_ANY. I'm sorry I didn't reply to you in time due to COVID-19 infection, I will send the v7 version as soon as possible to fix the problem you mentioned. > > > > > return rte_socket_id_by_idx(0); > > } > >
[PATCH v1 0/2] Fix VXLAN matching
Fix VXLAN matching with zero value and mis5 layout size calculation. Rongwei Liu (2): net/mlx5: fix matcher layout size calculation net/mlx5: fix VXLAN matching with zero value drivers/net/mlx5/mlx5_flow_dv.c | 37 ++--- 1 file changed, 11 insertions(+), 26 deletions(-) -- 2.27.0
[PATCH v1 1/2] net/mlx5: fix matcher layout size calculation
Initially, the rdma-core library only supported misc0 to misc3 fields in matching resources, misc4 and misc5 fields were added to handle new features. The matcher layout, passing from DPDK to rdma-core, shouldn't exceed the size of the engaged library version capabilities. For now, there is no way to know what is the maximum capability of rdma-core, and DPDK should limit the matcher layout to misc3 if possible (no matching on fields misc4 and misc5 are requested by the application). The check if misc4 and misc5 features were requested was based on checking the values against zeroes. The matching mask should be checked instead. Fixes: 630a587bfb37 ("net/mlx5: support matching on VXLAN reserved field") Cc: rongw...@nvidia.com Cc: sta...@dpdk.org Signed-off-by: Rongwei Liu Acked-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_flow_dv.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c index f136f43b0a..f44d621600 100644 --- a/drivers/net/mlx5/mlx5_flow_dv.c +++ b/drivers/net/mlx5/mlx5_flow_dv.c @@ -15167,7 +15167,7 @@ flow_dv_apply(struct rte_eth_dev *dev, struct rte_flow *flow, } dv->actions[n++] = priv->sh->default_miss_action; } - misc_mask = flow_dv_matcher_enable(dv->value.buf); + misc_mask = flow_dv_matcher_enable(dv_h->matcher->mask.buf); __flow_dv_adjust_buf_size(&dv->value.size, misc_mask); err = mlx5_flow_os_create_flow(dv_h->matcher->matcher_object, (void *)&dv->value, n, @@ -17367,7 +17367,7 @@ flow_dv_destroy_def_policy(struct rte_eth_dev *dev) static int __flow_dv_create_policy_flow(struct rte_eth_dev *dev, uint32_t color_reg_c_idx, - enum rte_color color, void *matcher_object, + enum rte_color color, struct mlx5_flow_dv_matcher *matcher, int actions_n, void *actions, bool match_src_port, const struct rte_flow_item *item, void **rule, const struct rte_flow_attr *attr) @@ -17397,9 +17397,9 @@ __flow_dv_create_policy_flow(struct rte_eth_dev *dev, } flow_dv_match_meta_reg(value.buf, (enum modify_reg)color_reg_c_idx, rte_col_2_mlx5_col(color), UINT32_MAX); - misc_mask = flow_dv_matcher_enable(value.buf); + misc_mask = flow_dv_matcher_enable(matcher->mask.buf); __flow_dv_adjust_buf_size(&value.size, misc_mask); - ret = mlx5_flow_os_create_flow(matcher_object, (void *)&value, + ret = mlx5_flow_os_create_flow(matcher->matcher_object, (void *)&value, actions_n, actions, rule); if (ret) { DRV_LOG(ERR, "Failed to create meter policy%d flow.", color); @@ -17553,7 +17553,7 @@ __flow_dv_create_domain_policy_rules(struct rte_eth_dev *dev, /* Create flow, matching color. */ if (__flow_dv_create_policy_flow(dev, color_reg_c_idx, (enum rte_color)i, - color_rule->matcher->matcher_object, + color_rule->matcher, acts[i].actions_n, acts[i].dv_actions, svport_match, NULL, &color_rule->rule, &attr)) { @@ -18021,7 +18021,7 @@ flow_dv_create_mtr_tbls(struct rte_eth_dev *dev, actions[i++] = priv->sh->dr_drop_action; flow_dv_match_meta_reg_all(matcher_para.buf, value.buf, (enum modify_reg)mtr_id_reg_c, 0, 0); - misc_mask = flow_dv_matcher_enable(value.buf); + misc_mask = flow_dv_matcher_enable(mtrmng->def_matcher[domain]->mask.buf); __flow_dv_adjust_buf_size(&value.size, misc_mask); ret = mlx5_flow_os_create_flow (mtrmng->def_matcher[domain]->matcher_object, @@ -18066,7 +18066,7 @@ flow_dv_create_mtr_tbls(struct rte_eth_dev *dev, fm->drop_cnt, NULL); actions[i++] = cnt->action; actions[i++] = priv->sh->dr_drop_action; - misc_mask = flow_dv_matcher_enable(value.buf); + misc_mask = flow_dv_matcher_enable(drop_matcher->mask.buf); __flow_dv_adjust_buf_size(&value.size, misc_mask); ret = mlx5_flow_os_create_flow(drop_matcher->matcher_object, (void *)&value, i, actions, @@ -18546,7 +18546,7 @@ flow_dv_meter_hierarchy_rule_create(struct rte_eth_dev *dev, goto err_exit; } if (__flow_dv_create_policy
[PATCH v1 2/2] net/mlx5: fix VXLAN matching with zero value
When an application wants to match VxLAN last_rsvd value zero, PMD sets the matching mask field to zero by mistake and it causes traffic with any last_rsvd value hits. The matching mask should be taken from application input directly, no need to perform the bit reset operation. Fixes: cd4ab742064a ("net/mlx5: split flow item matcher and value translation") Cc: suanmi...@nvidia.com Cc: sta...@dpdk.org Signed-off-by: Rongwei Liu Acked-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_flow_dv.c | 19 ++- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c index f44d621600..1abc4acad7 100644 --- a/drivers/net/mlx5/mlx5_flow_dv.c +++ b/drivers/net/mlx5/mlx5_flow_dv.c @@ -9480,12 +9480,10 @@ flow_dv_translate_item_vxlan(struct rte_eth_dev *dev, { const struct rte_flow_item_vxlan *vxlan_m; const struct rte_flow_item_vxlan *vxlan_v; - const struct rte_flow_item_vxlan *vxlan_vv = item->spec; void *headers_v; void *misc_v; void *misc5_v; uint32_t tunnel_v; - uint32_t *tunnel_header_v; char *vni_v; uint16_t dport; int size; @@ -9537,24 +9535,11 @@ flow_dv_translate_item_vxlan(struct rte_eth_dev *dev, vni_v[i] = vxlan_m->hdr.vni[i] & vxlan_v->hdr.vni[i]; return; } - tunnel_header_v = (uint32_t *)MLX5_ADDR_OF(fte_match_set_misc5, - misc5_v, - tunnel_header_1); tunnel_v = (vxlan_v->hdr.vni[0] & vxlan_m->hdr.vni[0]) | (vxlan_v->hdr.vni[1] & vxlan_m->hdr.vni[1]) << 8 | (vxlan_v->hdr.vni[2] & vxlan_m->hdr.vni[2]) << 16; - *tunnel_header_v = tunnel_v; - if (key_type == MLX5_SET_MATCHER_SW_M) { - tunnel_v = (vxlan_vv->hdr.vni[0] & vxlan_m->hdr.vni[0]) | - (vxlan_vv->hdr.vni[1] & vxlan_m->hdr.vni[1]) << 8 | - (vxlan_vv->hdr.vni[2] & vxlan_m->hdr.vni[2]) << 16; - if (!tunnel_v) - *tunnel_header_v = 0x0; - if (vxlan_vv->hdr.rsvd1 & vxlan_m->hdr.rsvd1) - *tunnel_header_v |= vxlan_v->hdr.rsvd1 << 24; - } else { - *tunnel_header_v |= (vxlan_v->hdr.rsvd1 & vxlan_m->hdr.rsvd1) << 24; - } + tunnel_v |= (vxlan_v->hdr.rsvd1 & vxlan_m->hdr.rsvd1) << 24; + MLX5_SET(fte_match_set_misc5, misc5_v, tunnel_header_1, RTE_BE32(tunnel_v)); } /** -- 2.27.0
[PATCH v1] net/mlx5: add test for hot upgrade
This patch adds testpmd app a runtime function to test the hot upgrade API. testpmd> mlx5 set flow_engine <0|1> (flag) 0 stands for active mode while 1 for standby mode. Signed-off-by: Rongwei Liu Acked-by: Viacheslav Ovsiienko --- doc/guides/nics/mlx5.rst| 10 ++ drivers/net/mlx5/mlx5_testpmd.c | 63 + 2 files changed, 73 insertions(+) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 7a137d5f6a..1867b2a3c0 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -2057,3 +2057,13 @@ where: * ``sw_queue_id``: queue index in range [64536, 65535]. This range is the highest 1000 numbers. * ``hw_queue_id``: queue index given by HW in queue creation. + +Set Flow Engine Mode + + +Set the flow engine to active(0) or standby(1) mode with specific flags:: + testpmd> mlx5 set flow_engine <0|1> (flags) + +This command works for software steering only. +Default FDB jump should be disabled if switchdev is enabled. +The mode will propagate to all the probed ports. diff --git a/drivers/net/mlx5/mlx5_testpmd.c b/drivers/net/mlx5/mlx5_testpmd.c index 879ea2826e..cb0cafe1e9 100644 --- a/drivers/net/mlx5/mlx5_testpmd.c +++ b/drivers/net/mlx5/mlx5_testpmd.c @@ -561,6 +561,64 @@ cmdline_parse_inst_t mlx5_cmd_unmap_ext_rxq = { } }; +/* Set flow engine mode with flags command. */ +struct mlx5_cmd_set_flow_engine_mode { + cmdline_fixed_string_t mlx5; + cmdline_fixed_string_t set; + cmdline_fixed_string_t flow_engine; + uint8_t mode; + uint32_t flags; +}; + + +static void +mlx5_cmd_set_flow_engine_mode_parsed(void *parsed_result, +__rte_unused struct cmdline *cl, +__rte_unused void *data) +{ + struct mlx5_cmd_set_flow_engine_mode *res = parsed_result; + int ret; + + ret = rte_pmd_mlx5_flow_engine_set_mode((enum mlx5_flow_engine_mode)res->mode, res->flags); + + if (ret < 0) + fprintf(stderr, "Fail to set flow_engine to %s mode with flags %x, error %s\n", + !res->mode ? "active" : "standby", res->flags, strerror(-ret)); + else + TESTPMD_LOG(DEBUG, "Set %d ports flow_engine to %s mode with flags %x\n", ret, + !res->mode ? "active" : "standby", res->flags); +} + +cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_mlx5 = + TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, mlx5, +"mlx5"); +cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_set = + TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, set, +"set"); +cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_flow_engine = + TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, flow_engine, +"flow_engine"); +cmdline_parse_token_num_t mlx5_cmd_set_flow_engine_mode_mode = + TOKEN_NUM_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, mode, + RTE_UINT8); +cmdline_parse_token_num_t mlx5_cmd_set_flow_engine_mode_flags = + TOKEN_NUM_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, flags, + RTE_UINT32); + +cmdline_parse_inst_t mlx5_cmd_set_flow_engine_mode = { + .f = &mlx5_cmd_set_flow_engine_mode_parsed, + .data = NULL, + .help_str = "mlx5 set flow_engine <0|1> (flag)", + .tokens = { + (void *)&mlx5_cmd_set_flow_engine_mode_mlx5, + (void *)&mlx5_cmd_set_flow_engine_mode_set, + (void *)&mlx5_cmd_set_flow_engine_mode_flow_engine, + (void *)&mlx5_cmd_set_flow_engine_mode_mode, + (void *)&mlx5_cmd_set_flow_engine_mode_flags, + NULL, + } +}; + static struct testpmd_driver_commands mlx5_driver_cmds = { .commands = { { @@ -588,6 +646,11 @@ static struct testpmd_driver_commands mlx5_driver_cmds = { .help = "mlx5 port (port_id) ext_rxq unmap (sw_queue_id)\n" "Unmap external Rx queue ethdev index mapping\n\n", }, + { + .ctx = &mlx5_cmd_set_flow_engine_mode, + .help = "mlx5 set flow_engine (mode) (flag)\n" + "Set flow_engine to the specific mode with flags.\n\n" + }, { .ctx = NULL, }, -- 2.27.0
[PATCH v1 0/2] disallow duplicated tag index
Doesn't allow duplicated tag index matching. Rongwei Liu (2): net/mlx5: disallow duplicated tag index in pattern template net/mlx5: fix sws duplicated tag index matching drivers/net/mlx5/mlx5_flow_dv.c | 21 +++-- drivers/net/mlx5/mlx5_flow_hw.c | 25 + 2 files changed, 40 insertions(+), 6 deletions(-) -- 2.27.0
[PATCH v1 2/2] net/mlx5: fix sws duplicated tag index matching
Duplicated matching tag index is not allowed in sws because they are using the same matching field in the underlayer layout. For example: "tag index is 0 data spec 0x12 mask 0xff / tag index is 0 data spec 0x1234 mask 0x" is paradoxical matching condition and "tag index is 0 data spec 0x12 mask 0xff / tag index is 0 data spec 0x3400 mask 0xff00" should be "tag index is 0 data spec 0x3412 mask 0x" Add checking logic against it. Fixes: e554b672aa05 ("net/mlx5: support flow tag") Cc: viachesl...@nvidia.com Cc: sta...@dpdk.org Signed-off-by: Rongwei Liu Acked-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_flow_dv.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c index 1abc4acad7..db2bf615db 100644 --- a/drivers/net/mlx5/mlx5_flow_dv.c +++ b/drivers/net/mlx5/mlx5_flow_dv.c @@ -2294,6 +2294,8 @@ flow_dv_validate_item_meta(struct rte_eth_dev *dev __rte_unused, * Pointer to the rte_eth_dev structure. * @param[in] item * Item specification. + * @param[in] tag_bitmap + * Tag index bitmap. * @param[in] attr * Attributes of flow that includes this item. * @param[out] error @@ -2305,6 +2307,7 @@ flow_dv_validate_item_meta(struct rte_eth_dev *dev __rte_unused, static int flow_dv_validate_item_tag(struct rte_eth_dev *dev, const struct rte_flow_item *item, + uint32_t *tag_bitmap, const struct rte_flow_attr *attr __rte_unused, struct rte_flow_error *error) { @@ -2348,6 +2351,12 @@ flow_dv_validate_item_tag(struct rte_eth_dev *dev, if (ret < 0) return ret; MLX5_ASSERT(ret != REG_NON); + if (*tag_bitmap & (1 << ret)) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM_SPEC, + item->spec, + "Duplicated tag index"); + *tag_bitmap |= 1 << ret; return 0; } @@ -7290,9 +7299,10 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr, bool def_policy = false; bool shared_count = false; uint16_t udp_dport = 0; - uint32_t tag_id = 0; + uint32_t tag_id = 0, tag_bitmap = 0; const struct rte_flow_action_age *non_shared_age = NULL; const struct rte_flow_action_count *count = NULL; + const struct mlx5_rte_flow_item_tag *mlx5_tag; struct mlx5_priv *act_priv = NULL; int aso_after_sample = 0; @@ -7621,7 +7631,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr, last_item = MLX5_FLOW_LAYER_ICMP6; break; case RTE_FLOW_ITEM_TYPE_TAG: - ret = flow_dv_validate_item_tag(dev, items, + ret = flow_dv_validate_item_tag(dev, items, &tag_bitmap, attr, error); if (ret < 0) return ret; @@ -7631,6 +7641,13 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr, last_item = MLX5_FLOW_ITEM_SQ; break; case MLX5_RTE_FLOW_ITEM_TYPE_TAG: + mlx5_tag = (const struct mlx5_rte_flow_item_tag *)items->spec; + if (tag_bitmap & (1 << mlx5_tag->id)) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM_SPEC, + items->spec, + "Duplicated tag index"); + tag_bitmap |= 1 << mlx5_tag->id; break; case RTE_FLOW_ITEM_TYPE_GTP: ret = flow_dv_validate_item_gtp(dev, items, item_flags, -- 2.27.0
[PATCH v1 1/2] net/mlx5: disallow duplicated tag index in pattern template
Duplicated tag index in pattern template will most likely cause matching failures such as "template tag index is 0 data mask 0xff / tag index is 0 data mask 0x / end" If the upper layer application needs to match the same tag twice with different masks, it should be consolidated into one rte_item with the desired mask. "template tag index is 0 data mask 0xff / tag index is 0 data mask 0xff00 / end" should be present as "template tag index is 0 data mask 0x / end" Cc: sta...@dpdk.org Signed-off-by: Rongwei Liu Acked-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_flow_hw.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index 7e0ee8d883..78011584eb 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -4831,8 +4831,9 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev, struct rte_flow_error *error) { struct mlx5_priv *priv = dev->data->dev_private; - int i; + int i, tag_idx; bool items_end = false; + uint32_t tag_bitmap = 0; if (!attr->ingress && !attr->egress && !attr->transfer) return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL, @@ -4874,16 +4875,26 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev, switch (type) { case RTE_FLOW_ITEM_TYPE_TAG: { - int reg; const struct rte_flow_item_tag *tag = (const struct rte_flow_item_tag *)items[i].spec; - reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, tag->index); - if (reg == REG_NON) + if (tag == NULL) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, + "Tag spec is NULL"); + tag_idx = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, tag->index); + if (tag_idx == REG_NON) return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, "Unsupported tag index"); + if (tag_bitmap & (1 << tag_idx)) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, + NULL, + "Duplicated tag index"); + tag_bitmap |= 1 << tag_idx; break; } case MLX5_RTE_FLOW_ITEM_TYPE_TAG: @@ -4897,6 +4908,12 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, "Unsupported internal tag index"); + if (tag_bitmap & (1 << tag->index)) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, + NULL, + "Duplicated tag index"); + tag_bitmap |= 1 << tag->index; break; } case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT: -- 2.27.0
RE: [PATCH] examples/l3fwd: add hard code to collect empty poll and NIC counters
Thanks for the comment. > -Original Message- > From: Stephen Hemminger > Sent: Thursday, May 11, 2023 11:45 PM > To: Feifei Wang > Cc: dev@dpdk.org; nd ; Lijian Zhang > ; Ruifeng Wang ; > Honnappa Nagarahalli > Subject: Re: [PATCH] examples/l3fwd: add hard code to collect empty poll and > NIC counters > > On Thu, 11 May 2023 16:25:19 +0800 > Feifei Wang wrote: > > > This patch is to collect empty poll of 'rte_eth_rx_burst' functions in > > dpdk l3fwd application. Empty poll means Rx burst function receives no > > pkts in one loop. > > > > Furthermore, we also add 'nic_xstats_display' API to show NIC counters. > > > > Usage: > > With this patch, no special settings, just run l3fwd, and when you > > stoping l3fwd, thread will print the info above. > > > > Note: > > This patch has just a slight impact on performance and can be ignored. > > There was a set of other patches around telemetry. > Shouldn't this example use some of that rather than "roll your own"? But for 'empty poll', it seems have no API to collect this. By the way, would you please share the API or patch which can do like this? Thanks very much. > > > > > dpdk version:23.03 > > > > Suggested-by: Lijian Zhang > > Signed-off-by: Feifei Wang > > Reviewed-by: Ruifeng Wang > > Reviewed-by: Honnappa Nagarahalli > > --- > > examples/l3fwd/l3fwd.h | 68 > ++ > > examples/l3fwd/l3fwd_lpm.c | 26 +-- > > examples/l3fwd/main.c | 22 > > 3 files changed, 114 insertions(+), 2 deletions(-) > > > > diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index > > b55855c932..2b3fca62f3 100644 > > --- a/examples/l3fwd/l3fwd.h > > +++ b/examples/l3fwd/l3fwd.h > > @@ -56,6 +56,17 @@ > > #define L3FWD_HASH_ENTRIES (1024*1024*1) > > #endif > > > > +struct lcore_stats { > > +uint32_t nb_rx_pkts[16]; > > +uint32_t num_loop[16]; > > +uint32_t none_loop[16]; > > +uint32_t no_full_loop[16]; > > +float none_loop_per[16]; > > +float no_full_loop_per[16]; > > +} __rte_cache_aligned; > > What is the 16 magic value? I think here should be use MAX_RX_QUEUE_PER_LCORE to replace 16. > > Use double instead of float to keep more accuracy? Agree. > > There maybe holes in this structure? > > You may want to allocate it at runtime based on number of actual cores and > get it on the right NUMA node. Ok. So maybe each lcore is responsible to define this struct for itself? > > > + > > +extern struct lcore_stats stats[RTE_MAX_LCORE]; > > + > > struct parm_cfg { > > const char *rule_ipv4_name; > > const char *rule_ipv6_name; > > @@ -115,6 +126,63 @@ extern struct acl_algorithms acl_alg[]; > > > > extern uint32_t max_pkt_len; > > > > +static inline void > > +nic_xstats_display(uint32_t port_id) > > +{ > > +struct rte_eth_xstat *xstats; > > +int cnt_xstats, idx_xstat; > > +struct rte_eth_xstat_name *xstats_names; > > + > > +printf("## NIC extended statistics for port %-2d\n", port_id); > > +if (!rte_eth_dev_is_valid_port(port_id)) { > > +fprintf(stderr, "Error: Invalid port number %i\n", > > port_id); > > +return; > > +} > > + > > +/* Get count */ > > +cnt_xstats = rte_eth_xstats_get_names(port_id, NULL, 0); > > +if (cnt_xstats < 0) { > > +fprintf(stderr, "Error: Cannot get count of xstats\n"); > > +return; > > +} > > + > > +/* Get id-name lookup table */ > > +xstats_names = malloc(sizeof(struct rte_eth_xstat_name) * > > cnt_xstats); > > +if (xstats_names == NULL) { > > +fprintf(stderr, "Cannot allocate memory for xstats > > lookup\n"); > > +return; > > +} > > +if (cnt_xstats != rte_eth_xstats_get_names( > > +port_id, xstats_names, cnt_xstats)) { > > +fprintf(stderr, "Error: Cannot get xstats lookup\n"); > > +free(xstats_names); > > +return; > > +} > > + > > +/* Get stats themselves */ > > +xstats = malloc(sizeof(struct rte_eth_xstat) * cnt_xstats); > > +if (xstats == NULL) { > > +fprintf(stderr, "Cannot allocate memory for xstats\n"); > > +free(xstats_names); > > +return; > > +} > > +if (cnt_xstats != rte_eth_xstats_get(port_id, xstats, cnt_xstats)) > > { > > +fprintf(stderr, "Error: Unable to get xstats\n"); > > +free(xstats_names); > > +free(xstats); > > +return; > > +} > > + > > +/* Display xstats */ > > +for (idx_xstat = 0; idx_xstat < cnt_xstats; idx_xstat++) { > > +printf("%s: %"PRIu64"\n", > > +xstats_names[idx_xstat].name, > > +xstats[idx_xstat].value); > > +} > > +free(xstats_name
[PATCH v1 0/7] ethdev: modify field API for multiple headers
This patch-set extend the modify field action API to support both multiple MPLS and GENEVE option headers. In current API, the header type is provided by rte_flow_field_id enumeration and the encapsulation level (inner/outer/tunnel) is specified by data.level field. However, there is no way to specify header inside encapsulation level. For example, for this packet: eth / mpls / mpls / mpls / ipv4 / udp the both second and third MPLS headers cannot be modified using this API. RFC: https://patchwork.dpdk.org/project/dpdk/cover/20230420092145.522389-1-michae...@nvidia.com/ Michael Baum (7): doc: fix blank lines in modify field action description doc: fix blank line in asynchronous operations description doc: fix wrong indentation in RSS action description net/mlx5: reduce modify field encapsulation level size ethdev: add GENEVE TLV option modification support ethdev: add MPLS header modification support net/mlx5: add MPLS modify field support app/test-pmd/cmdline_flow.c| 70 - doc/guides/prog_guide/rte_flow.rst | 38 +++--- doc/guides/rel_notes/release_23_07.rst | 3 ++ drivers/common/mlx5/mlx5_prm.h | 5 ++ drivers/net/mlx5/mlx5_flow_dv.c| 23 drivers/net/mlx5/mlx5_flow_hw.c| 38 +++--- lib/ethdev/rte_flow.h | 72 -- 7 files changed, 219 insertions(+), 30 deletions(-) -- 2.25.1
[PATCH v1 1/7] doc: fix blank lines in modify field action description
The modify field action description inside "Generic flow API (rte_flow)" documentation, lists all operations supported for a destination field. In addition, it lists the values supported for a encapsulation level field. Before the lists, in both cases, miss a blank line causing them to look regular text lines. This patch adds the blank lines. Fixes: 73b68f4c54a0 ("ethdev: introduce generic modify flow action") Cc: akozy...@nvidia.com Cc: sta...@dpdk.org Signed-off-by: Michael Baum --- doc/guides/prog_guide/rte_flow.rst | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst index 32fc45516a..e7faa368a1 100644 --- a/doc/guides/prog_guide/rte_flow.rst +++ b/doc/guides/prog_guide/rte_flow.rst @@ -2917,20 +2917,23 @@ The immediate value ``RTE_FLOW_FIELD_VALUE`` (or a pointer to it ``RTE_FLOW_FIELD_START`` is used to point to the beginning of a packet. See ``enum rte_flow_field_id`` for the list of supported fields. -``op`` selects the operation to perform on a destination field. +``op`` selects the operation to perform on a destination field: + - ``set`` copies the data from ``src`` field to ``dst`` field. - ``add`` adds together ``dst`` and ``src`` and stores the result into ``dst``. -- ``sub`` subtracts ``src`` from ``dst`` and stores the result into ``dst`` +- ``sub`` subtracts ``src`` from ``dst`` and stores the result into ``dst``. ``width`` defines a number of bits to use from ``src`` field. ``level`` is used to access any packet field on any encapsulation level -as well as any tag element in the tag array. -- ``0`` means the default behaviour. Depending on the packet type, it can -mean outermost, innermost or anything in between. +as well as any tag element in the tag array: + +- ``0`` means the default behaviour. Depending on the packet type, + it can mean outermost, innermost or anything in between. - ``1`` requests access to the outermost packet encapsulation level. - ``2`` and subsequent values requests access to the specified packet -encapsulation level, from outermost to innermost (lower to higher values). + encapsulation level, from outermost to innermost (lower to higher values). + For the tag array (in case of multiple tags are supported and present) ``level`` translates directly into the array index. -- 2.25.1
[PATCH v1 2/7] doc: fix blank line in asynchronous operations description
The asynchronous operations description inside "Generic flow API (rte_flow)" documentation, adds some bullets to describe asynchronous operations behavior. Before the first bullet, miss a blank line causing it to look a regular text line. This patch adds the blank line. Fixes: 197e820c6685 ("ethdev: bring in async queue-based flow rules operations") Cc: akozy...@nvidia.com Cc: sta...@dpdk.org Signed-off-by: Michael Baum --- doc/guides/prog_guide/rte_flow.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst index e7faa368a1..76e69190fc 100644 --- a/doc/guides/prog_guide/rte_flow.rst +++ b/doc/guides/prog_guide/rte_flow.rst @@ -3702,6 +3702,7 @@ Asynchronous operations --- Flow rules management can be done via special lockless flow management queues. + - Queue operations are asynchronous and not thread-safe. - Operations can thus be invoked by the app's datapath, -- 2.25.1
[PATCH v1 3/7] doc: fix wrong indentation in RSS action description
The RSS action description inside "Generic flow API (rte_flow)" documentation, lists the values supported for a encapsulation level field. For "2" value, it uses 3 spaces as an indentation instead of 2 after line breaking, causing the first line to be bold. This patch updates the number of spaces in the indentation. Fixes: 18aee2861a1f ("ethdev: add encap level to RSS flow API action") Cc: adrien.mazarg...@6wind.com Cc: sta...@dpdk.org Signed-off-by: Michael Baum --- doc/guides/prog_guide/rte_flow.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst index 76e69190fc..25b57bf86d 100644 --- a/doc/guides/prog_guide/rte_flow.rst +++ b/doc/guides/prog_guide/rte_flow.rst @@ -1954,8 +1954,8 @@ Also, regarding packet encapsulation ``level``: level. - ``2`` and subsequent values request RSS to be performed on the specified - inner packet encapsulation level, from outermost to innermost (lower to - higher values). + inner packet encapsulation level, from outermost to innermost (lower to + higher values). Values other than ``0`` are not necessarily supported. -- 2.25.1
[PATCH v1 4/7] net/mlx5: reduce modify field encapsulation level size
The type of "level" field in "rte_flow_action_modify_data" structure is uint32_t for now, but it is going to be changed to uint8_t in the next patch. For representing encapsulation level, 8 bits are more than enough and this change shouldn't affect the current implementation. However, when action template is created, the PMD requests to provide this field "fully masked" in action mask. The "fully masked" value is different between uint32_t and uint8_t types. This patch reduces all modify field encapsulation level "fully masked" initializations to use UINT8_MAX instead of UINT32_MAX. This change will avoid compilation warning after it will be changed to uint8_t by API. Signed-off-by: Michael Baum --- drivers/net/mlx5/mlx5_flow_hw.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index 7e0ee8d883..1b68a19900 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -3565,7 +3565,7 @@ flow_hw_validate_action_modify_field(const struct rte_flow_action *action, return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION, action, "immediate value, pointer and hash result cannot be used as destination"); - if (mask_conf->dst.level != UINT32_MAX) + if (mask_conf->dst.level != UINT8_MAX) return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION, action, "destination encapsulation level must be fully masked"); @@ -3579,7 +3579,7 @@ flow_hw_validate_action_modify_field(const struct rte_flow_action *action, "destination field mask and template are not equal"); if (action_conf->src.field != RTE_FLOW_FIELD_POINTER && action_conf->src.field != RTE_FLOW_FIELD_VALUE) { - if (mask_conf->src.level != UINT32_MAX) + if (mask_conf->src.level != UINT8_MAX) return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION, action, "source encapsulation level must be fully masked"); @@ -4450,7 +4450,7 @@ flow_hw_set_vlan_vid(struct rte_eth_dev *dev, .operation = RTE_FLOW_MODIFY_SET, .dst = { .field = RTE_FLOW_FIELD_VLAN_ID, - .level = 0x, .offset = 0x, + .level = 0xff, .offset = 0x, }, .src = { .field = RTE_FLOW_FIELD_VALUE, @@ -4583,12 +4583,12 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev, .operation = RTE_FLOW_MODIFY_SET, .dst = { .field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG, - .level = UINT32_MAX, + .level = UINT8_MAX, .offset = UINT32_MAX, }, .src = { .field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG, - .level = UINT32_MAX, + .level = UINT8_MAX, .offset = UINT32_MAX, }, .width = UINT32_MAX, @@ -5653,7 +5653,7 @@ flow_hw_create_tx_repr_tag_jump_acts_tmpl(struct rte_eth_dev *dev) .operation = RTE_FLOW_MODIFY_SET, .dst = { .field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG, - .level = UINT32_MAX, + .level = UINT8_MAX, .offset = UINT32_MAX, }, .src = { @@ -5677,12 +5677,12 @@ flow_hw_create_tx_repr_tag_jump_acts_tmpl(struct rte_eth_dev *dev) .operation = RTE_FLOW_MODIFY_SET, .dst = { .field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG, - .level = UINT32_MAX, + .level = UINT8_MAX, .offset = UINT32_MAX, }, .src = { .field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG, - .level = UINT32_MAX, + .level = UINT8_MAX, .offset = UINT32_MAX, }, .width = UINT32_MAX, @@ -6009,7 +6009,7 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev) .operation = RTE_FLOW_MODIFY_SET, .dst = { .field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG, - .level = UINT32_MAX, + .level = UINT8_MAX, .offset = UINT32_MAX, }, .src = { @@ -618
[PATCH v1 6/7] ethdev: add MPLS header modification support
Add support for MPLS modify header using "RTE_FLOW_FIELD_MPLS" id. Since MPLS heaser might appear more the one time in inner/outer/tunnel, a new field was added to "rte_flow_action_modify_data" structure in addition to "level" field. The "sub_level" field is the index of the header inside encapsulation level. It is used for modify multiple MPLS headers in same encapsulation level. This addition enables to modify multiple VLAN headers too, so the description of "RTE_FLOW_FIELD_VLAN_" was updated. Signed-off-by: Michael Baum --- app/test-pmd/cmdline_flow.c| 24 ++- doc/guides/prog_guide/rte_flow.rst | 6 lib/ethdev/rte_flow.h | 47 -- 3 files changed, 61 insertions(+), 16 deletions(-) diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c index 8c1dea53c0..5370be1e3c 100644 --- a/app/test-pmd/cmdline_flow.c +++ b/app/test-pmd/cmdline_flow.c @@ -636,6 +636,7 @@ enum index { ACTION_MODIFY_FIELD_DST_TYPE_VALUE, ACTION_MODIFY_FIELD_DST_LEVEL, ACTION_MODIFY_FIELD_DST_LEVEL_VALUE, + ACTION_MODIFY_FIELD_DST_SUB_LEVEL, ACTION_MODIFY_FIELD_DST_TYPE_ID, ACTION_MODIFY_FIELD_DST_CLASS_ID, ACTION_MODIFY_FIELD_DST_OFFSET, @@ -643,6 +644,7 @@ enum index { ACTION_MODIFY_FIELD_SRC_TYPE_VALUE, ACTION_MODIFY_FIELD_SRC_LEVEL, ACTION_MODIFY_FIELD_SRC_LEVEL_VALUE, + ACTION_MODIFY_FIELD_SRC_SUB_LEVEL, ACTION_MODIFY_FIELD_SRC_TYPE_ID, ACTION_MODIFY_FIELD_SRC_CLASS_ID, ACTION_MODIFY_FIELD_SRC_OFFSET, @@ -859,7 +861,7 @@ static const char *const modify_field_ids[] = { "ipv6_proto", "flex_item", "hash_result", - "geneve_opt_type", "geneve_opt_class", "geneve_opt_data", + "geneve_opt_type", "geneve_opt_class", "geneve_opt_data", "mpls", NULL }; @@ -2301,6 +2303,7 @@ static const enum index next_action_sample[] = { static const enum index action_modify_field_dst[] = { ACTION_MODIFY_FIELD_DST_LEVEL, + ACTION_MODIFY_FIELD_DST_SUB_LEVEL, ACTION_MODIFY_FIELD_DST_TYPE_ID, ACTION_MODIFY_FIELD_DST_CLASS_ID, ACTION_MODIFY_FIELD_DST_OFFSET, @@ -2310,6 +2313,7 @@ static const enum index action_modify_field_dst[] = { static const enum index action_modify_field_src[] = { ACTION_MODIFY_FIELD_SRC_LEVEL, + ACTION_MODIFY_FIELD_SRC_SUB_LEVEL, ACTION_MODIFY_FIELD_SRC_TYPE_ID, ACTION_MODIFY_FIELD_SRC_CLASS_ID, ACTION_MODIFY_FIELD_SRC_OFFSET, @@ -6398,6 +6402,15 @@ static const struct token token_list[] = { .call = parse_vc_modify_field_level, .comp = comp_none, }, + [ACTION_MODIFY_FIELD_DST_SUB_LEVEL] = { + .name = "dst_sub_level", + .help = "destination field sub level", + .next = NEXT(action_modify_field_dst, +NEXT_ENTRY(COMMON_UNSIGNED)), + .args = ARGS(ARGS_ENTRY(struct rte_flow_action_modify_field, + dst.sub_level)), + .call = parse_vc_conf, + }, [ACTION_MODIFY_FIELD_DST_TYPE_ID] = { .name = "dst_type_id", .help = "destination field type ID", @@ -6451,6 +6464,15 @@ static const struct token token_list[] = { .call = parse_vc_modify_field_level, .comp = comp_none, }, + [ACTION_MODIFY_FIELD_SRC_SUB_LEVEL] = { + .name = "stc_sub_level", + .help = "source field sub level", + .next = NEXT(action_modify_field_src, +NEXT_ENTRY(COMMON_UNSIGNED)), + .args = ARGS(ARGS_ENTRY(struct rte_flow_action_modify_field, + src.sub_level)), + .call = parse_vc_conf, + }, [ACTION_MODIFY_FIELD_SRC_TYPE_ID] = { .name = "src_type_id", .help = "source field type ID", diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst index cd38f0de46..1f681a38e4 100644 --- a/doc/guides/prog_guide/rte_flow.rst +++ b/doc/guides/prog_guide/rte_flow.rst @@ -2937,6 +2937,10 @@ as well as any tag element in the tag array: For the tag array (in case of multiple tags are supported and present) ``level`` translates directly into the array index. +- ``sub_level`` is the index of the header inside encapsulation level. + It is used for modify either ``VLAN`` or ``MPLS`` headers which multiple of + them might be supported in same encapsulation level. + ``type`` is used to specify (along with ``class_id``) the Geneve option which is being modified. This field is relevant only for ``RTE_FLOW_FIELD_GENEVE_OPT_`` type. @@ -3002,6 +3006,8 @@ value as sequence of bytes {xxx, xxx, 0x85, xxx, xxx, xxx}. +-+--+ | ``lev
[PATCH v1 5/7] ethdev: add GENEVE TLV option modification support
Add modify field support for GENEVE option fields: - "RTE_FLOW_FIELD_GENEVE_OPT_TYPE" - "RTE_FLOW_FIELD_GENEVE_OPT_CLASS" - "RTE_FLOW_FIELD_GENEVE_OPT_DATA" Each GENEVE TLV option is identified by both its "class" and "type", so 2 new fields were added to "rte_flow_action_modify_data" structure to help specify which option to modify. To get room for those 2 new fields, the "level" field move to use "uint8_t" which is more than enough for encapsulation level. Signed-off-by: Michael Baum --- app/test-pmd/cmdline_flow.c| 48 +++- doc/guides/prog_guide/rte_flow.rst | 12 ++ doc/guides/rel_notes/release_23_07.rst | 3 ++ lib/ethdev/rte_flow.h | 51 +- 4 files changed, 112 insertions(+), 2 deletions(-) diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c index 58939ec321..8c1dea53c0 100644 --- a/app/test-pmd/cmdline_flow.c +++ b/app/test-pmd/cmdline_flow.c @@ -636,11 +636,15 @@ enum index { ACTION_MODIFY_FIELD_DST_TYPE_VALUE, ACTION_MODIFY_FIELD_DST_LEVEL, ACTION_MODIFY_FIELD_DST_LEVEL_VALUE, + ACTION_MODIFY_FIELD_DST_TYPE_ID, + ACTION_MODIFY_FIELD_DST_CLASS_ID, ACTION_MODIFY_FIELD_DST_OFFSET, ACTION_MODIFY_FIELD_SRC_TYPE, ACTION_MODIFY_FIELD_SRC_TYPE_VALUE, ACTION_MODIFY_FIELD_SRC_LEVEL, ACTION_MODIFY_FIELD_SRC_LEVEL_VALUE, + ACTION_MODIFY_FIELD_SRC_TYPE_ID, + ACTION_MODIFY_FIELD_SRC_CLASS_ID, ACTION_MODIFY_FIELD_SRC_OFFSET, ACTION_MODIFY_FIELD_SRC_VALUE, ACTION_MODIFY_FIELD_SRC_POINTER, @@ -854,7 +858,9 @@ static const char *const modify_field_ids[] = { "ipv4_ecn", "ipv6_ecn", "gtp_psc_qfi", "meter_color", "ipv6_proto", "flex_item", - "hash_result", NULL + "hash_result", + "geneve_opt_type", "geneve_opt_class", "geneve_opt_data", + NULL }; static const char *const meter_colors[] = { @@ -2295,6 +2301,8 @@ static const enum index next_action_sample[] = { static const enum index action_modify_field_dst[] = { ACTION_MODIFY_FIELD_DST_LEVEL, + ACTION_MODIFY_FIELD_DST_TYPE_ID, + ACTION_MODIFY_FIELD_DST_CLASS_ID, ACTION_MODIFY_FIELD_DST_OFFSET, ACTION_MODIFY_FIELD_SRC_TYPE, ZERO, @@ -2302,6 +2310,8 @@ static const enum index action_modify_field_dst[] = { static const enum index action_modify_field_src[] = { ACTION_MODIFY_FIELD_SRC_LEVEL, + ACTION_MODIFY_FIELD_SRC_TYPE_ID, + ACTION_MODIFY_FIELD_SRC_CLASS_ID, ACTION_MODIFY_FIELD_SRC_OFFSET, ACTION_MODIFY_FIELD_SRC_VALUE, ACTION_MODIFY_FIELD_SRC_POINTER, @@ -6388,6 +6398,24 @@ static const struct token token_list[] = { .call = parse_vc_modify_field_level, .comp = comp_none, }, + [ACTION_MODIFY_FIELD_DST_TYPE_ID] = { + .name = "dst_type_id", + .help = "destination field type ID", + .next = NEXT(action_modify_field_dst, +NEXT_ENTRY(COMMON_UNSIGNED)), + .args = ARGS(ARGS_ENTRY(struct rte_flow_action_modify_field, + dst.type)), + .call = parse_vc_conf, + }, + [ACTION_MODIFY_FIELD_DST_CLASS_ID] = { + .name = "dst_class", + .help = "destination field class ID", + .next = NEXT(action_modify_field_dst, +NEXT_ENTRY(COMMON_UNSIGNED)), + .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_action_modify_field, +dst.class_id)), + .call = parse_vc_conf, + }, [ACTION_MODIFY_FIELD_DST_OFFSET] = { .name = "dst_offset", .help = "destination field bit offset", @@ -6423,6 +6451,24 @@ static const struct token token_list[] = { .call = parse_vc_modify_field_level, .comp = comp_none, }, + [ACTION_MODIFY_FIELD_SRC_TYPE_ID] = { + .name = "src_type_id", + .help = "source field type ID", + .next = NEXT(action_modify_field_src, +NEXT_ENTRY(COMMON_UNSIGNED)), + .args = ARGS(ARGS_ENTRY(struct rte_flow_action_modify_field, + src.type)), + .call = parse_vc_conf, + }, + [ACTION_MODIFY_FIELD_SRC_CLASS_ID] = { + .name = "src_class", + .help = "source field class ID", + .next = NEXT(action_modify_field_src, +NEXT_ENTRY(COMMON_UNSIGNED)), + .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_action_modify_field, +src.class_id)), + .call = parse_vc_conf, + }, [ACTION_MODIFY_FIELD_SRC_OFFSET] = { .name = "src_offset",
[PATCH v1 7/7] net/mlx5: add MPLS modify field support
Add support for modify field in tunnel MPLS header. For now it is supported only to copy from. Signed-off-by: Michael Baum --- drivers/common/mlx5/mlx5_prm.h | 5 + drivers/net/mlx5/mlx5_flow_dv.c | 23 +++ drivers/net/mlx5/mlx5_flow_hw.c | 16 +--- 3 files changed, 37 insertions(+), 7 deletions(-) diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h index ed3d5efbb7..04c1400a1e 100644 --- a/drivers/common/mlx5/mlx5_prm.h +++ b/drivers/common/mlx5/mlx5_prm.h @@ -787,6 +787,11 @@ enum mlx5_modification_field { MLX5_MODI_TUNNEL_HDR_DW_1 = 0x75, MLX5_MODI_GTPU_FIRST_EXT_DW_0 = 0x76, MLX5_MODI_HASH_RESULT = 0x81, + MLX5_MODI_IN_MPLS_LABEL_0 = 0x8a, + MLX5_MODI_IN_MPLS_LABEL_1, + MLX5_MODI_IN_MPLS_LABEL_2, + MLX5_MODI_IN_MPLS_LABEL_3, + MLX5_MODI_IN_MPLS_LABEL_4, MLX5_MODI_OUT_IPV6_NEXT_HDR = 0x4A, MLX5_MODI_INVALID = INT_MAX, }; diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c index f136f43b0a..93cce16a1e 100644 --- a/drivers/net/mlx5/mlx5_flow_dv.c +++ b/drivers/net/mlx5/mlx5_flow_dv.c @@ -1388,6 +1388,7 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev, case RTE_FLOW_FIELD_GENEVE_VNI: return 24; case RTE_FLOW_FIELD_GTP_TEID: + case RTE_FLOW_FIELD_MPLS: case RTE_FLOW_FIELD_TAG: return 32; case RTE_FLOW_FIELD_MARK: @@ -1435,6 +1436,12 @@ flow_modify_info_mask_32_masked(uint32_t length, uint32_t off, uint32_t post_mas return rte_cpu_to_be_32(mask & post_mask); } +static __rte_always_inline enum mlx5_modification_field +mlx5_mpls_modi_field_get(const struct rte_flow_action_modify_data *data) +{ + return MLX5_MODI_IN_MPLS_LABEL_0 + data->sub_level; +} + static void mlx5_modify_flex_item(const struct rte_eth_dev *dev, const struct mlx5_flex_item *flex, @@ -1893,6 +1900,16 @@ mlx5_flow_field_id_to_modify_info else info[idx].offset = off_be; break; + case RTE_FLOW_FIELD_MPLS: + MLX5_ASSERT(data->offset + width <= 32); + off_be = 32 - (data->offset + width); + info[idx] = (struct field_modify_info){4, 0, + mlx5_mpls_modi_field_get(data)}; + if (mask) + mask[idx] = flow_modify_info_mask_32(width, off_be); + else + info[idx].offset = off_be; + break; case RTE_FLOW_FIELD_TAG: { MLX5_ASSERT(data->offset + width <= 32); @@ -5344,6 +5361,12 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev, RTE_FLOW_ERROR_TYPE_ACTION, action, "modifications of the GENEVE Network" " Identifier is not supported"); + if (action_modify_field->dst.field == RTE_FLOW_FIELD_MPLS || + action_modify_field->src.field == RTE_FLOW_FIELD_MPLS) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ACTION, action, + "modifications of the MPLS header " + "is not supported"); if (action_modify_field->dst.field == RTE_FLOW_FIELD_MARK || action_modify_field->src.field == RTE_FLOW_FIELD_MARK) if (config->dv_xmeta_en == MLX5_XMETA_MODE_LEGACY || diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index 1b68a19900..80e6398992 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -3546,10 +3546,8 @@ flow_hw_validate_action_modify_field(const struct rte_flow_action *action, const struct rte_flow_action *mask, struct rte_flow_error *error) { - const struct rte_flow_action_modify_field *action_conf = - action->conf; - const struct rte_flow_action_modify_field *mask_conf = - mask->conf; + const struct rte_flow_action_modify_field *action_conf = action->conf; + const struct rte_flow_action_modify_field *mask_conf = mask->conf; if (action_conf->operation != mask_conf->operation) return rte_flow_error_set(error, EINVAL, @@ -3604,6 +3602,11 @@ flow_hw_validate_action_modify_field(const struct rte_flow_action *action, return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION, action, "modifying Geneve VNI is not supported"); + /* Due to HW bug, tunnel MPLS header is read only. */ + if (action_conf->dst.field == RTE_FLOW_FIELD_MPLS) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_A