Re: [PATCH] net/dpaa2: change threshold value

2023-05-15 Thread Sachin Saxena (OSS)

On 5/8/2023 4:11 PM, Tianli Lai wrote:

Caution: This is an external email. Please take care when clicking links or 
opening attachments. When in doubt, report the message using the 'Report this 
email' button


this threshold value can be changed with function argument nb_rx_desc.

Signed-off-by: Tianli Lai 
---
  drivers/net/dpaa2/dpaa2_ethdev.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index 679f33ae1a..ad892ded4a 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -829,7 +829,7 @@ dpaa2_dev_rx_queue_setup(struct rte_eth_dev *dev,
 dpaa2_q->cgid, &taildrop);
 } else {
 /*enabling per rx queue congestion control */
-   taildrop.threshold = CONG_THRESHOLD_RX_BYTES_Q;
+   taildrop.threshold = nb_rx_desc * 1024;
 taildrop.units = DPNI_CONGESTION_UNIT_BYTES;
 taildrop.oal = CONG_RX_OAL;
 DPAA2_PMD_DEBUG("Enabling Byte based Drop on queue= 
%d",
--
2.27.0


Hi Tianli,

The number of bytes based tail drop threshold value 
"CONG_THRESHOLD_RX_BYTES_Q" is an optimized value for dpaa2 platform. we 
concluded this value after multiple benchmark experiments in past. 
Although, number of frame based threshold value is "nb_rx_desc" based only.

We will further review this suggestion and get back.

--
Thanks,
Sachin Saxena
(NXP)



RE: [v1 1/3] net/mlx5/hws: support dest root table action

2023-05-15 Thread Matan Azrad


From: Hamdan Igbaria
> Add support for dest root table action creation.
> This support will allow to redirect the packets to the kernel from non root
> tables.
> 
> Signed-off-by: Hamdan Igbaria 
> Reviewed-by: Alex Vesker 
Series-acked-by: Matan Azrad 


[PATCH v2 0/4] Support VFIO sparse mmap in PCI bus

2023-05-15 Thread Miao Li
This series introduces a VFIO standard capability, called sparse
mmap to PCI bus. In linux kernel, it's defined as
VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
mmap whole BAR region into DPDK process, only mmap part of the
BAR region after getting sparse mmap information from kernel.
For the rest of BAR region that is not mmap-ed, DPDK process
can use pread/pwrite system calls to access. Sparse mmap is
useful when kernel does not want userspace to mmap whole BAR
region, or kernel wants to control over access to specific BAR
region. Vendors can choose to enable this feature or not for
their devices in their specific kernel modules.

In this patchset:

Patch 1-3 is mainly for introducing BAR access APIs so that
driver could use them to access specific BAR using pread/pwrite
system calls when part of the BAR is not mmap-able. Patch 4
adds the VFIO sparse mmap support finally.

v2:
1. add PCI device internal structure in bus/pci/windows/pci.c
2. fix parameter type error

Chenbo Xia (3):
  bus/pci: introduce an internal representation of PCI device
  bus/pci: avoid depending on private value in kernel source
  bus/pci: introduce helper for MMIO read and write

Miao Li (1):
  bus/pci: add VFIO sparse mmap support

 drivers/bus/pci/bsd/pci.c|  35 +++-
 drivers/bus/pci/linux/pci.c  |  78 +--
 drivers/bus/pci/linux/pci_init.h |  14 +-
 drivers/bus/pci/linux/pci_uio.c  |  22 ++
 drivers/bus/pci/linux/pci_vfio.c | 335 +--
 drivers/bus/pci/pci_common.c |  12 +-
 drivers/bus/pci/private.h|  25 ++-
 drivers/bus/pci/rte_bus_pci.h|  48 +
 drivers/bus/pci/version.map  |   3 +
 drivers/bus/pci/windows/pci.c|  14 +-
 lib/eal/include/rte_vfio.h   |   1 -
 11 files changed, 491 insertions(+), 96 deletions(-)

-- 
2.25.1



[PATCH v2 1/4] bus/pci: introduce an internal representation of PCI device

2023-05-15 Thread Miao Li
From: Chenbo Xia 

This patch introduces an internal representation of the PCI device
which will be used to store the internal information that don't have
to be exposed to drivers, e.g., the VFIO region sizes/offsets.

In this patch, the internal structure is simply a wrapper of the
rte_pci_device structure. More fields will be added.

Signed-off-by: Chenbo Xia 
---
 drivers/bus/pci/bsd/pci.c | 13 -
 drivers/bus/pci/linux/pci.c   | 28 
 drivers/bus/pci/pci_common.c  | 12 ++--
 drivers/bus/pci/private.h | 14 +-
 drivers/bus/pci/windows/pci.c | 14 +-
 5 files changed, 52 insertions(+), 29 deletions(-)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 7459d15c7e..a747eca58c 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -208,16 +208,19 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, 
int res_idx,
 static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 {
+   struct rte_pci_device_internal *pdev;
struct rte_pci_device *dev;
struct pci_bar_io bar;
unsigned i, max;
 
-   dev = malloc(sizeof(*dev));
-   if (dev == NULL) {
+   pdev = malloc(sizeof(*pdev));
+   if (pdev == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci 
device\n");
return -1;
}
 
-   memset(dev, 0, sizeof(*dev));
+   memset(pdev, 0, sizeof(*pdev));
+   dev = &pdev->device;
dev->device.bus = &rte_pci_bus.bus;
 
dev->addr.domain = conf->pc_sel.pc_domain;
@@ -303,7 +306,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
memmove(dev2->mem_resource,
dev->mem_resource,
sizeof(dev->mem_resource));
-   pci_free(dev);
+   pci_free(pdev);
}
return 0;
}
@@ -313,7 +316,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
return 0;
 
 skipdev:
-   pci_free(dev);
+   pci_free(pdev);
return 0;
 }
 
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index ebd1395502..4c2c5ba382 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -211,22 +211,26 @@ pci_scan_one(const char *dirname, const struct 
rte_pci_addr *addr)
 {
char filename[PATH_MAX];
unsigned long tmp;
+   struct rte_pci_device_internal *pdev;
struct rte_pci_device *dev;
char driver[PATH_MAX];
int ret;
 
-   dev = malloc(sizeof(*dev));
-   if (dev == NULL)
+   pdev = malloc(sizeof(*pdev));
+   if (pdev == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci 
device\n");
return -1;
+   }
 
-   memset(dev, 0, sizeof(*dev));
+   memset(pdev, 0, sizeof(*pdev));
+   dev = &pdev->device;
dev->device.bus = &rte_pci_bus.bus;
dev->addr = *addr;
 
/* get vendor id */
snprintf(filename, sizeof(filename), "%s/vendor", dirname);
if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-   pci_free(dev);
+   pci_free(pdev);
return -1;
}
dev->id.vendor_id = (uint16_t)tmp;
@@ -234,7 +238,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr 
*addr)
/* get device id */
snprintf(filename, sizeof(filename), "%s/device", dirname);
if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-   pci_free(dev);
+   pci_free(pdev);
return -1;
}
dev->id.device_id = (uint16_t)tmp;
@@ -243,7 +247,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr 
*addr)
snprintf(filename, sizeof(filename), "%s/subsystem_vendor",
 dirname);
if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-   pci_free(dev);
+   pci_free(pdev);
return -1;
}
dev->id.subsystem_vendor_id = (uint16_t)tmp;
@@ -252,7 +256,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr 
*addr)
snprintf(filename, sizeof(filename), "%s/subsystem_device",
 dirname);
if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-   pci_free(dev);
+   pci_free(pdev);
return -1;
}
dev->id.subsystem_device_id = (uint16_t)tmp;
@@ -261,7 +265,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr 
*addr)
snprintf(filename, sizeof(filename), "%s/class",
 dirname);
if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-   pci_free(dev);
+   pci_free(pdev);
return -1;
}
/* the least 24 bits are valid: class, subclass, program interface */
@@ -297,7 +

[PATCH v2 2/4] bus/pci: avoid depending on private value in kernel source

2023-05-15 Thread Miao Li
From: Chenbo Xia 

The value 40 used in VFIO_GET_REGION_ADDR() is a private value
(VFIO_PCI_OFFSET_SHIFT) defined in Linux kernel source [1]. It
is not part of VFIO API, and we should not depend on it.

[1] https://github.com/torvalds/linux/blob/v6.2/include/linux/vfio_pci_core.h

Signed-off-by: Chenbo Xia 
---
 drivers/bus/pci/linux/pci.c  |   4 +-
 drivers/bus/pci/linux/pci_init.h |   4 +-
 drivers/bus/pci/linux/pci_vfio.c | 195 +++
 drivers/bus/pci/private.h|   9 ++
 lib/eal/include/rte_vfio.h   |   1 -
 5 files changed, 158 insertions(+), 55 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 4c2c5ba382..04e21ae20f 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -645,7 +645,7 @@ int rte_pci_read_config(const struct rte_pci_device *device,
return pci_uio_read_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
case RTE_PCI_KDRV_VFIO:
-   return pci_vfio_read_config(intr_handle, buf, len, offset);
+   return pci_vfio_read_config(device, buf, len, offset);
 #endif
default:
rte_pci_device_name(&device->addr, devname,
@@ -669,7 +669,7 @@ int rte_pci_write_config(const struct rte_pci_device 
*device,
return pci_uio_write_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
case RTE_PCI_KDRV_VFIO:
-   return pci_vfio_write_config(intr_handle, buf, len, offset);
+   return pci_vfio_write_config(device, buf, len, offset);
 #endif
default:
rte_pci_device_name(&device->addr, devname,
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index dcea726186..9f6659ba6e 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -66,9 +66,9 @@ int pci_uio_ioport_unmap(struct rte_pci_ioport *p);
 #endif
 
 /* access config space */
-int pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_read_config(const struct rte_pci_device *dev,
 void *buf, size_t len, off_t offs);
-int pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_write_config(const struct rte_pci_device *dev,
  const void *buf, size_t len, off_t offs);
 
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index fab3483d9f..1748ad2ae0 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -43,45 +43,82 @@ static struct rte_tailq_elem rte_vfio_tailq = {
 };
 EAL_REGISTER_TAILQ(rte_vfio_tailq)
 
+static int
+pci_vfio_get_region(const struct rte_pci_device *dev, int index,
+   uint64_t *size, uint64_t *offset)
+{
+   const struct rte_pci_device_internal *pdev =
+   RTE_PCI_DEVICE_INTERNAL_CONST(dev);
+
+   if (index >= VFIO_PCI_NUM_REGIONS || index >= RTE_MAX_PCI_REGIONS)
+   return -1;
+
+   if (pdev->region[index].size == 0 && pdev->region[index].offset == 0)
+   return -1;
+
+   *size   = pdev->region[index].size;
+   *offset = pdev->region[index].offset;
+
+   return 0;
+}
+
 int
-pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_read_config(const struct rte_pci_device *dev,
void *buf, size_t len, off_t offs)
 {
-   int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+   uint64_t size, offset;
+   int fd;
 
-   if (vfio_dev_fd < 0)
+   fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+   if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+   &size, &offset) != 0)
+   return -1;
+
+   if ((uint64_t)len + offs > size)
return -1;
 
-   return pread64(vfio_dev_fd, buf, len,
-  VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+   return pread64(fd, buf, len, offset + offs);
 }
 
 int
-pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_write_config(const struct rte_pci_device *dev,
const void *buf, size_t len, off_t offs)
 {
-   int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+   uint64_t size, offset;
+   int fd;
 
-   if (vfio_dev_fd < 0)
+   fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+   if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+   &size, &offset) != 0)
return -1;
 
-   return pwrite64(vfio_dev_fd, buf, len,
-  VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+   if ((uint64_t)len + offs > size)
+   return -1;
+
+   return pwrite64(fd, buf, len, offset + offs);
 }
 
 /* get PCI BAR number where MSI-X interrupts are */
 static int
-pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
+pci_vfi

[PATCH v2 3/4] bus/pci: introduce helper for MMIO read and write

2023-05-15 Thread Miao Li
From: Chenbo Xia 

The MMIO regions may not be mmap-able for VFIO-PCI devices.
In this case, the driver should explicitly do read and write
to access these regions.

Signed-off-by: Chenbo Xia 
---
 drivers/bus/pci/bsd/pci.c| 22 +++
 drivers/bus/pci/linux/pci.c  | 46 ++
 drivers/bus/pci/linux/pci_init.h | 10 +++
 drivers/bus/pci/linux/pci_uio.c  | 22 +++
 drivers/bus/pci/linux/pci_vfio.c | 36 
 drivers/bus/pci/rte_bus_pci.h| 48 
 drivers/bus/pci/version.map  |  3 ++
 7 files changed, 187 insertions(+)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index a747eca58c..27f12590d4 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -489,6 +489,28 @@ int rte_pci_write_config(const struct rte_pci_device *dev,
return -1;
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *dev, int bar,
+ void *buf, size_t len, off_t offset)
+{
+   if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+   (uint64_t)offset + len > dev->mem_resource[bar].len)
+   return -1;
+   memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+   return len;
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *dev, int bar,
+  const void *buf, size_t len, off_t offset)
+{
+   if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+   (uint64_t)offset + len > dev->mem_resource[bar].len)
+   return -1;
+   memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+   return len;
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04e21ae20f..3d237398d9 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -680,6 +680,52 @@ int rte_pci_write_config(const struct rte_pci_device 
*device,
}
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+   void *buf, size_t len, off_t offset)
+{
+   char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+   switch (device->kdrv) {
+   case RTE_PCI_KDRV_IGB_UIO:
+   case RTE_PCI_KDRV_UIO_GENERIC:
+   return pci_uio_mmio_read(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+   case RTE_PCI_KDRV_VFIO:
+   return pci_vfio_mmio_read(device, bar, buf, len, offset);
+#endif
+   default:
+   rte_pci_device_name(&device->addr, devname,
+   RTE_DEV_NAME_MAX_LEN);
+   RTE_LOG(ERR, EAL,
+   "Unknown driver type for %s\n", devname);
+   return -1;
+   }
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+   const void *buf, size_t len, off_t offset)
+{
+   char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+   switch (device->kdrv) {
+   case RTE_PCI_KDRV_IGB_UIO:
+   case RTE_PCI_KDRV_UIO_GENERIC:
+   return pci_uio_mmio_write(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+   case RTE_PCI_KDRV_VFIO:
+   return pci_vfio_mmio_write(device, bar, buf, len, offset);
+#endif
+   default:
+   rte_pci_device_name(&device->addr, devname,
+   RTE_DEV_NAME_MAX_LEN);
+   RTE_LOG(ERR, EAL,
+   "Unknown driver type for %s\n", devname);
+   return -1;
+   }
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index 9f6659ba6e..d842809ccd 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -37,6 +37,11 @@ int pci_uio_read_config(const struct rte_intr_handle 
*intr_handle,
 int pci_uio_write_config(const struct rte_intr_handle *intr_handle,
 const void *buf, size_t len, off_t offs);
 
+int pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
+   void *buf, size_t len, off_t offset);
+int pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
+   const void *buf, size_t len, off_t offset);
+
 int pci_uio_ioport_map(struct rte_pci_device *dev, int bar,
   struct rte_pci_ioport *p);
 void pci_uio_ioport_read(struct rte_pci_ioport *p,
@@ -71,6 +76,11 @@ int pci_vfio_read_config(const struct rte_pci_device *dev,
 int pci_vfio_write_config(const struct rte_pci_device *dev,
  const void *buf, size_t len, off_t offs);
 
+int pci_vfio_mmio_read(const struct rte_pci_device *dev, int b

[PATCH v2 4/4] bus/pci: add VFIO sparse mmap support

2023-05-15 Thread Miao Li
This patch adds sparse mmap support in PCI bus. Sparse mmap is a
capability defined in VFIO which allows multiple mmap areas in one
VFIO region.

In this patch, the sparse mmap regions are mapped to one continuous
virtual address region that follows device-specific BAR layout. So,
driver can still access all mapped sparse mmap regions by using
'bar_base_address + bar_offset'.

Signed-off-by: Miao Li 
Signed-off-by: Chenbo Xia 
---
 drivers/bus/pci/linux/pci_vfio.c | 104 +++
 drivers/bus/pci/private.h|   2 +
 2 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index f6289c907f..dd57566083 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -673,6 +673,54 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct 
mapped_pci_resource *vfio_res,
return 0;
 }
 
+static int
+pci_vfio_sparse_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
+   int bar_index, int additional_flags)
+{
+   struct pci_map *bar = &vfio_res->maps[bar_index];
+   struct vfio_region_sparse_mmap_area *sparse;
+   void *bar_addr;
+   uint32_t i;
+
+   if (bar->size == 0) {
+   RTE_LOG(DEBUG, EAL, "Bar size is 0, skip BAR%d\n", bar_index);
+   return 0;
+   }
+
+   /* reserve the address using an inaccessible mapping */
+   bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
+   MAP_ANONYMOUS | additional_flags, -1, 0);
+   if (bar_addr != MAP_FAILED) {
+   void *map_addr = NULL;
+   for (i = 0; i < bar->nr_areas; i++) {
+   sparse = &bar->areas[i];
+   if (sparse->size) {
+   void *addr = RTE_PTR_ADD(bar_addr, 
(uintptr_t)sparse->offset);
+   map_addr = pci_map_resource(addr, vfio_dev_fd,
+   bar->offset + sparse->offset, 
sparse->size,
+   RTE_MAP_FORCE_ADDRESS);
+   if (map_addr == NULL) {
+   munmap(bar_addr, bar->size);
+   RTE_LOG(ERR, EAL, "Failed to map pci 
BAR%d\n",
+   bar_index);
+   goto err_map;
+   }
+   }
+   }
+   } else {
+   RTE_LOG(ERR, EAL, "Failed to create inaccessible mapping for 
BAR%d\n",
+   bar_index);
+   goto err_map;
+   }
+
+   bar->addr = bar_addr;
+   return 0;
+
+err_map:
+   bar->nr_areas = 0;
+   return -1;
+}
+
 /*
  * region info may contain capability headers, so we need to keep reallocating
  * the memory until we match allocated memory size with argsz.
@@ -875,6 +923,8 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 
for (i = 0; i < vfio_res->nb_maps; i++) {
void *bar_addr;
+   struct vfio_info_cap_header *hdr;
+   struct vfio_region_info_cap_sparse_mmap *sparse;
 
ret = pci_vfio_get_region_info(vfio_dev_fd, ®, i);
if (ret < 0) {
@@ -920,12 +970,33 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
maps[i].size = reg->size;
maps[i].path = NULL; /* vfio doesn't have per-resource paths */
 
-   ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
-   if (ret < 0) {
-   RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-   pci_addr, i, strerror(errno));
-   free(reg);
-   goto err_vfio_res;
+   hdr = pci_vfio_info_cap(reg, VFIO_REGION_INFO_CAP_SPARSE_MMAP);
+
+   if (hdr != NULL) {
+   sparse = container_of(hdr,
+   struct vfio_region_info_cap_sparse_mmap, 
header);
+   if (sparse->nr_areas > 0) {
+   maps[i].nr_areas = sparse->nr_areas;
+   maps[i].areas = sparse->areas;
+   }
+   }
+
+   if (maps[i].nr_areas > 0) {
+   ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, 
i, 0);
+   if (ret < 0) {
+   RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i 
failed: %s\n",
+   pci_addr, i, strerror(errno));
+   free(reg);
+   goto err_vfio_res;
+   }
+   } else {
+   ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+   if (ret < 0) {
+   RTE_LOG(ERR, EAL, "%s m

RE: [PATCH] examples/ptpclient: add signal handler for cleanup

2023-05-15 Thread Rahul Bhansali
Ping.

> -Original Message-
> From: Rahul Bhansali
> Sent: Friday, January 20, 2023 11:26 AM
> To: 'dev@dpdk.org' ; 'Kirill Rybalchenko'
> 
> Subject: RE: [PATCH] examples/ptpclient: add signal handler for cleanup
> 
> Ping.
> 
> > -Original Message-
> > From: Rahul Bhansali
> > Sent: Wednesday, November 2, 2022 10:21 PM
> > To: dev@dpdk.org; Kirill Rybalchenko 
> > Subject: RE: [PATCH] examples/ptpclient: add signal handler for
> > cleanup
> >
> > Ping.
> >
> > > -Original Message-
> > > From: Rahul Bhansali 
> > > Sent: Wednesday, August 31, 2022 12:19 PM
> > > To: dev@dpdk.org; Kirill Rybalchenko 
> > > Cc: Rahul Bhansali 
> > > Subject: [PATCH] examples/ptpclient: add signal handler for cleanup
> > >
> > > This adds the signal handler for SIGINT, SIGTERM.
> > > Also, this will come out from infinite loop and do cleanup once it
> > > receives any of the registered signal.
> > >
> > > Signed-off-by: Rahul Bhansali 
> > > ---
> > >  examples/ptpclient/ptpclient.c | 32
> > > ++--
> > >  1 file changed, 30 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/examples/ptpclient/ptpclient.c
> > > b/examples/ptpclient/ptpclient.c index 1f1c9c9c52..8b69716be1 100644
> > > --- a/examples/ptpclient/ptpclient.c
> > > +++ b/examples/ptpclient/ptpclient.c
> > > @@ -19,6 +19,9 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > > +
> > > +static volatile bool force_quit;
> > >
> > >  #define RX_RING_SIZE 1024
> > >  #define TX_RING_SIZE 1024
> > > @@ -609,7 +612,7 @@ parse_ptp_frames(uint16_t portid, struct
> > > rte_mbuf
> > > *m) {
> > >   * The lcore main. This is the main thread that does the work, reading 
> > > from
> an
> > >   * input port and writing to an output port.
> > >   */
> > > -static __rte_noreturn void
> > > +static void
> > >  lcore_main(void)
> > >  {
> > >   uint16_t portid;
> > > @@ -621,7 +624,7 @@ lcore_main(void)
> > >
> > >   /* Run until the application is quit or killed. */
> > >
> > > - while (1) {
> > > + while (!force_quit) {
> > >   /* Read packet from RX queues. 8< */
> > >   for (portid = 0; portid < ptp_enabled_port_nb; portid++) {
> > >
> > > @@ -734,6 +737,13 @@ ptp_parse_args(int argc, char **argv)
> > >   return 0;
> > >  }
> > >
> > > +static void
> > > +signal_handler(int signum)
> > > +{
> > > + if (signum == SIGINT || signum == SIGTERM)
> > > + force_quit = true;
> > > +}
> > > +
> > >  /*
> > >   * The main function, which does initialization and calls the per-lcore
> > >   * functions.
> > > @@ -758,6 +768,10 @@ main(int argc, char *argv[])
> > >   argc -= ret;
> > >   argv += ret;
> > >
> > > + force_quit = false;
> > > + signal(SIGINT, signal_handler);
> > > + signal(SIGTERM, signal_handler);
> > > +
> > >   ret = ptp_parse_args(argc, argv);
> > >   if (ret < 0)
> > >   rte_exit(EXIT_FAILURE, "Error with PTP initialization\n"); @@ -
> > > 802,6 +816,20 @@ main(int argc, char *argv[])
> > >   /* Call lcore_main on the main core only. */
> > >   lcore_main();
> > >
> > > + RTE_ETH_FOREACH_DEV(portid) {
> > > + if ((ptp_enabled_port_mask & (1 << portid)) == 0)
> > > + continue;
> > > +
> > > + /* Disable timesync timestamping for the Ethernet device */
> > > + rte_eth_timesync_disable(portid);
> > > +
> > > + ret = rte_eth_dev_stop(portid);
> > > + if (ret != 0)
> > > + printf("rte_eth_dev_stop: err=%d, port=%d\n", ret,
> > > portid);
> > > +
> > > + rte_eth_dev_close(portid);
> > > + }
> > > +
> > >   /* clean up the EAL */
> > >   rte_eal_cleanup();
> > >
> > > --
> > > 2.25.1



Re: [PATCH v2] vfio: do not coalesce DMA mappings

2023-05-15 Thread David Marchand
On Wed, Jan 4, 2023 at 6:19 AM Nipun Gupta  wrote:
> At the cleanup time when dma unmap is done, linux kernel
> does not allow unmap of individual segments which were
> coalesced together while creating the DMA map for type1 IOMMU
> mappings. So, this change updates the mapping of the memory
> segments(hugepages) on a per-page basis.
>
> Signed-off-by: Nipun Gupta 
> Signed-off-by: Nikhil Agarwal 
Reviewed-by: Anatoly Burakov 


Applied, thanks.


-- 
David Marchand



[PATCH v2] vhost: avoid sleeping under mutex

2023-05-15 Thread David Marchand
Covscan reported:

 2. dpdk-21.11/lib/vhost/socket.c:852: lock_acquire: Calling function
"pthread_mutex_lock" acquires lock "vhost_user.mutex".
23. dpdk-21.11/lib/vhost/socket.c:955: sleep: Call to
"vhost_user_reconnect_init" might sleep while holding lock
"vhost_user.mutex".
 #   953|   vsocket->reconnect =
!(flags & RTE_VHOST_USER_NO_RECONNECT);
 #   954|   if (vsocket->reconnect && reconn_tid == 0) {
 #   955|-> if (vhost_user_reconnect_init() != 0)
 #   956|   goto out_mutex;
 #   957|   }

The reason for this warning is that vhost_user_reconnect_init() creates a
ctrl thread and calls nanosleep waiting for this thread to be ready,
while vhost_user.mutex is taken.

Move the call to vhost_user_reconnect_init() out of this mutex.

While at it, a pthread_t value should be considered opaque.
Instead of relying reconn_tid == 0, use an internal flag in
vhost_user_reconnect_init().

Coverity issue: 373686
Bugzilla ID: 981
Fixes: e623e0c6d8a5 ("vhost: add reconnect ability")
Cc: sta...@dpdk.org

Signed-off-by: David Marchand 
---
Changes since v1:
- moved reconn_tid in vhost_user_reconnect_init as this
  variable is not used anywhere else,

---
 lib/vhost/socket.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index 669c322e12..00a912c59e 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -427,7 +427,6 @@ struct vhost_user_reconnect_list {
 };
 
 static struct vhost_user_reconnect_list reconn_list;
-static pthread_t reconn_tid;
 
 static int
 vhost_user_connect_nonblock(char *path, int fd, struct sockaddr *un, size_t sz)
@@ -498,8 +497,13 @@ vhost_user_client_reconnect(void *arg __rte_unused)
 static int
 vhost_user_reconnect_init(void)
 {
+   static bool reconn_init_done;
+   static pthread_t reconn_tid;
int ret;
 
+   if (reconn_init_done)
+   return 0;
+
ret = pthread_mutex_init(&reconn_list.mutex, NULL);
if (ret < 0) {
VHOST_LOG_CONFIG("thread", ERR, "%s: failed to initialize 
mutex\n", __func__);
@@ -515,6 +519,8 @@ vhost_user_reconnect_init(void)
VHOST_LOG_CONFIG("thread", ERR,
"%s: failed to destroy reconnect mutex\n",
__func__);
+   } else {
+   reconn_init_done = true;
}
 
return ret;
@@ -866,6 +872,11 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
if (!path)
return -1;
 
+   if ((flags & RTE_VHOST_USER_CLIENT) != 0 &&
+   (flags & RTE_VHOST_USER_NO_RECONNECT) == 0 &&
+   vhost_user_reconnect_init() != 0)
+   return -1;
+
pthread_mutex_lock(&vhost_user.mutex);
 
if (vhost_user.vsocket_cnt == MAX_VHOST_SOCKET) {
@@ -961,11 +972,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
}
 
if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
-   vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
-   if (vsocket->reconnect && reconn_tid == 0) {
-   if (vhost_user_reconnect_init() != 0)
-   goto out_mutex;
-   }
+   vsocket->reconnect = (flags & RTE_VHOST_USER_NO_RECONNECT) == 0;
} else {
vsocket->is_server = true;
}
-- 
2.40.0



Re: [PATCH] vhost: avoid sleeping under mutex

2023-05-15 Thread David Marchand
On Thu, May 11, 2023 at 10:10 AM Xia, Chenbo  wrote:
> > Covscan reported:
> >
> >  2. dpdk-21.11/lib/vhost/socket.c:852: lock_acquire: Calling function
> >   "pthread_mutex_lock" acquires lock "vhost_user.mutex".
> > 23. dpdk-21.11/lib/vhost/socket.c:955: sleep: Call to
> >   "vhost_user_reconnect_init" might sleep while holding lock
> >   "vhost_user.mutex".
> >  #   953| vsocket->reconnect =
> >   !(flags & RTE_VHOST_USER_NO_RECONNECT);
> >  #   954| if (vsocket->reconnect && reconn_tid == 0) {
> >  #   955|->   if (vhost_user_reconnect_init() != 0)
> >  #   956| goto out_mutex;
> >  #   957| }
> >
> > The reason for this warning is that vhost_user_reconnect_init() creates a
> > ctrl thread and calls nanosleep waiting for this thread to be ready,
> > while vhost_user.mutex is taken.
> >
> > Move the call to vhost_user_reconnect_init() out of this mutex.
> >
> > While at it, a pthread_t value should be considered opaque.
> > Instead of relying reconn_tid == 0, use an internal flag in
> > vhost_user_reconnect_init().
>
> Should we make reconn_tid not a global variable anymore then?
> Maybe a static pthread_t in vhost_user_reconnect_init?

Yep, I just sent a v2.
Thanks Chenbo.


-- 
David Marchand



RE: [EXT] Re: [PATCH v7] mem: telemetry support for memseg and element information

2023-05-15 Thread Amit Prakash Shukla
Hi Anatoly and David,

Could you review this patch, this is pending since October-22. If there are no 
comments, please accept the patch to make forward progress.

Thanks,
Amit Shukla

> -Original Message-
> From: Amit Prakash Shukla 
> Sent: Tuesday, February 28, 2023 1:00 PM
> To: Thomas Monjalon ; dev@dpdk.org
> Cc: Anatoly Burakov ; Jerin Jacob
> Kollanukkaran ; david.march...@redhat.com;
> bruce.richard...@intel.com; ciara.po...@intel.com;
> dmitry.kozl...@gmail.com
> Subject: RE: [EXT] Re: [PATCH v7] mem: telemetry support for memseg and
> element information
> 
> Hi Thomas,
> 
> No changes pending from my side.
> 
> Thanks,
> Amit Shukla
> 
> > -Original Message-
> > From: Thomas Monjalon 
> > Sent: Monday, February 20, 2023 4:41 PM
> > To: dev@dpdk.org
> > Cc: Anatoly Burakov ; Jerin Jacob
> > Kollanukkaran ; david.march...@redhat.com;
> > bruce.richard...@intel.com; ciara.po...@intel.com;
> > dmitry.kozl...@gmail.com; Amit Prakash Shukla
> > 
> > Subject: [EXT] Re: [PATCH v7] mem: telemetry support for memseg and
> > element information
> >
> > External Email
> >
> > --
> > 25/10/2022 15:02, Amit Prakash Shukla:
> > > Changes adds telemetry support to display memory occupancy in
> memseg
> > > and the information of the elements allocated from a memseg based on
> > > arguments provided by user.
> >
> > There was no comment since October.
> > Can we merge?
> >
> >



Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts

2023-05-15 Thread Jerin Jacob
On Fri, May 12, 2023 at 7:26 PM Morten Brørup  
wrote:
>
> > From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
> > Sent: Friday, 12 May 2023 15.15
> >
> > On 2023-05-12 13:59, Jerin Jacob wrote:
> > > On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
> > >  wrote:
> > >>
> > >> Use non-burst event enqueue and dequeue calls from burst enqueue and
> > >> dequeue only when the burst size is compile-time constant (and equal
> > >> to one).
> > >>
> > >> Signed-off-by: Mattias Rönnblom 
> > >>
> > >> ---
> > >>
> > >> v3: Actually include the change v2 claimed to contain.
> > >> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
> > >>  application is compiled with -pedantic. (Morten Brørup)
> > >> ---
> > >>   lib/eventdev/rte_eventdev.h | 4 ++--
> > >>   1 file changed, 2 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > >> index a90e23ac8b..a471caeb6d 100644
> > >> --- a/lib/eventdev/rte_eventdev.h
> > >> +++ b/lib/eventdev/rte_eventdev.h
> > >> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t
> > port_id,
> > >>   * Allow zero cost non burst mode routine invocation if
> > application
> > >>   * requests nb_events as const one
> > >>   */
> > >> -   if (nb_events == 1)
> > >> +   if (__extension__(__builtin_constant_p(nb_events)) && nb_events 
> > >> ==
> > 1)
> > >
> > > "Why" part is not clear from the commit message. Is this to avoid
> > > nb_events read if it is built-in const.
> >
> > The __builtin_constant_p() is introduced to avoid having the compiler
> > generate a conditional branch and two different code paths in case
> > nb_elem is a run-time variable.
> >
> > In particular, this matters if nb_elems is run-time variable and varies
> > between 1 and some larger value.
> >
> > I should have mention this in the commit message.
> >
> > A very slight performance improvement. It also makes the code better
> > match the comment, imo. Zero cost for const one enqueues, but no impact
> > non-compile-time-constant-length enqueues.
> >
> > Feel free to ignore.
> >
> > > If so, check should be following. Right?
> > >
> > > if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
> > > || nb_events  == 1)
>
> @Mattias: You missed the second part of this comparison, also catching 
> nb_events == 1 with non-constant nb_events.
>
> @Jerin: Such a change has no effect, compared to the original code.


Yes. That was the reason for I was skipping __builtin_constant_p in
the initial version.


Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts

2023-05-15 Thread Jerin Jacob
On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom
 wrote:
>
> On 2023-05-12 13:59, Jerin Jacob wrote:
> > On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
> >  wrote:
> >>
> >> Use non-burst event enqueue and dequeue calls from burst enqueue and
> >> dequeue only when the burst size is compile-time constant (and equal
> >> to one).
> >>
> >> Signed-off-by: Mattias Rönnblom 
> >>
> >> ---
> >>
> >> v3: Actually include the change v2 claimed to contain.
> >> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
> >>  application is compiled with -pedantic. (Morten Brørup)
> >> ---
> >>   lib/eventdev/rte_eventdev.h | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> >> index a90e23ac8b..a471caeb6d 100644
> >> --- a/lib/eventdev/rte_eventdev.h
> >> +++ b/lib/eventdev/rte_eventdev.h
> >> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t 
> >> port_id,
> >>   * Allow zero cost non burst mode routine invocation if 
> >> application
> >>   * requests nb_events as const one
> >>   */
> >> -   if (nb_events == 1)
> >> +   if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 
> >> 1)
> >
> > "Why" part is not clear from the commit message. Is this to avoid
> > nb_events read if it is built-in const.
>
> The __builtin_constant_p() is introduced to avoid having the compiler
> generate a conditional branch and two different code paths in case
> nb_elem is a run-time variable.
>
> In particular, this matters if nb_elems is run-time variable and varies
> between 1 and some larger value.
>
> I should have mention this in the commit message.
>
> A very slight performance improvement. It also makes the code better
> match the comment, imo. Zero cost for const one enqueues, but no impact
> non-compile-time-constant-length enqueues.
>
> Feel free to ignore.


I did some performance comparison of the patch.
A low-end ARM machines shows 0.7%  drop with single event case. No
regression see with high-end ARM cores with single event case.

IMO, optimizing the check for burst mode(the new patch) may not show
any real improvement as the cost is divided by number of event.
Whereas optimizing the check for single event case(The current code)
shows better performance with single event case and no regression
with burst mode as cost is divided by number of events.
If you agree, then we can skip this patch.


>
> > If so, check should be following. Right?
> >
> > if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
> > || nb_events  == 1)
> >
> > At least, It was my original intention in the code.
> >
> >
> >
> >>  return (fp_ops->enqueue)(port, ev);
> >>  else
> >>  return fn(port, ev, nb_events);
> >> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t 
> >> port_id, struct rte_event ev[],
> >>   * Allow zero cost non burst mode routine invocation if 
> >> application
> >>   * requests nb_events as const one
> >>   */
> >> -   if (nb_events == 1)
> >> +   if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 
> >> 1)
> >>  return (fp_ops->dequeue)(port, ev, timeout_ticks);
> >>  else
> >>  return (fp_ops->dequeue_burst)(port, ev, nb_events,
> >> --
> >> 2.34.1
> >>
>


Re: [RFC v2 1/3] ethdev: add frequency adjustment API

2023-05-15 Thread Thomas Monjalon
Hello,

03/04/2023 11:22, Simei Su:
> This patch introduces a new timesync API "rte_eth_timesync_adjust_freq"
> which enables frequency adjustment during PTP timesync.

You should explain how it compares with existing functions
like rte_eth_timesync_adjust_time().

[...]
>  /**
> + * Adjust the clock increment rate on an Ethernet device.
> + *
> + * This is usually used in conjunction with other Ethdev timesync functions 
> to
> + * synchronize the device time using the IEEE1588/802.1AS protocol.

All timesync functions have this sentence,
but it does not help to understand how to combine them.

> + *
> + * @param port_id
> + *  The port identifier of the Ethernet device.
> + * @param ppm
> + *  Parts per million with 16-bit fractional field

Sorry I don't understand what this parameter is about.
Probably because I'm not an expert of IEEE1588/802.1AS protocol.
Please make it possible to understand for basic users.

> + *
> + * @return
> + *   - 0: Success.
> + *   - -ENODEV: The port ID is invalid.
> + *   - -EIO: if device is removed.
> + *   - -ENOTSUP: The function is not supported by the Ethernet driver.
> + */
> +int rte_eth_timesync_adjust_freq(uint16_t port_id, int64_t ppm);

In general, I think there is an effort to provide in order to make
the timesync API better understandable.
Example of something to explain: how Rx and Tx timestamps of a port
are differents? Is it different of rte_eth_timesync_read_time()?

The function rte_eth_read_clock() was provided with a better explanation.

Please make improvements in API documentation as part of this API,
I don't want to get more functions without improving explanation
of older functions.
Thank you




[PATCH v1] doc: updated libcrypto dependencies in QAT guide

2023-05-15 Thread Samina Arshad
 The documenation needs extra steps for customers to explicitly
 show how to install libcrypto, which is needed for QAT.
 It requires the commands to install libcrypto for
 Ubuntu and RHEL.

Signed-off-by: Samina Arshad 
---
 .mailmap  |  1 +
 doc/guides/cryptodevs/qat.rst | 14 ++
 2 files changed, 15 insertions(+)

diff --git a/.mailmap b/.mailmap
index 0859104404..3c2c1fba83 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1179,6 +1179,7 @@ Salem Sol 
 Sameh Gobriel 
 Sam Grove 
 Samik Gupta 
+Samina Arshad 
 Samuel Gauthier 
 Sangjin Han 
 Sankar Chokkalingam 
diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst
index ef754106a8..a3694f7131 100644
--- a/doc/guides/cryptodevs/qat.rst
+++ b/doc/guides/cryptodevs/qat.rst
@@ -234,6 +234,20 @@ These are the build configuration options affecting QAT, 
and their default value
 Both QAT SYM PMD and QAT ASYM PMD have an external dependency on libcrypto, so 
are not
 built by default.
 
+Ubuntu
+
+.. code-block:: console
+
+  apt install libssl-dev
+
+
+RHEL
+
+.. code-block:: console
+
+  dnf install openssl-devel
+
+
 The QAT compressdev PMD has no external dependencies, so is built by default.
 
 The number of VFs per PF varies - see table below. If multiple QAT packages are
-- 
2.25.1



Re: [PATCH v2] vhost: avoid sleeping under mutex

2023-05-15 Thread Stephen Hemminger
On Mon, 15 May 2023 13:18:44 +0200
David Marchand  wrote:

> Covscan reported:
> 
>  2. dpdk-21.11/lib/vhost/socket.c:852: lock_acquire: Calling function
>   "pthread_mutex_lock" acquires lock "vhost_user.mutex".
> 23. dpdk-21.11/lib/vhost/socket.c:955: sleep: Call to
>   "vhost_user_reconnect_init" might sleep while holding lock
>   "vhost_user.mutex".
>  #   953| vsocket->reconnect =
>   !(flags & RTE_VHOST_USER_NO_RECONNECT);
>  #   954| if (vsocket->reconnect && reconn_tid == 0) {
>  #   955|->   if (vhost_user_reconnect_init() != 0)
>  #   956| goto out_mutex;
>  #   957| }
> 
> The reason for this warning is that vhost_user_reconnect_init() creates a
> ctrl thread and calls nanosleep waiting for this thread to be ready,
> while vhost_user.mutex is taken.
> 
> Move the call to vhost_user_reconnect_init() out of this mutex.
> 
> While at it, a pthread_t value should be considered opaque.
> Instead of relying reconn_tid == 0, use an internal flag in
> vhost_user_reconnect_init().
> 
> Coverity issue: 373686
> Bugzilla ID: 981
> Fixes: e623e0c6d8a5 ("vhost: add reconnect ability")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: David Marchand 
> ---
> Changes since v1:
> - moved reconn_tid in vhost_user_reconnect_init as this
>   variable is not used anywhere else,
> 
> ---
>  lib/vhost/socket.c | 19 +--
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> index 669c322e12..00a912c59e 100644
> --- a/lib/vhost/socket.c
> +++ b/lib/vhost/socket.c
> @@ -427,7 +427,6 @@ struct vhost_user_reconnect_list {
>  };
>  
>  static struct vhost_user_reconnect_list reconn_list;
> -static pthread_t reconn_tid;
>  
>  static int
>  vhost_user_connect_nonblock(char *path, int fd, struct sockaddr *un, size_t 
> sz)
> @@ -498,8 +497,13 @@ vhost_user_client_reconnect(void *arg __rte_unused)
>  static int
>  vhost_user_reconnect_init(void)
>  {
> + static bool reconn_init_done;
> + static pthread_t reconn_tid;

This would not be thread safe.
Perhaps use a pthread condition variable instead of spinning here.


Re: [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support

2023-05-15 Thread Stephen Hemminger
On Mon, 15 May 2023 06:47:00 +
Miao Li  wrote:

> + map_addr = pci_map_resource(addr, vfio_dev_fd,
> + bar->offset + sparse->offset, 
> sparse->size,
> + RTE_MAP_FORCE_ADDRESS);
> + if (map_addr == NULL) {
> + munmap(bar_addr, bar->size);
> + RTE_LOG(ERR, EAL, "Failed to map pci 
> BAR%d\n",
> + bar_index);

If mmap() fails then printing errno would help diagnose why.


[PATCH] dma/idxd: add support for multi-process when using VFIO

2023-05-15 Thread Bruce Richardson
When using vfio-pci/uio for hardware access, we need to avoid
reinitializing the hardware when mapping from a secondary process.
Instead, just configure the function pointers and reuse the data
mappings from the primary process.

With the code change, update driver doc with the information that
vfio-pci can be used for multi-process support, and explicitly state the
limitation on multi-process support being unavailable when using idxd
kernel driver.

Signed-off-by: Bruce Richardson 
---
 doc/guides/dmadevs/idxd.rst|  5 +
 drivers/dma/idxd/idxd_common.c |  6 --
 drivers/dma/idxd/idxd_pci.c| 30 ++
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/doc/guides/dmadevs/idxd.rst b/doc/guides/dmadevs/idxd.rst
index bdfd3e78ad..f75d1d0a85 100644
--- a/doc/guides/dmadevs/idxd.rst
+++ b/doc/guides/dmadevs/idxd.rst
@@ -35,6 +35,11 @@ Device Setup
 Intel\ |reg| DSA devices can use the IDXD kernel driver or DPDK-supported 
drivers,
 such as ``vfio-pci``. Both are supported by the IDXD PMD.
 
+.. note::
+To use Intel\ |reg| DSA devices in DPDK multi-process applications,
+the devices should be bound to the vfio-pci driver.
+Multi-process is not supported when using the kernel IDXD driver.
+
 Intel\ |reg| DSA devices using IDXD kernel driver
 ~~
 
diff --git a/drivers/dma/idxd/idxd_common.c b/drivers/dma/idxd/idxd_common.c
index 6fe8ad4884..83d53942eb 100644
--- a/drivers/dma/idxd/idxd_common.c
+++ b/drivers/dma/idxd/idxd_common.c
@@ -599,6 +599,10 @@ idxd_dmadev_create(const char *name, struct rte_device 
*dev,
dmadev->fp_obj->completed = idxd_completed;
dmadev->fp_obj->completed_status = idxd_completed_status;
dmadev->fp_obj->burst_capacity = idxd_burst_capacity;
+   dmadev->fp_obj->dev_private = dmadev->data->dev_private;
+
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return 0;
 
idxd = dmadev->data->dev_private;
*idxd = *base_idxd; /* copy over the main fields already passed in */
@@ -619,8 +623,6 @@ idxd_dmadev_create(const char *name, struct rte_device *dev,
idxd->batch_idx_ring = (void 
*)&idxd->batch_comp_ring[idxd->max_batches+1];
idxd->batch_iova = rte_mem_virt2iova(idxd->batch_comp_ring);
 
-   dmadev->fp_obj->dev_private = idxd;
-
idxd->dmadev->state = RTE_DMA_DEV_READY;
 
return 0;
diff --git a/drivers/dma/idxd/idxd_pci.c b/drivers/dma/idxd/idxd_pci.c
index 781fa02db3..5fe9314d01 100644
--- a/drivers/dma/idxd/idxd_pci.c
+++ b/drivers/dma/idxd/idxd_pci.c
@@ -309,6 +309,36 @@ idxd_dmadev_probe_pci(struct rte_pci_driver *drv, struct 
rte_pci_device *dev)
IDXD_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
dev->device.driver = &drv->driver;
 
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+   char qname[32];
+   int max_qid;
+
+   /* look up queue 0 to get the pci structure */
+   snprintf(qname, sizeof(qname), "%s-q0", name);
+   IDXD_PMD_INFO("Looking up %s\n", qname);
+   ret = idxd_dmadev_create(qname, &dev->device, NULL, 
&idxd_pci_ops);
+   if (ret != 0) {
+   IDXD_PMD_ERR("Failed to create dmadev %s", name);
+   return ret;
+   }
+   qid = rte_dma_get_dev_id_by_name(qname);
+   max_qid = rte_atomic16_read(
+   &((struct idxd_dmadev 
*)rte_dma_fp_objs[qid].dev_private)->u.pci->ref_count);
+
+   /* we have queue 0 done, now configure the rest of the queues */
+   for (qid = 1; qid < max_qid; qid++) {
+   /* add the queue number to each device name */
+   snprintf(qname, sizeof(qname), "%s-q%d", name, qid);
+   IDXD_PMD_INFO("Looking up %s\n", qname);
+   ret = idxd_dmadev_create(qname, &dev->device, NULL, 
&idxd_pci_ops);
+   if (ret != 0) {
+   IDXD_PMD_ERR("Failed to create dmadev %s", 
name);
+   return ret;
+   }
+   }
+   return 0;
+   }
+
if (dev->device.devargs && dev->device.devargs->args[0] != '\0') {
/* if the number of devargs grows beyond just 1, use rte_kvargs 
*/
if (sscanf(dev->device.devargs->args,
-- 
2.39.2



Re: [PATCH v5] app/testpmd: txonly multiflow port change support

2023-05-15 Thread Joshua Washington
Does this patch need anything else to be done before it can be merged? I'm
hoping to get this patch merged as part of the 23.07 release.

Thanks,
 Josh


Re: [PATCH] net/gve: support queue start and stop operations

2023-05-15 Thread Rushil Gupta
tested-by: Rushil Gupta 

On Mon, May 8, 2023 at 8:07 PM Junfeng Guo  wrote:

> Add support for queue operations for GQI:
>  - gve_rx_queue_start
>  - gve_tx_queue_start
>  - gve_rx_queue_stop
>  - gve_tx_queue_stop
>
> Add support for queue operations for DQO:
>  - gve_rx_queue_start_dqo
>  - gve_tx_queue_start_dqo
>  - gve_rx_queue_stop_dqo
>  - gve_tx_queue_stop_dqo
>
> Also move the funcs of rxq_mbufs_alloc into the corresponding files.
>
> Signed-off-by: Junfeng Guo 
> ---
>  drivers/net/gve/gve_ethdev.c | 166 +++
>  drivers/net/gve/gve_ethdev.h |  36 
>  drivers/net/gve/gve_rx.c |  96 ++--
>  drivers/net/gve/gve_rx_dqo.c |  97 ++--
>  drivers/net/gve/gve_tx.c |  54 ++--
>  drivers/net/gve/gve_tx_dqo.c |  54 ++--
>  6 files changed, 364 insertions(+), 139 deletions(-)
>
> diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
> index 8b6861a24f..1dcb3b3a01 100644
> --- a/drivers/net/gve/gve_ethdev.c
> +++ b/drivers/net/gve/gve_ethdev.c
> @@ -104,81 +104,6 @@ gve_dev_configure(struct rte_eth_dev *dev)
> return 0;
>  }
>
> -static int
> -gve_refill_pages(struct gve_rx_queue *rxq)
> -{
> -   struct rte_mbuf *nmb;
> -   uint16_t i;
> -   int diag;
> -
> -   diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0],
> rxq->nb_rx_desc);
> -   if (diag < 0) {
> -   for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
> -   nmb = rte_pktmbuf_alloc(rxq->mpool);
> -   if (!nmb)
> -   break;
> -   rxq->sw_ring[i] = nmb;
> -   }
> -   if (i < rxq->nb_rx_desc - 1)
> -   return -ENOMEM;
> -   }
> -   rxq->nb_avail = 0;
> -   rxq->next_avail = rxq->nb_rx_desc - 1;
> -
> -   for (i = 0; i < rxq->nb_rx_desc; i++) {
> -   if (rxq->is_gqi_qpl) {
> -   rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(i *
> PAGE_SIZE);
> -   } else {
> -   if (i == rxq->nb_rx_desc - 1)
> -   break;
> -   nmb = rxq->sw_ring[i];
> -   rxq->rx_data_ring[i].addr =
> rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
> -   }
> -   }
> -
> -   rte_write32(rte_cpu_to_be_32(rxq->next_avail), rxq->qrx_tail);
> -
> -   return 0;
> -}
> -
> -static int
> -gve_refill_dqo(struct gve_rx_queue *rxq)
> -{
> -   struct rte_mbuf *nmb;
> -   uint16_t i;
> -   int diag;
> -
> -   diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0],
> rxq->nb_rx_desc);
> -   if (diag < 0) {
> -   rxq->stats.no_mbufs_bulk++;
> -   for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
> -   nmb = rte_pktmbuf_alloc(rxq->mpool);
> -   if (!nmb)
> -   break;
> -   rxq->sw_ring[i] = nmb;
> -   }
> -   if (i < rxq->nb_rx_desc - 1) {
> -   rxq->stats.no_mbufs += rxq->nb_rx_desc - 1 - i;
> -   return -ENOMEM;
> -   }
> -   }
> -
> -   for (i = 0; i < rxq->nb_rx_desc; i++) {
> -   if (i == rxq->nb_rx_desc - 1)
> -   break;
> -   nmb = rxq->sw_ring[i];
> -   rxq->rx_ring[i].buf_addr =
> rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
> -   rxq->rx_ring[i].buf_id = rte_cpu_to_le_16(i);
> -   }
> -
> -   rxq->nb_rx_hold = 0;
> -   rxq->bufq_tail = rxq->nb_rx_desc - 1;
> -
> -   rte_write32(rxq->bufq_tail, rxq->qrx_tail);
> -
> -   return 0;
> -}
> -
>  static int
>  gve_link_update(struct rte_eth_dev *dev, __rte_unused int
> wait_to_complete)
>  {
> @@ -208,65 +133,68 @@ gve_link_update(struct rte_eth_dev *dev,
> __rte_unused int wait_to_complete)
>  }
>
>  static int
> -gve_dev_start(struct rte_eth_dev *dev)
> +gve_start_queues(struct rte_eth_dev *dev)
>  {
> -   uint16_t num_queues = dev->data->nb_tx_queues;
> struct gve_priv *priv = dev->data->dev_private;
> -   struct gve_tx_queue *txq;
> -   struct gve_rx_queue *rxq;
> +   uint16_t num_queues;
> uint16_t i;
> -   int err;
> +   int ret;
>
> +   num_queues = dev->data->nb_tx_queues;
> priv->txqs = (struct gve_tx_queue **)dev->data->tx_queues;
> -   err = gve_adminq_create_tx_queues(priv, num_queues);
> -   if (err) {
> -   PMD_DRV_LOG(ERR, "failed to create %u tx queues.",
> num_queues);
> -   return err;
> -   }
> -   for (i = 0; i < num_queues; i++) {
> -   txq = priv->txqs[i];
> -   txq->qtx_tail =
> -   &priv->db_bar2[rte_be_to_cpu_32(txq->qres->db_index)];
> -   txq->qtx_head =
> -
>  &priv->cnt_array[rte_be_to_cpu_32(txq->qres->counter_index)];
> -
> -

Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts

2023-05-15 Thread Mattias Rönnblom

On 2023-05-15 14:38, Jerin Jacob wrote:

On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom
 wrote:


On 2023-05-12 13:59, Jerin Jacob wrote:

On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
 wrote:


Use non-burst event enqueue and dequeue calls from burst enqueue and
dequeue only when the burst size is compile-time constant (and equal
to one).

Signed-off-by: Mattias Rönnblom 

---

v3: Actually include the change v2 claimed to contain.
v2: Wrap builtin call in __extension__, to avoid compiler warnings if
  application is compiled with -pedantic. (Morten Brørup)
---
   lib/eventdev/rte_eventdev.h | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index a90e23ac8b..a471caeb6d 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
   * Allow zero cost non burst mode routine invocation if application
   * requests nb_events as const one
   */
-   if (nb_events == 1)
+   if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)


"Why" part is not clear from the commit message. Is this to avoid
nb_events read if it is built-in const.


The __builtin_constant_p() is introduced to avoid having the compiler
generate a conditional branch and two different code paths in case
nb_elem is a run-time variable.

In particular, this matters if nb_elems is run-time variable and varies
between 1 and some larger value.

I should have mention this in the commit message.

A very slight performance improvement. It also makes the code better
match the comment, imo. Zero cost for const one enqueues, but no impact
non-compile-time-constant-length enqueues.

Feel free to ignore.



I did some performance comparison of the patch.
A low-end ARM machines shows 0.7%  drop with single event case. No
regression see with high-end ARM cores with single event case.

IMO, optimizing the check for burst mode(the new patch) may not show
any real improvement as the cost is divided by number of event.
Whereas optimizing the check for single event case(The current code)
shows better performance with single event case and no regression
with burst mode as cost is divided by number of events.


I ran some tests on an AMD Zen 3 with DSW.

In the below tests the enqueue burst size is not compile-time constant.

Enqueue burst size  Performance improvement
Run-time constant 1 ~5%
Run-time constant 2 ~0%
Run-time variable 1-2   ~9%
Run-time variable 1-16  ~0%

The run-time variable enqueue sizes randomly (uniformly) distributed in 
the specified range.


The first result may come as a surprise. The benchmark is using 
RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type 
in most apps). The single-event enqueue function only exists in a 
generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent). 
I suspect that is the reason for the performance improvement.


This effect is large-enough to make it somewhat beneficial (+~1%) to use 
run-time variable single-event enqueue compared to keeping the burst 
size compile-time constant.


The performance gain is counted toward both enqueue and dequeue costs 
(+benchmark app overhead), so an under-estimation if see this as an 
enqueue performance improvement.



If you agree, then we can skip this patch.



I have no strong opinion if this should be included or not.

It was up to me, I would drop the single-enqueue special case handling 
altogether in the next ABI update.







If so, check should be following. Right?

if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
|| nb_events  == 1)

At least, It was my original intention in the code.




  return (fp_ops->enqueue)(port, ev);
  else
  return fn(port, ev, nb_events);
@@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, 
struct rte_event ev[],
   * Allow zero cost non burst mode routine invocation if application
   * requests nb_events as const one
   */
-   if (nb_events == 1)
+   if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
  return (fp_ops->dequeue)(port, ev, timeout_ticks);
  else
  return (fp_ops->dequeue_burst)(port, ev, nb_events,
--
2.34.1





DTS roadmap for 23.07 release

2023-05-15 Thread Honnappa Nagarahalli
Hello,
Following is the roadmap for DTS for 23.07 release.

1) Tooling for developing automated documentation for DTS API docs. Uses Google 
docstring format.
Link: 
http://patches.dpdk.org/project/dpdk/cover/20230511091408.236638-1-juraj.lin...@pantheon.tech/

2) Integrate Fabric. This will contain the support for sudo with no password, 
removing use of scp.
Link: 
http://patches.dpdk.org/project/dpdk/patch/20230424133537.58698-1-juraj.lin...@pantheon.tech/

3) Traffic Generator abstraction
http://patches.dpdk.org/project/dpdk/cover/20230420093109.594704-1-juraj.lin...@pantheon.tech/

3) Non traffic based smoke tests
http://patches.dpdk.org/project/dpdk/cover/20230413175415.7683-2-jspew...@iol.unh.edu/

4) Minor changes:
a) Using a git commit ID instead of tarball.
Link: 
http://patches.dpdk.org/project/dpdk/patch/20230420140244.701467-1-juraj.lin...@pantheon.tech/

b) Updates to the linter tools.
Link: 
http://patches.dpdk.org/project/dpdk/patch/20230331091355.1224059-1-juraj.lin...@pantheon.tech/

5) Docker file for build and run environment
Link: 
http://patches.dpdk.org/project/dpdk/patch/20221103134633.446646-1-juraj.lin...@pantheon.tech/

Thank you,
Honnappa


Re: [PATCH v5] app/testpmd: txonly multiflow port change support

2023-05-15 Thread Ferruh Yigit
On 4/22/2023 12:20 AM, Joshua Washington wrote:
> Google cloud routes traffic using IP addresses without the support of MAC
> addresses, so changing source IP address for txonly-multi-flow can have
> negative performance implications for net/gve when using testpmd. This
> patch updates txonly multiflow mode to modify source ports instead of
> source IP addresses.
> 
> The change can be tested with the following command:
> dpdk-testpmd -- --forward-mode=txonly --txonly-multi-flow \
> --tx-ip=,
> 
> Signed-off-by: Joshua Washington 
> Reviewed-by: Rushil Gupta 

Hi Joshua, Aman,

The reason to support multi-flow in Tx-only mode is, to test RSS in the
Rx side.
When used in guest, in a hypervisor infrastructure what is the usecase
for multi-flow, why not use default single IP?

And if there is a need to support multi-flow without updating source IP,
what about to support various modes as "--txonly-multi-flow=XXX", where
XXX can be src_ip, dst_ip, src_port, dst_port?
If possible by keeping no param '--txonly-multi-flow' same as current
usage (src_ip) for backward capability..

This extends testing capability and lets different platforms select
different config based on their needs.


RE: [PATCH] examples/l3fwd: add hard code to collect empty poll and NIC counters

2023-05-15 Thread Honnappa Nagarahalli


> >
> > >
> > > On Thu, May 11, 2023 at 1:55 PM Feifei Wang 
> > > wrote:
> > > >
> > > > This patch is to collect empty poll of 'rte_eth_rx_burst'
> > > > functions in dpdk l3fwd application. Empty poll means Rx burst
> > > > function receives no pkts in one loop.
> > > >
> > > > Furthermore, we also add 'nic_xstats_display' API to show NIC counters.
> > > >
> > > > Usage:
> > > > With this patch, no special settings, just run l3fwd, and when you
> > > > stoping l3fwd, thread will print the info above.
> > > >
> > > > Note:
> > > > This patch has just a slight impact on performance and can be ignored.
> > How much is the regression?
> >
> > >
> > > IMO, We should not introduce regression as l3fwd kind of uses as
> > > reference application.
> > > I think, l3fwd should limit to stats exposed by ethdev(i.e directly
> > > from NIC, without performance regression).
> > Agree L3fwd is the reference app. Unfortunately, it is not in a state to 
> > debug
> any problems. May be many are just believing the numbers without
> understanding that there are problems.
> > Can we place these stats under a run time flag and reduce the impact
> further?
> 
> I think, example applications, we can have compile time option for new
> feature addtions in fastpath or add new forwarding mode in testpmd.
New forwarding (L3fwd in this case) mode for testpmd was rejected 
overwhelmingly. So, that's not an option.

I thought compile time options are discouraged as well. But, I am fine with the 
compile time approach to get some debugging capabilities in this application.

May be we could understand the performance difference with run time flag?

> 
> >
> > >
> > >
> > >
> > > >
> > > > dpdk version:23.03
> > > >
> > > > Suggested-by: Lijian Zhang 
> > > > Signed-off-by: Feifei Wang 
> > > > Reviewed-by: Ruifeng Wang 
> > > > Reviewed-by: Honnappa Nagarahalli
> 
> > > > ---
> > > >  examples/l3fwd/l3fwd.h | 68
> > > ++
> > > >  examples/l3fwd/l3fwd_lpm.c | 26 +--
> > > >  examples/l3fwd/main.c  | 22 
> > > >  3 files changed, 114 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index
> > > > b55855c932..2b3fca62f3 100644
> > > > --- a/examples/l3fwd/l3fwd.h
> > > > +++ b/examples/l3fwd/l3fwd.h
> > > > @@ -56,6 +56,17 @@
> > > >  #define L3FWD_HASH_ENTRIES (1024*1024*1)
> > > >  #endif
> > > >
> > > > +struct lcore_stats {
> > > > +uint32_t nb_rx_pkts[16];
> > > > +uint32_t num_loop[16];
> > > > +uint32_t none_loop[16];
> > > > +uint32_t no_full_loop[16];
> > > > +float  none_loop_per[16];
> > > > +float no_full_loop_per[16];
> > > > +} __rte_cache_aligned;
> > > > +
> > > > +extern struct lcore_stats stats[RTE_MAX_LCORE];
> > > > +
> > > >  struct parm_cfg {
> > > > const char *rule_ipv4_name;
> > > > const char *rule_ipv6_name; @@ -115,6 +126,63 @@ extern
> > > > struct acl_algorithms acl_alg[];
> > > >
> > > >  extern uint32_t max_pkt_len;
> > > >
> > > > +static inline void
> > > > +nic_xstats_display(uint32_t port_id) {
> > > > +struct rte_eth_xstat *xstats;
> > > > +int cnt_xstats, idx_xstat;
> > > > +struct rte_eth_xstat_name *xstats_names;
> > > > +
> > > > +printf("## NIC extended statistics for port %-2d\n", 
> > > > port_id);
> > > > +if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > +fprintf(stderr, "Error: Invalid port number %i\n", 
> > > > port_id);
> > > > +return;
> > > > +}
> > > > +
> > > > +/* Get count */
> > > > +cnt_xstats = rte_eth_xstats_get_names(port_id, NULL, 0);
> > > > +if (cnt_xstats  < 0) {
> > > > +fprintf(stderr, "Error: Cannot get count of xstats\n");
> > > > +return;
> > > > +}
> > > > +
> > > > +/* Get id-name lookup table */
> > > > +xstats_names = malloc(sizeof(struct rte_eth_xstat_name) *
> cnt_xstats);
> > > > +if (xstats_names == NULL) {
> > > > +fprintf(stderr, "Cannot allocate memory for xstats 
> > > > lookup\n");
> > > > +return;
> > > > +}
> > > > +if (cnt_xstats != rte_eth_xstats_get_names(
> > > > +port_id, xstats_names, cnt_xstats)) {
> > > > +fprintf(stderr, "Error: Cannot get xstats lookup\n");
> > > > +free(xstats_names);
> > > > +return;
> > > > +}
> > > > +
> > > > +/* Get stats themselves */
> > > > +xstats = malloc(sizeof(struct rte_eth_xstat) * cnt_xstats);
> > > > +if (xstats == NULL) {
> > > > +fprintf(stderr, "Cannot allocate memory for xstats\n");
> > > > +free(xstats_names);
> > > > +return;
> > > > +}
> > > > +if (cnt_xstats != rte_eth_xstats_get(port_id, xstats, 
> > > > cnt_xstats)) {
> > > > +fpr

Re: [PATCH v2] net/tap: resolve stringop-overflow with gcc 12 on ppc64le

2023-05-15 Thread Ferruh Yigit
On 3/23/2023 5:01 PM, David Christensen wrote:
> Building DPDK with gcc 12 on a ppc64le system generates a
> stringop-overflow warning. Replace the local MAC address
> validation function parse_user_mac() with a call to
> rte_ether_unformat_addr() instead.
> 
> Bugzilla ID: 1197
> Cc: sta...@dpdk.org
> 
> Signed-off-by: David Christensen 
> ---
> v2:
> * Added NULL checks previously performed in parse_user_mac()
> ---
>  drivers/net/tap/rte_eth_tap.c | 26 ++
>  1 file changed, 2 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index 089ac202fa..8c50801fd4 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -2267,29 +2267,6 @@ set_remote_iface(const char *key __rte_unused,
>   return 0;
>  }
>  
> -static int parse_user_mac(struct rte_ether_addr *user_mac,
> - const char *value)
> -{
> - unsigned int index = 0;
> - char mac_temp[strlen(ETH_TAP_USR_MAC_FMT) + 1], *mac_byte = NULL;
> -
> - if (user_mac == NULL || value == NULL)
> - return 0;
> -
> - strlcpy(mac_temp, value, sizeof(mac_temp));
> - mac_byte = strtok(mac_temp, ":");
> -
> - while ((mac_byte != NULL) &&
> - (strlen(mac_byte) <= 2) &&
> - (strlen(mac_byte) == strspn(mac_byte,
> - ETH_TAP_CMP_MAC_FMT))) {
> - user_mac->addr_bytes[index++] = strtoul(mac_byte, NULL, 16);
> - mac_byte = strtok(NULL, ":");
> - }
> -
> - return index;
> -}
> -
>  static int
>  set_mac_type(const char *key __rte_unused,
>const char *value,
> @@ -2311,7 +2288,8 @@ set_mac_type(const char *key __rte_unused,
>   goto success;
>   }
>  
> - if (parse_user_mac(user_mac, value) != 6)
> + if (value == NULL || user_mac == NULL ||
> + rte_ether_unformat_addr(value, user_mac) < 0)
>   goto error;
>  success:
>   TAP_LOG(DEBUG, "TAP user MAC param (%s)", value);

Hi David,

I confirm the build error, btw it helps to future references to put
build failure to the commit log,

and change is reasonable to convert PMD local parse function to an API,
BUT my concern is they don't behave exactly same, which changes user
interface of the driver.

The 'rte_ether_unformat_addr()' API expects exact "XX:XX:XX:XX:XX:XX or
::" format.
Like 'parse_user_mac()' accepts 'a:a:a:a:a:a' as input, but API requires
'0A:0A:0A:0A:0A:0A'.

This is a small change but still may create a bad experience if an
existing user/script hit by this, and I believe we don't have a strong
reason to change the interface.


To keep behavior same, we can either update 'rte_ether_unformat_addr()'
to accept singe chars between ':',
or fix the existing 'parse_user_mac()' for compiler warning, what do you
think?




Re: [PATCH v2] net/tap: resolve stringop-overflow with gcc 12 on ppc64le

2023-05-15 Thread Stephen Hemminger
On Tue, 16 May 2023 00:14:52 +0100
Ferruh Yigit  wrote:

> Hi David,
> 
> I confirm the build error, btw it helps to future references to put
> build failure to the commit log,
> 
> and change is reasonable to convert PMD local parse function to an API,
> BUT my concern is they don't behave exactly same, which changes user
> interface of the driver.
> 
> The 'rte_ether_unformat_addr()' API expects exact "XX:XX:XX:XX:XX:XX or
> ::" format.
> Like 'parse_user_mac()' accepts 'a:a:a:a:a:a' as input, but API requires
> '0A:0A:0A:0A:0A:0A'.
> 
> This is a small change but still may create a bad experience if an
> existing user/script hit by this, and I believe we don't have a strong
> reason to change the interface.
> 
> 
> To keep behavior same, we can either update 'rte_ether_unformat_addr()'
> to accept singe chars between ':',
> or fix the existing 'parse_user_mac()' for compiler warning, what do you
> think?

This is the kind of change where a simple release note will suffice.

Not sure if anyone beyond some test script would ever use this anyway.


Re: [PATCH v2] net/tap: resolve stringop-overflow with gcc 12 on ppc64le

2023-05-15 Thread Ferruh Yigit
On 5/16/2023 12:20 AM, Stephen Hemminger wrote:
> On Tue, 16 May 2023 00:14:52 +0100
> Ferruh Yigit  wrote:
> 
>> Hi David,
>>
>> I confirm the build error, btw it helps to future references to put
>> build failure to the commit log,
>>
>> and change is reasonable to convert PMD local parse function to an API,
>> BUT my concern is they don't behave exactly same, which changes user
>> interface of the driver.
>>
>> The 'rte_ether_unformat_addr()' API expects exact "XX:XX:XX:XX:XX:XX or
>> ::" format.
>> Like 'parse_user_mac()' accepts 'a:a:a:a:a:a' as input, but API requires
>> '0A:0A:0A:0A:0A:0A'.
>>
>> This is a small change but still may create a bad experience if an
>> existing user/script hit by this, and I believe we don't have a strong
>> reason to change the interface.
>>
>>
>> To keep behavior same, we can either update 'rte_ether_unformat_addr()'
>> to accept singe chars between ':',
>> or fix the existing 'parse_user_mac()' for compiler warning, what do you
>> think?
> 
> This is the kind of change where a simple release note will suffice.
> 
> Not sure if anyone beyond some test script would ever use this anyway.


Yes only some scripts and possible applications that hotplug tap
interface with hardcoded parameters may impacted, don't know how big is
this amount but this ends up breaking something that was working before
upgrading DPDK for them.

And I believe the motivation is weak to break the behavior.

Won't it be better to update 'rte_ether_unformat_addr()' to accept more
flexible syntax, and use it? Is there any disadvantage of this approach?



Re: [PATCH v2] net/tap: resolve stringop-overflow with gcc 12 on ppc64le

2023-05-15 Thread Stephen Hemminger
On Tue, 16 May 2023 00:35:56 +0100
Ferruh Yigit  wrote:

> Yes only some scripts and possible applications that hotplug tap
> interface with hardcoded parameters may impacted, don't know how big is
> this amount but this ends up breaking something that was working before
> upgrading DPDK for them.
> 
> And I believe the motivation is weak to break the behavior.
> 
> Won't it be better to update 'rte_ether_unformat_addr()' to accept more
> flexible syntax, and use it? Is there any disadvantage of this approach?

It is already more flexible than the standard ether_aton().


RE: [PATCH v2] net/i40e: remove redundant judgment

2023-05-15 Thread Feifei Wang
> -Original Message-
> From: Zhang, Qi Z 
> Sent: Monday, May 15, 2023 9:59 AM
> To: Zhang, Qi Z ; Feifei Wang
> ; Richardson, Bruce ;
> Konstantin Ananyev ; Zhang, Yuying
> ; Xing, Beilei ; David
> Christensen ; Ruifeng Wang
> 
> Cc: dev@dpdk.org; nd ; Honnappa Nagarahalli
> 
> Subject: RE: [PATCH v2] net/i40e: remove redundant judgment
> 
> 
> 
> > -Original Message-
> > From: Zhang, Qi Z 
> > Sent: Thursday, April 27, 2023 3:38 PM
> > To: Feifei Wang ; Richardson, Bruce
> > ; Konstantin Ananyev
> > ; Zhang, Yuying
> > ; Xing, Beilei ; David
> > Christensen ; Ruifeng Wang
> > 
> > Cc: dev@dpdk.org; n...@arm.com; Honnappa Nagarahalli
> > 
> > Subject: RE: [PATCH v2] net/i40e: remove redundant judgment
> >
> >
> >
> > > -Original Message-
> > > From: Feifei Wang 
> > > Sent: Tuesday, March 28, 2023 3:28 PM
> > > To: Richardson, Bruce ; Konstantin
> > > Ananyev ; Zhang, Yuying
> > > ; Xing, Beilei ;
> > > David Christensen ; Ruifeng Wang
> > > 
> > > Cc: dev@dpdk.org; n...@arm.com; Feifei Wang ;
> > > Honnappa Nagarahalli 
> > > Subject: [PATCH v2] net/i40e: remove redundant judgment
> > >
> > > Merged variable updates under the same condition. It reduces branch.
> > >
> > > In ampere-altra, there is no performance improvement with this patch.
> > > In x86 sse and avx2 path, there is also no performance improvement.
> >
> > Thanks for sharing the results. While the code implements some best
> > practices, such as reducing branching and adding compiler hints, which
> > should generally improve performance, it's not necessary to highlight
> > that it didn't provide benefits on certain specific platforms.
> >
> > Would it be ok to remove the last two lines when merging the patch?
> 
> Ping
> 
Sorry for I did not reply this. I agree with this when  merging the patch.
Thanks for the comments~.
> >
> > Otherwise
> > Acked-by: Qi Zhang 
> >
> >
> > >
> > >
> > > v2:
> > > 1. add change for avx and altivec path.
> > >
> > > Suggested-by: Honnappa Nagarahalli 
> > > Signed-off-by: Feifei Wang 
> > > Reviewed-by: Ruifeng Wang 
> > > ---
> > >  drivers/net/i40e/i40e_rxtx_common_avx.h  | 9 +
> > > drivers/net/i40e/i40e_rxtx_vec_altivec.c | 9 +
> > >  drivers/net/i40e/i40e_rxtx_vec_neon.c| 9 +
> > >  drivers/net/i40e/i40e_rxtx_vec_sse.c | 9 +
> > >  4 files changed, 20 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h
> > > b/drivers/net/i40e/i40e_rxtx_common_avx.h
> > > index cfc1e63173..85958d6c81 100644
> > > --- a/drivers/net/i40e/i40e_rxtx_common_avx.h
> > > +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h
> > > @@ -198,14 +198,15 @@ i40e_rxq_rearm_common(struct
> i40e_rx_queue
> > *rxq,
> > > __rte_unused bool avx512)  #endif
> > >
> > >   rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH;
> > > - if (rxq->rxrearm_start >= rxq->nb_rx_desc)
> > > + rx_id = rxq->rxrearm_start - 1;
> > > +
> > > + if (unlikely(rxq->rxrearm_start >= rxq->nb_rx_desc)) {
> > >   rxq->rxrearm_start = 0;
> > > + rx_id = rxq->nb_rx_desc - 1;
> > > + }
> > >
> > >   rxq->rxrearm_nb -= RTE_I40E_RXQ_REARM_THRESH;
> > >
> > > - rx_id = (uint16_t)((rxq->rxrearm_start == 0) ?
> > > -  (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1));
> > > -
> > >   /* Update the tail pointer on the NIC */
> > >   I40E_PCI_REG_WC_WRITE(rxq->qrx_tail, rx_id);  } diff --git
> > > a/drivers/net/i40e/i40e_rxtx_vec_altivec.c
> > > b/drivers/net/i40e/i40e_rxtx_vec_altivec.c
> > > index 2dfa04599c..8672ad1c41 100644
> > > --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c
> > > +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c
> > > @@ -89,14 +89,15 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)
> > >   }
> > >
> > >   rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH;
> > > - if (rxq->rxrearm_start >= rxq->nb_rx_desc)
> > > + rx_id = rxq->rxrearm_start - 1;
> > > +
> > > + if (unlikely(rxq->rxrearm_start >= rxq->nb_rx_desc)) {
> > >   rxq->rxrearm_start = 0;
> > > + rx_id = rxq->nb_rx_desc - 1;
> > > + }
> > >
> > >   rxq->rxrearm_nb -= RTE_I40E_RXQ_REARM_THRESH;
> > >
> > > - rx_id = (uint16_t)((rxq->rxrearm_start == 0) ?
> > > -  (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1));
> > > -
> > >   /* Update the tail pointer on the NIC */
> > >   I40E_PCI_REG_WRITE(rxq->qrx_tail, rx_id);  } diff --git
> > > a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > > b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > > index 12e6f1cbcb..49391fe4c7 100644
> > > --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > > +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > > @@ -64,14 +64,15 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)
> > >   }
> > >
> > >   rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH;
> > > - if (rxq->rxrearm_start >= rxq->nb_rx_desc)
> > > + rx_id = rxq->rxrearm_start - 1;
> > > +
> > > + if (unlikely(rxq->rxrearm_start >= rxq->nb_rx_desc)) {
> > >   rxq->rxrearm_start = 0;
> > > + rx_id = rxq->nb_rx_des

[Bug 1230] [dpdk-22.07][meson test] malloc_autotest test: failed to test

2023-05-15 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=1230

Bug ID: 1230
   Summary: [dpdk-22.07][meson test] malloc_autotest test: failed
to test
   Product: DPDK
   Version: 23.07
  Hardware: All
OS: FreeBSD
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: meson
  Assignee: dev@dpdk.org
  Reporter: yux.ji...@intel.com
  Target Milestone: ---

Environment:
-
DPDK version: 23.03
OS: FreeBSD 13.2-RELEASE
Compiler: gcc version 12.2.0 (FreeBSD Ports Collection)
NIC hardware:  Ethernet controller: Intel Corporation Ethernet
Controller XXV710 for 25GbE SFP28

Test Setup:
-
rm -rf x86_64-native-bsdapp-gcc/
CC=gcc meson -Denable_kmods=True -Dlibdir=lib  --default-library=static
x86_64-native-bsdapp-gcc
ninja -C x86_64-native-bsdapp-gcc

kldunload contigmem.ko
kldunload nic_uio.ko
kldload x86_64-native-bsdapp-gcc/kernel/freebsd/contigmem.ko
kldload x86_64-native-bsdapp-gcc/kernel/freebsd/nic_uio.ko
kenv hw.nic_uio.bdfs="2:0:0,2:0:1"

MALLOC_PERTURB_=162 DPDK_TEST=malloc_autotest
/root/dpdk/x86_64-native-bsdapp-gcc/app/test/dpdk-test

Show the output from the previous commands:
-
Incorrect heap statistics: Allocated size 
test_multi_alloc_statistics() failed
Test Failed
RTE>>

Expected Result:
-
Test OK

Regression:
-
Is this issue a regression: (Y/N) Y
Version the regression was introduced: 
commit f62f4a375ff496abf66e48d5e1b1c442b86a82c1
Author: Fengnan Chang 
Date:   Fri Feb 10 14:30:22 2023 +0800

malloc: optimize 4K allocations

Here is a simple test case:

-- 
You are receiving this mail because:
You are the assignee for the bug.

Arm's roadmap for 23.07

2023-05-15 Thread Honnappa Nagarahalli
(Bcc: Arm internal stake holders)

Hello,
Following are the work items planned for 23.07:

1. Buffer recycle (a.k.a direct-rearm of Rx side buffers) APIs and 
implementation.
2. Zero-copy mempool APIs, integration with i40e PMD and test cases.
3. Remove rte_ring generic implementation and improve c11 implementation (to 
allow for moving to C11 atomic APIs).
4. QAT on ARM support announcement.
5. IPSecMB support for tag SECLIB-IPSEC-2023.3.21.

Thank you,
Honnappa


RE: [PATCH] examples/ipsec-secgw: fix zero address in ethernet header

2023-05-15 Thread Rahul Bhansali
Ping.

> -Original Message-
> From: Rahul Bhansali 
> Sent: Thursday, March 30, 2023 3:39 PM
> To: dev@dpdk.org; Radu Nicolau ; Akhil Goyal
> 
> Cc: Rahul Bhansali 
> Subject: [PATCH] examples/ipsec-secgw: fix zero address in ethernet header
> 
> During port init, src address stored in ethaddr_tbl is typecast which 
> violates the
> stric-aliasing rule and not reflecting the updated source address in processed
> packets too.
> 
> Fixes: 6eb3ba0399 ("examples/ipsec-secgw: support poll mode NEON LPM
> lookup")
> 
> Signed-off-by: Rahul Bhansali 
> ---
>  examples/ipsec-secgw/ipsec-secgw.c | 20 ++--
> examples/ipsec-secgw/ipsec-secgw.h |  2 +-
>  2 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-
> secgw/ipsec-secgw.c
> index d2d9d85b4a..029749e522 100644
> --- a/examples/ipsec-secgw/ipsec-secgw.c
> +++ b/examples/ipsec-secgw/ipsec-secgw.c
> @@ -99,10 +99,10 @@ uint32_t qp_desc_nb = 2048;
>  #define MTU_TO_FRAMELEN(x)   ((x) + RTE_ETHER_HDR_LEN +
> RTE_ETHER_CRC_LEN)
> 
>  struct ethaddr_info ethaddr_tbl[RTE_MAX_ETHPORTS] = {
> - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x7e, 0x94, 0x9a) },
> - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x22, 0xa1, 0xd9) },
> - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x08, 0x69, 0x26) },
> - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x49, 0x9e, 0xdd) }
> + { {{0}}, {{0x00, 0x16, 0x3e, 0x7e, 0x94, 0x9a}} },
> + { {{0}}, {{0x00, 0x16, 0x3e, 0x22, 0xa1, 0xd9}} },
> + { {{0}}, {{0x00, 0x16, 0x3e, 0x08, 0x69, 0x26}} },
> + { {{0}}, {{0x00, 0x16, 0x3e, 0x49, 0x9e, 0xdd}} }
>  };
> 
>  struct offloads tx_offloads;
> @@ -1427,9 +1427,8 @@ add_dst_ethaddr(uint16_t port, const struct
> rte_ether_addr *addr)
>   if (port >= RTE_DIM(ethaddr_tbl))
>   return -EINVAL;
> 
> - ethaddr_tbl[port].dst = ETHADDR_TO_UINT64(addr);
> - rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[port].dst,
> - (struct rte_ether_addr *)(val_eth + port));
> + rte_ether_addr_copy(addr, ðaddr_tbl[port].dst);
> + rte_ether_addr_copy(addr, (struct rte_ether_addr *)(val_eth + port));
>   return 0;
>  }
> 
> @@ -1907,11 +1906,12 @@ port_init(uint16_t portid, uint64_t req_rx_offloads,
> uint64_t req_tx_offloads,
>   "Error getting MAC address (port %u): %s\n",
>   portid, rte_strerror(-ret));
> 
> - ethaddr_tbl[portid].src = ETHADDR_TO_UINT64(ðaddr);
> + rte_ether_addr_copy(ðaddr, ðaddr_tbl[portid].src);
> 
> - rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[portid].dst,
> + rte_ether_addr_copy(ðaddr_tbl[portid].dst,
>   (struct rte_ether_addr *)(val_eth + portid));
> - rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[portid].src,
> +
> + rte_ether_addr_copy(ðaddr_tbl[portid].src,
>   (struct rte_ether_addr *)(val_eth + portid) + 1);
> 
>   print_ethaddr("Address: ", ðaddr);
> diff --git a/examples/ipsec-secgw/ipsec-secgw.h b/examples/ipsec-
> secgw/ipsec-secgw.h
> index 0e0012d058..53665adf03 100644
> --- a/examples/ipsec-secgw/ipsec-secgw.h
> +++ b/examples/ipsec-secgw/ipsec-secgw.h
> @@ -84,7 +84,7 @@ struct ipsec_traffic_nb {
> 
>  /* port/source ethernet addr and destination ethernet addr */  struct
> ethaddr_info {
> - uint64_t src, dst;
> + struct rte_ether_addr src, dst;
>  };
> 
>  struct ipsec_spd_stats {
> --
> 2.25.1



RE: [PATCH v6] enhance NUMA affinity heuristic

2023-05-15 Thread You, KaisenX



> -Original Message-
> From: Thomas Monjalon 
> Sent: 2023年4月27日 14:58
> To: You, KaisenX 
> Cc: dev@dpdk.org; Zhou, YidingX ;
> david.march...@redhat.com; Matz, Olivier ;
> ferruh.yi...@amd.com; Burakov, Anatoly ; You,
> KaisenX ; sta...@dpdk.org
> Subject: Re: [PATCH v6] enhance NUMA affinity heuristic
> 
> 25/04/2023 07:16, Kaisen You:
> > Trying to allocate memory on the first detected numa node,it has less
> > chance to find some memory actually available rather than on the main
> > lcore numa node (especially when the DPDK application is started only
> > on one numa node).
> 
> You didn't change the explanations here as discussed previously.
> 
> >
> > Fixes: 8b0a1b8cb481 ("eal: stop using pthread for lcores and control
> > threads")
> > Fixes: 770d41bf3309 ("malloc: fix allocation with unknown socket ID")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: David Marchand 
> > Signed-off-by: Kaisen You 
> >
> > ---
> > Changes since v5:
> > - Add comments to the code,
> >
> > Changes since v4:
> > - mod the patch title,
> >
> > Changes since v3:
> > - add the assignment of socket_id in thread initialization,
> >
> > Changes since v2:
> > - add uncommitted local change and fix compilation,
> >
> > Changes since v1:
> > - accomodate for configurations with main lcore running on multiples
> >   physical cores belonging to different numa,
> > ---
> >  lib/eal/common/eal_common_thread.c | 4 
> >  lib/eal/common/malloc_heap.c   | 6 ++
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/lib/eal/common/eal_common_thread.c
> > b/lib/eal/common/eal_common_thread.c
> > index 079a385630..d65bfe251b 100644
> > --- a/lib/eal/common/eal_common_thread.c
> > +++ b/lib/eal/common/eal_common_thread.c
> > @@ -252,6 +252,10 @@ static int ctrl_thread_init(void *arg)
> > struct rte_thread_ctrl_params *params = arg;
> >
> > __rte_thread_init(rte_lcore_id(), cpuset);
> > +   /* set the value of the per-core variable _socket_id.
> > +* Convenient for threads to find memory.
> > +*/
> 
> That's an useless comment.
> We want to know WHY you are setting to SOCKET_ID_ANY.
> 
> > +   RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
> > params->ret = rte_thread_set_affinity_by_id(rte_thread_self(),
> cpuset);
> > if (params->ret != 0) {
> > __atomic_store_n(¶ms->ctrl_thread_status,
> > diff --git a/lib/eal/common/malloc_heap.c
> > b/lib/eal/common/malloc_heap.c index d25bdc98f9..a624f08cf7 100644
> > --- a/lib/eal/common/malloc_heap.c
> > +++ b/lib/eal/common/malloc_heap.c
> > @@ -716,6 +716,12 @@ malloc_get_numa_socket(void)
> > if (conf->socket_mem[socket_id] != 0)
> > return socket_id;
> > }
> > +   /* Trying to allocate memory on the main lcore numa node.
> > +* especially when the DPDK application is started only on one numa
> node.
> > +*/
> > +   socket_id = rte_lcore_to_socket_id(rte_get_main_lcore());
> > +   if (socket_id != (unsigned int)SOCKET_ID_ANY)
> > +   return socket_id;
> 
> You should explain why we could reach this point, i.e. SOCKET_ID_ANY.

I'm sorry I didn't reply to you in time due to COVID-19 infection, I will send 
the v7 
version as soon as possible to fix the problem you mentioned.
> 
> >
> > return rte_socket_id_by_idx(0);
> >  }
> 
> 



[PATCH v1 0/2] Fix VXLAN matching

2023-05-15 Thread Rongwei Liu
Fix VXLAN matching with zero value and mis5 layout size calculation.

Rongwei Liu (2):
  net/mlx5: fix matcher layout size calculation
  net/mlx5: fix VXLAN matching with zero value

 drivers/net/mlx5/mlx5_flow_dv.c | 37 ++---
 1 file changed, 11 insertions(+), 26 deletions(-)

-- 
2.27.0



[PATCH v1 1/2] net/mlx5: fix matcher layout size calculation

2023-05-15 Thread Rongwei Liu
Initially, the rdma-core library only supported misc0 to misc3 fields in
matching resources, misc4 and misc5 fields were added to handle new
features. The matcher layout, passing from DPDK to rdma-core, shouldn't
exceed the size of the engaged library version capabilities.

For now, there is no way to know what is the maximum capability of
rdma-core, and DPDK should limit the matcher layout to misc3 if
possible (no matching on fields misc4 and misc5 are requested by the
application).

The check if misc4 and misc5 features were requested was based on checking
the values against zeroes. The matching mask should be checked instead.

Fixes: 630a587bfb37 ("net/mlx5: support matching on VXLAN reserved field")
Cc: rongw...@nvidia.com
Cc: sta...@dpdk.org
Signed-off-by: Rongwei Liu 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_flow_dv.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index f136f43b0a..f44d621600 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -15167,7 +15167,7 @@ flow_dv_apply(struct rte_eth_dev *dev, struct rte_flow 
*flow,
}
dv->actions[n++] = priv->sh->default_miss_action;
}
-   misc_mask = flow_dv_matcher_enable(dv->value.buf);
+   misc_mask = flow_dv_matcher_enable(dv_h->matcher->mask.buf);
__flow_dv_adjust_buf_size(&dv->value.size, misc_mask);
err = mlx5_flow_os_create_flow(dv_h->matcher->matcher_object,
   (void *)&dv->value, n,
@@ -17367,7 +17367,7 @@ flow_dv_destroy_def_policy(struct rte_eth_dev *dev)
 static int
 __flow_dv_create_policy_flow(struct rte_eth_dev *dev,
uint32_t color_reg_c_idx,
-   enum rte_color color, void *matcher_object,
+   enum rte_color color, struct mlx5_flow_dv_matcher 
*matcher,
int actions_n, void *actions,
bool match_src_port, const struct rte_flow_item *item,
void **rule, const struct rte_flow_attr *attr)
@@ -17397,9 +17397,9 @@ __flow_dv_create_policy_flow(struct rte_eth_dev *dev,
}
flow_dv_match_meta_reg(value.buf, (enum modify_reg)color_reg_c_idx,
   rte_col_2_mlx5_col(color), UINT32_MAX);
-   misc_mask = flow_dv_matcher_enable(value.buf);
+   misc_mask = flow_dv_matcher_enable(matcher->mask.buf);
__flow_dv_adjust_buf_size(&value.size, misc_mask);
-   ret = mlx5_flow_os_create_flow(matcher_object, (void *)&value,
+   ret = mlx5_flow_os_create_flow(matcher->matcher_object, (void *)&value,
   actions_n, actions, rule);
if (ret) {
DRV_LOG(ERR, "Failed to create meter policy%d flow.", color);
@@ -17553,7 +17553,7 @@ __flow_dv_create_domain_policy_rules(struct rte_eth_dev 
*dev,
/* Create flow, matching color. */
if (__flow_dv_create_policy_flow(dev,
color_reg_c_idx, (enum rte_color)i,
-   color_rule->matcher->matcher_object,
+   color_rule->matcher,
acts[i].actions_n, acts[i].dv_actions,
svport_match, NULL, &color_rule->rule,
&attr)) {
@@ -18021,7 +18021,7 @@ flow_dv_create_mtr_tbls(struct rte_eth_dev *dev,
actions[i++] = priv->sh->dr_drop_action;
flow_dv_match_meta_reg_all(matcher_para.buf, value.buf,
(enum modify_reg)mtr_id_reg_c, 0, 0);
-   misc_mask = flow_dv_matcher_enable(value.buf);
+   misc_mask = 
flow_dv_matcher_enable(mtrmng->def_matcher[domain]->mask.buf);
__flow_dv_adjust_buf_size(&value.size, misc_mask);
ret = mlx5_flow_os_create_flow
(mtrmng->def_matcher[domain]->matcher_object,
@@ -18066,7 +18066,7 @@ flow_dv_create_mtr_tbls(struct rte_eth_dev *dev,
fm->drop_cnt, NULL);
actions[i++] = cnt->action;
actions[i++] = priv->sh->dr_drop_action;
-   misc_mask = flow_dv_matcher_enable(value.buf);
+   misc_mask = flow_dv_matcher_enable(drop_matcher->mask.buf);
__flow_dv_adjust_buf_size(&value.size, misc_mask);
ret = mlx5_flow_os_create_flow(drop_matcher->matcher_object,
   (void *)&value, i, actions,
@@ -18546,7 +18546,7 @@ flow_dv_meter_hierarchy_rule_create(struct rte_eth_dev 
*dev,
goto err_exit;
}
if (__flow_dv_create_policy

[PATCH v1 2/2] net/mlx5: fix VXLAN matching with zero value

2023-05-15 Thread Rongwei Liu
When an application wants to match VxLAN last_rsvd value zero,
PMD sets the matching mask field to zero by mistake and it causes
traffic with any last_rsvd value hits. The matching mask should be
taken from application input directly, no need to perform the bit reset
operation.

Fixes: cd4ab742064a ("net/mlx5: split flow item matcher and value translation")
Cc: suanmi...@nvidia.com
Cc: sta...@dpdk.org
Signed-off-by: Rongwei Liu 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_flow_dv.c | 19 ++-
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index f44d621600..1abc4acad7 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -9480,12 +9480,10 @@ flow_dv_translate_item_vxlan(struct rte_eth_dev *dev,
 {
const struct rte_flow_item_vxlan *vxlan_m;
const struct rte_flow_item_vxlan *vxlan_v;
-   const struct rte_flow_item_vxlan *vxlan_vv = item->spec;
void *headers_v;
void *misc_v;
void *misc5_v;
uint32_t tunnel_v;
-   uint32_t *tunnel_header_v;
char *vni_v;
uint16_t dport;
int size;
@@ -9537,24 +9535,11 @@ flow_dv_translate_item_vxlan(struct rte_eth_dev *dev,
vni_v[i] = vxlan_m->hdr.vni[i] & vxlan_v->hdr.vni[i];
return;
}
-   tunnel_header_v = (uint32_t *)MLX5_ADDR_OF(fte_match_set_misc5,
-  misc5_v,
-  tunnel_header_1);
tunnel_v = (vxlan_v->hdr.vni[0] & vxlan_m->hdr.vni[0]) |
   (vxlan_v->hdr.vni[1] & vxlan_m->hdr.vni[1]) << 8 |
   (vxlan_v->hdr.vni[2] & vxlan_m->hdr.vni[2]) << 16;
-   *tunnel_header_v = tunnel_v;
-   if (key_type == MLX5_SET_MATCHER_SW_M) {
-   tunnel_v = (vxlan_vv->hdr.vni[0] & vxlan_m->hdr.vni[0]) |
-  (vxlan_vv->hdr.vni[1] & vxlan_m->hdr.vni[1]) << 8 |
-  (vxlan_vv->hdr.vni[2] & vxlan_m->hdr.vni[2]) << 16;
-   if (!tunnel_v)
-   *tunnel_header_v = 0x0;
-   if (vxlan_vv->hdr.rsvd1 & vxlan_m->hdr.rsvd1)
-   *tunnel_header_v |= vxlan_v->hdr.rsvd1 << 24;
-   } else {
-   *tunnel_header_v |= (vxlan_v->hdr.rsvd1 & vxlan_m->hdr.rsvd1) 
<< 24;
-   }
+   tunnel_v |= (vxlan_v->hdr.rsvd1 & vxlan_m->hdr.rsvd1) << 24;
+   MLX5_SET(fte_match_set_misc5, misc5_v, tunnel_header_1, 
RTE_BE32(tunnel_v));
 }
 
 /**
-- 
2.27.0



[PATCH v1] net/mlx5: add test for hot upgrade

2023-05-15 Thread Rongwei Liu
This patch adds testpmd app a runtime function to test the hot
upgrade API.

testpmd> mlx5 set flow_engine <0|1> (flag)
0 stands for active mode while 1 for standby mode.

Signed-off-by: Rongwei Liu 
Acked-by: Viacheslav Ovsiienko 
---
 doc/guides/nics/mlx5.rst| 10 ++
 drivers/net/mlx5/mlx5_testpmd.c | 63 +
 2 files changed, 73 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 7a137d5f6a..1867b2a3c0 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2057,3 +2057,13 @@ where:
 * ``sw_queue_id``: queue index in range [64536, 65535].
   This range is the highest 1000 numbers.
 * ``hw_queue_id``: queue index given by HW in queue creation.
+
+Set Flow Engine Mode
+
+
+Set the flow engine to active(0) or standby(1) mode with specific flags::
+   testpmd> mlx5 set flow_engine <0|1> (flags)
+
+This command works for software steering only.
+Default FDB jump should be disabled if switchdev is enabled.
+The mode will propagate to all the probed ports.
diff --git a/drivers/net/mlx5/mlx5_testpmd.c b/drivers/net/mlx5/mlx5_testpmd.c
index 879ea2826e..cb0cafe1e9 100644
--- a/drivers/net/mlx5/mlx5_testpmd.c
+++ b/drivers/net/mlx5/mlx5_testpmd.c
@@ -561,6 +561,64 @@ cmdline_parse_inst_t mlx5_cmd_unmap_ext_rxq = {
}
 };
 
+/* Set flow engine mode with flags command. */
+struct mlx5_cmd_set_flow_engine_mode {
+   cmdline_fixed_string_t mlx5;
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t flow_engine;
+   uint8_t mode;
+   uint32_t flags;
+};
+
+
+static void
+mlx5_cmd_set_flow_engine_mode_parsed(void *parsed_result,
+__rte_unused struct cmdline *cl,
+__rte_unused void *data)
+{
+   struct mlx5_cmd_set_flow_engine_mode *res = parsed_result;
+   int ret;
+
+   ret = rte_pmd_mlx5_flow_engine_set_mode((enum 
mlx5_flow_engine_mode)res->mode, res->flags);
+
+   if (ret < 0)
+   fprintf(stderr, "Fail to set flow_engine to %s mode with flags 
%x, error %s\n",
+   !res->mode ? "active" : "standby", res->flags, 
strerror(-ret));
+   else
+   TESTPMD_LOG(DEBUG, "Set %d ports flow_engine to %s mode with 
flags %x\n", ret,
+   !res->mode ? "active" : "standby", res->flags);
+}
+
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_mlx5 =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, mlx5,
+"mlx5");
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_set =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, set,
+"set");
+cmdline_parse_token_string_t mlx5_cmd_set_flow_engine_mode_flow_engine =
+   TOKEN_STRING_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, 
flow_engine,
+"flow_engine");
+cmdline_parse_token_num_t mlx5_cmd_set_flow_engine_mode_mode =
+   TOKEN_NUM_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, mode,
+ RTE_UINT8);
+cmdline_parse_token_num_t mlx5_cmd_set_flow_engine_mode_flags =
+   TOKEN_NUM_INITIALIZER(struct mlx5_cmd_set_flow_engine_mode, flags,
+ RTE_UINT32);
+
+cmdline_parse_inst_t mlx5_cmd_set_flow_engine_mode = {
+   .f = &mlx5_cmd_set_flow_engine_mode_parsed,
+   .data = NULL,
+   .help_str = "mlx5 set flow_engine <0|1> (flag)",
+   .tokens = {
+   (void *)&mlx5_cmd_set_flow_engine_mode_mlx5,
+   (void *)&mlx5_cmd_set_flow_engine_mode_set,
+   (void *)&mlx5_cmd_set_flow_engine_mode_flow_engine,
+   (void *)&mlx5_cmd_set_flow_engine_mode_mode,
+   (void *)&mlx5_cmd_set_flow_engine_mode_flags,
+   NULL,
+   }
+};
+
 static struct testpmd_driver_commands mlx5_driver_cmds = {
.commands = {
{
@@ -588,6 +646,11 @@ static struct testpmd_driver_commands mlx5_driver_cmds = {
.help = "mlx5 port (port_id) ext_rxq unmap 
(sw_queue_id)\n"
"Unmap external Rx queue ethdev index 
mapping\n\n",
},
+   {
+   .ctx = &mlx5_cmd_set_flow_engine_mode,
+   .help = "mlx5 set flow_engine (mode) (flag)\n"
+   "Set flow_engine to the specific mode with 
flags.\n\n"
+   },
{
.ctx = NULL,
},
-- 
2.27.0



[PATCH v1 0/2] disallow duplicated tag index

2023-05-15 Thread Rongwei Liu
Doesn't allow duplicated tag index matching.

Rongwei Liu (2):
  net/mlx5: disallow duplicated tag index in pattern template
  net/mlx5: fix sws duplicated tag index matching

 drivers/net/mlx5/mlx5_flow_dv.c | 21 +++--
 drivers/net/mlx5/mlx5_flow_hw.c | 25 +
 2 files changed, 40 insertions(+), 6 deletions(-)

-- 
2.27.0



[PATCH v1 2/2] net/mlx5: fix sws duplicated tag index matching

2023-05-15 Thread Rongwei Liu
Duplicated matching tag index is not allowed in sws
because they are using the same matching field in the
underlayer layout.
For example: "tag index is 0 data spec 0x12 mask 0xff /
tag index is 0 data spec 0x1234 mask 0x" is paradoxical
matching condition and "tag index is 0 data spec 0x12 mask 0xff /
tag index is 0 data spec 0x3400 mask 0xff00" should be "tag index
is 0 data spec 0x3412 mask 0x"

Add checking logic against it.

Fixes: e554b672aa05 ("net/mlx5: support flow tag")
Cc: viachesl...@nvidia.com
Cc: sta...@dpdk.org
Signed-off-by: Rongwei Liu 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_flow_dv.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1abc4acad7..db2bf615db 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2294,6 +2294,8 @@ flow_dv_validate_item_meta(struct rte_eth_dev *dev 
__rte_unused,
  *   Pointer to the rte_eth_dev structure.
  * @param[in] item
  *   Item specification.
+ * @param[in] tag_bitmap
+ *   Tag index bitmap.
  * @param[in] attr
  *   Attributes of flow that includes this item.
  * @param[out] error
@@ -2305,6 +2307,7 @@ flow_dv_validate_item_meta(struct rte_eth_dev *dev 
__rte_unused,
 static int
 flow_dv_validate_item_tag(struct rte_eth_dev *dev,
  const struct rte_flow_item *item,
+ uint32_t *tag_bitmap,
  const struct rte_flow_attr *attr __rte_unused,
  struct rte_flow_error *error)
 {
@@ -2348,6 +2351,12 @@ flow_dv_validate_item_tag(struct rte_eth_dev *dev,
if (ret < 0)
return ret;
MLX5_ASSERT(ret != REG_NON);
+   if (*tag_bitmap & (1 << ret))
+   return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+ item->spec,
+ "Duplicated tag index");
+   *tag_bitmap |= 1 << ret;
return 0;
 }
 
@@ -7290,9 +7299,10 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct 
rte_flow_attr *attr,
bool def_policy = false;
bool shared_count = false;
uint16_t udp_dport = 0;
-   uint32_t tag_id = 0;
+   uint32_t tag_id = 0, tag_bitmap = 0;
const struct rte_flow_action_age *non_shared_age = NULL;
const struct rte_flow_action_count *count = NULL;
+   const struct mlx5_rte_flow_item_tag *mlx5_tag;
struct mlx5_priv *act_priv = NULL;
int aso_after_sample = 0;
 
@@ -7621,7 +7631,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct 
rte_flow_attr *attr,
last_item = MLX5_FLOW_LAYER_ICMP6;
break;
case RTE_FLOW_ITEM_TYPE_TAG:
-   ret = flow_dv_validate_item_tag(dev, items,
+   ret = flow_dv_validate_item_tag(dev, items, &tag_bitmap,
attr, error);
if (ret < 0)
return ret;
@@ -7631,6 +7641,13 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct 
rte_flow_attr *attr,
last_item = MLX5_FLOW_ITEM_SQ;
break;
case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+   mlx5_tag = (const struct mlx5_rte_flow_item_tag 
*)items->spec;
+   if (tag_bitmap & (1 << mlx5_tag->id))
+   return rte_flow_error_set(error, EINVAL,
+ 
RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+ items->spec,
+ "Duplicated tag 
index");
+   tag_bitmap |= 1 << mlx5_tag->id;
break;
case RTE_FLOW_ITEM_TYPE_GTP:
ret = flow_dv_validate_item_gtp(dev, items, item_flags,
-- 
2.27.0



[PATCH v1 1/2] net/mlx5: disallow duplicated tag index in pattern template

2023-05-15 Thread Rongwei Liu
Duplicated tag index in pattern template will most likely cause
matching failures such as "template tag index is 0 data mask 0xff /
tag index is 0 data mask 0x / end"

If the upper layer application needs to match the same tag twice
with different masks, it should be consolidated into one rte_item
with the desired mask.

"template tag index is 0 data mask 0xff / tag index is 0 data mask
0xff00 / end" should be present as "template tag index is 0 data mask
0x / end"

Cc: sta...@dpdk.org
Signed-off-by: Rongwei Liu 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 7e0ee8d883..78011584eb 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4831,8 +4831,9 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 struct rte_flow_error *error)
 {
struct mlx5_priv *priv = dev->data->dev_private;
-   int i;
+   int i, tag_idx;
bool items_end = false;
+   uint32_t tag_bitmap = 0;
 
if (!attr->ingress && !attr->egress && !attr->transfer)
return rte_flow_error_set(error, EINVAL, 
RTE_FLOW_ERROR_TYPE_ATTR, NULL,
@@ -4874,16 +4875,26 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
switch (type) {
case RTE_FLOW_ITEM_TYPE_TAG:
{
-   int reg;
const struct rte_flow_item_tag *tag =
(const struct rte_flow_item_tag *)items[i].spec;
 
-   reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, 
tag->index);
-   if (reg == REG_NON)
+   if (tag == NULL)
+   return rte_flow_error_set(error, EINVAL,
+ 
RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+ NULL,
+ "Tag spec is NULL");
+   tag_idx = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, 
tag->index);
+   if (tag_idx == REG_NON)
return rte_flow_error_set(error, EINVAL,
  
RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
  NULL,
  "Unsupported tag 
index");
+   if (tag_bitmap & (1 << tag_idx))
+   return rte_flow_error_set(error, EINVAL,
+ 
RTE_FLOW_ERROR_TYPE_ITEM,
+ NULL,
+ "Duplicated tag 
index");
+   tag_bitmap |= 1 << tag_idx;
break;
}
case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
@@ -4897,6 +4908,12 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
  
RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
  NULL,
  "Unsupported internal 
tag index");
+   if (tag_bitmap & (1 << tag->index))
+   return rte_flow_error_set(error, EINVAL,
+ 
RTE_FLOW_ERROR_TYPE_ITEM,
+ NULL,
+ "Duplicated tag 
index");
+   tag_bitmap |= 1 << tag->index;
break;
}
case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
-- 
2.27.0



RE: [PATCH] examples/l3fwd: add hard code to collect empty poll and NIC counters

2023-05-15 Thread Feifei Wang
Thanks for the comment.

> -Original Message-
> From: Stephen Hemminger 
> Sent: Thursday, May 11, 2023 11:45 PM
> To: Feifei Wang 
> Cc: dev@dpdk.org; nd ; Lijian Zhang
> ; Ruifeng Wang ;
> Honnappa Nagarahalli 
> Subject: Re: [PATCH] examples/l3fwd: add hard code to collect empty poll and
> NIC counters
> 
> On Thu, 11 May 2023 16:25:19 +0800
> Feifei Wang  wrote:
> 
> > This patch is to collect empty poll of 'rte_eth_rx_burst' functions in
> > dpdk l3fwd application. Empty poll means Rx burst function receives no
> > pkts in one loop.
> >
> > Furthermore, we also add 'nic_xstats_display' API to show NIC counters.
> >
> > Usage:
> > With this patch, no special settings, just run l3fwd, and when you
> > stoping l3fwd, thread will print the info above.
> >
> > Note:
> > This patch has just a slight impact on performance and can be ignored.
> 
> There was a set of other patches around telemetry.
> Shouldn't this example use some of that rather than "roll your own"?
But for 'empty poll', it seems have no API to collect this.
By the way, would you please share the API or patch which can do like this?
Thanks very much.
 
> 
> >
> > dpdk version:23.03
> >
> > Suggested-by: Lijian Zhang 
> > Signed-off-by: Feifei Wang 
> > Reviewed-by: Ruifeng Wang 
> > Reviewed-by: Honnappa Nagarahalli 
> > ---
> >  examples/l3fwd/l3fwd.h | 68
> ++
> >  examples/l3fwd/l3fwd_lpm.c | 26 +--
> >  examples/l3fwd/main.c  | 22 
> >  3 files changed, 114 insertions(+), 2 deletions(-)
> >
> > diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index
> > b55855c932..2b3fca62f3 100644
> > --- a/examples/l3fwd/l3fwd.h
> > +++ b/examples/l3fwd/l3fwd.h
> > @@ -56,6 +56,17 @@
> >  #define L3FWD_HASH_ENTRIES (1024*1024*1)
> >  #endif
> >
> > +struct lcore_stats {
> > +uint32_t nb_rx_pkts[16];
> > +uint32_t num_loop[16];
> > +uint32_t none_loop[16];
> > +uint32_t no_full_loop[16];
> > +float  none_loop_per[16];
> > +float no_full_loop_per[16];
> > +} __rte_cache_aligned;
> 
> What is the 16 magic value?
I think here should be use MAX_RX_QUEUE_PER_LCORE to replace 16.

> 
> Use double instead of float to keep more accuracy?
Agree.
> 
> There maybe holes in this structure?
> 
> You may want to allocate it at runtime based on number of actual cores and
> get it on the right NUMA node.
Ok. So maybe each lcore is responsible to define this struct for itself?

> 
> > +
> > +extern struct lcore_stats stats[RTE_MAX_LCORE];
> > +
> >  struct parm_cfg {
> > const char *rule_ipv4_name;
> > const char *rule_ipv6_name;
> > @@ -115,6 +126,63 @@ extern struct acl_algorithms acl_alg[];
> >
> >  extern uint32_t max_pkt_len;
> >
> > +static inline void
> > +nic_xstats_display(uint32_t port_id)
> > +{
> > +struct rte_eth_xstat *xstats;
> > +int cnt_xstats, idx_xstat;
> > +struct rte_eth_xstat_name *xstats_names;
> > +
> > +printf("## NIC extended statistics for port %-2d\n", port_id);
> > +if (!rte_eth_dev_is_valid_port(port_id)) {
> > +fprintf(stderr, "Error: Invalid port number %i\n", 
> > port_id);
> > +return;
> > +}
> > +
> > +/* Get count */
> > +cnt_xstats = rte_eth_xstats_get_names(port_id, NULL, 0);
> > +if (cnt_xstats  < 0) {
> > +fprintf(stderr, "Error: Cannot get count of xstats\n");
> > +return;
> > +}
> > +
> > +/* Get id-name lookup table */
> > +xstats_names = malloc(sizeof(struct rte_eth_xstat_name) * 
> > cnt_xstats);
> > +if (xstats_names == NULL) {
> > +fprintf(stderr, "Cannot allocate memory for xstats 
> > lookup\n");
> > +return;
> > +}
> > +if (cnt_xstats != rte_eth_xstats_get_names(
> > +port_id, xstats_names, cnt_xstats)) {
> > +fprintf(stderr, "Error: Cannot get xstats lookup\n");
> > +free(xstats_names);
> > +return;
> > +}
> > +
> > +/* Get stats themselves */
> > +xstats = malloc(sizeof(struct rte_eth_xstat) * cnt_xstats);
> > +if (xstats == NULL) {
> > +fprintf(stderr, "Cannot allocate memory for xstats\n");
> > +free(xstats_names);
> > +return;
> > +}
> > +if (cnt_xstats != rte_eth_xstats_get(port_id, xstats, cnt_xstats)) 
> > {
> > +fprintf(stderr, "Error: Unable to get xstats\n");
> > +free(xstats_names);
> > +free(xstats);
> > +return;
> > +}
> > +
> > +/* Display xstats */
> > +for (idx_xstat = 0; idx_xstat < cnt_xstats; idx_xstat++) {
> > +printf("%s: %"PRIu64"\n",
> > +xstats_names[idx_xstat].name,
> > +xstats[idx_xstat].value);
> > +}
> > +free(xstats_name

[PATCH v1 0/7] ethdev: modify field API for multiple headers

2023-05-15 Thread Michael Baum
This patch-set extend the modify field action API to support both
multiple MPLS and GENEVE option headers.

In current API, the header type is provided by rte_flow_field_id
enumeration and the encapsulation level (inner/outer/tunnel) is
specified by data.level field.
However, there is no way to specify header inside encapsulation level.

For example, for this packet:

eth / mpls / mpls / mpls / ipv4 / udp
the both second and third MPLS headers cannot be modified using this
API.

RFC:
https://patchwork.dpdk.org/project/dpdk/cover/20230420092145.522389-1-michae...@nvidia.com/

Michael Baum (7):
  doc: fix blank lines in modify field action description
  doc: fix blank line in asynchronous operations description
  doc: fix wrong indentation in RSS action description
  net/mlx5: reduce modify field encapsulation level size
  ethdev: add GENEVE TLV option modification support
  ethdev: add MPLS header modification support
  net/mlx5: add MPLS modify field support

 app/test-pmd/cmdline_flow.c| 70 -
 doc/guides/prog_guide/rte_flow.rst | 38 +++---
 doc/guides/rel_notes/release_23_07.rst |  3 ++
 drivers/common/mlx5/mlx5_prm.h |  5 ++
 drivers/net/mlx5/mlx5_flow_dv.c| 23 
 drivers/net/mlx5/mlx5_flow_hw.c| 38 +++---
 lib/ethdev/rte_flow.h  | 72 --
 7 files changed, 219 insertions(+), 30 deletions(-)

-- 
2.25.1



[PATCH v1 1/7] doc: fix blank lines in modify field action description

2023-05-15 Thread Michael Baum
The modify field action description inside "Generic flow API (rte_flow)"
documentation, lists all operations supported for a destination field.
In addition, it lists the values supported for a encapsulation level
field.

Before the lists, in both cases, miss a blank line causing them to look
regular text lines.

This patch adds the blank lines.

Fixes: 73b68f4c54a0 ("ethdev: introduce generic modify flow action")
Cc: akozy...@nvidia.com
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
---
 doc/guides/prog_guide/rte_flow.rst | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index 32fc45516a..e7faa368a1 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2917,20 +2917,23 @@ The immediate value ``RTE_FLOW_FIELD_VALUE`` (or a 
pointer to it
 ``RTE_FLOW_FIELD_START`` is used to point to the beginning of a packet.
 See ``enum rte_flow_field_id`` for the list of supported fields.
 
-``op`` selects the operation to perform on a destination field.
+``op`` selects the operation to perform on a destination field:
+
 - ``set`` copies the data from ``src`` field to ``dst`` field.
 - ``add`` adds together ``dst`` and ``src`` and stores the result into ``dst``.
-- ``sub`` subtracts ``src`` from ``dst`` and stores the result into ``dst``
+- ``sub`` subtracts ``src`` from ``dst`` and stores the result into ``dst``.
 
 ``width`` defines a number of bits to use from ``src`` field.
 
 ``level`` is used to access any packet field on any encapsulation level
-as well as any tag element in the tag array.
-- ``0`` means the default behaviour. Depending on the packet type, it can
-mean outermost, innermost or anything in between.
+as well as any tag element in the tag array:
+
+- ``0`` means the default behaviour. Depending on the packet type,
+  it can mean outermost, innermost or anything in between.
 - ``1`` requests access to the outermost packet encapsulation level.
 - ``2`` and subsequent values requests access to the specified packet
-encapsulation level, from outermost to innermost (lower to higher values).
+  encapsulation level, from outermost to innermost (lower to higher values).
+
 For the tag array (in case of multiple tags are supported and present)
 ``level`` translates directly into the array index.
 
-- 
2.25.1



[PATCH v1 2/7] doc: fix blank line in asynchronous operations description

2023-05-15 Thread Michael Baum
The asynchronous operations description inside "Generic flow API
(rte_flow)" documentation, adds some bullets to describe asynchronous
operations behavior.

Before the first bullet, miss a blank line causing it to look a regular
text line.

This patch adds the blank line.

Fixes: 197e820c6685 ("ethdev: bring in async queue-based flow rules operations")
Cc: akozy...@nvidia.com
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
---
 doc/guides/prog_guide/rte_flow.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index e7faa368a1..76e69190fc 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3702,6 +3702,7 @@ Asynchronous operations
 ---
 
 Flow rules management can be done via special lockless flow management queues.
+
 - Queue operations are asynchronous and not thread-safe.
 
 - Operations can thus be invoked by the app's datapath,
-- 
2.25.1



[PATCH v1 3/7] doc: fix wrong indentation in RSS action description

2023-05-15 Thread Michael Baum
The RSS action description inside "Generic flow API (rte_flow)"
documentation, lists the values supported for a encapsulation level
field.

For "2" value, it uses 3 spaces as an indentation instead of 2 after
line breaking, causing the first line to be bold.

This patch updates the number of spaces in the indentation.

Fixes: 18aee2861a1f ("ethdev: add encap level to RSS flow API action")
Cc: adrien.mazarg...@6wind.com
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
---
 doc/guides/prog_guide/rte_flow.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index 76e69190fc..25b57bf86d 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1954,8 +1954,8 @@ Also, regarding packet encapsulation ``level``:
   level.
 
 - ``2`` and subsequent values request RSS to be performed on the specified
-   inner packet encapsulation level, from outermost to innermost (lower to
-   higher values).
+  inner packet encapsulation level, from outermost to innermost (lower to
+  higher values).
 
 Values other than ``0`` are not necessarily supported.
 
-- 
2.25.1



[PATCH v1 4/7] net/mlx5: reduce modify field encapsulation level size

2023-05-15 Thread Michael Baum
The type of "level" field in "rte_flow_action_modify_data" structure is
uint32_t for now, but it is going to be changed to uint8_t in the next
patch.

For representing encapsulation level, 8 bits are more than enough and
this change shouldn't affect the current implementation.
However, when action template is created, the PMD requests to provide
this field "fully masked" in action mask. The "fully masked" value is
different between uint32_t and uint8_t types.

This patch reduces all modify field encapsulation level "fully masked"
initializations to use UINT8_MAX instead of UINT32_MAX. This change will
avoid compilation warning after it will be changed to uint8_t by API.

Signed-off-by: Michael Baum 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 7e0ee8d883..1b68a19900 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3565,7 +3565,7 @@ flow_hw_validate_action_modify_field(const struct 
rte_flow_action *action,
return rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ACTION, action,
"immediate value, pointer and hash result 
cannot be used as destination");
-   if (mask_conf->dst.level != UINT32_MAX)
+   if (mask_conf->dst.level != UINT8_MAX)
return rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ACTION, action,
"destination encapsulation level must be fully masked");
@@ -3579,7 +3579,7 @@ flow_hw_validate_action_modify_field(const struct 
rte_flow_action *action,
"destination field mask and template are not 
equal");
if (action_conf->src.field != RTE_FLOW_FIELD_POINTER &&
action_conf->src.field != RTE_FLOW_FIELD_VALUE) {
-   if (mask_conf->src.level != UINT32_MAX)
+   if (mask_conf->src.level != UINT8_MAX)
return rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ACTION, action,
"source encapsulation level must be fully 
masked");
@@ -4450,7 +4450,7 @@ flow_hw_set_vlan_vid(struct rte_eth_dev *dev,
.operation = RTE_FLOW_MODIFY_SET,
.dst = {
.field = RTE_FLOW_FIELD_VLAN_ID,
-   .level = 0x, .offset = 0x,
+   .level = 0xff, .offset = 0x,
},
.src = {
.field = RTE_FLOW_FIELD_VALUE,
@@ -4583,12 +4583,12 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
.operation = RTE_FLOW_MODIFY_SET,
.dst = {
.field = (enum 
rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
-   .level = UINT32_MAX,
+   .level = UINT8_MAX,
.offset = UINT32_MAX,
},
.src = {
.field = (enum 
rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
-   .level = UINT32_MAX,
+   .level = UINT8_MAX,
.offset = UINT32_MAX,
},
.width = UINT32_MAX,
@@ -5653,7 +5653,7 @@ flow_hw_create_tx_repr_tag_jump_acts_tmpl(struct 
rte_eth_dev *dev)
.operation = RTE_FLOW_MODIFY_SET,
.dst = {
.field = (enum 
rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
-   .level = UINT32_MAX,
+   .level = UINT8_MAX,
.offset = UINT32_MAX,
},
.src = {
@@ -5677,12 +5677,12 @@ flow_hw_create_tx_repr_tag_jump_acts_tmpl(struct 
rte_eth_dev *dev)
.operation = RTE_FLOW_MODIFY_SET,
.dst = {
.field = (enum 
rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
-   .level = UINT32_MAX,
+   .level = UINT8_MAX,
.offset = UINT32_MAX,
},
.src = {
.field = (enum 
rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
-   .level = UINT32_MAX,
+   .level = UINT8_MAX,
.offset = UINT32_MAX,
},
.width = UINT32_MAX,
@@ -6009,7 +6009,7 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct 
rte_eth_dev *dev)
.operation = RTE_FLOW_MODIFY_SET,
.dst = {
.field = (enum 
rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
-   .level = UINT32_MAX,
+   .level = UINT8_MAX,
.offset = UINT32_MAX,
},
.src = {
@@ -618

[PATCH v1 6/7] ethdev: add MPLS header modification support

2023-05-15 Thread Michael Baum
Add support for MPLS modify header using "RTE_FLOW_FIELD_MPLS" id.

Since MPLS heaser might appear more the one time in inner/outer/tunnel,
a new field was added to "rte_flow_action_modify_data" structure in
addition to "level" field.
The "sub_level" field is the index of the header inside encapsulation
level. It is used for modify multiple MPLS headers in same encapsulation
level.

This addition enables to modify multiple VLAN headers too, so the
description of "RTE_FLOW_FIELD_VLAN_" was updated.

Signed-off-by: Michael Baum 
---
 app/test-pmd/cmdline_flow.c| 24 ++-
 doc/guides/prog_guide/rte_flow.rst |  6 
 lib/ethdev/rte_flow.h  | 47 --
 3 files changed, 61 insertions(+), 16 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 8c1dea53c0..5370be1e3c 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -636,6 +636,7 @@ enum index {
ACTION_MODIFY_FIELD_DST_TYPE_VALUE,
ACTION_MODIFY_FIELD_DST_LEVEL,
ACTION_MODIFY_FIELD_DST_LEVEL_VALUE,
+   ACTION_MODIFY_FIELD_DST_SUB_LEVEL,
ACTION_MODIFY_FIELD_DST_TYPE_ID,
ACTION_MODIFY_FIELD_DST_CLASS_ID,
ACTION_MODIFY_FIELD_DST_OFFSET,
@@ -643,6 +644,7 @@ enum index {
ACTION_MODIFY_FIELD_SRC_TYPE_VALUE,
ACTION_MODIFY_FIELD_SRC_LEVEL,
ACTION_MODIFY_FIELD_SRC_LEVEL_VALUE,
+   ACTION_MODIFY_FIELD_SRC_SUB_LEVEL,
ACTION_MODIFY_FIELD_SRC_TYPE_ID,
ACTION_MODIFY_FIELD_SRC_CLASS_ID,
ACTION_MODIFY_FIELD_SRC_OFFSET,
@@ -859,7 +861,7 @@ static const char *const modify_field_ids[] = {
"ipv6_proto",
"flex_item",
"hash_result",
-   "geneve_opt_type", "geneve_opt_class", "geneve_opt_data",
+   "geneve_opt_type", "geneve_opt_class", "geneve_opt_data", "mpls",
NULL
 };
 
@@ -2301,6 +2303,7 @@ static const enum index next_action_sample[] = {
 
 static const enum index action_modify_field_dst[] = {
ACTION_MODIFY_FIELD_DST_LEVEL,
+   ACTION_MODIFY_FIELD_DST_SUB_LEVEL,
ACTION_MODIFY_FIELD_DST_TYPE_ID,
ACTION_MODIFY_FIELD_DST_CLASS_ID,
ACTION_MODIFY_FIELD_DST_OFFSET,
@@ -2310,6 +2313,7 @@ static const enum index action_modify_field_dst[] = {
 
 static const enum index action_modify_field_src[] = {
ACTION_MODIFY_FIELD_SRC_LEVEL,
+   ACTION_MODIFY_FIELD_SRC_SUB_LEVEL,
ACTION_MODIFY_FIELD_SRC_TYPE_ID,
ACTION_MODIFY_FIELD_SRC_CLASS_ID,
ACTION_MODIFY_FIELD_SRC_OFFSET,
@@ -6398,6 +6402,15 @@ static const struct token token_list[] = {
.call = parse_vc_modify_field_level,
.comp = comp_none,
},
+   [ACTION_MODIFY_FIELD_DST_SUB_LEVEL] = {
+   .name = "dst_sub_level",
+   .help = "destination field sub level",
+   .next = NEXT(action_modify_field_dst,
+NEXT_ENTRY(COMMON_UNSIGNED)),
+   .args = ARGS(ARGS_ENTRY(struct rte_flow_action_modify_field,
+   dst.sub_level)),
+   .call = parse_vc_conf,
+   },
[ACTION_MODIFY_FIELD_DST_TYPE_ID] = {
.name = "dst_type_id",
.help = "destination field type ID",
@@ -6451,6 +6464,15 @@ static const struct token token_list[] = {
.call = parse_vc_modify_field_level,
.comp = comp_none,
},
+   [ACTION_MODIFY_FIELD_SRC_SUB_LEVEL] = {
+   .name = "stc_sub_level",
+   .help = "source field sub level",
+   .next = NEXT(action_modify_field_src,
+NEXT_ENTRY(COMMON_UNSIGNED)),
+   .args = ARGS(ARGS_ENTRY(struct rte_flow_action_modify_field,
+   src.sub_level)),
+   .call = parse_vc_conf,
+   },
[ACTION_MODIFY_FIELD_SRC_TYPE_ID] = {
.name = "src_type_id",
.help = "source field type ID",
diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index cd38f0de46..1f681a38e4 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2937,6 +2937,10 @@ as well as any tag element in the tag array:
 For the tag array (in case of multiple tags are supported and present)
 ``level`` translates directly into the array index.
 
+- ``sub_level`` is the index of the header inside encapsulation level.
+  It is used for modify either ``VLAN`` or ``MPLS`` headers which multiple of
+  them might be supported in same encapsulation level.
+
 ``type`` is used to specify (along with ``class_id``) the Geneve option which
 is being modified.
 This field is relevant only for ``RTE_FLOW_FIELD_GENEVE_OPT_`` type.
@@ -3002,6 +3006,8 @@ value as sequence of bytes {xxx, xxx, 0x85, xxx, xxx, 
xxx}.

+-+--+
| ``lev

[PATCH v1 5/7] ethdev: add GENEVE TLV option modification support

2023-05-15 Thread Michael Baum
Add modify field support for GENEVE option fields:
 - "RTE_FLOW_FIELD_GENEVE_OPT_TYPE"
 - "RTE_FLOW_FIELD_GENEVE_OPT_CLASS"
 - "RTE_FLOW_FIELD_GENEVE_OPT_DATA"

Each GENEVE TLV option is identified by both its "class" and "type", so
2 new fields were added to "rte_flow_action_modify_data" structure to
help specify which option to modify.

To get room for those 2 new fields, the "level" field move to use
"uint8_t" which is more than enough for encapsulation level.

Signed-off-by: Michael Baum 
---
 app/test-pmd/cmdline_flow.c| 48 +++-
 doc/guides/prog_guide/rte_flow.rst | 12 ++
 doc/guides/rel_notes/release_23_07.rst |  3 ++
 lib/ethdev/rte_flow.h  | 51 +-
 4 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 58939ec321..8c1dea53c0 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -636,11 +636,15 @@ enum index {
ACTION_MODIFY_FIELD_DST_TYPE_VALUE,
ACTION_MODIFY_FIELD_DST_LEVEL,
ACTION_MODIFY_FIELD_DST_LEVEL_VALUE,
+   ACTION_MODIFY_FIELD_DST_TYPE_ID,
+   ACTION_MODIFY_FIELD_DST_CLASS_ID,
ACTION_MODIFY_FIELD_DST_OFFSET,
ACTION_MODIFY_FIELD_SRC_TYPE,
ACTION_MODIFY_FIELD_SRC_TYPE_VALUE,
ACTION_MODIFY_FIELD_SRC_LEVEL,
ACTION_MODIFY_FIELD_SRC_LEVEL_VALUE,
+   ACTION_MODIFY_FIELD_SRC_TYPE_ID,
+   ACTION_MODIFY_FIELD_SRC_CLASS_ID,
ACTION_MODIFY_FIELD_SRC_OFFSET,
ACTION_MODIFY_FIELD_SRC_VALUE,
ACTION_MODIFY_FIELD_SRC_POINTER,
@@ -854,7 +858,9 @@ static const char *const modify_field_ids[] = {
"ipv4_ecn", "ipv6_ecn", "gtp_psc_qfi", "meter_color",
"ipv6_proto",
"flex_item",
-   "hash_result", NULL
+   "hash_result",
+   "geneve_opt_type", "geneve_opt_class", "geneve_opt_data",
+   NULL
 };
 
 static const char *const meter_colors[] = {
@@ -2295,6 +2301,8 @@ static const enum index next_action_sample[] = {
 
 static const enum index action_modify_field_dst[] = {
ACTION_MODIFY_FIELD_DST_LEVEL,
+   ACTION_MODIFY_FIELD_DST_TYPE_ID,
+   ACTION_MODIFY_FIELD_DST_CLASS_ID,
ACTION_MODIFY_FIELD_DST_OFFSET,
ACTION_MODIFY_FIELD_SRC_TYPE,
ZERO,
@@ -2302,6 +2310,8 @@ static const enum index action_modify_field_dst[] = {
 
 static const enum index action_modify_field_src[] = {
ACTION_MODIFY_FIELD_SRC_LEVEL,
+   ACTION_MODIFY_FIELD_SRC_TYPE_ID,
+   ACTION_MODIFY_FIELD_SRC_CLASS_ID,
ACTION_MODIFY_FIELD_SRC_OFFSET,
ACTION_MODIFY_FIELD_SRC_VALUE,
ACTION_MODIFY_FIELD_SRC_POINTER,
@@ -6388,6 +6398,24 @@ static const struct token token_list[] = {
.call = parse_vc_modify_field_level,
.comp = comp_none,
},
+   [ACTION_MODIFY_FIELD_DST_TYPE_ID] = {
+   .name = "dst_type_id",
+   .help = "destination field type ID",
+   .next = NEXT(action_modify_field_dst,
+NEXT_ENTRY(COMMON_UNSIGNED)),
+   .args = ARGS(ARGS_ENTRY(struct rte_flow_action_modify_field,
+   dst.type)),
+   .call = parse_vc_conf,
+   },
+   [ACTION_MODIFY_FIELD_DST_CLASS_ID] = {
+   .name = "dst_class",
+   .help = "destination field class ID",
+   .next = NEXT(action_modify_field_dst,
+NEXT_ENTRY(COMMON_UNSIGNED)),
+   .args = ARGS(ARGS_ENTRY_HTON(struct 
rte_flow_action_modify_field,
+dst.class_id)),
+   .call = parse_vc_conf,
+   },
[ACTION_MODIFY_FIELD_DST_OFFSET] = {
.name = "dst_offset",
.help = "destination field bit offset",
@@ -6423,6 +6451,24 @@ static const struct token token_list[] = {
.call = parse_vc_modify_field_level,
.comp = comp_none,
},
+   [ACTION_MODIFY_FIELD_SRC_TYPE_ID] = {
+   .name = "src_type_id",
+   .help = "source field type ID",
+   .next = NEXT(action_modify_field_src,
+NEXT_ENTRY(COMMON_UNSIGNED)),
+   .args = ARGS(ARGS_ENTRY(struct rte_flow_action_modify_field,
+   src.type)),
+   .call = parse_vc_conf,
+   },
+   [ACTION_MODIFY_FIELD_SRC_CLASS_ID] = {
+   .name = "src_class",
+   .help = "source field class ID",
+   .next = NEXT(action_modify_field_src,
+NEXT_ENTRY(COMMON_UNSIGNED)),
+   .args = ARGS(ARGS_ENTRY_HTON(struct 
rte_flow_action_modify_field,
+src.class_id)),
+   .call = parse_vc_conf,
+   },
[ACTION_MODIFY_FIELD_SRC_OFFSET] = {
.name = "src_offset",
 

[PATCH v1 7/7] net/mlx5: add MPLS modify field support

2023-05-15 Thread Michael Baum
Add support for modify field in tunnel MPLS header.
For now it is supported only to copy from.

Signed-off-by: Michael Baum 
---
 drivers/common/mlx5/mlx5_prm.h  |  5 +
 drivers/net/mlx5/mlx5_flow_dv.c | 23 +++
 drivers/net/mlx5/mlx5_flow_hw.c | 16 +---
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index ed3d5efbb7..04c1400a1e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -787,6 +787,11 @@ enum mlx5_modification_field {
MLX5_MODI_TUNNEL_HDR_DW_1 = 0x75,
MLX5_MODI_GTPU_FIRST_EXT_DW_0 = 0x76,
MLX5_MODI_HASH_RESULT = 0x81,
+   MLX5_MODI_IN_MPLS_LABEL_0 = 0x8a,
+   MLX5_MODI_IN_MPLS_LABEL_1,
+   MLX5_MODI_IN_MPLS_LABEL_2,
+   MLX5_MODI_IN_MPLS_LABEL_3,
+   MLX5_MODI_IN_MPLS_LABEL_4,
MLX5_MODI_OUT_IPV6_NEXT_HDR = 0x4A,
MLX5_MODI_INVALID = INT_MAX,
 };
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index f136f43b0a..93cce16a1e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1388,6 +1388,7 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
case RTE_FLOW_FIELD_GENEVE_VNI:
return 24;
case RTE_FLOW_FIELD_GTP_TEID:
+   case RTE_FLOW_FIELD_MPLS:
case RTE_FLOW_FIELD_TAG:
return 32;
case RTE_FLOW_FIELD_MARK:
@@ -1435,6 +1436,12 @@ flow_modify_info_mask_32_masked(uint32_t length, 
uint32_t off, uint32_t post_mas
return rte_cpu_to_be_32(mask & post_mask);
 }
 
+static __rte_always_inline enum mlx5_modification_field
+mlx5_mpls_modi_field_get(const struct rte_flow_action_modify_data *data)
+{
+   return MLX5_MODI_IN_MPLS_LABEL_0 + data->sub_level;
+}
+
 static void
 mlx5_modify_flex_item(const struct rte_eth_dev *dev,
  const struct mlx5_flex_item *flex,
@@ -1893,6 +1900,16 @@ mlx5_flow_field_id_to_modify_info
else
info[idx].offset = off_be;
break;
+   case RTE_FLOW_FIELD_MPLS:
+   MLX5_ASSERT(data->offset + width <= 32);
+   off_be = 32 - (data->offset + width);
+   info[idx] = (struct field_modify_info){4, 0,
+   mlx5_mpls_modi_field_get(data)};
+   if (mask)
+   mask[idx] = flow_modify_info_mask_32(width, off_be);
+   else
+   info[idx].offset = off_be;
+   break;
case RTE_FLOW_FIELD_TAG:
{
MLX5_ASSERT(data->offset + width <= 32);
@@ -5344,6 +5361,12 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev 
*dev,
RTE_FLOW_ERROR_TYPE_ACTION, action,
"modifications of the GENEVE Network"
" Identifier is not supported");
+   if (action_modify_field->dst.field == RTE_FLOW_FIELD_MPLS ||
+   action_modify_field->src.field == RTE_FLOW_FIELD_MPLS)
+   return rte_flow_error_set(error, ENOTSUP,
+   RTE_FLOW_ERROR_TYPE_ACTION, action,
+   "modifications of the MPLS header "
+   "is not supported");
if (action_modify_field->dst.field == RTE_FLOW_FIELD_MARK ||
action_modify_field->src.field == RTE_FLOW_FIELD_MARK)
if (config->dv_xmeta_en == MLX5_XMETA_MODE_LEGACY ||
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 1b68a19900..80e6398992 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3546,10 +3546,8 @@ flow_hw_validate_action_modify_field(const struct 
rte_flow_action *action,
 const struct rte_flow_action *mask,
 struct rte_flow_error *error)
 {
-   const struct rte_flow_action_modify_field *action_conf =
-   action->conf;
-   const struct rte_flow_action_modify_field *mask_conf =
-   mask->conf;
+   const struct rte_flow_action_modify_field *action_conf = action->conf;
+   const struct rte_flow_action_modify_field *mask_conf = mask->conf;
 
if (action_conf->operation != mask_conf->operation)
return rte_flow_error_set(error, EINVAL,
@@ -3604,6 +3602,11 @@ flow_hw_validate_action_modify_field(const struct 
rte_flow_action *action,
return rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ACTION, action,
"modifying Geneve VNI is not supported");
+   /* Due to HW bug, tunnel MPLS header is read only. */
+   if (action_conf->dst.field == RTE_FLOW_FIELD_MPLS)
+   return rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_A