Re: [dpdk-dev] [PATCH v2] mempool: enforce valid flags at creation

2021-10-15 Thread Andrew Rybchenko
On 10/14/21 10:37 PM, Stephen Hemminger wrote:
> On Thu, 14 Oct 2021 21:29:16 +0200
> David Marchand  wrote:
> 
>> If we do not enforce valid flags are passed by an application, this
>> application might face issues in the future when we add more flags.
>>
>> Signed-off-by: David Marchand 
>> Reviewed-by: Andrew Rybchenko 
>> Acked-by: Ray Kinsella 
>> ---
>> Changes since v1:
>> - fixed checkpatch warning,
>> - moved flags to validate them in the same order than parameters,
> 
> Acked-by: Stephen Hemminger 
> 

Acked-by: Andrew Rybchenko 



[dpdk-dev] [PATCH v5 0/5] Implement rte_power_monitor API in virtio/vhost PMD

2021-10-15 Thread Miao Li
This patchset implements rte_power_monitor API in virtio and vhost PMD
to reduce power consumption when no packet come in. This API can be
called and tested in l3fwd-power after adding vhost and virtio support
in l3fwd-power and ignoring the rx queue information check in
queue_stopped().

v5:
-Rebase on lastest repo

v4:
-modify comment
-update the release note
-add IPv4 CKSUM check

v3:
-fix some code format issues
-fix spelling mistake

v2:
-remove flag and add match and size in rte_vhost_power_monitor_cond
-modify power callback function
-add dev and queue id check and remove unnecessary check
-fix the assignment of pmc->size
-update port configuration according to the device information and
remove adding command line arguments
-modify some titles

Miao Li (5):
  net/virtio: implement rte_power_monitor API
  vhost: implement rte_power_monitor API
  net/vhost: implement rte_power_monitor API
  power: modify return of queue_stopped
  examples/l3fwd-power: support virtio/vhost

 doc/guides/rel_notes/release_21_11.rst | 12 ++
 drivers/net/vhost/rte_eth_vhost.c  | 40 ++
 drivers/net/virtio/virtio_ethdev.c | 56 ++
 examples/l3fwd-power/main.c|  9 -
 lib/power/rte_power_pmd_mgmt.c |  9 -
 lib/vhost/rte_vhost.h  | 44 
 lib/vhost/version.map  |  3 ++
 lib/vhost/vhost.c  | 38 +
 8 files changed, 208 insertions(+), 3 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v5 1/5] net/virtio: implement rte_power_monitor API

2021-10-15 Thread Miao Li
This patch implements rte_power_monitor API in virtio PMD to reduce
power consumption when no packet come in. According to current semantics
of power monitor, this commit adds a callback function to decide whether
aborts the sleep by checking current value against the expected value and
virtio_get_monitor_addr to provide address to monitor. When no packet come
in, the value of address will not be changed and the running core will
sleep. Once packets arrive, the value of address will be changed and the
running core will wakeup.

Signed-off-by: Miao Li 
Reviewed-by: Chenbo Xia 
---
 doc/guides/rel_notes/release_21_11.rst |  4 ++
 drivers/net/virtio/virtio_ethdev.c | 56 ++
 2 files changed, 60 insertions(+)

diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index 4c56cdfeaa..27dc896703 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -72,6 +72,10 @@ New Features
   Added macros ETH_RSS_IPV4_CHKSUM and ETH_RSS_L4_CHKSUM, now IPv4 and
   TCP/UDP/SCTP header checksum field can be used as input set for RSS.
 
+* **Updated virtio PMD.**
+
+  Implement rte_power_monitor API in virtio PMD.
+
 * **Updated af_packet ethdev driver.**
 
   * Default VLAN strip behavior was changed. VLAN tag won't be stripped
diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 6aa36b3f39..1227f3f1f4 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -74,6 +74,8 @@ static int virtio_mac_addr_set(struct rte_eth_dev *dev,
struct rte_ether_addr *mac_addr);
 
 static int virtio_intr_disable(struct rte_eth_dev *dev);
+static int virtio_get_monitor_addr(void *rx_queue,
+   struct rte_power_monitor_cond *pmc);
 
 static int virtio_dev_queue_stats_mapping_set(
struct rte_eth_dev *eth_dev,
@@ -982,6 +984,7 @@ static const struct eth_dev_ops virtio_eth_dev_ops = {
.mac_addr_add= virtio_mac_addr_add,
.mac_addr_remove = virtio_mac_addr_remove,
.mac_addr_set= virtio_mac_addr_set,
+   .get_monitor_addr= virtio_get_monitor_addr,
 };
 
 /*
@@ -1313,6 +1316,59 @@ virtio_mac_addr_set(struct rte_eth_dev *dev, struct 
rte_ether_addr *mac_addr)
return 0;
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+#define CLB_MATCH_IDX 2
+static int
+virtio_monitor_callback(const uint64_t value,
+   const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+   const uint64_t m = opaque[CLB_MSK_IDX];
+   const uint64_t v = opaque[CLB_VAL_IDX];
+   const uint64_t c = opaque[CLB_MATCH_IDX];
+
+   if (c)
+   return (value & m) == v ? -1 : 0;
+   else
+   return (value & m) == v ? 0 : -1;
+}
+
+static int
+virtio_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
+{
+   struct virtnet_rx *rxvq = rx_queue;
+   struct virtqueue *vq = virtnet_rxq_to_vq(rxvq);
+   struct virtio_hw *hw;
+
+   if (vq == NULL)
+   return -EINVAL;
+
+   hw = vq->hw;
+   if (virtio_with_packed_queue(hw)) {
+   struct vring_packed_desc *desc;
+   desc = vq->vq_packed.ring.desc;
+   pmc->addr = &desc[vq->vq_used_cons_idx].flags;
+   if (vq->vq_packed.used_wrap_counter)
+   pmc->opaque[CLB_VAL_IDX] =
+   VRING_PACKED_DESC_F_AVAIL_USED;
+   else
+   pmc->opaque[CLB_VAL_IDX] = 0;
+   pmc->opaque[CLB_MSK_IDX] = VRING_PACKED_DESC_F_AVAIL_USED;
+   pmc->opaque[CLB_MATCH_IDX] = 1;
+   pmc->size = sizeof(desc[vq->vq_used_cons_idx].flags);
+   } else {
+   pmc->addr = &vq->vq_split.ring.used->idx;
+   pmc->opaque[CLB_VAL_IDX] = vq->vq_used_cons_idx
+   & (vq->vq_nentries - 1);
+   pmc->opaque[CLB_MSK_IDX] = vq->vq_nentries - 1;
+   pmc->opaque[CLB_MATCH_IDX] = 0;
+   pmc->size = sizeof(vq->vq_split.ring.used->idx);
+   }
+   pmc->fn = virtio_monitor_callback;
+
+   return 0;
+}
+
 static int
 virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 {
-- 
2.25.1



[dpdk-dev] [PATCH v5 2/5] vhost: implement rte_power_monitor API

2021-10-15 Thread Miao Li
This patch defines rte_vhost_power_monitor_cond which is used to pass
some information to vhost driver. The information is including the address
to monitor, the expected value, the mask to extract value read from 'addr',
the value size of monitor address, the match flag used to distinguish the
value used to match something or not match something. Vhost driver can use
these information to fill rte_power_monitor_cond.

Signed-off-by: Miao Li 
---
 doc/guides/rel_notes/release_21_11.rst |  4 +++
 lib/vhost/rte_vhost.h  | 44 ++
 lib/vhost/version.map  |  3 ++
 lib/vhost/vhost.c  | 38 ++
 4 files changed, 89 insertions(+)

diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index 27dc896703..ad6d256a55 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -72,6 +72,10 @@ New Features
   Added macros ETH_RSS_IPV4_CHKSUM and ETH_RSS_L4_CHKSUM, now IPv4 and
   TCP/UDP/SCTP header checksum field can be used as input set for RSS.
 
+* **Added power monitor API in vhost library.**
+
+  Added an API to support power monitor in vhost library.
+
 * **Updated virtio PMD.**
 
   Implement rte_power_monitor API in virtio PMD.
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index fd372d5259..42bda95e96 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -292,6 +292,33 @@ struct vhost_device_ops {
void *reserved[1]; /**< Reserved for future extension */
 };
 
+/**
+ * Power monitor condition.
+ */
+struct rte_vhost_power_monitor_cond {
+   /**< Address to monitor for changes */
+   volatile void *addr;
+   /**< If the `mask` is non-zero, location pointed
+*   to by `addr` will be read and masked, then
+*   compared with this value.
+*/
+   uint64_t val;
+   /**< 64-bit mask to extract value read from `addr` */
+   uint64_t mask;
+   /**< Data size (in bytes) that will be read from the
+*   monitored memory location (`addr`). Can be 1, 2,
+*   4, or 8. Supplying any other value will result in
+*   an error.
+*/
+   uint8_t size;
+   /**< If 1, and masked value that read from 'addr' equals
+*   'val', the driver will skip core sleep. If 0, and
+*  masked value that read from 'addr' does not equal 'val',
+*  the driver will skip core sleep.
+*/
+   uint8_t match;
+};
+
 /**
  * Convert guest physical address to host virtual address
  *
@@ -903,6 +930,23 @@ int rte_vhost_vring_call(int vid, uint16_t vring_idx);
  */
 uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
 
+/**
+ * Get power monitor address of the vhost device
+ *
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  vhost queue ID
+ * @param pmc
+ *  power monitor condition
+ * @return
+ *  0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_get_monitor_addr(int vid, uint16_t queue_id,
+   struct rte_vhost_power_monitor_cond *pmc);
+
 /**
  * Get log base and log size of the vhost device
  *
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 8ebde3f694..c8599ddb97 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -85,4 +85,7 @@ EXPERIMENTAL {
rte_vhost_async_channel_register_thread_unsafe;
rte_vhost_async_channel_unregister_thread_unsafe;
rte_vhost_clear_queue_thread_unsafe;
+
+   # added in 21.11
+   rte_vhost_get_monitor_addr;
 };
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 9540522dac..36c896c9e2 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -1889,5 +1889,43 @@ rte_vhost_async_get_inflight(int vid, uint16_t queue_id)
return ret;
 }
 
+int
+rte_vhost_get_monitor_addr(int vid, uint16_t queue_id,
+   struct rte_vhost_power_monitor_cond *pmc)
+{
+   struct virtio_net *dev = get_device(vid);
+   struct vhost_virtqueue *vq;
+
+   if (dev == NULL)
+   return -1;
+   if (queue_id >= VHOST_MAX_VRING)
+   return -1;
+
+   vq = dev->virtqueue[queue_id];
+   if (vq == NULL)
+   return -1;
+
+   if (vq_is_packed(dev)) {
+   struct vring_packed_desc *desc;
+   desc = vq->desc_packed;
+   pmc->addr = &desc[vq->last_avail_idx].flags;
+   if (vq->avail_wrap_counter)
+   pmc->val = VRING_DESC_F_AVAIL;
+   else
+   pmc->val = VRING_DESC_F_USED;
+   pmc->mask = VRING_DESC_F_AVAIL | VRING_DESC_F_USED;
+   pmc->size = sizeof(desc[vq->last_avail_idx].flags);
+   pmc->match = 1;
+   } else {
+   pmc->addr = &vq->avail->idx;
+   pmc->val = vq->last_avail_idx & (vq->size - 1);
+   pmc->mask = vq->size - 1;
+   pmc->size = sizeof(vq->avail->idx);
+ 

[dpdk-dev] [PATCH v5 3/5] net/vhost: implement rte_power_monitor API

2021-10-15 Thread Miao Li
This patch implements rte_power_monitor API in vhost PMD to reduce
power consumption when no packet come in. According to current semantics
of power monitor, this commit adds a callback function to decide whether
aborts the sleep by checking current value against the expected value and
vhost_get_monitor_addr to provide address to monitor. When no packet come
in, the value of address will not be changed and the running core will
sleep. Once packets arrive, the value of address will be changed and the
running core will wakeup.

Signed-off-by: Miao Li 
---
 doc/guides/rel_notes/release_21_11.rst |  4 +++
 drivers/net/vhost/rte_eth_vhost.c  | 40 ++
 2 files changed, 44 insertions(+)

diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index ad6d256a55..e6f9c284ae 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -76,6 +76,10 @@ New Features
 
   Added an API to support power monitor in vhost library.
 
+* **Updated vhost PMD.**
+
+  Implement rte_power_monitor API in vhost PMD.
+
 * **Updated virtio PMD.**
 
   Implement rte_power_monitor API in virtio PMD.
diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 2e24e5f7ff..ee665ee64d 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -1386,6 +1386,45 @@ eth_rx_queue_count(struct rte_eth_dev *dev, uint16_t 
rx_queue_id)
return rte_vhost_rx_queue_count(vq->vid, vq->virtqueue_id);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+#define CLB_MATCH_IDX 2
+static int
+vhost_monitor_callback(const uint64_t value,
+   const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+   const uint64_t m = opaque[CLB_MSK_IDX];
+   const uint64_t v = opaque[CLB_VAL_IDX];
+   const uint64_t c = opaque[CLB_MATCH_IDX];
+
+   if (c)
+   return (value & m) == v ? -1 : 0;
+   else
+   return (value & m) == v ? 0 : -1;
+}
+
+static int
+vhost_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
+{
+   struct vhost_queue *vq = rx_queue;
+   struct rte_vhost_power_monitor_cond vhost_pmc;
+   int ret;
+   if (vq == NULL)
+   return -EINVAL;
+   ret = rte_vhost_get_monitor_addr(vq->vid, vq->virtqueue_id,
+   &vhost_pmc);
+   if (ret < 0)
+   return -EINVAL;
+   pmc->addr = vhost_pmc.addr;
+   pmc->opaque[CLB_VAL_IDX] = vhost_pmc.val;
+   pmc->opaque[CLB_MSK_IDX] = vhost_pmc.mask;
+   pmc->opaque[CLB_MATCH_IDX] = vhost_pmc.match;
+   pmc->size = vhost_pmc.size;
+   pmc->fn = vhost_monitor_callback;
+
+   return 0;
+}
+
 static const struct eth_dev_ops ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
@@ -1405,6 +1444,7 @@ static const struct eth_dev_ops ops = {
.xstats_get_names = vhost_dev_xstats_get_names,
.rx_queue_intr_enable = eth_rxq_intr_enable,
.rx_queue_intr_disable = eth_rxq_intr_disable,
+   .get_monitor_addr= vhost_get_monitor_addr,
 };
 
 static int
-- 
2.25.1



[dpdk-dev] [PATCH v5 4/5] power: modify return of queue_stopped

2021-10-15 Thread Miao Li
Since some vdevs like virtio and vhost do not support rxq_info_get and
queue state inquiry, the error return value -ENOTSUP need to be ignored
when queue_stopped cannot get rx queue information and rx queue state.
This patch changes the return value of queue_stopped when
rte_eth_rx_queue_info_get return ENOTSUP to support vdevs which cannot
provide rx queue information and rx queue state enable power management.

Signed-off-by: Miao Li 
Acked-by: Anatoly Burakov 
---
 lib/power/rte_power_pmd_mgmt.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index 0ce40f0875..39a2b4cd23 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -382,8 +382,13 @@ queue_stopped(const uint16_t port_id, const uint16_t 
queue_id)
 {
struct rte_eth_rxq_info qinfo;
 
-   if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
-   return -1;
+   int ret = rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo);
+   if (ret < 0) {
+   if (ret == -ENOTSUP)
+   return 1;
+   else
+   return -1;
+   }
 
return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
 }
-- 
2.25.1



[dpdk-dev] [PATCH v5 5/5] examples/l3fwd-power: support virtio/vhost

2021-10-15 Thread Miao Li
In l3fwd-power, there is default port configuration which requires
RSS and IPV4/UDP/TCP checksum. Once device does not support these,
the l3fwd-power will exit and report an error.
This patch updates the port configuration based on device capabilities
after getting the device information to support devices like virtio
and vhost.

Signed-off-by: Miao Li 
---
 examples/l3fwd-power/main.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 73a3ab5bc0..61c15e01d2 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -505,7 +505,9 @@ is_valid_ipv4_pkt(struct rte_ipv4_hdr *pkt, uint32_t 
link_len)
return -1;
 
/* 2. The IP checksum must be correct. */
-   /* this is checked in H/W */
+   /* if this is not checked in H/W, check it. */
+   if ((port_conf.rxmode.offloads & DEV_RX_OFFLOAD_IPV4_CKSUM) == 0)
+   rte_ipv4_cksum(pkt);
 
/*
 * 3. The IP version number must be 4. If the version number is not 4
@@ -2637,6 +2639,11 @@ main(int argc, char **argv)
local_port_conf.rx_adv_conf.rss_conf.rss_hf);
}
 
+   if (local_port_conf.rx_adv_conf.rss_conf.rss_hf == 0)
+   local_port_conf.rxmode.mq_mode = ETH_MQ_RX_NONE;
+   local_port_conf.rxmode.offloads &= dev_info.rx_offload_capa;
+   port_conf.rxmode.offloads = local_port_conf.rxmode.offloads;
+
ret = rte_eth_dev_configure(portid, nb_rx_queue,
(uint16_t)n_tx_queue, &local_port_conf);
if (ret < 0)
-- 
2.25.1



Re: [dpdk-dev] [PATCH] examples/vhost: change the default value of NIC's max queues

2021-10-15 Thread Maxime Coquelin

Hi,

On 9/10/21 05:17, Xia, Chenbo wrote:

Hi Wenwu,


-Original Message-
From: Ma, WenwuX 
Sent: Friday, September 10, 2021 9:52 PM
To: dev@dpdk.org
Cc: maxime.coque...@redhat.com; Xia, Chenbo ; Jiang,
Cheng1 ; Hu, Jiayu ; Wang, Yinan
; Ma, WenwuX 
Subject: [PATCH] examples/vhost: change the default value of NIC's max queues

vswitch can't launch with 40G FTV due to Device start fails


Not many people can understand what's FTV. So let's describe it with a driver
name. Example if it's 'i40e':

vswitch can't launch with a 40G i40e port...

And Device -> device


if NIC’s max queues > the default number of 128,
so, we changed the default value from 128 to 512.



I'd say it's not cool to still hard-code the MAX_QUEUES so that only 'some' NICs
can work with the example. The app should have a way to check this kind of info
before init/start. But as I would like to see at some point, this example will
be removed and all our tests go to testpmd. Let's not waste too much effort on
this example.

Besides: it can be a fix. Let's backport it.

Thanks,
Chenbo
  

Signed-off-by: Wenwu Ma 
---
  examples/vhost/main.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index bc3d71c898..36969a4de5 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -29,7 +29,7 @@
  #include "main.h"

  #ifndef MAX_QUEUES
-#define MAX_QUEUES 128
+#define MAX_QUEUES 512
  #endif

  /* the maximum number of external ports supported */
--
2.25.1




Are you planning to post a new revision handling Chenbo's comments?

Thanks,
Maxime



Re: [dpdk-dev] [PATCH v5 2/5] vhost: implement rte_power_monitor API

2021-10-15 Thread Xia, Chenbo
Hi,

> -Original Message-
> From: Li, Miao 
> Sent: Friday, October 15, 2021 11:12 PM
> To: dev@dpdk.org
> Cc: Xia, Chenbo ; maxime.coque...@redhat.com; Li, Miao
> 
> Subject: [PATCH v5 2/5] vhost: implement rte_power_monitor API
> 
> This patch defines rte_vhost_power_monitor_cond which is used to pass
> some information to vhost driver. The information is including the address
> to monitor, the expected value, the mask to extract value read from 'addr',
> the value size of monitor address, the match flag used to distinguish the
> value used to match something or not match something. Vhost driver can use
> these information to fill rte_power_monitor_cond.
> 
> Signed-off-by: Miao Li 
> ---
>  doc/guides/rel_notes/release_21_11.rst |  4 +++
>  lib/vhost/rte_vhost.h  | 44 ++
>  lib/vhost/version.map  |  3 ++
>  lib/vhost/vhost.c  | 38 ++
>  4 files changed, 89 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_21_11.rst
> b/doc/guides/rel_notes/release_21_11.rst
> index 27dc896703..ad6d256a55 100644
> --- a/doc/guides/rel_notes/release_21_11.rst
> +++ b/doc/guides/rel_notes/release_21_11.rst
> @@ -72,6 +72,10 @@ New Features
>Added macros ETH_RSS_IPV4_CHKSUM and ETH_RSS_L4_CHKSUM, now IPv4 and
>TCP/UDP/SCTP header checksum field can be used as input set for RSS.
> 
> +* **Added power monitor API in vhost library.**
> +
> +  Added an API to support power monitor in vhost library.
> +
>  * **Updated virtio PMD.**
> 
>Implement rte_power_monitor API in virtio PMD.
> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> index fd372d5259..42bda95e96 100644
> --- a/lib/vhost/rte_vhost.h
> +++ b/lib/vhost/rte_vhost.h
> @@ -292,6 +292,33 @@ struct vhost_device_ops {
>   void *reserved[1]; /**< Reserved for future extension */
>  };
> 
> +/**
> + * Power monitor condition.
> + */
> +struct rte_vhost_power_monitor_cond {
> + /**< Address to monitor for changes */
> + volatile void *addr;
> + /**< If the `mask` is non-zero, location pointed
> +  *   to by `addr` will be read and masked, then
> +  *   compared with this value.
> +  */
> + uint64_t val;
> + /**< 64-bit mask to extract value read from `addr` */
> + uint64_t mask;
> + /**< Data size (in bytes) that will be read from the
> +  *   monitored memory location (`addr`). Can be 1, 2,
> +  *   4, or 8. Supplying any other value will result in
> +  *   an error.

'Can be ...' part is not necessary, as this value is defined in vhost
lib and currently only has two different values for packed or split.

> +  */
> + uint8_t size;
> + /**< If 1, and masked value that read from 'addr' equals
> +  *   'val', the driver will skip core sleep. If 0, and

'will' -> 'should'. As it's a suggestion for vhost driver.

> +  *  masked value that read from 'addr' does not equal 'val',
> +  *  the driver will skip core sleep.

Ditto.

Thanks,
Chenbo

> +  */
> + uint8_t match;
> +};
> +
>  /**
>   * Convert guest physical address to host virtual address
>   *
> @@ -903,6 +930,23 @@ int rte_vhost_vring_call(int vid, uint16_t vring_idx);
>   */
>  uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
> 
> +/**
> + * Get power monitor address of the vhost device
> + *
> + * @param vid
> + *  vhost device ID
> + * @param queue_id
> + *  vhost queue ID
> + * @param pmc
> + *  power monitor condition
> + * @return
> + *  0 on success, -1 on failure
> + */
> +__rte_experimental
> +int
> +rte_vhost_get_monitor_addr(int vid, uint16_t queue_id,
> + struct rte_vhost_power_monitor_cond *pmc);
> +
>  /**
>   * Get log base and log size of the vhost device
>   *
> diff --git a/lib/vhost/version.map b/lib/vhost/version.map
> index 8ebde3f694..c8599ddb97 100644
> --- a/lib/vhost/version.map
> +++ b/lib/vhost/version.map
> @@ -85,4 +85,7 @@ EXPERIMENTAL {
>   rte_vhost_async_channel_register_thread_unsafe;
>   rte_vhost_async_channel_unregister_thread_unsafe;
>   rte_vhost_clear_queue_thread_unsafe;
> +
> + # added in 21.11
> + rte_vhost_get_monitor_addr;
>  };
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index 9540522dac..36c896c9e2 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c
> @@ -1889,5 +1889,43 @@ rte_vhost_async_get_inflight(int vid, uint16_t 
> queue_id)
>   return ret;
>  }
> 
> +int
> +rte_vhost_get_monitor_addr(int vid, uint16_t queue_id,
> + struct rte_vhost_power_monitor_cond *pmc)
> +{
> + struct virtio_net *dev = get_device(vid);
> + struct vhost_virtqueue *vq;
> +
> + if (dev == NULL)
> + return -1;
> + if (queue_id >= VHOST_MAX_VRING)
> + return -1;
> +
> + vq = dev->virtqueue[queue_id];
> + if (vq == NULL)
> + return -1;
> +
> + if (vq_is_packed(dev)) {
> + struct vring_packed_desc *desc;
> + desc = vq->d

Re: [dpdk-dev] [PATCH v5 3/5] net/vhost: implement rte_power_monitor API

2021-10-15 Thread Xia, Chenbo
> -Original Message-
> From: Li, Miao 
> Sent: Friday, October 15, 2021 11:12 PM
> To: dev@dpdk.org
> Cc: Xia, Chenbo ; maxime.coque...@redhat.com; Li, Miao
> 
> Subject: [PATCH v5 3/5] net/vhost: implement rte_power_monitor API
> 
> This patch implements rte_power_monitor API in vhost PMD to reduce
> power consumption when no packet come in. According to current semantics
> of power monitor, this commit adds a callback function to decide whether
> aborts the sleep by checking current value against the expected value and
> vhost_get_monitor_addr to provide address to monitor. When no packet come
> in, the value of address will not be changed and the running core will
> sleep. Once packets arrive, the value of address will be changed and the
> running core will wakeup.
> 
> Signed-off-by: Miao Li 
> ---
>  doc/guides/rel_notes/release_21_11.rst |  4 +++
>  drivers/net/vhost/rte_eth_vhost.c  | 40 ++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_21_11.rst
> b/doc/guides/rel_notes/release_21_11.rst
> index ad6d256a55..e6f9c284ae 100644
> --- a/doc/guides/rel_notes/release_21_11.rst
> +++ b/doc/guides/rel_notes/release_21_11.rst
> @@ -76,6 +76,10 @@ New Features
> 
>Added an API to support power monitor in vhost library.
> 
> +* **Updated vhost PMD.**
> +
> +  Implement rte_power_monitor API in vhost PMD.
> +
>  * **Updated virtio PMD.**
> 
>Implement rte_power_monitor API in virtio PMD.
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> index 2e24e5f7ff..ee665ee64d 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -1386,6 +1386,45 @@ eth_rx_queue_count(struct rte_eth_dev *dev, uint16_t
> rx_queue_id)
>   return rte_vhost_rx_queue_count(vq->vid, vq->virtqueue_id);
>  }
> 
> +#define CLB_VAL_IDX 0
> +#define CLB_MSK_IDX 1
> +#define CLB_MATCH_IDX 2
> +static int
> +vhost_monitor_callback(const uint64_t value,
> + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
> +{
> + const uint64_t m = opaque[CLB_MSK_IDX];
> + const uint64_t v = opaque[CLB_VAL_IDX];
> + const uint64_t c = opaque[CLB_MATCH_IDX];
> +
> + if (c)
> + return (value & m) == v ? -1 : 0;
> + else
> + return (value & m) == v ? 0 : -1;
> +}
> +
> +static int
> +vhost_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
> +{
> + struct vhost_queue *vq = rx_queue;
> + struct rte_vhost_power_monitor_cond vhost_pmc;
> + int ret;
> + if (vq == NULL)
> + return -EINVAL;
> + ret = rte_vhost_get_monitor_addr(vq->vid, vq->virtqueue_id,
> + &vhost_pmc);
> + if (ret < 0)
> + return -EINVAL;
> + pmc->addr = vhost_pmc.addr;
> + pmc->opaque[CLB_VAL_IDX] = vhost_pmc.val;
> + pmc->opaque[CLB_MSK_IDX] = vhost_pmc.mask;
> + pmc->opaque[CLB_MATCH_IDX] = vhost_pmc.match;
> + pmc->size = vhost_pmc.size;
> + pmc->fn = vhost_monitor_callback;
> +
> + return 0;
> +}
> +
>  static const struct eth_dev_ops ops = {
>   .dev_start = eth_dev_start,
>   .dev_stop = eth_dev_stop,
> @@ -1405,6 +1444,7 @@ static const struct eth_dev_ops ops = {
>   .xstats_get_names = vhost_dev_xstats_get_names,
>   .rx_queue_intr_enable = eth_rxq_intr_enable,
>   .rx_queue_intr_disable = eth_rxq_intr_disable,
> + .get_monitor_addr= vhost_get_monitor_addr,

Please align the format with above callbacks: one space is enough after
'get_monitor_addr'

Thanks,
Chenbo

>  };
> 
>  static int
> --
> 2.25.1



Re: [dpdk-dev] [PATCH v5 4/5] power: modify return of queue_stopped

2021-10-15 Thread Xia, Chenbo
> -Original Message-
> From: Li, Miao 
> Sent: Friday, October 15, 2021 11:12 PM
> To: dev@dpdk.org
> Cc: Xia, Chenbo ; maxime.coque...@redhat.com; Li, Miao
> ; Burakov, Anatoly 
> Subject: [PATCH v5 4/5] power: modify return of queue_stopped
> 
> Since some vdevs like virtio and vhost do not support rxq_info_get and
> queue state inquiry, the error return value -ENOTSUP need to be ignored
> when queue_stopped cannot get rx queue information and rx queue state.
> This patch changes the return value of queue_stopped when
> rte_eth_rx_queue_info_get return ENOTSUP to support vdevs which cannot

ENOTSUP -> -ENOTSUP

With this fixed:

Reviewed-by: Chenbo Xia 

> provide rx queue information and rx queue state enable power management.
> 
> Signed-off-by: Miao Li 
> Acked-by: Anatoly Burakov 
> ---
>  lib/power/rte_power_pmd_mgmt.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
> index 0ce40f0875..39a2b4cd23 100644
> --- a/lib/power/rte_power_pmd_mgmt.c
> +++ b/lib/power/rte_power_pmd_mgmt.c
> @@ -382,8 +382,13 @@ queue_stopped(const uint16_t port_id, const uint16_t
> queue_id)
>  {
>   struct rte_eth_rxq_info qinfo;
> 
> - if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
> - return -1;
> + int ret = rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo);
> + if (ret < 0) {
> + if (ret == -ENOTSUP)
> + return 1;
> + else
> + return -1;
> + }
> 
>   return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
>  }
> --
> 2.25.1



Re: [dpdk-dev] [EXT] Re: [PATCH v2 1/6] eal/interrupts: implement get set APIs

2021-10-15 Thread Thomas Monjalon
14/10/2021 19:53, Dmitry Kozlyuk:
> 2021-10-14 17:15 (UTC+), Harman Kalra:
> > > > +   rte_intr_type_set;
> > > > +   rte_intr_type_get;
> > > > +   rte_intr_instance_alloc;
> > > > +   rte_intr_instance_free;
> > > >  };  
> > > 
> > > Do I understand correctly that these exports are needed to allow an
> > > application to use DPDK callback facilities for its own interrupt 
> > > sources?  
> > 
> > I exported only those APIs which are currently used by test suite or example
> > applications, may be later more APIs can be moved from internal to public on
> > need basis.
> > 
> > > If so, I'd suggest that instead we export a simpler set of functions:
> > > 1. Create/free a handle instance with automatic fixed type selection.
> > > 2. Trigger an interrupt on the specified handle instance.
> > > The flow would be that the application listens on whatever it wants, 
> > > probably
> > > with OS-specific mechanisms, and just notifies the interrupt thread about
> > > events to trigger callbacks.
> > > Because these APIs are experimental we don't need to change it now, just 
> > > my
> > > thoughts for the future.  
> > 
> > I am sorry but I did not followed your suggestion, can you please explain.
> 
> These API is used as follows. The application has a file descriptor
> that becomes readable on some event. The programmer doesn't want to create
> another thread like EAL interrupt thread, implement thread-safe callback
> registration and invocation. They want to reuse DPDK mechanism instead.
> So they create an instance of type EXT and give it the descriptor.
> In case of the unit test the descriptor is a pipe read end.
> In case of a real application it can be a socket, like in mlx5 PMD.
> This is often convenient, but not always. An event may be a signal,
> or busy-wait end, or it may be Windows with its completely different IO model
> (it's "issue an IO, wait for completion" instead of POSIX
> "wait for IO readiness, do a blocking IO").
> In all these cases the user needs to create a fake pipe (or whatever)
> to fit into how the interrupt thread waits for events.
> But what the application really needs is to say "there's an event, please run
> the callback on this handle". It's a function call that doesn't require any
> explicit file descriptors or handles, doesn't rely on any IO model.
> How it is implemented depends on the EAL, for POSIX it will probably be
> an internal pipe, Windows can use APC as in eal_intr_thread_schedule().
> Again, I'm thinking out loud here, nothing of this needs to be done now.

I like this way of thinking.




[dpdk-dev] [Bug 828] [dpdk-21.11] zuc unit test is failing

2021-10-15 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=828

Bug ID: 828
   Summary: [dpdk-21.11] zuc unit test is failing
   Product: DPDK
   Version: 21.11
  Hardware: All
OS: Linux
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: cryptodev
  Assignee: dev@dpdk.org
  Reporter: varalakshm...@intel.com
  Target Milestone: ---

Steps to reproduce

from dpdk path, the following steps should be followed:
x86_64-native-linuxapp-gcc/app/test/dpdk-test -l 1,2,3 --vdev crypto_zuc0
--socket-mem 2048,0 -n 4 --log-level=6 -a :1a:01.0

RTE>> cryptodev_sw_zuc_autotest

Actual Result:

+ Test Suite Summary : Cryptodev Unit Test Suite
+ --- +
+ AES Chain : 0/57 passed, 57/57 skipped, 0/57 failed, 0/57 unsupported
+ AES Cipher Only : 0/65 passed, 65/65 skipped, 0/65 failed, 0/65 unsupported
+ AES Docsis : 0/24 passed, 24/24 skipped, 0/24 failed, 0/24 unsupported
+ 3DES Chain : 0/20 passed, 20/20 skipped, 0/20 failed, 0/20 unsupported
+ 3DES Cipher Only : 0/10 passed, 10/10 skipped, 0/10 failed, 0/10 unsupported
+ DES Cipher Only : 0/2 passed, 2/2 skipped, 0/2 failed, 0/2 unsupported
+ DES Docsis : 0/12 passed, 12/12 skipped, 0/12 failed, 0/12 unsupported
+ Auth Only : 0/36 passed, 36/36 skipped, 0/36 failed, 0/36 unsupported
+ Multi Session Unit Test Suite : 0/2 passed, 2/2 skipped, 0/2 failed, 0/2
unsupported
+ NULL Test Suite : 0/2 passed, 2/2 skipped, 0/2 failed, 0/2 unsupported
+ AES CCM Authenticated Test Suite : 0/18 passed, 18/18 skipped, 0/18 failed,
0/18 unsupported
+ AES GCM Authenticated Test Suite : 0/58 passed, 58/58 skipped, 0/58 failed,
0/58 unsupported
+ AES GMAC Authentication Test Suite : 0/12 passed, 12/12 skipped, 0/12 failed,
0/12 unsupported
+ SNOW 3G Test Suite : 0/47 passed, 47/47 skipped, 0/47 failed, 0/47
unsupported
+ Chacha20-Poly1305 Test Suite : 0/2 passed, 2/2 skipped, 0/2 failed, 0/2
unsupported
+ ZUC Test Suite : 15/27 passed, 9/27 skipped, 3/27 failed, 0/27 unsupported
+ HMAC_MD5 Authentication Test Suite : 0/4 passed, 4/4 skipped, 0/4 failed, 0/4
unsupported
+ Kasumi Test Suite : 0/36 passed, 36/36 skipped, 0/36 failed, 0/36 unsupported
+ ESN Test Suite : 0/2 passed, 2/2 skipped, 0/2 failed, 0/2 unsupported
+ Negative AES GCM Test Suite : 0/12 passed, 12/12 skipped, 0/12 failed, 0/12
unsupported
+ Negative AES GMAC Test Suite : 0/2 passed, 2/2 skipped, 0/2 failed, 0/2
unsupported
+ Mixed CIPHER + HASH algorithms Test Suite : 0/32 passed, 32/32 skipped, 0/32
failed, 0/32 unsupported
+ Negative HMAC SHA1 Unit Test Suite : 0/4 passed, 4/4 skipped, 0/4 failed, 0/4
unsupported
+ Crypto General Unit Test Suite : 5/6 passed, 1/6 skipped, 0/6 failed, 0/6
unsupported
+ IPsec Proto Unit Test Suite : 0/17 passed, 17/17 skipped, 0/17 failed, 0/17
unsupported
+ PDCP Proto Unit Test Suite : 0/1 passed, 1/1 skipped, 0/1 failed, 0/1
unsupported
+ Docsis Proto Unit Test Suite : 0/1 passed, 1/1 skipped, 0/1 failed, 0/1
unsupported
+ --- +
+ Sub Testsuites Total : 27
+ Sub Testsuites Skipped : 25
+ Sub Testsuites Passed : 1
+ Sub Testsuites Failed : 1
+ --- +
+ Tests Total : 511
+ Tests Skipped : 488
+ Tests Executed : 65
+ Tests Unsupported: 0
+ Tests Passed : 20
+ Tests Failed : 3
+ --- +
Test Failed
RTE>>

Expected Result
Test is expected to Pass with no errors.

Stack Trace or Log

fa5bf9345d4e0141ac40f154b1c1a4b99e8fe9a3 is the first bad commit

commit fa5bf9345d4e0141ac40f154b1c1a4b99e8fe9a3
Author: Vidya Sagar Velumuri 
Date: Wed Sep 15 06:11:03 2021 +

test/crypto: add ZUC cases with 256-bit keys

Add test cases for zuc 256 bit key.
Add test case for zuc 8 and 16 byte digest with
256 bit key mode

Signed-off-by: Vidya Sagar Velumuri 
Acked-by: Akhil Goyal 

--
other tests failing:
cryptodev_aesni_mb_autotest
cryptodev_qat_autotest
cryptodev_sw_zuc_autotest
--

-- 
You are receiving this mail because:
You are the assignee for the bug.

[dpdk-dev] [PATCH v2] examples/vhost: change the default value of NIC's max queues

2021-10-15 Thread Wenwu Ma
vswitch can't launch with a 40G i40e port due to device start fails
if NIC’s max queues > the default number of 128, so, we changed
the default value from 128 to 512.

Signed-off-by: Wenwu Ma 
---
 examples/vhost/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index bc3d71c898..36969a4de5 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -29,7 +29,7 @@
 #include "main.h"
 
 #ifndef MAX_QUEUES
-#define MAX_QUEUES 128
+#define MAX_QUEUES 512
 #endif
 
 /* the maximum number of external ports supported */
-- 
2.25.1



Re: [dpdk-dev] [PATCH] app/testpmd: fix l4 sw csum over multi segments

2021-10-15 Thread David Marchand
Hello,

On Fri, Oct 15, 2021 at 7:27 AM Xiaoyun Li  wrote:
>
> In csum forwarding mode, software UDP/TCP csum calculation only takes
> the first segment into account while using the whole packet length so
> the calculation will read invalid memory region with multi-segments
> packets and will get wrong value.
> This patch fixes this issue.
>
> Fixes: af75078fece3 ("first public release")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Xiaoyun Li 
> ---
>  app/test-pmd/csumonly.c | 31 +++
>  1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index 090797318a..5df3be0a6f 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -18,7 +18,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 

This include caught my eye.


>  #include 
>  #include 
>  #include 
> @@ -56,6 +56,11 @@
>  #define GRE_SUPPORTED_FIELDS   (GRE_CHECKSUM_PRESENT | GRE_KEY_PRESENT |\
>  GRE_SEQUENCE_PRESENT)
>
> +/* When UDP or TCP or outer UDP csum offload is off, sw l4 csum is needed */
> +#define UDP_TCP_CSUM(DEV_TX_OFFLOAD_UDP_CKSUM |\
> +DEV_TX_OFFLOAD_TCP_CKSUM |\
> +DEV_TX_OFFLOAD_OUTER_UDP_CKSUM)
> +
>  /* We cannot use rte_cpu_to_be_16() on a constant in a switch/case */
>  #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
>  #define _htons(x) ((uint16_t)x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 
> 8)))
> @@ -602,12 +607,8 @@ process_outer_cksums(void *outer_l3_hdr, struct 
> testpmd_offload_info *info,
> /* do not recalculate udp cksum if it was 0 */
> if (udp_hdr->dgram_cksum != 0) {
> udp_hdr->dgram_cksum = 0;
> -   if (info->outer_ethertype == _htons(RTE_ETHER_TYPE_IPV4))
> -   udp_hdr->dgram_cksum =
> -   rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
> -   else
> -   udp_hdr->dgram_cksum =
> -   rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);
> +   udp_hdr->dgram_cksum = get_udptcp_checksum(outer_l3_hdr,
> +   udp_hdr, info->outer_ethertype);
> }
>
> return ol_flags;
> @@ -802,6 +803,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
> struct rte_mbuf *m, *p;
> struct rte_ether_hdr *eth_hdr;
> void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
> +   uint8_t *l3_buf = NULL;
> void **gro_ctx;
> uint16_t gro_pkts_num;
> uint8_t gro_enable;
> @@ -877,7 +879,19 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
> rte_ether_addr_copy(&ports[fs->tx_port].eth_addr,
> ð_hdr->src_addr);
> parse_ethernet(eth_hdr, &info);
> -   l3_hdr = (char *)eth_hdr + info.l2_len;
> +   /* When sw csum is needed, multi-segs needs a buf to contain
> +* the whole packet for later UDP/TCP csum calculation.
> +*/
> +   if (m->nb_segs > 1 && !(tx_ol_flags & PKT_TX_TCP_SEG) &&
> +   !(tx_offloads & UDP_TCP_CSUM)) {
> +   l3_buf = rte_zmalloc("csum l3_buf",
> +info.pkt_len - info.l2_len,
> +RTE_CACHE_LINE_SIZE);

Rather than call a dyn allocation in datapath, can't we have a static
buffer on the stack?


> +   rte_pktmbuf_read(m, info.l2_len,
> +info.pkt_len - info.l2_len, l3_buf);
> +   l3_hdr = l3_buf;
> +   } else
> +   l3_hdr = (char *)eth_hdr + info.l2_len;
>
> /* check if it's a supported tunnel */
> if (txp->parse_tunnel) {
> @@ -1051,6 +1065,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
> printf("tx: flags=%s", buf);
> printf("\n");
> }
> +   rte_free(l3_buf);
> }
>
> if (unlikely(gro_enable)) {
> --
> 2.25.1
>


-- 
David Marchand



Re: [dpdk-dev] [PATCH v2 0/7] crypto/security session framework rework

2021-10-15 Thread Zhang, Roy Fan
Hi Akhil,

It shall work but Kasumi tests are passing :-)
It is snow3g and aesni-mb/gcm that are failing.
Thanks

Regards,
Fan

> -Original Message-
> From: Akhil Goyal 
> Sent: Thursday, October 14, 2021 7:24 PM
> To: Zhang, Roy Fan ; dev@dpdk.org
> Cc: tho...@monjalon.net; david.march...@redhat.com;
> hemant.agra...@nxp.com; Anoob Joseph ; De Lara
> Guarch, Pablo ; Trahe, Fiona
> ; Doherty, Declan ;
> ma...@nvidia.com; g.si...@nxp.com; jianjay.z...@huawei.com;
> asoma...@amd.com; ruifeng.w...@arm.com; Ananyev, Konstantin
> ; Nicolau, Radu ;
> ajit.khapa...@broadcom.com; Nagadheeraj Rottela
> ; Ankur Dwivedi ;
> Power, Ciara ; Wang, Haiyue
> ; jiawe...@trustnetic.com;
> jianw...@trustnetic.com
> Subject: RE: [PATCH v2 0/7] crypto/security session framework rework
> 
> Hi Fan,
> >
> > Unfortunately the patches still cause seg-fault at QAT and SW PMDs.
> >
> > - for qat it fails at rte_security_ops->session_size_get not implemented.
> > - for sw pmds the queue pair's session private mempools are not set.
> >
> Can you check if below change works for Kasumi. I will replicate for others.
> 
> diff --git a/drivers/crypto/kasumi/kasumi_pmd_private.h
> b/drivers/crypto/kasumi/kasumi_pmd_private.h
> index abedcd616d..fe0e78e516 100644
> --- a/drivers/crypto/kasumi/kasumi_pmd_private.h
> +++ b/drivers/crypto/kasumi/kasumi_pmd_private.h
> @@ -38,8 +38,6 @@ struct kasumi_qp {
> /**< Ring for placing processed ops */
> struct rte_mempool *sess_mp;
> /**< Session Mempool */
> -   struct rte_mempool *sess_mp_priv;
> -   /**< Session Private Data Mempool */
> struct rte_cryptodev_stats qp_stats;
> /**< Queue pair statistics */
> uint8_t temp_digest[KASUMI_DIGEST_LENGTH];
> diff --git a/drivers/crypto/kasumi/rte_kasumi_pmd.c
> b/drivers/crypto/kasumi/rte_kasumi_pmd.c
> index d6f927417a..1fc59c8b8a 100644
> --- a/drivers/crypto/kasumi/rte_kasumi_pmd.c
> +++ b/drivers/crypto/kasumi/rte_kasumi_pmd.c
> @@ -139,27 +139,24 @@ kasumi_get_session(struct kasumi_qp *qp, struct
> rte_crypto_op *op)
> op->sym->session,
> cryptodev_driver_id);
> } else {
> -   void *_sess = NULL;
> -   void *_sess_private_data = NULL;
> +   struct rte_cryptodev_sym_session *_sess = NULL;
> 
> -   if (rte_mempool_get(qp->sess_mp, (void **)&_sess))
> +   /* Create temporary session */
> +   _sess = rte_cryptodev_sym_session_create(qp->sess_mp);
> +   if (_sess == NULL)
> return NULL;
> 
> -   if (rte_mempool_get(qp->sess_mp_priv,
> -   (void **)&_sess_private_data))
> -   return NULL;
> -
> -   sess = (struct kasumi_session *)_sess_private_data;
> -
> +   _sess->sess_data[cryptodev_driver_id].data =
> +   (void *)((uint8_t *)_sess +
> +   rte_cryptodev_sym_get_header_session_size() +
> +   (cryptodev_driver_id * _sess->priv_sz));
> +   sess = _sess->sess_data[cryptodev_driver_id].data;
> if (unlikely(kasumi_set_session_parameters(qp->mgr, sess,
> op->sym->xform) != 0)) {
> rte_mempool_put(qp->sess_mp, _sess);
> -   rte_mempool_put(qp->sess_mp_priv, _sess_private_data);
> sess = NULL;
> }
> op->sym->session = (struct rte_cryptodev_sym_session *)_sess;
> -   set_sym_session_private_data(op->sym->session,
> -   cryptodev_driver_id, _sess_private_data);
> }
> 
> if (unlikely(sess == NULL))
> @@ -327,7 +324,6 @@ process_ops(struct rte_crypto_op **ops, struct
> kasumi_session *session,
> memset(ops[i]->sym->session, 0,
> rte_cryptodev_sym_get_existing_header_session_size(
> ops[i]->sym->session));
> -   rte_mempool_put(qp->sess_mp_priv, session);
> rte_mempool_put(qp->sess_mp, ops[i]->sym->session);
> ops[i]->sym->session = NULL;
> }



Re: [dpdk-dev] [PATCH v5 5/5] examples/l3fwd-power: support virtio/vhost

2021-10-15 Thread Xia, Chenbo
> -Original Message-
> From: Li, Miao 
> Sent: Friday, October 15, 2021 11:12 PM
> To: dev@dpdk.org
> Cc: Xia, Chenbo ; maxime.coque...@redhat.com; Li, Miao
> 
> Subject: [PATCH v5 5/5] examples/l3fwd-power: support virtio/vhost
> 
> In l3fwd-power, there is default port configuration which requires
> RSS and IPV4/UDP/TCP checksum. Once device does not support these,
> the l3fwd-power will exit and report an error.
> This patch updates the port configuration based on device capabilities
> after getting the device information to support devices like virtio
> and vhost.
> 
> Signed-off-by: Miao Li 
> ---
>  examples/l3fwd-power/main.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
> index 73a3ab5bc0..61c15e01d2 100644
> --- a/examples/l3fwd-power/main.c
> +++ b/examples/l3fwd-power/main.c
> @@ -505,7 +505,9 @@ is_valid_ipv4_pkt(struct rte_ipv4_hdr *pkt, uint32_t
> link_len)
>   return -1;
> 
>   /* 2. The IP checksum must be correct. */
> - /* this is checked in H/W */
> + /* if this is not checked in H/W, check it. */
> + if ((port_conf.rxmode.offloads & DEV_RX_OFFLOAD_IPV4_CKSUM) == 0)
> + rte_ipv4_cksum(pkt);

This is not correct. The correct handling should be:

1. get actual cksum from pkt and save it
2. set pkt cksum to zero
3. compute correct cksum using rte_ipv4_cksum
4. compare to know if actual cksum == correct cksum

You can refer to test_ipsec_l3_csum_verify in test_cryptodev_security_ipsec.c

Thanks,
Chenbo

> 
>   /*
>* 3. The IP version number must be 4. If the version number is not 4
> @@ -2637,6 +2639,11 @@ main(int argc, char **argv)
>   local_port_conf.rx_adv_conf.rss_conf.rss_hf);
>   }
> 
> + if (local_port_conf.rx_adv_conf.rss_conf.rss_hf == 0)
> + local_port_conf.rxmode.mq_mode = ETH_MQ_RX_NONE;
> + local_port_conf.rxmode.offloads &= dev_info.rx_offload_capa;
> + port_conf.rxmode.offloads = local_port_conf.rxmode.offloads;
> +
>   ret = rte_eth_dev_configure(portid, nb_rx_queue,
>   (uint16_t)n_tx_queue, &local_port_conf);
>   if (ret < 0)
> --
> 2.25.1



Re: [dpdk-dev] [Bug 828] [dpdk-21.11] zuc unit test is failing

2021-10-15 Thread David Marchand
Hello,

On Fri, Oct 15, 2021 at 10:02 AM  wrote:
>
> https://bugs.dpdk.org/show_bug.cgi?id=828
>
> Bug ID: 828
>Summary: [dpdk-21.11] zuc unit test is failing
>Product: DPDK
>Version: 21.11
>   Hardware: All
> OS: Linux
> Status: UNCONFIRMED
>   Severity: normal
>   Priority: Normal
>  Component: cryptodev
>   Assignee: dev@dpdk.org
>   Reporter: varalakshm...@intel.com
>   Target Milestone: ---

I could not assign this bug to you in bz, can you have a look?
Thanks.


>
> Steps to reproduce
>
> from dpdk path, the following steps should be followed:
> x86_64-native-linuxapp-gcc/app/test/dpdk-test -l 1,2,3 --vdev crypto_zuc0
> --socket-mem 2048,0 -n 4 --log-level=6 -a :1a:01.0
>
> RTE>> cryptodev_sw_zuc_autotest
>

[snip]

> + --- +
> + Sub Testsuites Total : 27
> + Sub Testsuites Skipped : 25
> + Sub Testsuites Passed : 1
> + Sub Testsuites Failed : 1
> + --- +
> + Tests Total : 511
> + Tests Skipped : 488
> + Tests Executed : 65
> + Tests Unsupported: 0
> + Tests Passed : 20
> + Tests Failed : 3
> + --- +
> Test Failed
> RTE>>
>

> 
> fa5bf9345d4e0141ac40f154b1c1a4b99e8fe9a3 is the first bad commit
>
> commit fa5bf9345d4e0141ac40f154b1c1a4b99e8fe9a3
> Author: Vidya Sagar Velumuri 
> Date: Wed Sep 15 06:11:03 2021 +
>
> test/crypto: add ZUC cases with 256-bit keys
>
> Add test cases for zuc 256 bit key.
> Add test case for zuc 8 and 16 byte digest with
> 256 bit key mode
>
> Signed-off-by: Vidya Sagar Velumuri 
> Acked-by: Akhil Goyal 



-- 
David Marchand



[dpdk-dev] [PATCH v14 0/5] Add PIE support for HQoS library

2021-10-15 Thread Liguzinski, WojciechX
DPDK sched library is equipped with mechanism that secures it from the 
bufferbloat problem
which is a situation when excess buffers in the network cause high latency and 
latency 
variation. Currently, it supports RED for active queue management (which is 
designed 
to control the queue length but it does not control latency directly and is now 
being 
obsoleted). However, more advanced queue management is required to address this 
problem
and provide desirable quality of service to users.

This solution (RFC) proposes usage of new algorithm called "PIE" (Proportional 
Integral
controller Enhanced) that can effectively and directly control queuing latency 
to address 
the bufferbloat problem.

The implementation of mentioned functionality includes modification of existing 
and 
adding a new set of data structures to the library, adding PIE related APIs. 
This affects structures in public API/ABI. That is why deprecation notice is 
going
to be prepared and sent.

Liguzinski, WojciechX (5):
  sched: add PIE based congestion management
  example/qos_sched: add PIE support
  example/ip_pipeline: add PIE support
  doc/guides/prog_guide: added PIE
  app/test: add tests for PIE

 app/test/meson.build |4 +
 app/test/test_pie.c  | 1065 ++
 config/rte_config.h  |1 -
 doc/guides/prog_guide/glossary.rst   |3 +
 doc/guides/prog_guide/qos_framework.rst  |   60 +-
 doc/guides/prog_guide/traffic_management.rst |   13 +-
 drivers/net/softnic/rte_eth_softnic_tm.c |6 +-
 examples/ip_pipeline/tmgr.c  |  142 +--
 examples/qos_sched/app_thread.c  |1 -
 examples/qos_sched/cfg_file.c|  111 +-
 examples/qos_sched/cfg_file.h|5 +
 examples/qos_sched/init.c|   27 +-
 examples/qos_sched/main.h|3 +
 examples/qos_sched/profile.cfg   |  196 ++--
 lib/sched/meson.build|   10 +-
 lib/sched/rte_pie.c  |   86 ++
 lib/sched/rte_pie.h  |  398 +++
 lib/sched/rte_sched.c|  240 ++--
 lib/sched/rte_sched.h|   63 +-
 lib/sched/version.map|3 +
 20 files changed, 2161 insertions(+), 276 deletions(-)
 create mode 100644 app/test/test_pie.c
 create mode 100644 lib/sched/rte_pie.c
 create mode 100644 lib/sched/rte_pie.h

-- 
2.25.1



[dpdk-dev] [PATCH v14 1/5] sched: add PIE based congestion management

2021-10-15 Thread Liguzinski, WojciechX
Implement PIE based congestion management based on rfc8033

Signed-off-by: Liguzinski, WojciechX 
---
 drivers/net/softnic/rte_eth_softnic_tm.c |   6 +-
 lib/sched/meson.build|  10 +-
 lib/sched/rte_pie.c  |  82 +
 lib/sched/rte_pie.h  | 393 +++
 lib/sched/rte_sched.c| 240 +-
 lib/sched/rte_sched.h|  63 +++-
 lib/sched/version.map|   3 +
 7 files changed, 701 insertions(+), 96 deletions(-)
 create mode 100644 lib/sched/rte_pie.c
 create mode 100644 lib/sched/rte_pie.h

diff --git a/drivers/net/softnic/rte_eth_softnic_tm.c 
b/drivers/net/softnic/rte_eth_softnic_tm.c
index 90baba15ce..e74092ce7f 100644
--- a/drivers/net/softnic/rte_eth_softnic_tm.c
+++ b/drivers/net/softnic/rte_eth_softnic_tm.c
@@ -420,7 +420,7 @@ pmd_tm_node_type_get(struct rte_eth_dev *dev,
return 0;
 }
 
-#ifdef RTE_SCHED_RED
+#ifdef RTE_SCHED_CMAN
 #define WRED_SUPPORTED 1
 #else
 #define WRED_SUPPORTED 0
@@ -2306,7 +2306,7 @@ tm_tc_wred_profile_get(struct rte_eth_dev *dev, uint32_t 
tc_id)
return NULL;
 }
 
-#ifdef RTE_SCHED_RED
+#ifdef RTE_SCHED_CMAN
 
 static void
 wred_profiles_set(struct rte_eth_dev *dev, uint32_t subport_id)
@@ -2321,7 +2321,7 @@ wred_profiles_set(struct rte_eth_dev *dev, uint32_t 
subport_id)
for (tc_id = 0; tc_id < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; tc_id++)
for (color = RTE_COLOR_GREEN; color < RTE_COLORS; color++) {
struct rte_red_params *dst =
-   &pp->red_params[tc_id][color];
+   &pp->cman_params->red_params[tc_id][color];
struct tm_wred_profile *src_wp =
tm_tc_wred_profile_get(dev, tc_id);
struct rte_tm_red_params *src =
diff --git a/lib/sched/meson.build b/lib/sched/meson.build
index b24f7b8775..e7ae9bcf19 100644
--- a/lib/sched/meson.build
+++ b/lib/sched/meson.build
@@ -1,11 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-sources = files('rte_sched.c', 'rte_red.c', 'rte_approx.c')
-headers = files(
-'rte_approx.h',
-'rte_red.h',
-'rte_sched.h',
-'rte_sched_common.h',
-)
+sources = files('rte_sched.c', 'rte_red.c', 'rte_approx.c', 'rte_pie.c')
+headers = files('rte_sched.h', 'rte_sched_common.h',
+   'rte_red.h', 'rte_approx.h', 'rte_pie.h')
 deps += ['mbuf', 'meter']
diff --git a/lib/sched/rte_pie.c b/lib/sched/rte_pie.c
new file mode 100644
index 00..2fcecb2db4
--- /dev/null
+++ b/lib/sched/rte_pie.c
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include 
+
+#include "rte_pie.h"
+#include 
+#include 
+#include 
+
+#ifdef __INTEL_COMPILER
+#pragma warning(disable:2259) /* conversion may lose significant bits */
+#endif
+
+void
+rte_pie_rt_data_init(struct rte_pie *pie)
+{
+   if (pie == NULL) {
+   /* Allocate memory to use the PIE data structure */
+   pie = rte_malloc(NULL, sizeof(struct rte_pie), 0);
+
+   if (pie == NULL)
+   RTE_LOG(ERR, SCHED, "%s: Memory allocation fails\n", 
__func__);
+   }
+
+   pie->active = 0;
+   pie->in_measurement = 0;
+   pie->departed_bytes_count = 0;
+   pie->start_measurement = 0;
+   pie->last_measurement = 0;
+   pie->qlen = 0;
+   pie->avg_dq_time = 0;
+   pie->burst_allowance = 0;
+   pie->qdelay_old = 0;
+   pie->drop_prob = 0;
+   pie->accu_prob = 0;
+}
+
+int
+rte_pie_config_init(struct rte_pie_config *pie_cfg,
+   const uint16_t qdelay_ref,
+   const uint16_t dp_update_interval,
+   const uint16_t max_burst,
+   const uint16_t tailq_th)
+{
+   uint64_t tsc_hz = rte_get_tsc_hz();
+
+   if (pie_cfg == NULL)
+   return -1;
+
+   if (qdelay_ref <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for qdelay_ref\n", __func__);
+   return -EINVAL;
+   }
+
+   if (dp_update_interval <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for dp_update_interval\n", 
__func__);
+   return -EINVAL;
+   }
+
+   if (max_burst <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for max_burst\n", __func__);
+   return -EINVAL;
+   }
+
+   if (tailq_th <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for tailq_th\n", __func__);
+   return -EINVAL;
+   }
+
+   pie_cfg->qdelay_ref = (tsc_hz * qdelay_ref) / 1000;
+   pie_cfg->dp_update_interval = (tsc_hz * dp_update_interval) / 1000;
+   pie_cfg->max_burst = (tsc_hz * max_

[dpdk-dev] [PATCH v14 2/5] example/qos_sched: add PIE support

2021-10-15 Thread Liguzinski, WojciechX
patch add support enable PIE or RED by
parsing config file.

Signed-off-by: Liguzinski, WojciechX 
---
 config/rte_config.h |   1 -
 examples/qos_sched/app_thread.c |   1 -
 examples/qos_sched/cfg_file.c   | 111 ++
 examples/qos_sched/cfg_file.h   |   5 +
 examples/qos_sched/init.c   |  27 +++--
 examples/qos_sched/main.h   |   3 +
 examples/qos_sched/profile.cfg  | 196 +---
 7 files changed, 242 insertions(+), 102 deletions(-)

diff --git a/config/rte_config.h b/config/rte_config.h
index 590903c07d..48132f27df 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -89,7 +89,6 @@
 #define RTE_MAX_LCORE_FREQS 64
 
 /* rte_sched defines */
-#undef RTE_SCHED_RED
 #undef RTE_SCHED_COLLECT_STATS
 #undef RTE_SCHED_SUBPORT_TC_OV
 #define RTE_SCHED_PORT_N_GRINDERS 8
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index dbc878b553..895c0d3592 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -205,7 +205,6 @@ app_worker_thread(struct thread_conf **confs)
if (likely(nb_pkt)) {
int nb_sent = rte_sched_port_enqueue(conf->sched_port, 
mbufs,
nb_pkt);
-
APP_STATS_ADD(conf->stat.nb_drop, nb_pkt - nb_sent);
APP_STATS_ADD(conf->stat.nb_rx, nb_pkt);
}
diff --git a/examples/qos_sched/cfg_file.c b/examples/qos_sched/cfg_file.c
index cd167bd8e6..ea8b078566 100644
--- a/examples/qos_sched/cfg_file.c
+++ b/examples/qos_sched/cfg_file.c
@@ -229,6 +229,40 @@ cfg_load_subport_profile(struct rte_cfgfile *cfg,
return 0;
 }
 
+#ifdef RTE_SCHED_CMAN
+void set_subport_cman_params(struct rte_sched_subport_params *subport_p,
+   struct rte_sched_cman_params cman_p)
+{
+   int j, k;
+   subport_p->cman_params->cman_mode = cman_p.cman_mode;
+
+   for (j = 0; j < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; j++) {
+   if (subport_p->cman_params->cman_mode ==
+   RTE_SCHED_CMAN_WRED) {
+   for (k = 0; k < RTE_COLORS; k++) {
+   subport_p->cman_params->red_params[j][k].min_th 
=
+   cman_p.red_params[j][k].min_th;
+   subport_p->cman_params->red_params[j][k].max_th 
=
+   cman_p.red_params[j][k].max_th;
+   
subport_p->cman_params->red_params[j][k].maxp_inv =
+   cman_p.red_params[j][k].maxp_inv;
+   
subport_p->cman_params->red_params[j][k].wq_log2 =
+   cman_p.red_params[j][k].wq_log2;
+   }
+   } else {
+   subport_p->cman_params->pie_params[j].qdelay_ref =
+   cman_p.pie_params[j].qdelay_ref;
+   
subport_p->cman_params->pie_params[j].dp_update_interval =
+   cman_p.pie_params[j].dp_update_interval;
+   subport_p->cman_params->pie_params[j].max_burst =
+   cman_p.pie_params[j].max_burst;
+   subport_p->cman_params->pie_params[j].tailq_th =
+   cman_p.pie_params[j].tailq_th;
+   }
+   }
+}
+#endif
+
 int
 cfg_load_subport(struct rte_cfgfile *cfg, struct rte_sched_subport_params 
*subport_params)
 {
@@ -242,25 +276,26 @@ cfg_load_subport(struct rte_cfgfile *cfg, struct 
rte_sched_subport_params *subpo
memset(active_queues, 0, sizeof(active_queues));
n_active_queues = 0;
 
-#ifdef RTE_SCHED_RED
-   char sec_name[CFG_NAME_LEN];
-   struct rte_red_params 
red_params[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE][RTE_COLORS];
+#ifdef RTE_SCHED_CMAN
+   struct rte_sched_cman_params cman_params = {
+   .cman_mode = RTE_SCHED_CMAN_WRED,
+   .red_params = { },
+   };
 
-   snprintf(sec_name, sizeof(sec_name), "red");
-
-   if (rte_cfgfile_has_section(cfg, sec_name)) {
+   if (rte_cfgfile_has_section(cfg, "red")) {
+   cman_params.cman_mode = RTE_SCHED_CMAN_WRED;
 
for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++) {
char str[32];
 
/* Parse WRED min thresholds */
snprintf(str, sizeof(str), "tc %d wred min", i);
-   entry = rte_cfgfile_get_entry(cfg, sec_name, str);
+   entry = rte_cfgfile_get_entry(cfg, "red", str);
if (entry) {
char *next;
/* for each packet colour (green, yellow, red) 
*/
for (j = 0; j < RTE_COLORS; j++) {
-

[dpdk-dev] [PATCH v14 3/5] example/ip_pipeline: add PIE support

2021-10-15 Thread Liguzinski, WojciechX
Adding the PIE support for IP Pipeline

Signed-off-by: Liguzinski, WojciechX 
---
 examples/ip_pipeline/tmgr.c | 142 +++-
 1 file changed, 74 insertions(+), 68 deletions(-)

diff --git a/examples/ip_pipeline/tmgr.c b/examples/ip_pipeline/tmgr.c
index e4e364cbc0..b138e885cf 100644
--- a/examples/ip_pipeline/tmgr.c
+++ b/examples/ip_pipeline/tmgr.c
@@ -17,6 +17,77 @@ static uint32_t n_subport_profiles;
 static struct rte_sched_pipe_params
pipe_profile[TMGR_PIPE_PROFILE_MAX];
 
+#ifdef RTE_SCHED_CMAN
+static struct rte_sched_cman_params cman_params = {
+   .red_params = {
+   /* Traffic Class 0 Colors Green / Yellow / Red */
+   [0][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [0][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [0][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 1 - Colors Green / Yellow / Red */
+   [1][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [1][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [1][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 2 - Colors Green / Yellow / Red */
+   [2][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [2][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [2][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 3 - Colors Green / Yellow / Red */
+   [3][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [3][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [3][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 4 - Colors Green / Yellow / Red */
+   [4][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [4][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [4][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 5 - Colors Green / Yellow / Red */
+   [5][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [5][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [5][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 6 - Colors Green / Yellow / Red */
+   [6][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [6][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [6][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 7 - Colors Green / Yellow / Red */
+   [7][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [7][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [7][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 8 - Colors Green / Yellow / Red */
+   [8][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [8][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [8][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 9 - Colors Green / Yellow / Red */
+   [9][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [9][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [9][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 10 - Colors Green / Yellow / Red */
+   [10][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [10][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [10][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 11 - Colors Green / Yellow / Red */
+   [11][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [11][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [11][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+
+   /* Traffic Class 12 - Colors Green / Yellow / Red */
+   [12][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [12][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   [12][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 
= 9},
+   },
+};
+#endif /* RTE_SCHED_CMAN */
+
 static uint32_t n_pipe_profiles;
 
 static const str

[dpdk-dev] [PATCH v14 4/5] doc/guides/prog_guide: added PIE

2021-10-15 Thread Liguzinski, WojciechX
Added PIE related information to documentation.

Signed-off-by: Liguzinski, WojciechX 
---
 doc/guides/prog_guide/glossary.rst   |  3 +
 doc/guides/prog_guide/qos_framework.rst  | 60 +---
 doc/guides/prog_guide/traffic_management.rst | 13 -
 3 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/doc/guides/prog_guide/glossary.rst 
b/doc/guides/prog_guide/glossary.rst
index 7044a7df2a..fb0910ba5b 100644
--- a/doc/guides/prog_guide/glossary.rst
+++ b/doc/guides/prog_guide/glossary.rst
@@ -158,6 +158,9 @@ PCI
 PHY
An abbreviation for the physical layer of the OSI model.
 
+PIE
+   Proportional Integral Controller Enhanced (RFC8033)
+
 pktmbuf
An *mbuf* carrying a network packet.
 
diff --git a/doc/guides/prog_guide/qos_framework.rst 
b/doc/guides/prog_guide/qos_framework.rst
index 3b8a1184b0..7c8450181d 100644
--- a/doc/guides/prog_guide/qos_framework.rst
+++ b/doc/guides/prog_guide/qos_framework.rst
@@ -56,7 +56,8 @@ A functional description of each block is provided in the 
following table.
|   ||  
  |

+---+++
| 7 | Dropper| Congestion management using the Random Early 
Detection (RED) algorithm |
-   |   || (specified by the Sally Floyd - Van Jacobson 
paper) or Weighted RED (WRED).|
+   |   || (specified by the Sally Floyd - Van Jacobson 
paper) or Weighted RED (WRED) |
+   |   || or Proportional Integral Controller Enhanced 
(PIE).|
|   || Drop packets based on the current scheduler 
queue load level and packet|
|   || priority. When congestion is experienced, 
lower priority packets are dropped   |
|   || first.   
  |
@@ -421,7 +422,7 @@ No input packet can be part of more than one pipeline stage 
at a given time.
 The congestion management scheme implemented by the enqueue pipeline described 
above is very basic:
 packets are enqueued until a specific queue becomes full,
 then all the packets destined to the same queue are dropped until packets are 
consumed (by the dequeue operation).
-This can be improved by enabling RED/WRED as part of the enqueue pipeline 
which looks at the queue occupancy and
+This can be improved by enabling RED/WRED or PIE as part of the enqueue 
pipeline which looks at the queue occupancy and
 packet priority in order to yield the enqueue/drop decision for a specific 
packet
 (as opposed to enqueuing all packets / dropping all packets indiscriminately).
 
@@ -1155,13 +1156,13 @@ If the number of queues is small,
 then the performance of the port scheduler for the same level of active 
traffic is expected to be worse than
 the performance of a small set of message passing queues.
 
-.. _Dropper:
+.. _Droppers:
 
-Dropper
+Droppers
 ---
 
 The purpose of the DPDK dropper is to drop packets arriving at a packet 
scheduler to avoid congestion.
-The dropper supports the Random Early Detection (RED),
+The dropper supports the Proportional Integral Controller Enhanced (PIE), 
Random Early Detection (RED),
 Weighted Random Early Detection (WRED) and tail drop algorithms.
 :numref:`figure_blk_diag_dropper` illustrates how the dropper integrates with 
the scheduler.
 The DPDK currently does not support congestion management
@@ -1174,9 +1175,13 @@ so the dropper provides the only method for congestion 
avoidance.
High-level Block Diagram of the DPDK Dropper
 
 
-The dropper uses the Random Early Detection (RED) congestion avoidance 
algorithm as documented in the reference publication.
-The purpose of the RED algorithm is to monitor a packet queue,
+The dropper uses one of two congestion avoidance algorithms:
+   - the Random Early Detection (RED) as documented in the reference 
publication.
+   - the Proportional Integral Controller Enhanced (PIE) as documented in 
RFC8033 publication.
+
+The purpose of the RED/PIE algorithm is to monitor a packet queue,
 determine the current congestion level in the queue and decide whether an 
arriving packet should be enqueued or dropped.
+
 The RED algorithm uses an Exponential Weighted Moving Average (EWMA) filter to 
compute average queue size which
 gives an indication of the current congestion level in the queue.
 
@@ -1192,7 +1197,7 @@ This occurs when a packet queue has reached maximum 
capacity and cannot store an
 In this situation, all arriving packets are dropped.
 
 The flow through the dropper is illustrated in 
:numref:`figure_flow_tru_droppper`.
-The RED/WRED algorithm is exercised first and tail drop second.
+The RED/WRED/PIE algorithm is exercised first and tail drop second.

[dpdk-dev] [PATCH v14 5/5] app/test: add tests for PIE

2021-10-15 Thread Liguzinski, WojciechX
Tests for PIE code added to test application.

Signed-off-by: Liguzinski, WojciechX 
---
 app/test/meson.build |4 +
 app/test/test_pie.c  | 1065 ++
 lib/sched/rte_pie.c  |6 +-
 lib/sched/rte_pie.h  |   17 +-
 4 files changed, 1085 insertions(+), 7 deletions(-)
 create mode 100644 app/test/test_pie.c

diff --git a/app/test/meson.build b/app/test/meson.build
index f144d8b8ed..00ad7ab368 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -112,6 +112,7 @@ test_sources = files(
 'test_reciprocal_division.c',
 'test_reciprocal_division_perf.c',
 'test_red.c',
+'test_pie.c',
 'test_reorder.c',
 'test_rib.c',
 'test_rib6.c',
@@ -242,6 +243,7 @@ fast_tests = [
 ['prefetch_autotest', true],
 ['rcu_qsbr_autotest', true],
 ['red_autotest', true],
+['pie_autotest', true],
 ['rib_autotest', true],
 ['rib6_autotest', true],
 ['ring_autotest', true],
@@ -293,6 +295,7 @@ perf_test_names = [
 'fib_slow_autotest',
 'fib_perf_autotest',
 'red_all',
+'pie_all',
 'barrier_autotest',
 'hash_multiwriter_autotest',
 'timer_racecond_autotest',
@@ -306,6 +309,7 @@ perf_test_names = [
 'fib6_perf_autotest',
 'rcu_qsbr_perf_autotest',
 'red_perf',
+'pie_perf',
 'distributor_perf_autotest',
 'pmd_perf_autotest',
 'stack_perf_autotest',
diff --git a/app/test/test_pie.c b/app/test/test_pie.c
new file mode 100644
index 00..dfa69d1c7e
--- /dev/null
+++ b/app/test/test_pie.c
@@ -0,0 +1,1065 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#include 
+
+#ifdef __INTEL_COMPILER
+#pragma warning(disable:2259)   /* conversion may lose significant bits */
+#pragma warning(disable:181)/* Arg incompatible with format string */
+#endif
+
+/**< structures for testing rte_pie performance and function */
+struct test_rte_pie_config {/**< Test structure for RTE_PIE config */
+   struct rte_pie_config *pconfig; /**< RTE_PIE configuration parameters */
+   uint8_t num_cfg;/**< Number of RTE_PIE configs to test 
*/
+   uint16_t qdelay_ref;/**< Latency Target (milliseconds) */
+   uint16_t *dp_update_interval;   /**< Update interval for drop 
probability
+ * (milliseconds)
+ */
+   uint16_t *max_burst;/**< Max Burst Allowance (milliseconds) 
*/
+   uint16_t tailq_th;  /**< Tailq drop threshold (packet 
counts) */
+};
+
+struct test_queue { /**< Test structure for RTE_PIE Queues */
+   struct rte_pie *pdata_in;   /**< RTE_PIE runtime data input */
+   struct rte_pie *pdata_out;  /**< RTE_PIE runtime data 
output*/
+   uint32_t num_queues;/**< Number of RTE_PIE queues to test */
+   uint32_t *qlen; /**< Queue size */
+   uint32_t q_ramp_up; /**< Num of enqueues to ramp up the 
queue */
+   double drop_tolerance;  /**< Drop tolerance of packets not 
enqueued */
+};
+
+struct test_var {   /**< Test variables used for testing 
RTE_PIE */
+   uint32_t num_iterations;/**< Number of test iterations */
+   uint32_t num_ops;   /**< Number of test operations */
+   uint64_t clk_freq;  /**< CPU clock frequency */
+   uint32_t *dropped;  /**< Test operations dropped */
+   uint32_t *enqueued; /**< Test operations enqueued */
+   uint32_t *dequeued; /**< Test operations dequeued */
+};
+
+struct test_config {/**< Primary test structure for RTE_PIE */
+   const char *ifname; /**< Interface name */
+   const char *msg;/**< Test message for display */
+   const char *htxt;   /**< Header txt display for result 
output */
+   struct test_rte_pie_config *tconfig; /**< Test structure for RTE_PIE 
config */
+   struct test_queue *tqueue;  /**< Test structure for RTE_PIE Queues 
*/
+   struct test_var *tvar;  /**< Test variables used for testing 
RTE_PIE */
+   uint32_t *tlevel;   /**< Queue levels */
+};
+
+enum test_result {
+   FAIL = 0,
+   PASS
+};
+
+/**< Test structure to define tests to run */
+struct tests {
+   struct test_config *testcfg;
+   enum test_result (*testfn)(struct test_config *cfg);
+};
+
+struct rdtsc_prof {
+   uint64_t clk_start;
+   uint64_t clk_min;   /**< min clocks */
+   uint64_t clk_max;   /**< max clocks */
+   uint64_t clk_avgc;  /**< cou

Re: [dpdk-dev] [PATCH v9 0/4] improve telemetry support with in-memory mode

2021-10-15 Thread Bruce Richardson
On Thu, Oct 14, 2021 at 09:00:09PM +0200, David Marchand wrote:
> On Thu, Oct 14, 2021 at 12:50 PM Bruce Richardson
>  wrote:
> >
> > This patchset cleans up telemetry support for "in-memory" mode, so that
> > multiple independent processes can be run using that mode and still have
> > telemetry support. It also removes problems of one process removing the
> > socket of another - which was the original issue reported. The main changes
> > in this set are to:
> >
> > * disable telemetry for secondary processes, which prevents any socket
> >   conflicts in multi-process cases.
> > * when multiple processes are run using the same runtime directory (i.e.
> >   "in-memory" mode or similar), add a counter suffix to the socket names to
> >   avoid conflicts over the socket. Each process will use the lowest 
> > available
> >   suffix, with the first process using the directory, not adding any suffix.
> > * update the telemetry script and documentation to allow it to connect to
> >   in-memory DPDK processes.
> >
> 
> Thanks a lot for working on this.
> Reading the updated doc, I fixed some -file-prefix to --file-prefix in doc.
> 
Thanks for the fixups.


Re: [dpdk-dev] [PATCH v2] examples/vhost: change the default value of NIC's max queues

2021-10-15 Thread Xia, Chenbo
> -Original Message-
> From: Ma, WenwuX 
> Sent: Saturday, October 16, 2021 4:01 AM

Is your server time incorrect? You are sending patch from future

> To: dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Xia, Chenbo ; Jiang,
> Cheng1 ; Hu, Jiayu ; Yang, YvonneX
> ; Ma, WenwuX 
> Subject: [PATCH v2] examples/vhost: change the default value of NIC's max
> queues
> 
> vswitch can't launch with a 40G i40e port due to device start fails
> if NIC’s max queues > the default number of 128, so, we changed
> the default value from 128 to 512.

Still missing fix and cc-stable tag.

/Chenbo

> 
> Signed-off-by: Wenwu Ma 
> ---
>  examples/vhost/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index bc3d71c898..36969a4de5 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -29,7 +29,7 @@
>  #include "main.h"
> 
>  #ifndef MAX_QUEUES
> -#define MAX_QUEUES 128
> +#define MAX_QUEUES 512
>  #endif
> 
>  /* the maximum number of external ports supported */
> --
> 2.25.1



Re: [dpdk-dev] [PATCH v2] mempool: enforce valid flags at creation

2021-10-15 Thread David Marchand
On Fri, Oct 15, 2021 at 9:02 AM Andrew Rybchenko
 wrote:
> On 10/14/21 10:37 PM, Stephen Hemminger wrote:
> > On Thu, 14 Oct 2021 21:29:16 +0200
> > David Marchand  wrote:
> >
> >> If we do not enforce valid flags are passed by an application, this
> >> application might face issues in the future when we add more flags.
> >>
> >> Signed-off-by: David Marchand 
> >> Reviewed-by: Andrew Rybchenko 
> >> Acked-by: Ray Kinsella 
> > Acked-by: Stephen Hemminger 
> Acked-by: Andrew Rybchenko 

Applied, thanks.


-- 
David Marchand



Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart

2021-10-15 Thread Andrew Rybchenko
On 10/14/21 5:14 PM, Dmitry Kozlyuk wrote:
> 
> 
>> -Original Message-
>> From: Dmitry Kozlyuk
>> Sent: 13 октября 2021 г. 11:33
>> To: dev@dpdk.org; Andrew Rybchenko ; Ori
>> Kam ; Raslan Darawsheh 
>> Cc: NBU-Contact-Thomas Monjalon ; Ferruh Yigit
>> 
>> Subject: RE: [PATCH 2/5] ethdev: add capability to keep shared objects on
>> restart
>>
>> This thread continues discussions on previous versions to keep everything
>> in the thread with final patches:
>>
>> [1]: http://inbox.dpdk.org/dev/d5673b58-5aa6-ca35-5b60-
>> d938e56cf...@oktetlabs.ru/
>> [2]:
>> http://inbox.dpdk.org/dev/DM8PR12MB5400997CCEC9169AC5AE0C89D6EA9@DM8PR12MB
>> 5400.namprd12.prod.outlook.com/
>>
>> Please see below.
>>
>>> -Original Message-
>>> From: Dmitry Kozlyuk 
>>> Sent: 5 октября 2021 г. 3:52
>>> To: dev@dpdk.org
>>> Cc: Dmitry Kozlyuk ; Ori Kam ;
>>> NBU- Contact-Thomas Monjalon ; Ferruh Yigit
>>> ; Andrew Rybchenko
>>> 
>>> Subject: [PATCH 2/5] ethdev: add capability to keep shared objects on
>>> restart
>>>
>>> From: Dmitry Kozlyuk 
>>>
>>> rte_flow_action_handle_create() did not mention what happens with an
>>> indirect action when a device is stopped, possibly reconfigured, and
>>> started again. It is natural for some indirect actions to be
>>> persistent, like counters and meters; keeping others just saves
>>> application time and complexity. However, not all PMDs can support it.
>>> It is proposed to add a device capability to indicate if indirect
>>> actions are kept across the above sequence or implicitly destroyed.
>>>
>>> In the future, indirect actions may not be the only type of objects
>>> shared between flow rules. The capability bit intends to cover all
>>> possible types of such objects, hence its name.
>>>
>>> It may happen that in the future a PMD acquires support for a type of
>>> shared objects that it cannot keep across a restart. It is undesirable
>>> to stop advertising the capability so that applications that don't use
>>> objects of the problematic type can still take advantage of it.
>>> This is why PMDs are allowed to keep only a subset of shared objects
>>> provided that the vendor mandatorily documents it.
>>>
>>> If the device is being reconfigured in a way that is incompatible with
>>> an existing shared objects, PMD is required to report an error.
>>> This is mandatory, because flow API does not supply users with
>>> capabilities, so this is the only way for a user to learn that
>>> configuration is invalid. For example, if queue count changes and RSS
>>> indirect action specifies queues that are going away, the user must
>>> update the action before removing the queues or remove the action and
>>> all flow rules that were using it.
>>>
>>> Signed-off-by: Dmitry Kozlyuk 
>>> ---
>>> [...]
>>
>> Current pain point is that capability bits may be insufficient and a
>> programmatic way is desired to check which types of objects can be kept
>> across restart, instead of documenting the limitations.
>>
>> I support one of previous Ori's suggestions and want to clarify it [1]:
>>
>> Ori: "Another way is to assume that if the action was created before port
>> start it will be kept after port stop."
>> Andrew: "It does not sound like a solution. May be I simply don't know
>> target usecase."
>>
>> What Ori suggests (offline discussion summary): Suppose an application
>> wants to check whether a shared object (indirect action) or a flow rule of
>> a particular kind. It calls rte_flow_action_handle_create() or
>> rte_flow_create() before rte_eth_dev_start(). If it succeeds, 1) it means
>> objects of this type can be kept across restart, 2) it's a normal object
>> created that will work after the port is started. This is logical, because
>> if the PMD can keep some kind of objects when the port is stopped, it is
>> likely to be able to create them when the port is not started. It is
>> subject to discussion if "object kind" means only "type" or "type +
>> transfer bit" combination; for mlx5 PMD it doesn't matter. One minor
>> drawback is that applications can only do the test when the port is
>> stopped, but it seems likely that the test really needs to be done at
>> startup anyway.
>>
>> If this is acceptable:
>> 1. Capability bits are not needed anymore.
>> 2. ethdev patches can be accepted in RC1, present behavior is undefined
>> anyway.
>> 3. PMD patches will need update that can be done by RC2.
> 
> Andrew, what do you think?
> If you agree, do we need to include transfer bit into "kind"?
> I'd like to conclude before RC1 and can update the docs quickly.
> 
> I've seen the proposition to advertise capability
> to create flow rules before device start as a flag.
> I don't think it conflicts with Ori's suggestion
> because the flag doesn't imply that _any_ rule can be created,
> neither does it say about indirect actions.
> On the other hand, if PMD can create a flow object (rule, etc.)
> when the device is not started, it is logical to assume that
> after the device is stopped it can mov

Re: [dpdk-dev] [PATCH] memzone: enforce valid flags when reserving

2021-10-15 Thread David Marchand
On Wed, Oct 13, 2021 at 10:47 AM Andrew Rybchenko
 wrote:
>
> On 10/12/21 11:14 PM, Stephen Hemminger wrote:
> > On Tue, 12 Oct 2021 21:39:26 +0200
> > David Marchand  wrote:
> >
> >> If we do not enforce valid flags are passed by an application, this
> >> application might face issues in the future when we add more flags.
> >>
> >> Signed-off-by: David Marchand 
> > Acked-by: Stephen Hemminger 
> Acked-by: Andrew Rybchenko 
Acked-by: Ray Kinsella 

Applied, thanks.


-- 
David Marchand



Re: [dpdk-dev] [PATCH] mbuf: enforce no option for dynamic fields and flags

2021-10-15 Thread David Marchand
On Wed, Oct 13, 2021 at 9:06 AM Andrew Rybchenko
 wrote:
> On 10/12/21 11:14 PM, Stephen Hemminger wrote:
> > On Tue, 12 Oct 2021 21:39:57 +0200
> > David Marchand  wrote:
> >
> >> As stated in the API, dynamic field and flags should be created with no
> >> additional flag (simply in the API for future changes).
> >>
> >> Fix the dynamic flag register helper which was not enforcing it and add
> >> unit tests.
> >>
> >> Fixes: 4958ca3a443a ("mbuf: support dynamic fields and flags")
> >>
> >> Signed-off-by: David Marchand 
> > Acked-by: Stephen Hemminger 
> Acked-by: Andrew Rybchenko 
Acked-by: Ray Kinsella 

Applied, thanks.


-- 
David Marchand



Re: [dpdk-dev] [PATCH v2] eal: add telemetry callbacks for memory info

2021-10-15 Thread Power, Ciara
Hi Harman,

>-Original Message-
>From: Harman Kalra 
>Sent: Thursday 14 October 2021 18:17
>To: Harman Kalra ; dev@dpdk.org; Richardson, Bruce
>; Power, Ciara ;
>Burakov, Anatoly 
>Subject: RE: [PATCH v2] eal: add telemetry callbacks for memory info
>
>Ping...
>
>> -Original Message-
>> From: Harman Kalra 
>> Sent: Friday, October 8, 2021 6:14 PM
>> To: dev@dpdk.org; bruce.richard...@intel.com; ciara.po...@intel.com;
>> Anatoly Burakov 
>> Cc: Harman Kalra 
>> Subject: [PATCH v2] eal: add telemetry callbacks for memory info
>>
>> Registering new telemetry callbacks to list named (memzones) and
>> unnamed
>> (malloc) memory reserved and return information based on arguments
>> provided by user.
>>
>> Example:
>> Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
>> {"version": "DPDK 21.11.0-rc0", "pid": 59754, "max_output_len": 16384}
>> Connected to application: "dpdk-testpmd"
>> -->
>> --> /eal/memzone_list
>> {"/eal/memzone_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}
>> -->
>> -->
>> --> /eal/memzone_info,0
>> {"/eal/memzone_info": {"Zone": 0, "Name": "rte_eth_dev_data",\
>> "Length": 225408, "Address": "0x13ffc0280", "Socket": 0, "Flags": 0, \
>> "Hugepage_size": 536870912, "Hugepage_base": "0x12000",   \
>> "Hugepage_used": 1}}
>> -->
>> -->
>> --> /eal/memzone_info,6
>> {"/eal/memzone_info": {"Zone": 6, "Name": "MP_mb_pool_0_0",  \
>> "Length": 669918336, "Address": "0x15811db80", "Socket": 0,  \
>> "Flags": 0, "Hugepage_size": 536870912, "Hugepage_base":
>> "0x14000", \
>> "Hugepage_used": 2}}
>> -->
>> -->
>> --> /eal/memzone_info,14
>> {"/eal/memzone_info": null}
>> -->
>> -->
>> --> /eal/heap_list
>> {"/eal/heap_list": [0]}
>> -->
>> -->
>> --> /eal/heap_info,0
>> {"/eal/heap_info": {"Head id": 0, "Name": "socket_0", \
>> "Heap_size": 1610612736, "Free_size": 927645952,  \
>> "Alloc_size": 682966784, "Greatest_free_size": 529153152, \
>> "Alloc_count": 482, "Free_count": 2}}
>>
>> Signed-off-by: Harman Kalra 
>> ---


>From a Telemetry usage point of view, 

Acked-by: Ciara Power 


Re: [dpdk-dev] [PATCH v25 1/6] dmadev: introduce DMA device library

2021-10-15 Thread Thomas Monjalon
13/10/2021 09:41, Thomas Monjalon:
> 13/10/2021 02:21, fengchengwen:
> > On 2021/10/13 3:09, Thomas Monjalon wrote:
> > > 11/10/2021 09:33, Chengwen Feng:
> > >> +static void
> > >> +dma_release(struct rte_dma_dev *dev)
> > >> +{
> > >> +rte_free(dev->dev_private);
> > >> +memset(dev, 0, sizeof(struct rte_dma_dev));
> > >> +}
> [...]
> > >> +int
> > >> +rte_dma_pmd_release(const char *name)
> > >> +{
> > >> +struct rte_dma_dev *dev;
> > >> +
> > >> +if (dma_check_name(name) != 0)
> > >> +return -EINVAL;
> > >> +
> > >> +dev = dma_find_by_name(name);
> > >> +if (dev == NULL)
> > >> +return -EINVAL;
> > >> +
> > >> +dma_release(dev);
> > >> +return 0;
> > >> +}
> > > 
> > > Trying to understand the logic of creation/destroy.
> > > skeldma_probe
> > > \-> skeldma_create
> > > \-> rte_dma_pmd_allocate
> > > \-> dma_allocate
> > > \-> dma_data_prepare
> > > \-> dma_dev_data_prepare
> > > skeldma_remove
> > > \-> skeldma_destroy
> > > \-> rte_dma_pmd_release
> > > \-> dma_release
> > 
> > This patch only provide device allocate function, the 2st patch provide 
> > extra logic:
> > 
> > diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> > index 42a4693bd9..a6a5680d2b 100644
> > --- a/lib/dmadev/rte_dmadev.c
> > +++ b/lib/dmadev/rte_dmadev.c
> > @@ -201,6 +201,9 @@ rte_dma_pmd_release(const char *name)
> > if (dev == NULL)
> > return -EINVAL;
> > 
> > +   if (dev->state == RTE_DMA_DEV_READY)
> > +   return rte_dma_close(dev->dev_id);
> > +
> > dma_release(dev);
> > return 0;
> >  }
> > 
> > So the skeldma remove will be:
> > 
> >  skeldma_remove
> >  \-> skeldma_destroy
> >  \-> rte_dma_pmd_release
> >  \-> rte_dma_close
> >  \-> dma_release
> 
> OK, in this case, no need to dma_release from rte_dma_pmd_release,
> because it is already called from rte_dma_close.

Ping for reply please.





Re: [dpdk-dev] [PATCH v2] net/virtio: handle Tx checksums correctly for tunnel packets

2021-10-15 Thread Olivier Matz
On Thu, Oct 14, 2021 at 07:12:29AM +, Xia, Chenbo wrote:
> > -Original Message-
> > From: Ivan Malov 
> > Sent: Friday, September 17, 2021 2:50 AM
> > To: dev@dpdk.org
> > Cc: Maxime Coquelin ; sta...@dpdk.org; Andrew
> > Rybchenko ; Xia, Chenbo 
> > ;
> > Yuanhan Liu ; Olivier Matz
> > 
> > Subject: [PATCH v2] net/virtio: handle Tx checksums correctly for tunnel
> > packets
> > 
> > Tx prepare method calls rte_net_intel_cksum_prepare(), which
> > handles tunnel packets correctly, but Tx burst path does not
> > take tunnel presence into account when computing the offsets.
> > 
> > Fixes: 58169a9c8153 ("net/virtio: support Tx checksum offload")
> > Cc: sta...@dpdk.org
> > 
> > Signed-off-by: Ivan Malov 
> > Reviewed-by: Andrew Rybchenko 
> > ---
> >  drivers/net/virtio/virtqueue.h | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
> > index 03957b2bd0..b83ff32efb 100644
> > --- a/drivers/net/virtio/virtqueue.h
> > +++ b/drivers/net/virtio/virtqueue.h
> > @@ -620,19 +620,21 @@ static inline void
> >  virtqueue_xmit_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *cookie)
> >  {
> > uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
> > +   uint16_t o_l23_len = (cookie->ol_flags & PKT_TX_TUNNEL_MASK) ?
> > +cookie->outer_l2_len + cookie->outer_l3_len : 0;
> > 
> > if (cookie->ol_flags & PKT_TX_TCP_SEG)
> > csum_l4 |= PKT_TX_TCP_CKSUM;
> > 
> > switch (csum_l4) {
> > case PKT_TX_UDP_CKSUM:
> > -   hdr->csum_start = cookie->l2_len + cookie->l3_len;
> > +   hdr->csum_start = o_l23_len + cookie->l2_len + cookie->l3_len;
> > hdr->csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
> > hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> > break;
> > 
> > case PKT_TX_TCP_CKSUM:
> > -   hdr->csum_start = cookie->l2_len + cookie->l3_len;
> > +   hdr->csum_start = o_l23_len + cookie->l2_len + cookie->l3_len;
> > hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
> > hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> > break;
> > @@ -650,7 +652,8 @@ virtqueue_xmit_offload(struct virtio_net_hdr *hdr, 
> > struct
> > rte_mbuf *cookie)
> > VIRTIO_NET_HDR_GSO_TCPV6 :
> > VIRTIO_NET_HDR_GSO_TCPV4;
> > hdr->gso_size = cookie->tso_segsz;
> > -   hdr->hdr_len = cookie->l2_len + cookie->l3_len + cookie->l4_len;
> > +   hdr->hdr_len = o_l23_len + cookie->l2_len + cookie->l3_len +
> > +  cookie->l4_len;
> > } else {
> > ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
> > ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
> > --
> > 2.20.1
> 
> Reviewed-by: Chenbo Xia 
> 

I have one comment to mention that from application perspective, it has
to take care that the driver does not support outer tunnel offload (this
matches the advertised capabilities). For instance, in case of a vxlan
tunnel, if the outer checksum needs to be calculated, it has to be done
by the application. In short, the application can ask to offload the
inner part if no offload is required on the outer part.

Also, since grep "PKT_TX_TUNNEL" in driver/net/ixgbe gives nothing, it
seems the ixgbe driver does not support the same offload request than
described in this patch:
  (m->ol_flags & PKT_TX_TUNNEL_MASK) == PKT_TX_TUNNEL_X
  m->outer_l2_len = outer l2 length
  m->outer_l3_len = outer l3 length
  m->l2_len = outer l4 length + tunnel len + inner l2 len
  m->l3_len = inner l3 len
  m->l4_len = inner l4 len

An alternative for doing the same (that would work with ixgbe and
current virtio) is to give:
  (m->ol_flags & PKT_TX_TUNNEL_MASK) == 0
  m->l2_len = outer lengths + tunnel len + inner l2 len
  m->l3_len = inner l3 len
  m->l4_len = inner l4 len

I think a capability may be missing to differentiate which drivers
support which mode. Or, all drivers could be fixed to support both modes
(and this would make this patch valid).

Thanks,
Olivier


[dpdk-dev] [PATCH] doc: fix default mempool option

2021-10-15 Thread David Marchand
This option should be prefixed with -- for consistency with others.

Fixes: a103a97e7191 ("eal: allow user to override default mempool driver")
Cc: sta...@dpdk.org

Signed-off-by: David Marchand 
---
 doc/guides/freebsd_gsg/build_sample_apps.rst | 2 +-
 doc/guides/linux_gsg/build_sample_apps.rst   | 2 +-
 doc/guides/linux_gsg/eal_args.include.rst| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/guides/freebsd_gsg/build_sample_apps.rst 
b/doc/guides/freebsd_gsg/build_sample_apps.rst
index 4fba671e4f..c87e982759 100644
--- a/doc/guides/freebsd_gsg/build_sample_apps.rst
+++ b/doc/guides/freebsd_gsg/build_sample_apps.rst
@@ -88,7 +88,7 @@ Other options, specific to Linux and are not supported under 
FreeBSD are as foll
 *   ``--huge-dir``:
 The directory where hugetlbfs is mounted.
 
-*   ``mbuf-pool-ops-name``:
+*   ``--mbuf-pool-ops-name``:
 Pool ops name for mbuf to use.
 
 *   ``--file-prefix``:
diff --git a/doc/guides/linux_gsg/build_sample_apps.rst 
b/doc/guides/linux_gsg/build_sample_apps.rst
index 709b301427..efd2dd23f1 100644
--- a/doc/guides/linux_gsg/build_sample_apps.rst
+++ b/doc/guides/linux_gsg/build_sample_apps.rst
@@ -82,7 +82,7 @@ The EAL options are as follows:
 * ``--huge-dir``:
   The directory where hugetlbfs is mounted.
 
-* ``mbuf-pool-ops-name``:
+* ``--mbuf-pool-ops-name``:
   Pool ops name for mbuf to use.
 
 * ``--file-prefix``:
diff --git a/doc/guides/linux_gsg/eal_args.include.rst 
b/doc/guides/linux_gsg/eal_args.include.rst
index 96baa4a9b0..3549a0cf56 100644
--- a/doc/guides/linux_gsg/eal_args.include.rst
+++ b/doc/guides/linux_gsg/eal_args.include.rst
@@ -199,7 +199,7 @@ Other options
 
 Display the version information on startup.
 
-*   ``mbuf-pool-ops-name``:
+*   ``--mbuf-pool-ops-name``:
 
 Pool ops name for mbuf to use.
 
-- 
2.23.0



Re: [dpdk-dev] [PATCH v5 2/5] vhost: implement rte_power_monitor API

2021-10-15 Thread Li, Miao
Hi Chenbo,

> -Original Message-
> From: Xia, Chenbo 
> Sent: Friday, October 15, 2021 3:38 PM
> To: Li, Miao ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com
> Subject: RE: [PATCH v5 2/5] vhost: implement rte_power_monitor API
> 
> Hi,
> 
> > -Original Message-
> > From: Li, Miao 
> > Sent: Friday, October 15, 2021 11:12 PM
> > To: dev@dpdk.org
> > Cc: Xia, Chenbo ; maxime.coque...@redhat.com; Li,
> Miao
> > 
> > Subject: [PATCH v5 2/5] vhost: implement rte_power_monitor API
> >
> > This patch defines rte_vhost_power_monitor_cond which is used to pass
> > some information to vhost driver. The information is including the address
> > to monitor, the expected value, the mask to extract value read from 'addr',
> > the value size of monitor address, the match flag used to distinguish the
> > value used to match something or not match something. Vhost driver can use
> > these information to fill rte_power_monitor_cond.
> >
> > Signed-off-by: Miao Li 
> > ---
> >  doc/guides/rel_notes/release_21_11.rst |  4 +++
> >  lib/vhost/rte_vhost.h  | 44 ++
> >  lib/vhost/version.map  |  3 ++
> >  lib/vhost/vhost.c  | 38 ++
> >  4 files changed, 89 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/release_21_11.rst
> > b/doc/guides/rel_notes/release_21_11.rst
> > index 27dc896703..ad6d256a55 100644
> > --- a/doc/guides/rel_notes/release_21_11.rst
> > +++ b/doc/guides/rel_notes/release_21_11.rst
> > @@ -72,6 +72,10 @@ New Features
> >Added macros ETH_RSS_IPV4_CHKSUM and ETH_RSS_L4_CHKSUM, now IPv4
> and
> >TCP/UDP/SCTP header checksum field can be used as input set for RSS.
> >
> > +* **Added power monitor API in vhost library.**
> > +
> > +  Added an API to support power monitor in vhost library.
> > +
> >  * **Updated virtio PMD.**
> >
> >Implement rte_power_monitor API in virtio PMD.
> > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> > index fd372d5259..42bda95e96 100644
> > --- a/lib/vhost/rte_vhost.h
> > +++ b/lib/vhost/rte_vhost.h
> > @@ -292,6 +292,33 @@ struct vhost_device_ops {
> > void *reserved[1]; /**< Reserved for future extension */
> >  };
> >
> > +/**
> > + * Power monitor condition.
> > + */
> > +struct rte_vhost_power_monitor_cond {
> > +   /**< Address to monitor for changes */
> > +   volatile void *addr;
> > +   /**< If the `mask` is non-zero, location pointed
> > +*   to by `addr` will be read and masked, then
> > +*   compared with this value.
> > +*/
> > +   uint64_t val;
> > +   /**< 64-bit mask to extract value read from `addr` */
> > +   uint64_t mask;
> > +   /**< Data size (in bytes) that will be read from the
> > +*   monitored memory location (`addr`). Can be 1, 2,
> > +*   4, or 8. Supplying any other value will result in
> > +*   an error.
> 
> 'Can be ...' part is not necessary, as this value is defined in vhost
> lib and currently only has two different values for packed or split.

I will remove it in the next version.

> 
> > +*/
> > +   uint8_t size;
> > +   /**< If 1, and masked value that read from 'addr' equals
> > +*   'val', the driver will skip core sleep. If 0, and
> 
> 'will' -> 'should'. As it's a suggestion for vhost driver.
> 
> > +*  masked value that read from 'addr' does not equal 'val',
> > +*  the driver will skip core sleep.
> 
> Ditto.

I will modify them in the next version.

Thanks,
Miao

> 
> Thanks,
> Chenbo
> 
> > +*/
> > +   uint8_t match;
> > +};
> > +
> >  /**
> >   * Convert guest physical address to host virtual address
> >   *
> > @@ -903,6 +930,23 @@ int rte_vhost_vring_call(int vid, uint16_t vring_idx);
> >   */
> >  uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
> >
> > +/**
> > + * Get power monitor address of the vhost device
> > + *
> > + * @param vid
> > + *  vhost device ID
> > + * @param queue_id
> > + *  vhost queue ID
> > + * @param pmc
> > + *  power monitor condition
> > + * @return
> > + *  0 on success, -1 on failure
> > + */
> > +__rte_experimental
> > +int
> > +rte_vhost_get_monitor_addr(int vid, uint16_t queue_id,
> > +   struct rte_vhost_power_monitor_cond *pmc);
> > +
> >  /**
> >   * Get log base and log size of the vhost device
> >   *
> > diff --git a/lib/vhost/version.map b/lib/vhost/version.map
> > index 8ebde3f694..c8599ddb97 100644
> > --- a/lib/vhost/version.map
> > +++ b/lib/vhost/version.map
> > @@ -85,4 +85,7 @@ EXPERIMENTAL {
> > rte_vhost_async_channel_register_thread_unsafe;
> > rte_vhost_async_channel_unregister_thread_unsafe;
> > rte_vhost_clear_queue_thread_unsafe;
> > +
> > +   # added in 21.11
> > +   rte_vhost_get_monitor_addr;
> >  };
> > diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> > index 9540522dac..36c896c9e2 100644
> > --- a/lib/vhost/vhost.c
> > +++ b/lib/vhost/vhost.c
> > @@ -1889,5 +1889,43 @@ rte_vhost_async_get_inflight(int vid, uint16_t
> queue_id)
> > return ret;
> >  }
> >
> 

[dpdk-dev] [PATCH v3] examples/vhost: change the default value of NIC's max queues

2021-10-15 Thread Wenwu Ma
vswitch can't launch with a 40G i40e port due to device start fails
if NIC’s max queues > the default number of 128, so, we changed
the default value from 128 to 512.

Fixes: 4796ad63ba1f ("examples/vhost: import userspace vhost application")
Cc: sta...@dpdk.org

Signed-off-by: Wenwu Ma 
---
 examples/vhost/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index bc3d71c898..36969a4de5 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -29,7 +29,7 @@
 #include "main.h"
 
 #ifndef MAX_QUEUES
-#define MAX_QUEUES 128
+#define MAX_QUEUES 512
 #endif
 
 /* the maximum number of external ports supported */
-- 
2.25.1



Re: [dpdk-dev] [PATCH v5 3/5] net/vhost: implement rte_power_monitor API

2021-10-15 Thread Li, Miao
Hi Chenbo,

> -Original Message-
> From: Xia, Chenbo 
> Sent: Friday, October 15, 2021 3:40 PM
> To: Li, Miao ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com
> Subject: RE: [PATCH v5 3/5] net/vhost: implement rte_power_monitor API
> 
> > -Original Message-
> > From: Li, Miao 
> > Sent: Friday, October 15, 2021 11:12 PM
> > To: dev@dpdk.org
> > Cc: Xia, Chenbo ; maxime.coque...@redhat.com; Li,
> Miao
> > 
> > Subject: [PATCH v5 3/5] net/vhost: implement rte_power_monitor API
> >
> > This patch implements rte_power_monitor API in vhost PMD to reduce
> > power consumption when no packet come in. According to current semantics
> > of power monitor, this commit adds a callback function to decide whether
> > aborts the sleep by checking current value against the expected value and
> > vhost_get_monitor_addr to provide address to monitor. When no packet
> come
> > in, the value of address will not be changed and the running core will
> > sleep. Once packets arrive, the value of address will be changed and the
> > running core will wakeup.
> >
> > Signed-off-by: Miao Li 
> > ---
> >  doc/guides/rel_notes/release_21_11.rst |  4 +++
> >  drivers/net/vhost/rte_eth_vhost.c  | 40 ++
> >  2 files changed, 44 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/release_21_11.rst
> > b/doc/guides/rel_notes/release_21_11.rst
> > index ad6d256a55..e6f9c284ae 100644
> > --- a/doc/guides/rel_notes/release_21_11.rst
> > +++ b/doc/guides/rel_notes/release_21_11.rst
> > @@ -76,6 +76,10 @@ New Features
> >
> >Added an API to support power monitor in vhost library.
> >
> > +* **Updated vhost PMD.**
> > +
> > +  Implement rte_power_monitor API in vhost PMD.
> > +
> >  * **Updated virtio PMD.**
> >
> >Implement rte_power_monitor API in virtio PMD.
> > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> > b/drivers/net/vhost/rte_eth_vhost.c
> > index 2e24e5f7ff..ee665ee64d 100644
> > --- a/drivers/net/vhost/rte_eth_vhost.c
> > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > @@ -1386,6 +1386,45 @@ eth_rx_queue_count(struct rte_eth_dev *dev,
> uint16_t
> > rx_queue_id)
> > return rte_vhost_rx_queue_count(vq->vid, vq->virtqueue_id);
> >  }
> >
> > +#define CLB_VAL_IDX 0
> > +#define CLB_MSK_IDX 1
> > +#define CLB_MATCH_IDX 2
> > +static int
> > +vhost_monitor_callback(const uint64_t value,
> > +   const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
> > +{
> > +   const uint64_t m = opaque[CLB_MSK_IDX];
> > +   const uint64_t v = opaque[CLB_VAL_IDX];
> > +   const uint64_t c = opaque[CLB_MATCH_IDX];
> > +
> > +   if (c)
> > +   return (value & m) == v ? -1 : 0;
> > +   else
> > +   return (value & m) == v ? 0 : -1;
> > +}
> > +
> > +static int
> > +vhost_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond
> *pmc)
> > +{
> > +   struct vhost_queue *vq = rx_queue;
> > +   struct rte_vhost_power_monitor_cond vhost_pmc;
> > +   int ret;
> > +   if (vq == NULL)
> > +   return -EINVAL;
> > +   ret = rte_vhost_get_monitor_addr(vq->vid, vq->virtqueue_id,
> > +   &vhost_pmc);
> > +   if (ret < 0)
> > +   return -EINVAL;
> > +   pmc->addr = vhost_pmc.addr;
> > +   pmc->opaque[CLB_VAL_IDX] = vhost_pmc.val;
> > +   pmc->opaque[CLB_MSK_IDX] = vhost_pmc.mask;
> > +   pmc->opaque[CLB_MATCH_IDX] = vhost_pmc.match;
> > +   pmc->size = vhost_pmc.size;
> > +   pmc->fn = vhost_monitor_callback;
> > +
> > +   return 0;
> > +}
> > +
> >  static const struct eth_dev_ops ops = {
> > .dev_start = eth_dev_start,
> > .dev_stop = eth_dev_stop,
> > @@ -1405,6 +1444,7 @@ static const struct eth_dev_ops ops = {
> > .xstats_get_names = vhost_dev_xstats_get_names,
> > .rx_queue_intr_enable = eth_rxq_intr_enable,
> > .rx_queue_intr_disable = eth_rxq_intr_disable,
> > +   .get_monitor_addr= vhost_get_monitor_addr,
> 
> Please align the format with above callbacks: one space is enough after
> 'get_monitor_addr'

I will remove extra space in the next version.

Thanks,
Miao

> 
> Thanks,
> Chenbo
> 
> >  };
> >
> >  static int
> > --
> > 2.25.1



Re: [dpdk-dev] [PATCH v5 4/5] power: modify return of queue_stopped

2021-10-15 Thread Li, Miao
Hi Chenbo,

> -Original Message-
> From: Xia, Chenbo 
> Sent: Friday, October 15, 2021 3:47 PM
> To: Li, Miao ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Burakov, Anatoly
> 
> Subject: RE: [PATCH v5 4/5] power: modify return of queue_stopped
> 
> > -Original Message-
> > From: Li, Miao 
> > Sent: Friday, October 15, 2021 11:12 PM
> > To: dev@dpdk.org
> > Cc: Xia, Chenbo ; maxime.coque...@redhat.com; Li,
> Miao
> > ; Burakov, Anatoly 
> > Subject: [PATCH v5 4/5] power: modify return of queue_stopped
> >
> > Since some vdevs like virtio and vhost do not support rxq_info_get and
> > queue state inquiry, the error return value -ENOTSUP need to be ignored
> > when queue_stopped cannot get rx queue information and rx queue state.
> > This patch changes the return value of queue_stopped when
> > rte_eth_rx_queue_info_get return ENOTSUP to support vdevs which cannot
> 
> ENOTSUP -> -ENOTSUP
> 
> With this fixed:
> 
> Reviewed-by: Chenbo Xia 

I will fix it in the next version.

Thanks,
Miao

> 
> > provide rx queue information and rx queue state enable power management.
> >
> > Signed-off-by: Miao Li 
> > Acked-by: Anatoly Burakov 
> > ---
> >  lib/power/rte_power_pmd_mgmt.c | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/power/rte_power_pmd_mgmt.c
> b/lib/power/rte_power_pmd_mgmt.c
> > index 0ce40f0875..39a2b4cd23 100644
> > --- a/lib/power/rte_power_pmd_mgmt.c
> > +++ b/lib/power/rte_power_pmd_mgmt.c
> > @@ -382,8 +382,13 @@ queue_stopped(const uint16_t port_id, const
> uint16_t
> > queue_id)
> >  {
> > struct rte_eth_rxq_info qinfo;
> >
> > -   if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
> > -   return -1;
> > +   int ret = rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo);
> > +   if (ret < 0) {
> > +   if (ret == -ENOTSUP)
> > +   return 1;
> > +   else
> > +   return -1;
> > +   }
> >
> > return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
> >  }
> > --
> > 2.25.1



Re: [dpdk-dev] [PATCH v5 5/5] examples/l3fwd-power: support virtio/vhost

2021-10-15 Thread Li, Miao
Hi Chenbo,

> -Original Message-
> From: Xia, Chenbo 
> Sent: Friday, October 15, 2021 4:14 PM
> To: Li, Miao ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com
> Subject: RE: [PATCH v5 5/5] examples/l3fwd-power: support virtio/vhost
> 
> > -Original Message-
> > From: Li, Miao 
> > Sent: Friday, October 15, 2021 11:12 PM
> > To: dev@dpdk.org
> > Cc: Xia, Chenbo ; maxime.coque...@redhat.com; Li,
> Miao
> > 
> > Subject: [PATCH v5 5/5] examples/l3fwd-power: support virtio/vhost
> >
> > In l3fwd-power, there is default port configuration which requires
> > RSS and IPV4/UDP/TCP checksum. Once device does not support these,
> > the l3fwd-power will exit and report an error.
> > This patch updates the port configuration based on device capabilities
> > after getting the device information to support devices like virtio
> > and vhost.
> >
> > Signed-off-by: Miao Li 
> > ---
> >  examples/l3fwd-power/main.c | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
> > index 73a3ab5bc0..61c15e01d2 100644
> > --- a/examples/l3fwd-power/main.c
> > +++ b/examples/l3fwd-power/main.c
> > @@ -505,7 +505,9 @@ is_valid_ipv4_pkt(struct rte_ipv4_hdr *pkt, uint32_t
> > link_len)
> > return -1;
> >
> > /* 2. The IP checksum must be correct. */
> > -   /* this is checked in H/W */
> > +   /* if this is not checked in H/W, check it. */
> > +   if ((port_conf.rxmode.offloads & DEV_RX_OFFLOAD_IPV4_CKSUM) == 0)
> > +   rte_ipv4_cksum(pkt);
> 
> This is not correct. The correct handling should be:
> 
> 1. get actual cksum from pkt and save it
> 2. set pkt cksum to zero
> 3. compute correct cksum using rte_ipv4_cksum
> 4. compare to know if actual cksum == correct cksum
> 
> You can refer to test_ipsec_l3_csum_verify in test_cryptodev_security_ipsec.c
> 
> Thanks,
> Chenbo

I will fix it in the next version.

Thanks,
Miao

> 
> >
> > /*
> >  * 3. The IP version number must be 4. If the version number is not 4
> > @@ -2637,6 +2639,11 @@ main(int argc, char **argv)
> > local_port_conf.rx_adv_conf.rss_conf.rss_hf);
> > }
> >
> > +   if (local_port_conf.rx_adv_conf.rss_conf.rss_hf == 0)
> > +   local_port_conf.rxmode.mq_mode =
> ETH_MQ_RX_NONE;
> > +   local_port_conf.rxmode.offloads &= dev_info.rx_offload_capa;
> > +   port_conf.rxmode.offloads = local_port_conf.rxmode.offloads;
> > +
> > ret = rte_eth_dev_configure(portid, nb_rx_queue,
> > (uint16_t)n_tx_queue,
> &local_port_conf);
> > if (ret < 0)
> > --
> > 2.25.1



Re: [dpdk-dev] [PATCH v2] vhost: add sanity check when operating the split ring

2021-10-15 Thread Maxime Coquelin

The title is too vague, I would put something like:

vhost: add sanity check on inflight last index

On 10/14/21 14:40, Li Feng wrote:

The idx in rte_vhost_set_last_inflight_io_split is from the frontend


s/idx/index/


driver, check if it's in the virtqueue range.

Fixes: bb0c2de9602b ("vhost: add APIs to operate inflight ring")
Cc: sta...@dpdk.org

Signed-off-by: Li Feng 
---
  lib/vhost/vhost.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 9540522dac..3b674ac320 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -1226,6 +1226,9 @@ rte_vhost_set_last_inflight_io_split(int vid, uint16_t 
vring_idx,
if (unlikely(!vq->inflight_split))
return -1;
  
+	if (unlikely(idx >= vq->size))

+   return -1;
+
vq->inflight_split->last_inflight_io = idx;
return 0;
  }



Other than that, this is fine to me:

Reviewed-by: Maxime Coquelin 

If my suggestions are fine for you, I can fix while applying.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH] examples/vhost: change the default value of NIC's max queues

2021-10-15 Thread David Marchand
On Fri, Sep 10, 2021 at 5:17 AM Xia, Chenbo  wrote:
> > if NIC’s max queues > the default number of 128,
> > so, we changed the default value from 128 to 512.
> >
>
> I'd say it's not cool to still hard-code the MAX_QUEUES so that only 'some' 
> NICs
> can work with the example. The app should have a way to check this kind of 
> info

+1...

> before init/start. But as I would like to see at some point, this example will
> be removed and all our tests go to testpmd. Let's not waste too much effort on
> this example.

And +1, this example is a mess.


-- 
David Marchand



Re: [dpdk-dev] [PATCH v4 1/4] mempool: add event callbacks

2021-10-15 Thread Andrew Rybchenko
On 10/13/21 2:01 PM, Dmitry Kozlyuk wrote:
> Data path performance can benefit if the PMD knows which memory it will
> need to handle in advance, before the first mbuf is sent to the PMD.
> It is impractical, however, to consider all allocated memory for this
> purpose. Most often mbuf memory comes from mempools that can come and
> go. PMD can enumerate existing mempools on device start, but it also
> needs to track creation and destruction of mempools after the forwarding
> starts but before an mbuf from the new mempool is sent to the device.
> 
> Add an API to register callback for mempool life cycle events:
> * rte_mempool_event_callback_register()
> * rte_mempool_event_callback_unregister()
> Currently tracked events are:
> * RTE_MEMPOOL_EVENT_READY (after populating a mempool)
> * RTE_MEMPOOL_EVENT_DESTROY (before freeing a mempool)
> Provide a unit test for the new API.
> The new API is internal, because it is primarily demanded by PMDs that
> may need to deal with any mempools and do not control their creation,
> while an application, on the other hand, knows which mempools it creates
> and doesn't care about internal mempools PMDs might create.
> 
> Signed-off-by: Dmitry Kozlyuk 
> Acked-by: Matan Azrad 

With below review notes processed

Reviewed-by: Andrew Rybchenko 

[snip]

> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> index c5f859ae71..51c0ba2931 100644
> --- a/lib/mempool/rte_mempool.c
> +++ b/lib/mempool/rte_mempool.c

[snip]

> @@ -360,6 +372,10 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char 
> *vaddr,
>   STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next);
>   mp->nb_mem_chunks++;
>  
> + /* Report the mempool as ready only when fully populated. */
> + if (mp->populated_size >= mp->size)
> + mempool_event_callback_invoke(RTE_MEMPOOL_EVENT_READY, mp);
> +
>   rte_mempool_trace_populate_iova(mp, vaddr, iova, len, free_cb, opaque);
>   return i;
>  
> @@ -722,6 +738,7 @@ rte_mempool_free(struct rte_mempool *mp)
>   }
>   rte_mcfg_tailq_write_unlock();
>  
> + mempool_event_callback_invoke(RTE_MEMPOOL_EVENT_DESTROY, mp);
>   rte_mempool_trace_free(mp);
>   rte_mempool_free_memchunks(mp);
>   rte_mempool_ops_free(mp);
> @@ -1343,3 +1360,123 @@ void rte_mempool_walk(void (*func)(struct rte_mempool 
> *, void *),
>  
>   rte_mcfg_mempool_read_unlock();
>  }
> +
> +struct mempool_callback {

It sounds like it is a mempool callback itself.
Consider: mempool_event_callback_data.
I think this way it will be consistent.

> + rte_mempool_event_callback *func;
> + void *user_data;
> +};
> +
> +static void
> +mempool_event_callback_invoke(enum rte_mempool_event event,
> +   struct rte_mempool *mp)
> +{
> + struct mempool_callback_list *list;
> + struct rte_tailq_entry *te;
> + void *tmp_te;
> +
> + rte_mcfg_tailq_read_lock();
> + list = RTE_TAILQ_CAST(callback_tailq.head, mempool_callback_list);
> + RTE_TAILQ_FOREACH_SAFE(te, list, next, tmp_te) {
> + struct mempool_callback *cb = te->data;
> + rte_mcfg_tailq_read_unlock();
> + cb->func(event, mp, cb->user_data);
> + rte_mcfg_tailq_read_lock();
> + }
> + rte_mcfg_tailq_read_unlock();
> +}
> +
> +int
> +rte_mempool_event_callback_register(rte_mempool_event_callback *func,
> + void *user_data)
> +{
> + struct mempool_callback_list *list;
> + struct rte_tailq_entry *te = NULL;
> + struct mempool_callback *cb;
> + void *tmp_te;
> + int ret;
> +
> + if (func == NULL) {
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> +
> + rte_mcfg_mempool_read_lock();
> + rte_mcfg_tailq_write_lock();
> +
> + list = RTE_TAILQ_CAST(callback_tailq.head, mempool_callback_list);
> + RTE_TAILQ_FOREACH_SAFE(te, list, next, tmp_te) {
> + struct mempool_callback *cb =
> + (struct mempool_callback *)te->data;
> + if (cb->func == func && cb->user_data == user_data) {
> + ret = -EEXIST;
> + goto exit;
> + }
> + }
> +
> + te = rte_zmalloc("MEMPOOL_TAILQ_ENTRY", sizeof(*te), 0);
> + if (te == NULL) {
> + RTE_LOG(ERR, MEMPOOL,
> + "Cannot allocate event callback tailq entry!\n");
> + ret = -ENOMEM;
> + goto exit;
> + }
> +
> + cb = rte_malloc("MEMPOOL_EVENT_CALLBACK", sizeof(*cb), 0);
> + if (cb == NULL) {
> + RTE_LOG(ERR, MEMPOOL,
> + "Cannot allocate event callback!\n");
> + rte_free(te);
> + ret = -ENOMEM;
> + goto exit;
> + }
> +
> + cb->func = func;
> + cb->user_data = user_data;
> + te->data = cb;
> + TAILQ_INSERT_TAIL(list, te, next);
> + ret = 0;
> +
> +exit:
> + rte_mcfg_tailq_write_unloc

Re: [dpdk-dev] [PATCH v3 4/5] lib/kvargs: remove unneeded header includes

2021-10-15 Thread Olivier Matz
Hi Sean,

On Thu, Oct 07, 2021 at 10:25:56AM +, Sean Morrissey wrote:
> These header includes have been flagged by the iwyu_tool
> and removed.
> 
> Signed-off-by: Sean Morrissey 
> ---
>  lib/kvargs/rte_kvargs.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/lib/kvargs/rte_kvargs.c b/lib/kvargs/rte_kvargs.c
> index 38e9d5c1ca..4cce8e953b 100644
> --- a/lib/kvargs/rte_kvargs.c
> +++ b/lib/kvargs/rte_kvargs.c
> @@ -7,7 +7,6 @@
>  #include 
>  #include 
>  
> -#include 
>  #include 
>  
>  #include "rte_kvargs.h"
> -- 
> 2.25.1
> 

Did you check that it still compiles for the Windows platform
after this change?

+CC Dmitry


Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread Andrew Rybchenko
On 10/13/21 2:01 PM, Dmitry Kozlyuk wrote:
> Mempool is a generic allocator that is not necessarily used for device
> IO operations and its memory for DMA. Add MEMPOOL_F_NON_IO flag to mark
> such mempools automatically if their objects are not contiguous
> or IOVA are not available. Components can inspect this flag
> in order to optimize their memory management.
> Discussion: https://mails.dpdk.org/archives/dev/2021-August/216654.html
> 
> Signed-off-by: Dmitry Kozlyuk 
> Acked-by: Matan Azrad 

See review notes below. With review notes processed:

Reviewed-by: Andrew Rybchenko 

[snip]

> diff --git a/doc/guides/rel_notes/release_21_11.rst 
> b/doc/guides/rel_notes/release_21_11.rst
> index f643a61f44..74e0e6f495 100644
> --- a/doc/guides/rel_notes/release_21_11.rst
> +++ b/doc/guides/rel_notes/release_21_11.rst
> @@ -226,6 +226,9 @@ API Changes
>the crypto/security operation. This field will be used to communicate
>events such as soft expiry with IPsec in lookaside mode.
>  
> +* mempool: Added ``MEMPOOL_F_NON_IO`` flag to give a hint to DPDK components
> +  that objects from this pool will not be used for device IO (e.g. DMA).
> +
>  
>  ABI Changes
>  ---
> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> index 51c0ba2931..2204f140b3 100644
> --- a/lib/mempool/rte_mempool.c
> +++ b/lib/mempool/rte_mempool.c
> @@ -371,6 +371,8 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char 
> *vaddr,
>  
>   STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next);
>   mp->nb_mem_chunks++;
> + if (iova == RTE_BAD_IOVA)
> + mp->flags |= MEMPOOL_F_NON_IO;

As I understand rte_mempool_populate_iova() may be called
few times for one mempool. The flag must be set if all
invocations are done with RTE_BAD_IOVA. So, it should be
set by default and just removed when iova != RTE_BAD_IOVA
happens.

Yes, it is a corner case. May be it makes sense to
cover it by unit test as well.

>  
>   /* Report the mempool as ready only when fully populated. */
>   if (mp->populated_size >= mp->size)
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 663123042f..029b62a650 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -262,6 +262,8 @@ struct rte_mempool {
>  #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
> "single-consumer".*/
>  #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
>  #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous 
> objs. */
> +#define MEMPOOL_F_NON_IO 0x0040
> + /**< Internal: pool is not usable for device IO (DMA). */

Please, put the documentation before the define.
/** Internal: pool is not usable for device IO (DMA). */
#define MEMPOOL_F_NON_IO 0x0040

>  
>  /**
>   * @internal When debug is enabled, store some statistics.
> @@ -991,6 +993,9 @@ typedef void (rte_mempool_ctor_t)(struct rte_mempool *, 
> void *);
>   * "single-consumer". Otherwise, it is "multi-consumers".
>   *   - MEMPOOL_F_NO_IOVA_CONTIG: If set, allocated objects won't
>   * necessarily be contiguous in IO memory.
> + *   - MEMPOOL_F_NON_IO: If set, the mempool is considered to be
> + * never used for device IO, i.e. for DMA operations.
> + * It's a hint to other components and does not affect the mempool 
> behavior.

I tend to say that it should not be here if the flag is
internal.

>   * @return
>   *   The pointer to the new allocated mempool, on success. NULL on error
>   *   with rte_errno set appropriately. Possible rte_errno values include:
> 



Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart

2021-10-15 Thread Dmitry Kozlyuk
> [...]
> >>> From: Dmitry Kozlyuk 
> >>>
> >>> rte_flow_action_handle_create() did not mention what happens with an
> >>> indirect action when a device is stopped, possibly reconfigured, and
> >>> started again. It is natural for some indirect actions to be
> >>> persistent, like counters and meters; keeping others just saves
> >>> application time and complexity. However, not all PMDs can support it.
> >>> It is proposed to add a device capability to indicate if indirect
> >>> actions are kept across the above sequence or implicitly destroyed.
> >>>
> >>> In the future, indirect actions may not be the only type of objects
> >>> shared between flow rules. The capability bit intends to cover all
> >>> possible types of such objects, hence its name.
> >>>
> >>> It may happen that in the future a PMD acquires support for a type
> >>> of shared objects that it cannot keep across a restart. It is
> >>> undesirable to stop advertising the capability so that applications
> >>> that don't use objects of the problematic type can still take
> advantage of it.
> >>> This is why PMDs are allowed to keep only a subset of shared objects
> >>> provided that the vendor mandatorily documents it.
> >>>
> >>> If the device is being reconfigured in a way that is incompatible
> >>> with an existing shared objects, PMD is required to report an error.
> >>> This is mandatory, because flow API does not supply users with
> >>> capabilities, so this is the only way for a user to learn that
> >>> configuration is invalid. For example, if queue count changes and
> >>> RSS indirect action specifies queues that are going away, the user
> >>> must update the action before removing the queues or remove the
> >>> action and all flow rules that were using it.
> >>>
> >>> Signed-off-by: Dmitry Kozlyuk 
> >>> ---
> >>> [...]
> >>
> >> Current pain point is that capability bits may be insufficient and a
> >> programmatic way is desired to check which types of objects can be
> >> kept across restart, instead of documenting the limitations.
> >>
> >> I support one of previous Ori's suggestions and want to clarify it [1]:
> >>
> >> Ori: "Another way is to assume that if the action was created before
> >> port start it will be kept after port stop."
> >> Andrew: "It does not sound like a solution. May be I simply don't
> >> know target usecase."
> >>
> >> What Ori suggests (offline discussion summary): Suppose an
> >> application wants to check whether a shared object (indirect action)
> >> or a flow rule of a particular kind. It calls
> >> rte_flow_action_handle_create() or
> >> rte_flow_create() before rte_eth_dev_start(). If it succeeds, 1) it
> >> means objects of this type can be kept across restart, 2) it's a
> >> normal object created that will work after the port is started. This
> >> is logical, because if the PMD can keep some kind of objects when the
> >> port is stopped, it is likely to be able to create them when the port
> >> is not started. It is subject to discussion if "object kind" means
> >> only "type" or "type + transfer bit" combination; for mlx5 PMD it
> >> doesn't matter. One minor drawback is that applications can only do
> >> the test when the port is stopped, but it seems likely that the test
> >> really needs to be done at startup anyway.
> >>
> >> If this is acceptable:
> >> 1. Capability bits are not needed anymore.
> >> 2. ethdev patches can be accepted in RC1, present behavior is
> >> undefined anyway.
> >> 3. PMD patches will need update that can be done by RC2.
> >
> > Andrew, what do you think?
> > If you agree, do we need to include transfer bit into "kind"?
> > I'd like to conclude before RC1 and can update the docs quickly.
> >
> > I've seen the proposition to advertise capability to create flow rules
> > before device start as a flag.
> > I don't think it conflicts with Ori's suggestion because the flag
> > doesn't imply that _any_ rule can be created, neither does it say
> > about indirect actions.
> > On the other hand, if PMD can create a flow object (rule, etc.) when
> > the device is not started, it is logical to assume that after the
> > device is stopped it can move existing flow objects to the same state
> > as when the device was not started, then restore when it is started
> > again.
>
> Dmitry, thanks for the explanations. Ori's idea makes sense to me now. The
> problem is to document it properly. We must define rules to check it.
> Which bits in the check request matter and how application should make a
> choice of rule to try.

This is a generalization of the last question about the transfer bit.
I call the bits that matter a "kind". As I see it:

rule kind = seq. of item types + seq. of action types
indirect action kind = action type

As Ori mentioned, for mlx5 PMD transfer bit doesn't affect object persistence.
If you or other PMD maintainers think it may be relevant, no problem,
because PMDs like mlx5 will just ignore it when checking. Then it will be:

rule kind = seq. of item types + seq. 

[dpdk-dev] [PATCH v6 0/5] Implement rte_power_monitor API in virtio/vhost PMD

2021-10-15 Thread Miao Li
This patchset implements rte_power_monitor API in virtio and vhost PMD
to reduce power consumption when no packet come in. This API can be
called and tested in l3fwd-power after adding vhost and virtio support
in l3fwd-power and ignoring the rx queue information check in
queue_stopped().

v6:
-modify comment
-remove extra space
-fix IPv4 CKSUM check

v5:
-Rebase on lastest repo

v4:
-modify comment
-update the release note
-add IPv4 CKSUM check

v3:
-fix some code format issues
-fix spelling mistake

v2:
-remove flag and add match and size in rte_vhost_power_monitor_cond
-modify power callback function
-add dev and queue id check and remove unnecessary check
-fix the assignment of pmc->size
-update port configuration according to the device information and
remove adding command line arguments
-modify some titles

Miao Li (5):
  net/virtio: implement rte_power_monitor API
  vhost: implement rte_power_monitor API
  net/vhost: implement rte_power_monitor API
  power: modify return of queue_stopped
  examples/l3fwd-power: support virtio/vhost

 doc/guides/rel_notes/release_21_11.rst | 12 ++
 drivers/net/vhost/rte_eth_vhost.c  | 40 ++
 drivers/net/virtio/virtio_ethdev.c | 56 ++
 examples/l3fwd-power/main.c| 15 ++-
 lib/power/rte_power_pmd_mgmt.c |  9 -
 lib/vhost/rte_vhost.h  | 42 +++
 lib/vhost/version.map  |  3 ++
 lib/vhost/vhost.c  | 38 +
 8 files changed, 212 insertions(+), 3 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v6 1/5] net/virtio: implement rte_power_monitor API

2021-10-15 Thread Miao Li
This patch implements rte_power_monitor API in virtio PMD to reduce
power consumption when no packet come in. According to current semantics
of power monitor, this commit adds a callback function to decide whether
aborts the sleep by checking current value against the expected value and
virtio_get_monitor_addr to provide address to monitor. When no packet come
in, the value of address will not be changed and the running core will
sleep. Once packets arrive, the value of address will be changed and the
running core will wakeup.

Signed-off-by: Miao Li 
Reviewed-by: Chenbo Xia 
---
 doc/guides/rel_notes/release_21_11.rst |  4 ++
 drivers/net/virtio/virtio_ethdev.c | 56 ++
 2 files changed, 60 insertions(+)

diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index 4c56cdfeaa..27dc896703 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -72,6 +72,10 @@ New Features
   Added macros ETH_RSS_IPV4_CHKSUM and ETH_RSS_L4_CHKSUM, now IPv4 and
   TCP/UDP/SCTP header checksum field can be used as input set for RSS.
 
+* **Updated virtio PMD.**
+
+  Implement rte_power_monitor API in virtio PMD.
+
 * **Updated af_packet ethdev driver.**
 
   * Default VLAN strip behavior was changed. VLAN tag won't be stripped
diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 6aa36b3f39..1227f3f1f4 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -74,6 +74,8 @@ static int virtio_mac_addr_set(struct rte_eth_dev *dev,
struct rte_ether_addr *mac_addr);
 
 static int virtio_intr_disable(struct rte_eth_dev *dev);
+static int virtio_get_monitor_addr(void *rx_queue,
+   struct rte_power_monitor_cond *pmc);
 
 static int virtio_dev_queue_stats_mapping_set(
struct rte_eth_dev *eth_dev,
@@ -982,6 +984,7 @@ static const struct eth_dev_ops virtio_eth_dev_ops = {
.mac_addr_add= virtio_mac_addr_add,
.mac_addr_remove = virtio_mac_addr_remove,
.mac_addr_set= virtio_mac_addr_set,
+   .get_monitor_addr= virtio_get_monitor_addr,
 };
 
 /*
@@ -1313,6 +1316,59 @@ virtio_mac_addr_set(struct rte_eth_dev *dev, struct 
rte_ether_addr *mac_addr)
return 0;
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+#define CLB_MATCH_IDX 2
+static int
+virtio_monitor_callback(const uint64_t value,
+   const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+   const uint64_t m = opaque[CLB_MSK_IDX];
+   const uint64_t v = opaque[CLB_VAL_IDX];
+   const uint64_t c = opaque[CLB_MATCH_IDX];
+
+   if (c)
+   return (value & m) == v ? -1 : 0;
+   else
+   return (value & m) == v ? 0 : -1;
+}
+
+static int
+virtio_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
+{
+   struct virtnet_rx *rxvq = rx_queue;
+   struct virtqueue *vq = virtnet_rxq_to_vq(rxvq);
+   struct virtio_hw *hw;
+
+   if (vq == NULL)
+   return -EINVAL;
+
+   hw = vq->hw;
+   if (virtio_with_packed_queue(hw)) {
+   struct vring_packed_desc *desc;
+   desc = vq->vq_packed.ring.desc;
+   pmc->addr = &desc[vq->vq_used_cons_idx].flags;
+   if (vq->vq_packed.used_wrap_counter)
+   pmc->opaque[CLB_VAL_IDX] =
+   VRING_PACKED_DESC_F_AVAIL_USED;
+   else
+   pmc->opaque[CLB_VAL_IDX] = 0;
+   pmc->opaque[CLB_MSK_IDX] = VRING_PACKED_DESC_F_AVAIL_USED;
+   pmc->opaque[CLB_MATCH_IDX] = 1;
+   pmc->size = sizeof(desc[vq->vq_used_cons_idx].flags);
+   } else {
+   pmc->addr = &vq->vq_split.ring.used->idx;
+   pmc->opaque[CLB_VAL_IDX] = vq->vq_used_cons_idx
+   & (vq->vq_nentries - 1);
+   pmc->opaque[CLB_MSK_IDX] = vq->vq_nentries - 1;
+   pmc->opaque[CLB_MATCH_IDX] = 0;
+   pmc->size = sizeof(vq->vq_split.ring.used->idx);
+   }
+   pmc->fn = virtio_monitor_callback;
+
+   return 0;
+}
+
 static int
 virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 {
-- 
2.25.1



[dpdk-dev] [PATCH v6 2/5] vhost: implement rte_power_monitor API

2021-10-15 Thread Miao Li
This patch defines rte_vhost_power_monitor_cond which is used to pass
some information to vhost driver. The information is including the address
to monitor, the expected value, the mask to extract value read from 'addr',
the value size of monitor address, the match flag used to distinguish the
value used to match something or not match something. Vhost driver can use
these information to fill rte_power_monitor_cond.

Signed-off-by: Miao Li 
---
 doc/guides/rel_notes/release_21_11.rst |  4 +++
 lib/vhost/rte_vhost.h  | 42 ++
 lib/vhost/version.map  |  3 ++
 lib/vhost/vhost.c  | 38 +++
 4 files changed, 87 insertions(+)

diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index 27dc896703..ad6d256a55 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -72,6 +72,10 @@ New Features
   Added macros ETH_RSS_IPV4_CHKSUM and ETH_RSS_L4_CHKSUM, now IPv4 and
   TCP/UDP/SCTP header checksum field can be used as input set for RSS.
 
+* **Added power monitor API in vhost library.**
+
+  Added an API to support power monitor in vhost library.
+
 * **Updated virtio PMD.**
 
   Implement rte_power_monitor API in virtio PMD.
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index fd372d5259..6f0915b98f 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -292,6 +292,31 @@ struct vhost_device_ops {
void *reserved[1]; /**< Reserved for future extension */
 };
 
+/**
+ * Power monitor condition.
+ */
+struct rte_vhost_power_monitor_cond {
+   /**< Address to monitor for changes */
+   volatile void *addr;
+   /**< If the `mask` is non-zero, location pointed
+*   to by `addr` will be read and masked, then
+*   compared with this value.
+*/
+   uint64_t val;
+   /**< 64-bit mask to extract value read from `addr` */
+   uint64_t mask;
+   /**< Data size (in bytes) that will be read from the
+*   monitored memory location (`addr`).
+*/
+   uint8_t size;
+   /**< If 1, and masked value that read from 'addr' equals
+*   'val', the driver should skip core sleep. If 0, and
+*  masked value that read from 'addr' does not equal 'val',
+*  the driver should skip core sleep.
+*/
+   uint8_t match;
+};
+
 /**
  * Convert guest physical address to host virtual address
  *
@@ -903,6 +928,23 @@ int rte_vhost_vring_call(int vid, uint16_t vring_idx);
  */
 uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
 
+/**
+ * Get power monitor address of the vhost device
+ *
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  vhost queue ID
+ * @param pmc
+ *  power monitor condition
+ * @return
+ *  0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_get_monitor_addr(int vid, uint16_t queue_id,
+   struct rte_vhost_power_monitor_cond *pmc);
+
 /**
  * Get log base and log size of the vhost device
  *
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 8ebde3f694..c8599ddb97 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -85,4 +85,7 @@ EXPERIMENTAL {
rte_vhost_async_channel_register_thread_unsafe;
rte_vhost_async_channel_unregister_thread_unsafe;
rte_vhost_clear_queue_thread_unsafe;
+
+   # added in 21.11
+   rte_vhost_get_monitor_addr;
 };
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 9540522dac..36c896c9e2 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -1889,5 +1889,43 @@ rte_vhost_async_get_inflight(int vid, uint16_t queue_id)
return ret;
 }
 
+int
+rte_vhost_get_monitor_addr(int vid, uint16_t queue_id,
+   struct rte_vhost_power_monitor_cond *pmc)
+{
+   struct virtio_net *dev = get_device(vid);
+   struct vhost_virtqueue *vq;
+
+   if (dev == NULL)
+   return -1;
+   if (queue_id >= VHOST_MAX_VRING)
+   return -1;
+
+   vq = dev->virtqueue[queue_id];
+   if (vq == NULL)
+   return -1;
+
+   if (vq_is_packed(dev)) {
+   struct vring_packed_desc *desc;
+   desc = vq->desc_packed;
+   pmc->addr = &desc[vq->last_avail_idx].flags;
+   if (vq->avail_wrap_counter)
+   pmc->val = VRING_DESC_F_AVAIL;
+   else
+   pmc->val = VRING_DESC_F_USED;
+   pmc->mask = VRING_DESC_F_AVAIL | VRING_DESC_F_USED;
+   pmc->size = sizeof(desc[vq->last_avail_idx].flags);
+   pmc->match = 1;
+   } else {
+   pmc->addr = &vq->avail->idx;
+   pmc->val = vq->last_avail_idx & (vq->size - 1);
+   pmc->mask = vq->size - 1;
+   pmc->size = sizeof(vq->avail->idx);
+   pmc->match = 0;
+   }
+
+   return 0;
+}
+
 RTE_LOG_REGISTER_SUFFIX(vhost_config

[dpdk-dev] [PATCH v6 3/5] net/vhost: implement rte_power_monitor API

2021-10-15 Thread Miao Li
This patch implements rte_power_monitor API in vhost PMD to reduce
power consumption when no packet come in. According to current semantics
of power monitor, this commit adds a callback function to decide whether
aborts the sleep by checking current value against the expected value and
vhost_get_monitor_addr to provide address to monitor. When no packet come
in, the value of address will not be changed and the running core will
sleep. Once packets arrive, the value of address will be changed and the
running core will wakeup.

Signed-off-by: Miao Li 
---
 doc/guides/rel_notes/release_21_11.rst |  4 +++
 drivers/net/vhost/rte_eth_vhost.c  | 40 ++
 2 files changed, 44 insertions(+)

diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index ad6d256a55..e6f9c284ae 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -76,6 +76,10 @@ New Features
 
   Added an API to support power monitor in vhost library.
 
+* **Updated vhost PMD.**
+
+  Implement rte_power_monitor API in vhost PMD.
+
 * **Updated virtio PMD.**
 
   Implement rte_power_monitor API in virtio PMD.
diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 2e24e5f7ff..c9947e4db7 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -1386,6 +1386,45 @@ eth_rx_queue_count(struct rte_eth_dev *dev, uint16_t 
rx_queue_id)
return rte_vhost_rx_queue_count(vq->vid, vq->virtqueue_id);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+#define CLB_MATCH_IDX 2
+static int
+vhost_monitor_callback(const uint64_t value,
+   const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+   const uint64_t m = opaque[CLB_MSK_IDX];
+   const uint64_t v = opaque[CLB_VAL_IDX];
+   const uint64_t c = opaque[CLB_MATCH_IDX];
+
+   if (c)
+   return (value & m) == v ? -1 : 0;
+   else
+   return (value & m) == v ? 0 : -1;
+}
+
+static int
+vhost_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
+{
+   struct vhost_queue *vq = rx_queue;
+   struct rte_vhost_power_monitor_cond vhost_pmc;
+   int ret;
+   if (vq == NULL)
+   return -EINVAL;
+   ret = rte_vhost_get_monitor_addr(vq->vid, vq->virtqueue_id,
+   &vhost_pmc);
+   if (ret < 0)
+   return -EINVAL;
+   pmc->addr = vhost_pmc.addr;
+   pmc->opaque[CLB_VAL_IDX] = vhost_pmc.val;
+   pmc->opaque[CLB_MSK_IDX] = vhost_pmc.mask;
+   pmc->opaque[CLB_MATCH_IDX] = vhost_pmc.match;
+   pmc->size = vhost_pmc.size;
+   pmc->fn = vhost_monitor_callback;
+
+   return 0;
+}
+
 static const struct eth_dev_ops ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
@@ -1405,6 +1444,7 @@ static const struct eth_dev_ops ops = {
.xstats_get_names = vhost_dev_xstats_get_names,
.rx_queue_intr_enable = eth_rxq_intr_enable,
.rx_queue_intr_disable = eth_rxq_intr_disable,
+   .get_monitor_addr = vhost_get_monitor_addr,
 };
 
 static int
-- 
2.25.1



[dpdk-dev] [PATCH v6 4/5] power: modify return of queue_stopped

2021-10-15 Thread Miao Li
Since some vdevs like virtio and vhost do not support rxq_info_get and
queue state inquiry, the error return value -ENOTSUP need to be ignored
when queue_stopped cannot get rx queue information and rx queue state.
This patch changes the return value of queue_stopped when
rte_eth_rx_queue_info_get return -ENOTSUP to support vdevs which cannot
provide rx queue information and rx queue state enable power management.

Signed-off-by: Miao Li 
Acked-by: Anatoly Burakov 
Reviewed-by: Chenbo Xia 
---
 lib/power/rte_power_pmd_mgmt.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index 0ce40f0875..39a2b4cd23 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -382,8 +382,13 @@ queue_stopped(const uint16_t port_id, const uint16_t 
queue_id)
 {
struct rte_eth_rxq_info qinfo;
 
-   if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
-   return -1;
+   int ret = rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo);
+   if (ret < 0) {
+   if (ret == -ENOTSUP)
+   return 1;
+   else
+   return -1;
+   }
 
return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
 }
-- 
2.25.1



[dpdk-dev] [PATCH v6 5/5] examples/l3fwd-power: support virtio/vhost

2021-10-15 Thread Miao Li
In l3fwd-power, there is default port configuration which requires
RSS and IPV4/UDP/TCP checksum. Once device does not support these,
the l3fwd-power will exit and report an error.
This patch updates the port configuration based on device capabilities
after getting the device information to support devices like virtio
and vhost.

Signed-off-by: Miao Li 
---
 examples/l3fwd-power/main.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 73a3ab5bc0..dac946c18f 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -505,7 +505,15 @@ is_valid_ipv4_pkt(struct rte_ipv4_hdr *pkt, uint32_t 
link_len)
return -1;
 
/* 2. The IP checksum must be correct. */
-   /* this is checked in H/W */
+   /* if this is not checked in H/W, check it. */
+   if ((port_conf.rxmode.offloads & DEV_RX_OFFLOAD_IPV4_CKSUM) == 0){
+   uint16_t actual_cksum, expected_cksum;
+   actual_cksum = pkt->hdr_checksum;
+   pkt->hdr_checksum = 0;
+   expected_cksum = rte_ipv4_cksum(pkt);
+   if (actual_cksum != expected_cksum)
+   return -2;
+   }
 
/*
 * 3. The IP version number must be 4. If the version number is not 4
@@ -2637,6 +2645,11 @@ main(int argc, char **argv)
local_port_conf.rx_adv_conf.rss_conf.rss_hf);
}
 
+   if (local_port_conf.rx_adv_conf.rss_conf.rss_hf == 0)
+   local_port_conf.rxmode.mq_mode = ETH_MQ_RX_NONE;
+   local_port_conf.rxmode.offloads &= dev_info.rx_offload_capa;
+   port_conf.rxmode.offloads = local_port_conf.rxmode.offloads;
+
ret = rte_eth_dev_configure(portid, nb_rx_queue,
(uint16_t)n_tx_queue, &local_port_conf);
if (ret < 0)
-- 
2.25.1



Re: [dpdk-dev] [PATCH 1/5] hash: add new toeplitz hash implementation

2021-10-15 Thread Medvedkin, Vladimir

Hi Konstantin,

Thanks for the review,

On 07/10/2021 20:23, Ananyev, Konstantin wrote:



This patch add a new Toeplitz hash implementation using
Galios Fields New Instructions (GFNI).

Signed-off-by: Vladimir Medvedkin 
---
  doc/api/doxy-api-index.md |   1 +
  lib/hash/meson.build  |   1 +
  lib/hash/rte_thash.c  |  26 ++
  lib/hash/rte_thash.h  |  22 +
  lib/hash/rte_thash_gfni.h | 229 ++
  lib/hash/version.map  |   2 +
  6 files changed, 281 insertions(+)
  create mode 100644 lib/hash/rte_thash_gfni.h

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107..7549477 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -139,6 +139,7 @@ The public API headers are grouped by topics:
[hash]   (@ref rte_hash.h),
[jhash]  (@ref rte_jhash.h),
[thash]  (@ref rte_thash.h),
+  [thash_gfni] (@ref rte_thash_gfni.h),
[FBK hash]   (@ref rte_fbk_hash.h),
[CRC hash]   (@ref rte_hash_crc.h)

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index 9bc5ef9..40444ac 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -7,6 +7,7 @@ headers = files(
  'rte_hash.h',
  'rte_jhash.h',
  'rte_thash.h',
+'rte_thash_gfni.h',
  )
  indirect_headers += files('rte_crc_arm64.h')

diff --git a/lib/hash/rte_thash.c b/lib/hash/rte_thash.c
index d5a95a6..07447f7 100644
--- a/lib/hash/rte_thash.c
+++ b/lib/hash/rte_thash.c
@@ -11,6 +11,7 @@
  #include 
  #include 
  #include 
+#include 

  #define THASH_NAME_LEN64
  #define TOEPLITZ_HASH_LEN 32
@@ -88,6 +89,23 @@ struct rte_thash_ctx {
uint8_t hash_key[0];
  };

+uint8_t rte_thash_gfni_supported;


.. = 0;
?



This goes against style:
ERROR:GLOBAL_INITIALISERS: do not initialise globals to 0
I'll init it inside the RTE_INIT section


+
+void
+rte_thash_complete_matrix(uint64_t *matrixes, uint8_t *rss_key, int size)
+{
+   int i, j;
+   uint8_t *m = (uint8_t *)matrixes;
+
+   for (i = 0; i < size; i++) {
+   for (j = 0; j < 8; j++) {
+   m[i * 8 + j] = (rss_key[i] << j)|
+   (uint8_t)((uint16_t)(rss_key[i + 1]) >>
+   (8 - j));
+   }
+   }
+}
+
  static inline uint32_t
  get_bit_lfsr(struct thash_lfsr *lfsr)
  {
@@ -759,3 +777,11 @@ rte_thash_adjust_tuple(struct rte_thash_ctx *ctx,

return ret;
  }
+
+RTE_INIT(rte_thash_gfni_init)
+{
+#ifdef __GFNI__
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_GFNI))
+   rte_thash_gfni_supported = 1;
+#endif
+}
diff --git a/lib/hash/rte_thash.h b/lib/hash/rte_thash.h
index 76109fc..e3f1fc6 100644
--- a/lib/hash/rte_thash.h
+++ b/lib/hash/rte_thash.h
@@ -28,6 +28,7 @@ extern "C" {
  #include 
  #include 
  #include 
+#include 

  #if defined(RTE_ARCH_X86) || defined(__ARM_NEON)
  #include 
@@ -113,6 +114,8 @@ union rte_thash_tuple {
  };
  #endif

+extern uint8_t rte_thash_gfni_supported;
+
  /**
   * Prepare special converted key to use with rte_softrss_be()
   * @param orig
@@ -223,6 +226,25 @@ rte_softrss_be(uint32_t *input_tuple, uint32_t input_len,
return ret;
  }

+/**
+ * Converts Toeplitz hash key (RSS key) into matrixes required
+ * for GFNI implementation
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param matrixes
+ *  pointer to the memory where matrixes will be writen.
+ *  Note: the size of this memory must be equal to size * 8
+ * @param rss_key
+ *  pointer to the Toeplitz hash key
+ * @param size
+ *  Size of the rss_key in bytes.
+ */
+__rte_experimental
+void
+rte_thash_complete_matrix(uint64_t *matrixes, uint8_t *rss_key, int size);
+
  /** @internal Logarithm of minimum size of the RSS ReTa */
  #define   RTE_THASH_RETA_SZ_MIN   2U
  /** @internal Logarithm of maximum size of the RSS ReTa */
diff --git a/lib/hash/rte_thash_gfni.h b/lib/hash/rte_thash_gfni.h
new file mode 100644
index 000..8f89d7d
--- /dev/null
+++ b/lib/hash/rte_thash_gfni.h
@@ -0,0 +1,229 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Intel Corporation
+ */
+
+#ifndef _RTE_THASH_GFNI_H_
+#define _RTE_THASH_GFNI_H_
+
+/**
+ * @file
+ *
+ * Optimized Toeplitz hash functions implementation
+ * using Galois Fields New Instructions.
+ */
+
+#include 
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifdef __GFNI__
+
+#define RTE_THASH_FIRST_ITER_MSK   0x0f0f0f0f0f0e0c08
+#define RTE_THASH_PERM_MSK 0x0f0f0f0f0f0f0f0f
+#define RTE_THASH_FIRST_ITER_MSK_2 0xf0f0f0f0f0e0c080
+#define RTE_THASH_PERM_MSK_2   0xf0f0f0f0f0f0f0f0
+#define RTE_THASH_REWIND_MSK   0x00113377
+
+__rte_internal
+static inline void
+__rte_thash_xor_reduce(__m512i xor_acc, uint32_t *val_1, uint32_t *val_2)
+{
+   __m256i tmp_256_1, tmp_256_2;
+   __m128i tmp128_1, 

Re: [dpdk-dev] [PATCH v4 1/4] mempool: add event callbacks

2021-10-15 Thread Dmitry Kozlyuk
> -Original Message-
> From: Andrew Rybchenko 
> [...]
> With below review notes processed
> 
> Reviewed-by: Andrew Rybchenko 
> 

Thanks for the comments, I'll fix them all, just a small note below FYI.

> > + rte_mcfg_mempool_read_lock();
> > + rte_mcfg_tailq_write_lock();
> > + ret = -ENOENT;
> > + list = RTE_TAILQ_CAST(callback_tailq.head, mempool_callback_list);
> > + TAILQ_FOREACH(te, list, next) {
> > + cb = (struct mempool_callback *)te->data;
> > + if (cb->func == func && cb->user_data == user_data)
> > + break;
> > + }
> > + if (te != NULL) {
> 
> Here we rely on the fact that TAILQ_FOREACH() exists with te==NULL in the
> case of no such entry. I'd suggest to avoid the assumption.
> I.e. do below two lines above before break and have not the if condition
> her at all.

Since you asked the question, the code is non-obvious, so I'll change it.
FWIW, man 3 tailq:

TAILQ_FOREACH() traverses the queue referenced by head in the forward
direction, assigning each element in turn to var.  var is set to NULL
if the loop completes normally, or if there were no elements.


Re: [dpdk-dev] [PATCH 2/5] hash: enable gfni thash implementation

2021-10-15 Thread Medvedkin, Vladimir

Hi Konstantin,

On 08/10/2021 13:31, Ananyev, Konstantin wrote:



This patch enables new GFNI Toeplitz hash in
predictable RSS library.

Signed-off-by: Vladimir Medvedkin 
---
  lib/hash/rte_thash.c | 43 +++
  lib/hash/rte_thash.h | 19 +++
  lib/hash/version.map |  1 +
  3 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/lib/hash/rte_thash.c b/lib/hash/rte_thash.c
index 07447f7..86a0e96 100644
--- a/lib/hash/rte_thash.c
+++ b/lib/hash/rte_thash.c
@@ -86,6 +86,8 @@ struct rte_thash_ctx {
uint32_treta_sz_log;/** < size of the RSS ReTa in bits */
uint32_tsubtuples_nb;   /** < number of subtuples */
uint32_tflags;
+   uint64_t*matrices;


Comment, what is that, etc.



I'll rephrase the comment below.


+   /**< rte_thash_complete_matrix generated matrices */
uint8_t hash_key[0];
  };

@@ -253,12 +255,25 @@ rte_thash_init_ctx(const char *name, uint32_t key_len, 
uint32_t reta_sz,
ctx->hash_key[i] = rte_rand();
}

+   if (rte_thash_gfni_supported) {


I think it should be:
if (rte_thash_gfni_supported && rte_vect_get_max_simd_bitwidth() >= 
RTE_VECT_SIMD_512)




Agree


+   ctx->matrices = rte_zmalloc(NULL, key_len * sizeof(uint64_t),
+   RTE_CACHE_LINE_SIZE);


You can do it probably before allocation ctx, at the same place where te is 
allocated.
Might be a bit nicer.



I'd prefer to keep allocation and initialization of matrices in one 
place, below there is rte_thash_complete_matrix() which uses previously 
generated ctx->hash_key.



+   if (ctx->matrices == NULL)


RTE_LOG(ERR, ...);
rte_ernno = ENOMEM;



Agree


+   goto free_ctx;
+
+   rte_thash_complete_matrix(ctx->matrices, ctx->hash_key,
+   key_len);
+   }
+
te->data = (void *)ctx;
TAILQ_INSERT_TAIL(thash_list, te, next);

rte_mcfg_tailq_write_unlock();

return ctx;
+
+free_ctx:
+   rte_free(ctx);
  free_te:
rte_free(te);
  exit:
@@ -372,6 +387,10 @@ generate_subkey(struct rte_thash_ctx *ctx, struct 
thash_lfsr *lfsr,
set_bit(ctx->hash_key, get_rev_bit_lfsr(lfsr), i);
}

+   if (rte_thash_gfni_supported)


Here and in data-path functions, I think it would be better:
if (ctx->matrices != NULL)


Agree


+   rte_thash_complete_matrix(ctx->matrices, ctx->hash_key,
+   ctx->key_len);
+
return 0;
  }

@@ -628,6 +647,16 @@ rte_thash_get_key(struct rte_thash_ctx *ctx)
return ctx->hash_key;
  }

+const uint64_t *
+rte_thash_get_gfni_matrices(struct rte_thash_ctx *ctx)
+{
+   if (rte_thash_gfni_supported)
+   return ctx->matrices;


Why not just always:
return ctx->matices;
?



Agree


+
+   rte_errno = ENOTSUP;
+   return NULL;
+}
+
  static inline uint8_t
  read_unaligned_byte(uint8_t *ptr, unsigned int len, unsigned int offset)
  {
@@ -739,11 +768,17 @@ rte_thash_adjust_tuple(struct rte_thash_ctx *ctx,
attempts = RTE_MIN(attempts, 1U << (h->tuple_len - ctx->reta_sz_log));

for (i = 0; i < attempts; i++) {
-   for (j = 0; j < (tuple_len / 4); j++)
-   tmp_tuple[j] =
-   rte_be_to_cpu_32(*(uint32_t *)&tuple[j * 4]);
+   if (rte_thash_gfni_supported)

if (ctx->matrices)


+   hash = rte_thash_gfni(ctx->matrices, tuple, tuple_len);
+   else {
+   for (j = 0; j < (tuple_len / 4); j++)
+   tmp_tuple[j] =
+   rte_be_to_cpu_32(
+   *(uint32_t *)&tuple[j * 4]);
+
+   hash = rte_softrss(tmp_tuple, tuple_len / 4, hash_key);
+   }

-   hash = rte_softrss(tmp_tuple, tuple_len / 4, hash_key);
adj_bits = rte_thash_get_complement(h, hash, desired_value);

/*
diff --git a/lib/hash/rte_thash.h b/lib/hash/rte_thash.h
index e3f1fc6..6e6861c 100644
--- a/lib/hash/rte_thash.h
+++ b/lib/hash/rte_thash.h
@@ -410,6 +410,25 @@ const uint8_t *
  rte_thash_get_key(struct rte_thash_ctx *ctx);

  /**
+ * Get a pointer to the toeplitz hash matrices contained in the context.
+ * These matrices could be used with fast toeplitz hash implementation if
+ * CPU supports GFNI.
+ * Matrices changes after each addition of a helper.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param ctx
+ *  Thash context
+ * @return
+ *  A pointer to the toeplitz hash key matrices on success
+ *  NULL if GFNI is not supported.
+ */
+__rte_experimental
+const uint64_t *
+rte_thash_get_gfni_matrices(struct rte_thash_ctx *ctx);
+
+/**
   * Function prototype for the rte_thash_adjust_

Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread Dmitry Kozlyuk
> -Original Message-
> From: Andrew Rybchenko 
> [...]
> > diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> > index 51c0ba2931..2204f140b3 100644
> > --- a/lib/mempool/rte_mempool.c
> > +++ b/lib/mempool/rte_mempool.c
> > @@ -371,6 +371,8 @@ rte_mempool_populate_iova(struct rte_mempool *mp,
> > char *vaddr,
> >
> >   STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next);
> >   mp->nb_mem_chunks++;
> > + if (iova == RTE_BAD_IOVA)
> > + mp->flags |= MEMPOOL_F_NON_IO;
> 
> As I understand rte_mempool_populate_iova() may be called few times for
> one mempool. The flag must be set if all invocations are done with
> RTE_BAD_IOVA. So, it should be set by default and just removed when iova
> != RTE_BAD_IOVA happens.

I don't agree at all. If any object of the pool is unsuitable for IO,
the pool cannot be considered suitable for IO. So if there's a single
invocation with RTE_BAD_IOVA, the flag must be set forever.

> Yes, it is a corner case. May be it makes sense to cover it by unit test
> as well.

True for either your logic or mine, I'll add it.

Ack on the rest of the comments, thanks.


Re: [dpdk-dev] [PATCH v3 4/5] lib/kvargs: remove unneeded header includes

2021-10-15 Thread Morrissey, Sean



On 15/10/2021 10:00, Olivier Matz wrote:

Hi Sean,

On Thu, Oct 07, 2021 at 10:25:56AM +, Sean Morrissey wrote:

These header includes have been flagged by the iwyu_tool
and removed.

Signed-off-by: Sean Morrissey 
---
  lib/kvargs/rte_kvargs.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/lib/kvargs/rte_kvargs.c b/lib/kvargs/rte_kvargs.c
index 38e9d5c1ca..4cce8e953b 100644
--- a/lib/kvargs/rte_kvargs.c
+++ b/lib/kvargs/rte_kvargs.c
@@ -7,7 +7,6 @@
  #include 
  #include 
  
-#include 

  #include 
  
  #include "rte_kvargs.h"

--
2.25.1


Did you check that it still compiles for the Windows platform
after this change?

+CC Dmitry


Hi Olivier,

I cross-compiled with MinGW-64 after this change and it still compiled.

Thanks,

Sean.



Re: [dpdk-dev] [PATCH v2] vhost: add sanity check when operating the split ring

2021-10-15 Thread Li Feng
On Fri, Oct 15, 2021 at 4:52 PM Maxime Coquelin
 wrote:
>
> The title is too vague, I would put something like:
>
> vhost: add sanity check on inflight last index
>
> On 10/14/21 14:40, Li Feng wrote:
> > The idx in rte_vhost_set_last_inflight_io_split is from the frontend
>
> s/idx/index/
>
> > driver, check if it's in the virtqueue range.
> >
> > Fixes: bb0c2de9602b ("vhost: add APIs to operate inflight ring")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Li Feng 
> > ---
> >   lib/vhost/vhost.c | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> > index 9540522dac..3b674ac320 100644
> > --- a/lib/vhost/vhost.c
> > +++ b/lib/vhost/vhost.c
> > @@ -1226,6 +1226,9 @@ rte_vhost_set_last_inflight_io_split(int vid, 
> > uint16_t vring_idx,
> >   if (unlikely(!vq->inflight_split))
> >   return -1;
> >
> > + if (unlikely(idx >= vq->size))
> > + return -1;
> > +
> >   vq->inflight_split->last_inflight_io = idx;
> >   return 0;
> >   }
> >
>
> Other than that, this is fine to me:
>
> Reviewed-by: Maxime Coquelin 
>
> If my suggestions are fine for you, I can fix while applying.
>
It's fine.

> Thanks,
> Maxime
>


Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread David Marchand
Hello Dmitry,

On Wed, Oct 13, 2021 at 1:02 PM Dmitry Kozlyuk  wrote:
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 663123042f..029b62a650 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -262,6 +262,8 @@ struct rte_mempool {
>  #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
> "single-consumer".*/
>  #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
>  #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous 
> objs. */
> +#define MEMPOOL_F_NON_IO 0x0040
> +   /**< Internal: pool is not usable for device IO (DMA). */
>
>  /**
>   * @internal When debug is enabled, store some statistics.
> @@ -991,6 +993,9 @@ typedef void (rte_mempool_ctor_t)(struct rte_mempool *, 
> void *);
>   * "single-consumer". Otherwise, it is "multi-consumers".
>   *   - MEMPOOL_F_NO_IOVA_CONTIG: If set, allocated objects won't
>   * necessarily be contiguous in IO memory.
> + *   - MEMPOOL_F_NON_IO: If set, the mempool is considered to be
> + * never used for device IO, i.e. for DMA operations.
> + * It's a hint to other components and does not affect the mempool 
> behavior.
>   * @return
>   *   The pointer to the new allocated mempool, on success. NULL on error
>   *   with rte_errno set appropriately. Possible rte_errno values include:

- When rebasing on main, you probably won't be able to call this new flag.
The diff should be something like:

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index d886f4800c..35c80291fa 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -214,7 +214,7 @@ static int test_mempool_creation_with_unknown_flag(void)
MEMPOOL_ELT_SIZE, 0, 0,
NULL, NULL,
NULL, NULL,
-   SOCKET_ID_ANY, MEMPOOL_F_NO_IOVA_CONTIG << 1);
+   SOCKET_ID_ANY, MEMPOOL_F_NON_IO << 1);

if (mp_cov != NULL) {
rte_mempool_free(mp_cov);
diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
index 8d5f99f7e7..27d197fe86 100644
--- a/lib/mempool/rte_mempool.c
+++ b/lib/mempool/rte_mempool.c
@@ -802,6 +802,7 @@ rte_mempool_cache_free(struct rte_mempool_cache *cache)
| MEMPOOL_F_SC_GET \
| MEMPOOL_F_POOL_CREATED \
| MEMPOOL_F_NO_IOVA_CONTIG \
+   | MEMPOOL_F_NON_IO \
)
 /* create an empty mempool */
 struct rte_mempool *


- While grepping, I noticed that proc-info also dumps mempool flags.
This could be something to enhance, maybe amending current
rte_mempool_dump() and having this tool use it.
But for now, can you update this tool too?


-- 
David Marchand



Re: [dpdk-dev] [PATCH v6 1/5] ethdev: introduce shared Rx queue

2021-10-15 Thread Andrew Rybchenko
On 10/12/21 5:39 PM, Xueming Li wrote:
> In current DPDK framework, each Rx queue is pre-loaded with mbufs to
> save incoming packets. For some PMDs, when number of representors scale
> out in a switch domain, the memory consumption became significant.
> Polling all ports also leads to high cache miss, high latency and low
> throughput.
> 
> This patch introduce shared Rx queue. Ports in same Rx domain and
> switch domain could share Rx queue set by specifying non-zero sharing
> group in Rx queue configuration.
> 
> No special API is defined to receive packets from shared Rx queue.
> Polling any member port of a shared Rx queue receives packets of that
> queue for all member ports, source port is identified by mbuf->port.
> 
> Shared Rx queue must be polled in same thread or core, polling a queue
> ID of any member port is essentially same.
> 
> Multiple share groups are supported by non-zero share group ID. Device

"by non-zero share group ID" is not required. Since it must be
always non-zero to enable sharing.

> should support mixed configuration by allowing multiple share
> groups and non-shared Rx queue.
> 
> Even Rx queue shared, queue configuration like offloads and RSS should
> not be impacted.

I don't understand above sentence.
Even when Rx queues are shared, queue configuration like
offloads and RSS may differ. If a PMD has some limitation,
it should care about consistency itself. These limitations
should be documented in the PMD documentation.

> 
> Example grouping and polling model to reflect service priority:
>  Group1, 2 shared Rx queues per port: PF, rep0, rep1
>  Group2, 1 shared Rx queue per port: rep2, rep3, ... rep127
>  Core0: poll PF queue0
>  Core1: poll PF queue1
>  Core2: poll rep2 queue0


Can I have:
PF RxQ#0, RxQ#1
Rep0 RxQ#0 shared with PF RxQ#0
Rep1 RxQ#0 shared with PF RxQ#1

I guess no, since it looks like RxQ ID must be equal.
Or am I missing something? Otherwise grouping rules
are not obvious to me. May be we need dedicated
shared_qid in boundaries of the share_group?

> 
> PMD driver advertise shared Rx queue capability via
> RTE_ETH_DEV_CAPA_RXQ_SHARE.
> 
> PMD driver is responsible for shared Rx queue consistency checks to
> avoid member port's configuration contradict to each other.
> 
> Signed-off-by: Xueming Li 
> ---
>  doc/guides/nics/features.rst  | 13 
>  doc/guides/nics/features/default.ini  |  1 +
>  .../prog_guide/switch_representation.rst  | 10 +
>  doc/guides/rel_notes/release_21_11.rst|  5 +
>  lib/ethdev/rte_ethdev.c   |  9 
>  lib/ethdev/rte_ethdev.h   | 21 +++
>  6 files changed, 59 insertions(+)
> 
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index e346018e4b8..b64433b8ea5 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -615,6 +615,19 @@ Supports inner packet L4 checksum.
>``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_OUTER_UDP_CKSUM``.
>  
>  
> +.. _nic_features_shared_rx_queue:
> +
> +Shared Rx queue
> +---
> +
> +Supports shared Rx queue for ports in same Rx domain of a switch domain.
> +
> +* **[uses] rte_eth_dev_info**: ``dev_capa:RTE_ETH_DEV_CAPA_RXQ_SHARE``.
> +* **[uses] rte_eth_dev_info,rte_eth_switch_info**: ``rx_domain``, 
> ``domain_id``.
> +* **[uses] rte_eth_rxconf**: ``share_group``.
> +* **[provides] mbuf**: ``mbuf.port``.
> +
> +
>  .. _nic_features_packet_type_parsing:
>  
>  Packet type parsing
> diff --git a/doc/guides/nics/features/default.ini 
> b/doc/guides/nics/features/default.ini
> index d473b94091a..93f5d1b46f4 100644
> --- a/doc/guides/nics/features/default.ini
> +++ b/doc/guides/nics/features/default.ini
> @@ -19,6 +19,7 @@ Free Tx mbuf on demand =
>  Queue start/stop =
>  Runtime Rx queue setup =
>  Runtime Tx queue setup =
> +Shared Rx queue  =
>  Burst mode info  =
>  Power mgmt address monitor =
>  MTU update   =
> diff --git a/doc/guides/prog_guide/switch_representation.rst 
> b/doc/guides/prog_guide/switch_representation.rst
> index ff6aa91c806..de41db8385d 100644
> --- a/doc/guides/prog_guide/switch_representation.rst
> +++ b/doc/guides/prog_guide/switch_representation.rst
> @@ -123,6 +123,16 @@ thought as a software "patch panel" front-end for 
> applications.
>  .. [1] `Ethernet switch device driver model (switchdev)
> `_
>  
> +- For some PMDs, memory usage of representors is huge when number of
> +  representor grows, mbufs are allocated for each descriptor of Rx queue.
> +  Polling large number of ports brings more CPU load, cache miss and
> +  latency. Shared Rx queue can be used to share Rx queue between PF and
> +  representors among same Rx domain. ``RTE_ETH_DEV_CAPA_RXQ_SHARE`` is
> +  present in device capability of device info. Setting non-zero share group
> +  in Rx queue configurati

[dpdk-dev] [PATCH v2 0/5] optimized Toeplitz hash implementation

2021-10-15 Thread Vladimir Medvedkin
This patch series adds a new optimized implementation for the Toeplitz hash
function using Galois Fields New instruction (GFNI).
The main use case of this function is to calculate the hash value for a single
data, so there is no bulk implementation.
For performance reasons, the implementation was placed in a public header.
It is the responsibility of the user to ensure the platform supports GFNI
(by doing runtime checks of rte_thash_gfni_supported variable) before calling
these functions.

v2:
- fixed typos
- made big_rss_key static const and indented
- addressed Konstantin's comments

Vladimir Medvedkin (5):
  hash: add new toeplitz hash implementation
  hash: enable gfni thash implementation
  doc/hash: update documentation for the thash library
  test/thash: add tests for a new Toeplitz hash function
  test/thash: add performance tests for the Toeplitz hash

 app/test/meson.build|   2 +
 app/test/test_thash.c   | 231 +++
 app/test/test_thash_perf.c  | 125 +++
 doc/api/doxy-api-index.md   |   1 +
 doc/guides/prog_guide/toeplitz_hash_lib.rst |  37 -
 doc/guides/rel_notes/release_21_11.rst  |   4 +
 lib/hash/meson.build|   1 +
 lib/hash/rte_thash.c|  72 -
 lib/hash/rte_thash.h|  43 ++
 lib/hash/rte_thash_gfni.h   | 232 
 lib/hash/version.map|   3 +
 11 files changed, 743 insertions(+), 8 deletions(-)
 create mode 100644 app/test/test_thash_perf.c
 create mode 100644 lib/hash/rte_thash_gfni.h

-- 
2.7.4



[dpdk-dev] [PATCH v2 1/5] hash: add new toeplitz hash implementation

2021-10-15 Thread Vladimir Medvedkin
This patch add a new Toeplitz hash implementation using
Galios Fields New Instructions (GFNI).

Signed-off-by: Vladimir Medvedkin 
---
 doc/api/doxy-api-index.md |   1 +
 lib/hash/meson.build  |   1 +
 lib/hash/rte_thash.c  |  28 ++
 lib/hash/rte_thash.h  |  24 +
 lib/hash/rte_thash_gfni.h | 232 ++
 lib/hash/version.map  |   2 +
 6 files changed, 288 insertions(+)
 create mode 100644 lib/hash/rte_thash_gfni.h

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107..7549477 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -139,6 +139,7 @@ The public API headers are grouped by topics:
   [hash]   (@ref rte_hash.h),
   [jhash]  (@ref rte_jhash.h),
   [thash]  (@ref rte_thash.h),
+  [thash_gfni] (@ref rte_thash_gfni.h),
   [FBK hash]   (@ref rte_fbk_hash.h),
   [CRC hash]   (@ref rte_hash_crc.h)
 
diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index 9bc5ef9..40444ac 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -7,6 +7,7 @@ headers = files(
 'rte_hash.h',
 'rte_jhash.h',
 'rte_thash.h',
+'rte_thash_gfni.h',
 )
 indirect_headers += files('rte_crc_arm64.h')
 
diff --git a/lib/hash/rte_thash.c b/lib/hash/rte_thash.c
index 696a112..59a8b8e 100644
--- a/lib/hash/rte_thash.c
+++ b/lib/hash/rte_thash.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define THASH_NAME_LEN 64
 #define TOEPLITZ_HASH_LEN  32
@@ -90,6 +91,24 @@ struct rte_thash_ctx {
uint8_t hash_key[0];
 };
 
+/** Flag indicating GFNI support */
+uint8_t rte_thash_gfni_supported;
+
+void
+rte_thash_complete_matrix(uint64_t *matrixes, const uint8_t *rss_key, int size)
+{
+   int i, j;
+   uint8_t *m = (uint8_t *)matrixes;
+
+   for (i = 0; i < size; i++) {
+   for (j = 0; j < 8; j++) {
+   m[i * 8 + j] = (rss_key[i] << j)|
+   (uint8_t)((uint16_t)(rss_key[i + 1]) >>
+   (8 - j));
+   }
+   }
+}
+
 static inline uint32_t
 get_bit_lfsr(struct thash_lfsr *lfsr)
 {
@@ -761,3 +780,12 @@ rte_thash_adjust_tuple(struct rte_thash_ctx *ctx,
 
return ret;
 }
+
+RTE_INIT(rte_thash_gfni_init)
+{
+   rte_thash_gfni_supported = 0;
+#ifdef __GFNI__
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_GFNI))
+   rte_thash_gfni_supported = 1;
+#endif
+}
diff --git a/lib/hash/rte_thash.h b/lib/hash/rte_thash.h
index 76109fc..e4f14a5 100644
--- a/lib/hash/rte_thash.h
+++ b/lib/hash/rte_thash.h
@@ -28,6 +28,7 @@ extern "C" {
 #include 
 #include 
 #include 
+#include 
 
 #if defined(RTE_ARCH_X86) || defined(__ARM_NEON)
 #include 
@@ -113,6 +114,9 @@ union rte_thash_tuple {
 };
 #endif
 
+/** Flag indicating GFNI support */
+extern uint8_t rte_thash_gfni_supported;
+
 /**
  * Prepare special converted key to use with rte_softrss_be()
  * @param orig
@@ -223,6 +227,26 @@ rte_softrss_be(uint32_t *input_tuple, uint32_t input_len,
return ret;
 }
 
+/**
+ * Converts Toeplitz hash key (RSS key) into matrixes required
+ * for GFNI implementation
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param matrixes
+ *  pointer to the memory where matrices will be written.
+ *  Note: the size of this memory must be equal to size * 8
+ * @param rss_key
+ *  pointer to the Toeplitz hash key
+ * @param size
+ *  Size of the rss_key in bytes.
+ */
+__rte_experimental
+void
+rte_thash_complete_matrix(uint64_t *matrixes, const uint8_t *rss_key,
+   int size);
+
 /** @internal Logarithm of minimum size of the RSS ReTa */
 #defineRTE_THASH_RETA_SZ_MIN   2U
 /** @internal Logarithm of maximum size of the RSS ReTa */
diff --git a/lib/hash/rte_thash_gfni.h b/lib/hash/rte_thash_gfni.h
new file mode 100644
index 000..2e5de0d
--- /dev/null
+++ b/lib/hash/rte_thash_gfni.h
@@ -0,0 +1,232 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Intel Corporation
+ */
+
+#ifndef _RTE_THASH_GFNI_H_
+#define _RTE_THASH_GFNI_H_
+
+/**
+ * @file
+ *
+ * Optimized Toeplitz hash functions implementation
+ * using Galois Fields New Instructions.
+ */
+
+#include 
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifdef __GFNI__
+
+#define RTE_THASH_FIRST_ITER_MSK   0x0f0f0f0f0f0e0c08
+#define RTE_THASH_PERM_MSK 0x0f0f0f0f0f0f0f0f
+#define RTE_THASH_FIRST_ITER_MSK_2 0xf0f0f0f0f0e0c080
+#define RTE_THASH_PERM_MSK_2   0xf0f0f0f0f0f0f0f0
+#define RTE_THASH_REWIND_MSK   0x00113377
+
+__rte_internal
+static inline void
+__rte_thash_xor_reduce(__m512i xor_acc, uint32_t *val_1, uint32_t *val_2)
+{
+   __m256i tmp_256_1, tmp_256_2;
+   __m128i tmp128_1, tmp128_2;
+   uint64_t tmp_1, tmp_2;
+
+   tmp_256_1 = _mm512_castsi512_si256(xor_acc);
+   tmp_256_2 = _mm512_extracti32x8_epi

[dpdk-dev] [PATCH v2 2/5] hash: enable gfni thash implementation

2021-10-15 Thread Vladimir Medvedkin
This patch enables new GFNI Toeplitz hash in
predictable RSS library.

Signed-off-by: Vladimir Medvedkin 
---
 lib/hash/rte_thash.c | 44 
 lib/hash/rte_thash.h | 19 +++
 lib/hash/version.map |  1 +
 3 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/lib/hash/rte_thash.c b/lib/hash/rte_thash.c
index 59a8b8e..1f1d0cd 100644
--- a/lib/hash/rte_thash.c
+++ b/lib/hash/rte_thash.c
@@ -88,6 +88,8 @@ struct rte_thash_ctx {
uint32_treta_sz_log;/** < size of the RSS ReTa in bits */
uint32_tsubtuples_nb;   /** < number of subtuples */
uint32_tflags;
+   uint64_t*matrices;
+   /**< matrices used with rte_thash_gfni implementation */
uint8_t hash_key[0];
 };
 
@@ -256,12 +258,30 @@ rte_thash_init_ctx(const char *name, uint32_t key_len, 
uint32_t reta_sz,
ctx->hash_key[i] = rte_rand();
}
 
+   if (rte_thash_gfni_supported &&
+   (rte_vect_get_max_simd_bitwidth() >=
+   RTE_VECT_SIMD_512)) {
+   ctx->matrices = rte_zmalloc(NULL, key_len * sizeof(uint64_t),
+   RTE_CACHE_LINE_SIZE);
+   if (ctx->matrices == NULL) {
+   RTE_LOG(ERR, HASH, "Cannot allocate matrices\n");
+   rte_errno = ENOMEM;
+   goto free_ctx;
+   }
+
+   rte_thash_complete_matrix(ctx->matrices, ctx->hash_key,
+   key_len);
+   }
+
te->data = (void *)ctx;
TAILQ_INSERT_TAIL(thash_list, te, next);
 
rte_mcfg_tailq_write_unlock();
 
return ctx;
+
+free_ctx:
+   rte_free(ctx);
 free_te:
rte_free(te);
 exit:
@@ -375,6 +395,10 @@ generate_subkey(struct rte_thash_ctx *ctx, struct 
thash_lfsr *lfsr,
set_bit(ctx->hash_key, get_rev_bit_lfsr(lfsr), i);
}
 
+   if (ctx->matrices != NULL)
+   rte_thash_complete_matrix(ctx->matrices, ctx->hash_key,
+   ctx->key_len);
+
return 0;
 }
 
@@ -631,6 +655,12 @@ rte_thash_get_key(struct rte_thash_ctx *ctx)
return ctx->hash_key;
 }
 
+const uint64_t *
+rte_thash_get_gfni_matrices(struct rte_thash_ctx *ctx)
+{
+   return ctx->matrices;
+}
+
 static inline uint8_t
 read_unaligned_byte(uint8_t *ptr, unsigned int len, unsigned int offset)
 {
@@ -742,11 +772,17 @@ rte_thash_adjust_tuple(struct rte_thash_ctx *ctx,
attempts = RTE_MIN(attempts, 1U << (h->tuple_len - ctx->reta_sz_log));
 
for (i = 0; i < attempts; i++) {
-   for (j = 0; j < (tuple_len / 4); j++)
-   tmp_tuple[j] =
-   rte_be_to_cpu_32(*(uint32_t *)&tuple[j * 4]);
+   if (ctx->matrices != NULL)
+   hash = rte_thash_gfni(ctx->matrices, tuple, tuple_len);
+   else {
+   for (j = 0; j < (tuple_len / 4); j++)
+   tmp_tuple[j] =
+   rte_be_to_cpu_32(
+   *(uint32_t *)&tuple[j * 4]);
+
+   hash = rte_softrss(tmp_tuple, tuple_len / 4, hash_key);
+   }
 
-   hash = rte_softrss(tmp_tuple, tuple_len / 4, hash_key);
adj_bits = rte_thash_get_complement(h, hash, desired_value);
 
/*
diff --git a/lib/hash/rte_thash.h b/lib/hash/rte_thash.h
index e4f14a5..7afd359 100644
--- a/lib/hash/rte_thash.h
+++ b/lib/hash/rte_thash.h
@@ -412,6 +412,25 @@ const uint8_t *
 rte_thash_get_key(struct rte_thash_ctx *ctx);
 
 /**
+ * Get a pointer to the toeplitz hash matrices contained in the context.
+ * These matrices could be used with fast toeplitz hash implementation if
+ * CPU supports GFNI.
+ * Matrices changes after each addition of a helper.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param ctx
+ *  Thash context
+ * @return
+ *  A pointer to the toeplitz hash key matrices on success
+ *  NULL if GFNI is not supported.
+ */
+__rte_experimental
+const uint64_t *
+rte_thash_get_gfni_matrices(struct rte_thash_ctx *ctx);
+
+/**
  * Function prototype for the rte_thash_adjust_tuple
  * to check if adjusted tuple could be used.
  * Generally it is some kind of lookup function to check
diff --git a/lib/hash/version.map b/lib/hash/version.map
index cecf922..3eda695 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -43,6 +43,7 @@ EXPERIMENTAL {
rte_thash_find_existing;
rte_thash_free_ctx;
rte_thash_get_complement;
+   rte_thash_get_gfni_matrices;
rte_thash_get_helper;
rte_thash_get_key;
rte_thash_gfni_supported;
-- 
2.7.4



[dpdk-dev] [PATCH v2 3/5] doc/hash: update documentation for the thash library

2021-10-15 Thread Vladimir Medvedkin
This patch adds documentation for the new optimized Toeplitz hash
implementation using GFNI.

Signed-off-by: Vladimir Medvedkin 
---
 doc/guides/prog_guide/toeplitz_hash_lib.rst | 37 +
 doc/guides/rel_notes/release_21_11.rst  |  4 
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/toeplitz_hash_lib.rst 
b/doc/guides/prog_guide/toeplitz_hash_lib.rst
index f916857..6f50a18 100644
--- a/doc/guides/prog_guide/toeplitz_hash_lib.rst
+++ b/doc/guides/prog_guide/toeplitz_hash_lib.rst
@@ -19,24 +19,53 @@ to calculate the RSS hash sum to spread the traffic among 
the queues.
 Toeplitz hash function API
 --
 
-There are two functions that provide calculation of the Toeplitz hash sum:
+There are four functions that provide calculation of the Toeplitz hash sum:
 
 * ``rte_softrss()``
 * ``rte_softrss_be()``
+* ``rte_thash_gfni()``
+* ``rte_thash_gfni_x2()``
 
-Both of these functions take the parameters:
+First two functions are scalar implementation and take the parameters:
 
 * A pointer to the tuple, containing fields extracted from the packet.
 * A length of this tuple counted in double words.
 * A pointer to the RSS hash key corresponding to the one installed on the NIC.
 
-Both functions expect the tuple to be in "host" byte order
-and a multiple of 4 bytes in length.
+Both of abovementioned _softrss_ functions expect the tuple to be in
+"host" byte order and a multiple of 4 bytes in length.
 The ``rte_softrss()`` function expects the ``rss_key``
 to be exactly the same as the one installed on the NIC.
 The ``rte_softrss_be`` function is a faster implementation,
 but it expects ``rss_key`` to be converted to the host byte order.
 
+The last two functions are vectorized implementations using
+Galois Fields New Instructions. Could be used if ``rte_thash_gfni_supported`` 
is true.
+They expect the tuple to be in network byte order.
+
+``rte_thash_gfni()`` calculates the hash value for a single tuple, and
+``rte_thash_gfni_x2()`` calculates for a two independent tuples in one go.
+
+``rte_thash_gfni()`` takes the parameters:
+
+* A pointer to the matrixes derived from the RSS hash key using 
``rte_thash_complete_matrix()``.
+* A pointer to the tuple.
+* A length of the tuple in bytes.
+
+``rte_thash_gfni_x2()`` takes the parameters:
+
+* A pointer to the matrices derived from the RSS hash key using 
``rte_thash_complete_matrix()``.
+* Two tuple pointers.
+* A length of the longest tuple in bytes.
+* Two pointers on the ``uint32_t`` to write results to.
+
+``rte_thash_complete_matrix()`` is a function that calculates matrices 
required by
+GFNI implementations from the RSS hash key. It takes the parameters:
+
+* A pointer to the memory where the matrices will be written.
+* A pointer to the RSS hash key.
+* Length of the RSS hash key in bytes.
+
 
 Predictable RSS
 ---
diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index 4c56cdf..5b53117 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -159,6 +159,10 @@ New Features
   * Added tests to verify tunnel header verification in IPsec inbound.
   * Added tests to verify inner checksum.
 
+* **Added optimized Toeplitz hash implementation.**
+
+  Added optimized Toeplitz hash implementation using Galois Fields New 
Instructions.
+
 
 Removed Items
 -
-- 
2.7.4



[dpdk-dev] [PATCH v2 4/5] test/thash: add tests for a new Toeplitz hash function

2021-10-15 Thread Vladimir Medvedkin
This patch provides a set of tests for verifying the new
implementation of Toeplitz hash function using GFNI.

Signed-off-by: Vladimir Medvedkin 
---
 app/test/test_thash.c | 231 ++
 1 file changed, 231 insertions(+)

diff --git a/app/test/test_thash.c b/app/test/test_thash.c
index d8981fb..dac9caa 100644
--- a/app/test/test_thash.c
+++ b/app/test/test_thash.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "test.h"
 
@@ -78,6 +79,34 @@ uint8_t default_rss_key[] = {
 0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa,
 };
 
+static const uint8_t big_rss_key[] = {
+   0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
+   0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
+   0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
+   0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
+   0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa,
+   0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
+   0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
+   0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
+   0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
+   0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa,
+   0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
+   0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
+   0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
+   0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
+   0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa,
+   0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
+   0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
+   0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
+   0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
+   0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa,
+   0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
+   0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
+   0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
+   0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
+   0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa,
+};
+
 static int
 test_toeplitz_hash_calc(void)
 {
@@ -145,6 +174,204 @@ test_toeplitz_hash_calc(void)
 }
 
 static int
+test_toeplitz_hash_gfni(void)
+{
+   uint32_t i, j;
+   union rte_thash_tuple tuple;
+   uint32_t rss_l3, rss_l3l4;
+   uint64_t rss_key_matrixes[RTE_DIM(default_rss_key)];
+
+   if (!rte_thash_gfni_supported)
+   return TEST_SKIPPED;
+
+   /* Convert RSS key into matrixes */
+   rte_thash_complete_matrix(rss_key_matrixes, default_rss_key,
+   RTE_DIM(default_rss_key));
+
+   for (i = 0; i < RTE_DIM(v4_tbl); i++) {
+   tuple.v4.src_addr = rte_cpu_to_be_32(v4_tbl[i].src_ip);
+   tuple.v4.dst_addr = rte_cpu_to_be_32(v4_tbl[i].dst_ip);
+   tuple.v4.sport = rte_cpu_to_be_16(v4_tbl[i].dst_port);
+   tuple.v4.dport = rte_cpu_to_be_16(v4_tbl[i].src_port);
+
+   rss_l3 = rte_thash_gfni(rss_key_matrixes, (uint8_t *)&tuple,
+   RTE_THASH_V4_L3_LEN * 4);
+   rss_l3l4 = rte_thash_gfni(rss_key_matrixes, (uint8_t *)&tuple,
+   RTE_THASH_V4_L4_LEN * 4);
+   if ((rss_l3 != v4_tbl[i].hash_l3) ||
+   (rss_l3l4 != v4_tbl[i].hash_l3l4))
+   return -TEST_FAILED;
+   }
+
+   for (i = 0; i < RTE_DIM(v6_tbl); i++) {
+   for (j = 0; j < RTE_DIM(tuple.v6.src_addr); j++)
+   tuple.v6.src_addr[j] = v6_tbl[i].src_ip[j];
+   for (j = 0; j < RTE_DIM(tuple.v6.dst_addr); j++)
+   tuple.v6.dst_addr[j] = v6_tbl[i].dst_ip[j];
+   tuple.v6.sport = rte_cpu_to_be_16(v6_tbl[i].dst_port);
+   tuple.v6.dport = rte_cpu_to_be_16(v6_tbl[i].src_port);
+   rss_l3 = rte_thash_gfni(rss_key_matrixes, (uint8_t *)&tuple,
+   RTE_THASH_V6_L3_LEN * 4);
+   rss_l3l4 = rte_thash_gfni(rss_key_matrixes, (uint8_t *)&tuple,
+   RTE_THASH_V6_L4_LEN * 4);
+   if ((rss_l3 != v6_tbl[i].hash_l3) ||
+   (rss_l3l4 != v6_tbl[i].hash_l3l4))
+   return -TEST_FAILED;
+   }
+
+   return TEST_SUCCESS;
+}
+
+#define DATA_SZ4
+#define ITER   1000
+
+enum {
+   SCALAR_DATA_BUF_1_HASH_IDX = 0,
+   SCALAR_DATA_BUF_2_HASH_IDX,
+   GFNI_DATA_BUF_1_HASH_IDX,
+   GFNI_DATA_BUF_2_HASH_IDX,
+   GFNI_X2_DATA_BUF_1_HASH_IDX,
+   GFNI_X2_DATA_BUF_2_HASH_IDX,
+   HASH_IDXES
+};
+
+static int
+test_toeplitz_hash_rand_data(void)
+{
+   uint32_t data[2][DATA_SZ];
+   uint32_t scalar_data[2][DATA_SZ];
+   uint32_t hash[HASH_IDXES] = { 0 };
+   uint64_t rss_key_matrixes[RTE_DIM(default_rss_key)];
+   int i, j;
+
+   if (!rte_thash_gfni_supported)
+   return TEST_SKIPPED;
+
+   rte_thash_complete_matrix(rss_key_

[dpdk-dev] [PATCH v2 5/5] test/thash: add performance tests for the Toeplitz hash

2021-10-15 Thread Vladimir Medvedkin
This patch adds performance tests for different implementations
of the Toeplitz hash function.

Signed-off-by: Vladimir Medvedkin 
---
 app/test/meson.build   |   2 +
 app/test/test_thash_perf.c | 125 +
 2 files changed, 127 insertions(+)
 create mode 100644 app/test/test_thash_perf.c

diff --git a/app/test/meson.build b/app/test/meson.build
index f144d8b..b9c4e78 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -141,6 +141,7 @@ test_sources = files(
 'test_table_tables.c',
 'test_tailq.c',
 'test_thash.c',
+'test_thash_perf.c',
 'test_timer.c',
 'test_timer_perf.c',
 'test_timer_racecond.c',
@@ -315,6 +316,7 @@ perf_test_names = [
 'hash_readwrite_lf_perf_autotest',
 'trace_perf_autotest',
 'ipsec_perf_autotest',
+   'thash_perf_autotest',
 ]
 
 driver_test_names = [
diff --git a/app/test/test_thash_perf.c b/app/test/test_thash_perf.c
new file mode 100644
index 000..ccc4710
--- /dev/null
+++ b/app/test/test_thash_perf.c
@@ -0,0 +1,125 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Intel Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define ITERATIONS (1 << 15)
+#defineBATCH_SZ(1 << 10)
+
+#define IPV4_2_TUPLE_LEN   (8)
+#define IPV4_4_TUPLE_LEN   (12)
+#define IPV6_2_TUPLE_LEN   (32)
+#define IPV6_4_TUPLE_LEN   (36)
+
+
+static uint8_t default_rss_key[] = {
+   0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
+   0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
+   0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
+   0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
+   0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa,
+};
+
+static void
+run_thash_test(unsigned int tuple_len)
+{
+   uint32_t *tuples[BATCH_SZ];
+   unsigned int i, j;
+   uint64_t start_tsc, end_tsc;
+   uint32_t len = RTE_ALIGN_CEIL(tuple_len, sizeof(uint32_t));
+   volatile uint32_t hash = 0;
+   uint32_t hash_1 = 0;
+   uint32_t hash_2 = 0;
+
+   for (i = 0; i < BATCH_SZ; i++) {
+   tuples[i] = rte_zmalloc(NULL, len, 0);
+   for (j = 0; j < len / sizeof(uint32_t); j++)
+   tuples[i][j] = rte_rand();
+   }
+
+   start_tsc = rte_rdtsc_precise();
+   for (i = 0; i < ITERATIONS; i++) {
+   for (j = 0; j < BATCH_SZ; j++) {
+   hash ^= rte_softrss(tuples[j], len / sizeof(uint32_t),
+   default_rss_key);
+   }
+   }
+   end_tsc = rte_rdtsc_precise();
+
+   printf("Average rte_softrss() takes \t\t%.1f cycles for key len %d\n",
+   (double)(end_tsc - start_tsc) / (double)(ITERATIONS *
+   BATCH_SZ), len);
+
+   start_tsc = rte_rdtsc_precise();
+   for (i = 0; i < ITERATIONS; i++) {
+   for (j = 0; j < BATCH_SZ; j++) {
+   hash ^= rte_softrss_be(tuples[j], len /
+   sizeof(uint32_t), default_rss_key);
+   }
+   }
+   end_tsc = rte_rdtsc_precise();
+
+   printf("Average rte_softrss_be() takes \t\t%.1f cycles for key len 
%d\n",
+   (double)(end_tsc - start_tsc) / (double)(ITERATIONS *
+   BATCH_SZ), len);
+
+   if (!rte_thash_gfni_supported)
+   return;
+
+   uint64_t rss_key_matrixes[RTE_DIM(default_rss_key)];
+
+   rte_thash_complete_matrix(rss_key_matrixes, default_rss_key,
+   RTE_DIM(default_rss_key));
+
+   start_tsc = rte_rdtsc_precise();
+   for (i = 0; i < ITERATIONS; i++) {
+   for (j = 0; j < BATCH_SZ; j++)
+   hash ^= rte_thash_gfni(rss_key_matrixes,
+   (uint8_t *)tuples[j], len);
+   }
+   end_tsc = rte_rdtsc_precise();
+
+   printf("Average rte_thash_gfni takes \t\t%.1f cycles for key len %d\n",
+   (double)(end_tsc - start_tsc) / (double)(ITERATIONS *
+   BATCH_SZ), len);
+
+   start_tsc = rte_rdtsc_precise();
+   for (i = 0; i < ITERATIONS; i++) {
+   for (j = 0; j < BATCH_SZ; j += 2) {
+   rte_thash_gfni_x2(rss_key_matrixes,
+   (uint8_t *)tuples[j], (uint8_t *)tuples[j + 1],
+   len, &hash_1, &hash_2);
+
+   hash ^= hash_1 ^ hash_2;
+   }
+   }
+   end_tsc = rte_rdtsc_precise();
+
+   printf("Average rte_thash_gfni_x2 takes \t%.1f cycles for key len %d\n",
+   (double)(end_tsc - start_tsc) / (double)(ITERATIONS *
+   BATCH_SZ), len);
+}
+
+static int
+test_thash_perf(void)
+{
+   run_thash_test(IPV4_2_TUPLE_LEN);
+   run_thash_test(IPV4_4_TUPLE_LEN);
+   run_thash_test(IPV6_2_TUPLE_LEN);
+   run_thash_test(I

Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread Andrew Rybchenko
On 10/15/21 12:18 PM, Dmitry Kozlyuk wrote:
>> -Original Message-
>> From: Andrew Rybchenko 
>> [...]
>>> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
>>> index 51c0ba2931..2204f140b3 100644
>>> --- a/lib/mempool/rte_mempool.c
>>> +++ b/lib/mempool/rte_mempool.c
>>> @@ -371,6 +371,8 @@ rte_mempool_populate_iova(struct rte_mempool *mp,
>>> char *vaddr,
>>>
>>>   STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next);
>>>   mp->nb_mem_chunks++;
>>> + if (iova == RTE_BAD_IOVA)
>>> + mp->flags |= MEMPOOL_F_NON_IO;
>>
>> As I understand rte_mempool_populate_iova() may be called few times for
>> one mempool. The flag must be set if all invocations are done with
>> RTE_BAD_IOVA. So, it should be set by default and just removed when iova
>> != RTE_BAD_IOVA happens.
> 
> I don't agree at all. If any object of the pool is unsuitable for IO,
> the pool cannot be considered suitable for IO. So if there's a single
> invocation with RTE_BAD_IOVA, the flag must be set forever.

If so, some objects may be used for IO, some cannot be used.
What should happen if an application allocates an object
which is suitable for IO and try to use it this way?

> 
>> Yes, it is a corner case. May be it makes sense to cover it by unit test
>> as well.
> 
> True for either your logic or mine, I'll add it.
> 
> Ack on the rest of the comments, thanks.
> 



Re: [dpdk-dev] [PATCH v3] test/hash: fix buffer overflow

2021-10-15 Thread David Marchand
On Thu, Oct 14, 2021 at 7:55 PM Vladimir Medvedkin
 wrote:
> @@ -1607,6 +1611,17 @@ static struct rte_hash_parameters hash_params_ex = {
>  };
>
>  /*
> + * Wrapper function around rte_jhash_32b.
> + * It is required because rte_jhash_32b() accepts the length
> + * as size of 4-byte units.
> + */
> +static inline uint32_t
> +test_jhash_32b(const void *k, uint32_t length, uint32_t initval)
> +{
> +   return rte_jhash_32b(k, length >> 2, initval);
> +}

I am confused.
Does it mean that rte_jhash_32b is not compliant with rte_hash_create API?


-- 
David Marchand



Re: [dpdk-dev] [PATCH v12 02/12] librte_pcapng: add new library for writing pcapng files

2021-10-15 Thread Pattan, Reshma



> -Original Message-
> From: dev  On Behalf Of Stephen Hemminger
> See draft RFC
>   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html

The page is not found.  Might need to add new link I guess

> +enum pcapng_interface_options {
> + PCAPNG_IFB_NAME  = 2,
> + PCAPNG_IFB_DESCRIPTION,

Can IFB(interface block) be replaced with IF(interface) only?  But that's ok, 
upto u.


> + buf = calloc(1, len);
> + if (!buf)
> + return -1;

How about returning -ENOMEM

> +
> + hdr = (struct pcapng_section_header *)buf;
> + *hdr = (struct pcapng_section_header) {
> + .block_type = PCAPNG_SECTION_BLOCK,
> + .block_length = len,
> + .byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
> + .major_version = PCAPNG_MAJOR_VERS,
> + .minor_version = PCAPNG_MINOR_VERS,
> + .section_length = UINT64_MAX,
> + };
> + hdr->block_length = len;

Why to assign block_len with len again? as it is already done few lines above.

> + opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);

Some comments around this code, about adding end of options at the end of 
options list would be helpful.

> +
> +/* Write the PCAPNG section header at start of file */ static ssize_t

:s/section header/ interface header?

> +pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
> + if (mac_addr)
> + len += pcapng_optlen(6);

How about using  RTE_ETHER_ADDR_LEN instead of 6

> +struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue,

> +fail:
> + rte_pktmbuf_free(mc);


Freeing mc , would that take care of freeing  up the additional byte prepended 
after mc creation?

> + opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
> + &queue, sizeof(queue));

Don't we need to add end of options to the end of option list, like did in 
Interface block and section header block?

> diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h new file mode
> + *
> + * Packets to be captured are copied by rte_pcapng_mbuf()

Do you mean by rte_pcapng_copy()?




Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart

2021-10-15 Thread Andrew Rybchenko
On 10/15/21 12:04 PM, Dmitry Kozlyuk wrote:
>> [...]
> From: Dmitry Kozlyuk 
>
> rte_flow_action_handle_create() did not mention what happens with an
> indirect action when a device is stopped, possibly reconfigured, and
> started again. It is natural for some indirect actions to be
> persistent, like counters and meters; keeping others just saves
> application time and complexity. However, not all PMDs can support it.
> It is proposed to add a device capability to indicate if indirect
> actions are kept across the above sequence or implicitly destroyed.
>
> In the future, indirect actions may not be the only type of objects
> shared between flow rules. The capability bit intends to cover all
> possible types of such objects, hence its name.
>
> It may happen that in the future a PMD acquires support for a type
> of shared objects that it cannot keep across a restart. It is
> undesirable to stop advertising the capability so that applications
> that don't use objects of the problematic type can still take
>> advantage of it.
> This is why PMDs are allowed to keep only a subset of shared objects
> provided that the vendor mandatorily documents it.
>
> If the device is being reconfigured in a way that is incompatible
> with an existing shared objects, PMD is required to report an error.
> This is mandatory, because flow API does not supply users with
> capabilities, so this is the only way for a user to learn that
> configuration is invalid. For example, if queue count changes and
> RSS indirect action specifies queues that are going away, the user
> must update the action before removing the queues or remove the
> action and all flow rules that were using it.
>
> Signed-off-by: Dmitry Kozlyuk 
> ---
> [...]

 Current pain point is that capability bits may be insufficient and a
 programmatic way is desired to check which types of objects can be
 kept across restart, instead of documenting the limitations.

 I support one of previous Ori's suggestions and want to clarify it [1]:

 Ori: "Another way is to assume that if the action was created before
 port start it will be kept after port stop."
 Andrew: "It does not sound like a solution. May be I simply don't
 know target usecase."

 What Ori suggests (offline discussion summary): Suppose an
 application wants to check whether a shared object (indirect action)
 or a flow rule of a particular kind. It calls
 rte_flow_action_handle_create() or
 rte_flow_create() before rte_eth_dev_start(). If it succeeds, 1) it
 means objects of this type can be kept across restart, 2) it's a
 normal object created that will work after the port is started. This
 is logical, because if the PMD can keep some kind of objects when the
 port is stopped, it is likely to be able to create them when the port
 is not started. It is subject to discussion if "object kind" means
 only "type" or "type + transfer bit" combination; for mlx5 PMD it
 doesn't matter. One minor drawback is that applications can only do
 the test when the port is stopped, but it seems likely that the test
 really needs to be done at startup anyway.

 If this is acceptable:
 1. Capability bits are not needed anymore.
 2. ethdev patches can be accepted in RC1, present behavior is
 undefined anyway.
 3. PMD patches will need update that can be done by RC2.
>>>
>>> Andrew, what do you think?
>>> If you agree, do we need to include transfer bit into "kind"?
>>> I'd like to conclude before RC1 and can update the docs quickly.
>>>
>>> I've seen the proposition to advertise capability to create flow rules
>>> before device start as a flag.
>>> I don't think it conflicts with Ori's suggestion because the flag
>>> doesn't imply that _any_ rule can be created, neither does it say
>>> about indirect actions.
>>> On the other hand, if PMD can create a flow object (rule, etc.) when
>>> the device is not started, it is logical to assume that after the
>>> device is stopped it can move existing flow objects to the same state
>>> as when the device was not started, then restore when it is started
>>> again.
>>
>> Dmitry, thanks for the explanations. Ori's idea makes sense to me now. The
>> problem is to document it properly. We must define rules to check it.
>> Which bits in the check request matter and how application should make a
>> choice of rule to try.
> 
> This is a generalization of the last question about the transfer bit.
> I call the bits that matter a "kind". As I see it:
> 
> rule kind = seq. of item types + seq. of action types
> indirect action kind = action type
> 
> As Ori mentioned, for mlx5 PMD transfer bit doesn't affect object persistence.
> If you or other PMD maintainers think it may be relevant, no problem,
> because PMDs like mlx5 will just ignore it 

Re: [dpdk-dev] [PATCH] net/i40e: fix IPv6 fragment RSS offload type in flow

2021-10-15 Thread Jiang, YuX
> -Original Message-
> From: dev  On Behalf Of Alvin Zhang
> Sent: Tuesday, October 12, 2021 4:40 PM
> To: Xing, Beilei ; Guo, Junfeng
> 
> Cc: dev@dpdk.org; Zhang, AlvinX ;
> sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH] net/i40e: fix IPv6 fragment RSS offload type in
> flow
> 
> To keep flow format uniform with ice, this patch adds support for this RSS
> rule:
> flow create 0 ingress pattern eth / ipv6_frag_ext / end actions \
> rss types ipv6-frag end queues end queues end / end
> 
> Fixes: ef4c16fd9148 ("net/i40e: refactor RSS flow")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Alvin Zhang 
Tested-by: Yu Jiang 


Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread Dmitry Kozlyuk


> -Original Message-
> From: Andrew Rybchenko 
> Sent: 15 октября 2021 г. 12:34
> To: Dmitry Kozlyuk ; dev@dpdk.org
> Cc: Matan Azrad ; Olivier Matz 
> Subject: Re: [PATCH v4 2/4] mempool: add non-IO flag
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/15/21 12:18 PM, Dmitry Kozlyuk wrote:
> >> -Original Message-
> >> From: Andrew Rybchenko  [...]
> >>> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> >>> index 51c0ba2931..2204f140b3 100644
> >>> --- a/lib/mempool/rte_mempool.c
> >>> +++ b/lib/mempool/rte_mempool.c
> >>> @@ -371,6 +371,8 @@ rte_mempool_populate_iova(struct rte_mempool
> >>> *mp, char *vaddr,
> >>>
> >>>   STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next);
> >>>   mp->nb_mem_chunks++;
> >>> + if (iova == RTE_BAD_IOVA)
> >>> + mp->flags |= MEMPOOL_F_NON_IO;
> >>
> >> As I understand rte_mempool_populate_iova() may be called few times
> >> for one mempool. The flag must be set if all invocations are done
> >> with RTE_BAD_IOVA. So, it should be set by default and just removed
> >> when iova != RTE_BAD_IOVA happens.
> >
> > I don't agree at all. If any object of the pool is unsuitable for IO,
> > the pool cannot be considered suitable for IO. So if there's a single
> > invocation with RTE_BAD_IOVA, the flag must be set forever.
> 
> If so, some objects may be used for IO, some cannot be used.
> What should happen if an application allocates an object which is suitable
> for IO and try to use it this way?

Never mind, I was thinking in v1 mode when the application marked mempools
as not suitable for IO. Since now they're marked automatically, you're correct:
the flag must be set if and only if there is no chance
that objects from this pool will be used for IO.


Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread Olivier Matz
On Fri, Oct 15, 2021 at 12:33:31PM +0300, Andrew Rybchenko wrote:
> On 10/15/21 12:18 PM, Dmitry Kozlyuk wrote:
> >> -Original Message-
> >> From: Andrew Rybchenko 
> >> [...]
> >>> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> >>> index 51c0ba2931..2204f140b3 100644
> >>> --- a/lib/mempool/rte_mempool.c
> >>> +++ b/lib/mempool/rte_mempool.c
> >>> @@ -371,6 +371,8 @@ rte_mempool_populate_iova(struct rte_mempool *mp,
> >>> char *vaddr,
> >>>
> >>>   STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next);
> >>>   mp->nb_mem_chunks++;
> >>> + if (iova == RTE_BAD_IOVA)
> >>> + mp->flags |= MEMPOOL_F_NON_IO;
> >>
> >> As I understand rte_mempool_populate_iova() may be called few times for
> >> one mempool. The flag must be set if all invocations are done with
> >> RTE_BAD_IOVA. So, it should be set by default and just removed when iova
> >> != RTE_BAD_IOVA happens.
> > 
> > I don't agree at all. If any object of the pool is unsuitable for IO,
> > the pool cannot be considered suitable for IO. So if there's a single
> > invocation with RTE_BAD_IOVA, the flag must be set forever.
> 
> If so, some objects may be used for IO, some cannot be used.
> What should happen if an application allocates an object
> which is suitable for IO and try to use it this way?

If the application can predict if the allocated object is usable for IO
before allocating it, I would be surprised to have it used for IO. I agree
with Dmitry here.


[dpdk-dev] [PATCH v3 0/3] support PPPoL2TPv2oUDP RSS Hash

2021-10-15 Thread Jie Wang
Support IAVF PPPoL2TPv2oUDP RSS Hash. Required to distribute packets
based on inner IP src+dest address and TCP/UDP src+dest port.

---
v3:
 * add testpmd match ppp and l2tpv2 protocol header fields value.
 * add the code of l2tpv2_encap.
 * update the title of ethdev patch and adjust the position of
   the added code.

v2:
 * update the rte_flow.rst and release notes.
 * update l2tpv2 header format.

Jie Wang (3):
  ethdev: support PPP and L2TPV2 procotol
  net/iavf: support PPPoL2TPv2oUDP RSS Hash
  app/testpmd: support L2TPV2 and PPP protocol pattern

 app/test-pmd/cmdline.c | 244 +++
 app/test-pmd/cmdline_flow.c| 396 +
 app/test-pmd/testpmd.h |  22 ++
 doc/guides/prog_guide/rte_flow.rst |  25 ++
 doc/guides/rel_notes/release_21_11.rst |   5 +
 drivers/net/iavf/iavf_generic_flow.c   | 131 
 drivers/net/iavf/iavf_generic_flow.h   |  15 +
 drivers/net/iavf/iavf_hash.c   | 108 ++-
 lib/ethdev/rte_flow.c  |   2 +
 lib/ethdev/rte_flow.h  | 117 
 lib/net/rte_l2tpv2.h   | 214 +
 11 files changed, 1277 insertions(+), 2 deletions(-)
 create mode 100644 lib/net/rte_l2tpv2.h

-- 
2.25.1



[dpdk-dev] [PATCH v3 1/3] ethdev: support PPP and L2TPV2 procotol

2021-10-15 Thread Jie Wang
Added flow pattern items and header formats of L2TPv2 and PPP to
support PPP over L2TPv2 over UDP protocol RSS Hash.

Signed-off-by: Wenjun Wu 
Signed-off-by: Jie Wang 
---
 doc/guides/prog_guide/rte_flow.rst |  25 +++
 doc/guides/rel_notes/release_21_11.rst |   5 +
 lib/ethdev/rte_flow.c  |   2 +
 lib/ethdev/rte_flow.h  | 117 ++
 lib/net/rte_l2tpv2.h   | 214 +
 5 files changed, 363 insertions(+)
 create mode 100644 lib/net/rte_l2tpv2.h

diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index 3cb014c1fa..59fe7e79b5 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1425,6 +1425,31 @@ Matches a conntrack state after conntrack action.
 - ``flags``: conntrack packet state flags.
 - Default ``mask`` matches all state bits.
 
+Item: ``L2TPV2``
+^^^
+
+Matches a L2TPv2 header.
+
+- ``flags_version``: flags(12b), version(4b).
+- ``length``: total length of the message.
+- ``tunnel_id``: identifier for the control connection.
+- ``session_id``: identifier for a session within a tunnel.
+- ``ns``: sequence number for this date or control message.
+- ``nr``: sequence number expected in the next control message to be received.
+- ``offset_size``: offset of payload data.
+- ``offset_padding``: offset padding, variable length.
+- Default ``mask`` matches flags_version only.
+
+Item: ``PPP``
+^^^
+
+Matches a PPP header.
+
+- ``addr``: ppp address.
+- ``ctrl``: ppp control.
+- ``proto_id``: ppp protocol identifier.
+- Default ``mask`` matches addr, ctrl, proto_id.
+
 Actions
 ~~~
 
diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index d5c762df62..503f6dd828 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -81,6 +81,11 @@ New Features
   * Default VLAN strip behavior was changed. VLAN tag won't be stripped
 unless ``DEV_RX_OFFLOAD_VLAN_STRIP`` offload is enabled.
 
+* **Added L2TPV2 and PPP protocol support in rte_flow.**
+
+  Added flow pattern items and header formats of L2TPv2 and PPP to support
+  PPP over L2TPv2 over UDP protocol RSS Hash.
+
 * **Updated AF_XDP PMD.**
 
   * Disabled secondary process support.
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index 8cb7a069c8..1ec739a031 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -100,6 +100,8 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] 
= {
MK_FLOW_ITEM(GENEVE_OPT, sizeof(struct rte_flow_item_geneve_opt)),
MK_FLOW_ITEM(INTEGRITY, sizeof(struct rte_flow_item_integrity)),
MK_FLOW_ITEM(CONNTRACK, sizeof(uint32_t)),
+   MK_FLOW_ITEM(L2TPV2, sizeof(struct rte_flow_item_l2tpv2)),
+   MK_FLOW_ITEM(PPP, sizeof(struct rte_flow_item_ppp)),
 };
 
 /** Generate flow_action[] entry. */
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 5f87851f8c..f8fcf9c1f8 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef __cplusplus
 extern "C" {
@@ -574,6 +575,21 @@ enum rte_flow_item_type {
 * @see struct rte_flow_item_conntrack.
 */
RTE_FLOW_ITEM_TYPE_CONNTRACK,
+
+   /**
+* Matches L2TPV2 Header.
+*
+* See struct rte_flow_item_l2tpv2.
+*/
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+
+   /**
+* Matches PPP Header.
+*
+* See struct rte_flow_item_ppp.
+*/
+   RTE_FLOW_ITEM_TYPE_PPP,
+
 };
 
 /**
@@ -1799,6 +1815,55 @@ static const struct rte_flow_item_conntrack 
rte_flow_item_conntrack_mask = {
 };
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ * RTE_FLOW_ITEM_TYPE_L2TPV2
+ *
+ * Matches L2TPv2 Header
+ */
+struct rte_flow_item_l2tpv2 {
+   struct rte_l2tpv2_combined_msg_hdr hdr;
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_L2TPV2. */
+#ifndef __cplusplus
+static const struct rte_flow_item_l2tpv2 rte_flow_item_l2tpv2_mask = {
+   /*
+* flags and version bit mask
+* 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+* T L x x S x O P x x x x V V V V
+*/
+   .hdr = {
+   .common = {
+   .flags_version = 0xcb0f,
+   },
+   },
+};
+#endif
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ * RTE_FLOW_ITEM_TYPE_PPP
+ *
+ * Matches PPP Header
+ */
+struct rte_flow_item_ppp {
+   uint8_t addr; /**< ppp address(8) */
+   uint8_t ctrl; /**< ppp control(8) */
+   rte_be16_t proto_id; /**< ppp protocol id(16) */
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_PPP. */
+#ifndef __cplusplus
+static const struct rte_flow_item_ppp rte_flow_item_ppp_mask = {
+   .addr = 0xff,
+   .ctrl = 0xff,
+   .proto_id = 0x,
+};
+#endif
+
 /**
  * Matching pattern 

Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread Dmitry Kozlyuk



> -Original Message-
> From: Olivier Matz 
> Sent: 15 октября 2021 г. 12:43
> To: Andrew Rybchenko 
> Cc: Dmitry Kozlyuk ; dev@dpdk.org; Matan Azrad
> 
> Subject: Re: [PATCH v4 2/4] mempool: add non-IO flag
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Oct 15, 2021 at 12:33:31PM +0300, Andrew Rybchenko wrote:
> > On 10/15/21 12:18 PM, Dmitry Kozlyuk wrote:
> > >> -Original Message-
> > >> From: Andrew Rybchenko  [...]
> > >>> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> > >>> index 51c0ba2931..2204f140b3 100644
> > >>> --- a/lib/mempool/rte_mempool.c
> > >>> +++ b/lib/mempool/rte_mempool.c
> > >>> @@ -371,6 +371,8 @@ rte_mempool_populate_iova(struct rte_mempool
> > >>> *mp, char *vaddr,
> > >>>
> > >>>   STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next);
> > >>>   mp->nb_mem_chunks++;
> > >>> + if (iova == RTE_BAD_IOVA)
> > >>> + mp->flags |= MEMPOOL_F_NON_IO;
> > >>
> > >> As I understand rte_mempool_populate_iova() may be called few times
> > >> for one mempool. The flag must be set if all invocations are done
> > >> with RTE_BAD_IOVA. So, it should be set by default and just removed
> > >> when iova != RTE_BAD_IOVA happens.
> > >
> > > I don't agree at all. If any object of the pool is unsuitable for
> > > IO, the pool cannot be considered suitable for IO. So if there's a
> > > single invocation with RTE_BAD_IOVA, the flag must be set forever.
> >
> > If so, some objects may be used for IO, some cannot be used.
> > What should happen if an application allocates an object which is
> > suitable for IO and try to use it this way?
> 
> If the application can predict if the allocated object is usable for IO
> before allocating it, I would be surprised to have it used for IO. I agree
> with Dmitry here.

The flag hints to components, PMDs before all,
that objects from this mempool will never be used for IO,
so that the component can save some memory mapping or DMA configuration.
If the flag is set when even a single object may be used for IO,
the consumer of the flag will not be ready for that.
Whatever a corner case it is, Andrew is correct.
There is a subtle difference between "pool is not usable"
(as described now) and "objects from this mempool will never be used"
(as stated above), I'll highlight it in the flag description.



[dpdk-dev] [PATCH v3 2/3] net/iavf: support PPPoL2TPv2oUDP RSS Hash

2021-10-15 Thread Jie Wang
Add support for PPP over L2TPv2 over UDP protocol RSS Hash based
on inner IP src/dst address and TCP/UDP src/dst port.

Patterns are listed below:
eth/ipv4(6)/udp/l2tpv2/ppp/ipv4(6)
eth/ipv4(6)/udp/l2tpv2/ppp/ipv4(6)/udp
eth/ipv4(6)/udp/l2tpv2/ppp/ipv4(6)/tcp

Signed-off-by: Wenjun Wu 
Signed-off-by: Jie Wang 
---
 drivers/net/iavf/iavf_generic_flow.c | 131 +++
 drivers/net/iavf/iavf_generic_flow.h |  15 +++
 drivers/net/iavf/iavf_hash.c | 108 +-
 3 files changed, 252 insertions(+), 2 deletions(-)

diff --git a/drivers/net/iavf/iavf_generic_flow.c 
b/drivers/net/iavf/iavf_generic_flow.c
index b86d99e57d..364904fa02 100644
--- a/drivers/net/iavf/iavf_generic_flow.c
+++ b/drivers/net/iavf/iavf_generic_flow.c
@@ -1611,6 +1611,137 @@ enum rte_flow_item_type 
iavf_pattern_eth_ipv6_gre_ipv6_udp[] = {
RTE_FLOW_ITEM_TYPE_END,
 };
 
+/* PPPoL2TPv2oUDP */
+enum rte_flow_item_type iavf_pattern_eth_ipv4_udp_l2tpv2_ppp_ipv4[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_udp_l2tpv2_ppp_ipv6[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_udp_l2tpv2_ppp_ipv4_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_udp_l2tpv2_ppp_ipv4_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_TCP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_udp_l2tpv2_ppp_ipv6_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_udp_l2tpv2_ppp_ipv6_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_TCP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_udp_l2tpv2_ppp_ipv4[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_udp_l2tpv2_ppp_ipv6[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_udp_l2tpv2_ppp_ipv4_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_udp_l2tpv2_ppp_ipv4_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_TCP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_udp_l2tpv2_ppp_ipv6_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_udp_l2tpv2_ppp_ipv6_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_L2TPV2,
+   RTE_FLOW_ITEM_TYPE_PPP,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_TCP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+
+
 typedef struct iavf_flow_engine * (*parse_engine_t)(struct iavf_adapter *ad,
   

[dpdk-dev] [PATCH v3 3/3] app/testpmd: support L2TPV2 and PPP protocol pattern

2021-10-15 Thread Jie Wang
Add support for test-pmd to parse protocol pattern L2TPv2 and PPP.

Signed-off-by: Wenjun Wu 
Signed-off-by: Jie Wang 
---
 app/test-pmd/cmdline.c  | 244 ++
 app/test-pmd/cmdline_flow.c | 396 
 app/test-pmd/testpmd.h  |  22 ++
 3 files changed, 662 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 36d50fd3c7..bba761ad4b 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -13300,6 +13300,247 @@ cmdline_parse_inst_t cmd_set_nvgre_with_vlan = {
},
 };
 
+/** Set L2TPV2 encapsulation details */
+struct cmd_set_l2tpv2_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t l2tpv2;
+   cmdline_fixed_string_t pos_token;
+   cmdline_fixed_string_t ip_version;
+   uint32_t vlan_present:1;
+   uint16_t flags_version;
+   uint16_t session_id;
+   uint16_t udp_src;
+   uint16_t udp_dst;
+   cmdline_ipaddr_t ip_src;
+   cmdline_ipaddr_t ip_dst;
+   uint16_t tci;
+   uint8_t tos;
+   uint8_t ttl;
+   struct rte_ether_addr eth_src;
+   struct rte_ether_addr eth_dst;
+};
+
+cmdline_parse_token_string_t cmd_set_l2tpv2_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, set, "set");
+cmdline_parse_token_string_t cmd_set_l2tpv2_l2tpv2 =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, l2tpv2, 
"l2tpv2");
+cmdline_parse_token_string_t cmd_set_l2tpv2_l2tpv2_tos_ttl =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, l2tpv2,
+"l2tpv2-tos-ttl");
+cmdline_parse_token_string_t cmd_set_l2tpv2_l2tpv2_with_vlan =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, l2tpv2,
+"l2tpv2-with-vlan");
+cmdline_parse_token_string_t cmd_set_l2tpv2_ip_version =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, pos_token,
+"ip-version");
+cmdline_parse_token_string_t cmd_set_l2tpv2_ip_version_value =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, ip_version,
+"ipv4#ipv6");
+cmdline_parse_token_string_t cmd_set_l2tpv2_flags_version =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, pos_token,
+"flags_version");
+cmdline_parse_token_num_t cmd_set_l2tpv2_flags_version_value =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_l2tpv2_result, flags_version,
+ RTE_UINT16);
+cmdline_parse_token_string_t cmd_set_l2tpv2_session_id =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, pos_token,
+"session_id");
+cmdline_parse_token_num_t cmd_set_l2tpv2_session_id_value =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_l2tpv2_result, session_id,
+ RTE_UINT16);
+cmdline_parse_token_string_t cmd_set_l2tpv2_udp_src =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, pos_token,
+"udp-src");
+cmdline_parse_token_num_t cmd_set_l2tpv2_udp_src_value =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_l2tpv2_result, udp_src,
+ RTE_UINT16);
+cmdline_parse_token_string_t cmd_set_l2tpv2_udp_dst =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, pos_token,
+"udp-dst");
+cmdline_parse_token_num_t cmd_set_l2tpv2_udp_dst_value =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_l2tpv2_result, udp_dst,
+ RTE_UINT16);
+cmdline_parse_token_string_t cmd_set_l2tpv2_ip_tos =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, pos_token,
+"ip-tos");
+cmdline_parse_token_num_t cmd_set_l2tpv2_ip_tos_value =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_l2tpv2_result, tos, RTE_UINT8);
+cmdline_parse_token_string_t cmd_set_l2tpv2_ip_ttl =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, pos_token,
+"ip-ttl");
+cmdline_parse_token_num_t cmd_set_l2tpv2_ip_ttl_value =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_l2tpv2_result, ttl, RTE_UINT8);
+cmdline_parse_token_string_t cmd_set_l2tpv2_ip_src =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, pos_token,
+"ip-src");
+cmdline_parse_token_ipaddr_t cmd_set_l2tpv2_ip_src_value =
+   TOKEN_IPADDR_INITIALIZER(struct cmd_set_l2tpv2_result, ip_src);
+cmdline_parse_token_string_t cmd_set_l2tpv2_ip_dst =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, pos_token,
+"ip-dst");
+cmdline_parse_token_ipaddr_t cmd_set_l2tpv2_ip_dst_value =
+   TOKEN_IPADDR_INITIALIZER(struct cmd_set_l2tpv2_result, ip_dst);
+cmdline_parse_token_string_t cmd_set_l2tpv2_vlan =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_l2tpv2_result, pos_token,
+"vlan-tci");
+cmdline_parse_token_nu

Re: [dpdk-dev] [PATCH v25 1/6] dmadev: introduce DMA device library

2021-10-15 Thread fengchengwen
On 2021/10/15 16:29, Thomas Monjalon wrote:
> 13/10/2021 09:41, Thomas Monjalon:
>> 13/10/2021 02:21, fengchengwen:
>>> On 2021/10/13 3:09, Thomas Monjalon wrote:
 11/10/2021 09:33, Chengwen Feng:
> +static void
> +dma_release(struct rte_dma_dev *dev)
> +{
> + rte_free(dev->dev_private);
> + memset(dev, 0, sizeof(struct rte_dma_dev));
> +}
>> [...]
> +int
> +rte_dma_pmd_release(const char *name)
> +{
> + struct rte_dma_dev *dev;
> +
> + if (dma_check_name(name) != 0)
> + return -EINVAL;
> +
> + dev = dma_find_by_name(name);
> + if (dev == NULL)
> + return -EINVAL;
> +
> + dma_release(dev);
> + return 0;
> +}

 Trying to understand the logic of creation/destroy.
 skeldma_probe
 \-> skeldma_create
 \-> rte_dma_pmd_allocate
 \-> dma_allocate
 \-> dma_data_prepare
 \-> dma_dev_data_prepare
 skeldma_remove
 \-> skeldma_destroy
 \-> rte_dma_pmd_release
 \-> dma_release
>>>
>>> This patch only provide device allocate function, the 2st patch provide 
>>> extra logic:
>>>
>>> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
>>> index 42a4693bd9..a6a5680d2b 100644
>>> --- a/lib/dmadev/rte_dmadev.c
>>> +++ b/lib/dmadev/rte_dmadev.c
>>> @@ -201,6 +201,9 @@ rte_dma_pmd_release(const char *name)
>>> if (dev == NULL)
>>> return -EINVAL;
>>>
>>> +   if (dev->state == RTE_DMA_DEV_READY)
>>> +   return rte_dma_close(dev->dev_id);
>>> +
>>> dma_release(dev);
>>> return 0;
>>>  }
>>>
>>> So the skeldma remove will be:
>>>
>>>  skeldma_remove
>>>  \-> skeldma_destroy
>>>  \-> rte_dma_pmd_release
>>>  \-> rte_dma_close
>>>  \-> dma_release
>>
>> OK, in this case, no need to dma_release from rte_dma_pmd_release,
>> because it is already called from rte_dma_close.
> 
> Ping for reply please.

Sorry, I think the previous reply was enough, Let me explain:

The PMD use following logic create dmadev:
  skeldma_probe
\-> skeldma_create
  \-> rte_dma_pmd_allocate
\-> dma_allocate
  \-> mark dmadev state to READY.

The PMD remove will be:
 skeldma_remove
  \-> skeldma_destroy
  \-> rte_dma_pmd_release
  \-> rte_dma_close
  \-> dma_release

The application close dmadev:
  rte_dma_close
   \-> dma_release

in the above case, the PMD remove and application close both call rte_dma_close,
I think that's what you expect.


skeldma is simple, please let me give you a complicated example:
  hisi_dma_probe
\-> hisi_dma_create
  \-> rte_dma_pmd_allocate
\-> dma_allocate
  \-> hisi_hw_init
\-> if init fail, call rte_dma_pmd_release.
\-> dma_release
\-> if init OK, mark dmadev state to READY.

as you can see, if hisi_hw_init fail, it call rte_dma_pmd_release to release 
dmadev,
it will direct call dma_release.
if hisi_hw_init success, it mean the hardware also OK, then mark dmadev state to
READY. if PMD remove the dmadev it will call rte_dma_close because the dmadev's 
state
is READY, and the application could also call rte_dma_close to destroy dmadev.


The rte_dma_pmd_release take two function:
1. if the dmadev's hardware part init fail, the PMD could use this function 
release the
dmadev.
2. if the dmadev's hardware part init success, the PMD could use this function 
destroy
the dmadev.


If we don't have the rte_dma_pmd_release function, we should export dma_release 
function
which invoked when the hardware init fail.

And if we keep rte_dma_pmd_release, it correspond the rte_dma_pmd_allocate, the 
PMD just
invoke rte_dma_pmd_release to handle above both cases (hardware part init fail 
when probe
and remove phase).

Thanks.

> 
> 
> 
> 
> .
> 



[dpdk-dev] [Bug 827] unit_tests_eal/link_bonding:Failed to execute link_bonding_autotest

2021-10-15 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=827

xuemi...@nvidia.com (xuemi...@nvidia.com) changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||xuemi...@nvidia.com
 Resolution|--- |FIXED

--- Comment #1 from xuemi...@nvidia.com (xuemi...@nvidia.com) ---
Sorry, my bug. Fixed here:
https://patches.dpdk.org/project/dpdk/patch/20211015095548.1614980-1-xuemi...@nvidia.com/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[dpdk-dev] 回复: [PATCH v2 1/1] devtools: add relative path support for ABI compatibility check

2021-10-15 Thread Feifei Wang
Hi,

Sorry to disturb you. Have more comments for this patch or if it can be applied?
Thanks very much.

 Best Regards
Feifei

> -邮件原件-
> 发件人: Feifei Wang 
> 发送时间: Wednesday, August 11, 2021 2:17 PM
> 收件人: Bruce Richardson 
> 抄送: dev@dpdk.org; nd ; Phil Yang ;
> Feifei Wang ; Juraj Linkeš
> ; Ruifeng Wang 
> 主题: [PATCH v2 1/1] devtools: add relative path support for ABI compatibility
> check
> 
> From: Phil Yang 
> 
> Because dpdk guide does not limit the relative path for ABI compatibility
> check, users maybe set 'DPDK_ABI_REF_DIR' as a relative
> path:
> 
> ~/dpdk/devtools$ DPDK_ABI_REF_VERSION=v19.11
> DPDK_ABI_REF_DIR=build-gcc-shared ./test-meson-builds.sh
> 
> And if the DESTDIR is not an absolute path, ninja complains:
> + install_target build-gcc-shared/v19.11/build
> + build-gcc-shared/v19.11/build-gcc-shared
> + rm -rf build-gcc-shared/v19.11/build-gcc-shared
> + echo 'DESTDIR=build-gcc-shared/v19.11/build-gcc-shared ninja -C build-gcc-
> shared/v19.11/build install'
> + DESTDIR=build-gcc-shared/v19.11/build-gcc-shared
> + ninja -C build-gcc-shared/v19.11/build install
> ...
> ValueError: dst_dir must be absolute, got build-gcc-shared/v19.11/build-gcc-
> shared/usr/local/share/dpdk/
> examples/bbdev_app
> ...
> Error: install directory 'build-gcc-shared/v19.11/build-gcc-shared' does not
> exist.
> 
> To fix this, add relative path support using 'readlink -f'.
> 
> Signed-off-by: Phil Yang 
> Signed-off-by: Feifei Wang 
> Reviewed-by: Juraj Linkeš 
> Reviewed-by: Ruifeng Wang 
> Acked-by: Bruce Richardson 
> ---
>  devtools/test-meson-builds.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
> index 9ec8e2bc7e..8ddde95276 100755
> --- a/devtools/test-meson-builds.sh
> +++ b/devtools/test-meson-builds.sh
> @@ -168,7 +168,8 @@ build () #check> [meson options]
>   config $srcdir $builds_dir/$targetdir $cross --werror $*
>   compile $builds_dir/$targetdir
>   if [ -n "$DPDK_ABI_REF_VERSION" -a "$abicheck" = ABI ] ; then
> - abirefdir=${DPDK_ABI_REF_DIR:-
> reference}/$DPDK_ABI_REF_VERSION
> + abirefdir=$(readlink -f \
> + ${DPDK_ABI_REF_DIR:-
> reference}/$DPDK_ABI_REF_VERSION)
>   if [ ! -d $abirefdir/$targetdir ]; then
>   # clone current sources
>   if [ ! -d $abirefdir/src ]; then
> --
> 2.25.1



[dpdk-dev] 回复: [PATCH v1 0/2] replace tight loop with wait until equal api

2021-10-15 Thread Feifei Wang
Hi,

Would you please help review this patch series?
Thanks very much.

Best Regards
Feifei
> -邮件原件-
> 发件人: Feifei Wang 
> 发送时间: Wednesday, August 25, 2021 4:01 PM
> 抄送: dev@dpdk.org; nd ; Feifei Wang
> 
> 主题: [PATCH v1 0/2] replace tight loop with wait until equal api
> 
> For dpdk/lib, directly use wait_until_equal API to replace tight loop.
> 
> Feifei Wang (2):
>   eal/common: use wait until equal API for tight loop
>   mcslock: use wait until equal API for tight loop
> 
>  lib/eal/common/eal_common_mcfg.c  | 3 +--
>  lib/eal/include/generic/rte_mcslock.h | 4 ++--
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 
> --
> 2.25.1



[dpdk-dev] [PATCH v8 0/7] iavf: add iAVF IPsec inline crypto support

2021-10-15 Thread Radu Nicolau
Add support for inline crypto for IPsec, for ESP transport and
tunnel over IPv4 and IPv6, as well as supporting the offload for
ESP over UDP, and inconjunction with TSO for UDP and TCP flows.

Depends on series "new features for ipsec and security libraries"
https://patchwork.dpdk.org/project/dpdk/list/?series=19658


Radu Nicolau (7):
  common/iavf: add iAVF IPsec inline crypto support
  net/iavf: rework tx path
  net/iavf: add support for asynchronous virt channel messages
  net/iavf: add iAVF IPsec inline crypto support
  net/iavf: add xstats support for inline IPsec crypto
  net/iavf: add watchdog for VFLR
  net/iavf: update doc with inline crypto support

 doc/guides/nics/features/iavf.ini |2 +
 doc/guides/nics/intel_vf.rst  |   10 +
 doc/guides/rel_notes/release_21_11.rst|1 +
 drivers/common/iavf/iavf_type.h   |1 +
 drivers/common/iavf/virtchnl.h|   17 +-
 drivers/common/iavf/virtchnl_inline_ipsec.h   |  553 +
 drivers/net/iavf/iavf.h   |   52 +-
 drivers/net/iavf/iavf_ethdev.c|  219 +-
 drivers/net/iavf/iavf_generic_flow.c  |   15 +
 drivers/net/iavf/iavf_generic_flow.h  |2 +
 drivers/net/iavf/iavf_ipsec_crypto.c  | 1891 +
 drivers/net/iavf/iavf_ipsec_crypto.h  |  160 ++
 .../net/iavf/iavf_ipsec_crypto_capabilities.h |  383 
 drivers/net/iavf/iavf_rxtx.c  |  710 +--
 drivers/net/iavf/iavf_rxtx.h  |  198 +-
 drivers/net/iavf/iavf_rxtx_vec_sse.c  |   10 +-
 drivers/net/iavf/iavf_vchnl.c |  168 +-
 drivers/net/iavf/meson.build  |3 +-
 drivers/net/iavf/rte_pmd_iavf.h   |1 +
 drivers/net/iavf/version.map  |3 +
 20 files changed, 4088 insertions(+), 311 deletions(-)
 create mode 100644 drivers/common/iavf/virtchnl_inline_ipsec.h
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto.c
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto.h
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto_capabilities.h

-- 
v2: small updates and fixes in the flow related section
v3: split the huge patch and address feedback
v4: small changes due to dependencies changes
v5: updated the watchdow patch
v6: rebased and updated the common section
v7: fixed TSO issue and disabled watchdog by default
v8: rebased to next-net-intel and added doc updates

2.25.1



[dpdk-dev] [PATCH v8 1/7] common/iavf: add iAVF IPsec inline crypto support

2021-10-15 Thread Radu Nicolau
Add support for inline crypto for IPsec.

Signed-off-by: Declan Doherty 
Signed-off-by: Abhijit Sinha 
Signed-off-by: Radu Nicolau 
---
 drivers/common/iavf/iavf_type.h |   1 +
 drivers/common/iavf/virtchnl.h  |  17 +-
 drivers/common/iavf/virtchnl_inline_ipsec.h | 553 
 3 files changed, 569 insertions(+), 2 deletions(-)
 create mode 100644 drivers/common/iavf/virtchnl_inline_ipsec.h

diff --git a/drivers/common/iavf/iavf_type.h b/drivers/common/iavf/iavf_type.h
index 73dfb47e70..51267ca3b3 100644
--- a/drivers/common/iavf/iavf_type.h
+++ b/drivers/common/iavf/iavf_type.h
@@ -723,6 +723,7 @@ enum iavf_tx_desc_dtype_value {
IAVF_TX_DESC_DTYPE_NOP  = 0x1, /* same as Context desc */
IAVF_TX_DESC_DTYPE_CONTEXT  = 0x1,
IAVF_TX_DESC_DTYPE_FCOE_CTX = 0x2,
+   IAVF_TX_DESC_DTYPE_IPSEC= 0x3,
IAVF_TX_DESC_DTYPE_FILTER_PROG  = 0x8,
IAVF_TX_DESC_DTYPE_DDP_CTX  = 0x9,
IAVF_TX_DESC_DTYPE_FLEX_DATA= 0xB,
diff --git a/drivers/common/iavf/virtchnl.h b/drivers/common/iavf/virtchnl.h
index 067f715945..269578f7c0 100644
--- a/drivers/common/iavf/virtchnl.h
+++ b/drivers/common/iavf/virtchnl.h
@@ -38,6 +38,8 @@
  * value in current and future projects
  */
 
+#include "virtchnl_inline_ipsec.h"
+
 /* Error Codes */
 enum virtchnl_status_code {
VIRTCHNL_STATUS_SUCCESS = 0,
@@ -133,7 +135,8 @@ enum virtchnl_ops {
VIRTCHNL_OP_DISABLE_CHANNELS = 31,
VIRTCHNL_OP_ADD_CLOUD_FILTER = 32,
VIRTCHNL_OP_DEL_CLOUD_FILTER = 33,
-   /* opcodes 34, 35, 36, and 37 are reserved */
+   VIRTCHNL_OP_INLINE_IPSEC_CRYPTO = 34,
+   /* opcodes 35 and 36 are reserved */
VIRTCHNL_OP_DCF_CONFIG_BW = 37,
VIRTCHNL_OP_DCF_VLAN_OFFLOAD = 38,
VIRTCHNL_OP_DCF_CMD_DESC = 39,
@@ -225,6 +228,8 @@ static inline const char *virtchnl_op_str(enum virtchnl_ops 
v_opcode)
return "VIRTCHNL_OP_ADD_CLOUD_FILTER";
case VIRTCHNL_OP_DEL_CLOUD_FILTER:
return "VIRTCHNL_OP_DEL_CLOUD_FILTER";
+   case VIRTCHNL_OP_INLINE_IPSEC_CRYPTO:
+   return "VIRTCHNL_OP_INLINE_IPSEC_CRYPTO";
case VIRTCHNL_OP_DCF_CMD_DESC:
return "VIRTCHNL_OP_DCF_CMD_DESC";
case VIRTCHNL_OP_DCF_CMD_BUFF:
@@ -385,7 +390,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 #define VIRTCHNL_VF_OFFLOAD_REQ_QUEUES BIT(6)
 /* used to negotiate communicating link speeds in Mbps */
 #define VIRTCHNL_VF_CAP_ADV_LINK_SPEED BIT(7)
-   /* BIT(8) is reserved */
+#define VIRTCHNL_VF_OFFLOAD_INLINE_IPSEC_CRYPTOBIT(8)
 #define VIRTCHNL_VF_LARGE_NUM_QPAIRS   BIT(9)
 #define VIRTCHNL_VF_OFFLOAD_CRCBIT(10)
 #define VIRTCHNL_VF_OFFLOAD_VLAN_V2BIT(15)
@@ -2291,6 +2296,14 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info 
*ver, u32 v_opcode,
  sizeof(struct virtchnl_queue_vector);
}
break;
+
+   case VIRTCHNL_OP_INLINE_IPSEC_CRYPTO:
+   {
+   struct inline_ipsec_msg *iim = (struct inline_ipsec_msg *)msg;
+   valid_len =
+   virtchnl_inline_ipsec_val_msg_len(iim->ipsec_opcode);
+   break;
+   }
/* These are always errors coming from the VF. */
case VIRTCHNL_OP_EVENT:
case VIRTCHNL_OP_UNKNOWN:
diff --git a/drivers/common/iavf/virtchnl_inline_ipsec.h 
b/drivers/common/iavf/virtchnl_inline_ipsec.h
new file mode 100644
index 00..1e9134501e
--- /dev/null
+++ b/drivers/common/iavf/virtchnl_inline_ipsec.h
@@ -0,0 +1,553 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2001-2021 Intel Corporation
+ */
+
+#ifndef _VIRTCHNL_INLINE_IPSEC_H_
+#define _VIRTCHNL_INLINE_IPSEC_H_
+
+#define VIRTCHNL_IPSEC_MAX_CRYPTO_CAP_NUM  3
+#define VIRTCHNL_IPSEC_MAX_ALGO_CAP_NUM16
+#define VIRTCHNL_IPSEC_MAX_TX_DESC_NUM 128
+#define VIRTCHNL_IPSEC_MAX_CRYPTO_ITEM_NUMBER  2
+#define VIRTCHNL_IPSEC_MAX_KEY_LEN 128
+#define VIRTCHNL_IPSEC_MAX_SA_DESTROY_NUM  8
+#define VIRTCHNL_IPSEC_SA_DESTROY  0
+#define VIRTCHNL_IPSEC_BROADCAST_VFID  0x
+#define VIRTCHNL_IPSEC_INVALID_REQ_ID  0x
+#define VIRTCHNL_IPSEC_INVALID_SA_CFG_RESP 0x
+#define VIRTCHNL_IPSEC_INVALID_SP_CFG_RESP 0x
+
+/* crypto type */
+#define VIRTCHNL_AUTH  1
+#define VIRTCHNL_CIPHER2
+#define VIRTCHNL_AEAD  3
+
+/* caps enabled */
+#define VIRTCHNL_IPSEC_ESN_ENA BIT(0)
+#define VIRTCHNL_IPSEC_UDP_ENCAP_ENA   BIT(1)
+#define VIRTCHNL_IPSEC_SA_INDEX_SW_ENA BIT(2)
+#define VIRTCHNL_IPSEC_AUDIT_ENA   BIT(3)
+#define VIRTCHNL_IPSEC_BYTE_LIMIT_ENA  BIT(4)
+#define VIRTCHNL_IPSEC_DROP_ON_AUTH_FAIL_ENA   BIT(5)
+#define VIRTCHNL_IPSEC_ARW_CHECK_ENA  

[dpdk-dev] [PATCH v8 2/7] net/iavf: rework tx path

2021-10-15 Thread Radu Nicolau
Rework the TX path and TX descriptor usage in order to
allow for better use of oflload flags and to facilitate enabling of
inline crypto offload feature.

Signed-off-by: Declan Doherty 
Signed-off-by: Abhijit Sinha 
Signed-off-by: Radu Nicolau 
Acked-by: Jingjing Wu 
---
 drivers/net/iavf/iavf_rxtx.c | 538 ---
 drivers/net/iavf/iavf_rxtx.h | 117 +-
 drivers/net/iavf/iavf_rxtx_vec_sse.c |  10 +-
 3 files changed, 431 insertions(+), 234 deletions(-)

diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index 88661e5d74..8a73c929dc 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -1054,27 +1054,31 @@ iavf_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile 
union iavf_rx_desc *rxdp)
 
 static inline void
 iavf_flex_rxd_to_vlan_tci(struct rte_mbuf *mb,
- volatile union iavf_rx_flex_desc *rxdp,
- uint8_t rx_flags)
+ volatile union iavf_rx_flex_desc *rxdp)
 {
-   uint16_t vlan_tci = 0;
-
-   if (rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG1 &&
-   rte_le_to_cpu_64(rxdp->wb.status_error0) &
-   (1 << IAVF_RX_FLEX_DESC_STATUS0_L2TAG1P_S))
-   vlan_tci = rte_le_to_cpu_16(rxdp->wb.l2tag1);
+   if (rte_le_to_cpu_64(rxdp->wb.status_error0) &
+   (1 << IAVF_RX_FLEX_DESC_STATUS0_L2TAG1P_S)) {
+   mb->ol_flags |= PKT_RX_VLAN | PKT_RX_VLAN_STRIPPED;
+   mb->vlan_tci =
+   rte_le_to_cpu_16(rxdp->wb.l2tag1);
+   } else {
+   mb->vlan_tci = 0;
+   }
 
 #ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
-   if (rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2 &&
-   rte_le_to_cpu_16(rxdp->wb.status_error1) &
-   (1 << IAVF_RX_FLEX_DESC_STATUS1_L2TAG2P_S))
-   vlan_tci = rte_le_to_cpu_16(rxdp->wb.l2tag2_2nd);
-#endif
-
-   if (vlan_tci) {
-   mb->ol_flags |= PKT_RX_VLAN | PKT_RX_VLAN_STRIPPED;
-   mb->vlan_tci = vlan_tci;
+   if (rte_le_to_cpu_16(rxdp->wb.status_error1) &
+   (1 << IAVF_RX_FLEX_DESC_STATUS1_L2TAG2P_S)) {
+   mb->ol_flags |= PKT_RX_QINQ_STRIPPED | PKT_RX_QINQ |
+   PKT_RX_VLAN_STRIPPED | PKT_RX_VLAN;
+   mb->vlan_tci_outer = mb->vlan_tci;
+   mb->vlan_tci = rte_le_to_cpu_16(rxdp->wb.l2tag2_2nd);
+   PMD_RX_LOG(DEBUG, "Descriptor l2tag2_1: %u, l2tag2_2: %u",
+  rte_le_to_cpu_16(rxdp->wb.l2tag2_1st),
+  rte_le_to_cpu_16(rxdp->wb.l2tag2_2nd));
+   } else {
+   mb->vlan_tci_outer = 0;
}
+#endif
 }
 
 /* Translate the rx descriptor status and error fields to pkt flags */
@@ -1394,7 +1398,7 @@ iavf_recv_pkts_flex_rxd(void *rx_queue,
rxm->ol_flags = 0;
rxm->packet_type = ptype_tbl[IAVF_RX_FLEX_DESC_PTYPE_M &
rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
-   iavf_flex_rxd_to_vlan_tci(rxm, &rxd, rxq->rx_flags);
+   iavf_flex_rxd_to_vlan_tci(rxm, &rxd);
rxq->rxd_to_pkt_fields(rxq, rxm, &rxd);
pkt_flags = iavf_flex_rxd_error_to_pkt_flags(rx_stat_err0);
rxm->ol_flags |= pkt_flags;
@@ -1536,7 +1540,7 @@ iavf_recv_scattered_pkts_flex_rxd(void *rx_queue, struct 
rte_mbuf **rx_pkts,
first_seg->ol_flags = 0;
first_seg->packet_type = ptype_tbl[IAVF_RX_FLEX_DESC_PTYPE_M &
rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
-   iavf_flex_rxd_to_vlan_tci(first_seg, &rxd, rxq->rx_flags);
+   iavf_flex_rxd_to_vlan_tci(first_seg, &rxd);
rxq->rxd_to_pkt_fields(rxq, first_seg, &rxd);
pkt_flags = iavf_flex_rxd_error_to_pkt_flags(rx_stat_err0);
 
@@ -1774,7 +1778,7 @@ iavf_rx_scan_hw_ring_flex_rxd(struct iavf_rx_queue *rxq)
 
mb->packet_type = ptype_tbl[IAVF_RX_FLEX_DESC_PTYPE_M &
rte_le_to_cpu_16(rxdp[j].wb.ptype_flex_flags0)];
-   iavf_flex_rxd_to_vlan_tci(mb, &rxdp[j], rxq->rx_flags);
+   iavf_flex_rxd_to_vlan_tci(mb, &rxdp[j]);
rxq->rxd_to_pkt_fields(rxq, mb, &rxdp[j]);
stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
pkt_flags = iavf_flex_rxd_error_to_pkt_flags(stat_err0);
@@ -2068,190 +2072,302 @@ iavf_xmit_cleanup(struct iavf_tx_queue *txq)
return 0;
 }
 
-/* Check if the context descriptor is needed for TX offloading */
+
+
+static inline void
+iavf_fill_ctx_desc_cmd_field(volatile uint64_t *field, struct rte_mbuf *m)
+{
+   uint64_t cmd = 0;
+
+   /* TSO enabled */
+   if (m->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_UDP_SEG))
+   cmd = IAVF_TX_CTX_DESC_TSO << IAVF_TXD_DATA_QW1_CMD_SHIFT;
+
+   /* Time Sync - Currently not suppo

[dpdk-dev] [PATCH v8 3/7] net/iavf: add support for asynchronous virt channel messages

2021-10-15 Thread Radu Nicolau
Add support for asynchronous virtual channel messages, specifically for
inline IPsec messages.

Signed-off-by: Declan Doherty 
Signed-off-by: Abhijit Sinha 
Signed-off-by: Radu Nicolau 
Acked-by: Jingjing Wu 
---
 drivers/net/iavf/iavf.h   |  16 
 drivers/net/iavf/iavf_vchnl.c | 138 +-
 2 files changed, 101 insertions(+), 53 deletions(-)

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index 34bfa9af47..67051f29a8 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -193,6 +193,7 @@ struct iavf_info {
uint64_t supported_rxdid;
uint8_t *proto_xtr; /* proto xtr type for all queues */
volatile enum virtchnl_ops pend_cmd; /* pending command not finished */
+   rte_atomic32_t pend_cmd_count;
int cmd_retval; /* return value of the cmd response from PF */
uint8_t *aq_resp; /* buffer to store the adminq response from PF */
 
@@ -345,9 +346,24 @@ _atomic_set_cmd(struct iavf_info *vf, enum virtchnl_ops 
ops)
if (!ret)
PMD_DRV_LOG(ERR, "There is incomplete cmd %d", vf->pend_cmd);
 
+   rte_atomic32_set(&vf->pend_cmd_count, 1);
+
return !ret;
 }
 
+/* Check there is pending cmd in execution. If none, set new command. */
+static inline int
+_atomic_set_async_response_cmd(struct iavf_info *vf, enum virtchnl_ops ops)
+{
+   int ret = rte_atomic32_cmpset(&vf->pend_cmd, VIRTCHNL_OP_UNKNOWN, ops);
+
+   if (!ret)
+   PMD_DRV_LOG(ERR, "There is incomplete cmd %d", vf->pend_cmd);
+
+   rte_atomic32_set(&vf->pend_cmd_count, 2);
+
+   return !ret;
+}
 int iavf_check_api_version(struct iavf_adapter *adapter);
 int iavf_get_vf_resource(struct iavf_adapter *adapter);
 void iavf_handle_virtchnl_msg(struct rte_eth_dev *dev);
diff --git a/drivers/net/iavf/iavf_vchnl.c b/drivers/net/iavf/iavf_vchnl.c
index 0f4dd21d44..da4654957a 100644
--- a/drivers/net/iavf/iavf_vchnl.c
+++ b/drivers/net/iavf/iavf_vchnl.c
@@ -24,8 +24,8 @@
 #include "iavf.h"
 #include "iavf_rxtx.h"
 
-#define MAX_TRY_TIMES 200
-#define ASQ_DELAY_MS  10
+#define MAX_TRY_TIMES 2000
+#define ASQ_DELAY_MS  1
 
 static uint32_t
 iavf_convert_link_speed(enum virtchnl_link_speed virt_link_speed)
@@ -143,7 +143,8 @@ iavf_read_msg_from_pf(struct iavf_adapter *adapter, 
uint16_t buf_len,
 }
 
 static int
-iavf_execute_vf_cmd(struct iavf_adapter *adapter, struct iavf_cmd_info *args)
+iavf_execute_vf_cmd(struct iavf_adapter *adapter, struct iavf_cmd_info *args,
+   int async)
 {
struct iavf_hw *hw = IAVF_DEV_PRIVATE_TO_HW(adapter);
struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(adapter);
@@ -155,8 +156,14 @@ iavf_execute_vf_cmd(struct iavf_adapter *adapter, struct 
iavf_cmd_info *args)
if (vf->vf_reset)
return -EIO;
 
-   if (_atomic_set_cmd(vf, args->ops))
-   return -1;
+
+   if (async) {
+   if (_atomic_set_async_response_cmd(vf, args->ops))
+   return -1;
+   } else {
+   if (_atomic_set_cmd(vf, args->ops))
+   return -1;
+   }
 
ret = iavf_aq_send_msg_to_pf(hw, args->ops, IAVF_SUCCESS,
args->in_args, args->in_args_size, NULL);
@@ -252,9 +259,11 @@ static void
 iavf_handle_pf_event_msg(struct rte_eth_dev *dev, uint8_t *msg,
uint16_t msglen)
 {
+   struct iavf_adapter *adapter =
+   IAVF_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+   struct iavf_info *vf = &adapter->vf;
struct virtchnl_pf_event *pf_msg =
(struct virtchnl_pf_event *)msg;
-   struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
 
if (msglen < sizeof(struct virtchnl_pf_event)) {
PMD_DRV_LOG(DEBUG, "Error event");
@@ -330,18 +339,40 @@ iavf_handle_virtchnl_msg(struct rte_eth_dev *dev)
case iavf_aqc_opc_send_msg_to_vf:
if (msg_opc == VIRTCHNL_OP_EVENT) {
iavf_handle_pf_event_msg(dev, info.msg_buf,
-   info.msg_len);
+   info.msg_len);
} else {
+   /* check for inline IPsec events */
+   struct inline_ipsec_msg *imsg =
+   (struct inline_ipsec_msg *)info.msg_buf;
+   struct rte_eth_event_ipsec_desc desc;
+   if (msg_opc == VIRTCHNL_OP_INLINE_IPSEC_CRYPTO
+   && imsg->ipsec_opcode ==
+   INLINE_IPSEC_OP_EVENT) {
+   struct virtchnl_ipsec_event *ev =
+   imsg->ipsec_data.event;
+   desc.subtype =
+ 

[dpdk-dev] [PATCH v8 4/7] net/iavf: add iAVF IPsec inline crypto support

2021-10-15 Thread Radu Nicolau
Add support for inline crypto for IPsec, for ESP transport and
tunnel over IPv4 and IPv6, as well as supporting the offload for
ESP over UDP, and inconjunction with TSO for UDP and TCP flows.
Implement support for rte_security packet metadata

Add definition for IPsec descriptors, extend support for offload
in data and context descriptor to support

Add support to virtual channel mailbox for IPsec Crypto request
operations. IPsec Crypto requests receive an initial acknowledgement
from phsyical function driver of receipt of request and then an
asynchronous response with success/failure of request including any
response data.

Add enhanced descriptor debugging

Refactor of scalar tx burst function to support integration of offload

Signed-off-by: Declan Doherty 
Signed-off-by: Abhijit Sinha 
Signed-off-by: Radu Nicolau 
Reviewed-by: Jingjing Wu 
---
 drivers/net/iavf/iavf.h   |   10 +
 drivers/net/iavf/iavf_ethdev.c|   41 +-
 drivers/net/iavf/iavf_generic_flow.c  |   15 +
 drivers/net/iavf/iavf_generic_flow.h  |2 +
 drivers/net/iavf/iavf_ipsec_crypto.c  | 1891 +
 drivers/net/iavf/iavf_ipsec_crypto.h  |  160 ++
 .../net/iavf/iavf_ipsec_crypto_capabilities.h |  383 
 drivers/net/iavf/iavf_rxtx.c  |  202 +-
 drivers/net/iavf/iavf_rxtx.h  |   93 +-
 drivers/net/iavf/iavf_vchnl.c |   30 +
 drivers/net/iavf/meson.build  |3 +-
 drivers/net/iavf/rte_pmd_iavf.h   |1 +
 drivers/net/iavf/version.map  |3 +
 13 files changed, 2813 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto.c
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto.h
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto_capabilities.h

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index 67051f29a8..e98c42ba08 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -221,6 +221,7 @@ struct iavf_info {
rte_spinlock_t flow_ops_lock;
struct iavf_parser_list rss_parser_list;
struct iavf_parser_list dist_parser_list;
+   struct iavf_parser_list ipsec_crypto_parser_list;
 
struct iavf_fdir_info fdir; /* flow director info */
/* indicate large VF support enabled or not */
@@ -245,6 +246,7 @@ enum iavf_proto_xtr_type {
IAVF_PROTO_XTR_IPV6_FLOW,
IAVF_PROTO_XTR_TCP,
IAVF_PROTO_XTR_IP_OFFSET,
+   IAVF_PROTO_XTR_IPSEC_CRYPTO_SAID,
IAVF_PROTO_XTR_MAX,
 };
 
@@ -256,11 +258,14 @@ struct iavf_devargs {
uint8_t proto_xtr[IAVF_MAX_QUEUE_NUM];
 };
 
+struct iavf_security_ctx;
+
 /* Structure to store private data for each VF instance. */
 struct iavf_adapter {
struct iavf_hw hw;
struct rte_eth_dev_data *dev_data;
struct iavf_info vf;
+   struct iavf_security_ctx *security_ctx;
 
bool rx_bulk_alloc_allowed;
/* For vector PMD */
@@ -279,6 +284,8 @@ struct iavf_adapter {
(&((struct iavf_adapter *)adapter)->vf)
 #define IAVF_DEV_PRIVATE_TO_HW(adapter) \
(&((struct iavf_adapter *)adapter)->hw)
+#define IAVF_DEV_PRIVATE_TO_IAVF_SECURITY_CTX(adapter) \
+   (((struct iavf_adapter *)adapter)->security_ctx)
 
 /* IAVF_VSI_TO */
 #define IAVF_VSI_TO_HW(vsi) \
@@ -421,5 +428,8 @@ int iavf_set_q_tc_map(struct rte_eth_dev *dev,
uint16_t size);
 void iavf_tm_conf_init(struct rte_eth_dev *dev);
 void iavf_tm_conf_uninit(struct rte_eth_dev *dev);
+int iavf_ipsec_crypto_request(struct iavf_adapter *adapter,
+   uint8_t *msg, size_t msg_len,
+   uint8_t *resp_msg, size_t resp_msg_len);
 extern const struct rte_tm_ops iavf_tm_ops;
 #endif /* _IAVF_ETHDEV_H_ */
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index 7e4d256122..6663e923db 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -30,6 +30,7 @@
 #include "iavf_rxtx.h"
 #include "iavf_generic_flow.h"
 #include "rte_pmd_iavf.h"
+#include "iavf_ipsec_crypto.h"
 
 /* devargs */
 #define IAVF_PROTO_XTR_ARG "proto_xtr"
@@ -71,6 +72,11 @@ static struct iavf_proto_xtr_ol iavf_proto_xtr_params[] = {
[IAVF_PROTO_XTR_IP_OFFSET] = {
.param = { .name = "intel_pmd_dynflag_proto_xtr_ip_offset" },
.ol_flag = &rte_pmd_ifd_dynflag_proto_xtr_ip_offset_mask },
+   [IAVF_PROTO_XTR_IPSEC_CRYPTO_SAID] = {
+   .param = {
+   .name = "intel_pmd_dynflag_proto_xtr_ipsec_crypto_said" },
+   .ol_flag =
+   &rte_pmd_ifd_dynflag_proto_xtr_ipsec_crypto_said_mask },
 };
 
 static int iavf_dev_configure(struct rte_eth_dev *dev);
@@ -938,6 +944,9 @@ iavf_dev_stop(struct rte_eth_dev *dev)
iavf_add_del_mc_addr_list(adapter, vf->mc_addrs, vf->mc_addrs_num,
  false);
 
+   /* free iAVF security device conte

[dpdk-dev] [PATCH v8 5/7] net/iavf: add xstats support for inline IPsec crypto

2021-10-15 Thread Radu Nicolau
Add per queue counters for maintaining statistics for inline IPsec
crypto offload, which can be retrieved through the
rte_security_session_stats_get() with more detailed errors through the
rte_ethdev xstats.

Signed-off-by: Declan Doherty 
Signed-off-by: Radu Nicolau 
Acked-by: Jingjing Wu 
---
 drivers/net/iavf/iavf.h| 21 -
 drivers/net/iavf/iavf_ethdev.c | 84 --
 drivers/net/iavf/iavf_rxtx.h   | 12 -
 3 files changed, 89 insertions(+), 28 deletions(-)

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index e98c42ba08..90a7344bd5 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -96,6 +96,25 @@ struct iavf_adapter;
 struct iavf_rx_queue;
 struct iavf_tx_queue;
 
+
+struct iavf_ipsec_crypto_stats {
+   uint64_t icount;
+   uint64_t ibytes;
+   struct {
+   uint64_t count;
+   uint64_t sad_miss;
+   uint64_t not_processed;
+   uint64_t icv_check;
+   uint64_t ipsec_length;
+   uint64_t misc;
+   } ierrors;
+};
+
+struct iavf_eth_xstats {
+   struct virtchnl_eth_stats eth_stats;
+   struct iavf_ipsec_crypto_stats ips_stats;
+};
+
 /* Structure that defines a VSI, associated with a adapter. */
 struct iavf_vsi {
struct iavf_adapter *adapter; /* Backreference to associated adapter */
@@ -105,7 +124,7 @@ struct iavf_vsi {
uint16_t max_macaddrs;   /* Maximum number of MAC addresses */
uint16_t base_vector;
uint16_t msix_intr;  /* The MSIX interrupt binds to VSI */
-   struct virtchnl_eth_stats eth_stats_offset;
+   struct iavf_eth_xstats eth_stats_offset;
 };
 
 struct rte_flow;
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index 6663e923db..8f35107f3a 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -90,6 +90,7 @@ static const uint32_t *iavf_dev_supported_ptypes_get(struct 
rte_eth_dev *dev);
 static int iavf_dev_stats_get(struct rte_eth_dev *dev,
 struct rte_eth_stats *stats);
 static int iavf_dev_stats_reset(struct rte_eth_dev *dev);
+static int iavf_dev_xstats_reset(struct rte_eth_dev *dev);
 static int iavf_dev_xstats_get(struct rte_eth_dev *dev,
 struct rte_eth_xstat *xstats, unsigned int n);
 static int iavf_dev_xstats_get_names(struct rte_eth_dev *dev,
@@ -145,21 +146,37 @@ struct rte_iavf_xstats_name_off {
unsigned int offset;
 };
 
+#define _OFF_OF(a) offsetof(struct iavf_eth_xstats, a)
 static const struct rte_iavf_xstats_name_off rte_iavf_stats_strings[] = {
-   {"rx_bytes", offsetof(struct iavf_eth_stats, rx_bytes)},
-   {"rx_unicast_packets", offsetof(struct iavf_eth_stats, rx_unicast)},
-   {"rx_multicast_packets", offsetof(struct iavf_eth_stats, rx_multicast)},
-   {"rx_broadcast_packets", offsetof(struct iavf_eth_stats, rx_broadcast)},
-   {"rx_dropped_packets", offsetof(struct iavf_eth_stats, rx_discards)},
+   {"rx_bytes", _OFF_OF(eth_stats.rx_bytes)},
+   {"rx_unicast_packets", _OFF_OF(eth_stats.rx_unicast)},
+   {"rx_multicast_packets", _OFF_OF(eth_stats.rx_multicast)},
+   {"rx_broadcast_packets", _OFF_OF(eth_stats.rx_broadcast)},
+   {"rx_dropped_packets", _OFF_OF(eth_stats.rx_discards)},
{"rx_unknown_protocol_packets", offsetof(struct iavf_eth_stats,
rx_unknown_protocol)},
-   {"tx_bytes", offsetof(struct iavf_eth_stats, tx_bytes)},
-   {"tx_unicast_packets", offsetof(struct iavf_eth_stats, tx_unicast)},
-   {"tx_multicast_packets", offsetof(struct iavf_eth_stats, tx_multicast)},
-   {"tx_broadcast_packets", offsetof(struct iavf_eth_stats, tx_broadcast)},
-   {"tx_dropped_packets", offsetof(struct iavf_eth_stats, tx_discards)},
-   {"tx_error_packets", offsetof(struct iavf_eth_stats, tx_errors)},
+   {"tx_bytes", _OFF_OF(eth_stats.tx_bytes)},
+   {"tx_unicast_packets", _OFF_OF(eth_stats.tx_unicast)},
+   {"tx_multicast_packets", _OFF_OF(eth_stats.tx_multicast)},
+   {"tx_broadcast_packets", _OFF_OF(eth_stats.tx_broadcast)},
+   {"tx_dropped_packets", _OFF_OF(eth_stats.tx_discards)},
+   {"tx_error_packets", _OFF_OF(eth_stats.tx_errors)},
+
+   {"inline_ipsec_crypto_ipackets", _OFF_OF(ips_stats.icount)},
+   {"inline_ipsec_crypto_ibytes", _OFF_OF(ips_stats.ibytes)},
+   {"inline_ipsec_crypto_ierrors", _OFF_OF(ips_stats.ierrors.count)},
+   {"inline_ipsec_crypto_ierrors_sad_lookup",
+   _OFF_OF(ips_stats.ierrors.sad_miss)},
+   {"inline_ipsec_crypto_ierrors_not_processed",
+   _OFF_OF(ips_stats.ierrors.not_processed)},
+   {"inline_ipsec_crypto_ierrors_icv_fail",
+   _OFF_OF(ips_stats.ierrors.icv_check)},
+   {"inline_ipsec_crypto_ierrors_length",
+   _OFF_OF(ips_stats.ierrors.ipsec_length)},
+   {"inline_ipsec_crypto

[dpdk-dev] [PATCH v8 6/7] net/iavf: add watchdog for VFLR

2021-10-15 Thread Radu Nicolau
Add watchdog to iAVF PMD which support monitoring the VFLR register. If
the device is not already in reset then if a VF reset in progress is
detected then notfiy user through callback and set into reset state.
If the device is already in reset then poll for completion of reset.

The watchdog is disabled by default, to enable it set
IAVF_DEV_WATCHDOG_PERIOD to a non zero value (microseconds)

Signed-off-by: Declan Doherty 
Signed-off-by: Radu Nicolau 
---
 drivers/net/iavf/iavf.h|  5 ++
 drivers/net/iavf/iavf_ethdev.c | 94 ++
 2 files changed, 99 insertions(+)

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index 90a7344bd5..f06979b4da 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -31,6 +31,8 @@
 
 #define IAVF_NUM_MACADDR_MAX  64
 
+#define IAVF_DEV_WATCHDOG_PERIOD 0
+
 #define IAVF_DEFAULT_RX_PTHRESH  8
 #define IAVF_DEFAULT_RX_HTHRESH  8
 #define IAVF_DEFAULT_RX_WTHRESH  0
@@ -216,6 +218,9 @@ struct iavf_info {
int cmd_retval; /* return value of the cmd response from PF */
uint8_t *aq_resp; /* buffer to store the adminq response from PF */
 
+   /** iAVF watchdog enable */
+   bool watchdog_enabled;
+
/* Event from pf */
bool dev_closed;
bool link_up;
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index 8f35107f3a..9df9aeae7f 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "iavf.h"
 #include "iavf_rxtx.h"
@@ -240,6 +241,91 @@ iavf_tm_ops_get(struct rte_eth_dev *dev __rte_unused,
return 0;
 }
 
+__rte_unused
+static int
+iavf_vfr_inprogress(struct iavf_hw *hw)
+{
+   int inprogress = 0;
+
+   if ((IAVF_READ_REG(hw, IAVF_VFGEN_RSTAT) &
+   IAVF_VFGEN_RSTAT_VFR_STATE_MASK) ==
+   VIRTCHNL_VFR_INPROGRESS)
+   inprogress = 1;
+
+   if (inprogress)
+   PMD_DRV_LOG(INFO, "Watchdog detected VFR in progress");
+
+   return inprogress;
+}
+
+__rte_unused
+static void
+iavf_dev_watchdog(void *cb_arg)
+{
+   struct iavf_adapter *adapter = cb_arg;
+   struct iavf_hw *hw = IAVF_DEV_PRIVATE_TO_HW(adapter);
+   int vfr_inprogress = 0, rc = 0;
+
+   /* check if watchdog has been disabled since last call */
+   if (!adapter->vf.watchdog_enabled)
+   return;
+
+   /* If in reset then poll vfr_inprogress register for completion */
+   if (adapter->vf.vf_reset) {
+   vfr_inprogress = iavf_vfr_inprogress(hw);
+
+   if (!vfr_inprogress) {
+   PMD_DRV_LOG(INFO, "VF \"%s\" reset has completed",
+   adapter->vf.eth_dev->data->name);
+   adapter->vf.vf_reset = false;
+   }
+   /* If not in reset then poll vfr_inprogress register for VFLR event */
+   } else {
+   vfr_inprogress = iavf_vfr_inprogress(hw);
+
+   if (vfr_inprogress) {
+   PMD_DRV_LOG(INFO,
+   "VF \"%s\" reset event detected by watchdog",
+   adapter->vf.eth_dev->data->name);
+
+   /* enter reset state with VFLR event */
+   adapter->vf.vf_reset = true;
+
+   rte_eth_dev_callback_process(adapter->vf.eth_dev,
+   RTE_ETH_EVENT_INTR_RESET, NULL);
+   }
+   }
+
+   /* re-alarm watchdog */
+   rc = rte_eal_alarm_set(IAVF_DEV_WATCHDOG_PERIOD,
+   &iavf_dev_watchdog, cb_arg);
+
+   if (rc)
+   PMD_DRV_LOG(ERR, "Failed \"%s\" to reset device watchdog alarm",
+   adapter->vf.eth_dev->data->name);
+}
+
+static void
+iavf_dev_watchdog_enable(struct iavf_adapter *adapter __rte_unused)
+{
+#if (IAVF_DEV_WATCHDOG_PERIOD > 0)
+   PMD_DRV_LOG(INFO, "Enabling device watchdog");
+   adapter->vf.watchdog_enabled = true;
+   if (rte_eal_alarm_set(IAVF_DEV_WATCHDOG_PERIOD,
+   &iavf_dev_watchdog, (void *)adapter))
+   PMD_DRV_LOG(ERR, "Failed to enabled device watchdog");
+#endif
+}
+
+static void
+iavf_dev_watchdog_disable(struct iavf_adapter *adapter __rte_unused)
+{
+#if (IAVF_DEV_WATCHDOG_PERIOD > 0)
+   PMD_DRV_LOG(INFO, "Disabling device watchdog");
+   adapter->vf.watchdog_enabled = false;
+#endif
+}
+
 static int
 iavf_set_mc_addr_list(struct rte_eth_dev *dev,
struct rte_ether_addr *mc_addrs,
@@ -2497,6 +2583,11 @@ iavf_dev_init(struct rte_eth_dev *eth_dev)
 
iavf_default_rss_disable(adapter);
 
+
+   /* Start device watchdog */
+   iavf_dev_watchdog_enable(adapter);
+
+
return 0;
 
 flow_init_err:
@@ -2580,6 +2671,9 @@ iavf_dev_close(struct rte_eth_dev *dev)
if (vf->vf_reset && !rte_pci_set_bus_master(pci_dev,

[dpdk-dev] [PATCH v8 7/7] net/iavf: update doc with inline crypto support

2021-10-15 Thread Radu Nicolau
Update the PMD doc, feature matrix and release notes with the
new inline crypto feature.

Signed-off-by: Radu Nicolau 
---
 doc/guides/nics/features/iavf.ini  |  2 ++
 doc/guides/nics/intel_vf.rst   | 10 ++
 doc/guides/rel_notes/release_21_11.rst |  1 +
 3 files changed, 13 insertions(+)

diff --git a/doc/guides/nics/features/iavf.ini 
b/doc/guides/nics/features/iavf.ini
index d00ca934c3..78f649c25f 100644
--- a/doc/guides/nics/features/iavf.ini
+++ b/doc/guides/nics/features/iavf.ini
@@ -28,6 +28,7 @@ L4 checksum offload  = P
 Packet type parsing  = Y
 Rx descriptor status = Y
 Tx descriptor status = Y
+Inline crypto= Y
 Basic stats  = Y
 Multiprocess aware   = Y
 FreeBSD  = Y
@@ -64,3 +65,4 @@ mark = Y
 passthru = Y
 queue= Y
 rss  = Y
+security = Y
diff --git a/doc/guides/nics/intel_vf.rst b/doc/guides/nics/intel_vf.rst
index 2efdd1a41b..038e7c02b6 100644
--- a/doc/guides/nics/intel_vf.rst
+++ b/doc/guides/nics/intel_vf.rst
@@ -633,3 +633,13 @@ Windows Support
 
 *   To load NetUIO driver, follow the steps mentioned in `dpdk-kmods repository
 `_.
+
+
+Inline IPsec Support
+
+
+*   IAVF PMD supports inline crypto processing depending on the underlying
+hardware crypto capabilities. IPsec Security Gateway Sample Application
+supports inline IPsec processing for IAVF PMD. For more details see the
+IPsec Security Gateway Sample Application and Security library
+documentation.
diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index 7bb8768b67..3703d11369 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -95,6 +95,7 @@ New Features
 
   * Added Intel iavf support on Windows.
   * Added IPv4 and L4 (TCP/UDP/SCTP) checksum hash support in RSS flow.
+  * Added Intel iavf inline crypto support.
 
 * **Updated Intel ice driver.**
 
-- 
2.25.1



[dpdk-dev] [PATCH] net/enic: fix build with GCC 7.5

2021-10-15 Thread Ferruh Yigit
Build error:
../drivers/net/enic/enic_fm_flow.c: In function 'enic_fm_flow_parse':
../drivers/net/enic/enic_fm_flow.c:1467:24:
error: 'dev' may be used uninitialized in this function
[-Werror=maybe-uninitialized]
struct rte_eth_dev *dev;
^~~
../drivers/net/enic/enic_fm_flow.c:1580:24:
error: 'dev' may be used uninitialized in this function
[-Werror=maybe-uninitialized]
struct rte_eth_dev *dev;
^~~
../drivers/net/enic/enic_fm_flow.c:1599:24:
error: 'dev' may be used uninitialized in this function
[-Werror=maybe-uninitialized]
struct rte_eth_dev *dev;
^~~

Build error looks like false positive, but to silence the compiler
initializing the pointer with NULL.

Fixes: 7968917ccf64 ("net/enic: support meta flow actions to overrule 
destinations")

Reported-by: David Marchand 
Signed-off-by: Ferruh Yigit 
---
Cc: andrew.rybche...@oktetlabs.ru

I am not sure about the solution and I don't have environment to verify,
sending this patch to verify the solution in CI and trigger discussion
for fix.
The patch is still in next-net, when a proper fix is found, it can be
squashed in next-net.
---
 drivers/net/enic/enic_fm_flow.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/enic/enic_fm_flow.c b/drivers/net/enic/enic_fm_flow.c
index 4092ff1f6154..2c60bb864e23 100644
--- a/drivers/net/enic/enic_fm_flow.c
+++ b/drivers/net/enic/enic_fm_flow.c
@@ -1464,7 +1464,7 @@ enic_fm_copy_action(struct enic_flowman *fm,
}
case RTE_FLOW_ACTION_TYPE_PORT_ID: {
const struct rte_flow_action_port_id *port;
-   struct rte_eth_dev *dev;
+   struct rte_eth_dev *dev = NULL;
 
if (!ingress && (overlap & PORT_ID)) {
ENICPMD_LOG(DEBUG, "cannot have multiple egress 
PORT_ID actions");
@@ -1577,7 +1577,7 @@ enic_fm_copy_action(struct enic_flowman *fm,
}
case RTE_FLOW_ACTION_TYPE_PORT_REPRESENTOR: {
const struct rte_flow_action_ethdev *ethdev;
-   struct rte_eth_dev *dev;
+   struct rte_eth_dev *dev = NULL;
 
ethdev = actions->conf;
ret = enic_fm_check_transfer_dst(enic, ethdev->port_id,
@@ -1596,7 +1596,7 @@ enic_fm_copy_action(struct enic_flowman *fm,
}
case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT: {
const struct rte_flow_action_ethdev *ethdev;
-   struct rte_eth_dev *dev;
+   struct rte_eth_dev *dev = NULL;
 
if (overlap & PORT_ID) {
ENICPMD_LOG(DEBUG, "cannot have multiple egress 
PORT_ID actions");
-- 
2.31.1



Re: [dpdk-dev] [PATCH v6 1/3] net/thunderx: enable build only on 64-bit Linux

2021-10-15 Thread Ferruh Yigit

On 10/14/2021 8:56 PM, pbhagavat...@marvell.com wrote:

From: Pavan Nikhilesh 

Since AARCH32 extension is not implemented on thunderx family, only
enable build for 64bit.
Due to Linux kernel AF(Admin function) driver dependency, only enable
build for Linux.



Hi Pavan,

Perhaps this patch took more time than it should, but according Jerin's
description the problem is SoC can't run 32bit applications.

Why do you still mention from the kernel driver dependency? It looks like
that dependency is not reason to not compile 32 bit app, am I missing
something?


Signed-off-by: Pavan Nikhilesh 
Acked-by: Jerin Jacob 
---
  v6 Changes:
  - Update commit log to describe why 32bit is not supported.
  v5 Changes:
  - s/fuction/function.
  v4 Changes:
  - Update commit message regarding dependency on AF driver.

  drivers/net/thunderx/meson.build | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/thunderx/meson.build b/drivers/net/thunderx/meson.build
index 4bbcea7f93..da665bd76f 100644
--- a/drivers/net/thunderx/meson.build
+++ b/drivers/net/thunderx/meson.build
@@ -1,9 +1,9 @@
  # SPDX-License-Identifier: BSD-3-Clause
  # Copyright(c) 2017 Cavium, Inc

-if is_windows
+if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
  build = false
-reason = 'not supported on Windows'
+reason = 'only supported on 64-bit Linux'
  subdir_done()
  endif

--
2.17.1





Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread Dmitry Kozlyuk
Hello David,

> [...]
> - When rebasing on main, you probably won't be able to call this new flag.
> The diff should be something like:
> 
> diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c index
> d886f4800c..35c80291fa 100644
> --- a/app/test/test_mempool.c
> +++ b/app/test/test_mempool.c
> @@ -214,7 +214,7 @@ static int
> test_mempool_creation_with_unknown_flag(void)
> MEMPOOL_ELT_SIZE, 0, 0,
> NULL, NULL,
> NULL, NULL,
> -   SOCKET_ID_ANY, MEMPOOL_F_NO_IOVA_CONTIG << 1);
> +   SOCKET_ID_ANY, MEMPOOL_F_NON_IO << 1);
> 
> if (mp_cov != NULL) {
> rte_mempool_free(mp_cov); diff --git
> a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c index
> 8d5f99f7e7..27d197fe86 100644
> --- a/lib/mempool/rte_mempool.c
> +++ b/lib/mempool/rte_mempool.c
> @@ -802,6 +802,7 @@ rte_mempool_cache_free(struct rte_mempool_cache
> *cache)
> | MEMPOOL_F_SC_GET \
> | MEMPOOL_F_POOL_CREATED \
> | MEMPOOL_F_NO_IOVA_CONTIG \
> +   | MEMPOOL_F_NON_IO \

I wonder why CREATED and NON_IO should be listed here:
they are not supposed to be passed by the user,
which is what MEMPOOL_KNOWN_FLAGS is used for.
The same question stands for the test code.
Could you confirm your suggestion?

> )
>  /* create an empty mempool */
>  struct rte_mempool *
> 
> 
> - While grepping, I noticed that proc-info also dumps mempool flags.
> This could be something to enhance, maybe amending current
> rte_mempool_dump() and having this tool use it.
> But for now, can you update this tool too?

I will, thanks for the hints.


Re: [dpdk-dev] [PATCH v13 0/2] testpmd shows incorrect rx_offload configuration

2021-10-15 Thread Ferruh Yigit

On 10/14/2021 1:56 PM, Ferruh Yigit wrote:

On 10/14/2021 11:31 AM, Jie Wang wrote:

Launch testpmd with multiple queues, and check rx_offload info.

When testpmd shows the port configuration, it doesn't show RSS_HASH.

---
v13:
  - update the API comment.
  - fix the bug that testpmd failed to run test_pf_tx_rx_queue test case.
v12: update the commit log and the API comment.
v11:
  - update the commit log.
  - rename the function and variable name.
v10:
  - update the commit log.
  - merge the first two patches.
  - rename the new API name.
v9:
  - add a release notes update for the new API.
  - update the description of the new API.
  - optimize the new API.
  - optimize the assignment of the offloads.
v8: delete "rte_exit" and just print error log.
v7:
  - delete struct "rte_eth_dev_conf_info", and reuse struct "rte_eth_conf".
  - add "__rte_experimental" to the new API "rte_eth_dev_conf_info_get" 
declaration.
v6: split this patch into two patches.
v5: add an API to get device configuration info.
v4: delete the whitespace at the end of the line.
v3:
  - check and update the "offloads" of "port->dev_conf.rx/txmode".
  - update the commit log.
v2: copy "rx/txmode.offloads", instead of copying the entire struct
"dev->data->dev_conf.rx/txmode".

Jie Wang (2):
   ethdev: add an API to get device configuration
   app/testpmd: fix testpmd doesn't show RSS hash offload



For series,
Reviewed-by: Ferruh Yigit 

I am waiting CI result to proceed with patch.



CI failed to apply the patch :(, I am proceeding with the patch.


Re: [dpdk-dev] [PATCH] net/bonding: fix Rx queue data destroyed by Tx queue release

2021-10-15 Thread Ferruh Yigit

On 10/15/2021 10:55 AM, Xueming Li wrote:

When release Tx queue, Rx queue data got freed because wrong Tx queue
data located.

This patch fixes the wrong Tx queue data location.

Fixes: 7483341ae553 ("ethdev: change queue release callback")
Signed-off-by: Xueming Li 


Reviewed-by: Ferruh Yigit 

Applied to dpdk-next-net/main, thanks.


Re: [dpdk-dev] [PATCH 1/5] hash: add new toeplitz hash implementation

2021-10-15 Thread Ananyev, Konstantin

> >> +/**
> >> + * Calculate Toeplitz hash.
> >> + *
> >> + * @warning
> >> + * @b EXPERIMENTAL: this API may change without prior notice.
> >> + *
> >> + * @param m
> >> + *  Pointer to the matrices generated from the corresponding
> >> + *  RSS hash key using rte_thash_complete_matrix().
> >> + * @param tuple
> >> + *  Pointer to the data to be hashed. Data must be in network byte order.
> >> + * @param len
> >> + *  Length of the data to be hashed.
> >> + * @return
> >> + *  Calculated Toeplitz hash value.
> >> + */
> >> +__rte_experimental
> >> +static inline uint32_t
> >> +rte_thash_gfni(uint64_t *m, uint8_t *tuple, int len)
> >> +{
> >> +  uint32_t val, val_zero;
> >> +
> >> +  __m512i xor_acc = __rte_thash_gfni(m, tuple, NULL, len);
> >> +  __rte_thash_xor_reduce(xor_acc, &val, &val_zero);
> >> +
> >> +  return val;
> >> +}
> >> +
> >> +/**
> >> + * Calculate Toeplitz hash for two independent data buffers.
> >> + *
> >> + * @warning
> >> + * @b EXPERIMENTAL: this API may change without prior notice.
> >> + *
> >> + * @param m
> >> + *  Pointer to the matrices generated from the corresponding
> >> + *  RSS hash key using rte_thash_complete_matrix().
> >> + * @param tuple_1
> >> + *  Pointer to the data to be hashed. Data must be in network byte order.
> >> + * @param tuple_2
> >> + *  Pointer to the data to be hashed. Data must be in network byte order.
> >> + * @param len
> >> + *  Length of the largest data buffer to be hashed.
> >> + * @param val_1
> >> + *  Pointer to uint32_t where to put calculated Toeplitz hash value for
> >> + *  the first tuple.
> >> + * @param val_2
> >> + *  Pointer to uint32_t where to put calculated Toeplitz hash value for
> >> + *  the second tuple.
> >> + */
> >> +__rte_experimental
> >> +static inline void
> >> +rte_thash_gfni_x2(uint64_t *mtrx, uint8_t *tuple_1, uint8_t *tuple_2, int 
> >> len,
> >> +  uint32_t *val_1, uint32_t *val_2)
> >
> > Why just two?
> > Why not uint8_t *tuple[]
> > ?
> >
> 
> x2 version was added because there was unused space inside the ZMM which
> holds input key (input tuple) bytes for a second input key, so it helps
> to improve performance in some cases.
> Bulk version wasn't added because for the vast majority of cases it will
> be used with a single input key.
> Hiding this function inside .c will greatly affect performance, because
> it takes just a few cycles to calculate the hash for the most popular
> key sizes.

Ok, but it still unclear to me why for 2 only?
What stops you from doing:
static inline void
rte_thash_gfni_bulk(const uint64_t *mtrx, uint32_t len, uint8_t *tuple[], 
uint32_t val[], uint32_t num)
{
for (i = 0; i != (num & ~1); i += 2) {
xor_acc = __rte_thash_gfni(mtrx, tuple[i], tuple[i+ 1], 
len);
__rte_thash_xor_reduce(xor_acc, val + i, val + i + 1);
}
If (num & 1) {
xor_acc = __rte_thash_gfni(mtrx, tuple[i], NULL, len);
__rte_thash_xor_reduce(xor_acc, val + i, &val_zero);
}  
}
?



Re: [dpdk-dev] [PATCH] driver: i40evf device initialization

2021-10-15 Thread Xueming(Steven) Li
On Wed, 2021-10-13 at 09:21 -0400, Ben Magistro wrote:
> Hello,
> 
> Replying here as I'm a little stuck and hoping someone has some
> advice for what the next steps should be.
> 
> Going from the list above of how to get this noticed by the LTS
> maintainer(s), the patch, well commit message + subject were revised
> and resent to the list
> (https://patches.dpdk.org/project/dpdk/patch/20211012141752.6376-1-
> konce...@gmail.com/) but the i40evf has since been removed from main
> already so options 1 & 2 seem to no longer apply.  This seems to put
> us into option 3 of a backported patch?  Is it just a subject line
> change then or can this be pulled out of the "not applicable" pile
> still?

Hi Ben,

Since it doesn't apply to upstream, option 3 should be good enough,
just add [20.11] or [19.11] as prefix and send to sta...@dpdk.org.

Regards,
Xueming Li


> 
> Thanks and appreciate the advice,
> 
> Ben Magistro
> 
> On Mon, Sep 13, 2021 at 10:52 PM Ben Magistro 
> wrote:
> > +cc: sta...@dpdk.org
> > 
> > Per discussions here, cc'ing stable for fix to be applied to LTS as
> > i40evf is being removed from next.
> > 
> > On Thu, Sep 2, 2021 at 8:37 AM Xueming(Steven) Li
> >  wrote:
> > > 
> > > 
> > > 
> > > > -Original Message-
> > > > From: Ferruh Yigit 
> > > > Sent: Monday, August 30, 2021 5:43 PM
> > > > To: Xueming(Steven) Li ; Kevin Traynor
> > > > ; Ben Magistro ;
> > > > dev@dpdk.org; Beilei Xing ; Luca
> > > > Boccassi ; Christian Ehrhardt
> > > > 
> > > > Cc: ben.magis...@trinitycyber.com;
> > > > stefan.baran...@trinitycyber.com; Qi Zhang
> > > > 
> > > > Subject: Re: [dpdk-dev] [PATCH] driver: i40evf device
> > > > initialization
> > > > 
> > > > On 8/27/2021 7:28 AM, Xueming(Steven) Li wrote:
> > > > > 
> > > > > 
> > > > > > -Original Message-
> > > > > > From: Kevin Traynor 
> > > > > > Sent: Thursday, August 26, 2021 6:46 PM
> > > > > > To: Ferruh Yigit ; Ben Magistro
> > > > > > ; dev@dpdk.org; Beilei Xing
> > > > > > ; Luca Boccassi ;
> > > > > > Christian
> > > > > > Ehrhardt ;
> > > > > > Xueming(Steven) Li
> > > > > > 
> > > > > > Cc: ben.magis...@trinitycyber.com;
> > > > > > stefan.baran...@trinitycyber.com;
> > > > > > Qi Zhang 
> > > > > > Subject: Re: [dpdk-dev] [PATCH] driver: i40evf device
> > > > > > initialization
> > > > > > 
> > > > > > + Christian and Xueming
> > > > > > 
> > > > > > On 26/08/2021 11:25, Ferruh Yigit wrote:
> > > > > > > On 8/25/2021 8:45 PM, Ben Magistro wrote:
> > > > > > > > The i40evf driver is not initializing the eth_dev
> > > > > > > > attribute which
> > > > > > > > can result in a nullptr dereference. Changes were
> > > > > > > > modeled after the
> > > > > > > > iavf_dev_init() per suggestion from the mailing
> > > > > > > > list[1].
> > > > > > > > 
> > > > > > > > [1] https://mails.dpdk.org/archives/dev/2021-
> > > > > > > > August/217251.html
> > > > > > > > 
> > > > > > > > Signed-off-by: Ben Magistro 
> > > > > > > 
> > > > > > > i40evf will be removed in this release. But I guess it
> > > > > > > helps for
> > > > > > > stable releases to first merge the fixes and later
> > > > > > > removed it, not sure.
> > > > > > > 
> > > > > > > @Luca, @Kevin, do you prefer this patch directly to
> > > > > > > stable repos, or
> > > > > > > through the main repo?
> > > > > > 
> > > > > > I'll leave to Luca/Xueming and Christian to say if they
> > > > > > have a
> > > > > > preference, but I'd guess either way is fine from stable
> > > > > > view once it has fixes/stable tags or LTS patch prefix (it
> > > > > > doesn't have any of
> > > > these at present).
> > > > > 
> > > > > Yes, any option will make it being noticed by LTS maintainer:
> > > > > 1. patches accepted by main with "fix" in subject 2. patches
> > > > > accepted
> > > > > by main with "cc: sta...@dpdk.org" in commit message 3.
> > > > > patches
> > > > > backported to LTS, sent to stable maillist with LTS prefix,
> > > > > for example "[20.11]"
> > > > > 
> > > > 
> > > > Thanks Xueming,
> > > > 
> > > > But is there a preferences for this case?
> > > > 
> > > > The i40evf will be removed from main repo, is it better
> > > > 1- first apply the fix and remove the component from main (I
> > > > assume fix still will be bacported to LTS in this case) or
> > > > 2- remove the i40evf from main (without fix), apply the fix
> > > > directly to the LTS.
> > > 
> > > Both options will work, the first is more easy and common I
> > > guess, both 19.11 LTS and 20.11 LTS maintainer can find it.
> > > 
> > > > 
> > > > Thanks,
> > > > ferruh
> > > > 
> > > > > > 
> > > > > > > i40evf won't be tested in the main anyway, since it would
> > > > > > > be removed
> > > > > > > before -rc1 testing, so it looks like there won't be any
> > > > > > > difference from testing point of view.
> > > > > > > 
> > > > > > > 
> > > > > > > > ---
> > > > > > > >   drivers/net/i40e/i40e_ethdev_vf.c | 8 ++--
> > > > > > > >   1 file changed, 6 insertions(+), 2 deletions(-)
> > > > > > > > 
> > > > > > > 

Re: [dpdk-dev] [PATCH v3 1/3] ethdev: support PPP and L2TPV2 procotol

2021-10-15 Thread Ferruh Yigit

On 10/15/2021 10:58 AM, Jie Wang wrote:

+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ * RTE_FLOW_ITEM_TYPE_PPP
+ *
+ * Matches PPP Header
+ */
+struct rte_flow_item_ppp {
+   uint8_t addr; /**< ppp address(8) */
+   uint8_t ctrl; /**< ppp control(8) */
+   rte_be16_t proto_id; /**< ppp protocol id(16) */
+};


Hi Jie,

Can't we do same thing for ppp, have the protocol header in the lib/net
and use it within the 'rte_flow_item_ppp'?


Re: [dpdk-dev] [PATCH] net/enic: fix build with GCC 7.5

2021-10-15 Thread David Marchand
On Fri, Oct 15, 2021 at 12:28 PM Ferruh Yigit  wrote:
>
> Build error:
> ../drivers/net/enic/enic_fm_flow.c: In function 'enic_fm_flow_parse':
> ../drivers/net/enic/enic_fm_flow.c:1467:24:
> error: 'dev' may be used uninitialized in this function
> [-Werror=maybe-uninitialized]
> struct rte_eth_dev *dev;
> ^~~
> ../drivers/net/enic/enic_fm_flow.c:1580:24:
> error: 'dev' may be used uninitialized in this function
> [-Werror=maybe-uninitialized]
> struct rte_eth_dev *dev;
> ^~~
> ../drivers/net/enic/enic_fm_flow.c:1599:24:
> error: 'dev' may be used uninitialized in this function
> [-Werror=maybe-uninitialized]
> struct rte_eth_dev *dev;
> ^~~
>
> Build error looks like false positive, but to silence the compiler
> initializing the pointer with NULL.

enic_fm_check_transfer_dst() contains branches where dev is not set
and those branches return rte_flow_error_set return value.
dev is dereferenced later based on this return value == 0.

So the compiler probably thinks that rte_flow_error_set may return 0.
rte_flow_error_set is outside of compiler "view" at the moment it
compiles enic_fm_flow.c, so the compiler making the assumption this
function can return 0 is being prudent from my pov.


>
> Fixes: 7968917ccf64 ("net/enic: support meta flow actions to overrule 
> destinations")
>
> Reported-by: David Marchand 
> Signed-off-by: Ferruh Yigit 

Your fix looks good in any case.

Reviewed-by: David Marchand 


-- 
David Marchand



Re: [dpdk-dev] [PATCH v13 0/2] testpmd shows incorrect rx_offload configuration

2021-10-15 Thread Ferruh Yigit

On 10/14/2021 1:56 PM, Ferruh Yigit wrote:

On 10/14/2021 11:31 AM, Jie Wang wrote:

Launch testpmd with multiple queues, and check rx_offload info.

When testpmd shows the port configuration, it doesn't show RSS_HASH.

---
v13:
  - update the API comment.
  - fix the bug that testpmd failed to run test_pf_tx_rx_queue test case.
v12: update the commit log and the API comment.
v11:
  - update the commit log.
  - rename the function and variable name.
v10:
  - update the commit log.
  - merge the first two patches.
  - rename the new API name.
v9:
  - add a release notes update for the new API.
  - update the description of the new API.
  - optimize the new API.
  - optimize the assignment of the offloads.
v8: delete "rte_exit" and just print error log.
v7:
  - delete struct "rte_eth_dev_conf_info", and reuse struct "rte_eth_conf".
  - add "__rte_experimental" to the new API "rte_eth_dev_conf_info_get" 
declaration.
v6: split this patch into two patches.
v5: add an API to get device configuration info.
v4: delete the whitespace at the end of the line.
v3:
  - check and update the "offloads" of "port->dev_conf.rx/txmode".
  - update the commit log.
v2: copy "rx/txmode.offloads", instead of copying the entire struct
"dev->data->dev_conf.rx/txmode".

Jie Wang (2):
   ethdev: add an API to get device configuration
   app/testpmd: fix testpmd doesn't show RSS hash offload



For series,
Reviewed-by: Ferruh Yigit 



Series applied to dpdk-next-net/main, thanks.


Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread David Marchand
On Fri, Oct 15, 2021 at 12:42 PM Dmitry Kozlyuk  wrote:
> > a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c index
> > 8d5f99f7e7..27d197fe86 100644
> > --- a/lib/mempool/rte_mempool.c
> > +++ b/lib/mempool/rte_mempool.c
> > @@ -802,6 +802,7 @@ rte_mempool_cache_free(struct rte_mempool_cache
> > *cache)
> > | MEMPOOL_F_SC_GET \
> > | MEMPOOL_F_POOL_CREATED \
> > | MEMPOOL_F_NO_IOVA_CONTIG \
> > +   | MEMPOOL_F_NON_IO \
>
> I wonder why CREATED and NON_IO should be listed here:
> they are not supposed to be passed by the user,
> which is what MEMPOOL_KNOWN_FLAGS is used for.
> The same question stands for the test code.
> Could you confirm your suggestion?

There was no distinction in the API for valid flags so far, and indeed
I did not pay attention to MEMPOOL_F_POOL_CREATED and its internal
aspect.
(That's the problem when mixing stuff together)

We could separate internal and exposed flags in different fields, but
it seems overkill.
It would be seen as an API change too, if application were checking
for this flag.
So let's keep this as is.

As you suggest, we should exclude those internal flags from
KNOWN_FLAGS (probably rename it too), and we will have to export this
define for the unit test since the check had been written with
contiguous valid flags in mind.
If your new flag is internal only, I agree we must skip it.

I'll prepare a patch for mempool.

-- 
David Marchand



  1   2   3   >