RE: OVS DPDK DMA-Dev library/Design Discussion

2022-04-08 Thread Hu, Jiayu


> -Original Message-
> From: Ilya Maximets 
> Sent: Thursday, April 7, 2022 10:40 PM
> To: Maxime Coquelin ; Van Haaren, Harry
> ; Morten Brørup
> ; Richardson, Bruce
> 
> Cc: i.maxim...@ovn.org; Pai G, Sunil ; Stokes, Ian
> ; Hu, Jiayu ; Ferriter, Cian
> ; ovs-...@openvswitch.org; dev@dpdk.org;
> Mcnamara, John ; O'Driscoll, Tim
> ; Finn, Emma 
> Subject: Re: OVS DPDK DMA-Dev library/Design Discussion
> 
> On 4/7/22 16:25, Maxime Coquelin wrote:
> > Hi Harry,
> >
> > On 4/7/22 16:04, Van Haaren, Harry wrote:
> >> Hi OVS & DPDK, Maintainers & Community,
> >>
> >> Top posting overview of discussion as replies to thread become slower:
> >> perhaps it is a good time to review and plan for next steps?
> >>
> >>  From my perspective, it those most vocal in the thread seem to be in
> >> favour of the clean rx/tx split ("defer work"), with the tradeoff
> >> that the application must be aware of handling the async DMA
> >> completions. If there are any concerns opposing upstreaming of this
> method, please indicate this promptly, and we can continue technical
> discussions here now.
> >
> > Wasn't there some discussions about handling the Virtio completions
> > with the DMA engine? With that, we wouldn't need the deferral of work.
> 
> +1
> 
> With the virtio completions handled by DMA itself, the vhost port turns
> almost into a real HW NIC.  With that we will not need any extra
> manipulations from the OVS side, i.e. no need to defer any work while
> maintaining clear split between rx and tx operations.

First, making DMA do 2B copy would sacrifice performance, and I think
we all agree on that. Second, this method comes with an issue of ordering.
For example, PMD thread0 enqueue 10 packets to vring0 first, then PMD thread1
enqueue 20 packets to vring0. If PMD thread0 and threa1 have own dedicated
DMA device dma0 and dma1, flag/index update for the first 10 packets is done by
dma0, and flag/index update for the left 20 packets is done by dma1. But there
is no ordering guarantee among different DMA devices, so flag/index update may
error. If PMD threads don't have dedicated DMA devices, which means DMA
devices are shared among threads, we need lock and pay for lock contention in
data-path. Or we can allocate DMA devices for vring dynamically to avoid DMA
sharing among threads. But what's the overhead of allocation mechanism? Who
does it? Any thoughts?

Thanks,
Jiayu

> 
> I'd vote for that.
> 
> >
> > Thanks,
> > Maxime
> >
> >> In absence of continued technical discussion here, I suggest Sunil
> >> and Ian collaborate on getting the OVS Defer-work approach, and DPDK
> >> VHost Async patchsets available on GitHub for easier consumption and
> future development (as suggested in slides presented on last call).
> >>
> >> Regards, -Harry
> >>
> >> No inline-replies below; message just for context.
> >>
> >>> -Original Message-
> >>> From: Van Haaren, Harry
> >>> Sent: Wednesday, March 30, 2022 10:02 AM
> >>> To: Morten Brørup ; Richardson, Bruce
> >>> 
> >>> Cc: Maxime Coquelin ; Pai G, Sunil
> >>> ; Stokes, Ian ; Hu,
> >>> Jiayu ; Ferriter, Cian
> >>> ; Ilya Maximets ;
> >>> ovs-...@openvswitch.org; dev@dpdk.org; Mcnamara, John
> >>> ; O'Driscoll, Tim
> >>> ; Finn, Emma 
> >>> Subject: RE: OVS DPDK DMA-Dev library/Design Discussion
> >>>
>  -Original Message-
>  From: Morten Brørup 
>  Sent: Tuesday, March 29, 2022 8:59 PM
>  To: Van Haaren, Harry ; Richardson,
>  Bruce 
>  Cc: Maxime Coquelin ; Pai G, Sunil
>  ; Stokes, Ian ; Hu,
>  Jiayu ; Ferriter, Cian
>  ; Ilya Maximets ;
>  ovs-...@openvswitch.org; dev@dpdk.org; Mcnamara,
> >>> John
>  ; O'Driscoll, Tim
>  ; Finn, Emma 
>  Subject: RE: OVS DPDK DMA-Dev library/Design Discussion
> 
> > From: Van Haaren, Harry [mailto:harry.van.haa...@intel.com]
> > Sent: Tuesday, 29 March 2022 19.46
> >
> >> From: Morten Brørup 
> >> Sent: Tuesday, March 29, 2022 6:14 PM
> >>
> >>> From: Bruce Richardson [mailto:bruce.richard...@intel.com]
> >>> Sent: Tuesday, 29 March 2022 19.03
> >>>
> >>> On Tue, Mar 29, 2022 at 06:45:19PM +0200, Morten Brørup wrote:
> > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> > Sent: Tuesday, 29 March 2022 18.24
> >
> > Hi Morten,
> >
> > On 3/29/22 16:44, Morten Brørup wrote:
> >>> From: Van Haaren, Harry [mailto:harry.van.haa...@intel.com]
> >>> Sent: Tuesday, 29 March 2022 15.02
> >>>
>  From: Morten Brørup 
>  Sent: Tuesday, March 29, 2022 1:51 PM
> 
>  Having thought more about it, I think that a completely
> >>> different
> > architectural approach is required:
> 
>  Many of the DPDK Ethernet PMDs implement a variety of RX
> > and TX
> > packet burst functions, each optimized for different CPU
> > vector instruction sets. The availability o

[RFC 00/15] Add vDPA multi-threads optiomization

2022-04-08 Thread Li Zhang
Allow the driver to use internal threads to
obtain fast configuration.
All the threads will be open on the same core of
the event completion queue scheduling thread.

Add max_conf_threads parameter to configure
the maximum number of internal threads in addition to
the caller thread (8 is suggested).
These internal threads to pipeline handle VDPA tasks
in system and shared with all VDPA devices.
Default is 0, don't use internal threads for configuration.

Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
http://patchwork.dpdk.org/project/dpdk/list/?series=21868

Li Zhang (11):
  common/mlx5: extend virtq modifiable fields
  vdpa/mlx5: pre-create virtq in the prob
  vdpa/mlx5: optimize datapath-control synchronization
  vdpa/mlx5: add multi-thread management for configuration
  vdpa/mlx5: add task ring for MT management
  vdpa/mlx5: add MT task for VM memory registration
  vdpa/mlx5: add virtq creation task for MT management
  vdpa/mlx5: add virtq LM log task
  vdpa/mlx5: add device close task
  vdpa/mlx5: add virtq sub-resources creation
  vdpa/mlx5: prepare virtqueue resource creation

Yajun Wu (4):
  examples/vdpa: fix vDPA device remove
  vdpa/mlx5: support pre create virtq resource
  common/mlx5: add DevX API to move QP to reset state
  vdpa/mlx5: support event qp reuse

 doc/guides/vdpadevs/mlx5.rst  |  25 ++
 drivers/common/mlx5/mlx5_devx_cmds.c  |  77 +++-
 drivers/common/mlx5/mlx5_devx_cmds.h  |   6 +-
 drivers/common/mlx5/mlx5_prm.h|  30 +-
 drivers/vdpa/mlx5/meson.build |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c | 227 +-
 drivers/vdpa/mlx5/mlx5_vdpa.h | 147 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 362 
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   | 160 +--
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c  | 133 --
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 268 
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c   |  20 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 582 ++
 examples/vdpa/main.c  |   4 +
 14 files changed, 1674 insertions(+), 368 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

-- 
2.27.0



[RFC 01/15] examples/vdpa: fix vDPA device remove

2022-04-08 Thread Li Zhang
From: Yajun Wu 

Add calling rte_dev_remove in vDPA example application exit. Otherwise
rte_dev_remove never get called.

Fixes: edbed86d1cc ("examples/vdpa: introduce a new sample for vDPA")
Cc: sta...@dpdk.org

Signed-off-by: Yajun Wu 
---
 examples/vdpa/main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/examples/vdpa/main.c b/examples/vdpa/main.c
index bd66deca85..19753f6e09 100644
--- a/examples/vdpa/main.c
+++ b/examples/vdpa/main.c
@@ -593,6 +593,10 @@ main(int argc, char *argv[])
vdpa_sample_quit();
}
 
+   RTE_DEV_FOREACH(dev, "class=vdpa", &dev_iter) {
+   rte_dev_remove(dev);
+   }
+
/* clean up the EAL */
rte_eal_cleanup();
 
-- 
2.27.0



[RFC 02/15] vdpa/mlx5: support pre create virtq resource

2022-04-08 Thread Li Zhang
From: Yajun Wu 

The motivation of this change is to reduce vDPA device queue creation
time by create some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configurtion
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu 
---
 doc/guides/vdpadevs/mlx5.rst  | 14 +++
 drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++-
 drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 3ded142311..0ad77bf535 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 
drivers.
 
   - 0, HW default.
 
+- ``queue_size`` parameter [int]
+
+  - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up
+first time queue creation. Set it together with queues devarg.
+
+  - 0, default value, no pre-create virtq resource.
+
+- ``queues`` parameter [int]
+
+  - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx 
queue)
+for pre-create queue resource to speed up first time queue creation. Set it
+together with queue_size devarg.
+
+  - 0, default value, no pre-create virtq resource.
 
 Error handling
 ^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 534ba64b02..57f9b05e35 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 static void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
-   mlx5_vdpa_virtqs_cleanup(priv);
+   /* Clean pre-created resource in dev removal only. */
+   if (!priv->queues)
+   mlx5_vdpa_virtqs_cleanup(priv);
mlx5_vdpa_mem_dereg(priv);
 }
 
@@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char 
*val, void *opaque)
priv->hw_max_latency_us = (uint32_t)tmp;
} else if (strcmp(key, "hw_max_pending_comp") == 0) {
priv->hw_max_pending_comp = (uint32_t)tmp;
+   } else if (strcmp(key, "queue_size") == 0) {
+   priv->queue_size = (uint16_t)tmp;
+   } else if (strcmp(key, "queues") == 0) {
+   priv->queues = (uint16_t)tmp;
+   } else {
+   DRV_LOG(WARNING, "Invalid key %s.", key);
}
return 0;
 }
@@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
if (!priv->event_us &&
priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER)
priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US;
+   if ((priv->queue_size && !priv->queues) ||
+   (!priv->queue_size && priv->queues)) {
+   priv->queue_size = 0;
+   priv->queues = 0;
+   DRV_LOG(WARNING, "Please provide both queue_size and queues.");
+   }
DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode);
DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us);
DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
+   DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues,
+   priv->queue_size);
+}
+
+static int
+mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
+{
+   uint32_t index;
+   uint32_t i;
+
+   if (!priv->queues)
+   return 0;
+   for (index = 0; index < (priv->queues * 2); ++index) {
+   struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+
+   if (priv->caps.queue_counters_valid) {
+   if (!virtq->counters)
+   virtq->counters =
+   mlx5_devx_cmd_create_virtio_q_counters
+   (priv->cdev->ctx);
+   if (!virtq->counters) {
+   DRV_LOG(ERR, "Failed to create virtq couners 
for virtq"
+   " %d.", index);
+   return -1;
+   }
+   }
+   for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+   uint32_t size;
+   void *buf;
+   struct mlx5dv_devx_umem *obj;
+
+ 

[RFC 04/15] vdpa/mlx5: support event qp reuse

2022-04-08 Thread Li Zhang
From: Yajun Wu 

To speed up queue create time, event qp and cq will create only once.
Each virtq creation will reuse same event qp and cq.

Because FW will set event qp to error state during virtq destory,
need modify event qp to RESET state, then modify qp to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
previous. Add new variable qp_ci to save SW qp ci. Move qp pi
independently with cq ci.

Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
release.

Signed-off-by: Yajun Wu 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   |  8 
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 12 +-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 57f9b05e35..03ad01c156 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid)
}
mlx5_vdpa_steer_unset(priv);
mlx5_vdpa_virtqs_release(priv);
+   mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
priv->state = MLX5_VDPA_STATE_PROBED;
@@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv 
*priv)
return 0;
for (index = 0; index < (priv->queues * 2); ++index) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+   int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
+   -1, &virtq->eqp);
 
+   if (ret) {
+   DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+   index);
+   return -1;
+   }
if (priv->caps.queue_counters_valid) {
if (!virtq->counters)
virtq->counters =
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f6719a3c60..bf82026e37 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp {
struct mlx5_vdpa_cq cq;
struct mlx5_devx_obj *fw_qp;
struct mlx5_devx_qp sw_qp;
+   uint16_t qp_pi;
 };
 
 struct mlx5_vdpa_query_mr {
@@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  int callfd, struct mlx5_vdpa_event_qp *eqp);
 
 /**
@@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int 
qid,
  */
 int
 mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
+
+/**
+ * Drain virtq CQ CQE.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 7167a98db0..b43dca9255 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
};
uint32_t word;
} last_word;
-   uint16_t next_wqe_counter = cq->cq_ci;
+   uint16_t next_wqe_counter = eqp->qp_pi;
uint16_t cur_wqe_counter;
uint16_t comp;
 
@@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
rte_io_wmb();
/* Ring CQ doorbell record. */
cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+   eqp->qp_pi += comp;
rte_io_wmb();
/* Ring SW QP doorbell record. */
-   eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size);
+   eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size);
}
return comp;
 }
@@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
return max;
 }
 
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
+{
+   unsigned int i;
+
+   for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+   struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
+
+   mlx5_vdpa_queue_complete(cq);
+   if (cq->cq_obj.cq) {
+   cq->cq_obj.cqes[0].wqe_counter =
+   rte_cpu_to_be_16(UINT16_MAX);
+   priv->virtqs[i].eqp.qp_pi = 0;
+   if (!cq->armed)
+   mlx5_vdpa_cq_arm(priv, cq);
+   }
+   }
+}
+
 /* Wait on all CQs channel for completion event. */
 static st

[RFC 03/15] common/mlx5: add DevX API to move QP to reset state

2022-04-08 Thread Li Zhang
From: Yajun Wu 

Support set QP to RESET state.

Signed-off-by: Yajun Wu 
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++
 drivers/common/mlx5/mlx5_prm.h   | 17 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c 
b/drivers/common/mlx5/mlx5_devx_cmds.c
index d02ac2a678..a2943c9a58 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2255,11 +2255,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, 
uint32_t qp_st_mod_op,
uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+   uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)];
} in;
union {
uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+   uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)];
} out;
void *qpc;
int ret;
@@ -2302,6 +2304,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, 
uint32_t qp_st_mod_op,
inlen = sizeof(in.rtr2rts);
outlen = sizeof(out.rtr2rts);
break;
+   case MLX5_CMD_OP_QP_2RST:
+   MLX5_SET(2rst_qp_in, &in, qpn, qp->id);
+   inlen = sizeof(in.qp2rst);
+   outlen = sizeof(out.qp2rst);
+   break;
default:
DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
qp_st_mod_op);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 44b18225f6..cca6bfc6d4 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3653,6 +3653,23 @@ struct mlx5_ifc_init2init_qp_in_bits {
u8 reserved_at_800[0x80];
 };
 
+struct mlx5_ifc_2rst_qp_out_bits {
+   u8 status[0x8];
+   u8 reserved_at_8[0x18];
+   u8 syndrome[0x20];
+   u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_2rst_qp_in_bits {
+   u8 opcode[0x10];
+   u8 uid[0x10];
+   u8 vhca_tunnel_id[0x10];
+   u8 op_mod[0x10];
+   u8 reserved_at_80[0x8];
+   u8 qpn[0x18];
+   u8 reserved_at_a0[0x20];
+};
+
 struct mlx5_ifc_dealloc_pd_out_bits {
u8 status[0x8];
u8 reserved_0[0x18];
-- 
2.27.0



[RFC 07/15] vdpa/mlx5: optimize datapath-control synchronization

2022-04-08 Thread Li Zhang
The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
  parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
  which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   | 24 ---
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 13 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++---
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c| 34 +++---
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 83 +---
 6 files changed, 180 insertions(+), 78 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 03ad01c156..e99c86b3d6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
struct mlx5_vdpa_priv *priv =
mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+   struct mlx5_vdpa_virtq *virtq;
int ret;
 
if (priv == NULL) {
@@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
DRV_LOG(ERR, "Too big vring id: %d.", vring);
return -E2BIG;
}
-   pthread_mutex_lock(&priv->vq_config_lock);
+   virtq = &priv->virtqs[vring];
+   pthread_mutex_lock(&virtq->virtq_lock);
ret = mlx5_vdpa_virtq_enable(priv, vring, state);
-   pthread_mutex_unlock(&priv->vq_config_lock);
+   pthread_mutex_unlock(&virtq->virtq_lock);
return ret;
 }
 
@@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid)
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
+   pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
+   pthread_mutex_unlock(&priv->steer_update_lock);
mlx5_vdpa_virtqs_release(priv);
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
@@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid)
if (!priv->connected)
mlx5_vdpa_dev_cache_clean(priv);
priv->vid = 0;
-   /* The mutex may stay locked after event thread cancel - initiate it. */
-   pthread_mutex_init(&priv->vq_config_lock, NULL);
DRV_LOG(INFO, "vDPA device %d was closed.", vid);
return ret;
 }
@@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+   struct mlx5_vdpa_virtq *virtq;
uint32_t index;
uint32_t i;
 
+   for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+   index++) {
+   virtq = &priv->virtqs[index];
+   pthread_mutex_init(&virtq->virtq_lock, NULL);
+   }
if (!priv->queues)
return 0;
for (index = 0; index < (priv->queues * 2); ++index) {
-   struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+   virtq = &priv->virtqs[index];
int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-   -1, &virtq->eqp);
+   -1, virtq);
 
if (ret) {
DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
@@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
priv->num_lag_ports = attr->num_lag_ports;
if (attr->num_lag_ports == 0)
priv->num_lag_ports = 1;
-   pthread_mutex_init(&priv->vq_config_lock, NULL);
+   rte_spinlock_init(&priv->db_lock);
+   pthread_mutex_init(&priv->steer_update_lock, NULL);
priv->cdev = cdev;
mlx5_vdpa_config_get(mkvlist, priv);
if (mlx5_vdpa_create_dev_resources(priv))
@@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
mlx5_vdpa_release_dev_resources(priv);
if (priv->vdev)
rte_vdpa_unregister_device(priv->vdev);
-   pthread_mutex_destroy(&priv->vq_config_lock);
rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e5553079fe..3fd5eefc5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq {
bool stopped;
uint32_t configured:1;
uint32_t version;
+   pthread_mutex_t virtq_lock;
struct mlx5_vdpa_priv *priv;
struct mlx5_devx_obj *virtq;
struct mlx5_devx_obj *counters;
@@ -126,7 +127,8 @@ struct mlx5_vdpa_priv {
TAILQ_EN

[RFC 06/15] vdpa/mlx5: pre-create virtq in the prob

2022-04-08 Thread Li Zhang
dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h   |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c|  13 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 257 +---
 3 files changed, 170 insertions(+), 104 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index bf82026e37..e5553079fe 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
uint16_t vq_size;
uint8_t notifier_state;
bool stopped;
+   uint32_t configured:1;
uint32_t version;
struct mlx5_vdpa_priv *priv;
struct mlx5_devx_obj *virtq;
@@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, 
int qid);
  */
 void
 mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 43a2b98255..a8faf0c116 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -12,14 +12,17 @@ int
 mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+   .mod_fields_bitmap =
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
.dirty_bitmap_dump_enable = enable,
};
+   struct mlx5_vdpa_virtq *virtq;
int i;
 
for (i = 0; i < priv->nr_virtqs; ++i) {
attr.queue_index = i;
-   if (!priv->virtqs[i].virtq) {
+   virtq = &priv->virtqs[i];
+   if (!virtq->configured) {
DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
"enabling.", i);
} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
@@ -37,10 +40,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
   uint64_t log_size)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+   .mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
.dirty_bitmap_addr = log_base,
.dirty_bitmap_size = log_size,
};
+   struct mlx5_vdpa_virtq *virtq;
int i;
int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd,
  priv->cdev->pdn,
@@ -54,7 +58,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
attr.dirty_bitmap_mkey = priv->lm_mr.lkey;
for (i = 0; i < priv->nr_virtqs; ++i) {
attr.queue_index = i;
-   if (!priv->virtqs[i].virtq) {
+   virtq = &priv->virtqs[i];
+   if (!virtq->configured) {
DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
  &attr)) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 28cef69a58..ef5bf1ef01 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+   virtq->configured = 0;
for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
if (virtq->umems[j].obj) {
claim_zero(mlx5_glue->devx_umem_dereg
@@ -111,11 +112,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
rte_intr_fd_set(virtq->intr_handle, -1);
}
rte_intr_instance_free(virtq->intr_handle);
-   if (virtq->virtq) {
+   if (virtq->configured) {
ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
if (ret)
DRV_LOG(WARNING, "Failed to stop virtq %d.",
virtq->index);
+   virtq->configured = 0;
claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
}
virtq->virtq = NULL;
@@ -138,7 +140,7 @@ int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
struct mlx5_devx_virtq_attr attr = {
-   .type = MLX5_VIRTQ_M

[RFC 08/15] vdpa/mlx5: add multi-thread management for configuration

2022-04-08 Thread Li Zhang
The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang 
---
 doc/guides/vdpadevs/mlx5.rst  |  11 +++
 drivers/vdpa/mlx5/meson.build |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c |  41 
 drivers/vdpa/mlx5/mlx5_vdpa.h |  36 +++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
 7 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 0ad77bf535..b75a01688d 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 
drivers.
   CPU core number to set polling thread affinity to, default to control plane
   cpu.
 
+- ``max_conf_threads`` parameter [int]
+
+  Allow the driver to use internal threads to obtain fast configuration.
+  All the threads will be open on the same core of the event completion queue 
scheduling thread.
+
+  - 0, default, don't use internal threads for configuration.
+
+  - 1 - 256, number of internal threads in addition to the caller thread (8 is 
suggested).
+This value, if not 0, should be the same for all the devices;
+the first prob will take it with the event_core for all the multi-thread 
configurations in the driver.
+
 - ``hw_latency_mode`` parameter [int]
 
   The completion queue moderation mode:
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 0fa82ad257..9d8dbb1a82 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
 'mlx5_vdpa_virtq.c',
 'mlx5_vdpa_steer.c',
 'mlx5_vdpa_lm.c',
+'mlx5_vdpa_cthread.c',
 )
 cflags_options = [
 '-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e99c86b3d6..eace0e4c9e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
  TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
 
 static struct mlx5_vdpa_priv *
@@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char 
*val, void *opaque)
DRV_LOG(WARNING, "Invalid event_core %s.", val);
else
priv->event_core = tmp;
+   } else if (strcmp(key, "max_conf_threads") == 0) {
+   if (tmp) {
+   priv->use_c_thread = true;
+   if (!conf_thread_mng.initializer_priv) {
+   conf_thread_mng.initializer_priv = priv;
+   if (tmp > MLX5_VDPA_MAX_C_THRD) {
+   DRV_LOG(WARNING,
+   "Invalid max_conf_threads %s "
+   "and set max_conf_threads to %d",
+   val, MLX5_VDPA_MAX_C_THRD);
+   tmp = MLX5_VDPA_MAX_C_THRD;
+   }
+   conf_thread_mng.max_thrds = tmp;
+   } else if (tmp != conf_thread_mng.max_thrds) {
+   DRV_LOG(WARNING,
+   "max_conf_threads is PMD argument and not per device, "
+   "only the first device configuration set it, current value is %d "
+   "and will not be changed to %d.",
+   conf_thread_mng.max_thrds, (int)tmp);
+   }
+   } else {
+   priv->use_c_thread = false;
+   }
} else if (strcmp(key, "hw_latency_mode") == 0) {
priv->hw_latency_mode = (uint32_t)tmp;
} else if (strcmp(key, "hw_max_latency_us") == 0) {
@@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
"hw_max_latency_us",
"hw_max_pending_comp",
"no_traffic_time",
+   "queue_size",
+   "queues",
+   "max_conf_threads",
NULL,
 

[RFC 05/15] common/mlx5: extend virtq modifiable fields

2022-04-08 Thread Li Zhang
A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang 
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++-
 drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
 drivers/common/mlx5/mlx5_prm.h   | 13 +-
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c 
b/drivers/common/mlx5/mlx5_devx_cmds.c
index a2943c9a58..fd5b5dd378 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx,
vdpa_attr->log_doorbell_stride =
MLX5_GET(virtio_emulation_cap, hcattr,
 log_doorbell_stride);
+   vdpa_attr->vnet_modify_ext =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+vnet_modify_ext);
+   vdpa_attr->virtio_net_q_addr_modify =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+virtio_net_q_addr_modify);
+   vdpa_attr->virtio_q_index_modify =
+   MLX5_GET(virtio_emulation_cap, hcattr,
+virtio_q_index_modify);
vdpa_attr->log_doorbell_bar_size =
MLX5_GET(virtio_emulation_cap, hcattr,
 log_doorbell_bar_size);
@@ -2065,27 +2074,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj 
*virtq_obj,
MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
-   MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+   MLX5_SET64(virtio_net_q, virtq, modify_field_select,
+   attr->mod_fields_bitmap);
MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
-   switch (attr->type) {
-   case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+   if (!attr->mod_fields_bitmap) {
+   DRV_LOG(ERR, "Failed to modify VIRTQ for no type set.");
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE)
MLX5_SET16(virtio_net_q, virtq, state, attr->state);
-   break;
-   case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+   if (attr->mod_fields_bitmap &
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) {
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
 attr->dirty_bitmap_mkey);
MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
 attr->dirty_bitmap_addr);
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
 attr->dirty_bitmap_size);
-   break;
-   case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+   }
+   if (attr->mod_fields_bitmap &
+   MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE)
MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
 attr->dirty_bitmap_dump_enable);
-   break;
-   default:
-   rte_errno = EINVAL;
-   return -rte_errno;
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) {
+   MLX5_SET(virtio_q, virtctx, queue_period_mode,
+   attr->hw_latency_mode);
+   MLX5_SET(virtio_q, virtctx, queue_period_us,
+   attr->hw_max_latency_us);
+   MLX5_SET(virtio_q, virtctx, queue_max_count,
+   attr->hw_max_pending_comp);
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) {
+   MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+   MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+   MLX5_SET64(virtio_q, virtctx, available_addr,
+   attr->available_addr);
+   }
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX)
+   MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+  attr->hw_available_index);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX)
+   MLX5_SET16(virtio_net_q, virtq, hw_used_index,
+   attr->hw_used_index);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE)
+   MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type);
+   if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0)
+   MLX5_SET16(virtio_q, virtctx, virtio_version

[RFC 09/15] vdpa/mlx5: add task ring for MT management

2022-04-08 Thread Li Zhang
The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |  17 
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 118 +-
 2 files changed, 133 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 4e7c2557b7..2bbb868ec6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -74,10 +74,22 @@ enum {
 };
 
 #define MLX5_VDPA_MAX_C_THRD 256
+#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
+#define MLX5_VDPA_TASKS_PER_DEV 64
+
+/* Generic task information and size must be multiple of 4B. */
+struct mlx5_vdpa_task {
+   struct mlx5_vdpa_priv *priv;
+   uint32_t *remaining_cnt;
+   uint32_t *err_cnt;
+   uint32_t idx;
+} __rte_packed __rte_aligned(4);
 
 /* Generic mlx5_vdpa_c_thread information. */
 struct mlx5_vdpa_c_thread {
pthread_t tid;
+   struct rte_ring *rng;
+   pthread_cond_t c_cond;
 };
 
 struct mlx5_vdpa_conf_thread_mng {
@@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core);
  */
 void
 mlx5_vdpa_mult_threads_destroy(bool need_unlock);
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+   uint32_t thrd_idx,
+   uint32_t num);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index ba7d8b63b3..8475d7788a 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -11,17 +11,106 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r,
+   void **obj, uint32_t n, uint32_t *avail)
+{
+   uint32_t m;
+
+   m = rte_ring_dequeue_bulk_elem_start(r, obj,
+   sizeof(struct mlx5_vdpa_task), n, avail);
+   n = (m == n) ? n : 0;
+   rte_ring_dequeue_elem_finish(r, n);
+   return n;
+}
+
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
+   void * const *obj, uint32_t n, uint32_t *free)
+{
+   uint32_t m;
+
+   m = rte_ring_enqueue_bulk_elem_start(r, n, free);
+   n = (m == n) ? n : 0;
+   rte_ring_enqueue_elem_finish(r, obj,
+   sizeof(struct mlx5_vdpa_task), n);
+   return n;
+}
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+   uint32_t thrd_idx,
+   uint32_t num)
+{
+   struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
+   struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+   uint32_t i;
+
+   MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
+   for (i = 0 ; i < num; i++) {
+   task[i].priv = priv;
+   /* To be added later. */
+   }
+   if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
+   return -1;
+   for (i = 0 ; i < num; i++)
+   if (task[i].remaining_cnt)
+   __atomic_fetch_add(task[i].remaining_cnt, 1,
+   __ATOMIC_RELAXED);
+   /* wake up conf thread. */
+   pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+   pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond);
+   pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+   return 0;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
-   /* To be added later. */
-   return arg;
+   struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
+   pthread_t thread_id = pthread_self();
+   struct mlx5_vdpa_priv *priv;
+   struct mlx5_vdpa_task task;
+   struct rte_ring *rng;
+   uint32_t thrd_idx;
+   uint32_t task_num;
+
+   for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
+   thrd_idx++)
+   if (multhrd->cthrd[thrd_idx].tid == thread_id)
+   break;
+   if (thrd_idx >= multhrd->max_thrds) {
+   DRV_LOG(ERR, "Invalid thread_id 0x%lx in vdpa multi-thread",
+   thread_id);
+   return NULL;
+   }
+   rng = multhrd->cthrd[thrd_idx].rng;
+   while (1) {
+   task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng,
+   (void **)&task, 1, NULL);
+   if (!task_num) {
+   /* No task and condition wait. */
+   pthread_mutex_lock(&multhrd->cthrd_lock);
+ 

[RFC 10/15] vdpa/mlx5: add MT task for VM memory registration

2022-04-08 Thread Li Zhang
The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the master thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM proccess and
reduce its time by 5%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.h |  31 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 -
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 268 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
 5 files changed, 256 insertions(+), 97 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index eace0e4c9e..8dd8e6a2a0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
rte_errno = rte_errno ? rte_errno : EINVAL;
goto error;
}
-   SLIST_INIT(&priv->mr_list);
pthread_mutex_lock(&priv_list_lock);
TAILQ_INSERT_TAIL(&priv_list, priv, next);
pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2bbb868ec6..3316ce42be 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp {
 };
 
 struct mlx5_vdpa_query_mr {
-   SLIST_ENTRY(mlx5_vdpa_query_mr) next;
union {
struct ibv_mr *mr;
struct mlx5_devx_obj *mkey;
@@ -76,10 +75,17 @@ enum {
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
+#define MLX5_VDPA_MAX_MRS 0x
+
+/* Vdpa task types. */
+enum mlx5_vdpa_task_type {
+   MLX5_VDPA_TASK_REG_MR = 1,
+};
 
 /* Generic task information and size must be multiple of 4B. */
 struct mlx5_vdpa_task {
struct mlx5_vdpa_priv *priv;
+   enum mlx5_vdpa_task_type type;
uint32_t *remaining_cnt;
uint32_t *err_cnt;
uint32_t idx;
@@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng {
 };
 extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
 
+struct mlx5_vdpa_vmem_info {
+   struct rte_vhost_memory *vmem;
+   uint32_t entries_num;
+   uint64_t gcd;
+   uint64_t size;
+   uint8_t mode;
+};
+
 struct mlx5_vdpa_virtq {
SLIST_ENTRY(mlx5_vdpa_virtq) next;
uint8_t enable;
@@ -176,7 +190,7 @@ struct mlx5_vdpa_priv {
struct mlx5_hca_vdpa_attr caps;
uint32_t gpa_mkey_index;
struct ibv_mr *null_mr;
-   struct rte_vhost_memory *vmem;
+   struct mlx5_vdpa_vmem_info vmem_info;
struct mlx5dv_devx_event_channel *eventc;
struct mlx5dv_devx_event_channel *err_chnl;
struct mlx5_uar uar;
@@ -187,11 +201,13 @@ struct mlx5_vdpa_priv {
uint8_t num_lag_ports;
uint64_t features; /* Negotiated features. */
uint16_t log_max_rqt_size;
+   uint16_t last_c_thrd_idx;
+   uint16_t num_mrs; /* Number of memory regions. */
struct mlx5_vdpa_steer steer;
struct mlx5dv_var *var;
void *virtq_db_addr;
struct mlx5_pmd_wrapped_mr lm_mr;
-   SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+   struct mlx5_vdpa_query_mr **mrs;
struct mlx5_vdpa_virtq virtqs[];
 };
 
@@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
-   uint32_t num);
+   enum mlx5_vdpa_task_type task_type,
+   uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+   void **task_data, uint32_t num);
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+   uint32_t *err_cnt, uint32_t sleep_time);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 8475d7788a..22e24f7e75 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
-   uint32_t num)
+   enum mlx5_vdpa_task_type task_type,
+   uint32_t *remaining_cnt, uint32_t *err_cnt,
+   void **task_data, uint32_t num)
 {
struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+   uint32_t *data = (uint32_t *)task_data;
uint32_t i;
 

[RFC 11/15] vdpa/mlx5: add virtq creation task for MT management

2022-04-08 Thread Li Zhang
The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |   9 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 148 +++---
 4 files changed, 133 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3316ce42be..35221f5ddc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ enum {
 /* Vdpa task types. */
 enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
+   MLX5_VDPA_TASK_SETUP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info {
 
 struct mlx5_vdpa_virtq {
SLIST_ENTRY(mlx5_vdpa_virtq) next;
-   uint8_t enable;
uint16_t index;
uint16_t vq_size;
uint8_t notifier_state;
-   bool stopped;
uint32_t configured:1;
+   uint32_t enable:1;
+   uint32_t stopped:1;
uint32_t version;
pthread_mutex_t virtq_lock;
struct mlx5_vdpa_priv *priv;
@@ -565,11 +566,13 @@ bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
uint32_t thrd_idx,
enum mlx5_vdpa_task_type task_type,
-   uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+   uint32_t *remaining_cnt, uint32_t *err_cnt,
void **task_data, uint32_t num);
 int
 mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
 bool
 mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
uint32_t *err_cnt, uint32_t sleep_time);
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 22e24f7e75..a2d1ddb1e1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 {
struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
pthread_t thread_id = pthread_self();
+   struct mlx5_vdpa_virtq *virtq;
struct mlx5_vdpa_priv *priv;
struct mlx5_vdpa_task task;
struct rte_ring *rng;
@@ -142,6 +143,19 @@ mlx5_vdpa_c_thread_handle(void *arg)
__ATOMIC_RELAXED);
}
break;
+   case MLX5_VDPA_TASK_SETUP_VIRTQ:
+   virtq = &priv->virtqs[task.idx];
+   pthread_mutex_lock(&virtq->virtq_lock);
+   ret = mlx5_vdpa_virtq_setup(priv,
+   task.idx, false);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to setup virtq %d.", task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1, __ATOMIC_RELAXED);
+   }
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
default:
DRV_LOG(ERR, "Invalid vdpa task type %d.",
task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b45fbac146..f782b6b832 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
goto unlock;
if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
goto unlock;
-   virtq->stopped = true;
+   virtq->stopped = 1;
/* Query error info. */
if (mlx5_vdpa_virtq_query(priv, vq_index))
goto log;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 3be09f218f..127b1cee7f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -108,8 +108,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+   if (virtq->index != i)
+   continue;
pthread_mutex_lock(&virtq->virtq_lock);
-   virtq->configured = 0;
for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
if (virtq->umems[j].obj) {
claim_zero(mlx5_glue->devx_umem_dereg
@@ -128,7 +129,6 @@ mlx5_vdpa_virtqs_cleanup(stru

[RFC 12/15] vdpa/mlx5: add virtq LM log task

2022-04-08 Thread Li Zhang
Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.h |  3 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 ++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c  | 90 ++-
 3 files changed, 110 insertions(+), 17 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 35221f5ddc..e08931719f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -72,6 +72,8 @@ enum {
MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_USED_RING_LEN(size) \
+   ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
@@ -81,6 +83,7 @@ enum {
 enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
MLX5_VDPA_TASK_SETUP_VIRTQ,
+   MLX5_VDPA_TASK_STOP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index a2d1ddb1e1..0e54226a90 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
struct mlx5_vdpa_priv *priv;
struct mlx5_vdpa_task task;
struct rte_ring *rng;
+   uint64_t features;
uint32_t thrd_idx;
uint32_t task_num;
int ret;
@@ -156,6 +157,39 @@ mlx5_vdpa_c_thread_handle(void *arg)
}
pthread_mutex_unlock(&virtq->virtq_lock);
break;
+   case MLX5_VDPA_TASK_STOP_VIRTQ:
+   virtq = &priv->virtqs[task.idx];
+   pthread_mutex_lock(&virtq->virtq_lock);
+   ret = mlx5_vdpa_virtq_stop(priv,
+   task.idx);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to stop virtq %d.",
+   task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1,
+   __ATOMIC_RELAXED);
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
+   }
+   ret = rte_vhost_get_negotiated_features(
+   priv->vid, &features);
+   if (ret) {
+   DRV_LOG(ERR,
+   "Failed to get negotiated features virtq %d.",
+   task.idx);
+   __atomic_fetch_add(
+   task.err_cnt, 1,
+   __ATOMIC_RELAXED);
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
+   }
+   if (RTE_VHOST_NEED_LOG(features))
+   rte_vhost_log_used_vring(
+   priv->vid, task.idx, 0,
+   MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+   pthread_mutex_unlock(&virtq->virtq_lock);
+   break;
default:
DRV_LOG(ERR, "Invalid vdpa task type %d.",
task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index efebf364d0..07575ea8a9 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -89,39 +89,95 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, 
uint64_t log_base,
return -1;
 }
 
-#define MLX5_VDPA_USED_RING_LEN(size) \
-   ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
-
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+   uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+   uint32_t i, thrd_idx, data[1];
struct mlx5_vdpa_virtq *virtq;
uint64_t features;
-   int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
-   int i;
+   int ret;
 
+   ret = rte_vhost_get_negotiated_features(priv->vid, &features);
if (ret) {
DRV_LOG(ERR, "Failed to get negotiated features.");
return -1;
}
-   if (!RTE_VHOST_NEED_LOG(features))
-   return 0;
-   for (i = 0; i < priv->nr_virtqs; ++i) {
-   virtq = &priv->virtqs[i];
-   if (!priv->virtqs[i].virtq) {
-   DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
-   } else {
+   if (priv->use_c_thread && priv->nr_virtqs) {
+   uint32_t main_task_idx[priv->nr_virtqs];
+
+   for (i = 0; i < pr

[RFC 13/15] vdpa/mlx5: add device close task

2022-04-08 Thread Li Zhang
Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 51 +--
 drivers/vdpa/mlx5/mlx5_vdpa.h |  8 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 
 4 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 8dd8e6a2a0..d349682a83 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
-static void
+void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
/* Clean pre-created resource in dev removal only. */
@@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
mlx5_vdpa_mem_dereg(priv);
 }
 
+static bool
+mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
+{
+   uint32_t timeout = 0;
+
+   /* Check and wait all close tasks done. */
+   while (__atomic_load_n(&priv->dev_close_progress,
+   __ATOMIC_RELAXED) != 0 && timeout < 1000) {
+   rte_delay_us_sleep(1);
+   timeout++;
+   }
+   if (priv->dev_close_progress) {
+   DRV_LOG(ERR,
+   "Failed to wait close device tasks done vid %d.",
+   priv->vid);
+   return true;
+   }
+   return false;
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid)
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
+   if (priv->use_c_thread) {
+   if (priv->last_c_thrd_idx >=
+   (conf_thread_mng.max_thrds - 1))
+   priv->last_c_thrd_idx = 0;
+   else
+   priv->last_c_thrd_idx++;
+   __atomic_store_n(&priv->dev_close_progress,
+   1, __ATOMIC_RELAXED);
+   if (mlx5_vdpa_task_add(priv,
+   priv->last_c_thrd_idx,
+   MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+   NULL, NULL, NULL, 1)) {
+   DRV_LOG(ERR,
+   "Fail to add dev close task. ");
+   goto single_thrd;
+   }
+   priv->state = MLX5_VDPA_STATE_PROBED;
+   DRV_LOG(INFO, "vDPA device %d was closed.", vid);
+   return ret;
+   }
+single_thrd:
pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
pthread_mutex_unlock(&priv->steer_update_lock);
@@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid)
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-   priv->state = MLX5_VDPA_STATE_PROBED;
if (!priv->connected)
mlx5_vdpa_dev_cache_clean(priv);
priv->vid = 0;
+   __atomic_store_n(&priv->dev_close_progress, 0,
+   __ATOMIC_RELAXED);
+   priv->state = MLX5_VDPA_STATE_PROBED;
DRV_LOG(INFO, "vDPA device %d was closed.", vid);
return ret;
 }
@@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid)
DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
return -1;
}
+   if (mlx5_vdpa_wait_dev_close_tasks_done(priv))
+   return -1;
priv->vid = vid;
priv->connected = true;
if (mlx5_vdpa_mtu_set(priv))
@@ -839,6 +884,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
mlx5_vdpa_dev_close(priv->vid);
+   if (priv->use_c_thread)
+   mlx5_vdpa_wait_dev_close_tasks_done(priv);
mlx5_vdpa_release_dev_resources(priv);
if (priv->vdev)
rte_vdpa_unregister_device(priv->vdev);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e08931719f..b6392b9d66 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type {
MLX5_VDPA_TASK_REG_MR = 1,
MLX5_VDPA_TASK_SETUP_VIRTQ,
MLX5_VDPA_TASK_STOP_VIRTQ,
+   MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -206,6 +207,7 @@ struct mlx5_vdpa_priv {
uint64_t features; /* Negotiated features. */
uint16_t log_max_rqt_size;
uint16_t last_c_thrd_idx;
+   uint16_t dev_close_progress;
uint16_t num_mrs; /* Number of memory regions. */
struct mlx5_vdpa_steer steer;
struct mlx5dv_var *var;
@@ -578,4 +580,10 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t 
*remaining_cnt,
   

[RFC 14/15] vdpa/mlx5: add virtq sub-resources creation

2022-04-08 Thread Li Zhang
pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang 
Signed-off-by: Yajun Wu 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c   | 68 ++--
 drivers/vdpa/mlx5/mlx5_vdpa.h   | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c |  9 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 15 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +
 5 files changed, 117 insertions(+), 91 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d349682a83..eaca571e3e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -624,65 +624,37 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-   struct mlx5_vdpa_virtq *virtq;
+   uint32_t max_queues = priv->queues * 2;
uint32_t index;
-   uint32_t i;
+   struct mlx5_vdpa_virtq *virtq;
 
for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
index++) {
virtq = &priv->virtqs[index];
pthread_mutex_init(&virtq->virtq_lock, NULL);
}
-   if (!priv->queues)
+   if (!priv->queues || !priv->queue_size)
return 0;
-   for (index = 0; index < (priv->queues * 2); ++index) {
+   for (index = 0; index < max_queues; ++index)
+   if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+   index))
+   goto error;
+   if (mlx5_vdpa_is_modify_virtq_supported(priv))
+   if (mlx5_vdpa_steer_update(priv, true))
+   goto error;
+   return 0;
+error:
+   for (index = 0; index < max_queues; ++index) {
virtq = &priv->virtqs[index];
-   int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-   -1, virtq);
-
-   if (ret) {
-   DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-   index);
-   return -1;
-   }
-   if (priv->caps.queue_counters_valid) {
-   if (!virtq->counters)
-   virtq->counters =
-   mlx5_devx_cmd_create_virtio_q_counters
-   (priv->cdev->ctx);
-   if (!virtq->counters) {
-   DRV_LOG(ERR, "Failed to create virtq couners 
for virtq"
-   " %d.", index);
-   return -1;
-   }
-   }
-   for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-   uint32_t size;
-   void *buf;
-   struct mlx5dv_devx_umem *obj;
-
-   size = priv->caps.umems[i].a * priv->queue_size +
-   priv->caps.umems[i].b;
-   buf = rte_zmalloc(__func__, size, 4096);
-   if (buf == NULL) {
-   DRV_LOG(ERR, "Cannot allocate umem %d memory 
for virtq"
-   " %u.", i, index);
-   return -1;
-   }
-   obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
-   size, IBV_ACCESS_LOCAL_WRITE);
-   if (obj == NULL) {
-   rte_free(buf);
-   DRV_LOG(ERR, "Failed to register umem %d for 
virtq %u.",
-   i, index);
-   return -1;
-   }
-   virtq->umems[i].size = size;
-   virtq->umems[i].buf = buf;
-   virtq->umems[i].obj = obj;
+   if (virtq->virtq) {
+   pthread_mutex_lock(&virtq->virtq_lock);
+   mlx5_vdpa_virtq_unset(virtq);
+   pthread_mutex_unlock(&virtq->virtq_lock);
}
}
-   return 0;
+   if (mlx5_vdpa_is_modify_virtq_supported(priv))
+   mlx5_vdpa_steer_unset(priv);
+   return -1;
 }
 
 static int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index b6392b9d66..00700261ec 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   The guest notification file descriptor.
  * @param[in/out] virtq
  *   Pointer to the virt-queue structure.
+ * @param[in] reset
+ *   If ture, it will reset event qp.
  *
  * @return
  *   0

[RFC 15/15] vdpa/mlx5: prepare virtqueue resource creation

2022-04-08 Thread Li Zhang
Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang 
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 69 +++
 drivers/vdpa/mlx5/mlx5_vdpa.h |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 14 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 35 ++
 4 files changed, 104 insertions(+), 21 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index eaca571e3e..15ce30bc49 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -275,13 +275,17 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv 
*priv)
 }
 
 static int
-mlx5_vdpa_dev_close(int vid)
+_internal_mlx5_vdpa_dev_close(int vid, bool release_resource)
 {
struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
-   struct mlx5_vdpa_priv *priv =
-   mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+   struct mlx5_vdpa_priv *priv;
int ret = 0;
 
+   if (!vdev) {
+   DRV_LOG(ERR, "Invalid vDPA device.");
+   return -1;
+   }
+   priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
if (priv == NULL) {
DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
return -1;
@@ -291,7 +295,7 @@ mlx5_vdpa_dev_close(int vid)
ret |= mlx5_vdpa_lm_log(priv);
priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
}
-   if (priv->use_c_thread) {
+   if (priv->use_c_thread && !release_resource) {
if (priv->last_c_thrd_idx >=
(conf_thread_mng.max_thrds - 1))
priv->last_c_thrd_idx = 0;
@@ -315,7 +319,7 @@ mlx5_vdpa_dev_close(int vid)
pthread_mutex_lock(&priv->steer_update_lock);
mlx5_vdpa_steer_unset(priv);
pthread_mutex_unlock(&priv->steer_update_lock);
-   mlx5_vdpa_virtqs_release(priv);
+   mlx5_vdpa_virtqs_release(priv, release_resource);
mlx5_vdpa_drain_cq(priv);
if (priv->lm_mr.addr)
mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
@@ -329,6 +333,12 @@ mlx5_vdpa_dev_close(int vid)
return ret;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+   return _internal_mlx5_vdpa_dev_close(vid, false);
+}
+
 static int
 mlx5_vdpa_dev_config(int vid)
 {
@@ -624,8 +634,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+   uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
uint32_t max_queues = priv->queues * 2;
-   uint32_t index;
+   uint32_t index, thrd_idx, data[1];
struct mlx5_vdpa_virtq *virtq;
 
for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
@@ -635,10 +646,48 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv 
*priv)
}
if (!priv->queues || !priv->queue_size)
return 0;
-   for (index = 0; index < max_queues; ++index)
-   if (mlx5_vdpa_virtq_single_resource_prepare(priv,
-   index))
+   if (priv->use_c_thread) {
+   uint32_t main_task_idx[max_queues];
+
+   for (index = 0; index < max_queues; ++index) {
+   virtq = &priv->virtqs[index];
+   thrd_idx = index % (conf_thread_mng.max_thrds + 1);
+   if (!thrd_idx) {
+   main_task_idx[task_num] = index;
+   task_num++;
+   continue;
+   }
+   thrd_idx = priv->last_c_thrd_idx + 1;
+   if (thrd_idx >= conf_thread_mng.max_thrds)
+   thrd_idx = 0;
+   priv->last_c_thrd_idx = thrd_idx;
+   data[0] = index;
+   if (mlx5_vdpa_task_add(priv, thrd_idx,
+   MLX5_VDPA_TASK_PREPARE_VIRTQ,
+   &remaining_cnt, &err_cnt,
+   (void **)&data, 1)) {
+   DRV_LOG(ERR, "Fail to add "
+   "task prepare virtq (%d).", index);
+   main_task_idx[task_num] = index;
+   task_num++;
+   }
+   }
+   for (index = 0; index < task_num; ++index)
+   if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+   main_task_idx[index]))
+   goto error;
+   if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+   &err_cnt, 2000)) {
+   DRV_LOG(ERR,
+   "Failed to wait virt-queue prepare tasks ready.")

[PATCH 0/2] crypto/qat: add gen4 ecdsa and ecpm functions

2022-04-08 Thread Arek Kusztal
This commit adds functions for ecdsa and multiplication using
named curves. This will speed up calculation of multiplication and
signature generation for curves P-256 and P-384.

Depends-on: series-22343 ("crypto/qat: refactor asym algorithm macros and logs")
Depends-on: patch-109417 ("crypto/qat: enable asymmetric crypto on gen4 device")


Arek Kusztal (2):
  crypto/qat: add gen4 ecdsa functions
  crypto/qat: add gen4 ecpm functions

 .../common/qat/qat_adf/icp_qat_fw_mmp_ids.h   |  59 
 drivers/common/qat/qat_adf/qat_pke.h  |  40 +
 drivers/crypto/qat/qat_asym.c | 142 +-
 3 files changed, 234 insertions(+), 7 deletions(-)

-- 
2.30.2



[PATCH 1/2] crypto/qat: add gen4 ecdsa functions

2022-04-08 Thread Arek Kusztal
This commit adds generation of ecdsa with named
curves. This will speed up calculation of
signature for curves P-256 and P-384.

Signed-off-by: Arek Kusztal 
---
 .../common/qat/qat_adf/icp_qat_fw_mmp_ids.h   | 17 +
 drivers/common/qat/qat_adf/qat_pke.h  | 20 +
 drivers/crypto/qat/qat_asym.c | 75 +--
 3 files changed, 107 insertions(+), 5 deletions(-)

diff --git a/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h 
b/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
index 00813cffb9..9276f954f1 100644
--- a/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
+++ b/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
@@ -1524,6 +1524,23 @@ icp_qat_fw_mmp_ecdsa_verify_gfp_521_input::in in @endlink
  * icp_qat_fw_mmp_kpt_ecdsa_sign_rs_gfp_521_output::s s @endlink
  */
 
+#define PKE_ECDSA_SIGN_RS_P256 0x18133566
+/**< Functionality ID for ECC P256 ECDSA Sign RS
+ * @li 3 input parameters : @link icp_qat_fw_mmp_ecdsa_sign_rs_p256_input_s::k 
k
+ * @endlink @link icp_qat_fw_mmp_ecdsa_sign_rs_p256_input_s::e e @endlink @link
+ * icp_qat_fw_mmp_ecdsa_sign_rs_p256_input_s::d d @endlink
+ * @li 2 output parameters : @link 
icp_qat_fw_mmp_ecdsa_sign_rs_p256_output_s::r
+ * r @endlink @link icp_qat_fw_mmp_ecdsa_sign_rs_p256_output_s::s s @endlink
+ */
+#define PKE_ECDSA_SIGN_RS_P384 0x1a1335a6
+/**< Functionality ID for ECC P384 ECDSA Sign RS
+ * @li 3 input parameters : @link icp_qat_fw_mmp_ecdsa_sign_rs_p384_input_s::k 
k
+ * @endlink @link icp_qat_fw_mmp_ecdsa_sign_rs_p384_input_s::e e @endlink @link
+ * icp_qat_fw_mmp_ecdsa_sign_rs_p384_input_s::d d @endlink
+ * @li 2 output parameters : @link 
icp_qat_fw_mmp_ecdsa_sign_rs_p384_output_s::r
+ * r @endlink @link icp_qat_fw_mmp_ecdsa_sign_rs_p384_output_s::s s @endlink
+ */
+
 #define PKE_LIVENESS 0x0001
 /**< Functionality ID for PKE_LIVENESS
  * @li 0 input parameter(s)
diff --git a/drivers/common/qat/qat_adf/qat_pke.h 
b/drivers/common/qat/qat_adf/qat_pke.h
index 6c12bfd989..87b6a383b3 100644
--- a/drivers/common/qat/qat_adf/qat_pke.h
+++ b/drivers/common/qat/qat_adf/qat_pke.h
@@ -266,6 +266,26 @@ get_ecdsa_function(struct rte_crypto_asym_xform *xform)
return qat_function;
 }
 
+static struct qat_asym_function
+get_ecdsa_named_function(struct rte_crypto_asym_xform *xform)
+{
+   struct qat_asym_function qat_function;
+
+   switch (xform->ec.curve_id) {
+   case RTE_CRYPTO_EC_GROUP_SECP256R1:
+   qat_function.func_id = PKE_ECDSA_SIGN_RS_P256;
+   qat_function.bytesize = 32;
+   break;
+   case RTE_CRYPTO_EC_GROUP_SECP384R1:
+   qat_function.func_id = PKE_ECDSA_SIGN_RS_P384;
+   qat_function.bytesize = 48;
+   break;
+   default:
+   qat_function.func_id = 0;
+   }
+   return qat_function;
+}
+
 static struct qat_asym_function
 get_ecpm_function(struct rte_crypto_asym_xform *xform)
 {
diff --git a/drivers/crypto/qat/qat_asym.c b/drivers/crypto/qat/qat_asym.c
index d2041b2efa..0ac2bf7405 100644
--- a/drivers/crypto/qat/qat_asym.c
+++ b/drivers/crypto/qat/qat_asym.c
@@ -548,7 +548,7 @@ rsa_collect(struct rte_crypto_asym_op *asym_op,
 }
 
 static int
-ecdsa_set_input(struct rte_crypto_asym_op *asym_op,
+ecdsa_set_generic_input(struct rte_crypto_asym_op *asym_op,
struct icp_qat_fw_pke_request *qat_req,
struct qat_asym_op_cookie *cookie,
struct rte_crypto_asym_xform *xform)
@@ -649,6 +649,70 @@ ecdsa_set_input(struct rte_crypto_asym_op *asym_op,
return 0;
 }
 
+static int
+ecdsa_set_named_input(struct rte_crypto_asym_op *asym_op,
+   struct icp_qat_fw_pke_request *qat_req,
+   struct qat_asym_op_cookie *cookie,
+   struct rte_crypto_asym_xform *xform)
+{
+   struct qat_asym_function qat_function;
+   uint32_t qat_func_alignsize, func_id;
+   int curve_id;
+
+   curve_id = pick_curve(xform);
+   if (curve_id < 0) {
+   QAT_LOG(DEBUG, "Incorrect elliptic curve");
+   return -EINVAL;
+   }
+
+   qat_function = get_ecdsa_named_function(xform);
+   func_id = qat_function.func_id;
+   if (func_id == 0) {
+   QAT_LOG(ERR, "Cannot obtain functionality id");
+   return -EINVAL;
+   }
+   qat_func_alignsize =
+   RTE_ALIGN_CEIL(qat_function.bytesize, 8);
+
+   SET_PKE_LN(asym_op->ecdsa.k, qat_func_alignsize, 0);
+   SET_PKE_LN(asym_op->ecdsa.message, qat_func_alignsize, 1);
+   SET_PKE_LN(asym_op->ecdsa.pkey, qat_func_alignsize, 2);
+
+   cookie->alg_bytesize = curve[curve_id].bytesize;
+   cookie->qat_func_alignsize = qat_func_alignsize;
+   qat_req->pke_hdr.cd_pars.func_id = func_id;
+   qat_req->input_param_count =
+   3;
+   qat_req->output_param_count =
+   2;
+
+   HEXDUMP("k", cookie->input_array[0], qat_func_alignsize);
+   HEXDUMP("e", cookie->

[PATCH 2/2] crypto/qat: add gen4 ecpm functions

2022-04-08 Thread Arek Kusztal
This commit adds functions to use ecpm named
curves. This will speed up calculation
of multiplication for curves P-256 and P-384.

Signed-off-by: Arek Kusztal 
---
 .../common/qat/qat_adf/icp_qat_fw_mmp_ids.h   | 42 
 drivers/common/qat/qat_adf/qat_pke.h  | 20 ++
 drivers/crypto/qat/qat_asym.c | 67 ++-
 3 files changed, 127 insertions(+), 2 deletions(-)

diff --git a/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h 
b/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
index 9276f954f1..5a92393b40 100644
--- a/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
+++ b/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
@@ -1524,6 +1524,48 @@ icp_qat_fw_mmp_ecdsa_verify_gfp_521_input::in in @endlink
  * icp_qat_fw_mmp_kpt_ecdsa_sign_rs_gfp_521_output::s s @endlink
  */
 
+#define PKE_EC_GENERATOR_MULTIPLICATION_P256 0x12073556
+/**< Functionality ID for ECC P256 Generator Point Multiplication [k]G(x)
+ * @li 1 input parameters : @link
+ * icp_qat_fw_mmp_ec_p256_generator_multiplication_input_s::k k @endlink
+ * @li 2 output parameters : @link
+ * icp_qat_fw_mmp_ec_p256_generator_multiplication_output_s::xr xr @endlink
+ * @link icp_qat_fw_mmp_ec_p256_generator_multiplication_output_s::yr yr
+ * @endlink
+ */
+
+#define PKE_EC_GENERATOR_MULTIPLICATION_P384 0x0b073596
+/**< Functionality ID for ECC P384 Generator Point Multiplication [k]G(x)
+ * @li 1 input parameters : @link
+ * icp_qat_fw_mmp_ec_p384_generator_multiplication_input_s::k k @endlink
+ * @li 2 output parameters : @link
+ * icp_qat_fw_mmp_ec_p384_generator_multiplication_output_s::xr xr @endlink
+ * @link icp_qat_fw_mmp_ec_p384_generator_multiplication_output_s::yr yr
+ * @endlink
+ */
+
+#define PKE_EC_POINT_MULTIPLICATION_P256 0x0a083546
+/**< Functionality ID for ECC P256 Variable Point Multiplication [k]P(x)
+ * @li 3 input parameters : @link
+ * icp_qat_fw_mmp_ec_p256_point_multiplication_input_s::xp xp @endlink @link
+ * icp_qat_fw_mmp_ec_p256_point_multiplication_input_s::yp yp @endlink @link
+ * icp_qat_fw_mmp_ec_p256_point_multiplication_input_s::k k @endlink
+ * @li 2 output parameters : @link
+ * icp_qat_fw_mmp_ec_p256_point_multiplication_output_s::xr xr @endlink @link
+ * icp_qat_fw_mmp_ec_p256_point_multiplication_output_s::yr yr @endlink
+ */
+
+#define PKE_EC_POINT_MULTIPLICATION_P384 0x0b083586
+/**< Functionality ID for ECC P384 Variable Point Multiplication [k]P(x)
+ * @li 3 input parameters : @link
+ * icp_qat_fw_mmp_ec_p384_point_multiplication_input_s::xp xp @endlink @link
+ * icp_qat_fw_mmp_ec_p384_point_multiplication_input_s::yp yp @endlink @link
+ * icp_qat_fw_mmp_ec_p384_point_multiplication_input_s::k k @endlink
+ * @li 2 output parameters : @link
+ * icp_qat_fw_mmp_ec_p384_point_multiplication_output_s::xr xr @endlink @link
+ * icp_qat_fw_mmp_ec_p384_point_multiplication_output_s::yr yr @endlink
+ */
+
 #define PKE_ECDSA_SIGN_RS_P256 0x18133566
 /**< Functionality ID for ECC P256 ECDSA Sign RS
  * @li 3 input parameters : @link icp_qat_fw_mmp_ecdsa_sign_rs_p256_input_s::k 
k
diff --git a/drivers/common/qat/qat_adf/qat_pke.h 
b/drivers/common/qat/qat_adf/qat_pke.h
index 87b6a383b3..4c3afabb05 100644
--- a/drivers/common/qat/qat_adf/qat_pke.h
+++ b/drivers/common/qat/qat_adf/qat_pke.h
@@ -310,4 +310,24 @@ get_ecpm_function(struct rte_crypto_asym_xform *xform)
return qat_function;
 }
 
+static struct qat_asym_function
+get_ecpm_named_function(struct rte_crypto_asym_xform *xform)
+{
+   struct qat_asym_function qat_function;
+
+   switch (xform->ec.curve_id) {
+   case RTE_CRYPTO_EC_GROUP_SECP256R1:
+   qat_function.func_id = PKE_EC_POINT_MULTIPLICATION_P256;
+   qat_function.bytesize = 32;
+   break;
+   case RTE_CRYPTO_EC_GROUP_SECP384R1:
+   qat_function.func_id = PKE_EC_POINT_MULTIPLICATION_P384;
+   qat_function.bytesize = 48;
+   break;
+   default:
+   qat_function.func_id = 0;
+   }
+   return qat_function;
+}
+
 #endif
diff --git a/drivers/crypto/qat/qat_asym.c b/drivers/crypto/qat/qat_asym.c
index 0ac2bf7405..860046d446 100644
--- a/drivers/crypto/qat/qat_asym.c
+++ b/drivers/crypto/qat/qat_asym.c
@@ -739,7 +739,7 @@ ecdsa_collect(struct rte_crypto_asym_op *asym_op,
 }
 
 static int
-ecpm_set_input(struct rte_crypto_asym_op *asym_op,
+ecpm_set_generic_input(struct rte_crypto_asym_op *asym_op,
struct icp_qat_fw_pke_request *qat_req,
struct qat_asym_op_cookie *cookie,
struct rte_crypto_asym_xform *xform)
@@ -789,6 +789,69 @@ ecpm_set_input(struct rte_crypto_asym_op *asym_op,
return 0;
 }
 
+static int
+ecpm_set_named_input(struct rte_crypto_asym_op *asym_op,
+   struct icp_qat_fw_pke_request *qat_req,
+   struct qat_asym_op_cookie *cookie,
+   struct rte_crypto_asym_xform *xform)
+{
+   struct qat_asym_function qat_function;
+   uint32_t qat_func_alignsize

[PATCH v2 0/3] Enable Protocol Agnostic Flow Offloading FDIR in AVF

2022-04-08 Thread Junfeng Guo
This patch set enabled Protocol Agnostic Flow (raw flow) Offloading
for FDIR in AVF.

[PATCH v2 1/3] common/iavf: support raw packet in protocol header
[PATCH v2 2/3] net/iavf: align with proto hdr struct change
[PATCH v2 3/3] net/iavf: enable Protocol Agnostic Flow Offloading FDIR

v2:
add release notes and document update.

Junfeng Guo (3):
  common/iavf: support raw packet in protocol header
  net/iavf: align with proto hdr struct change
  net/iavf: enable Protocol Agnostic Flow Offloading FDIR

 doc/guides/rel_notes/release_22_07.rst |   4 +
 drivers/common/iavf/virtchnl.h |  20 ++-
 drivers/net/iavf/iavf_fdir.c   |  66 +
 drivers/net/iavf/iavf_generic_flow.c   |   6 +
 drivers/net/iavf/iavf_generic_flow.h   |   3 +
 drivers/net/iavf/iavf_hash.c   | 180 +
 6 files changed, 187 insertions(+), 92 deletions(-)

-- 
2.25.1



[PATCH v2 1/3] common/iavf: support raw packet in protocol header

2022-04-08 Thread Junfeng Guo
The patch extends existing virtchnl_proto_hdrs structure to allow VF
to pass a pair of buffers as packet data and mask that describe
a match pattern of a filter rule. Then the kernel PF driver is requested
to parse the pair of buffer and figure out low level hardware metadata
(ptype, profile, field vector.. ) to program the expected FDIR or RSS
rules.

INTERNAL ONLY:

This is requirement from DPDK to support Protocol Agnostic Flow
Offloading(*1). Previously we have integrated the Parser Library(*2)
into DPDK and enabled a raw packet based FDIR and RSS support in DPDK
PF driver(*3,*4), to enable the same feature for AVF driver, we need
Virtual Channel to support raw packet filter rule passing.

[1] https://wiki.ith.intel.com/display/NPGCVL/Protocol+Agnostic+Flow+Offloading
[2] 
http://patchwork.dpdk.org/project/dpdk/list/?series=19057&archive=both&state=*
[3] 
http://patchwork.dpdk.org/project/dpdk/list/?series=20254&state=%2A&archive=both
[4] 
http://patchwork.dpdk.org/project/dpdk/list/?series=20291&state=%2A&archive=both

Signed-off-by: Qi Zhang 
Signed-off-by: Junfeng Guo 
---
 drivers/common/iavf/virtchnl.h | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/common/iavf/virtchnl.h b/drivers/common/iavf/virtchnl.h
index 3e44eca7d8..3975229545 100644
--- a/drivers/common/iavf/virtchnl.h
+++ b/drivers/common/iavf/virtchnl.h
@@ -1482,6 +1482,7 @@ enum virtchnl_vfr_states {
 };
 
 #define VIRTCHNL_MAX_NUM_PROTO_HDRS32
+#define VIRTCHNL_MAX_SIZE_RAW_PACKET   1024
 #define PROTO_HDR_SHIFT5
 #define PROTO_HDR_FIELD_START(proto_hdr_type) \
(proto_hdr_type << PROTO_HDR_SHIFT)
@@ -1676,14 +1677,25 @@ VIRTCHNL_CHECK_STRUCT_LEN(72, virtchnl_proto_hdr);
 struct virtchnl_proto_hdrs {
u8 tunnel_level;
/**
-* specify where protocol header start from.
+* specify where protocol header start from. must be 0 when sending a 
raw packet request.
 * 0 - from the outer layer
 * 1 - from the first inner layer
 * 2 - from the second inner layer
 * 
-**/
-   int count; /* the proto layers must < VIRTCHNL_MAX_NUM_PROTO_HDRS */
-   struct virtchnl_proto_hdr proto_hdr[VIRTCHNL_MAX_NUM_PROTO_HDRS];
+*/
+   int count;
+   /**
+* number of proto layers, must < VIRTCHNL_MAX_NUM_PROTO_HDRS
+* must be 0 for a raw packet request.
+*/
+   union {
+   struct virtchnl_proto_hdr 
proto_hdr[VIRTCHNL_MAX_NUM_PROTO_HDRS];
+   struct {
+   u16 pkt_len;
+   u8 spec[VIRTCHNL_MAX_SIZE_RAW_PACKET];
+   u8 mask[VIRTCHNL_MAX_SIZE_RAW_PACKET];
+   } raw;
+   };
 };
 
 VIRTCHNL_CHECK_STRUCT_LEN(2312, virtchnl_proto_hdrs);
-- 
2.25.1



[PATCH v2 2/3] net/iavf: align with proto hdr struct change

2022-04-08 Thread Junfeng Guo
Structure virtchnl_proto_headrs is extended with a union struct for
proto_hdr table and raw struct. Thus update the proto_hdrs template
init to align the virtchnl changes.

Signed-off-by: Junfeng Guo 
---
 drivers/net/iavf/iavf_hash.c | 180 ++-
 1 file changed, 92 insertions(+), 88 deletions(-)

diff --git a/drivers/net/iavf/iavf_hash.c b/drivers/net/iavf/iavf_hash.c
index f35a07653b..278e75117d 100644
--- a/drivers/net/iavf/iavf_hash.c
+++ b/drivers/net/iavf/iavf_hash.c
@@ -181,252 +181,256 @@ iavf_hash_parse_pattern_action(struct iavf_adapter *ad,
 /* proto_hdrs template */
 struct virtchnl_proto_hdrs outer_ipv4_tmplt = {
TUNNEL_LEVEL_OUTER, 4,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv4}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv4}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv4_udp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
-proto_hdr_ipv4_with_prot,
-proto_hdr_udp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
+ proto_hdr_ipv4_with_prot,
+ proto_hdr_udp}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv4_tcp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
-proto_hdr_ipv4_with_prot,
-proto_hdr_tcp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
+ proto_hdr_ipv4_with_prot,
+ proto_hdr_tcp}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv4_sctp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv4,
-proto_hdr_sctp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv4,
+ proto_hdr_sctp}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv6_tmplt = {
TUNNEL_LEVEL_OUTER, 4,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv6}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv6}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv6_frag_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
-proto_hdr_ipv6, proto_hdr_ipv6_frag}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
+ proto_hdr_ipv6, proto_hdr_ipv6_frag}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv6_udp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
-proto_hdr_ipv6_with_prot,
-proto_hdr_udp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
+ proto_hdr_ipv6_with_prot,
+ proto_hdr_udp}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv6_tcp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
-proto_hdr_ipv6_with_prot,
-proto_hdr_tcp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
+ proto_hdr_ipv6_with_prot,
+ proto_hdr_tcp}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv6_sctp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv6,
-proto_hdr_sctp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv6,
+ proto_hdr_sctp}}
 };
 
 struct virtchnl_proto_hdrs inner_ipv4_tmplt = {
-   TUNNEL_LEVEL_INNER, 1, {proto_hdr_ipv4}
+   TUNNEL_LEVEL_INNER, 1, {{proto_hdr_ipv4}}
 };
 
 struct virtchnl_proto_hdrs inner_ipv4_udp_tmplt = {
-   TUNNEL_LEVEL_INNER, 2, {proto_hdr_ipv4_with_prot, proto_hdr_udp}
+   TUNNEL_LEVEL_INNER, 2, {{proto_hdr_ipv4_with_prot, proto_hdr_udp}}
 };
 
 struct virtchnl_proto_hdrs inner_ipv4_tcp_tmplt = {
-   TUNNEL_LEVEL_INNER, 2, {proto_hdr_ipv4_with_prot, proto_hdr_tcp}
+   TUNNEL_LEVEL_INNER, 2, {{proto_hdr_ipv4_with_prot, proto_hdr_tcp}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv4_tmplt = {
-   2, 1, {proto_hdr_ipv4}
+   2, 1, {{proto_hdr_ipv4}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv4_udp_tmplt = {
-   2, 2, {proto_hdr_ipv4_with_prot, proto_hdr_udp}
+   2, 2, {{proto_hdr_ipv4_with_prot, proto_hdr_udp}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv4_tcp_tmplt = {
-   2, 2, {proto_hdr_ipv4_with_prot, proto_hdr_tcp}
+   2, 2, {{proto_hdr_ipv4_with_prot, proto_hdr_tcp}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv6_tmplt = {
-   2, 1, {proto_hdr_ipv6}
+   2, 1, {{proto_hdr_ipv6}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv6_udp_tmplt = {
-   2, 2, {proto_hdr_ipv6_with_prot, proto_hdr_udp}
+   2, 2, {{proto_hdr_ipv6_with_prot, proto_hdr_udp}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv6_tcp_tmplt = {
-   2, 2, {proto_hdr_ipv6_with_prot, proto_hdr_tcp}
+   2, 2, {{proto_hdr_ipv6_with_prot, proto_hdr_tcp}}
 };
 
 struct virtchnl_proto_hdrs inner_ipv4_sctp_tmplt = {
-   TUNNEL_LEVEL_INNER, 2, {proto_hdr_ipv4, proto_hdr_sctp}
+   TUNNEL_LEVEL_INNER, 2, {{proto_hdr_ipv4, proto_hd

[PATCH v2 3/3] net/iavf: enable Protocol Agnostic Flow Offloading FDIR

2022-04-08 Thread Junfeng Guo
This patch enabled Protocol Agnostic Flow Offloading FDIR in AVF.

Signed-off-by: Junfeng Guo 
---
 doc/guides/rel_notes/release_22_07.rst |  4 ++
 drivers/net/iavf/iavf_fdir.c   | 66 ++
 drivers/net/iavf/iavf_generic_flow.c   |  6 +++
 drivers/net/iavf/iavf_generic_flow.h   |  3 ++
 4 files changed, 79 insertions(+)

diff --git a/doc/guides/rel_notes/release_22_07.rst 
b/doc/guides/rel_notes/release_22_07.rst
index 42a5f2d990..43eab0b6d5 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -55,6 +55,10 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Updated Intel iavf driver.**
+
+  * Added Protocol Agnostic Flow Offloading support in AVF Flow Director.
+
 
 Removed Items
 -
diff --git a/drivers/net/iavf/iavf_fdir.c b/drivers/net/iavf/iavf_fdir.c
index e9a3566c0d..bd0ae544da 100644
--- a/drivers/net/iavf/iavf_fdir.c
+++ b/drivers/net/iavf/iavf_fdir.c
@@ -194,6 +194,7 @@
IAVF_INSET_TUN_TCP_DST_PORT)
 
 static struct iavf_pattern_match_item iavf_fdir_pattern[] = {
+   {iavf_pattern_raw,   IAVF_INSET_NONE,   
IAVF_INSET_NONE},
{iavf_pattern_ethertype, IAVF_FDIR_INSET_ETH,   
IAVF_INSET_NONE},
{iavf_pattern_eth_ipv4,  IAVF_FDIR_INSET_ETH_IPV4,  
IAVF_INSET_NONE},
{iavf_pattern_eth_ipv4_udp,  IAVF_FDIR_INSET_ETH_IPV4_UDP,  
IAVF_INSET_NONE},
@@ -720,6 +721,7 @@ iavf_fdir_parse_pattern(__rte_unused struct iavf_adapter 
*ad,
struct virtchnl_proto_hdrs *hdrs =
&filter->add_fltr.rule_cfg.proto_hdrs;
enum rte_flow_item_type l3 = RTE_FLOW_ITEM_TYPE_END;
+   const struct rte_flow_item_raw *raw_spec, *raw_mask;
const struct rte_flow_item_eth *eth_spec, *eth_mask;
const struct rte_flow_item_ipv4 *ipv4_spec, *ipv4_last, *ipv4_mask;
const struct rte_flow_item_ipv6 *ipv6_spec, *ipv6_mask;
@@ -746,6 +748,7 @@ iavf_fdir_parse_pattern(__rte_unused struct iavf_adapter 
*ad,
enum rte_flow_item_type next_type;
uint8_t tun_inner = 0;
uint16_t ether_type, flags_version;
+   uint8_t item_num = 0;
int layer = 0;
 
uint8_t  ipv6_addr_mask[16] = {
@@ -763,8 +766,71 @@ iavf_fdir_parse_pattern(__rte_unused struct iavf_adapter 
*ad,
   RTE_FLOW_ERROR_TYPE_ITEM, item,
   "Not support range");
}
+   item_num++;
 
switch (item_type) {
+   case RTE_FLOW_ITEM_TYPE_RAW:
+   raw_spec = item->spec;
+   raw_mask = item->mask;
+
+   if (item_num != 1)
+   return -rte_errno;
+
+   if (raw_spec->length != raw_mask->length)
+   return -rte_errno;
+
+   uint16_t pkt_len = 0;
+   uint16_t tmp_val = 0;
+   uint8_t tmp = 0;
+   int i, j;
+
+   pkt_len = raw_spec->length;
+
+   for (i = 0, j = 0; i < pkt_len; i += 2, j++) {
+   tmp = raw_spec->pattern[i];
+   if (tmp >= 'a' && tmp <= 'f')
+   tmp_val = tmp - 'a' + 10;
+   if (tmp >= 'A' && tmp <= 'F')
+   tmp_val = tmp - 'A' + 10;
+   if (tmp >= '0' && tmp <= '9')
+   tmp_val = tmp - '0';
+
+   tmp_val *= 16;
+   tmp = raw_spec->pattern[i + 1];
+   if (tmp >= 'a' && tmp <= 'f')
+   tmp_val += (tmp - 'a' + 10);
+   if (tmp >= 'A' && tmp <= 'F')
+   tmp_val += (tmp - 'A' + 10);
+   if (tmp >= '0' && tmp <= '9')
+   tmp_val += (tmp - '0');
+
+   hdrs->raw.spec[j] = tmp_val;
+
+   tmp = raw_mask->pattern[i];
+   if (tmp >= 'a' && tmp <= 'f')
+   tmp_val = tmp - 'a' + 10;
+   if (tmp >= 'A' && tmp <= 'F')
+   tmp_val = tmp - 'A' + 10;
+   if (tmp >= '0' && tmp <= '9')
+   tmp_val = tmp - '0';
+
+   tmp_val *= 16;
+   tmp = raw_mask->pattern[i + 1];
+   if (tmp >= 'a' && tmp <= 'f')
+   t

RE: OVS DPDK DMA-Dev library/Design Discussion

2022-04-08 Thread Morten Brørup
> From: Hu, Jiayu [mailto:jiayu...@intel.com]
> Sent: Friday, 8 April 2022 09.14
> 
> > From: Ilya Maximets 
> >
> > On 4/7/22 16:25, Maxime Coquelin wrote:
> > > Hi Harry,
> > >
> > > On 4/7/22 16:04, Van Haaren, Harry wrote:
> > >> Hi OVS & DPDK, Maintainers & Community,
> > >>
> > >> Top posting overview of discussion as replies to thread become
> slower:
> > >> perhaps it is a good time to review and plan for next steps?
> > >>
> > >>  From my perspective, it those most vocal in the thread seem to be
> in
> > >> favour of the clean rx/tx split ("defer work"), with the tradeoff
> > >> that the application must be aware of handling the async DMA
> > >> completions. If there are any concerns opposing upstreaming of
> this
> > method, please indicate this promptly, and we can continue technical
> > discussions here now.
> > >
> > > Wasn't there some discussions about handling the Virtio completions
> > > with the DMA engine? With that, we wouldn't need the deferral of
> work.
> >
> > +1
> >
> > With the virtio completions handled by DMA itself, the vhost port
> turns
> > almost into a real HW NIC.  With that we will not need any extra
> > manipulations from the OVS side, i.e. no need to defer any work while
> > maintaining clear split between rx and tx operations.
> 
> First, making DMA do 2B copy would sacrifice performance, and I think
> we all agree on that. Second, this method comes with an issue of
> ordering.
> For example, PMD thread0 enqueue 10 packets to vring0 first, then PMD
> thread1
> enqueue 20 packets to vring0. If PMD thread0 and threa1 have own
> dedicated
> DMA device dma0 and dma1, flag/index update for the first 10 packets is
> done by
> dma0, and flag/index update for the left 20 packets is done by dma1.
> But there
> is no ordering guarantee among different DMA devices, so flag/index
> update may
> error. If PMD threads don't have dedicated DMA devices, which means DMA
> devices are shared among threads, we need lock and pay for lock
> contention in
> data-path. Or we can allocate DMA devices for vring dynamically to
> avoid DMA
> sharing among threads. But what's the overhead of allocation mechanism?
> Who
> does it? Any thoughts?
> 

Think of it like a hardware NIC... what are the constraints for a hardware NIC:

Two threads writing simultaneously into the same NIC TX queue is not possible, 
and would be an application design error. With a hardware NIC, you use separate 
TX queues for each thread.

Having two threads writing into the same TX queue (to maintain ordering), is 
not possible without additional application logic. This could be a pipeline 
stage with a lockless multi producer-single consumer ring in front of the NIC 
TX queue, or it could be a critical section preventing one thread from writing 
while another thread is writing.

Either way, multiple threads writing simultaneously into the same NIC TX queue 
is not possible with hardware NIC drivers, but must be implemented in the 
application. So why would anyone expect it to be possible for virtual NIC 
drivers (such as the vhost)?


> Thanks,
> Jiayu
> 
> >
> > I'd vote for that.
> >
> > >
> > > Thanks,
> > > Maxime
> > >
> > >> In absence of continued technical discussion here, I suggest Sunil
> > >> and Ian collaborate on getting the OVS Defer-work approach, and
> DPDK
> > >> VHost Async patchsets available on GitHub for easier consumption
> and
> > future development (as suggested in slides presented on last call).
> > >>
> > >> Regards, -Harry
> > >>
> > >> No inline-replies below; message just for context.
> > >>
> > >>> -Original Message-
> > >>> From: Van Haaren, Harry
> > >>> Sent: Wednesday, March 30, 2022 10:02 AM
> > >>> To: Morten Brørup ; Richardson, Bruce
> > >>> 
> > >>> Cc: Maxime Coquelin ; Pai G, Sunil
> > >>> ; Stokes, Ian ; Hu,
> > >>> Jiayu ; Ferriter, Cian
> > >>> ; Ilya Maximets ;
> > >>> ovs-...@openvswitch.org; dev@dpdk.org; Mcnamara, John
> > >>> ; O'Driscoll, Tim
> > >>> ; Finn, Emma 
> > >>> Subject: RE: OVS DPDK DMA-Dev library/Design Discussion
> > >>>
> >  -Original Message-
> >  From: Morten Brørup 
> >  Sent: Tuesday, March 29, 2022 8:59 PM
> >  To: Van Haaren, Harry ; Richardson,
> >  Bruce 
> >  Cc: Maxime Coquelin ; Pai G, Sunil
> >  ; Stokes, Ian ; Hu,
> >  Jiayu ; Ferriter, Cian
> >  ; Ilya Maximets ;
> >  ovs-...@openvswitch.org; dev@dpdk.org; Mcnamara,
> > >>> John
> >  ; O'Driscoll, Tim
> >  ; Finn, Emma 
> >  Subject: RE: OVS DPDK DMA-Dev library/Design Discussion
> > 
> > > From: Van Haaren, Harry [mailto:harry.van.haa...@intel.com]
> > > Sent: Tuesday, 29 March 2022 19.46
> > >
> > >> From: Morten Brørup 
> > >> Sent: Tuesday, March 29, 2022 6:14 PM
> > >>
> > >>> From: Bruce Richardson [mailto:bruce.richard...@intel.com]
> > >>> Sent: Tuesday, 29 March 2022 19.03
> > >>>
> > >>> On Tue, Mar 29, 2022 at 06:45:19PM +0200, Morten Brørup
> wrote:
> > > From: Max

Re: [RFC] ethdev: datapath-focused meter actions

2022-04-08 Thread Jerin Jacob
+ @Cristian Dumitrescu meter maintainer.


On Fri, Apr 8, 2022 at 8:17 AM Alexander Kozyrev  wrote:
>
> The introduction of asynchronous flow rules operations allowed users
> to create/destroy flow rules as part of the datapath without blocking
> on Flow API and slowing the packet processing down.
>
> That applies to every possible action that has no preparation steps.
> Unfortunately, one notable exception is the meter action.
> There is a separate API to prepare a meter profile and a meter policy
> before any meter object can be used as a flow rule action.
>
> The application logic is the following:
> 1. rte_mtr_meter_profile_add() is called to create the meter profile
> first to define how to classify incoming packets and to assign an
> appropriate color to them.
> 2. rte_mtr_meter_policy_add() is invoked to define the fate of a packet,
> based on its color (practically creating flow rules, matching colors).
> 3. rte_mtr_create() is then needed to search (with locks) for previously
> created profile and policy in order to create the meter object.
> 4. rte_flow_create() is now finally can be used to specify the created
> meter as an action.
>
> This approach doesn't fit into the asynchronous rule creation model
> and can be improved with the following proposal:
> 1. Creating a policy may be replaced with the creation of a group with
> up to 3 different rules for every color using asynchronous Flow API.
> That requires the introduction of a new pattern item - meter color.
> Then creation a flow rule with the meter means a simple jump to a group:
> rte_flow_async_create(group=1, pattern=color, actions=...);
> rte_flow_async_create(group=0, pattern=5-tuple,
>   actions=meter,jump group 1);
> This allows to classify packets and act upon their color classifications.
> The Meter action assigns a color to a packet and an appropriate action
> is selected based on the Meter color in group 1.
>
> 2. Preparing a meter object should be the part of flow rule creation
> and use the same flow queue to benefit from asynchronous operations:
> rte_flow_async_create(group=0, pattern=5-tuple,
>   actions=meter id 1 profile rfc2697, jump group 1);
> Creation of the meter object takes time and flow creation must wait
> until it is ready before inserting the rule. Using the same queue allows
> ensuring that. There is no need to create a meter object outside of the
> Flow API, but this approach won't affect the old Meter API in any way.
>
> 3. Another point of optimization is to prepare all the resources needed
> in advance in rte_flow_configure(). All the policy rules can be created
> during the initialization stage easily and put into several groups.
> These groups can be used by many meter objects by simple jump action to
> an appropriate group. Meter objects can be preallocated as well and
> configured with required profile parameters later at the flow rule
> creation stage. The number of pre-allocated profiles/policies is
> specified in the Flow engine resources settings.
>
> These optimizations alongside already existing pattern/actions templates
> can improve the insertion rate significantly and allow meter usage as
> part of the datapath. The introduction of the new API is intended to be
> used with the asynchronous Flow API. Deprecation of the old Meter API
> is not planned at this point.
>
> Signed-off-by: Alexander Kozyrev 
> ---
>  lib/ethdev/rte_flow.h | 71 ++-
>  1 file changed, 70 insertions(+), 1 deletion(-)
>
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index d8827dd184..aec36a9f0a 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -671,6 +672,13 @@ enum rte_flow_item_type {
>  * See struct rte_flow_item_gre_opt.
>  */
> RTE_FLOW_ITEM_TYPE_GRE_OPTION,
> +
> +   /**
> +* Matches Meter Color.
> +*
> +* See struct rte_flow_item_meter_color.
> +*/
> +   RTE_FLOW_ITEM_TYPE_METER_COLOR,
>  };
>
>  /**
> @@ -1990,6 +1998,26 @@ static const struct rte_flow_item_ppp 
> rte_flow_item_ppp_mask = {
>  };
>  #endif
>
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ITEM_TYPE_METER_COLOR
> + *
> + * Matches a meter color set in the packet meta-data
> + * (i.e. struct rte_mbuf::sched::color).
> + */
> +struct rte_flow_item_meter_color {
> +   enum rte_color color; /**< Packet color. */
> +};
> +
> +/** Default mask for RTE_FLOW_ITEM_TYPE_METER_COLOR. */
> +#ifndef __cplusplus
> +static const struct rte_flow_item_meter_color rte_flow_item_meter_color_mask 
> = {
> +   .color = 0x3,
> +};
> +#endif
> +
>  /**
>   * Matching pattern item definition.
>   *
> @@ -2376,6 +2404,14 @@ enum rte_flow_action_type {
>  */
> RTE_FLOW_ACTION_TYPE_METER,
>
> +  

Re: [PATCH 0/3] add eal functions for thread affinity

2022-04-08 Thread David Marchand
Hello Tyler,

On Fri, Apr 1, 2022 at 3:30 PM Tyler Retzlaff
 wrote:
>
> this series provides basic dependencies for additional eal thread api
> additions. series includes basic error handling, initial get/set thread
> affinity functions and minimal unit test.
>
> Tyler Retzlaff (3):
>   eal/windows: translate Windows errors to errno-style errors
>   eal: implement functions for get/set thread affinity
>   test/threads: add unit test for thread API
>
>  app/test/meson.build |   2 +
>  app/test/test_threads.c  |  86 +++
>  lib/eal/include/rte_thread.h |  45 ++
>  lib/eal/unix/rte_thread.c|  16 
>  lib/eal/version.map  |   4 +
>  lib/eal/windows/eal_lcore.c  | 173 +++--
>  lib/eal/windows/eal_windows.h|  10 +++
>  lib/eal/windows/include/rte_os.h |   2 +
>  lib/eal/windows/rte_thread.c | 179 
> ++-
>  9 files changed, 472 insertions(+), 45 deletions(-)
>  create mode 100644 app/test/test_threads.c

We have two concurrent series, can you clarify what are the intentions
on this work?
Is this series superseding Narcisa series?

Thanks!

-- 
David Marchand



[PATCH v4 1/4] common/iavf: support queue rate limit and quanta size configuration

2022-04-08 Thread Wenjun Wu
This patch adds new virtchnl opcodes and structures for rate limit
and quanta size configuration, which include:
1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each
VF per queue.
2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.

Signed-off-by: Ting Xu 
Signed-off-by: Wenjun Wu 
---
 drivers/common/iavf/virtchnl.h | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/drivers/common/iavf/virtchnl.h b/drivers/common/iavf/virtchnl.h
index 3e44eca7d8..249ae6ed23 100644
--- a/drivers/common/iavf/virtchnl.h
+++ b/drivers/common/iavf/virtchnl.h
@@ -164,6 +164,8 @@ enum virtchnl_ops {
VIRTCHNL_OP_ENABLE_QUEUES_V2 = 107,
VIRTCHNL_OP_DISABLE_QUEUES_V2 = 108,
VIRTCHNL_OP_MAP_QUEUE_VECTOR = 111,
+   VIRTCHNL_OP_CONFIG_QUEUE_BW = 112,
+   VIRTCHNL_OP_CONFIG_QUANTA = 113,
VIRTCHNL_OP_MAX,
 };
 
@@ -1872,6 +1874,23 @@ struct virtchnl_queue_tc_mapping {
 
 VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_queue_tc_mapping);
 
+/* VIRTCHNL_OP_CONFIG_QUEUE_BW */
+struct virtchnl_queue_bw {
+   u16 queue_id;
+   u8 tc;
+   u8 pad;
+   struct virtchnl_shaper_bw shaper;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_queue_bw);
+
+struct virtchnl_queues_bw_cfg {
+   u16 vsi_id;
+   u16 num_queues;
+   struct virtchnl_queue_bw cfg[1];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_queues_bw_cfg);
 
 /* TX and RX queue types are valid in legacy as well as split queue models.
  * With Split Queue model, 2 additional types are introduced - TX_COMPLETION
@@ -1978,6 +1997,12 @@ struct virtchnl_queue_vector_maps {
 
 VIRTCHNL_CHECK_STRUCT_LEN(24, virtchnl_queue_vector_maps);
 
+struct virtchnl_quanta_cfg {
+   u16 quanta_size;
+   struct virtchnl_queue_chunk queue_select;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_quanta_cfg);
 
 /* Since VF messages are limited by u16 size, precalculate the maximum possible
  * values of nested elements in virtchnl structures that virtual channel can
@@ -2244,6 +2269,31 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info 
*ver, u32 v_opcode,
 sizeof(q_tc->tc[0]);
}
break;
+   case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+   valid_len = sizeof(struct virtchnl_queues_bw_cfg);
+   if (msglen >= valid_len) {
+   struct virtchnl_queues_bw_cfg *q_bw =
+   (struct virtchnl_queues_bw_cfg *)msg;
+   if (q_bw->num_queues == 0) {
+   err_msg_format = true;
+   break;
+   }
+   valid_len += (q_bw->num_queues - 1) *
+sizeof(q_bw->cfg[0]);
+   }
+   break;
+   case VIRTCHNL_OP_CONFIG_QUANTA:
+   valid_len = sizeof(struct virtchnl_quanta_cfg);
+   if (msglen >= valid_len) {
+   struct virtchnl_quanta_cfg *q_quanta =
+   (struct virtchnl_quanta_cfg *)msg;
+   if (q_quanta->quanta_size == 0 ||
+   q_quanta->queue_select.num_queues == 0) {
+   err_msg_format = true;
+   break;
+   }
+   }
+   break;
case VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS:
break;
case VIRTCHNL_OP_ADD_VLAN_V2:
-- 
2.25.1



[PATCH v4 0/4] Enable queue rate limit and quanta size configuration

2022-04-08 Thread Wenjun Wu
This patch set adds queue rate limit and quanta size configuration.
Quanta size can be changed by driver devarg quanta_size=xxx. Quanta
size should be set to the value between 256 and 4096 and be the product
of 64.

v2: Rework virtchnl.
v3: Add release note.
v4: Quanta size configuration will block device init
if PF does not support. Fix this issue.

Wenjun Wu (4):
  common/iavf: support queue rate limit and quanta size configuration
  net/iavf: support queue rate limit configuration
  net/iavf: support quanta size configuration
  doc: add release notes for 22.07

 doc/guides/rel_notes/release_22_07.rst |   4 +
 drivers/common/iavf/virtchnl.h |  50 +++
 drivers/net/iavf/iavf.h|  16 +++
 drivers/net/iavf/iavf_ethdev.c |  38 +
 drivers/net/iavf/iavf_tm.c | 190 +++--
 drivers/net/iavf/iavf_vchnl.c  |  54 +++
 6 files changed, 344 insertions(+), 8 deletions(-)

-- 
2.25.1



[PATCH v4 2/4] net/iavf: support queue rate limit configuration

2022-04-08 Thread Wenjun Wu
This patch adds queue rate limit configuration support.
Only max bandwidth is supported.

Signed-off-by: Ting Xu 
Signed-off-by: Wenjun Wu 
---
 drivers/net/iavf/iavf.h   |  13 +++
 drivers/net/iavf/iavf_tm.c| 190 --
 drivers/net/iavf/iavf_vchnl.c |  23 
 3 files changed, 218 insertions(+), 8 deletions(-)

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index a01d18e61b..96515a3ee9 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -170,11 +170,21 @@ struct iavf_tm_node {
uint32_t weight;
uint32_t reference_count;
struct iavf_tm_node *parent;
+   struct iavf_tm_shaper_profile *shaper_profile;
struct rte_tm_node_params params;
 };
 
 TAILQ_HEAD(iavf_tm_node_list, iavf_tm_node);
 
+struct iavf_tm_shaper_profile {
+   TAILQ_ENTRY(iavf_tm_shaper_profile) node;
+   uint32_t shaper_profile_id;
+   uint32_t reference_count;
+   struct rte_tm_shaper_params profile;
+};
+
+TAILQ_HEAD(iavf_shaper_profile_list, iavf_tm_shaper_profile);
+
 /* node type of Traffic Manager */
 enum iavf_tm_node_type {
IAVF_TM_NODE_TYPE_PORT,
@@ -188,6 +198,7 @@ struct iavf_tm_conf {
struct iavf_tm_node *root; /* root node - vf vsi */
struct iavf_tm_node_list tc_list; /* node list for all the TCs */
struct iavf_tm_node_list queue_list; /* node list for all the queues */
+   struct iavf_shaper_profile_list shaper_profile_list;
uint32_t nb_tc_node;
uint32_t nb_queue_node;
bool committed;
@@ -451,6 +462,8 @@ int iavf_add_del_mc_addr_list(struct iavf_adapter *adapter,
 int iavf_request_queues(struct rte_eth_dev *dev, uint16_t num);
 int iavf_get_max_rss_queue_region(struct iavf_adapter *adapter);
 int iavf_get_qos_cap(struct iavf_adapter *adapter);
+int iavf_set_q_bw(struct rte_eth_dev *dev,
+ struct virtchnl_queues_bw_cfg *q_bw, uint16_t size);
 int iavf_set_q_tc_map(struct rte_eth_dev *dev,
struct virtchnl_queue_tc_mapping *q_tc_mapping,
uint16_t size);
diff --git a/drivers/net/iavf/iavf_tm.c b/drivers/net/iavf/iavf_tm.c
index 8d92062c7f..32bb3be45e 100644
--- a/drivers/net/iavf/iavf_tm.c
+++ b/drivers/net/iavf/iavf_tm.c
@@ -8,6 +8,13 @@
 static int iavf_hierarchy_commit(struct rte_eth_dev *dev,
 __rte_unused int clear_on_fail,
 __rte_unused struct rte_tm_error *error);
+static int iavf_shaper_profile_add(struct rte_eth_dev *dev,
+  uint32_t shaper_profile_id,
+  struct rte_tm_shaper_params *profile,
+  struct rte_tm_error *error);
+static int iavf_shaper_profile_del(struct rte_eth_dev *dev,
+  uint32_t shaper_profile_id,
+  struct rte_tm_error *error);
 static int iavf_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
  uint32_t parent_node_id, uint32_t priority,
  uint32_t weight, uint32_t level_id,
@@ -30,6 +37,8 @@ static int iavf_node_type_get(struct rte_eth_dev *dev, 
uint32_t node_id,
   int *is_leaf, struct rte_tm_error *error);
 
 const struct rte_tm_ops iavf_tm_ops = {
+   .shaper_profile_add = iavf_shaper_profile_add,
+   .shaper_profile_delete = iavf_shaper_profile_del,
.node_add = iavf_tm_node_add,
.node_delete = iavf_tm_node_delete,
.capabilities_get = iavf_tm_capabilities_get,
@@ -44,6 +53,9 @@ iavf_tm_conf_init(struct rte_eth_dev *dev)
 {
struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
 
+   /* initialize shaper profile list */
+   TAILQ_INIT(&vf->tm_conf.shaper_profile_list);
+
/* initialize node configuration */
vf->tm_conf.root = NULL;
TAILQ_INIT(&vf->tm_conf.tc_list);
@@ -57,6 +69,7 @@ void
 iavf_tm_conf_uninit(struct rte_eth_dev *dev)
 {
struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
+   struct iavf_tm_shaper_profile *shaper_profile;
struct iavf_tm_node *tm_node;
 
/* clear node configuration */
@@ -74,6 +87,14 @@ iavf_tm_conf_uninit(struct rte_eth_dev *dev)
rte_free(vf->tm_conf.root);
vf->tm_conf.root = NULL;
}
+
+   /* Remove all shaper profiles */
+   while ((shaper_profile =
+  TAILQ_FIRST(&vf->tm_conf.shaper_profile_list))) {
+   TAILQ_REMOVE(&vf->tm_conf.shaper_profile_list,
+shaper_profile, node);
+   rte_free(shaper_profile);
+   }
 }
 
 static inline struct iavf_tm_node *
@@ -132,13 +153,6 @@ iavf_node_param_check(struct iavf_info *vf, uint32_t 
node_id,
return -EINVAL;
}
 
-   /* not support shaper profile */
-   if (params->shaper_profile_id) {
-   error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS_SH

[PATCH v4 3/4] net/iavf: support quanta size configuration

2022-04-08 Thread Wenjun Wu
This patch adds quanta size configuration support.
Quanta size should between 256 and 4096, and be a product of 64.

Signed-off-by: Wenjun Wu 
---
 drivers/net/iavf/iavf.h|  3 +++
 drivers/net/iavf/iavf_ethdev.c | 38 ++
 drivers/net/iavf/iavf_vchnl.c  | 31 +++
 3 files changed, 72 insertions(+)

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index 96515a3ee9..c0a4a47b04 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -292,6 +292,7 @@ enum iavf_proto_xtr_type {
 struct iavf_devargs {
uint8_t proto_xtr_dflt;
uint8_t proto_xtr[IAVF_MAX_QUEUE_NUM];
+   uint16_t quanta_size;
 };
 
 struct iavf_security_ctx;
@@ -467,6 +468,8 @@ int iavf_set_q_bw(struct rte_eth_dev *dev,
 int iavf_set_q_tc_map(struct rte_eth_dev *dev,
struct virtchnl_queue_tc_mapping *q_tc_mapping,
uint16_t size);
+int iavf_set_vf_quanta_size(struct iavf_adapter *adapter, u16 start_queue_id,
+   u16 num_queues);
 void iavf_tm_conf_init(struct rte_eth_dev *dev);
 void iavf_tm_conf_uninit(struct rte_eth_dev *dev);
 int iavf_ipsec_crypto_request(struct iavf_adapter *adapter,
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index d6190ac24a..7d093bdc24 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -34,9 +34,11 @@
 
 /* devargs */
 #define IAVF_PROTO_XTR_ARG "proto_xtr"
+#define IAVF_QUANTA_SIZE_ARG   "quanta_size"
 
 static const char * const iavf_valid_args[] = {
IAVF_PROTO_XTR_ARG,
+   IAVF_QUANTA_SIZE_ARG,
NULL
 };
 
@@ -950,6 +952,9 @@ iavf_dev_start(struct rte_eth_dev *dev)
return -1;
}
 
+   if (iavf_set_vf_quanta_size(adapter, index, num_queue_pairs) != 0)
+   PMD_DRV_LOG(WARNING, "configure quanta size failed");
+
/* If needed, send configure queues msg multiple times to make the
 * adminq buffer length smaller than the 4K limitation.
 */
@@ -2092,6 +2097,25 @@ iavf_handle_proto_xtr_arg(__rte_unused const char *key, 
const char *value,
return 0;
 }
 
+static int
+parse_u16(__rte_unused const char *key, const char *value, void *args)
+{
+   u16 *num = (u16 *)args;
+   u16 tmp;
+
+   errno = 0;
+   tmp = strtoull(value, NULL, 10);
+   if (errno || !tmp) {
+   PMD_DRV_LOG(WARNING, "%s: \"%s\" is not a valid u16",
+   key, value);
+   return -1;
+   }
+
+   *num = tmp;
+
+   return 0;
+}
+
 static int iavf_parse_devargs(struct rte_eth_dev *dev)
 {
struct iavf_adapter *ad =
@@ -2118,6 +2142,20 @@ static int iavf_parse_devargs(struct rte_eth_dev *dev)
if (ret)
goto bail;
 
+   ret = rte_kvargs_process(kvlist, IAVF_QUANTA_SIZE_ARG,
+&parse_u16, &ad->devargs.quanta_size);
+   if (ret)
+   goto bail;
+
+   if (ad->devargs.quanta_size == 0)
+   ad->devargs.quanta_size = 1024;
+
+   if (ad->devargs.quanta_size < 256 || ad->devargs.quanta_size > 4096 ||
+   ad->devargs.quanta_size & 0x40) {
+   PMD_INIT_LOG(ERR, "invalid quanta size\n");
+   return -EINVAL;
+   }
+
 bail:
rte_kvargs_free(kvlist);
return ret;
diff --git a/drivers/net/iavf/iavf_vchnl.c b/drivers/net/iavf/iavf_vchnl.c
index 537369f736..f9452d14ae 100644
--- a/drivers/net/iavf/iavf_vchnl.c
+++ b/drivers/net/iavf/iavf_vchnl.c
@@ -1828,3 +1828,34 @@ iavf_ipsec_crypto_request(struct iavf_adapter *adapter,
 
return 0;
 }
+
+int
+iavf_set_vf_quanta_size(struct iavf_adapter *adapter, u16 start_queue_id, u16 
num_queues)
+{
+   struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(adapter);
+   struct iavf_cmd_info args;
+   struct virtchnl_quanta_cfg q_quanta;
+   int err;
+
+   if (adapter->devargs.quanta_size == 0)
+   return 0;
+
+   q_quanta.quanta_size = adapter->devargs.quanta_size;
+   q_quanta.queue_select.type = VIRTCHNL_QUEUE_TYPE_TX;
+   q_quanta.queue_select.start_queue_id = start_queue_id;
+   q_quanta.queue_select.num_queues = num_queues;
+
+   args.ops = VIRTCHNL_OP_CONFIG_QUANTA;
+   args.in_args = (uint8_t *)&q_quanta;
+   args.in_args_size = sizeof(q_quanta);
+   args.out_buffer = vf->aq_resp;
+   args.out_size = IAVF_AQ_BUF_SZ;
+
+   err = iavf_execute_vf_cmd(adapter, &args, 0);
+   if (err) {
+   PMD_DRV_LOG(ERR, "Failed to execute command 
VIRTCHNL_OP_CONFIG_QUANTA");
+   return err;
+   }
+
+   return 0;
+}
-- 
2.25.1



[PATCH v4 4/4] doc: add release notes for 22.07

2022-04-08 Thread Wenjun Wu
Add support for queue rate limit and quanta size configuration

Signed-off-by: Wenjun Wu 
---
 doc/guides/rel_notes/release_22_07.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/release_22_07.rst 
b/doc/guides/rel_notes/release_22_07.rst
index 42a5f2d990..f1b4057d70 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -55,6 +55,10 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Updated Intel iavf driver.**
+
+  * Added Tx QoS queue rate limitation support.
+  * Added quanta size configuration support.
 
 Removed Items
 -
-- 
2.25.1



[PATCH v3 0/3] Enable Protocol Agnostic Flow Offloading FDIR in AVF

2022-04-08 Thread Junfeng Guo
This patch set enabled Protocol Agnostic Flow (raw flow) Offloading
for FDIR in AVF.

[PATCH v3 1/3] common/iavf: support raw packet in protocol header
[PATCH v3 2/3] net/iavf: align with proto hdr struct change
[PATCH v3 3/3] net/iavf: enable Protocol Agnostic Flow Offloading FDIR

v3:
fix CI build issue.

v2:
add release notes and document update.

Junfeng Guo (3):
  common/iavf: support raw packet in protocol header
  net/iavf: align with proto hdr struct change
  net/iavf: enable Protocol Agnostic Flow Offloading FDIR

 doc/guides/rel_notes/release_22_07.rst |   4 +
 drivers/common/iavf/virtchnl.h |  20 ++-
 drivers/net/iavf/iavf_fdir.c   |  67 +
 drivers/net/iavf/iavf_generic_flow.c   |   6 +
 drivers/net/iavf/iavf_generic_flow.h   |   3 +
 drivers/net/iavf/iavf_hash.c   | 180 +
 6 files changed, 188 insertions(+), 92 deletions(-)

-- 
2.25.1



[PATCH v3 1/3] common/iavf: support raw packet in protocol header

2022-04-08 Thread Junfeng Guo
The patch extends existing virtchnl_proto_hdrs structure to allow VF
to pass a pair of buffers as packet data and mask that describe
a match pattern of a filter rule. Then the kernel PF driver is requested
to parse the pair of buffer and figure out low level hardware metadata
(ptype, profile, field vector.. ) to program the expected FDIR or RSS
rules.

INTERNAL ONLY:

This is requirement from DPDK to support Protocol Agnostic Flow
Offloading(*1). Previously we have integrated the Parser Library(*2)
into DPDK and enabled a raw packet based FDIR and RSS support in DPDK
PF driver(*3,*4), to enable the same feature for AVF driver, we need
Virtual Channel to support raw packet filter rule passing.

[1] https://wiki.ith.intel.com/display/NPGCVL/Protocol+Agnostic+Flow+Offloading
[2] 
http://patchwork.dpdk.org/project/dpdk/list/?series=19057&archive=both&state=*
[3] 
http://patchwork.dpdk.org/project/dpdk/list/?series=20254&state=%2A&archive=both
[4] 
http://patchwork.dpdk.org/project/dpdk/list/?series=20291&state=%2A&archive=both

Signed-off-by: Qi Zhang 
Signed-off-by: Junfeng Guo 
---
 drivers/common/iavf/virtchnl.h | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/common/iavf/virtchnl.h b/drivers/common/iavf/virtchnl.h
index 3e44eca7d8..3975229545 100644
--- a/drivers/common/iavf/virtchnl.h
+++ b/drivers/common/iavf/virtchnl.h
@@ -1482,6 +1482,7 @@ enum virtchnl_vfr_states {
 };
 
 #define VIRTCHNL_MAX_NUM_PROTO_HDRS32
+#define VIRTCHNL_MAX_SIZE_RAW_PACKET   1024
 #define PROTO_HDR_SHIFT5
 #define PROTO_HDR_FIELD_START(proto_hdr_type) \
(proto_hdr_type << PROTO_HDR_SHIFT)
@@ -1676,14 +1677,25 @@ VIRTCHNL_CHECK_STRUCT_LEN(72, virtchnl_proto_hdr);
 struct virtchnl_proto_hdrs {
u8 tunnel_level;
/**
-* specify where protocol header start from.
+* specify where protocol header start from. must be 0 when sending a 
raw packet request.
 * 0 - from the outer layer
 * 1 - from the first inner layer
 * 2 - from the second inner layer
 * 
-**/
-   int count; /* the proto layers must < VIRTCHNL_MAX_NUM_PROTO_HDRS */
-   struct virtchnl_proto_hdr proto_hdr[VIRTCHNL_MAX_NUM_PROTO_HDRS];
+*/
+   int count;
+   /**
+* number of proto layers, must < VIRTCHNL_MAX_NUM_PROTO_HDRS
+* must be 0 for a raw packet request.
+*/
+   union {
+   struct virtchnl_proto_hdr 
proto_hdr[VIRTCHNL_MAX_NUM_PROTO_HDRS];
+   struct {
+   u16 pkt_len;
+   u8 spec[VIRTCHNL_MAX_SIZE_RAW_PACKET];
+   u8 mask[VIRTCHNL_MAX_SIZE_RAW_PACKET];
+   } raw;
+   };
 };
 
 VIRTCHNL_CHECK_STRUCT_LEN(2312, virtchnl_proto_hdrs);
-- 
2.25.1



[PATCH v3 2/3] net/iavf: align with proto hdr struct change

2022-04-08 Thread Junfeng Guo
Structure virtchnl_proto_headrs is extended with a union struct for
proto_hdr table and raw struct. Thus update the proto_hdrs template
init to align the virtchnl changes.

Signed-off-by: Junfeng Guo 
---
 drivers/net/iavf/iavf_hash.c | 180 ++-
 1 file changed, 92 insertions(+), 88 deletions(-)

diff --git a/drivers/net/iavf/iavf_hash.c b/drivers/net/iavf/iavf_hash.c
index f35a07653b..278e75117d 100644
--- a/drivers/net/iavf/iavf_hash.c
+++ b/drivers/net/iavf/iavf_hash.c
@@ -181,252 +181,256 @@ iavf_hash_parse_pattern_action(struct iavf_adapter *ad,
 /* proto_hdrs template */
 struct virtchnl_proto_hdrs outer_ipv4_tmplt = {
TUNNEL_LEVEL_OUTER, 4,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv4}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv4}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv4_udp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
-proto_hdr_ipv4_with_prot,
-proto_hdr_udp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
+ proto_hdr_ipv4_with_prot,
+ proto_hdr_udp}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv4_tcp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
-proto_hdr_ipv4_with_prot,
-proto_hdr_tcp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
+ proto_hdr_ipv4_with_prot,
+ proto_hdr_tcp}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv4_sctp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv4,
-proto_hdr_sctp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv4,
+ proto_hdr_sctp}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv6_tmplt = {
TUNNEL_LEVEL_OUTER, 4,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv6}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv6}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv6_frag_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
-proto_hdr_ipv6, proto_hdr_ipv6_frag}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
+ proto_hdr_ipv6, proto_hdr_ipv6_frag}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv6_udp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
-proto_hdr_ipv6_with_prot,
-proto_hdr_udp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
+ proto_hdr_ipv6_with_prot,
+ proto_hdr_udp}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv6_tcp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
-proto_hdr_ipv6_with_prot,
-proto_hdr_tcp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan,
+ proto_hdr_ipv6_with_prot,
+ proto_hdr_tcp}}
 };
 
 struct virtchnl_proto_hdrs outer_ipv6_sctp_tmplt = {
TUNNEL_LEVEL_OUTER, 5,
-   {proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv6,
-proto_hdr_sctp}
+   {{proto_hdr_eth, proto_hdr_svlan, proto_hdr_cvlan, proto_hdr_ipv6,
+ proto_hdr_sctp}}
 };
 
 struct virtchnl_proto_hdrs inner_ipv4_tmplt = {
-   TUNNEL_LEVEL_INNER, 1, {proto_hdr_ipv4}
+   TUNNEL_LEVEL_INNER, 1, {{proto_hdr_ipv4}}
 };
 
 struct virtchnl_proto_hdrs inner_ipv4_udp_tmplt = {
-   TUNNEL_LEVEL_INNER, 2, {proto_hdr_ipv4_with_prot, proto_hdr_udp}
+   TUNNEL_LEVEL_INNER, 2, {{proto_hdr_ipv4_with_prot, proto_hdr_udp}}
 };
 
 struct virtchnl_proto_hdrs inner_ipv4_tcp_tmplt = {
-   TUNNEL_LEVEL_INNER, 2, {proto_hdr_ipv4_with_prot, proto_hdr_tcp}
+   TUNNEL_LEVEL_INNER, 2, {{proto_hdr_ipv4_with_prot, proto_hdr_tcp}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv4_tmplt = {
-   2, 1, {proto_hdr_ipv4}
+   2, 1, {{proto_hdr_ipv4}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv4_udp_tmplt = {
-   2, 2, {proto_hdr_ipv4_with_prot, proto_hdr_udp}
+   2, 2, {{proto_hdr_ipv4_with_prot, proto_hdr_udp}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv4_tcp_tmplt = {
-   2, 2, {proto_hdr_ipv4_with_prot, proto_hdr_tcp}
+   2, 2, {{proto_hdr_ipv4_with_prot, proto_hdr_tcp}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv6_tmplt = {
-   2, 1, {proto_hdr_ipv6}
+   2, 1, {{proto_hdr_ipv6}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv6_udp_tmplt = {
-   2, 2, {proto_hdr_ipv6_with_prot, proto_hdr_udp}
+   2, 2, {{proto_hdr_ipv6_with_prot, proto_hdr_udp}}
 };
 
 struct virtchnl_proto_hdrs second_inner_ipv6_tcp_tmplt = {
-   2, 2, {proto_hdr_ipv6_with_prot, proto_hdr_tcp}
+   2, 2, {{proto_hdr_ipv6_with_prot, proto_hdr_tcp}}
 };
 
 struct virtchnl_proto_hdrs inner_ipv4_sctp_tmplt = {
-   TUNNEL_LEVEL_INNER, 2, {proto_hdr_ipv4, proto_hdr_sctp}
+   TUNNEL_LEVEL_INNER, 2, {{proto_hdr_ipv4, proto_hd

[PATCH v3 3/3] net/iavf: enable Protocol Agnostic Flow Offloading FDIR

2022-04-08 Thread Junfeng Guo
This patch enabled Protocol Agnostic Flow Offloading FDIR in AVF.

Signed-off-by: Junfeng Guo 
---
 doc/guides/rel_notes/release_22_07.rst |  4 ++
 drivers/net/iavf/iavf_fdir.c   | 67 ++
 drivers/net/iavf/iavf_generic_flow.c   |  6 +++
 drivers/net/iavf/iavf_generic_flow.h   |  3 ++
 4 files changed, 80 insertions(+)

diff --git a/doc/guides/rel_notes/release_22_07.rst 
b/doc/guides/rel_notes/release_22_07.rst
index 42a5f2d990..43eab0b6d5 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -55,6 +55,10 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Updated Intel iavf driver.**
+
+  * Added Protocol Agnostic Flow Offloading support in AVF Flow Director.
+
 
 Removed Items
 -
diff --git a/drivers/net/iavf/iavf_fdir.c b/drivers/net/iavf/iavf_fdir.c
index e9a3566c0d..f236260502 100644
--- a/drivers/net/iavf/iavf_fdir.c
+++ b/drivers/net/iavf/iavf_fdir.c
@@ -194,6 +194,7 @@
IAVF_INSET_TUN_TCP_DST_PORT)
 
 static struct iavf_pattern_match_item iavf_fdir_pattern[] = {
+   {iavf_pattern_raw,   IAVF_INSET_NONE,   
IAVF_INSET_NONE},
{iavf_pattern_ethertype, IAVF_FDIR_INSET_ETH,   
IAVF_INSET_NONE},
{iavf_pattern_eth_ipv4,  IAVF_FDIR_INSET_ETH_IPV4,  
IAVF_INSET_NONE},
{iavf_pattern_eth_ipv4_udp,  IAVF_FDIR_INSET_ETH_IPV4_UDP,  
IAVF_INSET_NONE},
@@ -720,6 +721,7 @@ iavf_fdir_parse_pattern(__rte_unused struct iavf_adapter 
*ad,
struct virtchnl_proto_hdrs *hdrs =
&filter->add_fltr.rule_cfg.proto_hdrs;
enum rte_flow_item_type l3 = RTE_FLOW_ITEM_TYPE_END;
+   const struct rte_flow_item_raw *raw_spec, *raw_mask;
const struct rte_flow_item_eth *eth_spec, *eth_mask;
const struct rte_flow_item_ipv4 *ipv4_spec, *ipv4_last, *ipv4_mask;
const struct rte_flow_item_ipv6 *ipv6_spec, *ipv6_mask;
@@ -746,6 +748,7 @@ iavf_fdir_parse_pattern(__rte_unused struct iavf_adapter 
*ad,
enum rte_flow_item_type next_type;
uint8_t tun_inner = 0;
uint16_t ether_type, flags_version;
+   uint8_t item_num = 0;
int layer = 0;
 
uint8_t  ipv6_addr_mask[16] = {
@@ -763,8 +766,72 @@ iavf_fdir_parse_pattern(__rte_unused struct iavf_adapter 
*ad,
   RTE_FLOW_ERROR_TYPE_ITEM, item,
   "Not support range");
}
+   item_num++;
 
switch (item_type) {
+   case RTE_FLOW_ITEM_TYPE_RAW: {
+   raw_spec = item->spec;
+   raw_mask = item->mask;
+
+   if (item_num != 1)
+   return -rte_errno;
+
+   if (raw_spec->length != raw_mask->length)
+   return -rte_errno;
+
+   uint16_t pkt_len = 0;
+   uint16_t tmp_val = 0;
+   uint8_t tmp = 0;
+   int i, j;
+
+   pkt_len = raw_spec->length;
+
+   for (i = 0, j = 0; i < pkt_len; i += 2, j++) {
+   tmp = raw_spec->pattern[i];
+   if (tmp >= 'a' && tmp <= 'f')
+   tmp_val = tmp - 'a' + 10;
+   if (tmp >= 'A' && tmp <= 'F')
+   tmp_val = tmp - 'A' + 10;
+   if (tmp >= '0' && tmp <= '9')
+   tmp_val = tmp - '0';
+
+   tmp_val *= 16;
+   tmp = raw_spec->pattern[i + 1];
+   if (tmp >= 'a' && tmp <= 'f')
+   tmp_val += (tmp - 'a' + 10);
+   if (tmp >= 'A' && tmp <= 'F')
+   tmp_val += (tmp - 'A' + 10);
+   if (tmp >= '0' && tmp <= '9')
+   tmp_val += (tmp - '0');
+
+   hdrs->raw.spec[j] = tmp_val;
+
+   tmp = raw_mask->pattern[i];
+   if (tmp >= 'a' && tmp <= 'f')
+   tmp_val = tmp - 'a' + 10;
+   if (tmp >= 'A' && tmp <= 'F')
+   tmp_val = tmp - 'A' + 10;
+   if (tmp >= '0' && tmp <= '9')
+   tmp_val = tmp - '0';
+
+   tmp_val *= 16;
+   tmp = raw_mask->pattern[i + 1];
+   if (tmp >= 'a' && tmp <= 'f')
+  

Re: [dpdk-dev] [PATCH v5 2/2] hash: unify crc32 selection for x86 and Arm

2022-04-08 Thread David Marchand
On Tue, Jan 4, 2022 at 10:12 AM Ruifeng Wang  wrote:
> > From: pbhagavat...@marvell.com 
[snip]
> > -/**
> > - * Use single crc32 instruction to perform a hash on a 2 bytes value.
> > - * Fall back to software crc32 implementation in case SSE4.2 is
> > - * not supported
> > - *
> > - * @param data
> > - *   Data to perform hash on.
> > - * @param init_val
> > - *   Value to initialise hash generator.
> > - * @return
> > - *   32bit calculated hash value.
> > - */
> > -static inline uint32_t
> > -rte_hash_crc_2byte(uint16_t data, uint32_t init_val) -{ -#if defined
> > RTE_ARCH_X86
> > - if (likely(crc32_alg & CRC32_SSE42))
> > - return crc32c_sse42_u16(data, init_val);
> > +#if defined RTE_ARCH_ARM64
> > + RTE_LOG(WARNING, HASH,
> > + "Incorrect CRC32 algorithm requested setting best
> > available algorithm on the architecture\n");
> > + rte_hash_crc_set_alg(CRC32_ARM64);
> > +#endif
> > + break;
> > + case CRC32_ARM64:
> > +#if defined RTE_ARCH_ARM64
> > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
> > + crc32_alg = CRC32_ARM64;
> >  #endif
> > + #if defined RTE_ARCH_X86
> > + RTE_LOG(WARNING, HASH,
> > + "Incorrect CRC32 algorithm requested setting best
> > available algorithm on the architecture\n");
> > + rte_hash_crc_set_alg(CRC32_SSE42_x64);
> >  #endif
> > + break;
>
> I edited this part for readability.
> The 'break' need to be inside #if, so algorithm can fallback to CRC32_SW  
> when CRC32 feature is not available on hardware.

I marked this series in patchwork as changes requested.

Thanks.


-- 
David Marchand



Re: [PATCH] eal/windows: add missing C++ include guards

2022-04-08 Thread David Marchand
On Thu, Apr 7, 2022 at 1:20 PM Tyler Retzlaff
 wrote:
> On Tue, Apr 05, 2022 at 03:48:58PM +0200, David Marchand wrote:
> > Add missing 'extern "C"' to file.
> >
> > Fixes: 1db72630da0c ("eal/windows: do not expose private facilities")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: David Marchand 
> Acked-by: Tyler Retzlaff 

Applied, thanks.


-- 
David Marchand



[PATCH] net/nfp: update how MAX MTU is read

2022-04-08 Thread Walter Heymans
The 'max_rx_pktlen' value was previously read from hardware, which was
set by the running firmware. This caused confusion due to different
meanings of 'MAX_MTU'. This patch updates the 'max_rx_pktlen' to the
maximum value that the NFP NIC can support. The 'max_mtu' value that is
read from hardware, is assigned to the 'dev_info->max_mtu' variable.

If more layer 2 metadata must be used, the firmware can be updated to
report a smaller 'max_mtu' value.

The constant defined for NFP_FRAME_SIZE_MAX is derived for the maximum
supported buffer size of 10240, minus 136 bytes that is reserved by the
hardware and another 56 bytes reserved for expansion in firmware. This
results in a usable maximum packet length of 10048 bytes.

Signed-off-by: Walter Heymans 
Signed-off-by: Niklas Söderlund 
Reviewed-by: Louis Peens 
Reviewed-by: Chaoyong He 
Reviewed-by: Richard Donkin 
---
 drivers/net/nfp/nfp_common.c | 11 ++-
 drivers/net/nfp/nfp_common.h |  3 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/nfp/nfp_common.c b/drivers/net/nfp/nfp_common.c
index b26770dbfb..52fbda1a79 100644
--- a/drivers/net/nfp/nfp_common.c
+++ b/drivers/net/nfp/nfp_common.c
@@ -692,7 +692,16 @@ nfp_net_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
dev_info->min_rx_bufsize = RTE_ETHER_MIN_MTU;
-   dev_info->max_rx_pktlen = hw->max_mtu;
+   /*
+* The maximum rx packet length (max_rx_pktlen) is set to the
+* maximum supported frame size that the NFP can handle. This
+* includes layer 2 headers, CRC and other metadata that can
+* optionally be used.
+* The maximum layer 3 MTU (max_mtu) is read from hardware,
+* which was set by the firmware loaded onto the card.
+*/
+   dev_info->max_rx_pktlen = NFP_FRAME_SIZE_MAX;
+   dev_info->max_mtu = hw->max_mtu;
/* Next should change when PF support is implemented */
dev_info->max_mac_addrs = 1;
 
diff --git a/drivers/net/nfp/nfp_common.h b/drivers/net/nfp/nfp_common.h
index 8b35fa119c..8db5ec23f8 100644
--- a/drivers/net/nfp/nfp_common.h
+++ b/drivers/net/nfp/nfp_common.h
@@ -98,6 +98,9 @@ struct nfp_net_adapter;
 /* Number of supported physical ports */
 #define NFP_MAX_PHYPORTS   12
 
+/* Maximum supported NFP frame size (MTU + layer 2 headers) */
+#define NFP_FRAME_SIZE_MAX 10048
+
 #include 
 #include 
 
-- 
2.25.1



[PATCH] net/nfp: remove unneeded header inclusion

2022-04-08 Thread David Marchand
Looking at this driver history, there was never a need for including
execinfo.h.

Signed-off-by: David Marchand 
---
 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c 
b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
index bad80a5a1c..08bc4e8ef2 100644
--- a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
+++ b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
@@ -16,9 +16,6 @@
 
 #include 
 #include 
-#if defined(RTE_BACKTRACE)
-#include 
-#endif
 #include 
 #include 
 #include 
-- 
2.23.0



Re: OVS DPDK DMA-Dev library/Design Discussion

2022-04-08 Thread Ilya Maximets
On 4/8/22 09:13, Hu, Jiayu wrote:
> 
> 
>> -Original Message-
>> From: Ilya Maximets 
>> Sent: Thursday, April 7, 2022 10:40 PM
>> To: Maxime Coquelin ; Van Haaren, Harry
>> ; Morten Brørup
>> ; Richardson, Bruce
>> 
>> Cc: i.maxim...@ovn.org; Pai G, Sunil ; Stokes, Ian
>> ; Hu, Jiayu ; Ferriter, Cian
>> ; ovs-...@openvswitch.org; dev@dpdk.org;
>> Mcnamara, John ; O'Driscoll, Tim
>> ; Finn, Emma 
>> Subject: Re: OVS DPDK DMA-Dev library/Design Discussion
>>
>> On 4/7/22 16:25, Maxime Coquelin wrote:
>>> Hi Harry,
>>>
>>> On 4/7/22 16:04, Van Haaren, Harry wrote:
 Hi OVS & DPDK, Maintainers & Community,

 Top posting overview of discussion as replies to thread become slower:
 perhaps it is a good time to review and plan for next steps?

  From my perspective, it those most vocal in the thread seem to be in
 favour of the clean rx/tx split ("defer work"), with the tradeoff
 that the application must be aware of handling the async DMA
 completions. If there are any concerns opposing upstreaming of this
>> method, please indicate this promptly, and we can continue technical
>> discussions here now.
>>>
>>> Wasn't there some discussions about handling the Virtio completions
>>> with the DMA engine? With that, we wouldn't need the deferral of work.
>>
>> +1
>>
>> With the virtio completions handled by DMA itself, the vhost port turns
>> almost into a real HW NIC.  With that we will not need any extra
>> manipulations from the OVS side, i.e. no need to defer any work while
>> maintaining clear split between rx and tx operations.
> 
> First, making DMA do 2B copy would sacrifice performance, and I think
> we all agree on that.

I do not agree with that.  Yes, 2B copy by DMA will likely be slower
than done by CPU, however CPU is going away for dozens or even hundreds
of thousands of cycles to process a new packet batch or service other
ports, hence DMA will likely complete the transmission faster than
waiting for the CPU thread to come back to that task.  In any case,
this has to be tested.

> Second, this method comes with an issue of ordering.
> For example, PMD thread0 enqueue 10 packets to vring0 first, then PMD thread1
> enqueue 20 packets to vring0. If PMD thread0 and threa1 have own dedicated
> DMA device dma0 and dma1, flag/index update for the first 10 packets is done 
> by
> dma0, and flag/index update for the left 20 packets is done by dma1. But there
> is no ordering guarantee among different DMA devices, so flag/index update may
> error. If PMD threads don't have dedicated DMA devices, which means DMA
> devices are shared among threads, we need lock and pay for lock contention in
> data-path. Or we can allocate DMA devices for vring dynamically to avoid DMA
> sharing among threads. But what's the overhead of allocation mechanism? Who
> does it? Any thoughts?

1. DMA completion was discussed in context of per-queue allocation, so there
   is no re-ordering in this case.

2. Overhead can be minimal if allocated device can stick to the queue for a
   reasonable amount of time without re-allocation on every send.  You may
   look at XPS implementation in lib/dpif-netdev.c in OVS for example of
   such mechanism.  For sure it can not be the same, but ideas can be re-used.

3. Locking doesn't mean contention if resources are allocated/distributed
   thoughtfully.

4. Allocation can be done be either OVS or vhost library itself, I'd vote
   for doing that inside the vhost library, so any DPDK application and
   vhost ethdev can use it without re-inventing from scratch.  It also should
   be simpler from the API point of view if allocation and usage are in
   the same place.  But I don't have a strong opinion here as for now, since
   no real code examples exist, so it's hard to evaluate how they could look
   like.

But I feel like we're starting to run in circles here as I did already say
most of that before.

> 
> Thanks,
> Jiayu
> 
>>
>> I'd vote for that.
>>
>>>
>>> Thanks,
>>> Maxime
>>>
 In absence of continued technical discussion here, I suggest Sunil
 and Ian collaborate on getting the OVS Defer-work approach, and DPDK
 VHost Async patchsets available on GitHub for easier consumption and
>> future development (as suggested in slides presented on last call).

 Regards, -Harry

 No inline-replies below; message just for context.

> -Original Message-
> From: Van Haaren, Harry
> Sent: Wednesday, March 30, 2022 10:02 AM
> To: Morten Brørup ; Richardson, Bruce
> 
> Cc: Maxime Coquelin ; Pai G, Sunil
> ; Stokes, Ian ; Hu,
> Jiayu ; Ferriter, Cian
> ; Ilya Maximets ;
> ovs-...@openvswitch.org; dev@dpdk.org; Mcnamara, John
> ; O'Driscoll, Tim
> ; Finn, Emma 
> Subject: RE: OVS DPDK DMA-Dev library/Design Discussion
>
>> -Original Message-
>> From: Morten Brørup 
>> Sent: Tuesday, March 29, 2022 8:59 PM
>> To: Van Haaren, Harry ; Richardson,
>>

[PATCH 0/3] add IPsec AH test cases

2022-04-08 Thread Archana Muniganti
Add IPsec AH known test vectors including combined
mode support.

Archana Muniganti (3):
  test/crypto: add AH under combined mode UT
  test/crypto: add AH test vectors
  test/crypto: add AH AES-GMAC test vectors

 app/test/test_cryptodev.c | 150 +++-
 app/test/test_cryptodev_security_ipsec.c  |  86 -
 app/test/test_cryptodev_security_ipsec.h  |  17 +
 ...st_cryptodev_security_ipsec_test_vectors.h | 326 ++
 doc/guides/rel_notes/release_22_03.rst|   5 +
 5 files changed, 569 insertions(+), 15 deletions(-)

-- 
2.22.0



[PATCH 1/3] test/crypto: add AH under combined mode UT

2022-04-08 Thread Archana Muniganti
Added auth only and null cipher + auth under combined mode
for following combinations.
1. Tunnel IPv4
2. Transport IPv4

Signed-off-by: Archana Muniganti 
---
 app/test/test_cryptodev.c| 97 
 app/test/test_cryptodev_security_ipsec.c | 74 +++---
 app/test/test_cryptodev_security_ipsec.h |  8 ++
 doc/guides/rel_notes/release_22_03.rst   |  3 +
 4 files changed, 172 insertions(+), 10 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index a0c8926776..eda4a5b6f1 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -845,6 +845,7 @@ ipsec_proto_testsuite_setup(void)
}
 
test_ipsec_alg_list_populate();
+   test_ipsec_ah_alg_list_populate();
 
/*
 * Stop the device. Device would be started again by individual test
@@ -9238,6 +9239,19 @@ test_ipsec_proto_process(const struct ipsec_test_data 
td[],
"Crypto capabilities not supported\n");
return TEST_SKIPPED;
}
+   } else if (td[0].auth_only) {
+   memcpy(&ut_params->auth_xform, &td[0].xform.chain.auth,
+  sizeof(ut_params->auth_xform));
+   ut_params->auth_xform.auth.key.data = td[0].auth_key.data;
+
+   if (test_ipsec_crypto_caps_auth_verify(
+   sec_cap,
+   &ut_params->auth_xform) != 0) {
+   if (!silent)
+   RTE_LOG(INFO, USER1,
+   "Auth crypto capabilities not 
supported\n");
+   return TEST_SKIPPED;
+   }
} else {
memcpy(&ut_params->cipher_xform, &td[0].xform.chain.cipher,
   sizeof(ut_params->cipher_xform));
@@ -9281,6 +9295,9 @@ test_ipsec_proto_process(const struct ipsec_test_data 
td[],
memcpy(&ipsec_xform.salt, td[0].salt.data, salt_len);
sess_conf.ipsec = ipsec_xform;
sess_conf.crypto_xform = &ut_params->aead_xform;
+   } else if (td[0].auth_only) {
+   sess_conf.ipsec = ipsec_xform;
+   sess_conf.crypto_xform = &ut_params->auth_xform;
} else {
sess_conf.ipsec = ipsec_xform;
if (dir == RTE_SECURITY_IPSEC_SA_DIR_EGRESS) {
@@ -9526,6 +9543,52 @@ test_ipsec_proto_all(const struct ipsec_test_flags 
*flags)
return TEST_SKIPPED;
 }
 
+static int
+test_ipsec_ah_proto_all(const struct ipsec_test_flags *flags)
+{
+   struct ipsec_test_data td_outb[IPSEC_TEST_PACKETS_MAX];
+   struct ipsec_test_data td_inb[IPSEC_TEST_PACKETS_MAX];
+   unsigned int i, nb_pkts = 1, pass_cnt = 0;
+   int ret;
+
+   for (i = 0; i < RTE_DIM(ah_alg_list); i++) {
+   test_ipsec_td_prepare(ah_alg_list[i].param1,
+ ah_alg_list[i].param2,
+ flags,
+ td_outb,
+ nb_pkts);
+
+   ret = test_ipsec_proto_process(td_outb, td_inb, nb_pkts, true,
+  flags);
+   if (ret == TEST_SKIPPED)
+   continue;
+
+   if (ret == TEST_FAILED)
+   return TEST_FAILED;
+
+   test_ipsec_td_update(td_inb, td_outb, nb_pkts, flags);
+
+   ret = test_ipsec_proto_process(td_inb, NULL, nb_pkts, true,
+  flags);
+   if (ret == TEST_SKIPPED)
+   continue;
+
+   if (ret == TEST_FAILED)
+   return TEST_FAILED;
+
+   if (flags->display_alg)
+   test_ipsec_display_alg(ah_alg_list[i].param1,
+  ah_alg_list[i].param2);
+
+   pass_cnt++;
+   }
+
+   if (pass_cnt > 0)
+   return TEST_SUCCESS;
+   else
+   return TEST_SKIPPED;
+}
+
 static int
 test_ipsec_proto_display_list(const void *data __rte_unused)
 {
@@ -9538,6 +9601,32 @@ test_ipsec_proto_display_list(const void *data 
__rte_unused)
return test_ipsec_proto_all(&flags);
 }
 
+static int
+test_ipsec_proto_ah_tunnel_ipv4(const void *data __rte_unused)
+{
+   struct ipsec_test_flags flags;
+
+   memset(&flags, 0, sizeof(flags));
+
+   flags.ah = true;
+   flags.display_alg = true;
+
+   return test_ipsec_ah_proto_all(&flags);
+}
+
+static int
+test_ipsec_proto_ah_transport_ipv4(const void *data __rte_unused)
+{
+   struct ipsec_test_flags flags;
+
+   memset(&flags, 0, sizeof(flags));
+
+   flags.ah = true;
+   flags.transport = true;
+
+   return test_ipsec_ah_proto_all(&flags);
+}
+
 static int
 test_ipsec_proto_iv_gen(const void *data __rte_unused)
 {
@@ -15047,6 +15136,1

[PATCH 2/3] test/crypto: add AH test vectors

2022-04-08 Thread Archana Muniganti
Added tunnel and transport AH known test vectors for
SHA256 HMAC.

Signed-off-by: Archana Muniganti 
---
 app/test/test_cryptodev.c |  33 ++-
 ...st_cryptodev_security_ipsec_test_vectors.h | 210 ++
 doc/guides/rel_notes/release_22_03.rst|   1 +
 3 files changed, 240 insertions(+), 4 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index eda4a5b6f1..e152d45e1c 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -9139,6 +9139,8 @@ test_ipsec_proto_process(const struct ipsec_test_data 
td[],
0x, 0x001a};
uint16_t v6_dst[8] = {0x2001, 0x0470, 0xe5bf, 0xdead, 0x4957, 0x2174,
0xe82c, 0x4887};
+   const struct rte_ipv4_hdr *ipv4 =
+   (const struct rte_ipv4_hdr *)td[0].output_text.data;
struct crypto_testsuite_params *ts_params = &testsuite_params;
struct crypto_unittest_params *ut_params = &unittest_params;
struct rte_security_capability_idx sec_cap_idx;
@@ -9147,11 +9149,10 @@ test_ipsec_proto_process(const struct ipsec_test_data 
td[],
uint8_t dev_id = ts_params->valid_devs[0];
enum rte_security_ipsec_sa_direction dir;
struct ipsec_test_data *res_d_tmp = NULL;
-   uint32_t src = RTE_IPV4(192, 168, 1, 0);
-   uint32_t dst = RTE_IPV4(192, 168, 1, 1);
int salt_len, i, ret = TEST_SUCCESS;
struct rte_security_ctx *ctx;
uint8_t *input_text;
+   uint32_t src, dst;
uint32_t verify;
 
ut_params->type = RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL;
@@ -9165,6 +9166,9 @@ test_ipsec_proto_process(const struct ipsec_test_data 
td[],
dir = ipsec_xform.direction;
verify = flags->tunnel_hdr_verify;
 
+   memcpy(&src, &ipv4->src_addr, sizeof(ipv4->src_addr));
+   memcpy(&dst, &ipv4->dst_addr, sizeof(ipv4->dst_addr));
+
if ((dir == RTE_SECURITY_IPSEC_SA_DIR_INGRESS) && verify) {
if (verify == RTE_SECURITY_IPSEC_TUNNEL_VERIFY_SRC_DST_ADDR)
src += 1;
@@ -9431,8 +9435,9 @@ test_ipsec_proto_known_vec(const void *test_data)
 
memcpy(&td_outb, test_data, sizeof(td_outb));
 
-   if (td_outb.aead ||
-   td_outb.xform.chain.cipher.cipher.algo != RTE_CRYPTO_CIPHER_NULL) {
+   if ((td_outb.ipsec_xform.proto != RTE_SECURITY_IPSEC_SA_PROTO_AH) &&
+   (td_outb.aead || (td_outb.xform.chain.cipher.cipher.algo !=
+   RTE_CRYPTO_CIPHER_NULL))) {
/* Disable IV gen to be able to test with known vectors */
td_outb.ipsec_xform.options.iv_gen_disable = 1;
}
@@ -15082,6 +15087,16 @@ static struct unit_test_suite ipsec_proto_testsuite  = 
{
ut_setup_security, ut_teardown,
test_ipsec_proto_known_vec,
&pkt_null_aes_xcbc),
+   TEST_CASE_NAMED_WITH_DATA(
+   "Outbound known vector (AH tunnel mode IPv4 
HMAC-SHA256)",
+   ut_setup_security, ut_teardown,
+   test_ipsec_proto_known_vec,
+   &pkt_ah_tunnel_sha256),
+   TEST_CASE_NAMED_WITH_DATA(
+   "Outbound known vector (AH transport mode IPv4 
HMAC-SHA256)",
+   ut_setup_security, ut_teardown,
+   test_ipsec_proto_known_vec,
+   &pkt_ah_transport_sha256),
TEST_CASE_NAMED_WITH_DATA(
"Outbound fragmented packet",
ut_setup_security, ut_teardown,
@@ -15132,6 +15147,16 @@ static struct unit_test_suite ipsec_proto_testsuite  = 
{
ut_setup_security, ut_teardown,
test_ipsec_proto_known_vec_inb,
&pkt_null_aes_xcbc),
+   TEST_CASE_NAMED_WITH_DATA(
+   "Inbound known vector (AH tunnel mode IPv4 
HMAC-SHA256)",
+   ut_setup_security, ut_teardown,
+   test_ipsec_proto_known_vec_inb,
+   &pkt_ah_tunnel_sha256),
+   TEST_CASE_NAMED_WITH_DATA(
+   "Inbound known vector (AH transport mode IPv4 
HMAC-SHA256)",
+   ut_setup_security, ut_teardown,
+   test_ipsec_proto_known_vec_inb,
+   &pkt_ah_transport_sha256),
TEST_CASE_NAMED_ST(
"Combined test alg list",
ut_setup_security, ut_teardown,
diff --git a/app/test/test_cryptodev_security_ipsec_test_vectors.h 
b/app/test/test_cryptodev_security_ipsec_test_vectors.h
index fe2fd855df..f50986e9b4 100644
--- a/app/test/test_cryptodev_security_ipsec_test_vectors.h
+++ b/app/test/test_cryptodev_security_ipsec_test_vectors.h
@@ -1153,4 +1153,214 @@ struct ipsec_test_data pkt_null_aes_xcbc = {
},
 };

[PATCH 3/3] test/crypto: add AH AES-GMAC test vectors

2022-04-08 Thread Archana Muniganti
Added AES_GMAC test vectors along with combined mode support.

Signed-off-by: Archana Muniganti 
---
 app/test/test_cryptodev.c |  26 +++-
 app/test/test_cryptodev_security_ipsec.c  |  12 ++
 app/test/test_cryptodev_security_ipsec.h  |   9 ++
 ...st_cryptodev_security_ipsec_test_vectors.h | 116 ++
 doc/guides/rel_notes/release_22_03.rst|   1 +
 5 files changed, 160 insertions(+), 4 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index e152d45e1c..f444144cc6 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -9294,9 +9294,12 @@ test_ipsec_proto_process(const struct ipsec_test_data 
td[],
.protocol = RTE_SECURITY_PROTOCOL_IPSEC,
};
 
-   if (td[0].aead) {
+   if (td[0].aead || td[0].aes_gmac) {
salt_len = RTE_MIN(sizeof(ipsec_xform.salt), td[0].salt.len);
memcpy(&ipsec_xform.salt, td[0].salt.data, salt_len);
+   }
+
+   if (td[0].aead) {
sess_conf.ipsec = ipsec_xform;
sess_conf.crypto_xform = &ut_params->aead_xform;
} else if (td[0].auth_only) {
@@ -9377,6 +9380,8 @@ test_ipsec_proto_process(const struct ipsec_test_data 
td[],
 
if (td[i].aead)
len = td[i].xform.aead.aead.iv.length;
+   else if (td[i].aes_gmac)
+   len = td[i].xform.chain.auth.auth.iv.length;
else
len = td[i].xform.chain.cipher.cipher.iv.length;
 
@@ -9435,9 +9440,9 @@ test_ipsec_proto_known_vec(const void *test_data)
 
memcpy(&td_outb, test_data, sizeof(td_outb));
 
-   if ((td_outb.ipsec_xform.proto != RTE_SECURITY_IPSEC_SA_PROTO_AH) &&
-   (td_outb.aead || (td_outb.xform.chain.cipher.cipher.algo !=
-   RTE_CRYPTO_CIPHER_NULL))) {
+   if (td_outb.aes_gmac || td_outb.aead ||
+   ((td_outb.ipsec_xform.proto != RTE_SECURITY_IPSEC_SA_PROTO_AH) &&
+(td_outb.xform.chain.cipher.cipher.algo != 
RTE_CRYPTO_CIPHER_NULL))) {
/* Disable IV gen to be able to test with known vectors */
td_outb.ipsec_xform.options.iv_gen_disable = 1;
}
@@ -9506,6 +9511,9 @@ test_ipsec_proto_all(const struct ipsec_test_flags *flags)
cipher_alg = td_outb->xform.chain.cipher.cipher.algo;
auth_alg = td_outb->xform.chain.auth.auth.algo;
 
+   if (td_outb->aes_gmac && cipher_alg != 
RTE_CRYPTO_CIPHER_NULL)
+   continue;
+
/* ICV is not applicable for NULL auth */
if (flags->icv_corrupt &&
auth_alg == RTE_CRYPTO_AUTH_NULL)
@@ -15097,6 +15105,11 @@ static struct unit_test_suite ipsec_proto_testsuite  = 
{
ut_setup_security, ut_teardown,
test_ipsec_proto_known_vec,
&pkt_ah_transport_sha256),
+   TEST_CASE_NAMED_WITH_DATA(
+   "Outbound known vector (AH transport mode IPv4 AES-GMAC 
128)",
+   ut_setup_security, ut_teardown,
+   test_ipsec_proto_known_vec,
+   &pkt_ah_ipv4_aes_gmac_128),
TEST_CASE_NAMED_WITH_DATA(
"Outbound fragmented packet",
ut_setup_security, ut_teardown,
@@ -15157,6 +15170,11 @@ static struct unit_test_suite ipsec_proto_testsuite  = 
{
ut_setup_security, ut_teardown,
test_ipsec_proto_known_vec_inb,
&pkt_ah_transport_sha256),
+   TEST_CASE_NAMED_WITH_DATA(
+   "Inbound known vector (AH transport mode IPv4 AES-GMAC 
128)",
+   ut_setup_security, ut_teardown,
+   test_ipsec_proto_known_vec_inb,
+   &pkt_ah_ipv4_aes_gmac_128),
TEST_CASE_NAMED_ST(
"Combined test alg list",
ut_setup_security, ut_teardown,
diff --git a/app/test/test_cryptodev_security_ipsec.c 
b/app/test/test_cryptodev_security_ipsec.c
index 6098c3edc3..14c6ba681f 100644
--- a/app/test/test_cryptodev_security_ipsec.c
+++ b/app/test/test_cryptodev_security_ipsec.c
@@ -412,6 +412,12 @@ test_ipsec_td_prepare(const struct crypto_param *param1,
td->xform.chain.auth.auth.digest_length =
param1->digest_length;
td->auth_only = true;
+
+   if (td->xform.chain.auth.auth.algo == 
RTE_CRYPTO_AUTH_AES_GMAC) {
+   td->xform.chain.auth.auth.iv.length =
+   param1->iv_length;
+ 

[PATCH 0/2] populate mbuf in latency test

2022-04-08 Thread Archana Muniganti
For decrypt, ICV mismatch can come as data is dummy and
latency will be calculated for error path. Hence populate
mbuf with test vector data.

Archana Muniganti (2):
  app/crypto-perf: populate mbuf in latency test
  app/crypto-perf: add vector file for AES-GCM

 app/test-crypto-perf/cperf_ops.c   |  3 +-
 app/test-crypto-perf/cperf_test_common.c   | 36 
 app/test-crypto-perf/cperf_test_common.h   |  5 ++
 app/test-crypto-perf/cperf_test_latency.c  |  6 ++
 app/test-crypto-perf/cperf_test_verify.c   | 36 
 app/test-crypto-perf/data/aes_gcm_128.data | 97 ++
 6 files changed, 146 insertions(+), 37 deletions(-)
 create mode 100644 app/test-crypto-perf/data/aes_gcm_128.data

-- 
2.22.0



[PATCH 1/2] app/crypto-perf: populate mbuf in latency test

2022-04-08 Thread Archana Muniganti
For decrypt, ICV mismatch can come as data is dummy and
latency will be calculated for error path. Hence populate
mbuf with test vector data.

Signed-off-by: Archana Muniganti 
---
 app/test-crypto-perf/cperf_ops.c  |  3 +-
 app/test-crypto-perf/cperf_test_common.c  | 36 +++
 app/test-crypto-perf/cperf_test_common.h  |  5 
 app/test-crypto-perf/cperf_test_latency.c |  6 
 app/test-crypto-perf/cperf_test_verify.c  | 36 ---
 5 files changed, 49 insertions(+), 37 deletions(-)

diff --git a/app/test-crypto-perf/cperf_ops.c b/app/test-crypto-perf/cperf_ops.c
index 8baee12e45..97b719e13b 100644
--- a/app/test-crypto-perf/cperf_ops.c
+++ b/app/test-crypto-perf/cperf_ops.c
@@ -620,7 +620,8 @@ cperf_set_ops_aead(struct rte_crypto_op **ops,
}
}
 
-   if (options->test == CPERF_TEST_TYPE_VERIFY) {
+   if ((options->test == CPERF_TEST_TYPE_VERIFY) ||
+   (options->test == CPERF_TEST_TYPE_LATENCY)) {
for (i = 0; i < nb_ops; i++) {
uint8_t *iv_ptr = rte_crypto_op_ctod_offset(ops[i],
uint8_t *, iv_offset);
diff --git a/app/test-crypto-perf/cperf_test_common.c 
b/app/test-crypto-perf/cperf_test_common.c
index 97a1ea47ad..00aadc9a47 100644
--- a/app/test-crypto-perf/cperf_test_common.c
+++ b/app/test-crypto-perf/cperf_test_common.c
@@ -262,3 +262,39 @@ cperf_alloc_common_memory(const struct cperf_options 
*options,
 
return 0;
 }
+
+void
+cperf_mbuf_set(struct rte_mbuf *mbuf,
+   const struct cperf_options *options,
+   const struct cperf_test_vector *test_vector)
+{
+   uint32_t segment_sz = options->segment_sz;
+   uint8_t *mbuf_data;
+   uint8_t *test_data;
+   uint32_t remaining_bytes = options->max_buffer_size;
+
+   if (options->op_type == CPERF_AEAD) {
+   test_data = (options->aead_op == RTE_CRYPTO_AEAD_OP_ENCRYPT) ?
+   test_vector->plaintext.data :
+   test_vector->ciphertext.data;
+   } else {
+   test_data =
+   (options->cipher_op == RTE_CRYPTO_CIPHER_OP_ENCRYPT) ?
+   test_vector->plaintext.data :
+   test_vector->ciphertext.data;
+   }
+
+   while (remaining_bytes) {
+   mbuf_data = rte_pktmbuf_mtod(mbuf, uint8_t *);
+
+   if (remaining_bytes <= segment_sz) {
+   memcpy(mbuf_data, test_data, remaining_bytes);
+   return;
+   }
+
+   memcpy(mbuf_data, test_data, segment_sz);
+   remaining_bytes -= segment_sz;
+   test_data += segment_sz;
+   mbuf = mbuf->next;
+   }
+}
diff --git a/app/test-crypto-perf/cperf_test_common.h 
b/app/test-crypto-perf/cperf_test_common.h
index 3ace0d2e58..a603a607d5 100644
--- a/app/test-crypto-perf/cperf_test_common.h
+++ b/app/test-crypto-perf/cperf_test_common.h
@@ -21,4 +21,9 @@ cperf_alloc_common_memory(const struct cperf_options *options,
uint32_t *dst_buf_offset,
struct rte_mempool **pool);
 
+void
+cperf_mbuf_set(struct rte_mbuf *mbuf,
+   const struct cperf_options *options,
+   const struct cperf_test_vector *test_vector);
+
 #endif /* _CPERF_TEST_COMMON_H_ */
diff --git a/app/test-crypto-perf/cperf_test_latency.c 
b/app/test-crypto-perf/cperf_test_latency.c
index 9ada431660..6f972cea49 100644
--- a/app/test-crypto-perf/cperf_test_latency.c
+++ b/app/test-crypto-perf/cperf_test_latency.c
@@ -201,6 +201,12 @@ cperf_latency_test_runner(void *arg)
ctx->test_vector, iv_offset,
&imix_idx, &tsc_start);
 
+   /* Populate the mbuf with the test vector */
+   for (i = 0; i < burst_size; i++)
+   cperf_mbuf_set(ops[i]->sym->m_src,
+   ctx->options,
+   ctx->test_vector);
+
tsc_start = rte_rdtsc_precise();
 
 #ifdef CPERF_LINEARIZATION_ENABLE
diff --git a/app/test-crypto-perf/cperf_test_verify.c 
b/app/test-crypto-perf/cperf_test_verify.c
index c031330afc..5c0dc82290 100644
--- a/app/test-crypto-perf/cperf_test_verify.c
+++ b/app/test-crypto-perf/cperf_test_verify.c
@@ -195,42 +195,6 @@ cperf_verify_op(struct rte_crypto_op *op,
return !!res;
 }
 
-static void
-cperf_mbuf_set(struct rte_mbuf *mbuf,
-   const struct cperf_options *options,
-   const struct cperf_test_vector *test_vector)
-{
-   uint32_t segment_sz = options->segment_sz;
-   uint8_t *mbuf_data;
-   uint8_t *test_data;
-   uint32_t remaining_bytes = options->max_buffer_size;
-
-   if (options->op_type 

[PATCH 2/2] app/crypto-perf: add vector file for AES-GCM

2022-04-08 Thread Archana Muniganti
Added test vector file for AES-128-GCM for
64B and 512B length buffers.

Signed-off-by: Archana Muniganti 
---
 app/test-crypto-perf/data/aes_gcm_128.data | 97 ++
 1 file changed, 97 insertions(+)
 create mode 100644 app/test-crypto-perf/data/aes_gcm_128.data

diff --git a/app/test-crypto-perf/data/aes_gcm_128.data 
b/app/test-crypto-perf/data/aes_gcm_128.data
new file mode 100644
index 00..197f0aa99d
--- /dev/null
+++ b/app/test-crypto-perf/data/aes_gcm_128.data
@@ -0,0 +1,97 @@
+# List of tests for AES-128 GCM:
+# 1) [aead_buff_64]
+# 2) [aead_buff_512]
+
+##
+# GLOBAL #
+##
+plaintext =
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 
0xaa, 0xaa, 0xaa
+
+ciphertext =
+0x4c, 0x35, 0x11, 0xb6, 0x13, 0xcf, 0x34, 0x22, 0x1d, 0x33, 0xbd, 0x9b, 0x75, 
0xad, 0x5e, 0x33,
+0xa5, 0x77, 0xda, 0xfb, 0x7e, 0x57, 0x20, 0x3d, 0xc5, 0xe8, 0x19, 0x3a, 0x92, 
0x59, 0xfb, 0xc5,
+0xff, 0x47, 0x49, 0x8e, 0xcb, 0x4f, 0x8e, 0x6c, 0xcd, 0x9f, 0x81, 0x27, 0xa4, 
0xac, 0xaa, 0xe1,
+0xd0, 0x6a, 0xb0, 0x96, 0x05, 0x68, 0x8e, 0xe8, 0x44, 0x63, 0x12, 0x2a, 0xef, 
0x3d, 0xc3, 0xf9,
+0xcf, 0xd6, 0x31, 0x04, 0x88, 0xbb, 0xfb, 0xe0, 0x44, 0xcc, 0xef, 0x10, 0xb7, 
0xaf, 0x5e, 0x90,
+0x07, 0x10, 0xd8, 0x85, 0x59, 0x99, 0x29, 0x2a, 0xa8, 0x83, 0x21, 0x8d, 0x5f, 
0x02, 0xed, 0xa6,
+0x22, 0xa5, 0x9e, 0x09, 0xa6, 0x52, 0x84, 0x88, 0xb8, 0x1f, 0x90, 0x70, 0xab, 
0x2c, 0x2c, 0x45,
+0x6f, 0xdc, 0xca, 0x38, 0x3a, 0x11, 0xd0, 0x27, 0x24, 0x09, 0xf8, 0xbf, 0xa2, 
0x8f, 0xd3, 0x37,
+0xb4, 0x08, 0x0a, 0x61, 0xb9, 0x77, 0x92, 0xbd, 0x49, 0x36, 0x67, 0xe7, 0xef, 
0x81, 0x50, 0x7f,
+0xbb, 0x23, 0x46, 0x4a, 0xb9, 0x34, 0x98, 0xa2, 0xb8, 0x52, 0x86, 0x0e, 0xbd, 
0x6d, 0x11, 0x0a,
+0x91, 0x5c, 0x6d, 0x68, 0xea, 0x05, 0x47, 0x93, 0x33, 0x09, 0x28, 0x8a, 0xe5, 
0x2f, 0x10, 0x9f,
+0xd9, 0xb8, 0x4c, 0x7c, 0x23, 0x8e, 0x08, 0x03, 0xe5, 0x8b, 0x07, 0xd9, 0x29, 
0x52, 0x96, 0x98,
+0xe6, 0x40, 0x55, 0x62, 0xf0, 

[PATCH v1 0/2] vhost: add unsafe API to get DMA inflight packets

2022-04-08 Thread xuan . ding
From: Xuan Ding 

This patchset introduces an unsafe API to get the number of inflight
packets in DMA engine. It should be only used within the vhost ops
which already holds the lock. Like vring state changes or device is
destroyed. Compared with rte_vhost_async_get_inflight(), this is a
lock free version.

RFC v1->v1:
* refine the doc and commit log

Xuan Ding (2):
  vhost: add unsafe API to check inflight packets
  examples/vhost: use API to check inflight packets

 doc/guides/prog_guide/vhost_lib.rst|  6 ++
 doc/guides/rel_notes/release_22_07.rst |  4 
 examples/vhost/main.c  | 28 ++
 examples/vhost/main.h  |  1 -
 lib/vhost/rte_vhost_async.h| 17 
 lib/vhost/version.map  |  4 
 lib/vhost/vhost.c  | 26 
 7 files changed, 72 insertions(+), 14 deletions(-)

-- 
2.17.1



[PATCH v1 1/2] vhost: add unsafe API to check inflight packets

2022-04-08 Thread xuan . ding
From: Xuan Ding 

In async data path, when vring state changes or device is destroyed,
it is necessary to know the number of inflight packets in DMA engine.
This patch provides a thread unsafe API to return the number of
inflight packets for a vhost queue without using any lock.

Signed-off-by: Xuan Ding 
---
 doc/guides/prog_guide/vhost_lib.rst|  6 ++
 doc/guides/rel_notes/release_22_07.rst |  4 
 lib/vhost/rte_vhost_async.h| 17 +
 lib/vhost/version.map  |  4 
 lib/vhost/vhost.c  | 26 ++
 5 files changed, 57 insertions(+)

diff --git a/doc/guides/prog_guide/vhost_lib.rst 
b/doc/guides/prog_guide/vhost_lib.rst
index 886f8f5e72..f287b76ebf 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -271,6 +271,12 @@ The following is an overview of some key Vhost API 
functions:
   This function returns the amount of in-flight packets for the vhost
   queue using async acceleration.
 
+ * ``rte_vhost_async_get_inflight_thread_unsafe(vid, queue_id)``
+
+  Get the number of inflight packets for a vhost queue without performing
+  any locking. It should only be used within the vhost ops, which already
+  holds the lock.
+
 * ``rte_vhost_clear_queue_thread_unsafe(vid, queue_id, **pkts, count, dma_id, 
vchan_id)``
 
   Clear inflight packets which are submitted to DMA engine in vhost async data
diff --git a/doc/guides/rel_notes/release_22_07.rst 
b/doc/guides/rel_notes/release_22_07.rst
index 42a5f2d990..a0c5d9459b 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -55,6 +55,10 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Added vhost API to get the number of inflight packets.**
+
+  Added an API which can get the number of inflight packets in
+  vhost async data path without using lock.
 
 Removed Items
 -
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index f1293c6a9d..70234debf9 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -139,6 +139,23 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid, 
uint16_t queue_id,
 __rte_experimental
 int rte_vhost_async_get_inflight(int vid, uint16_t queue_id);
 
+/**
+ * This function is lock-free version to return the amount of in-flight
+ * packets for the vhost queue which uses async channel acceleration.
+ *
+ * @note This function does not perform any locking, it should only be
+ * used within the vhost ops, which already holds the lock.
+ *
+ * @param vid
+ * id of vhost device to enqueue data
+ * @param queue_id
+ * queue id to enqueue data
+ * @return
+ * the amount of in-flight packets on success; -1 on failure
+ */
+__rte_experimental
+int rte_vhost_async_get_inflight_thread_unsafe(int vid, uint16_t queue_id);
+
 /**
  * This function checks async completion status and clear packets for
  * a specific vhost device queue. Packets which are inflight will be
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 0a66c5840c..5841315386 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -87,6 +87,10 @@ EXPERIMENTAL {
 
# added in 22.03
rte_vhost_async_dma_configure;
+
+   # added in 22.07
+   rte_vhost_async_get_inflight_thread_unsafe;
+
 };
 
 INTERNAL {
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 2f96a28dac..df0bb9d043 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -1907,6 +1907,32 @@ rte_vhost_async_get_inflight(int vid, uint16_t queue_id)
return ret;
 }
 
+int
+rte_vhost_async_get_inflight_thread_unsafe(int vid, uint16_t queue_id)
+{
+   struct vhost_virtqueue *vq;
+   struct virtio_net *dev = get_device(vid);
+   int ret = -1;
+
+   if (dev == NULL)
+   return ret;
+
+   if (queue_id >= VHOST_MAX_VRING)
+   return ret;
+
+   vq = dev->virtqueue[queue_id];
+
+   if (vq == NULL)
+   return ret;
+
+   if (!vq->async)
+   return ret;
+
+   ret = vq->async->pkts_inflight_n;
+
+   return ret;
+}
+
 int
 rte_vhost_get_monitor_addr(int vid, uint16_t queue_id,
struct rte_vhost_power_monitor_cond *pmc)
-- 
2.17.1



[PATCH v1 2/2] examples/vhost: use API to check inflight packets

2022-04-08 Thread xuan . ding
From: Xuan Ding 

In async data path, call rte_vhost_async_get_inflight_thread_unsafe()
API to directly return the number of inflight packets instead of
maintaining a local variable.

Signed-off-by: Xuan Ding 
Reviewed-by: Maxime Coquelin 
---
 examples/vhost/main.c | 28 +++-
 examples/vhost/main.h |  1 -
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d94fabb060..c4d46de1c5 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -994,10 +994,8 @@ complete_async_pkts(struct vhost_dev *vdev)
 
complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
VIRTIO_RXQ, p_cpl, MAX_PKT_BURST, 
dma_id, 0);
-   if (complete_count) {
+   if (complete_count)
free_pkts(p_cpl, complete_count);
-   __atomic_sub_fetch(&vdev->pkts_inflight, complete_count, 
__ATOMIC_SEQ_CST);
-   }
 
 }
 
@@ -1039,7 +1037,6 @@ drain_vhost(struct vhost_dev *vdev)
 
complete_async_pkts(vdev);
ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, 
nr_xmit, dma_id, 0);
-   __atomic_add_fetch(&vdev->pkts_inflight, ret, __ATOMIC_SEQ_CST);
 
enqueue_fail = nr_xmit - ret;
if (enqueue_fail)
@@ -1368,7 +1365,6 @@ drain_eth_rx(struct vhost_dev *vdev)
complete_async_pkts(vdev);
enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
VIRTIO_RXQ, pkts, rx_count, dma_id, 0);
-   __atomic_add_fetch(&vdev->pkts_inflight, enqueue_count, 
__ATOMIC_SEQ_CST);
 
enqueue_fail = rx_count - enqueue_count;
if (enqueue_fail)
@@ -1540,14 +1536,17 @@ destroy_device(int vid)
 
if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
uint16_t n_pkt = 0;
+   int pkts_inflight;
int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-   struct rte_mbuf *m_cpl[vdev->pkts_inflight];
+   pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid, 
VIRTIO_RXQ);
+   struct rte_mbuf *m_cpl[pkts_inflight];
 
-   while (vdev->pkts_inflight) {
+   while (pkts_inflight) {
n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, 
VIRTIO_RXQ,
-   m_cpl, vdev->pkts_inflight, 
dma_id, 0);
+   m_cpl, pkts_inflight, dma_id, 
0);
free_pkts(m_cpl, n_pkt);
-   __atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, 
__ATOMIC_SEQ_CST);
+   pkts_inflight = 
rte_vhost_async_get_inflight_thread_unsafe(vid,
+   
VIRTIO_RXQ);
}
 
rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
@@ -1651,14 +1650,17 @@ vring_state_changed(int vid, uint16_t queue_id, int 
enable)
if (dma_bind[vid].dmas[queue_id].async_enabled) {
if (!enable) {
uint16_t n_pkt = 0;
+   int pkts_inflight;
+   pkts_inflight = 
rte_vhost_async_get_inflight_thread_unsafe(vid, queue_id);
int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-   struct rte_mbuf *m_cpl[vdev->pkts_inflight];
+   struct rte_mbuf *m_cpl[pkts_inflight];
 
-   while (vdev->pkts_inflight) {
+   while (pkts_inflight) {
n_pkt = 
rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-   m_cpl, 
vdev->pkts_inflight, dma_id, 0);
+   m_cpl, pkts_inflight, 
dma_id, 0);
free_pkts(m_cpl, n_pkt);
-   __atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, 
__ATOMIC_SEQ_CST);
+   pkts_inflight = 
rte_vhost_async_get_inflight_thread_unsafe(vid,
+   
queue_id);
}
}
}
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index b4a453e77e..e7f395c3c9 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -52,7 +52,6 @@ struct vhost_dev {
uint64_t features;
size_t hdr_len;
uint16_t nr_vrings;
-   uint16_t pkts_inflight;
struct rte_vhost_memory *mem;
struct device_statistics stats;
TAILQ_ENTRY(vhost_dev) global_vdev_entry;
-- 
2.17.1



[PATCH v2] common/sff_module: add telemetry command to dump module EEPROM

2022-04-08 Thread Robin Zhang
This patch introduce a new telemetry command '/sff_module/info'
to dump format module EEPROM information.

The format support for SFP(Small Formfactor Pluggable)/SFP+
/QSFP+(Quad Small Formfactor Pluggable)/QSFP28 modules based on
SFF(Small Form Factor) Committee specifications
SFF-8079/SFF-8472/SFF-8024/SFF-8636.

Signed-off-by: Robin Zhang 
---

v2:
- Redesign the dump function as a telemetry command, so that the EEPROM
  information can be used by other app.

- The usage like this:

  Launch the primary application with telemetry:
  Take testpmd as example: ./app/dpdk-testpmd

  Then launch the telemetry client script:
  ./usertools/dpdk-telemetry.py

  In telemetry client run command:
  --> /sff_module/info,

  Both primary application and telemetry client will show the formated
  module EEPROM information.

 drivers/common/meson.build|1 +
 drivers/common/sff_module/meson.build |   16 +
 drivers/common/sff_module/sff_8079.c  |  672 ++
 drivers/common/sff_module/sff_8472.c  |  301 ++
 drivers/common/sff_module/sff_8636.c  | 1004 +
 drivers/common/sff_module/sff_8636.h  |  592 
 drivers/common/sff_module/sff_common.c|  415 +
 drivers/common/sff_module/sff_common.h|  192 
 drivers/common/sff_module/sff_telemetry.c |  142 +++
 drivers/common/sff_module/sff_telemetry.h |   41 +
 drivers/common/sff_module/version.map |9 +
 11 files changed, 3385 insertions(+)
 create mode 100644 drivers/common/sff_module/meson.build
 create mode 100644 drivers/common/sff_module/sff_8079.c
 create mode 100644 drivers/common/sff_module/sff_8472.c
 create mode 100644 drivers/common/sff_module/sff_8636.c
 create mode 100644 drivers/common/sff_module/sff_8636.h
 create mode 100644 drivers/common/sff_module/sff_common.c
 create mode 100644 drivers/common/sff_module/sff_common.h
 create mode 100644 drivers/common/sff_module/sff_telemetry.c
 create mode 100644 drivers/common/sff_module/sff_telemetry.h
 create mode 100644 drivers/common/sff_module/version.map

diff --git a/drivers/common/meson.build b/drivers/common/meson.build
index ea261dd70a..7b183769ca 100644
--- a/drivers/common/meson.build
+++ b/drivers/common/meson.build
@@ -8,4 +8,5 @@ drivers = [
 'iavf',
 'mvep',
 'octeontx',
+'sff_module',
 ]
diff --git a/drivers/common/sff_module/meson.build 
b/drivers/common/sff_module/meson.build
new file mode 100644
index 00..1160b07ba2
--- /dev/null
+++ b/drivers/common/sff_module/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019-2021 Intel Corporation
+
+sources = files(
+'sff_common.c',
+'sff_8079.c',
+'sff_8472.c',
+'sff_8636.c',
+'sff_telemetry.c'
+)
+
+deps += ['ethdev']
+
+if cc.has_argument('-Wno-pointer-to-int-cast')
+cflags += '-Wno-pointer-to-int-cast'
+endif
\ No newline at end of file
diff --git a/drivers/common/sff_module/sff_8079.c 
b/drivers/common/sff_module/sff_8079.c
new file mode 100644
index 00..173cb0493e
--- /dev/null
+++ b/drivers/common/sff_module/sff_8079.c
@@ -0,0 +1,672 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2022 Intel Corporation
+ *
+ * Implements SFF-8079 optics diagnostics.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "sff_common.h"
+#include "sff_telemetry.h"
+
+static void sff_8079_show_identifier(const uint8_t *id, sff_item *items)
+{
+   sff_8024_show_identifier(id, 0, items);
+}
+
+static void sff_8079_show_ext_identifier(const uint8_t *id, sff_item *items)
+{
+   char val_string[TMP_STRING_SIZE];
+
+   printf("%-41s : 0x%02x", "Extended identifier", id[1]);
+   sprintf(val_string, "0x%02x", id[1]);
+   if (id[1] == 0x00) {
+   printf(" (GBIC not specified / not MOD_DEF compliant)\n");
+   strcat(val_string, " (GBIC not specified / not MOD_DEF 
compliant)");
+   } else if (id[1] == 0x04) {
+   printf(" (GBIC/SFP defined by 2-wire interface ID)\n");
+   strcat(val_string, " (GBIC/SFP defined by 2-wire interface 
ID)");
+   } else if (id[1] <= 0x07) {
+   printf(" (GBIC compliant with MOD_DEF %u)\n", id[1]);
+   char *tmp;
+   sprintf(tmp, " (GBIC compliant with MOD_DEF %u)", id[1]);
+   strcat(val_string, tmp);
+   } else {
+   printf(" (unknown)\n");
+   strcat(val_string, " (unknown)");
+   }
+   add_item_string(items, "Extended identifier", val_string);
+}
+
+static void sff_8079_show_connector(const uint8_t *id, sff_item *items)
+{
+   sff_8024_show_connector(id, 2, items);
+}
+
+static void sff_8079_show_transceiver(const uint8_t *id, sff_item *items)
+{
+   static const char *pfx =
+   "Transceiver type  :";
+   char val_string[TMP_STRING_SIZE];
+
+   printf("%-41s : 0x%02x 0x%02x 0x%02x 0x%

Re: [PATCH v2] common/sff_module: add telemetry command to dump module EEPROM

2022-04-08 Thread Bruce Richardson
On Fri, Apr 08, 2022 at 10:23:30AM +, Robin Zhang wrote:
> This patch introduce a new telemetry command '/sff_module/info'
> to dump format module EEPROM information.
> 
> The format support for SFP(Small Formfactor Pluggable)/SFP+
> /QSFP+(Quad Small Formfactor Pluggable)/QSFP28 modules based on
> SFF(Small Form Factor) Committee specifications
> SFF-8079/SFF-8472/SFF-8024/SFF-8636.
> 
> Signed-off-by: Robin Zhang 
> ---
> 
> v2:
> - Redesign the dump function as a telemetry command, so that the EEPROM
>   information can be used by other app.
> 
> - The usage like this:
> 
>   Launch the primary application with telemetry:
>   Take testpmd as example: ./app/dpdk-testpmd
> 
>   Then launch the telemetry client script:
>   ./usertools/dpdk-telemetry.py
> 
>   In telemetry client run command:
>   --> /sff_module/info,
> 
>   Both primary application and telemetry client will show the formated
>   module EEPROM information.
> 
>  drivers/common/meson.build|1 +
>  drivers/common/sff_module/meson.build |   16 +
>  drivers/common/sff_module/sff_8079.c  |  672 ++
>  drivers/common/sff_module/sff_8472.c  |  301 ++
>  drivers/common/sff_module/sff_8636.c  | 1004 +
>  drivers/common/sff_module/sff_8636.h  |  592 
>  drivers/common/sff_module/sff_common.c|  415 +
>  drivers/common/sff_module/sff_common.h|  192 
>  drivers/common/sff_module/sff_telemetry.c |  142 +++
>  drivers/common/sff_module/sff_telemetry.h |   41 +
>  drivers/common/sff_module/version.map |9 +
>  11 files changed, 3385 insertions(+)
>  create mode 100644 drivers/common/sff_module/meson.build
>  create mode 100644 drivers/common/sff_module/sff_8079.c
>  create mode 100644 drivers/common/sff_module/sff_8472.c
>  create mode 100644 drivers/common/sff_module/sff_8636.c
>  create mode 100644 drivers/common/sff_module/sff_8636.h
>  create mode 100644 drivers/common/sff_module/sff_common.c
>  create mode 100644 drivers/common/sff_module/sff_common.h
>  create mode 100644 drivers/common/sff_module/sff_telemetry.c
>  create mode 100644 drivers/common/sff_module/sff_telemetry.h
>  create mode 100644 drivers/common/sff_module/version.map
> 
Is this is whole new driver just to provide telemetry dumps of SFP
information? I can understand the problem somewhat - though I am in some
doubt that telemetry is the best way to expose this information - but
creating a new driver seems the wrong approach here. SFPs are for NIC
devices, so why isn't this available in a common API such as ethdev?

/Bruce


RE: [PATCH v2] common/sff_module: add telemetry command to dump module EEPROM

2022-04-08 Thread Zhang, RobinX
Hi Bruce,

> -Original Message-
> From: Richardson, Bruce 
> Sent: Friday, April 8, 2022 6:33 PM
> To: Zhang, RobinX 
> Cc: dev@dpdk.org; Yang, Qiming ; Zhang, Qi Z
> ; Yang, SteveX 
> Subject: Re: [PATCH v2] common/sff_module: add telemetry command to
> dump module EEPROM
> 
> On Fri, Apr 08, 2022 at 10:23:30AM +, Robin Zhang wrote:
> > This patch introduce a new telemetry command '/sff_module/info'
> > to dump format module EEPROM information.
> >
> > The format support for SFP(Small Formfactor Pluggable)/SFP+
> > /QSFP+(Quad Small Formfactor Pluggable)/QSFP28 modules based on
> > SFF(Small Form Factor) Committee specifications
> > SFF-8079/SFF-8472/SFF-8024/SFF-8636.
> >
> > Signed-off-by: Robin Zhang 
> > ---
> >
> > v2:
> > - Redesign the dump function as a telemetry command, so that the
> EEPROM
> >   information can be used by other app.
> >
> > - The usage like this:
> >
> >   Launch the primary application with telemetry:
> >   Take testpmd as example: ./app/dpdk-testpmd
> >
> >   Then launch the telemetry client script:
> >   ./usertools/dpdk-telemetry.py
> >
> >   In telemetry client run command:
> >   --> /sff_module/info,
> >
> >   Both primary application and telemetry client will show the formated
> >   module EEPROM information.
> >
> >  drivers/common/meson.build|1 +
> >  drivers/common/sff_module/meson.build |   16 +
> >  drivers/common/sff_module/sff_8079.c  |  672 ++
> >  drivers/common/sff_module/sff_8472.c  |  301 ++
> >  drivers/common/sff_module/sff_8636.c  | 1004
> +
> >  drivers/common/sff_module/sff_8636.h  |  592 
> >  drivers/common/sff_module/sff_common.c|  415 +
> >  drivers/common/sff_module/sff_common.h|  192 
> >  drivers/common/sff_module/sff_telemetry.c |  142 +++
> >  drivers/common/sff_module/sff_telemetry.h |   41 +
> >  drivers/common/sff_module/version.map |9 +
> >  11 files changed, 3385 insertions(+)
> >  create mode 100644 drivers/common/sff_module/meson.build
> >  create mode 100644 drivers/common/sff_module/sff_8079.c
> >  create mode 100644 drivers/common/sff_module/sff_8472.c
> >  create mode 100644 drivers/common/sff_module/sff_8636.c
> >  create mode 100644 drivers/common/sff_module/sff_8636.h
> >  create mode 100644 drivers/common/sff_module/sff_common.c
> >  create mode 100644 drivers/common/sff_module/sff_common.h
> >  create mode 100644 drivers/common/sff_module/sff_telemetry.c
> >  create mode 100644 drivers/common/sff_module/sff_telemetry.h
> >  create mode 100644 drivers/common/sff_module/version.map
> >
> Is this is whole new driver just to provide telemetry dumps of SFP
> information? I can understand the problem somewhat - though I am in some
> doubt that telemetry is the best way to expose this information - but
> creating a new driver seems the wrong approach here. SFPs are for NIC
> devices, so why isn't this available in a common API such as ethdev?
> 

I have considered add this function as a new telemetry command of ethdev (like 
'/ethdev/sff_module_info') to dump these SFP information.
But I'm not sure if it's acceptable to add all these production code 
(sff_8xxx.c) into lib/ethdev?
If it's OK, I can make V3 patches to change it as a telemetry command of ethdev.

> /Bruce


Re: [PATCH v2] common/sff_module: add telemetry command to dump module EEPROM

2022-04-08 Thread Bruce Richardson
On Fri, Apr 08, 2022 at 11:55:07AM +0100, Zhang, RobinX wrote:
> Hi Bruce,
> 
> > -Original Message-
> > From: Richardson, Bruce 
> > Sent: Friday, April 8, 2022 6:33 PM
> > To: Zhang, RobinX 
> > Cc: dev@dpdk.org; Yang, Qiming ; Zhang, Qi Z
> > ; Yang, SteveX 
> > Subject: Re: [PATCH v2] common/sff_module: add telemetry command to
> > dump module EEPROM
> >
> > On Fri, Apr 08, 2022 at 10:23:30AM +, Robin Zhang wrote:
> > > This patch introduce a new telemetry command '/sff_module/info'
> > > to dump format module EEPROM information.
> > >
> > > The format support for SFP(Small Formfactor Pluggable)/SFP+
> > > /QSFP+(Quad Small Formfactor Pluggable)/QSFP28 modules based on
> > > SFF(Small Form Factor) Committee specifications
> > > SFF-8079/SFF-8472/SFF-8024/SFF-8636.
> > >
> > > Signed-off-by: Robin Zhang 
> > > ---
> > >
> > > v2:
> > > - Redesign the dump function as a telemetry command, so that the
> > EEPROM
> > >   information can be used by other app.
> > >
> > > - The usage like this:
> > >
> > >   Launch the primary application with telemetry:
> > >   Take testpmd as example: ./app/dpdk-testpmd
> > >
> > >   Then launch the telemetry client script:
> > >   ./usertools/dpdk-telemetry.py
> > >
> > >   In telemetry client run command:
> > >   --> /sff_module/info,
> > >
> > >   Both primary application and telemetry client will show the formated
> > >   module EEPROM information.
> > >
> > >  drivers/common/meson.build|1 +
> > >  drivers/common/sff_module/meson.build |   16 +
> > >  drivers/common/sff_module/sff_8079.c  |  672 ++
> > >  drivers/common/sff_module/sff_8472.c  |  301 ++
> > >  drivers/common/sff_module/sff_8636.c  | 1004
> > +
> > >  drivers/common/sff_module/sff_8636.h  |  592 
> > >  drivers/common/sff_module/sff_common.c|  415 +
> > >  drivers/common/sff_module/sff_common.h|  192 
> > >  drivers/common/sff_module/sff_telemetry.c |  142 +++
> > >  drivers/common/sff_module/sff_telemetry.h |   41 +
> > >  drivers/common/sff_module/version.map |9 +
> > >  11 files changed, 3385 insertions(+)
> > >  create mode 100644 drivers/common/sff_module/meson.build
> > >  create mode 100644 drivers/common/sff_module/sff_8079.c
> > >  create mode 100644 drivers/common/sff_module/sff_8472.c
> > >  create mode 100644 drivers/common/sff_module/sff_8636.c
> > >  create mode 100644 drivers/common/sff_module/sff_8636.h
> > >  create mode 100644 drivers/common/sff_module/sff_common.c
> > >  create mode 100644 drivers/common/sff_module/sff_common.h
> > >  create mode 100644 drivers/common/sff_module/sff_telemetry.c
> > >  create mode 100644 drivers/common/sff_module/sff_telemetry.h
> > >  create mode 100644 drivers/common/sff_module/version.map
> > >
> > Is this is whole new driver just to provide telemetry dumps of SFP
> > information? I can understand the problem somewhat - though I am in some
> > doubt that telemetry is the best way to expose this information - but
> > creating a new driver seems the wrong approach here. SFPs are for NIC
> > devices, so why isn't this available in a common API such as ethdev?
> >
> 
> I have considered add this function as a new telemetry command of ethdev 
> (like '/ethdev/sff_module_info') to dump these SFP information.
> But I'm not sure if it's acceptable to add all these production code 
> (sff_8xxx.c) into lib/ethdev?
> If it's OK, I can make V3 patches to change it as a telemetry command of 
> ethdev.
> 

Hi,

I think some discussion is needed before you go preparing a new version of
this patchset.

Some initial questions:

1. Does SFF code apply only to Intel products/NICs or is it multi-vendor?
2. For the driver approach you previously took, how was the presence of
   hardware detected to load the driver? 
3. Does this work on SFPs need to interact with the NIC drivers in any way?

Thanks,
/Bruce


[PATCH] net/ice: optimize max queue number calculation

2022-04-08 Thread Qi Zhang
Remove the limitation that max queue pair number must be 2^n.
With this patch, even on a 8 ports device, the max queue pair
number increased from 128 to 254.

Signed-off-by: Qi Zhang 
---
 drivers/net/ice/ice_ethdev.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 73e550f5fb..03e6ed97fc 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -819,10 +819,26 @@ ice_vsi_config_tc_queue_mapping(struct ice_vsi *vsi,
return -ENOTSUP;
}
 
-   vsi->nb_qps = RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC);
-   fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps) - 1;
-   /* Adjust the queue number to actual queues that can be applied */
-   vsi->nb_qps = (vsi->nb_qps == 0) ? 0 : 0x1 << fls;
+   /* vector 0 is reserved and 1 vector for ctrl vsi */
+   if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2)
+   vsi->nb_qps = 0;
+   else
+   vsi->nb_qps = RTE_MIN(
+   
(uint16_t)vsi->adapter->hw.func_caps.common_cap.num_msix_vectors - 2,
+   RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC));
+
+   /* nb_qps(hex)  -> fls */
+   /*  -> 0 */
+   /* 0001 -> 0 */
+   /* 0002 -> 1 */
+   /* 0003 ~ 0004  -> 2 */
+   /* 0005 ~ 0008  -> 3 */
+   /* 0009 ~ 0010  -> 4 */
+   /* 0011 ~ 0020  -> 5 */
+   /* 0021 ~ 0040  -> 6 */
+   /* 0041 ~ 0080  -> 7 */
+   /* 0081 ~ 0100  -> 8 */
+   fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps - 1);
 
qp_idx = 0;
/* Set tc and queue mapping with VSI */
-- 
2.26.2



[PATCH v2] net/ice: optimize max queue number calculation

2022-04-08 Thread Qi Zhang
Remove the limitation that max queue pair number must be 2^n.
With this patch, even on a 8 ports device, the max queue pair
number increased from 128 to 254.

Signed-off-by: Qi Zhang 
---

v2:
- fix check patch warning

 drivers/net/ice/ice_ethdev.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 73e550f5fb..ff2b3e45d9 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -819,10 +819,26 @@ ice_vsi_config_tc_queue_mapping(struct ice_vsi *vsi,
return -ENOTSUP;
}
 
-   vsi->nb_qps = RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC);
-   fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps) - 1;
-   /* Adjust the queue number to actual queues that can be applied */
-   vsi->nb_qps = (vsi->nb_qps == 0) ? 0 : 0x1 << fls;
+   /* vector 0 is reserved and 1 vector for ctrl vsi */
+   if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2)
+   vsi->nb_qps = 0;
+   else
+   vsi->nb_qps = RTE_MIN
+   
((uint16_t)vsi->adapter->hw.func_caps.common_cap.num_msix_vectors - 2,
+   RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC));
+
+   /* nb_qps(hex)  -> fls */
+   /*  -> 0 */
+   /* 0001 -> 0 */
+   /* 0002 -> 1 */
+   /* 0003 ~ 0004  -> 2 */
+   /* 0005 ~ 0008  -> 3 */
+   /* 0009 ~ 0010  -> 4 */
+   /* 0011 ~ 0020  -> 5 */
+   /* 0021 ~ 0040  -> 6 */
+   /* 0041 ~ 0080  -> 7 */
+   /* 0081 ~ 0100  -> 8 */
+   fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps - 1);
 
qp_idx = 0;
/* Set tc and queue mapping with VSI */
-- 
2.26.2



RE: [PATCH v2] common/sff_module: add telemetry command to dump module EEPROM

2022-04-08 Thread Zhang, RobinX
Hi Bruce

> -Original Message-
> From: Richardson, Bruce 
> Sent: Friday, April 8, 2022 7:01 PM
> To: Zhang, RobinX 
> Cc: dev@dpdk.org; Yang, Qiming ; Zhang, Qi Z
> ; Yang, SteveX 
> Subject: Re: [PATCH v2] common/sff_module: add telemetry command to
> dump module EEPROM
> 
> On Fri, Apr 08, 2022 at 11:55:07AM +0100, Zhang, RobinX wrote:
> > Hi Bruce,
> >
> > > -Original Message-
> > > From: Richardson, Bruce 
> > > Sent: Friday, April 8, 2022 6:33 PM
> > > To: Zhang, RobinX 
> > > Cc: dev@dpdk.org; Yang, Qiming ; Zhang, Qi Z
> > > ; Yang, SteveX 
> > > Subject: Re: [PATCH v2] common/sff_module: add telemetry command
> to
> > > dump module EEPROM
> > >
> > > On Fri, Apr 08, 2022 at 10:23:30AM +, Robin Zhang wrote:
> > > > This patch introduce a new telemetry command '/sff_module/info'
> > > > to dump format module EEPROM information.
> > > >
> > > > The format support for SFP(Small Formfactor Pluggable)/SFP+
> > > > /QSFP+(Quad Small Formfactor Pluggable)/QSFP28 modules based on
> > > > SFF(Small Form Factor) Committee specifications
> > > > SFF-8079/SFF-8472/SFF-8024/SFF-8636.
> > > >
> > > > Signed-off-by: Robin Zhang 
> > > > ---
> > > >
> > > > v2:
> > > > - Redesign the dump function as a telemetry command, so that the
> > > EEPROM
> > > >   information can be used by other app.
> > > >
> > > > - The usage like this:
> > > >
> > > >   Launch the primary application with telemetry:
> > > >   Take testpmd as example: ./app/dpdk-testpmd
> > > >
> > > >   Then launch the telemetry client script:
> > > >   ./usertools/dpdk-telemetry.py
> > > >
> > > >   In telemetry client run command:
> > > >   --> /sff_module/info,
> > > >
> > > >   Both primary application and telemetry client will show the formated
> > > >   module EEPROM information.
> > > >
> > > >  drivers/common/meson.build|1 +
> > > >  drivers/common/sff_module/meson.build |   16 +
> > > >  drivers/common/sff_module/sff_8079.c  |  672 ++
> > > >  drivers/common/sff_module/sff_8472.c  |  301 ++
> > > >  drivers/common/sff_module/sff_8636.c  | 1004
> > > +
> > > >  drivers/common/sff_module/sff_8636.h  |  592 
> > > >  drivers/common/sff_module/sff_common.c|  415 +
> > > >  drivers/common/sff_module/sff_common.h|  192 
> > > >  drivers/common/sff_module/sff_telemetry.c |  142 +++
> > > >  drivers/common/sff_module/sff_telemetry.h |   41 +
> > > >  drivers/common/sff_module/version.map |9 +
> > > >  11 files changed, 3385 insertions(+)  create mode 100644
> > > > drivers/common/sff_module/meson.build
> > > >  create mode 100644 drivers/common/sff_module/sff_8079.c
> > > >  create mode 100644 drivers/common/sff_module/sff_8472.c
> > > >  create mode 100644 drivers/common/sff_module/sff_8636.c
> > > >  create mode 100644 drivers/common/sff_module/sff_8636.h
> > > >  create mode 100644 drivers/common/sff_module/sff_common.c
> > > >  create mode 100644 drivers/common/sff_module/sff_common.h
> > > >  create mode 100644 drivers/common/sff_module/sff_telemetry.c
> > > >  create mode 100644 drivers/common/sff_module/sff_telemetry.h
> > > >  create mode 100644 drivers/common/sff_module/version.map
> > > >
> > > Is this is whole new driver just to provide telemetry dumps of SFP
> > > information? I can understand the problem somewhat - though I am in
> > > some doubt that telemetry is the best way to expose this information
> > > - but creating a new driver seems the wrong approach here. SFPs are
> > > for NIC devices, so why isn't this available in a common API such as
> ethdev?
> > >
> >
> > I have considered add this function as a new telemetry command of
> ethdev (like '/ethdev/sff_module_info') to dump these SFP information.
> > But I'm not sure if it's acceptable to add all these production code
> (sff_8xxx.c) into lib/ethdev?
> > If it's OK, I can make V3 patches to change it as a telemetry command of
> ethdev.
> >
> 
> Hi,
> 
> I think some discussion is needed before you go preparing a new version of
> this patchset.
> 
> Some initial questions:
> 
> 1. Does SFF code apply only to Intel products/NICs or is it multi-vendor?
The SFF code apply to multi-vendor. 
In fact, it's applied to all the NIC driver which implemented 
dev_ops->get_module_eeprom.

> 2. For the driver approach you previously took, how was the presence of
>hardware detected to load the driver?
The purpose of put these production code into drivers/common is want to treat 
it as a common function for NIC drivers.
It will not related to any presence of hardware.

> 3. Does this work on SFPs need to interact with the NIC drivers in any way?
> 
Yes, just like my answer in question 1, the module EEPROM raw data is get from 
dev_ops->get_module_eeprom.
So need the NIC drivers to implement dev_ops->get_module_eeprom.

> Thanks,
> /Bruce


Re: [PATCH v2] common/sff_module: add telemetry command to dump module EEPROM

2022-04-08 Thread Bruce Richardson
On Fri, Apr 08, 2022 at 12:20:23PM +0100, Zhang, RobinX wrote:
> Hi Bruce
> 
> > -Original Message-
> > From: Richardson, Bruce 
> > Sent: Friday, April 8, 2022 7:01 PM
> > To: Zhang, RobinX 
> > Cc: dev@dpdk.org; Yang, Qiming ; Zhang, Qi Z
> > ; Yang, SteveX 
> > Subject: Re: [PATCH v2] common/sff_module: add telemetry command to
> > dump module EEPROM
> >
> > On Fri, Apr 08, 2022 at 11:55:07AM +0100, Zhang, RobinX wrote:
> > > Hi Bruce,
> > >
> > > > -Original Message-
> > > > From: Richardson, Bruce 
> > > > Sent: Friday, April 8, 2022 6:33 PM
> > > > To: Zhang, RobinX 
> > > > Cc: dev@dpdk.org; Yang, Qiming ; Zhang, Qi Z
> > > > ; Yang, SteveX 
> > > > Subject: Re: [PATCH v2] common/sff_module: add telemetry command
> > to
> > > > dump module EEPROM
> > > >
> > > > On Fri, Apr 08, 2022 at 10:23:30AM +, Robin Zhang wrote:
> > > > > This patch introduce a new telemetry command '/sff_module/info'
> > > > > to dump format module EEPROM information.
> > > > >
> > > > > The format support for SFP(Small Formfactor Pluggable)/SFP+
> > > > > /QSFP+(Quad Small Formfactor Pluggable)/QSFP28 modules based on
> > > > > SFF(Small Form Factor) Committee specifications
> > > > > SFF-8079/SFF-8472/SFF-8024/SFF-8636.
> > > > >
> > > > > Signed-off-by: Robin Zhang 
> > > > > ---
> > > > >
> > > > > v2:
> > > > > - Redesign the dump function as a telemetry command, so that the
> > > > EEPROM
> > > > >   information can be used by other app.
> > > > >
> > > > > - The usage like this:
> > > > >
> > > > >   Launch the primary application with telemetry:
> > > > >   Take testpmd as example: ./app/dpdk-testpmd
> > > > >
> > > > >   Then launch the telemetry client script:
> > > > >   ./usertools/dpdk-telemetry.py
> > > > >
> > > > >   In telemetry client run command:
> > > > >   --> /sff_module/info,
> > > > >
> > > > >   Both primary application and telemetry client will show the formated
> > > > >   module EEPROM information.
> > > > >
> > > > >  drivers/common/meson.build|1 +
> > > > >  drivers/common/sff_module/meson.build |   16 +
> > > > >  drivers/common/sff_module/sff_8079.c  |  672 ++
> > > > >  drivers/common/sff_module/sff_8472.c  |  301 ++
> > > > >  drivers/common/sff_module/sff_8636.c  | 1004
> > > > +
> > > > >  drivers/common/sff_module/sff_8636.h  |  592 
> > > > >  drivers/common/sff_module/sff_common.c|  415 +
> > > > >  drivers/common/sff_module/sff_common.h|  192 
> > > > >  drivers/common/sff_module/sff_telemetry.c |  142 +++
> > > > >  drivers/common/sff_module/sff_telemetry.h |   41 +
> > > > >  drivers/common/sff_module/version.map |9 +
> > > > >  11 files changed, 3385 insertions(+)  create mode 100644
> > > > > drivers/common/sff_module/meson.build
> > > > >  create mode 100644 drivers/common/sff_module/sff_8079.c
> > > > >  create mode 100644 drivers/common/sff_module/sff_8472.c
> > > > >  create mode 100644 drivers/common/sff_module/sff_8636.c
> > > > >  create mode 100644 drivers/common/sff_module/sff_8636.h
> > > > >  create mode 100644 drivers/common/sff_module/sff_common.c
> > > > >  create mode 100644 drivers/common/sff_module/sff_common.h
> > > > >  create mode 100644 drivers/common/sff_module/sff_telemetry.c
> > > > >  create mode 100644 drivers/common/sff_module/sff_telemetry.h
> > > > >  create mode 100644 drivers/common/sff_module/version.map
> > > > >
> > > > Is this is whole new driver just to provide telemetry dumps of SFP
> > > > information? I can understand the problem somewhat - though I am in
> > > > some doubt that telemetry is the best way to expose this information
> > > > - but creating a new driver seems the wrong approach here. SFPs are
> > > > for NIC devices, so why isn't this available in a common API such as
> > ethdev?
> > > >
> > >
> > > I have considered add this function as a new telemetry command of
> > ethdev (like '/ethdev/sff_module_info') to dump these SFP information.
> > > But I'm not sure if it's acceptable to add all these production code
> > (sff_8xxx.c) into lib/ethdev?
> > > If it's OK, I can make V3 patches to change it as a telemetry command of
> > ethdev.
> > >
> >
> > Hi,
> >
> > I think some discussion is needed before you go preparing a new version of
> > this patchset.
> >
> > Some initial questions:
> >
> > 1. Does SFF code apply only to Intel products/NICs or is it multi-vendor?
> The SFF code apply to multi-vendor.
> In fact, it's applied to all the NIC driver which implemented 
> dev_ops->get_module_eeprom.
> 
> > 2. For the driver approach you previously took, how was the presence of
> >hardware detected to load the driver?
> The purpose of put these production code into drivers/common is want to treat 
> it as a common function for NIC drivers.
> It will not related to any presence of hardware.
> 
> > 3. Does this work on SFPs need to interact with the NIC drivers in any way?
> >
> Yes, just like my answer

[PATCH] crypto/qat: add curve25519 and curve448 functions

2022-04-08 Thread Arek Kusztal
This commit adds qat functions for curve25519 and
curve448.

Signed-off-by: Arek Kusztal 
---
 .../common/qat/qat_adf/icp_qat_fw_mmp_ids.h   | 74 +++
 1 file changed, 74 insertions(+)

diff --git a/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h 
b/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
index 00813cffb9..7280fe9459 100644
--- a/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
+++ b/drivers/common/qat/qat_adf/icp_qat_fw_mmp_ids.h
@@ -1523,6 +1523,80 @@ icp_qat_fw_mmp_ecdsa_verify_gfp_521_input::in in @endlink
  * icp_qat_fw_mmp_kpt_ecdsa_sign_rs_gfp_521_output::r r @endlink @link
  * icp_qat_fw_mmp_kpt_ecdsa_sign_rs_gfp_521_output::s s @endlink
  */
+#define POINT_MULTIPLICATION_C25519 0x0a0634c6
+/**< Functionality ID for ECC curve25519 Variable Point Multiplication [k]P(x),
+ * as specified in RFC7748
+ * @li 2 input parameters : @link
+ * icp_qat_fw_point_multiplication_c25519_input_s::xp xp @endlink @link
+ * icp_qat_fw_point_multiplication_c25519_input_s::k k @endlink
+ * @li 1 output parameters : @link
+ * icp_qat_fw_point_multiplication_c25519_output_s::xr xr @endlink
+ */
+#define GENERATOR_MULTIPLICATION_C25519 0x0a0634d6
+/**< Functionality ID for ECC curve25519 Generator Point Multiplication 
[k]G(x),
+ * as specified in RFC7748
+ * @li 1 input parameters : @link
+ * icp_qat_fw_generator_multiplication_c25519_input_s::k k @endlink
+ * @li 1 output parameters : @link
+ * icp_qat_fw_generator_multiplication_c25519_output_s::xr xr @endlink
+ */
+#define POINT_MULTIPLICATION_ED25519 0x100b34e6
+/**< Functionality ID for ECC edwards25519 Variable Point Multiplication [k]P,
+ * as specified in RFC8032
+ * @li 3 input parameters : @link
+ * icp_qat_fw_point_multiplication_ed25519_input_s::xp xp @endlink @link
+ * icp_qat_fw_point_multiplication_ed25519_input_s::yp yp @endlink @link
+ * icp_qat_fw_point_multiplication_ed25519_input_s::k k @endlink
+ * @li 2 output parameters : @link
+ * icp_qat_fw_point_multiplication_ed25519_output_s::xr xr @endlink @link
+ * icp_qat_fw_point_multiplication_ed25519_output_s::yr yr @endlink
+ */
+#define GENERATOR_MULTIPLICATION_ED25519 0x100a34f6
+/**< Functionality ID for ECC edwards25519 Generator Point Multiplication [k]G,
+ * as specified in RFC8032
+ * @li 1 input parameters : @link
+ * icp_qat_fw_generator_multiplication_ed25519_input_s::k k @endlink
+ * @li 2 output parameters : @link
+ * icp_qat_fw_generator_multiplication_ed25519_output_s::xr xr @endlink @link
+ * icp_qat_fw_generator_multiplication_ed25519_output_s::yr yr @endlink
+ */
+#define POINT_MULTIPLICATION_C448 0x0c063506
+/**< Functionality ID for ECC curve448 Variable Point Multiplication [k]P(x), 
as
+ * specified in RFC7748
+ * @li 2 input parameters : @link
+ * icp_qat_fw_point_multiplication_c448_input_s::xp xp @endlink @link
+ * icp_qat_fw_point_multiplication_c448_input_s::k k @endlink
+ * @li 1 output parameters : @link
+ * icp_qat_fw_point_multiplication_c448_output_s::xr xr @endlink
+ */
+#define GENERATOR_MULTIPLICATION_C448 0x0c063516
+/**< Functionality ID for ECC curve448 Generator Point Multiplication [k]G(x),
+ * as specified in RFC7748
+ * @li 1 input parameters : @link
+ * icp_qat_fw_generator_multiplication_c448_input_s::k k @endlink
+ * @li 1 output parameters : @link
+ * icp_qat_fw_generator_multiplication_c448_output_s::xr xr @endlink
+ */
+#define POINT_MULTIPLICATION_ED448 0x1a0b3526
+/**< Functionality ID for ECC edwards448 Variable Point Multiplication [k]P, as
+ * specified in RFC8032
+ * @li 3 input parameters : @link
+ * icp_qat_fw_point_multiplication_ed448_input_s::xp xp @endlink @link
+ * icp_qat_fw_point_multiplication_ed448_input_s::yp yp @endlink @link
+ * icp_qat_fw_point_multiplication_ed448_input_s::k k @endlink
+ * @li 2 output parameters : @link
+ * icp_qat_fw_point_multiplication_ed448_output_s::xr xr @endlink @link
+ * icp_qat_fw_point_multiplication_ed448_output_s::yr yr @endlink
+ */
+#define GENERATOR_MULTIPLICATION_ED448 0x1a0a3536
+/**< Functionality ID for ECC edwards448 Generator Point Multiplication [k]P, 
as
+ * specified in RFC8032
+ * @li 1 input parameters : @link
+ * icp_qat_fw_generator_multiplication_ed448_input_s::k k @endlink
+ * @li 2 output parameters : @link
+ * icp_qat_fw_generator_multiplication_ed448_output_s::xr xr @endlink @link
+ * icp_qat_fw_generator_multiplication_ed448_output_s::yr yr @endlink
+ */
 
 #define PKE_LIVENESS 0x0001
 /**< Functionality ID for PKE_LIVENESS
-- 
2.30.2



RE: [dpdk][PATCH 1/2] sched: enable/disable TC OV at runtime

2022-04-08 Thread Singh, Jasvinder



> -Original Message-
> From: Marcin Danilewicz 
> Sent: Thursday, April 7, 2022 3:52 PM
> To: dev@dpdk.org
> Cc: Ajmera, Megha 
> Subject: [dpdk][PATCH 1/2] sched: enable/disable TC OV at runtime
> 
> From: Megha Ajmera 
> 
> Added new API to enable or disable TC over subscription for best effort
> traffic class at subport level.
> 
> By default TC OV is disabled for subport.
> 
> Signed-off-by: Megha Ajmera 
> 
> diff --git a/lib/sched/rte_sched.c b/lib/sched/rte_sched.c index
> ec74bee939..1d05089d00 100644
> --- a/lib/sched/rte_sched.c
> +++ b/lib/sched/rte_sched.c
> @@ -155,6 +155,7 @@ struct rte_sched_subport {
>   uint64_t tc_credits[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE];
> 
>   /* TC oversubscription */
> + uint8_t is_tc_ov_enabled;
>   uint64_t tc_ov_wm;
>   uint64_t tc_ov_wm_min;
>   uint64_t tc_ov_wm_max;
> @@ -1165,6 +1166,45 @@ rte_sched_cman_config(struct rte_sched_port
> *port,  }  #endif
> 
> +int
> +rte_sched_subport_tc_ov_config(struct rte_sched_port *port,
> + uint32_t subport_id,
> + bool tc_ov_enable)
> +{
> + struct rte_sched_subport *s;
> + struct rte_sched_subport_profile *profile;
> +
> + if (port == NULL) {
> + RTE_LOG(ERR, SCHED,
> + "%s: Incorrect value for parameter port\n",
> __func__);
> + return -EINVAL;
> + }
> +
> + if (subport_id >= port->n_subports_per_port) {
> + RTE_LOG(ERR, SCHED,
> + "%s: Incorrect value for parameter subport id\n",
> __func__);
> + return  -EINVAL;
> + }
> +
> + s = port->subports[subport_id];
> + s->is_tc_ov_enabled = tc_ov_enable;
> +
> + if (s->is_tc_ov_enabled) {
> + /* TC oversubscription */
> + s->tc_ov_wm_min = port->mtu;
> + s->tc_ov_period_id = 0;
> + s->tc_ov = 0;
> + s->tc_ov_n = 0;
> + s->tc_ov_rate = 0;
> +
> + profile = port->subport_profiles + s->profile;
> + s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(profile-
> >tc_period,
> + s->pipe_tc_be_rate_max);
> + s->tc_ov_wm = s->tc_ov_wm_max;
> + }
> + return 0;
> +}


This API should be invoked immediately after subport config function because 
during pipe configuration,  subport tc_ov parameters are updated based on the 
pipe best effort tc parameters.  With this condition, won't it be good to add 
tc_ov_enable/disable flag to subport params instead of adding new API? 


> +
>  int
>  rte_sched_subport_config(struct rte_sched_port *port,
>   uint32_t subport_id,
> @@ -1317,12 +1357,8 @@ rte_sched_subport_config(struct rte_sched_port
> *port,
>   for (i = 0; i < RTE_SCHED_PORT_N_GRINDERS; i++)
>   s->grinder_base_bmp_pos[i] =
> RTE_SCHED_PIPE_INVALID;
> 
> - /* TC oversubscription */
> - s->tc_ov_wm_min = port->mtu;
> - s->tc_ov_period_id = 0;
> - s->tc_ov = 0;
> - s->tc_ov_n = 0;
> - s->tc_ov_rate = 0;
> + /* TC over-subscription is disabled by default */
> + s->is_tc_ov_enabled = 0;
>   }
> 
>   {
> @@ -1342,9 +1378,6 @@ rte_sched_subport_config(struct rte_sched_port
> *port,
>   else
>   profile->tc_credits_per_period[i] = 0;
> 
> - s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(profile-
> >tc_period,
> - s-
> >pipe_tc_be_rate_max);
> - s->tc_ov_wm = s->tc_ov_wm_max;
>   s->profile = subport_profile_id;
> 
>   }
> @@ -1417,17 +1450,20 @@ rte_sched_pipe_config(struct rte_sched_port
> *port,
>   double pipe_tc_be_rate =
>   (double) params-
> >tc_credits_per_period[RTE_SCHED_TRAFFIC_CLASS_BE]
>   / (double) params->tc_period;
> - uint32_t tc_be_ov = s->tc_ov;
> 
> - /* Unplug pipe from its subport */
> - s->tc_ov_n -= params->tc_ov_weight;
> - s->tc_ov_rate -= pipe_tc_be_rate;
> - s->tc_ov = s->tc_ov_rate > subport_tc_be_rate;
> + if (s->is_tc_ov_enabled) {
> + uint32_t tc_be_ov = s->tc_ov;
> 
> - if (s->tc_ov != tc_be_ov) {
> - RTE_LOG(DEBUG, SCHED,
> - "Subport %u Best-effort TC oversubscription
> is OFF (%.4lf >= %.4lf)\n",
> - subport_id, subport_tc_be_rate, s-
> >tc_ov_rate);
> + /* Unplug pipe from its subport */
> + s->tc_ov_n -= params->tc_ov_weight;
> + s->tc_ov_rate -= pipe_tc_be_rate;
> + s->tc_ov = s->tc_ov_rate > subport_tc_be_rate;
> +
> + if (s->tc_ov != tc_be_ov) {
> + RTE_LOG(DEBUG, SCHED,
> + "Subport %u B

[PATCH v1] ring: correct the comment and figure description

2022-04-08 Thread Haiyue Wang
The index description isn't right, correct it as the Programmer's guide
said.

Also correct the guide's figure description about 'Dequeue First Step'.

Signed-off-by: Haiyue Wang 
---
 doc/guides/prog_guide/ring_lib.rst | 2 +-
 lib/ring/rte_ring_core.h   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/guides/prog_guide/ring_lib.rst 
b/doc/guides/prog_guide/ring_lib.rst
index 54e0bb4b68..515a715266 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -172,7 +172,7 @@ If there are not enough objects in the ring (this is 
detected by checking prod_t
 
 .. figure:: img/ring-dequeue1.*
 
-   Dequeue last step
+   Dequeue first step
 
 
 Dequeue Second Step
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 1252ca9546..82b237091b 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -111,8 +111,8 @@ struct rte_ring_hts_headtail {
  * An RTE ring structure.
  *
  * The producer and the consumer have a head and a tail index. The 
particularity
- * of these index is that they are not between 0 and size(ring). These indexes
- * are between 0 and 2^32, and we mask their value when we access the ring[]
+ * of these index is that they are not between 0 and size(ring)-1. These 
indexes
+ * are between 0 and 2^32 -1, and we mask their value when we access the ring[]
  * field. Thanks to this assumption, we can do subtractions between 2 index
  * values in a modulo-32bit base: that's why the overflow of the indexes is not
  * a problem.
-- 
2.35.1



Re: [PATCH v3] eal: add seqlock

2022-04-08 Thread Mattias Rönnblom

On 2022-04-03 19:37, Honnappa Nagarahalli wrote:






+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ * rte_seqlock_t lock;
+ * int param_x;
+ * char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char
+*param_y)
+ * {
+ * // Temporary variables, just to improve readability.

I think the above comment is not necessary. It is beneficial to copy the

protected data to keep the read side critical section small.




The data here would be copied into the buffers supplied by config_read()
anyways, so it's a copy regardless.

I see what you mean here. I would think the local variables add confusion, the 
copy can happen to the passed parameters directly. I will leave it to you to 
decide.



I'll remove the temp variables.




+ * int tentative_x;
+ * char tentative_y[MAX_Y_LEN];
+ * uint32_t sn;
+ *
+ * sn = rte_seqlock_read_lock(&config->lock);
+ * do {
+ * // Loads may be atomic or non-atomic, as in this example.
+ * tentative_x = config->param_x;
+ * strcpy(tentative_y, config->param_y);
+ * } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
+ * // An application could skip retrying, and try again later, if
+ * // progress is possible without the data.
+ *
+ * *param_x = tentative_x;
+ * strcpy(param_y, tentative_y);
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char
+*param_y)
+ * {
+ * rte_seqlock_write_lock(&config->lock);
+ * // Stores may be atomic or non-atomic, as in this example.
+ * config->param_x = param_x;
+ * strcpy(config->param_y, param_y);
+ * rte_seqlock_write_unlock(&config->lock);
+ * }
+ * @endcode
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+   uint32_t sn; /**< A sequence number for the protected data. */
+   rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }

Suggest using ticket lock for the writer side. It should have low overhead

when there is a single writer, but provides better functionality when there are
multiple writers.




Is a seqlock the synchronization primitive of choice for high-contention cases?
I would say no, but I'm not sure what you would use instead.

I think Stephen has come across some use cases of high contention writers with 
readers, maybe Stephen can provide some input.

IMO, there is no harm/perf issues in using ticket lock.



OK. I will leave at as spinlock for now (PATCH v4).


Re: [PATCH 0/3] add eal functions for thread affinity

2022-04-08 Thread Tyler Retzlaff
On Fri, Apr 08, 2022 at 10:57:55AM +0200, David Marchand wrote:
> Hello Tyler,
> 
> On Fri, Apr 1, 2022 at 3:30 PM Tyler Retzlaff
>  wrote:
> >
> > this series provides basic dependencies for additional eal thread api
> > additions. series includes basic error handling, initial get/set thread
> > affinity functions and minimal unit test.
> >
> > Tyler Retzlaff (3):
> >   eal/windows: translate Windows errors to errno-style errors
> >   eal: implement functions for get/set thread affinity
> >   test/threads: add unit test for thread API
> >
> >  app/test/meson.build |   2 +
> >  app/test/test_threads.c  |  86 +++
> >  lib/eal/include/rte_thread.h |  45 ++
> >  lib/eal/unix/rte_thread.c|  16 
> >  lib/eal/version.map  |   4 +
> >  lib/eal/windows/eal_lcore.c  | 173 
> > +++--
> >  lib/eal/windows/eal_windows.h|  10 +++
> >  lib/eal/windows/include/rte_os.h |   2 +
> >  lib/eal/windows/rte_thread.c | 179 
> > ++-
> >  9 files changed, 472 insertions(+), 45 deletions(-)
> >  create mode 100644 app/test/test_threads.c
> 
> We have two concurrent series, can you clarify what are the intentions
> on this work?

yes, i should have clarified this up front sorry.

> Is this series superseding Narcisa series?

this series supersedes the series from Narcisa. it was resolved through
discussion that the current series should be abandoned as it is too
large and not making progress.

we've elected to submit a series of smaller patchsets that incorporate
the feedback received to date and build up the api surface for
threading. the patches are still the work of Narcisa but she is
overscheduled so i will assist in upstreaming and addressing feedback.

additionally, rather than port the tree to the new __experimental api as
they are added we will prefer to add unit tests that provide validation
of the api and example usage.

our hope is the smaller scoped series will attract more attention and
have better acknowledgement velocity.

i will have Narcisa mark the monolithic series as superseded on
patchwork.

ty



Re: [PATCH v7] eal: fix rte_memcpy strict aliasing/alignment bugs

2022-04-08 Thread Luc Pelletier
Hi David,

Le jeu. 7 avr. 2022 à 11:24, David Marchand
 a écrit :

> > As a side note, we probably need to check other similar places in DPDK code.
>
> What would be the best way to detect those problematic places?

As far as I'm aware, there is no silver bullet to detect all strict
aliasing violations. A good summary of the different options are
described here:

https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8#catching-strict-aliasing-violations

However, this is not 100% and might still miss some strict aliasing violations.

The bottom line is that anywhere there's a cast to something other
than void* or char*, it could be a strict aliasing violation. So,
doing an exhaustive search throughout the code base for casts seems
like the only (tedious, time-consuming) solution.


Re: [PATCH 3/3] test/threads: add unit test for thread API

2022-04-08 Thread Dmitry Kozlyuk
2022-04-01 06:29 (UTC-0700), Tyler Retzlaff:
[...]
> +static int
> +test_thread_affinity(void)
> +{
> + pthread_t id;
> + rte_thread_t thread_id;
> +
> + RTE_TEST_ASSERT(pthread_create(&id, NULL, thread_main, NULL) == 0,
> + "Failed to create thread");
> + thread_id.opaque_id = id;

The need for this hack means that the new API is unusable in practice.
I think functions to get the current thread ID and to compare IDs
must be the part of this series to accept this unit test patch.



Re: [PATCH 2/3] eal: implement functions for get/set thread affinity

2022-04-08 Thread Dmitry Kozlyuk
2022-04-01 06:29 (UTC-0700), Tyler Retzlaff:
> Implement functions for getting/setting thread affinity.
> Threads can be pinned to specific cores by setting their
> affinity attribute.
> 
> Signed-off-by: Narcisa Vasile 
> Signed-off-by: Tyler Retzlaff 

Acked-by: Dmitry Kozlyuk 

Please see some small comments below.

> ---
>  lib/eal/include/rte_thread.h |  45 ++
>  lib/eal/unix/rte_thread.c|  16 
>  lib/eal/version.map  |   4 +
>  lib/eal/windows/eal_lcore.c  | 173 
> +--
>  lib/eal/windows/eal_windows.h|  10 +++
>  lib/eal/windows/include/rte_os.h |   2 +
>  lib/eal/windows/rte_thread.c | 131 -
>  7 files changed, 336 insertions(+), 45 deletions(-)
> 
> diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
> index 8be8ed8..4eb113f 100644
> --- a/lib/eal/include/rte_thread.h
> +++ b/lib/eal/include/rte_thread.h
> @@ -2,6 +2,8 @@
>   * Copyright(c) 2021 Mellanox Technologies, Ltd
>   */
>  
> +#include 
> +
>  #include 
>  #include 
>  
> @@ -21,6 +23,13 @@
>  #endif
>  
>  /**
> + * Thread id descriptor.
> + */
> +typedef struct rte_thread_tag {
> + uintptr_t opaque_id; /**< thread identifier */
> +} rte_thread_t;
> +
> +/**
>   * TLS key type, an opaque pointer.
>   */
>  typedef struct eal_tls_key *rte_thread_key;
> @@ -28,6 +37,42 @@
>  #ifdef RTE_HAS_CPUSET
>  
>  /**

Missing a common part for experimental functions:

 * @warning
 * @b EXPERIMENTAL: this API may change without prior notice.

> + * Set the affinity of thread 'thread_id' to the cpu set
> + * specified by 'cpuset'.
> + *
> + * @param thread_id
> + *Id of the thread for which to set the affinity.
> + *
> + * @param cpuset
> + *   Pointer to CPU affinity to set.
> + *
> + * @return
> + *   On success, return 0.
> + *   On failure, return a positive errno-style error number.
> + */
> +__rte_experimental
> +int rte_thread_set_affinity_by_id(rte_thread_t thread_id,
> + const rte_cpuset_t *cpuset);
> +
> +/**

Same here.

> + * Get the affinity of thread 'thread_id' and store it
> + * in 'cpuset'.
> + *
> + * @param thread_id
> + *Id of the thread for which to get the affinity.
> + *
> + * @param cpuset
> + *   Pointer for storing the affinity value.
> + *
> + * @return
> + *   On success, return 0.
> + *   On failure, return a positive errno-style error number.
> + */
> +__rte_experimental
> +int rte_thread_get_affinity_by_id(rte_thread_t thread_id,
> + rte_cpuset_t *cpuset);
> +
> +/**
>   * Set core affinity of the current thread.
>   * Support both EAL and non-EAL thread and update TLS.
>   *

[...]

> +static int
> +eal_query_group_affinity(void)
> +{
> + SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *infos = NULL;
> + unsigned int *cpu_count = &cpu_map.cpu_count;
> + DWORD infos_size = 0;
> + int ret = 0;
> + USHORT group_count;
> + KAFFINITY affinity;
> + USHORT group_no;
> + unsigned int i;
> +
> + if (!GetLogicalProcessorInformationEx(RelationGroup, NULL,
> + &infos_size)) {
> + DWORD error = GetLastError();
> + if (error != ERROR_INSUFFICIENT_BUFFER) {
> + log_early("Cannot get group information size, "
> + "error %lu\n", error);

Please don't break string constants for easy search.

> + rte_errno = EINVAL;
> + ret = -1;
> + goto cleanup;
> + }
> + }
[...]

> +static bool
> +eal_create_lcore_map(const SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *info)
> +{
> + const unsigned int node_id = info->NumaNode.NodeNumber;
> + const GROUP_AFFINITY *cores = &info->NumaNode.GroupMask;
> + struct lcore_map *lcore;
> + unsigned int socket_id;
> + unsigned int i;
> +
> + /* NUMA node may be reported multiple times if it includes
> +  * cores from different processor groups, e. g. 80 cores
> +  * of a physical processor comprise one NUMA node, but two
> +  * processor groups, because group size is limited by 32/64.
> +  */
> + for (socket_id = 0; socket_id < cpu_map.socket_count; socket_id++) {
> + if (cpu_map.sockets[socket_id].node_id == node_id)
> + break;
> + }

Nit: multi-line comments should start with a line containing just "/*",
and {} are no needed here.

[...]
> +static int
> +rte_convert_cpuset_to_affinity(const rte_cpuset_t *cpuset,
> + PGROUP_AFFINITY affinity)
> +{
> + int ret = 0;
> + PGROUP_AFFINITY cpu_affinity = NULL;
> + unsigned int cpu_idx;
> +
> + memset(affinity, 0, sizeof(GROUP_AFFINITY));
> + affinity->Group = (USHORT)-1;
> +
> + /* Check that all cpus of the set belong to the same processor group and
> +  * accumulate thread affinity to be applied.
> +  */
> + for (cpu_idx = 0; cpu_idx < CPU_SETSIZE; cpu_idx++) {
> + if (!CPU_I

[RFC PATCH] cryptodev: add basic asymmetric crypto capability structs

2022-04-08 Thread Arek Kusztal
This commit adds basic structs to handle asymmetric crypto capability.

Signed-off-by: Arek Kusztal 
---
 lib/cryptodev/rte_crypto_asym.h | 47 +
 lib/cryptodev/rte_cryptodev.h   |  8 ++
 2 files changed, 55 insertions(+)

diff --git a/lib/cryptodev/rte_crypto_asym.h b/lib/cryptodev/rte_crypto_asym.h
index cd24d4b07b..2d58fffee5 100644
--- a/lib/cryptodev/rte_crypto_asym.h
+++ b/lib/cryptodev/rte_crypto_asym.h
@@ -386,6 +386,26 @@ struct rte_crypto_rsa_op_param {
 */
 };
 
+struct rte_crypto_rsa_capability {
+   uint64_t padding_type;
+   /* Supported padding */
+   union {
+   uint64_t hash;
+   /* Supported hash functions, at least one
+* shall be supported */
+   uint64_t mgf;
+   /* Supported masdk generation functions,
+* at least one shall be supported */
+   } padding;
+   uint32_t max_key_len;
+   /* Maximum supported key length */
+   uint8_t sign_message;
+   /* If zero input should contain message digest,
+* otherwise it should be plain message */
+   uint8_t pkcs_plain_padding;
+   /* PKCS1_5 padding without algorithm identifier */
+};
+
 /**
  * Diffie-Hellman Operations params.
  * @note:
@@ -416,6 +436,19 @@ struct rte_crypto_dh_op_param {
 */
 };
 
+struct rte_crypto_dh_capability {
+   union {
+   uint32_t group_size;
+   /**< Maximum size of underliying mod group */
+   uint64_t curves;
+   /**< Supported elliptic curve ids */
+   /* uint64_t fixed_groups; ? */
+   /**< Supported fixed groups */
+   /* uint8_t custom_curves; ? */
+   /**< Supported custom curves */
+   };
+};
+
 /**
  * DSA Operations params
  *
@@ -484,6 +517,13 @@ struct rte_crypto_ecdsa_op_param {
 */
 };
 
+struct rte_crypto_ecdsa_capability {
+   uint64_t curves;
+   /**< Supported elliptic curve ids */
+   /* uint8_t custom_curves; ? */
+   /**< Supported custom curves */
+};
+
 /**
  * Structure for EC point multiplication operation param
  */
@@ -498,6 +538,13 @@ struct rte_crypto_ecpm_op_param {
/**< Scalar to multiply the input point */
 };
 
+struct rte_crypto_ecpm_capability {
+   uint64_t curves;
+   /**< Supported elliptic curve ids */
+   /* uint8_t custom_curves; ? */
+   /**< Supported custom curves */
+};
+
 /**
  * Asymmetric crypto transform data
  *
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index 45d33f4a50..79026dbb80 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -176,6 +176,14 @@ struct rte_cryptodev_asymmetric_xform_capability {
/**< Range of modulus length supported by modulus based xform.
 * Value 0 mean implementation default
 */
+   struct rte_crypto_ecdsa_capability ecdsa;
+   /**< ECDSA capability */
+   struct rte_crypto_ecpm_capability ecpm;
+   /**< ECPM capability */
+   struct rte_crypto_rsa_capability rsa;
+   /**< RSA capability */
+   struct rte_crypto_dh_capability dh;
+   /**< DH capability */
};
 };
 
-- 
2.30.2



[PATCH 0/4] Add APIs for configurable power options

2022-04-08 Thread Kevin Laatz
The power library contains some variables which are currently set by
defines, hard-coded values or set using sysfs values. In order to
configure these, code changes and recompiles are required, making
configuring these variables tedious.

This patchset introduces some new get/set APIs which allow users and
applications to configure there settings to suit their use-cases.
In addition, CLI options have been added to l3fwd_power to demonstrate
how an application could use these APIs to expose the options to users
without needing code changes to configure them.

Kevin Laatz (4):
  lib/power: add get and set API for emptypoll max
  lib/power: add get and set API for pause duration
  lib/power: add get and set API for scaling freq min and max with
pstate mode
  examples/l3fwd_power: add cli for configurable options

 examples/l3fwd-power/main.c  |  75 -
 lib/power/power_pstate_cpufreq.c |  22 +++--
 lib/power/rte_power_pmd_mgmt.c   | 112 +++--
 lib/power/rte_power_pmd_mgmt.h   | 138 +++
 lib/power/version.map|  10 +++
 5 files changed, 346 insertions(+), 11 deletions(-)

-- 
2.31.1



[PATCH 1/4] lib/power: add get and set API for emptypoll max

2022-04-08 Thread Kevin Laatz
Add new get/set APIs to configure emptypoll max which is used to
determine when a queue can go into sleep state.

Signed-off-by: Kevin Laatz 
---
 lib/power/rte_power_pmd_mgmt.c | 21 ++---
 lib/power/rte_power_pmd_mgmt.h | 27 +++
 lib/power/version.map  |  4 
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index 39a2b4cd23..dfb7ca9187 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -11,7 +11,7 @@
 
 #include "rte_power_pmd_mgmt.h"
 
-#define EMPTYPOLL_MAX  512
+unsigned int emptypoll_max;
 
 /* store some internal state */
 static struct pmd_conf_data {
@@ -206,7 +206,7 @@ queue_can_sleep(struct pmd_core_cfg *cfg, struct 
queue_list_entry *qcfg)
qcfg->n_empty_polls++;
 
/* if we haven't reached threshold for empty polls, we can't sleep */
-   if (qcfg->n_empty_polls <= EMPTYPOLL_MAX)
+   if (qcfg->n_empty_polls <= emptypoll_max)
return false;
 
/*
@@ -290,7 +290,7 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf 
**pkts __rte_unused,
/* this callback can't do more than one queue, omit multiqueue logic */
if (unlikely(nb_rx == 0)) {
queue_conf->n_empty_polls++;
-   if (unlikely(queue_conf->n_empty_polls > EMPTYPOLL_MAX)) {
+   if (unlikely(queue_conf->n_empty_polls > emptypoll_max)) {
struct rte_power_monitor_cond pmc;
int ret;
 
@@ -661,6 +661,18 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
return 0;
 }
 
+void
+rte_power_pmd_mgmt_set_emptypoll_max(unsigned int max)
+{
+   emptypoll_max = max;
+}
+
+unsigned int
+rte_power_pmd_mgmt_get_emptypoll_max(void)
+{
+   return emptypoll_max;
+}
+
 RTE_INIT(rte_power_ethdev_pmgmt_init) {
size_t i;
 
@@ -669,4 +681,7 @@ RTE_INIT(rte_power_ethdev_pmgmt_init) {
struct pmd_core_cfg *cfg = &lcore_cfgs[i];
TAILQ_INIT(&cfg->head);
}
+
+   /* initialize config defaults */
+   emptypoll_max = 512;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 444e7b8a66..d5a94f8187 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -90,6 +90,33 @@ int
 rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
uint16_t port_id, uint16_t queue_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Set a emptypoll_max to specified value. Used to specify the number of empty
+ * polls to wait before entering sleep state.
+ *
+ * @param max
+ *   The value to set emptypoll_max to.
+ */
+__rte_experimental
+void
+rte_power_pmd_mgmt_set_emptypoll_max(unsigned int max);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Get the current value of emptypoll_max.
+ *
+ * @return
+ *   The current emptypoll_max value
+ */
+__rte_experimental
+unsigned int
+rte_power_pmd_mgmt_get_emptypoll_max(void);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/power/version.map b/lib/power/version.map
index 6ec6d5d96d..8bcd497e06 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -38,4 +38,8 @@ EXPERIMENTAL {
# added in 21.02
rte_power_ethdev_pmgmt_queue_disable;
rte_power_ethdev_pmgmt_queue_enable;
+
+   # added in 22.07
+   rte_power_pmd_mgmt_set_emptypoll_max;
+   rte_power_pmd_mgmt_get_emptypoll_max;
 };
-- 
2.31.1



[PATCH 2/4] lib/power: add get and set API for pause duration

2022-04-08 Thread Kevin Laatz
Add new get/set API for configuring 'pause_duration' which used to adjust
the pause mode callback duration.

Signed-off-by: Kevin Laatz 
---
 lib/power/rte_power_pmd_mgmt.c | 26 --
 lib/power/rte_power_pmd_mgmt.h | 31 +++
 lib/power/version.map  |  2 ++
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index dfb7ca9187..ab92a93aa0 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -12,6 +12,7 @@
 #include "rte_power_pmd_mgmt.h"
 
 unsigned int emptypoll_max;
+unsigned int pause_duration;
 
 /* store some internal state */
 static struct pmd_conf_data {
@@ -315,6 +316,7 @@ clb_pause(uint16_t port_id __rte_unused, uint16_t qidx 
__rte_unused,
struct queue_list_entry *queue_conf = arg;
struct pmd_core_cfg *lcore_conf;
const bool empty = nb_rx == 0;
+   uint32_t pause_duration = rte_power_pmd_mgmt_get_pause_duration();
 
lcore_conf = &lcore_cfgs[lcore];
 
@@ -334,11 +336,11 @@ clb_pause(uint16_t port_id __rte_unused, uint16_t qidx 
__rte_unused,
if (global_data.intrinsics_support.power_pause) {
const uint64_t cur = rte_rdtsc();
const uint64_t wait_tsc =
-   cur + global_data.tsc_per_us;
+   cur + global_data.tsc_per_us * 
pause_duration;
rte_power_pause(wait_tsc);
} else {
uint64_t i;
-   for (i = 0; i < global_data.pause_per_us; i++)
+   for (i = 0; i < global_data.pause_per_us * 
pause_duration; i++)
rte_pause();
}
}
@@ -673,6 +675,25 @@ rte_power_pmd_mgmt_get_emptypoll_max(void)
return emptypoll_max;
 }
 
+int
+rte_power_pmd_mgmt_set_pause_duration(unsigned int duration)
+{
+   if (duration == 0) {
+   printf("Pause duration must be greater than 0, value 
unchanged\n");
+   rte_errno = EINVAL;
+   return -1;
+   }
+   pause_duration = duration;
+
+   return 0;
+}
+
+unsigned int
+rte_power_pmd_mgmt_get_pause_duration(void)
+{
+   return pause_duration;
+}
+
 RTE_INIT(rte_power_ethdev_pmgmt_init) {
size_t i;
 
@@ -684,4 +705,5 @@ RTE_INIT(rte_power_ethdev_pmgmt_init) {
 
/* initialize config defaults */
emptypoll_max = 512;
+   pause_duration = 1;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index d5a94f8187..18a9c3abb5 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -117,6 +117,37 @@ __rte_experimental
 unsigned int
 rte_power_pmd_mgmt_get_emptypoll_max(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Set the pause_duration. Used to adjust the pause mode callback duration.
+ *
+ * @note Duration must be greater than zero.
+ *
+ * @param duration
+ *   The value to set pause_duration to.
+ * @return
+ *   0 on success
+ *   <0 on error
+ */
+__rte_experimental
+int
+rte_power_pmd_mgmt_set_pause_duration(unsigned int duration);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Get the current value of pause_duration.
+ *
+ * @return
+ *   The current pause_duration value.
+ */
+__rte_experimental
+unsigned int
+rte_power_pmd_mgmt_get_pause_duration(void);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/power/version.map b/lib/power/version.map
index 8bcd497e06..5890e6e429 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -42,4 +42,6 @@ EXPERIMENTAL {
# added in 22.07
rte_power_pmd_mgmt_set_emptypoll_max;
rte_power_pmd_mgmt_get_emptypoll_max;
+   rte_power_pmd_mgmt_set_pause_duration;
+   rte_power_pmd_mgmt_get_pause_duration;
 };
-- 
2.31.1



[PATCH 3/4] lib/power: add get and set API for scaling freq min and max with pstate mode

2022-04-08 Thread Kevin Laatz
Add new get/set API to allow the user or application to set the minimum
and maximum frequencies to use when scaling.
Previously, the frequency range was determined by the HW capabilities of
the CPU. With this new API, the user or application can constrain this
if required.

Signed-off-by: Kevin Laatz 
---
 lib/power/power_pstate_cpufreq.c | 22 +++--
 lib/power/rte_power_pmd_mgmt.c   | 65 ++
 lib/power/rte_power_pmd_mgmt.h   | 80 
 lib/power/version.map|  4 ++
 4 files changed, 166 insertions(+), 5 deletions(-)

diff --git a/lib/power/power_pstate_cpufreq.c b/lib/power/power_pstate_cpufreq.c
index f4c36179ec..db10deafe2 100644
--- a/lib/power/power_pstate_cpufreq.c
+++ b/lib/power/power_pstate_cpufreq.c
@@ -12,6 +12,7 @@
 
 #include 
 
+#include "rte_power_pmd_mgmt.h"
 #include "power_pstate_cpufreq.h"
 #include "power_common.h"
 
@@ -354,6 +355,7 @@ power_get_available_freqs(struct pstate_power_info *pi)
FILE *f_min = NULL, *f_max = NULL;
int ret = -1;
uint32_t sys_min_freq = 0, sys_max_freq = 0, base_max_freq = 0;
+   int config_min_freq, config_max_freq;
uint32_t i, num_freqs = 0;
 
/* open all files */
@@ -388,6 +390,16 @@ power_get_available_freqs(struct pstate_power_info *pi)
goto out;
}
 
+   /* check for config set by user or application to limit frequency range 
*/
+   config_min_freq = rte_power_pmd_mgmt_get_scaling_freq_min(pi->lcore_id);
+   if (config_min_freq < 0)
+   goto out;
+   config_max_freq = rte_power_pmd_mgmt_get_scaling_freq_max(pi->lcore_id);
+   if (config_max_freq < 0)
+   goto out;
+   sys_min_freq = RTE_MAX(sys_min_freq, (uint32_t)config_min_freq);
+   sys_max_freq = RTE_MIN(sys_max_freq, (uint32_t)config_max_freq);
+
if (sys_max_freq < sys_min_freq)
goto out;
 
@@ -411,8 +423,8 @@ power_get_available_freqs(struct pstate_power_info *pi)
/* If turbo is available then there is one extra freq bucket
 * to store the sys max freq which value is base_max +1
 */
-   num_freqs = (base_max_freq - sys_min_freq) / BUS_FREQ + 1 +
-   pi->turbo_available;
+   num_freqs = (RTE_MIN(base_max_freq, sys_max_freq) - sys_min_freq) / 
BUS_FREQ
+   + 1 + pi->turbo_available;
if (num_freqs >= RTE_MAX_LCORE_FREQS) {
RTE_LOG(ERR, POWER, "Too many available frequencies: %d\n",
num_freqs);
@@ -427,10 +439,10 @@ power_get_available_freqs(struct pstate_power_info *pi)
 */
for (i = 0, pi->nb_freqs = 0; i < num_freqs; i++) {
if ((i == 0) && pi->turbo_available)
-   pi->freqs[pi->nb_freqs++] = base_max_freq + 1;
+   pi->freqs[pi->nb_freqs++] = RTE_MIN(base_max_freq, 
sys_max_freq) + 1;
else
-   pi->freqs[pi->nb_freqs++] =
-   base_max_freq - (i - pi->turbo_available) * BUS_FREQ;
+   pi->freqs[pi->nb_freqs++] = RTE_MIN(base_max_freq, 
sys_max_freq) -
+   (i - pi->turbo_available) * BUS_FREQ;
}
 
ret = 0;
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index ab92a93aa0..5ba050dc95 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -10,9 +10,12 @@
 #include 
 
 #include "rte_power_pmd_mgmt.h"
+#include "power_common.h"
 
 unsigned int emptypoll_max;
 unsigned int pause_duration;
+unsigned int scale_freq_min[RTE_MAX_LCORE];
+unsigned int scale_freq_max[RTE_MAX_LCORE];
 
 /* store some internal state */
 static struct pmd_conf_data {
@@ -694,8 +697,65 @@ rte_power_pmd_mgmt_get_pause_duration(void)
return pause_duration;
 }
 
+int
+rte_power_pmd_mgmt_set_scaling_freq_min(unsigned int lcore, unsigned int min)
+{
+   if (lcore >= RTE_MAX_LCORE) {
+   RTE_LOG(ERR, POWER, "Invalid lcore ID: %u\n", lcore);
+   rte_errno = EINVAL;
+   return -1;
+   }
+   scale_freq_min[lcore] = min;
+
+   return 0;
+}
+
+int
+rte_power_pmd_mgmt_set_scaling_freq_max(unsigned int lcore, unsigned int max)
+{
+   if (lcore >= RTE_MAX_LCORE) {
+   RTE_LOG(ERR, POWER, "Invalid lcore ID: %u\n", lcore);
+   rte_errno = EINVAL;
+   return -1;
+   }
+   scale_freq_max[lcore] = max;
+
+   return 0;
+}
+
+int
+rte_power_pmd_mgmt_get_scaling_freq_min(unsigned int lcore)
+{
+   if (lcore >= RTE_MAX_LCORE) {
+   RTE_LOG(ERR, POWER, "Invalid lcore ID: %u\n", lcore);
+   rte_errno = EINVAL;
+   return -1;
+   }
+
+   if (scale_freq_max[lcore] == 0)
+   RTE_LOG(DEBUG, POWER, "Scaling freq min config not set. Using 
sysfs min freq.\n");
+
+   return scale_freq_min[lcore];
+}
+
+int
+rte_power_pmd_

[PATCH 4/4] examples/l3fwd_power: add cli for configurable options

2022-04-08 Thread Kevin Laatz
Add CLI options to l3fwd_power to utilize the new power APIs introduced in
this patchset. These CLI options allow the user to configure the
heuritstics made available through the new API via the l3fwd_power
application options.

Signed-off-by: Kevin Laatz 
---
 examples/l3fwd-power/main.c | 75 -
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 20e5b59af9..f480de2420 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -265,6 +265,10 @@ static struct rte_eth_conf port_conf = {
 };
 
 static uint32_t max_pkt_len;
+static uint32_t max_empty_polls;
+static uint32_t pause_duration;
+static uint32_t scale_freq_min;
+static uint32_t scale_freq_max;
 
 static struct rte_mempool * pktmbuf_pool[NB_SOCKETS];
 
@@ -1626,10 +1630,32 @@ print_usage(const char *prgname)
" empty polls, full polls, and core busyness to telemetry\n"
" --interrupt-only: enable interrupt-only mode\n"
" --pmd-mgmt MODE: enable PMD power management mode. "
-   "Currently supported modes: baseline, monitor, pause, scale\n",
+   "Currently supported modes: baseline, monitor, pause, scale\n"
+   "  --max-empty-polls MAX_EMPTY_POLLS: number of empty polls to"
+   " wait before entering sleep state\n"
+   "  --pause-duration DURATION: set the duration, in 
microseconds,"
+   " of the pause callback\n"
+   "  --scale-freq-min FREQ_MIN: set minimum frequency for scaling"
+   " mode for all application lcores\n"
+   "  --scale-freq-max FREQ_MAX: set maximum frequency for scaling"
+   " mode for all application lcores\n",
prgname);
 }
 
+static int
+parse_int(const char *opt)
+{
+   char *end = NULL;
+   unsigned long val;
+
+   /* parse integer string */
+   val = strtoul(opt, &end, 10);
+   if ((opt[0] == '\0') || (end == NULL) || (*end != '\0'))
+   return -1;
+
+   return val;
+}
+
 static int parse_max_pkt_len(const char *pktlen)
 {
char *end = NULL;
@@ -1803,6 +1829,10 @@ parse_ep_config(const char *q_arg)
 #define CMD_LINE_OPT_TELEMETRY "telemetry"
 #define CMD_LINE_OPT_PMD_MGMT "pmd-mgmt"
 #define CMD_LINE_OPT_MAX_PKT_LEN "max-pkt-len"
+#define CMD_LINE_OPT_MAX_EMPTY_POLLS "max-empty-polls"
+#define CMD_LINE_OPT_PAUSE_DURATION "pause-duration"
+#define CMD_LINE_OPT_SCALE_FREQ_MIN "scale-freq-min"
+#define CMD_LINE_OPT_SCALE_FREQ_MAX "scale-freq-max"
 
 /* Parse the argument given in the command line of the application */
 static int
@@ -1812,6 +1842,7 @@ parse_args(int argc, char **argv)
char **argvopt;
int option_index;
uint32_t limit;
+   int i;
char *prgname = argv[0];
static struct option lgopts[] = {
{"config", 1, 0, 0},
@@ -1825,6 +1856,10 @@ parse_args(int argc, char **argv)
{CMD_LINE_OPT_TELEMETRY, 0, 0, 0},
{CMD_LINE_OPT_INTERRUPT_ONLY, 0, 0, 0},
{CMD_LINE_OPT_PMD_MGMT, 1, 0, 0},
+   {CMD_LINE_OPT_MAX_EMPTY_POLLS, 1, 0, 0},
+   {CMD_LINE_OPT_PAUSE_DURATION, 1, 0, 0},
+   {CMD_LINE_OPT_SCALE_FREQ_MIN, 1, 0, 0},
+   {CMD_LINE_OPT_SCALE_FREQ_MAX, 1, 0, 0},
{NULL, 0, 0, 0}
};
 
@@ -1975,6 +2010,44 @@ parse_args(int argc, char **argv)
parse_ptype = 1;
}
 
+   if (!strncmp(lgopts[option_index].name,
+   CMD_LINE_OPT_MAX_EMPTY_POLLS,
+   sizeof(CMD_LINE_OPT_MAX_EMPTY_POLLS))) {
+   printf("Maximum empty polls configured\n");
+   max_empty_polls = parse_int(optarg);
+   
rte_power_pmd_mgmt_set_emptypoll_max(max_empty_polls);
+   }
+
+   if (!strncmp(lgopts[option_index].name,
+   CMD_LINE_OPT_PAUSE_DURATION,
+   sizeof(CMD_LINE_OPT_PAUSE_DURATION))) {
+   printf("Pause duration configured\n");
+   pause_duration = parse_int(optarg);
+   
rte_power_pmd_mgmt_set_pause_duration(pause_duration);
+   }
+
+   if (!strncmp(lgopts[option_index].name,
+   CMD_LINE_OPT_SCALE_FREQ_MIN,
+   sizeof(CMD_LINE_OPT_SCALE_FREQ_MIN))) {
+   printf("Scaling frequency minimum 
configured\n");
+   scale_freq_min = parse_int(optarg);
+   for (i = 0; i < RTE_MAX_LCORE; i++)
+   if (rt

[PATCH 0/5] Fix IDXD PCI device close

2022-04-08 Thread Kevin Laatz
This patchset addresses the device close for IDXD PCI devices.
Initially, there was a memory leak reported by ASAN for the 'pci' member
of the 'idxd_dmadev' struct due to a missing free. In addition, this
patch set corrects the behaviour of the device close function to ensure
the cleanup is completed as expected.

Applications which use DMA devices should call rte_dma_close() for each
device probed in order to ensure proper cleanup of the devices. This has
been added to applications where DMA devices are commonly used.

Kevin Laatz (5):
  dma/idxd: fix memory leak in pci close
  dma/idxd: fix memory leak due to free on incorrect pointer
  app/test: close dma devices during cleanup
  app/testpmd: stop and close dmadevs at exit
  examples/dma: fix missing dma close

 app/test-pmd/testpmd.c   |  9 
 app/test/test.c  |  6 ++
 drivers/dma/idxd/idxd_common.c   |  1 +
 drivers/dma/idxd/idxd_internal.h |  2 ++
 drivers/dma/idxd/idxd_pci.c  | 36 +---
 examples/dma/dmafwd.c|  6 ++
 6 files changed, 52 insertions(+), 8 deletions(-)

-- 
2.31.1



[PATCH 1/5] dma/idxd: fix memory leak in pci close

2022-04-08 Thread Kevin Laatz
ASAN reports a memory leak for the 'pci' pointer in the 'idxd_dmadev'
struct.

This is fixed by free'ing the struct when the last queue on the PCI
device is being closed.

Fixes: 9449330a8458 ("dma/idxd: create dmadev instances on PCI probe")
Cc: sta...@dpdk.org
Cc: bruce.richard...@intel.com

Reported-by: Xingguang He 
Signed-off-by: Kevin Laatz 
---
 drivers/dma/idxd/idxd_common.c   |  1 +
 drivers/dma/idxd/idxd_internal.h |  2 ++
 drivers/dma/idxd/idxd_pci.c  | 34 +---
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/dma/idxd/idxd_common.c b/drivers/dma/idxd/idxd_common.c
index ea6413cc7a..57c4c1655d 100644
--- a/drivers/dma/idxd/idxd_common.c
+++ b/drivers/dma/idxd/idxd_common.c
@@ -597,6 +597,7 @@ idxd_dmadev_create(const char *name, struct rte_device *dev,
dmadev->fp_obj->dev_private = idxd;
 
idxd->dmadev->state = RTE_DMA_DEV_READY;
+   rte_atomic16_inc(&idxd->u.pci->ref_count);
 
return 0;
 
diff --git a/drivers/dma/idxd/idxd_internal.h b/drivers/dma/idxd/idxd_internal.h
index 3375600217..180a8587c6 100644
--- a/drivers/dma/idxd/idxd_internal.h
+++ b/drivers/dma/idxd/idxd_internal.h
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 
 #include "idxd_hw_defs.h"
 
@@ -33,6 +34,7 @@ struct idxd_pci_common {
rte_spinlock_t lk;
 
uint8_t wq_cfg_sz;
+   rte_atomic16_t ref_count;
volatile struct rte_idxd_bar0 *regs;
volatile uint32_t *wq_regs_base;
volatile struct rte_idxd_grpcfg *grp_regs;
diff --git a/drivers/dma/idxd/idxd_pci.c b/drivers/dma/idxd/idxd_pci.c
index 9ca1ec64e9..7036eb938d 100644
--- a/drivers/dma/idxd/idxd_pci.c
+++ b/drivers/dma/idxd/idxd_pci.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "idxd_internal.h"
 
@@ -115,20 +116,38 @@ idxd_pci_dev_close(struct rte_dma_dev *dev)
 {
struct idxd_dmadev *idxd = dev->fp_obj->dev_private;
uint8_t err_code;
+   int is_last_wq;
 
-   /* disable the device */
-   err_code = idxd_pci_dev_command(idxd, idxd_disable_dev);
-   if (err_code) {
-   IDXD_PMD_ERR("Error disabling device: code %#x", err_code);
-   return err_code;
+   if (idxd_is_wq_enabled(idxd)) {
+   /* disable the wq */
+   err_code = idxd_pci_dev_command(idxd, idxd_disable_wq);
+   if (err_code) {
+   IDXD_PMD_ERR("Error disabling wq: code %#x", err_code);
+   return err_code;
+   }
+   IDXD_PMD_DEBUG("IDXD WQ disabled OK");
}
-   IDXD_PMD_DEBUG("IDXD Device disabled OK");
 
/* free device memory */
IDXD_PMD_DEBUG("Freeing device driver memory");
rte_free(idxd->batch_idx_ring);
rte_free(idxd->desc_ring);
 
+   /* if this is the last WQ on the device, disable the device and free
+* the PCI struct
+*/
+   is_last_wq = rte_atomic16_dec_and_test(&idxd->u.pci->ref_count);
+   if (is_last_wq) {
+   /* disable the device */
+   err_code = idxd_pci_dev_command(idxd, idxd_disable_dev);
+   if (err_code) {
+   IDXD_PMD_ERR("Error disabling device: code %#x", 
err_code);
+   return err_code;
+   }
+   IDXD_PMD_DEBUG("IDXD device disabled OK");
+   rte_free(idxd->u.pci);
+   }
+
return 0;
 }
 
@@ -159,12 +178,13 @@ init_pci_device(struct rte_pci_device *dev, struct 
idxd_dmadev *idxd,
uint8_t lg2_max_batch, lg2_max_copy_size;
unsigned int i, err_code;
 
-   pci = malloc(sizeof(*pci));
+   pci = rte_malloc(NULL, sizeof(*pci), 0);
if (pci == NULL) {
IDXD_PMD_ERR("%s: Can't allocate memory", __func__);
err_code = -1;
goto err;
}
+   memset(pci, 0, sizeof(*pci));
rte_spinlock_init(&pci->lk);
 
/* assign the bar registers, and then configure device */
-- 
2.31.1



[PATCH 3/5] app/test: close dma devices during cleanup

2022-04-08 Thread Kevin Laatz
DMA devices are created during PCI probe of EAL init. These devices
need to be closed in order to perform necessary cleanup for those
devices. This patch adds the call to close() for all DMA devices.

Signed-off-by: Kevin Laatz 
---
 app/test/test.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/app/test/test.c b/app/test/test.c
index e69cae3eea..cc986e5cc9 100644
--- a/app/test/test.c
+++ b/app/test/test.c
@@ -24,6 +24,7 @@ extern cmdline_parse_ctx_t main_ctx[];
 #include 
 #include 
 #include 
+#include 
 #ifdef RTE_LIB_TIMER
 #include 
 #endif
@@ -244,6 +245,11 @@ main(int argc, char **argv)
 #ifdef RTE_LIB_TIMER
rte_timer_subsystem_finalize();
 #endif
+
+   /* close all dmadevs */
+   RTE_DMA_FOREACH_DEV(i)
+   rte_dma_close(i);
+
rte_eal_cleanup();
return ret;
 }
-- 
2.31.1



[PATCH 2/5] dma/idxd: fix memory leak due to free on incorrect pointer

2022-04-08 Thread Kevin Laatz
During PCI device close, any allocated memory needs to be free'd.
Currently, one of the free's is being called on an incorrect idxd_dmadev
struct member, namely 'batch_idx_ring', causing a memleak from the
pointer that should have been free'd.
This patch fixes this memleak by calling free on the correct pointer.

Fixes: 9449330a8458 ("dma/idxd: create dmadev instances on PCI probe")
Cc: sta...@dpdk.org
Cc: bruce.richard...@intel.com

Signed-off-by: Kevin Laatz 
---
 drivers/dma/idxd/idxd_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/idxd/idxd_pci.c b/drivers/dma/idxd/idxd_pci.c
index 7036eb938d..fdb1f15750 100644
--- a/drivers/dma/idxd/idxd_pci.c
+++ b/drivers/dma/idxd/idxd_pci.c
@@ -130,7 +130,7 @@ idxd_pci_dev_close(struct rte_dma_dev *dev)
 
/* free device memory */
IDXD_PMD_DEBUG("Freeing device driver memory");
-   rte_free(idxd->batch_idx_ring);
+   rte_free(idxd->batch_comp_ring);
rte_free(idxd->desc_ring);
 
/* if this is the last WQ on the device, disable the device and free
-- 
2.31.1



[PATCH 4/5] app/testpmd: stop and close dmadevs at exit

2022-04-08 Thread Kevin Laatz
DMA devices are created during PCI probe in EAL init. In order to
perform cleanup for those devices, they need to be stopped and closed.
This patch adds the necessary cleanup to ensure clean exit.

Signed-off-by: Kevin Laatz 
---
 app/test-pmd/testpmd.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe2ce19f99..438749c5b8 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #ifdef RTE_NET_IXGBE
@@ -3402,6 +3403,14 @@ pmd_test_exit(void)
}
}
 
+   /* stop and close all dmadevs */
+   RTE_DMA_FOREACH_DEV(i) {
+   printf("\nStopping and closing dmadev %d...\n", i);
+   fflush(stdout);
+   rte_dma_stop(i);
+   rte_dma_close(i);
+   }
+
if (hot_plug) {
ret = rte_dev_event_monitor_stop();
if (ret) {
-- 
2.31.1



[PATCH 5/5] examples/dma: fix missing dma close

2022-04-08 Thread Kevin Laatz
The application stops all dmadevs that it used but never closed any,
meaning device cleanup was not done.
This patch adds device cleanup for all dmadevs. All devices need to be
closed for completeness, since devices not used by the application may
also have been created during PCI probe of EAL init.

Fixes: d047310407a3 ("examples/ioat: port application to dmadev API")
Cc: sta...@dpdk.org
Cc: bruce.richard...@intel.com

Signed-off-by: Kevin Laatz 
---
 examples/dma/dmafwd.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/examples/dma/dmafwd.c b/examples/dma/dmafwd.c
index 608487e35c..4c612a0e0b 100644
--- a/examples/dma/dmafwd.c
+++ b/examples/dma/dmafwd.c
@@ -1097,6 +1097,12 @@ main(int argc, char **argv)
rte_ring_free(cfg.ports[i].rx_to_tx_ring);
}
 
+   /* close all dmadevs */
+   RTE_DMA_FOREACH_DEV(i) {
+   printf("Closing dmadev %d\n", i);
+   rte_dma_close(i);
+   }
+
/* clean up the EAL */
rte_eal_cleanup();
 
-- 
2.31.1



[PATCH] dma/idxd: fix return value for pci device commands

2022-04-08 Thread Kevin Laatz
When sending a command to an idxd device via pci bar, the response from
HW is checked to ensure it was successful. The response was incorrectly
being negated before being returned by the function, meaning error codes
cannot be checked against the HW specification.

This patch fixes the return values of the function by removing the
negation.

Signed-off-by: Kevin Laatz 
---
 drivers/dma/idxd/idxd_pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/idxd/idxd_pci.c b/drivers/dma/idxd/idxd_pci.c
index fa0a1ad562..844077b079 100644
--- a/drivers/dma/idxd/idxd_pci.c
+++ b/drivers/dma/idxd/idxd_pci.c
@@ -39,13 +39,13 @@ idxd_pci_dev_command(struct idxd_dmadev *idxd, enum 
rte_idxd_cmds command)
IDXD_PMD_ERR("Timeout waiting for command response from 
HW");
rte_spinlock_unlock(&idxd->u.pci->lk);
err_code &= CMDSTATUS_ERR_MASK;
-   return -err_code;
+   return err_code;
}
} while (err_code & CMDSTATUS_ACTIVE_MASK);
rte_spinlock_unlock(&idxd->u.pci->lk);
 
err_code &= CMDSTATUS_ERR_MASK;
-   return -err_code;
+   return err_code;
 }
 
 static uint32_t *
-- 
2.31.1



[PATCH v4] eal: add seqlock

2022-04-08 Thread Mattias Rönnblom
A sequence lock (seqlock) is synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, especially for
data structures shared across many cores and which are updated
relatively infrequently.

A seqlock permits multiple parallel readers. The variant of seqlock
implemented in this patch supports multiple writers as well. A
spinlock is used for writer-writer serialization.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger what the native atomic
machine instructions allow for.

DPDK seqlocks are not preemption safe on the writer side. A thread
preemption affects performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
 If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

PATCH v4:
  * Reverted to Linux kernel style naming on the read side.
  * Bail out early from the retry function if an odd sequence
number is encountered.
  * Added experimental warnings in the API documentation.
  * Static initializer now uses named field initialization.
  * Various tweaks to API documentation (including the example).

PATCH v3:
  * Renamed both read and write-side critical section begin/end functions
to better match rwlock naming, per Ola Liljedahl's suggestion.
  * Added 'extern "C"' guards for C++ compatibility.
  * Refer to the main lcore as the main lcore, and nothing else.

PATCH v2:
  * Skip instead of fail unit test in case too few lcores are available.
  * Use main lcore for testing, reducing the minimum number of lcores
required to run the unit tests to four.
  * Consistently refer to sn field as the "sequence number" in the
documentation.
  * Fixed spelling mistakes in documentation.

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
overkill) to uint32_t. The sn type needs to be sufficiently large
to assure no reader will read a sn, access the data, and then read
the same sn, but the sn has been incremented enough times to have
wrapped during the read, and arrived back at the original sn.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
with an anonymous struct typedef:ed to rte_seqlock_t.

Acked-by: Morten Brørup 
Reviewed-by: Ola Liljedahl 
Signed-off-by: Mattias Rönnblom 
---
 app/test/meson.build  |   2 +
 app/test/test_seqlock.c   | 202 +
 lib/eal/common/meson.build|   1 +
 lib/eal/common/rte_seqlock.c  |  12 ++
 lib/eal/include/meson.build   |   1 +
 lib/eal/include/rte_seqlock.h | 319 ++
 lib/eal/version.map   |   3 +
 7 files changed, 540 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/common/rte_seqlock.c
 create mode 100644 lib/eal/include/rte_seqlock.h

diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1b7b..5e418e8766 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@ test_sources = files(
 'test_rwlock.c',
 'test_sched.c',
 'test_security.c',
+'test_seqlock.c',
 'test_service_cores.c',
 'test_spinlock.c',
 'test_stack.c',
@@ -214,6 +215,7 @@ fast_tests = [
 ['rwlock_rde_wro_autotest', true],
 ['sched_autotest', true],
 ['security_autotest', false],
+['seqlock_autotest', true],
 ['spinlock_autotest', true],
 ['stack_autotest', false],
 ['stack_lf_autotest', false],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 00..3f1ce53678

Re: [PATCH 1/5] dma/idxd: fix memory leak in pci close

2022-04-08 Thread Bruce Richardson
On Fri, Apr 08, 2022 at 03:15:00PM +0100, Kevin Laatz wrote:
> ASAN reports a memory leak for the 'pci' pointer in the 'idxd_dmadev'
> struct.
> 
> This is fixed by free'ing the struct when the last queue on the PCI
> device is being closed.
> 
> Fixes: 9449330a8458 ("dma/idxd: create dmadev instances on PCI probe")
> Cc: sta...@dpdk.org
> Cc: bruce.richard...@intel.com
> 
> Reported-by: Xingguang He 
> Signed-off-by: Kevin Laatz 
> ---
Acked-by: Bruce Richardson 


Re: [PATCH 2/5] dma/idxd: fix memory leak due to free on incorrect pointer

2022-04-08 Thread Bruce Richardson
On Fri, Apr 08, 2022 at 03:15:01PM +0100, Kevin Laatz wrote:
> During PCI device close, any allocated memory needs to be free'd.
> Currently, one of the free's is being called on an incorrect idxd_dmadev
> struct member, namely 'batch_idx_ring', causing a memleak from the
> pointer that should have been free'd.
> This patch fixes this memleak by calling free on the correct pointer.
> 
> Fixes: 9449330a8458 ("dma/idxd: create dmadev instances on PCI probe")
> Cc: sta...@dpdk.org
> Cc: bruce.richard...@intel.com
> 
> Signed-off-by: Kevin Laatz 
> ---
>  drivers/dma/idxd/idxd_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/idxd/idxd_pci.c b/drivers/dma/idxd/idxd_pci.c
> index 7036eb938d..fdb1f15750 100644
> --- a/drivers/dma/idxd/idxd_pci.c
> +++ b/drivers/dma/idxd/idxd_pci.c
> @@ -130,7 +130,7 @@ idxd_pci_dev_close(struct rte_dma_dev *dev)
>  
>   /* free device memory */
>   IDXD_PMD_DEBUG("Freeing device driver memory");
> - rte_free(idxd->batch_idx_ring);
> + rte_free(idxd->batch_comp_ring);
>   rte_free(idxd->desc_ring);
>  
This is largely my fault, I expect, for being "smart" and allocating the
memory for both arrays from the one allocation. To clarify things, we need
to:
1) update the commit log message explaining why it's the wrong pointer,
i.e. that the two are in the one memory reservation
2) similarly add a comment to the rte_free call noting that it frees the
idx_ring too.

Alternatively, we can also consider adjusting the allocation code so both
arrays are allocated separately, and on free are freed similarly. We would,
however, need to double check that doing so introduces no perf hit.

/Bruce


Re: [PATCH 3/5] app/test: close dma devices during cleanup

2022-04-08 Thread Bruce Richardson
On Fri, Apr 08, 2022 at 03:15:02PM +0100, Kevin Laatz wrote:
> DMA devices are created during PCI probe of EAL init. These devices need
> to be closed in order to perform necessary cleanup for those devices.
> This patch adds the call to close() for all DMA devices.
> 
> Signed-off-by: Kevin Laatz  --- app/test/test.c |
> 6 ++ 1 file changed, 6 insertions(+)
> 
Just to clarify the situation here - on EAL init, all buses are probed and
all devices initialized. On eal_cleanup/rte_exit the inverse does not
happen, then, i.e. all probed devices on all buses are not closed, right?
This would seem a better option than requiring each application to manually
close all devices even if it never used them. However, it is probably a
bigger and more complex change.

/Bruce


Re: [PATCH v3] examples/kni: add interrupt mode to receive packets

2022-04-08 Thread Stephen Hemminger
On Fri,  8 Apr 2022 17:12:06 +0800
Tianli Lai  wrote:
> + if (status[lcore].wakeup) {
> + RTE_LOG(INFO, APP,
> + "lcore %u sleeps until interrupt triggers\n",
> + rte_lcore_id());
> + }

Shouldn't the be at DEBUG level.

> + /* initialize spinlock for each port */
> + rte_spinlock_init(&(locks[i]));

This comment seems rather obvious and unneeded.

> +static void
> +turn_on_off_intr(uint16_t port_id, uint16_t queue_id, bool on)
> +{
> + rte_spinlock_lock(&(locks[port_id]));
> + if (on)
> + rte_eth_dev_rx_intr_enable(port_id, queue_id);
> + else
> + rte_eth_dev_rx_intr_disable(port_id, queue_id);
> + rte_spinlock_unlock(&(locks[port_id]));
> +}

Since Rx queue can not safely shared between cores, why do you need
lock at all?


Re: [PATCH v4] eal: add seqlock

2022-04-08 Thread Stephen Hemminger
On Fri, 8 Apr 2022 16:24:42 +0200
Mattias Rönnblom  wrote:

> +++ b/lib/eal/common/rte_seqlock.c
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#include 
> +
> +void
> +rte_seqlock_init(rte_seqlock_t *seqlock)
> +{
> + seqlock->sn = 0;
> + rte_spinlock_init(&seqlock->lock);
> +}

Why not put init in rte_seqlock.h (like other locks)
and not need a .c at all?



Re: [PATCH v4] eal: add seqlock

2022-04-08 Thread Stephen Hemminger
On Fri, 8 Apr 2022 16:24:42 +0200
Mattias Rönnblom  wrote:

> + /* A writer pegged the sequence number during the read operation. */
> + if (unlikely(begin_sn != end_sn))
> + return true;

In some countries "pegged" might be considered inappropriate slang.
Use incremented or changed instead.


Re: [PATCH v4] eal: add seqlock

2022-04-08 Thread Mattias Rönnblom

On 2022-04-08 16:24, Mattias Rönnblom wrote:





PATCH v4:
   * Reverted to Linux kernel style naming on the read side.


In this version I chose to adhere to kernel naming on the read side, but 
keep the write_lock()/unlock() on the write side.


I think those names communicate better what the functions do, but 
Stephen's comment about keeping naming and semantics close to the Linux 
kernel APIs is very much relevant, also for the write functions.


I don't really have an opinion if we keep these names, or if we change 
to rte_seqlock_write_begin()/end().


You might ask yourself which of the two naming options make most sense 
in the light that we might extend the proposed seqlock API with an 
"unlocked" (non-writer-serializing) seqlock variant, or variants with 
other types of lock, in the future. What function writer-side names 
would be suitable for such. (I don't know, but it seemed something that 
might be useful to consider.)





RE: [PATCH v4] examples/l3fwd: merge l3fwd-acl into l3fwd

2022-04-08 Thread Ananyev, Konstantin

Hi Sean,

Few nits, that I didn't spot previously, pls see below.
 
> l3fwd-acl contains duplicate functions to l3fwd.
> For this reason we merge l3fwd-acl code into l3fwd
> with '--lookup acl' cmdline option to run ACL.
> 
> Signed-off-by: Sean Morrissey 
> ---
>  MAINTAINERS   |2 -
>  doc/guides/rel_notes/release_22_07.rst|5 +
>  doc/guides/sample_app_ug/index.rst|1 -
>  doc/guides/sample_app_ug/l3_forward.rst   |   63 +-
>  .../sample_app_ug/l3_forward_access_ctrl.rst  |  340 ---
>  examples/l3fwd-acl/Makefile   |   51 -
>  examples/l3fwd-acl/main.c | 2272 -
>  examples/l3fwd-acl/meson.build|   13 -
>  examples/l3fwd/Makefile   |2 +-
>  examples/l3fwd/l3fwd.h|   39 +-
>  examples/l3fwd/l3fwd_acl.c| 1098 
>  examples/l3fwd/l3fwd_acl.h|   51 +
>  examples/l3fwd/l3fwd_acl_scalar.h |  112 +
>  examples/l3fwd/l3fwd_route.h  |   16 +
>  examples/l3fwd/main.c |   65 +-
>  examples/l3fwd/meson.build|3 +-
>  examples/meson.build  |1 -
>  17 files changed, 1434 insertions(+), 2700 deletions(-)
>  delete mode 100644 doc/guides/sample_app_ug/l3_forward_access_ctrl.rst
>  delete mode 100644 examples/l3fwd-acl/Makefile
>  delete mode 100644 examples/l3fwd-acl/main.c
>  delete mode 100644 examples/l3fwd-acl/meson.build
>  create mode 100644 examples/l3fwd/l3fwd_acl.c
>  create mode 100644 examples/l3fwd/l3fwd_acl.h
>  create mode 100644 examples/l3fwd/l3fwd_acl_scalar.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 15008c03bc..b29ff8929d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1505,8 +1505,6 @@ F: lib/acl/
>  F: doc/guides/prog_guide/packet_classif_access_ctrl.rst
>  F: app/test-acl/
>  F: app/test/test_acl.*
> -F: examples/l3fwd-acl/
> -F: doc/guides/sample_app_ug/l3_forward_access_ctrl.rst
> 
...

> diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
> index ad39496e64..c22cb796f5 100644
> --- a/examples/l3fwd/l3fwd.h
> +++ b/examples/l3fwd/l3fwd.h
> @@ -7,6 +7,7 @@
> 
>  #include 
>  #include 
> +#include 
> 
>  #define DO_RFC_1812_CHECKS
> 
> @@ -61,6 +62,12 @@
>  struct parm_cfg {
>   const char *rule_ipv4_name;
>   const char *rule_ipv6_name;
> + enum rte_acl_classify_alg alg;
> +};
> +
> +struct acl_algorithms {
> + const char *name;
> + enum rte_acl_classify_alg alg;
>  };
> 
>  struct mbuf_table {
> @@ -80,6 +87,7 @@ struct lcore_conf {
>   uint16_t tx_port_id[RTE_MAX_ETHPORTS];
>   uint16_t tx_queue_id[RTE_MAX_ETHPORTS];
>   struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
> + struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];

I don't think it is used anywhere.
Probably remains from previous iterations.


>   void *ipv4_lookup_struct;
>   void *ipv6_lookup_struct;
>  } __rte_cache_aligned;
> @@ -107,6 +115,8 @@ extern struct lcore_conf lcore_conf[RTE_MAX_LCORE];
> 
>  extern struct parm_cfg parm_config;
> 
> +extern struct acl_algorithms acl_alg[];
> +
>  /* Send burst of packets on an output interface */
>  static inline int
>  send_burst(struct lcore_conf *qconf, uint16_t n, uint16_t port)
> @@ -190,10 +200,19 @@ is_valid_ipv4_pkt(struct rte_ipv4_hdr *pkt, uint32_t 
> link_len)
>  }
>  #endif /* DO_RFC_1812_CHECKS */
> 
> +enum rte_acl_classify_alg
> +parse_acl_alg(const char *alg);
> +
> +int
> +usage_acl_alg(char *buf, size_t sz);
> +
>  int
>  init_mem(uint16_t portid, unsigned int nb_mbuf);
> 
> -/* Function pointers for LPM, EM or FIB functionality. */
> +/* Function pointers for ACL, LPM, EM or FIB functionality. */
> +void
> +setup_acl(const int socketid);
> +
>  void
>  setup_lpm(const int socketid);
> 
> @@ -203,12 +222,19 @@ setup_hash(const int socketid);
>  void
>  setup_fib(const int socketid);
> 
> +int
> +acl_check_ptype(int portid);
> +
>  int
>  em_check_ptype(int portid);
> 
>  int
>  lpm_check_ptype(int portid);

Seems not used/defined.

> +uint16_t
> +acl_cb_parse_ptype(uint16_t port, uint16_t queue, struct rte_mbuf *pkts[],
> +   uint16_t nb_pkts, uint16_t max_pkts, void *user_param);
> +

Same not used/defined.


>  uint16_t
>  em_cb_parse_ptype(uint16_t port, uint16_t queue, struct rte_mbuf *pkts[],
> uint16_t nb_pkts, uint16_t max_pkts, void *user_param);
> @@ -217,6 +243,9 @@ uint16_t
>  lpm_cb_parse_ptype(uint16_t port, uint16_t queue, struct rte_mbuf *pkts[],
>  uint16_t nb_pkts, uint16_t max_pkts, void *user_param);
> 
> +int
> +acl_main_loop(__rte_unused void *dummy);
> +
>  int
>  em_main_loop(__rte_unused void *dummy);
> 
> @@ -278,7 +307,13 @@ int
>  fib_event_main_loop_tx_q_burst_vector(__rte_unused void *dummy);
> 
> 
> -/* Return ipv4/ipv6 fwd lookup struct for LPM, EM or FIB. */
> +/* Return ipv4/ipv6 fwd lookup struct f

Re: [PATCH v4] examples/l3fwd: merge l3fwd-acl into l3fwd

2022-04-08 Thread Morrissey, Sean

Hi Konstantin,

Comment on some of your feedback below. I will make the rest of the 
changes and send a new version.


On 08/04/2022 18:26, Ananyev, Konstantin wrote:

Hi Sean,

Few nits, that I didn't spot previously, pls see below.
  

+
+/* Setup ACL context. 8< */

Looks like some typo within comments.


I believe these characters are needed in the comments to state the start 
and end of the automated code snippets for the docs.



+static struct rte_acl_ctx*
+app_acl_init(struct rte_acl_rule *route_base,
+   struct rte_acl_rule *acl_base, unsigned int route_num,
+   unsigned int acl_num, int ipv6, int socketid)
+{
+   char name[PATH_MAX];
+   struct rte_acl_param acl_param;
+   struct rte_acl_config acl_build_param;
+   struct rte_acl_ctx *context;
+   int dim = ipv6 ? RTE_DIM(ipv6_defs) : RTE_DIM(ipv4_defs);
+
+   /* Create ACL contexts */
+   snprintf(name, sizeof(name), "%s%d",
+   ipv6 ? L3FWD_ACL_IPV6_NAME : L3FWD_ACL_IPV4_NAME,
+   socketid);
+
+   acl_param.name = name;
+   acl_param.socket_id = socketid;
+   acl_param.rule_size = RTE_ACL_RULE_SZ(dim);
+   acl_param.max_rule_num = MAX_ACL_RULE_NUM;
+
+   context = rte_acl_create(&acl_param);
+   if (context == NULL)
+   rte_exit(EXIT_FAILURE, "Failed to create ACL context\n");
+
+   if (parm_config.alg != RTE_ACL_CLASSIFY_DEFAULT &&
+   rte_acl_set_ctx_classify(context, parm_config.alg) != 0)
+   rte_exit(EXIT_FAILURE,
+   "Failed to setup classify method for  ACL context\n");
+
+   if (rte_acl_add_rules(context, route_base, route_num) < 0)
+   rte_exit(EXIT_FAILURE, "add rules failed\n");
+
+   if (rte_acl_add_rules(context, acl_base, acl_num) < 0)
+   rte_exit(EXIT_FAILURE, "add rules failed\n");
+
+   /* Perform builds */
+   memset(&acl_build_param, 0, sizeof(acl_build_param));
+
+   acl_build_param.num_categories = DEFAULT_MAX_CATEGORIES;
+   acl_build_param.num_fields = dim;
+   memcpy(&acl_build_param.defs, ipv6 ? ipv6_defs : ipv4_defs,
+   ipv6 ? sizeof(ipv6_defs) : sizeof(ipv4_defs));
+
+   if (rte_acl_build(context, &acl_build_param) != 0)
+   rte_exit(EXIT_FAILURE, "Failed to build ACL trie\n");
+
+   rte_acl_dump(context);
+
+   return context;
+}
+/* >8 End of ACL context setup. */

Typo in comments.


Same as above.



[RFC 0/1] net/iavf: add vector PMD for Arm for basic Rx path

2022-04-08 Thread Kathleen Capella
This patch aims to add the basic NEON Rx path to iavf driver. Currently,
the main Rx function (_recv_raw_pkts_vec) and the functions it depends on
have been implemented. Also, NEON vector path has been added to
iavf_set_rx_function. The code compiles on N1SDP platform and some traffic
testing has been done with testpmd application. 

Still to be done as part of 22.07:
 - add FDIR extraction
 - functional testing
 - performance testing

Scatter and flex Rx paths will be deferred until a later release.

Kathleen Capella (1):
  net/iavf: add vector PMD for Arm for basic Rx path

 drivers/net/iavf/iavf_rxtx.c  |  12 +-
 drivers/net/iavf/iavf_rxtx_vec_neon.c | 392 ++
 drivers/net/iavf/meson.build  |   2 +
 3 files changed, 404 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/iavf/iavf_rxtx_vec_neon.c

-- 
2.17.1



[RFC 1/1] net/iavf: add vector PMD for Arm for basic Rx path

2022-04-08 Thread Kathleen Capella
This patch adds the basic NEON Rx path to the iavf driver. It does not
include scatter or flex varieties.

Signed-off-by: Kathleen Capella 
---
 drivers/net/iavf/iavf_rxtx.c  |  12 +-
 drivers/net/iavf/iavf_rxtx_vec_neon.c | 392 ++
 drivers/net/iavf/meson.build  |   2 +
 3 files changed, 404 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/iavf/iavf_rxtx_vec_neon.c

diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index 16e8d021f9..27a75bb358 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -2795,11 +2795,12 @@ iavf_set_rx_function(struct rte_eth_dev *dev)
struct iavf_adapter *adapter =
IAVF_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
+   int check_ret;
 
 #ifdef RTE_ARCH_X86
struct iavf_rx_queue *rxq;
int i;
-   int check_ret;
+
bool use_avx2 = false;
bool use_avx512 = false;
bool use_flex = false;
@@ -2921,8 +2922,15 @@ iavf_set_rx_function(struct rte_eth_dev *dev)
 
return;
}
+#endif /* RTE_ARCH_X86 */
 
-#endif
+   check_ret = iavf_rx_vec_dev_check(dev);
+   if (check_ret >= 0) {
+   PMD_DRV_LOG(DEBUG, "Using a Vector Rx callback (port=%d).",
+   dev->data->port_id);
+   dev->rx_pkt_burst = iavf_recv_pkts_vec;
+   return;
+   }
if (dev->data->scattered_rx) {
PMD_DRV_LOG(DEBUG, "Using a Scattered Rx callback (port=%d).",
dev->data->port_id);
diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c 
b/drivers/net/iavf/iavf_rxtx_vec_neon.c
new file mode 100644
index 00..e1f1ce7ba9
--- /dev/null
+++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c
@@ -0,0 +1,392 @@
+#include 
+#include 
+#include 
+#include 
+
+#include "iavf.h"
+#include "iavf_rxtx.h"
+#include "iavf_rxtx_vec_common.h"
+
+
+#pragma GCC diagnostic ignored "-Wcast-qual"
+
+static inline void
+iavf_rxq_rearm(struct iavf_rx_queue *rxq)
+{
+   int i;
+   uint16_t rx_id;
+   volatile union iavf_rx_desc *rxdp;
+   struct rte_mbuf **rxep = &rxq->sw_ring[rxq->rxrearm_start];
+   struct rte_mbuf *mb0, *mb1;
+   uint64x2_t dma_addr0, dma_addr1;
+   uint64x2_t zero = vdupq_n_u64(0);
+   uint64_t paddr;
+
+   rxdp = rxq->rx_ring + rxq->rxrearm_start;
+
+   /* Pull 'n' more MBUFs into the software ring */
+   if (unlikely(rte_mempool_get_bulk(rxq->mp,
+ (void *)rxep,
+ IAVF_RXQ_REARM_THRESH) < 0)) {
+   if (rxq->rxrearm_nb + IAVF_RXQ_REARM_THRESH >=
+   rxq->nb_rx_desc) {
+   for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) {
+   rxep[i] = &rxq->fake_mbuf;
+   vst1q_u64((uint64_t *)&rxdp[i].read, zero);
+   }
+   }
+   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+   IAVF_RXQ_REARM_THRESH;
+   return;
+   }
+
+   /* Initialize the mbufs in vector, process 2 mbufs in one loop */
+   for (i = 0; i < IAVF_RXQ_REARM_THRESH; i += 2, rxep += 2) {
+   mb0 = rxep[0];
+   mb1 = rxep[1];
+
+   paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM;
+   dma_addr0 = vdupq_n_u64(paddr);
+
+   /* flush desc with pa dma_addr */
+   vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0);
+
+   paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM;
+   dma_addr1 = vdupq_n_u64(paddr);
+   vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1);
+   }
+
+   rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH;
+   if (rxq->rxrearm_start >= rxq->nb_rx_desc)
+   rxq->rxrearm_start = 0;
+
+   rxq->rxrearm_nb -= IAVF_RXQ_REARM_THRESH;
+
+   rx_id = (uint16_t)((rxq->rxrearm_start == 0) ?
+(rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1));
+
+   rte_io_wmb();
+   /* Update the tail pointer on the NIC */
+   IAVF_PCI_REG_WRITE_RELAXED(rxq->qrx_tail, rx_id);
+}
+
+static inline void
+desc_to_olflags_v(struct iavf_rx_queue *rxq, volatile union iavf_rx_desc *rxdp,
+ uint64x2_t descs[4], struct rte_mbuf **rx_pkts)
+{
+   uint32x4_t vlan0, vlan1, rss, l3_l4e;
+   const uint64x2_t mbuf_init = {rxq->mbuf_initializer, 0};
+   uint64x2_t rearm0, rearm1, rearm2, rearm3;
+
+   /* mask everything except RSS, flow director and VLAN flags
+* bit2 is for VLAN tag, bit11 for flow director indication
+* bit13:12 for RSS indication.
+*/
+   const uint32x4_t rss_vlan_msk = {
+   0x1c03804, 0x1c03804, 0x1c03804, 0x1c03804};
+
+   const uint32x4_t cksum_mask = {

[PATCH] pipeline: support default action arguments

2022-04-08 Thread Cristian Dumitrescu
Add support for default action arguments. Up to now, only default
actions with no arguments were accepted.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Yogesh Jangra 
---
 lib/pipeline/rte_swx_pipeline.c  | 200 +--
 lib/pipeline/rte_swx_pipeline.h  |  18 +--
 lib/pipeline/rte_swx_pipeline_spec.c |  14 +-
 3 files changed, 176 insertions(+), 56 deletions(-)

diff --git a/lib/pipeline/rte_swx_pipeline.c b/lib/pipeline/rte_swx_pipeline.c
index 17da11c015..e1aee68225 100644
--- a/lib/pipeline/rte_swx_pipeline.c
+++ b/lib/pipeline/rte_swx_pipeline.c
@@ -7337,6 +7337,91 @@ action_arg_src_mov_count(struct action *a,
return n_users;
 }
 
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define field_ntoh(val, n_bits) (ntoh64((val) << (64 - n_bits)))
+#define field_hton(val, n_bits) (hton64((val) << (64 - n_bits)))
+#else
+#define field_ntoh(val, n_bits) (val)
+#define field_hton(val, n_bits) (val)
+#endif
+
+#define ACTION_ARGS_TOKENS_MAX 256
+
+static int
+action_args_parse(struct action *a, const char *args, uint8_t *data)
+{
+   char *tokens[ACTION_ARGS_TOKENS_MAX], *s0 = NULL, *s;
+   uint32_t n_tokens = 0, offset = 0, i;
+   int status = 0;
+
+   /* Checks. */
+   if (!a->st || !args || !args[0]) {
+   status = -EINVAL;
+   goto error;
+   }
+
+   /* Memory allocation. */
+   s0 = strdup(args);
+   if (!s0) {
+   status = -ENOMEM;
+   goto error;
+   }
+
+   /* Parse the string into tokens. */
+   for (s = s0; ; ) {
+   char *token;
+
+   token = strtok_r(s, " \f\n\r\t\v", &s);
+   if (!token)
+   break;
+
+   if (n_tokens >= RTE_DIM(tokens)) {
+   status = -EINVAL;
+   goto error;
+   }
+
+   tokens[n_tokens] = token;
+   n_tokens++;
+   }
+
+   /* More checks. */
+   if (n_tokens != a->st->n_fields * 2) {
+   status = -EINVAL;
+   goto error;
+   }
+
+   /* Process the action arguments. */
+   for (i = 0; i < a->st->n_fields; i++) {
+   struct field *f = &a->st->fields[i];
+   char *arg_name = tokens[i * 2];
+   char *arg_val = tokens[i * 2 + 1];
+   uint64_t val;
+
+   if (strcmp(arg_name, f->name)) {
+   status = -EINVAL;
+   goto error;
+   }
+
+   val = strtoull(arg_val, &arg_val, 0);
+   if (arg_val[0]) {
+   status = -EINVAL;
+   goto error;
+   }
+
+   /* Endianness conversion. */
+   if (a->args_endianness[i])
+   val = field_hton(val, f->n_bits);
+
+   /* Copy to entry. */
+   memcpy(&data[offset], (uint8_t *)&val, f->n_bits / 8);
+   offset += f->n_bits / 8;
+   }
+
+error:
+   free(s0);
+   return status;
+}
+
 /*
  * Table.
  */
@@ -7609,8 +7694,8 @@ rte_swx_pipeline_table_config(struct rte_swx_pipeline *p,
  EINVAL);
 
default_action = action_find(p, params->default_action_name);
-   CHECK((default_action->st && params->default_action_data) ||
- !params->default_action_data, EINVAL);
+   CHECK((default_action->st && params->default_action_args) || 
!params->default_action_args,
+ EINVAL);
 
/* Table type checks. */
if (recommended_table_type_name)
@@ -7631,30 +7716,42 @@ rte_swx_pipeline_table_config(struct rte_swx_pipeline 
*p,
 
/* Memory allocation. */
t = calloc(1, sizeof(struct table));
-   if (!t)
-   goto nomem;
+   if (!t) {
+   status = -ENOMEM;
+   goto error;
+   }
 
t->fields = calloc(params->n_fields, sizeof(struct match_field));
-   if (!t->fields)
-   goto nomem;
+   if (!t->fields) {
+   status = -ENOMEM;
+   goto error;
+   }
 
t->actions = calloc(params->n_actions, sizeof(struct action *));
-   if (!t->actions)
-   goto nomem;
+   if (!t->actions) {
+   status = -ENOMEM;
+   goto error;
+   }
 
if (action_data_size_max) {
t->default_action_data = calloc(1, action_data_size_max);
-   if (!t->default_action_data)
-   goto nomem;
+   if (!t->default_action_data) {
+   status = -ENOMEM;
+   goto error;
+   }
}
 
t->action_is_for_table_entries = calloc(params->n_actions, sizeof(int));
-   if (!t->action_is_for_table_entries)
-   goto nomem;
+   if (!t->action_is_for_table_entries) {
+   status = -ENOMEM;
+   goto error;
+   }
 
t->action_is_for_default_entry

[PATCH v4] examples/kni: add interrupt mode to receive packets

2022-04-08 Thread Tianli Lai
kni application have two main-loop threads that they
CPU utilization are up to 100 percent, this two theads are
writing thread and reading thread. I thank set interrupt mode
at reading thread would reduce this thread CPU utilization.

Signed-off-by: Tianli Lai 
---
 examples/kni/main.c | 91 +
 1 file changed, 84 insertions(+), 7 deletions(-)

diff --git a/examples/kni/main.c b/examples/kni/main.c
index e99ef5c38a..d4d7a3daa9 100644
--- a/examples/kni/main.c
+++ b/examples/kni/main.c
@@ -73,6 +73,7 @@
 
 #define KNI_US_PER_SECOND   100
 #define KNI_SECOND_PER_DAY  86400
+#define MIN_ZERO_POLL_COUNT 100
 
 #define KNI_MAX_KTHREAD 32
 /*
@@ -107,6 +108,8 @@ static uint32_t ports_mask = 0;
 static int promiscuous_on = 0;
 /* Monitor link status continually. off by default. */
 static int monitor_links;
+/* rx set in interrupt mode off by default. */
+static int intr_rx_en;
 
 /* Structure type for recording kni interface specific stats */
 struct kni_interface_stats {
@@ -206,7 +209,7 @@ kni_burst_free_mbufs(struct rte_mbuf **pkts, unsigned num)
 /**
  * Interface to burst rx and enqueue mbufs into rx_q
  */
-static void
+static int
 kni_ingress(struct kni_port_params *p)
 {
uint8_t i;
@@ -214,9 +217,9 @@ kni_ingress(struct kni_port_params *p)
unsigned nb_rx, num;
uint32_t nb_kni;
struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
-
+   int ret = 0;
if (p == NULL)
-   return;
+   return -1;
 
nb_kni = p->nb_kni;
port_id = p->port_id;
@@ -225,8 +228,10 @@ kni_ingress(struct kni_port_params *p)
nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ);
if (unlikely(nb_rx > PKT_BURST_SZ)) {
RTE_LOG(ERR, APP, "Error receiving from eth\n");
-   return;
+   return -1;
}
+   if (nb_rx == 0)
+   ret = 1;
/* Burst tx to kni */
num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx);
if (num)
@@ -239,6 +244,7 @@ kni_ingress(struct kni_port_params *p)
kni_stats[port_id].rx_dropped += nb_rx - num;
}
}
+   return ret;
 }
 
 /**
@@ -277,12 +283,56 @@ kni_egress(struct kni_port_params *p)
}
 }
 
+static int
+sleep_until_rx_interrupt(int num, int lcore)
+{
+   static struct {
+   bool wakeup;
+   } __rte_cache_aligned status[RTE_MAX_LCORE];
+   struct rte_epoll_event event[num];
+   int n;
+
+   if (status[lcore].wakeup) {
+   RTE_LOG(DEBUG, APP,
+   "lcore %u sleeps until interrupt triggers\n",
+   rte_lcore_id());
+   }
+   n = rte_epoll_wait(RTE_EPOLL_PER_THREAD, event, num, 10);
+   status[lcore].wakeup = n != 0;
+
+   return 0;
+}
+
+static void
+turn_on_off_intr(uint16_t port_id, uint16_t queue_id, bool on)
+{
+   if (on)
+   rte_eth_dev_rx_intr_enable(port_id, queue_id);
+   else
+   rte_eth_dev_rx_intr_disable(port_id, queue_id);
+}
+
+static int event_register(void)
+{
+   int ret;
+
+   ret = rte_eth_dev_rx_intr_ctl_q(0, 0,
+   RTE_EPOLL_PER_THREAD,
+   RTE_INTR_EVENT_ADD, NULL);
+   if (ret)
+   return ret;
+
+   return 0;
+}
+
 static int
 main_loop(__rte_unused void *arg)
 {
uint16_t i;
int32_t f_stop;
int32_t f_pause;
+   int ret = 0;
+   uint32_t zero_rx_packet_count = 0;
const unsigned lcore_id = rte_lcore_id();
enum lcore_rxtx {
LCORE_NONE,
@@ -291,12 +341,17 @@ main_loop(__rte_unused void *arg)
LCORE_MAX
};
enum lcore_rxtx flag = LCORE_NONE;
+   int intr_en = 0;
 
RTE_ETH_FOREACH_DEV(i) {
if (!kni_port_params_array[i])
continue;
if (kni_port_params_array[i]->lcore_rx == (uint8_t)lcore_id) {
flag = LCORE_RX;
+   if (intr_rx_en && !event_register())
+   intr_en = 1;
+   else
+   RTE_LOG(DEBUG, APP, "RX interrupt won't 
enable.\n");
break;
} else if (kni_port_params_array[i]->lcore_tx ==
(uint8_t)lcore_id) {
@@ -316,7 +371,23 @@ main_loop(__rte_unused void *arg)
break;
if (f_pause)
continue;
-   kni_ingress(kni_port_params_array[i]);
+   ret = kni_ingress(kni_port_params_array[i]);
+   if (ret == 1) {
+   zero_rx_packet_count++;
+