RE: [PATCH v2 06/15] vdpa/mlx5: pre-create virtq in the prob
Thanks for your comments and will fix it on V3. Regards, Li Zhang > -Original Message- > From: Maxime Coquelin > Sent: Friday, June 17, 2022 11:54 PM > To: Li Zhang ; Ori Kam ; Slava > Ovsiienko ; Matan Azrad ; > Shahaf Shuler > Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL) > ; Raslan Darawsheh ; Roni > Bar Yanai > Subject: Re: [PATCH v2 06/15] vdpa/mlx5: pre-create virtq in the prob > > External email: Use caution opening links or attachments > > > I would rename the title to something like: > > "vdpa/mlx5: pre-create virtq at probe time" > > On 6/16/22 04:30, Li Zhang wrote: > > dev_config operation is called in LM progress. > > LM time is very critical because all > > the VM packets are dropped directly at that time. > > > > Move the virtq creation to probe time and only modify the > > configuration later in the dev_config stage using the new ability to > > modify virtq. > > > > This optimization accelerates the LM process and reduces its time by > > 70%. > > Nice. > > > Signed-off-by: Li Zhang > > Acked-by: Matan Azrad > > --- > > doc/guides/rel_notes/release_22_07.rst | 4 + > > drivers/vdpa/mlx5/mlx5_vdpa.h | 4 + > > drivers/vdpa/mlx5/mlx5_vdpa_lm.c | 13 +- > > drivers/vdpa/mlx5/mlx5_vdpa_virtq.c| 257 +++-- > > 4 files changed, 174 insertions(+), 104 deletions(-) > > > > diff --git a/doc/guides/rel_notes/release_22_07.rst > > b/doc/guides/rel_notes/release_22_07.rst > > index f2cf41def9..2056cd9ee7 100644 > > --- a/doc/guides/rel_notes/release_22_07.rst > > +++ b/doc/guides/rel_notes/release_22_07.rst > > @@ -175,6 +175,10 @@ New Features > > This is a fall-back implementation for platforms that > > don't support vector operations. > > > > +* **Updated Nvidia mlx5 vDPA driver.** > > + > > + * Added new devargs ``queue_size`` and ``queues`` to allow prior > creation of virtq resources. > > + > > > > Removed Items > > - > > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h > > b/drivers/vdpa/mlx5/mlx5_vdpa.h index bf82026e37..e5553079fe 100644 > > --- a/drivers/vdpa/mlx5/mlx5_vdpa.h > > +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h > > @@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq { > > uint16_t vq_size; > > uint8_t notifier_state; > > bool stopped; > > + uint32_t configured:1; > > uint32_t version; > > struct mlx5_vdpa_priv *priv; > > struct mlx5_devx_obj *virtq; > > @@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct > mlx5_vdpa_priv *priv, int qid); > >*/ > > void > > mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv); > > + > > +bool > > +mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv); > > #endif /* RTE_PMD_MLX5_VDPA_H_ */ > > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c > > b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c > > index 43a2b98255..a8faf0c116 100644 > > --- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c > > +++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c > > @@ -12,14 +12,17 @@ int > > mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable) > > { > > struct mlx5_devx_virtq_attr attr = { > > - .type = > MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE, > > + .mod_fields_bitmap = > > + MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE, > > .dirty_bitmap_dump_enable = enable, > > }; > > + struct mlx5_vdpa_virtq *virtq; > > int i; > > > > for (i = 0; i < priv->nr_virtqs; ++i) { > > attr.queue_index = i; > > - if (!priv->virtqs[i].virtq) { > > + virtq = &priv->virtqs[i]; > > + if (!virtq->configured) { > > DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap " > > "enabling.", i); > > Please avoid cutting logs, it makes it harder to grep in the code. > Also, now we can have up to 100 chars, so maybe it would fit anyway. > > > Other than that: > > Reviewed-by: Maxime Coquelin > > Thanks, > Maxime
RE: [PATCH v2 02/15] vdpa/mlx5: support pre create virtq resource
Thanks for your comment and will fix it on V3. Regards, Li Zhang > -Original Message- > From: Maxime Coquelin > Sent: Friday, June 17, 2022 11:37 PM > To: Li Zhang ; Ori Kam ; Slava > Ovsiienko ; Matan Azrad ; > Shahaf Shuler > Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL) > ; Raslan Darawsheh ; Roni > Bar Yanai ; Yajun Wu > Subject: Re: [PATCH v2 02/15] vdpa/mlx5: support pre create virtq resource > > External email: Use caution opening links or attachments > > > On 6/16/22 04:29, Li Zhang wrote: > > From: Yajun Wu > > > > The motivation of this change is to reduce vDPA device queue creation > > time by create some queue resource in vDPA device probe stage. > > s/create/creating/ > > > > > In VM live migration scenario, this can reduce 0.8ms for each queue > > creation, thus reduce LM network downtime. > > > > To create queue resource(umem/counter) in advance, we need to know > > virtio queue depth and max number of queue VM will use. > > > > Introduce two new devargs: queues(max queue pair number) and > > queue_size (queue depth). Two args must be both provided, if only one > > argument provided, the argument will be ignored and no pre-creation. > > > > The queues and queue_size must also be identical to vhost > > configuration driver later receive. Otherwise either the pre-create > > resource is wasted or missing or the resource need destroy and > > recreate(in case queue_size mismatch). > > > > Pre-create umem/counter will keep alive until vDPA device removal. > > > > Signed-off-by: Yajun Wu > > Acked-by: Matan Azrad > > --- > > doc/guides/vdpadevs/mlx5.rst | 14 +++ > > drivers/vdpa/mlx5/mlx5_vdpa.c | 75 > ++- > > drivers/vdpa/mlx5/mlx5_vdpa.h | 2 + > > 3 files changed, 89 insertions(+), 2 deletions(-) > > > > Reviewed-by: Maxime Coquelin > > Thanks, > Maxime
[PATCH v3 00/15] mlx5/vdpa: optimize live migration time
Allow the driver to use internal threads to obtain fast configuration. All the threads will be open on the same core of the event completion queue scheduling thread. Add max_conf_threads parameter to configure the maximum number of internal threads in addition to the caller thread (8 is suggested). These internal threads to pipeline handle VDPA tasks in system and shared with all VDPA devices. Default is 0, don't use internal threads for configuration. Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time") http://patchwork.dpdk.org/project/dpdk/list/?series=21868 RFC ("Add vDPA multi-threads optiomization") https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-l...@nvidia.com/ V2: * Drop eal device removal patch in series. * Add release note in release_22_07.rst. V3: * Fix comments about commit log issue. * Avoid cutting logs. Li Zhang (12): vdpa/mlx5: fix usage of capability for max number of virtqs common/mlx5: extend virtq modifiable fields vdpa/mlx5: pre-create virtq at probe time vdpa/mlx5: optimize datapath-control synchronization vdpa/mlx5: add multi-thread management for configuration vdpa/mlx5: add task ring for MT management vdpa/mlx5: add MT task for VM memory registration vdpa/mlx5: add virtq creation task for MT management vdpa/mlx5: add virtq LM log task vdpa/mlx5: add device close task vdpa/mlx5: add virtq sub-resources creation vdpa/mlx5: prepare virtqueue resource creation Yajun Wu (3): vdpa/mlx5: support pre create virtq resource common/mlx5: add DevX API to move QP to reset state vdpa/mlx5: support event qp reuse doc/guides/rel_notes/release_22_07.rst | 5 + doc/guides/vdpadevs/mlx5.rst | 25 + drivers/common/mlx5/mlx5_devx_cmds.c | 77 ++- drivers/common/mlx5/mlx5_devx_cmds.h | 6 +- drivers/common/mlx5/mlx5_prm.h | 30 +- drivers/vdpa/mlx5/meson.build | 1 + drivers/vdpa/mlx5/mlx5_vdpa.c | 270 -- drivers/vdpa/mlx5/mlx5_vdpa.h | 152 +- drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 360 ++ drivers/vdpa/mlx5/mlx5_vdpa_event.c| 160 -- drivers/vdpa/mlx5/mlx5_vdpa_lm.c | 132 +++-- drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 270 ++ drivers/vdpa/mlx5/mlx5_vdpa_steer.c| 22 +- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c| 654 ++--- 14 files changed, 1777 insertions(+), 387 deletions(-) create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c -- 2.31.1
[PATCH v3 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs
The driver wrongly takes the capability value for the number of virtq pairs instead of just the number of virtqs. Adjust all the usages of it to be the number of virtqs. Fixes: c2eb33a ("vdpa/mlx5: manage virtqs by array") Cc: sta...@dpdk.org Signed-off-by: Li Zhang Acked-by: Matan Azrad Reviewed-by: Maxime Coquelin --- drivers/vdpa/mlx5/mlx5_vdpa.c | 12 ++-- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 6 +++--- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index 76fa5d4299..ee71339b78 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -84,7 +84,7 @@ mlx5_vdpa_get_queue_num(struct rte_vdpa_device *vdev, uint32_t *queue_num) DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name); return -1; } - *queue_num = priv->caps.max_num_virtio_queues; + *queue_num = priv->caps.max_num_virtio_queues / 2; return 0; } @@ -141,7 +141,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state) DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name); return -EINVAL; } - if (vring >= (int)priv->caps.max_num_virtio_queues * 2) { + if (vring >= (int)priv->caps.max_num_virtio_queues) { DRV_LOG(ERR, "Too big vring id: %d.", vring); return -E2BIG; } @@ -388,7 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid, DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name); return -ENODEV; } - if (qid >= (int)priv->caps.max_num_virtio_queues * 2) { + if (qid >= (int)priv->caps.max_num_virtio_queues) { DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid, vdev->device->name); return -E2BIG; @@ -411,7 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid) DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name); return -ENODEV; } - if (qid >= (int)priv->caps.max_num_virtio_queues * 2) { + if (qid >= (int)priv->caps.max_num_virtio_queues) { DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid, vdev->device->name); return -E2BIG; @@ -624,7 +624,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev, DRV_LOG(DEBUG, "No capability to support virtq statistics."); priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv) + sizeof(struct mlx5_vdpa_virtq) * - attr->vdpa.max_num_virtio_queues * 2, + attr->vdpa.max_num_virtio_queues, RTE_CACHE_LINE_SIZE); if (!priv) { DRV_LOG(ERR, "Failed to allocate private memory."); @@ -685,7 +685,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv) uint32_t i; mlx5_vdpa_dev_cache_clean(priv); - for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) { + for (i = 0; i < priv->caps.max_num_virtio_queues; i++) { if (!priv->virtqs[i].counters) continue; claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters)); diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c index e025be47d2..c258eb3024 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c @@ -72,7 +72,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv) { unsigned int i, j; - for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) { + for (i = 0; i < priv->caps.max_num_virtio_queues; i++) { struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i]; for (j = 0; j < RTE_DIM(virtq->umems); ++j) { @@ -492,9 +492,9 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv) DRV_LOG(INFO, "TSO is enabled without CSUM, force CSUM."); priv->features |= (1ULL << VIRTIO_NET_F_CSUM); } - if (nr_vring > priv->caps.max_num_virtio_queues * 2) { + if (nr_vring > priv->caps.max_num_virtio_queues) { DRV_LOG(ERR, "Do not support more than %d virtqs(%d).", - (int)priv->caps.max_num_virtio_queues * 2, + (int)priv->caps.max_num_virtio_queues, (int)nr_vring); return -1; } -- 2.31.1
[PATCH v3 02/15] vdpa/mlx5: support pre create virtq resource
From: Yajun Wu The motivation of this change is to reduce vDPA device queue creation time by creating some queue resource in vDPA device probe stage. In VM live migration scenario, this can reduce 0.8ms for each queue creation, thus reduce LM network downtime. To create queue resource(umem/counter) in advance, we need to know virtio queue depth and max number of queue VM will use. Introduce two new devargs: queues(max queue pair number) and queue_size (queue depth). Two args must be both provided, if only one argument provided, the argument will be ignored and no pre-creation. The queues and queue_size must also be identical to vhost configuration driver later receive. Otherwise either the pre-create resource is wasted or missing or the resource need destroy and recreate(in case queue_size mismatch). Pre-create umem/counter will keep alive until vDPA device removal. Signed-off-by: Yajun Wu Acked-by: Matan Azrad Reviewed-by: Maxime Coquelin --- doc/guides/vdpadevs/mlx5.rst | 14 +++ drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++- drivers/vdpa/mlx5/mlx5_vdpa.h | 2 + 3 files changed, 89 insertions(+), 2 deletions(-) diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst index 3ded142311..0ad77bf535 100644 --- a/doc/guides/vdpadevs/mlx5.rst +++ b/doc/guides/vdpadevs/mlx5.rst @@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 drivers. - 0, HW default. +- ``queue_size`` parameter [int] + + - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up +first time queue creation. Set it together with queues devarg. + + - 0, default value, no pre-create virtq resource. + +- ``queues`` parameter [int] + + - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx queue) +for pre-create queue resource to speed up first time queue creation. Set it +together with queue_size devarg. + + - 0, default value, no pre-create virtq resource. Error handling ^^ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index ee71339b78..faf833ee2f 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv) static void mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv) { - mlx5_vdpa_virtqs_cleanup(priv); + /* Clean pre-created resource in dev removal only. */ + if (!priv->queues) + mlx5_vdpa_virtqs_cleanup(priv); mlx5_vdpa_mem_dereg(priv); } @@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque) priv->hw_max_latency_us = (uint32_t)tmp; } else if (strcmp(key, "hw_max_pending_comp") == 0) { priv->hw_max_pending_comp = (uint32_t)tmp; + } else if (strcmp(key, "queue_size") == 0) { + priv->queue_size = (uint16_t)tmp; + } else if (strcmp(key, "queues") == 0) { + priv->queues = (uint16_t)tmp; + } else { + DRV_LOG(WARNING, "Invalid key %s.", key); } return 0; } @@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist, if (!priv->event_us && priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER) priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US; + if ((priv->queue_size && !priv->queues) || + (!priv->queue_size && priv->queues)) { + priv->queue_size = 0; + priv->queues = 0; + DRV_LOG(WARNING, "Please provide both queue_size and queues."); + } DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode); DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us); DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max); + DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues, + priv->queue_size); +} + +static int +mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) +{ + uint32_t index; + uint32_t i; + + if (!priv->queues) + return 0; + for (index = 0; index < (priv->queues * 2); ++index) { + struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index]; + + if (priv->caps.queue_counters_valid) { + if (!virtq->counters) + virtq->counters = + mlx5_devx_cmd_create_virtio_q_counters + (priv->cdev->ctx); + if (!virtq->counters) { + DRV_LOG(ERR, "Failed to create virtq couners for virtq" + " %d.", index); + return -1; + } + } + for (i = 0; i < RTE_DIM(virtq->umems); ++i) { + uint32_t size; + void *buf; +
[PATCH v3 04/15] vdpa/mlx5: support event qp reuse
From: Yajun Wu To speed up queue create time, event qp and cq will create only once. Each virtq creation will reuse same event qp and cq. Because FW will set event qp to error state during virtq destroy, need modify event qp to RESET state, then modify qp to RTS state as usual. This can save about 1.5ms for each virtq creation. After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as previous. Add new variable qp_ci to save SW qp ci. Move qp pi independently with cq ci. Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq release. Signed-off-by: Yajun Wu Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.c | 8 drivers/vdpa/mlx5/mlx5_vdpa.h | 12 +- drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++-- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 6 +-- 4 files changed, 78 insertions(+), 8 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index faf833ee2f..ee99952e11 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid) } mlx5_vdpa_steer_unset(priv); mlx5_vdpa_virtqs_release(priv); + mlx5_vdpa_drain_cq(priv); if (priv->lm_mr.addr) mlx5_os_wrapped_mkey_destroy(&priv->lm_mr); priv->state = MLX5_VDPA_STATE_PROBED; @@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) return 0; for (index = 0; index < (priv->queues * 2); ++index) { struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index]; + int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size, + -1, &virtq->eqp); + if (ret) { + DRV_LOG(ERR, "Failed to create event QPs for virtq %d.", + index); + return -1; + } if (priv->caps.queue_counters_valid) { if (!virtq->counters) virtq->counters = diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index f6719a3c60..bf82026e37 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp { struct mlx5_vdpa_cq cq; struct mlx5_devx_obj *fw_qp; struct mlx5_devx_qp sw_qp; + uint16_t qp_pi; }; struct mlx5_vdpa_query_mr { @@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv); * @return * 0 on success, -1 otherwise and rte_errno is set. */ -int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n, +int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n, int callfd, struct mlx5_vdpa_event_qp *eqp); /** @@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid, */ int mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid); + +/** + * Drain virtq CQ CQE. + * + * @param[in] priv + * The vdpa driver private structure. + */ +void +mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv); #endif /* RTE_PMD_MLX5_VDPA_H_ */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c index 7167a98db0..b43dca9255 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c @@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq) }; uint32_t word; } last_word; - uint16_t next_wqe_counter = cq->cq_ci; + uint16_t next_wqe_counter = eqp->qp_pi; uint16_t cur_wqe_counter; uint16_t comp; @@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq) rte_io_wmb(); /* Ring CQ doorbell record. */ cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci); + eqp->qp_pi += comp; rte_io_wmb(); /* Ring SW QP doorbell record. */ - eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size); + eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size); } return comp; } @@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv) return max; } +void +mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv) +{ + unsigned int i; + + for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) { + struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq; + + mlx5_vdpa_queue_complete(cq); + if (cq->cq_obj.cq) { + cq->cq_obj.cqes[0].wqe_counter = + rte_cpu_to_be_16(UINT16_MAX); + priv->virtqs[i].eqp.qp_pi = 0; + if (!cq->armed) + mlx5_vdpa_cq_arm(priv, cq); + } + } +} + /* Wait on all CQs channel for completi
[PATCH v3 03/15] common/mlx5: add DevX API to move QP to reset state
From: Yajun Wu Support set QP to RESET state. Signed-off-by: Yajun Wu Acked-by: Matan Azrad Reviewed-by: Maxime Coquelin --- drivers/common/mlx5/mlx5_devx_cmds.c | 7 +++ drivers/common/mlx5/mlx5_prm.h | 17 + 2 files changed, 24 insertions(+) diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c index c6bdbc12bb..1d6d6578d6 100644 --- a/drivers/common/mlx5/mlx5_devx_cmds.c +++ b/drivers/common/mlx5/mlx5_devx_cmds.c @@ -2264,11 +2264,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op, uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)]; uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)]; uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)]; + uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)]; } in; union { uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)]; uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)]; uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)]; + uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)]; } out; void *qpc; int ret; @@ -2311,6 +2313,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op, inlen = sizeof(in.rtr2rts); outlen = sizeof(out.rtr2rts); break; + case MLX5_CMD_OP_QP_2RST: + MLX5_SET(2rst_qp_in, &in, qpn, qp->id); + inlen = sizeof(in.qp2rst); + outlen = sizeof(out.qp2rst); + break; default: DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.", qp_st_mod_op); diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h index bc3e70a1d1..8a2f55c33e 100644 --- a/drivers/common/mlx5/mlx5_prm.h +++ b/drivers/common/mlx5/mlx5_prm.h @@ -3657,6 +3657,23 @@ struct mlx5_ifc_init2init_qp_in_bits { u8 reserved_at_800[0x80]; }; +struct mlx5_ifc_2rst_qp_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + u8 syndrome[0x20]; + u8 reserved_at_40[0x40]; +}; + +struct mlx5_ifc_2rst_qp_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + u8 vhca_tunnel_id[0x10]; + u8 op_mod[0x10]; + u8 reserved_at_80[0x8]; + u8 qpn[0x18]; + u8 reserved_at_a0[0x20]; +}; + struct mlx5_ifc_dealloc_pd_out_bits { u8 status[0x8]; u8 reserved_0[0x18]; -- 2.31.1
[PATCH v3 05/15] common/mlx5: extend virtq modifiable fields
A virtq configuration can be modified after the virtq creation. Added the following modifiable fields: 1.address fields: desc_addr/used_addr/available_addr 2.hw_available_index 3.hw_used_index 4.virtio_q_type 5.version type 6.queue mkey 7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum 8.event mode: event_mode/event_qpn_or_msix Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++- drivers/common/mlx5/mlx5_devx_cmds.h | 6 ++- drivers/common/mlx5/mlx5_prm.h | 13 +- 3 files changed, 76 insertions(+), 13 deletions(-) diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c index 1d6d6578d6..1b68c37092 100644 --- a/drivers/common/mlx5/mlx5_devx_cmds.c +++ b/drivers/common/mlx5/mlx5_devx_cmds.c @@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx, vdpa_attr->log_doorbell_stride = MLX5_GET(virtio_emulation_cap, hcattr, log_doorbell_stride); + vdpa_attr->vnet_modify_ext = + MLX5_GET(virtio_emulation_cap, hcattr, +vnet_modify_ext); + vdpa_attr->virtio_net_q_addr_modify = + MLX5_GET(virtio_emulation_cap, hcattr, +virtio_net_q_addr_modify); + vdpa_attr->virtio_q_index_modify = + MLX5_GET(virtio_emulation_cap, hcattr, +virtio_q_index_modify); vdpa_attr->log_doorbell_bar_size = MLX5_GET(virtio_emulation_cap, hcattr, log_doorbell_bar_size); @@ -2074,27 +2083,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj, MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type, MLX5_GENERAL_OBJ_TYPE_VIRTQ); MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id); - MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type); + MLX5_SET64(virtio_net_q, virtq, modify_field_select, + attr->mod_fields_bitmap); MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index); - switch (attr->type) { - case MLX5_VIRTQ_MODIFY_TYPE_STATE: + if (!attr->mod_fields_bitmap) { + DRV_LOG(ERR, "Failed to modify VIRTQ for no type set."); + rte_errno = EINVAL; + return -rte_errno; + } + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE) MLX5_SET16(virtio_net_q, virtq, state, attr->state); - break; - case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS: + if (attr->mod_fields_bitmap & + MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) { MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey, attr->dirty_bitmap_mkey); MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr, attr->dirty_bitmap_addr); MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size, attr->dirty_bitmap_size); - break; - case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE: + } + if (attr->mod_fields_bitmap & + MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE) MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable, attr->dirty_bitmap_dump_enable); - break; - default: - rte_errno = EINVAL; - return -rte_errno; + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) { + MLX5_SET(virtio_q, virtctx, queue_period_mode, + attr->hw_latency_mode); + MLX5_SET(virtio_q, virtctx, queue_period_us, + attr->hw_max_latency_us); + MLX5_SET(virtio_q, virtctx, queue_max_count, + attr->hw_max_pending_comp); + } + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) { + MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr); + MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr); + MLX5_SET64(virtio_q, virtctx, available_addr, + attr->available_addr); + } + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX) + MLX5_SET16(virtio_net_q, virtq, hw_available_index, + attr->hw_available_index); + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX) + MLX5_SET16(virtio_net_q, virtq, hw_used_index, + attr->hw_used_index); + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE) + MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type); + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0) + MLX5_SET16(virtio_q,
[PATCH v3 06/15] vdpa/mlx5: pre-create virtq at probe time
dev_config operation is called in LM progress. LM time is very critical because all the VM packets are dropped directly at that time. Move the virtq creation to probe time and only modify the configuration later in the dev_config stage using the new ability to modify virtq. This optimization accelerates the LM process and reduces its time by 70%. Signed-off-by: Li Zhang Acked-by: Matan Azrad Reviewed-by: Maxime Coquelin --- doc/guides/rel_notes/release_22_07.rst | 4 + drivers/vdpa/mlx5/mlx5_vdpa.h | 4 + drivers/vdpa/mlx5/mlx5_vdpa_lm.c | 19 +- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c| 257 +++-- 4 files changed, 176 insertions(+), 108 deletions(-) diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst index f2cf41def9..2056cd9ee7 100644 --- a/doc/guides/rel_notes/release_22_07.rst +++ b/doc/guides/rel_notes/release_22_07.rst @@ -175,6 +175,10 @@ New Features This is a fall-back implementation for platforms that don't support vector operations. +* **Updated Nvidia mlx5 vDPA driver.** + + * Added new devargs ``queue_size`` and ``queues`` to allow prior creation of virtq resources. + Removed Items - diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index bf82026e37..e5553079fe 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq { uint16_t vq_size; uint8_t notifier_state; bool stopped; + uint32_t configured:1; uint32_t version; struct mlx5_vdpa_priv *priv; struct mlx5_devx_obj *virtq; @@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid); */ void mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv); + +bool +mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv); #endif /* RTE_PMD_MLX5_VDPA_H_ */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c index 43a2b98255..284758ad56 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c @@ -12,20 +12,21 @@ int mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable) { struct mlx5_devx_virtq_attr attr = { - .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE, + .mod_fields_bitmap = + MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE, .dirty_bitmap_dump_enable = enable, }; + struct mlx5_vdpa_virtq *virtq; int i; for (i = 0; i < priv->nr_virtqs; ++i) { attr.queue_index = i; - if (!priv->virtqs[i].virtq) { - DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap " - "enabling.", i); + virtq = &priv->virtqs[i]; + if (!virtq->configured) { + DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap enabling.", i); } else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq, &attr)) { - DRV_LOG(ERR, "Failed to modify virtq %d for dirty " - "bitmap enabling.", i); + DRV_LOG(ERR, "Failed to modify virtq %d for dirty bitmap enabling.", i); return -1; } } @@ -37,10 +38,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base, uint64_t log_size) { struct mlx5_devx_virtq_attr attr = { - .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS, + .mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS, .dirty_bitmap_addr = log_base, .dirty_bitmap_size = log_size, }; + struct mlx5_vdpa_virtq *virtq; int i; int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd, priv->cdev->pdn, @@ -54,7 +56,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base, attr.dirty_bitmap_mkey = priv->lm_mr.lkey; for (i = 0; i < priv->nr_virtqs; ++i) { attr.queue_index = i; - if (!priv->virtqs[i].virtq) { + virtq = &priv->virtqs[i]; + if (!virtq->configured) { DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i); } else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq, &attr)) { diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c index 6637ba1503..6e08d619e4 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c @@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv) for (i = 0; i < priv->caps.max_num_virtio_queues; i++) { struct mlx5_vdpa
[PATCH v3 09/15] vdpa/mlx5: add task ring for MT management
The configuration threads tasks need a container to support multiple tasks assigned to a thread in parallel. Use rte_ring container per thread to manage the thread tasks without locks. The caller thread from the user context opens a task to a thread and enqueue it to the thread ring. The thread polls its ring and dequeue tasks. That’s why the ring should be in multi-producer and single consumer mode. Anatomic counter manages the tasks completion notification. The threads report errors to the caller by a dedicated error counter per task. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.h | 17 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +- 2 files changed, 130 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index 4e7c2557b7..2bbb868ec6 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -74,10 +74,22 @@ enum { }; #define MLX5_VDPA_MAX_C_THRD 256 +#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096 +#define MLX5_VDPA_TASKS_PER_DEV 64 + +/* Generic task information and size must be multiple of 4B. */ +struct mlx5_vdpa_task { + struct mlx5_vdpa_priv *priv; + uint32_t *remaining_cnt; + uint32_t *err_cnt; + uint32_t idx; +} __rte_packed __rte_aligned(4); /* Generic mlx5_vdpa_c_thread information. */ struct mlx5_vdpa_c_thread { pthread_t tid; + struct rte_ring *rng; + pthread_cond_t c_cond; }; struct mlx5_vdpa_conf_thread_mng { @@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core); */ void mlx5_vdpa_mult_threads_destroy(bool need_unlock); + +bool +mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, + uint32_t thrd_idx, + uint32_t num); #endif /* RTE_PMD_MLX5_VDPA_H_ */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c index ba7d8b63b3..1fdc92d3ad 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c @@ -11,17 +11,103 @@ #include #include #include +#include #include #include "mlx5_vdpa_utils.h" #include "mlx5_vdpa.h" +static inline uint32_t +mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r, + void **obj, uint32_t n, uint32_t *avail) +{ + uint32_t m; + + m = rte_ring_dequeue_bulk_elem_start(r, obj, + sizeof(struct mlx5_vdpa_task), n, avail); + n = (m == n) ? n : 0; + rte_ring_dequeue_elem_finish(r, n); + return n; +} + +static inline uint32_t +mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r, + void * const *obj, uint32_t n, uint32_t *free) +{ + uint32_t m; + + m = rte_ring_enqueue_bulk_elem_start(r, n, free); + n = (m == n) ? n : 0; + rte_ring_enqueue_elem_finish(r, obj, + sizeof(struct mlx5_vdpa_task), n); + return n; +} + +bool +mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, + uint32_t thrd_idx, + uint32_t num) +{ + struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng; + struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV]; + uint32_t i; + + MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV); + for (i = 0 ; i < num; i++) { + task[i].priv = priv; + /* To be added later. */ + } + if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL)) + return -1; + for (i = 0 ; i < num; i++) + if (task[i].remaining_cnt) + __atomic_fetch_add(task[i].remaining_cnt, 1, + __ATOMIC_RELAXED); + /* wake up conf thread. */ + pthread_mutex_lock(&conf_thread_mng.cthrd_lock); + pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond); + pthread_mutex_unlock(&conf_thread_mng.cthrd_lock); + return 0; +} + static void * mlx5_vdpa_c_thread_handle(void *arg) { - /* To be added later. */ - return arg; + struct mlx5_vdpa_conf_thread_mng *multhrd = arg; + pthread_t thread_id = pthread_self(); + struct mlx5_vdpa_priv *priv; + struct mlx5_vdpa_task task; + struct rte_ring *rng; + uint32_t thrd_idx; + uint32_t task_num; + + for (thrd_idx = 0; thrd_idx < multhrd->max_thrds; + thrd_idx++) + if (multhrd->cthrd[thrd_idx].tid == thread_id) + break; + if (thrd_idx >= multhrd->max_thrds) + return NULL; + rng = multhrd->cthrd[thrd_idx].rng; + while (1) { + task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng, + (void **)&task, 1, NULL); + if (!task_num) { + /* No task and condition wait. */ + pthread_mutex_lock(&multhrd->cthrd_lock); + pthread_cond_wait( + &multhrd->cthrd[thrd_idx].c_cond, +
[PATCH v3 08/15] vdpa/mlx5: add multi-thread management for configuration
The LM process includes a lot of objects creations and destructions in the source and the destination servers. As much as LM time increases, the packet drop of the VM increases. To improve LM time need to parallel the configurations for mlx5 FW. Add internal multi-thread management in the driver for it. A new devarg defines the number of threads and their CPU. The management is shared between all the devices of the driver. Since the event_core also affects the datapath events thread, reduce the priority of the datapath event thread to allow fast configuration of the devices doing the LM. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- doc/guides/vdpadevs/mlx5.rst | 11 +++ drivers/vdpa/mlx5/meson.build | 1 + drivers/vdpa/mlx5/mlx5_vdpa.c | 41 drivers/vdpa/mlx5/mlx5_vdpa.h | 36 +++ drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++ drivers/vdpa/mlx5/mlx5_vdpa_event.c | 2 +- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 8 +- 7 files changed, 223 insertions(+), 5 deletions(-) create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst index 0ad77bf535..b75a01688d 100644 --- a/doc/guides/vdpadevs/mlx5.rst +++ b/doc/guides/vdpadevs/mlx5.rst @@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 drivers. CPU core number to set polling thread affinity to, default to control plane cpu. +- ``max_conf_threads`` parameter [int] + + Allow the driver to use internal threads to obtain fast configuration. + All the threads will be open on the same core of the event completion queue scheduling thread. + + - 0, default, don't use internal threads for configuration. + + - 1 - 256, number of internal threads in addition to the caller thread (8 is suggested). +This value, if not 0, should be the same for all the devices; +the first prob will take it with the event_core for all the multi-thread configurations in the driver. + - ``hw_latency_mode`` parameter [int] The completion queue moderation mode: diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build index 0fa82ad257..9d8dbb1a82 100644 --- a/drivers/vdpa/mlx5/meson.build +++ b/drivers/vdpa/mlx5/meson.build @@ -15,6 +15,7 @@ sources = files( 'mlx5_vdpa_virtq.c', 'mlx5_vdpa_steer.c', 'mlx5_vdpa_lm.c', +'mlx5_vdpa_cthread.c', ) cflags_options = [ '-std=c11', diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index e5a11f72fd..a9d023ed08 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list = TAILQ_HEAD_INITIALIZER(priv_list); static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER; +struct mlx5_vdpa_conf_thread_mng conf_thread_mng; + static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv); static struct mlx5_vdpa_priv * @@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque) DRV_LOG(WARNING, "Invalid event_core %s.", val); else priv->event_core = tmp; + } else if (strcmp(key, "max_conf_threads") == 0) { + if (tmp) { + priv->use_c_thread = true; + if (!conf_thread_mng.initializer_priv) { + conf_thread_mng.initializer_priv = priv; + if (tmp > MLX5_VDPA_MAX_C_THRD) { + DRV_LOG(WARNING, + "Invalid max_conf_threads %s " + "and set max_conf_threads to %d", + val, MLX5_VDPA_MAX_C_THRD); + tmp = MLX5_VDPA_MAX_C_THRD; + } + conf_thread_mng.max_thrds = tmp; + } else if (tmp != conf_thread_mng.max_thrds) { + DRV_LOG(WARNING, + "max_conf_threads is PMD argument and not per device, " + "only the first device configuration set it, current value is %d " + "and will not be changed to %d.", + conf_thread_mng.max_thrds, (int)tmp); + } + } else { + priv->use_c_thread = false; + } } else if (strcmp(key, "hw_latency_mode") == 0) { priv->hw_latency_mode = (uint32_t)tmp; } else if (strcmp(key, "hw_max_latency_us") == 0) { @@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist, "hw_max_latency_us", "hw_max_pending_comp", "no_traffic_time", + "queue_size", + "queues", + "max_conf_threads",
[PATCH v3 07/15] vdpa/mlx5: optimize datapath-control synchronization
The driver used a single global lock for any synchronization needed for the datapath and control path. It is better to group the critical sections with the other ones that should be synchronized. Replace the global lock with the following locks: 1.virtq locks(per virtq) synchronize datapath polling and parallel configurations on the same virtq. 2.A doorbell lock synchronizes doorbell update, which is shared for all the virtqs in the device. 3.A steering lock for the shared steering objects updates. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.c | 24 --- drivers/vdpa/mlx5/mlx5_vdpa.h | 13 ++-- drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++--- drivers/vdpa/mlx5/mlx5_vdpa_lm.c| 34 +++--- drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 7 ++- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++--- 6 files changed, 184 insertions(+), 79 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index ee99952e11..e5a11f72fd 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state) struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid); struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev); + struct mlx5_vdpa_virtq *virtq; int ret; if (priv == NULL) { @@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state) DRV_LOG(ERR, "Too big vring id: %d.", vring); return -E2BIG; } - pthread_mutex_lock(&priv->vq_config_lock); + virtq = &priv->virtqs[vring]; + pthread_mutex_lock(&virtq->virtq_lock); ret = mlx5_vdpa_virtq_enable(priv, vring, state); - pthread_mutex_unlock(&priv->vq_config_lock); + pthread_mutex_unlock(&virtq->virtq_lock); return ret; } @@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid) ret |= mlx5_vdpa_lm_log(priv); priv->state = MLX5_VDPA_STATE_IN_PROGRESS; } + pthread_mutex_lock(&priv->steer_update_lock); mlx5_vdpa_steer_unset(priv); + pthread_mutex_unlock(&priv->steer_update_lock); mlx5_vdpa_virtqs_release(priv); mlx5_vdpa_drain_cq(priv); if (priv->lm_mr.addr) @@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid) if (!priv->connected) mlx5_vdpa_dev_cache_clean(priv); priv->vid = 0; - /* The mutex may stay locked after event thread cancel - initiate it. */ - pthread_mutex_init(&priv->vq_config_lock, NULL); DRV_LOG(INFO, "vDPA device %d was closed.", vid); return ret; } @@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist, static int mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) { + struct mlx5_vdpa_virtq *virtq; uint32_t index; uint32_t i; + for (index = 0; index < priv->caps.max_num_virtio_queues * 2; + index++) { + virtq = &priv->virtqs[index]; + pthread_mutex_init(&virtq->virtq_lock, NULL); + } if (!priv->queues) return 0; for (index = 0; index < (priv->queues * 2); ++index) { - struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index]; + virtq = &priv->virtqs[index]; int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size, - -1, &virtq->eqp); + -1, virtq); if (ret) { DRV_LOG(ERR, "Failed to create event QPs for virtq %d.", @@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev, priv->num_lag_ports = attr->num_lag_ports; if (attr->num_lag_ports == 0) priv->num_lag_ports = 1; - pthread_mutex_init(&priv->vq_config_lock, NULL); + rte_spinlock_init(&priv->db_lock); + pthread_mutex_init(&priv->steer_update_lock, NULL); priv->cdev = cdev; mlx5_vdpa_config_get(mkvlist, priv); if (mlx5_vdpa_create_dev_resources(priv)) @@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv) mlx5_vdpa_release_dev_resources(priv); if (priv->vdev) rte_vdpa_unregister_device(priv->vdev); - pthread_mutex_destroy(&priv->vq_config_lock); rte_free(priv); } diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index e5553079fe..3fd5eefc5e 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq { bool stopped; uint32_t configured:1; uint32_t version; + pthread_mutex_t virtq_lock; struct mlx5_vdpa_priv *priv; struct mlx5_devx_obj *virtq; struct mlx5_devx_obj *counters; @@ -126,7 +127,8 @@ struct mlx5_vdp
[PATCH v3 10/15] vdpa/mlx5: add MT task for VM memory registration
The driver creates a direct MR object of the HW for each VM memory region, which maps the VM physical address to the actual physical address. Later, after all the MRs are ready, the driver creates an indirect MR to group all the direct MRs into one virtual space from the HW perspective. Create direct MRs in parallel using the MT mechanism. After completion, the primary thread creates the indirect MR needed for the following virtqs configurations. This optimization accelerrate the LM process and reduce its time by 5%. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.c | 1 - drivers/vdpa/mlx5/mlx5_vdpa.h | 31 ++- drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 47 - drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 270 ++ drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 6 +- 5 files changed, 258 insertions(+), 97 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index a9d023ed08..e3b32fa087 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev, rte_errno = rte_errno ? rte_errno : EINVAL; goto error; } - SLIST_INIT(&priv->mr_list); pthread_mutex_lock(&priv_list_lock); TAILQ_INSERT_TAIL(&priv_list, priv, next); pthread_mutex_unlock(&priv_list_lock); diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index 2bbb868ec6..3316ce42be 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp { }; struct mlx5_vdpa_query_mr { - SLIST_ENTRY(mlx5_vdpa_query_mr) next; union { struct ibv_mr *mr; struct mlx5_devx_obj *mkey; @@ -76,10 +75,17 @@ enum { #define MLX5_VDPA_MAX_C_THRD 256 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096 #define MLX5_VDPA_TASKS_PER_DEV 64 +#define MLX5_VDPA_MAX_MRS 0x + +/* Vdpa task types. */ +enum mlx5_vdpa_task_type { + MLX5_VDPA_TASK_REG_MR = 1, +}; /* Generic task information and size must be multiple of 4B. */ struct mlx5_vdpa_task { struct mlx5_vdpa_priv *priv; + enum mlx5_vdpa_task_type type; uint32_t *remaining_cnt; uint32_t *err_cnt; uint32_t idx; @@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng { }; extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng; +struct mlx5_vdpa_vmem_info { + struct rte_vhost_memory *vmem; + uint32_t entries_num; + uint64_t gcd; + uint64_t size; + uint8_t mode; +}; + struct mlx5_vdpa_virtq { SLIST_ENTRY(mlx5_vdpa_virtq) next; uint8_t enable; @@ -176,7 +190,7 @@ struct mlx5_vdpa_priv { struct mlx5_hca_vdpa_attr caps; uint32_t gpa_mkey_index; struct ibv_mr *null_mr; - struct rte_vhost_memory *vmem; + struct mlx5_vdpa_vmem_info vmem_info; struct mlx5dv_devx_event_channel *eventc; struct mlx5dv_devx_event_channel *err_chnl; struct mlx5_uar uar; @@ -187,11 +201,13 @@ struct mlx5_vdpa_priv { uint8_t num_lag_ports; uint64_t features; /* Negotiated features. */ uint16_t log_max_rqt_size; + uint16_t last_c_thrd_idx; + uint16_t num_mrs; /* Number of memory regions. */ struct mlx5_vdpa_steer steer; struct mlx5dv_var *var; void *virtq_db_addr; struct mlx5_pmd_wrapped_mr lm_mr; - SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list; + struct mlx5_vdpa_query_mr **mrs; struct mlx5_vdpa_virtq virtqs[]; }; @@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock); bool mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, uint32_t thrd_idx, - uint32_t num); + enum mlx5_vdpa_task_type task_type, + uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt, + void **task_data, uint32_t num); +int +mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx); +bool +mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt, + uint32_t *err_cnt, uint32_t sleep_time); #endif /* RTE_PMD_MLX5_VDPA_H_ */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c index 1fdc92d3ad..10391931ae 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c @@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r, bool mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, uint32_t thrd_idx, - uint32_t num) + enum mlx5_vdpa_task_type task_type, + uint32_t *remaining_cnt, uint32_t *err_cnt, + void **task_data, uint32_t num) { struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng; struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV]; + uint32_t *data = (uint32_t *)task_data;
[PATCH v3 11/15] vdpa/mlx5: add virtq creation task for MT management
The virtq object and all its sub-resources use a lot of FW commands and can be accelerated by the MT management. Split the virtqs creation between the configuration threads. This accelerates the LM process and reduces its time by 20%. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.h | 9 +- drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 14 +++ drivers/vdpa/mlx5/mlx5_vdpa_event.c | 2 +- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 149 +++--- 4 files changed, 134 insertions(+), 40 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index 3316ce42be..35221f5ddc 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -80,6 +80,7 @@ enum { /* Vdpa task types. */ enum mlx5_vdpa_task_type { MLX5_VDPA_TASK_REG_MR = 1, + MLX5_VDPA_TASK_SETUP_VIRTQ, }; /* Generic task information and size must be multiple of 4B. */ @@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info { struct mlx5_vdpa_virtq { SLIST_ENTRY(mlx5_vdpa_virtq) next; - uint8_t enable; uint16_t index; uint16_t vq_size; uint8_t notifier_state; - bool stopped; uint32_t configured:1; + uint32_t enable:1; + uint32_t stopped:1; uint32_t version; pthread_mutex_t virtq_lock; struct mlx5_vdpa_priv *priv; @@ -565,11 +566,13 @@ bool mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, uint32_t thrd_idx, enum mlx5_vdpa_task_type task_type, - uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt, + uint32_t *remaining_cnt, uint32_t *err_cnt, void **task_data, uint32_t num); int mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx); bool mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt, uint32_t *err_cnt, uint32_t sleep_time); +int +mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick); #endif /* RTE_PMD_MLX5_VDPA_H_ */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c index 10391931ae..1389d369ae 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c @@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg) { struct mlx5_vdpa_conf_thread_mng *multhrd = arg; pthread_t thread_id = pthread_self(); + struct mlx5_vdpa_virtq *virtq; struct mlx5_vdpa_priv *priv; struct mlx5_vdpa_task task; struct rte_ring *rng; @@ -139,6 +140,19 @@ mlx5_vdpa_c_thread_handle(void *arg) __ATOMIC_RELAXED); } break; + case MLX5_VDPA_TASK_SETUP_VIRTQ: + virtq = &priv->virtqs[task.idx]; + pthread_mutex_lock(&virtq->virtq_lock); + ret = mlx5_vdpa_virtq_setup(priv, + task.idx, false); + if (ret) { + DRV_LOG(ERR, + "Failed to setup virtq %d.", task.idx); + __atomic_fetch_add( + task.err_cnt, 1, __ATOMIC_RELAXED); + } + pthread_mutex_unlock(&virtq->virtq_lock); + break; default: DRV_LOG(ERR, "Invalid vdpa task type %d.", task.type); diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c index b45fbac146..f782b6b832 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c @@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused) goto unlock; if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC) goto unlock; - virtq->stopped = true; + virtq->stopped = 1; /* Query error info. */ if (mlx5_vdpa_virtq_query(priv, vq_index)) goto log; diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c index 1f81fb8723..50d59a8394 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c @@ -111,8 +111,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv) for (i = 0; i < priv->caps.max_num_virtio_queues; i++) { struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i]; + if (virtq->index != i) + continue; pthread_mutex_lock(&virtq->virtq_lock); - virtq->configured = 0; for (j = 0; j < RTE_DIM(virtq->umems); ++j) { if (virtq->umems[j].obj) { claim_zero(mlx5_glue->devx_umem_dereg @@ -131,7 +132,6 @@ mlx5_vdpa_
[PATCH v3 12/15] vdpa/mlx5: add virtq LM log task
Split the virtqs LM log between the configuration threads. This accelerates the LM process and reduces its time by 20%. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.h | 3 + drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++ drivers/vdpa/mlx5/mlx5_vdpa_lm.c | 85 +-- 3 files changed, 105 insertions(+), 17 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index 35221f5ddc..e08931719f 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -72,6 +72,8 @@ enum { MLX5_VDPA_NOTIFIER_STATE_ERR }; +#define MLX5_VDPA_USED_RING_LEN(size) \ + ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) #define MLX5_VDPA_MAX_C_THRD 256 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096 #define MLX5_VDPA_TASKS_PER_DEV 64 @@ -81,6 +83,7 @@ enum { enum mlx5_vdpa_task_type { MLX5_VDPA_TASK_REG_MR = 1, MLX5_VDPA_TASK_SETUP_VIRTQ, + MLX5_VDPA_TASK_STOP_VIRTQ, }; /* Generic task information and size must be multiple of 4B. */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c index 1389d369ae..98369f0887 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c @@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg) struct mlx5_vdpa_priv *priv; struct mlx5_vdpa_task task; struct rte_ring *rng; + uint64_t features; uint32_t thrd_idx; uint32_t task_num; int ret; @@ -153,6 +154,39 @@ mlx5_vdpa_c_thread_handle(void *arg) } pthread_mutex_unlock(&virtq->virtq_lock); break; + case MLX5_VDPA_TASK_STOP_VIRTQ: + virtq = &priv->virtqs[task.idx]; + pthread_mutex_lock(&virtq->virtq_lock); + ret = mlx5_vdpa_virtq_stop(priv, + task.idx); + if (ret) { + DRV_LOG(ERR, + "Failed to stop virtq %d.", + task.idx); + __atomic_fetch_add( + task.err_cnt, 1, + __ATOMIC_RELAXED); + pthread_mutex_unlock(&virtq->virtq_lock); + break; + } + ret = rte_vhost_get_negotiated_features( + priv->vid, &features); + if (ret) { + DRV_LOG(ERR, + "Failed to get negotiated features virtq %d.", + task.idx); + __atomic_fetch_add( + task.err_cnt, 1, + __ATOMIC_RELAXED); + pthread_mutex_unlock(&virtq->virtq_lock); + break; + } + if (RTE_VHOST_NEED_LOG(features)) + rte_vhost_log_used_vring( + priv->vid, task.idx, 0, + MLX5_VDPA_USED_RING_LEN(virtq->vq_size)); + pthread_mutex_unlock(&virtq->virtq_lock); + break; default: DRV_LOG(ERR, "Invalid vdpa task type %d.", task.type); diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c index ae495a35f3..016e2a097b 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c @@ -87,39 +87,90 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base, return -1; } -#define MLX5_VDPA_USED_RING_LEN(size) \ - ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) - int mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv) { + uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0; + uint32_t i, thrd_idx, data[1]; struct mlx5_vdpa_virtq *virtq; uint64_t features; - int ret = rte_vhost_get_negotiated_features(priv->vid, &features); - int i; + int ret; + ret = rte_vhost_get_negotiated_features(priv->vid, &features); if (ret) { DRV_LOG(ERR, "Failed to get negotiated features."); return -1; } - if (!RTE_VHOST_NEED_LOG(features)) - return 0; - for (i = 0; i < priv->nr_virtqs; ++i) { - virtq = &priv->virtqs[i]; - if (!priv->virtqs[i].virtq) { - DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i); - } else { + if (priv->use_c_thread && priv->nr_virtqs) { + uint32_t main_task_idx[priv->nr_virtqs]; + +
[PATCH v3 13/15] vdpa/mlx5: add device close task
Split the virtqs device close tasks after stopping virt-queue between the configuration threads. This accelerates the LM process and reduces its time by 50%. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.c | 56 +-- drivers/vdpa/mlx5/mlx5_vdpa.h | 8 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 14 +++ 4 files changed, 94 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index e3b32fa087..d000854c08 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv) return kern_mtu == vhost_mtu ? 0 : -1; } -static void +void mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv) { /* Clean pre-created resource in dev removal only. */ @@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv) mlx5_vdpa_mem_dereg(priv); } +static bool +mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv) +{ + uint32_t timeout = 0; + + /* Check and wait all close tasks done. */ + while (__atomic_load_n(&priv->dev_close_progress, + __ATOMIC_RELAXED) != 0 && timeout < 1000) { + rte_delay_us_sleep(1); + timeout++; + } + if (priv->dev_close_progress) { + DRV_LOG(ERR, + "Failed to wait close device tasks done vid %d.", + priv->vid); + return true; + } + return false; +} + static int mlx5_vdpa_dev_close(int vid) { @@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid) ret |= mlx5_vdpa_lm_log(priv); priv->state = MLX5_VDPA_STATE_IN_PROGRESS; } + if (priv->use_c_thread) { + if (priv->last_c_thrd_idx >= + (conf_thread_mng.max_thrds - 1)) + priv->last_c_thrd_idx = 0; + else + priv->last_c_thrd_idx++; + __atomic_store_n(&priv->dev_close_progress, + 1, __ATOMIC_RELAXED); + if (mlx5_vdpa_task_add(priv, + priv->last_c_thrd_idx, + MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT, + NULL, NULL, NULL, 1)) { + DRV_LOG(ERR, + "Fail to add dev close task. "); + goto single_thrd; + } + priv->state = MLX5_VDPA_STATE_PROBED; + DRV_LOG(INFO, "vDPA device %d was closed.", vid); + return ret; + } +single_thrd: pthread_mutex_lock(&priv->steer_update_lock); mlx5_vdpa_steer_unset(priv); pthread_mutex_unlock(&priv->steer_update_lock); @@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid) mlx5_vdpa_drain_cq(priv); if (priv->lm_mr.addr) mlx5_os_wrapped_mkey_destroy(&priv->lm_mr); - priv->state = MLX5_VDPA_STATE_PROBED; if (!priv->connected) mlx5_vdpa_dev_cache_clean(priv); priv->vid = 0; + __atomic_store_n(&priv->dev_close_progress, 0, + __ATOMIC_RELAXED); + priv->state = MLX5_VDPA_STATE_PROBED; DRV_LOG(INFO, "vDPA device %d was closed.", vid); return ret; } @@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid) DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid); return -1; } + if (mlx5_vdpa_wait_dev_close_tasks_done(priv)) + return -1; priv->vid = vid; priv->connected = true; if (mlx5_vdpa_mtu_set(priv)) @@ -444,8 +489,11 @@ mlx5_vdpa_dev_cleanup(int vid) DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name); return -1; } - if (priv->state == MLX5_VDPA_STATE_PROBED) + if (priv->state == MLX5_VDPA_STATE_PROBED) { + if (priv->use_c_thread) + mlx5_vdpa_wait_dev_close_tasks_done(priv); mlx5_vdpa_dev_cache_clean(priv); + } priv->connected = false; return 0; } @@ -839,6 +887,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv) { if (priv->state == MLX5_VDPA_STATE_CONFIGURED) mlx5_vdpa_dev_close(priv->vid); + if (priv->use_c_thread) + mlx5_vdpa_wait_dev_close_tasks_done(priv); mlx5_vdpa_release_dev_resources(priv); if (priv->vdev) rte_vdpa_unregister_device(priv->vdev); diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index e08931719f..b6392b9d66 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type { MLX5_VDPA_TASK_REG_MR = 1, MLX5_VDPA_TASK_SETUP_VIRTQ, MLX5_VDPA_TASK_STOP_VIRTQ, + MLX5_VDPA_TASK_
[PATCH v3 14/15] vdpa/mlx5: add virtq sub-resources creation
pre-created virt-queue sub-resource in device probe stage and then modify virtqueue in device config stage. Steer table also need to support dummy virt-queue. This accelerates the LM process and reduces its time by 40%. Signed-off-by: Li Zhang Signed-off-by: Yajun Wu Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.c | 72 +++-- drivers/vdpa/mlx5/mlx5_vdpa.h | 17 +++-- drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++-- drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++-- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 + 5 files changed, 123 insertions(+), 93 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index d000854c08..f006a9cd3f 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -627,65 +627,39 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist, static int mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) { - struct mlx5_vdpa_virtq *virtq; + uint32_t max_queues; uint32_t index; - uint32_t i; + struct mlx5_vdpa_virtq *virtq; - for (index = 0; index < priv->caps.max_num_virtio_queues * 2; + for (index = 0; index < priv->caps.max_num_virtio_queues; index++) { virtq = &priv->virtqs[index]; pthread_mutex_init(&virtq->virtq_lock, NULL); } - if (!priv->queues) + if (!priv->queues || !priv->queue_size) return 0; - for (index = 0; index < (priv->queues * 2); ++index) { + max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ? + (priv->queues * 2) : (priv->caps.max_num_virtio_queues); + for (index = 0; index < max_queues; ++index) + if (mlx5_vdpa_virtq_single_resource_prepare(priv, + index)) + goto error; + if (mlx5_vdpa_is_modify_virtq_supported(priv)) + if (mlx5_vdpa_steer_update(priv, true)) + goto error; + return 0; +error: + for (index = 0; index < max_queues; ++index) { virtq = &priv->virtqs[index]; - int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size, - -1, virtq); - - if (ret) { - DRV_LOG(ERR, "Failed to create event QPs for virtq %d.", - index); - return -1; - } - if (priv->caps.queue_counters_valid) { - if (!virtq->counters) - virtq->counters = - mlx5_devx_cmd_create_virtio_q_counters - (priv->cdev->ctx); - if (!virtq->counters) { - DRV_LOG(ERR, "Failed to create virtq couners for virtq" - " %d.", index); - return -1; - } - } - for (i = 0; i < RTE_DIM(virtq->umems); ++i) { - uint32_t size; - void *buf; - struct mlx5dv_devx_umem *obj; - - size = priv->caps.umems[i].a * priv->queue_size + - priv->caps.umems[i].b; - buf = rte_zmalloc(__func__, size, 4096); - if (buf == NULL) { - DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq" - " %u.", i, index); - return -1; - } - obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, - size, IBV_ACCESS_LOCAL_WRITE); - if (obj == NULL) { - rte_free(buf); - DRV_LOG(ERR, "Failed to register umem %d for virtq %u.", - i, index); - return -1; - } - virtq->umems[i].size = size; - virtq->umems[i].buf = buf; - virtq->umems[i].obj = obj; + if (virtq->virtq) { + pthread_mutex_lock(&virtq->virtq_lock); + mlx5_vdpa_virtq_unset(virtq); + pthread_mutex_unlock(&virtq->virtq_lock); } } - return 0; + if (mlx5_vdpa_is_modify_virtq_supported(priv)) + mlx5_vdpa_steer_unset(priv); + return -1; } static int diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index b6392b9d66..f353db62ac 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct ml
[PATCH v3 15/15] vdpa/mlx5: prepare virtqueue resource creation
Split the virtqs virt-queue resource between the configuration threads. Also need pre-created virt-queue resource after virtq destruction. This accelerates the LM process and reduces its time by 30%. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- doc/guides/rel_notes/release_22_07.rst | 1 + drivers/vdpa/mlx5/mlx5_vdpa.c | 115 +++-- drivers/vdpa/mlx5/mlx5_vdpa.h | 12 ++- drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 15 +++- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c| 111 5 files changed, 209 insertions(+), 45 deletions(-) diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst index 2056cd9ee7..e1a9796e5c 100644 --- a/doc/guides/rel_notes/release_22_07.rst +++ b/doc/guides/rel_notes/release_22_07.rst @@ -178,6 +178,7 @@ New Features * **Updated Nvidia mlx5 vDPA driver.** * Added new devargs ``queue_size`` and ``queues`` to allow prior creation of virtq resources. + * Added new devarg ``max_conf_threads`` defines the number of multi-thread management to parallel the configurations. Removed Items diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index f006a9cd3f..c5d82872c7 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -275,23 +275,18 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv) } static int -mlx5_vdpa_dev_close(int vid) +_internal_mlx5_vdpa_dev_close(struct mlx5_vdpa_priv *priv, + bool release_resource) { - struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid); - struct mlx5_vdpa_priv *priv = - mlx5_vdpa_find_priv_resource_by_vdev(vdev); int ret = 0; + int vid = priv->vid; - if (priv == NULL) { - DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name); - return -1; - } mlx5_vdpa_cqe_event_unset(priv); if (priv->state == MLX5_VDPA_STATE_CONFIGURED) { ret |= mlx5_vdpa_lm_log(priv); priv->state = MLX5_VDPA_STATE_IN_PROGRESS; } - if (priv->use_c_thread) { + if (priv->use_c_thread && !release_resource) { if (priv->last_c_thrd_idx >= (conf_thread_mng.max_thrds - 1)) priv->last_c_thrd_idx = 0; @@ -315,7 +310,7 @@ mlx5_vdpa_dev_close(int vid) pthread_mutex_lock(&priv->steer_update_lock); mlx5_vdpa_steer_unset(priv); pthread_mutex_unlock(&priv->steer_update_lock); - mlx5_vdpa_virtqs_release(priv); + mlx5_vdpa_virtqs_release(priv, release_resource); mlx5_vdpa_drain_cq(priv); if (priv->lm_mr.addr) mlx5_os_wrapped_mkey_destroy(&priv->lm_mr); @@ -329,6 +324,24 @@ mlx5_vdpa_dev_close(int vid) return ret; } +static int +mlx5_vdpa_dev_close(int vid) +{ + struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid); + struct mlx5_vdpa_priv *priv; + + if (!vdev) { + DRV_LOG(ERR, "Invalid vDPA device."); + return -1; + } + priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev); + if (priv == NULL) { + DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name); + return -1; + } + return _internal_mlx5_vdpa_dev_close(priv, false); +} + static int mlx5_vdpa_dev_config(int vid) { @@ -624,11 +637,33 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist, priv->queue_size); } +void +mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv) +{ + uint32_t max_queues, index; + struct mlx5_vdpa_virtq *virtq; + + if (!priv->queues || !priv->queue_size) + return; + max_queues = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ? + (priv->queues * 2) : (priv->caps.max_num_virtio_queues); + if (mlx5_vdpa_is_modify_virtq_supported(priv)) + mlx5_vdpa_steer_unset(priv); + for (index = 0; index < max_queues; ++index) { + virtq = &priv->virtqs[index]; + if (virtq->virtq) { + pthread_mutex_lock(&virtq->virtq_lock); + mlx5_vdpa_virtq_unset(virtq); + pthread_mutex_unlock(&virtq->virtq_lock); + } + } +} + static int mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) { - uint32_t max_queues; - uint32_t index; + uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0; + uint32_t max_queues, index, thrd_idx, data[1]; struct mlx5_vdpa_virtq *virtq; for (index = 0; index < priv->caps.max_num_virtio_queues; @@ -640,25 +675,53 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) return 0; max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ? (priv->queues * 2) : (priv->caps.max_num_virtio_queues)
[PATCH v4 00/15] mlx5/vdpa: optimize live migration time
Allow the driver to use internal threads to obtain fast configuration. All the threads will be open on the same core of the event completion queue scheduling thread. Add max_conf_threads parameter to configure the maximum number of internal threads in addition to the caller thread (8 is suggested). These internal threads to pipeline handle VDPA tasks in system and shared with all VDPA devices. Default is 0, don't use internal threads for configuration. Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time") http://patchwork.dpdk.org/project/dpdk/list/?series=21868 RFC ("Add vDPA multi-threads optiomization") https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-l...@nvidia.com/ V2: * Drop eal device removal patch in series. * Add release note in release_22_07.rst. V3: * Fix comments about commit log issue. * Avoid cutting logs. V4: * Fix coding style issue Li Zhang (12): vdpa/mlx5: fix usage of capability for max number of virtqs common/mlx5: extend virtq modifiable fields vdpa/mlx5: pre-create virtq at probe time vdpa/mlx5: optimize datapath-control synchronization vdpa/mlx5: add multi-thread management for configuration vdpa/mlx5: add task ring for MT management vdpa/mlx5: add MT task for VM memory registration vdpa/mlx5: add virtq creation task for MT management vdpa/mlx5: add virtq LM log task vdpa/mlx5: add device close task vdpa/mlx5: add virtq sub-resources creation vdpa/mlx5: prepare virtqueue resource creation Yajun Wu (3): vdpa/mlx5: support pre create virtq resource common/mlx5: add DevX API to move QP to reset state vdpa/mlx5: support event qp reuse doc/guides/rel_notes/release_22_07.rst | 5 + doc/guides/vdpadevs/mlx5.rst | 25 + drivers/common/mlx5/mlx5_devx_cmds.c | 77 ++- drivers/common/mlx5/mlx5_devx_cmds.h | 6 +- drivers/common/mlx5/mlx5_prm.h | 30 +- drivers/vdpa/mlx5/meson.build | 1 + drivers/vdpa/mlx5/mlx5_vdpa.c | 270 -- drivers/vdpa/mlx5/mlx5_vdpa.h | 152 +- drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 360 ++ drivers/vdpa/mlx5/mlx5_vdpa_event.c| 160 -- drivers/vdpa/mlx5/mlx5_vdpa_lm.c | 134 +++-- drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 270 ++ drivers/vdpa/mlx5/mlx5_vdpa_steer.c| 22 +- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c| 654 ++--- 14 files changed, 1779 insertions(+), 387 deletions(-) create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c -- 2.31.1
[PATCH v4 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs
The driver wrongly takes the capability value for the number of virtq pairs instead of just the number of virtqs. Adjust all the usages of it to be the number of virtqs. Fixes: c2eb33a ("vdpa/mlx5: manage virtqs by array") Cc: sta...@dpdk.org Signed-off-by: Li Zhang Acked-by: Matan Azrad Reviewed-by: Maxime Coquelin --- drivers/vdpa/mlx5/mlx5_vdpa.c | 12 ++-- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 6 +++--- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index 76fa5d4299..ee71339b78 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -84,7 +84,7 @@ mlx5_vdpa_get_queue_num(struct rte_vdpa_device *vdev, uint32_t *queue_num) DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name); return -1; } - *queue_num = priv->caps.max_num_virtio_queues; + *queue_num = priv->caps.max_num_virtio_queues / 2; return 0; } @@ -141,7 +141,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state) DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name); return -EINVAL; } - if (vring >= (int)priv->caps.max_num_virtio_queues * 2) { + if (vring >= (int)priv->caps.max_num_virtio_queues) { DRV_LOG(ERR, "Too big vring id: %d.", vring); return -E2BIG; } @@ -388,7 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid, DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name); return -ENODEV; } - if (qid >= (int)priv->caps.max_num_virtio_queues * 2) { + if (qid >= (int)priv->caps.max_num_virtio_queues) { DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid, vdev->device->name); return -E2BIG; @@ -411,7 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid) DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name); return -ENODEV; } - if (qid >= (int)priv->caps.max_num_virtio_queues * 2) { + if (qid >= (int)priv->caps.max_num_virtio_queues) { DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid, vdev->device->name); return -E2BIG; @@ -624,7 +624,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev, DRV_LOG(DEBUG, "No capability to support virtq statistics."); priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv) + sizeof(struct mlx5_vdpa_virtq) * - attr->vdpa.max_num_virtio_queues * 2, + attr->vdpa.max_num_virtio_queues, RTE_CACHE_LINE_SIZE); if (!priv) { DRV_LOG(ERR, "Failed to allocate private memory."); @@ -685,7 +685,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv) uint32_t i; mlx5_vdpa_dev_cache_clean(priv); - for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) { + for (i = 0; i < priv->caps.max_num_virtio_queues; i++) { if (!priv->virtqs[i].counters) continue; claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters)); diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c index e025be47d2..c258eb3024 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c @@ -72,7 +72,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv) { unsigned int i, j; - for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) { + for (i = 0; i < priv->caps.max_num_virtio_queues; i++) { struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i]; for (j = 0; j < RTE_DIM(virtq->umems); ++j) { @@ -492,9 +492,9 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv) DRV_LOG(INFO, "TSO is enabled without CSUM, force CSUM."); priv->features |= (1ULL << VIRTIO_NET_F_CSUM); } - if (nr_vring > priv->caps.max_num_virtio_queues * 2) { + if (nr_vring > priv->caps.max_num_virtio_queues) { DRV_LOG(ERR, "Do not support more than %d virtqs(%d).", - (int)priv->caps.max_num_virtio_queues * 2, + (int)priv->caps.max_num_virtio_queues, (int)nr_vring); return -1; } -- 2.31.1
[PATCH v4 02/15] vdpa/mlx5: support pre create virtq resource
From: Yajun Wu The motivation of this change is to reduce vDPA device queue creation time by creating some queue resource in vDPA device probe stage. In VM live migration scenario, this can reduce 0.8ms for each queue creation, thus reduce LM network downtime. To create queue resource(umem/counter) in advance, we need to know virtio queue depth and max number of queue VM will use. Introduce two new devargs: queues(max queue pair number) and queue_size (queue depth). Two args must be both provided, if only one argument provided, the argument will be ignored and no pre-creation. The queues and queue_size must also be identical to vhost configuration driver later receive. Otherwise either the pre-create resource is wasted or missing or the resource need destroy and recreate(in case queue_size mismatch). Pre-create umem/counter will keep alive until vDPA device removal. Signed-off-by: Yajun Wu Acked-by: Matan Azrad Reviewed-by: Maxime Coquelin --- doc/guides/vdpadevs/mlx5.rst | 14 +++ drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++- drivers/vdpa/mlx5/mlx5_vdpa.h | 2 + 3 files changed, 89 insertions(+), 2 deletions(-) diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst index 3ded142311..0ad77bf535 100644 --- a/doc/guides/vdpadevs/mlx5.rst +++ b/doc/guides/vdpadevs/mlx5.rst @@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 drivers. - 0, HW default. +- ``queue_size`` parameter [int] + + - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up +first time queue creation. Set it together with queues devarg. + + - 0, default value, no pre-create virtq resource. + +- ``queues`` parameter [int] + + - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx queue) +for pre-create queue resource to speed up first time queue creation. Set it +together with queue_size devarg. + + - 0, default value, no pre-create virtq resource. Error handling ^^ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index ee71339b78..faf833ee2f 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv) static void mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv) { - mlx5_vdpa_virtqs_cleanup(priv); + /* Clean pre-created resource in dev removal only. */ + if (!priv->queues) + mlx5_vdpa_virtqs_cleanup(priv); mlx5_vdpa_mem_dereg(priv); } @@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque) priv->hw_max_latency_us = (uint32_t)tmp; } else if (strcmp(key, "hw_max_pending_comp") == 0) { priv->hw_max_pending_comp = (uint32_t)tmp; + } else if (strcmp(key, "queue_size") == 0) { + priv->queue_size = (uint16_t)tmp; + } else if (strcmp(key, "queues") == 0) { + priv->queues = (uint16_t)tmp; + } else { + DRV_LOG(WARNING, "Invalid key %s.", key); } return 0; } @@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist, if (!priv->event_us && priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER) priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US; + if ((priv->queue_size && !priv->queues) || + (!priv->queue_size && priv->queues)) { + priv->queue_size = 0; + priv->queues = 0; + DRV_LOG(WARNING, "Please provide both queue_size and queues."); + } DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode); DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us); DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max); + DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues, + priv->queue_size); +} + +static int +mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) +{ + uint32_t index; + uint32_t i; + + if (!priv->queues) + return 0; + for (index = 0; index < (priv->queues * 2); ++index) { + struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index]; + + if (priv->caps.queue_counters_valid) { + if (!virtq->counters) + virtq->counters = + mlx5_devx_cmd_create_virtio_q_counters + (priv->cdev->ctx); + if (!virtq->counters) { + DRV_LOG(ERR, "Failed to create virtq couners for virtq" + " %d.", index); + return -1; + } + } + for (i = 0; i < RTE_DIM(virtq->umems); ++i) { + uint32_t size; + void *buf; +
[PATCH v4 03/15] common/mlx5: add DevX API to move QP to reset state
From: Yajun Wu Support set QP to RESET state. Signed-off-by: Yajun Wu Acked-by: Matan Azrad Reviewed-by: Maxime Coquelin --- drivers/common/mlx5/mlx5_devx_cmds.c | 7 +++ drivers/common/mlx5/mlx5_prm.h | 17 + 2 files changed, 24 insertions(+) diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c index c6bdbc12bb..1d6d6578d6 100644 --- a/drivers/common/mlx5/mlx5_devx_cmds.c +++ b/drivers/common/mlx5/mlx5_devx_cmds.c @@ -2264,11 +2264,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op, uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)]; uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)]; uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)]; + uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)]; } in; union { uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)]; uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)]; uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)]; + uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)]; } out; void *qpc; int ret; @@ -2311,6 +2313,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op, inlen = sizeof(in.rtr2rts); outlen = sizeof(out.rtr2rts); break; + case MLX5_CMD_OP_QP_2RST: + MLX5_SET(2rst_qp_in, &in, qpn, qp->id); + inlen = sizeof(in.qp2rst); + outlen = sizeof(out.qp2rst); + break; default: DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.", qp_st_mod_op); diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h index bc3e70a1d1..8a2f55c33e 100644 --- a/drivers/common/mlx5/mlx5_prm.h +++ b/drivers/common/mlx5/mlx5_prm.h @@ -3657,6 +3657,23 @@ struct mlx5_ifc_init2init_qp_in_bits { u8 reserved_at_800[0x80]; }; +struct mlx5_ifc_2rst_qp_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + u8 syndrome[0x20]; + u8 reserved_at_40[0x40]; +}; + +struct mlx5_ifc_2rst_qp_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + u8 vhca_tunnel_id[0x10]; + u8 op_mod[0x10]; + u8 reserved_at_80[0x8]; + u8 qpn[0x18]; + u8 reserved_at_a0[0x20]; +}; + struct mlx5_ifc_dealloc_pd_out_bits { u8 status[0x8]; u8 reserved_0[0x18]; -- 2.31.1
[PATCH v4 04/15] vdpa/mlx5: support event qp reuse
From: Yajun Wu To speed up queue create time, event qp and cq will create only once. Each virtq creation will reuse same event qp and cq. Because FW will set event qp to error state during virtq destroy, need modify event qp to RESET state, then modify qp to RTS state as usual. This can save about 1.5ms for each virtq creation. After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as previous. Add new variable qp_ci to save SW qp ci. Move qp pi independently with cq ci. Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq release. Signed-off-by: Yajun Wu Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.c | 8 drivers/vdpa/mlx5/mlx5_vdpa.h | 12 +- drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++-- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 6 +-- 4 files changed, 78 insertions(+), 8 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index faf833ee2f..ee99952e11 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid) } mlx5_vdpa_steer_unset(priv); mlx5_vdpa_virtqs_release(priv); + mlx5_vdpa_drain_cq(priv); if (priv->lm_mr.addr) mlx5_os_wrapped_mkey_destroy(&priv->lm_mr); priv->state = MLX5_VDPA_STATE_PROBED; @@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) return 0; for (index = 0; index < (priv->queues * 2); ++index) { struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index]; + int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size, + -1, &virtq->eqp); + if (ret) { + DRV_LOG(ERR, "Failed to create event QPs for virtq %d.", + index); + return -1; + } if (priv->caps.queue_counters_valid) { if (!virtq->counters) virtq->counters = diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index f6719a3c60..bf82026e37 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp { struct mlx5_vdpa_cq cq; struct mlx5_devx_obj *fw_qp; struct mlx5_devx_qp sw_qp; + uint16_t qp_pi; }; struct mlx5_vdpa_query_mr { @@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv); * @return * 0 on success, -1 otherwise and rte_errno is set. */ -int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n, +int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n, int callfd, struct mlx5_vdpa_event_qp *eqp); /** @@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid, */ int mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid); + +/** + * Drain virtq CQ CQE. + * + * @param[in] priv + * The vdpa driver private structure. + */ +void +mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv); #endif /* RTE_PMD_MLX5_VDPA_H_ */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c index 7167a98db0..b43dca9255 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c @@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq) }; uint32_t word; } last_word; - uint16_t next_wqe_counter = cq->cq_ci; + uint16_t next_wqe_counter = eqp->qp_pi; uint16_t cur_wqe_counter; uint16_t comp; @@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq) rte_io_wmb(); /* Ring CQ doorbell record. */ cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci); + eqp->qp_pi += comp; rte_io_wmb(); /* Ring SW QP doorbell record. */ - eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size); + eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size); } return comp; } @@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv) return max; } +void +mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv) +{ + unsigned int i; + + for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) { + struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq; + + mlx5_vdpa_queue_complete(cq); + if (cq->cq_obj.cq) { + cq->cq_obj.cqes[0].wqe_counter = + rte_cpu_to_be_16(UINT16_MAX); + priv->virtqs[i].eqp.qp_pi = 0; + if (!cq->armed) + mlx5_vdpa_cq_arm(priv, cq); + } + } +} + /* Wait on all CQs channel for completi
[PATCH v4 05/15] common/mlx5: extend virtq modifiable fields
A virtq configuration can be modified after the virtq creation. Added the following modifiable fields: 1.address fields: desc_addr/used_addr/available_addr 2.hw_available_index 3.hw_used_index 4.virtio_q_type 5.version type 6.queue mkey 7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum 8.event mode: event_mode/event_qpn_or_msix Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++- drivers/common/mlx5/mlx5_devx_cmds.h | 6 ++- drivers/common/mlx5/mlx5_prm.h | 13 +- 3 files changed, 76 insertions(+), 13 deletions(-) diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c index 1d6d6578d6..1b68c37092 100644 --- a/drivers/common/mlx5/mlx5_devx_cmds.c +++ b/drivers/common/mlx5/mlx5_devx_cmds.c @@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx, vdpa_attr->log_doorbell_stride = MLX5_GET(virtio_emulation_cap, hcattr, log_doorbell_stride); + vdpa_attr->vnet_modify_ext = + MLX5_GET(virtio_emulation_cap, hcattr, +vnet_modify_ext); + vdpa_attr->virtio_net_q_addr_modify = + MLX5_GET(virtio_emulation_cap, hcattr, +virtio_net_q_addr_modify); + vdpa_attr->virtio_q_index_modify = + MLX5_GET(virtio_emulation_cap, hcattr, +virtio_q_index_modify); vdpa_attr->log_doorbell_bar_size = MLX5_GET(virtio_emulation_cap, hcattr, log_doorbell_bar_size); @@ -2074,27 +2083,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj, MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type, MLX5_GENERAL_OBJ_TYPE_VIRTQ); MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id); - MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type); + MLX5_SET64(virtio_net_q, virtq, modify_field_select, + attr->mod_fields_bitmap); MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index); - switch (attr->type) { - case MLX5_VIRTQ_MODIFY_TYPE_STATE: + if (!attr->mod_fields_bitmap) { + DRV_LOG(ERR, "Failed to modify VIRTQ for no type set."); + rte_errno = EINVAL; + return -rte_errno; + } + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE) MLX5_SET16(virtio_net_q, virtq, state, attr->state); - break; - case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS: + if (attr->mod_fields_bitmap & + MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) { MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey, attr->dirty_bitmap_mkey); MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr, attr->dirty_bitmap_addr); MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size, attr->dirty_bitmap_size); - break; - case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE: + } + if (attr->mod_fields_bitmap & + MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE) MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable, attr->dirty_bitmap_dump_enable); - break; - default: - rte_errno = EINVAL; - return -rte_errno; + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) { + MLX5_SET(virtio_q, virtctx, queue_period_mode, + attr->hw_latency_mode); + MLX5_SET(virtio_q, virtctx, queue_period_us, + attr->hw_max_latency_us); + MLX5_SET(virtio_q, virtctx, queue_max_count, + attr->hw_max_pending_comp); + } + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) { + MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr); + MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr); + MLX5_SET64(virtio_q, virtctx, available_addr, + attr->available_addr); + } + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX) + MLX5_SET16(virtio_net_q, virtq, hw_available_index, + attr->hw_available_index); + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX) + MLX5_SET16(virtio_net_q, virtq, hw_used_index, + attr->hw_used_index); + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE) + MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type); + if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0) + MLX5_SET16(virtio_q,
[PATCH v4 06/15] vdpa/mlx5: pre-create virtq at probe time
dev_config operation is called in LM progress. LM time is very critical because all the VM packets are dropped directly at that time. Move the virtq creation to probe time and only modify the configuration later in the dev_config stage using the new ability to modify virtq. This optimization accelerates the LM process and reduces its time by 70%. Signed-off-by: Li Zhang Acked-by: Matan Azrad Reviewed-by: Maxime Coquelin --- doc/guides/rel_notes/release_22_07.rst | 4 + drivers/vdpa/mlx5/mlx5_vdpa.h | 4 + drivers/vdpa/mlx5/mlx5_vdpa_lm.c | 19 +- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c| 257 +++-- 4 files changed, 176 insertions(+), 108 deletions(-) diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst index f2cf41def9..2056cd9ee7 100644 --- a/doc/guides/rel_notes/release_22_07.rst +++ b/doc/guides/rel_notes/release_22_07.rst @@ -175,6 +175,10 @@ New Features This is a fall-back implementation for platforms that don't support vector operations. +* **Updated Nvidia mlx5 vDPA driver.** + + * Added new devargs ``queue_size`` and ``queues`` to allow prior creation of virtq resources. + Removed Items - diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index bf82026e37..e5553079fe 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq { uint16_t vq_size; uint8_t notifier_state; bool stopped; + uint32_t configured:1; uint32_t version; struct mlx5_vdpa_priv *priv; struct mlx5_devx_obj *virtq; @@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid); */ void mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv); + +bool +mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv); #endif /* RTE_PMD_MLX5_VDPA_H_ */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c index 43a2b98255..284758ad56 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c @@ -12,20 +12,21 @@ int mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable) { struct mlx5_devx_virtq_attr attr = { - .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE, + .mod_fields_bitmap = + MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE, .dirty_bitmap_dump_enable = enable, }; + struct mlx5_vdpa_virtq *virtq; int i; for (i = 0; i < priv->nr_virtqs; ++i) { attr.queue_index = i; - if (!priv->virtqs[i].virtq) { - DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap " - "enabling.", i); + virtq = &priv->virtqs[i]; + if (!virtq->configured) { + DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap enabling.", i); } else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq, &attr)) { - DRV_LOG(ERR, "Failed to modify virtq %d for dirty " - "bitmap enabling.", i); + DRV_LOG(ERR, "Failed to modify virtq %d for dirty bitmap enabling.", i); return -1; } } @@ -37,10 +38,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base, uint64_t log_size) { struct mlx5_devx_virtq_attr attr = { - .type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS, + .mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS, .dirty_bitmap_addr = log_base, .dirty_bitmap_size = log_size, }; + struct mlx5_vdpa_virtq *virtq; int i; int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd, priv->cdev->pdn, @@ -54,7 +56,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base, attr.dirty_bitmap_mkey = priv->lm_mr.lkey; for (i = 0; i < priv->nr_virtqs; ++i) { attr.queue_index = i; - if (!priv->virtqs[i].virtq) { + virtq = &priv->virtqs[i]; + if (!virtq->configured) { DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i); } else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq, &attr)) { diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c index 6637ba1503..6e08d619e4 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c @@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv) for (i = 0; i < priv->caps.max_num_virtio_queues; i++) { struct mlx5_vdpa
[PATCH v4 07/15] vdpa/mlx5: optimize datapath-control synchronization
The driver used a single global lock for any synchronization needed for the datapath and control path. It is better to group the critical sections with the other ones that should be synchronized. Replace the global lock with the following locks: 1.virtq locks(per virtq) synchronize datapath polling and parallel configurations on the same virtq. 2.A doorbell lock synchronizes doorbell update, which is shared for all the virtqs in the device. 3.A steering lock for the shared steering objects updates. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.c | 24 --- drivers/vdpa/mlx5/mlx5_vdpa.h | 13 ++-- drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++--- drivers/vdpa/mlx5/mlx5_vdpa_lm.c| 36 --- drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 7 ++- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++--- 6 files changed, 186 insertions(+), 79 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index ee99952e11..e5a11f72fd 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state) struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid); struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev); + struct mlx5_vdpa_virtq *virtq; int ret; if (priv == NULL) { @@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state) DRV_LOG(ERR, "Too big vring id: %d.", vring); return -E2BIG; } - pthread_mutex_lock(&priv->vq_config_lock); + virtq = &priv->virtqs[vring]; + pthread_mutex_lock(&virtq->virtq_lock); ret = mlx5_vdpa_virtq_enable(priv, vring, state); - pthread_mutex_unlock(&priv->vq_config_lock); + pthread_mutex_unlock(&virtq->virtq_lock); return ret; } @@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid) ret |= mlx5_vdpa_lm_log(priv); priv->state = MLX5_VDPA_STATE_IN_PROGRESS; } + pthread_mutex_lock(&priv->steer_update_lock); mlx5_vdpa_steer_unset(priv); + pthread_mutex_unlock(&priv->steer_update_lock); mlx5_vdpa_virtqs_release(priv); mlx5_vdpa_drain_cq(priv); if (priv->lm_mr.addr) @@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid) if (!priv->connected) mlx5_vdpa_dev_cache_clean(priv); priv->vid = 0; - /* The mutex may stay locked after event thread cancel - initiate it. */ - pthread_mutex_init(&priv->vq_config_lock, NULL); DRV_LOG(INFO, "vDPA device %d was closed.", vid); return ret; } @@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist, static int mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) { + struct mlx5_vdpa_virtq *virtq; uint32_t index; uint32_t i; + for (index = 0; index < priv->caps.max_num_virtio_queues * 2; + index++) { + virtq = &priv->virtqs[index]; + pthread_mutex_init(&virtq->virtq_lock, NULL); + } if (!priv->queues) return 0; for (index = 0; index < (priv->queues * 2); ++index) { - struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index]; + virtq = &priv->virtqs[index]; int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size, - -1, &virtq->eqp); + -1, virtq); if (ret) { DRV_LOG(ERR, "Failed to create event QPs for virtq %d.", @@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev, priv->num_lag_ports = attr->num_lag_ports; if (attr->num_lag_ports == 0) priv->num_lag_ports = 1; - pthread_mutex_init(&priv->vq_config_lock, NULL); + rte_spinlock_init(&priv->db_lock); + pthread_mutex_init(&priv->steer_update_lock, NULL); priv->cdev = cdev; mlx5_vdpa_config_get(mkvlist, priv); if (mlx5_vdpa_create_dev_resources(priv)) @@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv) mlx5_vdpa_release_dev_resources(priv); if (priv->vdev) rte_vdpa_unregister_device(priv->vdev); - pthread_mutex_destroy(&priv->vq_config_lock); rte_free(priv); } diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index e5553079fe..3fd5eefc5e 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq { bool stopped; uint32_t configured:1; uint32_t version; + pthread_mutex_t virtq_lock; struct mlx5_vdpa_priv *priv; struct mlx5_devx_obj *virtq; struct mlx5_devx_obj *counters; @@ -126,7 +127,8 @@ struct mlx5_vd
[PATCH v4 08/15] vdpa/mlx5: add multi-thread management for configuration
The LM process includes a lot of objects creations and destructions in the source and the destination servers. As much as LM time increases, the packet drop of the VM increases. To improve LM time need to parallel the configurations for mlx5 FW. Add internal multi-thread management in the driver for it. A new devarg defines the number of threads and their CPU. The management is shared between all the devices of the driver. Since the event_core also affects the datapath events thread, reduce the priority of the datapath event thread to allow fast configuration of the devices doing the LM. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- doc/guides/vdpadevs/mlx5.rst | 11 +++ drivers/vdpa/mlx5/meson.build | 1 + drivers/vdpa/mlx5/mlx5_vdpa.c | 41 drivers/vdpa/mlx5/mlx5_vdpa.h | 36 +++ drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++ drivers/vdpa/mlx5/mlx5_vdpa_event.c | 2 +- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 8 +- 7 files changed, 223 insertions(+), 5 deletions(-) create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst index 0ad77bf535..b75a01688d 100644 --- a/doc/guides/vdpadevs/mlx5.rst +++ b/doc/guides/vdpadevs/mlx5.rst @@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 drivers. CPU core number to set polling thread affinity to, default to control plane cpu. +- ``max_conf_threads`` parameter [int] + + Allow the driver to use internal threads to obtain fast configuration. + All the threads will be open on the same core of the event completion queue scheduling thread. + + - 0, default, don't use internal threads for configuration. + + - 1 - 256, number of internal threads in addition to the caller thread (8 is suggested). +This value, if not 0, should be the same for all the devices; +the first prob will take it with the event_core for all the multi-thread configurations in the driver. + - ``hw_latency_mode`` parameter [int] The completion queue moderation mode: diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build index 0fa82ad257..9d8dbb1a82 100644 --- a/drivers/vdpa/mlx5/meson.build +++ b/drivers/vdpa/mlx5/meson.build @@ -15,6 +15,7 @@ sources = files( 'mlx5_vdpa_virtq.c', 'mlx5_vdpa_steer.c', 'mlx5_vdpa_lm.c', +'mlx5_vdpa_cthread.c', ) cflags_options = [ '-std=c11', diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index e5a11f72fd..a9d023ed08 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list = TAILQ_HEAD_INITIALIZER(priv_list); static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER; +struct mlx5_vdpa_conf_thread_mng conf_thread_mng; + static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv); static struct mlx5_vdpa_priv * @@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque) DRV_LOG(WARNING, "Invalid event_core %s.", val); else priv->event_core = tmp; + } else if (strcmp(key, "max_conf_threads") == 0) { + if (tmp) { + priv->use_c_thread = true; + if (!conf_thread_mng.initializer_priv) { + conf_thread_mng.initializer_priv = priv; + if (tmp > MLX5_VDPA_MAX_C_THRD) { + DRV_LOG(WARNING, + "Invalid max_conf_threads %s " + "and set max_conf_threads to %d", + val, MLX5_VDPA_MAX_C_THRD); + tmp = MLX5_VDPA_MAX_C_THRD; + } + conf_thread_mng.max_thrds = tmp; + } else if (tmp != conf_thread_mng.max_thrds) { + DRV_LOG(WARNING, + "max_conf_threads is PMD argument and not per device, " + "only the first device configuration set it, current value is %d " + "and will not be changed to %d.", + conf_thread_mng.max_thrds, (int)tmp); + } + } else { + priv->use_c_thread = false; + } } else if (strcmp(key, "hw_latency_mode") == 0) { priv->hw_latency_mode = (uint32_t)tmp; } else if (strcmp(key, "hw_max_latency_us") == 0) { @@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist, "hw_max_latency_us", "hw_max_pending_comp", "no_traffic_time", + "queue_size", + "queues", + "max_conf_threads",
[PATCH v4 09/15] vdpa/mlx5: add task ring for MT management
The configuration threads tasks need a container to support multiple tasks assigned to a thread in parallel. Use rte_ring container per thread to manage the thread tasks without locks. The caller thread from the user context opens a task to a thread and enqueue it to the thread ring. The thread polls its ring and dequeue tasks. That’s why the ring should be in multi-producer and single consumer mode. Anatomic counter manages the tasks completion notification. The threads report errors to the caller by a dedicated error counter per task. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.h | 17 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +- 2 files changed, 130 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index 4e7c2557b7..2bbb868ec6 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -74,10 +74,22 @@ enum { }; #define MLX5_VDPA_MAX_C_THRD 256 +#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096 +#define MLX5_VDPA_TASKS_PER_DEV 64 + +/* Generic task information and size must be multiple of 4B. */ +struct mlx5_vdpa_task { + struct mlx5_vdpa_priv *priv; + uint32_t *remaining_cnt; + uint32_t *err_cnt; + uint32_t idx; +} __rte_packed __rte_aligned(4); /* Generic mlx5_vdpa_c_thread information. */ struct mlx5_vdpa_c_thread { pthread_t tid; + struct rte_ring *rng; + pthread_cond_t c_cond; }; struct mlx5_vdpa_conf_thread_mng { @@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core); */ void mlx5_vdpa_mult_threads_destroy(bool need_unlock); + +bool +mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, + uint32_t thrd_idx, + uint32_t num); #endif /* RTE_PMD_MLX5_VDPA_H_ */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c index ba7d8b63b3..1fdc92d3ad 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c @@ -11,17 +11,103 @@ #include #include #include +#include #include #include "mlx5_vdpa_utils.h" #include "mlx5_vdpa.h" +static inline uint32_t +mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r, + void **obj, uint32_t n, uint32_t *avail) +{ + uint32_t m; + + m = rte_ring_dequeue_bulk_elem_start(r, obj, + sizeof(struct mlx5_vdpa_task), n, avail); + n = (m == n) ? n : 0; + rte_ring_dequeue_elem_finish(r, n); + return n; +} + +static inline uint32_t +mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r, + void * const *obj, uint32_t n, uint32_t *free) +{ + uint32_t m; + + m = rte_ring_enqueue_bulk_elem_start(r, n, free); + n = (m == n) ? n : 0; + rte_ring_enqueue_elem_finish(r, obj, + sizeof(struct mlx5_vdpa_task), n); + return n; +} + +bool +mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, + uint32_t thrd_idx, + uint32_t num) +{ + struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng; + struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV]; + uint32_t i; + + MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV); + for (i = 0 ; i < num; i++) { + task[i].priv = priv; + /* To be added later. */ + } + if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL)) + return -1; + for (i = 0 ; i < num; i++) + if (task[i].remaining_cnt) + __atomic_fetch_add(task[i].remaining_cnt, 1, + __ATOMIC_RELAXED); + /* wake up conf thread. */ + pthread_mutex_lock(&conf_thread_mng.cthrd_lock); + pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond); + pthread_mutex_unlock(&conf_thread_mng.cthrd_lock); + return 0; +} + static void * mlx5_vdpa_c_thread_handle(void *arg) { - /* To be added later. */ - return arg; + struct mlx5_vdpa_conf_thread_mng *multhrd = arg; + pthread_t thread_id = pthread_self(); + struct mlx5_vdpa_priv *priv; + struct mlx5_vdpa_task task; + struct rte_ring *rng; + uint32_t thrd_idx; + uint32_t task_num; + + for (thrd_idx = 0; thrd_idx < multhrd->max_thrds; + thrd_idx++) + if (multhrd->cthrd[thrd_idx].tid == thread_id) + break; + if (thrd_idx >= multhrd->max_thrds) + return NULL; + rng = multhrd->cthrd[thrd_idx].rng; + while (1) { + task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng, + (void **)&task, 1, NULL); + if (!task_num) { + /* No task and condition wait. */ + pthread_mutex_lock(&multhrd->cthrd_lock); + pthread_cond_wait( + &multhrd->cthrd[thrd_idx].c_cond, +
[PATCH v4 10/15] vdpa/mlx5: add MT task for VM memory registration
The driver creates a direct MR object of the HW for each VM memory region, which maps the VM physical address to the actual physical address. Later, after all the MRs are ready, the driver creates an indirect MR to group all the direct MRs into one virtual space from the HW perspective. Create direct MRs in parallel using the MT mechanism. After completion, the primary thread creates the indirect MR needed for the following virtqs configurations. This optimization accelerrate the LM process and reduce its time by 5%. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.c | 1 - drivers/vdpa/mlx5/mlx5_vdpa.h | 31 ++- drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 47 - drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 270 ++ drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 6 +- 5 files changed, 258 insertions(+), 97 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index a9d023ed08..e3b32fa087 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev, rte_errno = rte_errno ? rte_errno : EINVAL; goto error; } - SLIST_INIT(&priv->mr_list); pthread_mutex_lock(&priv_list_lock); TAILQ_INSERT_TAIL(&priv_list, priv, next); pthread_mutex_unlock(&priv_list_lock); diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index 2bbb868ec6..3316ce42be 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp { }; struct mlx5_vdpa_query_mr { - SLIST_ENTRY(mlx5_vdpa_query_mr) next; union { struct ibv_mr *mr; struct mlx5_devx_obj *mkey; @@ -76,10 +75,17 @@ enum { #define MLX5_VDPA_MAX_C_THRD 256 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096 #define MLX5_VDPA_TASKS_PER_DEV 64 +#define MLX5_VDPA_MAX_MRS 0x + +/* Vdpa task types. */ +enum mlx5_vdpa_task_type { + MLX5_VDPA_TASK_REG_MR = 1, +}; /* Generic task information and size must be multiple of 4B. */ struct mlx5_vdpa_task { struct mlx5_vdpa_priv *priv; + enum mlx5_vdpa_task_type type; uint32_t *remaining_cnt; uint32_t *err_cnt; uint32_t idx; @@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng { }; extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng; +struct mlx5_vdpa_vmem_info { + struct rte_vhost_memory *vmem; + uint32_t entries_num; + uint64_t gcd; + uint64_t size; + uint8_t mode; +}; + struct mlx5_vdpa_virtq { SLIST_ENTRY(mlx5_vdpa_virtq) next; uint8_t enable; @@ -176,7 +190,7 @@ struct mlx5_vdpa_priv { struct mlx5_hca_vdpa_attr caps; uint32_t gpa_mkey_index; struct ibv_mr *null_mr; - struct rte_vhost_memory *vmem; + struct mlx5_vdpa_vmem_info vmem_info; struct mlx5dv_devx_event_channel *eventc; struct mlx5dv_devx_event_channel *err_chnl; struct mlx5_uar uar; @@ -187,11 +201,13 @@ struct mlx5_vdpa_priv { uint8_t num_lag_ports; uint64_t features; /* Negotiated features. */ uint16_t log_max_rqt_size; + uint16_t last_c_thrd_idx; + uint16_t num_mrs; /* Number of memory regions. */ struct mlx5_vdpa_steer steer; struct mlx5dv_var *var; void *virtq_db_addr; struct mlx5_pmd_wrapped_mr lm_mr; - SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list; + struct mlx5_vdpa_query_mr **mrs; struct mlx5_vdpa_virtq virtqs[]; }; @@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock); bool mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, uint32_t thrd_idx, - uint32_t num); + enum mlx5_vdpa_task_type task_type, + uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt, + void **task_data, uint32_t num); +int +mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx); +bool +mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt, + uint32_t *err_cnt, uint32_t sleep_time); #endif /* RTE_PMD_MLX5_VDPA_H_ */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c index 1fdc92d3ad..10391931ae 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c @@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r, bool mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, uint32_t thrd_idx, - uint32_t num) + enum mlx5_vdpa_task_type task_type, + uint32_t *remaining_cnt, uint32_t *err_cnt, + void **task_data, uint32_t num) { struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng; struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV]; + uint32_t *data = (uint32_t *)task_data;
[PATCH v4 11/15] vdpa/mlx5: add virtq creation task for MT management
The virtq object and all its sub-resources use a lot of FW commands and can be accelerated by the MT management. Split the virtqs creation between the configuration threads. This accelerates the LM process and reduces its time by 20%. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.h | 9 +- drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 14 +++ drivers/vdpa/mlx5/mlx5_vdpa_event.c | 2 +- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 149 +++--- 4 files changed, 134 insertions(+), 40 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index 3316ce42be..35221f5ddc 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -80,6 +80,7 @@ enum { /* Vdpa task types. */ enum mlx5_vdpa_task_type { MLX5_VDPA_TASK_REG_MR = 1, + MLX5_VDPA_TASK_SETUP_VIRTQ, }; /* Generic task information and size must be multiple of 4B. */ @@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info { struct mlx5_vdpa_virtq { SLIST_ENTRY(mlx5_vdpa_virtq) next; - uint8_t enable; uint16_t index; uint16_t vq_size; uint8_t notifier_state; - bool stopped; uint32_t configured:1; + uint32_t enable:1; + uint32_t stopped:1; uint32_t version; pthread_mutex_t virtq_lock; struct mlx5_vdpa_priv *priv; @@ -565,11 +566,13 @@ bool mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, uint32_t thrd_idx, enum mlx5_vdpa_task_type task_type, - uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt, + uint32_t *remaining_cnt, uint32_t *err_cnt, void **task_data, uint32_t num); int mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx); bool mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt, uint32_t *err_cnt, uint32_t sleep_time); +int +mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick); #endif /* RTE_PMD_MLX5_VDPA_H_ */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c index 10391931ae..1389d369ae 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c @@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg) { struct mlx5_vdpa_conf_thread_mng *multhrd = arg; pthread_t thread_id = pthread_self(); + struct mlx5_vdpa_virtq *virtq; struct mlx5_vdpa_priv *priv; struct mlx5_vdpa_task task; struct rte_ring *rng; @@ -139,6 +140,19 @@ mlx5_vdpa_c_thread_handle(void *arg) __ATOMIC_RELAXED); } break; + case MLX5_VDPA_TASK_SETUP_VIRTQ: + virtq = &priv->virtqs[task.idx]; + pthread_mutex_lock(&virtq->virtq_lock); + ret = mlx5_vdpa_virtq_setup(priv, + task.idx, false); + if (ret) { + DRV_LOG(ERR, + "Failed to setup virtq %d.", task.idx); + __atomic_fetch_add( + task.err_cnt, 1, __ATOMIC_RELAXED); + } + pthread_mutex_unlock(&virtq->virtq_lock); + break; default: DRV_LOG(ERR, "Invalid vdpa task type %d.", task.type); diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c index b45fbac146..f782b6b832 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c @@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused) goto unlock; if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC) goto unlock; - virtq->stopped = true; + virtq->stopped = 1; /* Query error info. */ if (mlx5_vdpa_virtq_query(priv, vq_index)) goto log; diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c index 1f81fb8723..50d59a8394 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c @@ -111,8 +111,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv) for (i = 0; i < priv->caps.max_num_virtio_queues; i++) { struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i]; + if (virtq->index != i) + continue; pthread_mutex_lock(&virtq->virtq_lock); - virtq->configured = 0; for (j = 0; j < RTE_DIM(virtq->umems); ++j) { if (virtq->umems[j].obj) { claim_zero(mlx5_glue->devx_umem_dereg @@ -131,7 +132,6 @@ mlx5_vdpa_
[PATCH v4 12/15] vdpa/mlx5: add virtq LM log task
Split the virtqs LM log between the configuration threads. This accelerates the LM process and reduces its time by 20%. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.h | 3 + drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++ drivers/vdpa/mlx5/mlx5_vdpa_lm.c | 85 +-- 3 files changed, 105 insertions(+), 17 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index 35221f5ddc..e08931719f 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -72,6 +72,8 @@ enum { MLX5_VDPA_NOTIFIER_STATE_ERR }; +#define MLX5_VDPA_USED_RING_LEN(size) \ + ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) #define MLX5_VDPA_MAX_C_THRD 256 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096 #define MLX5_VDPA_TASKS_PER_DEV 64 @@ -81,6 +83,7 @@ enum { enum mlx5_vdpa_task_type { MLX5_VDPA_TASK_REG_MR = 1, MLX5_VDPA_TASK_SETUP_VIRTQ, + MLX5_VDPA_TASK_STOP_VIRTQ, }; /* Generic task information and size must be multiple of 4B. */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c index 1389d369ae..98369f0887 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c @@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg) struct mlx5_vdpa_priv *priv; struct mlx5_vdpa_task task; struct rte_ring *rng; + uint64_t features; uint32_t thrd_idx; uint32_t task_num; int ret; @@ -153,6 +154,39 @@ mlx5_vdpa_c_thread_handle(void *arg) } pthread_mutex_unlock(&virtq->virtq_lock); break; + case MLX5_VDPA_TASK_STOP_VIRTQ: + virtq = &priv->virtqs[task.idx]; + pthread_mutex_lock(&virtq->virtq_lock); + ret = mlx5_vdpa_virtq_stop(priv, + task.idx); + if (ret) { + DRV_LOG(ERR, + "Failed to stop virtq %d.", + task.idx); + __atomic_fetch_add( + task.err_cnt, 1, + __ATOMIC_RELAXED); + pthread_mutex_unlock(&virtq->virtq_lock); + break; + } + ret = rte_vhost_get_negotiated_features( + priv->vid, &features); + if (ret) { + DRV_LOG(ERR, + "Failed to get negotiated features virtq %d.", + task.idx); + __atomic_fetch_add( + task.err_cnt, 1, + __ATOMIC_RELAXED); + pthread_mutex_unlock(&virtq->virtq_lock); + break; + } + if (RTE_VHOST_NEED_LOG(features)) + rte_vhost_log_used_vring( + priv->vid, task.idx, 0, + MLX5_VDPA_USED_RING_LEN(virtq->vq_size)); + pthread_mutex_unlock(&virtq->virtq_lock); + break; default: DRV_LOG(ERR, "Invalid vdpa task type %d.", task.type); diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c index bfa5d4d571..0fa671fc7c 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c @@ -89,39 +89,90 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base, return -1; } -#define MLX5_VDPA_USED_RING_LEN(size) \ - ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) - int mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv) { + uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0; + uint32_t i, thrd_idx, data[1]; struct mlx5_vdpa_virtq *virtq; uint64_t features; - int ret = rte_vhost_get_negotiated_features(priv->vid, &features); - int i; + int ret; + ret = rte_vhost_get_negotiated_features(priv->vid, &features); if (ret) { DRV_LOG(ERR, "Failed to get negotiated features."); return -1; } - if (!RTE_VHOST_NEED_LOG(features)) - return 0; - for (i = 0; i < priv->nr_virtqs; ++i) { - virtq = &priv->virtqs[i]; - if (!priv->virtqs[i].virtq) { - DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i); - } else { + if (priv->use_c_thread && priv->nr_virtqs) { + uint32_t main_task_idx[priv->nr_virtqs]; + +
[PATCH v4 13/15] vdpa/mlx5: add device close task
Split the virtqs device close tasks after stopping virt-queue between the configuration threads. This accelerates the LM process and reduces its time by 50%. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.c | 56 +-- drivers/vdpa/mlx5/mlx5_vdpa.h | 8 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 14 +++ 4 files changed, 94 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index e3b32fa087..d000854c08 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv) return kern_mtu == vhost_mtu ? 0 : -1; } -static void +void mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv) { /* Clean pre-created resource in dev removal only. */ @@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv) mlx5_vdpa_mem_dereg(priv); } +static bool +mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv) +{ + uint32_t timeout = 0; + + /* Check and wait all close tasks done. */ + while (__atomic_load_n(&priv->dev_close_progress, + __ATOMIC_RELAXED) != 0 && timeout < 1000) { + rte_delay_us_sleep(1); + timeout++; + } + if (priv->dev_close_progress) { + DRV_LOG(ERR, + "Failed to wait close device tasks done vid %d.", + priv->vid); + return true; + } + return false; +} + static int mlx5_vdpa_dev_close(int vid) { @@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid) ret |= mlx5_vdpa_lm_log(priv); priv->state = MLX5_VDPA_STATE_IN_PROGRESS; } + if (priv->use_c_thread) { + if (priv->last_c_thrd_idx >= + (conf_thread_mng.max_thrds - 1)) + priv->last_c_thrd_idx = 0; + else + priv->last_c_thrd_idx++; + __atomic_store_n(&priv->dev_close_progress, + 1, __ATOMIC_RELAXED); + if (mlx5_vdpa_task_add(priv, + priv->last_c_thrd_idx, + MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT, + NULL, NULL, NULL, 1)) { + DRV_LOG(ERR, + "Fail to add dev close task. "); + goto single_thrd; + } + priv->state = MLX5_VDPA_STATE_PROBED; + DRV_LOG(INFO, "vDPA device %d was closed.", vid); + return ret; + } +single_thrd: pthread_mutex_lock(&priv->steer_update_lock); mlx5_vdpa_steer_unset(priv); pthread_mutex_unlock(&priv->steer_update_lock); @@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid) mlx5_vdpa_drain_cq(priv); if (priv->lm_mr.addr) mlx5_os_wrapped_mkey_destroy(&priv->lm_mr); - priv->state = MLX5_VDPA_STATE_PROBED; if (!priv->connected) mlx5_vdpa_dev_cache_clean(priv); priv->vid = 0; + __atomic_store_n(&priv->dev_close_progress, 0, + __ATOMIC_RELAXED); + priv->state = MLX5_VDPA_STATE_PROBED; DRV_LOG(INFO, "vDPA device %d was closed.", vid); return ret; } @@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid) DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid); return -1; } + if (mlx5_vdpa_wait_dev_close_tasks_done(priv)) + return -1; priv->vid = vid; priv->connected = true; if (mlx5_vdpa_mtu_set(priv)) @@ -444,8 +489,11 @@ mlx5_vdpa_dev_cleanup(int vid) DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name); return -1; } - if (priv->state == MLX5_VDPA_STATE_PROBED) + if (priv->state == MLX5_VDPA_STATE_PROBED) { + if (priv->use_c_thread) + mlx5_vdpa_wait_dev_close_tasks_done(priv); mlx5_vdpa_dev_cache_clean(priv); + } priv->connected = false; return 0; } @@ -839,6 +887,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv) { if (priv->state == MLX5_VDPA_STATE_CONFIGURED) mlx5_vdpa_dev_close(priv->vid); + if (priv->use_c_thread) + mlx5_vdpa_wait_dev_close_tasks_done(priv); mlx5_vdpa_release_dev_resources(priv); if (priv->vdev) rte_vdpa_unregister_device(priv->vdev); diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index e08931719f..b6392b9d66 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type { MLX5_VDPA_TASK_REG_MR = 1, MLX5_VDPA_TASK_SETUP_VIRTQ, MLX5_VDPA_TASK_STOP_VIRTQ, + MLX5_VDPA_TASK_
[PATCH v4 14/15] vdpa/mlx5: add virtq sub-resources creation
pre-created virt-queue sub-resource in device probe stage and then modify virtqueue in device config stage. Steer table also need to support dummy virt-queue. This accelerates the LM process and reduces its time by 40%. Signed-off-by: Li Zhang Signed-off-by: Yajun Wu Acked-by: Matan Azrad --- drivers/vdpa/mlx5/mlx5_vdpa.c | 72 +++-- drivers/vdpa/mlx5/mlx5_vdpa.h | 17 +++-- drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++-- drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++-- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 + 5 files changed, 123 insertions(+), 93 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index d000854c08..f006a9cd3f 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -627,65 +627,39 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist, static int mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) { - struct mlx5_vdpa_virtq *virtq; + uint32_t max_queues; uint32_t index; - uint32_t i; + struct mlx5_vdpa_virtq *virtq; - for (index = 0; index < priv->caps.max_num_virtio_queues * 2; + for (index = 0; index < priv->caps.max_num_virtio_queues; index++) { virtq = &priv->virtqs[index]; pthread_mutex_init(&virtq->virtq_lock, NULL); } - if (!priv->queues) + if (!priv->queues || !priv->queue_size) return 0; - for (index = 0; index < (priv->queues * 2); ++index) { + max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ? + (priv->queues * 2) : (priv->caps.max_num_virtio_queues); + for (index = 0; index < max_queues; ++index) + if (mlx5_vdpa_virtq_single_resource_prepare(priv, + index)) + goto error; + if (mlx5_vdpa_is_modify_virtq_supported(priv)) + if (mlx5_vdpa_steer_update(priv, true)) + goto error; + return 0; +error: + for (index = 0; index < max_queues; ++index) { virtq = &priv->virtqs[index]; - int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size, - -1, virtq); - - if (ret) { - DRV_LOG(ERR, "Failed to create event QPs for virtq %d.", - index); - return -1; - } - if (priv->caps.queue_counters_valid) { - if (!virtq->counters) - virtq->counters = - mlx5_devx_cmd_create_virtio_q_counters - (priv->cdev->ctx); - if (!virtq->counters) { - DRV_LOG(ERR, "Failed to create virtq couners for virtq" - " %d.", index); - return -1; - } - } - for (i = 0; i < RTE_DIM(virtq->umems); ++i) { - uint32_t size; - void *buf; - struct mlx5dv_devx_umem *obj; - - size = priv->caps.umems[i].a * priv->queue_size + - priv->caps.umems[i].b; - buf = rte_zmalloc(__func__, size, 4096); - if (buf == NULL) { - DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq" - " %u.", i, index); - return -1; - } - obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, - size, IBV_ACCESS_LOCAL_WRITE); - if (obj == NULL) { - rte_free(buf); - DRV_LOG(ERR, "Failed to register umem %d for virtq %u.", - i, index); - return -1; - } - virtq->umems[i].size = size; - virtq->umems[i].buf = buf; - virtq->umems[i].obj = obj; + if (virtq->virtq) { + pthread_mutex_lock(&virtq->virtq_lock); + mlx5_vdpa_virtq_unset(virtq); + pthread_mutex_unlock(&virtq->virtq_lock); } } - return 0; + if (mlx5_vdpa_is_modify_virtq_supported(priv)) + mlx5_vdpa_steer_unset(priv); + return -1; } static int diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index b6392b9d66..f353db62ac 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct ml
[PATCH v4 15/15] vdpa/mlx5: prepare virtqueue resource creation
Split the virtqs virt-queue resource between the configuration threads. Also need pre-created virt-queue resource after virtq destruction. This accelerates the LM process and reduces its time by 30%. Signed-off-by: Li Zhang Acked-by: Matan Azrad --- doc/guides/rel_notes/release_22_07.rst | 1 + drivers/vdpa/mlx5/mlx5_vdpa.c | 115 +++-- drivers/vdpa/mlx5/mlx5_vdpa.h | 12 ++- drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 15 +++- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c| 111 5 files changed, 209 insertions(+), 45 deletions(-) diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst index 2056cd9ee7..e1a9796e5c 100644 --- a/doc/guides/rel_notes/release_22_07.rst +++ b/doc/guides/rel_notes/release_22_07.rst @@ -178,6 +178,7 @@ New Features * **Updated Nvidia mlx5 vDPA driver.** * Added new devargs ``queue_size`` and ``queues`` to allow prior creation of virtq resources. + * Added new devarg ``max_conf_threads`` defines the number of multi-thread management to parallel the configurations. Removed Items diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index f006a9cd3f..c5d82872c7 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -275,23 +275,18 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv) } static int -mlx5_vdpa_dev_close(int vid) +_internal_mlx5_vdpa_dev_close(struct mlx5_vdpa_priv *priv, + bool release_resource) { - struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid); - struct mlx5_vdpa_priv *priv = - mlx5_vdpa_find_priv_resource_by_vdev(vdev); int ret = 0; + int vid = priv->vid; - if (priv == NULL) { - DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name); - return -1; - } mlx5_vdpa_cqe_event_unset(priv); if (priv->state == MLX5_VDPA_STATE_CONFIGURED) { ret |= mlx5_vdpa_lm_log(priv); priv->state = MLX5_VDPA_STATE_IN_PROGRESS; } - if (priv->use_c_thread) { + if (priv->use_c_thread && !release_resource) { if (priv->last_c_thrd_idx >= (conf_thread_mng.max_thrds - 1)) priv->last_c_thrd_idx = 0; @@ -315,7 +310,7 @@ mlx5_vdpa_dev_close(int vid) pthread_mutex_lock(&priv->steer_update_lock); mlx5_vdpa_steer_unset(priv); pthread_mutex_unlock(&priv->steer_update_lock); - mlx5_vdpa_virtqs_release(priv); + mlx5_vdpa_virtqs_release(priv, release_resource); mlx5_vdpa_drain_cq(priv); if (priv->lm_mr.addr) mlx5_os_wrapped_mkey_destroy(&priv->lm_mr); @@ -329,6 +324,24 @@ mlx5_vdpa_dev_close(int vid) return ret; } +static int +mlx5_vdpa_dev_close(int vid) +{ + struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid); + struct mlx5_vdpa_priv *priv; + + if (!vdev) { + DRV_LOG(ERR, "Invalid vDPA device."); + return -1; + } + priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev); + if (priv == NULL) { + DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name); + return -1; + } + return _internal_mlx5_vdpa_dev_close(priv, false); +} + static int mlx5_vdpa_dev_config(int vid) { @@ -624,11 +637,33 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist, priv->queue_size); } +void +mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv) +{ + uint32_t max_queues, index; + struct mlx5_vdpa_virtq *virtq; + + if (!priv->queues || !priv->queue_size) + return; + max_queues = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ? + (priv->queues * 2) : (priv->caps.max_num_virtio_queues); + if (mlx5_vdpa_is_modify_virtq_supported(priv)) + mlx5_vdpa_steer_unset(priv); + for (index = 0; index < max_queues; ++index) { + virtq = &priv->virtqs[index]; + if (virtq->virtq) { + pthread_mutex_lock(&virtq->virtq_lock); + mlx5_vdpa_virtq_unset(virtq); + pthread_mutex_unlock(&virtq->virtq_lock); + } + } +} + static int mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) { - uint32_t max_queues; - uint32_t index; + uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0; + uint32_t max_queues, index, thrd_idx, data[1]; struct mlx5_vdpa_virtq *virtq; for (index = 0; index < priv->caps.max_num_virtio_queues; @@ -640,25 +675,53 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv) return 0; max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ? (priv->queues * 2) : (priv->caps.max_num_virtio_queues)
RE: [PATCH v2] vdpa/ifc: fix null pointer dereference
Hi Maxime, Thanks for your effort. > -Original Message- > From: Maxime Coquelin > Sent: Friday, June 17, 2022 10:08 PM > To: Pei, Andy ; dev@dpdk.org > Cc: Xia, Chenbo ; Wang, Xiao W > ; Xu, Rosen ; Xiao, QimaiX > > Subject: Re: [PATCH v2] vdpa/ifc: fix null pointer dereference > > > > On 6/15/22 08:23, Andy Pei wrote: > > Fix null pointer dereference reported in coverity scan. > > > > Coverity issue: 378882 > > Fixes: 5d75517beffe ("vdpa/ifc/base: access correct register for blk > > device") > > > > Signed-off-by: Andy Pei > > Acked-by: Xiao Wang > > --- > > drivers/vdpa/ifc/base/ifcvf.c | 9 + > > 1 file changed, 9 insertions(+) > > > > Applied to dpdk-next-virtio/main. > > Thanks, > Maxime
RE: [PATCH v2 1/5] telemetry: escape special char when tel string
+CC: Ciara Power, Telemetry library maintainer > From: fengchengwen [mailto:fengcheng...@huawei.com] > Sent: Saturday, 18 June 2022 05.52 > > On 2022/6/18 1:05, Stephen Hemminger wrote: > > On Fri, 17 Jun 2022 12:25:04 +0100 > > Bruce Richardson wrote: > > > >> On Fri, Jun 17, 2022 at 01:16:08PM +0200, Morten Brørup wrote: > From: Chengwen Feng [mailto:fengcheng...@huawei.com] > Sent: Friday, 17 June 2022 11.46 > > This patch supports escape special characters (including: > \",\\,/,\b, > /f,/n,/r,/t) when telemetry string. > This patch is used to support telemetry xxx-dump commands which > the > string may include special characters. > > Signed-off-by: Chengwen Feng > --- > lib/telemetry/telemetry.c | 96 > +-- > 1 file changed, 93 insertions(+), 3 deletions(-) > > diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c > index c6fd03a5ab..0f762f633e 100644 > --- a/lib/telemetry/telemetry.c > +++ b/lib/telemetry/telemetry.c > @@ -215,6 +215,94 @@ container_to_json(const struct rte_tel_data > *d, > char *out_buf, size_t buf_len) > return used; > } > > +static bool > +json_is_special_char(char ch) > +{ > +static unsigned char is_spec[256] = { 0 }; > +static bool init_once; > + > +if (!init_once) { > +is_spec['\"'] = 1; > +is_spec['\\'] = 1; > +is_spec['/'] = 1; > +is_spec['\b'] = 1; > +is_spec['\f'] = 1; > +is_spec['\n'] = 1; > +is_spec['\r'] = 1; > +is_spec['\t'] = 1; > +init_once = true; > +} > + > +return (bool)is_spec[(unsigned char)ch]; > +} > >> > >> According to the json spec at [1], the characters that need to be > escaped > >> are: > >> a) any characters <0x20 > >> b) inverted commas/quote character \" > >> c) the "reverse solidus character", better known to you and I as the > >> back-slash. > >> > >> Therefore, I think this table generation could be simplified, but > also > >> expanded using this. For completeness we should also see about > handling all > >> control characters if they are encountered. > >> > >> [1] https://www.rfc-editor.org/rfc/rfc8259.txt > >> > >> /Bruce > > > > Since it is trivial could be initializer? > > > > static const uint8_t is_spec[256] = { > >[0 ... 0x20] = 1, > >['\"' ] = 1, > >['\\' ] = 1, > >['/'] = 1, > > > > etc > > > > Or we could change the telemetry API to disallow control characters? > > I was thinking about converting 0~0x20, but I don't think there's a > scenario. > > I prefer change the telemetry API to disallow control characters, and > this may not > be a violation of the ABI, if yes, the dpdk-telemetry.py will returns > an error. I agree with Chengwen Feng. The telemetry data type is STRING, not BLOB. So we need to define exactly what the STRING type contains. I hope we can all agree that control characters should be disallowed. The more complicated question is: Do we want to use the ASCII character set only, or do we want to use UTF-8 encoded Unicode? Personally, think UTF-8 encoded Unicode is more future proof, and would vote for that. But I wouldn't reject limiting it to ASCII, and perhaps in the future introduce another data type for UTF-8 strings. UTF-8 is the modern choice, but it is incompatible with old stuff, e.g. many SNMP MIBs. > > So I think we could add declaring and checking functions to make sure > telemetry string > do not allow control characters. Input validation (when storing a string in the telemetry database) has a performance cost, so it could be a compile time debug option, like the memory cookies and mbuf integrity checks. Just a thought.
Re:Re: [PATCH v7] ip_frag: add IPv4 options fragment and test data
Hi,Stephen Thank you very much for your reply! >I would just replace all of the rte_memcpy with memcpy I will replace all of the rte_memcpy with memcpy. >I expect that rte_memcpy() is able to do better than memcpy() for larger >copies because it is >likely to use bigger vector instructions and check for alignment. >For small copies just doing the mov's directly is going to be as fast or >faster. >In fact, lots of places in DPDK should >replace rte_memcpy() with simple structure assignment to preserve type safety. I don't know the dividing line(the size of the data) between rte_memcpy and memcpy. We simply test 1500 bytes of replication, memcpy seems to be faster, maybe our test is not accurate enough. >This is somewhat historical data, it might be wrong. It would be worthwhile to >have benchmarks >across different sizes (variable and fixed), different compilers, and >different CPU's. >There might be surprising results. So I hope this can go on and provide a more professional rte_memcpy manual.Thanks! Huichao,Cai
Re: [PATCH] eal: fixes the bug where rte_malloc() fails to allocates memory
Hi Fidaullah, Thanks for the fix, Acked-by: Dmitry Kozlyuk Anatoly, I noticed a couple of other things while testing this. 1. Consider: elt_size = pg_sz - MALLOC_ELEM_OVERHEAD rte_malloc(align=0) which is converted to align = 1. Obviously, such an element fits into one page, however: alloc_sz = RTE_ALIGN_CEIL(1 + pg_sz + (MALLOC_ELEM_OVERHEAD - MALLOC_ELEM_OVERHEAD), pg_sz) == 2 * pg_sz. This can unnecessarily hit an allocation limit from the system or EAL. I suggest, in both places: alloc_sz = RTE_ALIGN_CEIL(RTE_ALIGN_CEIL(elt_size, align) + MALLOC_ELEM_OVERHEAD, pg_sz); This would be symmetric with malloc_elem_can_hold(). 2. Alignment calculation depends on whether we allocated new pages or not: malloc_heap_alloc_on_heap_id(align = 0) -> heap_alloc(align = 1) -> find_suitable_element(align = RTE_CACHE_LINE_ROUNDUP(align)) malloc_heap_alloc_on_heap_id(align = 0) -> alloc_more_mem_on_socket(align = 1) -> try_expand_heap() -> ... -> alloc_pages_on_heap(align = 1) -> find_suitable_element(align = 1) Why do we call find_suitable_element() directly and not just return and repeat the heap_alloc() attempt?
Re: [PATCH v2 2/6] eal: add thread lifetime management
2022-06-14 16:47 (UTC-0700), Tyler Retzlaff: > On Windows, the function executed by a thread when the thread starts is > represeneted by a function pointer of type DWORD (*func) (void*). > On other platforms, the function pointer is a void* (*func) (void*). > > Performing a cast between these two types of function pointers to > uniformize the API on all platforms may result in undefined behavior. > TO fix this issue, a wrapper that respects the signature required by > CreateThread() has been created on Windows. The interface issue is still there: `rte_thread_func` allows the thread routine to have a pointer-sized result. `rte_thread_join()` allows to obtain this value as `unsigned long`, which is pointer-sized on 32-bit platforms and less than that on 64-bit platforms. This can lead to issues when developers assume they can return a pointer and this works on 32 bits, but doesn't work on 64 bits. If you want to keep API as close to phtread as possible, the limitation must be at least clearly documented in Doxygen (`rte_thread_func` is undocumented BTW). I also suggest using `uint32_t` instead of `unsigned long` precisely to avoid "is it pointer-sized?" doubts. (I don't see much benefit in keeping pthread-like signature. When moving from pthread to rte_thread, it is as trivial to change the thread routine return type.) > +int > +rte_thread_create(rte_thread_t *thread_id, > + const rte_thread_attr_t *thread_attr, > + rte_thread_func thread_func, void *args) > +{ > [...] > + if (thread_attr->priority == > + RTE_THREAD_PRIORITY_REALTIME_CRITICAL) { > + ret = ENOTSUP; > + goto cleanup; > + } > + ret = thread_map_priority_to_os_value(thread_attr->priority, > + ¶m.sched_priority, &policy); > + if (ret != 0) > + goto cleanup; thread_map_priority_to_os_value() already checks for unsupported values, why not let it do this particular check? > +int > +rte_thread_join(rte_thread_t thread_id, unsigned long *value_ptr) > +{ > + int ret = 0; > + void *res = NULL; > + void **pres = NULL; > + > + if (value_ptr != NULL) > + pres = &res; > + > + ret = pthread_join((pthread_t)thread_id.opaque_id, pres); > + if (ret != 0) { > + RTE_LOG(DEBUG, EAL, "pthread_join failed\n"); > + return ret; > + } > + > + if (value_ptr != NULL && *pres != NULL) > + *value_ptr = *(unsigned long *)(*pres); > + > + return 0; > +} What makes *pres == NULL special? > +static DWORD > +thread_func_wrapper(void *args) > +{ > + struct thread_routine_ctx *pctx = args; > + struct thread_routine_ctx ctx; > + > + ctx.thread_func = pctx->thread_func; > + ctx.routine_args = pctx->routine_args; ctx = *pctx? > + > + free(pctx); > + > + return (DWORD)(uintptr_t)ctx.thread_func(ctx.routine_args); > +} > + > +int > +rte_thread_create(rte_thread_t *thread_id, > + const rte_thread_attr_t *thread_attr, > + rte_thread_func thread_func, void *args) > +{ > + int ret = 0; > + DWORD tid; > + HANDLE thread_handle = NULL; > + GROUP_AFFINITY thread_affinity; > + struct thread_routine_ctx *ctx = NULL; > + > + ctx = calloc(1, sizeof(*ctx)); > + if (ctx == NULL) { > + RTE_LOG(DEBUG, EAL, "Insufficient memory for thread context > allocations\n"); > + ret = ENOMEM; > + goto cleanup; > + } > + ctx->routine_args = args; > + ctx->thread_func = thread_func; > + > + thread_handle = CreateThread(NULL, 0, thread_func_wrapper, ctx, > + CREATE_SUSPENDED, &tid); > + if (thread_handle == NULL) { > + ret = thread_log_last_error("CreateThread()"); > + free(ctx); > + goto cleanup; Missing `free(ctx)` from other error paths below. > + } > + thread_id->opaque_id = tid; > + > + if (thread_attr != NULL) { > + if (CPU_COUNT(&thread_attr->cpuset) > 0) { > + ret = convert_cpuset_to_affinity( > + &thread_attr->cpuset, > + &thread_affinity > + ); > + if (ret != 0) { > + RTE_LOG(DEBUG, EAL, "Unable to convert cpuset > to thread affinity\n"); > + goto cleanup; > + } > + > + if (!SetThreadGroupAffinity(thread_handle, > + &thread_affinity, NULL)) { > + ret = > thread_log_last_error("SetThreadGroupAffinity()"); > + goto cleanup; > + } > + } > + ret = rte_thread_set_priority(*thread_id, > +
[PATCH] ip_frag: replace the rte memcpy with memcpy
To resolve the compilation warning,replace the rte_memcpy with memcpy. Modify in file test_ipfrag.c and rte_ipv4_fragmentation.c. Signed-off-by: Huichao Cai --- app/test/test_ipfrag.c | 13 ++--- lib/ip_frag/rte_ipv4_fragmentation.c | 7 +++ 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/app/test/test_ipfrag.c b/app/test/test_ipfrag.c index dc62b0e..ba0ffd0 100644 --- a/app/test/test_ipfrag.c +++ b/app/test/test_ipfrag.c @@ -23,7 +23,6 @@ #include #include -#include #include #define NUM_MBUFS 128 @@ -147,13 +146,13 @@ static void ut_teardown(void) if (opt_copied) { expected_opt->len = sizeof(expected_first_frag_ipv4_opts_copied); - rte_memcpy(expected_opt->data, + memcpy(expected_opt->data, expected_first_frag_ipv4_opts_copied, sizeof(expected_first_frag_ipv4_opts_copied)); } else { expected_opt->len = sizeof(expected_first_frag_ipv4_opts_nocopied); - rte_memcpy(expected_opt->data, + memcpy(expected_opt->data, expected_first_frag_ipv4_opts_nocopied, sizeof(expected_first_frag_ipv4_opts_nocopied)); } @@ -161,13 +160,13 @@ static void ut_teardown(void) if (opt_copied) { expected_opt->len = sizeof(expected_sub_frag_ipv4_opts_copied); - rte_memcpy(expected_opt->data, + memcpy(expected_opt->data, expected_sub_frag_ipv4_opts_copied, sizeof(expected_sub_frag_ipv4_opts_copied)); } else { expected_opt->len = sizeof(expected_sub_frag_ipv4_opts_nocopied); - rte_memcpy(expected_opt->data, + memcpy(expected_opt->data, expected_sub_frag_ipv4_opts_nocopied, sizeof(expected_sub_frag_ipv4_opts_nocopied)); } @@ -227,7 +226,7 @@ static void ut_teardown(void) hdr->src_addr = rte_cpu_to_be_32(0x8080808); hdr->dst_addr = rte_cpu_to_be_32(0x8080404); - rte_memcpy(hdr + 1, opt.data, opt.len); + memcpy(hdr + 1, opt.data, opt.len); } static void @@ -312,7 +311,7 @@ static void ut_teardown(void) char *iph_opt = rte_pktmbuf_mtod_offset(mb[i], char *, sizeof(struct rte_ipv4_hdr)); opt->len = opt_len; - rte_memcpy(opt->data, iph_opt, opt_len); + memcpy(opt->data, iph_opt, opt_len); } else { opt->len = RTE_IPV4_HDR_OPT_MAX_LEN; memset(opt->data, RTE_IPV4_HDR_OPT_EOL, diff --git a/lib/ip_frag/rte_ipv4_fragmentation.c b/lib/ip_frag/rte_ipv4_fragmentation.c index a19f6fd..27a8ad2 100644 --- a/lib/ip_frag/rte_ipv4_fragmentation.c +++ b/lib/ip_frag/rte_ipv4_fragmentation.c @@ -5,7 +5,6 @@ #include #include -#include #include #include "ip_frag_common.h" @@ -26,7 +25,7 @@ static inline void __fill_ipv4hdr_frag(struct rte_ipv4_hdr *dst, const struct rte_ipv4_hdr *src, uint16_t header_len, uint16_t len, uint16_t fofs, uint16_t dofs, uint32_t mf) { - rte_memcpy(dst, src, header_len); + memcpy(dst, src, header_len); fofs = (uint16_t)(fofs + (dofs >> RTE_IPV4_HDR_FO_SHIFT)); fofs = (uint16_t)(fofs | mf << RTE_IPV4_HDR_MF_SHIFT); dst->fragment_offset = rte_cpu_to_be_16(fofs); @@ -48,7 +47,7 @@ static inline uint16_t __create_ipopt_frag_hdr(uint8_t *iph, struct rte_ipv4_hdr *iph_opt = (struct rte_ipv4_hdr *)ipopt_frag_hdr; ipopt_len = 0; - rte_memcpy(ipopt_frag_hdr, iph, sizeof(struct rte_ipv4_hdr)); + memcpy(ipopt_frag_hdr, iph, sizeof(struct rte_ipv4_hdr)); ipopt_frag_hdr += sizeof(struct rte_ipv4_hdr); uint8_t *p_opt = iph + sizeof(struct rte_ipv4_hdr); @@ -65,7 +64,7 @@ static inline uint16_t __create_ipopt_frag_hdr(uint8_t *iph, break; if (RTE_IPV4_HDR_OPT_COPIED(*p_opt)) { - rte_memcpy(ipopt_frag_hdr + ipopt_len, + memcpy(ipopt_frag_hdr + ipopt_len, p_opt, p_opt[1]); ipopt_len += p_opt[1]; } -- 1.8.3.1
Re: dpdk address sanitizer
Thank you for your feedback, i will look into that .. any suggestions on what technique I can use to find memory leaks, invalid accesses, etc etc ? thanks!!! From: Stephen Hemminger Sent: Friday, June 17, 2022 4:17 PM To: David Marchand Cc: Juan Pablo L. ; us...@dpdk.org ; Burakov, Anatoly ; Dmitry Kozlyuk ; dev ; c...@dpdk.org Subject: Re: dpdk address sanitizer On Fri, 17 Jun 2022 07:45:33 +0200 David Marchand wrote: > Hello, > > On Fri, Jun 17, 2022 at 6:08 AM Juan Pablo L. > wrote: > > > > Hello, I am new to dpdk ... i would like to trace memory usage and detect > > memory leaks, valgrind as well as address sanitizer (gcc) report some > > memory loss at application end. For the life of me, i cannot figure it out > > ... i just write a simple program that has the rte_eal_init + > > rte_eal_cleanup and i get the following error (also tried helloworld from > > examples, with same results): > > > > ==3399==ERROR: AddressSanitizer: stack-buffer-overflow on address > > 0x7f6ca3efb480 at pc 0x7f6ca7162b61 bp 0x7f6ca3efb450 sp 0x7f6ca3efac00 > > WRITE of size 24 at 0x7f6ca3efb480 thread T-1 > > #0 0x7f6ca7162b60 in __interceptor_sigaltstack.part.0 > > (/lib64/libasan.so.8+0x61b60) > > #1 0x7f6ca71d9337 in __sanitizer::UnsetAlternateSignalStack() > > (/lib64/libasan.so.8+0xd8337) > > #2 0x7f6ca71c90f4 in __asan::AsanThread::Destroy() > > (/lib64/libasan.so.8+0xc80f4) > > #3 0x7f6ca679b000 in __GI___nptl_deallocate_tsd (/lib64/libc.so.6+0x8a000) > > #4 0x7f6ca679dc9d in start_thread (/lib64/libc.so.6+0x8cc9d) > > #5 0x7f6ca68235df in __GI___clone3 (/lib64/libc.so.6+0x1125df) > > > > Address 0x7f6ca3efb480 is a wild pointer inside of access range of size > > 0x0018. > > SUMMARY: AddressSanitizer: stack-buffer-overflow > > (/lib64/libasan.so.8+0x61b60) in __interceptor_sigaltstack.part.0 > > Shadow bytes around the buggy address: > > 0x0fee147d7640: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 > > 0x0fee147d7650: 00 00 00 00 00 00 00 00 00 06 f2 f2 f2 f2 00 00 > > 0x0fee147d7660: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 0x0fee147d7670: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 0x0fee147d7680: 00 00 00 00 00 00 00 00 00 00 00 04 f3 f3 f3 f3 > > =>0x0fee147d7690:[f3]f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 > > 0x0fee147d76a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 0x0fee147d76b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 0x0fee147d76c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 0x0fee147d76d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 0x0fee147d76e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > Shadow byte legend (one shadow byte represents 8 application bytes): > > Addressable: 00 > > Partially addressable: 01 02 03 04 05 06 07 > > Heap left redzone: fa > > Freed heap region: fd > > Stack left redzone: f1 > > Stack mid redzone: f2 > > Stack right redzone: f3 > > Stack after return: f5 > > Stack use after scope: f8 > > Global redzone: f9 > > Global init order: f6 > > Poisoned by user: f7 > > Container overflow: fc > > Array cookie: ac > > Intra object redzone: bb > > ASan internal: fe > > Left alloca redzone: ca > > Right alloca redzone: cb > > ==3399==ABORTING > > > > I am not sure what I m doing wrong but it is very frustrating. On top of > > that, I try other scenarios and see if I can just "ignore" that and still > > detect other memory leaks but it does not work. I get memory from > > rte_malloc and don't free it and I still get the above report only, I do > > not get any report from the memory I leaked intentionally ... no difference > > what so ever I tried the same with the helloworld example and I get > > the same results > > I experienced the same issue recently on Fedora 36. > I did not investigate. > > I think I waived this warning, by setting > ASAN_OPTIONS="use_sigaltstack=0" in the environment. > HTH. > Looks like the alternate signal stack allocated inside address sanitizer is not big enough.
[PATCH v1 0/2] Fix meter flow fail when matching E-Switch Manager
When using a meter in flow that matches E-Switch Manager, it will fail due to not handling E-Switch Manager match item correctly. This series fix this by using correct handling of parsing E-Switch Manager item. Shun Hao (2): net/mlx5: add limitation for E-Switch Manager match net/mlx5: fix meter fail when used on E-Switch Manager doc/guides/nics/mlx5.rst| 4 ++ drivers/net/mlx5/mlx5_flow.c| 93 - drivers/net/mlx5/mlx5_flow.h| 10 drivers/net/mlx5/mlx5_flow_dv.c | 42 +++ 4 files changed, 100 insertions(+), 49 deletions(-) -- 2.20.0
[PATCH v1 1/2] net/mlx5: add limitation for E-Switch Manager match
For BF with old FW which doesn't expose the E-Switch Manager vport ID, E-Switch Manager port matching works correctly only when BF is in embedded CPU mode. This patch adds the limitation description. Fixes: a564038699f9 ("net/mlx5: support E-Switch manager egress traffic match") Cc: sta...@dpdk.org Signed-off-by: Shun Hao Acked-by: Matan Azard --- doc/guides/nics/mlx5.rst| 4 drivers/net/mlx5/mlx5_flow.h| 4 drivers/net/mlx5/mlx5_flow_dv.c | 10 -- 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 1b66b2bc33..f10e112d27 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -534,6 +534,10 @@ Limitations - When configuring host shaper with MLX5_HOST_SHAPER_FLAG_AVAIL_THRESH_TRIGGERED flag set, only rates 0 and 100Mbps are supported. +- E-Switch Manager matching: + + - For Bluefield with old FW which doesn't expose the E-Switch Manager vport ID in the capability, matching E-Switch Manager should be used only in Bluefield embedded CPU mode. + Statistics -- diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h index f00c033fc5..7300390070 100644 --- a/drivers/net/mlx5/mlx5_flow.h +++ b/drivers/net/mlx5/mlx5_flow.h @@ -2076,4 +2076,8 @@ int flow_dv_action_query(struct rte_eth_dev *dev, size_t flow_dv_get_item_hdr_len(const enum rte_flow_item_type item_type); int flow_dv_convert_encap_data(const struct rte_flow_item *items, uint8_t *buf, size_t *size, struct rte_flow_error *error); + +#define MLX5_PF_VPORT_ID 0 +#define MLX5_ECPF_VPORT_ID 0xFFFE + #endif /* RTE_PMD_MLX5_FLOW_H_ */ diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c index 65b02b20ce..09f662bdcf 100644 --- a/drivers/net/mlx5/mlx5_flow_dv.c +++ b/drivers/net/mlx5/mlx5_flow_dv.c @@ -99,6 +99,7 @@ flow_dv_get_esw_manager_vport_id(struct rte_eth_dev *dev) struct mlx5_priv *priv = dev->data->dev_private; struct mlx5_common_device *cdev = priv->sh->cdev; + /* New FW exposes E-Switch Manager vport ID, can use it directly. */ if (cdev->config.hca_attr.esw_mgr_vport_id_valid) return (int16_t)cdev->config.hca_attr.esw_mgr_vport_id; @@ -108,9 +109,14 @@ flow_dv_get_esw_manager_vport_id(struct rte_eth_dev *dev) case PCI_DEVICE_ID_MELLANOX_CONNECTX5BF: case PCI_DEVICE_ID_MELLANOX_CONNECTX6DXBF: case PCI_DEVICE_ID_MELLANOX_CONNECTX7BF: - return (int16_t)0xfffe; + /* +* In old FW which doesn't expose the E-Switch Manager vport ID in the capability, +* only the BF embedded CPUs control the E-Switch Manager port. Hence, +* ECPF vport ID is selected and not the host port (0) in any BF case. +*/ + return (int16_t)MLX5_ECPF_VPORT_ID; default: - return 0; + return MLX5_PF_VPORT_ID; } } -- 2.20.0
[PATCH v1 2/2] net/mlx5: fix meter fail when used on E-Switch Manager
When meter is used by E-Switch Manager port, there's an error that cannot get correct port ID. This patch fixes this by using specific parsing process to get port ID for E-Switch Manager. Fixes: 3c481324baf3 ("net/mlx5: fix meter flow direction check") Cc: sta...@dpdk.org Signed-off-by: Shun Hao Acked-by: Matan Azard --- drivers/net/mlx5/mlx5_flow.c| 93 - drivers/net/mlx5/mlx5_flow.h| 6 +++ drivers/net/mlx5/mlx5_flow_dv.c | 48 +++-- 3 files changed, 92 insertions(+), 55 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 090de0366b..cfcd884b82 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -5297,7 +5297,7 @@ flow_meter_split_prep(struct rte_eth_dev *dev, const struct rte_flow_item *orig_items = items; struct rte_flow_action *hw_mtr_action; struct rte_flow_action *action_pre_head = NULL; - int32_t flow_src_port = priv->representor_id; + uint16_t flow_src_port = priv->representor_id; bool mtr_first; uint8_t mtr_id_offset = priv->mtr_reg_share ? MLX5_MTR_COLOR_BITS : 0; uint8_t mtr_reg_bits = priv->mtr_reg_share ? @@ -5311,22 +5311,12 @@ flow_meter_split_prep(struct rte_eth_dev *dev, /* Prepare the suffix subflow items. */ tag_item = sfx_items++; for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) { - struct mlx5_priv *port_priv; - const struct rte_flow_item_port_id *pid_v; int item_type = items->type; switch (item_type) { case RTE_FLOW_ITEM_TYPE_PORT_ID: - pid_v = items->spec; - MLX5_ASSERT(pid_v); - port_priv = mlx5_port_to_eswitch_info(pid_v->id, false); - if (!port_priv) - return rte_flow_error_set(error, - rte_errno, - RTE_FLOW_ERROR_TYPE_ITEM_SPEC, - pid_v, - "Failed to get port info."); - flow_src_port = port_priv->representor_id; + if (mlx5_flow_get_item_vport_id(dev, items, &flow_src_port, error)) + return -rte_errno; if (!fm->def_policy && wks->policy->is_hierarchy && flow_src_port != priv->representor_id) { if (flow_drv_mtr_hierarchy_rule_create(dev, @@ -10955,3 +10945,80 @@ mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority, } return res; } + +/** + * Get the E-Switch Manager vport id. + * + * @param[in] dev + * Pointer to the Ethernet device structure. + * + * @return + * The vport id. + */ +int16_t mlx5_flow_get_esw_manager_vport_id(struct rte_eth_dev *dev) +{ + struct mlx5_priv *priv = dev->data->dev_private; + struct mlx5_common_device *cdev = priv->sh->cdev; + + /* New FW exposes E-Switch Manager vport ID, can use it directly. */ + if (cdev->config.hca_attr.esw_mgr_vport_id_valid) + return (int16_t)cdev->config.hca_attr.esw_mgr_vport_id; + + if (priv->pci_dev == NULL) + return 0; + switch (priv->pci_dev->id.device_id) { + case PCI_DEVICE_ID_MELLANOX_CONNECTX5BF: + case PCI_DEVICE_ID_MELLANOX_CONNECTX6DXBF: + case PCI_DEVICE_ID_MELLANOX_CONNECTX7BF: + /* +* In old FW which doesn't expose the E-Switch Manager vport ID in the capability, +* only the BF embedded CPUs control the E-Switch Manager port. Hence, +* ECPF vport ID is selected and not the host port (0) in any BF case. +*/ + return (int16_t)MLX5_ECPF_VPORT_ID; + default: + return MLX5_PF_VPORT_ID; + } +} + +/** + * Parse item to get the vport id. + * + * @param[in] dev + * Pointer to the Ethernet device structure. + * @param[in] item + * The src port id match item. + * @param[out] vport_id + * Pointer to put the vport id. + * @param[out] error + * Pointer to error structure. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +int mlx5_flow_get_item_vport_id(struct rte_eth_dev *dev, + const struct rte_flow_item *item, + uint16_t *vport_id, + struct rte_flow_error *error) +{ + struct mlx5_priv *port_priv; + const struct rte_flow_item_port_id *pid_v; + + if (item->type != RTE_FLOW_ITEM_TYPE_PORT_ID) + return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM_SPEC, + NULL, "Incorrect item type."); + pid_v = item->spec; + if (!pid_v) + return 0; +