RE: [EXT] [PATCH v5] app/dma-perf: introduce dma-perf application

Anoob Joseph Fri, 09 Jun 2023 04:44:25 -0700

Hi,

Thanks for adding the app. Few comments inline. Please check.


Thanks,
Anoob

> -----Original Message-----
> From: Cheng Jiang <[email protected]>
> Sent: Thursday, June 8, 2023 2:14 PM
> To: [email protected]; [email protected];
> [email protected]; [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> Cheng Jiang <[email protected]>
> Subject: [EXT] [PATCH v5] app/dma-perf: introduce dma-perf application
> 
> External Email
> 
> ----------------------------------------------------------------------
> There are many high-performance DMA devices supported in DPDK now,
> and these DMA devices can also be integrated into other modules of DPDK as
> accelerators, such as Vhost. Before integrating DMA into applications,
> developers need to know the performance of these DMA devices in various
> scenarios and the performance of CPUs in the same scenario, such as
> different buffer lengths. Only in this way can we know the target
> performance of the application accelerated by using them. This patch
> introduces a high-performance testing tool, which supports comparing the
> performance of CPU and DMA in different scenarios automatically with a pre-
> set config file. Memory Copy performance test are supported for now.
> 
> Signed-off-by: Cheng Jiang <[email protected]>
> Signed-off-by: Jiayu Hu <[email protected]>
> Signed-off-by: Yuan Wang <[email protected]>
> Acked-by: Morten Brørup <[email protected]>
> Acked-by: Chenbo Xia <[email protected]>
> ---
> v5:
>   fixed some LONG_LINE warnings;
> v4:
>   fixed inaccuracy of the memory footprint display;
> v3:
>   fixed some typos;
> v2:
>   added lcore/dmadev designation;
>   added error case process;
>   removed worker_threads parameter from config.ini;
>   improved the logs;
>   improved config file;
> 
>  app/meson.build               |   1 +
>  app/test-dma-perf/benchmark.c | 472 ++++++++++++++++++++++++++++
> app/test-dma-perf/config.ini  |  59 ++++
>  app/test-dma-perf/main.c      | 569
> ++++++++++++++++++++++++++++++++++
>  app/test-dma-perf/main.h      |  69 +++++
>  app/test-dma-perf/meson.build |  17 +
>  6 files changed, 1187 insertions(+)
>  create mode 100644 app/test-dma-perf/benchmark.c  create mode 100644
> app/test-dma-perf/config.ini  create mode 100644 app/test-dma-
> perf/main.c  create mode 100644 app/test-dma-perf/main.h  create mode
> 100644 app/test-dma-perf/meson.build
> 

<snip>

> +
> +/* Configuration of device. */
> +static void
> +configure_dmadev_queue(uint32_t dev_id, uint32_t ring_size) {
> +     uint16_t vchan = 0;
> +     struct rte_dma_info info;
> +     struct rte_dma_conf dev_config = { .nb_vchans = 1 };

[Anoob] Is it possible to use more vchans? The code launches as many threads as 
the number of dma devices. Instead it should be total number of vchans.

> +     struct rte_dma_vchan_conf qconf = {
> +             .direction = RTE_DMA_DIR_MEM_TO_MEM,
> +             .nb_desc = ring_size
> +     };
> +
> +     if (rte_dma_configure(dev_id, &dev_config) != 0)
> +             rte_exit(EXIT_FAILURE, "Error with dma configure.\n");
> +
> +     if (rte_dma_vchan_setup(dev_id, vchan, &qconf) != 0)
> +             rte_exit(EXIT_FAILURE, "Error with queue configuration.\n");
> +
> +     rte_dma_info_get(dev_id, &info);
> +     if (info.nb_vchans != 1)
> +             rte_exit(EXIT_FAILURE, "Error, no configured queues
> reported on device id. %u\n",
> +                             dev_id);
> +
> +     if (rte_dma_start(dev_id) != 0)
> +             rte_exit(EXIT_FAILURE, "Error with dma start.\n"); }
> +
> +

<snip>

> +static inline int
> +do_dma_mem_copy(void *p)
> +{
> +     uint16_t *para_idx = (uint16_t *)p;
> +     volatile struct lcore_params *para = worker_params[*para_idx];
> +     volatile struct worker_info *worker_info = &(para->worker_info);
> +     uint16_t dev_id = para->dev_id;
> +     uint32_t nr_buf = para->nr_buf;
> +     uint16_t kick_batch = para->kick_batch;

[Anoob] Some of these variables can be made const. Since this is fast path, 
might be beneficial doing that way.

> +     uint32_t buf_size = para->buf_size;
> +     struct rte_mbuf **srcs = para->srcs;
> +     struct rte_mbuf **dsts = para->dsts;
> +     int64_t async_cnt = 0;
> +     int nr_cpl = 0;
> +     uint32_t i;
> +     uint32_t poll_cnt = 0;
> +
> +     worker_info->stop_flag = false;
> +     worker_info->ready_flag = true;
> +
> +     while (!worker_info->start_flag)
> +             ;
> +
> +     while (1) {
> +             for (i = 0; i < nr_buf; i++) {
> +                     if (unlikely(rte_dma_copy(dev_id,
> +                                             0,
> +                                             rte_pktmbuf_iova(srcs[i]),
> +                                             rte_pktmbuf_iova(dsts[i]),
> +                                             buf_size,
> +                                             0) < 0)) {
> +                             rte_dma_submit(dev_id, 0);
> +                             while (rte_dma_burst_capacity(dev_id, 0) ==
> 0) {
> +                                     nr_cpl = rte_dma_completed(dev_id,
> 0, MAX_DMA_CPL_NB,
> +                                                             NULL, NULL);
> +                                     async_cnt -= nr_cpl;
> +                                     worker_info->total_cpl += nr_cpl;
> +                             }
> +                             if (rte_dma_copy(dev_id,
> +                                             0,
> +                                             rte_pktmbuf_iova(srcs[i]),
> +                                             rte_pktmbuf_iova(dsts[i]),
> +                                             buf_size,
> +                                             0) < 0) {
> +                                     printf("enqueue fail again at %u\n",
> i);
> +                                     printf("space:%d\n",
> rte_dma_burst_capacity(dev_id, 0));
> +                                     rte_exit(EXIT_FAILURE, "DMA
> enqueue failed\n");
> +                             }

[Anoob] Only if the API returns -ENOSPC we should retry submitting, right? 
Other errors should be treated as fatal errors.

Do we need to use rte_dma_burst_capacity() API?

Can't we try something like,

dma_copy:
                ret =  rte_dma_copy(dev_id, 0, rte_pktmbuf_iova(srcs[i]), 
rte_pktmbuf_iova(dsts[i]), buf_size, 0);
                if (unlikely (ret < 0) {
                        if (ret == -ENOSPC) {
                                rte_dma_submit(dev_id, 0);
                                /* DMA completed & other handling */
                                goto dma_copy;
                        } else {
                                /* Error exit */
                        }
                }

                        
> +                     }
> +                     async_cnt++;
> +
> +                     if ((async_cnt % kick_batch) == 0) {
> +                             rte_dma_submit(dev_id, 0);
> +                             /* add a poll to avoid ring full */
> +                             nr_cpl = rte_dma_completed(dev_id, 0,
> MAX_DMA_CPL_NB, NULL, NULL);
> +                             async_cnt -= nr_cpl;
> +                             worker_info->total_cpl += nr_cpl;

[Anoob] Above code can be made as a static inline function so that in cases 
rte_dma_copy returns -ENOSPC, same static inline can be called.

<snip>

RE: [EXT] [PATCH v5] app/dma-perf: introduce dma-perf application

Reply via email to