Hi, Thanks for your comments, the replies are inline.
Thanks, Cheng > -----Original Message----- > From: Anoob Joseph <ano...@marvell.com> > Sent: Friday, June 9, 2023 7:44 PM > To: Jiang, Cheng1 <cheng1.ji...@intel.com>; tho...@monjalon.net; > Richardson, Bruce <bruce.richard...@intel.com>; > m...@smartsharesystems.com; Xia, Chenbo <chenbo....@intel.com> > Cc: dev@dpdk.org; Hu, Jiayu <jiayu...@intel.com>; Ding, Xuan > <xuan.d...@intel.com>; Ma, WenwuX <wenwux...@intel.com>; Wang, > YuanX <yuanx.w...@intel.com>; He, Xingguang <xingguang...@intel.com>; > Jerin Jacob Kollanukkaran <jer...@marvell.com>; Vamsi Krishna Attunuru > <vattun...@marvell.com>; Amit Prakash Shukla > <amitpraka...@marvell.com>; Satha Koteswara Rao Kottidi > <skotesh...@marvell.com>; Gowrishankar Muthukrishnan > <gmuthukri...@marvell.com>; Vidya Sagar Velumuri > <vvelum...@marvell.com> > Subject: RE: [EXT] [PATCH v5] app/dma-perf: introduce dma-perf application > > Hi, > > Thanks for adding the app. Few comments inline. Please check. > > Thanks, > Anoob > > > -----Original Message----- > > From: Cheng Jiang <cheng1.ji...@intel.com> > > Sent: Thursday, June 8, 2023 2:14 PM > > To: tho...@monjalon.net; bruce.richard...@intel.com; > > m...@smartsharesystems.com; chenbo....@intel.com > > Cc: dev@dpdk.org; jiayu...@intel.com; xuan.d...@intel.com; > > wenwux...@intel.com; yuanx.w...@intel.com; xingguang...@intel.com; > > Cheng Jiang <cheng1.ji...@intel.com> > > Subject: [EXT] [PATCH v5] app/dma-perf: introduce dma-perf application > > > > External Email > > > > ---------------------------------------------------------------------- > > There are many high-performance DMA devices supported in DPDK now, > and > > these DMA devices can also be integrated into other modules of DPDK as > > accelerators, such as Vhost. Before integrating DMA into applications, > > developers need to know the performance of these DMA devices in > > various scenarios and the performance of CPUs in the same scenario, > > such as different buffer lengths. Only in this way can we know the > > target performance of the application accelerated by using them. This > > patch introduces a high-performance testing tool, which supports > > comparing the performance of CPU and DMA in different scenarios > > automatically with a pre- set config file. Memory Copy performance test > are supported for now. > > > > Signed-off-by: Cheng Jiang <cheng1.ji...@intel.com> > > Signed-off-by: Jiayu Hu <jiayu...@intel.com> > > Signed-off-by: Yuan Wang <yuanx.w...@intel.com> > > Acked-by: Morten Brørup <m...@smartsharesystems.com> > > Acked-by: Chenbo Xia <chenbo....@intel.com> > > --- > > v5: > > fixed some LONG_LINE warnings; > > v4: > > fixed inaccuracy of the memory footprint display; > > v3: > > fixed some typos; > > v2: > > added lcore/dmadev designation; > > added error case process; > > removed worker_threads parameter from config.ini; > > improved the logs; > > improved config file; > > > > app/meson.build | 1 + > > app/test-dma-perf/benchmark.c | 472 ++++++++++++++++++++++++++++ > > app/test-dma-perf/config.ini | 59 ++++ > > app/test-dma-perf/main.c | 569 > > ++++++++++++++++++++++++++++++++++ > > app/test-dma-perf/main.h | 69 +++++ > > app/test-dma-perf/meson.build | 17 + > > 6 files changed, 1187 insertions(+) > > create mode 100644 app/test-dma-perf/benchmark.c create mode > 100644 > > app/test-dma-perf/config.ini create mode 100644 app/test-dma- > > perf/main.c create mode 100644 app/test-dma-perf/main.h create mode > > 100644 app/test-dma-perf/meson.build > > > > <snip> > > > + > > +/* Configuration of device. */ > > +static void > > +configure_dmadev_queue(uint32_t dev_id, uint32_t ring_size) { > > + uint16_t vchan = 0; > > + struct rte_dma_info info; > > + struct rte_dma_conf dev_config = { .nb_vchans = 1 }; > > [Anoob] Is it possible to use more vchans? The code launches as many > threads as the number of dma devices. Instead it should be total number of > vchans. [Cheng] Really good suggestion. This is feasible, but in the initial stage, we want to keep things simple. Perhaps in the future, we can add a parameter to configure the number of vchans for each device and then launch the corresponding number of threads for each vchan. > > > + struct rte_dma_vchan_conf qconf = { > > + .direction = RTE_DMA_DIR_MEM_TO_MEM, > > + .nb_desc = ring_size > > + }; > > + > > + if (rte_dma_configure(dev_id, &dev_config) != 0) > > + rte_exit(EXIT_FAILURE, "Error with dma configure.\n"); > > + > > + if (rte_dma_vchan_setup(dev_id, vchan, &qconf) != 0) > > + rte_exit(EXIT_FAILURE, "Error with queue configuration.\n"); > > + > > + rte_dma_info_get(dev_id, &info); > > + if (info.nb_vchans != 1) > > + rte_exit(EXIT_FAILURE, "Error, no configured queues > > reported on device id. %u\n", > > + dev_id); > > + > > + if (rte_dma_start(dev_id) != 0) > > + rte_exit(EXIT_FAILURE, "Error with dma start.\n"); } > > + > > + > > <snip> > > > +static inline int > > +do_dma_mem_copy(void *p) > > +{ > > + uint16_t *para_idx = (uint16_t *)p; > > + volatile struct lcore_params *para = worker_params[*para_idx]; > > + volatile struct worker_info *worker_info = &(para->worker_info); > > + uint16_t dev_id = para->dev_id; > > + uint32_t nr_buf = para->nr_buf; > > + uint16_t kick_batch = para->kick_batch; > > [Anoob] Some of these variables can be made const. Since this is fast path, > might be beneficial doing that way. [Cheng] Good idea, I'll improve it in the next version. > > > + uint32_t buf_size = para->buf_size; > > + struct rte_mbuf **srcs = para->srcs; > > + struct rte_mbuf **dsts = para->dsts; > > + int64_t async_cnt = 0; > > + int nr_cpl = 0; > > + uint32_t i; > > + uint32_t poll_cnt = 0; > > + > > + worker_info->stop_flag = false; > > + worker_info->ready_flag = true; > > + > > + while (!worker_info->start_flag) > > + ; > > + > > + while (1) { > > + for (i = 0; i < nr_buf; i++) { > > + if (unlikely(rte_dma_copy(dev_id, > > + 0, > > + rte_pktmbuf_iova(srcs[i]), > > + rte_pktmbuf_iova(dsts[i]), > > + buf_size, > > + 0) < 0)) { > > + rte_dma_submit(dev_id, 0); > > + while (rte_dma_burst_capacity(dev_id, 0) == > > 0) { > > + nr_cpl = rte_dma_completed(dev_id, > > 0, MAX_DMA_CPL_NB, > > + NULL, NULL); > > + async_cnt -= nr_cpl; > > + worker_info->total_cpl += nr_cpl; > > + } > > + if (rte_dma_copy(dev_id, > > + 0, > > + rte_pktmbuf_iova(srcs[i]), > > + rte_pktmbuf_iova(dsts[i]), > > + buf_size, > > + 0) < 0) { > > + printf("enqueue fail again at %u\n", > > i); > > + printf("space:%d\n", > > rte_dma_burst_capacity(dev_id, 0)); > > + rte_exit(EXIT_FAILURE, "DMA > > enqueue failed\n"); > > + } > > [Anoob] Only if the API returns -ENOSPC we should retry submitting, right? > Other errors should be treated as fatal errors. > > Do we need to use rte_dma_burst_capacity() API? > > Can't we try something like, > > dma_copy: > ret = rte_dma_copy(dev_id, 0, rte_pktmbuf_iova(srcs[i]), > rte_pktmbuf_iova(dsts[i]), buf_size, 0); > if (unlikely (ret < 0) { > if (ret == -ENOSPC) { > rte_dma_submit(dev_id, 0); > /* DMA completed & other handling */ > goto dma_copy; > } else { > /* Error exit */ > } > } > > [Cheng] Good idea, we don't have to check the capacity explicitly. I think your implementation is more clear, thanks. I will fix it in the next version. > > + } > > + async_cnt++; > > + > > + if ((async_cnt % kick_batch) == 0) { > > + rte_dma_submit(dev_id, 0); > > + /* add a poll to avoid ring full */ > > + nr_cpl = rte_dma_completed(dev_id, 0, > > MAX_DMA_CPL_NB, NULL, NULL); > > + async_cnt -= nr_cpl; > > + worker_info->total_cpl += nr_cpl; > > [Anoob] Above code can be made as a static inline function so that in cases > rte_dma_copy returns -ENOSPC, same static inline can be called. > [Cheng] sure, got it. Thanks! > <snip>