Hi Anoob, Replies are inline.
Thanks, Cheng > -----Original Message----- > From: Anoob Joseph <ano...@marvell.com> > Sent: Monday, June 26, 2023 1:42 PM > To: Jiang, Cheng1 <cheng1.ji...@intel.com>; tho...@monjalon.net; > Richardson, Bruce <bruce.richard...@intel.com>; > m...@smartsharesystems.com; Xia, Chenbo <chenbo....@intel.com>; Amit > Prakash Shukla <amitpraka...@marvell.com>; huangdeng...@huawei.com; > Laatz, Kevin <kevin.la...@intel.com>; fengcheng...@huawei.com; Jerin > Jacob Kollanukkaran <jer...@marvell.com> > Cc: dev@dpdk.org; Hu, Jiayu <jiayu...@intel.com>; Ding, Xuan > <xuan.d...@intel.com>; Ma, WenwuX <wenwux...@intel.com>; Wang, > YuanX <yuanx.w...@intel.com>; He, Xingguang <xingguang...@intel.com>; > Ling, WeiX <weix.l...@intel.com> > Subject: RE: [EXT] [PATCH v8] app/dma-perf: introduce dma-perf application > > Hi Cheng, > > Please see inline. > > Thanks, > Anoob > > > -----Original Message----- > > From: Jiang, Cheng1 <cheng1.ji...@intel.com> > > Sent: Saturday, June 24, 2023 5:23 PM > > To: Anoob Joseph <ano...@marvell.com>; tho...@monjalon.net; > > Richardson, Bruce <bruce.richard...@intel.com>; > > m...@smartsharesystems.com; Xia, Chenbo <chenbo....@intel.com>; Amit > > Prakash Shukla <amitpraka...@marvell.com>; > huangdeng...@huawei.com; > > Laatz, Kevin <kevin.la...@intel.com>; fengcheng...@huawei.com; Jerin > > Jacob Kollanukkaran <jer...@marvell.com> > > Cc: dev@dpdk.org; Hu, Jiayu <jiayu...@intel.com>; Ding, Xuan > > <xuan.d...@intel.com>; Ma, WenwuX <wenwux...@intel.com>; Wang, > YuanX > > <yuanx.w...@intel.com>; He, Xingguang <xingguang...@intel.com>; Ling, > > WeiX <weix.l...@intel.com> > > Subject: RE: [EXT] [PATCH v8] app/dma-perf: introduce dma-perf > > application > > > > Hi Anoob, > > > > Replies are inline. > > > > Thanks, > > Cheng > > > > > -----Original Message----- > > > From: Anoob Joseph <ano...@marvell.com> > > > Sent: Friday, June 23, 2023 2:53 PM > > > To: Jiang, Cheng1 <cheng1.ji...@intel.com>; tho...@monjalon.net; > > > Richardson, Bruce <bruce.richard...@intel.com>; > > > m...@smartsharesystems.com; Xia, Chenbo <chenbo....@intel.com>; > Amit > > > Prakash Shukla <amitpraka...@marvell.com>; > > huangdeng...@huawei.com; > > > Laatz, Kevin <kevin.la...@intel.com>; fengcheng...@huawei.com; Jerin > > > Jacob Kollanukkaran <jer...@marvell.com> > > > Cc: dev@dpdk.org; Hu, Jiayu <jiayu...@intel.com>; Ding, Xuan > > > <xuan.d...@intel.com>; Ma, WenwuX <wenwux...@intel.com>; Wang, > > YuanX > > > <yuanx.w...@intel.com>; He, Xingguang <xingguang...@intel.com>; > > Ling, > > > WeiX <weix.l...@intel.com> > > > Subject: RE: [EXT] [PATCH v8] app/dma-perf: introduce dma-perf > > > application > > > > > > Hi Cheng, > > > > > > Thanks for the new version. Please see inline. > > > > > > Thanks, > > > Anoob > > > > > > > -----Original Message----- > > > > From: Cheng Jiang <cheng1.ji...@intel.com> > > > > Sent: Tuesday, June 20, 2023 12:24 PM > > > > To: tho...@monjalon.net; bruce.richard...@intel.com; > > > > m...@smartsharesystems.com; chenbo....@intel.com; Amit Prakash > > Shukla > > > > <amitpraka...@marvell.com>; Anoob Joseph <ano...@marvell.com>; > > > > huangdeng...@huawei.com; kevin.la...@intel.com; > > > > fengcheng...@huawei.com; Jerin Jacob Kollanukkaran > > > > <jer...@marvell.com> > > > > Cc: dev@dpdk.org; jiayu...@intel.com; xuan.d...@intel.com; > > > > wenwux...@intel.com; yuanx.w...@intel.com; > > xingguang...@intel.com; > > > > weix.l...@intel.com; Cheng Jiang <cheng1.ji...@intel.com> > > > > Subject: [EXT] [PATCH v8] app/dma-perf: introduce dma-perf > > > > application > > > > > > > > External Email > > > > > > > > ------------------------------------------------------------------ > > > > -- > > > > -- There are many high-performance DMA devices supported in DPDK > > > > now, > > > and > > > > these DMA devices can also be integrated into other modules of > > > > DPDK as accelerators, such as Vhost. Before integrating DMA into > > > > applications, developers need to know the performance of these DMA > > > > devices in various scenarios and the performance of CPUs in the > > > > same scenario, such as different buffer lengths. Only in this way > > > > can we know the target performance of the application accelerated > > > > by using them. This patch introduces a high-performance testing > > > > tool, which supports comparing the performance of CPU and DMA in > > > > different scenarios automatically with a pre- set config file. > > > > Memory Copy performance test > > > are supported for now. > > > > > > > > Signed-off-by: Cheng Jiang <cheng1.ji...@intel.com> > > > > Signed-off-by: Jiayu Hu <jiayu...@intel.com> > > > > Signed-off-by: Yuan Wang <yuanx.w...@intel.com> > > > > Acked-by: Morten Brørup <m...@smartsharesystems.com> > > > > Acked-by: Chenbo Xia <chenbo....@intel.com> > > > > --- > > > > v8: > > > > fixed string copy issue in parse_lcore(); > > > > improved some data display format; > > > > added doc in doc/guides/tools; > > > > updated release notes; > > > > > > > > v7: > > > > fixed some strcpy issues; > > > > removed cache setup in calling rte_pktmbuf_pool_create(); > > > > fixed some typos; > > > > added some memory free and null set operations; > > > > improved result calculation; > > > > v6: > > > > improved code based on Anoob's comments; > > > > fixed some code structure issues; > > > > v5: > > > > fixed some LONG_LINE warnings; > > > > v4: > > > > fixed inaccuracy of the memory footprint display; > > > > v3: > > > > fixed some typos; > > > > v2: > > > > added lcore/dmadev designation; > > > > added error case process; > > > > removed worker_threads parameter from config.ini; > > > > improved the logs; > > > > improved config file; > > > > > > > > app/meson.build | 1 + > > > > app/test-dma-perf/benchmark.c | 498 +++++++++++++++++++++ > > > > app/test-dma-perf/config.ini | 61 +++ > > > > app/test-dma-perf/main.c | 594 +++++++++++++++++++++++++ > > > > app/test-dma-perf/main.h | 69 +++ > > > > app/test-dma-perf/meson.build | 17 + > > > > doc/guides/rel_notes/release_23_07.rst | 6 + > > > > doc/guides/tools/dmaperf.rst | 103 +++++ > > > > doc/guides/tools/index.rst | 1 + > > > > 9 files changed, 1350 insertions(+) create mode 100644 > > > > app/test-dma-perf/benchmark.c create mode 100644 > > > > app/test-dma-perf/config.ini create mode 100644 app/test-dma- > > > > perf/main.c create mode 100644 app/test-dma-perf/main.h create > > > > mode > > > > 100644 app/test-dma-perf/meson.build create mode 100644 > > > > doc/guides/tools/dmaperf.rst > > > > > > > > > <snip> > > > > > > > > + fprintf(stderr, "Error: Fail to find DMA %s.\n", > > > > dma_name); > > > > + goto end; > > > > + } > > > > + > > > > + ldm->dma_ids[i] = dev_id; > > > > + configure_dmadev_queue(dev_id, ring_size); > > > > + ++nb_dmadevs; > > > > + } > > > > + > > > > +end: > > > > + if (nb_dmadevs < nb_workers) { > > > > + printf("Not enough dmadevs (%u) for all workers > > > > (%u).\n", > > > > nb_dmadevs, nb_workers); > > > > + return -1; > > > > + } > > > > + > > > > + printf("Number of used dmadevs: %u.\n", nb_dmadevs); > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +static inline void > > > > +do_dma_submit_and_poll(uint16_t dev_id, uint64_t *async_cnt, > > > > + volatile struct worker_info *worker_info) { > > > > + int ret; > > > > + uint16_t nr_cpl; > > > > + > > > > + ret = rte_dma_submit(dev_id, 0); > > > > + if (ret < 0) { > > > > + rte_dma_stop(dev_id); > > > > + rte_dma_close(dev_id); > > > > + rte_exit(EXIT_FAILURE, "Error with dma submit.\n"); > > > > + } > > > > + > > > > + nr_cpl = rte_dma_completed(dev_id, 0, MAX_DMA_CPL_NB, NULL, > > > > NULL); > > > > + *async_cnt -= nr_cpl; > > > > + worker_info->total_cpl += nr_cpl; } > > > > + > > > > +static inline int > > > > +do_dma_mem_copy(void *p) > > > > > > [Anoob] Just curious, why not pass struct lcore_params *para itself? > > > Is it because the pointer is volatile? If yes, then we can take an > > > AI to split the struct into volatile and non-volatile parts. > > > > [Cheng] The reason I did it this way is because I want to launch this > > function on another core by spawning a new thread, and > > rte_eal_remote_launch() takes a void * as the parameter. That's why I > > passed void *p. Your suggestion to split the struct into volatile and > > non-volatile parts is quite reasonable. I am thinking about the best way to > implement it. Thanks. > > [Anoob] Instead of passing the address of index variable as void *, you can > easily send lcore_params pointer, right? > [Cheng] Yes, you are right. I can pass the lcore_params pointer. And I'll fix it in the next version. The new lcore_params will be non-volatile with volatile worker_info in it. This is more reasonable. > > > > > > > > > +{ > > > > + const uint16_t *para_idx = (uint16_t *)p; > > > > + volatile struct lcore_params *para = lcores_p[*para_idx].v_ptr; > > > > + volatile struct worker_info *worker_info = &(para->worker_info); > > > > + const uint16_t dev_id = para->dev_id; > > > > + const uint32_t nr_buf = para->nr_buf; > > > > + const uint16_t kick_batch = para->kick_batch; > > > > + const uint32_t buf_size = para->buf_size; > > > > + struct rte_mbuf **srcs = para->srcs; > > > > + struct rte_mbuf **dsts = para->dsts; > > > > + uint16_t nr_cpl; > > > > + uint64_t async_cnt = 0; > > > > + uint32_t i; > > > > + uint32_t poll_cnt = 0; > > > > + int ret; > > > > + > > > > + worker_info->stop_flag = false; > > > > + worker_info->ready_flag = true; > > > > + > > > > + while (!worker_info->start_flag) > > > > + ; > > > > + > > > > + while (1) { > > > > + for (i = 0; i < nr_buf; i++) { > > > > +dma_copy: > > > > + ret = rte_dma_copy(dev_id, 0, > > > > rte_pktmbuf_iova(srcs[i]), > > > > + rte_pktmbuf_iova(dsts[i]), buf_size, 0); > > > > > > [Anoob] Do we need to use ' rte_mbuf_data_iova' here instead of > > > 'rte_pktmbuf_iova'? > > > > [Cheng] yes rte_mbuf_data_iova is more appropriate, I'll fix it in the > > next version. Thanks. > > > > > > > > > + if (unlikely(ret < 0)) { > > > > + if (ret == -ENOSPC) { > > > > + do_dma_submit_and_poll(dev_id, > > > > &async_cnt, worker_info); > > > > + goto dma_copy; > > > > + } else { > > > > + /* Error exit */ > > > > + rte_dma_stop(dev_id); > > > > > > [Anoob] Missing rte_dma_close() here. Also, may be introduce a > > > static void function so that rte_exit call etc won't be part of the > > > fastpath > loop. > > > > > > May be something like below and you can call it from here and > > > "do_dma_submit_and_poll". > > > > > > static void > > > error_exit(int dev_id) > > > { > > > /* Error exit */ > > > rte_dma_stop(dev_id); > > > rte_dma_close(dev_id); > > > rte_exit(EXIT_FAILURE, "DMA enqueue failed\n"); } > > > > > > > [Cheng] I'm not so sure here. rte_dma_close() is called in the > > rte_exit(). Do we still call it explicitly before rte_exit()? > > [Anoob] In ' do_dma_submit_and_poll', there is rte_dma_close() before > rte_exit(). I'm fine either way as long is it is consistent. Said that, I > think it is > better to call close() from app, rather than relying on rte_exit. > [Cheng] sure, it makes sense to me that app should call rte_dma_close(), I'll fix it in the next version. Thanks. > <snip>