On 2024/3/1 16:06, Gowrishankar Muthukrishnan wrote:
> Hi Fengcheng,
> 
> <cut>
>>>>> -output_result(uint8_t scenario_id, uint32_t lcore_id, char
>>>>> *dma_name,
>>>> uint16_t ring_size,
>>>>> -                 uint16_t kick_batch, uint64_t ave_cycle, uint32_t
>>>> buf_size, uint32_t nr_buf,
>>>>> -                 float memory, float bandwidth, float mops, bool
>>>> is_dma)
>>>>> +output_result(struct test_configure *cfg, struct lcore_params *para,
>>>>> +                 uint16_t kick_batch, uint64_t ave_cycle, uint32_t
>>>> buf_size,
>>>>> +                 uint32_t nr_buf, float memory, float bandwidth, float
>>>> mops)
>>>>>  {
>>>>> - if (is_dma)
>>>>> -         printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch Size:
>>>> %u.\n",
>>>>> -                         lcore_id, dma_name, ring_size, kick_batch);
>>>>> - else
>>>>> + uint16_t ring_size = cfg->ring_size.cur;
>>>>> + uint8_t scenario_id = cfg->scenario_id;
>>>>> + uint32_t lcore_id = para->lcore_id;
>>>>> + char *dma_name = para->dma_name;
>>>>> +
>>>>> + if (cfg->is_dma) {
>>>>> +         printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch Size:
>>>> %u", lcore_id,
>>>>> +                dma_name, ring_size, kick_batch);
>>>>> +         if (cfg->is_sg)
>>>>> +                 printf(" DMA src ptrs: %u, dst ptrs: %u",
>>>>> +                        para->src_ptrs, para->dst_ptrs);
>>>>
>>>> DMA src sges: %u DMA dst sges: %u
>>>>
>>>> I think we should add a column which title maybe misc, some like
>>>> sg-src[4]- dst[1], and later we may add fill test, then this field
>>>> could be pattern-
>>>> 0x12345678
>>>>
>>>> And in "[PATCH v10 2/4] app/dma-perf: add PCI device support" commit,
>>>> if the DMA was worked in non-mem2mem direction, we could add simple
>>>> descriptor of direction and pcie.info in the above misc column.
>>>>
>>>
>>> I am sorry, I could not understand complete picture here. Do you mean
>>> we reserve a column and use it as per test type.
>>>
>>> For plain mem copy, nothing added.
>>> For SG mem copy, instead of showing "DMA src sges: 1, dst sges: 4", print
>> "sg-src[1]-dst[4]".
>>> In future, when we add fill test in benchmark, this line instead be 
>>> "pattern-
>> 0x12345678".
>>>
>>> Is my understanding correct over here ?
>>
>> Yes, some like this.
>>
> This patch adds SGE info in an alignment with existing output.
> 
> I think it is better to add further extensions as we add new features. Since 
> the app doesn't support the features that you mentioned, it is difficult to 
> anticipate the requirements.
> In fact, if the additional frameworks that we put in are not useful for those 
> features, it could lead to stale code.
> I would prefer if we can make these changes as we add new features.
> 
>>>
> <cut>
>>>>>   }
>>>>>
>>>>>   while (1) {
>>>>> @@ -541,13 +702,53 @@ mem_copy_benchmark(struct test_configure
>>>> *cfg, bool is_dma)
>>>>>
>>>>>   rte_eal_mp_wait_lcore();
>>>>>
>>>>> - for (i = 0; i < (nr_buf / nb_workers) * nb_workers; i++) {
>>>>> -         if (memcmp(rte_pktmbuf_mtod(srcs[i], void *),
>>>>> -                    rte_pktmbuf_mtod(dsts[i], void *),
>>>>> -                    cfg->buf_size.cur) != 0) {
>>>>> -                 printf("Copy validation fails for buffer number %d\n",
>>>> i);
>>>>> -                 ret = -1;
>>>>> -                 goto out;
>>>>> + if (!cfg->is_sg && cfg->transfer_dir == RTE_DMA_DIR_MEM_TO_MEM)
>>>> {
>>>>> +         for (i = 0; i < (nr_buf / nb_workers) * nb_workers; i++) {
>>>>> +                 if (memcmp(rte_pktmbuf_mtod(srcs[i], void *),
>>>>> +                                 rte_pktmbuf_mtod(dsts[i], void *),
>>>>> +                                 cfg->buf_size.cur) != 0) {
>>>>> +                         printf("Copy validation fails for buffer number
>>>> %d\n", i);
>>>>> +                         ret = -1;
>>>>> +                         goto out;
>>>>> +                 }
>>>>> +         }
>>>>> + } else if (cfg->is_sg && cfg->transfer_dir ==
>>>> RTE_DMA_DIR_MEM_TO_MEM) {
>>>>> +         size_t src_remsz = buf_size % cfg->src_ptrs;
>>>>> +         size_t dst_remsz = buf_size % cfg->dst_ptrs;
>>>>> +         size_t src_sz = buf_size / cfg->src_ptrs;
>>>>> +         size_t dst_sz = buf_size / cfg->dst_ptrs;
>>>>> +         uint8_t src[buf_size], dst[buf_size];
>>>>> +         uint8_t *sbuf, *dbuf, *ptr;
>>>>> +
>>>>> +         for (i = 0; i < (nr_buf / RTE_MAX(cfg->src_ptrs, 
>>>>> cfg->dst_ptrs));
>>>> i++) {
>>>>> +                 sbuf = src;
>>>>> +                 dbuf = dst;
>>>>> +                 ptr = NULL;
>>>>> +
>>>>> +                 for (j = 0; j < cfg->src_ptrs; j++) {
>>>>> +                         ptr = rte_pktmbuf_mtod(srcs[i * cfg->src_ptrs
>>>> + j], uint8_t *);
>>>>> +                         memcpy(sbuf, ptr, src_sz);
>>>>> +                         sbuf += src_sz;
>>>>> +                 }
>>>>> +
>>>>> +                 if (src_remsz)
>>>>> +                         memcpy(sbuf, ptr + src_sz, src_remsz);
>>>>> +
>>>>> +                 for (j = 0; j < cfg->dst_ptrs; j++) {
>>>>> +                         ptr = rte_pktmbuf_mtod(dsts[i * cfg->dst_ptrs
>>>> + j], uint8_t *);
>>>>> +                         memcpy(dbuf, ptr, dst_sz);
>>>>> +                         dbuf += dst_sz;
>>>>> +                 }
>>>>> +
>>>>> +                 if (dst_remsz)
>>>>> +                         memcpy(dbuf, ptr + dst_sz, dst_remsz);
>>>>> +
>>>>> +                 if (memcmp(src, dst, buf_size) != 0) {
>>>>> +                         printf("SG Copy validation fails for buffer
>>>> number %d\n",
>>>>> +                                 i * cfg->src_ptrs);
>>>>> +                         ret = -1;
>>>>> +                         goto out;
>>>>> +                 }
>>>>
>>>> Now I doubt the value of verify, this verify can't find the middle
>>>> round copy failure, because as long as the last round copy is
>>>> successful, the validation will pass.
>>>>
>>> Validation is on entire buffer. If any middle copy is a failure,
>>> entire memcmp would have failed. Or do I miss something ?
>>>
>>>> And adding validatation in every round copy will impact performance.
>>>>
>>> This validation is just after worker function is stopped measuring perf.
>>> How would this impact performance ?
>>
>> Yes, it will don't impact performance.
>>
>> What I said before is that is not valid, pls consider following scene:
>>
>>
>>      while (1) {
>>              for (i = 0; i < nr_buf; i++) {  // this for loop will copy all 
>> nr_bufs,
>> let's defind this is a round copy.
>> dma_copy:
>>                      ret = rte_dma_copy(dev_id, 0,
>> rte_mbuf_data_iova(srcs[i]),
>>                              rte_mbuf_data_iova(dsts[i]), buf_size, 0);
>>                      if (unlikely(ret < 0)) {
>>                              if (ret == -ENOSPC) {
>>                                      do_dma_submit_and_poll(dev_id,
>> &async_cnt, worker_info);
>>                                      goto dma_copy;
>>                              } else
>>                                      error_exit(dev_id);
>>                      }
>>                      async_cnt++;
>>
>>                      if ((async_cnt % kick_batch) == 0)
>>                              do_dma_submit_and_poll(dev_id,
>> &async_cnt, worker_info);
>>              }
>>
>>              if (worker_info->stop_flag)   // if don't stop, it will do many
>> round copies.
>>                      break;
>>      }
>>
>> and the later validation just verify the last round, let's assume there are 
>> 100
>> round, and if the last round copy work well, but round 0~98 both copy fail,
>> then the validation will not detect it.
>>
>>
>> So if we want do all the validation, then we should add the velidation after
>> every round copy, but it will impact the performance.
>>
>>
>>>
>>>> Also app/test_dmadev already verify data. so I think we should drop
>>>> the validation commit.
>>>
>>> Even in some corner cases or unknown issues, copy would have failed
>>> and taking perf cycles then is meaningless. That is the reason, this
>>> validation is added after perf function doing its job.
>>
>> How about:
>>
>>      while (1) {
>>              for (i = 0; i < nr_buf; i++) {  // this for loop will copy all 
>> nr_bufs,
>> let's defind this is a round copy.
>> dma_copy:
>>                      ret = rte_dma_copy(dev_id, 0,
>> rte_mbuf_data_iova(srcs[i]),
>>                              rte_mbuf_data_iova(dsts[i]), buf_size, 0);
>>                      if (unlikely(ret < 0)) {
>>                              if (ret == -ENOSPC) {
>>                                      do_dma_submit_and_poll(dev_id,
>> &async_cnt, worker_info);
>>                                      goto dma_copy;
>>                              } else
>>                                      error_exit(dev_id);
>>                      }
>>                      async_cnt++;
>>
>>                      if ((async_cnt % kick_batch) == 0)
>>                              do_dma_submit_and_poll(dev_id,
>> &async_cnt, worker_info);
>>              }
>>
>>              if (unlikely(work_info->verify)) {
>>                      ret = verify();
>>                      if (ret != 0) {
>>                              // error trace,
>>                              break;
>>                      }
>>              }
>>
>>              if (worker_info->stop_flag)   // if don't stop, it will do many
>> round copies.
>>                      break;
>>      }
>>
>> and make this verify as a config entry
> 
> I believe there is a difference in understanding of what this is intended to 
> do. Intention here is not to validate every operation done by DMA, and that 
> is already taken care by UT.
> 
> Is it possible that we are we misreporting numbers if the application is 
> buggy or PMD is misbehaving for the scenario under test and the copies are 
> not actually performed? Yes. Think about a scenario where PMD is buggy when 
> trying bursts of more than 1.
> 
> Checking last set of buffers is more like testing a sample from the perf test 
> to make sure perf test was indeed performing what it is claiming to do. If 
> you think it is unnecessary to do so, we can drop this from upstream. But 
> adding complete verification in performance app would be repeating what a 
> unit test is expected to do. I would suggest not to do that.

I think this commit is mainly test the dma-perf itself.

OK with continue this commit.

Thanks

> 
> Thanks,
> Gowrishankar
> 

Reply via email to