On 2024/3/1 16:06, Gowrishankar Muthukrishnan wrote:
> Hi Fengcheng,
>
> <cut>
>>>>> -output_result(uint8_t scenario_id, uint32_t lcore_id, char
>>>>> *dma_name,
>>>> uint16_t ring_size,
>>>>> - uint16_t kick_batch, uint64_t ave_cycle, uint32_t
>>>> buf_size, uint32_t nr_buf,
>>>>> - float memory, float bandwidth, float mops, bool
>>>> is_dma)
>>>>> +output_result(struct test_configure *cfg, struct lcore_params *para,
>>>>> + uint16_t kick_batch, uint64_t ave_cycle, uint32_t
>>>> buf_size,
>>>>> + uint32_t nr_buf, float memory, float bandwidth, float
>>>> mops)
>>>>> {
>>>>> - if (is_dma)
>>>>> - printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch Size:
>>>> %u.\n",
>>>>> - lcore_id, dma_name, ring_size, kick_batch);
>>>>> - else
>>>>> + uint16_t ring_size = cfg->ring_size.cur;
>>>>> + uint8_t scenario_id = cfg->scenario_id;
>>>>> + uint32_t lcore_id = para->lcore_id;
>>>>> + char *dma_name = para->dma_name;
>>>>> +
>>>>> + if (cfg->is_dma) {
>>>>> + printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch Size:
>>>> %u", lcore_id,
>>>>> + dma_name, ring_size, kick_batch);
>>>>> + if (cfg->is_sg)
>>>>> + printf(" DMA src ptrs: %u, dst ptrs: %u",
>>>>> + para->src_ptrs, para->dst_ptrs);
>>>>
>>>> DMA src sges: %u DMA dst sges: %u
>>>>
>>>> I think we should add a column which title maybe misc, some like
>>>> sg-src[4]- dst[1], and later we may add fill test, then this field
>>>> could be pattern-
>>>> 0x12345678
>>>>
>>>> And in "[PATCH v10 2/4] app/dma-perf: add PCI device support" commit,
>>>> if the DMA was worked in non-mem2mem direction, we could add simple
>>>> descriptor of direction and pcie.info in the above misc column.
>>>>
>>>
>>> I am sorry, I could not understand complete picture here. Do you mean
>>> we reserve a column and use it as per test type.
>>>
>>> For plain mem copy, nothing added.
>>> For SG mem copy, instead of showing "DMA src sges: 1, dst sges: 4", print
>> "sg-src[1]-dst[4]".
>>> In future, when we add fill test in benchmark, this line instead be
>>> "pattern-
>> 0x12345678".
>>>
>>> Is my understanding correct over here ?
>>
>> Yes, some like this.
>>
> This patch adds SGE info in an alignment with existing output.
>
> I think it is better to add further extensions as we add new features. Since
> the app doesn't support the features that you mentioned, it is difficult to
> anticipate the requirements.
> In fact, if the additional frameworks that we put in are not useful for those
> features, it could lead to stale code.
> I would prefer if we can make these changes as we add new features.
>
>>>
> <cut>
>>>>> }
>>>>>
>>>>> while (1) {
>>>>> @@ -541,13 +702,53 @@ mem_copy_benchmark(struct test_configure
>>>> *cfg, bool is_dma)
>>>>>
>>>>> rte_eal_mp_wait_lcore();
>>>>>
>>>>> - for (i = 0; i < (nr_buf / nb_workers) * nb_workers; i++) {
>>>>> - if (memcmp(rte_pktmbuf_mtod(srcs[i], void *),
>>>>> - rte_pktmbuf_mtod(dsts[i], void *),
>>>>> - cfg->buf_size.cur) != 0) {
>>>>> - printf("Copy validation fails for buffer number %d\n",
>>>> i);
>>>>> - ret = -1;
>>>>> - goto out;
>>>>> + if (!cfg->is_sg && cfg->transfer_dir == RTE_DMA_DIR_MEM_TO_MEM)
>>>> {
>>>>> + for (i = 0; i < (nr_buf / nb_workers) * nb_workers; i++) {
>>>>> + if (memcmp(rte_pktmbuf_mtod(srcs[i], void *),
>>>>> + rte_pktmbuf_mtod(dsts[i], void *),
>>>>> + cfg->buf_size.cur) != 0) {
>>>>> + printf("Copy validation fails for buffer number
>>>> %d\n", i);
>>>>> + ret = -1;
>>>>> + goto out;
>>>>> + }
>>>>> + }
>>>>> + } else if (cfg->is_sg && cfg->transfer_dir ==
>>>> RTE_DMA_DIR_MEM_TO_MEM) {
>>>>> + size_t src_remsz = buf_size % cfg->src_ptrs;
>>>>> + size_t dst_remsz = buf_size % cfg->dst_ptrs;
>>>>> + size_t src_sz = buf_size / cfg->src_ptrs;
>>>>> + size_t dst_sz = buf_size / cfg->dst_ptrs;
>>>>> + uint8_t src[buf_size], dst[buf_size];
>>>>> + uint8_t *sbuf, *dbuf, *ptr;
>>>>> +
>>>>> + for (i = 0; i < (nr_buf / RTE_MAX(cfg->src_ptrs,
>>>>> cfg->dst_ptrs));
>>>> i++) {
>>>>> + sbuf = src;
>>>>> + dbuf = dst;
>>>>> + ptr = NULL;
>>>>> +
>>>>> + for (j = 0; j < cfg->src_ptrs; j++) {
>>>>> + ptr = rte_pktmbuf_mtod(srcs[i * cfg->src_ptrs
>>>> + j], uint8_t *);
>>>>> + memcpy(sbuf, ptr, src_sz);
>>>>> + sbuf += src_sz;
>>>>> + }
>>>>> +
>>>>> + if (src_remsz)
>>>>> + memcpy(sbuf, ptr + src_sz, src_remsz);
>>>>> +
>>>>> + for (j = 0; j < cfg->dst_ptrs; j++) {
>>>>> + ptr = rte_pktmbuf_mtod(dsts[i * cfg->dst_ptrs
>>>> + j], uint8_t *);
>>>>> + memcpy(dbuf, ptr, dst_sz);
>>>>> + dbuf += dst_sz;
>>>>> + }
>>>>> +
>>>>> + if (dst_remsz)
>>>>> + memcpy(dbuf, ptr + dst_sz, dst_remsz);
>>>>> +
>>>>> + if (memcmp(src, dst, buf_size) != 0) {
>>>>> + printf("SG Copy validation fails for buffer
>>>> number %d\n",
>>>>> + i * cfg->src_ptrs);
>>>>> + ret = -1;
>>>>> + goto out;
>>>>> + }
>>>>
>>>> Now I doubt the value of verify, this verify can't find the middle
>>>> round copy failure, because as long as the last round copy is
>>>> successful, the validation will pass.
>>>>
>>> Validation is on entire buffer. If any middle copy is a failure,
>>> entire memcmp would have failed. Or do I miss something ?
>>>
>>>> And adding validatation in every round copy will impact performance.
>>>>
>>> This validation is just after worker function is stopped measuring perf.
>>> How would this impact performance ?
>>
>> Yes, it will don't impact performance.
>>
>> What I said before is that is not valid, pls consider following scene:
>>
>>
>> while (1) {
>> for (i = 0; i < nr_buf; i++) { // this for loop will copy all
>> nr_bufs,
>> let's defind this is a round copy.
>> dma_copy:
>> ret = rte_dma_copy(dev_id, 0,
>> rte_mbuf_data_iova(srcs[i]),
>> rte_mbuf_data_iova(dsts[i]), buf_size, 0);
>> if (unlikely(ret < 0)) {
>> if (ret == -ENOSPC) {
>> do_dma_submit_and_poll(dev_id,
>> &async_cnt, worker_info);
>> goto dma_copy;
>> } else
>> error_exit(dev_id);
>> }
>> async_cnt++;
>>
>> if ((async_cnt % kick_batch) == 0)
>> do_dma_submit_and_poll(dev_id,
>> &async_cnt, worker_info);
>> }
>>
>> if (worker_info->stop_flag) // if don't stop, it will do many
>> round copies.
>> break;
>> }
>>
>> and the later validation just verify the last round, let's assume there are
>> 100
>> round, and if the last round copy work well, but round 0~98 both copy fail,
>> then the validation will not detect it.
>>
>>
>> So if we want do all the validation, then we should add the velidation after
>> every round copy, but it will impact the performance.
>>
>>
>>>
>>>> Also app/test_dmadev already verify data. so I think we should drop
>>>> the validation commit.
>>>
>>> Even in some corner cases or unknown issues, copy would have failed
>>> and taking perf cycles then is meaningless. That is the reason, this
>>> validation is added after perf function doing its job.
>>
>> How about:
>>
>> while (1) {
>> for (i = 0; i < nr_buf; i++) { // this for loop will copy all
>> nr_bufs,
>> let's defind this is a round copy.
>> dma_copy:
>> ret = rte_dma_copy(dev_id, 0,
>> rte_mbuf_data_iova(srcs[i]),
>> rte_mbuf_data_iova(dsts[i]), buf_size, 0);
>> if (unlikely(ret < 0)) {
>> if (ret == -ENOSPC) {
>> do_dma_submit_and_poll(dev_id,
>> &async_cnt, worker_info);
>> goto dma_copy;
>> } else
>> error_exit(dev_id);
>> }
>> async_cnt++;
>>
>> if ((async_cnt % kick_batch) == 0)
>> do_dma_submit_and_poll(dev_id,
>> &async_cnt, worker_info);
>> }
>>
>> if (unlikely(work_info->verify)) {
>> ret = verify();
>> if (ret != 0) {
>> // error trace,
>> break;
>> }
>> }
>>
>> if (worker_info->stop_flag) // if don't stop, it will do many
>> round copies.
>> break;
>> }
>>
>> and make this verify as a config entry
>
> I believe there is a difference in understanding of what this is intended to
> do. Intention here is not to validate every operation done by DMA, and that
> is already taken care by UT.
>
> Is it possible that we are we misreporting numbers if the application is
> buggy or PMD is misbehaving for the scenario under test and the copies are
> not actually performed? Yes. Think about a scenario where PMD is buggy when
> trying bursts of more than 1.
>
> Checking last set of buffers is more like testing a sample from the perf test
> to make sure perf test was indeed performing what it is claiming to do. If
> you think it is unnecessary to do so, we can drop this from upstream. But
> adding complete verification in performance app would be repeating what a
> unit test is expected to do. I would suggest not to do that.
I think this commit is mainly test the dma-perf itself.
OK with continue this commit.
Thanks
>
> Thanks,
> Gowrishankar
>