> From: Wenwu Ma [mailto:wenwux...@intel.com]
> Sent: Monday, 29 August 2022 02.57
> 
> Offloading small packets to DMA degrades throughput 10%~20%,
> and this is because DMA offloading is not free and DMA is not
> good at processing small packets. In addition, control plane
> packets are usually small, and assign those packets to DMA will
> significantly increase latency, which may cause timeout like
> TCP handshake packets. Therefore, this patch use CPU to perform
> small copies in vhost.
> 
> Signed-off-by: Wenwu Ma <wenwux...@intel.com>
> ---

[...]

> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> index 35fa4670fd..cf796183a0 100644
> --- a/lib/vhost/virtio_net.c
> +++ b/lib/vhost/virtio_net.c
> @@ -26,6 +26,8 @@
> 
>  #define MAX_BATCH_LEN 256
> 
> +#define CPU_COPY_THRESHOLD_LEN 256

This threshold may not be optimal for all CPU architectures and/or DMA engines.

Could you please provide a test application to compare the performance of DMA 
copy with CPU rte_memcpy?

The performance metric should be simple: How many cycles does the CPU spend 
copying various packet sizes using each the two methods.

You could provide test_dmadev_perf.c in addition to the existing test_dmadev.c.
You can probably copy a some of the concepts and code from test_memcpy_perf.c.
Alternatively, you might be able to add DMA copy to test_memcpy_perf.c.

I'm sorry to push this on you - it should have been done as part of DMAdev 
development already.

-Morten

Reply via email to