> From: Wenwu Ma [mailto:wenwux...@intel.com] > Sent: Monday, 29 August 2022 02.57 > > Offloading small packets to DMA degrades throughput 10%~20%, > and this is because DMA offloading is not free and DMA is not > good at processing small packets. In addition, control plane > packets are usually small, and assign those packets to DMA will > significantly increase latency, which may cause timeout like > TCP handshake packets. Therefore, this patch use CPU to perform > small copies in vhost. > > Signed-off-by: Wenwu Ma <wenwux...@intel.com> > ---
[...] > diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c > index 35fa4670fd..cf796183a0 100644 > --- a/lib/vhost/virtio_net.c > +++ b/lib/vhost/virtio_net.c > @@ -26,6 +26,8 @@ > > #define MAX_BATCH_LEN 256 > > +#define CPU_COPY_THRESHOLD_LEN 256 This threshold may not be optimal for all CPU architectures and/or DMA engines. Could you please provide a test application to compare the performance of DMA copy with CPU rte_memcpy? The performance metric should be simple: How many cycles does the CPU spend copying various packet sizes using each the two methods. You could provide test_dmadev_perf.c in addition to the existing test_dmadev.c. You can probably copy a some of the concepts and code from test_memcpy_perf.c. Alternatively, you might be able to add DMA copy to test_memcpy_perf.c. I'm sorry to push this on you - it should have been done as part of DMAdev development already. -Morten