Hi Pierre, On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote: > Hello Maxime, > > Sorry for the late reply. > > >> Le 8 nov. 2016 ? 10:44, Maxime Coquelin <maxime.coquelin at redhat.com> a >> ?crit : >> >> Hi Pierre, >> >> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote: >>> Current virtio driver advertises VERSION_1 support, >>> but does not handle device's VERSION_1 support when >>> sending packets (it looks for ANY_LAYOUT feature, >>> which is absent). >>> >>> This patch enables 'can_push' in tx path when VERSION_1 >>> is advertised by the device. >>> >>> This significantly improves small packets forwarding rate >>> towards devices advertising VERSION_1 feature. >> I think it depends whether offloading is enabled or not. >> If no offloading enabled, I measured significant drop. >> Indeed, when no offloading is enabled, the Tx path in Virtio >> does not access the virtio header before your patch, as the header is memset >> to zero at device init time. >> With your patch, it gets memset to zero at every transmit in the hot >> path. > > Right. On the virtio side that is true, but on the device side, we have to > access the header anyway. No more now, if no offload features have been negotiated. I have done a patch that landed in v16.11 to skip header parsing in this case. That said, we still have to access its descriptor.
> And accessing two descriptors (with the address resolution and memory fetch > which comes with it) > is a costy operation compared to a single one. > In the case indirect descriptors are used, this is 1 desc access instead or 3. I agree this is far from being optimal. > And in the case chained descriptors are used, this doubles the number of > packets that you can put in your queue. > > Those are the results in my PHY -> VM (testpmd) -> PHY setup > Traffic is flowing bidirectionally. Numbers are for lossless-rates. > > When chained buffers are used for dpdk's TX: 2x2.13Mpps > When indirect descriptors are used for dpdk's TX: 2x2.38Mpps > When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps When I tried it, I also did PVP 0% benchmark, and I got opposite results. Chained and indirect cases were significantly better. My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding was macswap mode on guest side. I also saw some perf regression when running simple tespmd test on both ends. Yuanhan, did you run some benchmark with your series enabling ANY_LAYOUT? > > I must also note that qemu 2.5 does not seem to deal with VERSION_1 and > ANY_LAYOUT correctly. > The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd > still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not > catastrophic. But just note that you will not see performance in some cases > with qemu 2.5. Thanks for the info. Regards, Maxime