Hi Pierre, On 11/22/2016 10:54 AM, Pierre Pfister (ppfister) wrote: > Hello Maxime, > >> Le 9 nov. 2016 ? 15:51, Maxime Coquelin <maxime.coquelin at redhat.com> a >> ?crit : >> >> Hi Pierre, >> >> On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote: >>> Hello Maxime, >>> >>> Sorry for the late reply. >>> >>> >>>> Le 8 nov. 2016 ? 10:44, Maxime Coquelin <maxime.coquelin at redhat.com> a >>>> ?crit : >>>> >>>> Hi Pierre, >>>> >>>> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote: >>>>> Current virtio driver advertises VERSION_1 support, >>>>> but does not handle device's VERSION_1 support when >>>>> sending packets (it looks for ANY_LAYOUT feature, >>>>> which is absent). >>>>> >>>>> This patch enables 'can_push' in tx path when VERSION_1 >>>>> is advertised by the device. >>>>> >>>>> This significantly improves small packets forwarding rate >>>>> towards devices advertising VERSION_1 feature. >>>> I think it depends whether offloading is enabled or not. >>>> If no offloading enabled, I measured significant drop. >>>> Indeed, when no offloading is enabled, the Tx path in Virtio >>>> does not access the virtio header before your patch, as the header is >>>> memset to zero at device init time. >>>> With your patch, it gets memset to zero at every transmit in the hot >>>> path. >>> >>> Right. On the virtio side that is true, but on the device side, we have to >>> access the header anyway. >> No more now, if no offload features have been negotiated. >> I have done a patch that landed in v16.11 to skip header parsing in >> this case. >> That said, we still have to access its descriptor. >> >>> And accessing two descriptors (with the address resolution and memory fetch >>> which comes with it) >>> is a costy operation compared to a single one. >>> In the case indirect descriptors are used, this is 1 desc access instead or >>> 3. >> I agree this is far from being optimal. >> >>> And in the case chained descriptors are used, this doubles the number of >>> packets that you can put in your queue. >>> >>> Those are the results in my PHY -> VM (testpmd) -> PHY setup >>> Traffic is flowing bidirectionally. Numbers are for lossless-rates. >>> >>> When chained buffers are used for dpdk's TX: 2x2.13Mpps >>> When indirect descriptors are used for dpdk's TX: 2x2.38Mpps >>> When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps >> When I tried it, I also did PVP 0% benchmark, and I got opposite results. >> Chained and indirect cases were significantly better. >> >> My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM >> forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding >> was macswap mode on guest side. >> >> I also saw some perf regression when running simple tespmd test on both >> ends. >> >> Yuanhan, did you run some benchmark with your series enabling >> ANY_LAYOUT? > > It was enabled. But the specs specify that VERSION_1 includes ANY_LAYOUT. > Therefor, Qemu removes ANY_LAYOUT when VERSION_1 is set. > > We can keep arguing about which is fastest. I guess we have different setups > and different results, so we probably are deadlocked here. > But in any case, the current code is inconsistent, as it uses single > descriptor when ANY_LAYOUT is set, but not when VERSION_1 is set. > > I believe it makes sense to use single-descriptor any time it is possible, > but you are free to think otherwise. > Please make a call and make the code consistent (removes single-descriptors > all together, or use them when VERSION_1 is set too). Otherwise it just > creates yet-another testing headache. I also think it makes sense to have a single descriptor, but I had to highlight that I noticed a significant performance degradation when using single descriptor on my setup.
I'm fine we take your patch in virtio-next, so that more testing is conducted. Thanks, Maxime > > Thanks, > > - Pierre > >> >>> >>> I must also note that qemu 2.5 does not seem to deal with VERSION_1 and >>> ANY_LAYOUT correctly. >>> The patch I am proposing here works for qemu 2.7, but with qemu 2.5, >>> testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. >>> This is not catastrophic. But just note that you will not see performance >>> in some cases with qemu 2.5. >> >> Thanks for the info. >> >> Regards, >> Maxime >