Hi Pierre,

On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote:
> Hello Maxime,
>
> Sorry for the late reply.
>
>
>> Le 8 nov. 2016 ? 10:44, Maxime Coquelin <maxime.coquelin at redhat.com> a 
>> ?crit :
>>
>> Hi Pierre,
>>
>> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote:
>>> Current virtio driver advertises VERSION_1 support,
>>> but does not handle device's VERSION_1 support when
>>> sending packets (it looks for ANY_LAYOUT feature,
>>> which is absent).
>>>
>>> This patch enables 'can_push' in tx path when VERSION_1
>>> is advertised by the device.
>>>
>>> This significantly improves small packets forwarding rate
>>> towards devices advertising VERSION_1 feature.
>> I think it depends whether offloading is enabled or not.
>> If no offloading enabled, I measured significant drop.
>> Indeed, when no offloading is enabled, the Tx path in Virtio
>> does not access the virtio header before your patch, as the header is memset 
>> to zero at device init time.
>> With your patch, it gets memset to zero at every transmit in the hot
>> path.
>
> Right. On the virtio side that is true, but on the device side, we have to 
> access the header anyway.
No more now, if no offload features have been negotiated.
I have done a patch that landed in v16.11 to skip header parsing in
this case.
That said, we still have to access its descriptor.

> And accessing two descriptors (with the address resolution and memory fetch 
> which comes with it)
> is a costy operation compared to a single one.
> In the case indirect descriptors are used, this is 1 desc access instead or 3.
I agree this is far from being optimal.

> And in the case chained descriptors are used, this doubles the number of 
> packets that you can put in your queue.
>
> Those are the results in my PHY -> VM (testpmd) -> PHY setup
> Traffic is flowing bidirectionally. Numbers are for lossless-rates.
>
> When chained buffers are used for dpdk's TX: 2x2.13Mpps
> When indirect descriptors are used for dpdk's TX: 2x2.38Mpps
> When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps
When I tried it, I also did PVP 0% benchmark, and I got opposite 
results. Chained and indirect cases were significantly better.

My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM
forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding
was macswap mode on guest side.

I also saw some perf regression when running simple tespmd test on both
ends.

Yuanhan, did you run some benchmark with your series enabling
ANY_LAYOUT?

>
> I must also note that qemu 2.5 does not seem to deal with VERSION_1 and 
> ANY_LAYOUT correctly.
> The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd 
> still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not 
> catastrophic. But just note that you will not see performance in some cases 
> with qemu 2.5.

Thanks for the info.

Regards,
Maxime

Reply via email to