Thanks Yinan for reporting the regresion and Gavin for the analysis.

On 9/10/19 11:48 AM, Gavin Hu (Arm Technology China) wrote:
> Hi Yinan,
> 
> We have done a comparative analysis and found with the old code the 
> if(weak_barriers) and else branches were saved on x86 as rte_smp_wmb and 
> rte_cio_wmb are identical.  
> http://git.dpdk.org/dpdk/tree/drivers/net/virtio/virtqueue.h#n49 
> For the new code, with Joyce's patches applied, the branches were not saved, 
> which requir additional cpu cycles, this caused slight degradation on x86.
> 
> The patches uplifted the performance on aarch64 about 9% as indicated in the 
> cover letter. While I am thinking over a solution to the degradation on 
> x86,could you help answer:
> 1. Is rte_cio_wmb is sufficient for the non weak-barrier case(HW offloading)?
>  I got this question because I see in Intel NIC PMDs, it is almost never 
> used, it is rte_wmb that is more widely used to notify the NIC device, any 
> difference between the virtio ring compatible smartNIC device(or vDPA?) and 
> i40e like devices? 
> 2. If the rte_cio_wmb is not sufficient for this case and replaced by 
> stronger barriers, like sfence,  then the branches will not be saved by the 
> compiler, then the problem becomes with the correct use of barriers, other 
> than the degradation.
> 
> Any comments are welcome!

It may we worth that Yinan tries with rte_wmb instead of rte_cio_wmb
without the series applied, just to confirm this is caused by the etra
branch.

Maxime

> Best Regards,
> Gavin
> 
>> -----Original Message-----
>> From: Wang, Yinan <yinan.w...@intel.com>
>> Sent: Tuesday, September 10, 2019 11:54 AM
>> To: Maxime Coquelin <maxime.coque...@redhat.com>; Joyce Kong (Arm
>> Technology China) <joyce.k...@arm.com>; dev@dpdk.org
>> Cc: nd <n...@arm.com>; Bie, Tiwei <tiwei....@intel.com>; Wang, Zhihong
>> <zhihong.w...@intel.com>; amore...@redhat.com; Wang, Xiao W
>> <xiao.w.w...@intel.com>; Liu, Yong <yong....@intel.com>;
>> jfreim...@redhat.com; Honnappa Nagarahalli
>> <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm Technology China)
>> <gavin...@arm.com>
>> Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed 
>> vring
>> desc avail flags
>>
>>
>> Hi Joyce,
>>
>> I just test performance impact of your patch set with code base commit id:
>> d03d8622db48918d14bfe805641b1766ecc40088, after applying your v3 patch
>> set , seven paths of vhost/virtio pvp test shows performance drop as below:
>>
>> PVP vhost/virtio 1c1q test            before apply patch     apply patch
>> test_perf_pvp_inorder_mergeable       7.603             7.474
>> test_perf_pvp_inorder_no_mergeable        7.642                 7.525
>> test_perf_pvp_mergeable                    7.556                7.431
>> test_perf_pvp_normal                    7.554                   7.478
>> test_perf_pvp_vector_rx                     7.581               7.469
>> test_perf_pvp_virtio11_mergeable                7.068                   6.905
>> test_perf_pvp_virtio11_normal                   7.088                   6.888
>>
>> Thanks,
>> Yinan
>>
>>> -----Original Message-----
>>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime Coquelin
>>> Sent: 2019年9月9日 18:10
>>> To: Joyce Kong <joyce.k...@arm.com>; dev@dpdk.org
>>> Cc: n...@arm.com; Bie, Tiwei <tiwei....@intel.com>; Wang, Zhihong
>>> <zhihong.w...@intel.com>; amore...@redhat.com; Wang, Xiao W
>>> <xiao.w.w...@intel.com>; Liu, Yong <yong....@intel.com>;
>>> jfreim...@redhat.com; honnappa.nagaraha...@arm.com;
>> gavin...@arm.com
>>> Subject: Re: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed
>> vring
>>> desc avail flags
>>>
>>>
>>>
>>> On 9/9/19 11:14 AM, Joyce Kong wrote:
>>>> In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the
>>>> frontend and backend are assumed to be implemented in software, that
>>>> is they can run on identical CPUs in an SMP configuration.
>>>> Thus a weak form of memory barriers like rte_smp_r/wmb, other than
>>>> rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
>>>> and yields better performance.
>>>> For the above case, this patch helps yielding even better performance
>>>> by replacing the two-way barriers with C11 one-way barriers for avail
>>>> flags in packed ring.
>>>>
>>>> Meanwhile, a read barrier is required to ensure ordering between
>>>> descriptor's flags and content reads[1]. With C11, load-acquire can
>>>> enforce the ordering instead of rmb barrier.
>>>>
>>>> [1]https://patchwork.dpdk.org/patch/49109/
>>>>
>>>> Signed-off-by: Joyce Kong <joyce.k...@arm.com>
>>>> Reviewed-by: Gavin Hu <gavin...@arm.com>
>>>> Reviewed-by: Phil Yang <phil.y...@arm.com>
>>>> ---
>>>>  drivers/net/virtio/virtio_rxtx.c                 | 13 +++++++------
>>>>  drivers/net/virtio/virtio_user/virtio_user_dev.c |  6 +++++-
>>>>  drivers/net/virtio/virtqueue.h                   | 11 +++++++++++
>>>>  lib/librte_vhost/vhost.h                         |  2 +-
>>>>  lib/librte_vhost/virtio_net.c                    | 11 +++++------
>>>>  5 files changed, 29 insertions(+), 14 deletions(-)
>>>
>>> Reviewed-by: Maxime Coquelin <maxime.coque...@redhat.com>
>>>
>>> Thanks,
>>> Maxime

Reply via email to