Thanks Marvin, my inline comments.
> -----Original Message----- > From: Liu, Yong <yong....@intel.com> > Sent: Wednesday, September 11, 2019 2:30 PM > To: Gavin Hu (Arm Technology China) <gavin...@arm.com>; Wang, Yinan > <yinan.w...@intel.com>; Maxime Coquelin <maxime.coque...@redhat.com>; > Joyce Kong (Arm Technology China) <joyce.k...@arm.com>; dev@dpdk.org > Cc: nd <n...@arm.com>; Bie, Tiwei <tiwei....@intel.com>; Wang, Zhihong > <zhihong.w...@intel.com>; amore...@redhat.com; Wang, Xiao W > <xiao.w.w...@intel.com>; jfreim...@redhat.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; Steve Capper <steve.cap...@arm.com> > Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed > vring > desc avail flags > > Thanks Gavin, my answers are inline. > > > -----Original Message----- > > From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com] > > Sent: Wednesday, September 11, 2019 11:35 AM > > To: Liu, Yong <yong....@intel.com>; Wang, Yinan <yinan.w...@intel.com>; > > Maxime Coquelin <maxime.coque...@redhat.com>; Joyce Kong (Arm > Technology > > China) <joyce.k...@arm.com>; dev@dpdk.org > > Cc: nd <n...@arm.com>; Bie, Tiwei <tiwei....@intel.com>; Wang, Zhihong > > <zhihong.w...@intel.com>; amore...@redhat.com; Wang, Xiao W > > <xiao.w.w...@intel.com>; jfreim...@redhat.com; Honnappa Nagarahalli > > <honnappa.nagaraha...@arm.com>; Steve Capper <steve.cap...@arm.com> > > Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed > > vring desc avail flags > > > > Hi Marvin, > > > > Thanks for your answers, one more question for x86: > > 1. For CIO memory alone or MMIO memory(eg PCI BAR) alone, the compiler > > barrier is enough to keep ordering, that's why both rte_io_mb and > > rte_cio_mb are defined as compiler barriers, right? > > Yes, that's right for x86. > > > 2. How about the ordering of interleaved CIO and MMIO accesses, for > example, > > a young store to MMIO can be reordered before an older store to CIO? CIO > > may be faster than devices, but store buffers or caching may cause the CIO > > update not visible to the device(in a common doorbell case)? > > > > There's always one kind of cache coherent engine in x86 uncore sub-system. > When CIO write instruction was retried, data will be in CPU LLC. > When device doing inbound read, request will go to cache engine first and > then check memory state and retrieve latest value. I understand your words that the cache coherent engine is working like a hub/coordinator/arbiter for all the accesses to three types of memory: 1 - normal memory, 2 - CIO memory, 3 - MMIO memory, and the ordering behaviors are no different? Then in what scenarios mfence/sfence/lfence should be used? Maybe just mfence is enough to keep orderings of store/load(which is the only one might reordered on x86)? > > > Best regards, > > Gavin > > > > > -----Original Message----- > > > From: Liu, Yong <yong....@intel.com> > > > Sent: Wednesday, September 11, 2019 10:39 AM > > > To: Gavin Hu (Arm Technology China) <gavin...@arm.com>; Wang, Yinan > > > <yinan.w...@intel.com>; Maxime Coquelin > <maxime.coque...@redhat.com>; > > > Joyce Kong (Arm Technology China) <joyce.k...@arm.com>; > dev@dpdk.org > > > Cc: nd <n...@arm.com>; Bie, Tiwei <tiwei....@intel.com>; Wang, Zhihong > > > <zhihong.w...@intel.com>; amore...@redhat.com; Wang, Xiao W > > > <xiao.w.w...@intel.com>; jfreim...@redhat.com; Honnappa Nagarahalli > > > <honnappa.nagaraha...@arm.com>; Steve Capper > <steve.cap...@arm.com> > > > Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed > > vring > > > desc avail flags > > > > > > > > > > > > > -----Original Message----- > > > > From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com] > > > > Sent: Tuesday, September 10, 2019 5:49 PM > > > > To: Wang, Yinan <yinan.w...@intel.com>; Maxime Coquelin > > > > <maxime.coque...@redhat.com>; Joyce Kong (Arm Technology China) > > > > <joyce.k...@arm.com>; dev@dpdk.org > > > > Cc: nd <n...@arm.com>; Bie, Tiwei <tiwei....@intel.com>; Wang, > Zhihong > > > > <zhihong.w...@intel.com>; amore...@redhat.com; Wang, Xiao W > > > > <xiao.w.w...@intel.com>; Liu, Yong <yong....@intel.com>; > > > > jfreim...@redhat.com; Honnappa Nagarahalli > > > <honnappa.nagaraha...@arm.com>; > > > > Steve Capper <steve.cap...@arm.com> > > > > Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for > > packed > > > > vring desc avail flags > > > > > > > > Hi Yinan, > > > > > > > > We have done a comparative analysis and found with the old code the > > > > if(weak_barriers) and else branches were saved on x86 as rte_smp_wmb > > > and > > > > rte_cio_wmb are identical. > > > > http://git.dpdk.org/dpdk/tree/drivers/net/virtio/virtqueue.h#n49 > > > > For the new code, with Joyce's patches applied, the branches were not > > > saved, > > > > which requir additional cpu cycles, this caused slight degradation on > > x86. > > > > > > > > The patches uplifted the performance on aarch64 about 9% as indicated > > in > > > > the cover letter. While I am thinking over a solution to the > > degradation on > > > > x86,could you help answer: > > > > 1. Is rte_cio_wmb is sufficient for the non weak-barrier case(HW > > > > offloading)? > > > > I got this question because I see in Intel NIC PMDs, it is almost > > never > > > > used, it is rte_wmb that is more widely used to notify the NIC device, > > any > > > > difference between the virtio ring compatible smartNIC device(or vDPA?) > > > and > > > > i40e like devices? > > > > > > Hi Gavin, > > > X86 architecture can guarantee that young store happen later than old > > store. > > > So rte_cio_wmb is just compiler memory barrier in x86. > > > > > > I think compiler barrier is also enough in pmd, rte_wmb is in pmd because > > of > > > it was inherit from first implementation :) > > > > > > Thanks, > > > Marvin > > > > > > > 2. If the rte_cio_wmb is not sufficient for this case and replaced by > > > > stronger barriers, like sfence, then the branches will not be saved by > > the > > > > compiler, then the problem becomes with the correct use of barriers, > > other > > > > than the degradation. > > > > > > > > Any comments are welcome! > > > > > > > > Best Regards, > > > > Gavin > > > > > > > > > -----Original Message----- > > > > > From: Wang, Yinan <yinan.w...@intel.com> > > > > > Sent: Tuesday, September 10, 2019 11:54 AM > > > > > To: Maxime Coquelin <maxime.coque...@redhat.com>; Joyce Kong > (Arm > > > > > Technology China) <joyce.k...@arm.com>; dev@dpdk.org > > > > > Cc: nd <n...@arm.com>; Bie, Tiwei <tiwei....@intel.com>; Wang, > Zhihong > > > > > <zhihong.w...@intel.com>; amore...@redhat.com; Wang, Xiao W > > > > > <xiao.w.w...@intel.com>; Liu, Yong <yong....@intel.com>; > > > > > jfreim...@redhat.com; Honnappa Nagarahalli > > > > > <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm Technology China) > > > > > <gavin...@arm.com> > > > > > Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for > > packed > > > > vring > > > > > desc avail flags > > > > > > > > > > > > > > > Hi Joyce, > > > > > > > > > > I just test performance impact of your patch set with code base > > commit id: > > > > > d03d8622db48918d14bfe805641b1766ecc40088, after applying your > v3 > > > patch > > > > > set , seven paths of vhost/virtio pvp test shows performance drop as > > > > below: > > > > > > > > > > PVP vhost/virtio 1c1q test before apply patch apply > > patch > > > > > test_perf_pvp_inorder_mergeable 7.603 7.474 > > > > > test_perf_pvp_inorder_no_mergeable 7.642 7.525 > > > > > test_perf_pvp_mergeable 7.556 7.431 > > > > > test_perf_pvp_normal 7.554 7.478 > > > > > test_perf_pvp_vector_rx 7.581 7.469 > > > > > test_perf_pvp_virtio11_mergeable 7.068 > > 6.905 > > > > > test_perf_pvp_virtio11_normal 7.088 6.888 > > > > > > > > > > Thanks, > > > > > Yinan > > > > > > > > > > > -----Original Message----- > > > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime > > > Coquelin > > > > > > Sent: 2019年9月9日 18:10 > > > > > > To: Joyce Kong <joyce.k...@arm.com>; dev@dpdk.org > > > > > > Cc: n...@arm.com; Bie, Tiwei <tiwei....@intel.com>; Wang, Zhihong > > > > > > <zhihong.w...@intel.com>; amore...@redhat.com; Wang, Xiao W > > > > > > <xiao.w.w...@intel.com>; Liu, Yong <yong....@intel.com>; > > > > > > jfreim...@redhat.com; honnappa.nagaraha...@arm.com; > > > > > gavin...@arm.com > > > > > > Subject: Re: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for > > > > packed > > > > > vring > > > > > > desc avail flags > > > > > > > > > > > > > > > > > > > > > > > > On 9/9/19 11:14 AM, Joyce Kong wrote: > > > > > > > In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then > the > > > > > > > frontend and backend are assumed to be implemented in software, > > > that > > > > > > > is they can run on identical CPUs in an SMP configuration. > > > > > > > Thus a weak form of memory barriers like rte_smp_r/wmb, other > > than > > > > > > > rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers > > == 1) > > > > > > > and yields better performance. > > > > > > > For the above case, this patch helps yielding even better > > performance > > > > > > > by replacing the two-way barriers with C11 one-way barriers for > > avail > > > > > > > flags in packed ring. > > > > > > > > > > > > > > Meanwhile, a read barrier is required to ensure ordering between > > > > > > > descriptor's flags and content reads[1]. With C11, load-acquire > > can > > > > > > > enforce the ordering instead of rmb barrier. > > > > > > > > > > > > > > [1]https://patchwork.dpdk.org/patch/49109/ > > > > > > > > > > > > > > Signed-off-by: Joyce Kong <joyce.k...@arm.com> > > > > > > > Reviewed-by: Gavin Hu <gavin...@arm.com> > > > > > > > Reviewed-by: Phil Yang <phil.y...@arm.com> > > > > > > > --- > > > > > > > drivers/net/virtio/virtio_rxtx.c | 13 +++++++--- > > --- > > > > > > > drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +++++- > > > > > > > drivers/net/virtio/virtqueue.h | 11 > > +++++++++++ > > > > > > > lib/librte_vhost/vhost.h | 2 +- > > > > > > > lib/librte_vhost/virtio_net.c | 11 +++++----- > > - > > > > > > > 5 files changed, 29 insertions(+), 14 deletions(-) > > > > > > > > > > > > Reviewed-by: Maxime Coquelin <maxime.coque...@redhat.com> > > > > > > > > > > > > Thanks, > > > > > > Maxime