Hi Maxime, > -----Original Message----- > From: Maxime Coquelin <maxime.coque...@redhat.com> > Sent: Saturday, September 7, 2019 12:02 AM > To: Joyce Kong (Arm Technology China) <joyce.k...@arm.com>; > dev@dpdk.org > Cc: nd <n...@arm.com>; tiwei....@intel.com; zhihong.w...@intel.com; > amore...@redhat.com; xiao.w.w...@intel.com; yong....@intel.com; > jfreim...@redhat.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm Technology China) > <gavin...@arm.com> > Subject: Re: [PATCH v2 1/2] virtio: one way barrier for packed vring desc > avail > flags > > Hi Joyce, > > On 9/6/19 1:34 PM, Joyce Kong wrote: > > In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the > > frontend and backend are assumed to be implemented in software, that > > is they can run on identical CPUs in an SMP configuration. Thus a weak > > form of memory barriers like rte_smp_r/wmb, other than rte_cio_r/wmb, > > is sufficient for this case(vq->hw->weak_barriers == 1) and yields better > performance. > > For the above case, this patch helps yielding even better performance > > by replacing the two-way barriers with C11 one-way barriers. > > > > Meanwhile, a read barrier is required to ensure ordering between > > descriptor's flags and content reads[1]. With C11, load-acquire can > > enforce the ordering instead of rmb barrier. > > > > [1]https://patchwork.dpdk.org/patch/49109/ > > > > Signed-off-by: Joyce Kong <joyce.k...@arm.com> > > Reviewed-by: Gavin Hu <gavin...@arm.com> > > Reviewed-by: Phil Yang <phil.y...@arm.com> > > --- > > drivers/net/virtio/virtio_rxtx.c | 26 > > ++++++++++++++++++------ > > drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +++++- > > lib/librte_vhost/vhost.h | 2 +- > > lib/librte_vhost/virtio_net.c | 11 +++++----- > > 4 files changed, 31 insertions(+), 14 deletions(-) > > > > diff --git a/drivers/net/virtio/virtio_rxtx.c > > b/drivers/net/virtio/virtio_rxtx.c > > index 27ead19..2a2153c 100644 > > --- a/drivers/net/virtio/virtio_rxtx.c > > +++ b/drivers/net/virtio/virtio_rxtx.c > > @@ -456,8 +456,14 @@ virtqueue_enqueue_recv_refill_packed(struct > virtqueue *vq, > > vq->vq_desc_head_idx = dxp->next; > > if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END) > > vq->vq_desc_tail_idx = vq->vq_desc_head_idx; > > - virtio_wmb(hw->weak_barriers); > > - start_dp[idx].flags = flags; > > + > > + if (hw->weak_barriers) > > + __atomic_store_n(&start_dp[idx].flags, flags, > > + __ATOMIC_RELEASE); > > + else { > > + rte_cio_wmb(); > > + start_dp[idx].flags = flags; > > + } > It looks good to me. > I just wonder whether it would be cleaner to put that in an inline > function: > > static inline void > virtqueue_store_flags_packed() > > Same for the fetch.
Have wrapped the store/fetch operation in inline functions in v3. Best Regards, Joyce