On 04/06, Joyce Kong wrote: >In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend >and backend are assumed to be implemented in software, that is they can >run on identical CPUs in an SMP configuration. >Thus a weak form of memory barriers like rte_smp_r/wmb, other than >rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1) >and yields better performance. >For the above case, this patch helps yielding even better performance >by replacing the two-way barriers with C11 one-way barriers for used >index in split ring. > >Signed-off-by: Joyce Kong <joyce.k...@arm.com> >Reviewed-by: Gavin Hu <gavin...@arm.com> >--- > drivers/net/virtio/virtio_ethdev.c | 9 ++-- > drivers/net/virtio/virtio_ring.h | 2 +- > drivers/net/virtio/virtio_rxtx.c | 46 +++++++++---------- > drivers/net/virtio/virtio_rxtx_simple_neon.c | 5 +- > drivers/net/virtio/virtio_rxtx_simple_sse.c | 5 +- > .../net/virtio/virtio_user/virtio_user_dev.c | 8 ++-- > drivers/net/virtio/virtqueue.c | 2 +- > drivers/net/virtio/virtqueue.h | 37 ++++++++++++--- > lib/librte_vhost/virtio_net.c | 5 +- > 9 files changed, 71 insertions(+), 48 deletions(-) > >diff --git a/drivers/net/virtio/virtio_ethdev.c >b/drivers/net/virtio/virtio_ethdev.c >index f9d0ea70d..a4a865bfa 100644 >--- a/drivers/net/virtio/virtio_ethdev.c >+++ b/drivers/net/virtio/virtio_ethdev.c >@@ -285,13 +285,12 @@ virtio_send_command_split(struct virtnet_ctl *cvq, > > virtqueue_notify(vq); > >- rte_rmb(); >- while (VIRTQUEUE_NUSED(vq) == 0) { >- rte_rmb(); >+ /* virtqueue_nused has a load-acquire or rte_cio_rmb inside */ >+ while (virtqueue_nused(vq) == 0) > usleep(100); >- } > >- while (VIRTQUEUE_NUSED(vq)) { >+ /* virtqueue_nused has a load-acquire or rte_cio_rmb inside */ >+ while (virtqueue_nused(vq)) { > uint32_t idx, desc_idx, used_idx; > struct vring_used_elem *uep; > >diff --git a/drivers/net/virtio/virtio_ring.h >b/drivers/net/virtio/virtio_ring.h >index 7ba34662e..0f6574f68 100644 >--- a/drivers/net/virtio/virtio_ring.h >+++ b/drivers/net/virtio/virtio_ring.h >@@ -59,7 +59,7 @@ struct vring_used_elem { > > struct vring_used { > uint16_t flags; >- volatile uint16_t idx; >+ uint16_t idx; > struct vring_used_elem ring[0]; > }; > >diff --git a/drivers/net/virtio/virtio_rxtx.c >b/drivers/net/virtio/virtio_rxtx.c >index 752faa0f6..9ba26fd95 100644 >--- a/drivers/net/virtio/virtio_rxtx.c >+++ b/drivers/net/virtio/virtio_rxtx.c >@@ -45,7 +45,7 @@ virtio_dev_rx_queue_done(void *rxq, uint16_t offset) > struct virtnet_rx *rxvq = rxq; > struct virtqueue *vq = rxvq->vq; > >- return VIRTQUEUE_NUSED(vq) >= offset; >+ return virtqueue_nused(vq) >= offset; > } > > void >@@ -1243,9 +1243,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf >**rx_pkts, uint16_t nb_pkts) > if (unlikely(hw->started == 0)) > return nb_rx; > >- nb_used = VIRTQUEUE_NUSED(vq); >- >- virtio_rmb(hw->weak_barriers); >+ /* virtqueue_nused has a load-acquire or rte_cio_rmb inside */
Small nit, I don't think we need to add this comment to every occurrence of virtqueue_nused, what about moving it to the definition of this function? Thanks, Xiaolong