On 4/30/2020 10:14 AM, Joyce Kong wrote: > In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend > and backend are assumed to be implemented in software, that is they can > run on identical CPUs in an SMP configuration. > Thus a weak form of memory barriers like rte_smp_r/wmb, other than > rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1) > and yields better performance. > For the above case, this patch helps yielding even better performance > by replacing the two-way barriers with C11 one-way barriers for used > index in split ring. > > Signed-off-by: Joyce Kong <joyce.k...@arm.com> > Reviewed-by: Gavin Hu <gavin...@arm.com> > Reviewed-by: Maxime Coquelin <maxime.coque...@redhat.com>
<...> > @@ -464,8 +464,33 @@ virtio_get_queue_type(struct virtio_hw *hw, uint16_t > vtpci_queue_idx) > return VTNET_TQ; > } > > -#define VIRTQUEUE_NUSED(vq) ((uint16_t)((vq)->vq_split.ring.used->idx - \ > - (vq)->vq_used_cons_idx)) > +/* virtqueue_nused has load-acquire or rte_cio_rmb insed */ > +static inline uint16_t > +virtqueue_nused(const struct virtqueue *vq) > +{ > + uint16_t idx; > + > + if (vq->hw->weak_barriers) { > + /** > + * x86 prefers to using rte_smp_rmb over __atomic_load_n as it > + * reports a slightly better perf, which comes from the saved > + * branch by the compiler. > + * The if and else branches are identical with the smp and cio > + * barriers both defined as compiler barriers on x86. > + */ > +#ifdef RTE_ARCH_X86_64 > + idx = vq->vq_split.ring.used->idx; > + rte_smp_rmb(); > +#else > + idx = __atomic_load_n(&(vq)->vq_split.ring.used->idx, > + __ATOMIC_ACQUIRE); > +#endif > + } else { > + idx = vq->vq_split.ring.used->idx; > + rte_cio_rmb(); > + } > + return idx - vq->vq_used_cons_idx; > +} AltiVec implementation (virtio_rxtx_simple_altivec.c) is also using 'VIRTQUEUE_NUSED' macro, it also needs to be updated with this change.