On 5/1/20 12:58 AM, Ferruh Yigit wrote:
> On 4/30/2020 10:14 AM, Joyce Kong wrote:
>> In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend
>> and backend are assumed to be implemented in software, that is they can
>> run on identical CPUs in an SMP configuration.
>> Thus a weak form of memory barriers like rte_smp_r/wmb, other than
>> rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
>> and yields better performance.
>> For the above case, this patch helps yielding even better performance
>> by replacing the two-way barriers with C11 one-way barriers for used
>> index in split ring.
>>
>> Signed-off-by: Joyce Kong <joyce.k...@arm.com>
>> Reviewed-by: Gavin Hu <gavin...@arm.com>
>> Reviewed-by: Maxime Coquelin <maxime.coque...@redhat.com>
> 
> <...>
> 
>> @@ -464,8 +464,33 @@ virtio_get_queue_type(struct virtio_hw *hw, uint16_t 
>> vtpci_queue_idx)
>>              return VTNET_TQ;
>>  }
>>  
>> -#define VIRTQUEUE_NUSED(vq) ((uint16_t)((vq)->vq_split.ring.used->idx - \
>> -                                    (vq)->vq_used_cons_idx))
>> +/* virtqueue_nused has load-acquire or rte_cio_rmb insed */
>> +static inline uint16_t
>> +virtqueue_nused(const struct virtqueue *vq)
>> +{
>> +    uint16_t idx;
>> +
>> +    if (vq->hw->weak_barriers) {
>> +    /**
>> +     * x86 prefers to using rte_smp_rmb over __atomic_load_n as it
>> +     * reports a slightly better perf, which comes from the saved
>> +     * branch by the compiler.
>> +     * The if and else branches are identical with the smp and cio
>> +     * barriers both defined as compiler barriers on x86.
>> +     */
>> +#ifdef RTE_ARCH_X86_64
>> +            idx = vq->vq_split.ring.used->idx;
>> +            rte_smp_rmb();
>> +#else
>> +            idx = __atomic_load_n(&(vq)->vq_split.ring.used->idx,
>> +                            __ATOMIC_ACQUIRE);
>> +#endif
>> +    } else {
>> +            idx = vq->vq_split.ring.used->idx;
>> +            rte_cio_rmb();
>> +    }
>> +    return idx - vq->vq_used_cons_idx;
>> +}
> 
> AltiVec implementation (virtio_rxtx_simple_altivec.c) is also using
> 'VIRTQUEUE_NUSED' macro, it also needs to be updated with this change.
> 

I reproduced and fix the build issue.
You can fetch my tree with fixed series.

Thanks,
Maxime


Reply via email to