The broadcast_rarp field in the virtio_net struct is checked in the dequeue datapath regardless of whether descriptors are available or not.
As it is checked with cmpset leading to a write, false sharing on the virtio_net struct can happen between enqueue and dequeue datapaths regardless of whether a RARP is requested. In OVS, the issue can cause a uni-directional performance drop of up to 15%. Fix that by only performing the cmpset if a read of broadcast_rarp indicates that the cmpset is likely to succeed. Fixes: a66bcad32240 ("vhost: arrange struct fields for better cache sharing") Cc: sta...@dpdk.org Signed-off-by: Kevin Traynor <ktray...@redhat.com> --- V2: Change from fixing by moving broadcast_rarp location in virtio_net struct to performing a read before cmpset. lib/librte_vhost/virtio_net.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 337470d..d0a3b11 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1057,7 +1057,19 @@ static inline bool __attribute__((always_inline)) * * Check user_send_rarp() for more information. + * + * broadcast_rarp shares a cacheline in the virtio_net structure + * with some fields that are accessed during enqueue and + * rte_atomic16_cmpset() causes a write if using cmpxchg. This could + * result in false sharing between enqueue and dequeue. + * + * Prevent unnecessary false sharing by reading broadcast_rarp first + * and only performing cmpset if the read indicates it is likely to + * be set. */ - if (unlikely(rte_atomic16_cmpset((volatile uint16_t *) - &dev->broadcast_rarp.cnt, 1, 0))) { + + if (unlikely(rte_atomic16_read(&dev->broadcast_rarp) && + rte_atomic16_cmpset((volatile uint16_t *) + &dev->broadcast_rarp.cnt, 1, 0))) { + rarp_mbuf = rte_pktmbuf_alloc(mbuf_pool); if (rarp_mbuf == NULL) { -- 1.8.3.1