Hi Jianfeng,
On 12/08/2017 09:51 AM, Tan, Jianfeng wrote:
-----Original Message-----
From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
Sent: Friday, December 8, 2017 4:36 PM
To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; y...@fridaylinux.org; Bie,
Tiwei
Cc: sta...@dpdk.org; jfrei...@redhat.com
Subject: Re: [PATCH] vhost_user: protect active rings from async ring
changes
On 12/08/2017 03:14 AM, Tan, Jianfeng wrote:
-----Original Message-----
From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
Sent: Thursday, December 7, 2017 6:02 PM
To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; y...@fridaylinux.org;
Bie,
Tiwei
Cc: sta...@dpdk.org; jfrei...@redhat.com
Subject: Re: [PATCH] vhost_user: protect active rings from async ring
changes
On 12/07/2017 10:33 AM, Tan, Jianfeng wrote:
-----Original Message-----
From: Victor Kaplansky [mailto:vkapl...@redhat.com]
Sent: Wednesday, December 6, 2017 9:56 PM
To: dev@dpdk.org; y...@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng;
vkapl...@redhat.com
Cc: sta...@dpdk.org; jfrei...@redhat.com; Maxime Coquelin
Subject: [PATCH] vhost_user: protect active rings from async ring
changes
When performing live migration or memory hot-plugging,
the changes to the device and vrings made by message handler
done independently from vring usage by PMD threads.
This causes for example segfauls during live-migration
segfauls ->segfaults?
with MQ enable, but in general virtually any request
sent by qemu changing the state of device can cause
problems.
These patches fixes all above issues by adding a spinlock
to every vring and requiring message handler to start operation
only after ensuring that all PMD threads related to the divece
Another typo: divece.
are out of critical section accessing the vring data.
Each vring has its own lock in order to not create contention
between PMD threads of different vrings and to prevent
performance degradation by scaling queue pair number.
Also wonder how much overhead it brings.
Instead of locking each vring, can we just, waiting a while (10us for
example)
after call destroy_device() callback so that every PMD thread has enough
time to skip out the criterial area?
No, because we are not destroying the device when it is needed.
Actually, once destroy_device() is called, it is likely that the
application has taken care the ring aren't being processed anymore
before returning from the callback (This is at least the case with Vhost
PMD).
OK, I did not put it right way as there are multiple cases above: migration
and memory hot plug. Let me try again:
Whenever a vhost thread handles a message affecting PMD threads, (like
SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag -
VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out of
those criterial area. After message handling, reset the flag -
VIRTIO_DEV_RUNNING.
I think you mean clearing vq's enabled flag, because PMD threads never
check the VIRTIO_DEV_RUNNING flag.
Ah, yes.
I suppose it can work, basing on an assumption that PMD threads work in
polling mode and can skip criterial area quickly and inevitably.
That sounds very fragile, because if the CPU aren't perfectly isolated,
your PMD thread can be preempted for interrupt handling for example.
Or what if for some reason the PMD thread CPU stalls for a short while?
The later is unlikely, but if it happens, it will be hard to debug.
Let's see first the performance impact of using the spinlock. It might
not be that important because 99.9999% of the times, it will not even
spin.
Fair enough.
I did some benchmarks on my Broadwell test bench (see patch below), and
it seems that depending on the benchmark, perfs are on par, or better
with the spinlock! I guess it explains because with the spinlock there
is a better batching and less concurrent accesses on the rings, but I'm
not sure.
Please find my results below (CPU E5-2667 v4 @ 3.20GHz):
Bench v17.11 v17.11 + spinlock
---------------- ----------- -------------------
PVP Bidir run1 19.29Mpps 19.26Mpps
PVP Bidir run2 19.26Mpps 19.28Mpps
TxOnly 18.47Mpps 18.83Mpps
RxOnly 13.83Mpps 13.83Mpps
IO Loopback 7.94Mpps 7.97Mpps
Patch:
---------------------------------------------------------
From 7e18cefce682235558fed66a1fb87ab937ec9297 Mon Sep 17 00:00:00 2001
From: Maxime Coquelin <maxime.coque...@redhat.com>
Date: Fri, 8 Dec 2017 04:21:25 -0500
Subject: [PATCH] vhost: spinlock benchmarking
Signed-off-by: Maxime Coquelin <maxime.coque...@redhat.com>
---
lib/librte_vhost/vhost.c | 2 ++
lib/librte_vhost/vhost.h | 2 ++
lib/librte_vhost/virtio_net.c | 12 ++++++++++++
3 files changed, 16 insertions(+)
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 4f8b73a..d9752cf 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -219,6 +219,8 @@ struct virtio_net *
vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;
+ rte_spinlock_init(&vq->access_lock);
+
vhost_user_iotlb_init(dev, vring_idx);
/* Backends are set to -1 indicating an inactive device. */
vq->backend = -1;
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 1cc81c1..56242a8 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -131,6 +131,8 @@ struct vhost_virtqueue {
struct batch_copy_elem *batch_copy_elems;
uint16_t batch_copy_nb_elems;
+ rte_spinlock_t access_lock;
+
rte_rwlock_t iotlb_lock;
rte_rwlock_t iotlb_pending_lock;
struct rte_mempool *iotlb_pool;
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 6fee16e..50cc3de 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -329,6 +329,8 @@
if (unlikely(vq->enabled == 0))
return 0;
+ rte_spinlock_lock(&vq->access_lock);
+
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_lock(vq);
@@ -419,6 +421,8 @@
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_unlock(vq);
+ rte_spinlock_unlock(&vq->access_lock);
+
return count;
}
@@ -654,6 +658,8 @@
if (unlikely(vq->enabled == 0))
return 0;
+ rte_spinlock_lock(&vq->access_lock);
+
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_lock(vq);
@@ -715,6 +721,8 @@
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_unlock(vq);
+ rte_spinlock_unlock(&vq->access_lock);
+
return pkt_idx;
}
@@ -1183,6 +1191,8 @@
if (unlikely(vq->enabled == 0))
return 0;
+ rte_spinlock_lock(&vq->access_lock);
+
vq->batch_copy_nb_elems = 0;
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
@@ -1356,6 +1366,8 @@
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_unlock(vq);
+ rte_spinlock_unlock(&vq->access_lock);
+
if (unlikely(rarp_mbuf != NULL)) {
/*
* Inject it to the head of "pkts" array, so that switch's mac
--
1.8.3.1