Hi Maxime, > -----Original Message----- > From: Maxime Coquelin <maxime.coque...@redhat.com> > Sent: Thursday, January 27, 2022 6:47 PM > To: Wang, YuanX <yuanx.w...@intel.com>; Xia, Chenbo > <chenbo....@intel.com> > Cc: dev@dpdk.org; Hu, Jiayu <jiayu...@intel.com>; Ding, Xuan > <xuan.d...@intel.com>; Ma, WenwuX <wenwux...@intel.com>; Ling, > WeiX <weix.l...@intel.com> > Subject: Re: [PATCH] vhost: fix data-plane access to released vq > > Hi, > > On 1/27/22 11:30, Wang, YuanX wrote: > > Hi Maxime, > > > >> -----Original Message----- > >> From: Maxime Coquelin <maxime.coque...@redhat.com> > >> Sent: Wednesday, January 26, 2022 10:03 PM > >> To: Wang, YuanX <yuanx.w...@intel.com>; Xia, Chenbo > >> <chenbo....@intel.com> > >> Cc: dev@dpdk.org; Hu, Jiayu <jiayu...@intel.com>; Ding, Xuan > >> <xuan.d...@intel.com>; Ma, WenwuX <wenwux...@intel.com>; Ling, > WeiX > >> <weix.l...@intel.com> > >> Subject: Re: [PATCH] vhost: fix data-plane access to released vq > >> > >> Hi Yuan, > >> > >> On 12/3/21 17:34, Yuan Wang wrote: > >>> From: yuan wang <yuanx.w...@intel.com> > >>> > >>> When numa reallocation occurs, numa_realoc() on the control plane > >>> will free the old vq. If rte_vhost_dequeue_burst() on the data plane > >>> get the vq just before release, then it will access the released vq. > >>> We need to put the > >>> vq->access_lock into struct virtio_net to ensure that it > >>> can prevents this situation. > >> > >> > >> This patch is a fix, so the Fixes tag would be needed. > >> > >> But are you really facing this issue, or this is just based on code review? > > > > This issue is run-time checked with AddressSanitizer which can be turned > on by: > > meson configure -Db_sanitize=address <build_dir> > > > >> > >> Currently NUMA reallocation is called whenever > >> translate_ring_addresses() is called. > >> > >> translate_ring_addresses() is primarly called at device > >> initialization, before the .new_device() callback is called. At that > >> stage, there is no risk to performa NUMA reallocation as the > >> application is not expected to use APIs requiring vq->access_lock > acquisition. > >> > >> But I agree there are possibilities that numa_realloc() gets called > >> while device is in running state. But even if that happened, I don't > >> think it is possible that > >> numa_realloc() ends-up reallocating the virtqueue on a different NUMA > >> node (the vring should not have moved from a physical memory > standpoint). > >> And if even it happened, we should be safe because we ensure the VQ > >> was not ready (so not usable by the > >> application) before proceeding with reallocation: > > > > Here is a scenario where VQ ready has not been set: > > 1. run the testpmd and then start the data plane process. > > 2. run the front-end. > > 3. new_device() gets called when the first two queues are ready, even if > the later queues are not. > > 4. when processing messages from the later queues, it may go to > numa_realloc(), the ready flag has not been set and therefore can be > reallocated. > > I will need a bit more details here.
For this scenario I used a QEMU as the front end and set up 8 queues with the front and back ends in different NUMA. > > AFAICT, if the ready flag is not set for a given virtqueue, the virtqueue is > not > supposed to be exposed to the application. Is there a case where it happens? > If so, the fix should consist in ensuring the application cannot use the > virtqueue if it is not ready. > > Regards, > Maxime Thanks for the suggestion, I will look for more details on this. Regards, Yuan > > > > > If all the queues are ready before call new_deivce(), this issue does not > occur. > > I think maybe it is another solution. > > No, that was the older behaviour but causes issues with vDPA. > We cannot just revert to older behaviour. > > Thanks, > Maxime > > > Thanks, > > Yuan > > > >> > >> static struct virtio_net* > >> numa_realloc(struct virtio_net *dev, int index) { > >> int node, dev_node; > >> struct virtio_net *old_dev; > >> struct vhost_virtqueue *vq; > >> struct batch_copy_elem *bce; > >> struct guest_page *gp; > >> struct rte_vhost_memory *mem; > >> size_t mem_size; > >> int ret; > >> > >> old_dev = dev; > >> vq = dev->virtqueue[index]; > >> > >> /* > >> * If VQ is ready, it is too late to reallocate, it certainly already > >> * happened anyway on VHOST_USER_SET_VRING_ADRR. > >> */ > >> if (vq->ready) > >> return dev; > >> > >> So, if this is fixing a real issue, I would need more details on the > >> issue in order to understand why vq->ready was not set when it should > have been. > >> > >> On a side note, while trying to understand how you could face an > >> issue, I noticed that translate_ring_addresses() may be called by > >> vhost_user_iotlb_msg(). In that case, vq->access_lock is not held as > >> this is the handler for VHOST_USER_IOTLB_MSG. We may want to protect > >> translate_ring_addresses() calls with locking the VQ locks. I will > >> post a fix for it. > >> > >>> Signed-off-by: Yuan Wang <yuanx.w...@intel.com> > >>> --- > >>> lib/vhost/vhost.c | 26 +++++++++++++------------- > >>> lib/vhost/vhost.h | 4 +--- > >>> lib/vhost/vhost_user.c | 4 ++-- > >>> lib/vhost/virtio_net.c | 16 ++++++++-------- > >>> 4 files changed, 24 insertions(+), 26 deletions(-) > >>> > >> > >> ... > >> > >>> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index > >>> 7085e0885c..f85ce4fda5 100644 > >>> --- a/lib/vhost/vhost.h > >>> +++ b/lib/vhost/vhost.h > >>> @@ -185,9 +185,6 @@ struct vhost_virtqueue { > >>> bool access_ok; > >>> bool ready; > >>> > >>> - rte_spinlock_t access_lock; > >>> - > >>> - > >>> union { > >>> struct vring_used_elem *shadow_used_split; > >>> struct vring_used_elem_packed *shadow_used_packed; > >> @@ -384,6 > >>> +381,7 @@ struct virtio_net { > >>> int extbuf; > >>> int linearbuf; > >>> struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; > >>> + rte_spinlock_t vq_access_lock[VHOST_MAX_QUEUE_PAIRS > >> * 2]; > >> > >> The problem here is that you'll be introducing false sharing, so I > >> expect performance to no more scale with the number of queues. > >> > >> It also consumes unnecessary memory. > >> > >>> struct inflight_mem_info *inflight_info; > >>> #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : > IFNAMSIZ) > >>> char ifname[IF_NAME_SZ]; > >> > >> Thanks, > >> Maxime > >