Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coque...@redhat.com>
> Sent: Thursday, January 27, 2022 6:47 PM
> To: Wang, YuanX <yuanx.w...@intel.com>; Xia, Chenbo
> <chenbo....@intel.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu...@intel.com>; Ding, Xuan
> <xuan.d...@intel.com>; Ma, WenwuX <wenwux...@intel.com>; Ling,
> WeiX <weix.l...@intel.com>
> Subject: Re: [PATCH] vhost: fix data-plane access to released vq
> 
> Hi,
> 
> On 1/27/22 11:30, Wang, YuanX wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coque...@redhat.com>
> >> Sent: Wednesday, January 26, 2022 10:03 PM
> >> To: Wang, YuanX <yuanx.w...@intel.com>; Xia, Chenbo
> >> <chenbo....@intel.com>
> >> Cc: dev@dpdk.org; Hu, Jiayu <jiayu...@intel.com>; Ding, Xuan
> >> <xuan.d...@intel.com>; Ma, WenwuX <wenwux...@intel.com>; Ling,
> WeiX
> >> <weix.l...@intel.com>
> >> Subject: Re: [PATCH] vhost: fix data-plane access to released vq
> >>
> >> Hi Yuan,
> >>
> >> On 12/3/21 17:34, Yuan Wang wrote:
> >>> From: yuan wang <yuanx.w...@intel.com>
> >>>
> >>> When numa reallocation occurs, numa_realoc() on the control plane
> >>> will free the old vq. If rte_vhost_dequeue_burst() on the data plane
> >>> get the vq just before release, then it will access the released vq.
> >>> We need to put the
> >>> vq->access_lock into struct virtio_net to ensure that it
> >>> can prevents this situation.
> >>
> >>
> >> This patch is a fix, so the Fixes tag would be needed.
> >>
> >> But are you really facing this issue, or this is just based on code review?
> >
> > This issue is run-time checked with AddressSanitizer which can be turned
> on by:
> > meson configure -Db_sanitize=address <build_dir>
> >
> >>
> >> Currently NUMA reallocation is called whenever
> >> translate_ring_addresses() is called.
> >>
> >> translate_ring_addresses() is primarly called at device
> >> initialization, before the .new_device() callback is called. At that
> >> stage, there is no risk to performa NUMA reallocation as the
> >> application is not expected to use APIs requiring vq->access_lock
> acquisition.
> >>
> >> But I agree there are possibilities that numa_realloc() gets called
> >> while device is in running state. But even if that happened, I don't
> >> think it is possible that
> >> numa_realloc() ends-up reallocating the virtqueue on a different NUMA
> >> node (the vring should not have moved from a physical memory
> standpoint).
> >> And if even it happened, we should be safe because we ensure the VQ
> >> was not ready (so not usable by the
> >> application) before proceeding with reallocation:
> >
> > Here is a scenario where VQ ready has not been set:
> > 1. run the testpmd and then start the data plane process.
> > 2. run the front-end.
> > 3. new_device() gets called when the first two queues are ready, even if
> the later queues are not.
> > 4. when processing messages from the later queues, it may go to
> numa_realloc(), the ready flag has not been set and therefore can be
> reallocated.
> 
> I will need a bit more details here.

For this scenario I used a QEMU as the front end and set up 8 queues with the 
front and back ends in different NUMA.

> 
> AFAICT, if the ready flag is not set for a given virtqueue, the virtqueue is 
> not
> supposed to be exposed to the application. Is there a case where it happens?
> If so, the fix should consist in ensuring the application cannot use the
> virtqueue if it is not ready.
> 
> Regards,
> Maxime

Thanks for the suggestion, I will look for more details on this.

Regards,
Yuan

> 
> >
> > If all the queues are ready before call new_deivce(), this issue does not
> occur.
> > I think maybe it is another solution.
> 
> No, that was the older behaviour but causes issues with vDPA.
> We cannot just revert to older behaviour.
> 
> Thanks,
> Maxime
> 
> > Thanks,
> > Yuan
> >
> >>
> >> static struct virtio_net*
> >> numa_realloc(struct virtio_net *dev, int index) {
> >>    int node, dev_node;
> >>    struct virtio_net *old_dev;
> >>    struct vhost_virtqueue *vq;
> >>    struct batch_copy_elem *bce;
> >>    struct guest_page *gp;
> >>    struct rte_vhost_memory *mem;
> >>    size_t mem_size;
> >>    int ret;
> >>
> >>    old_dev = dev;
> >>    vq = dev->virtqueue[index];
> >>
> >>    /*
> >>     * If VQ is ready, it is too late to reallocate, it certainly already
> >>     * happened anyway on VHOST_USER_SET_VRING_ADRR.
> >>     */
> >>    if (vq->ready)
> >>            return dev;
> >>
> >> So, if this is fixing a real issue, I would need more details on the
> >> issue in order to understand why vq->ready was not set when it should
> have been.
> >>
> >> On a side note, while trying to understand how you could face an
> >> issue, I noticed that translate_ring_addresses() may be called by
> >> vhost_user_iotlb_msg(). In that case, vq->access_lock is not held as
> >> this is the handler for VHOST_USER_IOTLB_MSG. We may want to protect
> >> translate_ring_addresses() calls with locking the VQ locks. I will
> >> post a fix for it.
> >>
> >>> Signed-off-by: Yuan Wang <yuanx.w...@intel.com>
> >>> ---
> >>>    lib/vhost/vhost.c      | 26 +++++++++++++-------------
> >>>    lib/vhost/vhost.h      |  4 +---
> >>>    lib/vhost/vhost_user.c |  4 ++--
> >>>    lib/vhost/virtio_net.c | 16 ++++++++--------
> >>>    4 files changed, 24 insertions(+), 26 deletions(-)
> >>>
> >>
> >> ...
> >>
> >>> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index
> >>> 7085e0885c..f85ce4fda5 100644
> >>> --- a/lib/vhost/vhost.h
> >>> +++ b/lib/vhost/vhost.h
> >>> @@ -185,9 +185,6 @@ struct vhost_virtqueue {
> >>>           bool                    access_ok;
> >>>           bool                    ready;
> >>>
> >>> - rte_spinlock_t          access_lock;
> >>> -
> >>> -
> >>>           union {
> >>>                   struct vring_used_elem  *shadow_used_split;
> >>>                   struct vring_used_elem_packed *shadow_used_packed;
> >> @@ -384,6
> >>> +381,7 @@ struct virtio_net {
> >>>           int                     extbuf;
> >>>           int                     linearbuf;
> >>>           struct vhost_virtqueue  *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
> >>> + rte_spinlock_t          vq_access_lock[VHOST_MAX_QUEUE_PAIRS
> >> * 2];
> >>
> >> The problem here is that you'll be introducing false sharing, so I
> >> expect performance to no more scale with the number of queues.
> >>
> >> It also consumes unnecessary memory.
> >>
> >>>           struct inflight_mem_info *inflight_info;
> >>>    #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX :
> IFNAMSIZ)
> >>>           char                    ifname[IF_NAME_SZ];
> >>
> >> Thanks,
> >> Maxime
> >

Reply via email to