On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen <e...@nvidia.com> wrote: > > On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote: > > Thanks Eli and Jason for clarifications. See inline. > > > > On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen <e...@nvidia.com> wrote: > > > > > > On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote: > > > > > > > > On 2021/2/2 下午12:15, Si-Wei Liu wrote: > > > > > On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasow...@redhat.com> wrote: > > > > > > > > > > > > On 2021/2/2 上午3:17, Si-Wei Liu wrote: > > > > > > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu > > > > > > > <siwliu.ker...@gmail.com> wrote: > > > > > > > > On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <e...@nvidia.com> > > > > > > > > wrote: > > > > > > > > > suspend_vq should only suspend the VQ on not save the current > > > > > > > > > available > > > > > > > > > index. This is done when a change of map occurs when the > > > > > > > > > driver calls > > > > > > > > > save_channel_info(). > > > > > > > > Hmmm, suspend_vq() is also called by teardown_vq(), the latter > > > > > > > > of > > > > > > > > which doesn't save the available index as save_channel_info() > > > > > > > > doesn't > > > > > > > > get called in that path at all. How does it handle the case that > > > > > > > > aget_vq_state() is called from userspace (e.g. QEMU) while the > > > > > > > > hardware VQ object was torn down, but userspace still wants to > > > > > > > > access > > > > > > > > the queue index? > > > > > > > > > > > > > > > > Refer to > > > > > > > > https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei....@oracle.com/ > > > > > > > > > > > > > > > > vhost VQ 0 ring restore failed: -1: Resource temporarily > > > > > > > > unavailable (11) > > > > > > > > vhost VQ 1 ring restore failed: -1: Resource temporarily > > > > > > > > unavailable (11) > > > > > > > > > > > > > > > > QEMU will complain with the above warning while VM is being > > > > > > > > rebooted > > > > > > > > or shut down. > > > > > > > > > > > > > > > > Looks to me either the kernel driver should cover this > > > > > > > > requirement, or > > > > > > > > the userspace has to bear the burden in saving the index and > > > > > > > > not call > > > > > > > > into kernel if VQ is destroyed. > > > > > > > Actually, the userspace doesn't have the insights whether virt > > > > > > > queue > > > > > > > will be destroyed if just changing the device status via > > > > > > > set_status(). > > > > > > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave > > > > > > > like > > > > > > > so. Hence this still looks to me to be Mellanox specifics and > > > > > > > mlx5_vdpa implementation detail that shouldn't expose to > > > > > > > userspace. > > > > > > > > > > > > So I think we can simply drop this patch? > > > > > Yep, I think so. To be honest I don't know why it has anything to do > > > > > with the memory hotplug issue. > > > > > > > > > > > > Eli may know more, my understanding is that, during memory hotplut, qemu > > > > need to propagate new memory mappings via set_map(). For mellanox, it > > > > means > > > > it needs to rebuild memory keys, so the virtqueue needs to be suspended. > > > > > > > > > > I think Siwei was asking why the first patch was related to the hotplug > > > issue. > > > > I was thinking how consistency is assured when saving/restoring this > > h/w avail_index against the one in the virtq memory, particularly in > > the region_add/.region_del() context (e.g. the hotplug case). Problem > > is I don't see explicit memory barrier when guest thread updates the > > avail_index, how does the device make sure the h/w avail_index is > > uptodate while guest may race with updating the virtq's avail_index in > > the mean while? Maybe I completely miss something obvious? > DKIM-Signature: v1; arsa-sha256; crelaxed/relaxed; dnvidia.com; sn1; > t 12257780; bhHnB0z4VEKwRS3WGY8d836MJgxu5Eln/jbFZlNXVxc08; > hX-PGP-Universal:Date:From:To:CC:Subject:Message-ID:References: > MIME-Version:Content-Type:Content-Disposition: > Content-Transfer-Encoding:In-Reply-To:User-Agent:X-Originating-IP: > X-ClientProxiedBy; > bgGmb8+rcn3/rKzKQ/7QzSnghWzZ+FAU0XntsRZYGQ66sFvT7zsYPHogG3LIWNY77t > wNHPw7GCJrNaH3nEXPbOp0FMOZw4Kv4W7UPuYPobbLeTkvuPAidjB8dM42vz+1X61t > 9IVQT9X4hnAxRjI5CqZOo41GS4Tl1X+ykGoA+VE80BR/R/+nQ3tXDVULfppzeB+vu3 > TWnnpaZ2GyoNyPlMiyVRkHdXzDVgA4uQHxwHn7otGK5J4lzyu8KrFyQtiP+f6hfu5v > crJkYS8e9A+rfzUmKWuyHcKcmhPhAVJ4XdpzZcDXXlMHVxG7nR1o88xttC6D1+oNIP > 9xHI3DkNBpEuA > If you're asking about syncronization upon hot plug of memory, the > hardware always goes to read the available index from memory when a new > hardware object is associted with a virtqueue. You can argue then that > you don't need to restore the available index and you may be right but > this is the currect inteface to the firmware. > > > If you're asking on generally how sync is assured when the guest updates > the available index, can you please send a pointer to the code where you > see the update without a memory barrier?
This is a snippet of virtqueue_add_split() where avail_index gets updated by guest: /* Put entry in available array (but don't update avail->idx until they * do sync). */ avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1); vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head); /* Descriptors and available array need to be set before we expose the * new available array entries. */ virtio_wmb(vq->weak_barriers); vq->split.avail_idx_shadow++; vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->split.avail_idx_shadow); vq->num_added++; There's memory barrier to make sure the update to descriptor and available ring is seen before writing to the avail->idx, but there seems no gurantee that this update would flush to the memory immmedately either before or after the mlx5-vdpa is suspened and gets the old avail_index value stashed somewhere. In this case, how does the hardware ensure the consistency between the guest virtio and host mlx5-vdpa? Or, it completly relies on guest to update the avail_index once the next buffer is available, so that the index will be in sync again? Thanks, -Siwei > > > > > Thanks, > > -Siwei > > > > > > > > But you're correct. When memory is added, I get a new memory map. This > > > requires me to build a new memory key object which covers the new memory > > > map. Since the virtqueue objects are referencing this memory key, I need > > > to destroy them and build new ones referncing the new memory key. > > > > > > > Thanks > > > > > > > > > > > > > > > > > > -Siwei > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > -Siwei > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Eli Cohen <e...@nvidia.com> > > > > > > > > > --- > > > > > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 -------- > > > > > > > > > 1 file changed, 8 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > > > > index 88dde3455bfd..549ded074ff3 100644 > > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > > > > @@ -1148,8 +1148,6 @@ static int setup_vq(struct > > > > > > > > > mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq) > > > > > > > > > > > > > > > > > > static void suspend_vq(struct mlx5_vdpa_net *ndev, struct > > > > > > > > > mlx5_vdpa_virtqueue *mvq) > > > > > > > > > { > > > > > > > > > - struct mlx5_virtq_attr attr; > > > > > > > > > - > > > > > > > > > if (!mvq->initialized) > > > > > > > > > return; > > > > > > > > > > > > > > > > > > @@ -1158,12 +1156,6 @@ static void suspend_vq(struct > > > > > > > > > mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m > > > > > > > > > > > > > > > > > > if (modify_virtqueue(ndev, mvq, > > > > > > > > > MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND)) > > > > > > > > > mlx5_vdpa_warn(&ndev->mvdev, "modify to > > > > > > > > > suspend failed\n"); > > > > > > > > > - > > > > > > > > > - if (query_virtqueue(ndev, mvq, &attr)) { > > > > > > > > > - mlx5_vdpa_warn(&ndev->mvdev, "failed to query > > > > > > > > > virtqueue\n"); > > > > > > > > > - return; > > > > > > > > > - } > > > > > > > > > - mvq->avail_idx = attr.available_index; > > > > > > > > > } > > > > > > > > > > > > > > > > > > static void suspend_vqs(struct mlx5_vdpa_net *ndev) > > > > > > > > > -- > > > > > > > > > 2.29.2 > > > > > > > > > > > > >