Re: [RFC PATCH 00/18] Map memory at destination .load_setup in vDPA-net migration

Eugenio Perez Martin Mon, 06 Nov 2023 01:06:28 -0800

On Fri, Nov 3, 2023 at 9:41 PM Si-Wei Liu <si-wei....@oracle.com> wrote:
>
>
>
> On 11/2/2023 3:12 AM, Si-Wei Liu wrote:
> >
> >
> > On 10/19/2023 7:34 AM, Eugenio Pérez wrote:
> >> Current memory operations like pinning may take a lot of time at the
> >>
> >> destination.  Currently they are done after the source of the
> >> migration is
> >>
> >> stopped, and before the workload is resumed at the destination. This
> >> is a
> >>
> >> period where neigher traffic can flow, nor the VM workload can continue
> >>
> >> (downtime).
> >>
> >>
> >>
> >> We can do better as we know the memory layout of the guest RAM at the
> >>
> >> destination from the moment the migration starts.  Moving that
> >> operation allows
> >>
> >> QEMU to communicate the kernel the maps while the workload is still
> >> running in
> >>
> >> the source, so Linux can start mapping them.  Ideally, all IOMMU is
> >> configured,
> >>
> >> but if the vDPA parent driver uses on-chip IOMMU and .set_map we're
> >> still
> >>
> >> saving all the pinning time.
> > I get what you want to say, though not sure how pinning is relevant to
> > on-chip IOMMU and .set_map here, essentially pinning is required for
> > all parent vdpa drivers that perform DMA hence don't want VM pages to
> > move around.
> >>
> >>
> >>
> >> Note that further devices setup at the end of the migration may alter
> >> the guest
> >>
> >> memory layout. But same as the previous point, many operations are
> >> still done
> >>
> >> incrementally, like memory pinning, so we're saving time anyway.
> >>
> >>
> >>
> >> The first bunch of patches just reorganizes the code, so memory related
> >>
> >> operation parameters are shared between all vhost_vdpa devices. This is
> >>
> >> because the destination does not know what vhost_vdpa struct will
> >> have the
> >>
> >> registered listener member, so it is easier to place them in a shared
> >> struct
> >>
> >> rather to keep them in vhost_vdpa struct.  Future version may squash
> >> or omit
> >>
> >> these patches.
> > It looks this VhostVDPAShared facility (patch 1-13) is also what I
> > need in my SVQ descriptor group series [*], for which I've built
> > similar construct there. If possible please try to merge this in ASAP.
> > I'll rework my series on top of that.
> >
> > [*]
> > https://github.com/siwliu-kernel/qemu/commit/813518354af5ee8a6e867b2bf7dff3d6004fbcd5
> >
> >>
> >>
> >>
> >> Only tested with vdpa_sim. I'm sending this before full benchmark, as
> >> some work
> >>
> >> like [1] can be based on it, and Si-Wei agreed on benchmark this
> >> series with
> >>
> >> his experience.
> > Haven't done the full benchmark compared to pre-map at destination yet,
> Hi Eugenio,
>
> I just notice one thing that affects the performance benchmark for this
> series in terms of migration total_time (to be fair, it's mlx5_vdpa
> specific). It looks like iotlb map batching is not acked (via
> vhost_vdpa_set_backend_cap) at the point of vhost-vdpa_load_setup,
> effectively causing quite extensive time spent on hundreds of dma_map
> calls from listener_register().  While the equivalent code had been
> implemented in my destination pre-map patch [1]. Although I can
> benchmark the current patchset by remove batching from my code, I guess
> that's not the goal of this benchmark, right?
>
> If would be the best to have map batching in place, so benchmark for
> both options could match. What do you think?
>


This is definitely a bug I need to fix too, thanks for catching it!
:). But I'm not sure how this happens, as net_init_vhost_vdpa should
be called before .load_setup. I'll take a deeper look, thanks!

> Thanks,
> -Siwei
>
> [1]
> https://github.com/siwliu-kernel/qemu/commit/0ce225b0c7e618163ea09da3846c93c4de2f85ed#diff-45489c6f25dc36fd84e1cd28cbf3b8ff03301e2d24dadb6d1c334c9e8f14c00cR639
>
> > though an observation is that the destination QEMU seems very easy to
> > get stuck for very long time while in mid of pinning pages. During
> > this period, any client doing read-only QMP query or executing HMP
> > info command got frozen indefinitely (subject to how large size the
> > memory is being pinned). Is it possible to unblock those QMP request
> > or HMP command from being executed (at least the read-only ones) while
> > in migration? Yield from the load_setup corourtine and spawn another
> > thread?
> >
> > Having said, not sure if .load_setup is a good fit for what we want to
> > do. Searching all current users of .load_setup, either the job can be
> > done instantly or the task is time bound without trapping into kernel
> > for too long. Maybe pinning is too special use case here...
> >
> > -Siwei
> >>
> >>
> >>
> >> Future directions on top of this series may include:
> >>
> >> * Iterative migration of virtio-net devices, as it may reduce
> >> downtime per [1].
> >>
> >>    vhost-vdpa net can apply the configuration through CVQ in the
> >> destination
> >>
> >>    while the source is still migrating.
> >>
> >> * Move more things ahead of migration time, like DRIVER_OK.
> >>
> >> * Check that the devices of the destination are valid, and cancel the
> >> migration
> >>
> >>    in case it is not.
> >>
> >>
> >>
> >> [1]
> >> https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/
> >>
> >>
> >>
> >> Eugenio Pérez (18):
> >>
> >>    vdpa: add VhostVDPAShared
> >>
> >>    vdpa: move iova tree to the shared struct
> >>
> >>    vdpa: move iova_range to vhost_vdpa_shared
> >>
> >>    vdpa: move shadow_data to vhost_vdpa_shared
> >>
> >>    vdpa: use vdpa shared for tracing
> >>
> >>    vdpa: move file descriptor to vhost_vdpa_shared
> >>
> >>    vdpa: move iotlb_batch_begin_sent to vhost_vdpa_shared
> >>
> >>    vdpa: move backend_cap to vhost_vdpa_shared
> >>
> >>    vdpa: remove msg type of vhost_vdpa
> >>
> >>    vdpa: move iommu_list to vhost_vdpa_shared
> >>
> >>    vdpa: use VhostVDPAShared in vdpa_dma_map and unmap
> >>
> >>    vdpa: use dev_shared in vdpa_iommu
> >>
> >>    vdpa: move memory listener to vhost_vdpa_shared
> >>
> >>    vdpa: do not set virtio status bits if unneeded
> >>
> >>    vdpa: add vhost_vdpa_load_setup
> >>
> >>    vdpa: add vhost_vdpa_net_load_setup NetClient callback
> >>
> >>    vdpa: use shadow_data instead of first device v->shadow_vqs_enabled
> >>
> >>    virtio_net: register incremental migration handlers
> >>
> >>
> >>
> >>   include/hw/virtio/vhost-vdpa.h |  43 +++++---
> >>
> >>   include/net/net.h              |   4 +
> >>
> >>   hw/net/virtio-net.c            |  23 +++++
> >>
> >>   hw/virtio/vdpa-dev.c           |   7 +-
> >>
> >>   hw/virtio/vhost-vdpa.c         | 183 ++++++++++++++++++---------------
> >>
> >>   net/vhost-vdpa.c               | 127 ++++++++++++-----------
> >>
> >>   hw/virtio/trace-events         |  14 +--
> >>
> >>   7 files changed, 239 insertions(+), 162 deletions(-)
> >>
> >>
> >>
> >
>

Re: [RFC PATCH 00/18] Map memory at destination .load_setup in vDPA-net migration

Reply via email to