On Fri, Nov 3, 2023 at 9:41 PM Si-Wei Liu <si-wei....@oracle.com> wrote: > > > > On 11/2/2023 3:12 AM, Si-Wei Liu wrote: > > > > > > On 10/19/2023 7:34 AM, Eugenio Pérez wrote: > >> Current memory operations like pinning may take a lot of time at the > >> > >> destination. Currently they are done after the source of the > >> migration is > >> > >> stopped, and before the workload is resumed at the destination. This > >> is a > >> > >> period where neigher traffic can flow, nor the VM workload can continue > >> > >> (downtime). > >> > >> > >> > >> We can do better as we know the memory layout of the guest RAM at the > >> > >> destination from the moment the migration starts. Moving that > >> operation allows > >> > >> QEMU to communicate the kernel the maps while the workload is still > >> running in > >> > >> the source, so Linux can start mapping them. Ideally, all IOMMU is > >> configured, > >> > >> but if the vDPA parent driver uses on-chip IOMMU and .set_map we're > >> still > >> > >> saving all the pinning time. > > I get what you want to say, though not sure how pinning is relevant to > > on-chip IOMMU and .set_map here, essentially pinning is required for > > all parent vdpa drivers that perform DMA hence don't want VM pages to > > move around. > >> > >> > >> > >> Note that further devices setup at the end of the migration may alter > >> the guest > >> > >> memory layout. But same as the previous point, many operations are > >> still done > >> > >> incrementally, like memory pinning, so we're saving time anyway. > >> > >> > >> > >> The first bunch of patches just reorganizes the code, so memory related > >> > >> operation parameters are shared between all vhost_vdpa devices. This is > >> > >> because the destination does not know what vhost_vdpa struct will > >> have the > >> > >> registered listener member, so it is easier to place them in a shared > >> struct > >> > >> rather to keep them in vhost_vdpa struct. Future version may squash > >> or omit > >> > >> these patches. > > It looks this VhostVDPAShared facility (patch 1-13) is also what I > > need in my SVQ descriptor group series [*], for which I've built > > similar construct there. If possible please try to merge this in ASAP. > > I'll rework my series on top of that. > > > > [*] > > https://github.com/siwliu-kernel/qemu/commit/813518354af5ee8a6e867b2bf7dff3d6004fbcd5 > > > >> > >> > >> > >> Only tested with vdpa_sim. I'm sending this before full benchmark, as > >> some work > >> > >> like [1] can be based on it, and Si-Wei agreed on benchmark this > >> series with > >> > >> his experience. > > Haven't done the full benchmark compared to pre-map at destination yet, > Hi Eugenio, > > I just notice one thing that affects the performance benchmark for this > series in terms of migration total_time (to be fair, it's mlx5_vdpa > specific). It looks like iotlb map batching is not acked (via > vhost_vdpa_set_backend_cap) at the point of vhost-vdpa_load_setup, > effectively causing quite extensive time spent on hundreds of dma_map > calls from listener_register(). While the equivalent code had been > implemented in my destination pre-map patch [1]. Although I can > benchmark the current patchset by remove batching from my code, I guess > that's not the goal of this benchmark, right? > > If would be the best to have map batching in place, so benchmark for > both options could match. What do you think? >
This is definitely a bug I need to fix too, thanks for catching it! :). But I'm not sure how this happens, as net_init_vhost_vdpa should be called before .load_setup. I'll take a deeper look, thanks! > Thanks, > -Siwei > > [1] > https://github.com/siwliu-kernel/qemu/commit/0ce225b0c7e618163ea09da3846c93c4de2f85ed#diff-45489c6f25dc36fd84e1cd28cbf3b8ff03301e2d24dadb6d1c334c9e8f14c00cR639 > > > though an observation is that the destination QEMU seems very easy to > > get stuck for very long time while in mid of pinning pages. During > > this period, any client doing read-only QMP query or executing HMP > > info command got frozen indefinitely (subject to how large size the > > memory is being pinned). Is it possible to unblock those QMP request > > or HMP command from being executed (at least the read-only ones) while > > in migration? Yield from the load_setup corourtine and spawn another > > thread? > > > > Having said, not sure if .load_setup is a good fit for what we want to > > do. Searching all current users of .load_setup, either the job can be > > done instantly or the task is time bound without trapping into kernel > > for too long. Maybe pinning is too special use case here... > > > > -Siwei > >> > >> > >> > >> Future directions on top of this series may include: > >> > >> * Iterative migration of virtio-net devices, as it may reduce > >> downtime per [1]. > >> > >> vhost-vdpa net can apply the configuration through CVQ in the > >> destination > >> > >> while the source is still migrating. > >> > >> * Move more things ahead of migration time, like DRIVER_OK. > >> > >> * Check that the devices of the destination are valid, and cancel the > >> migration > >> > >> in case it is not. > >> > >> > >> > >> [1] > >> https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/ > >> > >> > >> > >> Eugenio Pérez (18): > >> > >> vdpa: add VhostVDPAShared > >> > >> vdpa: move iova tree to the shared struct > >> > >> vdpa: move iova_range to vhost_vdpa_shared > >> > >> vdpa: move shadow_data to vhost_vdpa_shared > >> > >> vdpa: use vdpa shared for tracing > >> > >> vdpa: move file descriptor to vhost_vdpa_shared > >> > >> vdpa: move iotlb_batch_begin_sent to vhost_vdpa_shared > >> > >> vdpa: move backend_cap to vhost_vdpa_shared > >> > >> vdpa: remove msg type of vhost_vdpa > >> > >> vdpa: move iommu_list to vhost_vdpa_shared > >> > >> vdpa: use VhostVDPAShared in vdpa_dma_map and unmap > >> > >> vdpa: use dev_shared in vdpa_iommu > >> > >> vdpa: move memory listener to vhost_vdpa_shared > >> > >> vdpa: do not set virtio status bits if unneeded > >> > >> vdpa: add vhost_vdpa_load_setup > >> > >> vdpa: add vhost_vdpa_net_load_setup NetClient callback > >> > >> vdpa: use shadow_data instead of first device v->shadow_vqs_enabled > >> > >> virtio_net: register incremental migration handlers > >> > >> > >> > >> include/hw/virtio/vhost-vdpa.h | 43 +++++--- > >> > >> include/net/net.h | 4 + > >> > >> hw/net/virtio-net.c | 23 +++++ > >> > >> hw/virtio/vdpa-dev.c | 7 +- > >> > >> hw/virtio/vhost-vdpa.c | 183 ++++++++++++++++++--------------- > >> > >> net/vhost-vdpa.c | 127 ++++++++++++----------- > >> > >> hw/virtio/trace-events | 14 +-- > >> > >> 7 files changed, 239 insertions(+), 162 deletions(-) > >> > >> > >> > > >