On Thu, Oct 19, 2023 at 10:35 PM Eugenio Pérez <epere...@redhat.com> wrote: > > Current memory operations like pinning may take a lot of time at the > destination. Currently they are done after the source of the migration is > stopped, and before the workload is resumed at the destination. This is a > period where neigher traffic can flow, nor the VM workload can continue > (downtime). > > We can do better as we know the memory layout of the guest RAM at the > destination from the moment the migration starts. Moving that operation > allows > QEMU to communicate the kernel the maps while the workload is still running in > the source, so Linux can start mapping them. Ideally, all IOMMU is > configured, > but if the vDPA parent driver uses on-chip IOMMU and .set_map we're still > saving all the pinning time. > > Note that further devices setup at the end of the migration may alter the > guest > memory layout. But same as the previous point, many operations are still done > incrementally, like memory pinning, so we're saving time anyway. > > The first bunch of patches just reorganizes the code, so memory related > operation parameters are shared between all vhost_vdpa devices. This is > because the destination does not know what vhost_vdpa struct will have the > registered listener member, so it is easier to place them in a shared struct > rather to keep them in vhost_vdpa struct. Future version may squash or omit > these patches. > > Only tested with vdpa_sim. I'm sending this before full benchmark, as some > work > like [1] can be based on it, and Si-Wei agreed on benchmark this series with > his experience.
I'd expect we can see some improvement even without other optimizations? For example, do we see improvement on mlx5? (Or we can probably add some delay to the simulator to see) Thanks > > Future directions on top of this series may include: > * Iterative migration of virtio-net devices, as it may reduce downtime per > [1]. > vhost-vdpa net can apply the configuration through CVQ in the destination > while the source is still migrating. > * Move more things ahead of migration time, like DRIVER_OK. > * Check that the devices of the destination are valid, and cancel the > migration > in case it is not. > > [1] > https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/ > > Eugenio Pérez (18): > vdpa: add VhostVDPAShared > vdpa: move iova tree to the shared struct > vdpa: move iova_range to vhost_vdpa_shared > vdpa: move shadow_data to vhost_vdpa_shared > vdpa: use vdpa shared for tracing > vdpa: move file descriptor to vhost_vdpa_shared > vdpa: move iotlb_batch_begin_sent to vhost_vdpa_shared > vdpa: move backend_cap to vhost_vdpa_shared > vdpa: remove msg type of vhost_vdpa > vdpa: move iommu_list to vhost_vdpa_shared > vdpa: use VhostVDPAShared in vdpa_dma_map and unmap > vdpa: use dev_shared in vdpa_iommu > vdpa: move memory listener to vhost_vdpa_shared > vdpa: do not set virtio status bits if unneeded > vdpa: add vhost_vdpa_load_setup > vdpa: add vhost_vdpa_net_load_setup NetClient callback > vdpa: use shadow_data instead of first device v->shadow_vqs_enabled > virtio_net: register incremental migration handlers > > include/hw/virtio/vhost-vdpa.h | 43 +++++--- > include/net/net.h | 4 + > hw/net/virtio-net.c | 23 +++++ > hw/virtio/vdpa-dev.c | 7 +- > hw/virtio/vhost-vdpa.c | 183 ++++++++++++++++++--------------- > net/vhost-vdpa.c | 127 ++++++++++++----------- > hw/virtio/trace-events | 14 +-- > 7 files changed, 239 insertions(+), 162 deletions(-) > > -- > 2.39.3 > >