On Thu, Aug 14, 2025 at 11:28:24AM +0200, Eugenio Perez Martin wrote: > On Wed, Aug 13, 2025 at 4:06 PM Peter Xu <pet...@redhat.com> wrote: > > > > On Wed, Aug 13, 2025 at 11:25:00AM +0200, Eugenio Perez Martin wrote: > > > On Mon, Aug 11, 2025 at 11:56 PM Peter Xu <pet...@redhat.com> wrote: > > > > > > > > On Mon, Aug 11, 2025 at 05:26:05PM -0400, Jonah Palmer wrote: > > > > > This effort was started to reduce the guest visible downtime by > > > > > virtio-net/vhost-net/vhost-vDPA during live migration, especially > > > > > vhost-vDPA. > > > > > > > > > > The downtime contributed by vhost-vDPA, for example, is not from > > > > > having to > > > > > migrate a lot of state but rather expensive backend control-plane > > > > > latency > > > > > like CVQ configurations (e.g. MQ queue pairs, RSS, MAC/VLAN filters, > > > > > offload > > > > > settings, MTU, etc.). Doing this requires kernel/HW NIC operations > > > > > which > > > > > dominates its downtime. > > > > > > > > > > In other words, by migrating the state of virtio-net early (before the > > > > > stop-and-copy phase), we can also start staging backend > > > > > configurations, > > > > > which is the main contributor of downtime when migrating a vhost-vDPA > > > > > device. > > > > > > > > > > I apologize if this series gives the impression that we're migrating > > > > > a lot > > > > > of data here. It's more along the lines of moving control-plane > > > > > latency out > > > > > of the stop-and-copy phase. > > > > > > > > I see, thanks. > > > > > > > > Please add these into the cover letter of the next post. IMHO it's > > > > extremely important information to explain the real goal of this work. > > > > I > > > > bet it is not expected for most people when reading the current cover > > > > letter. > > > > > > > > Then it could have nothing to do with iterative phase, am I right? > > > > > > > > What are the data needed for the dest QEMU to start staging backend > > > > configurations to the HWs underneath? Does dest QEMU already have them > > > > in > > > > the cmdlines? > > > > > > > > Asking this because I want to know whether it can be done completely > > > > without src QEMU at all, e.g. when dest QEMU starts. > > > > > > > > If src QEMU's data is still needed, please also first consider providing > > > > such facility using an "early VMSD" if it is ever possible: feel free to > > > > refer to commit 3b95a71b22827d26178. > > > > > > > > > > While it works for this series, it does not allow to resend the state > > > when the src device changes. For example, if the number of virtqueues > > > is modified. > > > > Some explanation on "how sync number of vqueues helps downtime" would help. > > Not "it might preheat things", but exactly why, and how that differs when > > it's pure software, and when hardware will be involved. > > > > By nvidia engineers to configure vqs (number, size, RSS, etc) takes > about ~200ms: > https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/ > > Adding Dragos here in case he can provide more details. Maybe the > numbers have changed though. For kernel mlx5_vdpa it can be even more on larger systems (256 GB VM with 32 VQs): https://lore.kernel.org/virtualization/20240830105838.2666587-2-dtatu...@nvidia.com/
As pointed in the above link, configuring VQs can amount to a lot of time whem many VQs are used (32 in our example). So having them pre-configured during migration would be a worthwhile optimization. Thanks, Dragos