On Thu, Aug 14, 2025 at 11:28:24AM +0200, Eugenio Perez Martin wrote:
> On Wed, Aug 13, 2025 at 4:06 PM Peter Xu <pet...@redhat.com> wrote:
> >
> > On Wed, Aug 13, 2025 at 11:25:00AM +0200, Eugenio Perez Martin wrote:
> > > On Mon, Aug 11, 2025 at 11:56 PM Peter Xu <pet...@redhat.com> wrote:
> > > >
> > > > On Mon, Aug 11, 2025 at 05:26:05PM -0400, Jonah Palmer wrote:
> > > > > This effort was started to reduce the guest visible downtime by
> > > > > virtio-net/vhost-net/vhost-vDPA during live migration, especially
> > > > > vhost-vDPA.
> > > > >
> > > > > The downtime contributed by vhost-vDPA, for example, is not from 
> > > > > having to
> > > > > migrate a lot of state but rather expensive backend control-plane 
> > > > > latency
> > > > > like CVQ configurations (e.g. MQ queue pairs, RSS, MAC/VLAN filters, 
> > > > > offload
> > > > > settings, MTU, etc.). Doing this requires kernel/HW NIC operations 
> > > > > which
> > > > > dominates its downtime.
> > > > >
> > > > > In other words, by migrating the state of virtio-net early (before the
> > > > > stop-and-copy phase), we can also start staging backend 
> > > > > configurations,
> > > > > which is the main contributor of downtime when migrating a vhost-vDPA
> > > > > device.
> > > > >
> > > > > I apologize if this series gives the impression that we're migrating 
> > > > > a lot
> > > > > of data here. It's more along the lines of moving control-plane 
> > > > > latency out
> > > > > of the stop-and-copy phase.
> > > >
> > > > I see, thanks.
> > > >
> > > > Please add these into the cover letter of the next post.  IMHO it's
> > > > extremely important information to explain the real goal of this work.  
> > > > I
> > > > bet it is not expected for most people when reading the current cover
> > > > letter.
> > > >
> > > > Then it could have nothing to do with iterative phase, am I right?
> > > >
> > > > What are the data needed for the dest QEMU to start staging backend
> > > > configurations to the HWs underneath?  Does dest QEMU already have them 
> > > > in
> > > > the cmdlines?
> > > >
> > > > Asking this because I want to know whether it can be done completely
> > > > without src QEMU at all, e.g. when dest QEMU starts.
> > > >
> > > > If src QEMU's data is still needed, please also first consider providing
> > > > such facility using an "early VMSD" if it is ever possible: feel free to
> > > > refer to commit 3b95a71b22827d26178.
> > > >
> > >
> > > While it works for this series, it does not allow to resend the state
> > > when the src device changes. For example, if the number of virtqueues
> > > is modified.
> >
> > Some explanation on "how sync number of vqueues helps downtime" would help.
> > Not "it might preheat things", but exactly why, and how that differs when
> > it's pure software, and when hardware will be involved.
> >
> 
> By nvidia engineers to configure vqs (number, size, RSS, etc) takes
> about ~200ms:
> https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/
> 
> Adding Dragos here in case he can provide more details. Maybe the
> numbers have changed though.
For kernel mlx5_vdpa it can be even more on larger systems (256 GB VM
with 32 VQs):
https://lore.kernel.org/virtualization/20240830105838.2666587-2-dtatu...@nvidia.com/

As pointed in the above link, configuring VQs can amount to a lot of
time whem many VQs are used (32 in our example). So having them
pre-configured during migration would be a worthwhile optimization.

Thanks,
Dragos

Reply via email to