Re: [RFC 5/6] virtio,virtio-net: skip consistency check in virtio_load for iterative migration

Eugenio Perez Martin Tue, 19 Aug 2025 00:13:13 -0700

On Mon, Aug 18, 2025 at 4:46 PM Jonah Palmer <jonah.pal...@oracle.com> wrote:
>
>
>
> On 8/18/25 2:51 AM, Eugenio Perez Martin wrote:
> > On Fri, Aug 15, 2025 at 4:50 PM Jonah Palmer <jonah.pal...@oracle.com> 
> > wrote:
> >>
> >>
> >>
> >> On 8/14/25 5:28 AM, Eugenio Perez Martin wrote:
> >>> On Wed, Aug 13, 2025 at 4:06 PM Peter Xu <pet...@redhat.com> wrote:
> >>>>
> >>>> On Wed, Aug 13, 2025 at 11:25:00AM +0200, Eugenio Perez Martin wrote:
> >>>>> On Mon, Aug 11, 2025 at 11:56 PM Peter Xu <pet...@redhat.com> wrote:
> >>>>>>
> >>>>>> On Mon, Aug 11, 2025 at 05:26:05PM -0400, Jonah Palmer wrote:
> >>>>>>> This effort was started to reduce the guest visible downtime by
> >>>>>>> virtio-net/vhost-net/vhost-vDPA during live migration, especially
> >>>>>>> vhost-vDPA.
> >>>>>>>
> >>>>>>> The downtime contributed by vhost-vDPA, for example, is not from 
> >>>>>>> having to
> >>>>>>> migrate a lot of state but rather expensive backend control-plane 
> >>>>>>> latency
> >>>>>>> like CVQ configurations (e.g. MQ queue pairs, RSS, MAC/VLAN filters, 
> >>>>>>> offload
> >>>>>>> settings, MTU, etc.). Doing this requires kernel/HW NIC operations 
> >>>>>>> which
> >>>>>>> dominates its downtime.
> >>>>>>>
> >>>>>>> In other words, by migrating the state of virtio-net early (before the
> >>>>>>> stop-and-copy phase), we can also start staging backend 
> >>>>>>> configurations,
> >>>>>>> which is the main contributor of downtime when migrating a vhost-vDPA
> >>>>>>> device.
> >>>>>>>
> >>>>>>> I apologize if this series gives the impression that we're migrating 
> >>>>>>> a lot
> >>>>>>> of data here. It's more along the lines of moving control-plane 
> >>>>>>> latency out
> >>>>>>> of the stop-and-copy phase.
> >>>>>>
> >>>>>> I see, thanks.
> >>>>>>
> >>>>>> Please add these into the cover letter of the next post.  IMHO it's
> >>>>>> extremely important information to explain the real goal of this work. 
> >>>>>>  I
> >>>>>> bet it is not expected for most people when reading the current cover
> >>>>>> letter.
> >>>>>>
> >>>>>> Then it could have nothing to do with iterative phase, am I right?
> >>>>>>
> >>>>>> What are the data needed for the dest QEMU to start staging backend
> >>>>>> configurations to the HWs underneath?  Does dest QEMU already have 
> >>>>>> them in
> >>>>>> the cmdlines?
> >>>>>>
> >>>>>> Asking this because I want to know whether it can be done completely
> >>>>>> without src QEMU at all, e.g. when dest QEMU starts.
> >>>>>>
> >>>>>> If src QEMU's data is still needed, please also first consider 
> >>>>>> providing
> >>>>>> such facility using an "early VMSD" if it is ever possible: feel free 
> >>>>>> to
> >>>>>> refer to commit 3b95a71b22827d26178.
> >>>>>>
> >>>>>
> >>>>> While it works for this series, it does not allow to resend the state
> >>>>> when the src device changes. For example, if the number of virtqueues
> >>>>> is modified.
> >>>>
> >>>> Some explanation on "how sync number of vqueues helps downtime" would 
> >>>> help.
> >>>> Not "it might preheat things", but exactly why, and how that differs when
> >>>> it's pure software, and when hardware will be involved.
> >>>>
> >>>
> >>> By nvidia engineers to configure vqs (number, size, RSS, etc) takes
> >>> about ~200ms:
> >>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/__;!!ACWV5N9M2RV99hQ!OQdf7sGaBlbXhcFHX7AC7HgYxvFljgwWlIgJCvMgWwFvPqMrAMbWqf0862zV5shIjaUvlrk54fLTK6uo2pA$
> >>>
> >>> Adding Dragos here in case he can provide more details. Maybe the
> >>> numbers have changed though.
> >>>
> >>> And I guess the difference with pure SW will always come down to PCI
> >>> communications, which assume it is slower than configuring the host SW
> >>> device in RAM or even CPU cache. But I admin that proper profiling is
> >>> needed before making those claims.
> >>>
> >>> Jonah, can you print the time it takes to configure the vDPA device
> >>> with traces vs the time it takes to enable the dataplane of the
> >>> device? So we can get an idea of how much time we save with this.
> >>>
> >>
> >> Let me know if this isn't what you're looking for.
> >>
> >> I'm assuming by "configuration time" you mean:
> >>    - Time from device startup (entry to vhost_vdpa_dev_start()) to right
> >>      before we start enabling the vrings (e.g.
> >>      VHOST_VDPA_SET_VRING_ENABLE in vhost_vdpa_net_cvq_load()).
> >>
> >> And by "time taken to enable the dataplane" I'm assuming you mean:
> >>    - Time right before we start enabling the vrings (see above) to right
> >>      after we enable the last vring (at the end of
> >>      vhost_vdpa_net_cvq_load())
> >>
> >> Guest specs: 128G Mem, SVQ=on, CVQ=on, 8 queue pairs:
> >>
> >> -netdev type=vhost-vdpa,vhostdev=$VHOST_VDPA_0,id=vhost-vdpa0,
> >>           queues=8,x-svq=on
> >>
> >> -device virtio-net-pci,netdev=vhost-vdpa0,id=vdpa0,bootindex=-1,
> >>           romfile=,page-per-vq=on,mac=$VF1_MAC,ctrl_vq=on,mq=on,
> >>           ctrl_vlan=off,vectors=18,host_mtu=9000,
> >>           disable-legacy=on,disable-modern=off
> >>
> >> ---
> >>
> >> Configuration time:    ~31s
> >> Dataplane enable time: ~0.14ms
> >>
> >
> > I was vague, but yes, that's representative enough! It would be more
> > accurate if the configuration time ends by the time QEMU enables the
> > first queue of the dataplane though.
> >
> > As Si-Wei mentions, is v->shared->listener_registered == true at the
> > beginning of vhost_vdpa_dev_start?
> >
>
> Ah, I also realized that Qemu I was using for measurements was using a
> version before the listener_registered member was introduced.
>
> I retested with the latest changes in Qemu and set x-svq=off, e.g.:
> guest specs: 128G Mem, SVQ=off, CVQ=on, 8 queue pairs. I ran testing 3
> times for measurements.
>
> v->shared->listener_registered == false at the beginning of
> vhost_vdpa_dev_start().
>


Let's move out the effect of the mem pinning from the downtime by
registering the listener before the migration. Can you check why is it
not registered at vhost_vdpa_set_owner?

> ---
>
> Configuration time: Time from first entry into vhost_vdpa_dev_start() to
> right after Qemu enables the first VQ.
>   - 26.947s, 26.606s, 27.326s
>
> Enable dataplane: Time from right after first VQ is enabled to right
> after the last VQ is enabled.
>   - 0.081ms, 0.081ms, 0.079ms
>

Re: [RFC 5/6] virtio,virtio-net: skip consistency check in virtio_load for iterative migration

Reply via email to