On Mon, Aug 18, 2025 at 4:46 PM Jonah Palmer <jonah.pal...@oracle.com> wrote: > > > > On 8/18/25 2:51 AM, Eugenio Perez Martin wrote: > > On Fri, Aug 15, 2025 at 4:50 PM Jonah Palmer <jonah.pal...@oracle.com> > > wrote: > >> > >> > >> > >> On 8/14/25 5:28 AM, Eugenio Perez Martin wrote: > >>> On Wed, Aug 13, 2025 at 4:06 PM Peter Xu <pet...@redhat.com> wrote: > >>>> > >>>> On Wed, Aug 13, 2025 at 11:25:00AM +0200, Eugenio Perez Martin wrote: > >>>>> On Mon, Aug 11, 2025 at 11:56 PM Peter Xu <pet...@redhat.com> wrote: > >>>>>> > >>>>>> On Mon, Aug 11, 2025 at 05:26:05PM -0400, Jonah Palmer wrote: > >>>>>>> This effort was started to reduce the guest visible downtime by > >>>>>>> virtio-net/vhost-net/vhost-vDPA during live migration, especially > >>>>>>> vhost-vDPA. > >>>>>>> > >>>>>>> The downtime contributed by vhost-vDPA, for example, is not from > >>>>>>> having to > >>>>>>> migrate a lot of state but rather expensive backend control-plane > >>>>>>> latency > >>>>>>> like CVQ configurations (e.g. MQ queue pairs, RSS, MAC/VLAN filters, > >>>>>>> offload > >>>>>>> settings, MTU, etc.). Doing this requires kernel/HW NIC operations > >>>>>>> which > >>>>>>> dominates its downtime. > >>>>>>> > >>>>>>> In other words, by migrating the state of virtio-net early (before the > >>>>>>> stop-and-copy phase), we can also start staging backend > >>>>>>> configurations, > >>>>>>> which is the main contributor of downtime when migrating a vhost-vDPA > >>>>>>> device. > >>>>>>> > >>>>>>> I apologize if this series gives the impression that we're migrating > >>>>>>> a lot > >>>>>>> of data here. It's more along the lines of moving control-plane > >>>>>>> latency out > >>>>>>> of the stop-and-copy phase. > >>>>>> > >>>>>> I see, thanks. > >>>>>> > >>>>>> Please add these into the cover letter of the next post. IMHO it's > >>>>>> extremely important information to explain the real goal of this work. > >>>>>> I > >>>>>> bet it is not expected for most people when reading the current cover > >>>>>> letter. > >>>>>> > >>>>>> Then it could have nothing to do with iterative phase, am I right? > >>>>>> > >>>>>> What are the data needed for the dest QEMU to start staging backend > >>>>>> configurations to the HWs underneath? Does dest QEMU already have > >>>>>> them in > >>>>>> the cmdlines? > >>>>>> > >>>>>> Asking this because I want to know whether it can be done completely > >>>>>> without src QEMU at all, e.g. when dest QEMU starts. > >>>>>> > >>>>>> If src QEMU's data is still needed, please also first consider > >>>>>> providing > >>>>>> such facility using an "early VMSD" if it is ever possible: feel free > >>>>>> to > >>>>>> refer to commit 3b95a71b22827d26178. > >>>>>> > >>>>> > >>>>> While it works for this series, it does not allow to resend the state > >>>>> when the src device changes. For example, if the number of virtqueues > >>>>> is modified. > >>>> > >>>> Some explanation on "how sync number of vqueues helps downtime" would > >>>> help. > >>>> Not "it might preheat things", but exactly why, and how that differs when > >>>> it's pure software, and when hardware will be involved. > >>>> > >>> > >>> By nvidia engineers to configure vqs (number, size, RSS, etc) takes > >>> about ~200ms: > >>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/__;!!ACWV5N9M2RV99hQ!OQdf7sGaBlbXhcFHX7AC7HgYxvFljgwWlIgJCvMgWwFvPqMrAMbWqf0862zV5shIjaUvlrk54fLTK6uo2pA$ > >>> > >>> Adding Dragos here in case he can provide more details. Maybe the > >>> numbers have changed though. > >>> > >>> And I guess the difference with pure SW will always come down to PCI > >>> communications, which assume it is slower than configuring the host SW > >>> device in RAM or even CPU cache. But I admin that proper profiling is > >>> needed before making those claims. > >>> > >>> Jonah, can you print the time it takes to configure the vDPA device > >>> with traces vs the time it takes to enable the dataplane of the > >>> device? So we can get an idea of how much time we save with this. > >>> > >> > >> Let me know if this isn't what you're looking for. > >> > >> I'm assuming by "configuration time" you mean: > >> - Time from device startup (entry to vhost_vdpa_dev_start()) to right > >> before we start enabling the vrings (e.g. > >> VHOST_VDPA_SET_VRING_ENABLE in vhost_vdpa_net_cvq_load()). > >> > >> And by "time taken to enable the dataplane" I'm assuming you mean: > >> - Time right before we start enabling the vrings (see above) to right > >> after we enable the last vring (at the end of > >> vhost_vdpa_net_cvq_load()) > >> > >> Guest specs: 128G Mem, SVQ=on, CVQ=on, 8 queue pairs: > >> > >> -netdev type=vhost-vdpa,vhostdev=$VHOST_VDPA_0,id=vhost-vdpa0, > >> queues=8,x-svq=on > >> > >> -device virtio-net-pci,netdev=vhost-vdpa0,id=vdpa0,bootindex=-1, > >> romfile=,page-per-vq=on,mac=$VF1_MAC,ctrl_vq=on,mq=on, > >> ctrl_vlan=off,vectors=18,host_mtu=9000, > >> disable-legacy=on,disable-modern=off > >> > >> --- > >> > >> Configuration time: ~31s > >> Dataplane enable time: ~0.14ms > >> > > > > I was vague, but yes, that's representative enough! It would be more > > accurate if the configuration time ends by the time QEMU enables the > > first queue of the dataplane though. > > > > As Si-Wei mentions, is v->shared->listener_registered == true at the > > beginning of vhost_vdpa_dev_start? > > > > Ah, I also realized that Qemu I was using for measurements was using a > version before the listener_registered member was introduced. > > I retested with the latest changes in Qemu and set x-svq=off, e.g.: > guest specs: 128G Mem, SVQ=off, CVQ=on, 8 queue pairs. I ran testing 3 > times for measurements. > > v->shared->listener_registered == false at the beginning of > vhost_vdpa_dev_start(). >
Let's move out the effect of the mem pinning from the downtime by registering the listener before the migration. Can you check why is it not registered at vhost_vdpa_set_owner? > --- > > Configuration time: Time from first entry into vhost_vdpa_dev_start() to > right after Qemu enables the first VQ. > - 26.947s, 26.606s, 27.326s > > Enable dataplane: Time from right after first VQ is enabled to right > after the last VQ is enabled. > - 0.081ms, 0.081ms, 0.079ms >