On 5/15/2025 11:40 PM, Markus Armbruster wrote:
Jason Wang <jasow...@redhat.com> writes:
On Thu, May 8, 2025 at 2:47 AM Jonah Palmer <jonah.pal...@oracle.com> wrote:
Current memory operations like pinning may take a lot of time at the
destination. Currently they are done after the source of the migration is
stopped, and before the workload is resumed at the destination. This is a
period where neigher traffic can flow, nor the VM workload can continue
(downtime).
We can do better as we know the memory layout of the guest RAM at the
destination from the moment that all devices are initializaed. So
moving that operation allows QEMU to communicate the kernel the maps
while the workload is still running in the source, so Linux can start
mapping them.
As a small drawback, there is a time in the initialization where QEMU
cannot respond to QMP etc. By some testing, this time is about
0.2seconds.
Adding Markus to see if this is a real problem or not.
I guess the answer is "depends", and to get a more useful one, we need
more information.
When all you care is time from executing qemu-system-FOO to guest
finish booting, and the guest takes 10s to boot, then an extra 0.2s
won't matter much.
There's no such delay of an extra 0.2s or higher per se, it's just
shifting around the page pinning hiccup, no matter it is 0.2s or
something else, from the time of guest booting up to before guest is
booted. This saves back guest boot time or start up delay, but in turn
the same delay effectively will be charged to VM launch time. We follow
the same model with VFIO, which would see the same hiccup during launch
(at an early stage where no real mgmt software would care about).
When a management application runs qemu-system-FOO several times to
probe its capabilities via QMP, then even milliseconds can hurt.
Not something like that, this page pinning hiccup is one time only that
occurs in the very early stage when launching QEMU, i.e. there's no
consistent delay every time when QMP is called. The delay in QMP
response at that very point depends on how much memory the VM has, but
this is just specif to VM with VFIO or vDPA devices that have to pin
memory for DMA. Having said, there's no extra delay at all if QEMU args
has no vDPA device assignment, on the other hand, there's same delay or
QMP hiccup when VFIO is around in QEMU args.
In what scenarios exactly is QMP delayed?
Having said, this is not a new problem to QEMU in particular, this QMP
delay is not peculiar, it's existent on VFIO as well.
Thanks,
-Siwei
You told us an absolute delay you observed. What's the relative delay,
i.e. what's the delay with and without these patches?
We need QMP to become available earlier in the startup sequence for
other reasons. Could we bypass the delay that way? Please understand
that this would likely be quite difficult: we know from experience that
messing with the startup sequence is prone to introduce subtle
compatility breaks and even bugs.
(I remember VFIO has some optimization in the speed of the pinning,
could vDPA do the same?)
That's well outside my bailiwick :)
[...]