On Mon, Aug 11, 2025 at 08:30:19AM -0400, Jonah Palmer wrote:
> 
> 
> On 8/7/25 12:31 PM, Peter Xu wrote:
> > On Thu, Aug 07, 2025 at 10:18:38AM -0400, Jonah Palmer wrote:
> > > 
> > > 
> > > On 8/6/25 12:27 PM, Peter Xu wrote:
> > > > On Tue, Jul 22, 2025 at 12:41:26PM +0000, Jonah Palmer wrote:
> > > > > Iterative live migration for virtio-net sends an initial
> > > > > VMStateDescription while the source is still active. Because data
> > > > > continues to flow for virtio-net, the guest's avail index continues to
> > > > > increment after last_avail_idx had already been sent. This causes the
> > > > > destination to often see something like this from virtio_error():
> > > > > 
> > > > > VQ 0 size 0x100 Guest index 0x0 inconsistent with Host index 0xc: 
> > > > > delta 0xfff4
> > > > 
> > > > This is pretty much understanable, as vmstate_save() / vmstate_load() 
> > > > are,
> > > > IMHO, not designed to be used while VM is running.
> > > > 
> > > > To me, it's still illegal (per previous patch) to use 
> > > > vmstate_save_state()
> > > > while VM is running, in a save_setup() phase.
> > > 
> > > Yea I understand where you're coming from. It just seemed too good to pass
> > > up on as a way to send and receive the entire state of a device.
> > > 
> > > I felt that if I were to implement something similar for iterative 
> > > migration
> > > only that I'd, more or less, be duplicating a lot of already existing code
> > > or vmstate logic.
> > > 
> > > > 
> > > > Some very high level questions from migration POV:
> > > > 
> > > > - Have we figured out why the downtime can be shrinked just by sending 
> > > > the
> > > >     vmstate twice?
> > > > 
> > > >     If we suspect it's memory got preheated, have we tried other ways to
> > > >     simply heat the memory up on dest side?  For example, some form of
> > > >     mlock[all]()?  IMHO it's pretty important we figure out the root of 
> > > > why
> > > >     such optimization came from.
> > > > 
> > > >     I do remember we have downtime issue with number of max_vqueues 
> > > > that may
> > > >     cause post_load() to be slow, I wonder there're other ways to 
> > > > improve it
> > > >     instead of vmstate_save(), especially in setup phase.
> > > > 
> > > 
> > > Yea I believe that the downtime shrinks on the second vmstate_load_state 
> > > due
> > > to preheated memory. But I'd like to stress that it's not my intention to
> > > resend the entire vmstate again during the stop-and-copy phase if 
> > > iterative
> > > migration was used. A future iteration of this series will eventually
> > > include a more efficient approach to update the destination with any 
> > > deltas
> > > since the vmstate was sent during the iterative portion (instead of just
> > > resending the entire vmstate again).
> > > 
> > > And yea there is an inefficiency regarding walking through 
> > > VIRTIO_QUEUE_MAX
> > > (1024) VQs (twice with PCI) that I mentioned here in another comment: 
> > > https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/0f5b804d-3852-4159-b151-308a57f1e...@oracle.com/__;!!ACWV5N9M2RV99hQ!Oyhh-o4V5gzcWsbmSxAkonhYn3xcLBF50-h-a9-D5MiKgbiHvkaAqdu1VZP5SVmuCk5GQu-sjFhL0IUC$
> > > 
> > > This might be better handled in a separate series though rather than as 
> > > part
> > > of this one.
> > 
> > One thing to mention is I recall some other developer was trying to
> > optimize device load from memory side:
> > 
> > https://urldefense.com/v3/__https://lore.kernel.org/all/20230317081904.24389-1-xuchuangxc...@bytedance.com/__;!!ACWV5N9M2RV99hQ!Oyhh-o4V5gzcWsbmSxAkonhYn3xcLBF50-h-a9-D5MiKgbiHvkaAqdu1VZP5SVmuCk5GQu-sjBifRrAz$
> > 
> > So maybe there're more than one way of doing this, and I'm not sure which
> > way is better, or both.
> > 
> 
> Ack. I'll take a look at this.
> 
> > > 
> > > > - Normally devices need iterative phase because:
> > > > 
> > > >     (a) the device may contain huge amount of data to transfer
> > > > 
> > > >         E.g. RAM and VFIO are good examples and fall into this category.
> > > > 
> > > >     (b) the device states are "iterable" from concept
> > > > 
> > > >         RAM is definitely true.  VFIO somehow mimiced that even though 
> > > > it was
> > > >         a streamed binary protocol..
> > > > 
> > > >     What's the answer for virtio-net here?  How large is the device 
> > > > state?
> > > >     Is this relevant to vDPA and real hardware (so virtio-net can look
> > > >     similar to VFIO at some point)?
> > > 
> > > 
> > > The main motivation behind implementing iterative migration for virtio-net
> > > is really to improve the guest visible downtime seen when migrating a vDPA
> > > device.
> > > 
> > > That is, by implementing iterative migration for virtio-net, we can see 
> > > the
> > > state of the device early on and get a head start on work that's currently
> > > being done during the stop-and-copy phase. If we do this work before the
> > > stop-and-copy phase, we can further decrease the time spent in this 
> > > window.
> > > 
> > > This would include work such as sending down the CVQ commands for 
> > > queue-pair
> > > creation (even more beneficial for multiqueue), RSS, filters, etc.
> > > 
> > > I'm hoping to show this more explicitly in the next version of this RFC
> > > series that I'm working on now.
> > 
> > OK, thanks for the context. I can wait and read the new version.
> > 
> > In all cases, please be noted that since migration thread does not take
> > BQL, it means either the setup or iterable phase may happen concurrently
> > with any of the vCPU threads.  I think it means maybe it's not wise to try
> > to iterate everything: please be ready to see e.g. 64bits MMIO register
> > being partially updated when dumping it to the wire, for example.
> > 
> 
> Gotcha. Some of the iterative hooks though like .save_setup, .load_state,
> etc. do hold the BQL though, right?

load_state() definitely needs the lock.

save_setup(), yes we have bql, but I really wish we don't depend on it, and
I don't know whether it'll keep holding true - AFAIU, the majority of it
really doesn't need the lock..  and I always wanted to see whether I can
remove it.

Normal iterations definitely runs without the lock.

> 
> > Do you have a rough estimation of the size of the device states to migrate?
> > 
> 
> Do you have a method at how I might be able to estimate this? I've been
> trying to get some kind of rough estimation but failing to do so.

Could I ask why you started this "migrate virtio-net in iteration phase"
effort?

I thought it was because there're a lot of data to migrate, and there
should be a way to estimate the minumum.  So is it not the case?

How about vDPA devices?  Do those devices have a lot of data to migrate?

We really need a good enough reason to have a device provide
save_iterate().  If it's only about "preheat some MMIO registers", we
should, IMHO, look at more generic ways first.

Thanks,

-- 
Peter Xu


Reply via email to