Re: [PATCH 1/2] migration: Add some documentation for multifd

Fabiano Rosas Tue, 11 Mar 2025 18:52:58 -0700

Peter Xu <pet...@redhat.com> writes:

> On Fri, Mar 07, 2025 at 04:06:17PM -0300, Fabiano Rosas wrote:
>> > I never tried vsock, would it be used in any use case?
>> >
>> 
>> I don't know, I'm going by what's in the code.
>> 
>> > It seems to be introduced by accident in 72a8192e225cea, but I'm not sure.
>> > Maybe there's something I missed.
>> 
>> The code was always had some variation of:
>> 
>> static bool transport_supports_multi_channels(SocketAddress *saddr)
>> {
>>     return strstart(uri, "tcp:", NULL) || strstart(uri, "unix:", NULL) ||
>>            strstart(uri, "vsock:", NULL);
>> }
>> 
>> Introduced by b7acd65707 ("migration: allow multifd for socket protocol
>> only").
>
> Looks like a copy-paste error..  I found it, should be 9ba3b2baa1
> ("migration: add vsock as data channel support").
>
> https://lore.kernel.org/all/2bc0e226-ee71-330a-1bcd-bd9d09750...@huawei.com/
> https://kvmforum2019.sched.com/event/Tmzh/zero-next-generation-virtualization-platform-for-huawei-cloud-jinsong-liu-zhichao-huang-huawei
> https://e.huawei.com/sa/material/event/HC/e37b9c4c33e14e869bb1183fab468fed
>


Great, thanks for finding those. I'll add it to my little "lore"
folder. This kind of information is always good to know.

> So if I read it right.. the VM in this case is inside a container or
> something, then it talks to an "agent" on a PCIe device which understands
> virtio-vsock protocol.  So maybe vsock just performs better than other ways
> to dig that tunnel for the container.
>
> In that case, mentioning vsock is at least ok.
>
> [...]
>
>> >> +After the channels have been put into a wait state by the sync
>> >> +functions, the client code may continue to transmit additional data by
>> >> +issuing ``multifd_send()`` once again.
>> >> +
>> >> +Note:
>> >> +
>> >> +- the RAM migration does, effectively, a global synchronization by
>> >> +  chaining a call to ``multifd_send_sync_main()`` with the emission of a
>> >> +  flag on the main migration channel (``RAM_SAVE_FLAG_MULTIFD_FLUSH``)
>> >
>> > ... or RAM_SAVE_FLAG_EOS ... depending on the machine type.
>> >
>> 
>> Eh.. big compatibility mess. I rather not mention it.
>
> It's not strictly a compatibility mess.  IIUC, it was used to be designed
> to always work with EOS.  I think at that time Juan was still focused on
> making it work and not whole perf tunings, but then we found it can be a
> major perf issue if we flush too soon.  Then if we flush it once per round,
> it may not always pair with a EOS.  That's why we needed a new message.
>

Being fully honest, at the time I got the impression the situation was
"random person inside RH decided to measure performance of random thing
and upstream maintainer felt pressure to push a fix".

Whether that was the case or not, it doesn't matter now, but we can't
deny that this _has_ generated some headache, just look at how many
issues arose from the introduction of that flag.

> But hey, you're writting a doc that helps everyone.  You deserve to decide
> whether you like to mention it or not on this one. :)

My rant aside, I really want to avoid any readers having to think too
much about this flush thing. We're already seeing some confusion when
discussing it with Prasad in the other thread. The code itself and the
git log are more reliable to explain the compat situation IMO.

>
> IIRC we updated our compat rule so we maintain each machine type for only 6
> years.  It means the whole per-iteration + EOS stuff can be removed in 3.5
> years or so - we did that work in July 2022.  So it isn't that important
> either to mention indeed.
>

Yep, that as well.

>> 
>> > Maybe we should also add a sentence on the relationship of
>> > MULTIFD_FLAG_SYNC and RAM_SAVE_FLAG_MULTIFD_FLUSH (or RAM_SAVE_FLAG_EOS ),
>> > in that they should always be sent together, and only if so would it
>> > provide ordering of multifd messages and what happens in the main migration
>> > thread.
>> >
>> 
>> The problem is that RAM_SAVE_FLAGs are a ram.c thing. In theory the need
>> for RAM_SAVE_FLAG_MULTIFD_FLUSH is just because the RAM migration is
>> driven by the source machine by the flags that are put on the
>> stream. IOW, this is a RAM migration design, not a multifd design. The
>> multifd design is (could be, we decide) that once sync packets are sent,
>> _something_ must do the following:
>> 
>>     for (i = 0; i < thread_count; i++) {
>>         trace_multifd_recv_sync_main_wait(i);
>>         qemu_sem_wait(&multifd_recv_state->sem_sync);
>>     }
>> 
>> ... which is already part of multifd_recv_sync_main(), but that just
>> _happens to be_ called by ram.c when it sees the
>> RAM_SAVE_FLAG_MULTIFD_FLUSH flag on the stream, that's not a multifd
>> design requirement. The ram.c code could for instance do the sync when
>> some QEMU_VM_SECTION_EOS (or whatever it's called) appears.
>
> I still think it should be done in RAM code only.  One major goal (if not
> the only goal..) is it wants to order different versions of pages and
> that's only what the RAM module is about, not migration in general.
>
> From that POV, having a QEMU_VM_* is kind of the wrong layer - they should
> work for migration in general.
>
> Said so, I agree we do violate it from time to time, for example, we have a
> bunch of subcmds (MIG_CMD_POSTCOPY*) just for postcopy, which is under
> QEMU_VM_COMMAND.  But IIUC that was either kind of ancient (so we need to
> stick with them now.. postcopy was there for 10 years) or it needs some
> ping-pong messages in which case QEMU_VM_COMMAND is the easiest.. IMHO we
> should still try to stick with the layering if possible.

All good points, but I was talking of something else:

I was just throwing an example of how it could be done differently to
make the point clear that the recv_sync has nothing to do with the ram
flag, that's just implementation detail. I was thinking specifically
about the multifd+postcopy work where we might need syncs but there is
no RAM_FLAGS there.

We don't actually _need_ to sync with the migration thread on the
destination like that. The client could send control information in it's
opaque packet (instead of in the migration thread) or in a completely
separate channel if it wanted. That sync is also not necessary if there
is no dependency around the data being transferred (i.e. mapped-ram just
takes data form the file and writes to guest memory)

To be clear I'm not suggesting we change anything, I'm only trying to
reflect in the docs some level of separation between multifd.c and ram.c
(or whatever else) because we've seen that mixing the two makes the
design less clean. We've already cleared much of the "p->pages" kind of
issues and the packet definition is more versatile after Maciej's work,
but I still think the separation should be more strict (hence all the
"client" talk in this doc).

(I realise I'm hijacking the documentation thread to talk about high
level design, my apologies).

Re: [PATCH 1/2] migration: Add some documentation for multifd

Reply via email to