On Thu, 20 Mar 2025 at 20:15, Fabiano Rosas <faro...@suse.de> wrote:
> Technically both can happen. But that would just be the case of
> file:fdset migration which requires an extra fd for O_DIRECT. So
> "multiple" in the usual sense of "more is better" is only
> fd-per-thread. IOW, using multiple fds is an implementation detail IMO,
> what people really care about is medium saturation, which we can only
> get (with multifd) via parallelization.

* I see. Multifd is essentially multiple threads = thread pool then.

> > Because doing migration via QMP commands is not as
> > straightforward, I wonder who might do that and why.
> >
>
> All of QEMU developers, libvirt developers, cloud software developers,
> kernel developers etc.

* Really? That must be using QMP apis via libvirt/virsh kind of tools
I guess. Otherwise how does one follow above instructions to enable
'multifd' and set number of channels on both source and destination
machines? User has to open QMP shell on two machines and invoke QMP
commands?

> > * So multifd mechanism can be used to transfer non-ram data as well? I
> > thought it's only used for RAM migration. Are device/gpu states etc
> > bits also transferred via multifd threads?
> >
> device state migration with multifd has been merged for 10.0
>
> <rant>
> If it were up to me, we'd have a pool of multifd threads that transmit
> everything migration-related.

* Same my thought: If multifd is to be used for all data, why not use
the existing QEMU thread pool  OR  make it a migration thread pool.
IIRC, there is also some discussion about having a thread pool for
VFIO or GPU state transfer. Having so many different thread pools does
not seem right.

> Unfortunately, that's not so
> straight-forward to implement without rewriting a lot of code, multifd
> requires too much entanglement from the data producer. We're constantly
> dealing with details of data transmission getting in the way of data
> production/consumption (e.g. try to change ram.c to produce multiple
> pages at once and watch everyting explode).

* Ideally there should be separation between what the client is doing
and how migration is working.

* IMO, migration is a mechanism to transfer byte streams from one
machine to another. And while doing so, facilitate writing (data) at
specific addresses/offsets on the destination, not just append bytes
at the tail end. This entails that each individual migration packet
specifies where to write data on the destination. Let's say a
migration stream is a train of packets. Each packet has a header and
data.

     ( [header][...data...] )><><( [header][...data...] )><><(
[header][data] )><>< ... ><><( [header][data] )

Header specifies:
    - Serial number
    - Header length
    - Data length/size (2MB/4MB/8MB etc.)
    - Destination address <- offset where to write migration data, if
it is zero(0) append that data
    - Data type (optional): Whether it is RAM/Device/GPU/CPU state etc.
    - Data iteration number <- version/iteration of the same RAM page
    ...   more variables
    ...   more variables
    - Some reserved bytes
Migration data is:
    - Just a data byte stream <= Data length/size above.

* Such a train of packets is then transferred via 1 thread or 10
threads is an operational change.
* Such a packet is pushed (Precopy) from source to destination  OR
pulled (Postcopy) by destination from the source side is an
operational difference. In Postcopy phase, it could send a message
saying I need the next RAM packet for this offset and RAM module on
the source side provides only relevant data. Again packaging and
transmission is done by the migration module. Similarly the Postcopy
phase could send a message saying I need the next GPU packet, and the
GPU module on the source side would provide relevant data.
* How long such a train of packets is, is also immaterial.
* With such a separation, things like synchronisation of threads is
not connected to the data (RAM/GPU/CPU/etc.) type.
* It may also allow us to apply compression/encryption uniformly
across all channels/threads, irrespective of the data type.
* Since migration is a packet transport mechanism,
creation/modification/destruction of packets could be done by one
entity. Clients (like RAM/GPU/CPU/VFIO etc.) shall only supply 'data'
to be packaged and sent. It shouldn't be like RAM.c writes its own
pakcets as they like, GPU.c writes their own packets as they like,
that does not seem right.

 >> +- A packet which is the final result of all the data aggregation
> >> +  and/or transformation. The packet contains: a *header* with magic and
> >> +  version numbers and flags that inform of special processing needed
> >> +  on the destination; a *payload-specific header* with metadata referent
> >> +  to the packet's data portion, e.g. page counts; and a variable-size
> >> +  *data portion* which contains the actual opaque payload data.

* Thread synchronisation and other such control messages could/should
be a separate packets of its own, to be sent on the main channel.
Thread synchronisation flags could/should not be combined with the
migration data packets above. Control message packets may have _no
data_ to be processed. (just sharing thoughts)

Thank you.
---
  - Prasad


Reply via email to