Re: [PATCH 1/2] migration: Add some documentation for multifd

Prasad Pandit Mon, 24 Mar 2025 04:14:45 -0700

Hi,

On Fri, 21 Mar 2025 at 19:34, Fabiano Rosas <faro...@suse.de> wrote:
> Well, I can't speak for everyone, of course, but generally the less
> layers on top of the object of your work the better.


* Yes, true.

> There are several ways of accessing QMP, some examples I have lying
> around:
>
> ==
>   $QEMU ... -qmp unix:${SRC_SOCK},server,wait=off
>
>   echo "
>        { 'execute': 'qmp_capabilities' }
>        { 'execute': 'migrate-set-capabilities','arguments':{ 'capabilities':[ 
> \
>        { 'capability': 'mapped-ram', 'state': true }, \
>        { 'capability': 'multifd', 'state': true } \
>        ] } }
>        { 'execute': 'migrate-set-parameters','arguments':{ 
> 'multifd-channels': 8 } }
>        { 'execute': 'migrate-set-parameters','arguments':{ 'max-bandwidth': 0 
> } }
>        { 'execute': 'migrate-set-parameters','arguments':{ 'direct-io': true 
> } }
>        { 'execute': 'migrate${incoming}','arguments':{ 'uri': 'file:$MIGFILE' 
> } }
>        " | nc -NU $SRC_SOCK
> ==
> (echo "migrate_set_capability x-ignore-shared on"; echo
> "migrate_set_capability validate-uuid on"; echo "migrate
> exec:cat>migfile-s390x"; echo "quit") | ./qemu-system-s390x -bios
> /tmp/migration-test-16K1Z2/bootsect -monitor stdio
> ==
> $QEMU ... -qmp unix:${DST_SOCK},server,wait=off
> ./qemu/scripts/qmp/qmp-shell $DST_SOCK
> ==
> $QEMU ...
> C-a c
> (qemu) info migrate

* Interesting. Initially I tried enabling multifd on two hosts and
setting multifd channels via QMP, but then quickly moved to virsh(1)
for its convenience.

> I think we all agree that having different sets of threads managed in
> different ways, is not ideal.

* Yes.

> The thing with multifd is that it's very
> important to keep the performance and constraints of ram migration. If
> we manage to achieve that with some generic thread pool, that's
> great. But it's an exploration work that will have to be done.

* Yes.

> >> Unfortunately, that's not so
> >> straight-forward to implement without rewriting a lot of code, multifd
> >> requires too much entanglement from the data producer. We're constantly
> >> dealing with details of data transmission getting in the way of data
> >> production/consumption (e.g. try to change ram.c to produce multiple
> >> pages at once and watch everyting explode).

* Hmmn, I think that's where a clear separation between migration and
client could help.

> But then there's stuff like mapped-ram which wants its data free of any
> metadata because it mirrors the RAM layout in the migration file.

* I'm not sure how it works now OR why it works that way. But shall
have a look at it.

> I generally like the idea of having the size of the header/data
> specified in the header itself. It does seem like it would allow for
> better extensibility over time. I spent a lot of time looking at those
> "unused" bytes in MultiFDPacket_t trying to figure out a way of
> embedding the size information in a backward-compatible way. We ended up
> going with Maciej's idea of isolating the common parts of the header in
> the MultiFDPacketHdr_t and having each data type define it's own
> specific sub-header.
>
> I don't know how this looks like in terms of type-safety and how we'd
> keep compatibility (two separate issues) because a variable-size header
> needs to end up in a well-defined structure at some point. It's
> generally more difficult to maintain code that simply takes a buffer and
> pokes at random offsets in there.

* I'm not sure if the header size would vary as much.

> This is all in the end client-centric, which means it is "data" from the
> migration perspective. So the question I put earlier still remains, what
> determines the kind of data that goes in the header and the kind of data
> that goes in the data part of the packet? It seems we cannot escape from
> having the client bring it's own header format.

* Yes. Actually that begs the question - Why do we need the migration
and client headers? The answer might differ based on how we look at
things.

Guest State
     |
     +---pcie-root-ports[0..15] -> [0...2GB]
     |
     +---Timer -> [0...1MB]
     |
     +---Audio -> [0...1MB]
     |
     +---Block -> [0...2GB]
     |
     +---RAM -> [0...128GB]
     |
     +---DirtyBitmap -> [0...4GB]
     |
     +---GPU -> [0...128GB]
     |
     +---CPUs[0...31] -> [0...8GB]
     ...
     |
     + (above numbers are for example only)

     [Host-1: Guest State]    <== [Migration] ==>    [Host-2: Guest State]

* Migration should create this same 'Guest State'  tree on the
destination Host-2 as:
     0. Whole guest state (vmstate) is a list of different nodes with
their own state as above.
     1. Migration core on the source side would iterate over these
nodes to call the respective  *_read() function to read their 'state'.
     2. Migration core would transmit the read 'state'  (as 2MB/4MB
data blocks) to the destination.
     3. On the destination side - Migration core needs to know where
to store/write/deliver the received 2MB/4MB data blocks.
          - This is where the migration header would help, to identify
which *_write() function to call.
     4. The respective *_write() function would then write (or
overwrite) the received block at the specified 'offset' within its
state.

* Let's say 'Migration Core' is similar to the TCP layer. Irrespective
of the application protocol (ftp/http/ssh/ncat(1)) TCP behaves the
same. TCP layer identifies where to deliver received data by its port
and connection numbers. TCP layer does not care which program is
running at a given port. Similarly 'Migration core' could read data
from Host-1 and write/deliver it on the Host-2, irrespective of
whether it is a RAM or GPU or any other state block.

* To answer the question, what goes in the header part: is the minimum
information required to identify where to write/deliver the data
block. As with application protocols, that information could be
embedded in the data block itself as well. In which case migration
header may not be required OR it may store bits related to the threads
OR bandwidth control OR accounting etc. depending on their purpose.

* Migration core could:
    - Create/destroy threads to transmit data
    - Apply compression/decompression on the data
    - Apply encryption/decryption on the data
    - Apply bandwidth control/limits across all threads while transmitting data

> Right, so we'd need an extra abstraction layer with a well defined
> interface to convert a raw packet into something that's useful for the
> clients. The vmstate macros actually do that work kind of well. A device
> emulation code does not need to care (too much) about how migration
> works as long as the vmstate is written properly.

* Yes.

Thank you.
---
  - Prasad

Re: [PATCH 1/2] migration: Add some documentation for multifd

Reply via email to