Steven Sistare <steven.sist...@oracle.com> writes:

> On 10/7/2024 3:44 PM, Peter Xu wrote:
>> On Mon, Sep 30, 2024 at 12:40:44PM -0700, Steve Sistare wrote:
>>> Add the cpr-transfer migration mode.  Usage:
>>>    qemu-system-$arch -machine anon-alloc=memfd ...
>>>
>>>    start new QEMU with "-incoming <uri-1> -cpr-uri <uri-2>"
>>>
>>>    Issue commands to old QEMU:
>>>    migrate_set_parameter mode cpr-transfer
>>>    migrate_set_parameter cpr-uri <uri-2>
>>>    migrate -d <uri-1>
>>>
>>> The migrate command stops the VM, saves CPR state to uri-2, saves
>>> normal migration state to uri-1, and old QEMU enters the postmigrate
>>> state.  The user starts new QEMU on the same host as old QEMU, with the
>>> same arguments as old QEMU, plus the -incoming option.  Guest RAM is
>>> preserved in place, albeit with new virtual addresses in new QEMU.
>>>
>>> This mode requires a second migration channel, specified by the
>>> cpr-uri migration property on the outgoing side, and by the cpr-uri
>>> QEMU command-line option on the incoming side.  The channel must
>>> be a type, such as unix socket, that supports SCM_RIGHTS.
>>>
>>> Memory-backend objects must have the share=on attribute, but
>>> memory-backend-epc is not supported.  The VM must be started with
>>> the '-machine anon-alloc=memfd' option, which allows anonymous
>>> memory to be transferred in place to the new process.  The memfds
>>> are kept open by sending the descriptors to new QEMU via the
>>> cpr-uri, which must support SCM_RIGHTS, and they are mmap'd
>>> in new QEMU.
>>>
>>> Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
>>> ---
>>>   include/migration/cpr.h   |  1 +
>>>   migration/cpr.c           | 34 +++++++++++++++++++----
>>>   migration/migration.c     | 69 
>>> +++++++++++++++++++++++++++++++++++++++++++++--
>>>   migration/migration.h     |  2 ++
>>>   migration/ram.c           |  2 ++
>>>   migration/vmstate-types.c |  5 ++--
>>>   qapi/migration.json       | 27 ++++++++++++++++++-
>>>   stubs/vmstate.c           |  7 +++++
>>>   8 files changed, 137 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/include/migration/cpr.h b/include/migration/cpr.h
>>> index e886c98..5cd373f 100644
>>> --- a/include/migration/cpr.h
>>> +++ b/include/migration/cpr.h
>>> @@ -30,6 +30,7 @@ int cpr_state_save(Error **errp);
>>>   int cpr_state_load(Error **errp);
>>>   void cpr_state_close(void);
>>>   struct QIOChannel *cpr_state_ioc(void);
>>> +bool cpr_needed_for_reuse(void *opaque);
>>>   
>>>   QEMUFile *cpr_transfer_output(const char *uri, Error **errp);
>>>   QEMUFile *cpr_transfer_input(const char *uri, Error **errp);
>>> diff --git a/migration/cpr.c b/migration/cpr.c
>>> index 86f66c1..911b556 100644
>>> --- a/migration/cpr.c
>>> +++ b/migration/cpr.c
>>> @@ -9,6 +9,7 @@
>>>   #include "qapi/error.h"
>>>   #include "migration/cpr.h"
>>>   #include "migration/misc.h"
>>> +#include "migration/options.h"
>>>   #include "migration/qemu-file.h"
>>>   #include "migration/savevm.h"
>>>   #include "migration/vmstate.h"
>>> @@ -57,7 +58,7 @@ static const VMStateDescription vmstate_cpr_fd = {
>>>           VMSTATE_UINT32(namelen, CprFd),
>>>           VMSTATE_VBUFFER_ALLOC_UINT32(name, CprFd, 0, NULL, namelen),
>>>           VMSTATE_INT32(id, CprFd),
>>> -        VMSTATE_INT32(fd, CprFd),
>>> +        VMSTATE_FD(fd, CprFd),
>>>           VMSTATE_END_OF_LIST()
>>>       }
>>>   };
>>> @@ -174,9 +175,16 @@ int cpr_state_save(Error **errp)
>>>   {
>>>       int ret;
>>>       QEMUFile *f;
>>> +    MigMode mode = migrate_mode();
>>>   
>>> -    /* set f based on mode in a later patch in this series */
>>> -    return 0;
>>> +    if (mode == MIG_MODE_CPR_TRANSFER) {
>>> +        f = cpr_transfer_output(migrate_cpr_uri(), errp);
>>> +    } else {
>>> +        return 0;
>>> +    }
>>> +    if (!f) {
>>> +        return -1;
>>> +    }
>>>   
>>>       qemu_put_be32(f, QEMU_CPR_FILE_MAGIC);
>>>       qemu_put_be32(f, QEMU_CPR_FILE_VERSION);
>>> @@ -205,8 +213,18 @@ int cpr_state_load(Error **errp)
>>>       uint32_t v;
>>>       QEMUFile *f;
>>>   
>>> -    /* set f based on mode in a later patch in this series */
>>> -    return 0;
>>> +    /*
>>> +     * Mode will be loaded in CPR state, so cannot use it to decide which
>>> +     * form of state to load.
>>> +     */
>>> +    if (cpr_uri) {
>>> +        f = cpr_transfer_input(cpr_uri, errp);
>>> +    } else {
>>> +        return 0;
>>> +    }
>>> +    if (!f) {
>>> +        return -1;
>>> +    }
>>>   
>>>       v = qemu_get_be32(f);
>>>       if (v != QEMU_CPR_FILE_MAGIC) {
>>> @@ -243,3 +261,9 @@ void cpr_state_close(void)
>>>           cpr_state_file = NULL;
>>>       }
>>>   }
>>> +
>>> +bool cpr_needed_for_reuse(void *opaque)
>>> +{
>>> +    MigMode mode = migrate_mode();
>>> +    return mode == MIG_MODE_CPR_TRANSFER;
>>> +}
>> 
>> Drop it until used?
>
> Maybe, but here is my reason for including it here.
>
> These common functions like cpr_needed_for_reuse and cpr_resave_fd are needed
> by multiple follow-on series: vfio, tap, iommufd.  To send those for comment,
> as I have beem, I need to prepend a patch for cpr_needed_for_reuse to each of
> those series, which is redundant.  It makes more sense IMO to include them in
> this initial series.
>
> But, it's your call.
>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 3301583..73b85aa 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -76,6 +76,7 @@
>>>   static NotifierWithReturnList migration_state_notifiers[] = {
>>>       NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
>>>       NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
>>> +    NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_TRANSFER),
>>>   };
>>>   
>>>   /* Messages sent on the return path from destination to source */
>>> @@ -109,6 +110,7 @@ static int migration_maybe_pause(MigrationState *s,
>>>   static void migrate_fd_cancel(MigrationState *s);
>>>   static bool close_return_path_on_source(MigrationState *s);
>>>   static void migration_completion_end(MigrationState *s);
>>> +static void migrate_hup_delete(MigrationState *s);
>>>   
>>>   static void migration_downtime_start(MigrationState *s)
>>>   {
>>> @@ -204,6 +206,12 @@ 
>>> migration_channels_and_transport_compatible(MigrationAddress *addr,
>>>           return false;
>>>       }
>>>   
>>> +    if (migrate_mode() == MIG_MODE_CPR_TRANSFER &&
>>> +        addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
>>> +        error_setg(errp, "Migration requires streamable transport (eg 
>>> unix)");
>>> +        return false;
>>> +    }
>>> +
>>>       return true;
>>>   }
>>>   
>>> @@ -316,6 +324,7 @@ void migration_cancel(const Error *error)
>>>           qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
>>>       }
>>>       migrate_fd_cancel(current_migration);
>>> +    migrate_hup_delete(current_migration);
>>>   }
>>>   
>>>   void migration_shutdown(void)
>>> @@ -718,6 +727,9 @@ static void qemu_start_incoming_migration(const char 
>>> *uri, bool has_channels,
>>>       } else {
>>>           error_setg(errp, "unknown migration protocol: %s", uri);
>>>       }
>>> +
>>> +    /* Close cpr socket to tell source that we are listening */
>>> +    cpr_state_close();
>> 
>> Would it be possible to use some explicit reply message to mark this?  
>
> In theory yes, but I fear that using a return channel with message parsing and
> dispatch adds more code than it is worth.

I think this approach is fine for now, but I wonder whether we could
reuse the current return path (RP) by starting it earlier and take
benefit from it already having the message passing infrastructure in
place. I'm actually looking ahead to the migration handshake thread[1],
which could be thought to have some similarity with the early cpr
channel. So having a generic channel in place early on to handle
handshake, CPR, RP, etc. could be a good idea.

Anyway, I'm probing on this a bit so I can start drafting something. I
got surprised that we don't even have the capability bits in the stream
in a useful way (currently, configuration_validate_capabilities() does
kind of nothing).

1- https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake


Reply via email to