On Mon, Jun 05, 2023 at 03:27:53PM +0300, Avihai Horon wrote: > Hi Peter, > > On 02/06/2023 17:47, Peter Xu wrote: > > External email: Use caution opening links or attachments > > > > > > Add a new migration parameter switchover-hold which can block src qemu > > migration from switching over to dest from running. > > > > One can set this flag to true so src qemu will keep iterating the VM data, > > not switching over to dest even if it can. > > > > It means now live migration works somehow like COLO; we keep syncing data > > from src to dst without stopping. > > > > When the user is ready for the switchover, one can set the parameter from > > true->false. That'll contain a implicit kick to migration thread to be > > alive and re-evaluate the switchover decision. > > > > This can be used in two cases so far in my mind: > > > > (1) One can use this parameter to start pre-heating migration (but not > > really migrating, so a migrate-cancel will cancel the preheat). When > > the user wants to really migrate, just clear the flag. It'll in most > > cases migrate immediately because most pages are already synced. > > > > (2) Can also be used as a clean way to do qtest, in many of the precopy > > tests we have requirement to run after 1 iteration without completing > > the precopy migration. Before that we have either set bandwidth to > > ridiculous low value, or tricks on detecting guest memory change over > > some adhoc guest memory position. Now we can simply set this flag > > then we know precopy won't complete and will just keep going. > > > > Here we leveraged a sem to make sure migration thread won't busy spin on a > > physical cpu, meanwhile provide a timedwait() of 10ms so it can still try > > its best to sync with dest QEMU from time to time. Note that the sem is > > prone to outdated counts but it's benign, please refer to the comment above > > the semaphore definition for more information. > > > > Signed-off-by: Peter Xu <pet...@redhat.com> > > --- > > qapi/migration.json | 25 +++++++++++++-- > > migration/migration.h | 16 ++++++++++ > > migration/migration-hmp-cmds.c | 3 ++ > > migration/migration.c | 56 ++++++++++++++++++++++++++++++++-- > > migration/options.c | 17 +++++++++++ > > 5 files changed, 111 insertions(+), 6 deletions(-) > > > > diff --git a/qapi/migration.json b/qapi/migration.json > > index 179af0c4d8..1d0059d125 100644 > > --- a/qapi/migration.json > > +++ b/qapi/migration.json > > @@ -779,6 +779,15 @@ > > # Nodes are mapped to their block device name if there is one, and > > # to their node name otherwise. (Since 5.2) > > # > > +# @switchover-hold: Whether we should hold-off precopy switchover from src > > +# to dest QEMU, even if we can finish migration in the downtime > > +# specified. By default off, so precopy migration will complete as > > +# soon as possible. One can set it to explicitly keep iterating during > > +# precopy migration until set the flag to false again to kick off the > > +# final switchover. Note, this does not affect postcopy switchover, > > +# because the user can control that using "migrate-start-postcopy" > > +# command explicitly. (Since 8.1) > > +# > > I think this should wrap at col 70, and need double space before (Since > 8.1). > Also in other places below.
I don't know that rule; anywhere documented? It seems true for qapi/. Obviously I forgot to copy Markus and Eric at least, since this is qmp stuff. > > > # Features: > > # > > # @unstable: Member @x-checkpoint-delay is experimental. > > @@ -800,7 +809,7 @@ > > 'xbzrle-cache-size', 'max-postcopy-bandwidth', > > 'max-cpu-throttle', 'multifd-compression', > > 'multifd-zlib-level' ,'multifd-zstd-level', > > - 'block-bitmap-mapping' ] } > > + 'block-bitmap-mapping', 'switchover-hold' ] } > > > > ## > > # @MigrateSetParameters: > > @@ -935,6 +944,10 @@ > > # Nodes are mapped to their block device name if there is one, and > > # to their node name otherwise. (Since 5.2) > > # > > +# @switchover-hold: Whether we should hold-off precopy switchover from src > > +# to dest QEMU. For more details, please refer to MigrationParameter > > +# entry of the same field. (Since 8.1) > > +# > > # Features: > > # > > # @unstable: Member @x-checkpoint-delay is experimental. > > @@ -972,7 +985,8 @@ > > '*multifd-compression': 'MultiFDCompression', > > '*multifd-zlib-level': 'uint8', > > '*multifd-zstd-level': 'uint8', > > - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } > > + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ], > > + '*switchover-hold': 'bool' } } > > > > ## > > # @migrate-set-parameters: > > @@ -1127,6 +1141,10 @@ > > # Nodes are mapped to their block device name if there is one, and > > # to their node name otherwise. (Since 5.2) > > # > > +# @switchover-hold: Whether we should hold-off precopy switchover from src > > +# to dest QEMU. For more details, please refer to MigrationParameter > > +# entry of the same field. (Since 8.1) > > +# > > # Features: > > # > > # @unstable: Member @x-checkpoint-delay is experimental. > > @@ -1161,7 +1179,8 @@ > > '*multifd-compression': 'MultiFDCompression', > > '*multifd-zlib-level': 'uint8', > > '*multifd-zstd-level': 'uint8', > > - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } > > + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ], > > + '*switchover-hold': 'bool' } } > > > > ## > > # @query-migrate-parameters: > > diff --git a/migration/migration.h b/migration/migration.h > > index 30c3e97635..721b1c9473 100644 > > --- a/migration/migration.h > > +++ b/migration/migration.h > > @@ -440,6 +440,22 @@ struct MigrationState { > > > > /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. > > */ > > JSONWriter *vmdesc; > > + /* > > + * Only migration thread will wait on it when switchover_hold==true. > > + * > > + * Only qmp set param will kick it when switching switchover_hold from > > + * true->false. > > + * > > + * NOTE: outdated sem count here is benign. E.g., when this is posted, > > + * the 1st migration got cancelled, then start the 2nd migration, or > > + * when someone sets the flag from true->false->true->false.. because > > + * any outdated sem count will only let the migration thread to run one > > + * more loop (timedwait() will eat the outdated count) when reaching > > + * the completion phase, then in the next loop it'll sleep again. The > > + * important thing here OTOH is when the migration thread is sleeping > > + * we can always kick it out of the sleep, which we will always do. > > + */ > > + QemuSemaphore switchover_hold_sem; > > }; > > > > void migrate_set_state(int *state, int old_state, int new_state); > > diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c > > index 9885d7c9f7..63a2c8a4a3 100644 > > --- a/migration/migration-hmp-cmds.c > > +++ b/migration/migration-hmp-cmds.c > > @@ -338,6 +338,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const > > QDict *qdict) > > monitor_printf(mon, "%s: '%s'\n", > > MigrationParameter_str(MIGRATION_PARAMETER_TLS_AUTHZ), > > params->tls_authz); > > + monitor_printf(mon, "%s: %s\n", > > + MigrationParameter_str(MIGRATION_PARAMETER_SWITCHOVER_HOLD), > > + params->switchover_hold ? "on" : "off"); > > > > if (params->has_block_bitmap_mapping) { > > const BitmapMigrationNodeAliasList *bmnal; > > diff --git a/migration/migration.c b/migration/migration.c > > index dc05c6f6ea..076c9f1372 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -2693,6 +2693,55 @@ static void migration_update_counters(MigrationState > > *s, > > bandwidth, s->threshold_size); > > } > > > > +static bool > > +migration_should_complete(MigrationState *s, uint64_t pending_size) > > +{ > > + /* We still have large pending data to send? */ > > + if (pending_size && (pending_size >= s->threshold_size)) { > > The parenthesis here are redundant. > > > + return false; > > + } > > + > > + /* The user doesn't want us to switchover yet */ > > + if (s->parameters.switchover_hold) { > > Should we check this only if we are in precopy? I.e., skip this check if > postcopy is active? > > I thought that this parameter is all about precopy switchover, and the fact > that it prevents stopcopy from reaching migration_completion() is just an > undesired side effect. > As I see it, if this parameter is set then precopy switchover is hold off, > and if on top of that a user starts postcopy then this parameter is not > relevant anymore - we should start postcopy and ignore it for the rest of > migration. > > Does it make sense? Yes, it matches more with the doc update indeed. I didn't bother with that and added flag=false in the postcopy tests in the follow up patch because logically a postcopy user shouldn't even use it (also per the doc), but I can also make it strict here as you said. Thanks for taking a look! -- Peter Xu