* Kunkun Jiang (jiangkun...@huawei.com) wrote: > On 2021/7/6 18:27, Dr. David Alan Gilbert wrote: > > * Kunkun Jiang (jiangkun...@huawei.com) wrote: > > > Hi Daniel, > > > > > > On 2021/7/5 20:48, Daniel P. Berrangé wrote: > > > > On Mon, Jul 05, 2021 at 08:36:52PM +0800, Kunkun Jiang wrote: > > > > > In the current version, the source QEMU process does not automatic > > > > > exit after a successful migration. Additional action is required, > > > > > such as sending { "execute": "quit" } or ctrl+c. For simplify, add > > > > > a new shutdown cause 'migration-completed' to exit the source QEMU > > > > > process after a successful migration. > > > > IIUC, 'STATUS_COMPLETED' state is entered on the source host > > > > once it has finished sending all VM state, and thus does not > > > > guarantee that the target host has successfully received and > > > > loaded all VM state. > > > Thanks for your reply. > > > > > > If the target host doesn't successfully receive and load all VM state, > > > we can send { "execute": "cont" } to resume the soruce in time to > > > ensure that VM will not lost? > > Yes, that's pretty common at the moment; the failed migration can > > happen at lots of different points: > > a) The last part of the actual migration stream/loading the devices > > - that's pretty easy, since the destination hasn't actually got > > the full migration stream. > > > > b) If the migration itself completes, but then the management system > > then tries to reconfigure the networking/storage on the destination, > > and something goes wrong in that, then it can roll that back and > > cont on the source. > > > > So, it's a pretty common type of failure/recovery - the management > > application has to be a bit careful not to do anything destructive > > until as late as possible, so it knows it can switch back. > Okay, I see. > > > > Typically a mgmt app will need to directly confirm that the > > > > target host QEMU has succesfully started running, before it > > > > will tell the source QEMU to quit. > > > 'a mgmt app', such as libvirt? > > Yes, it's currently libvirt that does that; but any of the control > > things could (it's just libvirt has been going long enough so it knows > > about lots and lots of nasty cases of migration failure, and recovering > > properly). > > > > Can you explain why did you want to get the source to automatically > > quit? In a real setup where does it help? > Sorry, my thoughts on live migration scenarios are not comprehensive enough.
That's OK; it takes a little while to understand all of the recovery and error cases; people *really* want to recover from a failed migration; so we try and be very careful about not throwing away the source. Dave > Thanks, > Kunkun Jiang > > Dave > > > > > > > Thanks, > > > Kunkun Jiang > > > > So, AFAICT, this automatic exit after STATUS_COMPLETED is > > > > not safe and could lead to total loss of the running VM in > > > > error scenarios. > > > > > > > > > > > > > > > > > Signed-off-by: Kunkun Jiang <jiangkun...@huawei.com> > > > > > --- > > > > > migration/migration.c | 1 + > > > > > qapi/run-state.json | 4 +++- > > > > > 2 files changed, 4 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > > > index 4228635d18..16782c93c2 100644 > > > > > --- a/migration/migration.c > > > > > +++ b/migration/migration.c > > > > > @@ -3539,6 +3539,7 @@ static void > > > > > migration_iteration_finish(MigrationState *s) > > > > > case MIGRATION_STATUS_COMPLETED: > > > > > migration_calculate_complete(s); > > > > > runstate_set(RUN_STATE_POSTMIGRATE); > > > > > + > > > > > qemu_system_shutdown_request(SHUTDOWN_CAUSE_MIGRATION_COMPLETED); > > > > > break; > > > > > case MIGRATION_STATUS_ACTIVE: > > > > > diff --git a/qapi/run-state.json b/qapi/run-state.json > > > > > index 43d66d700f..66aaef4e2b 100644 > > > > > --- a/qapi/run-state.json > > > > > +++ b/qapi/run-state.json > > > > > @@ -86,12 +86,14 @@ > > > > > # ignores --no-reboot. This is useful for > > > > > sanitizing > > > > > # hypercalls on s390 that are used during > > > > > kexec/kdump/boot > > > > > # > > > > > +# @migration-completed: Reaction to the successful migration > > > > > +# > > > > > ## > > > > > { 'enum': 'ShutdownCause', > > > > > # Beware, shutdown_caused_by_guest() depends on enumeration order > > > > > 'data': [ 'none', 'host-error', 'host-qmp-quit', > > > > > 'host-qmp-system-reset', > > > > > 'host-signal', 'host-ui', 'guest-shutdown', > > > > > 'guest-reset', > > > > > - 'guest-panic', 'subsystem-reset'] } > > > > > + 'guest-panic', 'subsystem-reset', 'migration-completed'] > > > > > } > > > > > ## > > > > > # @StatusInfo: > > > > > -- > > > > > 2.23.0 > > > > > > > > > > > > > > Regards, > > > > Daniel > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK