On 11/13/2024 4:11 PM, Peter Xu wrote:
On Fri, Nov 01, 2024 at 06:47:49AM -0700, Steve Sistare wrote:
Split qmp_migrate into start and finish functions.  Finish will be
called asynchronously in a subsequent patch, but for now, call it
immediately.  No functional change.

Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
---
  migration/migration.c | 27 +++++++++++++++++++++++++++
  1 file changed, 27 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 6dc7c09..86b3f39 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1521,6 +1521,7 @@ static void migrate_fd_error(MigrationState *s, const 
Error *error)
  static void migrate_fd_cancel(MigrationState *s)
  {
      int old_state ;
+    bool setup = (s->state == MIGRATION_STATUS_SETUP);
trace_migrate_fd_cancel(); @@ -1565,6 +1566,15 @@ static void migrate_fd_cancel(MigrationState *s)
              s->block_inactive = false;
          }
      }
+
+    /*
+     * If qmp_migrate_finish has not been called, then there is no path that
+     * will complete the cancellation.  Do it now.
+     */
+    if (setup && !s->to_dst_file) {
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_CANCELLED);
+        vm_resume(s->vm_old_state);
+    }

Hmm.. this doesn't look like the right place to put this change.. as this
patch logically should bring no functional change if it's only about a new
helper split an existing function.

I can move it to "cpr-transfer mode".
Tthe remainder of this patch becomes quite small.
I should probably just squash it all into "cpr-transfer mode".

Meanwhile, this change also doesn't yet tell where does a vm_resume() came
from..

The vm_resume is needed because of the patch "stop vm earlier for cpr".
In qmp_migrate, it calls migration_stop_vm in which sets vm_old_state.
Hence vm_resume restores that vm_old_state.

However, I moved "stop vm earlier for cpr" to a later patch series, so I
must also move the vm_resume call to that patch.

I'm really not sure whether this is correct at all, consider someone
does QMP "stop", migrate then quickly cancel it.  I suppose it may
accidentally resume the VM which it shouldn't.

vm_resume only resumes execution if vm_old_state is a running state.  As
long as vm_old_state is captured at the start of qmp_migrate, as happens
in "stop vm earlier for cpr", then vm_resume does the right thing.

Not to mention checking "setup" early, and unconditionally modify the state
here no matter what it is (can it be things like FAILED now, then
overwritten by a CANCELLED)?

OK, I will only overwrite a cancelling state:
  migrate_set_state(&s->state, MIGRATION_STATUS_CANCELLING, 
MIGRATION_STATUS_CANCELLED);

But I'd confess that's not the problem of
this patch, but that migration state machine is currently still racy.. afaiu.

  }
void migration_add_notifier_mode(NotifierWithReturn *notify,
@@ -2072,6 +2082,9 @@ static bool migrate_prepare(MigrationState *s, bool 
resume, Error **errp)
      return true;
  }
+static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
+                               Error **errp);
+
  void qmp_migrate(const char *uri, bool has_channels,
                   MigrationChannelList *channels, bool has_detach, bool detach,
                   bool has_resume, bool resume, Error **errp)
@@ -2118,6 +2131,20 @@ void qmp_migrate(const char *uri, bool has_channels,
          return;
      }
+ qmp_migrate_finish(addr, resume_requested, errp);
+
+    if (local_err) {
+        migrate_fd_error(s, local_err);
+        error_propagate(errp, local_err);
+    }

I don't see when local_err will be set at all until here.. maybe you meant
*errp, but then maybe we should drop local_err and use ERRP_GUARD().

In this patch local_err is always NULL.  It may be non-NULL in patch
"cpr-transfer mode".

I'll just squash "split qmp_migrate".  It was intended to siphon some
complexity away from "cpr-transfer mode", but raises more questions
than it answers.

- Steve

+}
+
+static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
+                               Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+    Error *local_err = NULL;
+
      if (!resume_requested) {
          if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
              return;
--
1.8.3.1




Reply via email to