On 10/7/2024 11:27 AM, Peter Xu wrote:
On Mon, Sep 30, 2024 at 12:40:35PM -0700, Steve Sistare wrote:
Stop the vm earlier for cpr, to guarantee consistent device state when
CPR state is saved.

Could you add some more info on why this order matters?

E.g., qmp_migrate should switch migration state machine to SETUP, while
this path holds BQL, I think it means there's no way devices got hot added
concurrently of the whole process.

Would other things change in the cpr states (name, fd, etc.)?  It'll be
great to mention these details in the commit message.

Because of the new cpr-state save operation needed by this mode,
I created this patch to be future proof.  Performing a save operation while
the machine is running is asking for trouble.  But right now, I am not aware
of any specific issues.

Later in the "tap and vhost" series there is another reason to stop the vm here 
and
save cpr state, because the devices must be stopped in old qemu before they
are initialized in new qemu.  If you are curious, see the 2 patches I attached
to the email at
  
https://lore.kernel.org/qemu-devel/fa95c40d-b5e5-41eb-bba7-7842bca2f...@oracle.com/
But, that has nothing to do with the contents of cpr state.

- Steve

Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
---
  migration/migration.c | 22 +++++++++++++---------
  1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index df00e5c..868bf0e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2082,6 +2082,7 @@ void qmp_migrate(const char *uri, bool has_channels,
      MigrationState *s = migrate_get_current();
      g_autoptr(MigrationChannel) channel = NULL;
      MigrationAddress *addr = NULL;
+    bool stopped = false;
/*
       * Having preliminary checks for uri and channel
@@ -2125,6 +2126,15 @@ void qmp_migrate(const char *uri, bool has_channels,
          }
      }
+ if (migrate_mode_is_cpr(s)) {
+        int ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
+        if (ret < 0) {
+            error_setg(&local_err, "migration_stop_vm failed, error %d", -ret);
+            goto out;
+        }
+        stopped = true;
+    }
+
      if (cpr_state_save(&local_err)) {
          goto out;
      }
@@ -2160,6 +2170,9 @@ out:
          }
          migrate_fd_error(s, local_err);
          error_propagate(errp, local_err);
+        if (stopped) {
+            vm_resume(s->vm_old_state);
+        }
          return;
      }
  }
@@ -3743,7 +3756,6 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
      Error *local_err = NULL;
      uint64_t rate_limit;
      bool resume = (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP);
-    int ret;
/*
       * If there's a previous error, free it and prepare for another one.
@@ -3815,14 +3827,6 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
          return;
      }
- if (migrate_mode_is_cpr(s)) {
-        ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
-        if (ret < 0) {
-            error_setg(&local_err, "migration_stop_vm failed, error %d", -ret);
-            goto fail;
-        }
-    }
-
      if (migrate_background_snapshot()) {
          qemu_thread_create(&s->thread, "mig/snapshot",
                  bg_migration_thread, s, QEMU_THREAD_JOINABLE);
--
1.8.3.1




Reply via email to