Nicholas Piggin <npig...@gmail.com> writes: Hi Nick,
I'm ignorant about replay, but we try to know why were taking the BQL in the migration code, we move it around sometimes, etc. Can we be a bit more strict with documentation here so we don't get stuck with a lock that can't be changed? > Migration causes a number of events that need to go in the replay > trace, such as vm state transitions. The replay_mutex lock needs to > be held for these. > Is it practical to explicitly list which events are those? Are there any tests that exercise this that we could use to validate changes around this area? > The simplest approach seems to be just take it up-front when taking > the bql. But also the thing asserts if taken inside the BQL, so is the actual matter here that we _cannot_ take the lock around the proper places? I also see the replay lock around the main loop, so is it basically bql2 from the perspective of most of QEMU? > > Signed-off-by: Nicholas Piggin <npig...@gmail.com> > --- > migration/migration.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/migration/migration.c b/migration/migration.c > index 2eb9e50a263..277fca954c1 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -24,6 +24,7 @@ > #include "socket.h" > #include "sysemu/runstate.h" > #include "sysemu/sysemu.h" > +#include "sysemu/replay.h" > #include "sysemu/cpu-throttle.h" > #include "rdma.h" > #include "ram.h" > @@ -2518,6 +2519,7 @@ static int postcopy_start(MigrationState *ms, Error > **errp) > } > > trace_postcopy_start(); > + replay_mutex_lock(); > bql_lock(); > trace_postcopy_start_set_run(); > > @@ -2629,6 +2631,7 @@ static int postcopy_start(MigrationState *ms, Error > **errp) > migration_downtime_end(ms); > > bql_unlock(); > + replay_mutex_unlock(); > > if (migrate_postcopy_ram()) { > /* > @@ -2670,6 +2673,7 @@ fail: > } > migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL); > bql_unlock(); > + replay_mutex_unlock(); > return -1; > } > > @@ -2721,6 +2725,7 @@ static int migration_completion_precopy(MigrationState > *s, > { > int ret; > > + replay_mutex_lock(); > bql_lock(); > > if (!migrate_mode_is_cpr(s)) { > @@ -2746,6 +2751,7 @@ static int migration_completion_precopy(MigrationState > *s, > s->block_inactive); > out_unlock: > bql_unlock(); > + replay_mutex_unlock(); > return ret; > } > > @@ -3633,6 +3639,7 @@ static void *bg_migration_thread(void *opaque) > > trace_migration_thread_setup_complete(); > > + replay_mutex_lock(); > bql_lock(); > > if (migration_stop_vm(s, RUN_STATE_PAUSED)) { > @@ -3666,6 +3673,7 @@ static void *bg_migration_thread(void *opaque) > */ > migration_bh_schedule(bg_migration_vm_start_bh, s); > bql_unlock(); > + replay_mutex_unlock(); > > while (migration_is_active()) { > MigIterateState iter_state = bg_migration_iteration_run(s); > @@ -3695,6 +3703,7 @@ fail: > migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, > MIGRATION_STATUS_FAILED); > bql_unlock(); > + replay_mutex_unlock(); > } > > fail_setup: