On Thursday, July 27, 2023 1:10 AM, Peter Xu wrote: > On Fri, Jul 21, 2023 at 11:14:55AM +0000, Wang, Wei W wrote: > > On Friday, July 21, 2023 4:38 AM, Peter Xu wrote: > > > Looks good to me, after addressing Isaku's comments. > > > > > > The current_active_state is very unfortunate, along with most of the > > > calls to > > > migrate_set_state() - I bet most of the code will definitely go > > > wrong if that cmpxchg didn't succeed inside of migrate_set_state(), > > > IOW in most cases we simply always want: > > > > Can you share examples where it could be wrong? > > (If it has bugs, we need to fix) > > Nop. What I meant is most of the cases we want to set the state without > caring much about the old state, so at least we can have a helper like below > and simply call migrate_set_state(s, STATE) where we don't care old state. > > > > > > > > > migrate_set_state(&s->state, s->state, XXX); > > > > > > Not sure whether one pre-requisite patch is good to have so we can > > > rename > > > migrate_set_state() to something like __migrate_set_state(), then: > > > > > > migrate_set_state(s, XXX) { > > > __migrate_set_state(&s->state, s->state, XXX); > > > } > > > > > > I don't even know whether there's any call site that will need > > > __migrate_set_state() for real.. > > > > > > > Seems this would break the use of "MIGRATION_STATUS_CANCELLING". > > For example, > > - In migration_maybe_pause: > > migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, > > new_state); If the current > > s->state isn't MIGRATION_STATUS_PRE_SWITCHOVER (could be > > MIGRATION_STATUS_CANCELLING), then s->state won’t be updated to > > new_state. > > - Then, in migration_completion, the following update to s->state won't > succeed: > > migrate_set_state(&s->state, current_active_state, > > MIGRATION_STATUS_COMPLETED); > > > > - Finally, when reaching migration_iteration_finish(), s->state is > > MIGRATION_STATUS_CANCELLING, instead of > MIGRATION_STATUS_COMPLETED. > > The whole state changes are just flaky to me in general, even with the help of > old_state cmpxchg.
Yes, the design/implementation of the migration state transition can be improved (it looks fragile to me). I think this should be done in a separate patchset, though. For this patch, we could keep it no functional change. > > E.g., I'm wondering whether below race can happen, assuming we're starting > with ACTIVE state and just about to complete migration: > > main thread migration thread > ----------- ---------------- > > migration_maybe_pause(current_active_state==ACTIVE) > if (s->state != > MIGRATION_STATUS_CANCELLING) > --> true, keep setting state > qemu_mutex_unlock_iothread(); > qemu_mutex_lock_iothread(); > migrate_fd_cancel() > if (old_state == MIGRATION_STATUS_PRE_SWITCHOVER) > --> false, not posting to pause_sem > set state to MIGRATION_STATUS_CANCELLING > migrate_set_state(&s->state, > *current_active_state, > > MIGRATION_STATUS_PRE_SWITCHOVER); > --> false, cmpxchg fail > qemu_sem_wait(&s->pause_sem); > --> hang death? Still need "migrate continue" to unblock the migration thread. Probably we should document that PAUSE_BEFORE_SWITCHOVER always requires an explicit "migrate continue" to be issued from user (even after migration is cancelled).