Peter Xu <pet...@redhat.com> wrote: > postcopy_qemufile_src object should be owned by one thread, either the main > thread (e.g. when at the beginning, or at the end of migration), or by the > return path thread (when during a preempt enabled postcopy migration). If > that's not the case the access to the object might be racy. > > postcopy_preempt_shutdown_file() can be potentially racy, because it's > called at the end phase of migration on the main thread, however during > which the return path thread hasn't yet been recycled; the recycle happens > in await_return_path_close_on_source() which is after this point. > > It means, logically it's posslbe the main thread and the return path thread > are both operating on the same qemufile. While I don't think qemufile is > thread safe at all. > > postcopy_preempt_shutdown_file() used to be needed because that's where we > send EOS to dest so that dest can safely shutdown the preempt thread. > > To avoid the possible race, remove this only place that a race can happen. > Instead we figure out another way to safely close the preempt thread on > dest. > > The core idea during postcopy on deciding "when to stop" is that dest will > send a postcopy SHUT message to src, telling src that all data is there. > Hence to shut the dest preempt thread maybe better to do it directly on > dest node. > > This patch proposed such a way that we change postcopy_prio_thread_created > into PreemptThreadStatus, so that we kick the preempt thread on dest qemu > by a sequence of: > > mis->preempt_thread_status = PREEMPT_THREAD_QUIT; > qemu_file_shutdown(mis->postcopy_qemufile_dst); > > While here shutdown() is probably so far the easiest way to kick preempt > thread from a blocked qemu_get_be64(). Then it reads preempt_thread_status > to make sure it's not a network failure but a willingness to quit the > thread. > > We could have avoided that extra status but just rely on migration status. > The problem is postcopy_ram_incoming_cleanup() is just called early enough > so we're still during POSTCOPY_ACTIVE no matter what.. So just make it > simple to have the status introduced. > > One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of > postcopy preempt. > > Fixes: 9358982744 ("migration: Send requested page directly in rp-return > thread") > Signed-off-by: Peter Xu <pet...@redhat.com> > --- > hw/core/machine.c | 1 + > migration/migration.c | 10 ++++++++-- > migration/migration.h | 34 +++++++++++++++++++++++++++++++++- > migration/postcopy-ram.c | 20 +++++++++++++++----- > 4 files changed, 57 insertions(+), 8 deletions(-) >
As preempt is pretty new I will .... Reviewed-by: Juan Quintela <quint...@redhat.com> But code is quite subtle. queued.