On 5.03.2025 13:34, Peter Xu wrote:
On Tue, Mar 04, 2025 at 11:03:34PM +0100, Maciej S. Szmigiero wrote:
From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>

All callers to migration_incoming_state_destroy() other than
postcopy_ram_listen_thread() do this call with BQL held.

Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
to always call that function under BQL rather than to have it deal with
both cases (with BQL and without BQL).
Add the necessary bql_lock() and bql_unlock() to
postcopy_ram_listen_thread().

qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
"load_state" SaveVMHandlers that are expecting BQL to be held.

In principle, the only devices that should be arriving on migration
channel serviced by postcopy_ram_listen_thread() are those that are
postcopiable and whose load handlers are safe to be called without BQL
being held.

But nothing currently prevents the source from sending data for "unsafe"
devices which would cause trouble there.
Add a TODO comment there so it's clear that it would be good to improve
handling of such (erroneous) case in the future.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
---
  migration/migration.c | 16 ++++++++++++++++
  migration/savevm.c    |  4 ++++
  2 files changed, 20 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 9e9db26667f1..6b2a8af4231d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void)
      struct MigrationIncomingState *mis = migration_incoming_get_current();
multifd_recv_cleanup();
+
      /*
       * RAM state cleanup needs to happen after multifd cleanup, because
       * multifd threads can use some of its states (receivedmap).
+     *
+     * This call also needs BQL held since it calls all registered
+     * load_cleanup SaveVMHandlers and at least the VFIO implementation is
+     * BQL-sensitive.
+     *
+     * In addition to the above, it also performs cleanup of load threads
+     * thread pool.
+     * This cleanup operation is BQL-sensitive as it requires unlocking BQL
+     * so a thread possibly waiting for it could get unblocked and finally
+     * exit.
+     * The reason why a load thread may need to hold BQL in the first place
+     * is because address space modification operations require it.

Hold on...

This almost says exactly why load_cleanup() should _not_ take BQL... rather
than should..

So I had a closer look at the latest code, it's about this:

static void vfio_load_cleanup_load_bufs_thread(VFIOMultifd *multifd)
{
     /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */
     bql_unlock();
     WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) {
         while (multifd->load_bufs_thread_running) {
             multifd->load_bufs_thread_want_exit = true;

             qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond);
             qemu_cond_signal(&multifd->load_bufs_iter_done_cond);
             qemu_cond_wait(&multifd->load_bufs_thread_finished_cond,
                            &multifd->load_bufs_mutex);
         }
     }
     bql_lock();
}

It doesn't make much sense to me to take it only because we want to drop it
unconditionally. Can we guarantee the function not taking BQL instead?  I
had a quick look on pmem's pmem_persist() (from libpmem, qemu_ram_msync <-
qemu_ram_block_writeback <- ram_load_cleanup), it looks ok.

So the question is, is it safe to unlock BQL in whatever context (in
coroutines, or in a bottom half)?

If the answer is yes, we could make migration_incoming_state_destroy()
always not taking BQL (and assert(!bql_locked()) instead).

All the other callers of migration_incoming_state_destroy() are holding BQL:
process_incoming_migration_bh(), process_incoming_migration_co() (called on,
failure path only), load_snapshot() and qmp_xen_load_devices_state().

So AFAIK the safer way is to standardize on holding BQL when calling
that function.
If the answer is no, then vfio_load_cleanup_load_bufs_thread()'s current
version may not work either..

I think the reason for BQL is to serialize access to the QEMU internals
which are not thread-safe.

So as long as these internals aren't touched when not holding BQL then
we should be safe - I don't see any particular state that's cached
around these BQL calls and would need explicit reloading after re-gaining
it.

Thanks,
Maciej


Reply via email to