On Wed, Mar 05, 2025 at 04:11:30PM +0100, Maciej S. Szmigiero wrote:
> On 5.03.2025 13:34, Peter Xu wrote:
> > On Tue, Mar 04, 2025 at 11:03:34PM +0100, Maciej S. Szmigiero wrote:
> > > From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
> > > 
> > > All callers to migration_incoming_state_destroy() other than
> > > postcopy_ram_listen_thread() do this call with BQL held.
> > > 
> > > Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
> > > SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
> > > to always call that function under BQL rather than to have it deal with
> > > both cases (with BQL and without BQL).
> > > Add the necessary bql_lock() and bql_unlock() to
> > > postcopy_ram_listen_thread().
> > > 
> > > qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
> > > "load_state" SaveVMHandlers that are expecting BQL to be held.
> > > 
> > > In principle, the only devices that should be arriving on migration
> > > channel serviced by postcopy_ram_listen_thread() are those that are
> > > postcopiable and whose load handlers are safe to be called without BQL
> > > being held.
> > > 
> > > But nothing currently prevents the source from sending data for "unsafe"
> > > devices which would cause trouble there.
> > > Add a TODO comment there so it's clear that it would be good to improve
> > > handling of such (erroneous) case in the future.
> > > 
> > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
> > > ---
> > >   migration/migration.c | 16 ++++++++++++++++
> > >   migration/savevm.c    |  4 ++++
> > >   2 files changed, 20 insertions(+)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 9e9db26667f1..6b2a8af4231d 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void)
> > >       struct MigrationIncomingState *mis = 
> > > migration_incoming_get_current();
> > >       multifd_recv_cleanup();
> > > +
> > >       /*
> > >        * RAM state cleanup needs to happen after multifd cleanup, because
> > >        * multifd threads can use some of its states (receivedmap).
> > > +     *
> > > +     * This call also needs BQL held since it calls all registered
> > > +     * load_cleanup SaveVMHandlers and at least the VFIO implementation 
> > > is
> > > +     * BQL-sensitive.
> > > +     *
> > > +     * In addition to the above, it also performs cleanup of load threads
> > > +     * thread pool.
> > > +     * This cleanup operation is BQL-sensitive as it requires unlocking 
> > > BQL
> > > +     * so a thread possibly waiting for it could get unblocked and 
> > > finally
> > > +     * exit.
> > > +     * The reason why a load thread may need to hold BQL in the first 
> > > place
> > > +     * is because address space modification operations require it.
> > 
> > Hold on...
> > 
> > This almost says exactly why load_cleanup() should _not_ take BQL... rather
> > than should..
> > 
> > So I had a closer look at the latest code, it's about this:
> > 
> > static void vfio_load_cleanup_load_bufs_thread(VFIOMultifd *multifd)
> > {
> >      /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */
> >      bql_unlock();
> >      WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) {
> >          while (multifd->load_bufs_thread_running) {
> >              multifd->load_bufs_thread_want_exit = true;
> > 
> >              qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond);
> >              qemu_cond_signal(&multifd->load_bufs_iter_done_cond);
> >              qemu_cond_wait(&multifd->load_bufs_thread_finished_cond,
> >                             &multifd->load_bufs_mutex);
> >          }
> >      }
> >      bql_lock();
> > }
> > 
> > It doesn't make much sense to me to take it only because we want to drop it
> > unconditionally. Can we guarantee the function not taking BQL instead?  I
> > had a quick look on pmem's pmem_persist() (from libpmem, qemu_ram_msync <-
> > qemu_ram_block_writeback <- ram_load_cleanup), it looks ok.
> > 
> > So the question is, is it safe to unlock BQL in whatever context (in
> > coroutines, or in a bottom half)?
> > 
> > If the answer is yes, we could make migration_incoming_state_destroy()
> > always not taking BQL (and assert(!bql_locked()) instead).
> 
> All the other callers of migration_incoming_state_destroy() are holding BQL:
> process_incoming_migration_bh(), process_incoming_migration_co() (called on,
> failure path only), load_snapshot() and qmp_xen_load_devices_state().
> 
> So AFAIK the safer way is to standardize on holding BQL when calling
> that function.
> > If the answer is no, then vfio_load_cleanup_load_bufs_thread()'s current
> > version may not work either..
> 
> I think the reason for BQL is to serialize access to the QEMU internals
> which are not thread-safe.
> 
> So as long as these internals aren't touched when not holding BQL then
> we should be safe - I don't see any particular state that's cached
> around these BQL calls and would need explicit reloading after re-gaining
> it.

OK, I checked with misterious force and looks like it's ok.

Would you please rephrase the comment, though?  I want to make it crystal
clear that what we're looking for is not holding BQL.. Maybe something like
this:

    /*
     * The VFIO load_cleanup() implementation is BQL-sensitive. It requires
     * BQL must NOT be taken when recycling load threads, so that it won't
     * block the load threads from making progress on address space
     * modification operations.
     *
     * To make it work, we could try to not take BQL for all load_cleanup(),
     * or conditionally unlock BQL only if bql_locked() in VFIO.
     *
     * Since most existing call sites take BQL for load_cleanup(), make
     * it simple by taking BQL always as the rule, so that VFIO can unlock
     * BQL and retake unconditionally.
     */

We may also want to update the subject.  Currently:

  "migration: postcopy_ram_listen_thread() should take BQL for some calls"

It's not accurate anymore, it could be:

  "migration: Always take BQL for migration_incoming_state_destroy()"

If with all above, please feel free to take:

Acked-by: Peter Xu <pet...@redhat.com>

I'm OK if it'll be touched up when merge too.

Thanks,

-- 
Peter Xu


Reply via email to