On Mon, Feb 03, 2025 at 10:41:43PM +0100, Maciej S. Szmigiero wrote:
> On 3.02.2025 21:36, Peter Xu wrote:
> > On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote:
> > > On 3.02.2025 20:58, Peter Xu wrote:
> > > > On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote:
> > > > > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote:
> > > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:
> > > > > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote:
> > > > > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:
> > > > > > > > > From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
> > > > > > > > > 
> > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it 
> > > > > > > > > needs to
> > > > > > > > > take BQL around function calls to migration methods requiring 
> > > > > > > > > BQL.
> > > > > > > > > 
> > > > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately 
> > > > > > > > > calls
> > > > > > > > > "load_state" SaveVMHandlers.
> > > > > > > > > 
> > > > > > > > > migration_incoming_state_destroy() needs BQL held since it 
> > > > > > > > > ultimately calls
> > > > > > > > > "load_cleanup" SaveVMHandlers.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Maciej S. Szmigiero 
> > > > > > > > > <maciej.szmigi...@oracle.com>
> > > > > > > > > ---
> > > > > > > > >      migration/savevm.c | 4 ++++
> > > > > > > > >      1 file changed, 4 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > > > > > > index b0b74140daea..0ceea9638cc1 100644
> > > > > > > > > --- a/migration/savevm.c
> > > > > > > > > +++ b/migration/savevm.c
> > > > > > > > > @@ -2013,7 +2013,9 @@ static void 
> > > > > > > > > *postcopy_ram_listen_thread(void *opaque)
> > > > > > > > >           * in qemu_file, and thus we must be blocking now.
> > > > > > > > >           */
> > > > > > > > >          qemu_file_set_blocking(f, true);
> > > > > > > > > +    bql_lock();
> > > > > > > > >          load_res = qemu_loadvm_state_main(f, mis);
> > > > > > > > > +    bql_unlock();
> > > > > > > > 
> > > > > > > > Doesn't that leave that held for a heck of a long time?
> > > > > > > 
> > > > > > > Yes, and it effectively broke "postcopy recover" test but I
> > > > > > > think the reason for that is qemu_loadvm_state_main() and
> > > > > > > its children don't drop BQL while waiting for I/O.
> > > > > > > 
> > > > > > > I've described this case in more detail in my reply to Fabiano 
> > > > > > > here:
> > > > > > > https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/
> > > > > > 
> > > > > > While it might be the cause in this case, my feeling is it's more 
> > > > > > fundamental
> > > > > > here - it's the whole reason that postcopy has a separate ram listen
> > > > > > thread.  As the destination is running, after it loads it's devices
> > > > > > and as it starts up the destination will be still loading RAM
> > > > > > (and other postcopiable devices) potentially for quite a while.
> > > > > > Holding the bql around the ram listen thread means that the
> > > > > > execution of the destination won't be able to take that lock
> > > > > > until the postcopy load has finished; so while that might apparently
> > > > > > complete, it'll lead to the destination stalling until that's 
> > > > > > finished
> > > > > > which defeats the whole point of postcopy.
> > > > > > That last one probably won't fail a test but it will lead to a long 
> > > > > > stall
> > > > > > if you give it a nice big guest with lots of RAM that it's rapidly
> > > > > > changing.
> > > > > 
> > > > > Okay, I understand the postcopy case/flow now.
> > > > > Thanks for explaining it clearly.
> > > > > 
> > > > > > > I still think that "load_state" SaveVMHandlers need to be called
> > > > > > > with BQL held since implementations apparently expect it that way:
> > > > > > > for example, I think PCI device configuration restore calls
> > > > > > > address space manipulation methods which abort() if called
> > > > > > > without BQL held.
> > > > > > 
> > > > > > However, the only devices that *should* be arriving on the channel
> > > > > > that the postcopy_ram_listen_thread is reading from are those
> > > > > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap).
> > > > > > Those load handlers are safe to be run while the other devices
> > > > > > are being changed.   Note the *should* - you could add a check
> > > > > > to fail if any other device arrives on that channel.
> > > > > 
> > > > > I think ultimately there should be either an explicit check, or,
> > > > > as you suggest in the paragraph below, a separate SaveVMHandler
> > > > > that runs without BQL held.
> > > > 
> > > > To me those are bugs happening during postcopy, so those abort()s in
> > > > memory.c are indeed for catching these issues too.
> > > > 
> > > > > Since the current state of just running these SaveVMHandlers
> > > > > without BQL in this case and hoping that nothing breaks is
> > > > > clearly sub-optimal.
> > > > > 
> > > > > > > I have previously even submitted a patch to explicitly document
> > > > > > > "load_state" SaveVMHandler as requiring BQL (which was also
> > > > > > > included in the previous version of this patch set) and it
> > > > > > > received a "Reviewed-by:" tag:
> > > > > > > https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/
> > > > > > > https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/
> > > > > > > https://lore.kernel.org/qemu-devel/87o732bti7....@suse.de/
> > > > > > 
> > > > > > It happens!
> > > > > > You could make this safer by having a load_state and a 
> > > > > > load_state_postcopy
> > > > > > member, and only mark the load_state as requiring the lock.
> > > > > 
> > > > > To not digress too much from the subject of this patch set
> > > > > (multifd VFIO device state transfer) for now I've just updated the
> > > > > TODO comment around that qemu_loadvm_state_main(), so hopefully this
> > > > > discussion won't get forgotten:
> > > > > https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79
> > > > 
> > > > The commit message may still need some touch ups, e.g.:
> > > > 
> > > >     postcopy_ram_listen_thread() is a free running thread, so it needs 
> > > > to
> > > >     take BQL around function calls to migration methods requiring BQL.
> > > > 
> > > > 
> > > > This sentence is still not correct, IMHO. As Dave explained, the ram 
> > > > load
> > > > thread is designed to run without BQL at least for the major workloads 
> > > > it
> > > > runs.
> > > 
> > > So what's your proposed wording of this commit then?
> > 
> > Perhaps dropping it? As either it implies qemu_loadvm_state_main() needs to
> > take bql (which could be wrong in case of postcopy at least from
> > design.. not sanity check pov), or it provides no real meaning to suggest
> > where to take it..
> > 
> > Personally I would put the comment as easy as possible - the large portion
> > isn't helping me to understand the code but only made it slightly more
> > confusing..
> > 
> >      /*
> >       * TODO: qemu_loadvm_state_main() could call "load_state" 
> > SaveVMHandlers
> >       * that are expecting BQL to be held, which isn't in this case.
> >       *
> >       * In principle, the only devices that should be arriving on this 
> > channel
> >       * now are those that are postcopiable and whose load handlers are safe
> >       * to be called without BQL being held.
> >       *
> >       * But nothing currently prevents the source from sending data for 
> > "unsafe"
> >       * devices which would cause trouble here.
> >       */
> > 
> > IMHO we could put it very simple if you think we need such sanity check
> > later:
> > 
> >      /* TODO: sanity check that only postcopiable data will be loaded here 
> > */
> 
> I think I will change that comment wording to the one ^^^^ you suggested 
> above,
> since we still need to have this commit to take BQL around that
> migration_incoming_state_destroy() call in postcopy_ram_listen_thread().
> 
> > > 
> > > > I don't worry on src sending something that crashes the dest: if that
> > > > happens, that's a bug, we need to fix it..  In that case abort() either 
> > > > in
> > > > memory.c or migration/ would be the same.
> > > 
> > > Yeah, but it would be a bug in the source (or just bit stream corruption 
> > > for
> > > any reason), yet it's the destination which would abort() or crash.
> > > 
> > > I think cases like that in principle should be handled more gracefully,
> > > like exiting the destination QEMU with an error.
> > > But that's something outside of the scope of this patch set.
> > 
> > Yes I agree.  It's just that postcopy normally cannot gracefully quits on
> > dest anyway.. as src QEMU cannot continue with a dead dest QEMU. For
> > obvious programming errors, I think abort() is still ok in this case, on
> > either src or dest if postcopy already started.
> > 
> > For this series, we could always stick with precopy, it could help converge
> > the series.
> 
> To be clear I'm messing with postcopy only because without adding that
> BQL lock around migration_incoming_state_destroy() in
> postcopy_ram_listen_thread() other changes in this patch set would break
> postcopy.
> And that's obviously not acceptable.

Ah, of course.

> 
> > > 
> > > > We could add some explicit check
> > > > in migration code, but I don't expect it to catch anything real, at 
> > > > least
> > > > such never happened since postcopy introduced.. so it's roughly 10 years
> > > > without anything like that happens.
> > > > 
> > > > Taking BQL for migration_incoming_state_destroy() looks all safe.  
> > > > There's
> > > > one qemu_ram_block_writeback() which made me a bit nervous initially, 
> > > > but
> > > > then looks like RAM backends should be almost noop (for shmem and
> > > > hugetlbfs) but except pmem.
> > > 
> > > That's the only part where taking BQL is actually necessary for the
> > > functionality of this patch set to work properly, so it's fine to leave
> > > that call to qemu_loadvm_state_main() as-is (without BQL) for time being.
> > > 
> > > > 
> > > > The other alternative is we define load_cleanup() to not rely on BQL 
> > > > (which
> > > > I believe is true before this series?), then take it only when VFIO's 
> > > > path
> > > > needs it.
> > > 
> > > I think other paths always call load_cleanup() with BQL so it's probably
> > > safer to have consistent semantics here.
> > 
> > IMHO we don't necessarily need to make it the default that vmstate handler
> > hooks will need BQL by default - we can always properly define them to best
> > suite our need.
> 
> But I think consistency is important - if other callers take BQL for
> load_cleanup() then it makes sense to take it in all places (only if to make
> the code simpler).

I assume current QEMU master branch doesn't need bql for all existing (only
RAM and VFIO..) load_cleanup(), or am I wrong?

Thanks,

-- 
Peter Xu


Reply via email to