On Mon, Feb 03, 2025 at 10:41:43PM +0100, Maciej S. Szmigiero wrote: > On 3.02.2025 21:36, Peter Xu wrote: > > On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote: > > > On 3.02.2025 20:58, Peter Xu wrote: > > > > On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: > > > > > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: > > > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: > > > > > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: > > > > > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: > > > > > > > > > From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com> > > > > > > > > > > > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it > > > > > > > > > needs to > > > > > > > > > take BQL around function calls to migration methods requiring > > > > > > > > > BQL. > > > > > > > > > > > > > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately > > > > > > > > > calls > > > > > > > > > "load_state" SaveVMHandlers. > > > > > > > > > > > > > > > > > > migration_incoming_state_destroy() needs BQL held since it > > > > > > > > > ultimately calls > > > > > > > > > "load_cleanup" SaveVMHandlers. > > > > > > > > > > > > > > > > > > Signed-off-by: Maciej S. Szmigiero > > > > > > > > > <maciej.szmigi...@oracle.com> > > > > > > > > > --- > > > > > > > > > migration/savevm.c | 4 ++++ > > > > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > > > > > > > index b0b74140daea..0ceea9638cc1 100644 > > > > > > > > > --- a/migration/savevm.c > > > > > > > > > +++ b/migration/savevm.c > > > > > > > > > @@ -2013,7 +2013,9 @@ static void > > > > > > > > > *postcopy_ram_listen_thread(void *opaque) > > > > > > > > > * in qemu_file, and thus we must be blocking now. > > > > > > > > > */ > > > > > > > > > qemu_file_set_blocking(f, true); > > > > > > > > > + bql_lock(); > > > > > > > > > load_res = qemu_loadvm_state_main(f, mis); > > > > > > > > > + bql_unlock(); > > > > > > > > > > > > > > > > Doesn't that leave that held for a heck of a long time? > > > > > > > > > > > > > > Yes, and it effectively broke "postcopy recover" test but I > > > > > > > think the reason for that is qemu_loadvm_state_main() and > > > > > > > its children don't drop BQL while waiting for I/O. > > > > > > > > > > > > > > I've described this case in more detail in my reply to Fabiano > > > > > > > here: > > > > > > > https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/ > > > > > > > > > > > > While it might be the cause in this case, my feeling is it's more > > > > > > fundamental > > > > > > here - it's the whole reason that postcopy has a separate ram listen > > > > > > thread. As the destination is running, after it loads it's devices > > > > > > and as it starts up the destination will be still loading RAM > > > > > > (and other postcopiable devices) potentially for quite a while. > > > > > > Holding the bql around the ram listen thread means that the > > > > > > execution of the destination won't be able to take that lock > > > > > > until the postcopy load has finished; so while that might apparently > > > > > > complete, it'll lead to the destination stalling until that's > > > > > > finished > > > > > > which defeats the whole point of postcopy. > > > > > > That last one probably won't fail a test but it will lead to a long > > > > > > stall > > > > > > if you give it a nice big guest with lots of RAM that it's rapidly > > > > > > changing. > > > > > > > > > > Okay, I understand the postcopy case/flow now. > > > > > Thanks for explaining it clearly. > > > > > > > > > > > > I still think that "load_state" SaveVMHandlers need to be called > > > > > > > with BQL held since implementations apparently expect it that way: > > > > > > > for example, I think PCI device configuration restore calls > > > > > > > address space manipulation methods which abort() if called > > > > > > > without BQL held. > > > > > > > > > > > > However, the only devices that *should* be arriving on the channel > > > > > > that the postcopy_ram_listen_thread is reading from are those > > > > > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). > > > > > > Those load handlers are safe to be run while the other devices > > > > > > are being changed. Note the *should* - you could add a check > > > > > > to fail if any other device arrives on that channel. > > > > > > > > > > I think ultimately there should be either an explicit check, or, > > > > > as you suggest in the paragraph below, a separate SaveVMHandler > > > > > that runs without BQL held. > > > > > > > > To me those are bugs happening during postcopy, so those abort()s in > > > > memory.c are indeed for catching these issues too. > > > > > > > > > Since the current state of just running these SaveVMHandlers > > > > > without BQL in this case and hoping that nothing breaks is > > > > > clearly sub-optimal. > > > > > > > > > > > > I have previously even submitted a patch to explicitly document > > > > > > > "load_state" SaveVMHandler as requiring BQL (which was also > > > > > > > included in the previous version of this patch set) and it > > > > > > > received a "Reviewed-by:" tag: > > > > > > > https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ > > > > > > > https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/ > > > > > > > https://lore.kernel.org/qemu-devel/87o732bti7....@suse.de/ > > > > > > > > > > > > It happens! > > > > > > You could make this safer by having a load_state and a > > > > > > load_state_postcopy > > > > > > member, and only mark the load_state as requiring the lock. > > > > > > > > > > To not digress too much from the subject of this patch set > > > > > (multifd VFIO device state transfer) for now I've just updated the > > > > > TODO comment around that qemu_loadvm_state_main(), so hopefully this > > > > > discussion won't get forgotten: > > > > > https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79 > > > > > > > > The commit message may still need some touch ups, e.g.: > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it needs > > > > to > > > > take BQL around function calls to migration methods requiring BQL. > > > > > > > > > > > > This sentence is still not correct, IMHO. As Dave explained, the ram > > > > load > > > > thread is designed to run without BQL at least for the major workloads > > > > it > > > > runs. > > > > > > So what's your proposed wording of this commit then? > > > > Perhaps dropping it? As either it implies qemu_loadvm_state_main() needs to > > take bql (which could be wrong in case of postcopy at least from > > design.. not sanity check pov), or it provides no real meaning to suggest > > where to take it.. > > > > Personally I would put the comment as easy as possible - the large portion > > isn't helping me to understand the code but only made it slightly more > > confusing.. > > > > /* > > * TODO: qemu_loadvm_state_main() could call "load_state" > > SaveVMHandlers > > * that are expecting BQL to be held, which isn't in this case. > > * > > * In principle, the only devices that should be arriving on this > > channel > > * now are those that are postcopiable and whose load handlers are safe > > * to be called without BQL being held. > > * > > * But nothing currently prevents the source from sending data for > > "unsafe" > > * devices which would cause trouble here. > > */ > > > > IMHO we could put it very simple if you think we need such sanity check > > later: > > > > /* TODO: sanity check that only postcopiable data will be loaded here > > */ > > I think I will change that comment wording to the one ^^^^ you suggested > above, > since we still need to have this commit to take BQL around that > migration_incoming_state_destroy() call in postcopy_ram_listen_thread(). > > > > > > > > I don't worry on src sending something that crashes the dest: if that > > > > happens, that's a bug, we need to fix it.. In that case abort() either > > > > in > > > > memory.c or migration/ would be the same. > > > > > > Yeah, but it would be a bug in the source (or just bit stream corruption > > > for > > > any reason), yet it's the destination which would abort() or crash. > > > > > > I think cases like that in principle should be handled more gracefully, > > > like exiting the destination QEMU with an error. > > > But that's something outside of the scope of this patch set. > > > > Yes I agree. It's just that postcopy normally cannot gracefully quits on > > dest anyway.. as src QEMU cannot continue with a dead dest QEMU. For > > obvious programming errors, I think abort() is still ok in this case, on > > either src or dest if postcopy already started. > > > > For this series, we could always stick with precopy, it could help converge > > the series. > > To be clear I'm messing with postcopy only because without adding that > BQL lock around migration_incoming_state_destroy() in > postcopy_ram_listen_thread() other changes in this patch set would break > postcopy. > And that's obviously not acceptable.
Ah, of course. > > > > > > > > We could add some explicit check > > > > in migration code, but I don't expect it to catch anything real, at > > > > least > > > > such never happened since postcopy introduced.. so it's roughly 10 years > > > > without anything like that happens. > > > > > > > > Taking BQL for migration_incoming_state_destroy() looks all safe. > > > > There's > > > > one qemu_ram_block_writeback() which made me a bit nervous initially, > > > > but > > > > then looks like RAM backends should be almost noop (for shmem and > > > > hugetlbfs) but except pmem. > > > > > > That's the only part where taking BQL is actually necessary for the > > > functionality of this patch set to work properly, so it's fine to leave > > > that call to qemu_loadvm_state_main() as-is (without BQL) for time being. > > > > > > > > > > > The other alternative is we define load_cleanup() to not rely on BQL > > > > (which > > > > I believe is true before this series?), then take it only when VFIO's > > > > path > > > > needs it. > > > > > > I think other paths always call load_cleanup() with BQL so it's probably > > > safer to have consistent semantics here. > > > > IMHO we don't necessarily need to make it the default that vmstate handler > > hooks will need BQL by default - we can always properly define them to best > > suite our need. > > But I think consistency is important - if other callers take BQL for > load_cleanup() then it makes sense to take it in all places (only if to make > the code simpler). I assume current QEMU master branch doesn't need bql for all existing (only RAM and VFIO..) load_cleanup(), or am I wrong? Thanks, -- Peter Xu