On Thu, Jul 22, 2021 at 07:00:56PM +0100, Dr. David Alan Gilbert wrote: > * David Gibson (da...@gibson.dropbear.id.au) wrote: > > On Tue, Jul 20, 2021 at 07:30:16AM +0200, Markus Armbruster wrote: > > > "Dr. David Alan Gilbert" <dgilb...@redhat.com> writes: > > > > > > > * Markus Armbruster (arm...@redhat.com) wrote: > > > >> We appear to use migration blockers in two ways: > > > >> > > > >> (1) Prevent migration for an indefinite time, typically due to use of > > > >> some feature that isn't compatible with migration. > > > >> > > > >> (2) Delay migration for a short time. > > > >> > > > >> Option -only-migrate is designed for (1). It interferes with (2). > > > >> > > > >> Example for (1): device "x-pci-proxy-dev" doesn't support migration. > > > >> It > > > >> adds a migration blocker on realize, and deletes it on unrealize. With > > > >> -only-migrate, device realize fails. Works as designed. > > > >> > > > >> Example for (2): spapr_mce_req_event() makes an effort to prevent > > > >> migration degrate the reporting of FWNMIs. It adds a migration blocker > > > >> when it receives one, and deletes it when it's done handling it. This > > > >> is a best effort; if migration is already in progress by the time FWNMI > > > >> is received, we simply carry on, and that's okay. However, option > > > >> -only-migrate sabotages the best effort entirely. > > > > > > > > That's interesting; it's the first time I've heard of anyone using it as > > > > 'best effort'. I've always regarded blockers as blocking. > > > > > > Me too, until I found this one. > > > > Right, it may well have been the first usage this way, this fwnmi > > stuff isn't super old. > > > > > >> While this isn't exactly terrible, it may be a weakness in our thinking > > > >> and our infrastructure. I'm bringing it up so the people in charge are > > > >> aware :) > > > > > > > > Thanks. > > > > > > > > It almost feels like they need a way to temporarily hold off > > > > 'completion' of migratio - i.e. the phase where we stop the CPU and > > > > write the device data; mind you you'd also probably want it to stop > > > > cold-migrates/snapshots? > > > > > > Yes, a proper way to delay 'completion' for a bit would be clearer, and > > > wouldn't let -only-migrate interfere. > > > > Right. If that becomes a thing, we should use it here. Note that > > this one use case probably isn't a very strong argument for it, > > though. The only problem here is slightly less that optimal error > > reporting in a rare edge case (hardware fault occurs by chance at the > > same time as a migration). > > Can you at least put a scary comment in to say why it's so odd. > > If you wanted a choice of a different bad way to do this, since you have > savevm_htab_handlers, you might be able to make htab_save_iterate claim > there's always more to do.
That would only work if the hash MMU is in use, which won't be the case with most current systems. > > .... and, also, I half-suspect that the whole fwnmi feature exists > > more to tick IBM RAS check boxes than because anyone will actually use > > it. > > Ah at least it's always reliable.... > > Dave > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature