David Gibson <da...@gibson.dropbear.id.au> wrote: > On Tue, Jul 20, 2021 at 07:30:16AM +0200, Markus Armbruster wrote: > > Right, it may well have been the first usage this way, this fwnmi > stuff isn't super old. > >> >> While this isn't exactly terrible, it may be a weakness in our thinking >> >> and our infrastructure. I'm bringing it up so the people in charge are >> >> aware :) >> > >> > Thanks. >> > >> > It almost feels like they need a way to temporarily hold off >> > 'completion' of migratio - i.e. the phase where we stop the CPU and >> > write the device data; mind you you'd also probably want it to stop >> > cold-migrates/snapshots? >> >> Yes, a proper way to delay 'completion' for a bit would be clearer, and >> wouldn't let -only-migrate interfere. > > Right. If that becomes a thing, we should use it here. Note that > this one use case probably isn't a very strong argument for it, > though. The only problem here is slightly less that optimal error > reporting in a rare edge case (hardware fault occurs by chance at the > same time as a migration). > > > .... and, also, I half-suspect that the whole fwnmi feature exists > more to tick IBM RAS check boxes than because anyone will actually use > it.
Right now the problem is when we broke migration stream. Sometime there is a field that is only needed on rare ocassions (othrewise we would have found it right away). But only thing that we can do now is abort the migration. If we were able to say, try it a little later, we could fix that kind of trouble. That is more or less what you have here. Later, Juan.