On Tue, 21 Jan 2020 14:41:26 +1100 David Gibson <da...@gibson.dropbear.id.au> wrote:
> On Wed, Jan 15, 2020 at 07:10:47PM +0100, Cédric Le Goater wrote: > > On 1/15/20 6:48 PM, Greg Kurz wrote: > > > Migration can potentially race with CAS reboot. If the migration thread > > > completes migration after CAS has set spapr->cas_reboot but before the > > > mainloop could pick up the reset request and reset the machine, the > > > guest is migrated unrebooted and the destination doesn't reboot it > > > either because it isn't aware a CAS reboot was needed (eg, because a > > > device was added before CAS). This likely result in a broken or hung > > > guest. > > > > > > Even if it is small, the window between CAS and CAS reboot is enough to > > > re-qualify spapr->cas_reboot as state that we should migrate. Add a new > > > subsection for that and always send it when a CAS reboot is pending. > > > This may cause migration to older QEMUs to fail but it is still better > > > than end up with a broken guest. > > > > > > The destination cannot honour the CAS reboot request from a post load > > > handler because this must be done after the guest is fully restored. > > > It is thus done from a VM change state handler. > > > > > > Reported-by: Lukáš Doktor <ldok...@redhat.com> > > > Signed-off-by: Greg Kurz <gr...@kaod.org> > > > > Cédric Le Goater <c...@kaod.org> > > > > Nice work ! That was quite complex to catch ! > > It is a very nice analysis. However, I'm disinclined to merge this > for the time being. > > My preferred approach would be to just eliminate CAS reboots > altogether, since that has other benefits. I'm feeling like this > isn't super-urgent, since CAS reboots are extremely rare in practice, > now that we've eliminated the one for the irq switchover. > Yeah. The only _true_ need for CAS rebooting now seems to be hotplug before CAS, which is likely not something frequent. > However, if it's not looking like we'll be ready to do that as the > qemu-5.0 release approaches, then I'll be more than willing to > reconsider this. > I hope we can drop CAS reboot in time.
pgp3KJKQ24sp6.pgp
Description: OpenPGP digital signature