> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, August 14, 2018 9:07 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>; Dr. David Alan Gilbert
> <dgilb...@redhat.com>
> Cc: qemu-devel@nongnu.org; quint...@redhat.com; wangxin (U)
> <wangxinxin.w...@huawei.com>
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> On 14/08/2018 15:03, Zhoujian (jay) wrote:
> >> -----Original Message-----
> >> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> >> Sent: Tuesday, August 14, 2018 8:02 PM
> >> To: Dr. David Alan Gilbert <dgilb...@redhat.com>
> >> Cc: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org;
> >> quint...@redhat.com; wangxin (U) <wangxinxin.w...@huawei.com>
> >> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> >>
> >> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
> >>>  a) Should the watchdog expire when the VM is stopped; I think it
> >>> shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so
> >>> is the bug here that the watchdog being used didn't use a virtual timer?
> >>
> >> All watchdogs do.
> >>
> >>>  b) If the watchdog expires just before the VM gets stopped, is
> >>> there a race which could hit this?  Possibly.
> >>
> >> Yes, I think it is a race that happens just before vm_stop, but I
> >> don't understand why the "qemu_clock_enable" in pause_all_vcpus does not
> prevent it.
> >
> > Hi Paolo,
> > The sequence is like this I think
> >
> >          |
> >          |  <-----  watchdog expired, which set reset_requested to
> SHUTDOWN_CAUSE_GUEST_RESET
> >          |
> >          |  <-----  migration thread sets to RUN_STATE_FINISH_MIGRATE, it
> will disable QEMU_CLOCK_VIRTUAL clock,
> >          |          but it is done after the setting of reset_requested
> 
> So the fix would be to process the reset request here?  (In do_vm_stop or
> pause_all_vcpus).  The code is currently in main_loop_should_exit().

After a second thought, I think it should keep the reset request process
in main_loop_should_exit(), since pause_all_vcpus(or do_vm_stop) is not in
a loop, it can't detect all the reset requests immediately.
If processing the reset request both in main_loop_should_exit() and do_vm_stop
or pause_all_vcpus will lead to a race of referencing to the global variable
'reset_requested'.
Could we add the check !runstate_check(RUN_STATE_FINISH_MIGRATE) before setting
to RUN_STATE_PRELAUNCH, just like !runstate_check(RUN_STATE_RUNNING) and
!runstate_check(RUN_STATE_INMIGRATE) did? But I'm not sure whether this will
cause any side effect.

Regards,
Jay Zhou

> 
> Paolo
> 
> >          |  <-----  main loop thread sets to RUN_STATE_PRELAUNCH since it
> detected a reset request
> >          |
> >          |  <-----  migration thread sets to RUN_STATE_POSTMIGRATE
> >
> >
> > Regards,
> > Jay Zhou
> >
> >>
> >> It should be possible to write a deterministic testcase with qtest...
> >>
> >> Paolo

Reply via email to