On Mon, 10 Dec 2018 at 11:06, Alex Bennée <alex.ben...@linaro.org> wrote: > > > Peter Maydell <peter.mayd...@linaro.org> writes: > > We discussed this a little while back: > > https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00154.html > > and Jaap reported a bug which I suspect of being the same thing: > > https://lists.gnu.org/archive/html/qemu-discuss/2018-10/msg00014.html > > > > Annoyingly I have lost the test case that demonstrated this > > race, but I analysed it at the time and this should definitely > > fix it. I have opted not to try to address any of the other > > possible cleanup here (eg vm_stop() has a potential similar > > race if called from a vcpu thread I suspect), since it gets > > pretty tangled. > > > > Jaap: could you test whether this patch fixes the issue you > > were seeing, please? > > --- > > cpus.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/cpus.c b/cpus.c > > index 0ddeeefc14f..b09b7027126 100644 > > --- a/cpus.c > > +++ b/cpus.c > > @@ -2100,7 +2100,8 @@ void qemu_init_vcpu(CPUState *cpu) > > void cpu_stop_current(void) > > { > > if (current_cpu) { > > - qemu_cpu_stop(current_cpu, true); > > + current_cpu->stop = true; > > + cpu_exit(current_cpu); > > Should the FIXME in vm_stop also be fixed? > > /* > * FIXME: should not return to device code in case > * vm_stop() has been requested. > */ > cpu_stop_current(); > return 0;
This is one of the things I had in mind with: > > I have opted not to try to address any of the other > > possible cleanup here (eg vm_stop() has a potential similar > > race if called from a vcpu thread I suspect), since it gets > > pretty tangled. though I might actually have meant pause_all_vcpus(). (For pause_all_vcpus() I think the correct thing is to fix the hw/i386/kvmvapic.c code to work in some other way, and then assert that pause_all_vcpus() is never called from the VCPU thread.) At any rate, this code is quite complicated, so I think it's worth just fixing this specific race without getting tangled up in everything else we could potentially refactor. I'm not even sure how we would arrange for vm_stop() to avoid returning to device emulation code if it has been called from there -- I would favour instead changing/defining the semantics to be like the shutdown-request and reset-request where the device code expects that control will return but the VM stop happens at the next opportunity, ie as soon as the execution of the insn which wrote to the device register has completed. thanks -- PMM