Re: [PATCH] cpu: Add starts_halted() method

Thiago Jung Bauermann Wed, 08 Jul 2020 20:06:16 -0700


Eduardo Habkost <ehabk...@redhat.com> writes:


> On Wed, Jul 08, 2020 at 09:11:55PM +0100, Peter Maydell wrote:
>> On Wed, 8 Jul 2020 at 18:36, Eduardo Habkost <ehabk...@redhat.com> wrote:
>> >
>> > On Wed, Jul 08, 2020 at 06:09:49PM +0100, Peter Maydell wrote:
>> > > Exactly. It appears that there's a bug in our mechanisms,
>> > > which is why I'm suggesting that the right thing is
>> > > to fix that bug rather than marking the CPU as halted
>> > > earlier in the reset process so that the KVM_RUN happens
>> > > to do nothing...
>> >
>> > I agree this is necessary, but it doesn't seem sufficient.
>> >
>> > Having cpu_reset() set halted=0 on spapr (and probably other
>> > machines) is also a bug, as it could still trigger unwanted
>> > KVM_RUN when cpu_reset() returns (and before machine code sets
>> > halted=1).
>>
>> The Arm handling of starting-halted sets halted=1 within cpu_reset,
>> based on whether the CPU object was created with a
>> "start-powered-off" property.
>
> Making this mechanism generic sounds like a good idea.

I'll take a stab at doing that and using it for the spapr machine.

>> I'm not sure in practice that anything can get in asynchronously
>> and cause a KVM_RUN in between spapr_reset_vcpu() calling
>> cpu_reset() and it setting cs->halted (and the other stuff),
>> though. This function ought to be called with the iothread
>> lock held, so KVM_RUN will only happen if it calls some
>> other function which incorrectly lets the CPU run.
>
> Yeah, maybe it won't happen in practice.  It just seems fragile.
> The same way ppc_cpu_reset() kicked the CPU by accident, code
> outside cpu_reset() might one day kick the CPU by accident before
> setting halted=1.

I'm seeing the vcpu being KVM_RUN'd too early twice during hotplug.
Both of them are before cpu_reset() and ppc_cpu_reset().

Here's the backtrace for the first of them (redacted for clarity):

#0  in cpu_resume ()
#1  in cpu_common_realizefn ()
#2  in ppc_cpu_realize ()
#3  in device_set_realized ()
#4  in property_set_bool ()
#5  in object_property_set ()
#6  in object_property_set_qobject ()
#7  in object_property_set_bool ()
#8  in qdev_realize ()
#9  in spapr_realize_vcpu ()
#10 in spapr_cpu_core_realize ()
#11 in device_set_realized ()
#12 in property_set_bool ()
#13 in object_property_set ()
#14 in object_property_set_qobject ()
#15 in object_property_set_bool ()
#16 in qdev_realize ()
#17 in qdev_device_add ()
#18 in qmp_device_add ()

Here's the second:

#0  in qemu_cpu_kick_thread ()
#1  in qemu_cpu_kick ()
#2  in queue_work_on_cpu ()
#3  in async_run_on_cpu ()
#4  in tlb_flush_by_mmuidx ()
#5  in tlb_flush ()
#6  in ppc_tlb_invalidate_all ()
#7  in ppc_cpu_reset ()
#8  in device_transitional_reset ()
#9  in resettable_phase_hold ()
#10 in resettable_assert_reset ()
#11 in device_set_realized ()
#12 in property_set_bool ()
#13 in object_property_set ()
#14 in object_property_set_qobject ()
#15 in object_property_set_bool ()
#16 in qdev_realize ()
#17 in spapr_realize_vcpu ()
#18 in spapr_cpu_core_realize ()
#19 in device_set_realized ()
#20 in property_set_bool ()
#21 in object_property_set ()
#22 in object_property_set_qobject ()
#23 in object_property_set_bool ()
#24 in qdev_realize ()
#25 in qdev_device_add ()
#26 in qmp_device_add ()

Looking closely, both of them ultimately stem from the
qdev_realize(DEVICE(cpu), ...) call in spapr_realize_vcpu(). Is there
something wrong with that? I don't know anything about the QEMU device
model to be able to tell.

One other way I found to avoid the spurious KVM_RUN calls is to remove
the cpu_resume() call in cpu_common_realizefn(), which to me seems to
be placed way too early in the CPU hotplug path. Simply removing it
makes CPU hotplug stop working though. :-)  I still have to see if I can
find a better place for it...

--
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH] cpu: Add starts_halted() method

Reply via email to