Re: [PATCH] cpu: Add starts_halted() method

Thiago Jung Bauermann Fri, 10 Jul 2020 13:17:46 -0700


Alex Bennée <alex.ben...@linaro.org> writes:


> Thiago Jung Bauermann <bauer...@linux.ibm.com> writes:
>
>> Eduardo Habkost <ehabk...@redhat.com> writes:
>>
>>> On Wed, Jul 08, 2020 at 09:11:55PM +0100, Peter Maydell wrote:
>>>> On Wed, 8 Jul 2020 at 18:36, Eduardo Habkost <ehabk...@redhat.com> wrote:
>>>> >
>>>> > On Wed, Jul 08, 2020 at 06:09:49PM +0100, Peter Maydell wrote:
>>>> > > Exactly. It appears that there's a bug in our mechanisms,
>>>> > > which is why I'm suggesting that the right thing is
>>>> > > to fix that bug rather than marking the CPU as halted
>>>> > > earlier in the reset process so that the KVM_RUN happens
>>>> > > to do nothing...
>>>> >
>>>> > I agree this is necessary, but it doesn't seem sufficient.
>>>> >
>>>> > Having cpu_reset() set halted=0 on spapr (and probably other
>>>> > machines) is also a bug, as it could still trigger unwanted
>>>> > KVM_RUN when cpu_reset() returns (and before machine code sets
>>>> > halted=1).
>>>>
>>>> The Arm handling of starting-halted sets halted=1 within cpu_reset,
>>>> based on whether the CPU object was created with a
>>>> "start-powered-off" property.
>>>
>>> Making this mechanism generic sounds like a good idea.
>>
>> I'll take a stab at doing that and using it for the spapr machine.
>>
>>>> I'm not sure in practice that anything can get in asynchronously
>>>> and cause a KVM_RUN in between spapr_reset_vcpu() calling
>>>> cpu_reset() and it setting cs->halted (and the other stuff),
>>>> though. This function ought to be called with the iothread
>>>> lock held, so KVM_RUN will only happen if it calls some
>>>> other function which incorrectly lets the CPU run.
>>>
>>> Yeah, maybe it won't happen in practice.  It just seems fragile.
>>> The same way ppc_cpu_reset() kicked the CPU by accident, code
>>> outside cpu_reset() might one day kick the CPU by accident before
>>> setting halted=1.
>>
>> I'm seeing the vcpu being KVM_RUN'd too early twice during hotplug.
>> Both of them are before cpu_reset() and ppc_cpu_reset().
>>
>> Here's the backtrace for the first of them (redacted for clarity):
>>
>> #0  in cpu_resume ()
>> #1  in cpu_common_realizefn ()
>> #2  in ppc_cpu_realize ()
>> #3  in device_set_realized ()
>> #4  in property_set_bool ()
>> #5  in object_property_set ()
>> #6  in object_property_set_qobject ()
>> #7  in object_property_set_bool ()
>> #8  in qdev_realize ()
> <snip>
>> #18 in qmp_device_add ()
>
> Is this a hotplug event?

Yes, the way I reproduce the problem is starting a pseries guest with
`-smp 2,maxcpus=32,sockets=1,cores=16,threads=2` and then use qmp-shell to
send the command:

device_add id=device-2 driver=host-spapr-cpu-core core-id=2 node-id=0

>> Here's the second:
>>
>> #0  in qemu_cpu_kick_thread ()
>> #1  in qemu_cpu_kick ()
>> #2  in queue_work_on_cpu ()
>> #3  in async_run_on_cpu ()
>> #4  in tlb_flush_by_mmuidx ()
>> #5  in tlb_flush ()
>> #6  in ppc_tlb_invalidate_all ()
>
> FWIW tcg_flush_softmmu_tlb handles a tlb_flush in the common reset code.

Ok, maybe KVM should be doing that too? Or maybe it does but pseries
isn't relying on it. I'll dig further.

--
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH] cpu: Add starts_halted() method

Reply via email to