Thiago Jung Bauermann's on April 18, 2019 11:00 am: > > Hello Nick, > > Thank you very much for reviewing this patch! > > Nicholas Piggin <npig...@gmail.com> writes: > >> Thiago Jung Bauermann's on April 11, 2019 9:08 am: >>> >>> Thiago Jung Bauermann <bauer...@linux.ibm.com> writes: >>> >>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c >>>> b/arch/powerpc/platforms/pseries/hotplug-cpu.c >>>> index 97feb6e79f1a..ac6dc35ab829 100644 >>>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c >>>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c >>>> @@ -214,13 +214,22 @@ static void pseries_cpu_die(unsigned int cpu) >>>> msleep(1); >>>> } >>>> } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { >>>> + /* >>>> + * If the current state is not offline yet, it means that the >>>> + * dying CPU (which is either in pseries_mach_cpu_die() or in >>>> + * the process of getting there) didn't have a chance yet to >>>> + * call rtas_stop_self() and therefore it's too early to query >>>> + * if the CPU is stopped. >>>> + */ >>>> + spin_event_timeout(get_cpu_current_state(cpu) == >>>> CPU_STATE_OFFLINE, >>>> + 100000, 100); >> >> If the CPU state does not go to offline here, you should give up and >> return online, right? Otherwise I think query-cpu-stopped-state can >> get confused by CPUs in idle and you get a false positive. > > Can it get confused? My impression from reading the definition for > query-cpu-stopped-state in the PAPR is that it will simply return a > CPU_status value of 2 in that case, meaning that "the processor thread > is not in the RTAS stopped state", but I don't know much about this.
In QEMU (non-KVM) mode, qcss I think may get confused between H_CEDE and rtas-stop-self. KVM mode may be okay because H_CEDE is handled in the kernel. >> That race can still happen, we would really need a sequence count check >> over current CPU state to ensure we got a race-free qcss value, but at >> least a check here should make the race implausible to hit. > > Actually, since rtas_stop_self() panics if the processor fails to stop > and also since callers of pseries_cpu_die()ยน already assume that it is > going to succeed in stopping the CPU (given that the function returns > void and can't signal an error), a more straightforward way of > eliminating the race is to simply do this: > > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c > b/arch/powerpc/platforms/pseries/hotplug-cpu.c > index 97feb6e79f1a..2331a609f48f 100644 > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > @@ -215,7 +215,7 @@ static void pseries_cpu_die(unsigned int cpu) > } > } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { > > - for (tries = 0; tries < 25; tries++) { > + while (true) { > cpu_status = smp_query_cpu_stopped(pcpu); > if (cpu_status == QCSS_STOPPED || > cpu_status == QCSS_HARDWARE_ERROR) > > > What do you think? Yeah I think that may be a good idea, just makes things much simpler. Thanks, Nick