Hello Gautham,
Thanks for your review. Gautham R Shenoy <e...@linux.vnet.ibm.com> writes: > Hello Thiago, > > On Fri, Feb 22, 2019 at 07:57:52PM -0300, Thiago Jung Bauermann wrote: >> I see two cases that can be causing this race: >> >> 1. It's possible that CPU 134 was inactive at the time it was unplugged. In >> that case, dlpar_offline_cpu() calls H_PROD on that CPU and immediately >> calls pseries_cpu_die(). Meanwhile, the prodded CPU activates and start >> the process of stopping itself. It's possible that the busy loop is not >> long enough to allow for the CPU to wake up and complete the stopping >> process. > > The problem is a bit more severe since, after printing "Querying > DEAD?" for CPU X, this CPU can prod another offline CPU Y on the same > core which, on waking up, will call rtas_stop_self. Thus we can have two > concurrent calls to rtas-stop-self, which is prohibited by the PAPR. Inded, very good point. I added this information to the patch description. >> 2. If CPU 134 was online at the time it was unplugged, it would have gone >> through the new CPU hotplug state machine in kernel/cpu.c that was >> introduced in v4.6 to get itself stopped. It's possible that the busy >> loop in pseries_cpu_die() was long enough for the older hotplug code but >> not for the new hotplug state machine. > > I haven't been able to observe the "Querying DEAD?" messages for the > online CPU which was being offlined and dlpar'ed out. Ah, thanks for pointing this out. That was a scenario I thought could happen when I was investigating this issue but I never confirmed whether it could really happen. I removed it from the patch description. >> I don't know if this race condition has any ill effects, but we can make >> the race a lot more even if we only start querying if the CPU is stopped >> when the stopping CPU is close to call rtas_stop_self(). >> >> Since pseries_mach_cpu_die() sets the CPU current state to offline almost >> immediately before calling rtas_stop_self(), we use that as a signal that >> it is either already stopped or very close to that point, and we can start >> the busy loop. >> >> As suggested by Michael Ellerman, this patch also changes the busy loop to >> wait for a fixed amount of wall time. >> >> Signed-off-by: Thiago Jung Bauermann <bauer...@linux.ibm.com> >> --- >> arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> I tried to estimate good amounts for the timeout and loop delays, but >> I'm not sure how reasonable my numbers are. The busy loops will wait for >> 100 µs between each try, and spin_event_timeout() will timeout after >> 100 ms. I'll be happy to change these values if you have better >> suggestions. > > Based on the measurements that I did on a POWER9 system, in successful > cases of smp_query_cpu_stopped(cpu) returning affirmative, the maximum > time spent inside the loop was was 10ms. That's very good to know. I added this information to the patch description. I also added you in an Analyzed-by tag, I hope it's fine with you. >> Gautham was able to test this patch and it solved the race condition. >> >> v1 was a cruder patch which just increased the number of loops: >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/153734.html >> >> v1 also mentioned a kernel crash but Gautham narrowed it down to a bug >> in RTAS, which is in the process of being fixed. >> >> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c >> b/arch/powerpc/platforms/pseries/hotplug-cpu.c >> index 97feb6e79f1a..424146cc752e 100644 >> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c >> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c >> @@ -214,13 +214,21 @@ static void pseries_cpu_die(unsigned int cpu) >> msleep(1); >> } >> } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { >> + /* >> + * If the current state is not offline yet, it means that the >> + * dying CPU (which is in pseries_mach_cpu_die) didn't have a >> + * chance to call rtas_stop_self yet and therefore it's too >> + * early to query if the CPU is stopped. >> + */ >> + spin_event_timeout(get_cpu_current_state(cpu) == >> CPU_STATE_OFFLINE, >> + 100000, 100); >> >> for (tries = 0; tries < 25; tries++) { > > Can we bumped up the tries to 100, so that we wait for 10ms before > printing the warning message ? Good idea. I increased the loop to 200 iterations so that it can take up to 20 ms, just to be sure. >> cpu_status = smp_query_cpu_stopped(pcpu); >> if (cpu_status == QCSS_STOPPED || >> cpu_status == QCSS_HARDWARE_ERROR) >> break; >> - cpu_relax(); >> + udelay(100); >> } >> } >> -- Thiago Jung Bauermann IBM Linux Technology Center