Hello Gautham,

Thanks for your review.

Gautham R Shenoy <e...@linux.vnet.ibm.com> writes:

> Hello Thiago,
>
> On Fri, Feb 22, 2019 at 07:57:52PM -0300, Thiago Jung Bauermann wrote:
>> I see two cases that can be causing this race:
>>
>> 1. It's possible that CPU 134 was inactive at the time it was unplugged. In
>>    that case, dlpar_offline_cpu() calls H_PROD on that CPU and immediately
>>    calls pseries_cpu_die(). Meanwhile, the prodded CPU activates and start
>>    the process of stopping itself. It's possible that the busy loop is not
>>    long enough to allow for the CPU to wake up and complete the stopping
>>    process.
>
> The problem is a bit more severe since, after printing "Querying
> DEAD?" for CPU X, this CPU can prod another offline CPU Y on the same
> core which, on waking up, will call rtas_stop_self. Thus we can have two
> concurrent calls to rtas-stop-self, which is prohibited by the PAPR.

Inded, very good point. I added this information to the patch
description.

>> 2. If CPU 134 was online at the time it was unplugged, it would have gone
>>    through the new CPU hotplug state machine in kernel/cpu.c that was
>>    introduced in v4.6 to get itself stopped. It's possible that the busy
>>    loop in pseries_cpu_die() was long enough for the older hotplug code but
>>    not for the new hotplug state machine.
>
> I haven't been able to observe the "Querying DEAD?" messages for the
> online CPU which was being offlined and dlpar'ed out.

Ah, thanks for pointing this out. That was a scenario I thought could
happen when I was investigating this issue but I never confirmed whether
it could really happen. I removed it from the patch description.

>> I don't know if this race condition has any ill effects, but we can make
>> the race a lot more even if we only start querying if the CPU is stopped
>> when the stopping CPU is close to call rtas_stop_self().
>>
>> Since pseries_mach_cpu_die() sets the CPU current state to offline almost
>> immediately before calling rtas_stop_self(), we use that as a signal that
>> it is either already stopped or very close to that point, and we can start
>> the busy loop.
>>
>> As suggested by Michael Ellerman, this patch also changes the busy loop to
>> wait for a fixed amount of wall time.
>>
>> Signed-off-by: Thiago Jung Bauermann <bauer...@linux.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> I tried to estimate good amounts for the timeout and loop delays, but
>> I'm not sure how reasonable my numbers are. The busy loops will wait for
>> 100 µs between each try, and spin_event_timeout() will timeout after
>> 100 ms. I'll be happy to change these values if you have better
>> suggestions.
>
> Based on the measurements that I did on a POWER9 system, in successful
> cases of smp_query_cpu_stopped(cpu) returning affirmative, the maximum
> time spent inside the loop was was 10ms.

That's very good to know. I added this information to the patch
description.

I also added you in an Analyzed-by tag, I hope it's fine with you.

>> Gautham was able to test this patch and it solved the race condition.
>>
>> v1 was a cruder patch which just increased the number of loops:
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/153734.html
>>
>> v1 also mentioned a kernel crash but Gautham narrowed it down to a bug
>> in RTAS, which is in the process of being fixed.
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
>> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 97feb6e79f1a..424146cc752e 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -214,13 +214,21 @@ static void pseries_cpu_die(unsigned int cpu)
>>                      msleep(1);
>>              }
>>      } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
>> +            /*
>> +             * If the current state is not offline yet, it means that the
>> +             * dying CPU (which is in pseries_mach_cpu_die) didn't have a
>> +             * chance to call rtas_stop_self yet and therefore it's too
>> +             * early to query if the CPU is stopped.
>> +             */
>> +            spin_event_timeout(get_cpu_current_state(cpu) == 
>> CPU_STATE_OFFLINE,
>> +                               100000, 100);
>>
>>              for (tries = 0; tries < 25; tries++) {
>
> Can we bumped up the tries to 100, so that we wait for 10ms before
> printing the warning message ?

Good idea. I increased the loop to 200 iterations so that it can take up
to 20 ms, just to be sure.

>>                      cpu_status = smp_query_cpu_stopped(pcpu);
>>                      if (cpu_status == QCSS_STOPPED ||
>>                          cpu_status == QCSS_HARDWARE_ERROR)
>>                              break;
>> -                    cpu_relax();
>> +                    udelay(100);
>>              }
>>      }
>>


--
Thiago Jung Bauermann
IBM Linux Technology Center

Reply via email to