Nathan Lynch <nath...@linux.ibm.com> writes:
> Michael Ellerman <m...@ellerman.id.au> writes:
>> Daniel Henrique Barboza <danielhb...@gmail.com> writes:
>>> This is enough to say that we can't easily see the history behind this 
>>> comment.
>>> I also believe that we're better of without it since it doesn't make sense
>>> with the current codebase.
>>
>> It was added by the original CPU hotplug commit for ppc64::
>>
>> https://github.com/mpe/linux-fullhistory/commit/0e9fd9441cd2113b67b14e739267c9e69761489b
>>
>>
>> The code was fairly similar:
>>
>> void __cpu_die(unsigned int cpu)
>> {
>>      int tries;
>>      int cpu_status;
>>      unsigned int pcpu = get_hard_smp_processor_id(cpu);
>>
>>      for (tries = 0; tries < 5; tries++) {
>>              cpu_status = query_cpu_stopped(pcpu);
>>
>>              if (cpu_status == 0)
>>                      break;
>>              set_current_state(TASK_UNINTERRUPTIBLE);
>>              schedule_timeout(HZ);
>>      }
>>      if (cpu_status != 0) {
>>              printk("Querying DEAD? cpu %i (%i) shows %i\n",
>>                     cpu, pcpu, cpu_status);
>>      }
>>
>>      /* Isolation and deallocation are definatly done by
>>       * drslot_chrp_cpu.  If they were not they would be
>>       * done here.  Change isolate state to Isolate and
>>       * change allocation-state to Unusable.
>>       */
>>      paca[cpu].xProcStart = 0;
>>
>>      /* So we can recognize if it fails to come up next time. */
>>      cpu_callin_map[cpu] = 0;
>> }
>>
>>
>> drslot_chrp_cpu() still exists in drmgr:
>>
>>   
>> https://github.com/ibm-power-utilities/powerpc-utils/blob/e798c4a09fbf0fa0f421e624cfa366a6c405c9fe/src/drmgr/drslot_chrp_cpu.c#L406
>>
>>
>> I agree the comment is no longer meaningful and can be removed.
>
> Thanks for providing this background.
>
>> It might be good to then add a comment explaining why we need to set
>> cpu_start = 0.
>
> Sure, I can take that as a follow-up. Or perhaps it should be moved to
> the online path.

Yeah possibly.

>> It's not immediately clear why we need to. When we bring a CPU back
>> online in smp_pSeries_kick_cpu() we ask RTAS to start it and then
>> immediately set cpu_start = 1, ie. there isn't a separate step that sets
>> cpu_start = 1 for hotplugged CPUs.
>
> Hmm I'm not following the distinction you seem to be drawing between
> bringing a CPU back online and a hotplugged CPU. kick_cpu is used in all
> cases AFAIK.

Yeah that wasn't very clear, reading it back I have half confused myself.

At boot we need the paca->cpu_start flag because some CPUs can be
spinning in generic_secondary_common_init, in head_64.S.

ie. they're not in RTAS, they're spinning in kernel code, and the only
thing that stops them coming "online" in the Linux sense is
paca->cpu_start.

You can see that in pseries/smp.c:

static inline int smp_startup_cpu(unsigned int lcpu)
{
        ...
        if (cpumask_test_cpu(lcpu, of_spin_mask))
                /* Already started by OF and sitting in spin loop */
                return 1;


We also hit that case when kexec'ing, where all the secondaries come in
that way.


On the other hand when we offline a CPU, we set paca->cpu_start = 0, in
pseries_cpu_die(), and then we return the CPU to RTAS.

The only way it *should* come back online is via smp_pSeries_kick_cpu(),
which calls smp_startup_cpu() to bring the CPU out of RTAS, and then
smp_pSeries_kick_cpu() immediately sets cpu_start = 1.

So the sequence is:

        CPU goes offline from Linux POV
        paca->cpu_start = 0;
        CPU offline in RTAS
        ...
        CPU brought out of RTAS
        paca->cpu_start = 1;
        CPU comes back online from Linux POV


But I guess I kind of answered my own question above, where I said
*should*. Clearing paca->cpu_start when we offline the CPU gives us a
little bit of backup if the CPU comes out of RTAS unexpectedly. ie. it
would then spin on paca->cpu_start, rather than spontaneously coming
back into Linux when we weren't expecting it.

cheers

Reply via email to