suspend

Michael Ellerman Sat, 05 Dec 2020 03:03:16 -0800

Nathan Lynch <nath...@linux.ibm.com> writes:
> Hi Michael,
>
> Michael Ellerman <m...@ellerman.id.au> writes:
>> Nathan Lynch <nath...@linux.ibm.com> writes:
>>> The partition suspend sequence as specified in the platform
>>> architecture requires that all active processor threads call
>>> H_JOIN, which:
>> ...
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c 
>>> b/arch/powerpc/platforms/pseries/mobility.c
>>> index 1b8ae221b98a..44ca7d4e143d 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -412,6 +414,128 @@ static int wait_for_vasi_session_suspending(u64 
>>> handle)
>> ...
>>
>>> +
>>> +static int do_join(void *arg)
>>> +{
>>> +   atomic_t *counter = arg;
>>> +   long hvrc;
>>> +   int ret;
>>> +
>>> +   /* Must ensure MSR.EE off for H_JOIN. */
>>> +   hard_irq_disable();
>>
>> Didn't stop_machine() already do that for us?
>>
>> In the state machine in multi_cpu_stop().
>
> Yes, but I didn't want to rely on something that seems like an
> implementation detail of stop_machine(). I assumed it's benign and in
> keeping with hard_irq_disable()'s intended semantics to make multiple
> calls to it within a critical section.


OK. I think it's part of the contract of stop_machine() these days, but
you're right hard_irq_disable() can be called multiple times, so we may
as well leave it there as insurance/documentation.

>>> +   hvrc = plpar_hcall_norets(H_JOIN);
>>> +
>>> +   switch (hvrc) {
>>> +   case H_CONTINUE:
>>> +           /*
>>> +            * All other CPUs are offline or in H_JOIN. This CPU
>>> +            * attempts the suspend.
>>> +            */
>>> +           ret = do_suspend();
>>> +           break;
>>> +   case H_SUCCESS:
>>> +           /*
>>> +            * The suspend is complete and this cpu has received a
>>> +            * prod.
>>> +            */
>>> +           ret = 0;
>>> +           break;
>>> +   case H_BAD_MODE:
>>> +   case H_HARDWARE:
>>> +   default:
>>> +           ret = -EIO;
>>> +           pr_err_ratelimited("H_JOIN error %ld on CPU %i\n",
>>> +                              hvrc, smp_processor_id());
>>> +           break;
>>> +   }
>>> +
>>> +   if (atomic_inc_return(counter) == 1) {
>>> +           pr_info("CPU %u waking all threads\n", smp_processor_id());
>>> +           prod_others();
>>> +   }
>>
>> Do we even need the counter? IIUC only one CPU receives H_CONTINUE. So
>> couldn't we just have that CPU do the prodding of others?
>
> CPUs may exit H_JOIN due to system reset interrupt at any time, and
> H_JOIN may return H_HARDWARE to a caller after other CPUs have entered
> the join state successfully. In these cases the counter ensures exactly
> one thread performs the prod sequence.

OK.

>>> +   /*
>>> +    * Execution may have been suspended for several seconds, so
>>> +    * reset the watchdog.
>>> +    */
>>> +   touch_nmi_watchdog();
>>> +   return ret;
>>> +}
>>> +
>>> +static int pseries_migrate_partition(u64 handle)
>>> +{
>>> +   atomic_t counter = ATOMIC_INIT(0);
>>> +   int ret;
>>> +
>>> +   ret = wait_for_vasi_session_suspending(handle);
>>> +   if (ret)
>>> +           goto out;
>>
>> Direct return would be clearer IMHO.
>
> OK, I can change this.

Thanks.

cheers

Re: [PATCH 13/29] powerpc/pseries/mobility: use stop_machine for join/suspend

Reply via email to