Hi Michael,

Michael Ellerman <m...@ellerman.id.au> writes:
> Nathan Lynch <nath...@linux.ibm.com> writes:
>> The partition suspend sequence as specified in the platform
>> architecture requires that all active processor threads call
>> H_JOIN, which:
> ...
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c 
>> b/arch/powerpc/platforms/pseries/mobility.c
>> index 1b8ae221b98a..44ca7d4e143d 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -412,6 +414,128 @@ static int wait_for_vasi_session_suspending(u64 handle)
> ...
>
>> +
>> +static int do_join(void *arg)
>> +{
>> +    atomic_t *counter = arg;
>> +    long hvrc;
>> +    int ret;
>> +
>> +    /* Must ensure MSR.EE off for H_JOIN. */
>> +    hard_irq_disable();
>
> Didn't stop_machine() already do that for us?
>
> In the state machine in multi_cpu_stop().

Yes, but I didn't want to rely on something that seems like an
implementation detail of stop_machine(). I assumed it's benign and in
keeping with hard_irq_disable()'s intended semantics to make multiple
calls to it within a critical section.

I don't feel strongly about this though. If I remove this
hard_irq_disable() I'll modify the comment to indicate that this
function relies on stop_machine() to satisfy this condition for H_JOIN.


>> +    hvrc = plpar_hcall_norets(H_JOIN);
>> +
>> +    switch (hvrc) {
>> +    case H_CONTINUE:
>> +            /*
>> +             * All other CPUs are offline or in H_JOIN. This CPU
>> +             * attempts the suspend.
>> +             */
>> +            ret = do_suspend();
>> +            break;
>> +    case H_SUCCESS:
>> +            /*
>> +             * The suspend is complete and this cpu has received a
>> +             * prod.
>> +             */
>> +            ret = 0;
>> +            break;
>> +    case H_BAD_MODE:
>> +    case H_HARDWARE:
>> +    default:
>> +            ret = -EIO;
>> +            pr_err_ratelimited("H_JOIN error %ld on CPU %i\n",
>> +                               hvrc, smp_processor_id());
>> +            break;
>> +    }
>> +
>> +    if (atomic_inc_return(counter) == 1) {
>> +            pr_info("CPU %u waking all threads\n", smp_processor_id());
>> +            prod_others();
>> +    }
>
> Do we even need the counter? IIUC only one CPU receives H_CONTINUE. So
> couldn't we just have that CPU do the prodding of others?

CPUs may exit H_JOIN due to system reset interrupt at any time, and
H_JOIN may return H_HARDWARE to a caller after other CPUs have entered
the join state successfully. In these cases the counter ensures exactly
one thread performs the prod sequence.

>
>> +    /*
>> +     * Execution may have been suspended for several seconds, so
>> +     * reset the watchdog.
>> +     */
>> +    touch_nmi_watchdog();
>> +    return ret;
>> +}
>> +
>> +static int pseries_migrate_partition(u64 handle)
>> +{
>> +    atomic_t counter = ATOMIC_INIT(0);
>> +    int ret;
>> +
>> +    ret = wait_for_vasi_session_suspending(handle);
>> +    if (ret)
>> +            goto out;
>
> Direct return would be clearer IMHO.

OK, I can change this.

Reply via email to