On Fri, 2019-08-02 at 19:29:24 UTC, Nathan Lynch wrote:
> The LPAR migration implementation and userspace-initiated cpu hotplug
> can interleave their executions like so:
> 
> 1. Set cpu 7 offline via sysfs.
> 
> 2. Begin a partition migration, whose implementation requires the OS
>    to ensure all present cpus are online; cpu 7 is onlined:
> 
>      rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
> 
>    This sets cpu 7 online in all respects except for the cpu's
>    corresponding struct device; dev->offline remains true.
> 
> 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
>    already online and returns success. The driver core (device_online)
>    sets dev->offline = false.
> 
> 4. The migration completes and restores cpu 7 to offline state:
> 
>      rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
> 
> This leaves cpu7 in a state where the driver core considers the cpu
> device online, but in all other respects it is offline and
> unused. Attempts to online the cpu via sysfs appear to succeed but the
> driver core actually does not pass the request to the lower-level
> cpuhp support code. This makes the cpu unusable until the cpu device
> is manually set offline and then online again via sysfs.
> 
> Instead of directly calling cpu_up/cpu_down, the migration code should
> use the higher-level device core APIs to maintain consistent state and
> serialize operations.
> 
> Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to 
> migration/hibernation")
> Signed-off-by: Nathan Lynch <nath...@linux.ibm.com>
> Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a6717c01ddc259f6f73364779df058e2c67309f8

cheers

Reply via email to