On Fri, 2019-08-02 at 19:29:24 UTC, Nathan Lynch wrote: > The LPAR migration implementation and userspace-initiated cpu hotplug > can interleave their executions like so: > > 1. Set cpu 7 offline via sysfs. > > 2. Begin a partition migration, whose implementation requires the OS > to ensure all present cpus are online; cpu 7 is onlined: > > rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up > > This sets cpu 7 online in all respects except for the cpu's > corresponding struct device; dev->offline remains true. > > 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is > already online and returns success. The driver core (device_online) > sets dev->offline = false. > > 4. The migration completes and restores cpu 7 to offline state: > > rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down > > This leaves cpu7 in a state where the driver core considers the cpu > device online, but in all other respects it is offline and > unused. Attempts to online the cpu via sysfs appear to succeed but the > driver core actually does not pass the request to the lower-level > cpuhp support code. This makes the cpu unusable until the cpu device > is manually set offline and then online again via sysfs. > > Instead of directly calling cpu_up/cpu_down, the migration code should > use the higher-level device core APIs to maintain consistent state and > serialize operations. > > Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to > migration/hibernation") > Signed-off-by: Nathan Lynch <nath...@linux.ibm.com> > Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/a6717c01ddc259f6f73364779df058e2c67309f8 cheers