Hello Nathan, On Sat, Aug 3, 2019 at 1:06 AM Nathan Lynch <nath...@linux.ibm.com> wrote: > > The LPAR migration implementation and userspace-initiated cpu hotplug > can interleave their executions like so: > > 1. Set cpu 7 offline via sysfs. > > 2. Begin a partition migration, whose implementation requires the OS > to ensure all present cpus are online; cpu 7 is onlined: > > rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up > > This sets cpu 7 online in all respects except for the cpu's > corresponding struct device; dev->offline remains true. > > 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is > already online and returns success. The driver core (device_online) > sets dev->offline = false. > > 4. The migration completes and restores cpu 7 to offline state: > > rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down > > This leaves cpu7 in a state where the driver core considers the cpu > device online, but in all other respects it is offline and > unused. Attempts to online the cpu via sysfs appear to succeed but the > driver core actually does not pass the request to the lower-level > cpuhp support code. This makes the cpu unusable until the cpu device > is manually set offline and then online again via sysfs. > > Instead of directly calling cpu_up/cpu_down, the migration code should > use the higher-level device core APIs to maintain consistent state and > serialize operations. > > Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to > migration/hibernation") > Signed-off-by: Nathan Lynch <nath...@linux.ibm.com>
Looks good to me. This locking scheme makes the code consistent with dlpar_cpu() which also uses the high-level device APIs. Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com> > --- > arch/powerpc/kernel/rtas.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c > index 5faf0a64c92b..05824eb4323b 100644 > --- a/arch/powerpc/kernel/rtas.c > +++ b/arch/powerpc/kernel/rtas.c > @@ -871,15 +871,17 @@ static int rtas_cpu_state_change_mask(enum > rtas_cpu_state state, > return 0; > > for_each_cpu(cpu, cpus) { > + struct device *dev = get_cpu_device(cpu); > + > switch (state) { > case DOWN: > - cpuret = cpu_down(cpu); > + cpuret = device_offline(dev); > break; > case UP: > - cpuret = cpu_up(cpu); > + cpuret = device_online(dev); > break; > } > - if (cpuret) { > + if (cpuret < 0) { > pr_debug("%s: cpu_%s for cpu#%d returned %d.\n", > __func__, > ((state == UP) ? "up" : "down"), > @@ -968,6 +970,8 @@ int rtas_ibm_suspend_me(u64 handle) > data.token = rtas_token("ibm,suspend-me"); > data.complete = &done; > > + lock_device_hotplug(); > + > /* All present CPUs must be online */ > cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask); > cpuret = rtas_online_cpus_mask(offline_mask); > @@ -1006,6 +1010,7 @@ int rtas_ibm_suspend_me(u64 handle) > __func__); > > out: > + unlock_device_hotplug(); > free_cpumask_var(offline_mask); > return atomic_read(&data.error); > } > -- > 2.20.1 > -- Thanks and Regards gautham.