Daniel Henrique Barboza <danielhb...@gmail.com> writes: > Ping > > On 3/5/21 2:38 PM, Daniel Henrique Barboza wrote: >> Of all the reasons that dlpar_cpu_remove() can fail, the 'last online >> CPU' is one that can be caused directly by the user offlining CPUs >> in a partition/virtual machine that has hotplugged CPUs. Trying to >> reclaim a hotplugged CPU can fail if the CPU is now the last online in >> the system. This is easily reproduced using QEMU [1].
Sorry, I saw this earlier and never got around to replying. I'm wondering if we neet to catch it earlier, ie. in dlpar_offline_cpu(). By the time we return to dlpar_cpu_remove() we've dropped the cpu_add_remove_lock (cpu_maps_update_done), so num_online_cpus() could change out from under us, meaning the num_online_cpus() == 1 check might trigger incorrectly (or vice versa). Something like the patch below (completely untested :D) And writing that patch makes me wonder, is == 1 the right check? In most cases we'll remove all but one thread of the core, but we'll fail on the last thread. Leaving that core effectively stuck in SMT1. Is that useful behaviour? Should we instead check at the start that we can remove all threads of the core without going to zero online CPUs? cheers diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 12cbffd3c2e3..498c22331ac8 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -271,6 +271,12 @@ static int dlpar_offline_cpu(struct device_node *dn) if (!cpu_online(cpu)) break; + if (num_online_cpus() == 1) { + pr_warn("Unable to remove last online CPU %pOFn\n", dn); + rc = EBUSY; + goto out_unlock; + } + cpu_maps_update_done(); rc = device_offline(get_cpu_device(cpu)); if (rc) @@ -283,6 +289,7 @@ static int dlpar_offline_cpu(struct device_node *dn) thread); } } +out_unlock: cpu_maps_update_done(); out: