Michael Bringmann <m...@linux.vnet.ibm.com> writes: > This patch is to check for cede'ed CPUs during LPM. Some extreme > tests encountered a problem ehere Linux has put some threads to > sleep (possibly to save energy or something), LPM was attempted, > and the Linux kernel didn't awaken the sleeping threads, but issued > the H_JOIN for the active threads. Since the sleeping threads > are not awake, they can not issue the expected H_JOIN, and the > partition would never suspend. This patch wakes the sleeping > threads back up.
I'm don't think this is the right solution. Just after your for loop we do an on_each_cpu() call, which sends an IPI to every CPU, and that should wake all CPUs up from CEDE. If that's not happening then there is a bug somewhere, and we need to work out where. > diff --git a/arch/powerpc/include/asm/plpar_wrappers.h > b/arch/powerpc/include/asm/plpar_wrappers.h > index cff5a41..8292eff 100644 > --- a/arch/powerpc/include/asm/plpar_wrappers.h > +++ b/arch/powerpc/include/asm/plpar_wrappers.h > @@ -26,10 +26,8 @@ static inline void set_cede_latency_hint(u8 latency_hint) > get_lppaca()->cede_latency_hint = latency_hint; > } > > -static inline long cede_processor(void) > -{ > - return plpar_hcall_norets(H_CEDE); > -} > +int cpu_is_ceded(int cpu); > +long cede_processor(void); > > static inline long extended_cede_processor(unsigned long latency_hint) > { > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c > index de35bd8f..fea3d21 100644 > --- a/arch/powerpc/kernel/rtas.c > +++ b/arch/powerpc/kernel/rtas.c > @@ -44,6 +44,7 @@ > #include <asm/time.h> > #include <asm/mmu.h> > #include <asm/topology.h> > +#include <asm/plpar_wrappers.h> > > /* This is here deliberately so it's only used in this file */ > void enter_rtas(unsigned long); > @@ -942,7 +943,7 @@ int rtas_ibm_suspend_me(u64 handle) > struct rtas_suspend_me_data data; > DECLARE_COMPLETION_ONSTACK(done); > cpumask_var_t offline_mask; > - int cpuret; > + int cpuret, cpu; > > if (!rtas_service_present("ibm,suspend-me")) > return -ENOSYS; > @@ -991,6 +992,11 @@ int rtas_ibm_suspend_me(u64 handle) > goto out_hotplug_enable; > } > > + for_each_present_cpu(cpu) { > + if (cpu_is_ceded(cpu)) > + plpar_hcall_norets(H_PROD, > get_hard_smp_processor_id(cpu)); > + } There's a race condition here, there's nothing to prevent the CPUs you just PROD'ed from going back into CEDE before you do the on_each_cpu() call below. > /* Call function on all CPUs. One of us will make the > * rtas call > */ > diff --git a/arch/powerpc/platforms/pseries/setup.c > b/arch/powerpc/platforms/pseries/setup.c > index 41f62ca2..48ae6d4 100644 > --- a/arch/powerpc/platforms/pseries/setup.c > +++ b/arch/powerpc/platforms/pseries/setup.c > @@ -331,6 +331,24 @@ static int alloc_dispatch_log_kmem_cache(void) > } > machine_early_initcall(pseries, alloc_dispatch_log_kmem_cache); > > +static DEFINE_PER_CPU(int, cpu_ceded); > + > +int cpu_is_ceded(int cpu) > +{ > + return per_cpu(cpu_ceded, cpu); > +} > + > +long cede_processor(void) > +{ > + long rc; > + > + per_cpu(cpu_ceded, raw_smp_processor_id()) = 1; And there's also a race condition here. From the other CPU's perspective the store to cpu_ceded is not necessarily ordered vs the hcall below. Which means the other CPU can see cpu_ceded = 0, and therefore not prod us, but this CPU has already called H_CEDE. > + rc = plpar_hcall_norets(H_CEDE); > + per_cpu(cpu_ceded, raw_smp_processor_id()) = 0; > + > + return rc; > +} > + > static void pseries_lpar_idle(void) > { > /* cheers