Hi Nick, Thanks for reviewing the patch.
On Wed, Mar 15, 2017 at 12:05:43AM +1000, Nicholas Piggin wrote: > On Mon, 13 Mar 2017 11:31:27 +0530 > "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> wrote: > > > From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> > > > > Currently during idle-init on power9, if we don't find suitable stop > > states in the device tree that can be used as the > > default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default > > stop state psscr to be used by power9_idle and deepest stop state > > which is used by CPU-Hotplug. > > > > However, if the platform firmware has not configured or enabled a stop > > state, the kernel should not make any assumptions and fallback to a > > default choice. > > > > If the kernel uses a stop state that is not configured by the platform > > firmware, it may lead to further failures which should be avoided. > > > > In this patch, we modify the init code to ensure that the kernel uses > > only the stop states exposed by the firmware through the device > > tree. When a suitable default stop state isn't found, we disable > > ppc_md.power_save for power9. Similarly, when a suitable > > deepest_stop_state is not found in the device tree exported by the > > firmware, fall back to the default busy-wait loop in the CPU-Hotplug > > code. > > Seems reasonable. I have a few comments that you may consider. Nothing > too major. > > Btw., it would be nice to move this hotplug idling selection code to > idle.c. Have the hotplug just ask to enter the best available idle mode > and that's it. I'm not asking you to do that for this series, but perhaps > consider it for the future. That's not a bad idea. I will do it in the respin of the patchset. > > > > diff --git a/arch/powerpc/platforms/powernv/idle.c > > b/arch/powerpc/platforms/powernv/idle.c > > index 4ee837e..9fde6e4 100644 > > --- a/arch/powerpc/platforms/powernv/idle.c > > +++ b/arch/powerpc/platforms/powernv/idle.c > > @@ -147,7 +147,6 @@ u32 pnv_get_supported_cpuidle_states(void) > > } > > EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states); > > > > - > > static void pnv_fastsleep_workaround_apply(void *info) > > > > { > > @@ -243,6 +242,7 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600, > > */ > > u64 pnv_default_stop_val; > > u64 pnv_default_stop_mask; > > +bool default_stop_found; > > > > /* > > * Used for ppc_md.power_save which needs a function with no parameters > > @@ -264,6 +264,7 @@ static void power9_idle(void) > > */ > > u64 pnv_deepest_stop_psscr_val; > > u64 pnv_deepest_stop_psscr_mask; > > +bool deepest_stop_found; > > > > /* > > * Power ISA 3.0 idle initialization. > > If the hotplug idle code was in idle.c, then all this deepest/default stop > logic and register settings would be static to idle.c, which would be nice. > > If you have a function to check if deepest stop is found, then you don't need > a non-static variable here (or for default_stop_found AFAIKS). Sure! > > > > @@ -352,7 +353,6 @@ static int __init pnv_power9_idle_init(struct > > device_node *np, u32 *flags, > > u32 *residency_ns = NULL; > > u64 max_residency_ns = 0; > > int rc = 0, i; > > - bool default_stop_found = false, deepest_stop_found = false; > > > > psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL); > > psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL); > > @@ -433,20 +433,22 @@ static int __init pnv_power9_idle_init(struct > > device_node *np, u32 *flags, > > } > > > > if (!default_stop_found) { > > - pnv_default_stop_val = PSSCR_HV_DEFAULT_VAL; > > - pnv_default_stop_mask = PSSCR_HV_DEFAULT_MASK; > > - pr_warn("Setting default stop psscr > > val=0x%016llx,mask=0x%016llx\n", > > + pr_warn("powernv:idle: Default stop not found. Disabling > > ppcmd.powersave\n"); > > + } else { > > + pr_info("powernv:idle: Default stop: psscr = > > 0x%016llx,mask=0x%016llx\n", > > pnv_default_stop_val, pnv_default_stop_mask); > > } > > > > if (!deepest_stop_found) { > > - pnv_deepest_stop_psscr_val = PSSCR_HV_DEFAULT_VAL; > > - pnv_deepest_stop_psscr_mask = PSSCR_HV_DEFAULT_MASK; > > - pr_warn("Setting default stop psscr > > val=0x%016llx,mask=0x%016llx\n", > > + pr_warn("powernv:idle: Deepest stop not found. CPU-Hotplug is > > affected\n"); > > I guess these warnings are meant for developers, but it might be nice > to make them slightly more meaningful? Prefix of choice seems to be > "cpuidle-powernv:" > > Then you could have "no suitable stop state provided by firmware. Disabling > idle power saving" and "no suitable stop state provided by firmware. Offlined > CPUs will busy-wait", perhaps? How about pr_warn("cpuidle-powernv: No suitable stop for CPU-Hotplug. Offlined CPUs will busy wait\n"); > > Just a suggestion. > > > + } else { > > + pr_info("powernv:idle: Deepest stop: psscr = > > 0x%016llx,mask=0x%016llx\n", > > pnv_deepest_stop_psscr_val, > > pnv_deepest_stop_psscr_mask); > > } > > > > + pr_info("powernv:idle: RL value of first deep stop = 0x%llx\n", > > + pnv_first_deep_stop_state); > > cpuidle-powernv: prefix for these too? Will fix. > > > out: > > kfree(psscr_val); > > kfree(psscr_mask); > > @@ -454,6 +456,12 @@ static int __init pnv_power9_idle_init(struct > > device_node *np, u32 *flags, > > return rc; > > } > > > > +bool pnv_check_deepest_stop(void) > > +{ > > + return deepest_stop_found; > > +} > > +EXPORT_SYMBOL_GPL(pnv_check_deepest_stop); > > Does this need to be exported? AFAIKS it's not used in a module. No, it is not used in a module. Will get rid of it. > > > + > > /* > > * Probe device tree for supported idle states > > */ > > @@ -526,7 +534,8 @@ static int __init pnv_init_idle_states(void) > > > > if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED) > > ppc_md.power_save = power7_idle; > > - else if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST) > > + else if ((supported_cpuidle_states & OPAL_PM_STOP_INST_FAST) && > > + default_stop_found) > > ppc_md.power_save = power9_idle; > > > > out: > > Is this the only use of default_stop_found? & OPAL_PM_STOP_INST_FAST > should always be the same as default_stop_found value, no? In that case > can you just remove the OPAL_PM_STOP_INST_FAST test here? (I like the > boolean and prefer the default idle state selection logic to stay in the > idle init above where you have it commented). Yup. You are right, the check is redundant. We only consider STOP_INST_FAST states for default in power9_idle_init(). Will fix this and move initialization of ppc_md.power_save to the init function. > > > > diff --git a/arch/powerpc/platforms/powernv/powernv.h > > b/arch/powerpc/platforms/powernv/powernv.h > > index 6130522..9acd5eb 100644 > > --- a/arch/powerpc/platforms/powernv/powernv.h > > +++ b/arch/powerpc/platforms/powernv/powernv.h > > @@ -18,6 +18,7 @@ static inline void pnv_pci_shutdown(void) { } > > #endif > > > > extern u32 pnv_get_supported_cpuidle_states(void); > > +bool pnv_check_deepest_stop(void); > > extern u64 pnv_deepest_stop_psscr_val; > > extern u64 pnv_deepest_stop_psscr_mask; > > > > diff --git a/arch/powerpc/platforms/powernv/smp.c > > b/arch/powerpc/platforms/powernv/smp.c > > index 8d5b99e..3f66e6f 100644 > > --- a/arch/powerpc/platforms/powernv/smp.c > > +++ b/arch/powerpc/platforms/powernv/smp.c > > @@ -140,6 +140,7 @@ static void pnv_smp_cpu_kill_self(void) > > unsigned int cpu; > > unsigned long srr1, wmask; > > u32 idle_states; > > + bool deepest_stop_found; > > > > /* Standard hot unplug procedure */ > > local_irq_disable(); > > @@ -155,6 +156,7 @@ static void pnv_smp_cpu_kill_self(void) > > wmask = SRR1_WAKEMASK_P8; > > > > idle_states = pnv_get_supported_cpuidle_states(); > > + deepest_stop_found = pnv_check_deepest_stop(); > > > > /* We don't want to take decrementer interrupts while we are offline, > > * so clear LPCR:PECE1. We keep PECE2 (and LPCR_PECE_HVEE on P9) > > @@ -184,7 +186,11 @@ static void pnv_smp_cpu_kill_self(void) > > > > ppc64_runlatch_off(); > > > > - if (cpu_has_feature(CPU_FTR_ARCH_300)) { > > + if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) { > > + pr_info("CPU %u offlining with psscr = 0x%016llx\n", > > + cpu, pnv_deepest_stop_psscr_val); > > + pr_info("CPU%d down paca pir %016x pir %lx\n", > > + cpu, hard_smp_processor_id(), mfspr(SPRN_PIR)); > > How much log info is appropriate here? This should have been pr_debug. I will clean up this part. > > > srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val, > > pnv_deepest_stop_psscr_mask); > > } else if (idle_states & OPAL_PM_WINKLE_ENABLED) { > -- Thanks and Regards gautham.