Re: [PATCH] powernv: Restore SPRs correctly upon wake up from hypervisor state loss
Hi Gautham, Thanks for fixing this. On Wed, Sep 7, 2016 at 1:16 AM, Gautham R. Shenoy wrote: > From: "Gautham R. Shenoy" > > pnv_wakeup_tb_loss function currently expects the cr4 to be "eq" if > the CPU is waking up from a complete hypervisor state loss. Hence, it > currently restores the SPR contents only if cr4 is "eq". > > However, after the commit bcef83a00dc4 ("powerpc/powernv: Add platform > support for stop instruction"), on ISA_V300 CPUs, the function > pnv_restore_hyp_resource sets cr4 to contain the result of the > comparison between state the CPU has woken up and the first deepest > stop state before calling pnv_wakeup_tb_loss. > > Thus if the CPU woke up from a state that is deeper than the first > deepest stop state, cr4 have "gt" set and hence, pnv_wakeup_tb_loss > will fail to restore the SPRs on waking up from such a state. > > Fix the code in pnv_wakeup_tb_loss to restore the SPR states when cr4 is > "eq" or "gt". > > Fixes: Commit bcef83a00dc4 ("powerpc/powernv: Add platform support for > stop instruction") > > Cc: Vaidyanathan Srinivasan > Cc: Michael Neuling > Cc: Michael Ellerman > Cc: Shreyas B. Prabhu > Signed-off-by: Gautham R. Shenoy > --- Reviewed-by: Shreyas B. Prabhu Thanks, Shreyas
Re: [PATCH] cpupower tools: Fix error when running cpupower monitor
On 08/25/2015 05:29 PM, Shreyas B Prabhu wrote: > > > On 08/17/2015 01:22 PM, Shreyas B Prabhu wrote: >> >> >> On 08/10/2015 05:58 PM, Thomas Renninger wrote: >>> On Monday, August 03, 2015 11:46:00 AM Shreyas B. Prabhu wrote: >>>> get_cpu_topology() tries to get topology info from all cpus by reading >>>> files in the topology sysfs dir. If a cpu is offlined, since it doesn't >>>> have topology dir, this function fails and returns -1. This causes >>>> functions relying on get_cpu_topology() to fail. For example- >>>> >>>> $ cpupower monitor >>>> Cannot read number of available processors >>>> >>>> Fix this by skipping fetching topology info for offline cpus. >>> >>> Looks fine. >>> >>> Thanks! >>> >>> Acked-by: Thomas Renninger >>> >> >> Thanks Thomas! >> Rafael, can you please pick this patch? >> >> > > > Hi Rafael, > > If this patch looks good can you please pick this up? > > > Thanks, > Shreyas > Hi Rafael, If this patch looks good can you please pick this up? Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower tools: Fix error when running cpupower monitor
>> >> Hi Rafael, >> >> If this patch looks good can you please pick this up? > > I picked it up last week, sorry for being silent about that. > > It should be in the Linus' tree already. > Thanks! Sorry I missed the fact that you had picked it last week. Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 4/4] powernv: powerpc: Add winkle support for offline cpus
On Monday 08 December 2014 11:22 AM, Paul Mackerras wrote: > On Thu, Dec 04, 2014 at 12:58:23PM +0530, Shreyas B. Prabhu wrote: >> Winkle is a deep idle state supported in power8 chips. A core enters >> winkle when all the threads of the core enter winkle. In this state >> power supply to the entire chiplet i.e core, private L2 and private L3 >> is turned off. As a result it gives higher powersavings compared to >> sleep. >> >> But entering winkle results in a total hypervisor state loss. Hence the >> hypervisor context has to be preserved before entering winkle and >> restored upon wake up. >> >> Power-on Reset Engine (PORE) is a dedicated engine which is responsible >> for powering on the chiplet during wake up. It can be programmed to >> restore the register contests of a few specific registers. This patch >> uses PORE to restore register state wherever possible and uses stack to >> save and restore rest of the necessary registers. >> >> With hypervisor state restore things fall under three categories- >> per-core state, per-subcore state and per-thread state. To manage this, >> extend the infrastructure introduced for sleep. Mainly we add a paca >> variable subcore_sibling_mask. Using this and the core_idle_state we can >> distingush first thread in core and subcore. > > Comments below... > >> diff --git a/arch/powerpc/kernel/exceptions-64s.S >> b/arch/powerpc/kernel/exceptions-64s.S >> index 7637889..2b9b5fb 100644 >> --- a/arch/powerpc/kernel/exceptions-64s.S >> +++ b/arch/powerpc/kernel/exceptions-64s.S >> @@ -102,9 +102,7 @@ system_reset_pSeries: >> #ifdef CONFIG_PPC_P7_NAP >> BEGIN_FTR_SECTION >> /* Running native on arch 2.06 or later, check if we are >> - * waking up from nap. We only handle no state loss and >> - * supervisor state loss. We do -not- handle hypervisor >> - * state loss at this time. >> + * waking up from nap/sleep/winkle. >> */ >> mfspr r13,SPRN_SRR1 >> rlwinm. r13,r13,47-31,30,31 >> @@ -112,7 +110,17 @@ BEGIN_FTR_SECTION >> >> cmpwi cr3,r13,2 >> >> -GET_PACA(r13) >> +/* Check if last bit of HSPGR0 is set. This indicates whether we are >> + * waking up from winkle */ >> +li r3,1 >> +mfspr r4,SPRN_HSPRG0 >> +and r5,r4,r3 >> +cmpwi cr4,r5,1/* Store result in cr4 for later use */ >> + >> +andcr4,r4,r3 >> +mtspr SPRN_HSPRG0,r4 >> + >> +mr r13,r4 > > This seems unnecessarily convoluted. How about: > > GET_PACA(r13) > clrldi r5,r13,63 > clrrdi r13,r13,1 > cmpwi cr4,r5,1 > mtspr SPRN_HSPRG0,r13 > Yes, makes more sense. I'll use this. >> diff --git a/arch/powerpc/kernel/idle_power7.S >> b/arch/powerpc/kernel/idle_power7.S >> index 8c3a1f4..8102075 100644 >> --- a/arch/powerpc/kernel/idle_power7.S >> +++ b/arch/powerpc/kernel/idle_power7.S >> @@ -19,8 +19,24 @@ >> #include >> #include >> #include >> +#include >> >> #undef DEBUG >> +/* >> + * Use unused space in the interrupt stack to save and restore >> + * registers for winkle support. >> + */ >> +#define _SDR1 GPR3 >> +#define _RPRGPR4 >> +#define _SPURR GPR5 >> +#define _PURR GPR6 >> +#define _TSCR GPR7 >> +#define _DSCR GPR8 >> +#define _AMOR GPR9 >> +#define _PMC5 GPR10 >> +#define _PMC6 GPR11 > > Why only PMC5 and PMC6 out of all the PMU registers? What about > PMC1-PMC4 and the MMCR registers? I assume they're lost during winkle > state also, aren't they? If we're not saving them, what's the point > of saving and restoring PMC5 and PMC6? > Yes all PMC and MMCR contents are lost. Using __restore_cpu_power8, the MMCR registers are initialized to 0. The reasoning behind specifically restoring PMC5 and PMC6 was the fact that they are not programmable and count cycles/instructions by default. We suspected that there might be a userspace program which relied on PMC5/PMC6 always increasing. But now on closer look, since these counters are 32 bit and cycles/ instruction counts are bound to exceed it, I doubt such userspace programs exist. I'll drop PMC5 and PMC6 in the next version. >> +#define _WORT GPR12 >> +#define _WORC GPR13 >> >> /* Idle state entry routines */ >> >> @@ -124,8 +140,8 @@ power7_enter_nap_mode: >> stb r4,HSTATE_HWTHREAD_STATE(r13) >> #endif >>
[PATCH v4 0/4] powernv: cpuidle: Redesign idle states management
Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these states. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores are involved. Winkle is deeper idle state compared to fastsleep. In this state the power supply to the chiplet, i.e core, private L2 and private L3 is turned off. This results in a total hypervisor state loss. This patch set adds support for winkle and provides a way to track the idle states of the threads of the core and use it for idle state management of idle states sleep and winkle. Note- This patch set requires "powerpc: powernv: Return to cpu offline loop when finished in KVM guest" (http://patchwork.ozlabs.org/patch/417240/) TBD: - Remove duplication of branching to kvm code. Changes in v4: -- - Based patches on top of http://patchwork.ozlabs.org/patch/417240/ - isync ordering fix. - Save/Restore SRR1 value so that it doesn't get clobbered by opal_call_realmode. - Changed HSPRG0 handling. - Comment fixes. Changes in v3: - - Added barriers after lock - Added a paca field to that stores thread mask. - Changed code structure around fastsleep workaround, to allow for manual patching out if the platform does not require it. - Threads waiting on core_idle_state lock now loop in HMT_LOW - Using NV CRs to avoid save/restore of CR while making OPAL calls. - Fixed couple of flow issues in path where fastsleep workaround was not needed - Using PPC_LR_STKOFF instead of _LINK in opal_call_realmode - Restoring WORT and WORC Changes in v2: -- -Using PNV_THREAD_NAP/SLEEP defines while calling power7_powersave_common -Comment changes based on review -Rebased on top of 3.18-rc6 Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: Vaidyanathan Srinivasan Cc: Preeti U Murthy Paul Mackerras (1): powerpc: powernv: Switch off MMU before entering nap/sleep/rvwinkle mode Preeti U. Murthy (1): powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu (2): powernv: cpuidle: Redesign idle states management powernv: powerpc: Add winkle support for offline cpus arch/powerpc/include/asm/cpuidle.h | 14 ++ arch/powerpc/include/asm/opal.h| 13 + arch/powerpc/include/asm/paca.h| 6 + arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/reg.h | 4 + arch/powerpc/kernel/asm-offsets.c | 6 + arch/powerpc/kernel/cpu_setup_power.S | 4 + arch/powerpc/kernel/exceptions-64s.S | 30 ++- arch/powerpc/kernel/idle_power7.S | 332 + arch/powerpc/platforms/powernv/opal-wrappers.S | 39 +++ arch/powerpc/platforms/powernv/powernv.h | 2 + arch/powerpc/platforms/powernv/setup.c | 160 arch/powerpc/platforms/powernv/smp.c | 10 +- arch/powerpc/platforms/powernv/subcore.c | 34 +++ arch/powerpc/platforms/powernv/subcore.h | 1 + drivers/cpuidle/cpuidle-powernv.c | 10 +- 17 files changed, 608 insertions(+), 60 deletions(-) create mode 100644 arch/powerpc/include/asm/cpuidle.h -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 1/4] powerpc: powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
From: Paul Mackerras Currently, when going idle, we set the flag indicating that we are in nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap (or sleep or rvwinkle) instruction, all with the MMU on. This is bad for two reasons: (a) the architecture specifies that those instructions must be executed with the MMU off, and in fact with only the SF, HV, ME and possibly RI bits set, and (b) this introduces a race, because as soon as we set the flag, another thread can switch the MMU to a guest context. If the race is lost, this thread will typically start looping on relocation-on ISIs at 0xc...4400. This fixes it by setting the MSR as required by the architecture before setting the flag or executing the nap/sleep/rvwinkle instruction. [ shre...@linux.vnet.ibm.com: Edited to handle LE ] Signed-off-by: Paul Mackerras Signed-off-by: Shreyas B. Prabhu Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org --- arch/powerpc/include/asm/reg.h| 2 ++ arch/powerpc/kernel/idle_power7.S | 18 +- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index c998279..a68ee15 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -118,8 +118,10 @@ #define __MSR (MSR_ME | MSR_RI | MSR_IR | MSR_DR | MSR_ISF |MSR_HV) #ifdef __BIG_ENDIAN__ #define MSR_ __MSR +#define MSR_IDLE (MSR_ME | MSR_SF | MSR_HV) #else #define MSR_ (__MSR | MSR_LE) +#define MSR_IDLE (MSR_ME | MSR_SF | MSR_HV | MSR_LE) #endif #define MSR_KERNEL (MSR_ | MSR_64BIT) #define MSR_USER32 (MSR_ | MSR_PR | MSR_EE) diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index 18c0687..e5aba6a 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -101,7 +101,23 @@ _GLOBAL(power7_powersave_common) std r9,_MSR(r1) std r1,PACAR1(r13) -_GLOBAL(power7_enter_nap_mode) + /* +* Go to real mode to do the nap, as required by the architecture. +* Also, we need to be in real mode before setting hwthread_state, +* because as soon as we do that, another thread can switch +* the MMU context to the guest. +*/ + LOAD_REG_IMMEDIATE(r5, MSR_IDLE) + li r6, MSR_RI + andcr6, r9, r6 + LOAD_REG_ADDR(r7, power7_enter_nap_mode) + mtmsrd r6, 1 /* clear RI before setting SRR0/1 */ + mtspr SPRN_SRR0, r7 + mtspr SPRN_SRR1, r5 + rfid + + .globl power7_enter_nap_mode +power7_enter_nap_mode: #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* Tell KVM we're napping */ li r4,KVM_HWTHREAD_IN_NAP -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 3/4] powernv: cpuidle: Redesign idle states management
Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these state. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores and kvm is involved. This patch adds the necessary infrastructure to track idle states of threads in a per-core structure. It uses this info to perform tasks like fastsleep workaround and timebase resync only once per core. Signed-off-by: Shreyas B. Prabhu Originally-by: Preeti U. Murthy Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org --- arch/powerpc/include/asm/cpuidle.h | 20 +++ arch/powerpc/include/asm/opal.h| 2 + arch/powerpc/include/asm/paca.h| 6 + arch/powerpc/include/asm/processor.h | 2 +- arch/powerpc/kernel/asm-offsets.c | 6 + arch/powerpc/kernel/exceptions-64s.S | 24 +-- arch/powerpc/kernel/idle_power7.S | 197 +++-- arch/powerpc/platforms/powernv/opal-wrappers.S | 37 + arch/powerpc/platforms/powernv/setup.c | 49 +- arch/powerpc/platforms/powernv/smp.c | 3 +- drivers/cpuidle/cpuidle-powernv.c | 3 +- 11 files changed, 291 insertions(+), 58 deletions(-) create mode 100644 arch/powerpc/include/asm/cpuidle.h diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h new file mode 100644 index 000..d2f99ca --- /dev/null +++ b/arch/powerpc/include/asm/cpuidle.h @@ -0,0 +1,20 @@ +#ifndef _ASM_POWERPC_CPUIDLE_H +#define _ASM_POWERPC_CPUIDLE_H + +#ifdef CONFIG_PPC_POWERNV +/* Used in powernv idle state management */ +#define PNV_THREAD_RUNNING 0 +#define PNV_THREAD_NAP 1 +#define PNV_THREAD_SLEEP2 +#define PNV_THREAD_WINKLE 3 +#define PNV_CORE_IDLE_LOCK_BIT 0x100 +#define PNV_CORE_IDLE_THREAD_BITS 0x0FF + +#ifndef __ASSEMBLY__ +extern u32 pnv_fastsleep_workaround_at_entry[]; +extern u32 pnv_fastsleep_workaround_at_exit[]; +#endif + +#endif + +#endif diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index f8b95c0..bef7fbc 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -152,6 +152,7 @@ struct opal_sg_list { #define OPAL_PCI_ERR_INJECT96 #define OPAL_PCI_EEH_FREEZE_SET97 #define OPAL_HANDLE_HMI98 +#define OPAL_CONFIG_CPU_IDLE_STATE 99 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 @@ -162,6 +163,7 @@ struct opal_sg_list { */ #define OPAL_PM_NAP_ENABLED0x0001 #define OPAL_PM_SLEEP_ENABLED 0x0002 +#define OPAL_PM_SLEEP_ENABLED_ER1 0x0008 #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index a5139ea..e2c4737 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -158,6 +158,12 @@ struct paca_struct { * early exception handler for use by high level C handler */ struct opal_machine_check_event *opal_mc_evt; + + /* Per-core mask tracking idle threads and a lock bit-[L][] */ + u32 *core_idle_state_ptr; + u8 thread_idle_state; /* PNV_THREAD_RUNNING/NAP/SLEEP */ + /* Mask to indicate thread id in core */ + u8 thread_mask; #endif #ifdef CONFIG_PPC_BOOK3S_64 /* Exclusive emergency stack pointer for machine check exception. */ diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 29c3798..f5c45b3 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -452,7 +452,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF}; extern int powersave_nap; /* set if nap mode can be used in idle loop */ extern unsigned long power7_nap(int check_irq); -extern void power7_sleep(void); +extern unsigned long power7_sleep(void); extern void flush_instruction_cache(void); extern void hard_reset_now(void); extern void poweroff_now(void); diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9d7dede..3bc0352 100644 --- a/arch/powerpc/kernel/asm
[PATCH v4 2/4] powerpc/powernv: Enable Offline CPUs to enter deep idle states
The secondary threads should enter deep idle states so as to gain maximum powersavings when the entire core is offline. To do so the offline path must be made aware of the available deepest idle state. Hence probe the device tree for the possible idle states in powernv core code and expose the deepest idle state through flags. Since the device tree is probed by the cpuidle driver as well, move the parameters required to discover the idle states into an appropriate common place to both the driver and the powernv core code. Another point is that fastsleep idle state may require workarounds in the kernel to function properly. This workaround is introduced in the subsequent patches. However neither the cpuidle driver or the hotplug path need be bothered about this workaround. They will be taken care of by the core powernv code. Originally-by: Srivatsa S. Bhat Signed-off-by: Preeti U. Murthy Signed-off-by: Shreyas B. Prabhu Reviewed-by: Paul Mackerras Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org --- arch/powerpc/include/asm/opal.h | 8 ++ arch/powerpc/platforms/powernv/powernv.h | 2 ++ arch/powerpc/platforms/powernv/setup.c | 49 arch/powerpc/platforms/powernv/smp.c | 7 - drivers/cpuidle/cpuidle-powernv.c| 9 ++ 5 files changed, 68 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 9124b0e..f8b95c0 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -155,6 +155,14 @@ struct opal_sg_list { #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 +/* Device tree flags */ + +/* Flags set in power-mgmt nodes in device tree if + * respective idle states are supported in the platform. + */ +#define OPAL_PM_NAP_ENABLED0x0001 +#define OPAL_PM_SLEEP_ENABLED 0x0002 + #ifndef __ASSEMBLY__ #include diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 6c8e2d1..604c48e 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -29,6 +29,8 @@ static inline u64 pnv_pci_dma_get_required_mask(struct pci_dev *pdev) } #endif +extern u32 pnv_get_supported_cpuidle_states(void); + extern void pnv_lpc_init(void); bool cpu_core_split_required(void); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 3f9546d..34c6665 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -290,6 +290,55 @@ static void __init pnv_setup_machdep_rtas(void) } #endif /* CONFIG_PPC_POWERNV_RTAS */ +static u32 supported_cpuidle_states; + +u32 pnv_get_supported_cpuidle_states(void) +{ + return supported_cpuidle_states; +} + +static int __init pnv_init_idle_states(void) +{ + struct device_node *power_mgt; + int dt_idle_states; + const __be32 *idle_state_flags; + u32 len_flags, flags; + int i; + + supported_cpuidle_states = 0; + + if (cpuidle_disable != IDLE_NO_OVERRIDE) + return 0; + + if (!firmware_has_feature(FW_FEATURE_OPALv3)) + return 0; + + power_mgt = of_find_node_by_path("/ibm,opal/power-mgt"); + if (!power_mgt) { + pr_warn("opal: PowerMgmt Node not found\n"); + return 0; + } + + idle_state_flags = of_get_property(power_mgt, + "ibm,cpu-idle-state-flags", &len_flags); + if (!idle_state_flags) { + pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n"); + return 0; + } + + dt_idle_states = len_flags / sizeof(u32); + + for (i = 0; i < dt_idle_states; i++) { + flags = be32_to_cpu(idle_state_flags[i]); + supported_cpuidle_states |= flags; + } + + return 0; +} + +subsys_initcall(pnv_init_idle_states); + + static int __init pnv_probe(void) { unsigned long root = of_get_flat_dt_root(); diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index b716f66..83299ef 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -150,6 +150,7 @@ static void pnv_smp_cpu_kill_self(void) { unsigned int cpu; unsigned long srr1; + u32 idle_states; /* Standard hot unplug procedure */ local_irq_disable(); @@ -160,13 +161,17 @@ static void pnv_smp_cpu_kill_self(void) generic_set_cpu_dead(cpu); smp_wmb(); + idle_states = pnv_get_supported_cpuidle_states(); /* We don't want to take decrementer interrupts while we are offline, * so clear LPCR:PECE1. We keep PECE2
[PATCH v4 4/4] powernv: powerpc: Add winkle support for offline cpus
Winkle is a deep idle state supported in power8 chips. A core enters winkle when all the threads of the core enter winkle. In this state power supply to the entire chiplet i.e core, private L2 and private L3 is turned off. As a result it gives higher powersavings compared to sleep. But entering winkle results in a total hypervisor state loss. Hence the hypervisor context has to be preserved before entering winkle and restored upon wake up. Power-on Reset Engine (PORE) is a dedicated engine which is responsible for powering on the chiplet during wake up. It can be programmed to restore the register contests of a few specific registers. This patch uses PORE to restore register state wherever possible and uses stack to save and restore rest of the necessary registers. With hypervisor state restore things fall under three categories- per-core state, per-subcore state and per-thread state. To manage this, extend the infrastructure introduced for sleep. Mainly we add a paca variable subcore_sibling_mask. Using this and the core_idle_state we can distingush first thread in core and subcore. Signed-off-by: Shreyas B. Prabhu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org --- arch/powerpc/include/asm/opal.h| 3 + arch/powerpc/include/asm/paca.h| 2 + arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/reg.h | 2 + arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kernel/exceptions-64s.S | 11 +- arch/powerpc/kernel/idle_power7.S | 141 +++-- arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/setup.c | 73 + arch/powerpc/platforms/powernv/smp.c | 4 +- arch/powerpc/platforms/powernv/subcore.c | 34 ++ arch/powerpc/platforms/powernv/subcore.h | 1 + 13 files changed, 266 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index bef7fbc..f0ca2d9 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -153,6 +153,7 @@ struct opal_sg_list { #define OPAL_PCI_EEH_FREEZE_SET97 #define OPAL_HANDLE_HMI98 #define OPAL_CONFIG_CPU_IDLE_STATE 99 +#define OPAL_SLW_SET_REG 100 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 @@ -163,6 +164,7 @@ struct opal_sg_list { */ #define OPAL_PM_NAP_ENABLED0x0001 #define OPAL_PM_SLEEP_ENABLED 0x0002 +#define OPAL_PM_WINKLE_ENABLED 0x0004 #define OPAL_PM_SLEEP_ENABLED_ER1 0x0008 #ifndef __ASSEMBLY__ @@ -972,6 +974,7 @@ int64_t opal_sensor_read(uint32_t sensor_hndl, int token, __be32 *sensor_data); int64_t opal_handle_hmi(void); int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end); int64_t opal_unregister_dump_region(uint32_t id); +int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number); /* Internal functions */ diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index e2c4737..c979577 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -164,6 +164,8 @@ struct paca_struct { u8 thread_idle_state; /* PNV_THREAD_RUNNING/NAP/SLEEP */ /* Mask to indicate thread id in core */ u8 thread_mask; + /* Mask to denote subcore sibling threads */ + u8 subcore_sibling_mask; #endif #ifdef CONFIG_PPC_BOOK3S_64 /* Exclusive emergency stack pointer for machine check exception. */ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 6f85362..5155be7 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -194,6 +194,7 @@ #define PPC_INST_NAP 0x4c000364 #define PPC_INST_SLEEP 0x4c0003a4 +#define PPC_INST_WINKLE0x4c0003e4 /* A2 specific instructions */ #define PPC_INST_ERATWE0x7c0001a6 @@ -374,6 +375,7 @@ #define PPC_NAPstringify_in_c(.long PPC_INST_NAP) #define PPC_SLEEP stringify_in_c(.long PPC_INST_SLEEP) +#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE) /* BHRB instructions */ #define PPC_CLRBHRBstringify_in_c(.long PPC_INST_CLRBHRB) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index f5c45b3..bf117d8 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -453,6 +453,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0
Re: [PATCH v3 3/4] powernv: cpuidle: Redesign idle states management
Hi Paul, On Monday 08 December 2014 10:31 AM, Paul Mackerras wrote: > On Thu, Dec 04, 2014 at 12:58:22PM +0530, Shreyas B. Prabhu wrote: >> Deep idle states like sleep and winkle are per core idle states. A core >> enters these states only when all the threads enter either the >> particular idle state or a deeper one. There are tasks like fastsleep >> hardware bug workaround and hypervisor core state save which have to be >> done only by the last thread of the core entering deep idle state and >> similarly tasks like timebase resync, hypervisor core register restore >> that have to be done only by the first thread waking up from these >> state. >> >> The current idle state management does not have a way to distinguish the >> first/last thread of the core waking/entering idle states. Tasks like >> timebase resync are done for all the threads. This is not only is >> suboptimal, but can cause functionality issues when subcores and kvm is >> involved. >> >> This patch adds the necessary infrastructure to track idle states of >> threads in a per-core structure. It uses this info to perform tasks like >> fastsleep workaround and timebase resync only once per core. > > Comments below... > >> diff --git a/arch/powerpc/include/asm/paca.h >> b/arch/powerpc/include/asm/paca.h >> index a5139ea..e4578c3 100644 >> --- a/arch/powerpc/include/asm/paca.h >> +++ b/arch/powerpc/include/asm/paca.h >> @@ -158,6 +158,12 @@ struct paca_struct { >> * early exception handler for use by high level C handler >> */ >> struct opal_machine_check_event *opal_mc_evt; >> + >> +/* Per-core mask tracking idle threads and a lock bit-[L][] */ >> +u32 *core_idle_state_ptr; >> +u8 thread_idle_state; /* ~Idle[0]/Nap[1]/Sleep[2]/Winkle[3] */ > > Might be clearer in the comment to say "/* PNV_THREAD_xxx */" so it's > clear the value should be one of PNV_THREAD_NAP, PNV_THREAD_SLEEP, > etc. Okay. > >> diff --git a/arch/powerpc/kernel/idle_power7.S >> b/arch/powerpc/kernel/idle_power7.S >> index 283c603..8c3a1f4 100644 >> --- a/arch/powerpc/kernel/idle_power7.S >> +++ b/arch/powerpc/kernel/idle_power7.S >> @@ -18,6 +18,7 @@ >> #include >> #include >> #include >> +#include >> >> #undef DEBUG >> >> @@ -37,8 +38,7 @@ >> >> /* >> * Pass requested state in r3: >> - * 0 - nap >> - * 1 - sleep >> + * r3 - PNV_THREAD_NAP/SLEEP/WINKLE >> * >> * To check IRQ_HAPPENED in r4 >> * 0 - don't check >> @@ -123,12 +123,58 @@ power7_enter_nap_mode: >> li r4,KVM_HWTHREAD_IN_NAP >> stb r4,HSTATE_HWTHREAD_STATE(r13) >> #endif >> -cmpwi cr0,r3,1 >> -beq 2f >> +stb r3,PACA_THREAD_IDLE_STATE(r13) >> +cmpwi cr1,r3,PNV_THREAD_SLEEP >> +bge cr1,2f >> IDLE_STATE_ENTER_SEQ(PPC_NAP) >> /* No return */ >> -2: IDLE_STATE_ENTER_SEQ(PPC_SLEEP) >> -/* No return */ >> +2: >> +/* Sleep or winkle */ >> +lbz r7,PACA_THREAD_MASK(r13) >> +ld r14,PACA_CORE_IDLE_STATE_PTR(r13) >> +lwarx_loop1: >> +lwarx r15,0,r14 >> +andcr15,r15,r7 /* Clear thread bit */ >> + >> +andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS >> + >> +/* >> + * If cr0 = 0, then current thread is the last thread of the core entering >> + * sleep. Last thread needs to execute the hardware bug workaround code if >> + * required by the platform. >> + * Make the workaround call unconditionally here. The below branch call is >> + * patched out when the idle states are discovered if the platform does not >> + * require it. >> + */ >> +.global pnv_fastsleep_workaround_at_entry >> +pnv_fastsleep_workaround_at_entry: >> +beq fastsleep_workaround_at_entry > > Did you investigate using the feature bit mechanism to do this > patching for you? You would need to allocate a CPU feature bit and > parse the device tree early on and set or clear the feature bit, > before the feature fixups are done. The code here would then end up > looking like: > > BEGIN_FTR_SECTION > beq fastsleep_workaround_at_entry > END_FTR_SECTION_IFSET(CPU_FTR_FASTSLEEP_WORKAROUND) > I agree using feature fixup is a much cleaner implementation. The difficulty is, information on whether fastsleep workaround is needed is passed in the device tree. do_feature_fixups is currently called before we unflatten the device tree. Any suggestions for
Re: [PATCH v2 1/2] powerpc: Add helpers for LPCR PECE1 operations
On Friday 23 January 2015 08:36 AM, Michael Ellerman wrote: > On Mon, 2015-01-19 at 13:35 +0530, Shreyas B. Prabhu wrote: >> PECE1 bit in LPCR is used to control whether decrementer can cause exit >> from powersaving states. PECE1 bit is cleared before entering fastsleep >> or deeper powersaving state and it is set on waking up. Since both >> cpuidle and cpu offline operations use these powersaving states, add >> helper functions to be used in both these places. > > Thanks. > > That isn't really much clearer than the original, so in the end I just merged > your original fix. > > I'll think if there's a bigger consolidation we can do that makes it clearer. > > cheers > > Helper could have been this : #define LPCR_CLEAR_PECE1 (mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1) This perhaps would make it more clearer, but it will end up using additional mfspr here- static int fastsleep_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { ... new_lpcr = old_lpcr; /* Do not exit powersave upon decrementer as we've setup the timer * offload. */ new_lpcr &= ~LPCR_PECE1; mtspr(SPRN_LPCR, new_lpcr); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] powerpc: powernv: winkle: Restore LPCR with LPCR_PECE1 cleared
LPCR_PECE1 bit controls whether decrementer interrupts are allowed to cause exit from power-saving mode. While waking up from winkle, restoring LPCR with LPCR_PECE1 set (i.e Decrementer interrupts allowed) can cause issue in the following scenario: - All the threads in a core are offlined. The core enters deep winkle. - Spurious interrupt wakes up a thread in the core. Here LPCR is restored with LPCR_PECE1 bit set. - Since it was a spurious interrupt on a offline thread, the thread clears the interrupt and goes back to winkle. - Here before the thread executes winkle and puts the core into deep winkle, if a decrementer interrupt occurs on any of the sibling threads in the core that thread wakes up. - Since in offline loop we are flushing interrupt only in case of external interrupt, the decrementer interrupt does not get flushed. So at this stage the thread is stuck in this is loop of waking up at 0x100 due to decrementer interrupt, not flushing the interrupt as only external interrupts get flushed, entering winkle, waking up at 0x100 again. Fix this by programming PORE to restore LPCR with LPCR_PECE1 bit cleared when waking up from winkle. Signed-off-by: Shreyas B. Prabhu Cc: Michael Ellerman Cc: Paul Mackerras Cc: Benjamin Herrenschmidt Cc: linuxppc-...@lists.ozlabs.org --- This issue is separate from the issue which Alexey has reported. Fix for that is still pending. arch/powerpc/platforms/powernv/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index ad0e32e..83067b1 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -298,7 +298,7 @@ int pnv_save_sprs_for_winkle(void) * all cpus at boot. Get these reg values of current cpu and use the * same accross all cpus. */ - uint64_t lpcr_val = mfspr(SPRN_LPCR); + uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1; uint64_t hid0_val = mfspr(SPRN_HID0); uint64_t hid1_val = mfspr(SPRN_HID1); uint64_t hid4_val = mfspr(SPRN_HID4); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/2] powerpc: powernv: winkle: Restore LPCR with LPCR_PECE1 cleared
LPCR_PECE1 bit controls whether decrementer interrupts are allowed to cause exit from power-saving mode. While waking up from winkle, restoring LPCR with LPCR_PECE1 set (i.e Decrementer interrupts allowed) can cause issue in the following scenario: - All the threads in a core are offlined. The core enters deep winkle. - Spurious interrupt wakes up a thread in the core. Here LPCR is restored with LPCR_PECE1 bit set. - Since it was a spurious interrupt on a offline thread, the thread clears the interrupt and goes back to winkle. - Here before the thread executes winkle and puts the core into deep winkle, if a decrementer interrupt occurs on any of the sibling threads in the core that thread wakes up. - Since in offline loop we are flushing interrupt only in case of external interrupt, the decrementer interrupt does not get flushed. So at this stage the thread is stuck in this is loop of waking up at 0x100 due to decrementer interrupt, not flushing the interrupt as only external interrupts get flushed, entering winkle, waking up at 0x100 again. Fix this by programming PORE to restore LPCR with LPCR_PECE1 bit cleared when waking up from winkle. Signed-off-by: Shreyas B. Prabhu Cc: Michael Ellerman Cc: Paul Mackerras Cc: Benjamin Herrenschmidt Cc: linuxppc-...@lists.ozlabs.org --- Changes is v2: == Using the helper function introduced in the previous patch. arch/powerpc/platforms/powernv/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index ad0e32e..ded7fc8 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -298,7 +298,7 @@ int pnv_save_sprs_for_winkle(void) * all cpus at boot. Get these reg values of current cpu and use the * same accross all cpus. */ - uint64_t lpcr_val = mfspr(SPRN_LPCR); + uint64_t lpcr_val = LPCR_CLEAR_PECE1(mfspr(SPRN_LPCR)); uint64_t hid0_val = mfspr(SPRN_HID0); uint64_t hid1_val = mfspr(SPRN_HID1); uint64_t hid4_val = mfspr(SPRN_HID4); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/2] powerpc: Add helpers for LPCR PECE1 operations
PECE1 bit in LPCR is used to control whether decrementer can cause exit from powersaving states. PECE1 bit is cleared before entering fastsleep or deeper powersaving state and it is set on waking up. Since both cpuidle and cpu offline operations use these powersaving states, add helper functions to be used in both these places. Signed-off-by: Shreyas B. Prabhu Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org --- arch/powerpc/include/asm/reg.h | 4 arch/powerpc/platforms/powernv/smp.c | 4 ++-- drivers/cpuidle/cpuidle-powernv.c| 3 +-- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index c870e38..0847303 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -339,6 +339,10 @@ #define LPCR_LPES_SH 2 #define LPCR_RMI 0x0002 /* real mode is cache inhibit */ #define LPCR_HDICE 0x0001 /* Hyp Decr enable (HV,PR,EE) */ +/* LPCR PECE1 helpers. Used to disable/enable wake up due to decrementer */ +#define LPCR_CLEAR_PECE1(old)(old & ~(u64)LPCR_PECE1) +#define LPCR_SET_PECE1(old) (old | (u64)LPCR_PECE1) + #ifndef SPRN_LPID #define SPRN_LPID 0x13F /* Logical Partition Identifier */ #endif diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 781ec45..ab61cb0 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -165,7 +165,7 @@ static void pnv_smp_cpu_kill_self(void) /* We don't want to take decrementer interrupts while we are offline, * so clear LPCR:PECE1. We keep PECE2 enabled. */ - mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1); + mtspr(SPRN_LPCR, LPCR_CLEAR_PECE1(mfspr(SPRN_LPCR))); while (!generic_check_cpu_restart(cpu)) { ppc64_runlatch_off(); @@ -203,7 +203,7 @@ static void pnv_smp_cpu_kill_self(void) if (!generic_check_cpu_restart(cpu)) DBG("CPU%d Unexpected exit while offline !\n", cpu); } - mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_PECE1); + mtspr(SPRN_LPCR, LPCR_SET_PECE1(mfspr(SPRN_LPCR))); DBG("CPU%d coming online...\n", cpu); } diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index de61b9a..ed0be4c 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -69,11 +69,10 @@ static int fastsleep_loop(struct cpuidle_device *dev, if (unlikely(system_state < SYSTEM_RUNNING)) return index; - new_lpcr = old_lpcr; /* Do not exit powersave upon decrementer as we've setup the timer * offload. */ - new_lpcr &= ~LPCR_PECE1; + new_lpcr = LPCR_CLEAR_PECE1(old_lpcr); mtspr(SPRN_LPCR, new_lpcr); power7_sleep(); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior
Fastsleep is one of the idle state which cpuidle subsystem currently uses on power8 machines. In this state L2 cache is brought down to a threshold voltage. Therefore when the core is in fastsleep, the communication between L2 and L3 needs to be fenced. But there is a bug in the current power8 chips surrounding this fencing. OPAL provides a workaround which precludes the possibility of hitting this bug. But running with this workaround applied causes checkstop if any correctable error in L2 cache directory is detected. Hence OPAL also provides a way to undo the workaround. In the existing implementation, workaround is applied by the last thread of the core entering fastsleep and undone by the first thread waking up. But this has a performance cost. These OPAL calls account for roughly 4000 cycles everytime the core has to enter or wakeup from fastsleep. This patch introduces a sysfs attribute (fastsleep_workaround_state) to choose the behavior of this workaround. By default, fastsleep_workaround_state = 0. In this case, workaround is applied/undone everytime the core enters/exits fastsleep. fastsleep_workaround_state = 1. In this case the workaround is applied once on all the cores and never undone. This can be triggered by echo 1 > /sys/devices/system/cpu/fastsleep_workaround_state For simplicity this attribute can be modified only once. Implying, once fastsleep_workaround_state is changed to 1, it cannot be reverted to the default state. Signed-off-by: Shreyas B. Prabhu --- Changes in V3- Kernel parameter changed to sysfs attribute Modified commmit message arch/powerpc/include/asm/opal.h| 8 +++ arch/powerpc/platforms/powernv/idle.c | 83 +- arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + 3 files changed, 91 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 9ee0a30..8bea8fc 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -180,6 +180,13 @@ struct opal_sg_list { #define OPAL_PM_WINKLE_ENABLED 0x0004 #define OPAL_PM_SLEEP_ENABLED_ER1 0x0008 +/* + * OPAL_CONFIG_CPU_IDLE_STATE parameters + */ +#define OPAL_CONFIG_IDLE_FASTSLEEP 1 +#define OPAL_CONFIG_IDLE_UNDO 0 +#define OPAL_CONFIG_IDLE_APPLY 1 + #ifndef __ASSEMBLY__ #include @@ -924,6 +931,7 @@ int64_t opal_handle_hmi(void); int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end); int64_t opal_unregister_dump_region(uint32_t id); int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); +int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag); int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number); int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg, uint64_t msg_len); diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 77992f6..79157b9 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -13,6 +13,8 @@ #include #include #include +#include +#include #include #include @@ -136,6 +138,77 @@ u32 pnv_get_supported_cpuidle_states(void) } EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states); +static void pnv_fastsleep_workaround_apply(void *info) +{ + opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP, + OPAL_CONFIG_IDLE_APPLY); +} + +/* + * Used to store fastsleep workaround state + * 0 - Workaround applied/undone at fastsleep entry/exit path (Default) + * 1 - Workaround applied once, never undone. + */ +static u8 fastsleep_workaround_state; + +static ssize_t show_fastsleep_workaround_state(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%u\n", fastsleep_workaround_state); +} + +static ssize_t store_fastsleep_workaround_state(struct device *dev, + struct device_attribute *attr, const char *buf, + size_t count) +{ + u32 val; + cpumask_t primary_thread_mask; + + /* +* fastsleep_workaround_state is write-once parameter. +* Once it has been set to 1, it cannot be undone. +*/ + if (fastsleep_workaround_state == 1) + return -EINVAL; + + if (kstrtou32(buf, 0, &val)) + return -EINVAL; + + if (val > 1) + return -EINVAL; + + fastsleep_workaround_state = 1; + /* +* fastsleep_workaround_state = 1 implies fastsleep workaround needs to +* be left in 'applied' state on all the cores. Do this by- +* 1. Patching out the call to 'undo' workaround in fastsleep exit path +* 2. Sending ipi to all the cores which have atleast one online thread +* 3. Patching out the call to 'apply' workaround in fastsleep entry +
[PATCH v3 1/3] powerpc: Fix cpu_online_cores_map to return only online threads mask
Currently, cpu_online_cores_map returns a mask, which for every core that has atleast one online thread, has the first-cpu-of-that-core's bit set. But the first cpu itself may not be online always. In such cases, if the returned mask is used for IPI, then it'll cause IPIs to be skipped on cores where the first thread is offline. Fix this by setting first-online-cpu-of-the-core's bit in the mask. This is done by fixing this in the underlying function cpu_thread_mask_to_cores. Signed-off-by: Shreyas B. Prabhu --- This patch is new in v3 In an example scenario where all the threads of 1st core are offline and argument to cpu_thread_mask_to_cores is cpu_possible_mask, with this implementation, return value will not have any bit corresponding to 1st core set. I think that should be okay. Any thoughts? arch/powerpc/include/asm/cputhreads.h | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h index 2bf8e93..9e8485c 100644 --- a/arch/powerpc/include/asm/cputhreads.h +++ b/arch/powerpc/include/asm/cputhreads.h @@ -31,9 +31,9 @@ extern cpumask_t threads_core_mask; /* cpu_thread_mask_to_cores - Return a cpumask of one per cores *hit by the argument * - * @threads: a cpumask of threads + * @threads: a cpumask of online threads * - * This function returns a cpumask which will have one "cpu" (or thread) + * This function returns a cpumask which will have one online cpu's * bit set for each core that has at least one thread set in the argument. * * This can typically be used for things like IPI for tlb invalidations @@ -42,13 +42,16 @@ extern cpumask_t threads_core_mask; static inline cpumask_t cpu_thread_mask_to_cores(const struct cpumask *threads) { cpumask_t tmp, res; - int i; + int i, cpu; cpumask_clear(&res); for (i = 0; i < NR_CPUS; i += threads_per_core) { cpumask_shift_left(&tmp, &threads_core_mask, i); - if (cpumask_intersects(threads, &tmp)) - cpumask_set_cpu(i, &res); + if (cpumask_intersects(threads, &tmp)) { + cpu = cpumask_next_and(-1, &tmp, cpu_online_mask); + if (cpu < nr_cpu_ids) + cpumask_set_cpu(cpu, &res); + } } return res; } -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 2/3] powerpc/powernv: Move cpuidle related code from setup.c to new file
This is a cleanup patch; doesn't change any functionality. Moves all cpuidle related code from setup.c to a new file. Signed-off-by: Shreyas B. Prabhu --- This patch is new in v3 arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/idle.c | 186 arch/powerpc/platforms/powernv/setup.c | 166 3 files changed, 187 insertions(+), 167 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/idle.c diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 6f3c5d3..560ee54 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -1,4 +1,4 @@ -obj-y += setup.o opal-wrappers.o opal.o opal-async.o +obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y += opal-msglog.o opal-hmi.o opal-power.o diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c new file mode 100644 index 000..77992f6 --- /dev/null +++ b/arch/powerpc/platforms/powernv/idle.c @@ -0,0 +1,186 @@ +/* + * PowerNV cpuidle code + * + * Copyright 2015 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "powernv.h" +#include "subcore.h" + +static u32 supported_cpuidle_states; + +int pnv_save_sprs_for_winkle(void) +{ + int cpu; + int rc; + + /* +* hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross +* all cpus at boot. Get these reg values of current cpu and use the +* same accross all cpus. +*/ + uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1; + uint64_t hid0_val = mfspr(SPRN_HID0); + uint64_t hid1_val = mfspr(SPRN_HID1); + uint64_t hid4_val = mfspr(SPRN_HID4); + uint64_t hid5_val = mfspr(SPRN_HID5); + uint64_t hmeer_val = mfspr(SPRN_HMEER); + + for_each_possible_cpu(cpu) { + uint64_t pir = get_hard_smp_processor_id(cpu); + uint64_t hsprg0_val = (uint64_t)&paca[cpu]; + + /* +* HSPRG0 is used to store the cpu's pointer to paca. Hence last +* 3 bits are guaranteed to be 0. Program slw to restore HSPRG0 +* with 63rd bit set, so that when a thread wakes up at 0x100 we +* can use this bit to distinguish between fastsleep and +* deep winkle. +*/ + hsprg0_val |= 1; + + rc = opal_slw_set_reg(pir, SPRN_HSPRG0, hsprg0_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val); + if (rc != 0) + return rc; + + /* HIDs are per core registers */ + if (cpu_thread_in_core(cpu) == 0) { + + rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val); + if (rc != 0) + return rc; + } + } + + return 0; +} + +static void pnv_alloc_idle_core_states(void) +{ + int i, j; + int nr_cores = cpu_nr_cores(); + u32 *core_idle_state; + + /* +* core_idle_state - First 8 bits track the idle state of each thread +* of the core. The 8th bit is the lock bit. Initially all thread bits +* are set. They are cleared when the thread enters deep idle state +* like sleep and winkle. Initially the lock bit is cleared. +* The lock bit has 2 purposes +* a. While the first thread is restoring core state, it prevents +* other threads in the core from switching to process context. +* b. While the last thread in the core is saving the core state, it +* prev
Re: [PATCH] kvm: powerpc: Fix ppc64_defconfig + PPC_POWERNV=n build error
Any suggestions on this? On Thursday 16 April 2015 04:28 PM, Shreyas B. Prabhu wrote: > kvm_no_guest function calls power7_wakeup_loss to put the thread into > the deepest supported idle state. power7_wakeup_loss is defined in > arch/powerpc/kernel/idle_power7.S, which is compiled only when PPC_P7_NAP=y. > And PPC_P7_NAP is selected when PPC_POWERNV=y. > Hence in cases where PPC_POWERNV=n and KVM_BOOK3S_64_HV=y we see the > following error: > > arch/powerpc/kvm/built-in.o: In function `kvm_no_guest': > arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x42c): undefined reference to > `power7_wakeup_loss' > > Fix this by adding PPC_POWERNV as a dependency for KVM_BOOK3S_64_HV. > > Signed-off-by: Shreyas B. Prabhu > --- > arch/powerpc/kvm/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig > index 11850f3..b3b3d9f 100644 > --- a/arch/powerpc/kvm/Kconfig > +++ b/arch/powerpc/kvm/Kconfig > @@ -75,7 +75,7 @@ config KVM_BOOK3S_64 > > config KVM_BOOK3S_64_HV > tristate "KVM support for POWER7 and PPC970 using hypervisor mode in > host" > - depends on KVM_BOOK3S_64 > + depends on KVM_BOOK3S_64 && PPC_POWERNV > select KVM_BOOK3S_HV_POSSIBLE > select MMU_NOTIFIER > select CMA > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior
>> >> By default, fastsleep_workaround_state = dynamic. In this case, workaround >> is applied/undone everytime the core enters/exits fastsleep. >> >> fastsleep_workaround_state = applyonce. In this case the workaround is >> applied once on all the cores and never undone. This can be triggered by >> echo applyonce > /sys/devices/system/cpu/fastsleep_workaround_state > > I was wondering if we really need such an elaborate design for this > sysfs file. Why not a sysfs file called fastsleep_workaround_apply_once, > which is set to '0' by default and the only value that it can take is > '1' ? The name easily implies that the workaround is applied only once > if it is set. I can see that this can cut down a good chunk of code from > this patch. I just didn't find too much value in having so much code for > a simple 'on' knob. I was considering something similar too. But then moved to this format as I thought this was unambiguous. Also moving to a binary attribute will reduces code only in show_fastsleep_workaround_state which I don't feel is much. That said, if you feel strongly about it, I can change it to the format you suggested. Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior
> > A point that bothers me here is if we can potentially race with cpu > hotplug ? If cpuX and its siblings are offline and it was interrupted to > come online: > > cpuX cpuY > Interrupted to come online > Undo workaround > > Nop the fastsleep_workaround_exit path > IPI online cores: apply workaround once > > Set yourself in the online mask > Nop the fastsleep_workaround_entry path > > > This results in cpuX undoing the workaround on its core, never to set it > back again. > > So should we protect the region between the beginning and end of > patching instructions with get_online_cpus() and put_online_cpus() ? > Nice catch. I had missed this. Sending out a patch correcting this. Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 0/3] powerpc: powernv: Fastsleep workaround behavior
Fastsleep is one of the idle state which cpuidle subsystem currently uses on power8 machines. In this state L2 cache is brought down to a threshold voltage. Therefore when the core is in fastsleep, the communication between L2 and L3 needs to be fenced. But there is a bug in the current power8 chips surrounding this fencing. OPAL provides a workaround which precludes the possibility of hitting this bug. But running with this workaround applied causes checkstop if any correctable error in L2 cache directory is detected. Hence OPAL also provides a way to undo the workaround. In the existing implementation, workaround is applied by the last thread of the core entering fastsleep and undone by the first thread waking up. But this has a performance cost. These OPAL calls account for roughly 4000 cycles everytime the core has to enter or wakeup from fastsleep. This patchset introduces a sysfs attribute (fastsleep_workaround_state) to choose the behavior of this workaround. Patch 1/3 fixes cpu_online_cores_map which is used by Patch 3/3. Patch 2/3 is a clean up patch. It moves all cpuidle related code into a new file. Patch 3/3 introduces the sysfs attribute to control fastsleep workaround behavior Changes in v5: - Fix potential race with hotplug with get_online_cpu/put_online_cpu Changes in v4: - -Handling patch_instruction and OPAL call errors -Sysfs attribute takes string ("dynamic" vs "applyonce") as input. -Improved changelogs Changes in v3: -- -Kernel parameter changed to sysfs attribute Changes in v2: -- -Changed commit message to accurately describe the downside of running workaround always applied. Shreyas B. Prabhu (3): powerpc: Fix cpu_online_cores_map to return only online threads mask powerpc/powernv: Move cpuidle related code from setup.c to new file powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior arch/powerpc/include/asm/cputhreads.h | 13 +- arch/powerpc/include/asm/opal-api.h| 7 + arch/powerpc/include/asm/opal.h| 1 + arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/idle.c | 323 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/setup.c | 171 - 7 files changed, 341 insertions(+), 177 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/idle.c -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 2/3] powerpc/powernv: Move cpuidle related code from setup.c to new file
This is a cleanup patch; doesn't change any functionality. Moves all cpuidle related code from setup.c to a new file. Signed-off-by: Shreyas B. Prabhu Reviewed-by: Preeti U Murthy --- arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/idle.c | 191 arch/powerpc/platforms/powernv/setup.c | 171 3 files changed, 192 insertions(+), 172 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/idle.c diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 33e44f3..bee9235 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -1,4 +1,4 @@ -obj-y += setup.o opal-wrappers.o opal.o opal-async.o +obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y += opal-msglog.o opal-hmi.o opal-power.o diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c new file mode 100644 index 000..104235a --- /dev/null +++ b/arch/powerpc/platforms/powernv/idle.c @@ -0,0 +1,191 @@ +/* + * PowerNV cpuidle code + * + * Copyright 2015 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "powernv.h" +#include "subcore.h" + +static u32 supported_cpuidle_states; + +int pnv_save_sprs_for_winkle(void) +{ + int cpu; + int rc; + + /* +* hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross +* all cpus at boot. Get these reg values of current cpu and use the +* same accross all cpus. +*/ + uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1; + uint64_t hid0_val = mfspr(SPRN_HID0); + uint64_t hid1_val = mfspr(SPRN_HID1); + uint64_t hid4_val = mfspr(SPRN_HID4); + uint64_t hid5_val = mfspr(SPRN_HID5); + uint64_t hmeer_val = mfspr(SPRN_HMEER); + + for_each_possible_cpu(cpu) { + uint64_t pir = get_hard_smp_processor_id(cpu); + uint64_t hsprg0_val = (uint64_t)&paca[cpu]; + + /* +* HSPRG0 is used to store the cpu's pointer to paca. Hence last +* 3 bits are guaranteed to be 0. Program slw to restore HSPRG0 +* with 63rd bit set, so that when a thread wakes up at 0x100 we +* can use this bit to distinguish between fastsleep and +* deep winkle. +*/ + hsprg0_val |= 1; + + rc = opal_slw_set_reg(pir, SPRN_HSPRG0, hsprg0_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val); + if (rc != 0) + return rc; + + /* HIDs are per core registers */ + if (cpu_thread_in_core(cpu) == 0) { + + rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val); + if (rc != 0) + return rc; + } + } + + return 0; +} + +static void pnv_alloc_idle_core_states(void) +{ + int i, j; + int nr_cores = cpu_nr_cores(); + u32 *core_idle_state; + + /* +* core_idle_state - First 8 bits track the idle state of each thread +* of the core. The 8th bit is the lock bit. Initially all thread bits +* are set. They are cleared when the thread enters deep idle state +* like sleep and winkle. Initially the lock bit is cleared. +* The lock bit has 2 purposes +* a. While the first thread is restoring core state, it prevents +* other threads in the core from switching to process context. +* b. While the last thread in the core is saving the core state, it +* prev
[PATCH v5 1/3] powerpc: Fix cpu_online_cores_map to return only online threads mask
Currently, cpu_online_cores_map returns a mask, which for every core with at least one online thread, has the bit for thread 0 of the core set to 1, and the bits for all other threads of the core set to 0. But thread 0 of the core itself may not be online always. In such cases, if the returned mask is used for IPI, then it'll cause IPIs to be skipped on cores where the first thread is offline, because the IPI code refuses to send IPIs to offline threads. Fix this by setting the bit of the first online thread in the core. This is done by fixing this in the underlying function cpu_thread_mask_to_cores. The result has the property that for all cores with online threads, there is one bit set in the returned map. And further, all bits that are set in the returned map correspond to online threads. Signed-off-by: Shreyas B. Prabhu Reviewed-by: Preeti U Murthy [ Changelog from Michael Ellerman ] Reviewed-by: Gautham R. Shenoy --- arch/powerpc/include/asm/cputhreads.h | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h index 4c8ad59..1076d3f 100644 --- a/arch/powerpc/include/asm/cputhreads.h +++ b/arch/powerpc/include/asm/cputhreads.h @@ -31,9 +31,9 @@ extern cpumask_t threads_core_mask; /* cpu_thread_mask_to_cores - Return a cpumask of one per cores *hit by the argument * - * @threads: a cpumask of threads + * @threads: a cpumask of online threads * - * This function returns a cpumask which will have one "cpu" (or thread) + * This function returns a cpumask which will have one online cpu's * bit set for each core that has at least one thread set in the argument. * * This can typically be used for things like IPI for tlb invalidations @@ -42,13 +42,16 @@ extern cpumask_t threads_core_mask; static inline cpumask_t cpu_thread_mask_to_cores(const struct cpumask *threads) { cpumask_t tmp, res; - int i; + int i, cpu; cpumask_clear(&res); for (i = 0; i < NR_CPUS; i += threads_per_core) { cpumask_shift_left(&tmp, &threads_core_mask, i); - if (cpumask_intersects(threads, &tmp)) - cpumask_set_cpu(i, &res); + if (cpumask_intersects(threads, &tmp)) { + cpu = cpumask_next_and(-1, &tmp, cpu_online_mask); + if (cpu < nr_cpu_ids) + cpumask_set_cpu(cpu, &res); + } } return res; } -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior
Fastsleep is one of the idle state which cpuidle subsystem currently uses on power8 machines. In this state L2 cache is brought down to a threshold voltage. Therefore when the core is in fastsleep, the communication between L2 and L3 needs to be fenced. But there is a bug in the current power8 chips surrounding this fencing. OPAL provides a workaround which precludes the possibility of hitting this bug. But running with this workaround applied causes checkstop if any correctable error in L2 cache directory is detected. Hence OPAL also provides a way to undo the workaround. In the existing implementation, workaround is applied by the last thread of the core entering fastsleep and undone by the first thread waking up. But this has a performance cost. These OPAL calls account for roughly 4000 cycles everytime the core has to enter or wakeup from fastsleep. This patch introduces a sysfs attribute (fastsleep_workaround_state) to choose the behavior of this workaround. By default, fastsleep_workaround_state = dynamic. In this case, workaround is applied/undone everytime the core enters/exits fastsleep. fastsleep_workaround_state = applyonce. In this case the workaround is applied once on all the cores and never undone. This can be triggered by echo applyonce > /sys/devices/system/cpu/fastsleep_workaround_state For simplicity this attribute can be modified only once. Implying, once fastsleep_workaround_state is changed to applyonce, it cannot be reverted to the default state. Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/opal-api.h| 7 ++ arch/powerpc/include/asm/opal.h| 1 + arch/powerpc/platforms/powernv/idle.c | 134 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + 4 files changed, 143 insertions(+) diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 0321a90..a49e5fa 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -165,6 +165,13 @@ #define OPAL_PM_WINKLE_ENABLED 0x0004 #define OPAL_PM_SLEEP_ENABLED_ER1 0x0008 /* with workaround */ +/* + * OPAL_CONFIG_CPU_IDLE_STATE parameters + */ +#define OPAL_CONFIG_IDLE_FASTSLEEP 1 +#define OPAL_CONFIG_IDLE_UNDO 0 +#define OPAL_CONFIG_IDLE_APPLY 1 + #ifndef __ASSEMBLY__ /* Other enums */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 042af1a..9a47813 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -186,6 +186,7 @@ int64_t opal_handle_hmi(void); int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end); int64_t opal_unregister_dump_region(uint32_t id); int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); +int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag); int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number); int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg, uint64_t msg_len); diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 104235a..eac7211 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -13,6 +13,8 @@ #include #include #include +#include +#include #include #include @@ -136,6 +138,129 @@ u32 pnv_get_supported_cpuidle_states(void) } EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states); + +static void pnv_fastsleep_workaround_apply(void *info) + +{ + int rc; + int *err = info; + + rc = opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP, + OPAL_CONFIG_IDLE_APPLY); + if (rc) + *err = 1; +} + +/* + * Used to store fastsleep workaround state + * 0 - Workaround applied/undone at fastsleep entry/exit path (Default) + * 1 - Workaround applied once, never undone. + */ +static u8 fastsleep_workaround_state; + +static const char * const fastsleep_workaround_avail_states[] = { + "dynamic", "applyonce" +}; + +/* + * fastsleep_workaround_avail_states values + */ +enum { + WORKAROUND_DYNAMIC, + WORKAROUND_APPLYONCE +}; +static ssize_t show_fastsleep_workaround_state(struct device *dev, + struct device_attribute *attr, char *buf) +{ + char *s = buf; + + if (fastsleep_workaround_state == 0) { + s += sprintf(s, "[%s] ", + fastsleep_workaround_avail_states[WORKAROUND_DYNAMIC]); + s += sprintf(s, "%s\n", + fastsleep_workaround_avail_states[WORKAROUND_APPLYONCE]); + } else { + s += sprintf(s, "%s ", + fastsleep_workaround_avail_states[WORKAROUND_DYNAMIC]); + s += sprintf(s, "[%s]\n", + fastsleep_workaround_avail_s
[PATCH] kvm: powerpc: Fix ppc64_defconfig + PPC_POWERNV=n build error
kvm_no_guest function calls power7_wakeup_loss to put the thread into the deepest supported idle state. power7_wakeup_loss is defined in arch/powerpc/kernel/idle_power7.S, which is compiled only when PPC_P7_NAP=y. And PPC_P7_NAP is selected when PPC_POWERNV=y. Hence in cases where PPC_POWERNV=n and KVM_BOOK3S_64_HV=y we see the following error: arch/powerpc/kvm/built-in.o: In function `kvm_no_guest': arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x42c): undefined reference to `power7_wakeup_loss' Fix this by adding PPC_POWERNV as a dependency for KVM_BOOK3S_64_HV. Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/kvm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 11850f3..b3b3d9f 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -75,7 +75,7 @@ config KVM_BOOK3S_64 config KVM_BOOK3S_64_HV tristate "KVM support for POWER7 and PPC970 using hypervisor mode in host" - depends on KVM_BOOK3S_64 + depends on KVM_BOOK3S_64 && PPC_POWERNV select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [v3, 1/3] powerpc: Fix cpu_online_cores_map to return only online threads mask
On Monday 30 March 2015 03:06 PM, Michael Ellerman wrote: > On Sun, 2015-22-03 at 04:42:57 UTC, "Shreyas B. Prabhu" wrote: >> Currently, cpu_online_cores_map returns a mask, which for every core >> that has atleast one online thread, has the first-cpu-of-that-core's bit >> set. > > ... which for every core with at least one online thread, has the bit for > thread 0 of the core set to 1, and the bits for all other threads of the > core > set to 0. > > Maybe that's clearer? > >> But the first cpu itself may not be online always. In such cases, if >^ > of the core > >> the returned mask is used for IPI, then it'll cause IPIs to be skipped >> on cores where the first thread is offline. > > .. because the IPI code refuses to send IPIs to offline threads, right? Yes. > >> Fix this by setting first-online-cpu-of-the-core's bit in the mask. > > .. by setting the bit of the first online thread in the core. > >> This is done by fixing this in the underlying function >> cpu_thread_mask_to_cores. > > > The result has the property that for all cores with online threads, there is > one bit set in the returned map. And further, all bits that are set in the > returned map correspond to online threads. > > >> Signed-off-by: Shreyas B. Prabhu >> --- >> This patch is new in v3 >> >> In an example scenario where all the threads of 1st core are offline >> and argument to cpu_thread_mask_to_cores is cpu_possible_mask, >> with this implementation, return value will not have any bit >> corresponding to 1st core set. I think that should be okay. Any thoughts? > > Looking at linux-next: > > $ git grep cpu_thread_mask_to_cores > arch/powerpc/include/asm/cputhreads.h:/* cpu_thread_mask_to_cores - Return > a cpumask of one per cores > arch/powerpc/include/asm/cputhreads.h:static inline cpumask_t > cpu_thread_mask_to_cores(const struct cpumask *threads) > arch/powerpc/include/asm/cputhreads.h: return > cpu_thread_mask_to_cores(cpu_online_mask); > $ git grep cpu_online_cores_map > arch/powerpc/include/asm/cputhreads.h:static inline cpumask_t > cpu_online_cores_map(void) > > ie. There are no users. > > So yeah I think we can change the semantics of this, and the semantics you > describe make sense. > > If you agree with my changelog comments I'm happy to fix that up and merge > this, or you can send a v4 if you like. > I'll fix the changelog in v4. > cheers > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [v3, 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior
On Monday 30 March 2015 03:51 PM, Michael Ellerman wrote: > On Sun, 2015-22-03 at 04:42:59 UTC, "Shreyas B. Prabhu" wrote: >> Fastsleep is one of the idle state which cpuidle subsystem currently >> uses on power8 machines. In this state L2 cache is brought down to a >> threshold voltage. Therefore when the core is in fastsleep, the >> communication between L2 and L3 needs to be fenced. But there is a bug >> in the current power8 chips surrounding this fencing. >> >> OPAL provides a workaround which precludes the possibility of hitting >> this bug. But running with this workaround applied causes checkstop >> if any correctable error in L2 cache directory is detected. Hence OPAL >> also provides a way to undo the workaround. >> >> In the existing implementation, workaround is applied by the last thread >> of the core entering fastsleep and undone by the first thread waking up. >> But this has a performance cost. These OPAL calls account for roughly >> 4000 cycles everytime the core has to enter or wakeup from fastsleep. >> >> This patch introduces a sysfs attribute (fastsleep_workaround_state) >> to choose the behavior of this workaround. >> >> By default, fastsleep_workaround_state = 0. In this case, workaround >> is applied/undone everytime the core enters/exits fastsleep. >> >> fastsleep_workaround_state = 1. In this case the workaround is applied >> once on all the cores and never undone. This can be triggered by >> echo 1 > /sys/devices/system/cpu/fastsleep_workaround_state >> >> For simplicity this attribute can be modified only once. Implying, once >> fastsleep_workaround_state is changed to 1, it cannot be reverted to >> the default state. > > This sounds good, although the name is a bit vague. > > Just calling it "state" doesn't make it clear what 0 and 1 mean. > I think better would be "fastsleep_workaround_active" ? > > Though even that is a bit wrong, because 0 doesn't really mean it's not > active, > it means it's not *permanently* active. > > So another option would be to make it a string attribute, with the initial > state being eg. "dynamic" and then maybe "applied" for the applied state? > How about "fastsleep_workaround_permanent", with default value = 0. User can make workaround permanent by echoing 1 to it. I'll post out V4 with the suggested changes. Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] tracing/mm: Don't trace kmem_cache_free on offline cpus
Since tracepoints use RCU for protection, they must not be called on offline cpus. trace_kmem_cache_free can be called on an offline cpu in this scenario caught by LOCKDEP: === [ INFO: suspicious RCU usage. ] 4.1.0-rc1+ #9 Not tainted --- include/trace/events/kmem.h:148 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9 Call Trace: [c01fed2f78f0] [c09dee8c] .dump_stack+0x98/0xd4 (unreliable) [c01fed2f7970] [c0128d88] .lockdep_rcu_suspicious+0x108/0x170 [c01fed2f7a00] [c026f924] .kmem_cache_free+0x344/0x4b0 [c01fed2f7ab0] [c00bd1cc] .__mmdrop+0x4c/0x160 [c01fed2f7b40] [c01068e0] .idle_task_exit+0xf0/0x100 [c01fed2f7bc0] [c0066948] .pnv_smp_cpu_kill_self+0x58/0x2c0 [c01fed2f7ca0] [c003ce34] .cpu_die+0x34/0x50 [c01fed2f7d10] [c00176d0] .arch_cpu_idle_dead+0x20/0x40 [c01fed2f7d80] [c011f9a8] .cpu_startup_entry+0x708/0x7a0 [c01fed2f7ec0] [c003cb6c] .start_secondary+0x36c/0x3a0 [c01fed2f7f90] [c0008b6c] start_secondary_prolog+0x10/0x14 Fix this by converting kmem_cache_free trace point into TRACE_EVENT_CONDITION where condition is cpu_online(smp_processor_id()) Signed-off-by: Shreyas B. Prabhu Reported-by: Aneesh Kumar K.V --- include/trace/events/kmem.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index 81ea598..dd9e612 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -140,11 +140,13 @@ DEFINE_EVENT(kmem_free, kfree, TP_ARGS(call_site, ptr) ); -DEFINE_EVENT(kmem_free, kmem_cache_free, +DEFINE_EVENT_CONDITION(kmem_free, kmem_cache_free, TP_PROTO(unsigned long call_site, const void *ptr), - TP_ARGS(call_site, ptr) + TP_ARGS(call_site, ptr), + + TP_CONDITION(cpu_online(smp_processor_id())) ); TRACE_EVENT(mm_page_free, -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus
Since tracepoints use RCU for protection, they must not be called on offline cpus. trace_mm_page_pcpu_drain can be called on an offline cpu in this scenario caught by LOCKDEP: === [ INFO: suspicious RCU usage. ] 4.1.0-rc1+ #9 Not tainted --- include/trace/events/kmem.h:265 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 1 lock held by swapper/5/0: #0: (&(&zone->lock)->rlock){..-...}, at: [] .free_pcppages_bulk+0x70/0x920 stack backtrace: CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.1.0-rc1+ #9 Call Trace: [c01fed2e7720] [c09dee8c] .dump_stack+0x98/0xd4 (unreliable) [c01fed2e77a0] [c0128d88] .lockdep_rcu_suspicious+0x108/0x170 [c01fed2e7830] [c020794c] .free_pcppages_bulk+0x60c/0x920 [c01fed2e7980] [c0208188] .free_hot_cold_page+0x208/0x280 [c01fed2e7a30] [c004d000] .destroy_context+0x90/0xd0 [c01fed2e7ab0] [c00bd1d8] .__mmdrop+0x58/0x160 [c01fed2e7b40] [c01068e0] .idle_task_exit+0xf0/0x100 [c01fed2e7bc0] [c0066948] .pnv_smp_cpu_kill_self+0x58/0x2c0 [c01fed2e7ca0] [c003ce34] .cpu_die+0x34/0x50 [c01fed2e7d10] [c00176d0] .arch_cpu_idle_dead+0x20/0x40 [c01fed2e7d80] [c011f9a8] .cpu_startup_entry+0x708/0x7a0 [c01fed2e7ec0] [c003cb6c] .start_secondary+0x36c/0x3a0 [c01fed2e7f90] [c0008b6c] start_secondary_prolog+0x10/0x14 Fix this by converting mm_page_pcpu_drain trace point into TRACE_EVENT_CONDITION where condition is cpu_online(smp_processor_id()) Signed-off-by: Shreyas B. Prabhu --- include/trace/events/kmem.h | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index 4abda92..6cd975f 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -257,12 +257,26 @@ DEFINE_EVENT(mm_page, mm_page_alloc_zone_locked, TP_ARGS(page, order, migratetype) ); -DEFINE_EVENT_PRINT(mm_page, mm_page_pcpu_drain, +TRACE_EVENT_CONDITION(mm_page_pcpu_drain, TP_PROTO(struct page *page, unsigned int order, int migratetype), TP_ARGS(page, order, migratetype), + TP_CONDITION(cpu_online(smp_processor_id())), + + TP_STRUCT__entry( + __field(unsigned long, pfn ) + __field(unsigned int, order ) + __field(int,migratetype ) + ), + + TP_fast_assign( + __entry->pfn= page ? page_to_pfn(page) : -1UL; + __entry->order = order; + __entry->migratetype= migratetype; + ), + TP_printk("page=%p pfn=%lu order=%d migratetype=%d", pfn_to_page(__entry->pfn), __entry->pfn, __entry->order, __entry->migratetype) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] tracing/mm: Don't trace mm_page_free on offline cpus
Since tracepoints use RCU for protection, they must not be called on offline cpus. trace_mm_page_free can be called on an offline cpu in this scenario caught by LOCKDEP: === [ INFO: suspicious RCU usage. ] 4.1.0-rc1+ #9 Not tainted --- include/trace/events/kmem.h:170 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9 Call Trace: [c01fed2f7790] [c09dee8c] .dump_stack+0x98/0xd4 (unreliable) [c01fed2f7810] [c0128d88] .lockdep_rcu_suspicious+0x108/0x170 [c01fed2f78a0] [c0203bc4] .free_pages_prepare+0x494/0x680 [c01fed2f7980] [c0207fd0] .free_hot_cold_page+0x50/0x280 [c01fed2f7a30] [c004d000] .destroy_context+0x90/0xd0 [c01fed2f7ab0] [c00bd1d8] .__mmdrop+0x58/0x160 [c01fed2f7b40] [c01068e0] .idle_task_exit+0xf0/0x100 [c01fed2f7bc0] [c0066948] .pnv_smp_cpu_kill_self+0x58/0x2c0 [c01fed2f7ca0] [c003ce34] .cpu_die+0x34/0x50 [c01fed2f7d10] [c00176d0] .arch_cpu_idle_dead+0x20/0x40 [c01fed2f7d80] [c011f9a8] .cpu_startup_entry+0x708/0x7a0 [c01fed2f7ec0] [c003cb6c] .start_secondary+0x36c/0x3a0 [c01fed2f7f90] [c0008b6c] start_secondary_prolog+0x10/0x14 Fix this by converting mm_page_free trace point into TRACE_EVENT_CONDITION where condition is cpu_online(smp_processor_id()) Signed-off-by: Shreyas B. Prabhu --- include/trace/events/kmem.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index dd9e612..4abda92 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -149,12 +149,14 @@ DEFINE_EVENT_CONDITION(kmem_free, kmem_cache_free, TP_CONDITION(cpu_online(smp_processor_id())) ); -TRACE_EVENT(mm_page_free, +TRACE_EVENT_CONDITION(mm_page_free, TP_PROTO(struct page *page, unsigned int order), TP_ARGS(page, order), + TP_CONDITION(cpu_online(smp_processor_id())), + TP_STRUCT__entry( __field(unsigned long, pfn ) __field(unsigned int, order ) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus
>> -DEFINE_EVENT_PRINT(mm_page, mm_page_pcpu_drain, >> +TRACE_EVENT_CONDITION(mm_page_pcpu_drain, >> >> TP_PROTO(struct page *page, unsigned int order, int migratetype), >> >> TP_ARGS(page, order, migratetype), >> >> + TP_CONDITION(cpu_online(smp_processor_id())), >> + >> + TP_STRUCT__entry( >> + __field(unsigned long, pfn ) >> + __field(unsigned int, order ) >> + __field(int,migratetype ) >> + ), >> + >> + TP_fast_assign( >> + __entry->pfn= page ? page_to_pfn(page) : -1UL; >> + __entry->order = order; >> + __entry->migratetype= migratetype; >> + ), >> + > > What was the need to do the above changes besides adding TP_CONDITION ? > IIUC there is no existing macro which can both add a condition and override printk format, hence the fall back to TRACE_EVENT_CONDITION. Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus
On Wednesday 29 April 2015 08:48 PM, Steven Rostedt wrote: > On Wed, 29 Apr 2015 20:19:28 +0530 > Shreyas B Prabhu wrote: > >> IIUC there is no existing macro which can both add a condition and >> override printk format, hence the fall back to TRACE_EVENT_CONDITION. > > Hmm, want me to send you a patch that changes that? > I am not sure if its worth the effort now. It doesn't look like any other trace point apart from the above use case will benefit from it. Only smbus_write and smbus_reply seem to come close. But even they need separate TP_fast_assign. Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus
On Wednesday 29 April 2015 10:38 PM, Steven Rostedt wrote: >> I am not sure if its worth the effort now. It doesn't look like any >> other trace point apart from the above use case will benefit from it. >> Only smbus_write and smbus_reply seem to come close. But even they need >> separate TP_fast_assign. > > It shouldn't be a problem to implement. But I'm currently cleaning up > those files, and any changes will cause nasty conflicts. > > Lets do this. Push the current changes as is, and when I get around to > adding a DEFINE_EVENT_PRINT_CONDITION(), we can modify that code to use > it. > Okay, sure. Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus
On Thursday 30 April 2015 10:06 AM, Preeti Murthy wrote: > On Wed, Apr 29, 2015 at 10:49 PM, Shreyas B Prabhu > wrote: >> >> >> On Wednesday 29 April 2015 10:38 PM, Steven Rostedt wrote: >>>> I am not sure if its worth the effort now. It doesn't look like any >>>> other trace point apart from the above use case will benefit from it. >>>> Only smbus_write and smbus_reply seem to come close. But even they need >>>> separate TP_fast_assign. >>> >>> It shouldn't be a problem to implement. But I'm currently cleaning up >>> those files, and any changes will cause nasty conflicts. >>> >>> Lets do this. Push the current changes as is, and when I get around to >>> adding a DEFINE_EVENT_PRINT_CONDITION(), we can modify that code to use >>> it. >>> >> Okay, sure. > > Looks good then. > > Reviewed-by: Preeti U Murthy Thanks a lot! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 1/3] powerpc: Fix cpu_online_cores_map to return only online threads mask
Currently, cpu_online_cores_map returns a mask, which for every core with at least one online thread, has the bit for thread 0 of the core set to 1, and the bits for all other threads of the core set to 0. But thread 0 of the core itself may not be online always. In such cases, if the returned mask is used for IPI, then it'll cause IPIs to be skipped on cores where the first thread is offline, because the IPI code refuses to send IPIs to offline threads. Fix this by setting the bit of the first online thread in the core. This is done by fixing this in the underlying function cpu_thread_mask_to_cores. The result has the property that for all cores with online threads, there is one bit set in the returned map. And further, all bits that are set in the returned map correspond to online threads. Signed-off-by: Shreyas B. Prabhu Reviewed-by: Preeti U Murthy [ Changelog from Michael Ellerman ] Reviewed-by: Gautham R. Shenoy --- arch/powerpc/include/asm/cputhreads.h | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h index 4c8ad59..1076d3f 100644 --- a/arch/powerpc/include/asm/cputhreads.h +++ b/arch/powerpc/include/asm/cputhreads.h @@ -31,9 +31,9 @@ extern cpumask_t threads_core_mask; /* cpu_thread_mask_to_cores - Return a cpumask of one per cores *hit by the argument * - * @threads: a cpumask of threads + * @threads: a cpumask of online threads * - * This function returns a cpumask which will have one "cpu" (or thread) + * This function returns a cpumask which will have one online cpu's * bit set for each core that has at least one thread set in the argument. * * This can typically be used for things like IPI for tlb invalidations @@ -42,13 +42,16 @@ extern cpumask_t threads_core_mask; static inline cpumask_t cpu_thread_mask_to_cores(const struct cpumask *threads) { cpumask_t tmp, res; - int i; + int i, cpu; cpumask_clear(&res); for (i = 0; i < NR_CPUS; i += threads_per_core) { cpumask_shift_left(&tmp, &threads_core_mask, i); - if (cpumask_intersects(threads, &tmp)) - cpumask_set_cpu(i, &res); + if (cpumask_intersects(threads, &tmp)) { + cpu = cpumask_next_and(-1, &tmp, cpu_online_mask); + if (cpu < nr_cpu_ids) + cpumask_set_cpu(cpu, &res); + } } return res; } -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 0/3] powerpc: powernv: Fastsleep workaround behavior
Fastsleep is one of the idle state which cpuidle subsystem currently uses on power8 machines. In this state L2 cache is brought down to a threshold voltage. Therefore when the core is in fastsleep, the communication between L2 and L3 needs to be fenced. But there is a bug in the current power8 chips surrounding this fencing. OPAL provides a workaround which precludes the possibility of hitting this bug. But running with this workaround applied causes checkstop if any correctable error in L2 cache directory is detected. Hence OPAL also provides a way to undo the workaround. In the existing implementation, workaround is applied by the last thread of the core entering fastsleep and undone by the first thread waking up. But this has a performance cost. These OPAL calls account for roughly 4000 cycles everytime the core has to enter or wakeup from fastsleep. This patchset introduces a sysfs attribute (fastsleep_workaround_applyonce) to choose the behavior of this workaround. Patch 1/3 fixes cpu_online_cores_map which is used by Patch 3/3. Patch 2/3 is a clean up patch. It moves all cpuidle related code into a new file. Patch 3/3 introduces the sysfs attribute to control fastsleep workaround behavior Changes in v6: - Changed the sysfs parameter to take 0/1 as input Changes in v5: - Fix potential race with hotplug with get_online_cpu/put_online_cpu Changes in v4: - -Handling patch_instruction and OPAL call errors -Sysfs attribute takes string ("dynamic" vs "applyonce") as input. -Improved changelogs Changes in v3: -- -Kernel parameter changed to sysfs attribute Changes in v2: -- -Changed commit message to accurately describe the downside of running workaround always applied. Shreyas B. Prabhu (3): powerpc: Fix cpu_online_cores_map to return only online threads mask powerpc/powernv: Move cpuidle related code from setup.c to new file powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior arch/powerpc/include/asm/cputhreads.h | 13 +- arch/powerpc/include/asm/opal-api.h| 7 + arch/powerpc/include/asm/opal.h| 1 + arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/idle.c | 323 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/setup.c | 171 - 7 files changed, 341 insertions(+), 177 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/idle.c -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior
Fastsleep is one of the idle state which cpuidle subsystem currently uses on power8 machines. In this state L2 cache is brought down to a threshold voltage. Therefore when the core is in fastsleep, the communication between L2 and L3 needs to be fenced. But there is a bug in the current power8 chips surrounding this fencing. OPAL provides a workaround which precludes the possibility of hitting this bug. But running with this workaround applied causes checkstop if any correctable error in L2 cache directory is detected. Hence OPAL also provides a way to undo the workaround. In the existing implementation, workaround is applied by the last thread of the core entering fastsleep and undone by the first thread waking up. But this has a performance cost. These OPAL calls account for roughly 4000 cycles everytime the core has to enter or wakeup from fastsleep. This patch introduces a sysfs attribute (fastsleep_workaround_applyonce) to choose the behavior of this workaround. By default, fastsleep_workaround_applyonce = 0. In this case, workaround is applied/undone everytime the core enters/exits fastsleep. fastsleep_workaround_applyonce = 1. In this case the workaround is applied once on all the cores and never undone. This can be triggered by echo 1 > /sys/devices/system/cpu/fastsleep_workaround_applyonce For simplicity this attribute can be modified only once. Implying, once fastsleep_workaround_applyonce is changed to 1, it cannot be reverted to the default state. Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/opal-api.h| 7 ++ arch/powerpc/include/asm/opal.h| 1 + arch/powerpc/platforms/powernv/idle.c | 101 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + 4 files changed, 110 insertions(+) diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 0321a90..a49e5fa 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -165,6 +165,13 @@ #define OPAL_PM_WINKLE_ENABLED 0x0004 #define OPAL_PM_SLEEP_ENABLED_ER1 0x0008 /* with workaround */ +/* + * OPAL_CONFIG_CPU_IDLE_STATE parameters + */ +#define OPAL_CONFIG_IDLE_FASTSLEEP 1 +#define OPAL_CONFIG_IDLE_UNDO 0 +#define OPAL_CONFIG_IDLE_APPLY 1 + #ifndef __ASSEMBLY__ /* Other enums */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 042af1a..9a47813 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -186,6 +186,7 @@ int64_t opal_handle_hmi(void); int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end); int64_t opal_unregister_dump_region(uint32_t id); int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); +int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag); int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number); int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg, uint64_t msg_len); diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 104235a..f90cc86 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -13,6 +13,8 @@ #include #include #include +#include +#include #include #include @@ -136,6 +138,96 @@ u32 pnv_get_supported_cpuidle_states(void) } EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states); + +static void pnv_fastsleep_workaround_apply(void *info) + +{ + int rc; + int *err = info; + + rc = opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP, + OPAL_CONFIG_IDLE_APPLY); + if (rc) + *err = 1; +} + +/* + * Used to store fastsleep workaround state + * 0 - Workaround applied/undone at fastsleep entry/exit path (Default) + * 1 - Workaround applied once, never undone. + */ +static u8 fastsleep_workaround_applyonce; + +static ssize_t show_fastsleep_workaround_applyonce(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%u\n", fastsleep_workaround_applyonce); +} + +static ssize_t store_fastsleep_workaround_applyonce(struct device *dev, + struct device_attribute *attr, const char *buf, + size_t count) +{ + cpumask_t primary_thread_mask; + int err; + u8 val; + + if (kstrtou8(buf, 0, &val) || val != 1) + return -EINVAL; + + if (fastsleep_workaround_applyonce == 1) + return count; + + /* +* fastsleep_workaround_applyonce = 1 implies +* fastsleep workaround needs to be left in 'applied' state on all +* the cores. Do this by- +* 1. Patching out the call to 'undo' workaround in fastsleep exit path +* 2. Sending ipi to all the cores which have atleast one onl
[PATCH v6 2/3] powerpc/powernv: Move cpuidle related code from setup.c to new file
This is a cleanup patch; doesn't change any functionality. Moves all cpuidle related code from setup.c to a new file. Signed-off-by: Shreyas B. Prabhu Reviewed-by: Preeti U Murthy --- arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/idle.c | 191 arch/powerpc/platforms/powernv/setup.c | 171 3 files changed, 192 insertions(+), 172 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/idle.c diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 33e44f3..bee9235 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -1,4 +1,4 @@ -obj-y += setup.o opal-wrappers.o opal.o opal-async.o +obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y += opal-msglog.o opal-hmi.o opal-power.o diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c new file mode 100644 index 000..104235a --- /dev/null +++ b/arch/powerpc/platforms/powernv/idle.c @@ -0,0 +1,191 @@ +/* + * PowerNV cpuidle code + * + * Copyright 2015 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "powernv.h" +#include "subcore.h" + +static u32 supported_cpuidle_states; + +int pnv_save_sprs_for_winkle(void) +{ + int cpu; + int rc; + + /* +* hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross +* all cpus at boot. Get these reg values of current cpu and use the +* same accross all cpus. +*/ + uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1; + uint64_t hid0_val = mfspr(SPRN_HID0); + uint64_t hid1_val = mfspr(SPRN_HID1); + uint64_t hid4_val = mfspr(SPRN_HID4); + uint64_t hid5_val = mfspr(SPRN_HID5); + uint64_t hmeer_val = mfspr(SPRN_HMEER); + + for_each_possible_cpu(cpu) { + uint64_t pir = get_hard_smp_processor_id(cpu); + uint64_t hsprg0_val = (uint64_t)&paca[cpu]; + + /* +* HSPRG0 is used to store the cpu's pointer to paca. Hence last +* 3 bits are guaranteed to be 0. Program slw to restore HSPRG0 +* with 63rd bit set, so that when a thread wakes up at 0x100 we +* can use this bit to distinguish between fastsleep and +* deep winkle. +*/ + hsprg0_val |= 1; + + rc = opal_slw_set_reg(pir, SPRN_HSPRG0, hsprg0_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val); + if (rc != 0) + return rc; + + /* HIDs are per core registers */ + if (cpu_thread_in_core(cpu) == 0) { + + rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val); + if (rc != 0) + return rc; + } + } + + return 0; +} + +static void pnv_alloc_idle_core_states(void) +{ + int i, j; + int nr_cores = cpu_nr_cores(); + u32 *core_idle_state; + + /* +* core_idle_state - First 8 bits track the idle state of each thread +* of the core. The 8th bit is the lock bit. Initially all thread bits +* are set. They are cleared when the thread enters deep idle state +* like sleep and winkle. Initially the lock bit is cleared. +* The lock bit has 2 purposes +* a. While the first thread is restoring core state, it prevents +* other threads in the core from switching to process context. +* b. While the last thread in the core is saving the core state, it +* prev
[PATCH v4 0/3] powerpc: powernv: Fastsleep workaround behavior
Fastsleep is one of the idle state which cpuidle subsystem currently uses on power8 machines. In this state L2 cache is brought down to a threshold voltage. Therefore when the core is in fastsleep, the communication between L2 and L3 needs to be fenced. But there is a bug in the current power8 chips surrounding this fencing. OPAL provides a workaround which precludes the possibility of hitting this bug. But running with this workaround applied causes checkstop if any correctable error in L2 cache directory is detected. Hence OPAL also provides a way to undo the workaround. In the existing implementation, workaround is applied by the last thread of the core entering fastsleep and undone by the first thread waking up. But this has a performance cost. These OPAL calls account for roughly 4000 cycles everytime the core has to enter or wakeup from fastsleep. This patchset introduces a sysfs attribute (fastsleep_workaround_state) to choose the behavior of this workaround. Patch 1/3 fixes cpu_online_cores_map which is used by Patch 3/3. Patch 2/3 is a clean up patch. It moves all cpuidle related code into a new file. Patch 3/3 introduces the sysfs attribute to control fastsleep workaround behavior Changes in v4: - -Handling patch_instruction and OPAL call errors -Sysfs attribute takes string ("dynamic" vs "applyonce") as input. -Improved changelogs Changes in v3: -- -Kernel parameter changed to sysfs attribute Changes in v2: -- -Changed commit message to accurately describe the downside of running workaround always applied. Shreyas B. Prabhu (3): powerpc: Fix cpu_online_cores_map to return only online threads mask powerpc/powernv: Move cpuidle related code from setup.c to new file powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior arch/powerpc/include/asm/cputhreads.h | 13 +- arch/powerpc/include/asm/opal-api.h| 7 + arch/powerpc/include/asm/opal.h| 1 + arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/idle.c | 323 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/setup.c | 171 - 7 files changed, 341 insertions(+), 177 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/idle.c -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 1/3] powerpc: Fix cpu_online_cores_map to return only online threads mask
Currently, cpu_online_cores_map returns a mask, which for every core with at least one online thread, has the bit for thread 0 of the core set to 1, and the bits for all other threads of the core set to 0. But thread 0 of the core itself may not be online always. In such cases, if the returned mask is used for IPI, then it'll cause IPIs to be skipped on cores where the first thread is offline, because the IPI code refuses to send IPIs to offline threads. Fix this by setting the bit of the first online thread in the core. This is done by fixing this in the underlying function cpu_thread_mask_to_cores. The result has the property that for all cores with online threads, there is one bit set in the returned map. And further, all bits that are set in the returned map correspond to online threads. Signed-off-by: Shreyas B. Prabhu Reviewed-by: Preeti U Murthy [ Changelog from Michael Ellerman ] --- arch/powerpc/include/asm/cputhreads.h | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h index 4c8ad59..1076d3f 100644 --- a/arch/powerpc/include/asm/cputhreads.h +++ b/arch/powerpc/include/asm/cputhreads.h @@ -31,9 +31,9 @@ extern cpumask_t threads_core_mask; /* cpu_thread_mask_to_cores - Return a cpumask of one per cores *hit by the argument * - * @threads: a cpumask of threads + * @threads: a cpumask of online threads * - * This function returns a cpumask which will have one "cpu" (or thread) + * This function returns a cpumask which will have one online cpu's * bit set for each core that has at least one thread set in the argument. * * This can typically be used for things like IPI for tlb invalidations @@ -42,13 +42,16 @@ extern cpumask_t threads_core_mask; static inline cpumask_t cpu_thread_mask_to_cores(const struct cpumask *threads) { cpumask_t tmp, res; - int i; + int i, cpu; cpumask_clear(&res); for (i = 0; i < NR_CPUS; i += threads_per_core) { cpumask_shift_left(&tmp, &threads_core_mask, i); - if (cpumask_intersects(threads, &tmp)) - cpumask_set_cpu(i, &res); + if (cpumask_intersects(threads, &tmp)) { + cpu = cpumask_next_and(-1, &tmp, cpu_online_mask); + if (cpu < nr_cpu_ids) + cpumask_set_cpu(cpu, &res); + } } return res; } -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior
Fastsleep is one of the idle state which cpuidle subsystem currently uses on power8 machines. In this state L2 cache is brought down to a threshold voltage. Therefore when the core is in fastsleep, the communication between L2 and L3 needs to be fenced. But there is a bug in the current power8 chips surrounding this fencing. OPAL provides a workaround which precludes the possibility of hitting this bug. But running with this workaround applied causes checkstop if any correctable error in L2 cache directory is detected. Hence OPAL also provides a way to undo the workaround. In the existing implementation, workaround is applied by the last thread of the core entering fastsleep and undone by the first thread waking up. But this has a performance cost. These OPAL calls account for roughly 4000 cycles everytime the core has to enter or wakeup from fastsleep. This patch introduces a sysfs attribute (fastsleep_workaround_state) to choose the behavior of this workaround. By default, fastsleep_workaround_state = dynamic. In this case, workaround is applied/undone everytime the core enters/exits fastsleep. fastsleep_workaround_state = applyonce. In this case the workaround is applied once on all the cores and never undone. This can be triggered by echo applyonce > /sys/devices/system/cpu/fastsleep_workaround_state For simplicity this attribute can be modified only once. Implying, once fastsleep_workaround_state is changed to applyonce, it cannot be reverted to the default state. Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/opal-api.h| 7 ++ arch/powerpc/include/asm/opal.h| 1 + arch/powerpc/platforms/powernv/idle.c | 132 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + 4 files changed, 141 insertions(+) diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 0321a90..a49e5fa 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -165,6 +165,13 @@ #define OPAL_PM_WINKLE_ENABLED 0x0004 #define OPAL_PM_SLEEP_ENABLED_ER1 0x0008 /* with workaround */ +/* + * OPAL_CONFIG_CPU_IDLE_STATE parameters + */ +#define OPAL_CONFIG_IDLE_FASTSLEEP 1 +#define OPAL_CONFIG_IDLE_UNDO 0 +#define OPAL_CONFIG_IDLE_APPLY 1 + #ifndef __ASSEMBLY__ /* Other enums */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 042af1a..9a47813 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -186,6 +186,7 @@ int64_t opal_handle_hmi(void); int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end); int64_t opal_unregister_dump_region(uint32_t id); int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); +int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag); int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number); int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg, uint64_t msg_len); diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 104235a..3e0423d 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -13,6 +13,8 @@ #include #include #include +#include +#include #include #include @@ -136,6 +138,127 @@ u32 pnv_get_supported_cpuidle_states(void) } EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states); + +static void pnv_fastsleep_workaround_apply(void *info) + +{ + int rc; + int *err = info; + + rc = opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP, + OPAL_CONFIG_IDLE_APPLY); + if (rc) + *err = 1; +} + +/* + * Used to store fastsleep workaround state + * 0 - Workaround applied/undone at fastsleep entry/exit path (Default) + * 1 - Workaround applied once, never undone. + */ +static u8 fastsleep_workaround_state; + +static const char * const fastsleep_workaround_avail_states[] = { + "dynamic", "applyonce" +}; + +/* + * fastsleep_workaround_avail_states values + */ +enum { + WORKAROUND_DYNAMIC, + WORKAROUND_APPLYONCE +}; +static ssize_t show_fastsleep_workaround_state(struct device *dev, + struct device_attribute *attr, char *buf) +{ + char *s = buf; + + if (fastsleep_workaround_state == 0) { + s += sprintf(s, "[%s] ", + fastsleep_workaround_avail_states[WORKAROUND_DYNAMIC]); + s += sprintf(s, "%s\n", + fastsleep_workaround_avail_states[WORKAROUND_APPLYONCE]); + } else { + s += sprintf(s, "%s ", + fastsleep_workaround_avail_states[WORKAROUND_DYNAMIC]); + s += sprintf(s, "[%s]\n", + fastsleep_workaround_avail_s
[PATCH v4 2/3] powerpc/powernv: Move cpuidle related code from setup.c to new file
This is a cleanup patch; doesn't change any functionality. Moves all cpuidle related code from setup.c to a new file. Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/idle.c | 191 arch/powerpc/platforms/powernv/setup.c | 171 3 files changed, 192 insertions(+), 172 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/idle.c diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 33e44f3..bee9235 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -1,4 +1,4 @@ -obj-y += setup.o opal-wrappers.o opal.o opal-async.o +obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y += opal-msglog.o opal-hmi.o opal-power.o diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c new file mode 100644 index 000..104235a --- /dev/null +++ b/arch/powerpc/platforms/powernv/idle.c @@ -0,0 +1,191 @@ +/* + * PowerNV cpuidle code + * + * Copyright 2015 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "powernv.h" +#include "subcore.h" + +static u32 supported_cpuidle_states; + +int pnv_save_sprs_for_winkle(void) +{ + int cpu; + int rc; + + /* +* hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross +* all cpus at boot. Get these reg values of current cpu and use the +* same accross all cpus. +*/ + uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1; + uint64_t hid0_val = mfspr(SPRN_HID0); + uint64_t hid1_val = mfspr(SPRN_HID1); + uint64_t hid4_val = mfspr(SPRN_HID4); + uint64_t hid5_val = mfspr(SPRN_HID5); + uint64_t hmeer_val = mfspr(SPRN_HMEER); + + for_each_possible_cpu(cpu) { + uint64_t pir = get_hard_smp_processor_id(cpu); + uint64_t hsprg0_val = (uint64_t)&paca[cpu]; + + /* +* HSPRG0 is used to store the cpu's pointer to paca. Hence last +* 3 bits are guaranteed to be 0. Program slw to restore HSPRG0 +* with 63rd bit set, so that when a thread wakes up at 0x100 we +* can use this bit to distinguish between fastsleep and +* deep winkle. +*/ + hsprg0_val |= 1; + + rc = opal_slw_set_reg(pir, SPRN_HSPRG0, hsprg0_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val); + if (rc != 0) + return rc; + + /* HIDs are per core registers */ + if (cpu_thread_in_core(cpu) == 0) { + + rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val); + if (rc != 0) + return rc; + } + } + + return 0; +} + +static void pnv_alloc_idle_core_states(void) +{ + int i, j; + int nr_cores = cpu_nr_cores(); + u32 *core_idle_state; + + /* +* core_idle_state - First 8 bits track the idle state of each thread +* of the core. The 8th bit is the lock bit. Initially all thread bits +* are set. They are cleared when the thread enters deep idle state +* like sleep and winkle. Initially the lock bit is cleared. +* The lock bit has 2 purposes +* a. While the first thread is restoring core state, it prevents +* other threads in the core from switching to process context. +* b. While the last thread in the core is saving the core state, it +* prevents a different thread from waking up. +
Re: [PATCH] powerpc: Make doorbell check preemption safe
On Wednesday 20 May 2015 06:30 AM, Michael Neuling wrote: > On Wed, 2015-05-20 at 00:30 +0530, Shreyas B. Prabhu wrote: >> Doorbell can be used to cause ipi on cpus which are sibling threads on >> the same core. So icp_native_cause_ipi checks if the destination cpu >> is a sibling thread of the current cpu and uses doorbell in such cases. >> >> But while running with CONFIG_PREEMPT=y, since this section is >> preemtible, we can run into issues if after we check if the destination >> cpu is a sibling cpu, the task gets migrated from a sibling cpu to a >> cpu on another core. >> >> Fix this by using get_cpu()/ put_cpu() > > Thanks. Looks good and it's boots for me. > > Signed-off-by: Michael Neuling > mikey, Thanks! mpe, if this looks ok, can you please pick it up? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle
Hi, In this patch series we use winkle for offlined cores. I successfully tested the working of this with subcore functionality. Test scenario was as follows: 1. Set SMT mode to 1, Set subores-per-core to 1 2. Offline a core, in this case cpu 32 (sending it to winkle) 3. Set subcores-per-core to 4 4. Online the core 5. Start a guest (Topology 1 core 2 threads) on a subcore, in this case on cpu 36 This works without any glitch. Thanks, Shreyas On Monday 25 August 2014 11:31 PM, Shreyas B. Prabhu wrote: > Fast sleep is an idle state, where the core and the L1 and L2 > caches are brought down to a threshold voltage. This also means that > the communication between L2 and L3 caches have to be fenced. However > the current P8 chips have a bug wherein this fencing between L2 and > L3 caches get delayed by a cpu cycle. This can delay L3 response to > the other cpus if they request for data during this time. Thus they > would fetch the same data from the memory which could lead to data > corruption if L3 cache is not flushed. > Patch 4 adds support to work around this. > > 'Deep Winkle' is a deeper idle state where core and private L2 are powered > off. While it offers higher power savings, it is at the cost of losing > hypervisor register state and higher latency. > Patch 5-9 adds support for winkle and uses it for offline cpus. > > Patch 1 - Moves parameters required discover idle states to a location > common to both cpuidle driver and powernv core code > Patch 2 - Populates idle state details from device tree > Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle > > > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > Cc: Rafael J. Wysocki > Cc: Srivatsa S. Bhat > Cc: Preeti U. Murthy > Cc: Vaidyanathan Srinivasan > Cc: Rob Herring > Cc: Grant Likely > Cc: devicet...@vger.kernel.org > Cc: linux...@vger.kernel.org > Cc: linuxppc-...@lists.ozlabs.org > > Preeti U Murthy (2): > cpuidle/powernv: Populate cpuidle state details by querying the > device-tree > powerpc/powernv/cpuidle: Add workaround to enable fastsleep > > Shreyas B. Prabhu (6): > powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from > fast-sleep > powerpc/powernv: Add OPAL call to save and restore > powerpc: Adding macro for accessing Thread Switch Control Register > powerpc/powernv: Add winkle infrastructure > powerpc/powernv: Discover and enable winkle > powerpc/powernv: Enter deepest supported idle state in offline > > Srivatsa S. Bhat (1): > powerpc/powernv: Enable Offline CPUs to enter deep idle states > > arch/powerpc/include/asm/machdep.h | 4 + > arch/powerpc/include/asm/opal.h| 10 ++ > arch/powerpc/include/asm/paca.h| 3 + > arch/powerpc/include/asm/ppc-opcode.h | 2 + > arch/powerpc/include/asm/processor.h | 6 +- > arch/powerpc/include/asm/reg.h | 1 + > arch/powerpc/kernel/asm-offsets.c | 1 + > arch/powerpc/kernel/exceptions-64s.S | 37 ++--- > arch/powerpc/kernel/idle.c | 30 > arch/powerpc/kernel/idle_power7.S | 83 +- > arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + > arch/powerpc/platforms/powernv/powernv.h | 8 + > arch/powerpc/platforms/powernv/setup.c | 217 > + > arch/powerpc/platforms/powernv/smp.c | 13 +- > arch/powerpc/platforms/powernv/subcore.c | 15 ++ > drivers/cpuidle/cpuidle-powernv.c | 40 - > 16 files changed, 439 insertions(+), 33 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle
Hi, Any updates on this patch series? On Monday 25 August 2014 11:31 PM, Shreyas B. Prabhu wrote: > Fast sleep is an idle state, where the core and the L1 and L2 > caches are brought down to a threshold voltage. This also means that > the communication between L2 and L3 caches have to be fenced. However > the current P8 chips have a bug wherein this fencing between L2 and > L3 caches get delayed by a cpu cycle. This can delay L3 response to > the other cpus if they request for data during this time. Thus they > would fetch the same data from the memory which could lead to data > corruption if L3 cache is not flushed. > Patch 4 adds support to work around this. > > 'Deep Winkle' is a deeper idle state where core and private L2 are powered > off. While it offers higher power savings, it is at the cost of losing > hypervisor register state and higher latency. > Patch 5-9 adds support for winkle and uses it for offline cpus. > > Patch 1 - Moves parameters required discover idle states to a location > common to both cpuidle driver and powernv core code > Patch 2 - Populates idle state details from device tree > Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle > > > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > Cc: Rafael J. Wysocki > Cc: Srivatsa S. Bhat > Cc: Preeti U. Murthy > Cc: Vaidyanathan Srinivasan > Cc: Rob Herring > Cc: Grant Likely > Cc: devicet...@vger.kernel.org > Cc: linux...@vger.kernel.org > Cc: linuxppc-...@lists.ozlabs.org > > Preeti U Murthy (2): > cpuidle/powernv: Populate cpuidle state details by querying the > device-tree > powerpc/powernv/cpuidle: Add workaround to enable fastsleep > > Shreyas B. Prabhu (6): > powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from > fast-sleep > powerpc/powernv: Add OPAL call to save and restore > powerpc: Adding macro for accessing Thread Switch Control Register > powerpc/powernv: Add winkle infrastructure > powerpc/powernv: Discover and enable winkle > powerpc/powernv: Enter deepest supported idle state in offline > > Srivatsa S. Bhat (1): > powerpc/powernv: Enable Offline CPUs to enter deep idle states > > arch/powerpc/include/asm/machdep.h | 4 + > arch/powerpc/include/asm/opal.h| 10 ++ > arch/powerpc/include/asm/paca.h| 3 + > arch/powerpc/include/asm/ppc-opcode.h | 2 + > arch/powerpc/include/asm/processor.h | 6 +- > arch/powerpc/include/asm/reg.h | 1 + > arch/powerpc/kernel/asm-offsets.c | 1 + > arch/powerpc/kernel/exceptions-64s.S | 37 ++--- > arch/powerpc/kernel/idle.c | 30 > arch/powerpc/kernel/idle_power7.S | 83 +- > arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + > arch/powerpc/platforms/powernv/powernv.h | 8 + > arch/powerpc/platforms/powernv/setup.c | 217 > + > arch/powerpc/platforms/powernv/smp.c | 13 +- > arch/powerpc/platforms/powernv/subcore.c | 15 ++ > drivers/cpuidle/cpuidle-powernv.c | 40 - > 16 files changed, 439 insertions(+), 33 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle
Hi, Any updates on this patch series? On Thursday 18 September 2014 08:41 AM, Shreyas B Prabhu wrote: > Hi, > > In this patch series we use winkle for offlined cores. I successfully > tested the working of this with subcore functionality. > > Test scenario was as follows: > 1. Set SMT mode to 1, Set subores-per-core to 1 > 2. Offline a core, in this case cpu 32 (sending it to winkle) > 3. Set subcores-per-core to 4 > 4. Online the core > 5. Start a guest (Topology 1 core 2 threads) on a subcore, in this case > on cpu 36 > > This works without any glitch. > > Thanks, > Shreyas > > On Monday 25 August 2014 11:31 PM, Shreyas B. Prabhu wrote: >> Fast sleep is an idle state, where the core and the L1 and L2 >> caches are brought down to a threshold voltage. This also means that >> the communication between L2 and L3 caches have to be fenced. However >> the current P8 chips have a bug wherein this fencing between L2 and >> L3 caches get delayed by a cpu cycle. This can delay L3 response to >> the other cpus if they request for data during this time. Thus they >> would fetch the same data from the memory which could lead to data >> corruption if L3 cache is not flushed. >> Patch 4 adds support to work around this. >> >> 'Deep Winkle' is a deeper idle state where core and private L2 are powered >> off. While it offers higher power savings, it is at the cost of losing >> hypervisor register state and higher latency. >> Patch 5-9 adds support for winkle and uses it for offline cpus. >> >> Patch 1 - Moves parameters required discover idle states to a location >> common to both cpuidle driver and powernv core code >> Patch 2 - Populates idle state details from device tree >> Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle >> >> >> Cc: Benjamin Herrenschmidt >> Cc: Paul Mackerras >> Cc: Michael Ellerman >> Cc: Rafael J. Wysocki >> Cc: Srivatsa S. Bhat >> Cc: Preeti U. Murthy >> Cc: Vaidyanathan Srinivasan >> Cc: Rob Herring >> Cc: Grant Likely >> Cc: devicet...@vger.kernel.org >> Cc: linux...@vger.kernel.org >> Cc: linuxppc-...@lists.ozlabs.org >> >> Preeti U Murthy (2): >> cpuidle/powernv: Populate cpuidle state details by querying the >> device-tree >> powerpc/powernv/cpuidle: Add workaround to enable fastsleep >> >> Shreyas B. Prabhu (6): >> powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from >> fast-sleep >> powerpc/powernv: Add OPAL call to save and restore >> powerpc: Adding macro for accessing Thread Switch Control Register >> powerpc/powernv: Add winkle infrastructure >> powerpc/powernv: Discover and enable winkle >> powerpc/powernv: Enter deepest supported idle state in offline >> >> Srivatsa S. Bhat (1): >> powerpc/powernv: Enable Offline CPUs to enter deep idle states >> >> arch/powerpc/include/asm/machdep.h | 4 + >> arch/powerpc/include/asm/opal.h| 10 ++ >> arch/powerpc/include/asm/paca.h| 3 + >> arch/powerpc/include/asm/ppc-opcode.h | 2 + >> arch/powerpc/include/asm/processor.h | 6 +- >> arch/powerpc/include/asm/reg.h | 1 + >> arch/powerpc/kernel/asm-offsets.c | 1 + >> arch/powerpc/kernel/exceptions-64s.S | 37 ++--- >> arch/powerpc/kernel/idle.c | 30 >> arch/powerpc/kernel/idle_power7.S | 83 +- >> arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + >> arch/powerpc/platforms/powernv/powernv.h | 8 + >> arch/powerpc/platforms/powernv/setup.c | 217 >> + >> arch/powerpc/platforms/powernv/smp.c | 13 +- >> arch/powerpc/platforms/powernv/subcore.c | 15 ++ >> drivers/cpuidle/cpuidle-powernv.c | 40 - >> 16 files changed, 439 insertions(+), 33 deletions(-) >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle
Hi Rafael, On Tuesday 30 September 2014 04:58 AM, Rafael J. Wysocki wrote: > On Monday, September 29, 2014 03:53:06 PM Shreyas B Prabhu wrote: >> Hi, >> Any updates on this patch series? > > I have a couple of patches from there in my tree it seems. Please have a look > at linux-pm.git/linux-next and please let me know if that's the case. > I checked linux-pm.git/linux-net (Last commit 067c17382165). None of the patches in this series are present in the tree. > >> On Thursday 18 September 2014 08:41 AM, Shreyas B Prabhu wrote: >>> Hi, >>> >>> In this patch series we use winkle for offlined cores. I successfully >>> tested the working of this with subcore functionality. >>> >>> Test scenario was as follows: >>> 1. Set SMT mode to 1, Set subores-per-core to 1 >>> 2. Offline a core, in this case cpu 32 (sending it to winkle) >>> 3. Set subcores-per-core to 4 >>> 4. Online the core >>> 5. Start a guest (Topology 1 core 2 threads) on a subcore, in this case >>> on cpu 36 >>> >>> This works without any glitch. >>> >>> Thanks, >>> Shreyas >>> >>> On Monday 25 August 2014 11:31 PM, Shreyas B. Prabhu wrote: >>>> Fast sleep is an idle state, where the core and the L1 and L2 >>>> caches are brought down to a threshold voltage. This also means that >>>> the communication between L2 and L3 caches have to be fenced. However >>>> the current P8 chips have a bug wherein this fencing between L2 and >>>> L3 caches get delayed by a cpu cycle. This can delay L3 response to >>>> the other cpus if they request for data during this time. Thus they >>>> would fetch the same data from the memory which could lead to data >>>> corruption if L3 cache is not flushed. >>>> Patch 4 adds support to work around this. >>>> >>>> 'Deep Winkle' is a deeper idle state where core and private L2 are powered >>>> off. While it offers higher power savings, it is at the cost of losing >>>> hypervisor register state and higher latency. >>>> Patch 5-9 adds support for winkle and uses it for offline cpus. >>>> >>>> Patch 1 - Moves parameters required discover idle states to a location >>>> common to both cpuidle driver and powernv core code >>>> Patch 2 - Populates idle state details from device tree >>>> Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle >>>> >>>> >>>> Cc: Benjamin Herrenschmidt >>>> Cc: Paul Mackerras >>>> Cc: Michael Ellerman >>>> Cc: Rafael J. Wysocki >>>> Cc: Srivatsa S. Bhat >>>> Cc: Preeti U. Murthy >>>> Cc: Vaidyanathan Srinivasan >>>> Cc: Rob Herring >>>> Cc: Grant Likely >>>> Cc: devicet...@vger.kernel.org >>>> Cc: linux...@vger.kernel.org >>>> Cc: linuxppc-...@lists.ozlabs.org >>>> >>>> Preeti U Murthy (2): >>>> cpuidle/powernv: Populate cpuidle state details by querying the >>>> device-tree >>>> powerpc/powernv/cpuidle: Add workaround to enable fastsleep >>>> >>>> Shreyas B. Prabhu (6): >>>> powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from >>>> fast-sleep >>>> powerpc/powernv: Add OPAL call to save and restore >>>> powerpc: Adding macro for accessing Thread Switch Control Register >>>> powerpc/powernv: Add winkle infrastructure >>>> powerpc/powernv: Discover and enable winkle >>>> powerpc/powernv: Enter deepest supported idle state in offline >>>> >>>> Srivatsa S. Bhat (1): >>>> powerpc/powernv: Enable Offline CPUs to enter deep idle states >>>> >>>> arch/powerpc/include/asm/machdep.h | 4 + >>>> arch/powerpc/include/asm/opal.h| 10 ++ >>>> arch/powerpc/include/asm/paca.h| 3 + >>>> arch/powerpc/include/asm/ppc-opcode.h | 2 + >>>> arch/powerpc/include/asm/processor.h | 6 +- >>>> arch/powerpc/include/asm/reg.h | 1 + >>>> arch/powerpc/kernel/asm-offsets.c | 1 + >>>> arch/powerpc/kernel/exceptions-64s.S | 37 ++--- >>>> arch/powerpc/kernel/idle.c | 30 >>>> arch/powerpc/kernel/idle_power7.S | 83 +- >>>> arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + >>>> arch/powerpc/platforms/powernv/powernv.h | 8 + >>>> arch/powerpc/platforms/powernv/setup.c | 217 >>>> + >>>> arch/powerpc/platforms/powernv/smp.c | 13 +- >>>> arch/powerpc/platforms/powernv/subcore.c | 15 ++ >>>> drivers/cpuidle/cpuidle-powernv.c | 40 - >>>> 16 files changed, 439 insertions(+), 33 deletions(-) >>>> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] powerpc/powernv: Fix build error when CONFIG_SMP=n
Fix the following build error when compiled with CONFIG_SMP=n arch/powerpc/platforms/powernv/setup.c: In function ‘pnv_kexec_wait_secondaries_down’: arch/powerpc/platforms/powernv/setup.c:179:4: error: implicit declaration of function ‘get_hard_smp_processor_id’ [-Werror=implicit-function-declaration] rc = opal_query_cpu_status(get_hard_smp_processor_id(i), The usage of get_hard_smp_processor_id() needs the declaration from . The file setup.c includes , which in-turn includes . However, includes only on SMP configs and hence UP builds fail. Fix this by directly including in setup.c unconditionally. Reported-by: Geert Uytterhoeven Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/platforms/powernv/setup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 8723d32..e6bde98 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "powernv.h" -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] powerpc/powernv: include asm/smp.h to handle UP config
Build throws following errors when CONFIG_SMP=n arch/powerpc/platforms/powernv/setup.c: In function ‘pnv_kexec_wait_secondaries_down’: arch/powerpc/platforms/powernv/setup.c:179:4: error: implicit declaration of function ‘get_hard_smp_processor_id’ rc = opal_query_cpu_status(get_hard_smp_processor_id(i), The usage of get_hard_smp_processor_id() needs the declaration from . The file setup.c includes , which in-turn includes . However, includes only on SMP configs and hence UP builds fail. Fix this by directly including in setup.c unconditionally. Reported-by: Geert Uytterhoeven Reviewed-by: Srivatsa S. Bhat Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/platforms/powernv/setup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 8c16a5f..678573c 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -35,6 +35,7 @@ #include #include #include +#include #include "powernv.h" -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] powerpc/powernv : Disable subcore for UP configs
Build throws following errors when CONFIG_SMP=n arch/powerpc/platforms/powernv/subcore.c: In function ‘cpu_update_split_mode’: arch/powerpc/platforms/powernv/subcore.c:274:15: error: ‘setup_max_cpus’ undeclared (first use in this function) arch/powerpc/platforms/powernv/subcore.c:285:5: error: lvalue required as left operand of assignment 'setup_max_cpus' variable is relevant only on SMP, so there is no point working around it for UP. Furthermore, subcore.c itself is relevant only on SMP and hence the better solution is to exclude subcore.c for UP builds. Signed-off-by: Shreyas B. Prabhu --- This patch applies on top of ben/powerpc.git/next branch arch/powerpc/platforms/powernv/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 4ad0d34..636d206 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -1,9 +1,9 @@ obj-y += setup.o opal-takeover.o opal-wrappers.o opal.o opal-async.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o -obj-y += opal-msglog.o subcore.o subcore-asm.o +obj-y += opal-msglog.o subcore-asm.o -obj-$(CONFIG_SMP) += smp.o +obj-$(CONFIG_SMP) += smp.o subcore.o obj-$(CONFIG_PCI) += pci.o pci-p5ioc2.o pci-ioda.o obj-$(CONFIG_EEH) += eeh-ioda.o eeh-powernv.o obj-$(CONFIG_PPC_SCOM) += opal-xscom.o -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 1/2] powerpc/powernv: include asm/smp.h to fix UP build failure
Build throws following errors when CONFIG_SMP=n arch/powerpc/platforms/powernv/setup.c: In function ‘pnv_kexec_wait_secondaries_down’: arch/powerpc/platforms/powernv/setup.c:179:4: error: implicit declaration of function ‘get_hard_smp_processor_id’ rc = opal_query_cpu_status(get_hard_smp_processor_id(i), The usage of get_hard_smp_processor_id() needs the declaration from . The file setup.c includes , which in-turn includes . However, includes only on SMP configs and hence UP builds fail. Fix this by directly including in setup.c unconditionally. Reported-by: Geert Uytterhoeven Reviewed-by: Srivatsa S. Bhat Signed-off-by: Shreyas B. Prabhu --- Changes is v2: Commit message improved based on suggestion. arch/powerpc/platforms/powernv/setup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 8c16a5f..678573c 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -35,6 +35,7 @@ #include #include #include +#include #include "powernv.h" -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/2] powerpc/powernv : Disable subcore for UP configs
Build throws following errors when CONFIG_SMP=n arch/powerpc/platforms/powernv/subcore.c: In function ‘cpu_update_split_mode’: arch/powerpc/platforms/powernv/subcore.c:274:15: error: ‘setup_max_cpus’ undeclared (first use in this function) arch/powerpc/platforms/powernv/subcore.c:285:5: error: lvalue required as left operand of assignment 'setup_max_cpus' variable is relevant only on SMP, so there is no point working around it for UP. Furthermore, subcore itself is relevant only on SMP and hence the better solution is to exclude subcore.o and subcore-asm.o for UP builds. Signed-off-by: Shreyas B. Prabhu --- Changes in v2: Excluding subcore-asm.o which is part of the subcore feature for UP configs. arch/powerpc/platforms/powernv/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 4ad0d34..d55891f 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -1,9 +1,9 @@ obj-y += setup.o opal-takeover.o opal-wrappers.o opal.o opal-async.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o -obj-y += opal-msglog.o subcore.o subcore-asm.o +obj-y += opal-msglog.o -obj-$(CONFIG_SMP) += smp.o +obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o obj-$(CONFIG_PCI) += pci.o pci-p5ioc2.o pci-ioda.o obj-$(CONFIG_EEH) += eeh-ioda.o eeh-powernv.o obj-$(CONFIG_PPC_SCOM) += opal-xscom.o -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle
Fast sleep is an idle state, where the core and the L1 and L2 caches are brought down to a threshold voltage. This also means that the communication between L2 and L3 caches have to be fenced. However the current P8 chips have a bug wherein this fencing between L2 and L3 caches get delayed by a cpu cycle. This can delay L3 response to the other cpus if they request for data during this time. Thus they would fetch the same data from the memory which could lead to data corruption if L3 cache is not flushed. Patch 4 adds support to work around this. 'Deep Winkle' is a deeper idle state where core and private L2 are powered off. While it offers higher power savings, it is at the cost of losing hypervisor register state and higher latency. Patch 5-9 adds support for winkle and uses it for offline cpus. Patch 1 - Moves parameters required discover idle states to a location common to both cpuidle driver and powernv core code Patch 2 - Populates idle state details from device tree Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: Srivatsa S. Bhat Cc: Preeti U. Murthy Cc: Vaidyanathan Srinivasan Cc: Rob Herring Cc: Grant Likely Cc: devicet...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Preeti U Murthy (2): cpuidle/powernv: Populate cpuidle state details by querying the device-tree powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu (6): powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep powerpc/powernv: Add OPAL call to save and restore powerpc: Adding macro for accessing Thread Switch Control Register powerpc/powernv: Add winkle infrastructure powerpc/powernv: Discover and enable winkle powerpc/powernv: Enter deepest supported idle state in offline Srivatsa S. Bhat (1): powerpc/powernv: Enable Offline CPUs to enter deep idle states arch/powerpc/include/asm/machdep.h | 4 + arch/powerpc/include/asm/opal.h| 10 ++ arch/powerpc/include/asm/paca.h| 3 + arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 6 +- arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kernel/exceptions-64s.S | 37 ++--- arch/powerpc/kernel/idle.c | 30 arch/powerpc/kernel/idle_power7.S | 83 +- arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + arch/powerpc/platforms/powernv/powernv.h | 8 + arch/powerpc/platforms/powernv/setup.c | 217 + arch/powerpc/platforms/powernv/smp.c | 13 +- arch/powerpc/platforms/powernv/subcore.c | 15 ++ drivers/cpuidle/cpuidle-powernv.c | 40 - 16 files changed, 439 insertions(+), 33 deletions(-) -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/9] powerpc/powernv/cpuidle: Add workaround to enable fastsleep
From: Preeti U Murthy Fast sleep is an idle state, where the core and the L1 and L2 caches are brought down to a threshold voltage. This also means that the communication between L2 and L3 caches have to be fenced. However the current P8 chips have a bug wherein this fencing between L2 and L3 caches get delayed by a cpu cycle. This can delay L3 response to the other cpus if they request for data during this time. Thus they would fetch the same data from the memory which could lead to data corruption if L3 cache is not flushed. The cpu idle states save power at a core level and not at a thread level. Hence powersavings is based on the shallowest idle state that a thread of a core is in. The above issue in fastsleep will arise only when all the threads in a core either enter fastsleep or some of them enter any deeper idle states, with only a few being in fastsleep. This patch therefore implements a workaround this bug by ensuring that, each time a cpu goes to fastsleep, it checks if it is the last thread in the core to enter fastsleep. If so, it needs to make an opal call to get around the above mentioned fastsleep problem in the hardware before issuing the sleep instruction. Similarly when a thread in a core comes out of fastsleep, it needs to verify if its the first thread in the core to come out of fastsleep and issue the opal call to revert the changes made while entering fastsleep. For the same reason mentioned above we need to take care of offline threads as well since we allow them to enter fastsleep and with support for deep winkle soon coming in they can enter winkle as well. We therefore ensure that even offline threads make the above mentioned opal calls similarly, so that as long as the threads in a core are in and idle state >= fastsleep, we have the workaround in place. Whenever a thread comes out of either of these states, it needs to verify if the opal call has been made and if so it will revert it. For now this patch ensures that offline threads enter fastsleep. We need to be able to synchronize the cpus in a core which are entering and exiting fastsleep so as to ensure that the last thread in the core to enter fastsleep and the first to exit fastsleep *only* issue the opal call. To do so, we need a per-core lock and counter. The counter is required to keep track of the number of threads in a core which are in idle state >= fastsleep. To make the implementation of this simple, we introduce a per-cpu lock and counter and every thread always takes the primary thread's lock, modifies the primary thread's counter. This effectively makes them per-core entities. But the workaround is abstracted in the powernv core code and neither the hotplug path nor the cpuidle driver need to bother about it. All they need to know is if fastsleep, with error or no error is present as an idle state. Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Signed-off-by: Shreyas B. Prabhu Signed-off-by: Preeti U Murthy --- arch/powerpc/include/asm/machdep.h | 3 + arch/powerpc/include/asm/opal.h| 3 + arch/powerpc/include/asm/processor.h | 4 +- arch/powerpc/kernel/idle.c | 19 arch/powerpc/kernel/idle_power7.S | 2 +- arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/setup.c | 139 ++--- drivers/cpuidle/cpuidle-powernv.c | 8 +- 8 files changed, 140 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index b125cea..f37014f 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -298,6 +298,9 @@ struct machdep_calls { #ifdef CONFIG_MEMORY_HOTREMOVE int (*remove_memory)(u64, u64); #endif + /* Idle handlers */ + void(*setup_idle)(void); + unsigned long (*power7_sleep)(void); }; extern void e500_idle(void); diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 28b8342..166d572 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -149,6 +149,7 @@ struct opal_sg_list { #define OPAL_DUMP_INFO294 #define OPAL_PCI_EEH_FREEZE_SET97 #define OPAL_HANDLE_HMI98 +#define OPAL_CONFIG_IDLE_STATE 99 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 @@ -775,6 +776,7 @@ extern struct device_node *opal_node; /* Flags used for idle state discovery from the device tree */ #define IDLE_INST_NAP 0x0001 /* nap instruction can be used */ #define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */ +#define IDLE_INST_SLEEP_ER10x0008 /* Use sleep
[PATCH 1/9] powerpc/powernv: Enable Offline CPUs to enter deep idle states
From: "Srivatsa S. Bhat" The offline cpus should enter deep idle states so as to gain maximum powersavings when the entire core is offline. To do so the offline path must be made aware of the available deepest idle state. Hence probe the device tree for the possible idle states in powernv core code and expose the deepest idle state through flags. Since the device tree is probed by the cpuidle driver as well, move the parameters required to discover the idle states into an appropriate common place to both the driver and the powernv core code. Another point is that fastsleep idle state may require workarounds in the kernel to function properly. This workaround is introduced in the subsequent patches. However neither the cpuidle driver or the hotplug path need be bothered about this workaround. They will be taken care of by the core powernv code. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Srivatsa S. Bhat Signed-off-by: Shreyas B. Prabhu [ Changelog modified by pre...@linux.vnet.ibm.com ] Signed-off-by: Preeti U. Murthy --- arch/powerpc/include/asm/opal.h | 4 +++ arch/powerpc/platforms/powernv/powernv.h | 7 + arch/powerpc/platforms/powernv/setup.c | 51 arch/powerpc/platforms/powernv/smp.c | 11 ++- drivers/cpuidle/cpuidle-powernv.c| 7 ++--- 5 files changed, 75 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 86055e5..28b8342 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -772,6 +772,10 @@ extern struct kobject *opal_kobj; /* /ibm,opal */ extern struct device_node *opal_node; +/* Flags used for idle state discovery from the device tree */ +#define IDLE_INST_NAP 0x0001 /* nap instruction can be used */ +#define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */ + /* API functions */ int64_t opal_invalid_call(void); int64_t opal_console_write(int64_t term_number, __be64 *length, diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 75501bf..31ece13 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -23,6 +23,13 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask) } #endif +/* Flags to indicate which of the CPU idle states are available for use */ + +#define IDLE_USE_NAP (1UL << 0) +#define IDLE_USE_SLEEP (1UL << 1) + +extern unsigned int pnv_get_supported_cpuidle_states(void); + extern void pnv_lpc_init(void); bool cpu_core_split_required(void); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 5a0e2dc..2dca1d8 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -282,6 +282,57 @@ static void __init pnv_setup_machdep_rtas(void) } #endif /* CONFIG_PPC_POWERNV_RTAS */ +static unsigned int supported_cpuidle_states; + +unsigned int pnv_get_supported_cpuidle_states(void) +{ + return supported_cpuidle_states; +} + +static int __init pnv_probe_idle_states(void) +{ + struct device_node *power_mgt; + struct property *prop; + int dt_idle_states; + u32 *flags; + int i; + + supported_cpuidle_states = 0; + + if (cpuidle_disable != IDLE_NO_OVERRIDE) + return 0; + + if (!firmware_has_feature(FW_FEATURE_OPALv3)) + return 0; + + power_mgt = of_find_node_by_path("/ibm,opal/power-mgt"); + if (!power_mgt) { + pr_warn("opal: PowerMgmt Node not found\n"); + return 0; + } + + prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL); + if (!prop) { + pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n"); + return 0; + } + + dt_idle_states = prop->length / sizeof(u32); + flags = (u32 *) prop->value; + + for (i = 0; i < dt_idle_states; i++) { + if (flags[i] & IDLE_INST_NAP) + supported_cpuidle_states |= IDLE_USE_NAP; + + if (flags[i] & IDLE_INST_SLEEP) + supported_cpuidle_states |= IDLE_USE_SLEEP; + } + + return 0; +} + +subsys_initcall(pnv_probe_idle_states); + static int __init pnv_probe(void) { unsigned long root = of_get_flat_dt_root(); diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 5fcfcf4..3ad31d2 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void) static void pnv_smp_cpu_kill_self(void) { unsigned int cpu
[PATCH 3/9] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep
When guests have to be launched, the secondary threads which are offline are woken up to run the guests. Today these threads wake up from nap and check if they have to run guests. Now that the offline secondary threads can go to fastsleep or going ahead a deeper idle state such as winkle, add this check in the wakeup from any of the deep idle states path as well. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Suggested-by: "Srivatsa S. Bhat" Signed-off-by: Shreyas B. Prabhu [ Changelog added by ] Signed-off-by: Preeti U Murthy --- arch/powerpc/kernel/exceptions-64s.S | 35 --- 1 file changed, 16 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 050f79a..c64f3cc0 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -100,25 +100,8 @@ system_reset_pSeries: SET_SCRATCH0(r13) #ifdef CONFIG_PPC_P7_NAP BEGIN_FTR_SECTION - /* Running native on arch 2.06 or later, check if we are -* waking up from nap. We only handle no state loss and -* supervisor state loss. We do -not- handle hypervisor -* state loss at this time. -*/ - mfspr r13,SPRN_SRR1 - rlwinm. r13,r13,47-31,30,31 - beq 9f - /* waking up from powersave (nap) state */ - cmpwi cr1,r13,2 - /* Total loss of HV state is fatal, we could try to use the -* PIR to locate a PACA, then use an emergency stack etc... -* OPAL v3 based powernv platforms have new idle states -* which fall in this catagory. -*/ - bgt cr1,8f GET_PACA(r13) - #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE li r0,KVM_HWTHREAD_IN_KERNEL stb r0,HSTATE_HWTHREAD_STATE(r13) @@ -131,13 +114,27 @@ BEGIN_FTR_SECTION 1: #endif + /* Running native on arch 2.06 or later, check if we are +* waking up from nap. We only handle no state loss and +* supervisor state loss. We do -not- handle hypervisor +* state loss at this time. +*/ + mfspr r13,SPRN_SRR1 + rlwinm. r13,r13,47-31,30,31 + beq 9f + + /* waking up from powersave (nap) state */ + cmpwi cr1,r13,2 + GET_PACA(r13) + + bgt cr1,8f + beq cr1,2f b power7_wakeup_noloss 2: b power7_wakeup_loss /* Fast Sleep wakeup on PowerNV */ -8: GET_PACA(r13) - b power7_wakeup_tb_loss +8: b power7_wakeup_tb_loss 9: END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/9] powerpc/powernv: Discover and enable winkle
Discover winkle from device tree. If supported make OPAL calls necessary to save HIDs, HMEER, HSPRG0 and LPCR. Also make OPAL call when the HID0 value is modified during split/unsplit of cores. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/opal.h | 1 + arch/powerpc/platforms/powernv/powernv.h | 1 + arch/powerpc/platforms/powernv/setup.c | 75 arch/powerpc/platforms/powernv/subcore.c | 15 +++ 4 files changed, 92 insertions(+) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index d376020..a77957f 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -778,6 +778,7 @@ extern struct device_node *opal_node; #define IDLE_INST_NAP 0x0001 /* nap instruction can be used */ #define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */ #define IDLE_INST_SLEEP_ER10x0008 /* Use sleep with work around*/ +#define IDLE_INST_WINKLE 0x0004 /* winkle instruction can be used */ /* API functions */ int64_t opal_invalid_call(void); diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 31ece13..76b37f8 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -27,6 +27,7 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask) #define IDLE_USE_NAP (1UL << 0) #define IDLE_USE_SLEEP (1UL << 1) +#define IDLE_USE_WINKLE(1UL << 3) extern unsigned int pnv_get_supported_cpuidle_states(void); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index f45b52d..13c5e49 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -273,6 +273,65 @@ unsigned int pnv_get_supported_cpuidle_states(void) return supported_cpuidle_states; } +int pnv_save_sprs_for_winkle(void) +{ + int cpu; + int rc; + + /* + * hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross + * all cpus at boot. Get these reg values of current cpu and use the + * same accross all cpus. + */ + uint64_t lpcr_val = mfspr(SPRN_LPCR); + uint64_t hid0_val = mfspr(SPRN_HID0); + uint64_t hid1_val = mfspr(SPRN_HID1); + uint64_t hid4_val = mfspr(SPRN_HID4); + uint64_t hid5_val = mfspr(SPRN_HID5); + uint64_t hmeer_val = mfspr(SPRN_HMEER); + + for_each_possible_cpu(cpu) { + uint64_t pir = get_hard_smp_processor_id(cpu); + uint64_t local_paca_ptr = (uint64_t)&paca[cpu]; + + rc = opal_slw_set_reg(pir, SPRN_HSPRG0, local_paca_ptr); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val); + if (rc != 0) + return rc; + + /* HIDs are per core registers */ + if (cpu_thread_in_core(cpu) == 0) { + + rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val); + if (rc != 0) + return rc; + + } + + } + + return 0; + +} static int __init pnv_probe_idle_states(void) { struct device_node *power_mgt; @@ -318,6 +377,22 @@ static int __init pnv_probe_idle_states(void) supported_cpuidle_states |= IDLE_USE_SLEEP; need_fastsleep_workaround = 1; } + + if (flags & IDLE_INST_WINKLE) { + /* +* If winkle is supported, save HSPRG0, HIDs and LPCR +* contents via OPAL. Enable winkle only if this +* succeeds. +*/ + int opal_ret_val = pnv_save_sprs_for_winkle(); + + if (!opal_ret_val) + supported_cpuidle_states |= IDLE_USE_WINKLE; + else + pr_warn("opal: opal_slw_set_reg failed with rc=%d, disabling winkle\n", +
[PATCH 9/9] powerpc/powernv: Enter deepest supported idle state in offline
Enter winkle during offline if supported, else revert to sleep or nap. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/platforms/powernv/smp.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 3ad31d2..e3fc2c9 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -169,8 +169,10 @@ static void pnv_smp_cpu_kill_self(void) while (!generic_check_cpu_restart(cpu)) { ppc64_runlatch_off(); - /* If sleep is supported, go to sleep, instead of nap */ - if (idle_states & IDLE_USE_SLEEP) + /* Go to deepest supported idle state */ + if (idle_states & IDLE_USE_WINKLE) + power7_winkle(); + else if (idle_states & IDLE_USE_SLEEP) power7_sleep(); else power7_nap(1); -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/9] powerpc: Adding macro for accessing Thread Switch Control Register
Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/reg.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 0c05059..cb65a73 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -371,6 +371,7 @@ #define SPRN_DBAT7L0x23F /* Data BAT 7 Lower Register */ #define SPRN_DBAT7U0x23E /* Data BAT 7 Upper Register */ #define SPRN_PPR 0x380 /* SMT Thread status Register */ +#define SPRN_TSCR 0x399 /* Thread Switch Control Register */ #define SPRN_DEC 0x016 /* Decrement Register */ #define SPRN_DER 0x095 /* Debug Enable Regsiter */ -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/9] powerpc/powernv: Add OPAL call to save and restore
PORE can be programmed to restore hypervisor registers when waking up from deep cpu idle states like winkle. Add call to pass SPR address and value to OPAL, which in turn will program PORE to restore the register state. Cc: linuxppc-...@lists.ozlabs.org Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Suggested-by: Vaidyanathan Srinivasan Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/opal.h| 2 ++ arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + 2 files changed, 3 insertions(+) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 166d572..d376020 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -150,6 +150,7 @@ struct opal_sg_list { #define OPAL_PCI_EEH_FREEZE_SET97 #define OPAL_HANDLE_HMI98 #define OPAL_CONFIG_IDLE_STATE 99 +#define OPAL_SLW_SET_REG 100 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 @@ -978,6 +979,7 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs); extern void opal_shutdown(void); extern int opal_resync_timebase(void); int64_t opal_config_idle_state(uint64_t state, uint64_t enter); +int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); extern void opal_lpc_init(void); diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index 8d1e724..12e5d46 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -246,5 +246,6 @@ OPAL_CALL(opal_get_param, OPAL_GET_PARAM); OPAL_CALL(opal_set_param, OPAL_SET_PARAM); OPAL_CALL(opal_handle_hmi, OPAL_HANDLE_HMI); OPAL_CALL(opal_config_idle_state, OPAL_CONFIG_IDLE_STATE); +OPAL_CALL(opal_slw_set_reg,OPAL_SLW_SET_REG); OPAL_CALL(opal_register_dump_region, OPAL_REGISTER_DUMP_REGION); OPAL_CALL(opal_unregister_dump_region, OPAL_UNREGISTER_DUMP_REGION); -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/9] powerpc/powernv: Add winkle infrastructure
Winkle causes power to be gated off to the entire chiplet. Hence the hypervisor/firmware state in the entire chiplet is lost. This patch adds necessary infrastructure to support winkle. Specifically does following: - Before entering winkle, save state of registers that need to be restored on wake up (SDR1, HFSCR) - SRR1 bits 46:47 which is used to identify which power saving mode cpu woke up from is '11' for both winkle and sleep. Hence introduce a flag in PACA to distinguish b/w winkle and sleep. - Upon waking up, restore all saved registers, recover slb Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Suggested-by: Vaidyanathan Srinivasan Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/machdep.h | 1 + arch/powerpc/include/asm/paca.h| 3 ++ arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 2 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kernel/exceptions-64s.S | 4 +- arch/powerpc/kernel/idle.c | 11 + arch/powerpc/kernel/idle_power7.S | 81 +- arch/powerpc/platforms/powernv/setup.c | 24 ++ 9 files changed, 126 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index f37014f..0a3ced9 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -301,6 +301,7 @@ struct machdep_calls { /* Idle handlers */ void(*setup_idle)(void); unsigned long (*power7_sleep)(void); + unsigned long (*power7_winkle)(void); }; extern void e500_idle(void); diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index a5139ea..3358f09 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -158,6 +158,9 @@ struct paca_struct { * early exception handler for use by high level C handler */ struct opal_machine_check_event *opal_mc_evt; + + /* Flag to distinguish b/w sleep and winkle */ + u8 offline_state; #endif #ifdef CONFIG_PPC_BOOK3S_64 /* Exclusive emergency stack pointer for machine check exception. */ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 6f85362..5155be7 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -194,6 +194,7 @@ #define PPC_INST_NAP 0x4c000364 #define PPC_INST_SLEEP 0x4c0003a4 +#define PPC_INST_WINKLE0x4c0003e4 /* A2 specific instructions */ #define PPC_INST_ERATWE0x7c0001a6 @@ -374,6 +375,7 @@ #define PPC_NAPstringify_in_c(.long PPC_INST_NAP) #define PPC_SLEEP stringify_in_c(.long PPC_INST_SLEEP) +#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE) /* BHRB instructions */ #define PPC_CLRBHRBstringify_in_c(.long PPC_INST_CLRBHRB) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 41953cd..00e3df9 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -455,6 +455,8 @@ extern void arch_setup_idle(void); extern void power7_nap(int check_irq); extern unsigned long power7_sleep(void); extern unsigned long __power7_sleep(void); +extern unsigned long power7_winkle(void); +extern unsigned long __power7_winkle(void); extern void flush_instruction_cache(void); extern void hard_reset_now(void); extern void poweroff_now(void); diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9d7dede..ea98817 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -731,6 +731,7 @@ int main(void) DEFINE(OPAL_MC_SRR0, offsetof(struct opal_machine_check_event, srr0)); DEFINE(OPAL_MC_SRR1, offsetof(struct opal_machine_check_event, srr1)); DEFINE(PACA_OPAL_MC_EVT, offsetof(struct paca_struct, opal_mc_evt)); + DEFINE(PACAOFFLINESTATE, offsetof(struct paca_struct, offline_state)); #endif return 0; diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index c64f3cc0..6c6db2b 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -133,8 +133,8 @@ BEGIN_FTR_SECTION b power7_wakeup_noloss 2: b power7_wakeup_loss - /* Fast Sleep wakeup on PowerNV */ -8: b power7_wakeup_tb_loss + /* Fast Sleep / Winkle wakeup on PowerNV */ +8: b power7_wakeup_hv_state_loss 9: END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c index 1f268e0..ed46217 100644 --- a/arch/powerpc/kernel/idle.c +++ b/arch/powerpc/kernel/idle.c @@ -9
[PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle
Fast sleep is an idle state, where the core and the L1 and L2 caches are brought down to a threshold voltage. This also means that the communication between L2 and L3 caches have to be fenced. However the current P8 chips have a bug wherein this fencing between L2 and L3 caches get delayed by a cpu cycle. This can delay L3 response to the other cpus if they request for data during this time. Thus they would fetch the same data from the memory which could lead to data corruption if L3 cache is not flushed. Patch 4 adds support to work around this. 'Deep Winkle' is a deeper idle state where core and private L2 are powered off. While it offers higher power savings, it is at the cost of losing hypervisor register state and higher latency. Patch 5-9 adds support for winkle and uses it for offline cpus. Patch 1 - Moves parameters required discover idle states to a location common to both cpuidle driver and powernv core code Patch 2 - Populates idle state details from device tree Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: Srivatsa S. Bhat Cc: Preeti U. Murthy Cc: Vaidyanathan Srinivasan Cc: Rob Herring Cc: Grant Likely Cc: devicet...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Preeti U Murthy (2): cpuidle/powernv: Populate cpuidle state details by querying the device-tree powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu (6): powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep powerpc/powernv: Add OPAL call to save and restore powerpc: Adding macro for accessing Thread Switch Control Register powerpc/powernv: Add winkle infrastructure powerpc/powernv: Discover and enable winkle powerpc/powernv: Enter deepest supported idle state in offline Srivatsa S. Bhat (1): powerpc/powernv: Enable Offline CPUs to enter deep idle states arch/powerpc/include/asm/machdep.h | 4 + arch/powerpc/include/asm/opal.h| 10 ++ arch/powerpc/include/asm/paca.h| 3 + arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 6 +- arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kernel/exceptions-64s.S | 37 ++--- arch/powerpc/kernel/idle.c | 30 arch/powerpc/kernel/idle_power7.S | 83 +- arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + arch/powerpc/platforms/powernv/powernv.h | 8 + arch/powerpc/platforms/powernv/setup.c | 217 + arch/powerpc/platforms/powernv/smp.c | 13 +- arch/powerpc/platforms/powernv/subcore.c | 15 ++ drivers/cpuidle/cpuidle-powernv.c | 40 - 16 files changed, 439 insertions(+), 33 deletions(-) -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/9] cpuidle/powernv: Populate cpuidle state details by querying the device-tree
From: Preeti U Murthy We hard code the metrics relevant for cpuidle states in the kernel today. Instead pick them up from the device tree so that they remain relevant and updated for the system that the kernel is running on. Cc: linux...@vger.kernel.org Cc: Rafael J. Wysocki Cc: Rob Herring Cc: Grant Likely Cc: devicet...@vger.kernel.org Signed-off-by: Preeti U. Murthy Signed-off-by: Shreyas B. Prabhu --- drivers/cpuidle/cpuidle-powernv.c | 27 ++- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index 23d2743..3ceff53 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -162,7 +162,8 @@ static int powernv_add_idle_states(void) int nr_idle_states = 1; /* Snooze */ int dt_idle_states; const __be32 *idle_state_flags; - u32 len_flags, flags; + const __be32 *idle_state_latency; + u32 len_flags, flags, latency_ns; int i; /* Currently we have snooze statically defined */ @@ -178,19 +179,33 @@ static int powernv_add_idle_states(void) pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n"); return nr_idle_states; } + idle_state_latency = of_get_property(power_mgt, + "ibm,cpu-idle-state-latencies-ns", NULL); + if (!idle_state_latency) { + pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-latencies-ns\n"); + return nr_idle_states; + } dt_idle_states = len_flags / sizeof(u32); for (i = 0; i < dt_idle_states; i++) { flags = be32_to_cpu(idle_state_flags[i]); + + /* Cpuidle accepts exit_latency in us and we estimate best case +* target residency to be 10x exit_latency +*/ + latency_ns = be32_to_cpu(idle_state_latency[i]); + if (flags & IDLE_INST_NAP) { /* Add NAP state */ strcpy(powernv_states[nr_idle_states].name, "Nap"); strcpy(powernv_states[nr_idle_states].desc, "Nap"); powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIME_VALID; - powernv_states[nr_idle_states].exit_latency = 10; - powernv_states[nr_idle_states].target_residency = 100; + powernv_states[nr_idle_states].exit_latency = + ((unsigned int)latency_ns) / 1000; + powernv_states[nr_idle_states].target_residency = + ((unsigned int)latency_ns / 100); powernv_states[nr_idle_states].enter = &nap_loop; nr_idle_states++; } @@ -201,8 +216,10 @@ static int powernv_add_idle_states(void) strcpy(powernv_states[nr_idle_states].desc, "FastSleep"); powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TIMER_STOP; - powernv_states[nr_idle_states].exit_latency = 300; - powernv_states[nr_idle_states].target_residency = 100; + powernv_states[nr_idle_states].exit_latency = + ((unsigned int)latency_ns) / 1000; + powernv_states[nr_idle_states].target_residency = + ((unsigned int)latency_ns / 100); powernv_states[nr_idle_states].enter = &fastsleep_loop; nr_idle_states++; } -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 3/5] powerpc/powernv: Add winkle infrastructure
On Tuesday 07 October 2014 11:03 AM, Benjamin Herrenschmidt wrote: > On Wed, 2014-10-01 at 13:16 +0530, Shreyas B. Prabhu wrote: >> Winkle causes power to be gated off to the entire chiplet. Hence the >> hypervisor/firmware state in the entire chiplet is lost. >> >> This patch adds necessary infrastructure to support waking up from >> hypervisor state loss. Specifically does following: >> - Before entering winkle, save state of registers that need to be >> restored on wake up (SDR1, HFSCR) > > Add ... to your list, it's not exhaustive, is it ? I use interrupt stack frame for only SDR1 and HFSCR. The rest of the SPRs are restored via PORE in the next patch. I'll change the comments to better reflect this. > >> - SRR1 bits 46:47 which is used to identify which power saving mode cpu >> woke up from is '11' for both winkle and sleep. Hence introduce a flag >> in PACA to distinguish b/w winkle and sleep. >> >> - Upon waking up, restore all saved registers, recover slb >> >> Cc: Benjamin Herrenschmidt >> Cc: Paul Mackerras >> Cc: Michael Ellerman >> Cc: linuxppc-...@lists.ozlabs.org >> Suggested-by: Vaidyanathan Srinivasan >> Signed-off-by: Shreyas B. Prabhu >> --- >> arch/powerpc/include/asm/machdep.h | 1 + >> arch/powerpc/include/asm/paca.h| 3 ++ >> arch/powerpc/include/asm/ppc-opcode.h | 2 + >> arch/powerpc/include/asm/processor.h | 2 + >> arch/powerpc/kernel/asm-offsets.c | 1 + >> arch/powerpc/kernel/exceptions-64s.S | 8 ++-- >> arch/powerpc/kernel/idle.c | 11 + >> arch/powerpc/kernel/idle_power7.S | 81 >> +- >> arch/powerpc/platforms/powernv/setup.c | 24 ++ >> 9 files changed, 127 insertions(+), 6 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/machdep.h >> b/arch/powerpc/include/asm/machdep.h >> index f37014f..0a3ced9 100644 >> --- a/arch/powerpc/include/asm/machdep.h >> +++ b/arch/powerpc/include/asm/machdep.h >> @@ -301,6 +301,7 @@ struct machdep_calls { >> /* Idle handlers */ >> void(*setup_idle)(void); >> unsigned long (*power7_sleep)(void); >> +unsigned long (*power7_winkle)(void); >> }; > > Why does it need to be ppc_md ? Same comments as for sleep > >> extern void e500_idle(void); >> diff --git a/arch/powerpc/include/asm/paca.h >> b/arch/powerpc/include/asm/paca.h >> index a5139ea..3358f09 100644 >> --- a/arch/powerpc/include/asm/paca.h >> +++ b/arch/powerpc/include/asm/paca.h >> @@ -158,6 +158,9 @@ struct paca_struct { >> * early exception handler for use by high level C handler >> */ >> struct opal_machine_check_event *opal_mc_evt; >> + >> +/* Flag to distinguish b/w sleep and winkle */ >> +u8 offline_state; > > Not fan of the name. I'd rather you call it "wakeup_state_loss" or > something a bit more explicit about what that actually means if it's > going to be a boolean value. Otherwise make it an enumeration of > constants. > Okay. I'll change this. >> #endif >> #ifdef CONFIG_PPC_BOOK3S_64 >> /* Exclusive emergency stack pointer for machine check exception. */ >> diff --git a/arch/powerpc/include/asm/ppc-opcode.h >> b/arch/powerpc/include/asm/ppc-opcode.h >> index 6f85362..5155be7 100644 >> --- a/arch/powerpc/include/asm/ppc-opcode.h >> +++ b/arch/powerpc/include/asm/ppc-opcode.h >> @@ -194,6 +194,7 @@ >> >> #define PPC_INST_NAP0x4c000364 >> #define PPC_INST_SLEEP 0x4c0003a4 >> +#define PPC_INST_WINKLE 0x4c0003e4 >> >> /* A2 specific instructions */ >> #define PPC_INST_ERATWE 0x7c0001a6 >> @@ -374,6 +375,7 @@ >> >> #define PPC_NAP stringify_in_c(.long PPC_INST_NAP) >> #define PPC_SLEEP stringify_in_c(.long PPC_INST_SLEEP) >> +#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE) >> >> /* BHRB instructions */ >> #define PPC_CLRBHRB stringify_in_c(.long PPC_INST_CLRBHRB) >> diff --git a/arch/powerpc/include/asm/processor.h >> b/arch/powerpc/include/asm/processor.h >> index 41953cd..00e3df9 100644 >> --- a/arch/powerpc/include/asm/processor.h >> +++ b/arch/powerpc/include/asm/processor.h >> @@ -455,6 +455,8 @@ extern void arch_setup_idle(void); >> extern void power7_nap(int check_irq); >> extern unsigned long power7_sleep(void); >> ext
[PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes
Fast sleep is an idle state, where the core and the L1 and L2 caches are brought down to a threshold voltage. This also means that the communication between L2 and L3 caches have to be fenced. However the current P8 chips have a bug wherein this fencing between L2 and L3 caches get delayed by a cpu cycle. This can delay L3 response to the other cpus if they request for data during this time. Thus they would fetch the same data from the memory which could lead to data corruption if L3 cache is not flushed. This series overcomes above problem in kernel. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: Srivatsa S. Bhat Cc: Preeti U. Murthy Cc: Vaidyanathan Srinivasan v2: Rebased on 3.17-rc7 Split from 'powerpc/powernv: Support for fastsleep and winkle' v1: https://lkml.org/lkml/2014/8/25/446 Preeti U Murthy (1): powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu (1): powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Srivatsa S. Bhat (1): powerpc/powernv: Enable Offline CPUs to enter deep idle states arch/powerpc/include/asm/machdep.h | 3 + arch/powerpc/include/asm/opal.h| 7 ++ arch/powerpc/include/asm/processor.h | 4 +- arch/powerpc/kernel/exceptions-64s.S | 35 arch/powerpc/kernel/idle.c | 19 arch/powerpc/kernel/idle_power7.S | 2 +- arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/powernv.h | 7 ++ arch/powerpc/platforms/powernv/setup.c | 118 + arch/powerpc/platforms/powernv/smp.c | 11 ++- drivers/cpuidle/cpuidle-powernv.c | 13 ++- 11 files changed, 194 insertions(+), 26 deletions(-) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] cpuidle/powernv: Populate cpuidle state details by querying the device-tree
From: Preeti U Murthy We hard code the metrics relevant for cpuidle states in the kernel today. Instead pick them up from the device tree so that they remain relevant and updated for the system that the kernel is running on. Cc: linux...@vger.kernel.org Cc: Rafael J. Wysocki Cc: Rob Herring Cc: Grant Likely Cc: devicet...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: Michael Ellerman Signed-off-by: Preeti U. Murthy Signed-off-by: Shreyas B. Prabhu --- v2: Rebased on 3.17-rc7 Separated from 'powerpc/powernv: Support for fastsleep and winkle' v1: Initial post https://lkml.org/lkml/2014/8/25/456 diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index a64be57..2426a4b 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -163,7 +163,8 @@ static int powernv_add_idle_states(void) int nr_idle_states = 1; /* Snooze */ int dt_idle_states; const __be32 *idle_state_flags; - u32 len_flags, flags; + const __be32 *idle_state_latency; + u32 len_flags, flags, latency_ns; int i; /* Currently we have snooze statically defined */ @@ -180,18 +181,32 @@ static int powernv_add_idle_states(void) return nr_idle_states; } + idle_state_latency = of_get_property(power_mgt, + "ibm,cpu-idle-state-latencies-ns", NULL); + if (!idle_state_latency) { + pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-latencies-ns\n"); + return nr_idle_states; + } + dt_idle_states = len_flags / sizeof(u32); for (i = 0; i < dt_idle_states; i++) { flags = be32_to_cpu(idle_state_flags[i]); + + /* Cpuidle accepts exit_latency in us and we estimate best case +* target residency to be 10x exit_latency +*/ + latency_ns = be32_to_cpu(idle_state_latency[i]); if (flags & IDLE_USE_INST_NAP) { /* Add NAP state */ strcpy(powernv_states[nr_idle_states].name, "Nap"); strcpy(powernv_states[nr_idle_states].desc, "Nap"); powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIME_VALID; - powernv_states[nr_idle_states].exit_latency = 10; - powernv_states[nr_idle_states].target_residency = 100; + powernv_states[nr_idle_states].exit_latency = + ((unsigned int)latency_ns) / 1000; + powernv_states[nr_idle_states].target_residency = + ((unsigned int)latency_ns / 100); powernv_states[nr_idle_states].enter = &nap_loop; nr_idle_states++; } @@ -202,8 +217,10 @@ static int powernv_add_idle_states(void) strcpy(powernv_states[nr_idle_states].desc, "FastSleep"); powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TIMER_STOP; - powernv_states[nr_idle_states].exit_latency = 300; - powernv_states[nr_idle_states].target_residency = 100; + powernv_states[nr_idle_states].exit_latency = + ((unsigned int)latency_ns) / 1000; + powernv_states[nr_idle_states].target_residency = + ((unsigned int)latency_ns / 100); powernv_states[nr_idle_states].enter = &fastsleep_loop; nr_idle_states++; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/5] powerpc: Adding macro for accessing Thread Switch Control Register
Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/reg.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 0c05059..cb65a73 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -371,6 +371,7 @@ #define SPRN_DBAT7L0x23F /* Data BAT 7 Lower Register */ #define SPRN_DBAT7U0x23E /* Data BAT 7 Upper Register */ #define SPRN_PPR 0x380 /* SMT Thread status Register */ +#define SPRN_TSCR 0x399 /* Thread Switch Control Register */ #define SPRN_DEC 0x016 /* Decrement Register */ #define SPRN_DER 0x095 /* Debug Enable Regsiter */ -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep
From: Preeti U Murthy Fast sleep is an idle state, where the core and the L1 and L2 caches are brought down to a threshold voltage. This also means that the communication between L2 and L3 caches have to be fenced. However the current P8 chips have a bug wherein this fencing between L2 and L3 caches get delayed by a cpu cycle. This can delay L3 response to the other cpus if they request for data during this time. Thus they would fetch the same data from the memory which could lead to data corruption if L3 cache is not flushed. The cpu idle states save power at a core level and not at a thread level. Hence powersavings is based on the shallowest idle state that a thread of a core is in. The above issue in fastsleep will arise only when all the threads in a core either enter fastsleep or some of them enter any deeper idle states, with only a few being in fastsleep. This patch therefore implements a workaround this bug by ensuring that, each time a cpu goes to fastsleep, it checks if it is the last thread in the core to enter fastsleep. If so, it needs to make an opal call to get around the above mentioned fastsleep problem in the hardware before issuing the sleep instruction. Similarly when a thread in a core comes out of fastsleep, it needs to verify if its the first thread in the core to come out of fastsleep and issue the opal call to revert the changes made while entering fastsleep. For the same reason mentioned above we need to take care of offline threads as well since we allow them to enter fastsleep and with support for deep winkle soon coming in they can enter winkle as well. We therefore ensure that even offline threads make the above mentioned opal calls similarly, so that as long as the threads in a core are in and idle state >= fastsleep, we have the workaround in place. Whenever a thread comes out of either of these states, it needs to verify if the opal call has been made and if so it will revert it. For now this patch ensures that offline threads enter fastsleep. We need to be able to synchronize the cpus in a core which are entering and exiting fastsleep so as to ensure that the last thread in the core to enter fastsleep and the first to exit fastsleep *only* issue the opal call. To do so, we need a per-core lock and counter. The counter is required to keep track of the number of threads in a core which are in idle state >= fastsleep. To make the implementation of this simple, we introduce a per-cpu lock and counter and every thread always takes the primary thread's lock, modifies the primary thread's counter. This effectively makes them per-core entities. But the workaround is abstracted in the powernv core code and neither the hotplug path nor the cpuidle driver need to bother about it. All they need to know is if fastsleep, with error or no error is present as an idle state. Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Signed-off-by: Shreyas B. Prabhu Signed-off-by: Preeti U Murthy --- arch/powerpc/include/asm/machdep.h | 3 + arch/powerpc/include/asm/opal.h| 3 + arch/powerpc/include/asm/processor.h | 4 +- arch/powerpc/kernel/idle.c | 19 arch/powerpc/kernel/idle_power7.S | 2 +- arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/setup.c | 139 ++--- drivers/cpuidle/cpuidle-powernv.c | 8 +- 8 files changed, 140 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index b125cea..f37014f 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -298,6 +298,9 @@ struct machdep_calls { #ifdef CONFIG_MEMORY_HOTREMOVE int (*remove_memory)(u64, u64); #endif + /* Idle handlers */ + void(*setup_idle)(void); + unsigned long (*power7_sleep)(void); }; extern void e500_idle(void); diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 28b8342..166d572 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -149,6 +149,7 @@ struct opal_sg_list { #define OPAL_DUMP_INFO294 #define OPAL_PCI_EEH_FREEZE_SET97 #define OPAL_HANDLE_HMI98 +#define OPAL_CONFIG_IDLE_STATE 99 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 @@ -775,6 +776,7 @@ extern struct device_node *opal_node; /* Flags used for idle state discovery from the device tree */ #define IDLE_INST_NAP 0x0001 /* nap instruction can be used */ #define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */ +#define IDLE_INST_SLEEP_ER10x0008 /* Use sleep
[PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states
From: "Srivatsa S. Bhat" The offline cpus should enter deep idle states so as to gain maximum powersavings when the entire core is offline. To do so the offline path must be made aware of the available deepest idle state. Hence probe the device tree for the possible idle states in powernv core code and expose the deepest idle state through flags. Since the device tree is probed by the cpuidle driver as well, move the parameters required to discover the idle states into an appropriate common place to both the driver and the powernv core code. Another point is that fastsleep idle state may require workarounds in the kernel to function properly. This workaround is introduced in the subsequent patches. However neither the cpuidle driver or the hotplug path need be bothered about this workaround. They will be taken care of by the core powernv code. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Srivatsa S. Bhat Signed-off-by: Shreyas B. Prabhu [ Changelog modified by pre...@linux.vnet.ibm.com ] Signed-off-by: Preeti U. Murthy --- arch/powerpc/include/asm/opal.h | 4 +++ arch/powerpc/platforms/powernv/powernv.h | 7 + arch/powerpc/platforms/powernv/setup.c | 51 arch/powerpc/platforms/powernv/smp.c | 11 ++- drivers/cpuidle/cpuidle-powernv.c| 7 ++--- 5 files changed, 75 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 86055e5..28b8342 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -772,6 +772,10 @@ extern struct kobject *opal_kobj; /* /ibm,opal */ extern struct device_node *opal_node; +/* Flags used for idle state discovery from the device tree */ +#define IDLE_INST_NAP 0x0001 /* nap instruction can be used */ +#define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */ + /* API functions */ int64_t opal_invalid_call(void); int64_t opal_console_write(int64_t term_number, __be64 *length, diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 75501bf..31ece13 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -23,6 +23,13 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask) } #endif +/* Flags to indicate which of the CPU idle states are available for use */ + +#define IDLE_USE_NAP (1UL << 0) +#define IDLE_USE_SLEEP (1UL << 1) + +extern unsigned int pnv_get_supported_cpuidle_states(void); + extern void pnv_lpc_init(void); bool cpu_core_split_required(void); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 5a0e2dc..2dca1d8 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -282,6 +282,57 @@ static void __init pnv_setup_machdep_rtas(void) } #endif /* CONFIG_PPC_POWERNV_RTAS */ +static unsigned int supported_cpuidle_states; + +unsigned int pnv_get_supported_cpuidle_states(void) +{ + return supported_cpuidle_states; +} + +static int __init pnv_probe_idle_states(void) +{ + struct device_node *power_mgt; + struct property *prop; + int dt_idle_states; + u32 *flags; + int i; + + supported_cpuidle_states = 0; + + if (cpuidle_disable != IDLE_NO_OVERRIDE) + return 0; + + if (!firmware_has_feature(FW_FEATURE_OPALv3)) + return 0; + + power_mgt = of_find_node_by_path("/ibm,opal/power-mgt"); + if (!power_mgt) { + pr_warn("opal: PowerMgmt Node not found\n"); + return 0; + } + + prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL); + if (!prop) { + pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n"); + return 0; + } + + dt_idle_states = prop->length / sizeof(u32); + flags = (u32 *) prop->value; + + for (i = 0; i < dt_idle_states; i++) { + if (flags[i] & IDLE_INST_NAP) + supported_cpuidle_states |= IDLE_USE_NAP; + + if (flags[i] & IDLE_INST_SLEEP) + supported_cpuidle_states |= IDLE_USE_SLEEP; + } + + return 0; +} + +subsys_initcall(pnv_probe_idle_states); + static int __init pnv_probe(void) { unsigned long root = of_get_flat_dt_root(); diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 5fcfcf4..3ad31d2 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void) static void pnv_smp_cpu_kill_self(void) { unsigned int cpu
[PATCH v2 0/5] Winkle support for offline cpus
Powernv already has support for nap and sleep and these states are used by cpuidle framework. This patchset adds support for 'deep winkle' a deeper idle state. In deep winkle, entire chiplet (core/L2/L3) is power off, leading to higher power savings. But this results in hypervisor state loss. This patchset add the necessary infrastructure to recover from hypervisor state loss and enables offline cpus to use winkle. I've successfully tested subcore functionality with these patches. Particularly these two scenarios: Scenario 1: -> Set subcore-per-core to 4. -> Offline and online a complete core Check if core wakes up with 4 subcores Scenario 2. -> Set subcore-per-core to 1. -> Offline a core. -> set subcore-per-core to 4. -> Online a core Check if core wakes up with 4 subcores. In both these scenarios, the core wakes up with 4 subcores and can run guests on individual subcores. Note, these patches apply on top 'powernv/cpuidle: Fastsleep workaround and fixes' series. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Srivatsa S. Bhat Cc: Preeti U. Murthy Cc: Vaidyanathan Srinivasan Cc: linuxppc-...@lists.ozlabs.org v2: Rebased on 3.17-rc7 Split from 'powerpc/powernv: Support for fastsleep and winkle' v1: https://lkml.org/lkml/2014/8/25/446 Shreyas B. Prabhu (5): powerpc/powernv: Add OPAL call to save and restore powerpc: Adding macro for accessing Thread Switch Control Register powerpc/powernv: Add winkle infrastructure powerpc/powernv: Discover and enable winkle powerpc/powernv: Enter deepest supported idle state in offline arch/powerpc/include/asm/machdep.h | 1 + arch/powerpc/include/asm/opal.h| 3 + arch/powerpc/include/asm/paca.h| 3 + arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 2 + arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kernel/exceptions-64s.S | 4 +- arch/powerpc/kernel/idle.c | 11 +++ arch/powerpc/kernel/idle_power7.S | 81 - arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/powernv.h | 1 + arch/powerpc/platforms/powernv/setup.c | 99 ++ arch/powerpc/platforms/powernv/smp.c | 6 +- arch/powerpc/platforms/powernv/subcore.c | 15 15 files changed, 226 insertions(+), 5 deletions(-) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep
When guests have to be launched, the secondary threads which are offline are woken up to run the guests. Today these threads wake up from nap and check if they have to run guests. Now that the offline secondary threads can go to fastsleep or going ahead a deeper idle state such as winkle, add this check in the wakeup from any of the deep idle states path as well. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Suggested-by: "Srivatsa S. Bhat" Signed-off-by: Shreyas B. Prabhu [ Changelog added by ] Signed-off-by: Preeti U Murthy --- arch/powerpc/kernel/exceptions-64s.S | 35 --- 1 file changed, 16 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 050f79a..c64f3cc0 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -100,25 +100,8 @@ system_reset_pSeries: SET_SCRATCH0(r13) #ifdef CONFIG_PPC_P7_NAP BEGIN_FTR_SECTION - /* Running native on arch 2.06 or later, check if we are -* waking up from nap. We only handle no state loss and -* supervisor state loss. We do -not- handle hypervisor -* state loss at this time. -*/ - mfspr r13,SPRN_SRR1 - rlwinm. r13,r13,47-31,30,31 - beq 9f - /* waking up from powersave (nap) state */ - cmpwi cr1,r13,2 - /* Total loss of HV state is fatal, we could try to use the -* PIR to locate a PACA, then use an emergency stack etc... -* OPAL v3 based powernv platforms have new idle states -* which fall in this catagory. -*/ - bgt cr1,8f GET_PACA(r13) - #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE li r0,KVM_HWTHREAD_IN_KERNEL stb r0,HSTATE_HWTHREAD_STATE(r13) @@ -131,13 +114,27 @@ BEGIN_FTR_SECTION 1: #endif + /* Running native on arch 2.06 or later, check if we are +* waking up from nap. We only handle no state loss and +* supervisor state loss. We do -not- handle hypervisor +* state loss at this time. +*/ + mfspr r13,SPRN_SRR1 + rlwinm. r13,r13,47-31,30,31 + beq 9f + + /* waking up from powersave (nap) state */ + cmpwi cr1,r13,2 + GET_PACA(r13) + + bgt cr1,8f + beq cr1,2f b power7_wakeup_noloss 2: b power7_wakeup_loss /* Fast Sleep wakeup on PowerNV */ -8: GET_PACA(r13) - b power7_wakeup_tb_loss +8: b power7_wakeup_tb_loss 9: END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 3/5] powerpc/powernv: Add winkle infrastructure
Winkle causes power to be gated off to the entire chiplet. Hence the hypervisor/firmware state in the entire chiplet is lost. This patch adds necessary infrastructure to support waking up from hypervisor state loss. Specifically does following: - Before entering winkle, save state of registers that need to be restored on wake up (SDR1, HFSCR) - SRR1 bits 46:47 which is used to identify which power saving mode cpu woke up from is '11' for both winkle and sleep. Hence introduce a flag in PACA to distinguish b/w winkle and sleep. - Upon waking up, restore all saved registers, recover slb Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Suggested-by: Vaidyanathan Srinivasan Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/machdep.h | 1 + arch/powerpc/include/asm/paca.h| 3 ++ arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 2 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kernel/exceptions-64s.S | 8 ++-- arch/powerpc/kernel/idle.c | 11 + arch/powerpc/kernel/idle_power7.S | 81 +- arch/powerpc/platforms/powernv/setup.c | 24 ++ 9 files changed, 127 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index f37014f..0a3ced9 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -301,6 +301,7 @@ struct machdep_calls { /* Idle handlers */ void(*setup_idle)(void); unsigned long (*power7_sleep)(void); + unsigned long (*power7_winkle)(void); }; extern void e500_idle(void); diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index a5139ea..3358f09 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -158,6 +158,9 @@ struct paca_struct { * early exception handler for use by high level C handler */ struct opal_machine_check_event *opal_mc_evt; + + /* Flag to distinguish b/w sleep and winkle */ + u8 offline_state; #endif #ifdef CONFIG_PPC_BOOK3S_64 /* Exclusive emergency stack pointer for machine check exception. */ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 6f85362..5155be7 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -194,6 +194,7 @@ #define PPC_INST_NAP 0x4c000364 #define PPC_INST_SLEEP 0x4c0003a4 +#define PPC_INST_WINKLE0x4c0003e4 /* A2 specific instructions */ #define PPC_INST_ERATWE0x7c0001a6 @@ -374,6 +375,7 @@ #define PPC_NAPstringify_in_c(.long PPC_INST_NAP) #define PPC_SLEEP stringify_in_c(.long PPC_INST_SLEEP) +#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE) /* BHRB instructions */ #define PPC_CLRBHRBstringify_in_c(.long PPC_INST_CLRBHRB) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 41953cd..00e3df9 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -455,6 +455,8 @@ extern void arch_setup_idle(void); extern void power7_nap(int check_irq); extern unsigned long power7_sleep(void); extern unsigned long __power7_sleep(void); +extern unsigned long power7_winkle(void); +extern unsigned long __power7_winkle(void); extern void flush_instruction_cache(void); extern void hard_reset_now(void); extern void poweroff_now(void); diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9d7dede..ea98817 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -731,6 +731,7 @@ int main(void) DEFINE(OPAL_MC_SRR0, offsetof(struct opal_machine_check_event, srr0)); DEFINE(OPAL_MC_SRR1, offsetof(struct opal_machine_check_event, srr1)); DEFINE(PACA_OPAL_MC_EVT, offsetof(struct paca_struct, opal_mc_evt)); + DEFINE(PACAOFFLINESTATE, offsetof(struct paca_struct, offline_state)); #endif return 0; diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index c64f3cc0..261f348 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -115,9 +115,7 @@ BEGIN_FTR_SECTION #endif /* Running native on arch 2.06 or later, check if we are -* waking up from nap. We only handle no state loss and -* supervisor state loss. We do -not- handle hypervisor -* state loss at this time. +* waking up from power saving mode. */ mfspr r13,SPRN_SRR1 rlwinm. r13,r13,47-31,30,31 @@ -133,8 +131,8 @@ BEGIN_FTR_SECTION b power7_wakeup_noloss
[PATCH v2 5/5] powerpc/powernv: Enter deepest supported idle state in offline
Enter winkle during offline if supported, else revert to sleep or nap. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/platforms/powernv/smp.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 3ad31d2..e3fc2c9 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -169,8 +169,10 @@ static void pnv_smp_cpu_kill_self(void) while (!generic_check_cpu_restart(cpu)) { ppc64_runlatch_off(); - /* If sleep is supported, go to sleep, instead of nap */ - if (idle_states & IDLE_USE_SLEEP) + /* Go to deepest supported idle state */ + if (idle_states & IDLE_USE_WINKLE) + power7_winkle(); + else if (idle_states & IDLE_USE_SLEEP) power7_sleep(); else power7_nap(1); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 4/5] powerpc/powernv: Discover and enable winkle
Discover winkle from device tree. If supported make OPAL calls necessary to save HIDs, HMEER, HSPRG0 and LPCR. Also make OPAL call when the HID0 value is modified during split/unsplit of cores. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/opal.h | 1 + arch/powerpc/platforms/powernv/powernv.h | 1 + arch/powerpc/platforms/powernv/setup.c | 75 arch/powerpc/platforms/powernv/subcore.c | 15 +++ 4 files changed, 92 insertions(+) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index d376020..a77957f 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -778,6 +778,7 @@ extern struct device_node *opal_node; #define IDLE_INST_NAP 0x0001 /* nap instruction can be used */ #define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */ #define IDLE_INST_SLEEP_ER10x0008 /* Use sleep with work around*/ +#define IDLE_INST_WINKLE 0x0004 /* winkle instruction can be used */ /* API functions */ int64_t opal_invalid_call(void); diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 31ece13..76b37f8 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -27,6 +27,7 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask) #define IDLE_USE_NAP (1UL << 0) #define IDLE_USE_SLEEP (1UL << 1) +#define IDLE_USE_WINKLE(1UL << 3) extern unsigned int pnv_get_supported_cpuidle_states(void); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index f45b52d..13c5e49 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -273,6 +273,65 @@ unsigned int pnv_get_supported_cpuidle_states(void) return supported_cpuidle_states; } +int pnv_save_sprs_for_winkle(void) +{ + int cpu; + int rc; + + /* + * hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross + * all cpus at boot. Get these reg values of current cpu and use the + * same accross all cpus. + */ + uint64_t lpcr_val = mfspr(SPRN_LPCR); + uint64_t hid0_val = mfspr(SPRN_HID0); + uint64_t hid1_val = mfspr(SPRN_HID1); + uint64_t hid4_val = mfspr(SPRN_HID4); + uint64_t hid5_val = mfspr(SPRN_HID5); + uint64_t hmeer_val = mfspr(SPRN_HMEER); + + for_each_possible_cpu(cpu) { + uint64_t pir = get_hard_smp_processor_id(cpu); + uint64_t local_paca_ptr = (uint64_t)&paca[cpu]; + + rc = opal_slw_set_reg(pir, SPRN_HSPRG0, local_paca_ptr); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val); + if (rc != 0) + return rc; + + /* HIDs are per core registers */ + if (cpu_thread_in_core(cpu) == 0) { + + rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val); + if (rc != 0) + return rc; + + } + + } + + return 0; + +} static int __init pnv_probe_idle_states(void) { struct device_node *power_mgt; @@ -318,6 +377,22 @@ static int __init pnv_probe_idle_states(void) supported_cpuidle_states |= IDLE_USE_SLEEP; need_fastsleep_workaround = 1; } + + if (flags & IDLE_INST_WINKLE) { + /* +* If winkle is supported, save HSPRG0, HIDs and LPCR +* contents via OPAL. Enable winkle only if this +* succeeds. +*/ + int opal_ret_val = pnv_save_sprs_for_winkle(); + + if (!opal_ret_val) + supported_cpuidle_states |= IDLE_USE_WINKLE; + else + pr_warn("opal: opal_slw_set_reg failed with rc=%d, disabling winkle\n", +
[PATCH v2 1/5] powerpc/powernv: Add OPAL call to save and restore
PORE can be programmed to restore hypervisor registers when waking up from deep cpu idle states like winkle. Add call to pass SPR address and value to OPAL, which in turn will program PORE to restore the register state. Cc: linuxppc-...@lists.ozlabs.org Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Suggested-by: Vaidyanathan Srinivasan Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/opal.h| 2 ++ arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + 2 files changed, 3 insertions(+) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 166d572..d376020 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -150,6 +150,7 @@ struct opal_sg_list { #define OPAL_PCI_EEH_FREEZE_SET97 #define OPAL_HANDLE_HMI98 #define OPAL_CONFIG_IDLE_STATE 99 +#define OPAL_SLW_SET_REG 100 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 @@ -978,6 +979,7 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs); extern void opal_shutdown(void); extern int opal_resync_timebase(void); int64_t opal_config_idle_state(uint64_t state, uint64_t enter); +int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); extern void opal_lpc_init(void); diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index 8d1e724..12e5d46 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -246,5 +246,6 @@ OPAL_CALL(opal_get_param, OPAL_GET_PARAM); OPAL_CALL(opal_set_param, OPAL_SET_PARAM); OPAL_CALL(opal_handle_hmi, OPAL_HANDLE_HMI); OPAL_CALL(opal_config_idle_state, OPAL_CONFIG_IDLE_STATE); +OPAL_CALL(opal_slw_set_reg,OPAL_SLW_SET_REG); OPAL_CALL(opal_register_dump_region, OPAL_REGISTER_DUMP_REGION); OPAL_CALL(opal_unregister_dump_region, OPAL_UNREGISTER_DUMP_REGION); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep
CCing Rafael J. Wysocki and linux...@vger.kernel.org On Wednesday 01 October 2014 01:15 PM, Shreyas B. Prabhu wrote: > When guests have to be launched, the secondary threads which are offline > are woken up to run the guests. Today these threads wake up from nap > and check if they have to run guests. Now that the offline secondary > threads can go to fastsleep or going ahead a deeper idle state such as winkle, > add this check in the wakeup from any of the deep idle states path as well. > > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > Cc: linuxppc-...@lists.ozlabs.org > Suggested-by: "Srivatsa S. Bhat" > Signed-off-by: Shreyas B. Prabhu > [ Changelog added by ] > Signed-off-by: Preeti U Murthy > --- > arch/powerpc/kernel/exceptions-64s.S | 35 --- > 1 file changed, 16 insertions(+), 19 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 050f79a..c64f3cc0 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -100,25 +100,8 @@ system_reset_pSeries: > SET_SCRATCH0(r13) > #ifdef CONFIG_PPC_P7_NAP > BEGIN_FTR_SECTION > - /* Running native on arch 2.06 or later, check if we are > - * waking up from nap. We only handle no state loss and > - * supervisor state loss. We do -not- handle hypervisor > - * state loss at this time. > - */ > - mfspr r13,SPRN_SRR1 > - rlwinm. r13,r13,47-31,30,31 > - beq 9f > > - /* waking up from powersave (nap) state */ > - cmpwi cr1,r13,2 > - /* Total loss of HV state is fatal, we could try to use the > - * PIR to locate a PACA, then use an emergency stack etc... > - * OPAL v3 based powernv platforms have new idle states > - * which fall in this catagory. > - */ > - bgt cr1,8f > GET_PACA(r13) > - > #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > li r0,KVM_HWTHREAD_IN_KERNEL > stb r0,HSTATE_HWTHREAD_STATE(r13) > @@ -131,13 +114,27 @@ BEGIN_FTR_SECTION > 1: > #endif > > + /* Running native on arch 2.06 or later, check if we are > + * waking up from nap. We only handle no state loss and > + * supervisor state loss. We do -not- handle hypervisor > + * state loss at this time. > + */ > + mfspr r13,SPRN_SRR1 > + rlwinm. r13,r13,47-31,30,31 > + beq 9f > + > + /* waking up from powersave (nap) state */ > + cmpwi cr1,r13,2 > + GET_PACA(r13) > + > + bgt cr1,8f > + > beq cr1,2f > b power7_wakeup_noloss > 2: b power7_wakeup_loss > > /* Fast Sleep wakeup on PowerNV */ > -8: GET_PACA(r13) > - b power7_wakeup_tb_loss > +8: b power7_wakeup_tb_loss > > 9: > END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes
On Thursday 02 October 2014 02:16 AM, Rafael J. Wysocki wrote: > On Wednesday, October 01, 2014 01:15:57 PM Shreyas B. Prabhu wrote: >> Fast sleep is an idle state, where the core and the L1 and L2 >> caches are brought down to a threshold voltage. This also means that >> the communication between L2 and L3 caches have to be fenced. However >> the current P8 chips have a bug wherein this fencing between L2 and >> L3 caches get delayed by a cpu cycle. This can delay L3 response to >> the other cpus if they request for data during this time. Thus they >> would fetch the same data from the memory which could lead to data >> corruption if L3 cache is not flushed. >> >> This series overcomes above problem in kernel. >> >> Cc: Benjamin Herrenschmidt >> Cc: Paul Mackerras >> Cc: Michael Ellerman >> Cc: Rafael J. Wysocki >> Cc: linux...@vger.kernel.org >> Cc: linuxppc-...@lists.ozlabs.org >> Cc: Srivatsa S. Bhat >> Cc: Preeti U. Murthy >> Cc: Vaidyanathan Srinivasan >> >> v2: >> Rebased on 3.17-rc7 >> Split from 'powerpc/powernv: Support for fastsleep and winkle' >> >> v1: >> https://lkml.org/lkml/2014/8/25/446 >> >> Preeti U Murthy (1): >> powerpc/powernv/cpuidle: Add workaround to enable fastsleep >> >> Shreyas B. Prabhu (1): >> powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from >> fast-sleep >> >> Srivatsa S. Bhat (1): >> powerpc/powernv: Enable Offline CPUs to enter deep idle states >> >> arch/powerpc/include/asm/machdep.h | 3 + >> arch/powerpc/include/asm/opal.h| 7 ++ >> arch/powerpc/include/asm/processor.h | 4 +- >> arch/powerpc/kernel/exceptions-64s.S | 35 >> arch/powerpc/kernel/idle.c | 19 >> arch/powerpc/kernel/idle_power7.S | 2 +- >> arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + >> arch/powerpc/platforms/powernv/powernv.h | 7 ++ >> arch/powerpc/platforms/powernv/setup.c | 118 >> + >> arch/powerpc/platforms/powernv/smp.c | 11 ++- >> drivers/cpuidle/cpuidle-powernv.c | 13 ++- >> 11 files changed, 194 insertions(+), 26 deletions(-) > > [2/3] seems to be missig from the series. > > Also, since that mostly modifies arch/powerpc, I think it should go through > that tree. I'm fine with the cpuidle-powernv changes in [1/3] and [3/3]. > Hi Rafael, Thanks for looking into this. The second patch is an independent fix in the powerpc exception handler. To be safe I am ccing you and linux-pm list on that patch now. Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/4] powernv: cpuidle: Redesign idle states management
Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these states. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores are involved. Winkle is deeper idle state compared to fastsleep. In this state the power supply to the chiplet, i.e core, private L2 and private L3 is turned off. This results in a total hypervisor state loss. This patch set adds support for winkle and provides a way to track the idle states of the threads of the core and use it for idle state management of idle states sleep and winkle. TODO: - Handle the case where a thread enters nap and wakes up with supervisor/ hypervisor state loss. This can only happen due to a bug in the hardware or the kernel. One way to handle this can be restore the state, switch to the kernel process context and trigger a panic or a warning. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: Vaidyanathan Srinivasan Cc: Preeti U Murthy Paul Mackerras (1): powerpc: powernv: Switch off MMU before entering nap/sleep/rvwinkle mode Preeti U. Murthy (1): powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu (2): powernv: cpuidle: Redesign idle states management powernv: powerpc: Add winkle support for offline cpus arch/powerpc/include/asm/cpuidle.h | 14 ++ arch/powerpc/include/asm/opal.h| 13 + arch/powerpc/include/asm/paca.h| 6 + arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/reg.h | 2 + arch/powerpc/kernel/asm-offsets.c | 6 + arch/powerpc/kernel/cpu_setup_power.S | 4 + arch/powerpc/kernel/exceptions-64s.S | 30 ++- arch/powerpc/kernel/idle_power7.S | 326 + arch/powerpc/platforms/powernv/opal-wrappers.S | 39 +++ arch/powerpc/platforms/powernv/powernv.h | 2 + arch/powerpc/platforms/powernv/setup.c | 170 + arch/powerpc/platforms/powernv/smp.c | 10 +- arch/powerpc/platforms/powernv/subcore.c | 35 +++ arch/powerpc/platforms/powernv/subcore.h | 1 + drivers/cpuidle/cpuidle-powernv.c | 10 +- 17 files changed, 611 insertions(+), 60 deletions(-) create mode 100644 arch/powerpc/include/asm/cpuidle.h -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4] powernv: cpuidle: Redesign idle states management
Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these state. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores and kvm is involved. This patch adds the necessary infrastructure to track idle states of threads in a per-core structure. It uses this info to perform tasks like fastsleep workaround and timebase resync only once per core. Signed-off-by: Shreyas B. Prabhu Originally-by: Preeti U. Murthy Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org --- arch/powerpc/include/asm/cpuidle.h | 14 ++ arch/powerpc/include/asm/opal.h| 2 + arch/powerpc/include/asm/paca.h| 4 + arch/powerpc/kernel/asm-offsets.c | 4 + arch/powerpc/kernel/exceptions-64s.S | 20 ++- arch/powerpc/kernel/idle_power7.S | 183 +++-- arch/powerpc/platforms/powernv/opal-wrappers.S | 37 + arch/powerpc/platforms/powernv/setup.c | 52 ++- arch/powerpc/platforms/powernv/smp.c | 3 +- drivers/cpuidle/cpuidle-powernv.c | 3 +- 10 files changed, 267 insertions(+), 55 deletions(-) create mode 100644 arch/powerpc/include/asm/cpuidle.h diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h new file mode 100644 index 000..8c82850 --- /dev/null +++ b/arch/powerpc/include/asm/cpuidle.h @@ -0,0 +1,14 @@ +#ifndef _ASM_POWERPC_CPUIDLE_H +#define _ASM_POWERPC_CPUIDLE_H + +#ifdef CONFIG_PPC_POWERNV +/* Used in powernv idle state management */ +#define PNV_THREAD_RUNNING 0 +#define PNV_THREAD_NAP 1 +#define PNV_THREAD_SLEEP2 +#define PNV_THREAD_WINKLE 3 +#define PNV_CORE_IDLE_LOCK_BIT 0x100 +#define PNV_CORE_IDLE_THREAD_BITS 0x0FF +#endif + +#endif diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index f8b95c0..bef7fbc 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -152,6 +152,7 @@ struct opal_sg_list { #define OPAL_PCI_ERR_INJECT96 #define OPAL_PCI_EEH_FREEZE_SET97 #define OPAL_HANDLE_HMI98 +#define OPAL_CONFIG_CPU_IDLE_STATE 99 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 @@ -162,6 +163,7 @@ struct opal_sg_list { */ #define OPAL_PM_NAP_ENABLED0x0001 #define OPAL_PM_SLEEP_ENABLED 0x0002 +#define OPAL_PM_SLEEP_ENABLED_ER1 0x0008 #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index a5139ea..85aeedb 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -158,6 +158,10 @@ struct paca_struct { * early exception handler for use by high level C handler */ struct opal_machine_check_event *opal_mc_evt; + + /* Per-core mask tracking idle threads and a lock bit-[L][] */ + u32 *core_idle_state_ptr; + u8 thread_idle_state; /* ~Idle[0]/Nap[1]/Sleep[2]/Winkle[3] */ #endif #ifdef CONFIG_PPC_BOOK3S_64 /* Exclusive emergency stack pointer for machine check exception. */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9d7dede..50f299e 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -731,6 +731,10 @@ int main(void) DEFINE(OPAL_MC_SRR0, offsetof(struct opal_machine_check_event, srr0)); DEFINE(OPAL_MC_SRR1, offsetof(struct opal_machine_check_event, srr1)); DEFINE(PACA_OPAL_MC_EVT, offsetof(struct paca_struct, opal_mc_evt)); + DEFINE(PACA_CORE_IDLE_STATE_PTR, + offsetof(struct paca_struct, core_idle_state_ptr)); + DEFINE(PACA_THREAD_IDLE_STATE, + offsetof(struct paca_struct, thread_idle_state)); #endif return 0; diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 72e783e..3311c8d 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -15,6 +15,7 @@ #include #include #include +#include /* * We layout
[PATCH 4/4] powernv: powerpc: Add winkle support for offline cpus
Winkle is a deep idle state supported in power8 chips. A core enters winkle when all the threads of the core enter winkle. In this state power supply to the entire chiplet i.e core, private L2 and private L3 is turned off. As a result it gives higher powersavings compared to sleep. But entering winkle results in a total hypervisor state loss. Hence the hypervisor context has to be preserved before entering winkle and restored upon wake up. Power-on Reset Engine (PORE) is a dedicated engine which is responsible for powering on the chiplet during wake up. It can be programmed to restore the register contests of a few specific registers. This patch uses PORE to restore register state wherever possible and uses stack to save and restore rest of the necessary registers. With hypervisor state restore things fall under three categories- per-core state, per-subcore state and per-thread state. To manage this, extend the infrastructure introduced for sleep. Mainly we add a paca variable subcore_sibling_mask. Using this and the core_idle_state we can distingush first thread in core and subcore. Signed-off-by: Shreyas B. Prabhu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org --- arch/powerpc/include/asm/opal.h| 3 + arch/powerpc/include/asm/paca.h| 2 + arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/reg.h | 2 + arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kernel/cpu_setup_power.S | 4 + arch/powerpc/kernel/exceptions-64s.S | 10 ++ arch/powerpc/kernel/idle_power7.S | 161 ++--- arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + arch/powerpc/platforms/powernv/setup.c | 73 +++ arch/powerpc/platforms/powernv/smp.c | 4 +- arch/powerpc/platforms/powernv/subcore.c | 34 ++ arch/powerpc/platforms/powernv/subcore.h | 1 + 14 files changed, 285 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index bef7fbc..f0ca2d9 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -153,6 +153,7 @@ struct opal_sg_list { #define OPAL_PCI_EEH_FREEZE_SET97 #define OPAL_HANDLE_HMI98 #define OPAL_CONFIG_CPU_IDLE_STATE 99 +#define OPAL_SLW_SET_REG 100 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 @@ -163,6 +164,7 @@ struct opal_sg_list { */ #define OPAL_PM_NAP_ENABLED0x0001 #define OPAL_PM_SLEEP_ENABLED 0x0002 +#define OPAL_PM_WINKLE_ENABLED 0x0004 #define OPAL_PM_SLEEP_ENABLED_ER1 0x0008 #ifndef __ASSEMBLY__ @@ -972,6 +974,7 @@ int64_t opal_sensor_read(uint32_t sensor_hndl, int token, __be32 *sensor_data); int64_t opal_handle_hmi(void); int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end); int64_t opal_unregister_dump_region(uint32_t id); +int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number); /* Internal functions */ diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 85aeedb..c2e51b7 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -162,6 +162,8 @@ struct paca_struct { /* Per-core mask tracking idle threads and a lock bit-[L][] */ u32 *core_idle_state_ptr; u8 thread_idle_state; /* ~Idle[0]/Nap[1]/Sleep[2]/Winkle[3] */ + /* Mask to denote subcore sibling threads */ + u8 subcore_sibling_mask; #endif #ifdef CONFIG_PPC_BOOK3S_64 /* Exclusive emergency stack pointer for machine check exception. */ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 6f85362..5155be7 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -194,6 +194,7 @@ #define PPC_INST_NAP 0x4c000364 #define PPC_INST_SLEEP 0x4c0003a4 +#define PPC_INST_WINKLE0x4c0003e4 /* A2 specific instructions */ #define PPC_INST_ERATWE0x7c0001a6 @@ -374,6 +375,7 @@ #define PPC_NAPstringify_in_c(.long PPC_INST_NAP) #define PPC_SLEEP stringify_in_c(.long PPC_INST_SLEEP) +#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE) /* BHRB instructions */ #define PPC_CLRBHRBstringify_in_c(.long PPC_INST_CLRBHRB) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index dda7ac4..c076842 100644 --- a/arch/powerpc/include/asm/processor.h +++ b
[PATCH 1/4] powerpc: powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
From: Paul Mackerras Currently, when going idle, we set the flag indicating that we are in nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap (or sleep or rvwinkle) instruction, all with the MMU on. This is bad for two reasons: (a) the architecture specifies that those instructions must be executed with the MMU off, and in fact with only the SF, HV, ME and possibly RI bits set, and (b) this introduces a race, because as soon as we set the flag, another thread can switch the MMU to a guest context. If the race is lost, this thread will typically start looping on relocation-on ISIs at 0xc...4400. This fixes it by setting the MSR as required by the architecture before setting the flag or executing the nap/sleep/rvwinkle instruction. [ shre...@linux.vnet.ibm.com: Edited to handle LE ] Signed-off-by: Paul Mackerras Signed-off-by: Shreyas B. Prabhu Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org --- arch/powerpc/include/asm/reg.h| 2 ++ arch/powerpc/kernel/idle_power7.S | 18 +- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index c998279..a68ee15 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -118,8 +118,10 @@ #define __MSR (MSR_ME | MSR_RI | MSR_IR | MSR_DR | MSR_ISF |MSR_HV) #ifdef __BIG_ENDIAN__ #define MSR_ __MSR +#define MSR_IDLE (MSR_ME | MSR_SF | MSR_HV) #else #define MSR_ (__MSR | MSR_LE) +#define MSR_IDLE (MSR_ME | MSR_SF | MSR_HV | MSR_LE) #endif #define MSR_KERNEL (MSR_ | MSR_64BIT) #define MSR_USER32 (MSR_ | MSR_PR | MSR_EE) diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index c0754bb..283c603 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -101,7 +101,23 @@ _GLOBAL(power7_powersave_common) std r9,_MSR(r1) std r1,PACAR1(r13) -_GLOBAL(power7_enter_nap_mode) + /* +* Go to real mode to do the nap, as required by the architecture. +* Also, we need to be in real mode before setting hwthread_state, +* because as soon as we do that, another thread can switch +* the MMU context to the guest. +*/ + LOAD_REG_IMMEDIATE(r5, MSR_IDLE) + li r6, MSR_RI + andcr6, r9, r6 + LOAD_REG_ADDR(r7, power7_enter_nap_mode) + mtmsrd r6, 1 /* clear RI before setting SRR0/1 */ + mtspr SPRN_SRR0, r7 + mtspr SPRN_SRR1, r5 + rfid + + .globl power7_enter_nap_mode +power7_enter_nap_mode: #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* Tell KVM we're napping */ li r4,KVM_HWTHREAD_IN_NAP -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4] powerpc/powernv: Enable Offline CPUs to enter deep idle states
From: "Preeti U. Murthy" The secondary threads should enter deep idle states so as to gain maximum powersavings when the entire core is offline. To do so the offline path must be made aware of the available deepest idle state. Hence probe the device tree for the possible idle states in powernv core code and expose the deepest idle state through flags. Since the device tree is probed by the cpuidle driver as well, move the parameters required to discover the idle states into an appropriate common place to both the driver and the powernv core code. Another point is that fastsleep idle state may require workarounds in the kernel to function properly. This workaround is introduced in the subsequent patches. However neither the cpuidle driver or the hotplug path need be bothered about this workaround. They will be taken care of by the core powernv code. Originally-by: Srivatsa S. Bhat Signed-off-by: Preeti U. Murthy Signed-off-by: Shreyas B. Prabhu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org --- arch/powerpc/include/asm/opal.h | 8 ++ arch/powerpc/platforms/powernv/powernv.h | 2 ++ arch/powerpc/platforms/powernv/setup.c | 49 arch/powerpc/platforms/powernv/smp.c | 7 - drivers/cpuidle/cpuidle-powernv.c| 9 ++ 5 files changed, 68 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 9124b0e..f8b95c0 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -155,6 +155,14 @@ struct opal_sg_list { #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 +/* Device tree flags */ + +/* Flags set in power-mgmt nodes in device tree if + * respective idle states are supported in the platform. + */ +#define OPAL_PM_NAP_ENABLED0x0001 +#define OPAL_PM_SLEEP_ENABLED 0x0002 + #ifndef __ASSEMBLY__ #include diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 6c8e2d1..604c48e 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -29,6 +29,8 @@ static inline u64 pnv_pci_dma_get_required_mask(struct pci_dev *pdev) } #endif +extern u32 pnv_get_supported_cpuidle_states(void); + extern void pnv_lpc_init(void); bool cpu_core_split_required(void); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 3f9546d..34c6665 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -290,6 +290,55 @@ static void __init pnv_setup_machdep_rtas(void) } #endif /* CONFIG_PPC_POWERNV_RTAS */ +static u32 supported_cpuidle_states; + +u32 pnv_get_supported_cpuidle_states(void) +{ + return supported_cpuidle_states; +} + +static int __init pnv_init_idle_states(void) +{ + struct device_node *power_mgt; + int dt_idle_states; + const __be32 *idle_state_flags; + u32 len_flags, flags; + int i; + + supported_cpuidle_states = 0; + + if (cpuidle_disable != IDLE_NO_OVERRIDE) + return 0; + + if (!firmware_has_feature(FW_FEATURE_OPALv3)) + return 0; + + power_mgt = of_find_node_by_path("/ibm,opal/power-mgt"); + if (!power_mgt) { + pr_warn("opal: PowerMgmt Node not found\n"); + return 0; + } + + idle_state_flags = of_get_property(power_mgt, + "ibm,cpu-idle-state-flags", &len_flags); + if (!idle_state_flags) { + pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n"); + return 0; + } + + dt_idle_states = len_flags / sizeof(u32); + + for (i = 0; i < dt_idle_states; i++) { + flags = be32_to_cpu(idle_state_flags[i]); + supported_cpuidle_states |= flags; + } + + return 0; +} + +subsys_initcall(pnv_init_idle_states); + + static int __init pnv_probe(void) { unsigned long root = of_get_flat_dt_root(); diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 4753958..3dc4cec 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void) static void pnv_smp_cpu_kill_self(void) { unsigned int cpu; + u32 idle_states; /* Standard hot unplug procedure */ local_irq_disable(); @@ -159,13 +160,17 @@ static void pnv_smp_cpu_kill_self(void) generic_set_cpu_dead(cpu); smp_wmb(); + idle_states = pnv_get_supported_cpuidle_states(); /* We don't want to take decrementer interrupts while we are offline, * so clear LPCR:PEC
Re: [PATCH v2] powerpc/powernv: Fix race in updating core_idle_state
On 07/09/2015 10:11 AM, Daniel Axtens wrote: >> I recommend creating an alias or script that does: >> >> $ git log --pretty=fixes -n 1 $commit | xclip >> > > FWIW, having finally got around to doing this, I found I first needed > the following snippet in ~/.gitconfig from > https://www.kernel.org/doc/Documentation/SubmittingPatches > > > [core] > abbrev = 12 > [pretty] > fixes = Fixes: %h (\"%s\") > > Otherwise git doesn't know what the pretty format is. > Right, thanks for the pointer! Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower tools: Fix error when running cpupower monitor
On 08/17/2015 01:22 PM, Shreyas B Prabhu wrote: > > > On 08/10/2015 05:58 PM, Thomas Renninger wrote: >> On Monday, August 03, 2015 11:46:00 AM Shreyas B. Prabhu wrote: >>> get_cpu_topology() tries to get topology info from all cpus by reading >>> files in the topology sysfs dir. If a cpu is offlined, since it doesn't >>> have topology dir, this function fails and returns -1. This causes >>> functions relying on get_cpu_topology() to fail. For example- >>> >>> $ cpupower monitor >>> Cannot read number of available processors >>> >>> Fix this by skipping fetching topology info for offline cpus. >> >> Looks fine. >> >> Thanks! >> >> Acked-by: Thomas Renninger >> > > Thanks Thomas! > Rafael, can you please pick this patch? > > Hi Rafael, If this patch looks good can you please pick this up? Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower tools: Fix error when running cpupower monitor
On 08/10/2015 05:58 PM, Thomas Renninger wrote: > On Monday, August 03, 2015 11:46:00 AM Shreyas B. Prabhu wrote: >> get_cpu_topology() tries to get topology info from all cpus by reading >> files in the topology sysfs dir. If a cpu is offlined, since it doesn't >> have topology dir, this function fails and returns -1. This causes >> functions relying on get_cpu_topology() to fail. For example- >> >> $ cpupower monitor >> Cannot read number of available processors >> >> Fix this by skipping fetching topology info for offline cpus. > > Looks fine. > > Thanks! > > Acked-by: Thomas Renninger > Thanks Thomas! Rafael, can you please pick this patch? Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] powerpc: Add an inline function to update POWER8 HID0
On 08/05/2015 12:38 PM, Gautham R. Shenoy wrote: > Section 3.7 of Version 1.2 of the Power8 Processor User's Manual > prescribes that updates to HID0 be preceded by a SYNC instruction and > followed by an ISYNC instruction (Page 91). > > Create an inline function name update_power8_hid0() which follows this > recipe and invoke it from the static split core path. > > Signed-off-by: Gautham R. Shenoy Reviewed-by: Shreyas B. Prabhu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/9] powerpc/powernv: Add platform support for stop instruction
On 05/03/2016 10:55 AM, Michael Neuling wrote: > >> diff --git a/arch/powerpc/include/asm/cputable.h >> b/arch/powerpc/include/asm/cputable.h >> index df4fb5f..a4739a1 100644 >> --- a/arch/powerpc/include/asm/cputable.h >> +++ b/arch/powerpc/include/asm/cputable.h >> @@ -205,6 +205,7 @@ enum { >> #define CPU_FTR_DABRX >> LONG_ASM_CONST(0x0800) >> #define CPU_FTR_PMAO_BUGLONG_ASM_CONST(0x1000) >> #define CPU_FTR_SUBCORE >> LONG_ASM_CONST(0x2000) >> +#define CPU_FTR_STOP_INST LONG_ASM_CONST(0x4000) > > In general, we are putting all the POWER9 features under CPU_FTR_ARCH_300. > Is there a reason you need this separate bit? > No I don't need a separate bit, I'll use CPU_FTR_ARCH_300. Thanks, Shreyas > CPU_FTR bits are fairly scarce these days. > > Mikey >
[PATCH v2 8/9] cpuidle/powernv: Add support for POWER ISA v3 idle states
POWER ISA v3 defines a new idle processor core mechanism. In summary, a) new instruction named stop is added. b) new per thread SPR named PSSCR is added which controls the behavior of stop instruction. Supported idle states and value to be written to PSSCR register to enter any idle state is exposed via ibm,cpu-idle-state-names and ibm,cpu-idle-state-psscr respectively. To enter an idle state, platform provided power_stop() needs to be invoked with the appropriate PSSCR value. This patch adds support for this new mechanism in cpuidle powernv driver. Cc: Rafael J. Wysocki Cc: Daniel Lezcano Cc: linux...@vger.kernel.org Cc: Michael Ellerman Cc: Paul Mackerras Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Shreyas B. Prabhu --- drivers/cpuidle/cpuidle-powernv.c | 57 ++- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index e12dc30..efe5221 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -21,6 +21,7 @@ #include #define MAX_POWERNV_IDLE_STATES8 +#define MAX_IDLE_STATE_NAME_LEN10 struct cpuidle_driver powernv_idle_driver = { .name = "powernv_idle", @@ -29,9 +30,11 @@ struct cpuidle_driver powernv_idle_driver = { static int max_idle_state; static struct cpuidle_state *cpuidle_state_table; + +static u64 stop_psscr_table[MAX_POWERNV_IDLE_STATES]; + static u64 snooze_timeout; static bool snooze_timeout_en; - static int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) @@ -139,6 +142,15 @@ static struct notifier_block setup_hotplug_notifier = { .notifier_call = powernv_cpuidle_add_cpu_notifier, }; +static int stop_loop(struct cpuidle_device *dev, + struct cpuidle_driver *drv, + int index) +{ + ppc64_runlatch_off(); + power_stop(stop_psscr_table[index]); + ppc64_runlatch_on(); + return index; +} /* * powernv_cpuidle_driver_init() */ @@ -169,6 +181,8 @@ static int powernv_add_idle_states(void) int nr_idle_states = 1; /* Snooze */ int dt_idle_states; u32 *latency_ns, *residency_ns, *flags; + u64 *psscr_val = NULL; + const char *names[MAX_POWERNV_IDLE_STATES]; int i, rc; /* Currently we have snooze statically defined */ @@ -201,6 +215,23 @@ static int powernv_add_idle_states(void) goto out_free_latency; } + rc = of_property_read_string_array(power_mgt, + "ibm,cpu-idle-state-names", names, dt_idle_states); + if (rc < -1) { + pr_warn("cpuidle-powernv: missing ibm,cpu-idle-states-names in DT\n"); + goto out_free_latency; + } + + if (cpu_has_feature(CPU_FTR_ARCH_300)) { + psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), + GFP_KERNEL); + rc = of_property_read_u64_array(power_mgt, + "ibm,cpu-idle-state-psscr", psscr_val, dt_idle_states); + if (rc < -1) { + pr_warn("cpuidle-powernv: missing ibm,cpu-idle-states-psscr in DT\n"); + goto out_free_psscr; + } + } residency_ns = kzalloc(sizeof(*residency_ns) * dt_idle_states, GFP_KERNEL); rc = of_property_read_u32_array(power_mgt, "ibm,cpu-idle-state-residency-ns", residency_ns, dt_idle_states); @@ -218,6 +249,16 @@ static int powernv_add_idle_states(void) powernv_states[nr_idle_states].flags = 0; powernv_states[nr_idle_states].target_residency = 100; powernv_states[nr_idle_states].enter = &nap_loop; + } else if ((flags[i] & OPAL_PM_STOP_INST_FAST) && + !(flags[i] & OPAL_PM_TIMEBASE_STOP)) { + strncpy(powernv_states[nr_idle_states].name, + (char *)names[i], MAX_IDLE_STATE_NAME_LEN); + strncpy(powernv_states[nr_idle_states].desc, + (char *)names[i], MAX_IDLE_STATE_NAME_LEN); + powernv_states[nr_idle_states].flags = 0; + + powernv_states[nr_idle_states].enter = &stop_loop; + stop_psscr_table[nr_idle_states] = psscr_val[i]; } /* @@ -233,6 +274,18 @@ static int powernv_add_idle_states(void) powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIMER_STOP; powernv_states[nr_idle_states].target_residency = 30; powernv_states[nr_idle_stat
[PATCH v2 1/9] powerpc/powernv: Move CHECK_HMI_INTERRUPT to exception-64s header
CHECK_HMI_INTERRUPT is used to check for HMI's in reset vector. Move the macro to a common location (exception-64s.h) This patch does not change any functionality. Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/exception-64s.h | 18 ++ arch/powerpc/kernel/idle_power7.S| 20 +--- 2 files changed, 19 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 93ae809..6a625af 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -545,4 +545,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) #define FINISH_NAP #endif +#define CHECK_HMI_INTERRUPT\ + mfspr r0,SPRN_SRR1; \ +BEGIN_FTR_SECTION_NESTED(66); \ + rlwinm r0,r0,45-31,0xf; /* extract wake reason field (P8) */ \ +FTR_SECTION_ELSE_NESTED(66); \ + rlwinm r0,r0,45-31,0xe; /* P7 wake reason field is 3 bits */ \ +ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \ + cmpwi r0,0xa; /* Hypervisor maintenance ? */ \ + bne 20f;\ + /* Invoke opal call to handle hmi */\ + ld r2,PACATOC(r13);\ + ld r1,PACAR1(r13); \ + std r3,ORIG_GPR3(r1); /* Save original r3 */ \ + li r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/ \ + bl opal_call_realmode; \ + ld r3,ORIG_GPR3(r1); /* Restore original r3 */ \ +20:nop; + #endif /* _ASM_POWERPC_EXCEPTION_H */ diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index 470ceeb..6b3404b 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -19,6 +19,7 @@ #include #include #include +#include #include #undef DEBUG @@ -257,25 +258,6 @@ _GLOBAL(power7_winkle) b power7_powersave_common /* No return */ -#define CHECK_HMI_INTERRUPT\ - mfspr r0,SPRN_SRR1; \ -BEGIN_FTR_SECTION_NESTED(66); \ - rlwinm r0,r0,45-31,0xf; /* extract wake reason field (P8) */ \ -FTR_SECTION_ELSE_NESTED(66); \ - rlwinm r0,r0,45-31,0xe; /* P7 wake reason field is 3 bits */ \ -ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \ - cmpwi r0,0xa; /* Hypervisor maintenance ? */ \ - bne 20f;\ - /* Invoke opal call to handle hmi */\ - ld r2,PACATOC(r13);\ - ld r1,PACAR1(r13); \ - std r3,ORIG_GPR3(r1); /* Save original r3 */ \ - li r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/ \ - bl opal_call_realmode; \ - ld r3,ORIG_GPR3(r1); /* Restore original r3 */ \ -20:nop; - - _GLOBAL(power7_wakeup_tb_loss) ld r2,PACATOC(r13); ld r1,PACAR1(r13) -- 2.4.11
[PATCH v2 0/9] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states
POWER ISA v3 defines a new idle processor core mechanism. In summary, a) new instruction named stop is added. This instruction replaces instructions like nap, sleep, rvwinkle. b) new per thread SPR named PSSCR is added which controls the behavior of stop instruction. PSSCR has following key fields Bits 0:3 - Power-Saving Level Status. This field indicates the lowest power-saving state the thread entered since stop instruction was last executed. Bit 42 - Enable State Loss 0 - No state is lost irrespective of other fields 1 - Allows state loss Bits 44:47 - Power-Saving Level Limit This limits the power-saving level that can be entered into. Bits 60:63 - Requested Level Used to specify which power-saving level must be entered on executing stop instruction Stop idle states and their properties like name, latency, target residency, psscr value are exposed via device tree. This patch series adds support for this new mechanism. Patches 1-6 are cleanups and code movement. Patch 7 adds platform specific support for stop and psscr handling. Patch 8 adds cpuidle driver support. Patch 9 makes offlined cpu use stop state. Changes in v2 = - Rebased on v4.6-rc6 - Using CPU_FTR_ARCH_300 bit instead of CPU_FTR_STOP_INST Cc: Rafael J. Wysocki Cc: Daniel Lezcano Cc: linux...@vger.kernel.org Cc: Michael Ellerman Cc: Paul Mackerras Cc: Michael Neuling Cc: linuxppc-...@lists.ozlabs.org Shreyas B. Prabhu (9): powerpc/powernv: Move CHECK_HMI_INTERRUPT to exception-64s header powerpc/kvm: make hypervisor state restore a function powerpc/powernv: Move idle code usable by multiple hardware to common location powerpc/powernv: Make power7_powersave_common more generic powerpc/powernv: Move idle related macros to cpuidle.h powerpc/powernv: set power_save func after the idle states are initialized powerpc/powernv: Add platform support for stop instruction cpuidle/powernv: Add support for POWER ISA v3 idle states powerpc/powernv: Use deepest stop state when cpu is offlined arch/powerpc/include/asm/cpuidle.h| 29 arch/powerpc/include/asm/exception-64s.h | 18 +++ arch/powerpc/include/asm/kvm_book3s_asm.h | 2 +- arch/powerpc/include/asm/machdep.h| 1 + arch/powerpc/include/asm/opal-api.h | 11 +- arch/powerpc/include/asm/paca.h | 4 + arch/powerpc/include/asm/ppc-opcode.h | 4 + arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/reg.h| 11 ++ arch/powerpc/kernel/Makefile | 2 + arch/powerpc/kernel/asm-offsets.c | 4 + arch/powerpc/kernel/exceptions-64s.S | 29 +--- arch/powerpc/kernel/idle_power7.S | 212 +++- arch/powerpc/kernel/idle_power_common.S | 185 + arch/powerpc/kernel/idle_power_stop.S | 221 ++ arch/powerpc/platforms/Kconfig| 4 + arch/powerpc/platforms/powernv/Kconfig| 1 + arch/powerpc/platforms/powernv/idle.c | 94 +++-- arch/powerpc/platforms/powernv/powernv.h | 1 + arch/powerpc/platforms/powernv/setup.c| 2 +- arch/powerpc/platforms/powernv/smp.c | 4 +- drivers/cpuidle/cpuidle-powernv.c | 57 +++- 22 files changed, 659 insertions(+), 238 deletions(-) create mode 100644 arch/powerpc/kernel/idle_power_common.S create mode 100644 arch/powerpc/kernel/idle_power_stop.S -- 2.4.11
[PATCH v2 6/9] powerpc/powernv: set power_save func after the idle states are initialized
pnv_init_idle_states discovers supported idle states from the device tree and does the required initialization. Set power_save function pointer only after this initialization is done Signed-off-by: Shreyas B. Prabhy --- arch/powerpc/platforms/powernv/idle.c | 3 +++ arch/powerpc/platforms/powernv/setup.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index fcc8b68..fbb09fb 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -285,6 +285,9 @@ static int __init pnv_init_idle_states(void) } pnv_alloc_idle_core_states(); + + if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED) + ppc_md.power_save = power7_idle; out_free: kfree(flags); out: diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 1acb0c7..c9685b6 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -312,7 +312,7 @@ define_machine(powernv) { .get_proc_freq = pnv_get_proc_freq, .progress = pnv_progress, .machine_shutdown = pnv_shutdown, - .power_save = power7_idle, + .power_save = NULL, .calibrate_decr = generic_calibrate_decr, #ifdef CONFIG_KEXEC .kexec_cpu_down = pnv_kexec_cpu_down, -- 2.4.11
[PATCH v2 7/9] powerpc/powernv: Add platform support for stop instruction
POWER ISA v3 defines a new idle processor core mechanism. In summary, a) new instruction named stop is added. This instruction replaces instructions like nap, sleep, rvwinkle. b) new per thread SPR named PSSCR is added which controls the behavior of stop instruction. PSSCR has following key fields Bits 0:3 - Power-Saving Level Status. This field indicates the lowest power-saving state the thread entered since stop instruction was last executed. Bit 42 - Enable State Loss 0 - No state is lost irrespective of other fields 1 - Allows state loss Bits 44:47 - Power-Saving Level Limit This limits the power-saving level that can be entered into. Bits 60:63 - Requested Level Used to specify which power-saving level must be entered on executing stop instruction This patch adds support for stop instruction and PSSCR handling. Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/include/asm/cpuidle.h| 2 + arch/powerpc/include/asm/kvm_book3s_asm.h | 2 +- arch/powerpc/include/asm/machdep.h| 1 + arch/powerpc/include/asm/opal-api.h | 11 +- arch/powerpc/include/asm/paca.h | 4 + arch/powerpc/include/asm/ppc-opcode.h | 4 + arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/reg.h| 11 ++ arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/asm-offsets.c | 4 + arch/powerpc/kernel/idle_power7.S | 2 +- arch/powerpc/kernel/idle_power_common.S | 26 +++- arch/powerpc/kernel/idle_power_stop.S | 221 ++ arch/powerpc/platforms/Kconfig| 4 + arch/powerpc/platforms/powernv/Kconfig| 1 + arch/powerpc/platforms/powernv/idle.c | 80 +-- 16 files changed, 358 insertions(+), 17 deletions(-) create mode 100644 arch/powerpc/kernel/idle_power_stop.S diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h index faa97b7..6d20583 100644 --- a/arch/powerpc/include/asm/cpuidle.h +++ b/arch/powerpc/include/asm/cpuidle.h @@ -13,6 +13,8 @@ #ifndef __ASSEMBLY__ extern u32 pnv_fastsleep_workaround_at_entry[]; extern u32 pnv_fastsleep_workaround_at_exit[]; + +extern u64 pnv_first_deep_stop_state; #endif #endif diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 72b6225..d318d43 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -162,7 +162,7 @@ struct kvmppc_book3s_shadow_vcpu { /* Values for kvm_state */ #define KVM_HWTHREAD_IN_KERNEL 0 -#define KVM_HWTHREAD_IN_NAP1 +#define KVM_HWTHREAD_IN_IDLE 1 #define KVM_HWTHREAD_IN_KVM2 #endif /* __ASM_KVM_BOOK3S_ASM_H__ */ diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index fd22442..ca4b116 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -261,6 +261,7 @@ struct machdep_calls { extern void e500_idle(void); extern void power4_idle(void); extern void power7_idle(void); +extern void power_stop0(void); extern void ppc6xx_idle(void); extern void book3e_idle(void); diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index f8faaae..3b978ba 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -162,13 +162,20 @@ /* Device tree flags */ -/* Flags set in power-mgmt nodes in device tree if - * respective idle states are supported in the platform. +/* + * Flags set in power-mgmt nodes in device tree describing + * idle states that are supported in the platform. */ + +#define OPAL_PM_TIMEBASE_STOP 0x0002 +#define OPAL_PM_LOSE_HYP_CONTEXT 0x2000 +#define OPAL_PM_LOSE_FULL_CONTEXT 0x4000 #define OPAL_PM_NAP_ENABLED0x0001 #define OPAL_PM_SLEEP_ENABLED 0x0002 #define OPAL_PM_WINKLE_ENABLED 0x0004 #define OPAL_PM_SLEEP_ENABLED_ER1 0x0008 /* with workaround */ +#define OPAL_PM_STOP_INST_FAST 0x0010 +#define OPAL_PM_STOP_INST_DEEP 0x0020 /* * OPAL_CONFIG_CPU_IDLE_STATE parameters diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 546540b..bf48b7e 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -171,6 +171,10 @@ struct paca_struct { /* Mask to denote subcore sibling threads */ u8 subcore_sibling_mask; #endif +#ifdef CONFIG_PPC_STOP_INST +/* Template for PSSCR with EC, ESL, TR, PSLL, MTL fields set */ + u64 thread_psscr; +#endif #ifdef CONFIG_PPC_BOOK3S_64 /* Exclusive emergency stack pointer for machine check exception. */ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 7ab04fc..f66747f 100644 --- a/arch/powerpc/include
[PATCH v2 9/9] powerpc/powernv: Use deepest stop state when cpu is offlined
If hardware supports stop state, use the deepest stop state when the cpu is offlined. Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/platforms/powernv/idle.c| 15 +-- arch/powerpc/platforms/powernv/powernv.h | 1 + arch/powerpc/platforms/powernv/smp.c | 4 +++- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 45717ab..cce4780 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -240,6 +240,11 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600, */ u64 pnv_first_deep_stop_state; +/* + * Deepest stop idle state. Used when a cpu is offlined + */ +u64 pnv_deepest_stop_state; + static int __init pnv_init_idle_states(void) { struct device_node *power_mgt; @@ -286,8 +291,11 @@ static int __init pnv_init_idle_states(void) } /* -* Set pnv_first_deep_stop_state to the first stop level -* to cause hypervisor state loss +* Set pnv_first_deep_stop_state and pnv_deepest_stop_state. +* pnv_first_deep_stop_state should be set to the first stop +* level to cause hypervisor state loss. +* pnv_deepest_stop_state should be set to the deepest stop +* stop state. */ pnv_first_deep_stop_state = 0xF; for (i = 0; i < dt_idle_states; i++) { @@ -296,6 +304,9 @@ static int __init pnv_init_idle_states(void) if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) && (pnv_first_deep_stop_state > psscr_rl)) pnv_first_deep_stop_state = psscr_rl; + + if (pnv_deepest_stop_state < psscr_rl) + pnv_deepest_stop_state = psscr_rl; } } diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 6dbc0a1..da7c843 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -18,6 +18,7 @@ static inline void pnv_pci_shutdown(void) { } #endif extern u32 pnv_get_supported_cpuidle_states(void); +extern u64 pnv_deepest_stop_state; extern void pnv_lpc_init(void); diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index ad7b1a3..f69ceb6 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -182,7 +182,9 @@ static void pnv_smp_cpu_kill_self(void) ppc64_runlatch_off(); - if (idle_states & OPAL_PM_WINKLE_ENABLED) + if (cpu_has_feature(CPU_FTR_ARCH_300)) + srr1 = power_stop(pnv_deepest_stop_state); + else if (idle_states & OPAL_PM_WINKLE_ENABLED) srr1 = power7_winkle(); else if ((idle_states & OPAL_PM_SLEEP_ENABLED) || (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) -- 2.4.11
[PATCH v2 5/9] powerpc/powernv: Move idle related macros to cpuidle.h
Move idle related macros to a common location asm/cpuidle.h so that they can be used for stop instruction support. Signed-off-by: Shreyas B. Prabhy --- arch/powerpc/include/asm/cpuidle.h | 27 +++ arch/powerpc/kernel/idle_power7.S | 26 -- 2 files changed, 27 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h index d2f99ca..faa97b7 100644 --- a/arch/powerpc/include/asm/cpuidle.h +++ b/arch/powerpc/include/asm/cpuidle.h @@ -17,4 +17,31 @@ extern u32 pnv_fastsleep_workaround_at_exit[]; #endif +/* Idle state entry routines */ +#ifdef CONFIG_PPC_P7_NAP +#defineIDLE_STATE_ENTER_SEQ(IDLE_INST) \ + /* Magic NAP/SLEEP/WINKLE mode enter sequence */\ + std r0,0(r1); \ + ptesync;\ + ld r0,0(r1); \ +1: cmp cr0,r0,r0; \ + bne 1b; \ + IDLE_INST; \ + b . +#endif /* CONFIG_PPC_P7_NAP */ + +/* + * Use unused space in the interrupt stack to save and restore + * registers for deep-idle support. + */ +#define _SDR1 GPR3 +#define _RPR GPR4 +#define _SPURR GPR5 +#define _PURR GPR6 +#define _TSCR GPR7 +#define _DSCR GPR8 +#define _AMOR GPR9 +#define _WORT GPR10 +#define _WORC GPR11 + #endif diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index 1ea71d4..6a24769 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -24,32 +24,6 @@ #undef DEBUG -/* - * Use unused space in the interrupt stack to save and restore - * registers for winkle support. - */ -#define _SDR1 GPR3 -#define _RPR GPR4 -#define _SPURR GPR5 -#define _PURR GPR6 -#define _TSCR GPR7 -#define _DSCR GPR8 -#define _AMOR GPR9 -#define _WORT GPR10 -#define _WORC GPR11 - -/* Idle state entry routines */ - -#defineIDLE_STATE_ENTER_SEQ(IDLE_INST) \ - /* Magic NAP/SLEEP/WINKLE mode enter sequence */\ - std r0,0(r1); \ - ptesync;\ - ld r0,0(r1); \ -1: cmp cr0,r0,r0; \ - bne 1b; \ - IDLE_INST; \ - b . - .text /* -- 2.4.11
[PATCH v2 2/9] powerpc/kvm: make hypervisor state restore a function
In the current code, when the thread wakes up in reset vector, some of the state restore code and check for whether a thread needs to branch to kvm is duplicated. Reorder the code such that this duplication is avoided. At a higher level this is what the change looks like- Before this patch - power7_wakeup_tb_loss: restore hypervisor state if (thread needed by kvm) goto kvm_start_guest restore nvgprs, cr, pc rfid to process context power7_wakeup_loss: restore nvgprs, cr, pc rfid to process context reset vector: if (waking from deep idle states) goto power7_wakeup_tb_loss else if (thread needed by kvm) goto kvm_start_guest goto power7_wakeup_loss After this patch - power7_wakeup_tb_loss: restore hypervisor state return power7_restore_hyp_resource(): if (waking from deep idle states) goto power7_wakeup_tb_loss return power7_wakeup_loss: restore nvgprs, cr, pc rfid to process context reset vector: power7_restore_hyp_resource() if (thread needed by kvm) goto kvm_start_guest goto power7_wakeup_loss Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/kernel/exceptions-64s.S | 29 +++- arch/powerpc/kernel/idle_power7.S| 67 2 files changed, 41 insertions(+), 55 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 7716ceb..7ebfbb0 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -107,25 +107,8 @@ BEGIN_FTR_SECTION beq 9f cmpwi cr3,r13,2 + bl power7_restore_hyp_resource - /* -* Check if last bit of HSPGR0 is set. This indicates whether we are -* waking up from winkle. -*/ - GET_PACA(r13) - clrldi r5,r13,63 - clrrdi r13,r13,1 - cmpwi cr4,r5,1 - mtspr SPRN_HSPRG0,r13 - - lbz r0,PACA_THREAD_IDLE_STATE(r13) - cmpwi cr2,r0,PNV_THREAD_NAP - bgt cr2,8f /* Either sleep or Winkle */ - - /* Waking up from nap should not cause hypervisor state loss */ - bgt cr3,. - - /* Waking up from nap */ li r0,PNV_THREAD_RUNNING stb r0,PACA_THREAD_IDLE_STATE(r13) /* Clear thread state */ @@ -143,13 +126,9 @@ BEGIN_FTR_SECTION /* Return SRR1 from power7_nap() */ mfspr r3,SPRN_SRR1 - beq cr3,2f - b power7_wakeup_noloss -2: b power7_wakeup_loss - - /* Fast Sleep wakeup on PowerNV */ -8: GET_PACA(r13) - b power7_wakeup_tb_loss + blt cr3,2f + b power7_wakeup_loss +2: b power7_wakeup_noloss 9: END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index 6b3404b..82d164b 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -258,6 +258,35 @@ _GLOBAL(power7_winkle) b power7_powersave_common /* No return */ +/* + * Called from reset vector. Check whether we have woken up with + * hypervisor state loss. If yes, restore hypervisor state and return + * back to reset vector. + */ +_GLOBAL(power7_restore_hyp_resource) + /* +* Check if last bit of HSPGR0 is set. This indicates whether we are +* waking up from winkle. +*/ + GET_PACA(r13) + clrldi r5,r13,63 + clrrdi r13,r13,1 + cmpwi cr4,r5,1 + mtspr SPRN_HSPRG0,r13 + + lbz r0,PACA_THREAD_IDLE_STATE(r13) + cmpwi cr2,r0,PNV_THREAD_NAP + bgt cr2,power7_wakeup_tb_loss /* Either sleep or Winkle */ + + /* +* We fall through here if PACA_THREAD_IDLE_STATE shows we are waking +* up from nap. At this stage CR3 shouldn't contains 'gt' since that +* indicates we are waking with hypervisor state loss from nap. +*/ + bgt cr3,. + + blr + _GLOBAL(power7_wakeup_tb_loss) ld r2,PACATOC(r13); ld r1,PACAR1(r13) @@ -266,11 +295,13 @@ _GLOBAL(power7_wakeup_tb_loss) * and they are restored before switching to the process context. Hence * until they are restored, they are free to be used. * -* Save SRR1 in a NVGPR as it might be clobbered in opal_call_realmode -* (called in CHECK_HMI_INTERRUPT). SRR1 is required to determine the -* wakeup reason if we branch to kvm_start_guest. +* Save SRR1 and LR in NVGPRs as they might be clobbered in +* opal_call_realmode (called in CHECK_HMI_INTERRUPT). SRR1 is required +* to determine the wakeup reason if we branch to kvm_start_guest. LR +* is req
[PATCH v2 4/9] powerpc/powernv: Make power7_powersave_common more generic
power7_powersave_common does common steps needed before entering idle state and eventually changes MSR to MSR_IDLE and does rfid to power7_enter_nap_mode. Make it more generic by passing the rfid address as a function parameter. Also make function name more generic. Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/kernel/idle_power7.S | 11 +++ arch/powerpc/kernel/idle_power_common.S | 11 ++- 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index 594e1c5..1ea71d4 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -169,19 +169,22 @@ _GLOBAL(power7_idle) _GLOBAL(power7_nap) mr r4,r3 li r3,PNV_THREAD_NAP - b power7_powersave_common + LOAD_REG_ADDR(r5, power7_enter_nap_mode) + b power_powersave_common /* No return */ _GLOBAL(power7_sleep) li r3,PNV_THREAD_SLEEP li r4,1 - b power7_powersave_common + LOAD_REG_ADDR(r5, power7_enter_nap_mode) + b power_powersave_common /* No return */ _GLOBAL(power7_winkle) - li r3,3 + li r3,PNV_THREAD_WINKLE li r4,1 - b power7_powersave_common + LOAD_REG_ADDR(r5, power7_enter_nap_mode) + b power_powersave_common /* No return */ _GLOBAL(power7_wakeup_tb_loss) diff --git a/arch/powerpc/kernel/idle_power_common.S b/arch/powerpc/kernel/idle_power_common.S index 05954ae..ff7a541 100644 --- a/arch/powerpc/kernel/idle_power_common.S +++ b/arch/powerpc/kernel/idle_power_common.S @@ -21,8 +21,10 @@ * To check IRQ_HAPPENED in r4 * 0 - don't check * 1 - check + * + * Address to 'rfid' to in r5 */ -_GLOBAL(power7_powersave_common) +_GLOBAL(power_powersave_common) /* Use r3 to pass state nap/sleep/winkle */ /* NAP is a state loss, we create a regs frame on the * stack, fill it up with the state we care about and @@ -79,13 +81,12 @@ _GLOBAL(power7_powersave_common) * because as soon as we do that, another thread can switch * the MMU context to the guest. */ - LOAD_REG_IMMEDIATE(r5, MSR_IDLE) + LOAD_REG_IMMEDIATE(r7, MSR_IDLE) li r6, MSR_RI andcr6, r9, r6 - LOAD_REG_ADDR(r7, power7_enter_nap_mode) mtmsrd r6, 1 /* clear RI before setting SRR0/1 */ - mtspr SPRN_SRR0, r7 - mtspr SPRN_SRR1, r5 + mtspr SPRN_SRR0, r5 + mtspr SPRN_SRR1, r7 rfid /* No return */ -- 2.4.11
[PATCH v2 3/9] powerpc/powernv: Move idle code usable by multiple hardware to common location
CPU-idle related code like context save/restore functions idle_power7.S can reused for adding stop instruction support. Move this code to a new commonly accessible location. Signed-off-by: Shreyas B. Prabhu --- arch/powerpc/kernel/Makefile| 1 + arch/powerpc/kernel/idle_power7.S | 144 arch/powerpc/kernel/idle_power_common.S | 160 3 files changed, 161 insertions(+), 144 deletions(-) create mode 100644 arch/powerpc/kernel/idle_power_common.S diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 2da380f..b877b84 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_PPC64) += vdso64/ obj-$(CONFIG_ALTIVEC) += vecemu.o obj-$(CONFIG_PPC_970_NAP) += idle_power4.o obj-$(CONFIG_PPC_P7_NAP) += idle_power7.o +obj-$(CONFIG_PPC_POWERNV) += idle_power_common.o procfs-y := proc_powerpc.o obj-$(CONFIG_PROC_FS) += $(procfs-y) rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI) := rtas_pci.o diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index 82d164b..594e1c5 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -68,80 +68,6 @@ core_idle_lock_held: lwarx r15,0,r14 blr -/* - * Pass requested state in r3: - * r3 - PNV_THREAD_NAP/SLEEP/WINKLE - * - * To check IRQ_HAPPENED in r4 - * 0 - don't check - * 1 - check - */ -_GLOBAL(power7_powersave_common) - /* Use r3 to pass state nap/sleep/winkle */ - /* NAP is a state loss, we create a regs frame on the -* stack, fill it up with the state we care about and -* stick a pointer to it in PACAR1. We really only -* need to save PC, some CR bits and the NV GPRs, -* but for now an interrupt frame will do. -*/ - mflrr0 - std r0,16(r1) - stdur1,-INT_FRAME_SIZE(r1) - std r0,_LINK(r1) - std r0,_NIP(r1) - - /* Hard disable interrupts */ - mfmsr r9 - rldicl r9,r9,48,1 - rotldi r9,r9,16 - mtmsrd r9,1/* hard-disable interrupts */ - - /* Check if something happened while soft-disabled */ - lbz r0,PACAIRQHAPPENED(r13) - andi. r0,r0,~PACA_IRQ_HARD_DIS@l - beq 1f - cmpwi cr0,r4,0 - beq 1f - addir1,r1,INT_FRAME_SIZE - ld r0,16(r1) - li r3,0/* Return 0 (no nap) */ - mtlrr0 - blr - -1: /* We mark irqs hard disabled as this is the state we'll -* be in when returning and we need to tell arch_local_irq_restore() -* about it -*/ - li r0,PACA_IRQ_HARD_DIS - stb r0,PACAIRQHAPPENED(r13) - - /* We haven't lost state ... yet */ - li r0,0 - stb r0,PACA_NAPSTATELOST(r13) - - /* Continue saving state */ - SAVE_GPR(2, r1) - SAVE_NVGPRS(r1) - mfcrr4 - std r4,_CCR(r1) - std r9,_MSR(r1) - std r1,PACAR1(r13) - - /* -* Go to real mode to do the nap, as required by the architecture. -* Also, we need to be in real mode before setting hwthread_state, -* because as soon as we do that, another thread can switch -* the MMU context to the guest. -*/ - LOAD_REG_IMMEDIATE(r5, MSR_IDLE) - li r6, MSR_RI - andcr6, r9, r6 - LOAD_REG_ADDR(r7, power7_enter_nap_mode) - mtmsrd r6, 1 /* clear RI before setting SRR0/1 */ - mtspr SPRN_SRR0, r7 - mtspr SPRN_SRR1, r5 - rfid - .globl power7_enter_nap_mode power7_enter_nap_mode: #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE @@ -258,35 +184,6 @@ _GLOBAL(power7_winkle) b power7_powersave_common /* No return */ -/* - * Called from reset vector. Check whether we have woken up with - * hypervisor state loss. If yes, restore hypervisor state and return - * back to reset vector. - */ -_GLOBAL(power7_restore_hyp_resource) - /* -* Check if last bit of HSPGR0 is set. This indicates whether we are -* waking up from winkle. -*/ - GET_PACA(r13) - clrldi r5,r13,63 - clrrdi r13,r13,1 - cmpwi cr4,r5,1 - mtspr SPRN_HSPRG0,r13 - - lbz r0,PACA_THREAD_IDLE_STATE(r13) - cmpwi cr2,r0,PNV_THREAD_NAP - bgt cr2,power7_wakeup_tb_loss /* Either sleep or Winkle */ - - /* -* We fall through here if PACA_THREAD_IDLE_STATE shows we are waking -* up from nap. At this stage CR3 shouldn't contains 'gt' since that -* indicates we are waking with hypervisor state loss from nap. -*/ - bgt cr3,. - - blr - _GLOBAL(power7_wakeup_tb_loss) ld r2,PACATOC(r13); ld
[PATCH v3 2/9] powerpc/powernv: Rename idle_power7.S to idle_power_common.S
idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next patch for POWER9. Rename the file to a non-hardware specific name. Signed-off-by: Shreyas B. Prabhu --- Changes in v3: == - Instead of moving few common functions from idle_power7.S to idle_power_common.S, renaming idle_power7.S to idle_power_common.S. arch/powerpc/kernel/Makefile| 2 +- arch/powerpc/kernel/idle_power7.S | 527 arch/powerpc/kernel/idle_power_common.S | 527 3 files changed, 528 insertions(+), 528 deletions(-) delete mode 100644 arch/powerpc/kernel/idle_power7.S create mode 100644 arch/powerpc/kernel/idle_power_common.S diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 2da380f..99116da 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -47,7 +47,7 @@ obj-$(CONFIG_PPC_BOOK3E_64) += exceptions-64e.o idle_book3e.o obj-$(CONFIG_PPC64)+= vdso64/ obj-$(CONFIG_ALTIVEC) += vecemu.o obj-$(CONFIG_PPC_970_NAP) += idle_power4.o -obj-$(CONFIG_PPC_P7_NAP) += idle_power7.o +obj-$(CONFIG_PPC_P7_NAP) += idle_power_common.o procfs-y := proc_powerpc.o obj-$(CONFIG_PROC_FS) += $(procfs-y) rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI) := rtas_pci.o diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S deleted file mode 100644 index db59613..000 --- a/arch/powerpc/kernel/idle_power7.S +++ /dev/null @@ -1,527 +0,0 @@ -/* - * This file contains the power_save function for Power7 CPUs. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#undef DEBUG - -/* - * Use unused space in the interrupt stack to save and restore - * registers for winkle support. - */ -#define _SDR1 GPR3 -#define _RPR GPR4 -#define _SPURR GPR5 -#define _PURR GPR6 -#define _TSCR GPR7 -#define _DSCR GPR8 -#define _AMOR GPR9 -#define _WORT GPR10 -#define _WORC GPR11 - -/* Idle state entry routines */ - -#defineIDLE_STATE_ENTER_SEQ(IDLE_INST) \ - /* Magic NAP/SLEEP/WINKLE mode enter sequence */\ - std r0,0(r1); \ - ptesync;\ - ld r0,0(r1); \ -1: cmp cr0,r0,r0; \ - bne 1b; \ - IDLE_INST; \ - b . - - .text - -/* - * Used by threads when the lock bit of core_idle_state is set. - * Threads will spin in HMT_LOW until the lock bit is cleared. - * r14 - pointer to core_idle_state - * r15 - used to load contents of core_idle_state - */ - -core_idle_lock_held: - HMT_LOW -3: lwz r15,0(r14) - andi. r15,r15,PNV_CORE_IDLE_LOCK_BIT - bne 3b - HMT_MEDIUM - lwarx r15,0,r14 - blr - -/* - * Pass requested state in r3: - * r3 - PNV_THREAD_NAP/SLEEP/WINKLE - * - * To check IRQ_HAPPENED in r4 - * 0 - don't check - * 1 - check - */ -_GLOBAL(power7_powersave_common) - /* Use r3 to pass state nap/sleep/winkle */ - /* NAP is a state loss, we create a regs frame on the -* stack, fill it up with the state we care about and -* stick a pointer to it in PACAR1. We really only -* need to save PC, some CR bits and the NV GPRs, -* but for now an interrupt frame will do. -*/ - mflrr0 - std r0,16(r1) - stdur1,-INT_FRAME_SIZE(r1) - std r0,_LINK(r1) - std r0,_NIP(r1) - - /* Hard disable interrupts */ - mfmsr r9 - rldicl r9,r9,48,1 - rotldi r9,r9,16 - mtmsrd r9,1/* hard-disable interrupts */ - - /* Check if something happened while soft-disabled */ - lbz r0,PACAIRQHAPPENED(r13) - andi. r0,r0,~PACA_IRQ_HARD_DIS@l - beq 1f - cmpwi cr0,r4,0 - beq 1f - addir1,r1,INT_FRAME_SIZE - ld r0,16(r1) - li r3,0/* Return 0 (no nap) */ - mtlrr0 - blr - -1: /* We mark irqs hard disabled as this is the state we'll -* be in when returning and we need to tell arch_local_irq_restore() -* about it -*/ - li r0,PACA_IRQ_HARD_DIS - stb r0,PACAIRQHAPPENED(r13) - - /* We haven't lost state ... yet */ - li r0,0 - stb r0,PACA_N
[PATCH v3 0/9] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states
POWER ISA v3 defines a new idle processor core mechanism. In summary, a) new instruction named stop is added. This instruction replaces instructions like nap, sleep, rvwinkle. b) new per thread SPR named PSSCR is added which controls the behavior of stop instruction. PSSCR has following key fields Bits 0:3 - Power-Saving Level Status. This field indicates the lowest power-saving state the thread entered since stop instruction was last executed. Bit 42 - Enable State Loss 0 - No state is lost irrespective of other fields 1 - Allows state loss Bits 44:47 - Power-Saving Level Limit This limits the power-saving level that can be entered into. Bits 60:63 - Requested Level Used to specify which power-saving level must be entered on executing stop instruction Stop idle states and their properties like name, latency, target residency, psscr value are exposed via device tree. This patch series adds support for this new mechanism. Patches 1-6 are cleanups and code movement. Patch 7 adds platform specific support for stop and psscr handling. Patch 8 adds cpuidle driver support. Patch 9 makes offlined cpu use deepest stop state. Changes in v3 = - Rebased on powerpc-next - Dropping patch 1 since we are not adding a new file for P9 idle support - Improved comments in multiple places - Moved GET_PACA from power7_restore_hyp_resource to System Reset - Instead of moving few functions from idle_power7 to idle_power_common, renaming idle_power7.S to idle_power_common.S - Moved HSTATE_HWTHREAD_STATE updation to power_powersave_common - Dropped earlier patch 5 which moved few macros from idle_power_common to asm/cpuidle.h. - Added a patch to rename reusable power7_* idle functions to pnv_* - Added new patch that creates abstraction for saving SPRs before entering deep idle states - Instead of introducing new file idle_power_stop.S, P9 idle support is added to idle_power_common.S using CPU_FTR sections. - Fixed r4 reg clobbering in power_stop0 Changes in v2 = - Rebased on v4.6-rc6 - Using CPU_FTR_ARCH_300 bit instead of CPU_FTR_STOP_INST Cc: Rafael J. Wysocki Cc: Daniel Lezcano Cc: linux...@vger.kernel.org Cc: Michael Ellerman Cc: Paul Mackerras Cc: Michael Neuling Cc: linuxppc-...@lists.ozlabs.org Shreyas B. Prabhu (9): powerpc/powernv: Move CHECK_HMI_INTERRUPT to exception-64s header powerpc/kvm: make hypervisor state restore a function powerpc/powernv: Rename idle_power7.S to idle_power_common.S powerpc/powernv: Make power7_powersave_common more generic powerpc/powernv: abstraction for saving SPRs before entering deep idle states powerpc/powernv: set power_save func after the idle states are initialized powerpc/powernv: Add platform support for stop instruction cpuidle/powernv: Add support for POWER ISA v3 idle states powerpc/powernv: Use deepest stop state when cpu is offlined arch/powerpc/include/asm/cpuidle.h| 2 + arch/powerpc/include/asm/exception-64s.h | 18 + arch/powerpc/include/asm/kvm_book3s_asm.h | 2 +- arch/powerpc/include/asm/machdep.h| 1 + arch/powerpc/include/asm/opal-api.h | 11 +- arch/powerpc/include/asm/paca.h | 2 + arch/powerpc/include/asm/ppc-opcode.h | 4 + arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/reg.h| 11 + arch/powerpc/kernel/Makefile | 2 +- arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kernel/exceptions-64s.S | 28 +- arch/powerpc/kernel/idle_power7.S | 515 arch/powerpc/kernel/idle_power_common.S | 642 ++ arch/powerpc/platforms/powernv/idle.c | 96 - arch/powerpc/platforms/powernv/powernv.h | 1 + arch/powerpc/platforms/powernv/setup.c| 2 +- arch/powerpc/platforms/powernv/smp.c | 4 +- drivers/cpuidle/cpuidle-powernv.c | 57 ++- 19 files changed, 843 insertions(+), 558 deletions(-) delete mode 100644 arch/powerpc/kernel/idle_power7.S create mode 100644 arch/powerpc/kernel/idle_power_common.S -- 2.4.11
[PATCH v3 5/9] powerpc/powernv: abstraction for saving SPRs before entering deep idle states
Create a function for saving SPRs before entering deep idle states. This function can be reused for POWER9 deep idle states. Signed-off-by: Shreyas B. Prabhu --- New in v3 arch/powerpc/kernel/idle_power_common.S | 54 +++-- 1 file changed, 32 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kernel/idle_power_common.S b/arch/powerpc/kernel/idle_power_common.S index d100577..d931537 100644 --- a/arch/powerpc/kernel/idle_power_common.S +++ b/arch/powerpc/kernel/idle_power_common.S @@ -52,6 +52,36 @@ .text /* + * Used by threads before entering deep idle states. Saves SPRs + * in interrupt stack frame + */ +save_sprs_to_stack: + /* +* Note all register i.e per-core, per-subcore or per-thread is saved +* here since any thread in the core might wake up first +*/ + mfspr r3,SPRN_SDR1 + std r3,_SDR1(r1) + mfspr r3,SPRN_RPR + std r3,_RPR(r1) + mfspr r3,SPRN_SPURR + std r3,_SPURR(r1) + mfspr r3,SPRN_PURR + std r3,_PURR(r1) + mfspr r3,SPRN_TSCR + std r3,_TSCR(r1) + mfspr r3,SPRN_DSCR + std r3,_DSCR(r1) + mfspr r3,SPRN_AMOR + std r3,_AMOR(r1) + mfspr r3,SPRN_WORT + std r3,_WORT(r1) + mfspr r3,SPRN_WORC + std r3,_WORC(r1) + + blr + +/* * Used by threads when the lock bit of core_idle_state is set. * Threads will spin in HMT_LOW until the lock bit is cleared. * r14 - pointer to core_idle_state @@ -207,28 +237,8 @@ fastsleep_workaround_at_entry: b common_enter enter_winkle: - /* -* Note all register i.e per-core, per-subcore or per-thread is saved -* here since any thread in the core might wake up first -*/ - mfspr r3,SPRN_SDR1 - std r3,_SDR1(r1) - mfspr r3,SPRN_RPR - std r3,_RPR(r1) - mfspr r3,SPRN_SPURR - std r3,_SPURR(r1) - mfspr r3,SPRN_PURR - std r3,_PURR(r1) - mfspr r3,SPRN_TSCR - std r3,_TSCR(r1) - mfspr r3,SPRN_DSCR - std r3,_DSCR(r1) - mfspr r3,SPRN_AMOR - std r3,_AMOR(r1) - mfspr r3,SPRN_WORT - std r3,_WORT(r1) - mfspr r3,SPRN_WORC - std r3,_WORC(r1) + bl save_sprs_to_stack + IDLE_STATE_ENTER_SEQ(PPC_WINKLE) _GLOBAL(power7_idle) -- 2.4.11