Re: [PATCH] powernv: Restore SPRs correctly upon wake up from hypervisor state loss

2016-09-07 Thread Shreyas B. Prabhu
Hi Gautham,

Thanks for fixing this.

On Wed, Sep 7, 2016 at 1:16 AM, Gautham R. Shenoy
 wrote:
> From: "Gautham R. Shenoy" 
>
> pnv_wakeup_tb_loss function currently expects the cr4 to be "eq" if
> the CPU is waking up from a complete hypervisor state loss. Hence, it
> currently restores the SPR contents only if cr4 is "eq".
>
> However, after the commit bcef83a00dc4 ("powerpc/powernv: Add platform
> support for stop instruction"), on ISA_V300 CPUs, the function
> pnv_restore_hyp_resource sets cr4 to contain the result of the
> comparison between state the CPU has woken up and the first deepest
> stop state before calling pnv_wakeup_tb_loss.
>
> Thus if the CPU woke up from a state that is deeper than the first
> deepest stop state, cr4 have "gt" set and hence, pnv_wakeup_tb_loss
> will fail to restore the SPRs on waking up from such a state.
>
> Fix the code in pnv_wakeup_tb_loss to restore the SPR states when cr4 is
> "eq" or "gt".
>
> Fixes: Commit bcef83a00dc4 ("powerpc/powernv: Add platform support for
> stop instruction")
>
> Cc: Vaidyanathan Srinivasan 
> Cc: Michael Neuling 
> Cc: Michael Ellerman 
> Cc: Shreyas B. Prabhu 
> Signed-off-by: Gautham R. Shenoy 
> ---

Reviewed-by: Shreyas B. Prabhu 


Thanks,
Shreyas


Re: [PATCH] cpupower tools: Fix error when running cpupower monitor

2015-09-02 Thread Shreyas B Prabhu


On 08/25/2015 05:29 PM, Shreyas B Prabhu wrote:
> 
> 
> On 08/17/2015 01:22 PM, Shreyas B Prabhu wrote:
>>
>>
>> On 08/10/2015 05:58 PM, Thomas Renninger wrote:
>>> On Monday, August 03, 2015 11:46:00 AM Shreyas B. Prabhu wrote:
>>>> get_cpu_topology() tries to get topology info from all cpus by reading
>>>> files in the topology sysfs dir. If a cpu is offlined, since it doesn't
>>>> have topology dir, this function fails and returns -1. This causes
>>>> functions relying on get_cpu_topology() to fail. For example-
>>>>
>>>> $ cpupower monitor
>>>> Cannot read number of available processors
>>>>
>>>> Fix this by skipping fetching topology info for offline cpus.
>>>
>>> Looks fine.
>>>
>>> Thanks!
>>>
>>> Acked-by: Thomas Renninger 
>>>
>>
>> Thanks Thomas!
>> Rafael, can you please pick this patch?
>>
>>
> 
> 
> Hi Rafael,
> 
> If this patch looks good can you please pick this up?
> 
> 
> Thanks,
> Shreyas
> 

Hi Rafael,

If this patch looks good can you please pick this up?


Thanks,
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpupower tools: Fix error when running cpupower monitor

2015-09-04 Thread Shreyas B Prabhu

>>
>> Hi Rafael,
>>
>> If this patch looks good can you please pick this up?
> 
> I picked it up last week, sorry for being silent about that.
> 
> It should be in the Linus' tree already.
> 

Thanks! Sorry I missed the fact that you had picked it last week.

Thanks,
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 4/4] powernv: powerpc: Add winkle support for offline cpus

2014-12-08 Thread Shreyas B Prabhu


On Monday 08 December 2014 11:22 AM, Paul Mackerras wrote:
> On Thu, Dec 04, 2014 at 12:58:23PM +0530, Shreyas B. Prabhu wrote:
>> Winkle is a deep idle state supported in power8 chips. A core enters
>> winkle when all the threads of the core enter winkle. In this state
>> power supply to the entire chiplet i.e core, private L2 and private L3
>> is turned off. As a result it gives higher powersavings compared to
>> sleep.
>>
>> But entering winkle results in a total hypervisor state loss. Hence the
>> hypervisor context has to be preserved before entering winkle and
>> restored upon wake up.
>>
>> Power-on Reset Engine (PORE) is a dedicated engine which is responsible
>> for powering on the chiplet during wake up. It can be programmed to
>> restore the register contests of a few specific registers. This patch
>> uses PORE to restore register state wherever possible and uses stack to
>> save and restore rest of the necessary registers.
>>
>> With hypervisor state restore things fall under three categories-
>> per-core state, per-subcore state and per-thread state. To manage this,
>> extend the infrastructure introduced for sleep. Mainly we add a paca
>> variable subcore_sibling_mask. Using this and the core_idle_state we can
>> distingush first thread in core and subcore.
> 
> Comments below...
> 
>> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
>> b/arch/powerpc/kernel/exceptions-64s.S
>> index 7637889..2b9b5fb 100644
>> --- a/arch/powerpc/kernel/exceptions-64s.S
>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>> @@ -102,9 +102,7 @@ system_reset_pSeries:
>>  #ifdef CONFIG_PPC_P7_NAP
>>  BEGIN_FTR_SECTION
>>  /* Running native on arch 2.06 or later, check if we are
>> - * waking up from nap. We only handle no state loss and
>> - * supervisor state loss. We do -not- handle hypervisor
>> - * state loss at this time.
>> + * waking up from nap/sleep/winkle.
>>   */
>>  mfspr   r13,SPRN_SRR1
>>  rlwinm. r13,r13,47-31,30,31
>> @@ -112,7 +110,17 @@ BEGIN_FTR_SECTION
>>  
>>  cmpwi   cr3,r13,2
>>  
>> -GET_PACA(r13)
>> +/* Check if last bit of HSPGR0 is set. This indicates whether we are
>> + * waking up from winkle */
>> +li  r3,1
>> +mfspr   r4,SPRN_HSPRG0
>> +and r5,r4,r3
>> +cmpwi   cr4,r5,1/* Store result in cr4 for later use */
>> +
>> +andcr4,r4,r3
>> +mtspr   SPRN_HSPRG0,r4
>> +
>> +mr  r13,r4
> 
> This seems unnecessarily convoluted.  How about:
> 
>   GET_PACA(r13)
>   clrldi  r5,r13,63
>   clrrdi  r13,r13,1
>   cmpwi   cr4,r5,1
>   mtspr   SPRN_HSPRG0,r13
> 
Yes, makes more sense. I'll use this.

>> diff --git a/arch/powerpc/kernel/idle_power7.S 
>> b/arch/powerpc/kernel/idle_power7.S
>> index 8c3a1f4..8102075 100644
>> --- a/arch/powerpc/kernel/idle_power7.S
>> +++ b/arch/powerpc/kernel/idle_power7.S
>> @@ -19,8 +19,24 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #undef DEBUG
>> +/*
>> + * Use unused space in the interrupt stack to save and restore
>> + * registers for winkle support.
>> + */
>> +#define _SDR1   GPR3
>> +#define _RPRGPR4
>> +#define _SPURR  GPR5
>> +#define _PURR   GPR6
>> +#define _TSCR   GPR7
>> +#define _DSCR   GPR8
>> +#define _AMOR   GPR9
>> +#define _PMC5   GPR10
>> +#define _PMC6   GPR11
> 
> Why only PMC5 and PMC6 out of all the PMU registers?  What about
> PMC1-PMC4 and the MMCR registers?  I assume they're lost during winkle
> state also, aren't they?  If we're not saving them, what's the point
> of saving and restoring PMC5 and PMC6?
>
Yes all PMC and MMCR contents are lost. Using __restore_cpu_power8, the
MMCR registers are initialized to 0. The reasoning behind specifically
restoring PMC5 and PMC6 was the fact that they are not programmable and
count cycles/instructions by default. We suspected that there might be a
userspace program which relied on PMC5/PMC6 always increasing.
But now on closer look, since these counters are 32 bit and cycles/
instruction counts are bound to exceed it, I doubt such userspace programs
exist. I'll drop PMC5 and PMC6 in the next version.
 
>> +#define _WORT   GPR12
>> +#define _WORC   GPR13
>>  
>>  /* Idle state entry routines */
>>  
>> @@ -124,8 +140,8 @@ power7_enter_nap_mode:
>>  stb r4,HSTATE_HWTHREAD_STATE(r13)
>>  #endif
>>

[PATCH v4 0/4] powernv: cpuidle: Redesign idle states management

2014-12-09 Thread Shreyas B. Prabhu
Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the particular
idle state or a deeper one. There are tasks like fastsleep hardware bug
workaround and hypervisor core state save which have to be done only by
the last thread of the core entering deep idle state and similarly tasks
like timebase resync, hypervisor core register restore that have to be
done only by the first thread waking up from these states. 

The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is suboptimal,
but can cause functionality issues when subcores are involved.

Winkle is deeper idle state compared to fastsleep. In this state the power
supply to the chiplet, i.e core, private L2 and private L3 is turned off.
This results in a total hypervisor state loss. This patch set adds support
for winkle and provides a way to track the idle states of the threads of the
core and use it for idle state management of idle states sleep and winkle.

Note- This patch set requires "powerpc: powernv: Return to cpu offline loop
when finished in KVM guest" (http://patchwork.ozlabs.org/patch/417240/)

TBD:

- Remove duplication of branching to kvm code. 

Changes in v4:
--
- Based patches on top of http://patchwork.ozlabs.org/patch/417240/
- isync ordering fix.
- Save/Restore SRR1 value so that it doesn't get clobbered by 
opal_call_realmode.
- Changed HSPRG0 handling.
- Comment fixes.


Changes in v3:
-
- Added barriers after lock
- Added a paca field to that stores thread mask. 
- Changed code structure around fastsleep workaround, to allow for manual
  patching out if the platform does not require it. 
- Threads waiting on core_idle_state lock now loop in HMT_LOW
- Using NV CRs to avoid save/restore of CR while making OPAL calls.
- Fixed couple of flow issues in path where fastsleep workaround was not needed
- Using PPC_LR_STKOFF instead of _LINK in opal_call_realmode
- Restoring WORT and WORC

Changes in v2:
--
-Using PNV_THREAD_NAP/SLEEP defines while calling power7_powersave_common
-Comment changes based on review
-Rebased on top of 3.18-rc6


Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: Vaidyanathan Srinivasan 
Cc: Preeti U Murthy 

Paul Mackerras (1):
  powerpc: powernv: Switch off MMU before entering nap/sleep/rvwinkle
mode

Preeti U. Murthy (1):
  powerpc/powernv: Enable Offline CPUs to enter deep idle states

Shreyas B. Prabhu (2):
  powernv: cpuidle: Redesign idle states management
  powernv: powerpc: Add winkle support for offline cpus

 arch/powerpc/include/asm/cpuidle.h |  14 ++
 arch/powerpc/include/asm/opal.h|  13 +
 arch/powerpc/include/asm/paca.h|   6 +
 arch/powerpc/include/asm/ppc-opcode.h  |   2 +
 arch/powerpc/include/asm/processor.h   |   1 +
 arch/powerpc/include/asm/reg.h |   4 +
 arch/powerpc/kernel/asm-offsets.c  |   6 +
 arch/powerpc/kernel/cpu_setup_power.S  |   4 +
 arch/powerpc/kernel/exceptions-64s.S   |  30 ++-
 arch/powerpc/kernel/idle_power7.S  | 332 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |  39 +++
 arch/powerpc/platforms/powernv/powernv.h   |   2 +
 arch/powerpc/platforms/powernv/setup.c | 160 
 arch/powerpc/platforms/powernv/smp.c   |  10 +-
 arch/powerpc/platforms/powernv/subcore.c   |  34 +++
 arch/powerpc/platforms/powernv/subcore.h   |   1 +
 drivers/cpuidle/cpuidle-powernv.c  |  10 +-
 17 files changed, 608 insertions(+), 60 deletions(-)
 create mode 100644 arch/powerpc/include/asm/cpuidle.h

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/4] powerpc: powernv: Switch off MMU before entering nap/sleep/rvwinkle mode

2014-12-09 Thread Shreyas B. Prabhu
From: Paul Mackerras 

Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on.  This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context.  If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.

This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.

[ shre...@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras 
Signed-off-by: Shreyas B. Prabhu 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/include/asm/reg.h|  2 ++
 arch/powerpc/kernel/idle_power7.S | 18 +-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index c998279..a68ee15 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -118,8 +118,10 @@
 #define __MSR  (MSR_ME | MSR_RI | MSR_IR | MSR_DR | MSR_ISF |MSR_HV)
 #ifdef __BIG_ENDIAN__
 #define MSR_   __MSR
+#define MSR_IDLE   (MSR_ME | MSR_SF | MSR_HV)
 #else
 #define MSR_   (__MSR | MSR_LE)
+#define MSR_IDLE   (MSR_ME | MSR_SF | MSR_HV | MSR_LE)
 #endif
 #define MSR_KERNEL (MSR_ | MSR_64BIT)
 #define MSR_USER32 (MSR_ | MSR_PR | MSR_EE)
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index 18c0687..e5aba6a 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -101,7 +101,23 @@ _GLOBAL(power7_powersave_common)
std r9,_MSR(r1)
std r1,PACAR1(r13)
 
-_GLOBAL(power7_enter_nap_mode)
+   /*
+* Go to real mode to do the nap, as required by the architecture.
+* Also, we need to be in real mode before setting hwthread_state,
+* because as soon as we do that, another thread can switch
+* the MMU context to the guest.
+*/
+   LOAD_REG_IMMEDIATE(r5, MSR_IDLE)
+   li  r6, MSR_RI
+   andcr6, r9, r6
+   LOAD_REG_ADDR(r7, power7_enter_nap_mode)
+   mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
+   mtspr   SPRN_SRR0, r7
+   mtspr   SPRN_SRR1, r5
+   rfid
+
+   .globl  power7_enter_nap_mode
+power7_enter_nap_mode:
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
/* Tell KVM we're napping */
li  r4,KVM_HWTHREAD_IN_NAP
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 3/4] powernv: cpuidle: Redesign idle states management

2014-12-09 Thread Shreyas B. Prabhu
Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the
particular idle state or a deeper one. There are tasks like fastsleep
hardware bug workaround and hypervisor core state save which have to be
done only by the last thread of the core entering deep idle state and
similarly tasks like timebase resync, hypervisor core register restore
that have to be done only by the first thread waking up from these
state.

The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is
suboptimal, but can cause functionality issues when subcores and kvm is
involved.

This patch adds the necessary infrastructure to track idle states of
threads in a per-core structure. It uses this info to perform tasks like
fastsleep workaround and timebase resync only once per core.

Signed-off-by: Shreyas B. Prabhu 
Originally-by: Preeti U. Murthy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/include/asm/cpuidle.h |  20 +++
 arch/powerpc/include/asm/opal.h|   2 +
 arch/powerpc/include/asm/paca.h|   6 +
 arch/powerpc/include/asm/processor.h   |   2 +-
 arch/powerpc/kernel/asm-offsets.c  |   6 +
 arch/powerpc/kernel/exceptions-64s.S   |  24 +--
 arch/powerpc/kernel/idle_power7.S  | 197 +++--
 arch/powerpc/platforms/powernv/opal-wrappers.S |  37 +
 arch/powerpc/platforms/powernv/setup.c |  49 +-
 arch/powerpc/platforms/powernv/smp.c   |   3 +-
 drivers/cpuidle/cpuidle-powernv.c  |   3 +-
 11 files changed, 291 insertions(+), 58 deletions(-)
 create mode 100644 arch/powerpc/include/asm/cpuidle.h

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
new file mode 100644
index 000..d2f99ca
--- /dev/null
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -0,0 +1,20 @@
+#ifndef _ASM_POWERPC_CPUIDLE_H
+#define _ASM_POWERPC_CPUIDLE_H
+
+#ifdef CONFIG_PPC_POWERNV
+/* Used in powernv idle state management */
+#define PNV_THREAD_RUNNING  0
+#define PNV_THREAD_NAP  1
+#define PNV_THREAD_SLEEP2
+#define PNV_THREAD_WINKLE   3
+#define PNV_CORE_IDLE_LOCK_BIT  0x100
+#define PNV_CORE_IDLE_THREAD_BITS   0x0FF
+
+#ifndef __ASSEMBLY__
+extern u32 pnv_fastsleep_workaround_at_entry[];
+extern u32 pnv_fastsleep_workaround_at_exit[];
+#endif
+
+#endif
+
+#endif
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index f8b95c0..bef7fbc 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -152,6 +152,7 @@ struct opal_sg_list {
 #define OPAL_PCI_ERR_INJECT96
 #define OPAL_PCI_EEH_FREEZE_SET97
 #define OPAL_HANDLE_HMI98
+#define OPAL_CONFIG_CPU_IDLE_STATE 99
 #define OPAL_REGISTER_DUMP_REGION  101
 #define OPAL_UNREGISTER_DUMP_REGION102
 
@@ -162,6 +163,7 @@ struct opal_sg_list {
  */
 #define OPAL_PM_NAP_ENABLED0x0001
 #define OPAL_PM_SLEEP_ENABLED  0x0002
+#define OPAL_PM_SLEEP_ENABLED_ER1  0x0008
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index a5139ea..e2c4737 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -158,6 +158,12 @@ struct paca_struct {
 * early exception handler for use by high level C handler
 */
struct opal_machine_check_event *opal_mc_evt;
+
+   /* Per-core mask tracking idle threads and a lock bit-[L][] */
+   u32 *core_idle_state_ptr;
+   u8 thread_idle_state;   /* PNV_THREAD_RUNNING/NAP/SLEEP */
+   /* Mask to indicate thread id in core */
+   u8 thread_mask;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
/* Exclusive emergency stack pointer for machine check exception. */
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 29c3798..f5c45b3 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -452,7 +452,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, 
IDLE_POWERSAVE_OFF};
 
 extern int powersave_nap;  /* set if nap mode can be used in idle loop */
 extern unsigned long power7_nap(int check_irq);
-extern void power7_sleep(void);
+extern unsigned long power7_sleep(void);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
 extern void poweroff_now(void);
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9d7dede..3bc0352 100644
--- a/arch/powerpc/kernel/asm

[PATCH v4 2/4] powerpc/powernv: Enable Offline CPUs to enter deep idle states

2014-12-09 Thread Shreyas B. Prabhu
The secondary threads should enter deep idle states so as to gain maximum
powersavings when the entire core is offline. To do so the offline path
must be made aware of the available deepest idle state. Hence probe the
device tree for the possible idle states in powernv core code and
expose the deepest idle state through flags.

Since the  device tree is probed by the cpuidle driver as well, move
the parameters required to discover the idle states into an appropriate
common place to both the driver and the powernv core code.

Another point is that fastsleep idle state may require workarounds in
the kernel to function properly. This workaround is introduced in the
subsequent patches. However neither the cpuidle driver or the hotplug
path need be bothered about this workaround.

They will be taken care of by the core powernv code.

Originally-by: Srivatsa S. Bhat 
Signed-off-by: Preeti U. Murthy 
Signed-off-by: Shreyas B. Prabhu 
Reviewed-by: Paul Mackerras 

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/include/asm/opal.h  |  8 ++
 arch/powerpc/platforms/powernv/powernv.h |  2 ++
 arch/powerpc/platforms/powernv/setup.c   | 49 
 arch/powerpc/platforms/powernv/smp.c |  7 -
 drivers/cpuidle/cpuidle-powernv.c|  9 ++
 5 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9124b0e..f8b95c0 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -155,6 +155,14 @@ struct opal_sg_list {
 #define OPAL_REGISTER_DUMP_REGION  101
 #define OPAL_UNREGISTER_DUMP_REGION102
 
+/* Device tree flags */
+
+/* Flags set in power-mgmt nodes in device tree if
+ * respective idle states are supported in the platform.
+ */
+#define OPAL_PM_NAP_ENABLED0x0001
+#define OPAL_PM_SLEEP_ENABLED  0x0002
+
 #ifndef __ASSEMBLY__
 
 #include 
diff --git a/arch/powerpc/platforms/powernv/powernv.h 
b/arch/powerpc/platforms/powernv/powernv.h
index 6c8e2d1..604c48e 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -29,6 +29,8 @@ static inline u64 pnv_pci_dma_get_required_mask(struct 
pci_dev *pdev)
 }
 #endif
 
+extern u32 pnv_get_supported_cpuidle_states(void);
+
 extern void pnv_lpc_init(void);
 
 bool cpu_core_split_required(void);
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 3f9546d..34c6665 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -290,6 +290,55 @@ static void __init pnv_setup_machdep_rtas(void)
 }
 #endif /* CONFIG_PPC_POWERNV_RTAS */
 
+static u32 supported_cpuidle_states;
+
+u32 pnv_get_supported_cpuidle_states(void)
+{
+   return supported_cpuidle_states;
+}
+
+static int __init pnv_init_idle_states(void)
+{
+   struct device_node *power_mgt;
+   int dt_idle_states;
+   const __be32 *idle_state_flags;
+   u32 len_flags, flags;
+   int i;
+
+   supported_cpuidle_states = 0;
+
+   if (cpuidle_disable != IDLE_NO_OVERRIDE)
+   return 0;
+
+   if (!firmware_has_feature(FW_FEATURE_OPALv3))
+   return 0;
+
+   power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
+   if (!power_mgt) {
+   pr_warn("opal: PowerMgmt Node not found\n");
+   return 0;
+   }
+
+   idle_state_flags = of_get_property(power_mgt,
+   "ibm,cpu-idle-state-flags", &len_flags);
+   if (!idle_state_flags) {
+   pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
+   return 0;
+   }
+
+   dt_idle_states = len_flags / sizeof(u32);
+
+   for (i = 0; i < dt_idle_states; i++) {
+   flags = be32_to_cpu(idle_state_flags[i]);
+   supported_cpuidle_states |= flags;
+   }
+
+   return 0;
+}
+
+subsys_initcall(pnv_init_idle_states);
+
+
 static int __init pnv_probe(void)
 {
unsigned long root = of_get_flat_dt_root();
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index b716f66..83299ef 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -150,6 +150,7 @@ static void pnv_smp_cpu_kill_self(void)
 {
unsigned int cpu;
unsigned long srr1;
+   u32 idle_states;
 
/* Standard hot unplug procedure */
local_irq_disable();
@@ -160,13 +161,17 @@ static void pnv_smp_cpu_kill_self(void)
generic_set_cpu_dead(cpu);
smp_wmb();
 
+   idle_states = pnv_get_supported_cpuidle_states();
/* We don't want to take decrementer interrupts while we are offline,
 * so clear LPCR:PECE1. We keep PECE2

[PATCH v4 4/4] powernv: powerpc: Add winkle support for offline cpus

2014-12-09 Thread Shreyas B. Prabhu
Winkle is a deep idle state supported in power8 chips. A core enters
winkle when all the threads of the core enter winkle. In this state
power supply to the entire chiplet i.e core, private L2 and private L3
is turned off. As a result it gives higher powersavings compared to
sleep.

But entering winkle results in a total hypervisor state loss. Hence the
hypervisor context has to be preserved before entering winkle and
restored upon wake up.

Power-on Reset Engine (PORE) is a dedicated engine which is responsible
for powering on the chiplet during wake up. It can be programmed to
restore the register contests of a few specific registers. This patch
uses PORE to restore register state wherever possible and uses stack to
save and restore rest of the necessary registers.

With hypervisor state restore things fall under three categories-
per-core state, per-subcore state and per-thread state. To manage this,
extend the infrastructure introduced for sleep. Mainly we add a paca
variable subcore_sibling_mask. Using this and the core_idle_state we can
distingush first thread in core and subcore.

Signed-off-by: Shreyas B. Prabhu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/include/asm/opal.h|   3 +
 arch/powerpc/include/asm/paca.h|   2 +
 arch/powerpc/include/asm/ppc-opcode.h  |   2 +
 arch/powerpc/include/asm/processor.h   |   1 +
 arch/powerpc/include/asm/reg.h |   2 +
 arch/powerpc/kernel/asm-offsets.c  |   2 +
 arch/powerpc/kernel/exceptions-64s.S   |  11 +-
 arch/powerpc/kernel/idle_power7.S  | 141 +++--
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/setup.c |  73 +
 arch/powerpc/platforms/powernv/smp.c   |   4 +-
 arch/powerpc/platforms/powernv/subcore.c   |  34 ++
 arch/powerpc/platforms/powernv/subcore.h   |   1 +
 13 files changed, 266 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index bef7fbc..f0ca2d9 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -153,6 +153,7 @@ struct opal_sg_list {
 #define OPAL_PCI_EEH_FREEZE_SET97
 #define OPAL_HANDLE_HMI98
 #define OPAL_CONFIG_CPU_IDLE_STATE 99
+#define OPAL_SLW_SET_REG   100
 #define OPAL_REGISTER_DUMP_REGION  101
 #define OPAL_UNREGISTER_DUMP_REGION102
 
@@ -163,6 +164,7 @@ struct opal_sg_list {
  */
 #define OPAL_PM_NAP_ENABLED0x0001
 #define OPAL_PM_SLEEP_ENABLED  0x0002
+#define OPAL_PM_WINKLE_ENABLED 0x0004
 #define OPAL_PM_SLEEP_ENABLED_ER1  0x0008
 
 #ifndef __ASSEMBLY__
@@ -972,6 +974,7 @@ int64_t opal_sensor_read(uint32_t sensor_hndl, int token, 
__be32 *sensor_data);
 int64_t opal_handle_hmi(void);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
+int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
 int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t 
pe_number);
 
 /* Internal functions */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index e2c4737..c979577 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -164,6 +164,8 @@ struct paca_struct {
u8 thread_idle_state;   /* PNV_THREAD_RUNNING/NAP/SLEEP */
/* Mask to indicate thread id in core */
u8 thread_mask;
+   /* Mask to denote subcore sibling threads */
+   u8 subcore_sibling_mask;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
/* Exclusive emergency stack pointer for machine check exception. */
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 6f85362..5155be7 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -194,6 +194,7 @@
 
 #define PPC_INST_NAP   0x4c000364
 #define PPC_INST_SLEEP 0x4c0003a4
+#define PPC_INST_WINKLE0x4c0003e4
 
 /* A2 specific instructions */
 #define PPC_INST_ERATWE0x7c0001a6
@@ -374,6 +375,7 @@
 
 #define PPC_NAPstringify_in_c(.long PPC_INST_NAP)
 #define PPC_SLEEP  stringify_in_c(.long PPC_INST_SLEEP)
+#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE)
 
 /* BHRB instructions */
 #define PPC_CLRBHRBstringify_in_c(.long PPC_INST_CLRBHRB)
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index f5c45b3..bf117d8 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -453,6 +453,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0

Re: [PATCH v3 3/4] powernv: cpuidle: Redesign idle states management

2014-12-07 Thread Shreyas B Prabhu
Hi Paul,

On Monday 08 December 2014 10:31 AM, Paul Mackerras wrote:
> On Thu, Dec 04, 2014 at 12:58:22PM +0530, Shreyas B. Prabhu wrote:
>> Deep idle states like sleep and winkle are per core idle states. A core
>> enters these states only when all the threads enter either the
>> particular idle state or a deeper one. There are tasks like fastsleep
>> hardware bug workaround and hypervisor core state save which have to be
>> done only by the last thread of the core entering deep idle state and
>> similarly tasks like timebase resync, hypervisor core register restore
>> that have to be done only by the first thread waking up from these
>> state.
>>
>> The current idle state management does not have a way to distinguish the
>> first/last thread of the core waking/entering idle states. Tasks like
>> timebase resync are done for all the threads. This is not only is
>> suboptimal, but can cause functionality issues when subcores and kvm is
>> involved.
>>
>> This patch adds the necessary infrastructure to track idle states of
>> threads in a per-core structure. It uses this info to perform tasks like
>> fastsleep workaround and timebase resync only once per core.
> 
> Comments below...
> 
>> diff --git a/arch/powerpc/include/asm/paca.h 
>> b/arch/powerpc/include/asm/paca.h
>> index a5139ea..e4578c3 100644
>> --- a/arch/powerpc/include/asm/paca.h
>> +++ b/arch/powerpc/include/asm/paca.h
>> @@ -158,6 +158,12 @@ struct paca_struct {
>>   * early exception handler for use by high level C handler
>>   */
>>  struct opal_machine_check_event *opal_mc_evt;
>> +
>> +/* Per-core mask tracking idle threads and a lock bit-[L][] */
>> +u32 *core_idle_state_ptr;
>> +u8 thread_idle_state;   /* ~Idle[0]/Nap[1]/Sleep[2]/Winkle[3] */
> 
> Might be clearer in the comment to say "/* PNV_THREAD_xxx */" so it's
> clear the value should be one of PNV_THREAD_NAP, PNV_THREAD_SLEEP,
> etc.

Okay. 
> 
>> diff --git a/arch/powerpc/kernel/idle_power7.S 
>> b/arch/powerpc/kernel/idle_power7.S
>> index 283c603..8c3a1f4 100644
>> --- a/arch/powerpc/kernel/idle_power7.S
>> +++ b/arch/powerpc/kernel/idle_power7.S
>> @@ -18,6 +18,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #undef DEBUG
>>  
>> @@ -37,8 +38,7 @@
>>  
>>  /*
>>   * Pass requested state in r3:
>> - *  0 - nap
>> - *  1 - sleep
>> + *  r3 - PNV_THREAD_NAP/SLEEP/WINKLE
>>   *
>>   * To check IRQ_HAPPENED in r4
>>   *  0 - don't check
>> @@ -123,12 +123,58 @@ power7_enter_nap_mode:
>>  li  r4,KVM_HWTHREAD_IN_NAP
>>  stb r4,HSTATE_HWTHREAD_STATE(r13)
>>  #endif
>> -cmpwi   cr0,r3,1
>> -beq 2f
>> +stb r3,PACA_THREAD_IDLE_STATE(r13)
>> +cmpwi   cr1,r3,PNV_THREAD_SLEEP
>> +bge cr1,2f
>>  IDLE_STATE_ENTER_SEQ(PPC_NAP)
>>  /* No return */
>> -2:  IDLE_STATE_ENTER_SEQ(PPC_SLEEP)
>> -/* No return */
>> +2:
>> +/* Sleep or winkle */
>> +lbz r7,PACA_THREAD_MASK(r13)
>> +ld  r14,PACA_CORE_IDLE_STATE_PTR(r13)
>> +lwarx_loop1:
>> +lwarx   r15,0,r14
>> +andcr15,r15,r7  /* Clear thread bit */
>> +
>> +andi.   r15,r15,PNV_CORE_IDLE_THREAD_BITS
>> +
>> +/*
>> + * If cr0 = 0, then current thread is the last thread of the core entering
>> + * sleep. Last thread needs to execute the hardware bug workaround code if
>> + * required by the platform.
>> + * Make the workaround call unconditionally here. The below branch call is
>> + * patched out when the idle states are discovered if the platform does not
>> + * require it.
>> + */
>> +.global pnv_fastsleep_workaround_at_entry
>> +pnv_fastsleep_workaround_at_entry:
>> +beq fastsleep_workaround_at_entry
> 
> Did you investigate using the feature bit mechanism to do this
> patching for you?  You would need to allocate a CPU feature bit and
> parse the device tree early on and set or clear the feature bit,
> before the feature fixups are done.  The code here would then end up
> looking like:
> 
> BEGIN_FTR_SECTION
>   beq fastsleep_workaround_at_entry
> END_FTR_SECTION_IFSET(CPU_FTR_FASTSLEEP_WORKAROUND)
> 

I agree using feature fixup is a much cleaner implementation. The difficulty is,
information on whether fastsleep workaround is needed is passed in the device
tree. do_feature_fixups is currently called before we unflatten the device tree.
Any suggestions for 

Re: [PATCH v2 1/2] powerpc: Add helpers for LPCR PECE1 operations

2015-01-22 Thread Shreyas B Prabhu


On Friday 23 January 2015 08:36 AM, Michael Ellerman wrote:
> On Mon, 2015-01-19 at 13:35 +0530, Shreyas B. Prabhu wrote:
>> PECE1 bit in LPCR is used to control whether decrementer can cause exit
>> from powersaving states. PECE1 bit is cleared before entering fastsleep
>> or deeper powersaving state and it is set on waking up. Since both
>> cpuidle and cpu offline operations use these powersaving states, add
>> helper functions to be used in both these places.
> 
> Thanks.
> 
> That isn't really much clearer than the original, so in the end I just merged
> your original fix.
> 
> I'll think if there's a bigger consolidation we can do that makes it clearer.
> 
> cheers
> 
> 
Helper could have been this :

#define   LPCR_CLEAR_PECE1  (mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1)

This perhaps would make it more clearer, but it will end up using additional 
mfspr here-

static int fastsleep_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
...

new_lpcr = old_lpcr;
/* Do not exit powersave upon decrementer as we've setup the timer
 * offload.
 */
new_lpcr &= ~LPCR_PECE1;

mtspr(SPRN_LPCR, new_lpcr);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] powerpc: powernv: winkle: Restore LPCR with LPCR_PECE1 cleared

2015-01-14 Thread Shreyas B. Prabhu
LPCR_PECE1 bit controls whether decrementer interrupts are allowed to
cause exit from power-saving mode. While waking up from winkle, restoring
LPCR with LPCR_PECE1 set (i.e Decrementer interrupts allowed) can cause
issue in the following scenario:

- All the threads in a core are offlined. The core enters deep winkle.
- Spurious interrupt wakes up a thread in the core. Here LPCR is restored
  with LPCR_PECE1 bit set.
- Since it was a spurious interrupt on a offline thread, the thread clears
  the interrupt and goes back to winkle.
- Here before the thread executes winkle and puts the core into deep winkle,
  if a decrementer interrupt occurs on any of the sibling threads in the core
  that thread wakes up. 
- Since in offline loop we are flushing interrupt only in case of external
  interrupt, the decrementer interrupt does not get flushed. So at this stage
  the thread is stuck in this is loop of waking up at 0x100 due to decrementer
  interrupt, not flushing the interrupt as only external interrupts get flushed,
  entering winkle, waking up at 0x100 again.

Fix this by programming PORE to restore LPCR with LPCR_PECE1 bit 
cleared when waking up from winkle.

Signed-off-by: Shreyas B. Prabhu 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: linuxppc-...@lists.ozlabs.org
---
This issue is separate from the issue which Alexey has reported. Fix for that is
still pending.

 arch/powerpc/platforms/powernv/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index ad0e32e..83067b1 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -298,7 +298,7 @@ int pnv_save_sprs_for_winkle(void)
 * all cpus at boot. Get these reg values of current cpu and use the
 * same accross all cpus.
 */
-   uint64_t lpcr_val = mfspr(SPRN_LPCR);
+   uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1;
uint64_t hid0_val = mfspr(SPRN_HID0);
uint64_t hid1_val = mfspr(SPRN_HID1);
uint64_t hid4_val = mfspr(SPRN_HID4);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] powerpc: powernv: winkle: Restore LPCR with LPCR_PECE1 cleared

2015-01-19 Thread Shreyas B. Prabhu
LPCR_PECE1 bit controls whether decrementer interrupts are allowed to
cause exit from power-saving mode. While waking up from winkle, restoring
LPCR with LPCR_PECE1 set (i.e Decrementer interrupts allowed) can cause
issue in the following scenario:

- All the threads in a core are offlined. The core enters deep winkle.
- Spurious interrupt wakes up a thread in the core. Here LPCR is restored
  with LPCR_PECE1 bit set.
- Since it was a spurious interrupt on a offline thread, the thread clears
  the interrupt and goes back to winkle.
- Here before the thread executes winkle and puts the core into deep winkle,
  if a decrementer interrupt occurs on any of the sibling threads in the core
  that thread wakes up.
- Since in offline loop we are flushing interrupt only in case of external
  interrupt, the decrementer interrupt does not get flushed. So at this stage
  the thread is stuck in this is loop of waking up at 0x100 due to decrementer
  interrupt, not flushing the interrupt as only external interrupts get flushed,
  entering winkle, waking up at 0x100 again.

Fix this by programming PORE to restore LPCR with LPCR_PECE1 bit
cleared when waking up from winkle.

Signed-off-by: Shreyas B. Prabhu 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: linuxppc-...@lists.ozlabs.org
---
Changes is v2:
==
Using the helper function introduced in the previous patch.

 arch/powerpc/platforms/powernv/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index ad0e32e..ded7fc8 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -298,7 +298,7 @@ int pnv_save_sprs_for_winkle(void)
 * all cpus at boot. Get these reg values of current cpu and use the
 * same accross all cpus.
 */
-   uint64_t lpcr_val = mfspr(SPRN_LPCR);
+   uint64_t lpcr_val = LPCR_CLEAR_PECE1(mfspr(SPRN_LPCR));
uint64_t hid0_val = mfspr(SPRN_HID0);
uint64_t hid1_val = mfspr(SPRN_HID1);
uint64_t hid4_val = mfspr(SPRN_HID4);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] powerpc: Add helpers for LPCR PECE1 operations

2015-01-19 Thread Shreyas B. Prabhu
PECE1 bit in LPCR is used to control whether decrementer can cause exit
from powersaving states. PECE1 bit is cleared before entering fastsleep
or deeper powersaving state and it is set on waking up. Since both
cpuidle and cpu offline operations use these powersaving states, add
helper functions to be used in both these places.

Signed-off-by: Shreyas B. Prabhu 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/include/asm/reg.h   | 4 
 arch/powerpc/platforms/powernv/smp.c | 4 ++--
 drivers/cpuidle/cpuidle-powernv.c| 3 +--
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index c870e38..0847303 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -339,6 +339,10 @@
 #define   LPCR_LPES_SH 2
 #define   LPCR_RMI 0x0002  /* real mode is cache inhibit */
 #define   LPCR_HDICE   0x0001  /* Hyp Decr enable (HV,PR,EE) */
+/* LPCR PECE1 helpers. Used to disable/enable wake up due to decrementer */
+#define   LPCR_CLEAR_PECE1(old)(old & ~(u64)LPCR_PECE1)
+#define   LPCR_SET_PECE1(old)  (old | (u64)LPCR_PECE1)
+
 #ifndef SPRN_LPID
 #define SPRN_LPID  0x13F   /* Logical Partition Identifier */
 #endif
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index 781ec45..ab61cb0 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -165,7 +165,7 @@ static void pnv_smp_cpu_kill_self(void)
/* We don't want to take decrementer interrupts while we are offline,
 * so clear LPCR:PECE1. We keep PECE2 enabled.
 */
-   mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
+   mtspr(SPRN_LPCR, LPCR_CLEAR_PECE1(mfspr(SPRN_LPCR)));
while (!generic_check_cpu_restart(cpu)) {
 
ppc64_runlatch_off();
@@ -203,7 +203,7 @@ static void pnv_smp_cpu_kill_self(void)
if (!generic_check_cpu_restart(cpu))
DBG("CPU%d Unexpected exit while offline !\n", cpu);
}
-   mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_PECE1);
+   mtspr(SPRN_LPCR, LPCR_SET_PECE1(mfspr(SPRN_LPCR)));
DBG("CPU%d coming online...\n", cpu);
 }
 
diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index de61b9a..ed0be4c 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -69,11 +69,10 @@ static int fastsleep_loop(struct cpuidle_device *dev,
if (unlikely(system_state < SYSTEM_RUNNING))
return index;
 
-   new_lpcr = old_lpcr;
/* Do not exit powersave upon decrementer as we've setup the timer
 * offload.
 */
-   new_lpcr &= ~LPCR_PECE1;
+   new_lpcr = LPCR_CLEAR_PECE1(old_lpcr);
 
mtspr(SPRN_LPCR, new_lpcr);
power7_sleep();
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior

2015-03-21 Thread Shreyas B. Prabhu
Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the
communication between L2 and L3 needs to be fenced. But there is a bug
in the current power8 chips surrounding this fencing.

OPAL provides a workaround which precludes the possibility of hitting
this bug. But running with this workaround applied causes checkstop
if any correctable error in L2 cache directory is detected. Hence OPAL
also provides a way to undo the workaround.

In the existing implementation, workaround is applied by the last thread
of the core entering fastsleep and undone by the first thread waking up.
But this has a performance cost. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.

This patch introduces a sysfs attribute (fastsleep_workaround_state)
to choose the behavior of this workaround.

By default, fastsleep_workaround_state = 0. In this case, workaround
is applied/undone everytime the core enters/exits fastsleep.

fastsleep_workaround_state = 1. In this case the workaround is applied
once on all the cores and never undone. This can be triggered by
echo 1 > /sys/devices/system/cpu/fastsleep_workaround_state

For simplicity this attribute can be modified only once. Implying, once
fastsleep_workaround_state is changed to 1, it cannot be reverted to
the default state.

Signed-off-by: Shreyas B. Prabhu 
---
Changes in V3-
Kernel parameter changed to sysfs attribute
Modified commmit message

 arch/powerpc/include/asm/opal.h|  8 +++
 arch/powerpc/platforms/powernv/idle.c  | 83 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 3 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9ee0a30..8bea8fc 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -180,6 +180,13 @@ struct opal_sg_list {
 #define OPAL_PM_WINKLE_ENABLED 0x0004
 #define OPAL_PM_SLEEP_ENABLED_ER1  0x0008
 
+/*
+ * OPAL_CONFIG_CPU_IDLE_STATE parameters
+ */
+#define OPAL_CONFIG_IDLE_FASTSLEEP 1
+#define OPAL_CONFIG_IDLE_UNDO  0
+#define OPAL_CONFIG_IDLE_APPLY 1
+
 #ifndef __ASSEMBLY__
 
 #include 
@@ -924,6 +931,7 @@ int64_t opal_handle_hmi(void);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
 int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
+int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag);
 int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t 
pe_number);
 int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg,
uint64_t msg_len);
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 77992f6..79157b9 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -136,6 +138,77 @@ u32 pnv_get_supported_cpuidle_states(void)
 }
 EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states);
 
+static void pnv_fastsleep_workaround_apply(void *info)
+{
+   opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP,
+   OPAL_CONFIG_IDLE_APPLY);
+}
+
+/*
+ * Used to store fastsleep workaround state
+ * 0 - Workaround applied/undone at fastsleep entry/exit path (Default)
+ * 1 - Workaround applied once, never undone.
+ */
+static u8 fastsleep_workaround_state;
+
+static ssize_t show_fastsleep_workaround_state(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   return sprintf(buf, "%u\n", fastsleep_workaround_state);
+}
+
+static ssize_t store_fastsleep_workaround_state(struct device *dev,
+   struct device_attribute *attr, const char *buf,
+   size_t count)
+{
+   u32 val;
+   cpumask_t primary_thread_mask;
+
+   /*
+* fastsleep_workaround_state is write-once parameter.
+* Once it has been set to 1, it cannot be undone.
+*/
+   if (fastsleep_workaround_state == 1)
+   return -EINVAL;
+
+   if (kstrtou32(buf, 0, &val))
+   return -EINVAL;
+
+   if (val > 1)
+   return -EINVAL;
+
+   fastsleep_workaround_state = 1;
+   /*
+* fastsleep_workaround_state = 1 implies fastsleep workaround needs to
+* be left in 'applied' state on all the cores. Do this by-
+* 1. Patching out the call to 'undo' workaround in fastsleep exit path
+* 2. Sending ipi to all the cores which have atleast one online thread
+* 3. Patching out the call to 'apply' workaround in fastsleep entry
+   

[PATCH v3 1/3] powerpc: Fix cpu_online_cores_map to return only online threads mask

2015-03-21 Thread Shreyas B. Prabhu
Currently, cpu_online_cores_map returns a mask, which for every core
that has atleast one online thread, has the first-cpu-of-that-core's bit
set. But the first cpu itself may not be online always. In such cases, if
the returned mask is used for IPI, then it'll cause IPIs to be skipped
on cores where the first thread is offline.

Fix this by setting first-online-cpu-of-the-core's bit in the mask.
This is done by fixing this in the underlying function
cpu_thread_mask_to_cores.

Signed-off-by: Shreyas B. Prabhu 
---
This patch is new in v3

In an example scenario where all the threads of 1st core are offline
and argument to cpu_thread_mask_to_cores is cpu_possible_mask,
with this implementation, return value will not have any bit
corresponding to 1st core set. I think that should be okay. Any thoughts?

 arch/powerpc/include/asm/cputhreads.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/cputhreads.h 
b/arch/powerpc/include/asm/cputhreads.h
index 2bf8e93..9e8485c 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -31,9 +31,9 @@ extern cpumask_t threads_core_mask;
 /* cpu_thread_mask_to_cores - Return a cpumask of one per cores
  *hit by the argument
  *
- * @threads:   a cpumask of threads
+ * @threads:   a cpumask of online threads
  *
- * This function returns a cpumask which will have one "cpu" (or thread)
+ * This function returns a cpumask which will have one online cpu's
  * bit set for each core that has at least one thread set in the argument.
  *
  * This can typically be used for things like IPI for tlb invalidations
@@ -42,13 +42,16 @@ extern cpumask_t threads_core_mask;
 static inline cpumask_t cpu_thread_mask_to_cores(const struct cpumask *threads)
 {
cpumask_t   tmp, res;
-   int i;
+   int i, cpu;
 
cpumask_clear(&res);
for (i = 0; i < NR_CPUS; i += threads_per_core) {
cpumask_shift_left(&tmp, &threads_core_mask, i);
-   if (cpumask_intersects(threads, &tmp))
-   cpumask_set_cpu(i, &res);
+   if (cpumask_intersects(threads, &tmp)) {
+   cpu = cpumask_next_and(-1, &tmp, cpu_online_mask);
+   if (cpu < nr_cpu_ids)
+   cpumask_set_cpu(cpu, &res);
+   }
}
return res;
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 2/3] powerpc/powernv: Move cpuidle related code from setup.c to new file

2015-03-21 Thread Shreyas B. Prabhu
This is a cleanup patch; doesn't change any functionality. Moves
all cpuidle related code from setup.c to a new file.

Signed-off-by: Shreyas B. Prabhu 
---
This patch is new in v3

 arch/powerpc/platforms/powernv/Makefile |   2 +-
 arch/powerpc/platforms/powernv/idle.c   | 186 
 arch/powerpc/platforms/powernv/setup.c  | 166 
 3 files changed, 187 insertions(+), 167 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/idle.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 6f3c5d3..560ee54 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,4 +1,4 @@
-obj-y  += setup.o opal-wrappers.o opal.o opal-async.o
+obj-y  += setup.o opal-wrappers.o opal.o opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
new file mode 100644
index 000..77992f6
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -0,0 +1,186 @@
+/*
+ * PowerNV cpuidle code
+ *
+ * Copyright 2015 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "powernv.h"
+#include "subcore.h"
+
+static u32 supported_cpuidle_states;
+
+int pnv_save_sprs_for_winkle(void)
+{
+   int cpu;
+   int rc;
+
+   /*
+* hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross
+* all cpus at boot. Get these reg values of current cpu and use the
+* same accross all cpus.
+*/
+   uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1;
+   uint64_t hid0_val = mfspr(SPRN_HID0);
+   uint64_t hid1_val = mfspr(SPRN_HID1);
+   uint64_t hid4_val = mfspr(SPRN_HID4);
+   uint64_t hid5_val = mfspr(SPRN_HID5);
+   uint64_t hmeer_val = mfspr(SPRN_HMEER);
+
+   for_each_possible_cpu(cpu) {
+   uint64_t pir = get_hard_smp_processor_id(cpu);
+   uint64_t hsprg0_val = (uint64_t)&paca[cpu];
+
+   /*
+* HSPRG0 is used to store the cpu's pointer to paca. Hence last
+* 3 bits are guaranteed to be 0. Program slw to restore HSPRG0
+* with 63rd bit set, so that when a thread wakes up at 0x100 we
+* can use this bit to distinguish between fastsleep and
+* deep winkle.
+*/
+   hsprg0_val |= 1;
+
+   rc = opal_slw_set_reg(pir, SPRN_HSPRG0, hsprg0_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
+   if (rc != 0)
+   return rc;
+
+   /* HIDs are per core registers */
+   if (cpu_thread_in_core(cpu) == 0) {
+
+   rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val);
+   if (rc != 0)
+   return rc;
+   }
+   }
+
+   return 0;
+}
+
+static void pnv_alloc_idle_core_states(void)
+{
+   int i, j;
+   int nr_cores = cpu_nr_cores();
+   u32 *core_idle_state;
+
+   /*
+* core_idle_state - First 8 bits track the idle state of each thread
+* of the core. The 8th bit is the lock bit. Initially all thread bits
+* are set. They are cleared when the thread enters deep idle state
+* like sleep and winkle. Initially the lock bit is cleared.
+* The lock bit has 2 purposes
+* a. While the first thread is restoring core state, it prevents
+* other threads in the core from switching to process context.
+* b. While the last thread in the core is saving the core state, it
+* prev

Re: [PATCH] kvm: powerpc: Fix ppc64_defconfig + PPC_POWERNV=n build error

2015-04-22 Thread Shreyas B Prabhu
Any suggestions on this?

On Thursday 16 April 2015 04:28 PM, Shreyas B. Prabhu wrote:
> kvm_no_guest function calls power7_wakeup_loss to put the thread into
> the deepest supported idle state. power7_wakeup_loss is defined in
> arch/powerpc/kernel/idle_power7.S, which is compiled only when PPC_P7_NAP=y.
> And PPC_P7_NAP is selected when PPC_POWERNV=y.
> Hence in cases where PPC_POWERNV=n and KVM_BOOK3S_64_HV=y we see the
> following error:
> 
> arch/powerpc/kvm/built-in.o: In function `kvm_no_guest':
> arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x42c): undefined reference to 
> `power7_wakeup_loss'
> 
> Fix this by adding PPC_POWERNV as a dependency for KVM_BOOK3S_64_HV.
> 
> Signed-off-by: Shreyas B. Prabhu 
> ---
>  arch/powerpc/kvm/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index 11850f3..b3b3d9f 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -75,7 +75,7 @@ config KVM_BOOK3S_64
> 
>  config KVM_BOOK3S_64_HV
>   tristate "KVM support for POWER7 and PPC970 using hypervisor mode in 
> host"
> - depends on KVM_BOOK3S_64
> + depends on KVM_BOOK3S_64 && PPC_POWERNV
>   select KVM_BOOK3S_HV_POSSIBLE
>   select MMU_NOTIFIER
>   select CMA
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior

2015-04-14 Thread Shreyas B Prabhu

>>
>> By default, fastsleep_workaround_state = dynamic. In this case, workaround
>> is applied/undone everytime the core enters/exits fastsleep.
>>
>> fastsleep_workaround_state = applyonce. In this case the workaround is
>> applied once on all the cores and never undone. This can be triggered by
>> echo applyonce > /sys/devices/system/cpu/fastsleep_workaround_state
> 
> I was wondering if we really need such an elaborate design for this
> sysfs file. Why not a sysfs file called fastsleep_workaround_apply_once,
> which is set to '0' by default and the only value that it can take is
> '1' ? The name easily implies that the workaround is applied only once
> if it is set. I can see that this can cut down a good chunk of code from
> this patch. I just didn't find too much value in having so much code for
> a simple 'on' knob.

I was considering something similar too. But then moved to this format
as I thought this was unambiguous. Also moving to a binary attribute
will reduces code only in show_fastsleep_workaround_state which I don't
feel is much.
That said, if you feel strongly about it, I can change it to the format
you suggested.

Thanks,
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior

2015-04-14 Thread Shreyas B Prabhu

> 
> A point that bothers me here is if we can potentially race with cpu
> hotplug ? If cpuX and its siblings are offline and it was interrupted to
> come online:
> 
> cpuX cpuY
> Interrupted to come online
> Undo workaround
> 
> Nop the fastsleep_workaround_exit path
> IPI online cores: apply workaround once
> 
> Set yourself in the online mask
> Nop the fastsleep_workaround_entry path
> 
> 
> This results in cpuX undoing the workaround on its core, never to set it
> back again.
> 
> So should we protect the region between the beginning and end of
> patching instructions with get_online_cpus() and put_online_cpus() ?
> 

Nice catch. I had missed this. Sending out a patch correcting this.

Thanks,
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 0/3] powerpc: powernv: Fastsleep workaround behavior

2015-04-14 Thread Shreyas B. Prabhu
Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the 
communication between L2 and L3 needs to be fenced. But there is a bug 
in the current power8 chips surrounding this fencing.

OPAL provides a workaround which precludes the possibility of hitting
this bug. But running with this workaround applied causes checkstop
if any correctable error in L2 cache directory is detected. Hence OPAL
also provides a way to undo the workaround.

In the existing implementation, workaround is applied by the last thread
of the core entering fastsleep and undone by the first thread waking up. 
But this has a performance cost. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.

This patchset introduces a sysfs attribute (fastsleep_workaround_state)
to choose the behavior of this workaround.

Patch 1/3 fixes cpu_online_cores_map which is used by Patch 3/3. 
Patch 2/3 is a clean up patch. It moves all cpuidle related code into 
a new file. 
Patch 3/3 introduces the sysfs attribute to control fastsleep workaround
behavior


Changes in v5:
- Fix potential race with hotplug with get_online_cpu/put_online_cpu

Changes in v4:
-
-Handling patch_instruction and OPAL call errors
-Sysfs attribute takes string ("dynamic" vs "applyonce") as input. 
-Improved changelogs

Changes in v3:
--
-Kernel parameter changed to sysfs attribute

Changes in v2:
--
-Changed commit message to accurately describe the downside
 of running workaround always applied.

Shreyas B. Prabhu (3):
  powerpc: Fix cpu_online_cores_map to return only online threads mask
  powerpc/powernv: Move cpuidle related code from setup.c to new file
  powerpc/powernv: Introduce sysfs control for fastsleep workaround
behavior

 arch/powerpc/include/asm/cputhreads.h  |  13 +-
 arch/powerpc/include/asm/opal-api.h|   7 +
 arch/powerpc/include/asm/opal.h|   1 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/idle.c  | 323 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/setup.c | 171 -
 7 files changed, 341 insertions(+), 177 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/idle.c

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 2/3] powerpc/powernv: Move cpuidle related code from setup.c to new file

2015-04-14 Thread Shreyas B. Prabhu
This is a cleanup patch; doesn't change any functionality. Moves
all cpuidle related code from setup.c to a new file.

Signed-off-by: Shreyas B. Prabhu 
Reviewed-by: Preeti U Murthy 
---
 arch/powerpc/platforms/powernv/Makefile |   2 +-
 arch/powerpc/platforms/powernv/idle.c   | 191 
 arch/powerpc/platforms/powernv/setup.c  | 171 
 3 files changed, 192 insertions(+), 172 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/idle.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 33e44f3..bee9235 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,4 +1,4 @@
-obj-y  += setup.o opal-wrappers.o opal.o opal-async.o
+obj-y  += setup.o opal-wrappers.o opal.o opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
new file mode 100644
index 000..104235a
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -0,0 +1,191 @@
+/*
+ * PowerNV cpuidle code
+ *
+ * Copyright 2015 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "powernv.h"
+#include "subcore.h"
+
+static u32 supported_cpuidle_states;
+
+int pnv_save_sprs_for_winkle(void)
+{
+   int cpu;
+   int rc;
+
+   /*
+* hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross
+* all cpus at boot. Get these reg values of current cpu and use the
+* same accross all cpus.
+*/
+   uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1;
+   uint64_t hid0_val = mfspr(SPRN_HID0);
+   uint64_t hid1_val = mfspr(SPRN_HID1);
+   uint64_t hid4_val = mfspr(SPRN_HID4);
+   uint64_t hid5_val = mfspr(SPRN_HID5);
+   uint64_t hmeer_val = mfspr(SPRN_HMEER);
+
+   for_each_possible_cpu(cpu) {
+   uint64_t pir = get_hard_smp_processor_id(cpu);
+   uint64_t hsprg0_val = (uint64_t)&paca[cpu];
+
+   /*
+* HSPRG0 is used to store the cpu's pointer to paca. Hence last
+* 3 bits are guaranteed to be 0. Program slw to restore HSPRG0
+* with 63rd bit set, so that when a thread wakes up at 0x100 we
+* can use this bit to distinguish between fastsleep and
+* deep winkle.
+*/
+   hsprg0_val |= 1;
+
+   rc = opal_slw_set_reg(pir, SPRN_HSPRG0, hsprg0_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
+   if (rc != 0)
+   return rc;
+
+   /* HIDs are per core registers */
+   if (cpu_thread_in_core(cpu) == 0) {
+
+   rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val);
+   if (rc != 0)
+   return rc;
+   }
+   }
+
+   return 0;
+}
+
+static void pnv_alloc_idle_core_states(void)
+{
+   int i, j;
+   int nr_cores = cpu_nr_cores();
+   u32 *core_idle_state;
+
+   /*
+* core_idle_state - First 8 bits track the idle state of each thread
+* of the core. The 8th bit is the lock bit. Initially all thread bits
+* are set. They are cleared when the thread enters deep idle state
+* like sleep and winkle. Initially the lock bit is cleared.
+* The lock bit has 2 purposes
+* a. While the first thread is restoring core state, it prevents
+* other threads in the core from switching to process context.
+* b. While the last thread in the core is saving the core state, it
+* prev

[PATCH v5 1/3] powerpc: Fix cpu_online_cores_map to return only online threads mask

2015-04-14 Thread Shreyas B. Prabhu
Currently, cpu_online_cores_map returns a mask, which for every core with
at least one online thread, has the bit for thread 0 of the core set to 1,
and the bits for all other threads of the core set to 0. But thread 0 of
the core itself may not be online always. In such cases, if the returned
mask is used for IPI, then it'll cause IPIs to be skipped on cores where
the first thread is offline, because the IPI code refuses to send IPIs to
offline threads.

Fix this by setting the bit of the first online thread in the core.
This is done by fixing this in the underlying function
cpu_thread_mask_to_cores.

The result has the property that for all cores with online threads, there
is one bit set in the returned map. And further, all bits that are set in
the returned map correspond to online threads.

Signed-off-by: Shreyas B. Prabhu 
Reviewed-by: Preeti U Murthy 
[ Changelog from Michael Ellerman  ]
Reviewed-by: Gautham R. Shenoy 
---
 arch/powerpc/include/asm/cputhreads.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/cputhreads.h 
b/arch/powerpc/include/asm/cputhreads.h
index 4c8ad59..1076d3f 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -31,9 +31,9 @@ extern cpumask_t threads_core_mask;
 /* cpu_thread_mask_to_cores - Return a cpumask of one per cores
  *hit by the argument
  *
- * @threads:   a cpumask of threads
+ * @threads:   a cpumask of online threads
  *
- * This function returns a cpumask which will have one "cpu" (or thread)
+ * This function returns a cpumask which will have one online cpu's
  * bit set for each core that has at least one thread set in the argument.
  *
  * This can typically be used for things like IPI for tlb invalidations
@@ -42,13 +42,16 @@ extern cpumask_t threads_core_mask;
 static inline cpumask_t cpu_thread_mask_to_cores(const struct cpumask *threads)
 {
cpumask_t   tmp, res;
-   int i;
+   int i, cpu;
 
cpumask_clear(&res);
for (i = 0; i < NR_CPUS; i += threads_per_core) {
cpumask_shift_left(&tmp, &threads_core_mask, i);
-   if (cpumask_intersects(threads, &tmp))
-   cpumask_set_cpu(i, &res);
+   if (cpumask_intersects(threads, &tmp)) {
+   cpu = cpumask_next_and(-1, &tmp, cpu_online_mask);
+   if (cpu < nr_cpu_ids)
+   cpumask_set_cpu(cpu, &res);
+   }
}
return res;
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior

2015-04-14 Thread Shreyas B. Prabhu
Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the
communication between L2 and L3 needs to be fenced. But there is a bug
in the current power8 chips surrounding this fencing.

OPAL provides a workaround which precludes the possibility of hitting
this bug. But running with this workaround applied causes checkstop
if any correctable error in L2 cache directory is detected. Hence OPAL
also provides a way to undo the workaround.

In the existing implementation, workaround is applied by the last thread
of the core entering fastsleep and undone by the first thread waking up.
But this has a performance cost. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.

This patch introduces a sysfs attribute (fastsleep_workaround_state)
to choose the behavior of this workaround.

By default, fastsleep_workaround_state = dynamic. In this case, workaround
is applied/undone everytime the core enters/exits fastsleep.

fastsleep_workaround_state = applyonce. In this case the workaround is
applied once on all the cores and never undone. This can be triggered by
echo applyonce > /sys/devices/system/cpu/fastsleep_workaround_state

For simplicity this attribute can be modified only once. Implying, once
fastsleep_workaround_state is changed to applyonce, it cannot be reverted
to the default state.

Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/opal-api.h|   7 ++
 arch/powerpc/include/asm/opal.h|   1 +
 arch/powerpc/platforms/powernv/idle.c  | 134 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 4 files changed, 143 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 0321a90..a49e5fa 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -165,6 +165,13 @@
 #define OPAL_PM_WINKLE_ENABLED 0x0004
 #define OPAL_PM_SLEEP_ENABLED_ER1  0x0008 /* with workaround */
 
+/*
+ * OPAL_CONFIG_CPU_IDLE_STATE parameters
+ */
+#define OPAL_CONFIG_IDLE_FASTSLEEP 1
+#define OPAL_CONFIG_IDLE_UNDO  0
+#define OPAL_CONFIG_IDLE_APPLY 1
+
 #ifndef __ASSEMBLY__
 
 /* Other enums */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 042af1a..9a47813 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -186,6 +186,7 @@ int64_t opal_handle_hmi(void);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
 int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
+int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag);
 int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t 
pe_number);
 int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg,
uint64_t msg_len);
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 104235a..eac7211 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -136,6 +138,129 @@ u32 pnv_get_supported_cpuidle_states(void)
 }
 EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states);
 
+
+static void pnv_fastsleep_workaround_apply(void *info)
+
+{
+   int rc;
+   int *err = info;
+
+   rc = opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP,
+   OPAL_CONFIG_IDLE_APPLY);
+   if (rc)
+   *err = 1;
+}
+
+/*
+ * Used to store fastsleep workaround state
+ * 0 - Workaround applied/undone at fastsleep entry/exit path (Default)
+ * 1 - Workaround applied once, never undone.
+ */
+static u8 fastsleep_workaround_state;
+
+static const char * const fastsleep_workaround_avail_states[] = {
+   "dynamic", "applyonce"
+};
+
+/*
+ * fastsleep_workaround_avail_states values
+ */
+enum {
+   WORKAROUND_DYNAMIC,
+   WORKAROUND_APPLYONCE
+};
+static ssize_t show_fastsleep_workaround_state(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   char *s = buf;
+
+   if (fastsleep_workaround_state == 0) {
+   s += sprintf(s, "[%s] ",
+   fastsleep_workaround_avail_states[WORKAROUND_DYNAMIC]);
+   s += sprintf(s, "%s\n",
+   
fastsleep_workaround_avail_states[WORKAROUND_APPLYONCE]);
+   } else {
+   s += sprintf(s, "%s ",
+   fastsleep_workaround_avail_states[WORKAROUND_DYNAMIC]);
+   s += sprintf(s, "[%s]\n",
+   
fastsleep_workaround_avail_s

[PATCH] kvm: powerpc: Fix ppc64_defconfig + PPC_POWERNV=n build error

2015-04-16 Thread Shreyas B. Prabhu
kvm_no_guest function calls power7_wakeup_loss to put the thread into
the deepest supported idle state. power7_wakeup_loss is defined in
arch/powerpc/kernel/idle_power7.S, which is compiled only when PPC_P7_NAP=y.
And PPC_P7_NAP is selected when PPC_POWERNV=y.
Hence in cases where PPC_POWERNV=n and KVM_BOOK3S_64_HV=y we see the
following error:

arch/powerpc/kvm/built-in.o: In function `kvm_no_guest':
arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x42c): undefined reference to 
`power7_wakeup_loss'

Fix this by adding PPC_POWERNV as a dependency for KVM_BOOK3S_64_HV.

Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/kvm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 11850f3..b3b3d9f 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -75,7 +75,7 @@ config KVM_BOOK3S_64
 
 config KVM_BOOK3S_64_HV
tristate "KVM support for POWER7 and PPC970 using hypervisor mode in 
host"
-   depends on KVM_BOOK3S_64
+   depends on KVM_BOOK3S_64 && PPC_POWERNV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [v3, 1/3] powerpc: Fix cpu_online_cores_map to return only online threads mask

2015-03-30 Thread Shreyas B Prabhu


On Monday 30 March 2015 03:06 PM, Michael Ellerman wrote:
> On Sun, 2015-22-03 at 04:42:57 UTC, "Shreyas B. Prabhu" wrote:
>> Currently, cpu_online_cores_map returns a mask, which for every core
>> that has atleast one online thread, has the first-cpu-of-that-core's bit
>> set. 
> 
>   ... which for every core with at least one online thread, has the bit for
>   thread 0 of the core set to 1, and the bits for all other threads of the 
> core
>   set to 0.
> 
> Maybe that's clearer?
> 
>> But the first cpu itself may not be online always. In such cases, if
>^
>  of the core
> 
>> the returned mask is used for IPI, then it'll cause IPIs to be skipped
>> on cores where the first thread is offline.
> 
>   .. because the IPI code refuses to send IPIs to offline threads, right?

Yes.
> 
>> Fix this by setting first-online-cpu-of-the-core's bit in the mask.
> 
>   .. by setting the bit of the first online thread in the core.
> 
>> This is done by fixing this in the underlying function
>> cpu_thread_mask_to_cores.
> 
> 
> The result has the property that for all cores with online threads, there is
> one bit set in the returned map. And further, all bits that are set in the
> returned map correspond to online threads.
> 
> 
>> Signed-off-by: Shreyas B. Prabhu 
>> ---
>> This patch is new in v3
>>
>> In an example scenario where all the threads of 1st core are offline
>> and argument to cpu_thread_mask_to_cores is cpu_possible_mask,
>> with this implementation, return value will not have any bit
>> corresponding to 1st core set. I think that should be okay. Any thoughts?
> 
> Looking at linux-next:
> 
>   $ git grep cpu_thread_mask_to_cores
>   arch/powerpc/include/asm/cputhreads.h:/* cpu_thread_mask_to_cores - Return 
> a cpumask of one per cores
>   arch/powerpc/include/asm/cputhreads.h:static inline cpumask_t 
> cpu_thread_mask_to_cores(const struct cpumask *threads)
>   arch/powerpc/include/asm/cputhreads.h:  return 
> cpu_thread_mask_to_cores(cpu_online_mask);
>   $ git grep cpu_online_cores_map
>   arch/powerpc/include/asm/cputhreads.h:static inline cpumask_t 
> cpu_online_cores_map(void)
> 
> ie. There are no users.
> 
> So yeah I think we can change the semantics of this, and the semantics you
> describe make sense.
> 
> If you agree with my changelog comments I'm happy to fix that up and merge
> this, or you can send a v4 if you like.
> 

I'll fix the changelog in v4.
> cheers
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [v3, 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior

2015-03-30 Thread Shreyas B Prabhu


On Monday 30 March 2015 03:51 PM, Michael Ellerman wrote:
> On Sun, 2015-22-03 at 04:42:59 UTC, "Shreyas B. Prabhu" wrote:
>> Fastsleep is one of the idle state which cpuidle subsystem currently
>> uses on power8 machines. In this state L2 cache is brought down to a
>> threshold voltage. Therefore when the core is in fastsleep, the
>> communication between L2 and L3 needs to be fenced. But there is a bug
>> in the current power8 chips surrounding this fencing.
>>
>> OPAL provides a workaround which precludes the possibility of hitting
>> this bug. But running with this workaround applied causes checkstop
>> if any correctable error in L2 cache directory is detected. Hence OPAL
>> also provides a way to undo the workaround.
>>
>> In the existing implementation, workaround is applied by the last thread
>> of the core entering fastsleep and undone by the first thread waking up.
>> But this has a performance cost. These OPAL calls account for roughly
>> 4000 cycles everytime the core has to enter or wakeup from fastsleep.
>>
>> This patch introduces a sysfs attribute (fastsleep_workaround_state)
>> to choose the behavior of this workaround.
>>
>> By default, fastsleep_workaround_state = 0. In this case, workaround
>> is applied/undone everytime the core enters/exits fastsleep.
>>
>> fastsleep_workaround_state = 1. In this case the workaround is applied
>> once on all the cores and never undone. This can be triggered by
>> echo 1 > /sys/devices/system/cpu/fastsleep_workaround_state
>>
>> For simplicity this attribute can be modified only once. Implying, once
>> fastsleep_workaround_state is changed to 1, it cannot be reverted to
>> the default state.
> 
> This sounds good, although the name is a bit vague.
> 
> Just calling it "state" doesn't make it clear what 0 and 1 mean.
> I think better would be "fastsleep_workaround_active" ?
> 
> Though even that is a bit wrong, because 0 doesn't really mean it's not 
> active,
> it means it's not *permanently* active.
> 
> So another option would be to make it a string attribute, with the initial
> state being eg. "dynamic" and then maybe "applied" for the applied state?
> 
How about "fastsleep_workaround_permanent", with default value = 0. User
can make workaround permanent by echoing 1 to it.

I'll post out V4 with the suggested changes.


Thanks,
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] tracing/mm: Don't trace kmem_cache_free on offline cpus

2015-04-28 Thread Shreyas B. Prabhu
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_kmem_cache_free can be called on an offline cpu in
this scenario caught by LOCKDEP:

===
[ INFO: suspicious RCU usage. ]
4.1.0-rc1+ #9 Not tainted
---
include/trace/events/kmem.h:148 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
Call Trace:
[c01fed2f78f0] [c09dee8c] .dump_stack+0x98/0xd4 (unreliable)
[c01fed2f7970] [c0128d88] .lockdep_rcu_suspicious+0x108/0x170
[c01fed2f7a00] [c026f924] .kmem_cache_free+0x344/0x4b0
[c01fed2f7ab0] [c00bd1cc] .__mmdrop+0x4c/0x160
[c01fed2f7b40] [c01068e0] .idle_task_exit+0xf0/0x100
[c01fed2f7bc0] [c0066948] .pnv_smp_cpu_kill_self+0x58/0x2c0
[c01fed2f7ca0] [c003ce34] .cpu_die+0x34/0x50
[c01fed2f7d10] [c00176d0] .arch_cpu_idle_dead+0x20/0x40
[c01fed2f7d80] [c011f9a8] .cpu_startup_entry+0x708/0x7a0
[c01fed2f7ec0] [c003cb6c] .start_secondary+0x36c/0x3a0
[c01fed2f7f90] [c0008b6c] start_secondary_prolog+0x10/0x14

Fix this by converting kmem_cache_free trace point into TRACE_EVENT_CONDITION
where condition is cpu_online(smp_processor_id())

Signed-off-by: Shreyas B. Prabhu 
Reported-by: Aneesh Kumar K.V 
---
 include/trace/events/kmem.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 81ea598..dd9e612 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -140,11 +140,13 @@ DEFINE_EVENT(kmem_free, kfree,
TP_ARGS(call_site, ptr)
 );
 
-DEFINE_EVENT(kmem_free, kmem_cache_free,
+DEFINE_EVENT_CONDITION(kmem_free, kmem_cache_free,
 
TP_PROTO(unsigned long call_site, const void *ptr),
 
-   TP_ARGS(call_site, ptr)
+   TP_ARGS(call_site, ptr),
+
+   TP_CONDITION(cpu_online(smp_processor_id()))
 );
 
 TRACE_EVENT(mm_page_free,
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus

2015-04-28 Thread Shreyas B. Prabhu
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_mm_page_pcpu_drain can be called on an offline cpu
in this scenario caught by LOCKDEP:

 ===
 [ INFO: suspicious RCU usage. ]
 4.1.0-rc1+ #9 Not tainted
 ---
 include/trace/events/kmem.h:265 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
 1 lock held by swapper/5/0:
  #0:  (&(&zone->lock)->rlock){..-...}, at: [] 
.free_pcppages_bulk+0x70/0x920

stack backtrace:
 CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.1.0-rc1+ #9
 Call Trace:
 [c01fed2e7720] [c09dee8c] .dump_stack+0x98/0xd4 (unreliable)
 [c01fed2e77a0] [c0128d88] .lockdep_rcu_suspicious+0x108/0x170
 [c01fed2e7830] [c020794c] .free_pcppages_bulk+0x60c/0x920
 [c01fed2e7980] [c0208188] .free_hot_cold_page+0x208/0x280
 [c01fed2e7a30] [c004d000] .destroy_context+0x90/0xd0
 [c01fed2e7ab0] [c00bd1d8] .__mmdrop+0x58/0x160
 [c01fed2e7b40] [c01068e0] .idle_task_exit+0xf0/0x100
 [c01fed2e7bc0] [c0066948] .pnv_smp_cpu_kill_self+0x58/0x2c0
 [c01fed2e7ca0] [c003ce34] .cpu_die+0x34/0x50
 [c01fed2e7d10] [c00176d0] .arch_cpu_idle_dead+0x20/0x40
 [c01fed2e7d80] [c011f9a8] .cpu_startup_entry+0x708/0x7a0
 [c01fed2e7ec0] [c003cb6c] .start_secondary+0x36c/0x3a0
 [c01fed2e7f90] [c0008b6c] start_secondary_prolog+0x10/0x14

Fix this by converting mm_page_pcpu_drain trace point into TRACE_EVENT_CONDITION
where condition is cpu_online(smp_processor_id())

Signed-off-by: Shreyas B. Prabhu 
---
 include/trace/events/kmem.h | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 4abda92..6cd975f 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -257,12 +257,26 @@ DEFINE_EVENT(mm_page, mm_page_alloc_zone_locked,
TP_ARGS(page, order, migratetype)
 );
 
-DEFINE_EVENT_PRINT(mm_page, mm_page_pcpu_drain,
+TRACE_EVENT_CONDITION(mm_page_pcpu_drain,
 
TP_PROTO(struct page *page, unsigned int order, int migratetype),
 
TP_ARGS(page, order, migratetype),
 
+   TP_CONDITION(cpu_online(smp_processor_id())),
+
+   TP_STRUCT__entry(
+   __field(unsigned long,  pfn )
+   __field(unsigned int,   order   )
+   __field(int,migratetype )
+   ),
+
+   TP_fast_assign(
+   __entry->pfn= page ? page_to_pfn(page) : -1UL;
+   __entry->order  = order;
+   __entry->migratetype= migratetype;
+   ),
+
TP_printk("page=%p pfn=%lu order=%d migratetype=%d",
pfn_to_page(__entry->pfn), __entry->pfn,
__entry->order, __entry->migratetype)
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] tracing/mm: Don't trace mm_page_free on offline cpus

2015-04-28 Thread Shreyas B. Prabhu
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_mm_page_free can be called on an offline cpu in
this scenario caught by LOCKDEP:

 ===
 [ INFO: suspicious RCU usage. ]
 4.1.0-rc1+ #9 Not tainted
 ---
 include/trace/events/kmem.h:170 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
 no locks held by swapper/1/0.

stack backtrace:
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
 Call Trace:
 [c01fed2f7790] [c09dee8c] .dump_stack+0x98/0xd4 (unreliable)
 [c01fed2f7810] [c0128d88] .lockdep_rcu_suspicious+0x108/0x170
 [c01fed2f78a0] [c0203bc4] .free_pages_prepare+0x494/0x680
 [c01fed2f7980] [c0207fd0] .free_hot_cold_page+0x50/0x280
 [c01fed2f7a30] [c004d000] .destroy_context+0x90/0xd0
 [c01fed2f7ab0] [c00bd1d8] .__mmdrop+0x58/0x160
 [c01fed2f7b40] [c01068e0] .idle_task_exit+0xf0/0x100
 [c01fed2f7bc0] [c0066948] .pnv_smp_cpu_kill_self+0x58/0x2c0
 [c01fed2f7ca0] [c003ce34] .cpu_die+0x34/0x50
 [c01fed2f7d10] [c00176d0] .arch_cpu_idle_dead+0x20/0x40
 [c01fed2f7d80] [c011f9a8] .cpu_startup_entry+0x708/0x7a0
 [c01fed2f7ec0] [c003cb6c] .start_secondary+0x36c/0x3a0
 [c01fed2f7f90] [c0008b6c] start_secondary_prolog+0x10/0x14

Fix this by converting mm_page_free trace point into TRACE_EVENT_CONDITION
where condition is cpu_online(smp_processor_id())

Signed-off-by: Shreyas B. Prabhu 
---
 include/trace/events/kmem.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index dd9e612..4abda92 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -149,12 +149,14 @@ DEFINE_EVENT_CONDITION(kmem_free, kmem_cache_free,
TP_CONDITION(cpu_online(smp_processor_id()))
 );
 
-TRACE_EVENT(mm_page_free,
+TRACE_EVENT_CONDITION(mm_page_free,
 
TP_PROTO(struct page *page, unsigned int order),
 
TP_ARGS(page, order),
 
+   TP_CONDITION(cpu_online(smp_processor_id())),
+
TP_STRUCT__entry(
__field(unsigned long,  pfn )
__field(unsigned int,   order   )
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus

2015-04-29 Thread Shreyas B Prabhu

>> -DEFINE_EVENT_PRINT(mm_page, mm_page_pcpu_drain,
>> +TRACE_EVENT_CONDITION(mm_page_pcpu_drain,
>>
>> TP_PROTO(struct page *page, unsigned int order, int migratetype),
>>
>> TP_ARGS(page, order, migratetype),
>>
>> +   TP_CONDITION(cpu_online(smp_processor_id())),
>> +
>> +   TP_STRUCT__entry(
>> +   __field(unsigned long,  pfn )
>> +   __field(unsigned int,   order   )
>> +   __field(int,migratetype )
>> +   ),
>> +
>> +   TP_fast_assign(
>> +   __entry->pfn= page ? page_to_pfn(page) : -1UL;
>> +   __entry->order  = order;
>> +   __entry->migratetype= migratetype;
>> +   ),
>> +
> 
> What was the need to do the above changes besides adding TP_CONDITION ?
> 

IIUC there is no existing macro which can both add a condition and
override printk format, hence the fall back to TRACE_EVENT_CONDITION.

Thanks,
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus

2015-04-29 Thread Shreyas B Prabhu


On Wednesday 29 April 2015 08:48 PM, Steven Rostedt wrote:
> On Wed, 29 Apr 2015 20:19:28 +0530
> Shreyas B Prabhu  wrote:
> 
>> IIUC there is no existing macro which can both add a condition and
>> override printk format, hence the fall back to TRACE_EVENT_CONDITION.
> 
> Hmm, want me to send you a patch that changes that?
> 
I am not sure if its worth the effort now. It doesn't look like any
other trace point apart from the above use case will benefit from it.
Only smbus_write and smbus_reply seem to come close. But even they need
separate TP_fast_assign.

Thanks,
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus

2015-04-29 Thread Shreyas B Prabhu


On Wednesday 29 April 2015 10:38 PM, Steven Rostedt wrote:
>> I am not sure if its worth the effort now. It doesn't look like any
>> other trace point apart from the above use case will benefit from it.
>> Only smbus_write and smbus_reply seem to come close. But even they need
>> separate TP_fast_assign.
> 
> It shouldn't be a problem to implement. But I'm currently cleaning up
> those files, and any changes will cause nasty conflicts.
> 
> Lets do this. Push the current changes as is, and when I get around to
> adding a DEFINE_EVENT_PRINT_CONDITION(), we can modify that code to use
> it.
> 
Okay, sure.

Thanks,
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] tracing/mm: Don't trace mm_page_pcpu_drain on offline cpus

2015-04-29 Thread Shreyas B Prabhu


On Thursday 30 April 2015 10:06 AM, Preeti Murthy wrote:
> On Wed, Apr 29, 2015 at 10:49 PM, Shreyas B Prabhu
>  wrote:
>>
>>
>> On Wednesday 29 April 2015 10:38 PM, Steven Rostedt wrote:
>>>> I am not sure if its worth the effort now. It doesn't look like any
>>>> other trace point apart from the above use case will benefit from it.
>>>> Only smbus_write and smbus_reply seem to come close. But even they need
>>>> separate TP_fast_assign.
>>>
>>> It shouldn't be a problem to implement. But I'm currently cleaning up
>>> those files, and any changes will cause nasty conflicts.
>>>
>>> Lets do this. Push the current changes as is, and when I get around to
>>> adding a DEFINE_EVENT_PRINT_CONDITION(), we can modify that code to use
>>> it.
>>>
>> Okay, sure.
> 
> Looks good then.
> 
> Reviewed-by: Preeti U Murthy 

Thanks a lot!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 1/3] powerpc: Fix cpu_online_cores_map to return only online threads mask

2015-04-19 Thread Shreyas B. Prabhu
Currently, cpu_online_cores_map returns a mask, which for every core with
at least one online thread, has the bit for thread 0 of the core set to 1,
and the bits for all other threads of the core set to 0. But thread 0 of
the core itself may not be online always. In such cases, if the returned
mask is used for IPI, then it'll cause IPIs to be skipped on cores where
the first thread is offline, because the IPI code refuses to send IPIs to
offline threads.

Fix this by setting the bit of the first online thread in the core.
This is done by fixing this in the underlying function
cpu_thread_mask_to_cores.

The result has the property that for all cores with online threads, there
is one bit set in the returned map. And further, all bits that are set in
the returned map correspond to online threads.

Signed-off-by: Shreyas B. Prabhu 
Reviewed-by: Preeti U Murthy 
[ Changelog from Michael Ellerman  ]
Reviewed-by: Gautham R. Shenoy 
---
 arch/powerpc/include/asm/cputhreads.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/cputhreads.h 
b/arch/powerpc/include/asm/cputhreads.h
index 4c8ad59..1076d3f 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -31,9 +31,9 @@ extern cpumask_t threads_core_mask;
 /* cpu_thread_mask_to_cores - Return a cpumask of one per cores
  *hit by the argument
  *
- * @threads:   a cpumask of threads
+ * @threads:   a cpumask of online threads
  *
- * This function returns a cpumask which will have one "cpu" (or thread)
+ * This function returns a cpumask which will have one online cpu's
  * bit set for each core that has at least one thread set in the argument.
  *
  * This can typically be used for things like IPI for tlb invalidations
@@ -42,13 +42,16 @@ extern cpumask_t threads_core_mask;
 static inline cpumask_t cpu_thread_mask_to_cores(const struct cpumask *threads)
 {
cpumask_t   tmp, res;
-   int i;
+   int i, cpu;
 
cpumask_clear(&res);
for (i = 0; i < NR_CPUS; i += threads_per_core) {
cpumask_shift_left(&tmp, &threads_core_mask, i);
-   if (cpumask_intersects(threads, &tmp))
-   cpumask_set_cpu(i, &res);
+   if (cpumask_intersects(threads, &tmp)) {
+   cpu = cpumask_next_and(-1, &tmp, cpu_online_mask);
+   if (cpu < nr_cpu_ids)
+   cpumask_set_cpu(cpu, &res);
+   }
}
return res;
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 0/3] powerpc: powernv: Fastsleep workaround behavior

2015-04-19 Thread Shreyas B. Prabhu
Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the 
communication between L2 and L3 needs to be fenced. But there is a bug 
in the current power8 chips surrounding this fencing.

OPAL provides a workaround which precludes the possibility of hitting
this bug. But running with this workaround applied causes checkstop
if any correctable error in L2 cache directory is detected. Hence OPAL
also provides a way to undo the workaround.

In the existing implementation, workaround is applied by the last thread
of the core entering fastsleep and undone by the first thread waking up. 
But this has a performance cost. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.

This patchset introduces a sysfs attribute (fastsleep_workaround_applyonce)
to choose the behavior of this workaround.

Patch 1/3 fixes cpu_online_cores_map which is used by Patch 3/3. 
Patch 2/3 is a clean up patch. It moves all cpuidle related code into 
a new file. 
Patch 3/3 introduces the sysfs attribute to control fastsleep workaround
behavior


Changes in v6:
- Changed the sysfs parameter to take 0/1 as input

Changes in v5:
- Fix potential race with hotplug with get_online_cpu/put_online_cpu

Changes in v4:
-
-Handling patch_instruction and OPAL call errors
-Sysfs attribute takes string ("dynamic" vs "applyonce") as input. 
-Improved changelogs

Changes in v3:
--
-Kernel parameter changed to sysfs attribute

Changes in v2:
--
-Changed commit message to accurately describe the downside
 of running workaround always applied.

Shreyas B. Prabhu (3):
  powerpc: Fix cpu_online_cores_map to return only online threads mask
  powerpc/powernv: Move cpuidle related code from setup.c to new file
  powerpc/powernv: Introduce sysfs control for fastsleep workaround
behavior

 arch/powerpc/include/asm/cputhreads.h  |  13 +-
 arch/powerpc/include/asm/opal-api.h|   7 +
 arch/powerpc/include/asm/opal.h|   1 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/idle.c  | 323 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/setup.c | 171 -
 7 files changed, 341 insertions(+), 177 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/idle.c

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior

2015-04-19 Thread Shreyas B. Prabhu
Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the
communication between L2 and L3 needs to be fenced. But there is a bug
in the current power8 chips surrounding this fencing.

OPAL provides a workaround which precludes the possibility of hitting
this bug. But running with this workaround applied causes checkstop
if any correctable error in L2 cache directory is detected. Hence OPAL
also provides a way to undo the workaround.

In the existing implementation, workaround is applied by the last thread
of the core entering fastsleep and undone by the first thread waking up.
But this has a performance cost. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.

This patch introduces a sysfs attribute (fastsleep_workaround_applyonce)
to choose the behavior of this workaround.

By default, fastsleep_workaround_applyonce = 0. In this case, workaround
is applied/undone everytime the core enters/exits fastsleep.

fastsleep_workaround_applyonce = 1. In this case the workaround is
applied once on all the cores and never undone. This can be triggered by
echo 1 > /sys/devices/system/cpu/fastsleep_workaround_applyonce

For simplicity this attribute can be modified only once. Implying, once
fastsleep_workaround_applyonce is changed to 1, it cannot be reverted
to the default state.

Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/opal-api.h|   7 ++
 arch/powerpc/include/asm/opal.h|   1 +
 arch/powerpc/platforms/powernv/idle.c  | 101 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 4 files changed, 110 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 0321a90..a49e5fa 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -165,6 +165,13 @@
 #define OPAL_PM_WINKLE_ENABLED 0x0004
 #define OPAL_PM_SLEEP_ENABLED_ER1  0x0008 /* with workaround */
 
+/*
+ * OPAL_CONFIG_CPU_IDLE_STATE parameters
+ */
+#define OPAL_CONFIG_IDLE_FASTSLEEP 1
+#define OPAL_CONFIG_IDLE_UNDO  0
+#define OPAL_CONFIG_IDLE_APPLY 1
+
 #ifndef __ASSEMBLY__
 
 /* Other enums */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 042af1a..9a47813 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -186,6 +186,7 @@ int64_t opal_handle_hmi(void);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
 int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
+int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag);
 int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t 
pe_number);
 int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg,
uint64_t msg_len);
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 104235a..f90cc86 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -136,6 +138,96 @@ u32 pnv_get_supported_cpuidle_states(void)
 }
 EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states);
 
+
+static void pnv_fastsleep_workaround_apply(void *info)
+
+{
+   int rc;
+   int *err = info;
+
+   rc = opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP,
+   OPAL_CONFIG_IDLE_APPLY);
+   if (rc)
+   *err = 1;
+}
+
+/*
+ * Used to store fastsleep workaround state
+ * 0 - Workaround applied/undone at fastsleep entry/exit path (Default)
+ * 1 - Workaround applied once, never undone.
+ */
+static u8 fastsleep_workaround_applyonce;
+
+static ssize_t show_fastsleep_workaround_applyonce(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   return sprintf(buf, "%u\n", fastsleep_workaround_applyonce);
+}
+
+static ssize_t store_fastsleep_workaround_applyonce(struct device *dev,
+   struct device_attribute *attr, const char *buf,
+   size_t count)
+{
+   cpumask_t primary_thread_mask;
+   int err;
+   u8 val;
+
+   if (kstrtou8(buf, 0, &val) || val != 1)
+   return -EINVAL;
+
+   if (fastsleep_workaround_applyonce == 1)
+   return count;
+
+   /*
+* fastsleep_workaround_applyonce = 1 implies
+* fastsleep workaround needs to be left in 'applied' state on all
+* the cores. Do this by-
+* 1. Patching out the call to 'undo' workaround in fastsleep exit path
+* 2. Sending ipi to all the cores which have atleast one onl

[PATCH v6 2/3] powerpc/powernv: Move cpuidle related code from setup.c to new file

2015-04-19 Thread Shreyas B. Prabhu
This is a cleanup patch; doesn't change any functionality. Moves
all cpuidle related code from setup.c to a new file.

Signed-off-by: Shreyas B. Prabhu 
Reviewed-by: Preeti U Murthy 
---
 arch/powerpc/platforms/powernv/Makefile |   2 +-
 arch/powerpc/platforms/powernv/idle.c   | 191 
 arch/powerpc/platforms/powernv/setup.c  | 171 
 3 files changed, 192 insertions(+), 172 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/idle.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 33e44f3..bee9235 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,4 +1,4 @@
-obj-y  += setup.o opal-wrappers.o opal.o opal-async.o
+obj-y  += setup.o opal-wrappers.o opal.o opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
new file mode 100644
index 000..104235a
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -0,0 +1,191 @@
+/*
+ * PowerNV cpuidle code
+ *
+ * Copyright 2015 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "powernv.h"
+#include "subcore.h"
+
+static u32 supported_cpuidle_states;
+
+int pnv_save_sprs_for_winkle(void)
+{
+   int cpu;
+   int rc;
+
+   /*
+* hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross
+* all cpus at boot. Get these reg values of current cpu and use the
+* same accross all cpus.
+*/
+   uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1;
+   uint64_t hid0_val = mfspr(SPRN_HID0);
+   uint64_t hid1_val = mfspr(SPRN_HID1);
+   uint64_t hid4_val = mfspr(SPRN_HID4);
+   uint64_t hid5_val = mfspr(SPRN_HID5);
+   uint64_t hmeer_val = mfspr(SPRN_HMEER);
+
+   for_each_possible_cpu(cpu) {
+   uint64_t pir = get_hard_smp_processor_id(cpu);
+   uint64_t hsprg0_val = (uint64_t)&paca[cpu];
+
+   /*
+* HSPRG0 is used to store the cpu's pointer to paca. Hence last
+* 3 bits are guaranteed to be 0. Program slw to restore HSPRG0
+* with 63rd bit set, so that when a thread wakes up at 0x100 we
+* can use this bit to distinguish between fastsleep and
+* deep winkle.
+*/
+   hsprg0_val |= 1;
+
+   rc = opal_slw_set_reg(pir, SPRN_HSPRG0, hsprg0_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
+   if (rc != 0)
+   return rc;
+
+   /* HIDs are per core registers */
+   if (cpu_thread_in_core(cpu) == 0) {
+
+   rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val);
+   if (rc != 0)
+   return rc;
+   }
+   }
+
+   return 0;
+}
+
+static void pnv_alloc_idle_core_states(void)
+{
+   int i, j;
+   int nr_cores = cpu_nr_cores();
+   u32 *core_idle_state;
+
+   /*
+* core_idle_state - First 8 bits track the idle state of each thread
+* of the core. The 8th bit is the lock bit. Initially all thread bits
+* are set. They are cleared when the thread enters deep idle state
+* like sleep and winkle. Initially the lock bit is cleared.
+* The lock bit has 2 purposes
+* a. While the first thread is restoring core state, it prevents
+* other threads in the core from switching to process context.
+* b. While the last thread in the core is saving the core state, it
+* prev

[PATCH v4 0/3] powerpc: powernv: Fastsleep workaround behavior

2015-04-13 Thread Shreyas B. Prabhu
Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the 
communication between L2 and L3 needs to be fenced. But there is a bug 
in the current power8 chips surrounding this fencing.

OPAL provides a workaround which precludes the possibility of hitting
this bug. But running with this workaround applied causes checkstop
if any correctable error in L2 cache directory is detected. Hence OPAL
also provides a way to undo the workaround.

In the existing implementation, workaround is applied by the last thread
of the core entering fastsleep and undone by the first thread waking up. 
But this has a performance cost. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.

This patchset introduces a sysfs attribute (fastsleep_workaround_state)
to choose the behavior of this workaround.

Patch 1/3 fixes cpu_online_cores_map which is used by Patch 3/3. 
Patch 2/3 is a clean up patch. It moves all cpuidle related code into 
a new file. 
Patch 3/3 introduces the sysfs attribute to control fastsleep workaround
behavior


Changes in v4:
-
-Handling patch_instruction and OPAL call errors
-Sysfs attribute takes string ("dynamic" vs "applyonce") as input. 
-Improved changelogs

Changes in v3:
--
-Kernel parameter changed to sysfs attribute

Changes in v2:
--
-Changed commit message to accurately describe the downside
 of running workaround always applied.

Shreyas B. Prabhu (3):
  powerpc: Fix cpu_online_cores_map to return only online threads mask
  powerpc/powernv: Move cpuidle related code from setup.c to new file
  powerpc/powernv: Introduce sysfs control for fastsleep workaround
behavior

 arch/powerpc/include/asm/cputhreads.h  |  13 +-
 arch/powerpc/include/asm/opal-api.h|   7 +
 arch/powerpc/include/asm/opal.h|   1 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/idle.c  | 323 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/setup.c | 171 -
 7 files changed, 341 insertions(+), 177 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/idle.c

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/3] powerpc: Fix cpu_online_cores_map to return only online threads mask

2015-04-13 Thread Shreyas B. Prabhu
Currently, cpu_online_cores_map returns a mask, which for every core with
at least one online thread, has the bit for thread 0 of the core set to 1,
and the bits for all other threads of the core set to 0. But thread 0 of
the core itself may not be online always. In such cases, if the returned
mask is used for IPI, then it'll cause IPIs to be skipped on cores where
the first thread is offline, because the IPI code refuses to send IPIs to
offline threads.

Fix this by setting the bit of the first online thread in the core.
This is done by fixing this in the underlying function
cpu_thread_mask_to_cores.

The result has the property that for all cores with online threads, there
is one bit set in the returned map. And further, all bits that are set in
the returned map correspond to online threads.

Signed-off-by: Shreyas B. Prabhu 
Reviewed-by: Preeti U Murthy 
[ Changelog from Michael Ellerman  ]
---
 arch/powerpc/include/asm/cputhreads.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/cputhreads.h 
b/arch/powerpc/include/asm/cputhreads.h
index 4c8ad59..1076d3f 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -31,9 +31,9 @@ extern cpumask_t threads_core_mask;
 /* cpu_thread_mask_to_cores - Return a cpumask of one per cores
  *hit by the argument
  *
- * @threads:   a cpumask of threads
+ * @threads:   a cpumask of online threads
  *
- * This function returns a cpumask which will have one "cpu" (or thread)
+ * This function returns a cpumask which will have one online cpu's
  * bit set for each core that has at least one thread set in the argument.
  *
  * This can typically be used for things like IPI for tlb invalidations
@@ -42,13 +42,16 @@ extern cpumask_t threads_core_mask;
 static inline cpumask_t cpu_thread_mask_to_cores(const struct cpumask *threads)
 {
cpumask_t   tmp, res;
-   int i;
+   int i, cpu;
 
cpumask_clear(&res);
for (i = 0; i < NR_CPUS; i += threads_per_core) {
cpumask_shift_left(&tmp, &threads_core_mask, i);
-   if (cpumask_intersects(threads, &tmp))
-   cpumask_set_cpu(i, &res);
+   if (cpumask_intersects(threads, &tmp)) {
+   cpu = cpumask_next_and(-1, &tmp, cpu_online_mask);
+   if (cpu < nr_cpu_ids)
+   cpumask_set_cpu(cpu, &res);
+   }
}
return res;
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior

2015-04-13 Thread Shreyas B. Prabhu
Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the
communication between L2 and L3 needs to be fenced. But there is a bug
in the current power8 chips surrounding this fencing.

OPAL provides a workaround which precludes the possibility of hitting
this bug. But running with this workaround applied causes checkstop
if any correctable error in L2 cache directory is detected. Hence OPAL
also provides a way to undo the workaround.

In the existing implementation, workaround is applied by the last thread
of the core entering fastsleep and undone by the first thread waking up.
But this has a performance cost. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.

This patch introduces a sysfs attribute (fastsleep_workaround_state)
to choose the behavior of this workaround.

By default, fastsleep_workaround_state = dynamic. In this case, workaround
is applied/undone everytime the core enters/exits fastsleep.

fastsleep_workaround_state = applyonce. In this case the workaround is
applied once on all the cores and never undone. This can be triggered by
echo applyonce > /sys/devices/system/cpu/fastsleep_workaround_state

For simplicity this attribute can be modified only once. Implying, once
fastsleep_workaround_state is changed to applyonce, it cannot be reverted
to the default state.

Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/opal-api.h|   7 ++
 arch/powerpc/include/asm/opal.h|   1 +
 arch/powerpc/platforms/powernv/idle.c  | 132 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 4 files changed, 141 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 0321a90..a49e5fa 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -165,6 +165,13 @@
 #define OPAL_PM_WINKLE_ENABLED 0x0004
 #define OPAL_PM_SLEEP_ENABLED_ER1  0x0008 /* with workaround */
 
+/*
+ * OPAL_CONFIG_CPU_IDLE_STATE parameters
+ */
+#define OPAL_CONFIG_IDLE_FASTSLEEP 1
+#define OPAL_CONFIG_IDLE_UNDO  0
+#define OPAL_CONFIG_IDLE_APPLY 1
+
 #ifndef __ASSEMBLY__
 
 /* Other enums */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 042af1a..9a47813 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -186,6 +186,7 @@ int64_t opal_handle_hmi(void);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
 int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
+int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag);
 int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t 
pe_number);
 int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg,
uint64_t msg_len);
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 104235a..3e0423d 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -136,6 +138,127 @@ u32 pnv_get_supported_cpuidle_states(void)
 }
 EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states);
 
+
+static void pnv_fastsleep_workaround_apply(void *info)
+
+{
+   int rc;
+   int *err = info;
+
+   rc = opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP,
+   OPAL_CONFIG_IDLE_APPLY);
+   if (rc)
+   *err = 1;
+}
+
+/*
+ * Used to store fastsleep workaround state
+ * 0 - Workaround applied/undone at fastsleep entry/exit path (Default)
+ * 1 - Workaround applied once, never undone.
+ */
+static u8 fastsleep_workaround_state;
+
+static const char * const fastsleep_workaround_avail_states[] = {
+   "dynamic", "applyonce"
+};
+
+/*
+ * fastsleep_workaround_avail_states values
+ */
+enum {
+   WORKAROUND_DYNAMIC,
+   WORKAROUND_APPLYONCE
+};
+static ssize_t show_fastsleep_workaround_state(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   char *s = buf;
+
+   if (fastsleep_workaround_state == 0) {
+   s += sprintf(s, "[%s] ",
+   fastsleep_workaround_avail_states[WORKAROUND_DYNAMIC]);
+   s += sprintf(s, "%s\n",
+   
fastsleep_workaround_avail_states[WORKAROUND_APPLYONCE]);
+   } else {
+   s += sprintf(s, "%s ",
+   fastsleep_workaround_avail_states[WORKAROUND_DYNAMIC]);
+   s += sprintf(s, "[%s]\n",
+   
fastsleep_workaround_avail_s

[PATCH v4 2/3] powerpc/powernv: Move cpuidle related code from setup.c to new file

2015-04-13 Thread Shreyas B. Prabhu
This is a cleanup patch; doesn't change any functionality. Moves
all cpuidle related code from setup.c to a new file.

Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/platforms/powernv/Makefile |   2 +-
 arch/powerpc/platforms/powernv/idle.c   | 191 
 arch/powerpc/platforms/powernv/setup.c  | 171 
 3 files changed, 192 insertions(+), 172 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/idle.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 33e44f3..bee9235 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,4 +1,4 @@
-obj-y  += setup.o opal-wrappers.o opal.o opal-async.o
+obj-y  += setup.o opal-wrappers.o opal.o opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
new file mode 100644
index 000..104235a
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -0,0 +1,191 @@
+/*
+ * PowerNV cpuidle code
+ *
+ * Copyright 2015 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "powernv.h"
+#include "subcore.h"
+
+static u32 supported_cpuidle_states;
+
+int pnv_save_sprs_for_winkle(void)
+{
+   int cpu;
+   int rc;
+
+   /*
+* hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross
+* all cpus at boot. Get these reg values of current cpu and use the
+* same accross all cpus.
+*/
+   uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1;
+   uint64_t hid0_val = mfspr(SPRN_HID0);
+   uint64_t hid1_val = mfspr(SPRN_HID1);
+   uint64_t hid4_val = mfspr(SPRN_HID4);
+   uint64_t hid5_val = mfspr(SPRN_HID5);
+   uint64_t hmeer_val = mfspr(SPRN_HMEER);
+
+   for_each_possible_cpu(cpu) {
+   uint64_t pir = get_hard_smp_processor_id(cpu);
+   uint64_t hsprg0_val = (uint64_t)&paca[cpu];
+
+   /*
+* HSPRG0 is used to store the cpu's pointer to paca. Hence last
+* 3 bits are guaranteed to be 0. Program slw to restore HSPRG0
+* with 63rd bit set, so that when a thread wakes up at 0x100 we
+* can use this bit to distinguish between fastsleep and
+* deep winkle.
+*/
+   hsprg0_val |= 1;
+
+   rc = opal_slw_set_reg(pir, SPRN_HSPRG0, hsprg0_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
+   if (rc != 0)
+   return rc;
+
+   /* HIDs are per core registers */
+   if (cpu_thread_in_core(cpu) == 0) {
+
+   rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val);
+   if (rc != 0)
+   return rc;
+   }
+   }
+
+   return 0;
+}
+
+static void pnv_alloc_idle_core_states(void)
+{
+   int i, j;
+   int nr_cores = cpu_nr_cores();
+   u32 *core_idle_state;
+
+   /*
+* core_idle_state - First 8 bits track the idle state of each thread
+* of the core. The 8th bit is the lock bit. Initially all thread bits
+* are set. They are cleared when the thread enters deep idle state
+* like sleep and winkle. Initially the lock bit is cleared.
+* The lock bit has 2 purposes
+* a. While the first thread is restoring core state, it prevents
+* other threads in the core from switching to process context.
+* b. While the last thread in the core is saving the core state, it
+* prevents a different thread from waking up.
+ 

Re: [PATCH] powerpc: Make doorbell check preemption safe

2015-06-03 Thread Shreyas B Prabhu


On Wednesday 20 May 2015 06:30 AM, Michael Neuling wrote:
> On Wed, 2015-05-20 at 00:30 +0530, Shreyas B. Prabhu wrote:
>> Doorbell can be used to cause ipi on cpus which are sibling threads on
>> the same core. So icp_native_cause_ipi checks if the destination cpu
>> is a sibling thread of the current cpu and uses doorbell in such cases.
>>
>> But while running with CONFIG_PREEMPT=y, since this section is
>> preemtible, we can run into issues if after we check if the destination
>> cpu is a sibling cpu, the task gets migrated from a sibling cpu to a
>> cpu on another core.
>>
>> Fix this by using get_cpu()/ put_cpu()
> 
> Thanks.  Looks good and it's boots for me.
> 
> Signed-off-by: Michael Neuling 
> 
mikey, Thanks!


mpe, if this looks ok, can you please pick it up?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle

2014-09-17 Thread Shreyas B Prabhu
Hi,

In this patch series we use winkle for offlined cores. I successfully
tested the working of this with subcore functionality.

Test scenario was as follows:
1. Set SMT mode to 1, Set subores-per-core to 1
2. Offline a core, in this case cpu 32 (sending it to winkle)
3. Set subcores-per-core to 4
4. Online the core
5. Start a guest (Topology 1 core 2 threads) on a subcore, in this case
on cpu 36

This works without any glitch.

Thanks,
Shreyas

On Monday 25 August 2014 11:31 PM, Shreyas B. Prabhu wrote:
> Fast sleep is an idle state, where the core and the L1 and L2
> caches are brought down to a threshold voltage. This also means that
> the communication between L2 and L3 caches have to be fenced. However
> the current P8 chips have a bug wherein this fencing between L2 and
> L3 caches get delayed by a cpu cycle. This can delay L3 response to
> the other cpus if they request for data during this time. Thus they
> would fetch the same data from the memory which could lead to data
> corruption if L3 cache is not flushed.
> Patch 4 adds support to work around this.
> 
> 'Deep Winkle' is a deeper idle state where core and private L2 are powered
> off. While it offers higher power savings, it is at the cost of losing
> hypervisor register state and higher latency.
> Patch 5-9 adds support for winkle and uses it for offline cpus.
> 
> Patch 1 - Moves parameters required discover idle states to a location 
> common to both cpuidle driver and powernv core code
> Patch 2 - Populates idle state details from device tree
> Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle
> 
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Rafael J. Wysocki 
> Cc: Srivatsa S. Bhat 
> Cc: Preeti U. Murthy 
> Cc: Vaidyanathan Srinivasan 
> Cc: Rob Herring 
> Cc: Grant Likely 
> Cc: devicet...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> 
> Preeti U Murthy (2):
>   cpuidle/powernv: Populate cpuidle state details by querying the
> device-tree
>   powerpc/powernv/cpuidle: Add workaround to enable fastsleep
> 
> Shreyas B. Prabhu (6):
>   powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from
> fast-sleep
>   powerpc/powernv: Add OPAL call to save and restore
>   powerpc: Adding macro for accessing Thread Switch Control Register
>   powerpc/powernv: Add winkle infrastructure
>   powerpc/powernv: Discover and enable winkle
>   powerpc/powernv: Enter deepest supported idle state in offline
> 
> Srivatsa S. Bhat (1):
>   powerpc/powernv: Enable Offline CPUs to enter deep idle states
> 
>  arch/powerpc/include/asm/machdep.h |   4 +
>  arch/powerpc/include/asm/opal.h|  10 ++
>  arch/powerpc/include/asm/paca.h|   3 +
>  arch/powerpc/include/asm/ppc-opcode.h  |   2 +
>  arch/powerpc/include/asm/processor.h   |   6 +-
>  arch/powerpc/include/asm/reg.h |   1 +
>  arch/powerpc/kernel/asm-offsets.c  |   1 +
>  arch/powerpc/kernel/exceptions-64s.S   |  37 ++---
>  arch/powerpc/kernel/idle.c |  30 
>  arch/powerpc/kernel/idle_power7.S  |  83 +-
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
>  arch/powerpc/platforms/powernv/powernv.h   |   8 +
>  arch/powerpc/platforms/powernv/setup.c | 217 
> +
>  arch/powerpc/platforms/powernv/smp.c   |  13 +-
>  arch/powerpc/platforms/powernv/subcore.c   |  15 ++
>  drivers/cpuidle/cpuidle-powernv.c  |  40 -
>  16 files changed, 439 insertions(+), 33 deletions(-)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle

2014-09-11 Thread Shreyas B Prabhu
Hi,
Any updates on this patch series?

On Monday 25 August 2014 11:31 PM, Shreyas B. Prabhu wrote:
> Fast sleep is an idle state, where the core and the L1 and L2
> caches are brought down to a threshold voltage. This also means that
> the communication between L2 and L3 caches have to be fenced. However
> the current P8 chips have a bug wherein this fencing between L2 and
> L3 caches get delayed by a cpu cycle. This can delay L3 response to
> the other cpus if they request for data during this time. Thus they
> would fetch the same data from the memory which could lead to data
> corruption if L3 cache is not flushed.
> Patch 4 adds support to work around this.
> 
> 'Deep Winkle' is a deeper idle state where core and private L2 are powered
> off. While it offers higher power savings, it is at the cost of losing
> hypervisor register state and higher latency.
> Patch 5-9 adds support for winkle and uses it for offline cpus.
> 
> Patch 1 - Moves parameters required discover idle states to a location 
> common to both cpuidle driver and powernv core code
> Patch 2 - Populates idle state details from device tree
> Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle
> 
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Rafael J. Wysocki 
> Cc: Srivatsa S. Bhat 
> Cc: Preeti U. Murthy 
> Cc: Vaidyanathan Srinivasan 
> Cc: Rob Herring 
> Cc: Grant Likely 
> Cc: devicet...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> 
> Preeti U Murthy (2):
>   cpuidle/powernv: Populate cpuidle state details by querying the
> device-tree
>   powerpc/powernv/cpuidle: Add workaround to enable fastsleep
> 
> Shreyas B. Prabhu (6):
>   powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from
> fast-sleep
>   powerpc/powernv: Add OPAL call to save and restore
>   powerpc: Adding macro for accessing Thread Switch Control Register
>   powerpc/powernv: Add winkle infrastructure
>   powerpc/powernv: Discover and enable winkle
>   powerpc/powernv: Enter deepest supported idle state in offline
> 
> Srivatsa S. Bhat (1):
>   powerpc/powernv: Enable Offline CPUs to enter deep idle states
> 
>  arch/powerpc/include/asm/machdep.h |   4 +
>  arch/powerpc/include/asm/opal.h|  10 ++
>  arch/powerpc/include/asm/paca.h|   3 +
>  arch/powerpc/include/asm/ppc-opcode.h  |   2 +
>  arch/powerpc/include/asm/processor.h   |   6 +-
>  arch/powerpc/include/asm/reg.h |   1 +
>  arch/powerpc/kernel/asm-offsets.c  |   1 +
>  arch/powerpc/kernel/exceptions-64s.S   |  37 ++---
>  arch/powerpc/kernel/idle.c |  30 
>  arch/powerpc/kernel/idle_power7.S  |  83 +-
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
>  arch/powerpc/platforms/powernv/powernv.h   |   8 +
>  arch/powerpc/platforms/powernv/setup.c | 217 
> +
>  arch/powerpc/platforms/powernv/smp.c   |  13 +-
>  arch/powerpc/platforms/powernv/subcore.c   |  15 ++
>  drivers/cpuidle/cpuidle-powernv.c  |  40 -
>  16 files changed, 439 insertions(+), 33 deletions(-)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle

2014-09-29 Thread Shreyas B Prabhu
Hi,
Any updates on this patch series?

On Thursday 18 September 2014 08:41 AM, Shreyas B Prabhu wrote:
> Hi,
> 
> In this patch series we use winkle for offlined cores. I successfully
> tested the working of this with subcore functionality.
> 
> Test scenario was as follows:
> 1. Set SMT mode to 1, Set subores-per-core to 1
> 2. Offline a core, in this case cpu 32 (sending it to winkle)
> 3. Set subcores-per-core to 4
> 4. Online the core
> 5. Start a guest (Topology 1 core 2 threads) on a subcore, in this case
> on cpu 36
> 
> This works without any glitch.
> 
> Thanks,
> Shreyas
> 
> On Monday 25 August 2014 11:31 PM, Shreyas B. Prabhu wrote:
>> Fast sleep is an idle state, where the core and the L1 and L2
>> caches are brought down to a threshold voltage. This also means that
>> the communication between L2 and L3 caches have to be fenced. However
>> the current P8 chips have a bug wherein this fencing between L2 and
>> L3 caches get delayed by a cpu cycle. This can delay L3 response to
>> the other cpus if they request for data during this time. Thus they
>> would fetch the same data from the memory which could lead to data
>> corruption if L3 cache is not flushed.
>> Patch 4 adds support to work around this.
>>
>> 'Deep Winkle' is a deeper idle state where core and private L2 are powered
>> off. While it offers higher power savings, it is at the cost of losing
>> hypervisor register state and higher latency.
>> Patch 5-9 adds support for winkle and uses it for offline cpus.
>>
>> Patch 1 - Moves parameters required discover idle states to a location 
>> common to both cpuidle driver and powernv core code
>> Patch 2 - Populates idle state details from device tree
>> Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle
>>
>>
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Michael Ellerman 
>> Cc: Rafael J. Wysocki 
>> Cc: Srivatsa S. Bhat 
>> Cc: Preeti U. Murthy 
>> Cc: Vaidyanathan Srinivasan 
>> Cc: Rob Herring 
>> Cc: Grant Likely 
>> Cc: devicet...@vger.kernel.org
>> Cc: linux...@vger.kernel.org
>> Cc: linuxppc-...@lists.ozlabs.org
>>
>> Preeti U Murthy (2):
>>   cpuidle/powernv: Populate cpuidle state details by querying the
>> device-tree
>>   powerpc/powernv/cpuidle: Add workaround to enable fastsleep
>>
>> Shreyas B. Prabhu (6):
>>   powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from
>> fast-sleep
>>   powerpc/powernv: Add OPAL call to save and restore
>>   powerpc: Adding macro for accessing Thread Switch Control Register
>>   powerpc/powernv: Add winkle infrastructure
>>   powerpc/powernv: Discover and enable winkle
>>   powerpc/powernv: Enter deepest supported idle state in offline
>>
>> Srivatsa S. Bhat (1):
>>   powerpc/powernv: Enable Offline CPUs to enter deep idle states
>>
>>  arch/powerpc/include/asm/machdep.h |   4 +
>>  arch/powerpc/include/asm/opal.h|  10 ++
>>  arch/powerpc/include/asm/paca.h|   3 +
>>  arch/powerpc/include/asm/ppc-opcode.h  |   2 +
>>  arch/powerpc/include/asm/processor.h   |   6 +-
>>  arch/powerpc/include/asm/reg.h |   1 +
>>  arch/powerpc/kernel/asm-offsets.c  |   1 +
>>  arch/powerpc/kernel/exceptions-64s.S   |  37 ++---
>>  arch/powerpc/kernel/idle.c |  30 
>>  arch/powerpc/kernel/idle_power7.S  |  83 +-
>>  arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
>>  arch/powerpc/platforms/powernv/powernv.h   |   8 +
>>  arch/powerpc/platforms/powernv/setup.c | 217 
>> +
>>  arch/powerpc/platforms/powernv/smp.c   |  13 +-
>>  arch/powerpc/platforms/powernv/subcore.c   |  15 ++
>>  drivers/cpuidle/cpuidle-powernv.c  |  40 -
>>  16 files changed, 439 insertions(+), 33 deletions(-)
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle

2014-09-30 Thread Shreyas B Prabhu
Hi Rafael,

On Tuesday 30 September 2014 04:58 AM, Rafael J. Wysocki wrote:
> On Monday, September 29, 2014 03:53:06 PM Shreyas B Prabhu wrote:
>> Hi,
>> Any updates on this patch series?
> 
> I have a couple of patches from there in my tree it seems.  Please have a look
> at linux-pm.git/linux-next and please let me know if that's the case.
> 

I checked linux-pm.git/linux-net (Last commit 067c17382165). None of the patches
in this series are present in the tree. 


> 
>> On Thursday 18 September 2014 08:41 AM, Shreyas B Prabhu wrote:
>>> Hi,
>>>
>>> In this patch series we use winkle for offlined cores. I successfully
>>> tested the working of this with subcore functionality.
>>>
>>> Test scenario was as follows:
>>> 1. Set SMT mode to 1, Set subores-per-core to 1
>>> 2. Offline a core, in this case cpu 32 (sending it to winkle)
>>> 3. Set subcores-per-core to 4
>>> 4. Online the core
>>> 5. Start a guest (Topology 1 core 2 threads) on a subcore, in this case
>>> on cpu 36
>>>
>>> This works without any glitch.
>>>
>>> Thanks,
>>> Shreyas
>>>
>>> On Monday 25 August 2014 11:31 PM, Shreyas B. Prabhu wrote:
>>>> Fast sleep is an idle state, where the core and the L1 and L2
>>>> caches are brought down to a threshold voltage. This also means that
>>>> the communication between L2 and L3 caches have to be fenced. However
>>>> the current P8 chips have a bug wherein this fencing between L2 and
>>>> L3 caches get delayed by a cpu cycle. This can delay L3 response to
>>>> the other cpus if they request for data during this time. Thus they
>>>> would fetch the same data from the memory which could lead to data
>>>> corruption if L3 cache is not flushed.
>>>> Patch 4 adds support to work around this.
>>>>
>>>> 'Deep Winkle' is a deeper idle state where core and private L2 are powered
>>>> off. While it offers higher power savings, it is at the cost of losing
>>>> hypervisor register state and higher latency.
>>>> Patch 5-9 adds support for winkle and uses it for offline cpus.
>>>>
>>>> Patch 1 - Moves parameters required discover idle states to a location 
>>>> common to both cpuidle driver and powernv core code
>>>> Patch 2 - Populates idle state details from device tree
>>>> Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle
>>>>
>>>>
>>>> Cc: Benjamin Herrenschmidt 
>>>> Cc: Paul Mackerras 
>>>> Cc: Michael Ellerman 
>>>> Cc: Rafael J. Wysocki 
>>>> Cc: Srivatsa S. Bhat 
>>>> Cc: Preeti U. Murthy 
>>>> Cc: Vaidyanathan Srinivasan 
>>>> Cc: Rob Herring 
>>>> Cc: Grant Likely 
>>>> Cc: devicet...@vger.kernel.org
>>>> Cc: linux...@vger.kernel.org
>>>> Cc: linuxppc-...@lists.ozlabs.org
>>>>
>>>> Preeti U Murthy (2):
>>>>   cpuidle/powernv: Populate cpuidle state details by querying the
>>>> device-tree
>>>>   powerpc/powernv/cpuidle: Add workaround to enable fastsleep
>>>>
>>>> Shreyas B. Prabhu (6):
>>>>   powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from
>>>> fast-sleep
>>>>   powerpc/powernv: Add OPAL call to save and restore
>>>>   powerpc: Adding macro for accessing Thread Switch Control Register
>>>>   powerpc/powernv: Add winkle infrastructure
>>>>   powerpc/powernv: Discover and enable winkle
>>>>   powerpc/powernv: Enter deepest supported idle state in offline
>>>>
>>>> Srivatsa S. Bhat (1):
>>>>   powerpc/powernv: Enable Offline CPUs to enter deep idle states
>>>>
>>>>  arch/powerpc/include/asm/machdep.h |   4 +
>>>>  arch/powerpc/include/asm/opal.h|  10 ++
>>>>  arch/powerpc/include/asm/paca.h|   3 +
>>>>  arch/powerpc/include/asm/ppc-opcode.h  |   2 +
>>>>  arch/powerpc/include/asm/processor.h   |   6 +-
>>>>  arch/powerpc/include/asm/reg.h |   1 +
>>>>  arch/powerpc/kernel/asm-offsets.c  |   1 +
>>>>  arch/powerpc/kernel/exceptions-64s.S   |  37 ++---
>>>>  arch/powerpc/kernel/idle.c |  30 
>>>>  arch/powerpc/kernel/idle_power7.S  |  83 +-
>>>>  arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
>>>>  arch/powerpc/platforms/powernv/powernv.h   |   8 +
>>>>  arch/powerpc/platforms/powernv/setup.c | 217 
>>>> +
>>>>  arch/powerpc/platforms/powernv/smp.c   |  13 +-
>>>>  arch/powerpc/platforms/powernv/subcore.c   |  15 ++
>>>>  drivers/cpuidle/cpuidle-powernv.c  |  40 -
>>>>  16 files changed, 439 insertions(+), 33 deletions(-)
>>>>
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] powerpc/powernv: Fix build error when CONFIG_SMP=n

2014-05-20 Thread Shreyas B. Prabhu
Fix the following build error when compiled with CONFIG_SMP=n
arch/powerpc/platforms/powernv/setup.c: In function 
‘pnv_kexec_wait_secondaries_down’:
arch/powerpc/platforms/powernv/setup.c:179:4: error: implicit declaration of 
function ‘get_hard_smp_processor_id’ [-Werror=implicit-function-declaration]
rc = opal_query_cpu_status(get_hard_smp_processor_id(i),

The usage of get_hard_smp_processor_id() needs the declaration from .
The file setup.c includes , which in-turn includes .
However,  includes  only on SMP configs and hence UP
builds fail. Fix this by directly including  in setup.c
unconditionally.

Reported-by: Geert Uytterhoeven 
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/platforms/powernv/setup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 8723d32..e6bde98 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "powernv.h"
 
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] powerpc/powernv: include asm/smp.h to handle UP config

2014-06-05 Thread Shreyas B. Prabhu
Build throws following errors when CONFIG_SMP=n
arch/powerpc/platforms/powernv/setup.c: In function 
‘pnv_kexec_wait_secondaries_down’:
arch/powerpc/platforms/powernv/setup.c:179:4: error: implicit declaration of 
function ‘get_hard_smp_processor_id’
rc = opal_query_cpu_status(get_hard_smp_processor_id(i),

The usage of get_hard_smp_processor_id() needs the declaration from
. The file setup.c includes , which in-turn
includes . However,  includes 
only on SMP configs and hence UP builds fail.

Fix this by directly including  in setup.c unconditionally.

Reported-by: Geert Uytterhoeven 
Reviewed-by: Srivatsa S. Bhat 
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/platforms/powernv/setup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 8c16a5f..678573c 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "powernv.h"
 
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] powerpc/powernv : Disable subcore for UP configs

2014-06-05 Thread Shreyas B. Prabhu
Build throws following errors when CONFIG_SMP=n
arch/powerpc/platforms/powernv/subcore.c: In function ‘cpu_update_split_mode’:
arch/powerpc/platforms/powernv/subcore.c:274:15: error: ‘setup_max_cpus’ 
undeclared (first use in this function)
arch/powerpc/platforms/powernv/subcore.c:285:5: error: lvalue required as left 
operand of assignment

'setup_max_cpus' variable is relevant only on SMP, so there is no point
working around it for UP. Furthermore, subcore.c itself is relevant only
on SMP and hence the better solution is to exclude subcore.c for UP builds.

Signed-off-by: Shreyas B. Prabhu 
---
This patch applies on top of ben/powerpc.git/next branch

 arch/powerpc/platforms/powernv/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 4ad0d34..636d206 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,9 +1,9 @@
 obj-y  += setup.o opal-takeover.o opal-wrappers.o opal.o 
opal-async.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
-obj-y  += opal-msglog.o subcore.o subcore-asm.o
+obj-y  += opal-msglog.o subcore-asm.o
 
-obj-$(CONFIG_SMP)  += smp.o
+obj-$(CONFIG_SMP)  += smp.o subcore.o
 obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o
 obj-$(CONFIG_EEH)  += eeh-ioda.o eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2 1/2] powerpc/powernv: include asm/smp.h to fix UP build failure

2014-06-06 Thread Shreyas B. Prabhu
Build throws following errors when CONFIG_SMP=n
arch/powerpc/platforms/powernv/setup.c: In function 
‘pnv_kexec_wait_secondaries_down’:
arch/powerpc/platforms/powernv/setup.c:179:4: error: implicit declaration of 
function ‘get_hard_smp_processor_id’
rc = opal_query_cpu_status(get_hard_smp_processor_id(i),

The usage of get_hard_smp_processor_id() needs the declaration from
. The file setup.c includes , which in-turn
includes . However,  includes 
only on SMP configs and hence UP builds fail.

Fix this by directly including  in setup.c unconditionally.

Reported-by: Geert Uytterhoeven 
Reviewed-by: Srivatsa S. Bhat 
Signed-off-by: Shreyas B. Prabhu 
---
Changes is v2:
Commit message improved based on suggestion.

 arch/powerpc/platforms/powernv/setup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 8c16a5f..678573c 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "powernv.h"
 
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] powerpc/powernv : Disable subcore for UP configs

2014-06-06 Thread Shreyas B. Prabhu
Build throws following errors when CONFIG_SMP=n
arch/powerpc/platforms/powernv/subcore.c: In function ‘cpu_update_split_mode’:
arch/powerpc/platforms/powernv/subcore.c:274:15: error: ‘setup_max_cpus’ 
undeclared (first use in this function)
arch/powerpc/platforms/powernv/subcore.c:285:5: error: lvalue required as left 
operand of assignment

'setup_max_cpus' variable is relevant only on SMP, so there is no point
working around it for UP. Furthermore, subcore itself is relevant only
on SMP and hence the better solution is to exclude subcore.o and
subcore-asm.o for UP builds.

Signed-off-by: Shreyas B. Prabhu 
---
Changes in v2:
Excluding subcore-asm.o which is part of the subcore feature for UP configs.

 arch/powerpc/platforms/powernv/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 4ad0d34..d55891f 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,9 +1,9 @@
 obj-y  += setup.o opal-takeover.o opal-wrappers.o opal.o 
opal-async.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
-obj-y  += opal-msglog.o subcore.o subcore-asm.o
+obj-y  += opal-msglog.o
 
-obj-$(CONFIG_SMP)  += smp.o
+obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o
 obj-$(CONFIG_EEH)  += eeh-ioda.o eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle

2014-08-25 Thread Shreyas B. Prabhu
Fast sleep is an idle state, where the core and the L1 and L2
caches are brought down to a threshold voltage. This also means that
the communication between L2 and L3 caches have to be fenced. However
the current P8 chips have a bug wherein this fencing between L2 and
L3 caches get delayed by a cpu cycle. This can delay L3 response to
the other cpus if they request for data during this time. Thus they
would fetch the same data from the memory which could lead to data
corruption if L3 cache is not flushed.
Patch 4 adds support to work around this.

'Deep Winkle' is a deeper idle state where core and private L2 are powered
off. While it offers higher power savings, it is at the cost of losing
hypervisor register state and higher latency.
Patch 5-9 adds support for winkle and uses it for offline cpus.

Patch 1 - Moves parameters required discover idle states to a location 
common to both cpuidle driver and powernv core code
Patch 2 - Populates idle state details from device tree
Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle


Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Cc: Srivatsa S. Bhat 
Cc: Preeti U. Murthy 
Cc: Vaidyanathan Srinivasan 
Cc: Rob Herring 
Cc: Grant Likely 
Cc: devicet...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org

Preeti U Murthy (2):
  cpuidle/powernv: Populate cpuidle state details by querying the
device-tree
  powerpc/powernv/cpuidle: Add workaround to enable fastsleep

Shreyas B. Prabhu (6):
  powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from
fast-sleep
  powerpc/powernv: Add OPAL call to save and restore
  powerpc: Adding macro for accessing Thread Switch Control Register
  powerpc/powernv: Add winkle infrastructure
  powerpc/powernv: Discover and enable winkle
  powerpc/powernv: Enter deepest supported idle state in offline

Srivatsa S. Bhat (1):
  powerpc/powernv: Enable Offline CPUs to enter deep idle states

 arch/powerpc/include/asm/machdep.h |   4 +
 arch/powerpc/include/asm/opal.h|  10 ++
 arch/powerpc/include/asm/paca.h|   3 +
 arch/powerpc/include/asm/ppc-opcode.h  |   2 +
 arch/powerpc/include/asm/processor.h   |   6 +-
 arch/powerpc/include/asm/reg.h |   1 +
 arch/powerpc/kernel/asm-offsets.c  |   1 +
 arch/powerpc/kernel/exceptions-64s.S   |  37 ++---
 arch/powerpc/kernel/idle.c |  30 
 arch/powerpc/kernel/idle_power7.S  |  83 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
 arch/powerpc/platforms/powernv/powernv.h   |   8 +
 arch/powerpc/platforms/powernv/setup.c | 217 +
 arch/powerpc/platforms/powernv/smp.c   |  13 +-
 arch/powerpc/platforms/powernv/subcore.c   |  15 ++
 drivers/cpuidle/cpuidle-powernv.c  |  40 -
 16 files changed, 439 insertions(+), 33 deletions(-)

-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/9] powerpc/powernv/cpuidle: Add workaround to enable fastsleep

2014-08-25 Thread Shreyas B. Prabhu
From: Preeti U Murthy 

Fast sleep is an idle state, where the core and the L1 and L2
caches are brought down to a threshold voltage. This also means that
the communication between L2 and L3 caches have to be fenced. However
the current P8 chips have a bug wherein this fencing between L2 and
L3 caches get delayed by a cpu cycle. This can delay L3 response to
the other cpus if they request for data during this time. Thus they
would fetch the same data from the memory which could lead to data
corruption if L3 cache is not flushed.

The cpu idle states save power at a core level and not at a thread level.
Hence powersavings is based on the shallowest idle state that a thread
of a core is in. The above issue in fastsleep will arise only when
all the threads in a core either enter fastsleep or some of them enter
any deeper idle states, with only a few being in fastsleep. This patch
therefore implements a workaround this bug  by ensuring
that, each time a cpu goes to fastsleep, it checks if it is the last
thread in the core to enter fastsleep. If so, it needs to make an opal
call to get around the above mentioned fastsleep problem in the hardware
before issuing the sleep instruction.

Similarly when a thread in a core comes out of fastsleep, it needs
to verify if its the first thread in the core to come out of fastsleep
and issue the opal call to revert the changes made while entering
fastsleep.

For the same reason mentioned above we need to take care of offline threads
as well since we allow them to enter fastsleep and with support for
deep winkle soon coming in they can enter winkle as well.  We therefore
ensure that even offline threads make the above mentioned opal calls
similarly, so that as long as the threads in a core are in and
idle state >= fastsleep, we have the workaround in place. Whenever a
thread comes out of either of these states, it needs to verify if the
opal call has been made and if so it will revert it. For now this patch
ensures that offline threads enter fastsleep.

We need to be able to synchronize the cpus in a core which are entering
and exiting fastsleep so as to ensure that the last thread in the core
to enter fastsleep and the first to exit fastsleep *only* issue the opal
call. To do so, we need a per-core lock and counter. The counter is
required to keep track of the number of threads in a core which are in
idle state >= fastsleep. To make the implementation of this simple, we
introduce a per-cpu lock and counter and every thread always takes the
primary thread's lock, modifies the primary thread's counter. This
effectively makes them per-core entities.

But the workaround is abstracted in the powernv core code and neither
the hotplug path nor the cpuidle driver need to bother about it. All
they need to know is if fastsleep, with error or no error is present as
an idle state.

Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Signed-off-by: Shreyas B. Prabhu 
Signed-off-by: Preeti U Murthy 
---
 arch/powerpc/include/asm/machdep.h |   3 +
 arch/powerpc/include/asm/opal.h|   3 +
 arch/powerpc/include/asm/processor.h   |   4 +-
 arch/powerpc/kernel/idle.c |  19 
 arch/powerpc/kernel/idle_power7.S  |   2 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/setup.c | 139 ++---
 drivers/cpuidle/cpuidle-powernv.c  |   8 +-
 8 files changed, 140 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index b125cea..f37014f 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -298,6 +298,9 @@ struct machdep_calls {
 #ifdef CONFIG_MEMORY_HOTREMOVE
int (*remove_memory)(u64, u64);
 #endif
+   /* Idle handlers */
+   void(*setup_idle)(void);
+   unsigned long   (*power7_sleep)(void);
 };
 
 extern void e500_idle(void);
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 28b8342..166d572 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -149,6 +149,7 @@ struct opal_sg_list {
 #define OPAL_DUMP_INFO294
 #define OPAL_PCI_EEH_FREEZE_SET97
 #define OPAL_HANDLE_HMI98
+#define OPAL_CONFIG_IDLE_STATE 99
 #define OPAL_REGISTER_DUMP_REGION  101
 #define OPAL_UNREGISTER_DUMP_REGION102
 
@@ -775,6 +776,7 @@ extern struct device_node *opal_node;
 /* Flags used for idle state discovery from the device tree */
 #define IDLE_INST_NAP  0x0001 /* nap instruction can be used */
 #define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */
+#define IDLE_INST_SLEEP_ER10x0008 /* Use sleep 

[PATCH 1/9] powerpc/powernv: Enable Offline CPUs to enter deep idle states

2014-08-25 Thread Shreyas B. Prabhu
From: "Srivatsa S. Bhat" 

The offline cpus should enter deep idle states so as to gain maximum
powersavings when the entire core is offline. To do so the offline path
must be made aware of the available deepest idle state. Hence probe the
device tree for the possible idle states in powernv core code and
expose the deepest idle state through flags.

Since the  device tree is probed by the cpuidle driver as well, move
the parameters required to discover the idle states into an appropriate
common place to both the driver and the powernv core code.

Another point is that fastsleep idle state may require workarounds in
the kernel to function properly. This workaround is introduced in the
subsequent patches. However neither the cpuidle driver or the hotplug
path need be bothered about this workaround.

They will be taken care of by the core powernv code.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat 
Signed-off-by: Shreyas B. Prabhu 
[ Changelog modified by pre...@linux.vnet.ibm.com ]
Signed-off-by: Preeti U. Murthy 
---
 arch/powerpc/include/asm/opal.h  |  4 +++
 arch/powerpc/platforms/powernv/powernv.h |  7 +
 arch/powerpc/platforms/powernv/setup.c   | 51 
 arch/powerpc/platforms/powernv/smp.c | 11 ++-
 drivers/cpuidle/cpuidle-powernv.c|  7 ++---
 5 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 86055e5..28b8342 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -772,6 +772,10 @@ extern struct kobject *opal_kobj;
 /* /ibm,opal */
 extern struct device_node *opal_node;
 
+/* Flags used for idle state discovery from the device tree */
+#define IDLE_INST_NAP  0x0001 /* nap instruction can be used */
+#define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */
+
 /* API functions */
 int64_t opal_invalid_call(void);
 int64_t opal_console_write(int64_t term_number, __be64 *length,
diff --git a/arch/powerpc/platforms/powernv/powernv.h 
b/arch/powerpc/platforms/powernv/powernv.h
index 75501bf..31ece13 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -23,6 +23,13 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, 
u64 dma_mask)
 }
 #endif
 
+/* Flags to indicate which of the CPU idle states are available for use */
+
+#define IDLE_USE_NAP   (1UL << 0)
+#define IDLE_USE_SLEEP (1UL << 1)
+
+extern unsigned int pnv_get_supported_cpuidle_states(void);
+
 extern void pnv_lpc_init(void);
 
 bool cpu_core_split_required(void);
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 5a0e2dc..2dca1d8 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -282,6 +282,57 @@ static void __init pnv_setup_machdep_rtas(void)
 }
 #endif /* CONFIG_PPC_POWERNV_RTAS */
 
+static unsigned int supported_cpuidle_states;
+
+unsigned int pnv_get_supported_cpuidle_states(void)
+{
+   return supported_cpuidle_states;
+}
+
+static int __init pnv_probe_idle_states(void)
+{
+   struct device_node *power_mgt;
+   struct property *prop;
+   int dt_idle_states;
+   u32 *flags;
+   int i;
+
+   supported_cpuidle_states = 0;
+
+   if (cpuidle_disable != IDLE_NO_OVERRIDE)
+   return 0;
+
+   if (!firmware_has_feature(FW_FEATURE_OPALv3))
+   return 0;
+
+   power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
+   if (!power_mgt) {
+   pr_warn("opal: PowerMgmt Node not found\n");
+   return 0;
+   }
+
+   prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL);
+   if (!prop) {
+   pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
+   return 0;
+   }
+
+   dt_idle_states = prop->length / sizeof(u32);
+   flags = (u32 *) prop->value;
+
+   for (i = 0; i < dt_idle_states; i++) {
+   if (flags[i] & IDLE_INST_NAP)
+   supported_cpuidle_states |= IDLE_USE_NAP;
+
+   if (flags[i] & IDLE_INST_SLEEP)
+   supported_cpuidle_states |= IDLE_USE_SLEEP;
+   }
+
+   return 0;
+}
+
+subsys_initcall(pnv_probe_idle_states);
+
 static int __init pnv_probe(void)
 {
unsigned long root = of_get_flat_dt_root();
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index 5fcfcf4..3ad31d2 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void)
 static void pnv_smp_cpu_kill_self(void)
 {
unsigned int cpu

[PATCH 3/9] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep

2014-08-25 Thread Shreyas B. Prabhu
When guests have to be launched, the secondary threads which are offline
are woken up to run the guests. Today these threads wake up from nap
and check if they have to run guests. Now that the offline secondary
threads can go to fastsleep or going ahead a deeper idle state such as winkle,
add this check in the wakeup from any of the deep idle states path as well.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Suggested-by: "Srivatsa S. Bhat" 
Signed-off-by: Shreyas B. Prabhu 
[ Changelog added by  ]
Signed-off-by: Preeti U Murthy 
---
 arch/powerpc/kernel/exceptions-64s.S | 35 ---
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 050f79a..c64f3cc0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -100,25 +100,8 @@ system_reset_pSeries:
SET_SCRATCH0(r13)
 #ifdef CONFIG_PPC_P7_NAP
 BEGIN_FTR_SECTION
-   /* Running native on arch 2.06 or later, check if we are
-* waking up from nap. We only handle no state loss and
-* supervisor state loss. We do -not- handle hypervisor
-* state loss at this time.
-*/
-   mfspr   r13,SPRN_SRR1
-   rlwinm. r13,r13,47-31,30,31
-   beq 9f
 
-   /* waking up from powersave (nap) state */
-   cmpwi   cr1,r13,2
-   /* Total loss of HV state is fatal, we could try to use the
-* PIR to locate a PACA, then use an emergency stack etc...
-* OPAL v3 based powernv platforms have new idle states
-* which fall in this catagory.
-*/
-   bgt cr1,8f
GET_PACA(r13)
-
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
li  r0,KVM_HWTHREAD_IN_KERNEL
stb r0,HSTATE_HWTHREAD_STATE(r13)
@@ -131,13 +114,27 @@ BEGIN_FTR_SECTION
 1:
 #endif
 
+   /* Running native on arch 2.06 or later, check if we are
+* waking up from nap. We only handle no state loss and
+* supervisor state loss. We do -not- handle hypervisor
+* state loss at this time.
+*/
+   mfspr   r13,SPRN_SRR1
+   rlwinm. r13,r13,47-31,30,31
+   beq 9f
+
+   /* waking up from powersave (nap) state */
+   cmpwi   cr1,r13,2
+   GET_PACA(r13)
+
+   bgt cr1,8f
+
beq cr1,2f
b   power7_wakeup_noloss
 2: b   power7_wakeup_loss
 
/* Fast Sleep wakeup on PowerNV */
-8: GET_PACA(r13)
-   b   power7_wakeup_tb_loss
+8: b   power7_wakeup_tb_loss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/9] powerpc/powernv: Discover and enable winkle

2014-08-25 Thread Shreyas B. Prabhu
Discover winkle from device tree. If supported make OPAL calls
necessary to save HIDs, HMEER, HSPRG0 and LPCR.
Also make OPAL call when the HID0 value is modified during
split/unsplit of cores.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/opal.h  |  1 +
 arch/powerpc/platforms/powernv/powernv.h |  1 +
 arch/powerpc/platforms/powernv/setup.c   | 75 
 arch/powerpc/platforms/powernv/subcore.c | 15 +++
 4 files changed, 92 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index d376020..a77957f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -778,6 +778,7 @@ extern struct device_node *opal_node;
 #define IDLE_INST_NAP  0x0001 /* nap instruction can be used */
 #define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */
 #define IDLE_INST_SLEEP_ER10x0008 /* Use sleep with work around*/
+#define IDLE_INST_WINKLE   0x0004 /* winkle instruction can be used */
 
 /* API functions */
 int64_t opal_invalid_call(void);
diff --git a/arch/powerpc/platforms/powernv/powernv.h 
b/arch/powerpc/platforms/powernv/powernv.h
index 31ece13..76b37f8 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -27,6 +27,7 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, 
u64 dma_mask)
 
 #define IDLE_USE_NAP   (1UL << 0)
 #define IDLE_USE_SLEEP (1UL << 1)
+#define IDLE_USE_WINKLE(1UL << 3)
 
 extern unsigned int pnv_get_supported_cpuidle_states(void);
 
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index f45b52d..13c5e49 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -273,6 +273,65 @@ unsigned int pnv_get_supported_cpuidle_states(void)
return supported_cpuidle_states;
 }
 
+int pnv_save_sprs_for_winkle(void)
+{
+   int cpu;
+   int rc;
+
+   /*
+   * hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross
+   * all cpus at boot. Get these reg values of current cpu and use the
+   * same accross all cpus.
+   */
+   uint64_t lpcr_val = mfspr(SPRN_LPCR);
+   uint64_t hid0_val = mfspr(SPRN_HID0);
+   uint64_t hid1_val = mfspr(SPRN_HID1);
+   uint64_t hid4_val = mfspr(SPRN_HID4);
+   uint64_t hid5_val = mfspr(SPRN_HID5);
+   uint64_t hmeer_val = mfspr(SPRN_HMEER);
+
+   for_each_possible_cpu(cpu) {
+   uint64_t pir = get_hard_smp_processor_id(cpu);
+   uint64_t local_paca_ptr = (uint64_t)&paca[cpu];
+
+   rc = opal_slw_set_reg(pir, SPRN_HSPRG0, local_paca_ptr);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
+   if (rc != 0)
+   return rc;
+
+   /* HIDs are per core registers */
+   if (cpu_thread_in_core(cpu) == 0) {
+
+   rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val);
+   if (rc != 0)
+   return rc;
+
+   }
+
+   }
+
+   return 0;
+
+}
 static int __init pnv_probe_idle_states(void)
 {
struct device_node *power_mgt;
@@ -318,6 +377,22 @@ static int __init pnv_probe_idle_states(void)
supported_cpuidle_states |= IDLE_USE_SLEEP;
need_fastsleep_workaround = 1;
}
+
+   if (flags & IDLE_INST_WINKLE) {
+   /*
+* If winkle is supported, save HSPRG0, HIDs and LPCR
+* contents via OPAL. Enable winkle only if this
+* succeeds.
+*/
+   int opal_ret_val = pnv_save_sprs_for_winkle();
+
+   if (!opal_ret_val)
+   supported_cpuidle_states |= IDLE_USE_WINKLE;
+   else
+   pr_warn("opal: opal_slw_set_reg failed with 
rc=%d, disabling winkle\n",
+   

[PATCH 9/9] powerpc/powernv: Enter deepest supported idle state in offline

2014-08-25 Thread Shreyas B. Prabhu
Enter winkle during offline if supported, else revert to sleep or nap.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/platforms/powernv/smp.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index 3ad31d2..e3fc2c9 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -169,8 +169,10 @@ static void pnv_smp_cpu_kill_self(void)
while (!generic_check_cpu_restart(cpu)) {
ppc64_runlatch_off();
 
-   /* If sleep is supported, go to sleep, instead of nap */
-   if (idle_states & IDLE_USE_SLEEP)
+   /* Go to deepest supported idle state */
+   if (idle_states & IDLE_USE_WINKLE)
+   power7_winkle();
+   else if (idle_states & IDLE_USE_SLEEP)
power7_sleep();
else
power7_nap(1);
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/9] powerpc: Adding macro for accessing Thread Switch Control Register

2014-08-25 Thread Shreyas B. Prabhu
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/reg.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 0c05059..cb65a73 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -371,6 +371,7 @@
 #define SPRN_DBAT7L0x23F   /* Data BAT 7 Lower Register */
 #define SPRN_DBAT7U0x23E   /* Data BAT 7 Upper Register */
 #define SPRN_PPR   0x380   /* SMT Thread status Register */
+#define SPRN_TSCR  0x399   /* Thread Switch Control Register */
 
 #define SPRN_DEC   0x016   /* Decrement Register */
 #define SPRN_DER   0x095   /* Debug Enable Regsiter */
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/9] powerpc/powernv: Add OPAL call to save and restore

2014-08-25 Thread Shreyas B. Prabhu
PORE can be programmed to restore hypervisor registers when waking up
from deep cpu idle states like winkle.

Add call to pass SPR address and value to OPAL, which in turn will
program PORE to restore the register state.

Cc: linuxppc-...@lists.ozlabs.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Suggested-by: Vaidyanathan Srinivasan 
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/opal.h| 2 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
 2 files changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 166d572..d376020 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -150,6 +150,7 @@ struct opal_sg_list {
 #define OPAL_PCI_EEH_FREEZE_SET97
 #define OPAL_HANDLE_HMI98
 #define OPAL_CONFIG_IDLE_STATE 99
+#define OPAL_SLW_SET_REG   100
 #define OPAL_REGISTER_DUMP_REGION  101
 #define OPAL_UNREGISTER_DUMP_REGION102
 
@@ -978,6 +979,7 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs);
 extern void opal_shutdown(void);
 extern int opal_resync_timebase(void);
 int64_t opal_config_idle_state(uint64_t state, uint64_t enter);
+int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
 
 extern void opal_lpc_init(void);
 
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 8d1e724..12e5d46 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -246,5 +246,6 @@ OPAL_CALL(opal_get_param,   OPAL_GET_PARAM);
 OPAL_CALL(opal_set_param,  OPAL_SET_PARAM);
 OPAL_CALL(opal_handle_hmi, OPAL_HANDLE_HMI);
 OPAL_CALL(opal_config_idle_state,  OPAL_CONFIG_IDLE_STATE);
+OPAL_CALL(opal_slw_set_reg,OPAL_SLW_SET_REG);
 OPAL_CALL(opal_register_dump_region,   OPAL_REGISTER_DUMP_REGION);
 OPAL_CALL(opal_unregister_dump_region, OPAL_UNREGISTER_DUMP_REGION);
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/9] powerpc/powernv: Add winkle infrastructure

2014-08-25 Thread Shreyas B. Prabhu
Winkle causes power to be gated off to the entire chiplet. Hence the
hypervisor/firmware state in the entire chiplet is lost.

This patch adds necessary infrastructure to support winkle. Specifically
does following:
- Before entering winkle, save state of registers that need to be
  restored on wake up (SDR1, HFSCR)

- SRR1 bits 46:47 which is used to identify which power saving mode cpu
  woke up from is '11' for both winkle and sleep. Hence introduce a flag
  in PACA to distinguish b/w winkle and sleep.

- Upon waking up, restore all saved registers, recover slb

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Suggested-by: Vaidyanathan Srinivasan 
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/machdep.h |  1 +
 arch/powerpc/include/asm/paca.h|  3 ++
 arch/powerpc/include/asm/ppc-opcode.h  |  2 +
 arch/powerpc/include/asm/processor.h   |  2 +
 arch/powerpc/kernel/asm-offsets.c  |  1 +
 arch/powerpc/kernel/exceptions-64s.S   |  4 +-
 arch/powerpc/kernel/idle.c | 11 +
 arch/powerpc/kernel/idle_power7.S  | 81 +-
 arch/powerpc/platforms/powernv/setup.c | 24 ++
 9 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index f37014f..0a3ced9 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -301,6 +301,7 @@ struct machdep_calls {
/* Idle handlers */
void(*setup_idle)(void);
unsigned long   (*power7_sleep)(void);
+   unsigned long   (*power7_winkle)(void);
 };
 
 extern void e500_idle(void);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index a5139ea..3358f09 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -158,6 +158,9 @@ struct paca_struct {
 * early exception handler for use by high level C handler
 */
struct opal_machine_check_event *opal_mc_evt;
+
+   /* Flag to distinguish b/w sleep and winkle */
+   u8 offline_state;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
/* Exclusive emergency stack pointer for machine check exception. */
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 6f85362..5155be7 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -194,6 +194,7 @@
 
 #define PPC_INST_NAP   0x4c000364
 #define PPC_INST_SLEEP 0x4c0003a4
+#define PPC_INST_WINKLE0x4c0003e4
 
 /* A2 specific instructions */
 #define PPC_INST_ERATWE0x7c0001a6
@@ -374,6 +375,7 @@
 
 #define PPC_NAPstringify_in_c(.long PPC_INST_NAP)
 #define PPC_SLEEP  stringify_in_c(.long PPC_INST_SLEEP)
+#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE)
 
 /* BHRB instructions */
 #define PPC_CLRBHRBstringify_in_c(.long PPC_INST_CLRBHRB)
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 41953cd..00e3df9 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -455,6 +455,8 @@ extern void arch_setup_idle(void);
 extern void power7_nap(int check_irq);
 extern unsigned long power7_sleep(void);
 extern unsigned long __power7_sleep(void);
+extern unsigned long power7_winkle(void);
+extern unsigned long __power7_winkle(void);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
 extern void poweroff_now(void);
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9d7dede..ea98817 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -731,6 +731,7 @@ int main(void)
DEFINE(OPAL_MC_SRR0, offsetof(struct opal_machine_check_event, srr0));
DEFINE(OPAL_MC_SRR1, offsetof(struct opal_machine_check_event, srr1));
DEFINE(PACA_OPAL_MC_EVT, offsetof(struct paca_struct, opal_mc_evt));
+   DEFINE(PACAOFFLINESTATE, offsetof(struct paca_struct, offline_state));
 #endif
 
return 0;
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index c64f3cc0..6c6db2b 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -133,8 +133,8 @@ BEGIN_FTR_SECTION
b   power7_wakeup_noloss
 2: b   power7_wakeup_loss
 
-   /* Fast Sleep wakeup on PowerNV */
-8: b   power7_wakeup_tb_loss
+   /* Fast Sleep / Winkle wakeup on PowerNV */
+8: b   power7_wakeup_hv_state_loss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 1f268e0..ed46217 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -9

[PATCH 0/9] powerpc/powernv: Support for fastsleep and winkle

2014-08-25 Thread Shreyas B. Prabhu
Fast sleep is an idle state, where the core and the L1 and L2
caches are brought down to a threshold voltage. This also means that
the communication between L2 and L3 caches have to be fenced. However
the current P8 chips have a bug wherein this fencing between L2 and
L3 caches get delayed by a cpu cycle. This can delay L3 response to
the other cpus if they request for data during this time. Thus they
would fetch the same data from the memory which could lead to data
corruption if L3 cache is not flushed.
Patch 4 adds support to work around this.

'Deep Winkle' is a deeper idle state where core and private L2 are powered
off. While it offers higher power savings, it is at the cost of losing
hypervisor register state and higher latency.
Patch 5-9 adds support for winkle and uses it for offline cpus.

Patch 1 - Moves parameters required discover idle states to a location 
common to both cpuidle driver and powernv core code
Patch 2 - Populates idle state details from device tree
Patch 3 - Enables cpus to run guest after waking up from fastsleep/winkle


Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Cc: Srivatsa S. Bhat 
Cc: Preeti U. Murthy 
Cc: Vaidyanathan Srinivasan 
Cc: Rob Herring 
Cc: Grant Likely 
Cc: devicet...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org

Preeti U Murthy (2):
  cpuidle/powernv: Populate cpuidle state details by querying the
device-tree
  powerpc/powernv/cpuidle: Add workaround to enable fastsleep

Shreyas B. Prabhu (6):
  powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from
fast-sleep
  powerpc/powernv: Add OPAL call to save and restore
  powerpc: Adding macro for accessing Thread Switch Control Register
  powerpc/powernv: Add winkle infrastructure
  powerpc/powernv: Discover and enable winkle
  powerpc/powernv: Enter deepest supported idle state in offline

Srivatsa S. Bhat (1):
  powerpc/powernv: Enable Offline CPUs to enter deep idle states

 arch/powerpc/include/asm/machdep.h |   4 +
 arch/powerpc/include/asm/opal.h|  10 ++
 arch/powerpc/include/asm/paca.h|   3 +
 arch/powerpc/include/asm/ppc-opcode.h  |   2 +
 arch/powerpc/include/asm/processor.h   |   6 +-
 arch/powerpc/include/asm/reg.h |   1 +
 arch/powerpc/kernel/asm-offsets.c  |   1 +
 arch/powerpc/kernel/exceptions-64s.S   |  37 ++---
 arch/powerpc/kernel/idle.c |  30 
 arch/powerpc/kernel/idle_power7.S  |  83 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
 arch/powerpc/platforms/powernv/powernv.h   |   8 +
 arch/powerpc/platforms/powernv/setup.c | 217 +
 arch/powerpc/platforms/powernv/smp.c   |  13 +-
 arch/powerpc/platforms/powernv/subcore.c   |  15 ++
 drivers/cpuidle/cpuidle-powernv.c  |  40 -
 16 files changed, 439 insertions(+), 33 deletions(-)

-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/9] cpuidle/powernv: Populate cpuidle state details by querying the device-tree

2014-08-25 Thread Shreyas B. Prabhu
From: Preeti U Murthy 

We hard code the metrics relevant for cpuidle states in the kernel today.
Instead pick them up from the device tree so that they remain relevant
and updated for the system that the kernel is running on.

Cc: linux...@vger.kernel.org
Cc: Rafael J. Wysocki 
Cc: Rob Herring 
Cc: Grant Likely 
Cc: devicet...@vger.kernel.org
Signed-off-by: Preeti U. Murthy 
Signed-off-by: Shreyas B. Prabhu 
---
 drivers/cpuidle/cpuidle-powernv.c | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 23d2743..3ceff53 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -162,7 +162,8 @@ static int powernv_add_idle_states(void)
int nr_idle_states = 1; /* Snooze */
int dt_idle_states;
const __be32 *idle_state_flags;
-   u32 len_flags, flags;
+   const __be32 *idle_state_latency;
+   u32 len_flags, flags, latency_ns;
int i;
 
/* Currently we have snooze statically defined */
@@ -178,19 +179,33 @@ static int powernv_add_idle_states(void)
pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
return nr_idle_states;
}
+   idle_state_latency = of_get_property(power_mgt,
+   "ibm,cpu-idle-state-latencies-ns", NULL);
+   if (!idle_state_latency) {
+   pr_warn("DT-PowerMgmt: missing 
ibm,cpu-idle-state-latencies-ns\n");
+   return nr_idle_states;
+   }
 
dt_idle_states = len_flags / sizeof(u32);
 
for (i = 0; i < dt_idle_states; i++) {
 
flags = be32_to_cpu(idle_state_flags[i]);
+
+   /* Cpuidle accepts exit_latency in us and we estimate best case
+* target residency to be 10x exit_latency
+*/
+   latency_ns = be32_to_cpu(idle_state_latency[i]);
+
if (flags & IDLE_INST_NAP) {
/* Add NAP state */
strcpy(powernv_states[nr_idle_states].name, "Nap");
strcpy(powernv_states[nr_idle_states].desc, "Nap");
powernv_states[nr_idle_states].flags = 
CPUIDLE_FLAG_TIME_VALID;
-   powernv_states[nr_idle_states].exit_latency = 10;
-   powernv_states[nr_idle_states].target_residency = 100;
+   powernv_states[nr_idle_states].exit_latency =
+   ((unsigned int)latency_ns) / 1000;
+   powernv_states[nr_idle_states].target_residency =
+   ((unsigned int)latency_ns / 100);
powernv_states[nr_idle_states].enter = &nap_loop;
nr_idle_states++;
}
@@ -201,8 +216,10 @@ static int powernv_add_idle_states(void)
strcpy(powernv_states[nr_idle_states].desc, 
"FastSleep");
powernv_states[nr_idle_states].flags =
CPUIDLE_FLAG_TIME_VALID | 
CPUIDLE_FLAG_TIMER_STOP;
-   powernv_states[nr_idle_states].exit_latency = 300;
-   powernv_states[nr_idle_states].target_residency = 
100;
+   powernv_states[nr_idle_states].exit_latency =
+   ((unsigned int)latency_ns) / 1000;
+   powernv_states[nr_idle_states].target_residency =
+   ((unsigned int)latency_ns / 100);
powernv_states[nr_idle_states].enter = &fastsleep_loop;
nr_idle_states++;
}
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/5] powerpc/powernv: Add winkle infrastructure

2014-10-07 Thread Shreyas B Prabhu


On Tuesday 07 October 2014 11:03 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2014-10-01 at 13:16 +0530, Shreyas B. Prabhu wrote:
>> Winkle causes power to be gated off to the entire chiplet. Hence the
>> hypervisor/firmware state in the entire chiplet is lost.
>>
>> This patch adds necessary infrastructure to support waking up from
>> hypervisor state loss. Specifically does following:
>> - Before entering winkle, save state of registers that need to be
>>   restored on wake up (SDR1, HFSCR)
> 
>  Add ... to your list, it's not exhaustive, is it ?

I use interrupt stack frame for only SDR1 and HFSCR. The rest of the
SPRs are restored via PORE in the next patch. I'll change the comments
to better reflect this.

> 
>> - SRR1 bits 46:47 which is used to identify which power saving mode cpu
>>   woke up from is '11' for both winkle and sleep. Hence introduce a flag
>>   in PACA to distinguish b/w winkle and sleep.
>>
>> - Upon waking up, restore all saved registers, recover slb
>>
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Michael Ellerman 
>> Cc: linuxppc-...@lists.ozlabs.org
>> Suggested-by: Vaidyanathan Srinivasan 
>> Signed-off-by: Shreyas B. Prabhu 
>> ---
>>  arch/powerpc/include/asm/machdep.h |  1 +
>>  arch/powerpc/include/asm/paca.h|  3 ++
>>  arch/powerpc/include/asm/ppc-opcode.h  |  2 +
>>  arch/powerpc/include/asm/processor.h   |  2 +
>>  arch/powerpc/kernel/asm-offsets.c  |  1 +
>>  arch/powerpc/kernel/exceptions-64s.S   |  8 ++--
>>  arch/powerpc/kernel/idle.c | 11 +
>>  arch/powerpc/kernel/idle_power7.S  | 81 
>> +-
>>  arch/powerpc/platforms/powernv/setup.c | 24 ++
>>  9 files changed, 127 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/machdep.h 
>> b/arch/powerpc/include/asm/machdep.h
>> index f37014f..0a3ced9 100644
>> --- a/arch/powerpc/include/asm/machdep.h
>> +++ b/arch/powerpc/include/asm/machdep.h
>> @@ -301,6 +301,7 @@ struct machdep_calls {
>>  /* Idle handlers */
>>  void(*setup_idle)(void);
>>  unsigned long   (*power7_sleep)(void);
>> +unsigned long   (*power7_winkle)(void);
>>  };
> 
> Why does it need to be ppc_md ? Same comments as for sleep
> 
>>  extern void e500_idle(void);
>> diff --git a/arch/powerpc/include/asm/paca.h 
>> b/arch/powerpc/include/asm/paca.h
>> index a5139ea..3358f09 100644
>> --- a/arch/powerpc/include/asm/paca.h
>> +++ b/arch/powerpc/include/asm/paca.h
>> @@ -158,6 +158,9 @@ struct paca_struct {
>>   * early exception handler for use by high level C handler
>>   */
>>  struct opal_machine_check_event *opal_mc_evt;
>> +
>> +/* Flag to distinguish b/w sleep and winkle */
>> +u8 offline_state;
> 
> Not fan of the name. I'd rather you call it "wakeup_state_loss" or
> something a bit more explicit about what that actually means if it's
> going to be a boolean value. Otherwise make it an enumeration of
> constants.
> 
Okay. I'll change this.

>>  #endif
>>  #ifdef CONFIG_PPC_BOOK3S_64
>>  /* Exclusive emergency stack pointer for machine check exception. */
>> diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
>> b/arch/powerpc/include/asm/ppc-opcode.h
>> index 6f85362..5155be7 100644
>> --- a/arch/powerpc/include/asm/ppc-opcode.h
>> +++ b/arch/powerpc/include/asm/ppc-opcode.h
>> @@ -194,6 +194,7 @@
>>  
>>  #define PPC_INST_NAP0x4c000364
>>  #define PPC_INST_SLEEP  0x4c0003a4
>> +#define PPC_INST_WINKLE 0x4c0003e4
>>  
>>  /* A2 specific instructions */
>>  #define PPC_INST_ERATWE 0x7c0001a6
>> @@ -374,6 +375,7 @@
>>  
>>  #define PPC_NAP stringify_in_c(.long PPC_INST_NAP)
>>  #define PPC_SLEEP   stringify_in_c(.long PPC_INST_SLEEP)
>> +#define PPC_WINKLE  stringify_in_c(.long PPC_INST_WINKLE)
>>  
>>  /* BHRB instructions */
>>  #define PPC_CLRBHRB stringify_in_c(.long PPC_INST_CLRBHRB)
>> diff --git a/arch/powerpc/include/asm/processor.h 
>> b/arch/powerpc/include/asm/processor.h
>> index 41953cd..00e3df9 100644
>> --- a/arch/powerpc/include/asm/processor.h
>> +++ b/arch/powerpc/include/asm/processor.h
>> @@ -455,6 +455,8 @@ extern void arch_setup_idle(void);
>>  extern void power7_nap(int check_irq);
>>  extern unsigned long power7_sleep(void);
>>  ext

[PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes

2014-10-01 Thread Shreyas B. Prabhu
Fast sleep is an idle state, where the core and the L1 and L2
caches are brought down to a threshold voltage. This also means that
the communication between L2 and L3 caches have to be fenced. However
the current P8 chips have a bug wherein this fencing between L2 and
L3 caches get delayed by a cpu cycle. This can delay L3 response to
the other cpus if they request for data during this time. Thus they
would fetch the same data from the memory which could lead to data
corruption if L3 cache is not flushed. 

This series overcomes above problem in kernel.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: Srivatsa S. Bhat 
Cc: Preeti U. Murthy 
Cc: Vaidyanathan Srinivasan 

v2:
Rebased on 3.17-rc7
Split from 'powerpc/powernv: Support for fastsleep and winkle'

v1:
https://lkml.org/lkml/2014/8/25/446

Preeti U Murthy (1):
  powerpc/powernv/cpuidle: Add workaround to enable fastsleep

Shreyas B. Prabhu (1):
  powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from
fast-sleep

Srivatsa S. Bhat (1):
  powerpc/powernv: Enable Offline CPUs to enter deep idle states

 arch/powerpc/include/asm/machdep.h |   3 +
 arch/powerpc/include/asm/opal.h|   7 ++
 arch/powerpc/include/asm/processor.h   |   4 +-
 arch/powerpc/kernel/exceptions-64s.S   |  35 
 arch/powerpc/kernel/idle.c |  19 
 arch/powerpc/kernel/idle_power7.S  |   2 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/powernv.h   |   7 ++
 arch/powerpc/platforms/powernv/setup.c | 118 +
 arch/powerpc/platforms/powernv/smp.c   |  11 ++-
 drivers/cpuidle/cpuidle-powernv.c  |  13 ++-
 11 files changed, 194 insertions(+), 26 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] cpuidle/powernv: Populate cpuidle state details by querying the device-tree

2014-10-01 Thread Shreyas B. Prabhu
From: Preeti U Murthy 

We hard code the metrics relevant for cpuidle states in the kernel today.
Instead pick them up from the device tree so that they remain relevant
and updated for the system that the kernel is running on.

Cc: linux...@vger.kernel.org
Cc: Rafael J. Wysocki 
Cc: Rob Herring 
Cc: Grant Likely 
Cc: devicet...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: Michael Ellerman 
Signed-off-by: Preeti U. Murthy 
Signed-off-by: Shreyas B. Prabhu 
---
v2:
Rebased on 3.17-rc7
Separated from 'powerpc/powernv: Support for fastsleep and winkle'

v1:
Initial post
https://lkml.org/lkml/2014/8/25/456

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index a64be57..2426a4b 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -163,7 +163,8 @@ static int powernv_add_idle_states(void)
int nr_idle_states = 1; /* Snooze */
int dt_idle_states;
const __be32 *idle_state_flags;
-   u32 len_flags, flags;
+   const __be32 *idle_state_latency;
+   u32 len_flags, flags, latency_ns;
int i;
 
/* Currently we have snooze statically defined */
@@ -180,18 +181,32 @@ static int powernv_add_idle_states(void)
return nr_idle_states;
}
 
+   idle_state_latency = of_get_property(power_mgt,
+   "ibm,cpu-idle-state-latencies-ns", NULL);
+   if (!idle_state_latency) {
+   pr_warn("DT-PowerMgmt: missing 
ibm,cpu-idle-state-latencies-ns\n");
+   return nr_idle_states;
+   }
+
dt_idle_states = len_flags / sizeof(u32);
 
for (i = 0; i < dt_idle_states; i++) {
 
flags = be32_to_cpu(idle_state_flags[i]);
+
+   /* Cpuidle accepts exit_latency in us and we estimate best case
+* target residency to be 10x exit_latency
+*/
+   latency_ns = be32_to_cpu(idle_state_latency[i]);
if (flags & IDLE_USE_INST_NAP) {
/* Add NAP state */
strcpy(powernv_states[nr_idle_states].name, "Nap");
strcpy(powernv_states[nr_idle_states].desc, "Nap");
powernv_states[nr_idle_states].flags = 
CPUIDLE_FLAG_TIME_VALID;
-   powernv_states[nr_idle_states].exit_latency = 10;
-   powernv_states[nr_idle_states].target_residency = 100;
+   powernv_states[nr_idle_states].exit_latency =
+   ((unsigned int)latency_ns) / 1000;
+   powernv_states[nr_idle_states].target_residency =
+   ((unsigned int)latency_ns / 100);
powernv_states[nr_idle_states].enter = &nap_loop;
nr_idle_states++;
}
@@ -202,8 +217,10 @@ static int powernv_add_idle_states(void)
strcpy(powernv_states[nr_idle_states].desc, 
"FastSleep");
powernv_states[nr_idle_states].flags =
CPUIDLE_FLAG_TIME_VALID | 
CPUIDLE_FLAG_TIMER_STOP;
-   powernv_states[nr_idle_states].exit_latency = 300;
-   powernv_states[nr_idle_states].target_residency = 
100;
+   powernv_states[nr_idle_states].exit_latency =
+   ((unsigned int)latency_ns) / 1000;
+   powernv_states[nr_idle_states].target_residency =
+   ((unsigned int)latency_ns / 100);
powernv_states[nr_idle_states].enter = &fastsleep_loop;
nr_idle_states++;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/5] powerpc: Adding macro for accessing Thread Switch Control Register

2014-10-01 Thread Shreyas B. Prabhu
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/reg.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 0c05059..cb65a73 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -371,6 +371,7 @@
 #define SPRN_DBAT7L0x23F   /* Data BAT 7 Lower Register */
 #define SPRN_DBAT7U0x23E   /* Data BAT 7 Upper Register */
 #define SPRN_PPR   0x380   /* SMT Thread status Register */
+#define SPRN_TSCR  0x399   /* Thread Switch Control Register */
 
 #define SPRN_DEC   0x016   /* Decrement Register */
 #define SPRN_DER   0x095   /* Debug Enable Regsiter */
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep

2014-10-01 Thread Shreyas B. Prabhu
From: Preeti U Murthy 

Fast sleep is an idle state, where the core and the L1 and L2
caches are brought down to a threshold voltage. This also means that
the communication between L2 and L3 caches have to be fenced. However
the current P8 chips have a bug wherein this fencing between L2 and
L3 caches get delayed by a cpu cycle. This can delay L3 response to
the other cpus if they request for data during this time. Thus they
would fetch the same data from the memory which could lead to data
corruption if L3 cache is not flushed.

The cpu idle states save power at a core level and not at a thread level.
Hence powersavings is based on the shallowest idle state that a thread
of a core is in. The above issue in fastsleep will arise only when
all the threads in a core either enter fastsleep or some of them enter
any deeper idle states, with only a few being in fastsleep. This patch
therefore implements a workaround this bug  by ensuring
that, each time a cpu goes to fastsleep, it checks if it is the last
thread in the core to enter fastsleep. If so, it needs to make an opal
call to get around the above mentioned fastsleep problem in the hardware
before issuing the sleep instruction.

Similarly when a thread in a core comes out of fastsleep, it needs
to verify if its the first thread in the core to come out of fastsleep
and issue the opal call to revert the changes made while entering
fastsleep.

For the same reason mentioned above we need to take care of offline threads
as well since we allow them to enter fastsleep and with support for
deep winkle soon coming in they can enter winkle as well.  We therefore
ensure that even offline threads make the above mentioned opal calls
similarly, so that as long as the threads in a core are in and
idle state >= fastsleep, we have the workaround in place. Whenever a
thread comes out of either of these states, it needs to verify if the
opal call has been made and if so it will revert it. For now this patch
ensures that offline threads enter fastsleep.

We need to be able to synchronize the cpus in a core which are entering
and exiting fastsleep so as to ensure that the last thread in the core
to enter fastsleep and the first to exit fastsleep *only* issue the opal
call. To do so, we need a per-core lock and counter. The counter is
required to keep track of the number of threads in a core which are in
idle state >= fastsleep. To make the implementation of this simple, we
introduce a per-cpu lock and counter and every thread always takes the
primary thread's lock, modifies the primary thread's counter. This
effectively makes them per-core entities.

But the workaround is abstracted in the powernv core code and neither
the hotplug path nor the cpuidle driver need to bother about it. All
they need to know is if fastsleep, with error or no error is present as
an idle state.

Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Signed-off-by: Shreyas B. Prabhu 
Signed-off-by: Preeti U Murthy 
---
 arch/powerpc/include/asm/machdep.h |   3 +
 arch/powerpc/include/asm/opal.h|   3 +
 arch/powerpc/include/asm/processor.h   |   4 +-
 arch/powerpc/kernel/idle.c |  19 
 arch/powerpc/kernel/idle_power7.S  |   2 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/setup.c | 139 ++---
 drivers/cpuidle/cpuidle-powernv.c  |   8 +-
 8 files changed, 140 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index b125cea..f37014f 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -298,6 +298,9 @@ struct machdep_calls {
 #ifdef CONFIG_MEMORY_HOTREMOVE
int (*remove_memory)(u64, u64);
 #endif
+   /* Idle handlers */
+   void(*setup_idle)(void);
+   unsigned long   (*power7_sleep)(void);
 };
 
 extern void e500_idle(void);
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 28b8342..166d572 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -149,6 +149,7 @@ struct opal_sg_list {
 #define OPAL_DUMP_INFO294
 #define OPAL_PCI_EEH_FREEZE_SET97
 #define OPAL_HANDLE_HMI98
+#define OPAL_CONFIG_IDLE_STATE 99
 #define OPAL_REGISTER_DUMP_REGION  101
 #define OPAL_UNREGISTER_DUMP_REGION102
 
@@ -775,6 +776,7 @@ extern struct device_node *opal_node;
 /* Flags used for idle state discovery from the device tree */
 #define IDLE_INST_NAP  0x0001 /* nap instruction can be used */
 #define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */
+#define IDLE_INST_SLEEP_ER10x0008 /* Use sleep 

[PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states

2014-10-01 Thread Shreyas B. Prabhu
From: "Srivatsa S. Bhat" 

The offline cpus should enter deep idle states so as to gain maximum
powersavings when the entire core is offline. To do so the offline path
must be made aware of the available deepest idle state. Hence probe the
device tree for the possible idle states in powernv core code and
expose the deepest idle state through flags.

Since the  device tree is probed by the cpuidle driver as well, move
the parameters required to discover the idle states into an appropriate
common place to both the driver and the powernv core code.

Another point is that fastsleep idle state may require workarounds in
the kernel to function properly. This workaround is introduced in the
subsequent patches. However neither the cpuidle driver or the hotplug
path need be bothered about this workaround.

They will be taken care of by the core powernv code.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat 
Signed-off-by: Shreyas B. Prabhu 
[ Changelog modified by pre...@linux.vnet.ibm.com ]
Signed-off-by: Preeti U. Murthy 
---
 arch/powerpc/include/asm/opal.h  |  4 +++
 arch/powerpc/platforms/powernv/powernv.h |  7 +
 arch/powerpc/platforms/powernv/setup.c   | 51 
 arch/powerpc/platforms/powernv/smp.c | 11 ++-
 drivers/cpuidle/cpuidle-powernv.c|  7 ++---
 5 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 86055e5..28b8342 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -772,6 +772,10 @@ extern struct kobject *opal_kobj;
 /* /ibm,opal */
 extern struct device_node *opal_node;
 
+/* Flags used for idle state discovery from the device tree */
+#define IDLE_INST_NAP  0x0001 /* nap instruction can be used */
+#define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */
+
 /* API functions */
 int64_t opal_invalid_call(void);
 int64_t opal_console_write(int64_t term_number, __be64 *length,
diff --git a/arch/powerpc/platforms/powernv/powernv.h 
b/arch/powerpc/platforms/powernv/powernv.h
index 75501bf..31ece13 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -23,6 +23,13 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, 
u64 dma_mask)
 }
 #endif
 
+/* Flags to indicate which of the CPU idle states are available for use */
+
+#define IDLE_USE_NAP   (1UL << 0)
+#define IDLE_USE_SLEEP (1UL << 1)
+
+extern unsigned int pnv_get_supported_cpuidle_states(void);
+
 extern void pnv_lpc_init(void);
 
 bool cpu_core_split_required(void);
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 5a0e2dc..2dca1d8 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -282,6 +282,57 @@ static void __init pnv_setup_machdep_rtas(void)
 }
 #endif /* CONFIG_PPC_POWERNV_RTAS */
 
+static unsigned int supported_cpuidle_states;
+
+unsigned int pnv_get_supported_cpuidle_states(void)
+{
+   return supported_cpuidle_states;
+}
+
+static int __init pnv_probe_idle_states(void)
+{
+   struct device_node *power_mgt;
+   struct property *prop;
+   int dt_idle_states;
+   u32 *flags;
+   int i;
+
+   supported_cpuidle_states = 0;
+
+   if (cpuidle_disable != IDLE_NO_OVERRIDE)
+   return 0;
+
+   if (!firmware_has_feature(FW_FEATURE_OPALv3))
+   return 0;
+
+   power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
+   if (!power_mgt) {
+   pr_warn("opal: PowerMgmt Node not found\n");
+   return 0;
+   }
+
+   prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL);
+   if (!prop) {
+   pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
+   return 0;
+   }
+
+   dt_idle_states = prop->length / sizeof(u32);
+   flags = (u32 *) prop->value;
+
+   for (i = 0; i < dt_idle_states; i++) {
+   if (flags[i] & IDLE_INST_NAP)
+   supported_cpuidle_states |= IDLE_USE_NAP;
+
+   if (flags[i] & IDLE_INST_SLEEP)
+   supported_cpuidle_states |= IDLE_USE_SLEEP;
+   }
+
+   return 0;
+}
+
+subsys_initcall(pnv_probe_idle_states);
+
 static int __init pnv_probe(void)
 {
unsigned long root = of_get_flat_dt_root();
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index 5fcfcf4..3ad31d2 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void)
 static void pnv_smp_cpu_kill_self(void)
 {
unsigned int cpu

[PATCH v2 0/5] Winkle support for offline cpus

2014-10-01 Thread Shreyas B. Prabhu
Powernv already has support for nap and sleep and these states are used
by cpuidle framework. This patchset adds support for 'deep winkle' a 
deeper idle state. 

In deep winkle, entire chiplet (core/L2/L3) is power off, leading to 
higher power savings. But this results in hypervisor state loss. This
patchset add the necessary infrastructure to recover from hypervisor
state loss and enables offline cpus to use winkle. 

I've successfully tested subcore functionality with these patches. 
Particularly these two scenarios:

Scenario 1:
-> Set subcore-per-core to 4.
-> Offline and online a complete core
Check if core wakes up with 4 subcores

Scenario 2. 
-> Set subcore-per-core to 1.
-> Offline a core. 
-> set subcore-per-core to 4.
-> Online a core 
Check if core wakes up with 4 subcores.

In both these scenarios, the core wakes up with 4 subcores and can run 
guests on individual subcores.

Note, these patches apply on top 'powernv/cpuidle: Fastsleep workaround and
fixes' series.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Srivatsa S. Bhat 
Cc: Preeti U. Murthy 
Cc: Vaidyanathan Srinivasan 
Cc: linuxppc-...@lists.ozlabs.org

v2:
Rebased on 3.17-rc7
Split from 'powerpc/powernv: Support for fastsleep and winkle'

v1:
https://lkml.org/lkml/2014/8/25/446

Shreyas B. Prabhu (5):
  powerpc/powernv: Add OPAL call to save and restore
  powerpc: Adding macro for accessing Thread Switch Control Register
  powerpc/powernv: Add winkle infrastructure
  powerpc/powernv: Discover and enable winkle
  powerpc/powernv: Enter deepest supported idle state in offline

 arch/powerpc/include/asm/machdep.h |  1 +
 arch/powerpc/include/asm/opal.h|  3 +
 arch/powerpc/include/asm/paca.h|  3 +
 arch/powerpc/include/asm/ppc-opcode.h  |  2 +
 arch/powerpc/include/asm/processor.h   |  2 +
 arch/powerpc/include/asm/reg.h |  1 +
 arch/powerpc/kernel/asm-offsets.c  |  1 +
 arch/powerpc/kernel/exceptions-64s.S   |  4 +-
 arch/powerpc/kernel/idle.c | 11 +++
 arch/powerpc/kernel/idle_power7.S  | 81 -
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 arch/powerpc/platforms/powernv/powernv.h   |  1 +
 arch/powerpc/platforms/powernv/setup.c | 99 ++
 arch/powerpc/platforms/powernv/smp.c   |  6 +-
 arch/powerpc/platforms/powernv/subcore.c   | 15 
 15 files changed, 226 insertions(+), 5 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep

2014-10-01 Thread Shreyas B. Prabhu
When guests have to be launched, the secondary threads which are offline
are woken up to run the guests. Today these threads wake up from nap
and check if they have to run guests. Now that the offline secondary
threads can go to fastsleep or going ahead a deeper idle state such as winkle,
add this check in the wakeup from any of the deep idle states path as well.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Suggested-by: "Srivatsa S. Bhat" 
Signed-off-by: Shreyas B. Prabhu 
[ Changelog added by  ]
Signed-off-by: Preeti U Murthy 
---
 arch/powerpc/kernel/exceptions-64s.S | 35 ---
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 050f79a..c64f3cc0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -100,25 +100,8 @@ system_reset_pSeries:
SET_SCRATCH0(r13)
 #ifdef CONFIG_PPC_P7_NAP
 BEGIN_FTR_SECTION
-   /* Running native on arch 2.06 or later, check if we are
-* waking up from nap. We only handle no state loss and
-* supervisor state loss. We do -not- handle hypervisor
-* state loss at this time.
-*/
-   mfspr   r13,SPRN_SRR1
-   rlwinm. r13,r13,47-31,30,31
-   beq 9f
 
-   /* waking up from powersave (nap) state */
-   cmpwi   cr1,r13,2
-   /* Total loss of HV state is fatal, we could try to use the
-* PIR to locate a PACA, then use an emergency stack etc...
-* OPAL v3 based powernv platforms have new idle states
-* which fall in this catagory.
-*/
-   bgt cr1,8f
GET_PACA(r13)
-
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
li  r0,KVM_HWTHREAD_IN_KERNEL
stb r0,HSTATE_HWTHREAD_STATE(r13)
@@ -131,13 +114,27 @@ BEGIN_FTR_SECTION
 1:
 #endif
 
+   /* Running native on arch 2.06 or later, check if we are
+* waking up from nap. We only handle no state loss and
+* supervisor state loss. We do -not- handle hypervisor
+* state loss at this time.
+*/
+   mfspr   r13,SPRN_SRR1
+   rlwinm. r13,r13,47-31,30,31
+   beq 9f
+
+   /* waking up from powersave (nap) state */
+   cmpwi   cr1,r13,2
+   GET_PACA(r13)
+
+   bgt cr1,8f
+
beq cr1,2f
b   power7_wakeup_noloss
 2: b   power7_wakeup_loss
 
/* Fast Sleep wakeup on PowerNV */
-8: GET_PACA(r13)
-   b   power7_wakeup_tb_loss
+8: b   power7_wakeup_tb_loss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/5] powerpc/powernv: Add winkle infrastructure

2014-10-01 Thread Shreyas B. Prabhu
Winkle causes power to be gated off to the entire chiplet. Hence the
hypervisor/firmware state in the entire chiplet is lost.

This patch adds necessary infrastructure to support waking up from
hypervisor state loss. Specifically does following:
- Before entering winkle, save state of registers that need to be
  restored on wake up (SDR1, HFSCR)

- SRR1 bits 46:47 which is used to identify which power saving mode cpu
  woke up from is '11' for both winkle and sleep. Hence introduce a flag
  in PACA to distinguish b/w winkle and sleep.

- Upon waking up, restore all saved registers, recover slb

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Suggested-by: Vaidyanathan Srinivasan 
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/machdep.h |  1 +
 arch/powerpc/include/asm/paca.h|  3 ++
 arch/powerpc/include/asm/ppc-opcode.h  |  2 +
 arch/powerpc/include/asm/processor.h   |  2 +
 arch/powerpc/kernel/asm-offsets.c  |  1 +
 arch/powerpc/kernel/exceptions-64s.S   |  8 ++--
 arch/powerpc/kernel/idle.c | 11 +
 arch/powerpc/kernel/idle_power7.S  | 81 +-
 arch/powerpc/platforms/powernv/setup.c | 24 ++
 9 files changed, 127 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index f37014f..0a3ced9 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -301,6 +301,7 @@ struct machdep_calls {
/* Idle handlers */
void(*setup_idle)(void);
unsigned long   (*power7_sleep)(void);
+   unsigned long   (*power7_winkle)(void);
 };
 
 extern void e500_idle(void);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index a5139ea..3358f09 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -158,6 +158,9 @@ struct paca_struct {
 * early exception handler for use by high level C handler
 */
struct opal_machine_check_event *opal_mc_evt;
+
+   /* Flag to distinguish b/w sleep and winkle */
+   u8 offline_state;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
/* Exclusive emergency stack pointer for machine check exception. */
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 6f85362..5155be7 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -194,6 +194,7 @@
 
 #define PPC_INST_NAP   0x4c000364
 #define PPC_INST_SLEEP 0x4c0003a4
+#define PPC_INST_WINKLE0x4c0003e4
 
 /* A2 specific instructions */
 #define PPC_INST_ERATWE0x7c0001a6
@@ -374,6 +375,7 @@
 
 #define PPC_NAPstringify_in_c(.long PPC_INST_NAP)
 #define PPC_SLEEP  stringify_in_c(.long PPC_INST_SLEEP)
+#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE)
 
 /* BHRB instructions */
 #define PPC_CLRBHRBstringify_in_c(.long PPC_INST_CLRBHRB)
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 41953cd..00e3df9 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -455,6 +455,8 @@ extern void arch_setup_idle(void);
 extern void power7_nap(int check_irq);
 extern unsigned long power7_sleep(void);
 extern unsigned long __power7_sleep(void);
+extern unsigned long power7_winkle(void);
+extern unsigned long __power7_winkle(void);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
 extern void poweroff_now(void);
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9d7dede..ea98817 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -731,6 +731,7 @@ int main(void)
DEFINE(OPAL_MC_SRR0, offsetof(struct opal_machine_check_event, srr0));
DEFINE(OPAL_MC_SRR1, offsetof(struct opal_machine_check_event, srr1));
DEFINE(PACA_OPAL_MC_EVT, offsetof(struct paca_struct, opal_mc_evt));
+   DEFINE(PACAOFFLINESTATE, offsetof(struct paca_struct, offline_state));
 #endif
 
return 0;
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index c64f3cc0..261f348 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -115,9 +115,7 @@ BEGIN_FTR_SECTION
 #endif
 
/* Running native on arch 2.06 or later, check if we are
-* waking up from nap. We only handle no state loss and
-* supervisor state loss. We do -not- handle hypervisor
-* state loss at this time.
+* waking up from power saving mode.
 */
mfspr   r13,SPRN_SRR1
rlwinm. r13,r13,47-31,30,31
@@ -133,8 +131,8 @@ BEGIN_FTR_SECTION
b   power7_wakeup_noloss

[PATCH v2 5/5] powerpc/powernv: Enter deepest supported idle state in offline

2014-10-01 Thread Shreyas B. Prabhu
Enter winkle during offline if supported, else revert to sleep or nap.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/platforms/powernv/smp.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index 3ad31d2..e3fc2c9 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -169,8 +169,10 @@ static void pnv_smp_cpu_kill_self(void)
while (!generic_check_cpu_restart(cpu)) {
ppc64_runlatch_off();
 
-   /* If sleep is supported, go to sleep, instead of nap */
-   if (idle_states & IDLE_USE_SLEEP)
+   /* Go to deepest supported idle state */
+   if (idle_states & IDLE_USE_WINKLE)
+   power7_winkle();
+   else if (idle_states & IDLE_USE_SLEEP)
power7_sleep();
else
power7_nap(1);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 4/5] powerpc/powernv: Discover and enable winkle

2014-10-01 Thread Shreyas B. Prabhu
Discover winkle from device tree. If supported make OPAL calls
necessary to save HIDs, HMEER, HSPRG0 and LPCR.
Also make OPAL call when the HID0 value is modified during
split/unsplit of cores.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/opal.h  |  1 +
 arch/powerpc/platforms/powernv/powernv.h |  1 +
 arch/powerpc/platforms/powernv/setup.c   | 75 
 arch/powerpc/platforms/powernv/subcore.c | 15 +++
 4 files changed, 92 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index d376020..a77957f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -778,6 +778,7 @@ extern struct device_node *opal_node;
 #define IDLE_INST_NAP  0x0001 /* nap instruction can be used */
 #define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */
 #define IDLE_INST_SLEEP_ER10x0008 /* Use sleep with work around*/
+#define IDLE_INST_WINKLE   0x0004 /* winkle instruction can be used */
 
 /* API functions */
 int64_t opal_invalid_call(void);
diff --git a/arch/powerpc/platforms/powernv/powernv.h 
b/arch/powerpc/platforms/powernv/powernv.h
index 31ece13..76b37f8 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -27,6 +27,7 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, 
u64 dma_mask)
 
 #define IDLE_USE_NAP   (1UL << 0)
 #define IDLE_USE_SLEEP (1UL << 1)
+#define IDLE_USE_WINKLE(1UL << 3)
 
 extern unsigned int pnv_get_supported_cpuidle_states(void);
 
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index f45b52d..13c5e49 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -273,6 +273,65 @@ unsigned int pnv_get_supported_cpuidle_states(void)
return supported_cpuidle_states;
 }
 
+int pnv_save_sprs_for_winkle(void)
+{
+   int cpu;
+   int rc;
+
+   /*
+   * hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross
+   * all cpus at boot. Get these reg values of current cpu and use the
+   * same accross all cpus.
+   */
+   uint64_t lpcr_val = mfspr(SPRN_LPCR);
+   uint64_t hid0_val = mfspr(SPRN_HID0);
+   uint64_t hid1_val = mfspr(SPRN_HID1);
+   uint64_t hid4_val = mfspr(SPRN_HID4);
+   uint64_t hid5_val = mfspr(SPRN_HID5);
+   uint64_t hmeer_val = mfspr(SPRN_HMEER);
+
+   for_each_possible_cpu(cpu) {
+   uint64_t pir = get_hard_smp_processor_id(cpu);
+   uint64_t local_paca_ptr = (uint64_t)&paca[cpu];
+
+   rc = opal_slw_set_reg(pir, SPRN_HSPRG0, local_paca_ptr);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
+   if (rc != 0)
+   return rc;
+
+   /* HIDs are per core registers */
+   if (cpu_thread_in_core(cpu) == 0) {
+
+   rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val);
+   if (rc != 0)
+   return rc;
+
+   rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val);
+   if (rc != 0)
+   return rc;
+
+   }
+
+   }
+
+   return 0;
+
+}
 static int __init pnv_probe_idle_states(void)
 {
struct device_node *power_mgt;
@@ -318,6 +377,22 @@ static int __init pnv_probe_idle_states(void)
supported_cpuidle_states |= IDLE_USE_SLEEP;
need_fastsleep_workaround = 1;
}
+
+   if (flags & IDLE_INST_WINKLE) {
+   /*
+* If winkle is supported, save HSPRG0, HIDs and LPCR
+* contents via OPAL. Enable winkle only if this
+* succeeds.
+*/
+   int opal_ret_val = pnv_save_sprs_for_winkle();
+
+   if (!opal_ret_val)
+   supported_cpuidle_states |= IDLE_USE_WINKLE;
+   else
+   pr_warn("opal: opal_slw_set_reg failed with 
rc=%d, disabling winkle\n",
+   

[PATCH v2 1/5] powerpc/powernv: Add OPAL call to save and restore

2014-10-01 Thread Shreyas B. Prabhu
PORE can be programmed to restore hypervisor registers when waking up
from deep cpu idle states like winkle.

Add call to pass SPR address and value to OPAL, which in turn will
program PORE to restore the register state.

Cc: linuxppc-...@lists.ozlabs.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Suggested-by: Vaidyanathan Srinivasan 
Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/opal.h| 2 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
 2 files changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 166d572..d376020 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -150,6 +150,7 @@ struct opal_sg_list {
 #define OPAL_PCI_EEH_FREEZE_SET97
 #define OPAL_HANDLE_HMI98
 #define OPAL_CONFIG_IDLE_STATE 99
+#define OPAL_SLW_SET_REG   100
 #define OPAL_REGISTER_DUMP_REGION  101
 #define OPAL_UNREGISTER_DUMP_REGION102
 
@@ -978,6 +979,7 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs);
 extern void opal_shutdown(void);
 extern int opal_resync_timebase(void);
 int64_t opal_config_idle_state(uint64_t state, uint64_t enter);
+int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
 
 extern void opal_lpc_init(void);
 
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 8d1e724..12e5d46 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -246,5 +246,6 @@ OPAL_CALL(opal_get_param,   OPAL_GET_PARAM);
 OPAL_CALL(opal_set_param,  OPAL_SET_PARAM);
 OPAL_CALL(opal_handle_hmi, OPAL_HANDLE_HMI);
 OPAL_CALL(opal_config_idle_state,  OPAL_CONFIG_IDLE_STATE);
+OPAL_CALL(opal_slw_set_reg,OPAL_SLW_SET_REG);
 OPAL_CALL(opal_register_dump_region,   OPAL_REGISTER_DUMP_REGION);
 OPAL_CALL(opal_unregister_dump_region, OPAL_UNREGISTER_DUMP_REGION);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep

2014-10-02 Thread Shreyas B Prabhu
CCing Rafael J. Wysocki and linux...@vger.kernel.org

On Wednesday 01 October 2014 01:15 PM, Shreyas B. Prabhu wrote:
> When guests have to be launched, the secondary threads which are offline
> are woken up to run the guests. Today these threads wake up from nap
> and check if they have to run guests. Now that the offline secondary
> threads can go to fastsleep or going ahead a deeper idle state such as winkle,
> add this check in the wakeup from any of the deep idle states path as well.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: linuxppc-...@lists.ozlabs.org
> Suggested-by: "Srivatsa S. Bhat" 
> Signed-off-by: Shreyas B. Prabhu 
> [ Changelog added by  ]
> Signed-off-by: Preeti U Murthy 
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 35 ---
>  1 file changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 050f79a..c64f3cc0 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -100,25 +100,8 @@ system_reset_pSeries:
>   SET_SCRATCH0(r13)
>  #ifdef CONFIG_PPC_P7_NAP
>  BEGIN_FTR_SECTION
> - /* Running native on arch 2.06 or later, check if we are
> -  * waking up from nap. We only handle no state loss and
> -  * supervisor state loss. We do -not- handle hypervisor
> -  * state loss at this time.
> -  */
> - mfspr   r13,SPRN_SRR1
> - rlwinm. r13,r13,47-31,30,31
> - beq 9f
> 
> - /* waking up from powersave (nap) state */
> - cmpwi   cr1,r13,2
> - /* Total loss of HV state is fatal, we could try to use the
> -  * PIR to locate a PACA, then use an emergency stack etc...
> -  * OPAL v3 based powernv platforms have new idle states
> -  * which fall in this catagory.
> -  */
> - bgt cr1,8f
>   GET_PACA(r13)
> -
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>   li  r0,KVM_HWTHREAD_IN_KERNEL
>   stb r0,HSTATE_HWTHREAD_STATE(r13)
> @@ -131,13 +114,27 @@ BEGIN_FTR_SECTION
>  1:
>  #endif
> 
> + /* Running native on arch 2.06 or later, check if we are
> +  * waking up from nap. We only handle no state loss and
> +  * supervisor state loss. We do -not- handle hypervisor
> +  * state loss at this time.
> +  */
> + mfspr   r13,SPRN_SRR1
> + rlwinm. r13,r13,47-31,30,31
> + beq 9f
> +
> + /* waking up from powersave (nap) state */
> + cmpwi   cr1,r13,2
> + GET_PACA(r13)
> +
> + bgt cr1,8f
> +
>   beq cr1,2f
>   b   power7_wakeup_noloss
>  2:   b   power7_wakeup_loss
> 
>   /* Fast Sleep wakeup on PowerNV */
> -8:   GET_PACA(r13)
> - b   power7_wakeup_tb_loss
> +8:   b   power7_wakeup_tb_loss
> 
>  9:
>  END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes

2014-10-02 Thread Shreyas B Prabhu


On Thursday 02 October 2014 02:16 AM, Rafael J. Wysocki wrote:
> On Wednesday, October 01, 2014 01:15:57 PM Shreyas B. Prabhu wrote:
>> Fast sleep is an idle state, where the core and the L1 and L2
>> caches are brought down to a threshold voltage. This also means that
>> the communication between L2 and L3 caches have to be fenced. However
>> the current P8 chips have a bug wherein this fencing between L2 and
>> L3 caches get delayed by a cpu cycle. This can delay L3 response to
>> the other cpus if they request for data during this time. Thus they
>> would fetch the same data from the memory which could lead to data
>> corruption if L3 cache is not flushed. 
>>
>> This series overcomes above problem in kernel.
>>
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Michael Ellerman 
>> Cc: Rafael J. Wysocki 
>> Cc: linux...@vger.kernel.org
>> Cc: linuxppc-...@lists.ozlabs.org
>> Cc: Srivatsa S. Bhat 
>> Cc: Preeti U. Murthy 
>> Cc: Vaidyanathan Srinivasan 
>>
>> v2:
>> Rebased on 3.17-rc7
>> Split from 'powerpc/powernv: Support for fastsleep and winkle'
>>
>> v1:
>> https://lkml.org/lkml/2014/8/25/446
>>
>> Preeti U Murthy (1):
>>   powerpc/powernv/cpuidle: Add workaround to enable fastsleep
>>
>> Shreyas B. Prabhu (1):
>>   powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from
>> fast-sleep
>>
>> Srivatsa S. Bhat (1):
>>   powerpc/powernv: Enable Offline CPUs to enter deep idle states
>>
>>  arch/powerpc/include/asm/machdep.h |   3 +
>>  arch/powerpc/include/asm/opal.h|   7 ++
>>  arch/powerpc/include/asm/processor.h   |   4 +-
>>  arch/powerpc/kernel/exceptions-64s.S   |  35 
>>  arch/powerpc/kernel/idle.c |  19 
>>  arch/powerpc/kernel/idle_power7.S  |   2 +-
>>  arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
>>  arch/powerpc/platforms/powernv/powernv.h   |   7 ++
>>  arch/powerpc/platforms/powernv/setup.c | 118 
>> +
>>  arch/powerpc/platforms/powernv/smp.c   |  11 ++-
>>  drivers/cpuidle/cpuidle-powernv.c  |  13 ++-
>>  11 files changed, 194 insertions(+), 26 deletions(-)
> 
> [2/3] seems to be missig from the series.
> 
> Also, since that mostly modifies arch/powerpc, I think it should go through
> that tree.  I'm fine with the cpuidle-powernv changes in [1/3] and [3/3].
> 
Hi Rafael, 

Thanks for looking into this. The second patch is an independent fix in the 
powerpc exception handler. To be safe I am ccing you and linux-pm list on that
patch now. 


Thanks, 
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] powernv: cpuidle: Redesign idle states management

2014-11-03 Thread Shreyas B. Prabhu
Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the particular
idle state or a deeper one. There are tasks like fastsleep hardware bug
workaround and hypervisor core state save which have to be done only by
the last thread of the core entering deep idle state and similarly tasks
like timebase resync, hypervisor core register restore that have to be
done only by the first thread waking up from these states. 

The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is suboptimal,
but can cause functionality issues when subcores are involved.

Winkle is deeper idle state compared to fastsleep. In this state the power
supply to the chiplet, i.e core, private L2 and private L3 is turned off.
This results in a total hypervisor state loss. This patch set adds support
for winkle and provides a way to track the idle states of the threads of the
core and use it for idle state management of idle states sleep and winkle.

TODO:
-
Handle the case where a thread enters nap and wakes up with supervisor/
hypervisor state loss. This can only happen due to a bug in the
hardware or the kernel. One way to handle this can be restore the state,
switch to the kernel process context and trigger a panic or a warning.


Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: Vaidyanathan Srinivasan 
Cc: Preeti U Murthy 
Paul Mackerras (1):
  powerpc: powernv: Switch off MMU before entering nap/sleep/rvwinkle
mode

Preeti U. Murthy (1):
  powerpc/powernv: Enable Offline CPUs to enter deep idle states

Shreyas B. Prabhu (2):
  powernv: cpuidle: Redesign idle states management
  powernv: powerpc: Add winkle support for offline cpus

 arch/powerpc/include/asm/cpuidle.h |  14 ++
 arch/powerpc/include/asm/opal.h|  13 +
 arch/powerpc/include/asm/paca.h|   6 +
 arch/powerpc/include/asm/ppc-opcode.h  |   2 +
 arch/powerpc/include/asm/processor.h   |   1 +
 arch/powerpc/include/asm/reg.h |   2 +
 arch/powerpc/kernel/asm-offsets.c  |   6 +
 arch/powerpc/kernel/cpu_setup_power.S  |   4 +
 arch/powerpc/kernel/exceptions-64s.S   |  30 ++-
 arch/powerpc/kernel/idle_power7.S  | 326 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |  39 +++
 arch/powerpc/platforms/powernv/powernv.h   |   2 +
 arch/powerpc/platforms/powernv/setup.c | 170 +
 arch/powerpc/platforms/powernv/smp.c   |  10 +-
 arch/powerpc/platforms/powernv/subcore.c   |  35 +++
 arch/powerpc/platforms/powernv/subcore.h   |   1 +
 drivers/cpuidle/cpuidle-powernv.c  |  10 +-
 17 files changed, 611 insertions(+), 60 deletions(-)
 create mode 100644 arch/powerpc/include/asm/cpuidle.h

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] powernv: cpuidle: Redesign idle states management

2014-11-03 Thread Shreyas B. Prabhu
Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the
particular idle state or a deeper one. There are tasks like fastsleep
hardware bug workaround and hypervisor core state save which have to be
done only by the last thread of the core entering deep idle state and
similarly tasks like timebase resync, hypervisor core register restore
that have to be done only by the first thread waking up from these
state.

The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is
suboptimal, but can cause functionality issues when subcores and kvm is
involved.

This patch adds the necessary infrastructure to track idle states of
threads in a per-core structure. It uses this info to perform tasks like
fastsleep workaround and timebase resync only once per core.

Signed-off-by: Shreyas B. Prabhu 
Originally-by: Preeti U. Murthy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/include/asm/cpuidle.h |  14 ++
 arch/powerpc/include/asm/opal.h|   2 +
 arch/powerpc/include/asm/paca.h|   4 +
 arch/powerpc/kernel/asm-offsets.c  |   4 +
 arch/powerpc/kernel/exceptions-64s.S   |  20 ++-
 arch/powerpc/kernel/idle_power7.S  | 183 +++--
 arch/powerpc/platforms/powernv/opal-wrappers.S |  37 +
 arch/powerpc/platforms/powernv/setup.c |  52 ++-
 arch/powerpc/platforms/powernv/smp.c   |   3 +-
 drivers/cpuidle/cpuidle-powernv.c  |   3 +-
 10 files changed, 267 insertions(+), 55 deletions(-)
 create mode 100644 arch/powerpc/include/asm/cpuidle.h

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
new file mode 100644
index 000..8c82850
--- /dev/null
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -0,0 +1,14 @@
+#ifndef _ASM_POWERPC_CPUIDLE_H
+#define _ASM_POWERPC_CPUIDLE_H
+
+#ifdef CONFIG_PPC_POWERNV
+/* Used in powernv idle state management */
+#define PNV_THREAD_RUNNING  0
+#define PNV_THREAD_NAP  1
+#define PNV_THREAD_SLEEP2
+#define PNV_THREAD_WINKLE   3
+#define PNV_CORE_IDLE_LOCK_BIT  0x100
+#define PNV_CORE_IDLE_THREAD_BITS   0x0FF
+#endif
+
+#endif
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index f8b95c0..bef7fbc 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -152,6 +152,7 @@ struct opal_sg_list {
 #define OPAL_PCI_ERR_INJECT96
 #define OPAL_PCI_EEH_FREEZE_SET97
 #define OPAL_HANDLE_HMI98
+#define OPAL_CONFIG_CPU_IDLE_STATE 99
 #define OPAL_REGISTER_DUMP_REGION  101
 #define OPAL_UNREGISTER_DUMP_REGION102
 
@@ -162,6 +163,7 @@ struct opal_sg_list {
  */
 #define OPAL_PM_NAP_ENABLED0x0001
 #define OPAL_PM_SLEEP_ENABLED  0x0002
+#define OPAL_PM_SLEEP_ENABLED_ER1  0x0008
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index a5139ea..85aeedb 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -158,6 +158,10 @@ struct paca_struct {
 * early exception handler for use by high level C handler
 */
struct opal_machine_check_event *opal_mc_evt;
+
+   /* Per-core mask tracking idle threads and a lock bit-[L][] */
+   u32 *core_idle_state_ptr;
+   u8 thread_idle_state;   /* ~Idle[0]/Nap[1]/Sleep[2]/Winkle[3] */
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
/* Exclusive emergency stack pointer for machine check exception. */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9d7dede..50f299e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -731,6 +731,10 @@ int main(void)
DEFINE(OPAL_MC_SRR0, offsetof(struct opal_machine_check_event, srr0));
DEFINE(OPAL_MC_SRR1, offsetof(struct opal_machine_check_event, srr1));
DEFINE(PACA_OPAL_MC_EVT, offsetof(struct paca_struct, opal_mc_evt));
+   DEFINE(PACA_CORE_IDLE_STATE_PTR,
+   offsetof(struct paca_struct, core_idle_state_ptr));
+   DEFINE(PACA_THREAD_IDLE_STATE,
+   offsetof(struct paca_struct, thread_idle_state));
 #endif
 
return 0;
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 72e783e..3311c8d 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * We layout

[PATCH 4/4] powernv: powerpc: Add winkle support for offline cpus

2014-11-03 Thread Shreyas B. Prabhu
Winkle is a deep idle state supported in power8 chips. A core enters
winkle when all the threads of the core enter winkle. In this state
power supply to the entire chiplet i.e core, private L2 and private L3
is turned off. As a result it gives higher powersavings compared to
sleep.

But entering winkle results in a total hypervisor state loss. Hence the
hypervisor context has to be preserved before entering winkle and
restored upon wake up.

Power-on Reset Engine (PORE) is a dedicated engine which is responsible
for powering on the chiplet during wake up. It can be programmed to
restore the register contests of a few specific registers. This patch
uses PORE to restore register state wherever possible and uses stack to
save and restore rest of the necessary registers.

With hypervisor state restore things fall under three categories-
per-core state, per-subcore state and per-thread state. To manage this,
extend the infrastructure introduced for sleep. Mainly we add a paca
variable subcore_sibling_mask. Using this and the core_idle_state we can
distingush first thread in core and subcore.

Signed-off-by: Shreyas B. Prabhu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/include/asm/opal.h|   3 +
 arch/powerpc/include/asm/paca.h|   2 +
 arch/powerpc/include/asm/ppc-opcode.h  |   2 +
 arch/powerpc/include/asm/processor.h   |   1 +
 arch/powerpc/include/asm/reg.h |   2 +
 arch/powerpc/kernel/asm-offsets.c  |   2 +
 arch/powerpc/kernel/cpu_setup_power.S  |   4 +
 arch/powerpc/kernel/exceptions-64s.S   |  10 ++
 arch/powerpc/kernel/idle_power7.S  | 161 ++---
 arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
 arch/powerpc/platforms/powernv/setup.c |  73 +++
 arch/powerpc/platforms/powernv/smp.c   |   4 +-
 arch/powerpc/platforms/powernv/subcore.c   |  34 ++
 arch/powerpc/platforms/powernv/subcore.h   |   1 +
 14 files changed, 285 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index bef7fbc..f0ca2d9 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -153,6 +153,7 @@ struct opal_sg_list {
 #define OPAL_PCI_EEH_FREEZE_SET97
 #define OPAL_HANDLE_HMI98
 #define OPAL_CONFIG_CPU_IDLE_STATE 99
+#define OPAL_SLW_SET_REG   100
 #define OPAL_REGISTER_DUMP_REGION  101
 #define OPAL_UNREGISTER_DUMP_REGION102
 
@@ -163,6 +164,7 @@ struct opal_sg_list {
  */
 #define OPAL_PM_NAP_ENABLED0x0001
 #define OPAL_PM_SLEEP_ENABLED  0x0002
+#define OPAL_PM_WINKLE_ENABLED 0x0004
 #define OPAL_PM_SLEEP_ENABLED_ER1  0x0008
 
 #ifndef __ASSEMBLY__
@@ -972,6 +974,7 @@ int64_t opal_sensor_read(uint32_t sensor_hndl, int token, 
__be32 *sensor_data);
 int64_t opal_handle_hmi(void);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
+int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
 int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t 
pe_number);
 
 /* Internal functions */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 85aeedb..c2e51b7 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -162,6 +162,8 @@ struct paca_struct {
/* Per-core mask tracking idle threads and a lock bit-[L][] */
u32 *core_idle_state_ptr;
u8 thread_idle_state;   /* ~Idle[0]/Nap[1]/Sleep[2]/Winkle[3] */
+   /* Mask to denote subcore sibling threads */
+   u8 subcore_sibling_mask;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
/* Exclusive emergency stack pointer for machine check exception. */
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 6f85362..5155be7 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -194,6 +194,7 @@
 
 #define PPC_INST_NAP   0x4c000364
 #define PPC_INST_SLEEP 0x4c0003a4
+#define PPC_INST_WINKLE0x4c0003e4
 
 /* A2 specific instructions */
 #define PPC_INST_ERATWE0x7c0001a6
@@ -374,6 +375,7 @@
 
 #define PPC_NAPstringify_in_c(.long PPC_INST_NAP)
 #define PPC_SLEEP  stringify_in_c(.long PPC_INST_SLEEP)
+#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE)
 
 /* BHRB instructions */
 #define PPC_CLRBHRBstringify_in_c(.long PPC_INST_CLRBHRB)
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index dda7ac4..c076842 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b

[PATCH 1/4] powerpc: powernv: Switch off MMU before entering nap/sleep/rvwinkle mode

2014-11-03 Thread Shreyas B. Prabhu
From: Paul Mackerras 

Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on.  This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context.  If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.

This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.

[ shre...@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras 
Signed-off-by: Shreyas B. Prabhu 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/include/asm/reg.h|  2 ++
 arch/powerpc/kernel/idle_power7.S | 18 +-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index c998279..a68ee15 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -118,8 +118,10 @@
 #define __MSR  (MSR_ME | MSR_RI | MSR_IR | MSR_DR | MSR_ISF |MSR_HV)
 #ifdef __BIG_ENDIAN__
 #define MSR_   __MSR
+#define MSR_IDLE   (MSR_ME | MSR_SF | MSR_HV)
 #else
 #define MSR_   (__MSR | MSR_LE)
+#define MSR_IDLE   (MSR_ME | MSR_SF | MSR_HV | MSR_LE)
 #endif
 #define MSR_KERNEL (MSR_ | MSR_64BIT)
 #define MSR_USER32 (MSR_ | MSR_PR | MSR_EE)
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index c0754bb..283c603 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -101,7 +101,23 @@ _GLOBAL(power7_powersave_common)
std r9,_MSR(r1)
std r1,PACAR1(r13)
 
-_GLOBAL(power7_enter_nap_mode)
+   /*
+* Go to real mode to do the nap, as required by the architecture.
+* Also, we need to be in real mode before setting hwthread_state,
+* because as soon as we do that, another thread can switch
+* the MMU context to the guest.
+*/
+   LOAD_REG_IMMEDIATE(r5, MSR_IDLE)
+   li  r6, MSR_RI
+   andcr6, r9, r6
+   LOAD_REG_ADDR(r7, power7_enter_nap_mode)
+   mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
+   mtspr   SPRN_SRR0, r7
+   mtspr   SPRN_SRR1, r5
+   rfid
+
+   .globl  power7_enter_nap_mode
+power7_enter_nap_mode:
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
/* Tell KVM we're napping */
li  r4,KVM_HWTHREAD_IN_NAP
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] powerpc/powernv: Enable Offline CPUs to enter deep idle states

2014-11-03 Thread Shreyas B. Prabhu
From: "Preeti U. Murthy" 

The secondary threads should enter deep idle states so as to gain maximum
powersavings when the entire core is offline. To do so the offline path
must be made aware of the available deepest idle state. Hence probe the
device tree for the possible idle states in powernv core code and
expose the deepest idle state through flags.

Since the  device tree is probed by the cpuidle driver as well, move
the parameters required to discover the idle states into an appropriate
common place to both the driver and the powernv core code.

Another point is that fastsleep idle state may require workarounds in
the kernel to function properly. This workaround is introduced in the
subsequent patches. However neither the cpuidle driver or the hotplug
path need be bothered about this workaround.

They will be taken care of by the core powernv code.

Originally-by: Srivatsa S. Bhat 
Signed-off-by: Preeti U. Murthy 
Signed-off-by: Shreyas B. Prabhu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rafael J. Wysocki 
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org

---
 arch/powerpc/include/asm/opal.h  |  8 ++
 arch/powerpc/platforms/powernv/powernv.h |  2 ++
 arch/powerpc/platforms/powernv/setup.c   | 49 
 arch/powerpc/platforms/powernv/smp.c |  7 -
 drivers/cpuidle/cpuidle-powernv.c|  9 ++
 5 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9124b0e..f8b95c0 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -155,6 +155,14 @@ struct opal_sg_list {
 #define OPAL_REGISTER_DUMP_REGION  101
 #define OPAL_UNREGISTER_DUMP_REGION102
 
+/* Device tree flags */
+
+/* Flags set in power-mgmt nodes in device tree if
+ * respective idle states are supported in the platform.
+ */
+#define OPAL_PM_NAP_ENABLED0x0001
+#define OPAL_PM_SLEEP_ENABLED  0x0002
+
 #ifndef __ASSEMBLY__
 
 #include 
diff --git a/arch/powerpc/platforms/powernv/powernv.h 
b/arch/powerpc/platforms/powernv/powernv.h
index 6c8e2d1..604c48e 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -29,6 +29,8 @@ static inline u64 pnv_pci_dma_get_required_mask(struct 
pci_dev *pdev)
 }
 #endif
 
+extern u32 pnv_get_supported_cpuidle_states(void);
+
 extern void pnv_lpc_init(void);
 
 bool cpu_core_split_required(void);
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 3f9546d..34c6665 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -290,6 +290,55 @@ static void __init pnv_setup_machdep_rtas(void)
 }
 #endif /* CONFIG_PPC_POWERNV_RTAS */
 
+static u32 supported_cpuidle_states;
+
+u32 pnv_get_supported_cpuidle_states(void)
+{
+   return supported_cpuidle_states;
+}
+
+static int __init pnv_init_idle_states(void)
+{
+   struct device_node *power_mgt;
+   int dt_idle_states;
+   const __be32 *idle_state_flags;
+   u32 len_flags, flags;
+   int i;
+
+   supported_cpuidle_states = 0;
+
+   if (cpuidle_disable != IDLE_NO_OVERRIDE)
+   return 0;
+
+   if (!firmware_has_feature(FW_FEATURE_OPALv3))
+   return 0;
+
+   power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
+   if (!power_mgt) {
+   pr_warn("opal: PowerMgmt Node not found\n");
+   return 0;
+   }
+
+   idle_state_flags = of_get_property(power_mgt,
+   "ibm,cpu-idle-state-flags", &len_flags);
+   if (!idle_state_flags) {
+   pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
+   return 0;
+   }
+
+   dt_idle_states = len_flags / sizeof(u32);
+
+   for (i = 0; i < dt_idle_states; i++) {
+   flags = be32_to_cpu(idle_state_flags[i]);
+   supported_cpuidle_states |= flags;
+   }
+
+   return 0;
+}
+
+subsys_initcall(pnv_init_idle_states);
+
+
 static int __init pnv_probe(void)
 {
unsigned long root = of_get_flat_dt_root();
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index 4753958..3dc4cec 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void)
 static void pnv_smp_cpu_kill_self(void)
 {
unsigned int cpu;
+   u32 idle_states;
 
/* Standard hot unplug procedure */
local_irq_disable();
@@ -159,13 +160,17 @@ static void pnv_smp_cpu_kill_self(void)
generic_set_cpu_dead(cpu);
smp_wmb();
 
+   idle_states = pnv_get_supported_cpuidle_states();
/* We don't want to take decrementer interrupts while we are offline,
 * so clear LPCR:PEC

Re: [PATCH v2] powerpc/powernv: Fix race in updating core_idle_state

2015-07-08 Thread Shreyas B Prabhu


On 07/09/2015 10:11 AM, Daniel Axtens wrote:
>> I recommend creating an alias or script that does:
>>
>> $ git log --pretty=fixes -n 1 $commit | xclip
>>
> 
> FWIW, having finally got around to doing this, I found I first needed
> the following snippet in ~/.gitconfig from
> https://www.kernel.org/doc/Documentation/SubmittingPatches
> 
> 
>   [core]
>   abbrev = 12
>   [pretty]
>   fixes = Fixes: %h (\"%s\")
> 
> Otherwise git doesn't know what the pretty format is.
>


Right, thanks for the pointer!

Thanks,
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpupower tools: Fix error when running cpupower monitor

2015-08-25 Thread Shreyas B Prabhu


On 08/17/2015 01:22 PM, Shreyas B Prabhu wrote:
> 
> 
> On 08/10/2015 05:58 PM, Thomas Renninger wrote:
>> On Monday, August 03, 2015 11:46:00 AM Shreyas B. Prabhu wrote:
>>> get_cpu_topology() tries to get topology info from all cpus by reading
>>> files in the topology sysfs dir. If a cpu is offlined, since it doesn't
>>> have topology dir, this function fails and returns -1. This causes
>>> functions relying on get_cpu_topology() to fail. For example-
>>>
>>> $ cpupower monitor
>>> Cannot read number of available processors
>>>
>>> Fix this by skipping fetching topology info for offline cpus.
>>
>> Looks fine.
>>
>> Thanks!
>>
>> Acked-by: Thomas Renninger 
>>
> 
> Thanks Thomas!
> Rafael, can you please pick this patch?
> 
> 


Hi Rafael,

If this patch looks good can you please pick this up?


Thanks,
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpupower tools: Fix error when running cpupower monitor

2015-08-17 Thread Shreyas B Prabhu


On 08/10/2015 05:58 PM, Thomas Renninger wrote:
> On Monday, August 03, 2015 11:46:00 AM Shreyas B. Prabhu wrote:
>> get_cpu_topology() tries to get topology info from all cpus by reading
>> files in the topology sysfs dir. If a cpu is offlined, since it doesn't
>> have topology dir, this function fails and returns -1. This causes
>> functions relying on get_cpu_topology() to fail. For example-
>>
>> $ cpupower monitor
>> Cannot read number of available processors
>>
>> Fix this by skipping fetching topology info for offline cpus.
> 
> Looks fine.
> 
> Thanks!
> 
> Acked-by: Thomas Renninger 
> 

Thanks Thomas!
Rafael, can you please pick this patch?


Thanks,
Shreyas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] powerpc: Add an inline function to update POWER8 HID0

2015-08-14 Thread Shreyas B Prabhu


On 08/05/2015 12:38 PM, Gautham R. Shenoy wrote:
> Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
> prescribes that updates to HID0 be preceded by a SYNC instruction and
> followed by an ISYNC instruction (Page 91).
> 
> Create an inline function name update_power8_hid0() which follows this
> recipe and invoke it from the static split core path.
> 
> Signed-off-by: Gautham R. Shenoy 

Reviewed-by: Shreyas B. Prabhu 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/9] powerpc/powernv: Add platform support for stop instruction

2016-05-02 Thread Shreyas B Prabhu


On 05/03/2016 10:55 AM, Michael Neuling wrote:
> 
>> diff --git a/arch/powerpc/include/asm/cputable.h 
>> b/arch/powerpc/include/asm/cputable.h
>> index df4fb5f..a4739a1 100644
>> --- a/arch/powerpc/include/asm/cputable.h
>> +++ b/arch/powerpc/include/asm/cputable.h
>> @@ -205,6 +205,7 @@ enum {
>>  #define CPU_FTR_DABRX   
>> LONG_ASM_CONST(0x0800)
>>  #define CPU_FTR_PMAO_BUGLONG_ASM_CONST(0x1000)
>>  #define CPU_FTR_SUBCORE 
>> LONG_ASM_CONST(0x2000)
>> +#define CPU_FTR_STOP_INST   LONG_ASM_CONST(0x4000)
> 
> In general, we are putting all the POWER9 features under CPU_FTR_ARCH_300.
> Is there a reason you need this separate bit?
> 

No I don't need a separate bit, I'll use CPU_FTR_ARCH_300.

Thanks,
Shreyas

> CPU_FTR bits are fairly scarce these days.
> 
> Mikey
> 



[PATCH v2 8/9] cpuidle/powernv: Add support for POWER ISA v3 idle states

2016-05-03 Thread Shreyas B. Prabhu
POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added.
 b) new per thread SPR named PSSCR is added which controls the behavior
of stop instruction.

Supported idle states and value to be written to PSSCR register to enter
any idle state is exposed via ibm,cpu-idle-state-names and
ibm,cpu-idle-state-psscr respectively. To enter an idle state,
platform provided power_stop() needs to be invoked with the appropriate
PSSCR value.

This patch adds support for this new mechanism in cpuidle powernv driver.

Cc: Rafael J. Wysocki 
Cc: Daniel Lezcano 
Cc: linux...@vger.kernel.org
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Shreyas B. Prabhu 
---
 drivers/cpuidle/cpuidle-powernv.c | 57 ++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index e12dc30..efe5221 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -21,6 +21,7 @@
 #include 
 
 #define MAX_POWERNV_IDLE_STATES8
+#define MAX_IDLE_STATE_NAME_LEN10
 
 struct cpuidle_driver powernv_idle_driver = {
.name = "powernv_idle",
@@ -29,9 +30,11 @@ struct cpuidle_driver powernv_idle_driver = {
 
 static int max_idle_state;
 static struct cpuidle_state *cpuidle_state_table;
+
+static u64 stop_psscr_table[MAX_POWERNV_IDLE_STATES];
+
 static u64 snooze_timeout;
 static bool snooze_timeout_en;
-
 static int snooze_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
@@ -139,6 +142,15 @@ static struct notifier_block setup_hotplug_notifier = {
.notifier_call = powernv_cpuidle_add_cpu_notifier,
 };
 
+static int stop_loop(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv,
+   int index)
+{
+   ppc64_runlatch_off();
+   power_stop(stop_psscr_table[index]);
+   ppc64_runlatch_on();
+   return index;
+}
 /*
  * powernv_cpuidle_driver_init()
  */
@@ -169,6 +181,8 @@ static int powernv_add_idle_states(void)
int nr_idle_states = 1; /* Snooze */
int dt_idle_states;
u32 *latency_ns, *residency_ns, *flags;
+   u64 *psscr_val = NULL;
+   const char *names[MAX_POWERNV_IDLE_STATES];
int i, rc;
 
/* Currently we have snooze statically defined */
@@ -201,6 +215,23 @@ static int powernv_add_idle_states(void)
goto out_free_latency;
}
 
+   rc = of_property_read_string_array(power_mgt,
+   "ibm,cpu-idle-state-names", names, dt_idle_states);
+   if (rc < -1) {
+   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-states-names in 
DT\n");
+   goto out_free_latency;
+   }
+
+   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val),
+   GFP_KERNEL);
+   rc = of_property_read_u64_array(power_mgt,
+   "ibm,cpu-idle-state-psscr", psscr_val, dt_idle_states);
+   if (rc < -1) {
+   pr_warn("cpuidle-powernv: missing 
ibm,cpu-idle-states-psscr in DT\n");
+   goto out_free_psscr;
+   }
+   }
residency_ns = kzalloc(sizeof(*residency_ns) * dt_idle_states, 
GFP_KERNEL);
rc = of_property_read_u32_array(power_mgt,
"ibm,cpu-idle-state-residency-ns", residency_ns, 
dt_idle_states);
@@ -218,6 +249,16 @@ static int powernv_add_idle_states(void)
powernv_states[nr_idle_states].flags = 0;
powernv_states[nr_idle_states].target_residency = 100;
powernv_states[nr_idle_states].enter = &nap_loop;
+   } else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&
+   !(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
+   strncpy(powernv_states[nr_idle_states].name,
+   (char *)names[i], MAX_IDLE_STATE_NAME_LEN);
+   strncpy(powernv_states[nr_idle_states].desc,
+   (char *)names[i], MAX_IDLE_STATE_NAME_LEN);
+   powernv_states[nr_idle_states].flags = 0;
+
+   powernv_states[nr_idle_states].enter = &stop_loop;
+   stop_psscr_table[nr_idle_states] = psscr_val[i];
}
 
/*
@@ -233,6 +274,18 @@ static int powernv_add_idle_states(void)
powernv_states[nr_idle_states].flags = 
CPUIDLE_FLAG_TIMER_STOP;
powernv_states[nr_idle_states].target_residency = 
30;
powernv_states[nr_idle_stat

[PATCH v2 1/9] powerpc/powernv: Move CHECK_HMI_INTERRUPT to exception-64s header

2016-05-03 Thread Shreyas B. Prabhu
CHECK_HMI_INTERRUPT is used to check for HMI's in reset vector. Move
the macro to a common location (exception-64s.h)
This patch does not change any functionality.

Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/exception-64s.h | 18 ++
 arch/powerpc/kernel/idle_power7.S| 20 +---
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 93ae809..6a625af 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -545,4 +545,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
 #define FINISH_NAP
 #endif
 
+#define CHECK_HMI_INTERRUPT\
+   mfspr   r0,SPRN_SRR1;   \
+BEGIN_FTR_SECTION_NESTED(66);  \
+   rlwinm  r0,r0,45-31,0xf;  /* extract wake reason field (P8) */  \
+FTR_SECTION_ELSE_NESTED(66);   \
+   rlwinm  r0,r0,45-31,0xe;  /* P7 wake reason field is 3 bits */  \
+ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);   \
+   cmpwi   r0,0xa; /* Hypervisor maintenance ? */  \
+   bne 20f;\
+   /* Invoke opal call to handle hmi */\
+   ld  r2,PACATOC(r13);\
+   ld  r1,PACAR1(r13); \
+   std r3,ORIG_GPR3(r1);   /* Save original r3 */  \
+   li  r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/   \
+   bl  opal_call_realmode; \
+   ld  r3,ORIG_GPR3(r1);   /* Restore original r3 */   \
+20:nop;
+
 #endif /* _ASM_POWERPC_EXCEPTION_H */
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index 470ceeb..6b3404b 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #undef DEBUG
@@ -257,25 +258,6 @@ _GLOBAL(power7_winkle)
b   power7_powersave_common
/* No return */
 
-#define CHECK_HMI_INTERRUPT\
-   mfspr   r0,SPRN_SRR1;   \
-BEGIN_FTR_SECTION_NESTED(66);  \
-   rlwinm  r0,r0,45-31,0xf;  /* extract wake reason field (P8) */  \
-FTR_SECTION_ELSE_NESTED(66);   \
-   rlwinm  r0,r0,45-31,0xe;  /* P7 wake reason field is 3 bits */  \
-ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);   \
-   cmpwi   r0,0xa; /* Hypervisor maintenance ? */  \
-   bne 20f;\
-   /* Invoke opal call to handle hmi */\
-   ld  r2,PACATOC(r13);\
-   ld  r1,PACAR1(r13); \
-   std r3,ORIG_GPR3(r1);   /* Save original r3 */  \
-   li  r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/   \
-   bl  opal_call_realmode; \
-   ld  r3,ORIG_GPR3(r1);   /* Restore original r3 */   \
-20:nop;
-
-
 _GLOBAL(power7_wakeup_tb_loss)
ld  r2,PACATOC(r13);
ld  r1,PACAR1(r13)
-- 
2.4.11



[PATCH v2 0/9] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states

2016-05-03 Thread Shreyas B. Prabhu
POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added. This instruction replaces
instructions like nap, sleep, rvwinkle.
 b) new per thread SPR named PSSCR is added which controls the behavior
of stop instruction. 

PSSCR has following key fields
Bits 0:3  - Power-Saving Level Status. This field indicates the
lowest power-saving state the thread entered since stop
instruction was last executed.

Bit 42 - Enable State Loss  
0 - No state is lost irrespective of other fields  
1 - Allows state loss

Bits 44:47 - Power-Saving Level Limit  
This limits the power-saving level that can be entered into.

Bits 60:63 - Requested Level  
Used to specify which power-saving level must be entered on
executing stop instruction

Stop idle states and their properties like name, latency, target
residency, psscr value are exposed via device tree.

This patch series adds support for this new mechanism.

Patches 1-6 are cleanups and code movement.
Patch 7 adds platform specific support for stop and psscr handling.
Patch 8 adds cpuidle driver support.
Patch 9 makes offlined cpu use stop state.

Changes in v2
=
 - Rebased on v4.6-rc6
 - Using CPU_FTR_ARCH_300 bit instead of CPU_FTR_STOP_INST

Cc: Rafael J. Wysocki 
Cc: Daniel Lezcano 
Cc: linux...@vger.kernel.org
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Michael Neuling 
Cc: linuxppc-...@lists.ozlabs.org


Shreyas B. Prabhu (9):
  powerpc/powernv: Move CHECK_HMI_INTERRUPT to exception-64s header
  powerpc/kvm: make hypervisor state restore a function
  powerpc/powernv: Move idle code usable by multiple hardware to common
location
  powerpc/powernv: Make power7_powersave_common more generic
  powerpc/powernv: Move idle related macros to cpuidle.h
  powerpc/powernv: set power_save func after the idle states are
initialized
  powerpc/powernv: Add platform support for stop instruction
  cpuidle/powernv: Add support for POWER ISA v3 idle states
  powerpc/powernv: Use deepest stop state when cpu is offlined

 arch/powerpc/include/asm/cpuidle.h|  29 
 arch/powerpc/include/asm/exception-64s.h  |  18 +++
 arch/powerpc/include/asm/kvm_book3s_asm.h |   2 +-
 arch/powerpc/include/asm/machdep.h|   1 +
 arch/powerpc/include/asm/opal-api.h   |  11 +-
 arch/powerpc/include/asm/paca.h   |   4 +
 arch/powerpc/include/asm/ppc-opcode.h |   4 +
 arch/powerpc/include/asm/processor.h  |   1 +
 arch/powerpc/include/asm/reg.h|  11 ++
 arch/powerpc/kernel/Makefile  |   2 +
 arch/powerpc/kernel/asm-offsets.c |   4 +
 arch/powerpc/kernel/exceptions-64s.S  |  29 +---
 arch/powerpc/kernel/idle_power7.S | 212 +++-
 arch/powerpc/kernel/idle_power_common.S   | 185 +
 arch/powerpc/kernel/idle_power_stop.S | 221 ++
 arch/powerpc/platforms/Kconfig|   4 +
 arch/powerpc/platforms/powernv/Kconfig|   1 +
 arch/powerpc/platforms/powernv/idle.c |  94 +++--
 arch/powerpc/platforms/powernv/powernv.h  |   1 +
 arch/powerpc/platforms/powernv/setup.c|   2 +-
 arch/powerpc/platforms/powernv/smp.c  |   4 +-
 drivers/cpuidle/cpuidle-powernv.c |  57 +++-
 22 files changed, 659 insertions(+), 238 deletions(-)
 create mode 100644 arch/powerpc/kernel/idle_power_common.S
 create mode 100644 arch/powerpc/kernel/idle_power_stop.S

-- 
2.4.11



[PATCH v2 6/9] powerpc/powernv: set power_save func after the idle states are initialized

2016-05-03 Thread Shreyas B. Prabhu
pnv_init_idle_states discovers supported idle states from the
device tree and does the required initialization. Set power_save
function pointer only after this initialization is done

Signed-off-by: Shreyas B. Prabhy 
---
 arch/powerpc/platforms/powernv/idle.c  | 3 +++
 arch/powerpc/platforms/powernv/setup.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index fcc8b68..fbb09fb 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -285,6 +285,9 @@ static int __init pnv_init_idle_states(void)
}
 
pnv_alloc_idle_core_states();
+
+   if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
+   ppc_md.power_save = power7_idle;
 out_free:
kfree(flags);
 out:
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 1acb0c7..c9685b6 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -312,7 +312,7 @@ define_machine(powernv) {
.get_proc_freq  = pnv_get_proc_freq,
.progress   = pnv_progress,
.machine_shutdown   = pnv_shutdown,
-   .power_save = power7_idle,
+   .power_save = NULL,
.calibrate_decr = generic_calibrate_decr,
 #ifdef CONFIG_KEXEC
.kexec_cpu_down = pnv_kexec_cpu_down,
-- 
2.4.11



[PATCH v2 7/9] powerpc/powernv: Add platform support for stop instruction

2016-05-03 Thread Shreyas B. Prabhu
POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added. This instruction replaces
instructions like nap, sleep, rvwinkle.
 b) new per thread SPR named PSSCR is added which controls the behavior
of stop instruction.

PSSCR has following key fields
Bits 0:3  - Power-Saving Level Status. This field indicates the lowest
power-saving state the thread entered since stop instruction was last
executed.

Bit 42 - Enable State Loss
0 - No state is lost irrespective of other fields
1 - Allows state loss

Bits 44:47 - Power-Saving Level Limit
This limits the power-saving level that can be entered into.

Bits 60:63 - Requested Level
Used to specify which power-saving level must be entered on executing
stop instruction

This patch adds support for stop instruction and PSSCR handling.

Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/include/asm/cpuidle.h|   2 +
 arch/powerpc/include/asm/kvm_book3s_asm.h |   2 +-
 arch/powerpc/include/asm/machdep.h|   1 +
 arch/powerpc/include/asm/opal-api.h   |  11 +-
 arch/powerpc/include/asm/paca.h   |   4 +
 arch/powerpc/include/asm/ppc-opcode.h |   4 +
 arch/powerpc/include/asm/processor.h  |   1 +
 arch/powerpc/include/asm/reg.h|  11 ++
 arch/powerpc/kernel/Makefile  |   1 +
 arch/powerpc/kernel/asm-offsets.c |   4 +
 arch/powerpc/kernel/idle_power7.S |   2 +-
 arch/powerpc/kernel/idle_power_common.S   |  26 +++-
 arch/powerpc/kernel/idle_power_stop.S | 221 ++
 arch/powerpc/platforms/Kconfig|   4 +
 arch/powerpc/platforms/powernv/Kconfig|   1 +
 arch/powerpc/platforms/powernv/idle.c |  80 +--
 16 files changed, 358 insertions(+), 17 deletions(-)
 create mode 100644 arch/powerpc/kernel/idle_power_stop.S

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index faa97b7..6d20583 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -13,6 +13,8 @@
 #ifndef __ASSEMBLY__
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
+
+extern u64 pnv_first_deep_stop_state;
 #endif
 
 #endif
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 72b6225..d318d43 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -162,7 +162,7 @@ struct kvmppc_book3s_shadow_vcpu {
 
 /* Values for kvm_state */
 #define KVM_HWTHREAD_IN_KERNEL 0
-#define KVM_HWTHREAD_IN_NAP1
+#define KVM_HWTHREAD_IN_IDLE   1
 #define KVM_HWTHREAD_IN_KVM2
 
 #endif /* __ASM_KVM_BOOK3S_ASM_H__ */
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index fd22442..ca4b116 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -261,6 +261,7 @@ struct machdep_calls {
 extern void e500_idle(void);
 extern void power4_idle(void);
 extern void power7_idle(void);
+extern void power_stop0(void);
 extern void ppc6xx_idle(void);
 extern void book3e_idle(void);
 
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index f8faaae..3b978ba 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -162,13 +162,20 @@
 
 /* Device tree flags */
 
-/* Flags set in power-mgmt nodes in device tree if
- * respective idle states are supported in the platform.
+/*
+ * Flags set in power-mgmt nodes in device tree describing
+ * idle states that are supported in the platform.
  */
+
+#define OPAL_PM_TIMEBASE_STOP  0x0002
+#define OPAL_PM_LOSE_HYP_CONTEXT   0x2000
+#define OPAL_PM_LOSE_FULL_CONTEXT  0x4000
 #define OPAL_PM_NAP_ENABLED0x0001
 #define OPAL_PM_SLEEP_ENABLED  0x0002
 #define OPAL_PM_WINKLE_ENABLED 0x0004
 #define OPAL_PM_SLEEP_ENABLED_ER1  0x0008 /* with workaround */
+#define OPAL_PM_STOP_INST_FAST 0x0010
+#define OPAL_PM_STOP_INST_DEEP 0x0020
 
 /*
  * OPAL_CONFIG_CPU_IDLE_STATE parameters
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 546540b..bf48b7e 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -171,6 +171,10 @@ struct paca_struct {
/* Mask to denote subcore sibling threads */
u8 subcore_sibling_mask;
 #endif
+#ifdef CONFIG_PPC_STOP_INST
+/* Template for PSSCR with EC, ESL, TR, PSLL, MTL fields set */
+   u64 thread_psscr;
+#endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* Exclusive emergency stack pointer for machine check exception. */
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 7ab04fc..f66747f 100644
--- a/arch/powerpc/include

[PATCH v2 9/9] powerpc/powernv: Use deepest stop state when cpu is offlined

2016-05-03 Thread Shreyas B. Prabhu
If hardware supports stop state, use the deepest stop state when
the cpu is offlined.

Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/platforms/powernv/idle.c| 15 +--
 arch/powerpc/platforms/powernv/powernv.h |  1 +
 arch/powerpc/platforms/powernv/smp.c |  4 +++-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 45717ab..cce4780 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -240,6 +240,11 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
  */
 u64 pnv_first_deep_stop_state;
 
+/*
+ * Deepest stop idle state. Used when a cpu is offlined
+ */
+u64 pnv_deepest_stop_state;
+
 static int __init pnv_init_idle_states(void)
 {
struct device_node *power_mgt;
@@ -286,8 +291,11 @@ static int __init pnv_init_idle_states(void)
}
 
/*
-* Set pnv_first_deep_stop_state to the first stop level
-* to cause hypervisor state loss
+* Set pnv_first_deep_stop_state and pnv_deepest_stop_state.
+* pnv_first_deep_stop_state should be set to the first stop
+* level to cause hypervisor state loss.
+* pnv_deepest_stop_state should be set to the deepest stop
+* stop state.
 */
pnv_first_deep_stop_state = 0xF;
for (i = 0; i < dt_idle_states; i++) {
@@ -296,6 +304,9 @@ static int __init pnv_init_idle_states(void)
if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
 (pnv_first_deep_stop_state > psscr_rl))
pnv_first_deep_stop_state = psscr_rl;
+
+   if (pnv_deepest_stop_state < psscr_rl)
+   pnv_deepest_stop_state = psscr_rl;
}
}
 
diff --git a/arch/powerpc/platforms/powernv/powernv.h 
b/arch/powerpc/platforms/powernv/powernv.h
index 6dbc0a1..da7c843 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -18,6 +18,7 @@ static inline void pnv_pci_shutdown(void) { }
 #endif
 
 extern u32 pnv_get_supported_cpuidle_states(void);
+extern u64 pnv_deepest_stop_state;
 
 extern void pnv_lpc_init(void);
 
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index ad7b1a3..f69ceb6 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -182,7 +182,9 @@ static void pnv_smp_cpu_kill_self(void)
 
ppc64_runlatch_off();
 
-   if (idle_states & OPAL_PM_WINKLE_ENABLED)
+   if (cpu_has_feature(CPU_FTR_ARCH_300))
+   srr1 = power_stop(pnv_deepest_stop_state);
+   else if (idle_states & OPAL_PM_WINKLE_ENABLED)
srr1 = power7_winkle();
else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
(idle_states & OPAL_PM_SLEEP_ENABLED_ER1))
-- 
2.4.11



[PATCH v2 5/9] powerpc/powernv: Move idle related macros to cpuidle.h

2016-05-03 Thread Shreyas B. Prabhu
Move idle related macros to a common location asm/cpuidle.h so that
they can be used for stop instruction support.

Signed-off-by: Shreyas B. Prabhy 
---
 arch/powerpc/include/asm/cpuidle.h | 27 +++
 arch/powerpc/kernel/idle_power7.S  | 26 --
 2 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index d2f99ca..faa97b7 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -17,4 +17,31 @@ extern u32 pnv_fastsleep_workaround_at_exit[];
 
 #endif
 
+/* Idle state entry routines */
+#ifdef CONFIG_PPC_P7_NAP
+#defineIDLE_STATE_ENTER_SEQ(IDLE_INST) \
+   /* Magic NAP/SLEEP/WINKLE mode enter sequence */\
+   std r0,0(r1);   \
+   ptesync;\
+   ld  r0,0(r1);   \
+1: cmp cr0,r0,r0;  \
+   bne 1b; \
+   IDLE_INST;  \
+   b   .
+#endif /* CONFIG_PPC_P7_NAP */
+
+/*
+ * Use unused space in the interrupt stack to save and restore
+ * registers for deep-idle support.
+ */
+#define _SDR1  GPR3
+#define _RPR   GPR4
+#define _SPURR GPR5
+#define _PURR  GPR6
+#define _TSCR  GPR7
+#define _DSCR  GPR8
+#define _AMOR  GPR9
+#define _WORT  GPR10
+#define _WORC  GPR11
+
 #endif
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index 1ea71d4..6a24769 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -24,32 +24,6 @@
 
 #undef DEBUG
 
-/*
- * Use unused space in the interrupt stack to save and restore
- * registers for winkle support.
- */
-#define _SDR1  GPR3
-#define _RPR   GPR4
-#define _SPURR GPR5
-#define _PURR  GPR6
-#define _TSCR  GPR7
-#define _DSCR  GPR8
-#define _AMOR  GPR9
-#define _WORT  GPR10
-#define _WORC  GPR11
-
-/* Idle state entry routines */
-
-#defineIDLE_STATE_ENTER_SEQ(IDLE_INST) \
-   /* Magic NAP/SLEEP/WINKLE mode enter sequence */\
-   std r0,0(r1);   \
-   ptesync;\
-   ld  r0,0(r1);   \
-1: cmp cr0,r0,r0;  \
-   bne 1b; \
-   IDLE_INST;  \
-   b   .
-
.text
 
 /*
-- 
2.4.11



[PATCH v2 2/9] powerpc/kvm: make hypervisor state restore a function

2016-05-03 Thread Shreyas B. Prabhu
In the current code, when the thread wakes up in reset vector, some
of the state restore code and check for whether a thread needs to
branch to kvm is duplicated. Reorder the code such that this
duplication is avoided.

At a higher level this is what the change looks like-

Before this patch -
power7_wakeup_tb_loss:
restore hypervisor state
if (thread needed by kvm)
goto kvm_start_guest
restore nvgprs, cr, pc
rfid to process context

power7_wakeup_loss:
restore nvgprs, cr, pc
rfid to process context

reset vector:
if (waking from deep idle states)
goto power7_wakeup_tb_loss
else
if (thread needed by kvm)
goto kvm_start_guest
goto power7_wakeup_loss

After this patch -
power7_wakeup_tb_loss:
restore hypervisor state
return

power7_restore_hyp_resource():
if (waking from deep idle states)
goto power7_wakeup_tb_loss
return

power7_wakeup_loss:
restore nvgprs, cr, pc
rfid to process context

reset vector:
power7_restore_hyp_resource()
if (thread needed by kvm)
goto kvm_start_guest
goto power7_wakeup_loss

Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/kernel/exceptions-64s.S | 29 +++-
 arch/powerpc/kernel/idle_power7.S| 67 
 2 files changed, 41 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 7716ceb..7ebfbb0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -107,25 +107,8 @@ BEGIN_FTR_SECTION
beq 9f
 
cmpwi   cr3,r13,2
+   bl  power7_restore_hyp_resource
 
-   /*
-* Check if last bit of HSPGR0 is set. This indicates whether we are
-* waking up from winkle.
-*/
-   GET_PACA(r13)
-   clrldi  r5,r13,63
-   clrrdi  r13,r13,1
-   cmpwi   cr4,r5,1
-   mtspr   SPRN_HSPRG0,r13
-
-   lbz r0,PACA_THREAD_IDLE_STATE(r13)
-   cmpwi   cr2,r0,PNV_THREAD_NAP
-   bgt cr2,8f  /* Either sleep or Winkle */
-
-   /* Waking up from nap should not cause hypervisor state loss */
-   bgt cr3,.
-
-   /* Waking up from nap */
li  r0,PNV_THREAD_RUNNING
stb r0,PACA_THREAD_IDLE_STATE(r13)  /* Clear thread state */
 
@@ -143,13 +126,9 @@ BEGIN_FTR_SECTION
 
/* Return SRR1 from power7_nap() */
mfspr   r3,SPRN_SRR1
-   beq cr3,2f
-   b   power7_wakeup_noloss
-2: b   power7_wakeup_loss
-
-   /* Fast Sleep wakeup on PowerNV */
-8: GET_PACA(r13)
-   b   power7_wakeup_tb_loss
+   blt cr3,2f
+   b   power7_wakeup_loss
+2: b   power7_wakeup_noloss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index 6b3404b..82d164b 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -258,6 +258,35 @@ _GLOBAL(power7_winkle)
b   power7_powersave_common
/* No return */
 
+/*
+ * Called from reset vector. Check whether we have woken up with
+ * hypervisor state loss. If yes, restore hypervisor state and return
+ * back to reset vector.
+ */
+_GLOBAL(power7_restore_hyp_resource)
+   /*
+* Check if last bit of HSPGR0 is set. This indicates whether we are
+* waking up from winkle.
+*/
+   GET_PACA(r13)
+   clrldi  r5,r13,63
+   clrrdi  r13,r13,1
+   cmpwi   cr4,r5,1
+   mtspr   SPRN_HSPRG0,r13
+
+   lbz r0,PACA_THREAD_IDLE_STATE(r13)
+   cmpwi   cr2,r0,PNV_THREAD_NAP
+   bgt cr2,power7_wakeup_tb_loss   /* Either sleep or Winkle */
+
+   /*
+* We fall through here if PACA_THREAD_IDLE_STATE shows we are waking
+* up from nap. At this stage CR3 shouldn't contains 'gt' since that
+* indicates we are waking with hypervisor state loss from nap.
+*/
+   bgt cr3,.
+
+   blr
+
 _GLOBAL(power7_wakeup_tb_loss)
ld  r2,PACATOC(r13);
ld  r1,PACAR1(r13)
@@ -266,11 +295,13 @@ _GLOBAL(power7_wakeup_tb_loss)
 * and they are restored before switching to the process context. Hence
 * until they are restored, they are free to be used.
 *
-* Save SRR1 in a NVGPR as it might be clobbered in opal_call_realmode
-* (called in CHECK_HMI_INTERRUPT). SRR1 is required to determine the
-* wakeup reason if we branch to kvm_start_guest.
+* Save SRR1 and LR in NVGPRs as they might be clobbered in
+* opal_call_realmode (called in CHECK_HMI_INTERRUPT). SRR1 is required
+* to determine the wakeup reason if we branch to kvm_start_guest. LR
+* is req

[PATCH v2 4/9] powerpc/powernv: Make power7_powersave_common more generic

2016-05-03 Thread Shreyas B. Prabhu
power7_powersave_common does common steps needed before entering idle
state and eventually changes MSR to MSR_IDLE and does rfid to
power7_enter_nap_mode.

Make it more generic by passing the rfid address as a function parameter.
Also make function name more generic.

Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/kernel/idle_power7.S   | 11 +++
 arch/powerpc/kernel/idle_power_common.S | 11 ++-
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index 594e1c5..1ea71d4 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -169,19 +169,22 @@ _GLOBAL(power7_idle)
 _GLOBAL(power7_nap)
mr  r4,r3
li  r3,PNV_THREAD_NAP
-   b   power7_powersave_common
+   LOAD_REG_ADDR(r5, power7_enter_nap_mode)
+   b   power_powersave_common
/* No return */
 
 _GLOBAL(power7_sleep)
li  r3,PNV_THREAD_SLEEP
li  r4,1
-   b   power7_powersave_common
+   LOAD_REG_ADDR(r5, power7_enter_nap_mode)
+   b   power_powersave_common
/* No return */
 
 _GLOBAL(power7_winkle)
-   li  r3,3
+   li  r3,PNV_THREAD_WINKLE
li  r4,1
-   b   power7_powersave_common
+   LOAD_REG_ADDR(r5, power7_enter_nap_mode)
+   b   power_powersave_common
/* No return */
 
 _GLOBAL(power7_wakeup_tb_loss)
diff --git a/arch/powerpc/kernel/idle_power_common.S 
b/arch/powerpc/kernel/idle_power_common.S
index 05954ae..ff7a541 100644
--- a/arch/powerpc/kernel/idle_power_common.S
+++ b/arch/powerpc/kernel/idle_power_common.S
@@ -21,8 +21,10 @@
  * To check IRQ_HAPPENED in r4
  * 0 - don't check
  * 1 - check
+ *
+ * Address to 'rfid' to in r5
  */
-_GLOBAL(power7_powersave_common)
+_GLOBAL(power_powersave_common)
/* Use r3 to pass state nap/sleep/winkle */
/* NAP is a state loss, we create a regs frame on the
 * stack, fill it up with the state we care about and
@@ -79,13 +81,12 @@ _GLOBAL(power7_powersave_common)
 * because as soon as we do that, another thread can switch
 * the MMU context to the guest.
 */
-   LOAD_REG_IMMEDIATE(r5, MSR_IDLE)
+   LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
li  r6, MSR_RI
andcr6, r9, r6
-   LOAD_REG_ADDR(r7, power7_enter_nap_mode)
mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
-   mtspr   SPRN_SRR0, r7
-   mtspr   SPRN_SRR1, r5
+   mtspr   SPRN_SRR0, r5
+   mtspr   SPRN_SRR1, r7
rfid
/* No return */
 
-- 
2.4.11



[PATCH v2 3/9] powerpc/powernv: Move idle code usable by multiple hardware to common location

2016-05-03 Thread Shreyas B. Prabhu
CPU-idle related code like context save/restore functions idle_power7.S
can reused for adding stop instruction support. Move this
code to a new commonly accessible location.

Signed-off-by: Shreyas B. Prabhu 
---
 arch/powerpc/kernel/Makefile|   1 +
 arch/powerpc/kernel/idle_power7.S   | 144 
 arch/powerpc/kernel/idle_power_common.S | 160 
 3 files changed, 161 insertions(+), 144 deletions(-)
 create mode 100644 arch/powerpc/kernel/idle_power_common.S

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2da380f..b877b84 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_PPC64)   += vdso64/
 obj-$(CONFIG_ALTIVEC)  += vecemu.o
 obj-$(CONFIG_PPC_970_NAP)  += idle_power4.o
 obj-$(CONFIG_PPC_P7_NAP)   += idle_power7.o
+obj-$(CONFIG_PPC_POWERNV)  += idle_power_common.o
 procfs-y   := proc_powerpc.o
 obj-$(CONFIG_PROC_FS)  += $(procfs-y)
 rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI)  := rtas_pci.o
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index 82d164b..594e1c5 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -68,80 +68,6 @@ core_idle_lock_held:
lwarx   r15,0,r14
blr
 
-/*
- * Pass requested state in r3:
- * r3 - PNV_THREAD_NAP/SLEEP/WINKLE
- *
- * To check IRQ_HAPPENED in r4
- * 0 - don't check
- * 1 - check
- */
-_GLOBAL(power7_powersave_common)
-   /* Use r3 to pass state nap/sleep/winkle */
-   /* NAP is a state loss, we create a regs frame on the
-* stack, fill it up with the state we care about and
-* stick a pointer to it in PACAR1. We really only
-* need to save PC, some CR bits and the NV GPRs,
-* but for now an interrupt frame will do.
-*/
-   mflrr0
-   std r0,16(r1)
-   stdur1,-INT_FRAME_SIZE(r1)
-   std r0,_LINK(r1)
-   std r0,_NIP(r1)
-
-   /* Hard disable interrupts */
-   mfmsr   r9
-   rldicl  r9,r9,48,1
-   rotldi  r9,r9,16
-   mtmsrd  r9,1/* hard-disable interrupts */
-
-   /* Check if something happened while soft-disabled */
-   lbz r0,PACAIRQHAPPENED(r13)
-   andi.   r0,r0,~PACA_IRQ_HARD_DIS@l
-   beq 1f
-   cmpwi   cr0,r4,0
-   beq 1f
-   addir1,r1,INT_FRAME_SIZE
-   ld  r0,16(r1)
-   li  r3,0/* Return 0 (no nap) */
-   mtlrr0
-   blr
-
-1: /* We mark irqs hard disabled as this is the state we'll
-* be in when returning and we need to tell arch_local_irq_restore()
-* about it
-*/
-   li  r0,PACA_IRQ_HARD_DIS
-   stb r0,PACAIRQHAPPENED(r13)
-
-   /* We haven't lost state ... yet */
-   li  r0,0
-   stb r0,PACA_NAPSTATELOST(r13)
-
-   /* Continue saving state */
-   SAVE_GPR(2, r1)
-   SAVE_NVGPRS(r1)
-   mfcrr4
-   std r4,_CCR(r1)
-   std r9,_MSR(r1)
-   std r1,PACAR1(r13)
-
-   /*
-* Go to real mode to do the nap, as required by the architecture.
-* Also, we need to be in real mode before setting hwthread_state,
-* because as soon as we do that, another thread can switch
-* the MMU context to the guest.
-*/
-   LOAD_REG_IMMEDIATE(r5, MSR_IDLE)
-   li  r6, MSR_RI
-   andcr6, r9, r6
-   LOAD_REG_ADDR(r7, power7_enter_nap_mode)
-   mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
-   mtspr   SPRN_SRR0, r7
-   mtspr   SPRN_SRR1, r5
-   rfid
-
.globl  power7_enter_nap_mode
 power7_enter_nap_mode:
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -258,35 +184,6 @@ _GLOBAL(power7_winkle)
b   power7_powersave_common
/* No return */
 
-/*
- * Called from reset vector. Check whether we have woken up with
- * hypervisor state loss. If yes, restore hypervisor state and return
- * back to reset vector.
- */
-_GLOBAL(power7_restore_hyp_resource)
-   /*
-* Check if last bit of HSPGR0 is set. This indicates whether we are
-* waking up from winkle.
-*/
-   GET_PACA(r13)
-   clrldi  r5,r13,63
-   clrrdi  r13,r13,1
-   cmpwi   cr4,r5,1
-   mtspr   SPRN_HSPRG0,r13
-
-   lbz r0,PACA_THREAD_IDLE_STATE(r13)
-   cmpwi   cr2,r0,PNV_THREAD_NAP
-   bgt cr2,power7_wakeup_tb_loss   /* Either sleep or Winkle */
-
-   /*
-* We fall through here if PACA_THREAD_IDLE_STATE shows we are waking
-* up from nap. At this stage CR3 shouldn't contains 'gt' since that
-* indicates we are waking with hypervisor state loss from nap.
-*/
-   bgt cr3,.
-
-   blr
-
 _GLOBAL(power7_wakeup_tb_loss)
ld  r2,PACATOC(r13);
ld 

[PATCH v3 2/9] powerpc/powernv: Rename idle_power7.S to idle_power_common.S

2016-05-23 Thread Shreyas B. Prabhu
idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next
patch for POWER9. Rename the file to a non-hardware specific
name.

Signed-off-by: Shreyas B. Prabhu 
---
Changes in v3:
==
 - Instead of moving few common functions from idle_power7.S to
   idle_power_common.S, renaming idle_power7.S to idle_power_common.S.

 arch/powerpc/kernel/Makefile|   2 +-
 arch/powerpc/kernel/idle_power7.S   | 527 
 arch/powerpc/kernel/idle_power_common.S | 527 
 3 files changed, 528 insertions(+), 528 deletions(-)
 delete mode 100644 arch/powerpc/kernel/idle_power7.S
 create mode 100644 arch/powerpc/kernel/idle_power_common.S

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2da380f..99116da 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -47,7 +47,7 @@ obj-$(CONFIG_PPC_BOOK3E_64)   += exceptions-64e.o 
idle_book3e.o
 obj-$(CONFIG_PPC64)+= vdso64/
 obj-$(CONFIG_ALTIVEC)  += vecemu.o
 obj-$(CONFIG_PPC_970_NAP)  += idle_power4.o
-obj-$(CONFIG_PPC_P7_NAP)   += idle_power7.o
+obj-$(CONFIG_PPC_P7_NAP)   += idle_power_common.o
 procfs-y   := proc_powerpc.o
 obj-$(CONFIG_PROC_FS)  += $(procfs-y)
 rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI)  := rtas_pci.o
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
deleted file mode 100644
index db59613..000
--- a/arch/powerpc/kernel/idle_power7.S
+++ /dev/null
@@ -1,527 +0,0 @@
-/*
- *  This file contains the power_save function for Power7 CPUs.
- *
- *  This program is free software; you can redistribute it and/or
- *  modify it under the terms of the GNU General Public License
- *  as published by the Free Software Foundation; either version
- *  2 of the License, or (at your option) any later version.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#undef DEBUG
-
-/*
- * Use unused space in the interrupt stack to save and restore
- * registers for winkle support.
- */
-#define _SDR1  GPR3
-#define _RPR   GPR4
-#define _SPURR GPR5
-#define _PURR  GPR6
-#define _TSCR  GPR7
-#define _DSCR  GPR8
-#define _AMOR  GPR9
-#define _WORT  GPR10
-#define _WORC  GPR11
-
-/* Idle state entry routines */
-
-#defineIDLE_STATE_ENTER_SEQ(IDLE_INST) \
-   /* Magic NAP/SLEEP/WINKLE mode enter sequence */\
-   std r0,0(r1);   \
-   ptesync;\
-   ld  r0,0(r1);   \
-1: cmp cr0,r0,r0;  \
-   bne 1b; \
-   IDLE_INST;  \
-   b   .
-
-   .text
-
-/*
- * Used by threads when the lock bit of core_idle_state is set.
- * Threads will spin in HMT_LOW until the lock bit is cleared.
- * r14 - pointer to core_idle_state
- * r15 - used to load contents of core_idle_state
- */
-
-core_idle_lock_held:
-   HMT_LOW
-3: lwz r15,0(r14)
-   andi.   r15,r15,PNV_CORE_IDLE_LOCK_BIT
-   bne 3b
-   HMT_MEDIUM
-   lwarx   r15,0,r14
-   blr
-
-/*
- * Pass requested state in r3:
- * r3 - PNV_THREAD_NAP/SLEEP/WINKLE
- *
- * To check IRQ_HAPPENED in r4
- * 0 - don't check
- * 1 - check
- */
-_GLOBAL(power7_powersave_common)
-   /* Use r3 to pass state nap/sleep/winkle */
-   /* NAP is a state loss, we create a regs frame on the
-* stack, fill it up with the state we care about and
-* stick a pointer to it in PACAR1. We really only
-* need to save PC, some CR bits and the NV GPRs,
-* but for now an interrupt frame will do.
-*/
-   mflrr0
-   std r0,16(r1)
-   stdur1,-INT_FRAME_SIZE(r1)
-   std r0,_LINK(r1)
-   std r0,_NIP(r1)
-
-   /* Hard disable interrupts */
-   mfmsr   r9
-   rldicl  r9,r9,48,1
-   rotldi  r9,r9,16
-   mtmsrd  r9,1/* hard-disable interrupts */
-
-   /* Check if something happened while soft-disabled */
-   lbz r0,PACAIRQHAPPENED(r13)
-   andi.   r0,r0,~PACA_IRQ_HARD_DIS@l
-   beq 1f
-   cmpwi   cr0,r4,0
-   beq 1f
-   addir1,r1,INT_FRAME_SIZE
-   ld  r0,16(r1)
-   li  r3,0/* Return 0 (no nap) */
-   mtlrr0
-   blr
-
-1: /* We mark irqs hard disabled as this is the state we'll
-* be in when returning and we need to tell arch_local_irq_restore()
-* about it
-*/
-   li  r0,PACA_IRQ_HARD_DIS
-   stb r0,PACAIRQHAPPENED(r13)
-
-   /* We haven't lost state ... yet */
-   li  r0,0
-   stb r0,PACA_N

[PATCH v3 0/9] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states

2016-05-23 Thread Shreyas B. Prabhu
POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added. This instruction replaces
instructions like nap, sleep, rvwinkle.
 b) new per thread SPR named PSSCR is added which controls the behavior
of stop instruction. 

PSSCR has following key fields
Bits 0:3  - Power-Saving Level Status. This field indicates the
lowest power-saving state the thread entered since stop
instruction was last executed.

Bit 42 - Enable State Loss  
0 - No state is lost irrespective of other fields  
1 - Allows state loss

Bits 44:47 - Power-Saving Level Limit  
This limits the power-saving level that can be entered into.

Bits 60:63 - Requested Level  
Used to specify which power-saving level must be entered on
executing stop instruction

Stop idle states and their properties like name, latency, target
residency, psscr value are exposed via device tree.

This patch series adds support for this new mechanism.

Patches 1-6 are cleanups and code movement.
Patch 7 adds platform specific support for stop and psscr handling.
Patch 8 adds cpuidle driver support.
Patch 9 makes offlined cpu use deepest stop state.

Changes in v3
=
 - Rebased on powerpc-next
 - Dropping patch 1 since we are not adding a new file for P9 idle support
 - Improved comments in multiple places
 - Moved GET_PACA from power7_restore_hyp_resource to System Reset
 - Instead of moving few functions from idle_power7 to idle_power_common,
   renaming idle_power7.S to idle_power_common.S
 - Moved HSTATE_HWTHREAD_STATE updation to power_powersave_common
 - Dropped earlier patch 5 which moved few macros from idle_power_common to
   asm/cpuidle.h. 
 - Added a patch to rename reusable power7_* idle functions to pnv_*
 - Added new patch that creates abstraction for saving SPRs before
   entering deep idle states
 - Instead of introducing new file idle_power_stop.S, P9 idle support
   is added to idle_power_common.S using CPU_FTR sections.
 - Fixed r4 reg clobbering in power_stop0

Changes in v2
=
 - Rebased on v4.6-rc6
 - Using CPU_FTR_ARCH_300 bit instead of CPU_FTR_STOP_INST

Cc: Rafael J. Wysocki 
Cc: Daniel Lezcano 
Cc: linux...@vger.kernel.org
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Michael Neuling 
Cc: linuxppc-...@lists.ozlabs.org

Shreyas B. Prabhu (9):
  powerpc/powernv: Move CHECK_HMI_INTERRUPT to exception-64s header
  powerpc/kvm: make hypervisor state restore a function
  powerpc/powernv: Rename idle_power7.S to idle_power_common.S
  powerpc/powernv: Make power7_powersave_common more generic
  powerpc/powernv: abstraction for saving SPRs before entering deep idle
states
  powerpc/powernv: set power_save func after the idle states are
initialized
  powerpc/powernv: Add platform support for stop instruction
  cpuidle/powernv: Add support for POWER ISA v3 idle states
  powerpc/powernv: Use deepest stop state when cpu is offlined

 arch/powerpc/include/asm/cpuidle.h|   2 +
 arch/powerpc/include/asm/exception-64s.h  |  18 +
 arch/powerpc/include/asm/kvm_book3s_asm.h |   2 +-
 arch/powerpc/include/asm/machdep.h|   1 +
 arch/powerpc/include/asm/opal-api.h   |  11 +-
 arch/powerpc/include/asm/paca.h   |   2 +
 arch/powerpc/include/asm/ppc-opcode.h |   4 +
 arch/powerpc/include/asm/processor.h  |   1 +
 arch/powerpc/include/asm/reg.h|  11 +
 arch/powerpc/kernel/Makefile  |   2 +-
 arch/powerpc/kernel/asm-offsets.c |   2 +
 arch/powerpc/kernel/exceptions-64s.S  |  28 +-
 arch/powerpc/kernel/idle_power7.S | 515 
 arch/powerpc/kernel/idle_power_common.S   | 642 ++
 arch/powerpc/platforms/powernv/idle.c |  96 -
 arch/powerpc/platforms/powernv/powernv.h  |   1 +
 arch/powerpc/platforms/powernv/setup.c|   2 +-
 arch/powerpc/platforms/powernv/smp.c  |   4 +-
 drivers/cpuidle/cpuidle-powernv.c |  57 ++-
 19 files changed, 843 insertions(+), 558 deletions(-)
 delete mode 100644 arch/powerpc/kernel/idle_power7.S
 create mode 100644 arch/powerpc/kernel/idle_power_common.S

-- 
2.4.11



[PATCH v3 5/9] powerpc/powernv: abstraction for saving SPRs before entering deep idle states

2016-05-23 Thread Shreyas B. Prabhu
Create a function for saving SPRs before entering deep idle states.
This function can be reused for POWER9 deep idle states.

Signed-off-by: Shreyas B. Prabhu 
---
New in v3

 arch/powerpc/kernel/idle_power_common.S | 54 +++--
 1 file changed, 32 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/idle_power_common.S 
b/arch/powerpc/kernel/idle_power_common.S
index d100577..d931537 100644
--- a/arch/powerpc/kernel/idle_power_common.S
+++ b/arch/powerpc/kernel/idle_power_common.S
@@ -52,6 +52,36 @@
.text
 
 /*
+ * Used by threads before entering deep idle states. Saves SPRs
+ * in interrupt stack frame
+ */
+save_sprs_to_stack:
+   /*
+* Note all register i.e per-core, per-subcore or per-thread is saved
+* here since any thread in the core might wake up first
+*/
+   mfspr   r3,SPRN_SDR1
+   std r3,_SDR1(r1)
+   mfspr   r3,SPRN_RPR
+   std r3,_RPR(r1)
+   mfspr   r3,SPRN_SPURR
+   std r3,_SPURR(r1)
+   mfspr   r3,SPRN_PURR
+   std r3,_PURR(r1)
+   mfspr   r3,SPRN_TSCR
+   std r3,_TSCR(r1)
+   mfspr   r3,SPRN_DSCR
+   std r3,_DSCR(r1)
+   mfspr   r3,SPRN_AMOR
+   std r3,_AMOR(r1)
+   mfspr   r3,SPRN_WORT
+   std r3,_WORT(r1)
+   mfspr   r3,SPRN_WORC
+   std r3,_WORC(r1)
+
+   blr
+
+/*
  * Used by threads when the lock bit of core_idle_state is set.
  * Threads will spin in HMT_LOW until the lock bit is cleared.
  * r14 - pointer to core_idle_state
@@ -207,28 +237,8 @@ fastsleep_workaround_at_entry:
b   common_enter
 
 enter_winkle:
-   /*
-* Note all register i.e per-core, per-subcore or per-thread is saved
-* here since any thread in the core might wake up first
-*/
-   mfspr   r3,SPRN_SDR1
-   std r3,_SDR1(r1)
-   mfspr   r3,SPRN_RPR
-   std r3,_RPR(r1)
-   mfspr   r3,SPRN_SPURR
-   std r3,_SPURR(r1)
-   mfspr   r3,SPRN_PURR
-   std r3,_PURR(r1)
-   mfspr   r3,SPRN_TSCR
-   std r3,_TSCR(r1)
-   mfspr   r3,SPRN_DSCR
-   std r3,_DSCR(r1)
-   mfspr   r3,SPRN_AMOR
-   std r3,_AMOR(r1)
-   mfspr   r3,SPRN_WORT
-   std r3,_WORT(r1)
-   mfspr   r3,SPRN_WORC
-   std r3,_WORC(r1)
+   bl  save_sprs_to_stack
+
IDLE_STATE_ENTER_SEQ(PPC_WINKLE)
 
 _GLOBAL(power7_idle)
-- 
2.4.11



  1   2   3   >