Re: [PATCH 3/3] powerpc/pseries: Support compression of oops text via pstore

2013-06-25 Thread Aruna Balakrishnaiah

Hi Kees,

On Monday 24 June 2013 11:27 PM, Kees Cook wrote:

On Sun, Jun 23, 2013 at 11:23 PM, Aruna Balakrishnaiah
 wrote:

The patch set supports compression of oops messages while writing to NVRAM,
this helps in capturing more of oops data to lnx,oops-log. The pstore file
for oops messages will be in decompressed format making it readable.

In case compression fails, the patch takes care of copying the header added
by pstore and last oops_data_sz bytes of big_oops_buf to NVRAM so that we
have recent oops messages in lnx,oops-log.

In case decompression fails, it will result in absence of oops file but still
have files (in /dev/pstore) for other partitions.

Signed-off-by: Aruna Balakrishnaiah 
---
  arch/powerpc/platforms/pseries/nvram.c |  132 +---
  1 file changed, 118 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/nvram.c 
b/arch/powerpc/platforms/pseries/nvram.c
index 0159d74..b5ba5e2 100644
--- a/arch/powerpc/platforms/pseries/nvram.c
+++ b/arch/powerpc/platforms/pseries/nvram.c
@@ -539,6 +539,65 @@ static int zip_oops(size_t text_len)
  }

  #ifdef CONFIG_PSTORE
+/* Derived from logfs_uncompress */
+int nvram_decompress(void *in, void *out, size_t inlen, size_t outlen)
+{
+   int err, ret;
+
+   ret = -EIO;
+   err = zlib_inflateInit(&stream);
+   if (err != Z_OK)
+   goto error;
+
+   stream.next_in = in;
+   stream.avail_in = inlen;
+   stream.total_in = 0;
+   stream.next_out = out;
+   stream.avail_out = outlen;
+   stream.total_out = 0;
+
+   err = zlib_inflate(&stream, Z_FINISH);
+   if (err != Z_STREAM_END)
+   goto error;
+
+   err = zlib_inflateEnd(&stream);
+   if (err != Z_OK)
+   goto error;
+
+   ret = stream.total_out;
+error:
+   return ret;
+}
+
+static int unzip_oops(char *oops_buf, char *big_buf)
+{
+   struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf;
+   u64 timestamp = oops_hdr->timestamp;
+   char *big_oops_data = NULL;
+   char *oops_data_buf = NULL;
+   size_t big_oops_data_sz;
+   int unzipped_len;
+
+   big_oops_data = big_buf + sizeof(struct oops_log_info);
+   big_oops_data_sz = big_oops_buf_sz - sizeof(struct oops_log_info);
+   oops_data_buf = oops_buf + sizeof(struct oops_log_info);
+
+   unzipped_len = nvram_decompress(oops_data_buf, big_oops_data,
+   oops_hdr->report_length,
+   big_oops_data_sz);
+
+   if (unzipped_len < 0) {
+   pr_err("nvram: decompression failed; returned %d\n",
+   unzipped_len);
+   return -1;
+   }
+   oops_hdr = (struct oops_log_info *)big_buf;
+   oops_hdr->version = OOPS_HDR_VERSION;
+   oops_hdr->report_length = (u16) unzipped_len;
+   oops_hdr->timestamp = timestamp;
+   return 0;
+}
+
  static int nvram_pstore_open(struct pstore_info *psi)
  {
 /* Reset the iterator to start reading partitions again */
@@ -567,6 +626,7 @@ static int nvram_pstore_write(enum pstore_type_id type,
 size_t size, struct pstore_info *psi)
  {
 int rc;
+   unsigned int err_type = ERR_TYPE_KERNEL_PANIC;
 struct oops_log_info *oops_hdr = (struct oops_log_info *) oops_buf;

 /* part 1 has the recent messages from printk buffer */
@@ -577,8 +637,31 @@ static int nvram_pstore_write(enum pstore_type_id type,
 oops_hdr->version = OOPS_HDR_VERSION;
 oops_hdr->report_length = (u16) size;
 oops_hdr->timestamp = get_seconds();
+
+   if (big_oops_buf) {
+   rc = zip_oops(size);
+   /*
+* If compression fails copy recent log messages from
+* big_oops_buf to oops_data.
+*/
+   if (rc != 0) {
+   int hsize = pstore_get_header_size();

I think I would rather see the API to pstore_write() changed to
include explicit details about header sizes. Mkaing hsize a global
seems unwise, since it's not strictly going to be a constant value. It
could change between calls to the writer, for example.


Introducing headersize in pstore_write() API would need changes at
multiple places whereits being called. The idea is to move the
compression support to pstore infrastructure so that other platforms
could also make use of it. Once the compression support gets in,
header size argument in pstore_write() will have to be deprecated.

Till the time compression support for pstore goes in, can't we call
pstore_header_size before every write call to knowthe header size.


Beyond that, this all seems sensible, though it would be kind of cool
to move this compression logic into the pstore core so it would get
used by default (or through a module parameter).
-Kees


+   size_t diff = size - oo

Re: [PATCH 04/10] powerpc/eeh: Backends to get/set settings

2013-06-25 Thread Gavin Shan
On Tue, Jun 25, 2013 at 04:07:24PM +1000, Benjamin Herrenschmidt wrote:
>On Tue, 2013-06-25 at 13:55 +0800, Gavin Shan wrote:
>> When the PHB gets fenced, 0xFF's returns from PCI config space and
>> MMIO space in the hardware. The operations writting to them should
>> be dropped. The patch introduce backends allow to set/get flags that
>> indicate the access to PCI-CFG and MMIO should be blocked.
>
>We can't block MMIO without massive overhead. Config space can be
>blocked inside the firmware, can't it ?
>

Yep. The config space has been blocked on fenced PHB by firmware. I
almostly forgot that (struct p7ioc_phb::use_asb) :-)

Thanks,
Gavin

>
>> Signed-off-by: Gavin Shan 
>> ---
>>  arch/powerpc/include/asm/eeh.h   |6 +++
>>  arch/powerpc/platforms/pseries/eeh_pseries.c |   44 
>> ++
>>  2 files changed, 50 insertions(+), 0 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>> index dd65e31..de821c1 100644
>> --- a/arch/powerpc/include/asm/eeh.h
>> +++ b/arch/powerpc/include/asm/eeh.h
>> @@ -131,6 +131,10 @@ static inline struct pci_dev *eeh_dev_to_pci_dev(struct 
>> eeh_dev *edev)
>>  #define EEH_LOG_TEMP1   /* EEH temporary error log  
>> */
>>  #define EEH_LOG_PERM2   /* EEH permanent error log  
>> */
>>  
>> +/* Settings for platforms */
>> +#define EEH_SETTING_BLOCK_CFG   1   /* Blocked PCI config access
>> */
>> +#define EEH_SETTING_BLOCK_IO2   /* Blocked MMIO access  
>> */
>> +
>>  struct eeh_ops {
>>  char *name;
>>  int (*init)(void);
>> @@ -146,6 +150,8 @@ struct eeh_ops {
>>  int (*configure_bridge)(struct eeh_pe *pe);
>>  int (*read_config)(struct device_node *dn, int where, int size, u32 
>> *val);
>>  int (*write_config)(struct device_node *dn, int where, int size, u32 
>> val);
>> +int (*get_setting)(int option, int *value, void *data);
>> +int (*set_setting)(int option, int value, void *data);
>>  int (*next_error)(struct eeh_pe **pe);
>>  };
>>  
>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
>> b/arch/powerpc/platforms/pseries/eeh_pseries.c
>> index 62415f2..8c9509b 100644
>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>> @@ -612,6 +612,48 @@ static int pseries_eeh_write_config(struct device_node 
>> *dn, int where, int size,
>>  return rtas_write_config(pdn, where, size, val);
>>  }
>>  
>> +/**
>> + * pseries_eeh_get_setting - Retrieve settings that affect EEH core
>> + * @option: option
>> + * @value: value
>> + * @data: dependent data
>> + *
>> + * Retrieve the settings from the platform in order to affect the
>> + * behaviour of EEH core. We don't block PCI config or MMIO access
>> + * on pSeries platform.
>> + */
>> +static int pseries_eeh_get_setting(int option, int *value, void *data)
>> +{
>> +int ret = 0;
>> +
>> +switch (option) {
>> +case EEH_SETTING_BLOCK_CFG:
>> +case EEH_SETTING_BLOCK_IO:
>> +*value = 0;
>> +break;
>> +default:
>> +pr_warning("%s: Unrecognized option (%d)\n",
>> +   __func__, option);
>> +ret = -EINVAL;
>> +}
>> +
>> +return ret;
>> +}
>> +
>> +/**
>> + * pseries_eeh_set_setting - Configure settings to affect EEH core
>> + * @option: option
>> + * @value: value
>> + * @data: dependent data
>> + *
>> + * Configure the settings for the platform in order to affect the
>> + * behaviour of EEH core.
>> + */
>> +static int pseries_eeh_set_setting(int option, int value, void *data)
>> +{
>> +return 0;
>> +}
>> +
>>  static struct eeh_ops pseries_eeh_ops = {
>>  .name   = "pseries",
>>  .init   = pseries_eeh_init,
>> @@ -626,6 +668,8 @@ static struct eeh_ops pseries_eeh_ops = {
>>  .configure_bridge   = pseries_eeh_configure_bridge,
>>  .read_config= pseries_eeh_read_config,
>>  .write_config   = pseries_eeh_write_config,
>> +.get_setting= pseries_eeh_get_setting,
>> +.set_setting= pseries_eeh_set_setting,
>>  .next_error = NULL
>>  };
>>  
>
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Regression in RCU subsystem in latest mainline kernel

2013-06-25 Thread Michael Ellerman
On Tue, Jun 18, 2013 at 09:09:06PM -0700, Paul E. McKenney wrote:
> On Mon, Jun 17, 2013 at 05:42:13PM +1000, Michael Ellerman wrote:
> > On Sat, Jun 15, 2013 at 12:02:21PM +1000, Benjamin Herrenschmidt wrote:
> > > On Fri, 2013-06-14 at 17:06 -0400, Steven Rostedt wrote:
> > > > I was pretty much able to reproduce this on my PA Semi PPC box. Funny
> > > > thing is, when I type on the console, it makes progress. Anyway, it
> > > > seems that powerpc has an issue with irq_work(). I'll try to get some
> > > > time either tonight or next week to figure it out.
> > > 
> > > Does this help ?
> > > 
> > > diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> > > index 5cbcf4d..ea185e0 100644
> > > --- a/arch/powerpc/kernel/irq.c
> > > +++ b/arch/powerpc/kernel/irq.c
> > > @@ -162,7 +162,7 @@ notrace unsigned int __check_irq_replay(void)
> > >* in case we also had a rollover while hard disabled
> > >*/
> > >   local_paca->irq_happened &= ~PACA_IRQ_DEC;
> > > - if (decrementer_check_overflow())
> > > + if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
> > >   return 0x900;
> > >  
> > >   /* Finally check if an external interrupt happened */
> > > 
> > 
> > This seems to help, but doesn't elminate the RCU stall warnings I am
> > seeing. I now see them less often, but not never.
> > 
> > Stack trace is something like:
 
Hi Paul,

Sorry I've been distracted with other stuff the last week.

> Hmmm...  How many CPUs are on your system?  And how much work is
> perf_event_for_each_child() having to do here?

I'm not 100% sure which system this trace is from. But it would have
~100-128 cpus.

I don't think perf_event_for_each_child() is doing much, there should
only be a single event and the smp_call_function_single() should be
degrading to a local function call.

> If the amount of work is large and your kernel is built with
> CONFIG_PREEMPT=n, the RCU CPU stall warning would be expected behavior.
> If so, we might need a preemption point in perf_event_for_each_child().

I'm using CONFIG_PREEMPT_NONE=y, which I think is what you mean.

Here's another trace from 3.10-rc7 plus a few local patches.

We suspect that the perf enable could be causing a flood of interrupts, but why
that's clogging things up so badly who knows.

INFO: rcu_sched self-detected stall on CPU { 38}  (t=2600 jiffies g=1 c=0 q=9)
cpu 0x26: Vector: 0  at [c007ed952b60]
pc: c014f500: .rcu_check_callbacks+0x400/0x8e0
lr: c014f500: .rcu_check_callbacks+0x400/0x8e0
sp: c007ed952cd0
   msr: 90009032
  current = 0xc007ed8b4a80
  paca= 0xcfdcab00   softe: 0irq_happened: 0x00
pid   = 2492, comm = power8-events
enter ? for help
[c007ed952e00] c00a3e88 .update_process_times+0x48/0xa0
[c007ed952e90] c00fd600 .tick_sched_handle.isra.13+0x40/0xd0
[c007ed952f20] c00fd8b4 .tick_sched_timer+0x64/0xa0
[c007ed952fc0] c00ca074 .__run_hrtimer+0x94/0x250
[c007ed953060] c00cb0f8 .hrtimer_interrupt+0x138/0x3a0
[c007ed953150] c001ef54 .timer_interrupt+0x124/0x2f0
[c007ed953200] c000a5fc restore_check_irq_replay+0x68/0xa8
--- Exception: 901 (Decrementer) at c00105ec 
.arch_local_irq_restore+0xc/0x10
[link register   ] c0096dac .__do_softirq+0x13c/0x380
[c007ed9534f0] c0096da0 .__do_softirq+0x130/0x380 (unreliable)
[c007ed953610] c0097228 .irq_exit+0xd8/0x120
[c007ed953690] c001ef88 .timer_interrupt+0x158/0x2f0
[c007ed953740] c000a5fc restore_check_irq_replay+0x68/0xa8
--- Exception: 901 (Decrementer) at c010e16c 
.smp_call_function_single+0x13c/0x230
[c007ed953a30] c0189c64 .task_function_call+0x54/0x70 (unreliable)
[c007ed953ad0] c0189d4c .perf_event_enable+0xcc/0x150
[c007ed953b70] c0187ea0 .perf_event_for_each_child+0x60/0x100
[c007ed953c00] c018c5e8 .perf_ioctl+0x108/0x3c0
[c007ed953ca0] c0226e94 .do_vfs_ioctl+0xc4/0x740
[c007ed953d90] c0227570 .SyS_ioctl+0x60/0xb0
[c007ed953e30] c0009e60 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 1fee03d0
SP (3fffdf0d2700) is in userspace


cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Regression in RCU subsystem in latest mainline kernel

2013-06-25 Thread Benjamin Herrenschmidt
On Tue, 2013-06-25 at 17:19 +1000, Michael Ellerman wrote:
> Here's another trace from 3.10-rc7 plus a few local patches.
> 
> We suspect that the perf enable could be causing a flood of
> interrupts, but why
> that's clogging things up so badly who knows.

Additionally, perf being potentially NMIs , we might be hitting a bad
case of reentrance in RCU ... hard to tell.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Regression in RCU subsystem in latest mainline kernel

2013-06-25 Thread Michael Ellerman
On Tue, Jun 25, 2013 at 05:19:14PM +1000, Michael Ellerman wrote:
> 
> Here's another trace from 3.10-rc7 plus a few local patches.

And here's another with CONFIG_RCU_CPU_STALL_INFO=y in case that's useful:

PASS running test_pmc5_6_overuse()
INFO: rcu_sched self-detected stall on CPU
8: (1 GPs behind) idle=8eb/142/0 softirq=215/220 
 (t=2100 jiffies g=18446744073709551583 c=18446744073709551582 q=13)
cpu 0x8: Vector: 0  at [c003ea03eae0]
pc: c011d9b0: .rcu_check_callbacks+0x450/0x910
lr: c011d9b0: .rcu_check_callbacks+0x450/0x910
sp: c003ea03ec40
   msr: 90009032
  current = 0xc003ebf9f4a0
  paca= 0xcfdc2400   softe: 0irq_happened: 0x00
pid   = 2444, comm = power8-events
enter ? for help
[c003ea03ed70] c0094cd0 .update_process_times+0x40/0x90
[c003ea03ee00] c00df050 .tick_sched_handle.isra.13+0x20/0xa0
[c003ea03ee80] c00df2bc .tick_sched_timer+0x5c/0xa0
[c003ea03ef20] c00b3728 .__run_hrtimer+0x98/0x260
[c003ea03efc0] c00b4738 .hrtimer_interrupt+0x138/0x3c0
[c003ea03f0d0] c001cd34 .timer_interrupt+0x124/0x2f0
[c003ea03f180] c000a4f4 restore_check_irq_replay+0x68/0xa8
--- Exception: 901 (Decrementer) at c0093ad4 
.run_timer_softirq+0x74/0x360
[c003ea03f580] c0089ac4 .__do_softirq+0x174/0x350
[c003ea03f6a0] c0089ea8 .irq_exit+0xb8/0x100
[c003ea03f720] c001cd68 .timer_interrupt+0x158/0x2f0
[c003ea03f7d0] c000a4f4 restore_check_irq_replay+0x68/0xa8
--- Exception: 901 (Decrementer) at c014a520 
.task_function_call+0x60/0x70
[c003ea03fac0] c014a634 .perf_event_enable+0x104/0x1c0 (unreliable)
[c003ea03fb70] c01495ec .perf_event_for_each_child+0x5c/0xf0
[c003ea03fc00] c014cd78 .perf_ioctl+0x108/0x400
[c003ea03fca0] c01d9aa0 .do_vfs_ioctl+0xb0/0x740
[c003ea03fd80] c01da188 .SyS_ioctl+0x58/0xb0
[c003ea03fe30] c0009d54 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 1fee03d0
SP (35e7cc90) is in userspace


cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/4] powerpc: Remove unreachable relocation on exception handlers

2013-06-25 Thread Michael Ellerman
We have relocation on exception handlers defined for h_data_storage and
h_instr_storage. However we will never take relocation on exceptions for
these because they can only come from a guest, and we never take
relocation on exceptions when we transition from guest to host.

We also have a handler for hmi_exception (Hypervisor Maintenance) which
is defined in the architecture to never be delivered with relocation on,
see see v2.07 Book III-S section 6.5.

So remove the handlers, leaving a branch to self just to be double extra
paranoid.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/exceptions-64s.S |   18 +++---
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 40e4a17..0a9fdea 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -793,14 +793,10 @@ system_call_relon_pSeries:
STD_RELON_EXCEPTION_PSERIES(0x4d00, 0xd00, single_step)
 
. = 0x4e00
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXGEN)
-   b   h_data_storage_relon_hv
+   b   .   /* Can't happen, see v2.07 Book III-S section 6.5 */
 
. = 0x4e20
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXGEN)
-   b   h_instr_storage_relon_hv
+   b   .   /* Can't happen, see v2.07 Book III-S section 6.5 */
 
. = 0x4e40
SET_SCRATCH0(r13)
@@ -808,9 +804,7 @@ system_call_relon_pSeries:
b   emulation_assist_relon_hv
 
. = 0x4e60
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXGEN)
-   b   hmi_exception_relon_hv
+   b   .   /* Can't happen, see v2.07 Book III-S section 6.5 */
 
. = 0x4e80
SET_SCRATCH0(r13)
@@ -1180,14 +1174,8 @@ tm_unavailable_common:
 __end_handlers:
 
/* Equivalents to the above handlers for relocation-on interrupt 
vectors */
-   STD_RELON_EXCEPTION_HV_OOL(0xe00, h_data_storage)
-   KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe00)
-   STD_RELON_EXCEPTION_HV_OOL(0xe20, h_instr_storage)
-   KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe20)
STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist)
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe40)
-   STD_RELON_EXCEPTION_HV_OOL(0xe60, hmi_exception)
-   KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe60)
MASKABLE_RELON_EXCEPTION_HV_OOL(0xe80, h_doorbell)
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe80)
 
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/4] powerpc: Remove KVMTEST from RELON exception handlers

2013-06-25 Thread Michael Ellerman
KVMTEST is a macro which checks whether we are taking an exception from
guest context, if so we branch out of line and eventually call into the
KVM code to handle the switch.

When running real guests on bare metal (HV KVM) the hardware ensures
that we never take a relocation on exception when transitioning from
guest to host. For PR KVM we disable relocation on exceptions ourself in
kvmppc_core_init_vm(), as of commit a413f47 "Disable relocation on
exceptions whenever PR KVM is active".

So convert all the RELON macros to use NOTEST, and drop the remaining
KVM_HANDLER() definitions we have for 0xe40 and 0xe80.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/exception-64s.h |8 
 arch/powerpc/kernel/exceptions-64s.S |2 --
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 46793b5..07ca627 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -358,12 +358,12 @@ label##_relon_pSeries:
\
/* No guest interrupts come through here */ \
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label##_common, \
-  EXC_STD, KVMTEST_PR, vec)
+  EXC_STD, NOTEST, vec)
 
 #define STD_RELON_EXCEPTION_PSERIES_OOL(vec, label)\
.globl label##_relon_pSeries;   \
 label##_relon_pSeries: \
-   EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, vec);\
+   EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, vec);\
EXCEPTION_RELON_PROLOG_PSERIES_1(label##_common, EXC_STD)
 
 #define STD_RELON_EXCEPTION_HV(loc, vec, label)\
@@ -374,12 +374,12 @@ label##_relon_hv: \
/* No guest interrupts come through here */ \
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label##_common, \
-  EXC_HV, KVMTEST, vec)
+  EXC_HV, NOTEST, vec)
 
 #define STD_RELON_EXCEPTION_HV_OOL(vec, label) \
.globl label##_relon_hv;\
 label##_relon_hv:  \
-   EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, vec);   \
+   EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, vec);\
EXCEPTION_RELON_PROLOG_PSERIES_1(label##_common, EXC_HV)
 
 /* This associate vector numbers with bits in paca->irq_happened */
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 0a9fdea..6bd6763 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1175,9 +1175,7 @@ __end_handlers:
 
/* Equivalents to the above handlers for relocation-on interrupt 
vectors */
STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist)
-   KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe40)
MASKABLE_RELON_EXCEPTION_HV_OOL(0xe80, h_doorbell)
-   KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe80)
 
STD_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor)
STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable)
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 3/4] powerpc: Rename and flesh out the facility unavailable exception handler

2013-06-25 Thread Michael Ellerman
From: Michael Ellerman 

The exception at 0xf60 is not the TM (Transactional Memory) unavailable
exception, it is the "Facility Unavailable Exception", rename it as
such.

Flesh out the handler to acknowledge the fact that it can be called for
many reasons, one of which is TM being unavailable.

Use STD_EXCEPTION_COMMON() for the exception body, for some reason we
had it open-coded, I've checked the generated code is identical.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/exceptions-64s.S |   21 +++--
 arch/powerpc/kernel/traps.c  |   33 +
 2 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 6bd6763..d55a63c 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -341,10 +341,11 @@ vsx_unavailable_pSeries_1:
EXCEPTION_PROLOG_0(PACA_EXGEN)
b   vsx_unavailable_pSeries
 
+facility_unavailable_trampoline:
. = 0xf60
SET_SCRATCH0(r13)
EXCEPTION_PROLOG_0(PACA_EXGEN)
-   b   tm_unavailable_pSeries
+   b   facility_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
STD_EXCEPTION_HV(0x1200, 0x1202, cbe_system_error)
@@ -522,7 +523,7 @@ denorm_done:
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf20)
STD_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf40)
-   STD_EXCEPTION_PSERIES_OOL(0xf60, tm_unavailable)
+   STD_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf60)
 
 /*
@@ -829,11 +830,11 @@ vsx_unavailable_relon_pSeries_1:
EXCEPTION_PROLOG_0(PACA_EXGEN)
b   vsx_unavailable_relon_pSeries
 
-tm_unavailable_relon_pSeries_1:
+facility_unavailable_relon_trampoline:
. = 0x4f60
SET_SCRATCH0(r13)
EXCEPTION_PROLOG_0(PACA_EXGEN)
-   b   tm_unavailable_relon_pSeries
+   b   facility_unavailable_relon_pSeries
 
STD_RELON_EXCEPTION_PSERIES(0x5300, 0x1300, instruction_breakpoint)
 #ifdef CONFIG_PPC_DENORMALISATION
@@ -1159,15 +1160,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
bl  .vsx_unavailable_exception
b   .ret_from_except
 
-   .align  7
-   .globl tm_unavailable_common
-tm_unavailable_common:
-   EXCEPTION_PROLOG_COMMON(0xf60, PACA_EXGEN)
-   bl  .save_nvgprs
-   DISABLE_INTS
-   addir3,r1,STACK_FRAME_OVERHEAD
-   bl  .tm_unavailable_exception
-   b   .ret_from_except
+   STD_EXCEPTION_COMMON(0xf60, facility_unavailable, 
.facility_unavailable_exception)
 
.align  7
.globl  __end_handlers
@@ -1180,7 +1173,7 @@ __end_handlers:
STD_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor)
STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable)
STD_RELON_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable)
-   STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, tm_unavailable)
+   STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable)
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
 /*
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index c0e5caf..2053bbd 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1282,25 +1282,42 @@ void vsx_unavailable_exception(struct pt_regs *regs)
die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
 }
 
-void tm_unavailable_exception(struct pt_regs *regs)
+void facility_unavailable_exception(struct pt_regs *regs)
 {
+   static char *facility_strings[] = {
+   "FPU",
+   "VMX/VSX",
+   "DSCR",
+   "PMU SPRs",
+   "BHRB",
+   "TM",
+   "AT",
+   "EBB",
+   "TAR",
+   };
+   char *facility;
+   u64 value;
+
+   value = mfspr(SPRN_FSCR) >> 56;
+
/* We restore the interrupt state now */
if (!arch_irq_disabled_regs(regs))
local_irq_enable();
 
-   /* Currently we never expect a TMU exception.  Catch
-* this and kill the process!
-*/
-   printk(KERN_EMERG "Unexpected TM unavailable exception at %lx "
-  "(msr %lx)\n",
-  regs->nip, regs->msr);
+   if (value < ARRAY_SIZE(facility_strings))
+   facility = facility_strings[value];
+   else
+   facility = "unknown";
+
+   pr_err("Facility '%s' unavailable, exception at 0x%lx, MSR=%lx\n",
+   facility, regs->nip, regs->msr);
 
if (user_mode(regs)) {
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
return;
}
 
-   die("Unexpected TM unavailable exception", regs, SIGABRT);
+   die("Unexpected facility unavailable exception", regs, SIGABRT);
 }
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-- 
1.7.10.4

_

[PATCH 4/4] powerpc: Wire up the HV facility unavailable exception

2013-06-25 Thread Michael Ellerman
Similar to the facility unavailble exception, except the facilities are
controlled by HFSCR.

Adapt the facility_unavailable_exception() so it can be called for
either the regular or Hypervisor facility unavailable exceptions.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/exceptions-64s.S |   15 +++
 arch/powerpc/kernel/traps.c  |   16 
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index d55a63c..4e00d22 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -347,6 +347,12 @@ facility_unavailable_trampoline:
EXCEPTION_PROLOG_0(PACA_EXGEN)
b   facility_unavailable_pSeries
 
+hv_facility_unavailable_trampoline:
+   . = 0xf80
+   SET_SCRATCH0(r13)
+   EXCEPTION_PROLOG_0(PACA_EXGEN)
+   b   facility_unavailable_hv
+
 #ifdef CONFIG_CBE_RAS
STD_EXCEPTION_HV(0x1200, 0x1202, cbe_system_error)
KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0x1202)
@@ -525,6 +531,8 @@ denorm_done:
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf40)
STD_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf60)
+   STD_EXCEPTION_HV_OOL(0xf82, facility_unavailable)
+   KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xf82)
 
 /*
  * An interrupt came in while soft-disabled. We set paca->irq_happened, then:
@@ -836,6 +844,12 @@ facility_unavailable_relon_trampoline:
EXCEPTION_PROLOG_0(PACA_EXGEN)
b   facility_unavailable_relon_pSeries
 
+hv_facility_unavailable_relon_trampoline:
+   . = 0x4f80
+   SET_SCRATCH0(r13)
+   EXCEPTION_PROLOG_0(PACA_EXGEN)
+   b   facility_unavailable_relon_hv
+
STD_RELON_EXCEPTION_PSERIES(0x5300, 0x1300, instruction_breakpoint)
 #ifdef CONFIG_PPC_DENORMALISATION
. = 0x5500
@@ -1174,6 +1188,7 @@ __end_handlers:
STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable)
STD_RELON_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable)
STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable)
+   STD_RELON_EXCEPTION_HV_OOL(0xf80, facility_unavailable)
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
 /*
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2053bbd..e4f205a 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1295,10 +1295,18 @@ void facility_unavailable_exception(struct pt_regs 
*regs)
"EBB",
"TAR",
};
-   char *facility;
+   char *facility, *prefix;
u64 value;
 
-   value = mfspr(SPRN_FSCR) >> 56;
+   if (regs->trap == 0xf60) {
+   value = mfspr(SPRN_FSCR);
+   prefix = "";
+   } else {
+   value = mfspr(SPRN_HFSCR);
+   prefix = "Hypervisor ";
+   }
+
+   value = value >> 56;
 
/* We restore the interrupt state now */
if (!arch_irq_disabled_regs(regs))
@@ -1309,8 +1317,8 @@ void facility_unavailable_exception(struct pt_regs *regs)
else
facility = "unknown";
 
-   pr_err("Facility '%s' unavailable, exception at 0x%lx, MSR=%lx\n",
-   facility, regs->nip, regs->msr);
+   pr_err("%sFacility '%s' unavailable, exception at 0x%lx, MSR=%lx\n",
+   prefix, facility, regs->nip, regs->msr);
 
if (user_mode(regs)) {
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 03/10] powerpc/eeh: Check PCIe link after reset

2013-06-25 Thread Gavin Shan
On Tue, Jun 25, 2013 at 04:06:24PM +1000, Benjamin Herrenschmidt wrote:
>On Tue, 2013-06-25 at 13:55 +0800, Gavin Shan wrote:
>> * don't touch the other command bits
>>  */
>> -   eeh_ops->read_config(dn, PCI_COMMAND, 4, &cmd);
>> -   if (edev->config_space[1] & PCI_COMMAND_PARITY)
>> -   cmd |= PCI_COMMAND_PARITY;
>> -   else
>> -   cmd &= ~PCI_COMMAND_PARITY;
>> -   if (edev->config_space[1] & PCI_COMMAND_SERR)
>> -   cmd |= PCI_COMMAND_SERR;
>> -   else
>> -   cmd &= ~PCI_COMMAND_SERR;
>> -   eeh_ops->write_config(dn, PCI_COMMAND, 4, cmd);
>> +   if (pdev) {
>> +   eeh_ops->write_config(dn, PCI_COMMAND, 4,
>> + edev->config_space[1]);
>> +   } else {
>
>That needs a much better comment. Why are you doing that instead
>of what's below ? In fact there is more to restore in a bridge
>right ? (windows etc...). Do you do that ? Should we just have a
>different function to restore a device vs. a bridge ?
>

Yeah, We should have one separate function to do that for bridge.
I'll do that in next revision.

>I also don't see a need to do thing differently between phyp and
>powernv. Bridges inside partitions would suffer the same fate in
>both cases.
>

If we just have complete reset for fenced PHB, we need restore it
from the cache (edev->config_space[1]) instead of reading that from
hardware. Fenced PHB is the special case on PowerNV :-)

Thanks,
Gavin


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/9] PTR_RET is now PTR_ERR_OR_ZERO(): Replace most.

2013-06-25 Thread Benjamin Herrenschmidt
On Sun, 2013-06-16 at 14:12 +0930, Rusty Russell wrote:
> Sweep of the simple cases.
> 
> Cc: net...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: Julia Lawall 
> Signed-off-by: Rusty Russell 

Acked-by: Benjamin Herrenschmidt 

> ---
>  arch/arm/mach-omap2/i2c.c |  2 +-
>  arch/m68k/amiga/platform.c|  2 +-
>  arch/m68k/kernel/time.c   |  2 +-
>  arch/m68k/q40/config.c|  2 +-
>  arch/powerpc/kernel/iommu.c   |  2 +-
>  arch/powerpc/kernel/time.c|  2 +-
>  arch/powerpc/platforms/ps3/time.c |  2 +-
>  arch/powerpc/sysdev/rtc_cmos_setup.c  |  2 +-
>  arch/s390/hypfs/hypfs_dbfs.c  |  2 +-
>  drivers/char/tile-srom.c  |  2 +-
>  drivers/infiniband/core/cma.c |  2 +-
>  drivers/net/appletalk/cops.c  |  2 +-
>  drivers/net/appletalk/ltpc.c  |  2 +-
>  drivers/net/ethernet/amd/atarilance.c |  2 +-
>  drivers/net/ethernet/amd/mvme147.c|  2 +-
>  drivers/net/ethernet/amd/ni65.c   |  2 +-
>  drivers/net/ethernet/amd/sun3lance.c  |  2 +-
>  drivers/net/wireless/brcm80211/brcmfmac/dhd_dbg.c |  2 +-
>  drivers/net/wireless/brcm80211/brcmsmac/debug.c   |  2 +-
>  drivers/platform/x86/samsung-q10.c|  2 +-
>  drivers/regulator/fan53555.c  |  2 +-
>  drivers/spi/spi-fsl-spi.c |  2 +-
>  drivers/spi/spidev.c  |  2 +-
>  drivers/video/omap2/dss/core.c|  2 +-
>  fs/btrfs/dev-replace.c|  2 +-
>  fs/btrfs/inode.c  |  2 +-
>  net/bluetooth/hci_sysfs.c |  2 +-
>  net/bridge/netfilter/ebtable_broute.c |  2 +-
>  net/bridge/netfilter/ebtable_filter.c |  2 +-
>  net/bridge/netfilter/ebtable_nat.c|  2 +-
>  net/ipv4/netfilter/arptable_filter.c  |  2 +-
>  net/ipv4/netfilter/iptable_filter.c   |  2 +-
>  net/ipv4/netfilter/iptable_mangle.c   |  2 +-
>  net/ipv4/netfilter/iptable_nat.c  |  2 +-
>  net/ipv4/netfilter/iptable_raw.c  |  2 +-
>  net/ipv4/netfilter/iptable_security.c |  2 +-
>  net/ipv6/netfilter/ip6table_filter.c  |  2 +-
>  net/ipv6/netfilter/ip6table_mangle.c  |  2 +-
>  net/ipv6/netfilter/ip6table_nat.c |  2 +-
>  net/ipv6/netfilter/ip6table_raw.c |  2 +-
>  net/ipv6/netfilter/ip6table_security.c|  2 +-
>  scripts/coccinelle/api/ptr_ret.cocci  | 10 +-
>  sound/soc/soc-io.c|  2 +-
>  43 files changed, 47 insertions(+), 47 deletions(-)
> 
> diff --git a/arch/arm/mach-omap2/i2c.c b/arch/arm/mach-omap2/i2c.c
> index d940e53..b456b44 100644
> --- a/arch/arm/mach-omap2/i2c.c
> +++ b/arch/arm/mach-omap2/i2c.c
> @@ -181,7 +181,7 @@ int __init omap_i2c_add_bus(struct 
> omap_i2c_bus_platform_data *i2c_pdata,
>sizeof(struct omap_i2c_bus_platform_data));
>   WARN(IS_ERR(pdev), "Could not build omap_device for %s\n", name);
>  
> - return PTR_RET(pdev);
> + return PTR_ERR_OR_ZERO(pdev);
>  }
>  
>  static  int __init omap_i2c_cmdline(void)
> diff --git a/arch/m68k/amiga/platform.c b/arch/m68k/amiga/platform.c
> index 6083088..dacd9f9 100644
> --- a/arch/m68k/amiga/platform.c
> +++ b/arch/m68k/amiga/platform.c
> @@ -56,7 +56,7 @@ static int __init amiga_init_bus(void)
>   n = AMIGAHW_PRESENT(ZORRO3) ? 4 : 2;
>   pdev = platform_device_register_simple("amiga-zorro", -1,
>  zorro_resources, n);
> - return PTR_RET(pdev);
> + return PTR_ERR_OR_ZERO(pdev);
>  }
>  
>  subsys_initcall(amiga_init_bus);
> diff --git a/arch/m68k/kernel/time.c b/arch/m68k/kernel/time.c
> index bea6bcf..7eb9792 100644
> --- a/arch/m68k/kernel/time.c
> +++ b/arch/m68k/kernel/time.c
> @@ -90,7 +90,7 @@ static int __init rtc_init(void)
>   return -ENODEV;
>  
>   pdev = platform_device_register_simple("rtc-generic", -1, NULL, 0);
> - return PTR_RET(pdev);
> + return PTR_ERR_OR_ZERO(pdev);
>  }
>  
>  module_init(rtc_init);
> diff --git a/arch/m68k/q40/config.c b/arch/m68k/q40/config.c
> index 658542b..078bb74 100644
> --- a/arch/m68k/q40/config.c
> +++ b/arch/m68k/q40/config.c
> @@ -338,6 +338,6 @@ static __init int q40_add_kbd_device(void)
>   return -ENODEV;
>  
>   pdev = platform_device_register_simple("q40kbd", -1, NULL, 0);
> - return PTR_RET(pdev);
> + return PTR_ERR_OR_ZERO(pdev);
>  }
>  arch_initcall(q40_add_kbd_device);
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index c

[git pull] Please pull powerpc.git merge branch

2013-06-25 Thread Benjamin Herrenschmidt
Hi Linus !

This is a fix for a regression causing a freescale "83xx" based platforms
to crash on boot due to some PCI breakage. Please apply.

Cheers,
Ben.

The following changes since commit 17858ca65eef148d335ffd4cfc09228a1c1cbfb5:

  Merge tag 'please-pull-fixia64' of 
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux (2013-06-18 06:29:19 
-1000)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

for you to fetch changes up to b37e161388ac3980d5dfb73050e85874b84253eb:

  powerpc/pci: Fix boot panic on mpc83xx (regression) (2013-06-24 16:54:09 
-0500)


Rojhalat Ibrahim (1):
  powerpc/pci: Fix boot panic on mpc83xx (regression)

 arch/powerpc/sysdev/fsl_pci.c |   24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 03/10] powerpc/eeh: Check PCIe link after reset

2013-06-25 Thread Benjamin Herrenschmidt
On Tue, 2013-06-25 at 15:47 +0800, Gavin Shan wrote:
> If we just have complete reset for fenced PHB, we need restore it
> from the cache (edev->config_space[1]) instead of reading that from
> hardware. Fenced PHB is the special case on PowerNV :-)

Well not really...

In general we can also end up doing a hard reset under pHyp, and bridges
can lose their state as well, which means they need to be restored from
cache.

We don't see the real PHB, but we might see the bridges if we have a PE
that contains a bridge, for example, a PCIe card with a switch on it.

If we hard reset that (because the driver requested it) or if pHyp did a
reset due to a fence behind the scene, that bridge *will* have lost its
state and will need to be reconfigured too... or is RTAS doing it all ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 03/10] powerpc/eeh: Check PCIe link after reset

2013-06-25 Thread Gavin Shan
On Tue, Jun 25, 2013 at 05:57:44PM +1000, Benjamin Herrenschmidt wrote:
>On Tue, 2013-06-25 at 15:47 +0800, Gavin Shan wrote:
>> If we just have complete reset for fenced PHB, we need restore it
>> from the cache (edev->config_space[1]) instead of reading that from
>> hardware. Fenced PHB is the special case on PowerNV :-)
>
>Well not really...
>
>In general we can also end up doing a hard reset under pHyp, and bridges
>can lose their state as well, which means they need to be restored from
>cache.
>
>We don't see the real PHB, but we might see the bridges if we have a PE
>that contains a bridge, for example, a PCIe card with a switch on it.
>
>If we hard reset that (because the driver requested it) or if pHyp did a
>reset due to a fence behind the scene, that bridge *will* have lost its
>state and will need to be reconfigured too... or is RTAS doing it all ?
>

Ok. So that would be job of eeh_ops->configure_bridge(). On pSeries, it
should have done with that.

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] powerpc/hw_brk: Fix setting of length for exact mode breakpoints

2013-06-25 Thread Anshuman Khandual
On 06/24/2013 11:17 AM, Michael Neuling wrote:
> The smallest match region for both the DABR and DAWR is 8 bytes, so the
> kernel needs to filter matches when users want to look at regions smaller than
> this.
> 
> Currently we set the length of PPC_BREAKPOINT_MODE_EXACT breakpoints to 8.
> This is wrong as in exact mode we should only match on 1 address, hence the
> length should be 1.
> 
> This ensures that the kernel will filter out any exact mode hardware 
> breakpoint
> matches on any addresses other than the requested one.
> 
> Signed-off-by: Michael Neuling 

Reviewed-by: Anshuman Khandual 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc/hw_brk: Fix clearing of extraneous IRQ

2013-06-25 Thread Anshuman Khandual
On 06/24/2013 11:17 AM, Michael Neuling wrote:
> In 9422de3 "powerpc: Hardware breakpoints rewrite to handle non DABR 
> breakpoint
> registers" we changed the way we mark extraneous irqs with this:
> 
> - info->extraneous_interrupt = !((bp->attr.bp_addr <= dar) &&
> - (dar - bp->attr.bp_addr < bp->attr.bp_len));
> + if (!((bp->attr.bp_addr <= dar) &&
> +   (dar - bp->attr.bp_addr < bp->attr.bp_len)))
> + info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
> 
> Unfortunately this is bogus as it never clears extraneous IRQ if it's already
> set.
> 
> This correctly clears extraneous IRQ before possibly setting it.
> 
> Signed-off-by: Michael Neuling 
Reviewed-by: Anshuman Khandual 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/6] powerpc/eeh: Don't collect PCI-CFG data on PHB

2013-06-25 Thread Gavin Shan
When the PHB is fenced or dead, it's pointless to collect the data
from PCI config space of subordinate PCI devices since it should
return 0xFF's. It also has potential risk to incur additional errors.
The patch avoids collecting PCI-CFG data while PHB is in fenced or
dead state.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh.c |   34 --
 1 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 951a632..60deb42 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -232,16 +232,30 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int 
severity)
 {
size_t loglen = 0;
struct eeh_dev *edev;
+   bool valid_cfg_log = true;
 
-   eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
-   eeh_ops->configure_bridge(pe);
-   eeh_pe_restore_bars(pe);
-
-   pci_regs_buf[0] = 0;
-   eeh_pe_for_each_dev(pe, edev) {
-   loglen += eeh_gather_pci_data(edev, pci_regs_buf,
-   EEH_PCI_REGS_LOG_LEN);
-}
+   /*
+* When the PHB is fenced or dead, it's pointless to collect
+* the data from PCI config space because it should return
+* 0xFF's. For ER, we still retrieve the data from the PCI
+* config space.
+*/
+   if (eeh_probe_mode_dev() &&
+   (pe->type & EEH_PE_PHB) &&
+   (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)))
+   valid_cfg_log = false;
+
+   if (valid_cfg_log) {
+   eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
+   eeh_ops->configure_bridge(pe);
+   eeh_pe_restore_bars(pe);
+
+   pci_regs_buf[0] = 0;
+   eeh_pe_for_each_dev(pe, edev) {
+   loglen += eeh_gather_pci_data(edev, pci_regs_buf,
+ EEH_PCI_REGS_LOG_LEN);
+   }
+   }
 
eeh_ops->get_log(pe, severity, pci_regs_buf, loglen);
 }
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/6] powerpc/eeh: Check PCIe link after reset

2013-06-25 Thread Gavin Shan
After reset (e.g. complete reset) in order to bring the fenced PHB
back, the PCIe link might not be ready yet. The patch intends to
make sure the PCIe link is ready before accessing its subordinate
PCI devices. The patch also fixes that wrong values restored to
PCI_COMMAND register for PCI bridges.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh_pe.c |  157 ++
 1 files changed, 144 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 55943fc..016588a 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -22,6 +22,7 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -567,30 +568,132 @@ void eeh_pe_state_clear(struct eeh_pe *pe, int state)
eeh_pe_traverse(pe, __eeh_pe_state_clear, &state);
 }
 
-/**
- * eeh_restore_one_device_bars - Restore the Base Address Registers for one 
device
- * @data: EEH device
- * @flag: Unused
+/*
+ * Some PCI bridges (e.g. PLX bridges) have primary/secondary
+ * buses assigned explicitly by firmware, and we probably have
+ * lost that after reset. So we have to delay the check until
+ * the PCI-CFG registers have been restored for the parent
+ * bridge.
  *
- * Loads the PCI configuration space base address registers,
- * the expansion ROM base address, the latency timer, and etc.
- * from the saved values in the device node.
+ * Don't use normal PCI-CFG accessors, which probably has been
+ * blocked on normal path during the stage. So we need utilize
+ * eeh operations, which is always permitted.
  */
-static void *eeh_restore_one_device_bars(void *data, void *flag)
+static void eeh_bridge_check_link(struct pci_dev *pdev,
+ struct device_node *dn)
+{
+   int cap;
+   uint32_t val;
+   int timeout = 0;
+
+   /*
+* We only check root port and downstream ports of
+* PCIe switches
+*/
+   if (!pci_is_pcie(pdev) ||
+   (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&
+pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
+   return;
+
+   pr_debug("%s: Check PCIe link for %s ...\n",
+__func__, pci_name(pdev));
+
+   /* Check slot status */
+   cap = pdev->pcie_cap;
+   eeh_ops->read_config(dn, cap + PCI_EXP_SLTSTA, 2, &val);
+   if (!(val & PCI_EXP_SLTSTA_PDS)) {
+   pr_debug("  No card in the slot (0x%04x) !\n", val);
+   return;
+   }
+
+   /* Check power status if we have the capability */
+   eeh_ops->read_config(dn, cap + PCI_EXP_SLTCAP, 2, &val);
+   if (val & PCI_EXP_SLTCAP_PCP) {
+   eeh_ops->read_config(dn, cap + PCI_EXP_SLTCTL, 2, &val);
+   if (val & PCI_EXP_SLTCTL_PCC) {
+   pr_debug("  In power-off state, power it on ...\n");
+   val &= ~(PCI_EXP_SLTCTL_PCC | PCI_EXP_SLTCTL_PIC);
+   val |= (0x0100 & PCI_EXP_SLTCTL_PIC);
+   eeh_ops->write_config(dn, cap + PCI_EXP_SLTCTL, 2, val);
+   msleep(2 * 1000);
+   }
+   }
+
+   /* Enable link */
+   eeh_ops->read_config(dn, cap + PCI_EXP_LNKCTL, 2, &val);
+   val &= ~PCI_EXP_LNKCTL_LD;
+   eeh_ops->write_config(dn, cap + PCI_EXP_LNKCTL, 2, val);
+
+   /* Check link */
+   eeh_ops->read_config(dn, cap + PCI_EXP_LNKCAP, 4, &val);
+   if (!(val & PCI_EXP_LNKCAP_DLLLARC)) {
+   pr_debug("  No link reporting capability (0x%08x) \n", val);
+   msleep(1000);
+   return;
+   }
+
+   /* Wait the link is up until timeout (5s) */
+   timeout = 0;
+   while (timeout < 5000) {
+   msleep(20);
+   timeout += 20;
+
+   eeh_ops->read_config(dn, cap + PCI_EXP_LNKSTA, 2, &val);
+   if (val & PCI_EXP_LNKSTA_DLLLA)
+   break;
+   }
+
+   if (val & PCI_EXP_LNKSTA_DLLLA)
+   pr_debug("  Link up (%s)\n",
+(val & PCI_EXP_LNKSTA_CLS_2_5GB) ? "2.5GB" : "5GB");
+   else
+   pr_debug("  Link not ready (0x%04x)\n", val);
+}
+
+#define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF))
+#define SAVED_BYTE(OFF)(((u8 *)(edev->config_space))[BYTE_SWAP(OFF)])
+
+static void eeh_restore_bridge_bars(struct pci_dev *pdev,
+   struct eeh_dev *edev,
+   struct device_node *dn)
+{
+   int i;
+
+   /*
+* Device BARs: 0x10 - 0x18
+* Bus numbers and windows: 0x18 - 0x30
+*/
+   for (i = 4; i < 13; i++)
+   eeh_ops->write_config(dn, i*4, 4, edev->config_space[i]);
+   /* Rom: 0x38 */
+   eeh_ops->write_config(dn, 14*4, 4, edev->config_space[14]);
+
+   /* Cache line & Latency timer: 0xC 0xD */
+   eeh_ops->write_c

[PATCH 5/6] powerpc/eeh: Refactor the output message

2013-06-25 Thread Gavin Shan
We needn't the the whole backtrace other than one-line message in
the error reporting interrupt handler. For errors triggered by
access PCI config space or MMIO, we replace "WARN(1, ...)" with
pr_err() and dump_stack().

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh.c |9 +++--
 arch/powerpc/platforms/powernv/eeh-ioda.c |   25 -
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 60deb42..38e4b40 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -324,7 +324,9 @@ static int eeh_phb_check_failure(struct eeh_pe *pe)
eeh_serialize_unlock(flags);
eeh_send_failure_event(phb_pe);
 
-   WARN(1, "EEH: PHB failure detected\n");
+   pr_err("EEH: PHB#%x failure detected\n",
+   phb_pe->phb->global_number);
+   dump_stack();
 
return 1;
 out:
@@ -453,7 +455,10 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 * a stack trace will help the device-driver authors figure
 * out what happened.  So print that out.
 */
-   WARN(1, "EEH: failure detected\n");
+   pr_err("EEH: Frozen PE#%x detected on PHB#%x\n",
+   pe->addr, pe->phb->global_number);
+   dump_stack();
+
return 1;
 
 dn_unlock:
diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 85025d7..0cd1c4a 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -853,11 +853,14 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
phb->eeh_state |= PNV_EEH_STATE_REMOVED;
}
 
-   WARN(1, "EEH: dead IOC detected\n");
+   pr_err("EEH: dead IOC detected\n");
ret = 4;
goto out;
-   } else if (severity == OPAL_EEH_SEV_INF)
+   } else if (severity == OPAL_EEH_SEV_INF) {
+   pr_info("EEH: IOC informative error "
+   "detected\n");
ioda_eeh_hub_diag(hose);
+   }
 
break;
case OPAL_EEH_PHB_ERROR:
@@ -865,8 +868,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
if (ioda_eeh_get_phb_pe(hose, pe))
break;
 
-   WARN(1, "EEH: dead PHB#%x detected\n",
-hose->global_number);
+   pr_err("EEH: dead PHB#%x detected\n",
+   hose->global_number);
phb->eeh_state |= PNV_EEH_STATE_REMOVED;
ret = 3;
goto out;
@@ -874,20 +877,24 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
if (ioda_eeh_get_phb_pe(hose, pe))
break;
 
-   WARN(1, "EEH: fenced PHB#%x detected\n",
-hose->global_number);
+   pr_err("EEH: fenced PHB#%x detected\n",
+   hose->global_number);
ret = 2;
goto out;
-   } else if (severity == OPAL_EEH_SEV_INF)
+   } else if (severity == OPAL_EEH_SEV_INF) {
+   pr_info("EEH: PHB#%x informative error "
+   "detected\n",
+   hose->global_number);
ioda_eeh_phb_diag(hose);
+   }
 
break;
case OPAL_EEH_PE_ERROR:
if (ioda_eeh_get_pe(hose, frozen_pe_no, pe))
break;
 
-   WARN(1, "EEH: Frozen PE#%x on PHB#%x detected\n",
-(*pe)->addr, (*pe)->phb->global_number);
+   pr_err("EEH: Frozen PE#%x on PHB#%x detected\n",
+   (*pe)->addr, (*pe)->phb->global_number);
ret = 1;
goto out;
}
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 6/6] powerpc/eeh: Avoid build warnings

2013-06-25 Thread Gavin Shan
The patch is for avoiding following build warnings:

   The function .pnv_pci_ioda_fixup() references
   the function __init .eeh_init().
   This is often because .pnv_pci_ioda_fixup lacks a __init

   The function .pnv_pci_ioda_fixup() references
   the function __init .eeh_addr_cache_build().
   This is often because .pnv_pci_ioda_fixup lacks a __init

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/eeh.h  |4 ++--
 arch/powerpc/kernel/eeh.c   |2 +-
 arch/powerpc/kernel/eeh_cache.c |2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index dd65e31..09a8743 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -202,13 +202,13 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
 void *eeh_dev_init(struct device_node *dn, void *data);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
-int __init eeh_init(void);
+int eeh_init(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 unsigned long eeh_check_failure(const volatile void __iomem *token,
unsigned long val);
 int eeh_dev_check_failure(struct eeh_dev *edev);
-void __init eeh_addr_cache_build(void);
+void eeh_addr_cache_build(void);
 void eeh_add_device_tree_early(struct device_node *);
 void eeh_add_device_tree_late(struct pci_bus *);
 void eeh_add_sysfs_files(struct pci_bus *);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 38e4b40..f055e6f 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -751,7 +751,7 @@ int __exit eeh_ops_unregister(const char *name)
  * Even if force-off is set, the EEH hardware is still enabled, so that
  * newer systems can boot.
  */
-int __init eeh_init(void)
+int eeh_init(void)
 {
struct pci_controller *hose, *tmp;
struct device_node *phb;
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 858ebea..ea9a94c 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -285,7 +285,7 @@ void eeh_addr_cache_rmv_dev(struct pci_dev *dev)
  * Must be run late in boot process, after the pci controllers
  * have been scanned for devices (after all device resources are known).
  */
-void __init eeh_addr_cache_build(void)
+void eeh_addr_cache_build(void)
 {
struct device_node *dn;
struct eeh_dev *edev;
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 00/6] Follow-up fixes for EEH on PowerNV

2013-06-25 Thread Gavin Shan
The series of patches are follow-up in order to make EEH workable for PowerNV
platform on Juno-IOC-L machine. Couple of issues have been fixed with help of
Ben:

- Check PCIe link after PHB complete reset
- Restore config space for bridges
- The EEH address cache wasn't built successfully
- Misc cleanup on output messages
- Misc cleanup on EEH flags maintained by "struct pnv_phb"
- Misc cleanup on properties of functions to avoid build warnings
 
The series of patches have been verified on Juno-IOC-L machine:

Trigger frozen PE:

echo 0x0200 > /sys/kernel/debug/powerpc/PCI/err_injct
sleep 1
echo 0x0 > /sys/kernel/debug/powerpc/PCI/err_injct

Trigger fenced PHB:

echo 0x8000 > /sys/kernel/debug/powerpc/PCI/err_injct


Changelog:

v1 -> v2:
* Remove the mechanism to block PCI-CFG and MMIO.
* Add one patch to do cleanup on output messages.
* Add one patch to avoid build warnings.
* Split functions to restore BARs for PCI devices and bridges 
separately.

---

arch/powerpc/include/asm/eeh.h|4 +-
arch/powerpc/kernel/eeh.c |   43 ++--
arch/powerpc/kernel/eeh_cache.c   |4 +-
arch/powerpc/kernel/eeh_pe.c  |  157 ++---
arch/powerpc/platforms/powernv/eeh-ioda.c |   33 ---
arch/powerpc/platforms/powernv/pci-ioda.c |1 +
arch/powerpc/platforms/powernv/pci.c  |4 +-
arch/powerpc/platforms/powernv/pci.h  |7 +-
8 files changed, 207 insertions(+), 46 deletions(-)

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 4/6] powerpc/eeh: Fix address catch for PowerNV

2013-06-25 Thread Gavin Shan
On the PowerNV platform, the EEH address cache isn't built correctly
because we skipped the EEH devices without binding PE. The patch
fixes that.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh_cache.c   |2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 1d5d9a6..858ebea 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -194,7 +194,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
}
 
/* Skip any devices for which EEH is not enabled. */
-   if (!edev->pe) {
+   if (!eeh_probe_mode_dev() && !edev->pe) {
 #ifdef DEBUG
pr_info("PCI: skip building address cache for=%s - %s\n",
pci_name(dev), dn->full_name);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 3e5c3d5..0ff9a3a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -998,6 +998,7 @@ static void pnv_pci_ioda_fixup(void)
pnv_pci_ioda_create_dbgfs();
 
 #ifdef CONFIG_EEH
+   eeh_probe_mode_set(EEH_PROBE_MODE_DEV);
eeh_addr_cache_build();
eeh_init();
 #endif
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 3/6] powerpc/powernv: Replace variables with flags

2013-06-25 Thread Gavin Shan
We have 2 fields in "struct pnv_phb" to trace the states. The patch
replace the fields with one and introduces flags for that. The patch
doesn't impact the logic.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |8 
 arch/powerpc/platforms/powernv/pci.c  |4 ++--
 arch/powerpc/platforms/powernv/pci.h  |7 +--
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 84f3036..85025d7 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -132,7 +132,7 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
&ioda_eeh_dbgfs_ops);
 #endif
 
-   phb->eeh_enabled = 1;
+   phb->eeh_state |= PNV_EEH_STATE_ENABLED;
}
 
return 0;
@@ -815,7 +815,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 * removed, we needn't take care of it any more.
 */
phb = hose->private_data;
-   if (phb->removed)
+   if (phb->eeh_state & PNV_EEH_STATE_REMOVED)
continue;
 
rc = opal_pci_next_error(phb->opal_id,
@@ -850,7 +850,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
list_for_each_entry_safe(hose, tmp,
&hose_list, list_node) {
phb = hose->private_data;
-   phb->removed = 1;
+   phb->eeh_state |= PNV_EEH_STATE_REMOVED;
}
 
WARN(1, "EEH: dead IOC detected\n");
@@ -867,7 +867,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 
WARN(1, "EEH: dead PHB#%x detected\n",
 hose->global_number);
-   phb->removed = 1;
+   phb->eeh_state |= PNV_EEH_STATE_REMOVED;
ret = 3;
goto out;
} else if (severity == OPAL_EEH_SEV_PHB_FENCED) {
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 6d9a506..1f31826 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -308,7 +308,7 @@ static int pnv_pci_read_config(struct pci_bus *bus,
if (phb_pe && (phb_pe->state & EEH_PE_ISOLATED))
return PCIBIOS_SUCCESSFUL;
 
-   if (phb->eeh_enabled) {
+   if (phb->eeh_state & PNV_EEH_STATE_ENABLED) {
if (*val == EEH_IO_ERROR_VALUE(size)) {
busdn = pci_bus_to_OF_node(bus);
for (dn = busdn->child; dn; dn = dn->sibling) {
@@ -358,7 +358,7 @@ static int pnv_pci_write_config(struct pci_bus *bus,
 
/* Check if the PHB got frozen due to an error (no response) */
 #ifdef CONFIG_EEH
-   if (!phb->eeh_enabled)
+   if (!(phb->eeh_state & PNV_EEH_STATE_ENABLED))
pnv_pci_config_check_eeh(phb, bus, bdfn);
 #else
pnv_pci_config_check_eeh(phb, bus, bdfn);
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 43906e3..40bdf02 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -78,6 +78,10 @@ struct pnv_eeh_ops {
int (*configure_bridge)(struct eeh_pe *pe);
int (*next_error)(struct eeh_pe **pe);
 };
+
+#define PNV_EEH_STATE_ENABLED  (1 << 0)/* EEH enabled  */
+#define PNV_EEH_STATE_REMOVED  (1 << 1)/* PHB removed  */
+
 #endif /* CONFIG_EEH */
 
 struct pnv_phb {
@@ -92,8 +96,7 @@ struct pnv_phb {
 
 #ifdef CONFIG_EEH
struct pnv_eeh_ops  *eeh_ops;
-   int eeh_enabled;
-   int removed;
+   int eeh_state;
 #endif
 
 #ifdef CONFIG_DEBUG_FS
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 1/3] mm/cma: Move dma contiguous changes into a seperate config

2013-06-25 Thread Anshuman Khandual
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 4e22ce3..5d93bb5 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -6,7 +6,7 @@ obj-y := core.o bus.o dd.o syscore.o \
>  attribute_container.o transport_class.o \
>  topology.o
>  obj-$(CONFIG_DEVTMPFS)   += devtmpfs.o
> -obj-$(CONFIG_CMA) += dma-contiguous.o
> +obj-$(CONFIG_DMA_CMA) += dma-contiguous.o
>  obj-y+= power/
>  obj-$(CONFIG_HAS_DMA)+= dma-mapping.o
>  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
> index 01b5c84..00141d3 100644
> --- a/include/linux/dma-contiguous.h
> +++ b/include/linux/dma-contiguous.h
> @@ -57,7 +57,7 @@ struct cma;
>  struct page;
>  struct device;
> 
> -#ifdef CONFIG_CMA
> +#ifdef CONFIG_DMA_CMA
> 

We have some generic CMA documentation available in this file which need
to be moved to a more generic place (generic MM) as we are differentiating
it from DMA specific usage. Ideally we should have two documentation

(1) CMA usage for any subsystem
(2) DMA specific CMA usage


>  /*
>   * There is always at least global CMA area and a few optional device
> diff --git a/mm/Kconfig b/mm/Kconfig
> index e742d06..b362369 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -477,3 +477,23 @@ config FRONTSWAP
> and swap data is stored as normal on the matching swap device.
> 
> If unsure, say Y to enable frontswap.
> +
> +config CMA
> + bool "Contiguous Memory Allocator"
> + depends on HAVE_MEMBLOCK
> + select MIGRATION
> + select MEMORY_ISOLATION
> + help
> +   This enables the Contiguous Memory Allocator which allows other
> +   subsystem to allocate big physically-contiguous blocks of memory

Should be "any subsystem" instead of "other subsystem"

> +
> +   If unsure, say "n".
> +
> +config CMA_DEBUG
> + bool "CMA debug messages (DEVELOPMENT)"
> + depends on DEBUG_KERNEL && CMA
> + help
> +   Turns on debug messages in CMA.  This produces KERN_DEBUG
> +   messages for every CMA call as well as various messages while
> +   processing calls such as dma_alloc_from_contiguous().
> +   This option does not affect warning and error messages.
> 

We should probably split up these debug configs as well to differentiate between
generic CMA_DEBUG and DMA_CMA_DEBUG options.


Regards
Anshuman

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8

2013-06-25 Thread Anshuman Khandual
On 06/24/2013 04:58 PM, Michael Ellerman wrote:
> A mistake we have made in the past is that we pull out the fields we
> need from the event code, but don't check that there are no unknown bits
> set. This means that we can't ever assign meaning to those unknown bits
> in future.
> 
> Although we have once again failed to do this at release, it is still
> early days for Power8 so I think we can still slip this in and get away
> with it.
> 
> Signed-off-by: Michael Ellerman 
Reviewed-by: Anshuman Khandual 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/8] powerpc/perf: Rework disable logic in pmu_disable()

2013-06-25 Thread Anshuman Khandual
On 06/24/2013 04:58 PM, Michael Ellerman wrote:
> In pmu_disable() we disable the PMU by setting the FC (Freeze Counters)
> bit in MMCR0. In order to do this we have to read/modify/write MMCR0.
> 
> It's possible that we read a value from MMCR0 which has PMAO (PMU Alert
> Occurred) set. When we write that value back it will cause an interrupt
> to occur. We will then end up in the PMU interrupt handler even though
> we are supposed to have just disabled the PMU.
> 

Is that possible ? First of all MMCR0[PMAO] could not be written by SW.
Even if you try writing it, how its going to generate PMU interrupt ?
HW sets this bit MMCR0[PMAO] after a PMU interrupt has already occurred
not that if we set this, a PMU interrupt would be generated.

> We can avoid this by making sure we never write PMAO back. We should not

Making sure that we dont write PMAO back is a good idea though.

> lose interrupts because when the PMU is re-enabled the overflowed values
> will cause another interrupt.
> 

I doubt this theory.

> We also reorder the clearing of SAMPLE_ENABLE so that is done after the
> PMU is frozen. Otherwise there is a small window between the clearing of
> SAMPLE_ENABLE and the setting of FC where we could take an interrupt and
> incorrectly see SAMPLE_ENABLE not set. This would for example change the
> logic in perf_read_regs().
> 

Agreed

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/6] powerpc/eeh: Don't collect PCI-CFG data on PHB

2013-06-25 Thread Benjamin Herrenschmidt
On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote:
> +   /*
> +* When the PHB is fenced or dead, it's pointless to collect
> +* the data from PCI config space because it should return
> +* 0xFF's. For ER, we still retrieve the data from the PCI
> +* config space.
> +*/
> +   if (eeh_probe_mode_dev() &&
> +   (pe->type & EEH_PE_PHB) &&
> +   (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)))
> +   valid_cfg_log = false;
> +

I'm still unsure about that one. EEH_PE_ISOLATED could be the result
of a normal ER of PE#0 (which can happen for various reasons other
than a fence) in which case the config space is available and
interesting.

I would either not bother and collect the FF's, or make this specific
to fence and only fence.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/6] powerpc/eeh: Don't collect PCI-CFG data on PHB

2013-06-25 Thread Benjamin Herrenschmidt
On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote:
> +   pci_regs_buf[0] = 0;
> +   eeh_pe_for_each_dev(pe, edev) {
> +   loglen += eeh_gather_pci_data(edev, pci_regs_buf,
> + EEH_PCI_REGS_LOG_LEN);
> +   }
> +   }

Unless I'm mistaken, this is buggy and will overwrite the content of
pci_regs_buf for every device (they will all write over the same
portion of the log).

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/6] powerpc/eeh: Check PCIe link after reset

2013-06-25 Thread Benjamin Herrenschmidt
On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote:
> After reset (e.g. complete reset) in order to bring the fenced PHB
> back, the PCIe link might not be ready yet. The patch intends to
> make sure the PCIe link is ready before accessing its subordinate
> PCI devices. The patch also fixes that wrong values restored to
> PCI_COMMAND register for PCI bridges.

This should also help if we end up doing a full reset for ER cases
right ?

IE, in a setup with PHB -> device (no switch), if the device driver
requests a fundamental reset, we should do a PERST at the PHB level (are
we ?) and thus restore things in a similar way.

> Signed-off-by: Gavin Shan 
> ---
>  arch/powerpc/kernel/eeh_pe.c |  157 
> ++
>  1 files changed, 144 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
> index 55943fc..016588a 100644
> --- a/arch/powerpc/kernel/eeh_pe.c
> +++ b/arch/powerpc/kernel/eeh_pe.c
> @@ -22,6 +22,7 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -567,30 +568,132 @@ void eeh_pe_state_clear(struct eeh_pe *pe, int state)
>   eeh_pe_traverse(pe, __eeh_pe_state_clear, &state);
>  }
>  
> -/**
> - * eeh_restore_one_device_bars - Restore the Base Address Registers for one 
> device
> - * @data: EEH device
> - * @flag: Unused
> +/*
> + * Some PCI bridges (e.g. PLX bridges) have primary/secondary
> + * buses assigned explicitly by firmware, and we probably have
> + * lost that after reset. So we have to delay the check until
> + * the PCI-CFG registers have been restored for the parent
> + * bridge.
>   *
> - * Loads the PCI configuration space base address registers,
> - * the expansion ROM base address, the latency timer, and etc.
> - * from the saved values in the device node.
> + * Don't use normal PCI-CFG accessors, which probably has been
> + * blocked on normal path during the stage. So we need utilize
> + * eeh operations, which is always permitted.
>   */
> -static void *eeh_restore_one_device_bars(void *data, void *flag)
> +static void eeh_bridge_check_link(struct pci_dev *pdev,
> +   struct device_node *dn)
> +{
> + int cap;
> + uint32_t val;
> + int timeout = 0;
> +
> + /*
> +  * We only check root port and downstream ports of
> +  * PCIe switches
> +  */
> + if (!pci_is_pcie(pdev) ||
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&
> +  pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
> + return;
> +
> + pr_debug("%s: Check PCIe link for %s ...\n",
> +  __func__, pci_name(pdev));
> +
> + /* Check slot status */
> + cap = pdev->pcie_cap;
> + eeh_ops->read_config(dn, cap + PCI_EXP_SLTSTA, 2, &val);
> + if (!(val & PCI_EXP_SLTSTA_PDS)) {
> + pr_debug("  No card in the slot (0x%04x) !\n", val);
> + return;
> + }
> +
> + /* Check power status if we have the capability */
> + eeh_ops->read_config(dn, cap + PCI_EXP_SLTCAP, 2, &val);
> + if (val & PCI_EXP_SLTCAP_PCP) {
> + eeh_ops->read_config(dn, cap + PCI_EXP_SLTCTL, 2, &val);
> + if (val & PCI_EXP_SLTCTL_PCC) {
> + pr_debug("  In power-off state, power it on ...\n");
> + val &= ~(PCI_EXP_SLTCTL_PCC | PCI_EXP_SLTCTL_PIC);
> + val |= (0x0100 & PCI_EXP_SLTCTL_PIC);
> + eeh_ops->write_config(dn, cap + PCI_EXP_SLTCTL, 2, val);
> + msleep(2 * 1000);
> + }
> + }
> +
> + /* Enable link */
> + eeh_ops->read_config(dn, cap + PCI_EXP_LNKCTL, 2, &val);
> + val &= ~PCI_EXP_LNKCTL_LD;
> + eeh_ops->write_config(dn, cap + PCI_EXP_LNKCTL, 2, val);
> +
> + /* Check link */
> + eeh_ops->read_config(dn, cap + PCI_EXP_LNKCAP, 4, &val);
> + if (!(val & PCI_EXP_LNKCAP_DLLLARC)) {
> + pr_debug("  No link reporting capability (0x%08x) \n", val);
> + msleep(1000);
> + return;
> + }
> +
> + /* Wait the link is up until timeout (5s) */
> + timeout = 0;
> + while (timeout < 5000) {
> + msleep(20);
> + timeout += 20;
> +
> + eeh_ops->read_config(dn, cap + PCI_EXP_LNKSTA, 2, &val);
> + if (val & PCI_EXP_LNKSTA_DLLLA)
> + break;
> + }
> +
> + if (val & PCI_EXP_LNKSTA_DLLLA)
> + pr_debug("  Link up (%s)\n",
> +  (val & PCI_EXP_LNKSTA_CLS_2_5GB) ? "2.5GB" : "5GB");
> + else
> + pr_debug("  Link not ready (0x%04x)\n", val);
> +}
> +
> +#define BYTE_SWAP(OFF)   (8*((OFF)/4)+3-(OFF))
> +#define SAVED_BYTE(OFF)  (((u8 *)(edev->config_space))[BYTE_SWAP(OFF)])
> +
> +static void eeh_restore_bridge_bars(struct pci_dev *pdev,
> + struct eeh_dev *edev,
> + 

[PATCH v3 1/2] perf tools: fix a typo of a Power7 event name

2013-06-25 Thread Runzhen Wang
In the Power7 PMU guide:
https://www.power.org/documentation/commonly-used-metrics-for-performance-analysis/
PM_BRU_MPRED is referred to as PM_BR_MPRED.

It fixed the typo by changing the name of the event in kernel
and documentation accordingly.

This patch changes the ABI, there are some reasons I think it's ok:

- It is relatively new interface, specific to the Power7 platform.

- No tools that we know of actually use this interface at this point
 (none are listed near the interface).

- Users of this interface (eg oprofile users migrating to perf)
  would be more used to the "PM_BR_MPRED" rather than "PM_BRU_MPRED".

- These are in the ABI/testing at this point rather than ABI/stable,
  so hoping we have some wiggle room.

Signed-off-by: Runzhen Wang 
---
 .../testing/sysfs-bus-event_source-devices-events  |2 +-
 arch/powerpc/perf/power7-pmu.c |   12 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
index 8b25ffb..3c1cc24 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
@@ -29,7 +29,7 @@ Description:  Generic performance monitoring events
 
 What:  /sys/devices/cpu/events/PM_1PLUS_PPC_CMPL
/sys/devices/cpu/events/PM_BRU_FIN
-   /sys/devices/cpu/events/PM_BRU_MPRED
+   /sys/devices/cpu/events/PM_BR_MPRED
/sys/devices/cpu/events/PM_CMPLU_STALL
/sys/devices/cpu/events/PM_CMPLU_STALL_BRU
/sys/devices/cpu/events/PM_CMPLU_STALL_DCACHE_MISS
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 13c3f0e..d1821b8 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -60,7 +60,7 @@
 #definePME_PM_LD_REF_L10xc880
 #definePME_PM_LD_MISS_L1   0x400f0
 #definePME_PM_BRU_FIN  0x10068
-#definePME_PM_BRU_MPRED0x400f6
+#definePME_PM_BR_MPRED 0x400f6
 
 #define PME_PM_CMPLU_STALL_FXU 0x20014
 #define PME_PM_CMPLU_STALL_DIV 0x40014
@@ -349,7 +349,7 @@ static int power7_generic_events[] = {
[PERF_COUNT_HW_CACHE_REFERENCES] =  PME_PM_LD_REF_L1,
[PERF_COUNT_HW_CACHE_MISSES] =  PME_PM_LD_MISS_L1,
[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =   PME_PM_BRU_FIN,
-   [PERF_COUNT_HW_BRANCH_MISSES] = PME_PM_BRU_MPRED,
+   [PERF_COUNT_HW_BRANCH_MISSES] = PME_PM_BR_MPRED,
 };
 
 #define C(x)   PERF_COUNT_HW_CACHE_##x
@@ -405,7 +405,7 @@ GENERIC_EVENT_ATTR(instructions,INST_CMPL);
 GENERIC_EVENT_ATTR(cache-references,   LD_REF_L1);
 GENERIC_EVENT_ATTR(cache-misses,   LD_MISS_L1);
 GENERIC_EVENT_ATTR(branch-instructions,BRU_FIN);
-GENERIC_EVENT_ATTR(branch-misses,  BRU_MPRED);
+GENERIC_EVENT_ATTR(branch-misses,  BR_MPRED);
 
 POWER_EVENT_ATTR(CYC,  CYC);
 POWER_EVENT_ATTR(GCT_NOSLOT_CYC,   GCT_NOSLOT_CYC);
@@ -414,7 +414,7 @@ POWER_EVENT_ATTR(INST_CMPL, INST_CMPL);
 POWER_EVENT_ATTR(LD_REF_L1,LD_REF_L1);
 POWER_EVENT_ATTR(LD_MISS_L1,   LD_MISS_L1);
 POWER_EVENT_ATTR(BRU_FIN,  BRU_FIN)
-POWER_EVENT_ATTR(BRU_MPRED,BRU_MPRED);
+POWER_EVENT_ATTR(BR_MPRED, BR_MPRED);
 
 POWER_EVENT_ATTR(CMPLU_STALL_FXU,  CMPLU_STALL_FXU);
 POWER_EVENT_ATTR(CMPLU_STALL_DIV,  CMPLU_STALL_DIV);
@@ -449,7 +449,7 @@ static struct attribute *power7_events_attr[] = {
GENERIC_EVENT_PTR(LD_REF_L1),
GENERIC_EVENT_PTR(LD_MISS_L1),
GENERIC_EVENT_PTR(BRU_FIN),
-   GENERIC_EVENT_PTR(BRU_MPRED),
+   GENERIC_EVENT_PTR(BR_MPRED),
 
POWER_EVENT_PTR(CYC),
POWER_EVENT_PTR(GCT_NOSLOT_CYC),
@@ -458,7 +458,7 @@ static struct attribute *power7_events_attr[] = {
POWER_EVENT_PTR(LD_REF_L1),
POWER_EVENT_PTR(LD_MISS_L1),
POWER_EVENT_PTR(BRU_FIN),
-   POWER_EVENT_PTR(BRU_MPRED),
+   POWER_EVENT_PTR(BR_MPRED),
 
POWER_EVENT_PTR(CMPLU_STALL_FXU),
POWER_EVENT_PTR(CMPLU_STALL_DIV),
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 2/2] perf tools: Make Power7 events available for perf

2013-06-25 Thread Runzhen Wang
Power7 supports over 530 different perf events but only a small
subset of these can be specified by name, for the remaining
events, we must specify them by their raw code:

perf stat -e r2003c 

This patch makes all the POWER7 events available in sysfs.
So we can instead specify these as:

perf stat -e 'cpu/PM_CMPLU_STALL_DFU/' 

where PM_CMPLU_STALL_DFU is the r2003c in previous example.

Before this patch is applied, the size of power7-pmu.o is:

$ size arch/powerpc/perf/power7-pmu.o
   textdata bss dec hex filename
   30732720   0579316a1 arch/powerpc/perf/power7-pmu.o

and after the patch is applied, it is:

$ size arch/powerpc/perf/power7-pmu.o
   textdata bss dec hex filename
  15950   31112   0   47062b7d6 arch/powerpc/perf/power7-pmu.o

For the run time overhead, I use two scripts, one is "event_name.sh",
which contains 50 event names, it looks like:

 # ./perf record  -e 'cpu/PM_CMPLU_STALL_DFU/' -e .  /bin/sleep 1

the other one is named "event_code.sh" which use corresponding  events raw
code instead of events names, it looks like:

 # ./perf record -e r2003c -e ..  /bin/sleep 1

below is the result.

Using events name:

[root@localhost perf]# time ./event_name.sh
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (~102 samples) ]

real0m1.192s
user0m0.028s
sys 0m0.106s

Using events raw code:

[root@localhost perf]# time ./event_code.sh
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (~112 samples) ]

real0m1.198s
user0m0.028s
sys 0m0.105s

Signed-off-by: Runzhen Wang 
---
 arch/powerpc/include/asm/perf_event_server.h |4 +-
 arch/powerpc/perf/power7-events-list.h   |  548 ++
 arch/powerpc/perf/power7-pmu.c   |  148 ++-
 3 files changed, 582 insertions(+), 118 deletions(-)
 create mode 100644 arch/powerpc/perf/power7-events-list.h

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index f265049..d9270d8 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -136,11 +136,11 @@ extern ssize_t power_events_sysfs_show(struct device *dev,
 #defineEVENT_PTR(_id, _suffix) &EVENT_VAR(_id, 
_suffix).attr.attr
 
 #defineEVENT_ATTR(_name, _id, _suffix) 
\
-   PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_PM_##_id,\
+   PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_##_id,   \
power_events_sysfs_show)
 
 #defineGENERIC_EVENT_ATTR(_name, _id)  EVENT_ATTR(_name, _id, _g)
 #defineGENERIC_EVENT_PTR(_id)  EVENT_PTR(_id, _g)
 
-#definePOWER_EVENT_ATTR(_name, _id)EVENT_ATTR(PM_##_name, _id, _p)
+#definePOWER_EVENT_ATTR(_name, _id)EVENT_ATTR(_name, _id, _p)
 #definePOWER_EVENT_PTR(_id)EVENT_PTR(_id, _p)
diff --git a/arch/powerpc/perf/power7-events-list.h 
b/arch/powerpc/perf/power7-events-list.h
new file mode 100644
index 000..a67e8a9
--- /dev/null
+++ b/arch/powerpc/perf/power7-events-list.h
@@ -0,0 +1,548 @@
+/*
+ * Performance counter support for POWER7 processors.
+ *
+ * Copyright 2013 Runzhen Wang, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+EVENT(PM_IC_DEMAND_L2_BR_ALL, 0x4898)
+EVENT(PM_GCT_UTIL_7_TO_10_SLOTS,  0x20a0)
+EVENT(PM_PMC2_SAVED,  0x10022)
+EVENT(PM_CMPLU_STALL_DFU, 0x2003c)
+EVENT(PM_VSU0_16FLOP, 0xa0a4)
+EVENT(PM_MRK_LSU_DERAT_MISS,  0x3d05a)
+EVENT(PM_MRK_ST_CMPL, 0x10034)
+EVENT(PM_NEST_PAIR3_ADD,  0x40881)
+EVENT(PM_L2_ST_DISP,  0x46180)
+EVENT(PM_L2_CASTOUT_MOD,  0x16180)
+EVENT(PM_ISEG,0x20a4)
+EVENT(PM_MRK_INST_TIMEO,  0x40034)
+EVENT(PM_L2_RCST_DISP_FAIL_ADDR,  0x36282)
+EVENT(PM_LSU1_DC_PREF_STREAM_CONFIRM, 0xd0b6)
+EVENT(PM_IERAT_WR_64K,0x40be)
+EVENT(PM_MRK_DTLB_MISS_16M,   0x4d05e)
+EVENT(PM_IERAT_MISS,  0x100f6)
+EVENT(PM_MRK_PTEG_FROM_LMEM,  0x4d052)
+EVENT(PM_FLOP,0x100f4)
+EVENT(PM_THRD_PRIO_4_5_CYC,   0x40b4)
+EVENT(PM_BR_PRED_TA,  0x40aa)
+EVENT(PM_CMPLU_STALL_FXU, 0x20014)
+EVENT(PM_EXT_INT, 0x200f8)
+EVENT(PM_VSU_FSQRT_FDIV,  

[PATCH v2 0/2] perf tools: Power7 events name available for perf

2013-06-25 Thread Runzhen Wang
Thank for Sukadev Bhattip and Xiao Guangrong's help.
Thank for Michael Ellerman's review. 

There is the Change Log for v2:

1. As Michael Ellerman suggested, I added runtime overhead information 
   in the 0002 patch's description.

2. Put the events name in a new head file which is named "power7-events-list.h",
   and use several macros, such as, 
 
   #define EVENT(_name, _code) POWER_EVENT_ATTR(_name, _code)
   #include "power7-events-list.h"
   #undef EVENT
  
   to generate different outputs.
 
Thanks
Runzhen Wang

Runzhen Wang (2):
  perf tools: fix a typo of a Power7 event name
  perf tools: Make Power7 events available for perf

 .../testing/sysfs-bus-event_source-devices-events  |2 +-
 arch/powerpc/include/asm/perf_event_server.h   |4 +-
 arch/powerpc/perf/power7-events-list.h |  548 
 arch/powerpc/perf/power7-pmu.c |  150 ++
 4 files changed, 584 insertions(+), 120 deletions(-)
 create mode 100644 arch/powerpc/perf/power7-events-list.h

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH 3/3] powerpc/pseries: Support compression of oops text via pstore

2013-06-25 Thread Luck, Tony
> Introducing headersize in pstore_write() API would need changes at
> multiple places whereits being called. The idea is to move the
> compression support to pstore infrastructure so that other platforms
> could also make use of it.

Any thoughts on the back/forward compatibility as we switch to compressed
pstore data?   E.g. imagine I have a system installed with some Linux 
distribution
with a kernel too old to know about compressed pstore. I use that machine to
run the latest kernels that do compression ... and one fine day one of them 
crashes
hard - logging in compressed form to pstore. Now I boot my distro kernel to pick
up the pieces ... what do I see in /sys/fs/pstore/*? Some compressed files? Can 
I
read them with some tool?

This somewhat of a corner case - but not completely unrealistic ... I'd at least
like to be reassured that the old kernel won't choke when it sees the compressed
blobs.

-Tony

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Regression in RCU subsystem in latest mainline kernel

2013-06-25 Thread Paul E. McKenney
On Tue, Jun 25, 2013 at 05:44:23PM +1000, Michael Ellerman wrote:
> On Tue, Jun 25, 2013 at 05:19:14PM +1000, Michael Ellerman wrote:
> > 
> > Here's another trace from 3.10-rc7 plus a few local patches.
> 
> And here's another with CONFIG_RCU_CPU_STALL_INFO=y in case that's useful:
> 
> PASS running test_pmc5_6_overuse()
> INFO: rcu_sched self-detected stall on CPU
>   8: (1 GPs behind) idle=8eb/142/0 softirq=215/220 

So this CPU has been out of action since before the beginning of the
current grace period ("1 GPs behind").  It is not idle, having taken
a pair of nested interrupts from process context (matching the stack
below).  This CPU has take five softirqs since the last grace period
that it noticed, which makes it likely that the loop is within the
softirq handler.

>(t=2100 jiffies g=18446744073709551583 c=18446744073709551582 q=13)

Assuming HZ=100, this stall has been going on  for 21 seconds.  There
is a grace period in progress according to RCU's global state (which
this CPU is not yet aware of).  There are a total of 13 RCU callbacks
queued across the entire system.

If the system is at all responsive, I suggest using ftrace (either from
the boot command line or at runtime) to trace __do_softirq() and
hrtimer_interrupt().

Thanx, Paul

> cpu 0x8: Vector: 0  at [c003ea03eae0]
> pc: c011d9b0: .rcu_check_callbacks+0x450/0x910
> lr: c011d9b0: .rcu_check_callbacks+0x450/0x910
> sp: c003ea03ec40
>msr: 90009032
>   current = 0xc003ebf9f4a0
>   paca= 0xcfdc2400 softe: 0irq_happened: 0x00
> pid   = 2444, comm = power8-events
> enter ? for help
> [c003ea03ed70] c0094cd0 .update_process_times+0x40/0x90
> [c003ea03ee00] c00df050 .tick_sched_handle.isra.13+0x20/0xa0
> [c003ea03ee80] c00df2bc .tick_sched_timer+0x5c/0xa0
> [c003ea03ef20] c00b3728 .__run_hrtimer+0x98/0x260
> [c003ea03efc0] c00b4738 .hrtimer_interrupt+0x138/0x3c0
> [c003ea03f0d0] c001cd34 .timer_interrupt+0x124/0x2f0
> [c003ea03f180] c000a4f4 restore_check_irq_replay+0x68/0xa8
> --- Exception: 901 (Decrementer) at c0093ad4 
> .run_timer_softirq+0x74/0x360
> [c003ea03f580] c0089ac4 .__do_softirq+0x174/0x350
> [c003ea03f6a0] c0089ea8 .irq_exit+0xb8/0x100
> [c003ea03f720] c001cd68 .timer_interrupt+0x158/0x2f0
> [c003ea03f7d0] c000a4f4 restore_check_irq_replay+0x68/0xa8
> --- Exception: 901 (Decrementer) at c014a520 
> .task_function_call+0x60/0x70
> [c003ea03fac0] c014a634 .perf_event_enable+0x104/0x1c0 
> (unreliable)
> [c003ea03fb70] c01495ec .perf_event_for_each_child+0x5c/0xf0
> [c003ea03fc00] c014cd78 .perf_ioctl+0x108/0x400
> [c003ea03fca0] c01d9aa0 .do_vfs_ioctl+0xb0/0x740
> [c003ea03fd80] c01da188 .SyS_ioctl+0x58/0xb0
> [c003ea03fe30] c0009d54 syscall_exit+0x0/0x98
> --- Exception: c01 (System Call) at 1fee03d0
> SP (35e7cc90) is in userspace
> 
> 
> cheers
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 3/3] powerpc/pseries: Support compression of oops text via pstore

2013-06-25 Thread Kees Cook
On Tue, Jun 25, 2013 at 12:04 AM, Aruna Balakrishnaiah
 wrote:
> Hi Kees,
>
>
> On Monday 24 June 2013 11:27 PM, Kees Cook wrote:
>>
>> On Sun, Jun 23, 2013 at 11:23 PM, Aruna Balakrishnaiah
>>  wrote:
>>>
>>> The patch set supports compression of oops messages while writing to
>>> NVRAM,
>>> this helps in capturing more of oops data to lnx,oops-log. The pstore
>>> file
>>> for oops messages will be in decompressed format making it readable.
>>>
>>> In case compression fails, the patch takes care of copying the header
>>> added
>>> by pstore and last oops_data_sz bytes of big_oops_buf to NVRAM so that we
>>> have recent oops messages in lnx,oops-log.
>>>
>>> In case decompression fails, it will result in absence of oops file but
>>> still
>>> have files (in /dev/pstore) for other partitions.
>>>
>>> Signed-off-by: Aruna Balakrishnaiah 
>>> ---
>>>   arch/powerpc/platforms/pseries/nvram.c |  132
>>> +---
>>>   1 file changed, 118 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/nvram.c
>>> b/arch/powerpc/platforms/pseries/nvram.c
>>> index 0159d74..b5ba5e2 100644
>>> --- a/arch/powerpc/platforms/pseries/nvram.c
>>> +++ b/arch/powerpc/platforms/pseries/nvram.c
>>> @@ -539,6 +539,65 @@ static int zip_oops(size_t text_len)
>>>   }
>>>
>>>   #ifdef CONFIG_PSTORE
>>> +/* Derived from logfs_uncompress */
>>> +int nvram_decompress(void *in, void *out, size_t inlen, size_t outlen)
>>> +{
>>> +   int err, ret;
>>> +
>>> +   ret = -EIO;
>>> +   err = zlib_inflateInit(&stream);
>>> +   if (err != Z_OK)
>>> +   goto error;
>>> +
>>> +   stream.next_in = in;
>>> +   stream.avail_in = inlen;
>>> +   stream.total_in = 0;
>>> +   stream.next_out = out;
>>> +   stream.avail_out = outlen;
>>> +   stream.total_out = 0;
>>> +
>>> +   err = zlib_inflate(&stream, Z_FINISH);
>>> +   if (err != Z_STREAM_END)
>>> +   goto error;
>>> +
>>> +   err = zlib_inflateEnd(&stream);
>>> +   if (err != Z_OK)
>>> +   goto error;
>>> +
>>> +   ret = stream.total_out;
>>> +error:
>>> +   return ret;
>>> +}
>>> +
>>> +static int unzip_oops(char *oops_buf, char *big_buf)
>>> +{
>>> +   struct oops_log_info *oops_hdr = (struct oops_log_info
>>> *)oops_buf;
>>> +   u64 timestamp = oops_hdr->timestamp;
>>> +   char *big_oops_data = NULL;
>>> +   char *oops_data_buf = NULL;
>>> +   size_t big_oops_data_sz;
>>> +   int unzipped_len;
>>> +
>>> +   big_oops_data = big_buf + sizeof(struct oops_log_info);
>>> +   big_oops_data_sz = big_oops_buf_sz - sizeof(struct
>>> oops_log_info);
>>> +   oops_data_buf = oops_buf + sizeof(struct oops_log_info);
>>> +
>>> +   unzipped_len = nvram_decompress(oops_data_buf, big_oops_data,
>>> +   oops_hdr->report_length,
>>> +   big_oops_data_sz);
>>> +
>>> +   if (unzipped_len < 0) {
>>> +   pr_err("nvram: decompression failed; returned %d\n",
>>> +
>>> unzipped_len);
>>> +   return -1;
>>> +   }
>>> +   oops_hdr = (struct oops_log_info *)big_buf;
>>> +   oops_hdr->version = OOPS_HDR_VERSION;
>>> +   oops_hdr->report_length = (u16) unzipped_len;
>>> +   oops_hdr->timestamp = timestamp;
>>> +   return 0;
>>> +}
>>> +
>>>   static int nvram_pstore_open(struct pstore_info *psi)
>>>   {
>>>  /* Reset the iterator to start reading partitions again */
>>> @@ -567,6 +626,7 @@ static int nvram_pstore_write(enum pstore_type_id
>>> type,
>>>  size_t size, struct pstore_info *psi)
>>>   {
>>>  int rc;
>>> +   unsigned int err_type = ERR_TYPE_KERNEL_PANIC;
>>>  struct oops_log_info *oops_hdr = (struct oops_log_info *)
>>> oops_buf;
>>>
>>>  /* part 1 has the recent messages from printk buffer */
>>> @@ -577,8 +637,31 @@ static int nvram_pstore_write(enum pstore_type_id
>>> type,
>>>  oops_hdr->version = OOPS_HDR_VERSION;
>>>  oops_hdr->report_length = (u16) size;
>>>  oops_hdr->timestamp = get_seconds();
>>> +
>>> +   if (big_oops_buf) {
>>> +   rc = zip_oops(size);
>>> +   /*
>>> +* If compression fails copy recent log messages from
>>> +* big_oops_buf to oops_data.
>>> +*/
>>> +   if (rc != 0) {
>>> +   int hsize = pstore_get_header_size();
>>
>> I think I would rather see the API to pstore_write() changed to
>> include explicit details about header sizes. Mkaing hsize a global
>> seems unwise, since it's not strictly going to be a constant value. It
>> could change between calls to the writer, for example.
>
>
> Introducing headersize in pstore_write() API would need changes at
> multiple places whereits being called. The idea is to move the
> compression support to pstore infrastructure so that other platforms
> c

BUG: no PCI/PCIe devices found in 85xx architecture

2013-06-25 Thread Stefani Seibold
Hi,

there is a bug in kernel 3.9 which the new fsl_pci platform driver. The
pcibios_init in pci_32.c will be called before the platform driver probe
will be invoked.

The call order for a p2020 board with linux 3.9 is currently:

fsl_pci_init
pcibios_init
fsl_pci_probe
fsl_pci_probe
fsl_pci_probe

Therefore the PCI/PCIe bridge will be added after the PCI/PCIe busses
was scanned for devices. So no PCI/PCIe devices are available.

Everything works fine by reverting the fsl_pci.[ch] to the version in
linux 3.4, because the PCI/PCIe bridges will be added in
the ..._setup_arch() function, before the pcibios_init function is
called.

Any solution for this issue?

- Stefani


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-06-25 Thread Peter LaDow
On Sat, Jun 22, 2013 at 5:00 PM, Benjamin Herrenschmidt
 wrote:
> Afaik e300 is slightly out of order, maybe it's missing a memory barrier
> somewhere One thing to try is to add some to the dma_map/unmap ops.

I went through the driver and added memory barriers to the
dma_map_page/dma_unmap_page and dma_alloc_coherent/dma_free_coherent
calls (wmb() calls after each, which resolves to a sync instruction).
I still get a kernel panic.

I did turn on DEBUG_PAGE_ALLOC to try and get more information, but
I'm not finding anything new.  However, with the SLAB debugging I do
find SLAB corruption, e.g.:

Slab corruption: fib6_nodes start=e900c7f8, len=32
Redzone: 0x9f911029d74e35b/0x30a706a6050806.
Last user: [<06040001>](0x6040001)
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ff ff ff ff ff ff
Prev obj: start=e900c7c0, len=32
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<  (null)>](0x0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5
Next obj: start=e900c830, len=32
Redzone: 0x30a706a6050aca/0xc8be11029d74e35b.
Last user: [<  (null)>](0x0)
000: 0d aa 00 00 00 00 00 00 0a ca 0d 49 00 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 75 8b

Which is clearly corrupted with ethernet frames.  The only interface
connected is the e1000.  Eventually this corruption leads to a kernel
panic.

I'm completely confused on how this could happen. Given the M bit is
set for all pages (see below), and with memory barriers on the DMA
map/unmap and register operations, the only thing I can think of is
something in the IO sequencer (which was suggested in the link I gave
earlier).  Yet the patch mentioned is in place.

> Also audit the driver to ensure that it properly uses barriers when
> populating descriptors (and maybe compare to a more recent version of
> the driver upstream).

I've gone through the driver and didn't see anything missing.  And the
upstream (v3.10-rc5) driver is the same version (7.3.21-k8-NAPI).  And
I've used the latest from the e1000 release (8.0.35-NAPI), and I get
the same problem.

On Sun, Jun 23, 2013 at 6:16 PM, Benjamin Herrenschmidt
 wrote:
> Also dbl check that the MMU is indeed mapping all these pages with the
> "M" bit.

The DBAT's have the M bit set (both have 0x12 in the DBATxL
registers)...sometimes.  Usually when I halt the CPU and dumps the
BAT's, all the IBAT's and DBAT's have zeros.  But occasionally I see
DBAT2 and DBAT3 with values and the M bit set.

I also dumped all the TLB entries, and every one of them has the M bit
set (see below).

TLB dump:

BDI>dtlb 0 63
IDX  V RC VSID   VPIRPN  WIMG PP
  0: V 0C 000eee_e9a -> 2e9a --M- 00
  1: V 0C 000eee_f401000 -> 2f401000 --M- 00
  2: V 1C 000ccc_0502000 -> 00502000 --M- 00
  3: V 0C 000eee_f403000 -> 2f403000 --M- 00
  4: V 0C 000eee_c124000 -> 2c124000 --M- 00
  5: V 0C 000eee_f405000 -> 2f405000 --M- 00
  6: V 0C 000eee_e9e6000 -> 2e9e6000 --M- 00
  7: V 0C 33afd1_0427000 -> 005f8000 --M- 10
  8: V 0C 33afd1_0428000 -> 2ff63000 --M- 10
  9: V 0C 000ccc_0349000 -> 00349000 --M- 00
 10: V 1C 000ccc_03ca000 -> 003ca000 --M- 00
 11: V 1C 000ccc_03cb000 -> 003cb000 --M- 00
 12: V 0C 33afd1_040c000 -> 003b4000 --M- 11
 13: V 0C 000eee_f40d000 -> 2f40d000 --M- 00
 14: V 1C 000eee_fa8e000 -> 2fa8e000 --M- 00
 15: V 0- 33afd1_034f000 -> 2e6b1000 --M- 11
 16: V 0C 000eee_f47 -> 2f47 --M- 00
 17: V 0C 33afd1_0411000 -> 2fe54000 --M- 10
 18: V 0C 000eee_f4b2000 -> 2f4b2000 --M- 00
 19: V 1C 33eb14_8073000 -> 00462000 --M- 10
 20: V 0C 000ccc_02f4000 -> 002f4000 --M- 00
 21: V 0C 000eee_f415000 -> 2f415000 --M- 00
 22: V 1C 000ccc_03f6000 -> 003f6000 --M- 00
 23: V 0C 000ccc_02f7000 -> 002f7000 --M- 00
 24: V 1C 000ccc_03f8000 -> 003f8000 --M- 00
 25: V 0C 000ccc_03d9000 -> 003d9000 --M- 00
 26: V 1C 33b304_a31a000 -> 007f4000 --M- 10
 27: V 1C 000ccc_03fb000 -> 003fb000 --M- 00
 28: V 1C 000ccc_03fc000 -> 003fc000 --M- 00
 29: V 0C 000eee_f41d000 -> 2f41d000 --M- 00
 30: V 1C 000eee_e87e000 -> 2e87e000 --M- 00
 31: V 1C 33afd1_045f000 -> 2fe52000 --M- 10
 32: V 0C 000ccc_000 ->  --M- 00
 33: V 0C 000eee_e9a1000 -> 2e9a1000 --M- 00
 34: V 1C 33b304_8022000 -> 00f44000 --M- 10
 35: V 0C 000ccc_0503000 -> 00503000 --M- 00
 36: V 0C 33afd1_0744000 -> 2fe17000 --M- 10
 37: V 0C 000eee_c125000 -> 2c125000 --M- 00
 38: V 0C 33e7e1_0406000 -> 0078e000 --M- 11
 39: V 0C 000eee_e987000 -> 2e987000 --M- 00
 40: V 0C 000ccc_0008000 -> 8000 --M- 00
 41: V 0C 000ccc_03c9000 -> 003c9000 --M- 00
 42: V 1C 33ba7b_f8ea000 -> 005f9000 --M- 10
 43: V 1C 33afd1_040b000 -> 2ffe --M- 11
 44: V 0C 000ccc_03cc000 -> 003cc000 --M- 00
 45: V 0C 000eee_b68d000 -> 2b68d000 --M- 00
 46: V 1C 000eee_f40e000 -> 2f40e000 --M- 00
 47: V 0C 000eee_fa8f000 -> 2fa8f000 --M- 00
 48: V 0C 33afd1_041 -> 2fe4a000 --M- 10
 49: V 0C 000eee_f471000 -> 2f471000 --M- 00
 50: V 0C 000ccc_03f2000 -> 003f2000 --M- 00
 51: V 1C 000eee_f473000 -> 2f473000 --M- 00
 52: V 0C 000ccc_03f

Re: [PATCH 04/45] CPU hotplug: Add infrastructure to check lacking hotplug synchronization

2013-06-25 Thread Srivatsa S. Bhat
On 06/25/2013 04:56 AM, Steven Rostedt wrote:
> On Sun, 2013-06-23 at 19:08 +0530, Srivatsa S. Bhat wrote:
> 
> 
> Just to make the code a little cleaner, can you add:
> 
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index 860f51a..e90d9d7 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -63,6 +63,72 @@ static struct {
>>  .refcount = 0,
>>  };
>>  
>> +#ifdef CONFIG_DEBUG_HOTPLUG_CPU
>> +
[..]
> 
> static inline void atomic_reader_refcnt_inc(void)
> {
>   this_cpu_inc(atomic_reader_refcnt);
> }
> static inline void atomic_reader_refcnt_dec(void)
> {
>   this_cpu_dec(atomic_reader_refcnt);
> }
> 
> #else
> static inline void atomic_reader_refcnt_inc(void)
> {
> }
> static inline void atomic_reader_refcnt_dec(void)
> {
> }
> #endif
> 
>> +#endif
>> +
>>  void get_online_cpus(void)
>>  {
>>  might_sleep();
>> @@ -189,13 +255,22 @@ unsigned int get_online_cpus_atomic(void)
>>   * from going offline.
>>   */
>>  preempt_disable();
>> +
>> +#ifdef CONFIG_DEBUG_HOTPLUG_CPU
>> +this_cpu_inc(atomic_reader_refcnt);
>> +#endif
> 
> Replace the #ifdef with just:
> 
>   atomic_reader_refcnt_inc();
> 
>>  return smp_processor_id();
>>  }
>>  EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
>>  
>>  void put_online_cpus_atomic(void)
>>  {
>> +
>> +#ifdef CONFIG_DEBUG_HOTPLUG_CPU
>> +this_cpu_dec(atomic_reader_refcnt);
>> +#endif
> 
> And
> 
>   atomic_reader_refcnt_dec();
> 

This makes the code look much better. Thank you!
I'll make that change in my v2.

Regards,
Srivatsa S. Bhat

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 40/45] powerpc, irq: Use GFP_ATOMIC allocations in atomic context

2013-06-25 Thread Srivatsa S. Bhat
On 06/25/2013 08:43 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2013-06-25 at 12:58 +1000, Michael Ellerman wrote:
>> On Tue, Jun 25, 2013 at 12:13:04PM +1000, Benjamin Herrenschmidt wrote:
>>> On Tue, 2013-06-25 at 12:08 +1000, Michael Ellerman wrote:
 We're not checking for allocation failure, which we should be.

 But this code is only used on powermac and 85xx, so it should probably
 just be a TODO to fix this up to handle the failure.
>>>
>>> And what can we do if they fail ?
>>
>> Fail up the chain and not unplug the CPU presumably.
> 
> BTW. Isn't Srivatsa series removing the need to stop_machine() for
> unplug ? 

Yes.

That should mean we should be able to use GFP_KERNEL no ?

No, because whatever code was being executed in stop_machine() context
would still be executed with interrupts disabled. So allocations that
can sleep would continue to be forbidden in this path.

In the CPU unplug sequence, the CPU_DYING notifications (and the surrounding
code) is guaranteed to be run:
a. _on_ the CPU going offline
b. with interrupts disabled on that CPU.

My patchset will retain these guarantees even after removing stop_machine().
And these are required for the correct execution of the code in this path,
since they rely on these semantics.

So I guess I'll retain the patch as it is. Thank you!

Regards,
Srivatsa S. Bhat

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 00/45] CPU hotplug: stop_machine()-free CPU hotplug, part 1

2013-06-25 Thread Srivatsa S. Bhat
Hi,

This patchset is a first step towards removing stop_machine() from the
CPU hotplug offline path. It introduces a set of APIs (as a replacement to
preempt_disable()/preempt_enable()) to synchronize with CPU hotplug from
atomic contexts.

The motivation behind getting rid of stop_machine() is to avoid its
ill-effects, such as performance penalties[1] and the real-time latencies it
inflicts on the system (and also things involving stop_machine() have often
been notoriously hard to debug). And in general, getting rid of stop_machine()
from CPU hotplug also greatly enhances the elegance of the CPU hotplug design
itself.

Getting rid of stop_machine() involves building the corresponding
infrastructure in the core CPU hotplug code and converting all places which
used to depend on the semantics of stop_machine() to synchronize with CPU
hotplug.

This patchset builds a first-level base infrastructure on which tree-wide
conversions can be built upon, and also includes the conversions themselves.
We certainly need a few more careful tree-sweeps to complete the conversion,
but the goal of this patchset is to introduce the core pieces and to get the
first batch of conversions in, while covering a reasonable bulk among them.

This patchset also has a debug infrastructure to help with the conversions -
with the newly introduced CONFIG_DEBUG_HOTPLUG_CPU option turned on, it
prints warnings whenever the need for a conversion is detected. Patches 4-7
build this framework. Needless to say, I'd really appreciate if people could
test kernels with this option turned on and report omissions or better yet,
send patches to contribute to this effort.

[It is to be noted that this patchset doesn't replace stop_machine() yet,
so the immediate risk in having an unconverted (or converted) call-site
is nil, since there is no major functional change involved.]

Once the conversion gets completed, we can finalize on the design of the
stop_machine() replacement and use that in the core CPU hotplug code. We have
had some discussions in the past where we debated several different
designs[2]. We'll revisit that with more ideas once this conversion gets over.


This patchset applies on current tip:master. It is also available in the
following git branch:

git://github.com/srivatsabhat/linux.git  stop-mch-free-cpuhp-part1-v2


Thank you very much!


Changes in v2:
-
* Build-fix for !HOTPLUG_CPU case.
* Minor code cleanups, and added a few comments.


References:
--

1. Performance difference between CPU Hotplug with and without
   stop_machine():
   http://article.gmane.org/gmane.linux.kernel/1435249

2. Links to discussions around alternative synchronization schemes to
   replace stop_machine() in the CPU Hotplug code:

   v6: http://lwn.net/Articles/538819/
   v5: http://lwn.net/Articles/533553/
   v4: https://lkml.org/lkml/2012/12/11/209
   v3: https://lkml.org/lkml/2012/12/7/287
   v2: https://lkml.org/lkml/2012/12/5/322
   v1: https://lkml.org/lkml/2012/12/4/88

3. Links to previous versions of this patchset:
   v1: http://lwn.net/Articles/556138/

--
 Srivatsa S. Bhat (45):
  CPU hotplug: Provide APIs to prevent CPU offline from atomic context
  CPU hotplug: Clarify the usage of different synchronization APIs
  Documentation, CPU hotplug: Recommend usage of 
get/put_online_cpus_atomic()
  CPU hotplug: Add infrastructure to check lacking hotplug synchronization
  CPU hotplug: Protect set_cpu_online() to avoid false-positives
  CPU hotplug: Sprinkle debugging checks to catch locking bugs
  CPU hotplug: Expose the new debug config option
  CPU hotplug: Convert preprocessor macros to static inline functions
  smp: Use get/put_online_cpus_atomic() to prevent CPU offline
  sched/core: Use get/put_online_cpus_atomic() to prevent CPU offline
  migration: Use raw_spin_lock/unlock since interrupts are already disabled
  sched/fair: Use get/put_online_cpus_atomic() to prevent CPU offline
  timer: Use get/put_online_cpus_atomic() to prevent CPU offline
  sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline
  rcu: Use get/put_online_cpus_atomic() to prevent CPU offline
  tick-broadcast: Use get/put_online_cpus_atomic() to prevent CPU offline
  time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline
  softirq: Use get/put_online_cpus_atomic() to prevent CPU offline
  irq: Use get/put_online_cpus_atomic() to prevent CPU offline
  net: Use get/put_online_cpus_atomic() to prevent CPU offline
  block: Use get/put_online_cpus_atomic() to prevent CPU offline
  percpu_counter: Use get/put_online_cpus_atomic() to prevent CPU offline
  infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline
  [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline
  staging/octeon: Use get/put_online_cpus_atomic() to prevent CPU offline
  x86: Use get/put_online_cpus_atomic() to prevent CPU

[PATCH v2 01/45] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2013-06-25 Thread Srivatsa S. Bhat
The current CPU offline code uses stop_machine() internally. And disabling
preemption prevents stop_machine() from taking effect, thus also preventing
CPUs from going offline, as a side effect.

There are places where this side-effect of preempt_disable() (or equivalent)
is used to synchronize with CPU hotplug. Typically these are in atomic
sections of code, where they can't make use of get/put_online_cpus(), because
the latter set of APIs can sleep.

Going forward, we want to get rid of stop_machine() from the CPU hotplug
offline path. And then, with stop_machine() gone, disabling preemption will
no longer prevent CPUs from going offline.

So provide a set of APIs for such atomic hotplug readers, to prevent (any)
CPUs from going offline. For now, they will default to preempt_disable()
and preempt_enable() itself, but this will help us do the tree-wide conversion,
as a preparatory step to remove stop_machine() from CPU hotplug.

(Besides, it is good documentation as well, since it clearly marks places
where we synchronize with CPU hotplug, instead of combining it subtly with
disabling preemption).

In future, when actually removing stop_machine(), we will alter the
implementation of these APIs to a suitable synchronization scheme.

Reviewed-by: Steven Rostedt 
Cc: Thomas Gleixner 
Cc: Andrew Morton 
Cc: Tejun Heo 
Cc: "Rafael J. Wysocki" 
Cc: Yasuaki Ishimatsu 
Signed-off-by: Srivatsa S. Bhat 
---

 include/linux/cpu.h |   20 
 kernel/cpu.c|   38 ++
 2 files changed, 58 insertions(+)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 9f3c7e8..a57b25a 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 struct device;
 
@@ -175,6 +177,8 @@ extern struct bus_type cpu_subsys;
 
 extern void get_online_cpus(void);
 extern void put_online_cpus(void);
+extern unsigned int get_online_cpus_atomic(void);
+extern void put_online_cpus_atomic(void);
 extern void cpu_hotplug_disable(void);
 extern void cpu_hotplug_enable(void);
 #define hotcpu_notifier(fn, pri)   cpu_notifier(fn, pri)
@@ -202,6 +206,22 @@ static inline void cpu_hotplug_driver_unlock(void)
 #define put_online_cpus()  do { } while (0)
 #define cpu_hotplug_disable()  do { } while (0)
 #define cpu_hotplug_enable()   do { } while (0)
+
+static inline unsigned int get_online_cpus_atomic(void)
+{
+   /*
+* Disable preemption to avoid getting complaints from the
+* debug_smp_processor_id() code.
+*/
+   preempt_disable();
+   return smp_processor_id();
+}
+
+static inline void put_online_cpus_atomic(void)
+{
+   preempt_enable();
+}
+
 #define hotcpu_notifier(fn, pri)   do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)   ({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 198a388..2d03398 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -154,6 +154,44 @@ void cpu_hotplug_enable(void)
cpu_maps_update_done();
 }
 
+/*
+ * get_online_cpus_atomic - Prevent any CPU from going offline
+ *
+ * Atomic hotplug readers (tasks which wish to prevent CPUs from going
+ * offline during their critical section, but can't afford to sleep)
+ * can invoke this function to synchronize with CPU offline. This function
+ * can be called recursively, provided it is matched with an equal number
+ * of calls to put_online_cpus_atomic().
+ *
+ * Note: This does NOT prevent CPUs from coming online! It only prevents
+ * CPUs from going offline.
+ *
+ * Lock ordering rule: Strictly speaking, there is no lock ordering
+ * requirement here, but it is advisable to keep the locking consistent.
+ * As a simple rule-of-thumb, use these functions in the outer-most blocks
+ * of your critical sections, outside of other locks.
+ *
+ * Returns the current CPU number, with preemption disabled.
+ */
+unsigned int get_online_cpus_atomic(void)
+{
+   /*
+* The current CPU hotplug implementation uses stop_machine() in
+* the CPU offline path. And disabling preemption prevents
+* stop_machine() from taking effect. Thus, this prevents any CPU
+* from going offline.
+*/
+   preempt_disable();
+   return smp_processor_id();
+}
+EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
+
+void put_online_cpus_atomic(void)
+{
+   preempt_enable();
+}
+EXPORT_SYMBOL_GPL(put_online_cpus_atomic);
+
 #else /* #if CONFIG_HOTPLUG_CPU */
 static void cpu_hotplug_begin(void) {}
 static void cpu_hotplug_done(void) {}

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 02/45] CPU hotplug: Clarify the usage of different synchronization APIs

2013-06-25 Thread Srivatsa S. Bhat
We have quite a few APIs now which help synchronize with CPU hotplug.
Among them, get/put_online_cpus() is the oldest and the most well-known,
so no problems there. By extension, its easy to comprehend the new
set : get/put_online_cpus_atomic().

But there is yet another set, which might appear tempting to use:
cpu_hotplug_disable()/cpu_hotplug_enable(). Add comments to clarify
that this latter set is NOT for general use and must be used only in
specific cases where the requirement is really to _disable_ hotplug
and not just to synchronize with it.

Cc: Thomas Gleixner 
Cc: Andrew Morton 
Cc: Yasuaki Ishimatsu 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/cpu.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2d03398..860f51a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -139,6 +139,13 @@ static void cpu_hotplug_done(void)
  * the 'cpu_hotplug_disabled' flag. The same lock is also acquired by the
  * hotplug path before performing hotplug operations. So acquiring that lock
  * guarantees mutual exclusion from any currently running hotplug operations.
+ *
+ * Note: In most cases, this is *NOT* the function you need. If you simply
+ * want to avoid racing with CPU hotplug operations, use get/put_online_cpus()
+ * or get/put_online_cpus_atomic(), depending on the situation.
+ *
+ * This set of functions is reserved for cases where you really wish to
+ * _disable_ CPU hotplug and not just synchronize with it.
  */
 void cpu_hotplug_disable(void)
 {

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 03/45] Documentation, CPU hotplug: Recommend usage of get/put_online_cpus_atomic()

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

So add documentation to recommend using the new get/put_online_cpus_atomic()
APIs to prevent CPUs from going offline, while invoking from atomic context.

Cc: Rob Landley 
Cc: linux-...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 Documentation/cpu-hotplug.txt |   20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 9f40135..7b3ca60 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -113,13 +113,18 @@ Never use anything other than cpumask_t to represent 
bitmap of CPUs.
#include 
get_online_cpus() and put_online_cpus():
 
-The above calls are used to inhibit cpu hotplug operations. While the
+The above calls are used to inhibit cpu hotplug operations, when invoked from
+non-atomic contexts (because the above functions can sleep). While the
 cpu_hotplug.refcount is non zero, the cpu_online_mask will not change.
-If you merely need to avoid cpus going away, you could also use
-preempt_disable() and preempt_enable() for those sections.
-Just remember the critical section cannot call any
-function that can sleep or schedule this process away. The preempt_disable()
-will work as long as stop_machine_run() is used to take a cpu down.
+
+However, if you are executing in atomic context (ie., you can't afford to
+sleep), and you merely need to avoid cpus going offline, you can use
+get_online_cpus_atomic() and put_online_cpus_atomic() for those sections.
+Just remember the critical section cannot call any function that can sleep or
+schedule this process away. Using preempt_disable() will also work, as long
+as stop_machine() is used to take a CPU down. But we are going to get rid of
+stop_machine() in the CPU offline path soon, so it is strongly recommended
+to use the APIs mentioned above.
 
 CPU Hotplug - Frequently Asked Questions.
 
@@ -360,6 +365,9 @@ A: There are two ways.  If your code can be run in 
interrupt context, use
return err;
}
 
+   If my_func_on_cpu() itself cannot block, use get/put_online_cpus_atomic()
+   instead of get/put_online_cpus(), to prevent CPUs from going offline.
+
 Q: How do we determine how many CPUs are available for hotplug.
 A: There is no clear spec defined way from ACPI that can give us that
information today. Based on some input from Natalie of Unisys,

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 04/45] CPU hotplug: Add infrastructure to check lacking hotplug synchronization

2013-06-25 Thread Srivatsa S. Bhat
Add a debugging infrastructure to warn if an atomic hotplug reader has not
invoked get_online_cpus_atomic() before traversing/accessing the
cpu_online_mask. Encapsulate these checks under a new debug config option
DEBUG_HOTPLUG_CPU.

This debugging infrastructure proves useful in the tree-wide conversion
of atomic hotplug readers from preempt_disable() to the new APIs, and
help us catch the places we missed, much before we actually get rid of
stop_machine(). We can perhaps remove the debugging checks later on.

Cc: Rusty Russell 
Cc: Alex Shi 
Cc: KOSAKI Motohiro 
Cc: Tejun Heo 
Cc: Thomas Gleixner 
Cc: Andrew Morton 
Cc: Yasuaki Ishimatsu 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Srivatsa S. Bhat 
---

 include/linux/cpumask.h |   12 ++
 kernel/cpu.c|   89 +++
 2 files changed, 101 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index d08e4d2..9197ca4 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -101,6 +101,18 @@ extern const struct cpumask *const cpu_active_mask;
 #define cpu_active(cpu)((cpu) == 0)
 #endif
 
+#ifdef CONFIG_DEBUG_HOTPLUG_CPU
+extern void check_hotplug_safe_cpumask(const struct cpumask *mask);
+extern void check_hotplug_safe_cpu(unsigned int cpu,
+  const struct cpumask *mask);
+#else
+static inline void check_hotplug_safe_cpumask(const struct cpumask *mask) { }
+static inline void check_hotplug_safe_cpu(unsigned int cpu,
+ const struct cpumask *mask)
+{
+}
+#endif
+
 /* verify cpu argument to cpumask_* operators */
 static inline unsigned int cpumask_check(unsigned int cpu)
 {
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 860f51a..5297ec1 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -63,6 +63,92 @@ static struct {
.refcount = 0,
 };
 
+#ifdef CONFIG_DEBUG_HOTPLUG_CPU
+
+static DEFINE_PER_CPU(unsigned long, atomic_reader_refcnt);
+
+static int current_is_hotplug_safe(const struct cpumask *mask)
+{
+
+   /* If we are not dealing with cpu_online_mask, don't complain. */
+   if (mask != cpu_online_mask)
+   return 1;
+
+   /* If this is the task doing hotplug, don't complain. */
+   if (unlikely(current == cpu_hotplug.active_writer))
+   return 1;
+
+   /* If we are in early boot, don't complain. */
+   if (system_state != SYSTEM_RUNNING)
+   return 1;
+
+   /*
+* Check if the current task is in atomic context and it has
+* invoked get_online_cpus_atomic() to synchronize with
+* CPU Hotplug.
+*/
+   if (preempt_count() || irqs_disabled())
+   return this_cpu_read(atomic_reader_refcnt);
+   else
+   return 1; /* No checks for non-atomic contexts for now */
+}
+
+static inline void warn_hotplug_unsafe(void)
+{
+   WARN_ONCE(1, "Must use get/put_online_cpus_atomic() to synchronize"
+" with CPU hotplug\n");
+}
+
+/*
+ * Check if the task (executing in atomic context) has the required protection
+ * against CPU hotplug, while accessing the specified cpumask.
+ */
+void check_hotplug_safe_cpumask(const struct cpumask *mask)
+{
+   if (!current_is_hotplug_safe(mask))
+   warn_hotplug_unsafe();
+}
+EXPORT_SYMBOL_GPL(check_hotplug_safe_cpumask);
+
+/*
+ * Similar to check_hotplug_safe_cpumask(), except that we don't complain
+ * if the task (executing in atomic context) is testing whether the CPU it
+ * is executing on is online or not.
+ *
+ * (A task executing with preemption disabled on a CPU, automatically prevents
+ *  offlining that CPU, irrespective of the actual implementation of CPU
+ *  offline. So we don't enforce holding of get_online_cpus_atomic() for that
+ *  case).
+ */
+void check_hotplug_safe_cpu(unsigned int cpu, const struct cpumask *mask)
+{
+   if(!current_is_hotplug_safe(mask) && cpu != smp_processor_id())
+   warn_hotplug_unsafe();
+}
+EXPORT_SYMBOL_GPL(check_hotplug_safe_cpu);
+
+static inline void atomic_reader_refcnt_inc(void)
+{
+   this_cpu_inc(atomic_reader_refcnt);
+}
+
+static inline void atomic_reader_refcnt_dec(void)
+{
+   this_cpu_dec(atomic_reader_refcnt);
+}
+
+#else
+
+static inline void atomic_reader_refcnt_inc(void)
+{
+}
+
+static inline void atomic_reader_refcnt_dec(void)
+{
+}
+
+#endif
+
 void get_online_cpus(void)
 {
might_sleep();
@@ -189,12 +275,15 @@ unsigned int get_online_cpus_atomic(void)
 * from going offline.
 */
preempt_disable();
+   atomic_reader_refcnt_inc();
+
return smp_processor_id();
 }
 EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
 
 void put_online_cpus_atomic(void)
 {
+   atomic_reader_refcnt_dec();
preempt_enable();
 }
 EXPORT_SYMBOL_GPL(put_online_cpus_atomic);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlab

[PATCH v2 05/45] CPU hotplug: Protect set_cpu_online() to avoid false-positives

2013-06-25 Thread Srivatsa S. Bhat
When bringing a secondary CPU online, the task running on the CPU coming up
sets itself in the cpu_online_mask. This is safe even though this task is not
the hotplug writer task.

But it is kinda hard to teach this to the CPU hotplug debug infrastructure,
and if we get it wrong, we risk making the debug code too lenient, risking
false-negatives.

Luckily, all architectures use set_cpu_online() to manipulate the
cpu_online_mask. So, to avoid false-positive warnings from the CPU hotplug
debug code, encapsulate the body of set_cpu_online() within
get/put_online_cpus_atomic().

Cc: Thomas Gleixner 
Cc: Andrew Morton 
Cc: Yasuaki Ishimatsu 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/cpu.c |4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5297ec1..35e7115 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -832,10 +832,14 @@ void set_cpu_present(unsigned int cpu, bool present)
 
 void set_cpu_online(unsigned int cpu, bool online)
 {
+   get_online_cpus_atomic();
+
if (online)
cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
else
cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
+
+   put_online_cpus_atomic();
 }
 
 void set_cpu_active(unsigned int cpu, bool active)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 06/45] CPU hotplug: Sprinkle debugging checks to catch locking bugs

2013-06-25 Thread Srivatsa S. Bhat
Now that we have a debug infrastructure in place to detect cases where
get/put_online_cpus_atomic() had to be used, add these checks at the
right spots to help catch places where we missed converting to the new
APIs.

Cc: Rusty Russell 
Cc: Alex Shi 
Cc: KOSAKI Motohiro 
Cc: Tejun Heo 
Cc: Andrew Morton 
Cc: Joonsoo Kim 
Signed-off-by: Srivatsa S. Bhat 
---

 include/linux/cpumask.h |   47 +--
 lib/cpumask.c   |8 
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 9197ca4..06d2c36 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -169,6 +169,7 @@ static inline unsigned int cpumask_any_but(const struct 
cpumask *mask,
  */
 static inline unsigned int cpumask_first(const struct cpumask *srcp)
 {
+   check_hotplug_safe_cpumask(srcp);
return find_first_bit(cpumask_bits(srcp), nr_cpumask_bits);
 }
 
@@ -184,6 +185,8 @@ static inline unsigned int cpumask_next(int n, const struct 
cpumask *srcp)
/* -1 is a legal arg here. */
if (n != -1)
cpumask_check(n);
+
+   check_hotplug_safe_cpumask(srcp);
return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1);
 }
 
@@ -199,6 +202,8 @@ static inline unsigned int cpumask_next_zero(int n, const 
struct cpumask *srcp)
/* -1 is a legal arg here. */
if (n != -1)
cpumask_check(n);
+
+   check_hotplug_safe_cpumask(srcp);
return find_next_zero_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1);
 }
 
@@ -288,8 +293,15 @@ static inline void cpumask_clear_cpu(int cpu, struct 
cpumask *dstp)
  *
  * No static inline type checking - see Subtlety (1) above.
  */
-#define cpumask_test_cpu(cpu, cpumask) \
-   test_bit(cpumask_check(cpu), cpumask_bits((cpumask)))
+#define cpumask_test_cpu(cpu, cpumask) \
+({ \
+   int __ret;  \
+   \
+   check_hotplug_safe_cpu(cpu, cpumask);   \
+   __ret = test_bit(cpumask_check(cpu),\
+   cpumask_bits((cpumask)));   \
+   __ret;  \
+})
 
 /**
  * cpumask_test_and_set_cpu - atomically test and set a cpu in a cpumask
@@ -349,6 +361,9 @@ static inline int cpumask_and(struct cpumask *dstp,
   const struct cpumask *src1p,
   const struct cpumask *src2p)
 {
+   check_hotplug_safe_cpumask(src1p);
+   check_hotplug_safe_cpumask(src2p);
+
return bitmap_and(cpumask_bits(dstp), cpumask_bits(src1p),
   cpumask_bits(src2p), nr_cpumask_bits);
 }
@@ -362,6 +377,9 @@ static inline int cpumask_and(struct cpumask *dstp,
 static inline void cpumask_or(struct cpumask *dstp, const struct cpumask 
*src1p,
  const struct cpumask *src2p)
 {
+   check_hotplug_safe_cpumask(src1p);
+   check_hotplug_safe_cpumask(src2p);
+
bitmap_or(cpumask_bits(dstp), cpumask_bits(src1p),
  cpumask_bits(src2p), nr_cpumask_bits);
 }
@@ -376,6 +394,9 @@ static inline void cpumask_xor(struct cpumask *dstp,
   const struct cpumask *src1p,
   const struct cpumask *src2p)
 {
+   check_hotplug_safe_cpumask(src1p);
+   check_hotplug_safe_cpumask(src2p);
+
bitmap_xor(cpumask_bits(dstp), cpumask_bits(src1p),
   cpumask_bits(src2p), nr_cpumask_bits);
 }
@@ -392,6 +413,9 @@ static inline int cpumask_andnot(struct cpumask *dstp,
  const struct cpumask *src1p,
  const struct cpumask *src2p)
 {
+   check_hotplug_safe_cpumask(src1p);
+   check_hotplug_safe_cpumask(src2p);
+
return bitmap_andnot(cpumask_bits(dstp), cpumask_bits(src1p),
  cpumask_bits(src2p), nr_cpumask_bits);
 }
@@ -404,6 +428,8 @@ static inline int cpumask_andnot(struct cpumask *dstp,
 static inline void cpumask_complement(struct cpumask *dstp,
  const struct cpumask *srcp)
 {
+   check_hotplug_safe_cpumask(srcp);
+
bitmap_complement(cpumask_bits(dstp), cpumask_bits(srcp),
  nr_cpumask_bits);
 }
@@ -416,6 +442,9 @@ static inline void cpumask_complement(struct cpumask *dstp,
 static inline bool cpumask_equal(const struct cpumask *src1p,
const struct cpumask *src2p)
 {
+   check_hotplug_safe_cpumask(src1p);
+   check_hotplug_safe_cpumask(src2p);
+
return bitmap_equal(cpumask_bits(src1p), cpumask_bits(src2p),
   

[PATCH v2 07/45] CPU hotplug: Expose the new debug config option

2013-06-25 Thread Srivatsa S. Bhat
Now that we have all the pieces of the CPU hotplug debug infrastructure
in place, expose the feature by growing a new Kconfig option,
CONFIG_DEBUG_HOTPLUG_CPU.

Cc: Andrew Morton 
Cc: "Paul E. McKenney" 
Cc: Akinobu Mita 
Cc: Catalin Marinas 
Cc: Michel Lespinasse 
Cc: Sergei Shtylyov 
Signed-off-by: Srivatsa S. Bhat 
---

 lib/Kconfig.debug |8 
 1 file changed, 8 insertions(+)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 566cf2b..ec6be74 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -512,6 +512,14 @@ config DEBUG_PREEMPT
  if kernel code uses it in a preemption-unsafe way. Also, the kernel
  will detect preemption count underflows.
 
+config DEBUG_HOTPLUG_CPU
+   bool "Debug CPU hotplug"
+   depends on HOTPLUG_CPU
+   help
+ If you say Y here, the kernel will check all the accesses of
+ cpu_online_mask from atomic contexts, and will print warnings if
+ the task lacks appropriate synchronization with CPU hotplug.
+
 config DEBUG_RT_MUTEXES
bool "RT Mutex debugging, deadlock detection"
depends on DEBUG_KERNEL && RT_MUTEXES

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 08/45] CPU hotplug: Convert preprocessor macros to static inline functions

2013-06-25 Thread Srivatsa S. Bhat
Convert the macros in the CPU hotplug code to static inline C functions.

Cc: Thomas Gleixner 
Cc: Andrew Morton 
Cc: Tejun Heo 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Srivatsa S. Bhat 
---

 include/linux/cpu.h |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index a57b25a..85431a0 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -202,10 +202,8 @@ static inline void cpu_hotplug_driver_unlock(void)
 
 #else  /* CONFIG_HOTPLUG_CPU */
 
-#define get_online_cpus()  do { } while (0)
-#define put_online_cpus()  do { } while (0)
-#define cpu_hotplug_disable()  do { } while (0)
-#define cpu_hotplug_enable()   do { } while (0)
+static inline void get_online_cpus(void) {}
+static inline void put_online_cpus(void) {}
 
 static inline unsigned int get_online_cpus_atomic(void)
 {
@@ -222,6 +220,9 @@ static inline void put_online_cpus_atomic(void)
preempt_enable();
 }
 
+static inline void cpu_hotplug_disable(void) {}
+static inline void cpu_hotplug_enable(void) {}
+
 #define hotcpu_notifier(fn, pri)   do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)   ({ (void)(nb); 0; })

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 09/45] smp: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Andrew Morton 
Cc: Wang YanQing 
Cc: Shaohua Li 
Cc: Jan Beulich 
Cc: liguang 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/smp.c |   52 ++--
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 4dba0f7..1f36d6d 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -232,7 +232,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, 
void *info,
 * prevent preemption and reschedule on another processor,
 * as well as CPU removal
 */
-   this_cpu = get_cpu();
+   this_cpu = get_online_cpus_atomic();
 
/*
 * Can deadlock when called with interrupts disabled.
@@ -264,7 +264,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, 
void *info,
}
}
 
-   put_cpu();
+   put_online_cpus_atomic();
 
return err;
 }
@@ -294,7 +294,7 @@ int smp_call_function_any(const struct cpumask *mask,
int ret;
 
/* Try for same CPU (cheapest) */
-   cpu = get_cpu();
+   cpu = get_online_cpus_atomic();
if (cpumask_test_cpu(cpu, mask))
goto call;
 
@@ -310,7 +310,7 @@ int smp_call_function_any(const struct cpumask *mask,
cpu = cpumask_any_and(mask, cpu_online_mask);
 call:
ret = smp_call_function_single(cpu, func, info, wait);
-   put_cpu();
+   put_online_cpus_atomic();
return ret;
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
@@ -331,7 +331,8 @@ void __smp_call_function_single(int cpu, struct 
call_single_data *csd,
unsigned int this_cpu;
unsigned long flags;
 
-   this_cpu = get_cpu();
+   this_cpu = get_online_cpus_atomic();
+
/*
 * Can deadlock when called with interrupts disabled.
 * We allow cpu's that are not yet online though, as no one else can
@@ -349,7 +350,8 @@ void __smp_call_function_single(int cpu, struct 
call_single_data *csd,
csd_lock(csd);
generic_exec_single(cpu, csd, wait);
}
-   put_cpu();
+
+   put_online_cpus_atomic();
 }
 
 /**
@@ -370,7 +372,9 @@ void smp_call_function_many(const struct cpumask *mask,
smp_call_func_t func, void *info, bool wait)
 {
struct call_function_data *cfd;
-   int cpu, next_cpu, this_cpu = smp_processor_id();
+   int cpu, next_cpu, this_cpu;
+
+   this_cpu = get_online_cpus_atomic();
 
/*
 * Can deadlock when called with interrupts disabled.
@@ -388,7 +392,7 @@ void smp_call_function_many(const struct cpumask *mask,
 
/* No online cpus?  We're done. */
if (cpu >= nr_cpu_ids)
-   return;
+   goto out;
 
/* Do we have another CPU which isn't us? */
next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
@@ -398,7 +402,7 @@ void smp_call_function_many(const struct cpumask *mask,
/* Fastpath: do that cpu by itself. */
if (next_cpu >= nr_cpu_ids) {
smp_call_function_single(cpu, func, info, wait);
-   return;
+   goto out;
}
 
cfd = &__get_cpu_var(cfd_data);
@@ -408,7 +412,7 @@ void smp_call_function_many(const struct cpumask *mask,
 
/* Some callers race with other cpus changing the passed mask */
if (unlikely(!cpumask_weight(cfd->cpumask)))
-   return;
+   goto out;
 
/*
 * After we put an entry into the list, cfd->cpumask may be cleared
@@ -443,6 +447,9 @@ void smp_call_function_many(const struct cpumask *mask,
csd_lock_wait(csd);
}
}
+
+out:
+   put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(smp_call_function_many);
 
@@ -463,9 +470,9 @@ EXPORT_SYMBOL(smp_call_function_many);
  */
 int smp_call_function(smp_call_func_t func, void *info, int wait)
 {
-   preempt_disable();
+   get_online_cpus_atomic();
smp_call_function_many(cpu_online_mask, func, info, wait);
-   preempt_enable();
+   put_online_cpus_atomic();
 
return 0;
 }
@@ -565,12 +572,12 @@ int on_each_cpu(void (*func) (void *info), void *info, 
int wait)
unsigned long flags;
int ret = 0;
 
-   preempt_disable();
+   get_online_cpus_atomic();
ret = smp_call_function(func, info, wait);
local_irq_save(flags);
func(info);
local_irq_restore(flags);
-   preempt_enable();
+   put_online_cpus_atomic();
return ret;
 }
 EXPORT_SYMBOL(on_each_cpu);
@@ -592,7 +599,7 @@ EXPORT_SYMBOL(on_each_cpu);
 void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
void *info, bool wa

[PATCH v2 10/45] sched/core: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/sched/core.c |   23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 195658b..accd550 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1160,11 +1160,11 @@ void kick_process(struct task_struct *p)
 {
int cpu;
 
-   preempt_disable();
+   get_online_cpus_atomic();
cpu = task_cpu(p);
if ((cpu != smp_processor_id()) && task_curr(p))
smp_send_reschedule(cpu);
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(kick_process);
 #endif /* CONFIG_SMP */
@@ -1172,6 +1172,9 @@ EXPORT_SYMBOL_GPL(kick_process);
 #ifdef CONFIG_SMP
 /*
  * ->cpus_allowed is protected by both rq->lock and p->pi_lock
+ *
+ *  Must be called within get/put_online_cpus_atomic(), to prevent
+ *  CPUs from going offline from under us.
  */
 static int select_fallback_rq(int cpu, struct task_struct *p)
 {
@@ -1245,6 +1248,9 @@ out:
 
 /*
  * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
+ *
+ * Must be called within get/put_online_cpus_atomic(), to prevent
+ * CPUs from going offline from under us.
  */
 static inline
 int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
@@ -1489,6 +1495,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, 
int wake_flags)
unsigned long flags;
int cpu, success = 0;
 
+   get_online_cpus_atomic();
+
smp_wmb();
raw_spin_lock_irqsave(&p->pi_lock, flags);
if (!(p->state & state))
@@ -1531,6 +1539,7 @@ stat:
 out:
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 
+   put_online_cpus_atomic();
return success;
 }
 
@@ -1753,6 +1762,8 @@ void wake_up_new_task(struct task_struct *p)
unsigned long flags;
struct rq *rq;
 
+   get_online_cpus_atomic();
+
raw_spin_lock_irqsave(&p->pi_lock, flags);
 #ifdef CONFIG_SMP
/*
@@ -1773,6 +1784,8 @@ void wake_up_new_task(struct task_struct *p)
p->sched_class->task_woken(rq, p);
 #endif
task_rq_unlock(rq, p, &flags);
+
+   put_online_cpus_atomic();
 }
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
@@ -3886,6 +3899,8 @@ bool __sched yield_to(struct task_struct *p, bool preempt)
unsigned long flags;
int yielded = 0;
 
+   get_online_cpus_atomic();
+
local_irq_save(flags);
rq = this_rq();
 
@@ -3931,6 +3946,8 @@ out_unlock:
 out_irq:
local_irq_restore(flags);
 
+   put_online_cpus_atomic();
+
if (yielded > 0)
schedule();
 
@@ -4331,9 +4348,11 @@ static int migration_cpu_stop(void *data)
 * The original target cpu might have gone down and we might
 * be on another cpu but it doesn't matter.
 */
+   get_online_cpus_atomic();
local_irq_disable();
__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
local_irq_enable();
+   put_online_cpus_atomic();
return 0;
 }
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 11/45] migration: Use raw_spin_lock/unlock since interrupts are already disabled

2013-06-25 Thread Srivatsa S. Bhat
We need not use the raw_spin_lock_irqsave/restore primitives because
all CPU_DYING notifiers run with interrupts disabled. So just use
raw_spin_lock/unlock.

Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/sched/core.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index accd550..ff26f54 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4682,14 +4682,14 @@ migration_call(struct notifier_block *nfb, unsigned 
long action, void *hcpu)
case CPU_DYING:
sched_ttwu_pending();
/* Update our root-domain */
-   raw_spin_lock_irqsave(&rq->lock, flags);
+   raw_spin_lock(&rq->lock); /* IRQs already disabled */
if (rq->rd) {
BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
set_rq_offline(rq);
}
migrate_tasks(cpu);
BUG_ON(rq->nr_running != 1); /* the migration thread */
-   raw_spin_unlock_irqrestore(&rq->lock, flags);
+   raw_spin_unlock(&rq->lock);
break;
 
case CPU_DEAD:

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 12/45] sched/fair: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/sched/fair.c |   14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c0ac2c3..88f056e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3338,7 +3338,8 @@ done:
  *
  * Returns the target CPU number, or the same CPU if no balancing is needed.
  *
- * preempt must be disabled.
+ * Must be called within get/put_online_cpus_atomic(), to prevent CPUs
+ * from going offline from under us.
  */
 static int
 select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
@@ -5267,6 +5268,8 @@ void idle_balance(int this_cpu, struct rq *this_rq)
raw_spin_unlock(&this_rq->lock);
 
update_blocked_averages(this_cpu);
+
+   get_online_cpus_atomic();
rcu_read_lock();
for_each_domain(this_cpu, sd) {
unsigned long interval;
@@ -5290,6 +5293,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
}
}
rcu_read_unlock();
+   put_online_cpus_atomic();
 
raw_spin_lock(&this_rq->lock);
 
@@ -5316,6 +5320,7 @@ static int active_load_balance_cpu_stop(void *data)
struct rq *target_rq = cpu_rq(target_cpu);
struct sched_domain *sd;
 
+   get_online_cpus_atomic();
raw_spin_lock_irq(&busiest_rq->lock);
 
/* make sure the requested cpu hasn't gone down in the meantime */
@@ -5367,6 +5372,7 @@ static int active_load_balance_cpu_stop(void *data)
 out_unlock:
busiest_rq->active_balance = 0;
raw_spin_unlock_irq(&busiest_rq->lock);
+   put_online_cpus_atomic();
return 0;
 }
 
@@ -5527,6 +5533,7 @@ static void rebalance_domains(int cpu, enum cpu_idle_type 
idle)
 
update_blocked_averages(cpu);
 
+   get_online_cpus_atomic();
rcu_read_lock();
for_each_domain(cpu, sd) {
if (!(sd->flags & SD_LOAD_BALANCE))
@@ -5575,6 +5582,7 @@ out:
break;
}
rcu_read_unlock();
+   put_online_cpus_atomic();
 
/*
 * next_balance will be updated only when there is a need.
@@ -5706,6 +5714,7 @@ static void run_rebalance_domains(struct softirq_action 
*h)
enum cpu_idle_type idle = this_rq->idle_balance ?
CPU_IDLE : CPU_NOT_IDLE;
 
+   get_online_cpus_atomic();
rebalance_domains(this_cpu, idle);
 
/*
@@ -5714,6 +5723,7 @@ static void run_rebalance_domains(struct softirq_action 
*h)
 * stopped.
 */
nohz_idle_balance(this_cpu, idle);
+   put_online_cpus_atomic();
 }
 
 static inline int on_null_domain(int cpu)
@@ -5731,8 +5741,10 @@ void trigger_load_balance(struct rq *rq, int cpu)
likely(!on_null_domain(cpu)))
raise_softirq(SCHED_SOFTIRQ);
 #ifdef CONFIG_NO_HZ_COMMON
+   get_online_cpus_atomic();
if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu)))
nohz_balancer_kick(cpu);
+   put_online_cpus_atomic();
 #endif
 }
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 13/45] timer: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Thomas Gleixner 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/timer.c |4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/timer.c b/kernel/timer.c
index 15ffdb3..5db594c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -729,6 +729,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
timer_stats_timer_set_start_info(timer);
BUG_ON(!timer->function);
 
+   get_online_cpus_atomic();
base = lock_timer_base(timer, &flags);
 
ret = detach_if_pending(timer, base, false);
@@ -768,6 +769,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
 
 out_unlock:
spin_unlock_irqrestore(&base->lock, flags);
+   put_online_cpus_atomic();
 
return ret;
 }
@@ -926,6 +928,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 
timer_stats_timer_set_start_info(timer);
BUG_ON(timer_pending(timer) || !timer->function);
+   get_online_cpus_atomic();
spin_lock_irqsave(&base->lock, flags);
timer_set_base(timer, base);
debug_activate(timer, timer->expires);
@@ -940,6 +943,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 */
wake_up_nohz_cpu(cpu);
spin_unlock_irqrestore(&base->lock, flags);
+   put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(add_timer_on);
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 14/45] sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/sched/rt.c |   14 ++
 1 file changed, 14 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 01970c8..03d9f38 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -6,6 +6,7 @@
 #include "sched.h"
 
 #include 
+#include 
 
 int sched_rr_timeslice = RR_TIMESLICE;
 
@@ -28,7 +29,9 @@ static enum hrtimer_restart sched_rt_period_timer(struct 
hrtimer *timer)
if (!overrun)
break;
 
+   get_online_cpus_atomic();
idle = do_sched_rt_period_timer(rt_b, overrun);
+   put_online_cpus_atomic();
}
 
return idle ? HRTIMER_NORESTART : HRTIMER_RESTART;
@@ -547,6 +550,7 @@ static int do_balance_runtime(struct rt_rq *rt_rq)
int i, weight, more = 0;
u64 rt_period;
 
+   get_online_cpus_atomic();
weight = cpumask_weight(rd->span);
 
raw_spin_lock(&rt_b->rt_runtime_lock);
@@ -588,6 +592,7 @@ next:
raw_spin_unlock(&iter->rt_runtime_lock);
}
raw_spin_unlock(&rt_b->rt_runtime_lock);
+   put_online_cpus_atomic();
 
return more;
 }
@@ -1168,6 +1173,10 @@ static void yield_task_rt(struct rq *rq)
 #ifdef CONFIG_SMP
 static int find_lowest_rq(struct task_struct *task);
 
+/*
+ * Must be called within get/put_online_cpus_atomic(), to prevent CPUs
+ * from going offline from under us.
+ */
 static int
 select_task_rq_rt(struct task_struct *p, int sd_flag, int flags)
 {
@@ -1561,6 +1570,8 @@ retry:
return 0;
}
 
+   get_online_cpus_atomic();
+
/* We might release rq lock */
get_task_struct(next_task);
 
@@ -1611,6 +1622,7 @@ retry:
 out:
put_task_struct(next_task);
 
+   put_online_cpus_atomic();
return ret;
 }
 
@@ -1630,6 +1642,7 @@ static int pull_rt_task(struct rq *this_rq)
if (likely(!rt_overloaded(this_rq)))
return 0;
 
+   get_online_cpus_atomic();
for_each_cpu(cpu, this_rq->rd->rto_mask) {
if (this_cpu == cpu)
continue;
@@ -1695,6 +1708,7 @@ skip:
double_unlock_balance(this_rq, src_rq);
}
 
+   put_online_cpus_atomic();
return ret;
 }
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 16/45] tick-broadcast: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Thomas Gleixner 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/time/tick-broadcast.c |8 
 1 file changed, 8 insertions(+)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 9d96a54..1e4ac45 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -227,12 +227,14 @@ static void tick_do_broadcast(struct cpumask *mask)
  */
 static void tick_do_periodic_broadcast(void)
 {
+   get_online_cpus_atomic();
raw_spin_lock(&tick_broadcast_lock);
 
cpumask_and(tmpmask, cpu_online_mask, tick_broadcast_mask);
tick_do_broadcast(tmpmask);
 
raw_spin_unlock(&tick_broadcast_lock);
+   put_online_cpus_atomic();
 }
 
 /*
@@ -335,11 +337,13 @@ out:
  */
 void tick_broadcast_on_off(unsigned long reason, int *oncpu)
 {
+   get_online_cpus_atomic();
if (!cpumask_test_cpu(*oncpu, cpu_online_mask))
printk(KERN_ERR "tick-broadcast: ignoring broadcast for "
   "offline CPU #%d\n", *oncpu);
else
tick_do_broadcast_on_off(&reason);
+   put_online_cpus_atomic();
 }
 
 /*
@@ -505,6 +509,7 @@ static void tick_handle_oneshot_broadcast(struct 
clock_event_device *dev)
ktime_t now, next_event;
int cpu, next_cpu = 0;
 
+   get_online_cpus_atomic();
raw_spin_lock(&tick_broadcast_lock);
 again:
dev->next_event.tv64 = KTIME_MAX;
@@ -562,6 +567,7 @@ again:
goto again;
}
raw_spin_unlock(&tick_broadcast_lock);
+   put_online_cpus_atomic();
 }
 
 /*
@@ -756,6 +762,7 @@ void tick_broadcast_switch_to_oneshot(void)
struct clock_event_device *bc;
unsigned long flags;
 
+   get_online_cpus_atomic();
raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 
tick_broadcast_device.mode = TICKDEV_MODE_ONESHOT;
@@ -764,6 +771,7 @@ void tick_broadcast_switch_to_oneshot(void)
tick_broadcast_setup_oneshot(bc);
 
raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+   put_online_cpus_atomic();
 }
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 15/45] rcu: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline,
while being protected by a spinlock. Use the get/put_online_cpus_atomic()
APIs to prevent CPUs from going offline, while invoking from atomic context.

Cc: Dipankar Sarma 
Cc: "Paul E. McKenney" 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/rcutree.c |4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index cf3adc6..caeed1a 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2107,6 +2107,8 @@ static void force_qs_rnp(struct rcu_state *rsp, int 
(*f)(struct rcu_data *))
rcu_initiate_boost(rnp, flags); /* releases rnp->lock */
continue;
}
+
+   get_online_cpus_atomic();
cpu = rnp->grplo;
bit = 1;
for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
@@ -2114,6 +2116,8 @@ static void force_qs_rnp(struct rcu_state *rsp, int 
(*f)(struct rcu_data *))
f(per_cpu_ptr(rsp->rda, cpu)))
mask |= bit;
}
+   put_online_cpus_atomic();
+
if (mask != 0) {
 
/* rcu_report_qs_rnp() releases rnp->lock. */

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 17/45] time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: John Stultz 
Cc: Thomas Gleixner 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/time/clocksource.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index e713ef7..c4bbc25 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -30,6 +30,7 @@
 #include  /* for spin_unlock_irq() using preempt_count() m68k */
 #include 
 #include 
+#include 
 
 #include "tick-internal.h"
 
@@ -252,6 +253,7 @@ static void clocksource_watchdog(unsigned long data)
int64_t wd_nsec, cs_nsec;
int next_cpu, reset_pending;
 
+   get_online_cpus_atomic();
spin_lock(&watchdog_lock);
if (!watchdog_running)
goto out;
@@ -329,6 +331,7 @@ static void clocksource_watchdog(unsigned long data)
add_timer_on(&watchdog_timer, next_cpu);
 out:
spin_unlock(&watchdog_lock);
+   put_online_cpus_atomic();
 }
 
 static inline void clocksource_start_watchdog(void)
@@ -367,6 +370,7 @@ static void clocksource_enqueue_watchdog(struct clocksource 
*cs)
 {
unsigned long flags;
 
+   get_online_cpus_atomic();
spin_lock_irqsave(&watchdog_lock, flags);
if (cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
/* cs is a clocksource to be watched. */
@@ -386,6 +390,7 @@ static void clocksource_enqueue_watchdog(struct clocksource 
*cs)
/* Check if the watchdog timer needs to be started. */
clocksource_start_watchdog();
spin_unlock_irqrestore(&watchdog_lock, flags);
+   put_online_cpus_atomic();
 }
 
 static void clocksource_dequeue_watchdog(struct clocksource *cs)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 18/45] softirq: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Frederic Weisbecker 
Cc: Thomas Gleixner 
Cc: Andrew Morton 
Cc: Sedat Dilek 
Cc: "Paul E. McKenney" 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/softirq.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 3d6833f..c289722 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -644,14 +644,17 @@ static void remote_softirq_receive(void *data)
 
 static int __try_remote_softirq(struct call_single_data *cp, int cpu, int 
softirq)
 {
+   get_online_cpus_atomic();
if (cpu_online(cpu)) {
cp->func = remote_softirq_receive;
cp->info = &softirq;
cp->flags = 0;
 
__smp_call_function_single(cpu, cp, 0);
+   put_online_cpus_atomic();
return 0;
}
+   put_online_cpus_atomic();
return 1;
 }
 #else /* CONFIG_USE_GENERIC_SMP_HELPERS */

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 19/45] irq: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Thomas Gleixner 
Signed-off-by: Srivatsa S. Bhat 
---

 kernel/irq/manage.c |7 +++
 kernel/irq/proc.c   |3 +++
 2 files changed, 10 insertions(+)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e16caa8..4d89f19 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internals.h"
 
@@ -202,9 +203,11 @@ int irq_set_affinity(unsigned int irq, const struct 
cpumask *mask)
if (!desc)
return -EINVAL;
 
+   get_online_cpus_atomic();
raw_spin_lock_irqsave(&desc->lock, flags);
ret =  __irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask);
raw_spin_unlock_irqrestore(&desc->lock, flags);
+   put_online_cpus_atomic();
return ret;
 }
 
@@ -343,9 +346,11 @@ int irq_select_affinity_usr(unsigned int irq, struct 
cpumask *mask)
unsigned long flags;
int ret;
 
+   get_online_cpus_atomic();
raw_spin_lock_irqsave(&desc->lock, flags);
ret = setup_affinity(irq, desc, mask);
raw_spin_unlock_irqrestore(&desc->lock, flags);
+   put_online_cpus_atomic();
return ret;
 }
 
@@ -1128,7 +1133,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
}
 
/* Set default affinity mask once everything is setup */
+   get_online_cpus_atomic();
setup_affinity(irq, desc, mask);
+   put_online_cpus_atomic();
 
} else if (new->flags & IRQF_TRIGGER_MASK) {
unsigned int nmsk = new->flags & IRQF_TRIGGER_MASK;
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 19ed5c4..47f9a74 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -7,6 +7,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -441,6 +442,7 @@ int show_interrupts(struct seq_file *p, void *v)
if (!desc)
return 0;
 
+   get_online_cpus_atomic();
raw_spin_lock_irqsave(&desc->lock, flags);
for_each_online_cpu(j)
any_count |= kstat_irqs_cpu(i, j);
@@ -477,6 +479,7 @@ int show_interrupts(struct seq_file *p, void *v)
seq_putc(p, '\n');
 out:
raw_spin_unlock_irqrestore(&desc->lock, flags);
+   put_online_cpus_atomic();
return 0;
 }
 #endif

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 20/45] net: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Alexander Duyck 
Cc: Cong Wang 
Cc: Ben Hutchings 
Cc: net...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 net/core/dev.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index fc1e289..90519e9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3141,7 +3141,7 @@ int netif_rx(struct sk_buff *skb)
struct rps_dev_flow voidflow, *rflow = &voidflow;
int cpu;
 
-   preempt_disable();
+   get_online_cpus_atomic();
rcu_read_lock();
 
cpu = get_rps_cpu(skb->dev, skb, &rflow);
@@ -3151,7 +3151,7 @@ int netif_rx(struct sk_buff *skb)
ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
 
rcu_read_unlock();
-   preempt_enable();
+   put_online_cpus_atomic();
} else
 #endif
{
@@ -3570,6 +3570,7 @@ int netif_receive_skb(struct sk_buff *skb)
struct rps_dev_flow voidflow, *rflow = &voidflow;
int cpu, ret;
 
+   get_online_cpus_atomic();
rcu_read_lock();
 
cpu = get_rps_cpu(skb->dev, skb, &rflow);
@@ -3577,9 +3578,11 @@ int netif_receive_skb(struct sk_buff *skb)
if (cpu >= 0) {
ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
rcu_read_unlock();
+   put_online_cpus_atomic();
return ret;
}
rcu_read_unlock();
+   put_online_cpus_atomic();
}
 #endif
return __netif_receive_skb(skb);
@@ -3957,6 +3960,7 @@ static void net_rps_action_and_irq_enable(struct 
softnet_data *sd)
local_irq_enable();
 
/* Send pending IPI's to kick RPS processing on remote cpus. */
+   get_online_cpus_atomic();
while (remsd) {
struct softnet_data *next = remsd->rps_ipi_next;
 
@@ -3965,6 +3969,7 @@ static void net_rps_action_and_irq_enable(struct 
softnet_data *sd)
   &remsd->csd, 0);
remsd = next;
}
+   put_online_cpus_atomic();
} else
 #endif
local_irq_enable();

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 21/45] block: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Jens Axboe 
Signed-off-by: Srivatsa S. Bhat 
---

 block/blk-softirq.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 467c8de..bbab3d3 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -58,6 +58,7 @@ static void trigger_softirq(void *data)
  */
 static int raise_blk_irq(int cpu, struct request *rq)
 {
+   get_online_cpus_atomic();
if (cpu_online(cpu)) {
struct call_single_data *data = &rq->csd;
 
@@ -66,8 +67,10 @@ static int raise_blk_irq(int cpu, struct request *rq)
data->flags = 0;
 
__smp_call_function_single(cpu, data, 0);
+   put_online_cpus_atomic();
return 0;
}
+   put_online_cpus_atomic();
 
return 1;
 }

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 22/45] percpu_counter: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Al Viro 
Cc: Tejun Heo 
Signed-off-by: Srivatsa S. Bhat 
---

 lib/percpu_counter.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index ba6085d..f5e718d 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -98,6 +98,15 @@ s64 __percpu_counter_sum(struct percpu_counter *fbc)
s64 ret;
int cpu;
 
+   /*
+* The calls to get/put_online_cpus_atomic() is strictly not
+* necessary, since CPU hotplug is explicitly handled via the
+* hotplug callback which synchronizes through fbc->lock.
+* But we add them here anyway to make it easier for the debug
+* code under CONFIG_DEBUG_HOTPLUG_CPU to validate the correctness
+* of hotplug synchronization.
+*/
+   get_online_cpus_atomic();
raw_spin_lock(&fbc->lock);
ret = fbc->count;
for_each_online_cpu(cpu) {
@@ -105,6 +114,7 @@ s64 __percpu_counter_sum(struct percpu_counter *fbc)
ret += *pcount;
}
raw_spin_unlock(&fbc->lock);
+   put_online_cpus_atomic();
return ret;
 }
 EXPORT_SYMBOL(__percpu_counter_sum);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 23/45] infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Hoang-Nam Nguyen 
Cc: Christoph Raisch 
Cc: Roland Dreier 
Cc: Sean Hefty 
Cc: Hal Rosenstock 
Cc: linux-r...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 drivers/infiniband/hw/ehca/ehca_irq.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c 
b/drivers/infiniband/hw/ehca/ehca_irq.c
index 8615d7c..ace901e 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -43,6 +43,7 @@
 
 #include 
 #include 
+#include 
 
 #include "ehca_classes.h"
 #include "ehca_irq.h"
@@ -703,6 +704,7 @@ static void queue_comp_task(struct ehca_cq *__cq)
int cq_jobs;
unsigned long flags;
 
+   get_online_cpus_atomic();
cpu_id = find_next_online_cpu(pool);
BUG_ON(!cpu_online(cpu_id));
 
@@ -720,6 +722,7 @@ static void queue_comp_task(struct ehca_cq *__cq)
BUG_ON(!cct || !thread);
}
__queue_comp_task(__cq, cct, thread);
+   put_online_cpus_atomic();
 }
 
 static void run_comp_task(struct ehca_cpu_comp_task *cct)
@@ -759,6 +762,7 @@ static void comp_task_park(unsigned int cpu)
list_splice_init(&cct->cq_list, &list);
spin_unlock_irq(&cct->task_lock);
 
+   get_online_cpus_atomic();
cpu = find_next_online_cpu(pool);
target = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
thread = *per_cpu_ptr(pool->cpu_comp_threads, cpu);
@@ -768,6 +772,7 @@ static void comp_task_park(unsigned int cpu)
__queue_comp_task(cq, target, thread);
}
spin_unlock_irq(&target->task_lock);
+   put_online_cpus_atomic();
 }
 
 static void comp_task_stop(unsigned int cpu, bool online)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 24/45] [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Robert Love 
Cc: "James E.J. Bottomley" 
Cc: de...@open-fcoe.org
Cc: linux-s...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 drivers/scsi/fcoe/fcoe.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index 292b24f..a107d3c 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -1484,6 +1484,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct 
net_device *netdev,
 * was originated, otherwise select cpu using rx exchange id
 * or fcoe_select_cpu().
 */
+   get_online_cpus_atomic();
if (ntoh24(fh->fh_f_ctl) & FC_FC_EX_CTX)
cpu = ntohs(fh->fh_ox_id) & fc_cpu_mask;
else {
@@ -1493,8 +1494,10 @@ static int fcoe_rcv(struct sk_buff *skb, struct 
net_device *netdev,
cpu = ntohs(fh->fh_rx_id) & fc_cpu_mask;
}
 
-   if (cpu >= nr_cpu_ids)
+   if (cpu >= nr_cpu_ids) {
+   put_online_cpus_atomic();
goto err;
+   }
 
fps = &per_cpu(fcoe_percpu, cpu);
spin_lock(&fps->fcoe_rx_list.lock);
@@ -1514,6 +1517,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct 
net_device *netdev,
spin_lock(&fps->fcoe_rx_list.lock);
if (!fps->thread) {
spin_unlock(&fps->fcoe_rx_list.lock);
+   put_online_cpus_atomic();
goto err;
}
}
@@ -1535,6 +1539,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct 
net_device *netdev,
if (fps->thread->state == TASK_INTERRUPTIBLE)
wake_up_process(fps->thread);
spin_unlock(&fps->fcoe_rx_list.lock);
+   put_online_cpus_atomic();
 
return 0;
 err:

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 25/45] staging/octeon: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Greg Kroah-Hartman 
Cc: de...@driverdev.osuosl.org
Acked-by: David Daney 
Signed-off-by: Srivatsa S. Bhat 
---

 drivers/staging/octeon/ethernet-rx.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/staging/octeon/ethernet-rx.c 
b/drivers/staging/octeon/ethernet-rx.c
index 34afc16..8588b4d 100644
--- a/drivers/staging/octeon/ethernet-rx.c
+++ b/drivers/staging/octeon/ethernet-rx.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #ifdef CONFIG_XFRM
@@ -97,6 +98,7 @@ static void cvm_oct_enable_one_cpu(void)
return;
 
/* ... if a CPU is available, Turn on NAPI polling for that CPU.  */
+   get_online_cpus_atomic();
for_each_online_cpu(cpu) {
if (!cpu_test_and_set(cpu, core_state.cpu_state)) {
v = smp_call_function_single(cpu, cvm_oct_enable_napi,
@@ -106,6 +108,7 @@ static void cvm_oct_enable_one_cpu(void)
break;
}
}
+   put_online_cpus_atomic();
 }
 
 static void cvm_oct_no_more_work(void)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 26/45] x86: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Konrad Rzeszutek Wilk 
Cc: Sebastian Andrzej Siewior 
Cc: Joerg Roedel 
Cc: Jan Beulich 
Cc: Joonsoo Kim 
Cc: linux-e...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/x86/kernel/apic/io_apic.c   |   21 ++---
 arch/x86/kernel/cpu/mcheck/therm_throt.c |4 ++--
 arch/x86/mm/tlb.c|   14 +++---
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 9ed796c..4c71c1e 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1169,9 +1170,11 @@ int assign_irq_vector(int irq, struct irq_cfg *cfg, 
const struct cpumask *mask)
int err;
unsigned long flags;
 
+   get_online_cpus_atomic();
raw_spin_lock_irqsave(&vector_lock, flags);
err = __assign_irq_vector(irq, cfg, mask);
raw_spin_unlock_irqrestore(&vector_lock, flags);
+   put_online_cpus_atomic();
return err;
 }
 
@@ -1757,13 +1760,13 @@ __apicdebuginit(void) print_local_APICs(int maxcpu)
if (!maxcpu)
return;
 
-   preempt_disable();
+   get_online_cpus_atomic();
for_each_online_cpu(cpu) {
if (cpu >= maxcpu)
break;
smp_call_function_single(cpu, print_local_APIC, NULL, 1);
}
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 __apicdebuginit(void) print_PIC(void)
@@ -2153,10 +2156,12 @@ static int ioapic_retrigger_irq(struct irq_data *data)
unsigned long flags;
int cpu;
 
+   get_online_cpus_atomic();
raw_spin_lock_irqsave(&vector_lock, flags);
cpu = cpumask_first_and(cfg->domain, cpu_online_mask);
apic->send_IPI_mask(cpumask_of(cpu), cfg->vector);
raw_spin_unlock_irqrestore(&vector_lock, flags);
+   put_online_cpus_atomic();
 
return 1;
 }
@@ -2175,6 +2180,7 @@ void send_cleanup_vector(struct irq_cfg *cfg)
 {
cpumask_var_t cleanup_mask;
 
+   get_online_cpus_atomic();
if (unlikely(!alloc_cpumask_var(&cleanup_mask, GFP_ATOMIC))) {
unsigned int i;
for_each_cpu_and(i, cfg->old_domain, cpu_online_mask)
@@ -2185,6 +2191,7 @@ void send_cleanup_vector(struct irq_cfg *cfg)
free_cpumask_var(cleanup_mask);
}
cfg->move_in_progress = 0;
+   put_online_cpus_atomic();
 }
 
 asmlinkage void smp_irq_move_cleanup_interrupt(void)
@@ -2939,11 +2946,13 @@ unsigned int __create_irqs(unsigned int from, unsigned 
int count, int node)
goto out_irqs;
}
 
+   get_online_cpus_atomic();
raw_spin_lock_irqsave(&vector_lock, flags);
for (i = 0; i < count; i++)
if (__assign_irq_vector(irq + i, cfg[i], apic->target_cpus()))
goto out_vecs;
raw_spin_unlock_irqrestore(&vector_lock, flags);
+   put_online_cpus_atomic();
 
for (i = 0; i < count; i++) {
irq_set_chip_data(irq + i, cfg[i]);
@@ -2957,6 +2966,7 @@ out_vecs:
for (i--; i >= 0; i--)
__clear_irq_vector(irq + i, cfg[i]);
raw_spin_unlock_irqrestore(&vector_lock, flags);
+   put_online_cpus_atomic();
 out_irqs:
for (i = 0; i < count; i++)
free_irq_at(irq + i, cfg[i]);
@@ -2994,9 +3004,11 @@ void destroy_irq(unsigned int irq)
 
free_remapped_irq(irq);
 
+   get_online_cpus_atomic();
raw_spin_lock_irqsave(&vector_lock, flags);
__clear_irq_vector(irq, cfg);
raw_spin_unlock_irqrestore(&vector_lock, flags);
+   put_online_cpus_atomic();
free_irq_at(irq, cfg);
 }
 
@@ -3365,8 +3377,11 @@ io_apic_setup_irq_pin(unsigned int irq, int node, struct 
io_apic_irq_attr *attr)
if (!cfg)
return -EINVAL;
ret = __add_pin_to_irq_node(cfg, node, attr->ioapic, attr->ioapic_pin);
-   if (!ret)
+   if (!ret) {
+   get_online_cpus_atomic();
setup_ioapic_irq(irq, cfg, attr);
+   put_online_cpus_atomic();
+   }
return ret;
 }
 
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c 
b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 2f3a799..3eea984 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -83,13 +83,13 @@ static ssize_t therm_throt_device_show_##event##_##name(
\
unsigned int cpu = dev->id;   

[PATCH v2 27/45] perf/x86: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
The CPU_DYING notifier modifies the per-cpu pointer pmu->box, and this can
race with functions such as uncore_pmu_to_box() and uncore_pci_remove() when
we remove stop_machine() from the CPU offline path. So protect them using
get/put_online_cpus_atomic().

Cc: Peter Zijlstra 
Cc: Paul Mackerras 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/x86/kernel/cpu/perf_event_intel_uncore.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c 
b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 9dd9975..7c2a064 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1,3 +1,4 @@
+#include 
 #include "perf_event_intel_uncore.h"
 
 static struct intel_uncore_type *empty_uncore[] = { NULL, };
@@ -2630,6 +2631,7 @@ uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
if (box)
return box;
 
+   get_online_cpus_atomic();
raw_spin_lock(&uncore_box_lock);
list_for_each_entry(box, &pmu->box_list, list) {
if (box->phys_id == topology_physical_package_id(cpu)) {
@@ -2639,6 +2641,7 @@ uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
}
}
raw_spin_unlock(&uncore_box_lock);
+   put_online_cpus_atomic();
 
return *per_cpu_ptr(pmu->box, cpu);
 }
@@ -3229,6 +3232,7 @@ static void uncore_pci_remove(struct pci_dev *pdev)
list_del(&box->list);
raw_spin_unlock(&uncore_box_lock);
 
+   get_online_cpus_atomic();
for_each_possible_cpu(cpu) {
if (*per_cpu_ptr(pmu->box, cpu) == box) {
*per_cpu_ptr(pmu->box, cpu) = NULL;
@@ -3237,6 +3241,8 @@ static void uncore_pci_remove(struct pci_dev *pdev)
}
 
WARN_ON_ONCE(atomic_read(&box->refcnt) != 1);
+   put_online_cpus_atomic();
+
kfree(box);
 }
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 28/45] KVM: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Gleb Natapov 
Cc: Paolo Bonzini 
Cc: k...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 virt/kvm/kvm_main.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 302681c..5bbfa30 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -174,7 +174,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned 
int req)
 
zalloc_cpumask_var(&cpus, GFP_ATOMIC);
 
-   me = get_cpu();
+   me = get_online_cpus_atomic();
kvm_for_each_vcpu(i, vcpu, kvm) {
kvm_make_request(req, vcpu);
cpu = vcpu->cpu;
@@ -192,7 +192,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned 
int req)
smp_call_function_many(cpus, ack_flush, NULL, 1);
else
called = false;
-   put_cpu();
+   put_online_cpus_atomic();
free_cpumask_var(cpus);
return called;
 }
@@ -1707,11 +1707,11 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
++vcpu->stat.halt_wakeup;
}
 
-   me = get_cpu();
+   me = get_online_cpus_atomic();
if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
if (kvm_arch_vcpu_should_kick(vcpu))
smp_send_reschedule(cpu);
-   put_cpu();
+   put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
 #endif /* !CONFIG_S390 */

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 29/45] kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Gleb Natapov 
Cc: Paolo Bonzini 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/x86/kvm/vmx.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 260a919..4e1e966 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -7164,12 +7165,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (!vmm_exclusive)
kvm_cpu_vmxoff();
 
-   cpu = get_cpu();
+   cpu = get_online_cpus_atomic();
vmx_vcpu_load(&vmx->vcpu, cpu);
vmx->vcpu.cpu = cpu;
err = vmx_vcpu_setup(vmx);
vmx_vcpu_put(&vmx->vcpu);
-   put_cpu();
+   put_online_cpus_atomic();
if (err)
goto free_vmcs;
if (vm_need_virtualize_apic_accesses(kvm)) {
@@ -7706,12 +7707,12 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
 
vmx->nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET);
 
-   cpu = get_cpu();
+   cpu = get_online_cpus_atomic();
vmx->loaded_vmcs = vmcs02;
vmx_vcpu_put(vcpu);
vmx_vcpu_load(vcpu, cpu);
vcpu->cpu = cpu;
-   put_cpu();
+   put_online_cpus_atomic();
 
vmx_segment_cache_clear(vmx);
 
@@ -8023,12 +8024,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)
leave_guest_mode(vcpu);
prepare_vmcs12(vcpu, vmcs12);
 
-   cpu = get_cpu();
+   cpu = get_online_cpus_atomic();
vmx->loaded_vmcs = &vmx->vmcs01;
vmx_vcpu_put(vcpu);
vmx_vcpu_load(vcpu, cpu);
vcpu->cpu = cpu;
-   put_cpu();
+   put_online_cpus_atomic();
 
vmx_segment_cache_clear(vmx);
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 30/45] x86/xen: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Konrad Rzeszutek Wilk 
Cc: Jeremy Fitzhardinge 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: xen-de...@lists.xensource.com
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/x86/xen/mmu.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fdc3ba2..3229c4f 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -39,6 +39,7 @@
  * Jeremy Fitzhardinge , XenSource Inc, 2007
  */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1163,9 +1164,13 @@ static void xen_drop_mm_ref(struct mm_struct *mm)
  */
 static void xen_exit_mmap(struct mm_struct *mm)
 {
-   get_cpu();  /* make sure we don't move around */
+   /*
+* Make sure we don't move around, and also prevent CPUs from
+* going offline.
+*/
+   get_online_cpus_atomic();
xen_drop_mm_ref(mm);
-   put_cpu();
+   put_online_cpus_atomic();
 
spin_lock(&mm->page_table_lock);
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 31/45] alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Thomas Gleixner 
Cc: linux-al...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/alpha/kernel/smp.c |   19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index 7b60834..e147268 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -497,7 +497,6 @@ smp_cpus_done(unsigned int max_cpus)
   ((bogosum + 2500) / (5000/HZ)) % 100);
 }
 
-
 void
 smp_percpu_timer_interrupt(struct pt_regs *regs)
 {
@@ -681,7 +680,7 @@ ipi_flush_tlb_mm(void *x)
 void
 flush_tlb_mm(struct mm_struct *mm)
 {
-   preempt_disable();
+   get_online_cpus_atomic();
 
if (mm == current->active_mm) {
flush_tlb_current(mm);
@@ -693,7 +692,7 @@ flush_tlb_mm(struct mm_struct *mm)
if (mm->context[cpu])
mm->context[cpu] = 0;
}
-   preempt_enable();
+   put_online_cpus_atomic();
return;
}
}
@@ -702,7 +701,7 @@ flush_tlb_mm(struct mm_struct *mm)
printk(KERN_CRIT "flush_tlb_mm: timed out\n");
}
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(flush_tlb_mm);
 
@@ -730,7 +729,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long 
addr)
struct flush_tlb_page_struct data;
struct mm_struct *mm = vma->vm_mm;
 
-   preempt_disable();
+   get_online_cpus_atomic();
 
if (mm == current->active_mm) {
flush_tlb_current_page(mm, vma, addr);
@@ -742,7 +741,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long 
addr)
if (mm->context[cpu])
mm->context[cpu] = 0;
}
-   preempt_enable();
+   put_online_cpus_atomic();
return;
}
}
@@ -755,7 +754,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long 
addr)
printk(KERN_CRIT "flush_tlb_page: timed out\n");
}
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(flush_tlb_page);
 
@@ -786,7 +785,7 @@ flush_icache_user_range(struct vm_area_struct *vma, struct 
page *page,
if ((vma->vm_flags & VM_EXEC) == 0)
return;
 
-   preempt_disable();
+   get_online_cpus_atomic();
 
if (mm == current->active_mm) {
__load_new_mm_context(mm);
@@ -798,7 +797,7 @@ flush_icache_user_range(struct vm_area_struct *vma, struct 
page *page,
if (mm->context[cpu])
mm->context[cpu] = 0;
}
-   preempt_enable();
+   put_online_cpus_atomic();
return;
}
}
@@ -807,5 +806,5 @@ flush_icache_user_range(struct vm_area_struct *vma, struct 
page *page,
printk(KERN_CRIT "flush_icache_page: timed out\n");
}
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 32/45] blackfin/smp: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Mike Frysinger 
Cc: Bob Liu 
Cc: Steven Miao 
Cc: Thomas Gleixner 
Cc: uclinux-dist-de...@blackfin.uclinux.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/blackfin/mach-common/smp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/blackfin/mach-common/smp.c b/arch/blackfin/mach-common/smp.c
index 1bc2ce6..11496cd 100644
--- a/arch/blackfin/mach-common/smp.c
+++ b/arch/blackfin/mach-common/smp.c
@@ -238,13 +238,13 @@ void smp_send_stop(void)
 {
cpumask_t callmap;
 
-   preempt_disable();
+   get_online_cpus_atomic();
cpumask_copy(&callmap, cpu_online_mask);
cpumask_clear_cpu(smp_processor_id(), &callmap);
if (!cpumask_empty(&callmap))
send_ipi(&callmap, BFIN_IPI_CPU_STOP);
 
-   preempt_enable();
+   put_online_cpus_atomic();
 
return;
 }

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 33/45] cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Acked-by: Jesper Nilsson 
Cc: Mikael Starvik 
Cc: Thomas Gleixner 
Cc: linux-cris-ker...@axis.com
Signed-off-by: Srivatsa S. Bhat 
---

 arch/cris/arch-v32/kernel/smp.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/cris/arch-v32/kernel/smp.c b/arch/cris/arch-v32/kernel/smp.c
index cdd1202..b2d4612 100644
--- a/arch/cris/arch-v32/kernel/smp.c
+++ b/arch/cris/arch-v32/kernel/smp.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -222,6 +223,7 @@ void flush_tlb_common(struct mm_struct* mm, struct 
vm_area_struct* vma, unsigned
unsigned long flags;
cpumask_t cpu_mask;
 
+   get_online_cpus_atomic();
spin_lock_irqsave(&tlbstate_lock, flags);
cpu_mask = (mm == FLUSH_ALL ? cpu_all_mask : *mm_cpumask(mm));
cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
@@ -230,6 +232,7 @@ void flush_tlb_common(struct mm_struct* mm, struct 
vm_area_struct* vma, unsigned
flush_addr = addr;
send_ipi(IPI_FLUSH_TLB, 1, cpu_mask);
spin_unlock_irqrestore(&tlbstate_lock, flags);
+   put_online_cpus_atomic();
 }
 
 void flush_tlb_all(void)
@@ -319,10 +322,12 @@ int smp_call_function(void (*func)(void *info), void 
*info, int wait)
data.info = info;
data.wait = wait;
 
+   get_online_cpus_atomic();
spin_lock(&call_lock);
call_data = &data;
ret = send_ipi(IPI_CALL, wait, cpu_mask);
spin_unlock(&call_lock);
+   put_online_cpus_atomic();
 
return ret;
 }

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 34/45] hexagon/smp: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Richard Kuo 
Cc: Thomas Gleixner 
Cc: linux-hexa...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/hexagon/kernel/smp.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index 0e364ca..30d4318 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -241,9 +241,12 @@ void smp_send_reschedule(int cpu)
 void smp_send_stop(void)
 {
struct cpumask targets;
+
+   get_online_cpus_atomic();
cpumask_copy(&targets, cpu_online_mask);
cpumask_clear_cpu(smp_processor_id(), &targets);
send_ipi(&targets, IPI_CPU_STOP);
+   put_online_cpus_atomic();
 }
 
 void arch_send_call_function_single_ipi(int cpu)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 35/45] ia64: irq, perfmon: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Andrew Morton 
Cc: "Eric W. Biederman" 
Cc: Thomas Gleixner 
Cc: linux-i...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/ia64/kernel/irq_ia64.c |   15 +++
 arch/ia64/kernel/perfmon.c  |8 +++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c
index 1034884..f58b162 100644
--- a/arch/ia64/kernel/irq_ia64.c
+++ b/arch/ia64/kernel/irq_ia64.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -160,9 +161,11 @@ int bind_irq_vector(int irq, int vector, cpumask_t domain)
unsigned long flags;
int ret;
 
+   get_online_cpus_atomic();
spin_lock_irqsave(&vector_lock, flags);
ret = __bind_irq_vector(irq, vector, domain);
spin_unlock_irqrestore(&vector_lock, flags);
+   put_online_cpus_atomic();
return ret;
 }
 
@@ -190,9 +193,11 @@ static void clear_irq_vector(int irq)
 {
unsigned long flags;
 
+   get_online_cpus_atomic();
spin_lock_irqsave(&vector_lock, flags);
__clear_irq_vector(irq);
spin_unlock_irqrestore(&vector_lock, flags);
+   put_online_cpus_atomic();
 }
 
 int
@@ -204,6 +209,7 @@ ia64_native_assign_irq_vector (int irq)
 
vector = -ENOSPC;
 
+   get_online_cpus_atomic();
spin_lock_irqsave(&vector_lock, flags);
for_each_online_cpu(cpu) {
domain = vector_allocation_domain(cpu);
@@ -218,6 +224,7 @@ ia64_native_assign_irq_vector (int irq)
BUG_ON(__bind_irq_vector(irq, vector, domain));
  out:
spin_unlock_irqrestore(&vector_lock, flags);
+   put_online_cpus_atomic();
return vector;
 }
 
@@ -302,9 +309,11 @@ int irq_prepare_move(int irq, int cpu)
unsigned long flags;
int ret;
 
+   get_online_cpus_atomic();
spin_lock_irqsave(&vector_lock, flags);
ret = __irq_prepare_move(irq, cpu);
spin_unlock_irqrestore(&vector_lock, flags);
+   put_online_cpus_atomic();
return ret;
 }
 
@@ -320,11 +329,13 @@ void irq_complete_move(unsigned irq)
if (unlikely(cpu_isset(smp_processor_id(), cfg->old_domain)))
return;
 
+   get_online_cpus_atomic();
cpumask_and(&cleanup_mask, &cfg->old_domain, cpu_online_mask);
cfg->move_cleanup_count = cpus_weight(cleanup_mask);
for_each_cpu_mask(i, cleanup_mask)
platform_send_ipi(i, IA64_IRQ_MOVE_VECTOR, IA64_IPI_DM_INT, 0);
cfg->move_in_progress = 0;
+   put_online_cpus_atomic();
 }
 
 static irqreturn_t smp_irq_move_cleanup_interrupt(int irq, void *dev_id)
@@ -393,10 +404,12 @@ void destroy_and_reserve_irq(unsigned int irq)
 
dynamic_irq_cleanup(irq);
 
+   get_online_cpus_atomic();
spin_lock_irqsave(&vector_lock, flags);
__clear_irq_vector(irq);
irq_status[irq] = IRQ_RSVD;
spin_unlock_irqrestore(&vector_lock, flags);
+   put_online_cpus_atomic();
 }
 
 /*
@@ -409,6 +422,7 @@ int create_irq(void)
cpumask_t domain = CPU_MASK_NONE;
 
irq = vector = -ENOSPC;
+   get_online_cpus_atomic();
spin_lock_irqsave(&vector_lock, flags);
for_each_online_cpu(cpu) {
domain = vector_allocation_domain(cpu);
@@ -424,6 +438,7 @@ int create_irq(void)
BUG_ON(__bind_irq_vector(irq, vector, domain));
  out:
spin_unlock_irqrestore(&vector_lock, flags);
+   put_online_cpus_atomic();
if (irq >= 0)
dynamic_irq_init(irq);
return irq;
diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 9ea25fc..16c8303 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -6476,9 +6476,12 @@ pfm_install_alt_pmu_interrupt(pfm_intr_handler_desc_t 
*hdl)
/* do the easy test first */
if (pfm_alt_intr_handler) return -EBUSY;
 
+   get_online_cpus_atomic();
+
/* one at a time in the install or remove, just fail the others */
if (!spin_trylock(&pfm_alt_install_check)) {
-   return -EBUSY;
+   ret = -EBUSY;
+   goto out;
}
 
/* reserve our session */
@@ -6498,6 +6501,7 @@ pfm_install_alt_pmu_interrupt(pfm_intr_handler_desc_t 
*hdl)
pfm_alt_intr_handler = hdl;
 
spin_unlock(&pfm_alt_install_check);
+   put_online_cpus_atomic();
 
return 0;
 
@@ -6510,6 +6514,8 @@ cleanup_reserve:
}
 
spin_unlock(&pfm_alt_install_check);
+out:
+   put_online_cpus_atomic();
 
return ret;
 }

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozl

[PATCH v2 36/45] ia64: smp, tlb: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: linux-i...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/ia64/kernel/smp.c |   12 ++--
 arch/ia64/mm/tlb.c |4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index 9fcd4e6..25991ba 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -259,8 +260,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask)
cpumask_t cpumask = xcpumask;
int mycpu, cpu, flush_mycpu = 0;
 
-   preempt_disable();
-   mycpu = smp_processor_id();
+   mycpu = get_online_cpus_atomic();
 
for_each_cpu_mask(cpu, cpumask)
counts[cpu] = local_tlb_flush_counts[cpu].count & 0x;
@@ -280,7 +280,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask)
while(counts[cpu] == (local_tlb_flush_counts[cpu].count & 
0x))
udelay(FLUSH_DELAY);
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 void
@@ -293,12 +293,12 @@ void
 smp_flush_tlb_mm (struct mm_struct *mm)
 {
cpumask_var_t cpus;
-   preempt_disable();
+   get_online_cpus_atomic();
/* this happens for the common case of a single-threaded fork():  */
if (likely(mm == current->active_mm && atomic_read(&mm->mm_users) == 1))
{
local_finish_flush_tlb_mm(mm);
-   preempt_enable();
+   put_online_cpus_atomic();
return;
}
if (!alloc_cpumask_var(&cpus, GFP_ATOMIC)) {
@@ -313,7 +313,7 @@ smp_flush_tlb_mm (struct mm_struct *mm)
local_irq_disable();
local_finish_flush_tlb_mm(mm);
local_irq_enable();
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 void arch_send_call_function_single_ipi(int cpu)
diff --git a/arch/ia64/mm/tlb.c b/arch/ia64/mm/tlb.c
index ed61297..8c55ef5 100644
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -87,11 +87,11 @@ wrap_mmu_context (struct mm_struct *mm)
 * can't call flush_tlb_all() here because of race condition
 * with O(1) scheduler [EF]
 */
-   cpu = get_cpu(); /* prevent preemption/migration */
+   cpu = get_online_cpus_atomic(); /* prevent preemption/migration */
for_each_online_cpu(i)
if (i != cpu)
per_cpu(ia64_need_tlb_flush, i) = 1;
-   put_cpu();
+   put_online_cpus_atomic();
local_flush_tlb_all();
 }
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 37/45] m32r: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Hirokazu Takata 
Cc: linux-m...@ml.linux-m32r.org
Cc: linux-m32r...@ml.linux-m32r.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/m32r/kernel/smp.c |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/m32r/kernel/smp.c b/arch/m32r/kernel/smp.c
index ce7aea3..ffafdba 100644
--- a/arch/m32r/kernel/smp.c
+++ b/arch/m32r/kernel/smp.c
@@ -151,7 +151,7 @@ void smp_flush_cache_all(void)
cpumask_t cpumask;
unsigned long *mask;
 
-   preempt_disable();
+   get_online_cpus_atomic();
cpumask_copy(&cpumask, cpu_online_mask);
cpumask_clear_cpu(smp_processor_id(), &cpumask);
spin_lock(&flushcache_lock);
@@ -162,7 +162,7 @@ void smp_flush_cache_all(void)
while (flushcache_cpumask)
mb();
spin_unlock(&flushcache_lock);
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 void smp_flush_cache_all_interrupt(void)
@@ -197,12 +197,12 @@ void smp_flush_tlb_all(void)
 {
unsigned long flags;
 
-   preempt_disable();
+   get_online_cpus_atomic();
local_irq_save(flags);
__flush_tlb_all();
local_irq_restore(flags);
smp_call_function(flush_tlb_all_ipi, NULL, 1);
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 /*==*
@@ -250,7 +250,7 @@ void smp_flush_tlb_mm(struct mm_struct *mm)
unsigned long *mmc;
unsigned long flags;
 
-   preempt_disable();
+   get_online_cpus_atomic();
cpu_id = smp_processor_id();
mmc = &mm->context[cpu_id];
cpumask_copy(&cpu_mask, mm_cpumask(mm));
@@ -268,7 +268,7 @@ void smp_flush_tlb_mm(struct mm_struct *mm)
if (!cpumask_empty(&cpu_mask))
flush_tlb_others(cpu_mask, mm, NULL, FLUSH_ALL);
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 /*==*
@@ -320,7 +320,7 @@ void smp_flush_tlb_page(struct vm_area_struct *vma, 
unsigned long va)
unsigned long *mmc;
unsigned long flags;
 
-   preempt_disable();
+   get_online_cpus_atomic();
cpu_id = smp_processor_id();
mmc = &mm->context[cpu_id];
cpumask_copy(&cpu_mask, mm_cpumask(mm));
@@ -341,7 +341,7 @@ void smp_flush_tlb_page(struct vm_area_struct *vma, 
unsigned long va)
if (!cpumask_empty(&cpu_mask))
flush_tlb_others(cpu_mask, mm, vma, va);
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 /*==*

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 38/45] MIPS: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Ralf Baechle 
Cc: David Daney 
Cc: Yong Zhang 
Cc: Thomas Gleixner 
Cc: Sanjay Lal 
Cc: "Steven J. Hill" 
Cc: John Crispin 
Cc: Florian Fainelli 
Cc: linux-m...@linux-mips.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/mips/kernel/cevt-smtc.c |7 +++
 arch/mips/kernel/smp.c   |   16 
 arch/mips/kernel/smtc.c  |   12 
 arch/mips/mm/c-octeon.c  |4 ++--
 4 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/mips/kernel/cevt-smtc.c b/arch/mips/kernel/cevt-smtc.c
index 9de5ed7..2e6c0cd 100644
--- a/arch/mips/kernel/cevt-smtc.c
+++ b/arch/mips/kernel/cevt-smtc.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -84,6 +85,8 @@ static int mips_next_event(unsigned long delta,
unsigned long nextcomp = 0L;
int vpe = current_cpu_data.vpe_id;
int cpu = smp_processor_id();
+
+   get_online_cpus_atomic();
local_irq_save(flags);
mtflags = dmt();
 
@@ -164,6 +167,7 @@ static int mips_next_event(unsigned long delta,
}
emt(mtflags);
local_irq_restore(flags);
+   put_online_cpus_atomic();
return 0;
 }
 
@@ -177,6 +181,7 @@ void smtc_distribute_timer(int vpe)
unsigned long nextstamp;
unsigned long reference;
 
+   get_online_cpus_atomic();
 
 repeat:
nextstamp = 0L;
@@ -229,6 +234,8 @@ repeat:
> (unsigned long)LONG_MAX)
goto repeat;
}
+
+   put_online_cpus_atomic();
 }
 
 
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 6e7862a..be152b6 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -250,12 +250,12 @@ static inline void smp_on_other_tlbs(void (*func) (void 
*info), void *info)
 
 static inline void smp_on_each_tlb(void (*func) (void *info), void *info)
 {
-   preempt_disable();
+   get_online_cpus_atomic();
 
smp_on_other_tlbs(func, info);
func(info);
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 /*
@@ -273,7 +273,7 @@ static inline void smp_on_each_tlb(void (*func) (void 
*info), void *info)
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-   preempt_disable();
+   get_online_cpus_atomic();
 
if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
smp_on_other_tlbs(flush_tlb_mm_ipi, mm);
@@ -287,7 +287,7 @@ void flush_tlb_mm(struct mm_struct *mm)
}
local_flush_tlb_mm(mm);
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 struct flush_tlb_data {
@@ -307,7 +307,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned 
long start, unsigned l
 {
struct mm_struct *mm = vma->vm_mm;
 
-   preempt_disable();
+   get_online_cpus_atomic();
if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
struct flush_tlb_data fd = {
.vma = vma,
@@ -325,7 +325,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned 
long start, unsigned l
}
}
local_flush_tlb_range(vma, start, end);
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 static void flush_tlb_kernel_range_ipi(void *info)
@@ -354,7 +354,7 @@ static void flush_tlb_page_ipi(void *info)
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 {
-   preempt_disable();
+   get_online_cpus_atomic();
if ((atomic_read(&vma->vm_mm->mm_users) != 1) || (current->mm != 
vma->vm_mm)) {
struct flush_tlb_data fd = {
.vma = vma,
@@ -371,7 +371,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned 
long page)
}
}
local_flush_tlb_page(vma, page);
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 static void flush_tlb_one_ipi(void *info)
diff --git a/arch/mips/kernel/smtc.c b/arch/mips/kernel/smtc.c
index 75a4fd7..3cda8eb 100644
--- a/arch/mips/kernel/smtc.c
+++ b/arch/mips/kernel/smtc.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1143,6 +1144,8 @@ static irqreturn_t ipi_interrupt(int irq, void *dev_idm)
 * for the current TC, so we ought not to have to do it explicitly here.
 */
 
+   get_online_cpus_atomic();
+
for_each_online_cpu(cpu) {
if (cpu_data[cpu].vpe_id != my_vpe)
continue;
@@ -1180,6 +1183,8 @@ static irqreturn_t ipi_interrupt(int irq, void *dev_idm)
}
}
 
+   put_online_cpus_atomic();
+
return IRQ_HANDLED;
 }
 
@@ -1383,6 +1388,7 @@ void smtc_get_new_mmu_context(struct mm_struct 

[PATCH v2 39/45] mn10300: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: David Howells 
Cc: Koichi Yasutake 
Cc: linux-am33-l...@redhat.com
Signed-off-by: Srivatsa S. Bhat 
---

 arch/mn10300/mm/cache-smp.c |3 +++
 arch/mn10300/mm/tlb-smp.c   |   17 +
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/mn10300/mm/cache-smp.c b/arch/mn10300/mm/cache-smp.c
index 2d23b9e..406357d 100644
--- a/arch/mn10300/mm/cache-smp.c
+++ b/arch/mn10300/mm/cache-smp.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -91,6 +92,7 @@ void smp_cache_interrupt(void)
 void smp_cache_call(unsigned long opr_mask,
unsigned long start, unsigned long end)
 {
+   get_online_cpus_atomic();
smp_cache_mask = opr_mask;
smp_cache_start = start;
smp_cache_end = end;
@@ -102,4 +104,5 @@ void smp_cache_call(unsigned long opr_mask,
while (!cpumask_empty(&smp_cache_ipi_map))
/* nothing. lockup detection does not belong here */
mb();
+   put_online_cpus_atomic();
 }
diff --git a/arch/mn10300/mm/tlb-smp.c b/arch/mn10300/mm/tlb-smp.c
index 3e57faf..8856fd3 100644
--- a/arch/mn10300/mm/tlb-smp.c
+++ b/arch/mn10300/mm/tlb-smp.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -61,7 +62,7 @@ void smp_flush_tlb(void *unused)
 {
unsigned long cpu_id;
 
-   cpu_id = get_cpu();
+   cpu_id = get_online_cpus_atomic();
 
if (!cpumask_test_cpu(cpu_id, &flush_cpumask))
/* This was a BUG() but until someone can quote me the line
@@ -82,7 +83,7 @@ void smp_flush_tlb(void *unused)
cpumask_clear_cpu(cpu_id, &flush_cpumask);
smp_mb__after_clear_bit();
 out:
-   put_cpu();
+   put_online_cpus_atomic();
 }
 
 /**
@@ -144,7 +145,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 {
cpumask_t cpu_mask;
 
-   preempt_disable();
+   get_online_cpus_atomic();
cpumask_copy(&cpu_mask, mm_cpumask(mm));
cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -152,7 +153,7 @@ void flush_tlb_mm(struct mm_struct *mm)
if (!cpumask_empty(&cpu_mask))
flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 /**
@@ -163,7 +164,7 @@ void flush_tlb_current_task(void)
struct mm_struct *mm = current->mm;
cpumask_t cpu_mask;
 
-   preempt_disable();
+   get_online_cpus_atomic();
cpumask_copy(&cpu_mask, mm_cpumask(mm));
cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -171,7 +172,7 @@ void flush_tlb_current_task(void)
if (!cpumask_empty(&cpu_mask))
flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 /**
@@ -184,7 +185,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned 
long va)
struct mm_struct *mm = vma->vm_mm;
cpumask_t cpu_mask;
 
-   preempt_disable();
+   get_online_cpus_atomic();
cpumask_copy(&cpu_mask, mm_cpumask(mm));
cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -192,7 +193,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned 
long va)
if (!cpumask_empty(&cpu_mask))
flush_tlb_others(cpu_mask, mm, va);
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 /**

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 40/45] powerpc, irq: Use GFP_ATOMIC allocations in atomic context

2013-06-25 Thread Srivatsa S. Bhat
The function migrate_irqs() is called with interrupts disabled
and hence its not safe to do GFP_KERNEL allocations inside it,
because they can sleep. So change the gfp mask to GFP_ATOMIC.

Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Ian Munsie 
Cc: Steven Rostedt 
Cc: Michael Ellerman 
Cc: Li Zhong 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/powerpc/kernel/irq.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ea185e0..ca39bac 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -412,7 +412,7 @@ void migrate_irqs(void)
cpumask_var_t mask;
const struct cpumask *map = cpu_online_mask;
 
-   alloc_cpumask_var(&mask, GFP_KERNEL);
+   alloc_cpumask_var(&mask, GFP_ATOMIC);
 
for_each_irq_desc(irq, desc) {
struct irq_data *data;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 41/45] powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Benjamin Herrenschmidt 
Cc: Gleb Natapov 
Cc: Alexander Graf 
Cc: Rob Herring 
Cc: Grant Likely 
Cc: Kumar Gala 
Cc: Zhao Chenhui 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: k...@vger.kernel.org
Cc: kvm-...@vger.kernel.org
Cc: oprofile-l...@lists.sf.net
Cc: cbe-oss-...@lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/powerpc/kernel/irq.c  |7 ++-
 arch/powerpc/kernel/machine_kexec_64.c |4 ++--
 arch/powerpc/kernel/smp.c  |2 ++
 arch/powerpc/kvm/book3s_hv.c   |5 +++--
 arch/powerpc/mm/mmu_context_nohash.c   |3 +++
 arch/powerpc/oprofile/cell/spu_profiler.c  |3 +++
 arch/powerpc/oprofile/cell/spu_task_sync.c |4 
 arch/powerpc/oprofile/op_model_cell.c  |6 ++
 8 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ca39bac..41e9961 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -410,7 +411,10 @@ void migrate_irqs(void)
unsigned int irq;
static int warned;
cpumask_var_t mask;
-   const struct cpumask *map = cpu_online_mask;
+   const struct cpumask *map;
+
+   get_online_cpus_atomic();
+   map = cpu_online_mask;
 
alloc_cpumask_var(&mask, GFP_ATOMIC);
 
@@ -436,6 +440,7 @@ void migrate_irqs(void)
}
 
free_cpumask_var(mask);
+   put_online_cpus_atomic();
 
local_irq_enable();
mdelay(1);
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 611acdf..38f6d75 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -187,7 +187,7 @@ static void kexec_prepare_cpus_wait(int wait_state)
int my_cpu, i, notified=-1;
 
hw_breakpoint_disable();
-   my_cpu = get_cpu();
+   my_cpu = get_online_cpus_atomic();
/* Make sure each CPU has at least made it to the state we need.
 *
 * FIXME: There is a (slim) chance of a problem if not all of the CPUs
@@ -266,7 +266,7 @@ static void kexec_prepare_cpus(void)
 */
kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
 
-   put_cpu();
+   put_online_cpus_atomic();
 }
 
 #else /* ! SMP */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ee7ac5e..2123bec 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -277,9 +277,11 @@ void smp_send_debugger_break(void)
if (unlikely(!smp_ops))
return;
 
+   get_online_cpus_atomic();
for_each_online_cpu(cpu)
if (cpu != me)
do_message_pass(cpu, PPC_MSG_DEBUGGER_BREAK);
+   put_online_cpus_atomic();
 }
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2efa9dd..9d8a973 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -78,7 +79,7 @@ void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
++vcpu->stat.halt_wakeup;
}
 
-   me = get_cpu();
+   me = get_online_cpus_atomic();
 
/* CPU points to the first thread of the core */
if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
@@ -88,7 +89,7 @@ void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
else if (cpu_online(cpu))
smp_send_reschedule(cpu);
}
-   put_cpu();
+   put_online_cpus_atomic();
 }
 
 /*
diff --git a/arch/powerpc/mm/mmu_context_nohash.c 
b/arch/powerpc/mm/mmu_context_nohash.c
index e779642..c7bdcb4 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -194,6 +194,8 @@ void switch_mmu_context(struct mm_struct *prev, struct 
mm_struct *next)
unsigned int i, id, cpu = smp_processor_id();
unsigned long *map;
 
+   get_online_cpus_atomic();
+
/* No lockless fast path .. yet */
raw_spin_lock(&context_lock);
 
@@ -280,6 +282,7 @@ void switch_mmu_context(struct mm_struct *prev, struct 
mm_struct *next)
pr_hardcont(" -> %d\n", id);
set_context(id, next->pgd);
raw_spin_unlock(&context_lock);
+   put_online_cpus_atomic();
 }
 
 /*
diff --git a/arch/powerpc/oprofile/cell/spu_profiler.c 
b/arch/powerpc/oprofile/cell/spu_profiler.c
index b129d00..ab6e6c1 100644
--- a/arch/powerpc/oprofile/cell/spu_profiler.c
+++ b/arch/powerpc/oprofile/cell/spu_profiler.c
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #inc

[PATCH v2 42/45] powerpc: Use get/put_online_cpus_atomic() to avoid false-positive warning

2013-06-25 Thread Srivatsa S. Bhat
Bringing a secondary CPU online is a special case in which, accessing
the cpu_online_mask is safe, even though that task (which running on the
CPU coming online) is not the hotplug writer.

It is a little hard to teach this to the debugging checks under
CONFIG_DEBUG_HOTPLUG_CPU. But luckily powerpc is one of the few places
where the CPU coming online traverses the cpu_online_mask before fully
coming online. So wrap that part under get/put_online_cpus_atomic(), to
avoid false-positive warnings from the CPU hotplug debug code.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Kumar Gala 
Cc: Zhao Chenhui 
Cc: Thomas Gleixner 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/powerpc/kernel/smp.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 2123bec..59c9a09 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -657,6 +657,7 @@ __cpuinit void start_secondary(void *unused)
cpumask_set_cpu(base + i, cpu_core_mask(cpu));
}
l2_cache = cpu_to_l2cache(cpu);
+   get_online_cpus_atomic();
for_each_online_cpu(i) {
struct device_node *np = cpu_to_l2cache(i);
if (!np)
@@ -667,6 +668,7 @@ __cpuinit void start_secondary(void *unused)
}
of_node_put(np);
}
+   put_online_cpus_atomic();
of_node_put(l2_cache);
 
local_irq_enable();

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 43/45] sh: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Paul Mundt 
Cc: Thomas Gleixner 
Cc: linux...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/sh/kernel/smp.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 4569645..42ec182 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -357,7 +357,7 @@ static void flush_tlb_mm_ipi(void *mm)
  */
 void flush_tlb_mm(struct mm_struct *mm)
 {
-   preempt_disable();
+   get_online_cpus_atomic();
 
if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
smp_call_function(flush_tlb_mm_ipi, (void *)mm, 1);
@@ -369,7 +369,7 @@ void flush_tlb_mm(struct mm_struct *mm)
}
local_flush_tlb_mm(mm);
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 struct flush_tlb_data {
@@ -390,7 +390,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 {
struct mm_struct *mm = vma->vm_mm;
 
-   preempt_disable();
+   get_online_cpus_atomic();
if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
struct flush_tlb_data fd;
 
@@ -405,7 +405,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
cpu_context(i, mm) = 0;
}
local_flush_tlb_range(vma, start, end);
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 static void flush_tlb_kernel_range_ipi(void *info)
@@ -433,7 +433,7 @@ static void flush_tlb_page_ipi(void *info)
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 {
-   preempt_disable();
+   get_online_cpus_atomic();
if ((atomic_read(&vma->vm_mm->mm_users) != 1) ||
(current->mm != vma->vm_mm)) {
struct flush_tlb_data fd;
@@ -448,7 +448,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned 
long page)
cpu_context(i, vma->vm_mm) = 0;
}
local_flush_tlb_page(vma, page);
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 static void flush_tlb_one_ipi(void *info)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 44/45] sparc: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: "David S. Miller" 
Cc: Sam Ravnborg 
Cc: Thomas Gleixner 
Cc: Dave Kleikamp 
Cc: sparcli...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat 
---

 arch/sparc/kernel/smp_64.c |   12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 77539ed..4f71a95 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -792,7 +792,9 @@ static void smp_cross_call_masked(unsigned long *func, u32 
ctx, u64 data1, u64 d
 /* Send cross call to all processors except self. */
 static void smp_cross_call(unsigned long *func, u32 ctx, u64 data1, u64 data2)
 {
+   get_online_cpus_atomic();
smp_cross_call_masked(func, ctx, data1, data2, cpu_online_mask);
+   put_online_cpus_atomic();
 }
 
 extern unsigned long xcall_sync_tick;
@@ -896,7 +898,7 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
atomic_inc(&dcpage_flushes);
 #endif
 
-   this_cpu = get_cpu();
+   this_cpu = get_online_cpus_atomic();
 
if (cpu == this_cpu) {
__local_flush_dcache_page(page);
@@ -922,7 +924,7 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
}
}
 
-   put_cpu();
+   put_online_cpus_atomic();
 }
 
 void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
@@ -933,7 +935,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct 
page *page)
if (tlb_type == hypervisor)
return;
 
-   preempt_disable();
+   get_online_cpus_atomic();
 
 #ifdef CONFIG_DEBUG_DCFLUSH
atomic_inc(&dcpage_flushes);
@@ -958,7 +960,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct 
page *page)
}
__local_flush_dcache_page(page);
 
-   preempt_enable();
+   put_online_cpus_atomic();
 }
 
 void __irq_entry smp_new_mmu_context_version_client(int irq, struct pt_regs 
*regs)
@@ -1150,6 +1152,7 @@ void smp_capture(void)
 {
int result = atomic_add_ret(1, &smp_capture_depth);
 
+   get_online_cpus_atomic();
if (result == 1) {
int ncpus = num_online_cpus();
 
@@ -1166,6 +1169,7 @@ void smp_capture(void)
printk("done\n");
 #endif
}
+   put_online_cpus_atomic();
 }
 
 void smp_release(void)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2 25/45] staging/octeon: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Greg Kroah-Hartman
On Wed, Jun 26, 2013 at 02:00:04AM +0530, Srivatsa S. Bhat wrote:
> Once stop_machine() is gone from the CPU offline path, we won't be able
> to depend on disabling preemption to prevent CPUs from going offline
> from under us.
> 
> Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
> offline, while invoking from atomic context.
> 
> Cc: Greg Kroah-Hartman 
> Cc: de...@driverdev.osuosl.org
> Acked-by: David Daney 
> Signed-off-by: Srivatsa S. Bhat 

Acked-by: Greg Kroah-Hartman 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 45/45] tile: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Chris Metcalf 
Signed-off-by: Srivatsa S. Bhat 
---

 arch/tile/kernel/module.c |3 +++
 arch/tile/kernel/tlb.c|   15 +++
 arch/tile/mm/homecache.c  |3 +++
 3 files changed, 21 insertions(+)

diff --git a/arch/tile/kernel/module.c b/arch/tile/kernel/module.c
index 4918d91..db7d858 100644
--- a/arch/tile/kernel/module.c
+++ b/arch/tile/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -79,8 +80,10 @@ void module_free(struct module *mod, void *module_region)
vfree(module_region);
 
/* Globally flush the L1 icache. */
+   get_online_cpus_atomic();
flush_remote(0, HV_FLUSH_EVICT_L1I, cpu_online_mask,
 0, 0, 0, NULL, NULL, 0);
+   put_online_cpus_atomic();
 
/*
 * FIXME: If module_region == mod->module_init, trim exception
diff --git a/arch/tile/kernel/tlb.c b/arch/tile/kernel/tlb.c
index 3fd54d5..a32b9dd 100644
--- a/arch/tile/kernel/tlb.c
+++ b/arch/tile/kernel/tlb.c
@@ -14,6 +14,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -35,6 +36,8 @@ void flush_tlb_mm(struct mm_struct *mm)
 {
HV_Remote_ASID asids[NR_CPUS];
int i = 0, cpu;
+
+   get_online_cpus_atomic();
for_each_cpu(cpu, mm_cpumask(mm)) {
HV_Remote_ASID *asid = &asids[i++];
asid->y = cpu / smp_topology.width;
@@ -43,6 +46,7 @@ void flush_tlb_mm(struct mm_struct *mm)
}
flush_remote(0, HV_FLUSH_EVICT_L1I, mm_cpumask(mm),
 0, 0, 0, NULL, asids, i);
+   put_online_cpus_atomic();
 }
 
 void flush_tlb_current_task(void)
@@ -55,8 +59,11 @@ void flush_tlb_page_mm(struct vm_area_struct *vma, struct 
mm_struct *mm,
 {
unsigned long size = vma_kernel_pagesize(vma);
int cache = (vma->vm_flags & VM_EXEC) ? HV_FLUSH_EVICT_L1I : 0;
+
+   get_online_cpus_atomic();
flush_remote(0, cache, mm_cpumask(mm),
 va, size, size, mm_cpumask(mm), NULL, 0);
+   put_online_cpus_atomic();
 }
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
@@ -71,13 +78,18 @@ void flush_tlb_range(struct vm_area_struct *vma,
unsigned long size = vma_kernel_pagesize(vma);
struct mm_struct *mm = vma->vm_mm;
int cache = (vma->vm_flags & VM_EXEC) ? HV_FLUSH_EVICT_L1I : 0;
+
+   get_online_cpus_atomic();
flush_remote(0, cache, mm_cpumask(mm), start, end - start, size,
 mm_cpumask(mm), NULL, 0);
+   put_online_cpus_atomic();
 }
 
 void flush_tlb_all(void)
 {
int i;
+
+   get_online_cpus_atomic();
for (i = 0; ; ++i) {
HV_VirtAddrRange r = hv_inquire_virtual(i);
if (r.size == 0)
@@ -89,10 +101,13 @@ void flush_tlb_all(void)
 r.start, r.size, HPAGE_SIZE, cpu_online_mask,
 NULL, 0);
}
+   put_online_cpus_atomic();
 }
 
 void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
+   get_online_cpus_atomic();
flush_remote(0, HV_FLUSH_EVICT_L1I, cpu_online_mask,
 start, end - start, PAGE_SIZE, cpu_online_mask, NULL, 0);
+   put_online_cpus_atomic();
 }
diff --git a/arch/tile/mm/homecache.c b/arch/tile/mm/homecache.c
index 1ae9119..7ff5bf0 100644
--- a/arch/tile/mm/homecache.c
+++ b/arch/tile/mm/homecache.c
@@ -397,9 +397,12 @@ void homecache_change_page_home(struct page *page, int 
order, int home)
BUG_ON(page_count(page) > 1);
BUG_ON(page_mapcount(page) != 0);
kva = (unsigned long) page_address(page);
+
+   get_online_cpus_atomic();
flush_remote(0, HV_FLUSH_EVICT_L2, &cpu_cacheable_map,
 kva, pages * PAGE_SIZE, PAGE_SIZE, cpu_online_mask,
 NULL, 0);
+   put_online_cpus_atomic();
 
for (i = 0; i < pages; ++i, kva += PAGE_SIZE) {
pte_t *ptep = virt_to_pte(NULL, kva);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: mpc85xx_edac.c: Should the mpc85xx_l2_isr be shared irqs?

2013-06-25 Thread Scott Wood
On Wed, Jul 18, 2012 at 05:00:29PM +0800, Xufeng Zhang wrote:
> Hi All,
> 
> I detected below error when booting p1021mds after enabled EDAC feature:
> EDAC MC: Ver: 2.1.0 Jul 17 2012
> Freescale(R) MPC85xx EDAC driver, (C) 2006 Montavista Software
> EDAC MC0: Giving out device to 'MPC85xx_edac' 'mpc85xx_mc_err': DEV 
> mpc85xx_mc_e
> rr
> IRQ 45/[EDAC] MC err: IRQF_DISABLED is not guaranteed on shared IRQs
> MPC85xx_edac acquired irq 45 for MC
> MPC85xx_edac MC err registered
> EDAC DEVICE0: Giving out device to module 'MPC85xx_edac' controller 
> 'mpc85xx_l2_
> err': DEV 'mpc85xx_l2_err' (INTERRUPT)
> mpc85xx_l2_err_probe: Unable to requiest irq 45 for MPC85xx L2 err
> 
> Then kernel hang.
> 
> When request irq for l2-cache, since it share the same irq with memory
> controller,
> I think the code should be:
>   printk(KERN_ERR
> 
> 
> 
> 
> 
> Thanks,
> Xufeng Zhang
> 
> 
> --- a/drivers/edac/mpc85xx_edac.c
> +++ b/drivers/edac/mpc85xx_edac.c
> @@ -577,7 +577,7 @@ static int __devinit mpc85xx_l2_err_probe(struct
> of_device *op,
>   if (edac_op_state == EDAC_OPSTATE_INT) {
>   pdata->irq = irq_of_parse_and_map(op->node, 0);
>   res = devm_request_irq(&op->dev, pdata->irq,
> -mpc85xx_l2_isr, IRQF_DISABLED,
> +mpc85xx_l2_isr, IRQF_DISABLED | 
> IRQF_SHARED,
>  "[EDAC] L2 err", edac_dev);
>   if (res < 0) {

Sorry for the delayed response...  That "IRQF_DISABLED is not guaranteed"
message was removed in v2.6.35 (along with the rest of IRQF_DISABLED
support) which was almost two years old even back when your e-mail was
sent.  Even back then, as far as I can tell your patch would be
introducing, not fixing, that message.

It does look like this interrupt should be shared, though.

The deprecated IRQF_DISABLED should be removed.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] Fix string emulation for 32-bit process on ppc64

2013-06-25 Thread James Yang
String instruction emulation would erroneously result in a segfault if
the upper bits of the EA are set and is so high that it fails access
check.  Truncate the EA to 32 bits if the process is 32-bit.

Signed-off-by: James Yang 
---
 arch/powerpc/kernel/traps.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index dce1bea..c72e7e9 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -840,6 +840,10 @@ static int emulate_string_inst(struct pt_regs *regs, u32 
instword)
u8 val;
u32 shift = 8 * (3 - (pos & 0x3));
 
+   /* if process is 32-bit, clear upper 32 bits of EA */
+   if ((regs->msr & MSR_64BIT) == 0)
+   EA &= 0x;
+
switch ((instword & PPC_INST_STRING_MASK)) {
case PPC_INST_LSWX:
case PPC_INST_LSWI:
-- 
1.7.0.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2 2/2] perf tools: Make Power7 events available for perf

2013-06-25 Thread Vince Weaver
On Tue, 25 Jun 2013, Runzhen Wang wrote:

> This patch makes all the POWER7 events available in sysfs.
> 
> ...
>
> $ size arch/powerpc/perf/power7-pmu.o
>text  data bss dec hex filename
>3073  2720   0579316a1 arch/powerpc/perf/power7-pmu.o
> 
> and after the patch is applied, it is:
> 
> $ size arch/powerpc/perf/power7-pmu.o
>text  data bss dec hex filename
>   15950 31112   0   47062b7d6 arch/powerpc/perf/power7-pmu.o

So if I'm reading this right, there's 45k of overhead for just one cpu 
type?

What happens if we do this on x86?

If we have similar for p6/p4/core2/nehalem/ivb/snb/amd10h/amd15h/amd16h/knb
that's 450k of event defintions in the kernel.  And may I remind everyone 
that you can't compile perf_event support as a module, nor can you 
unconfigure it on x86 (it's always built in, no option to disable).

I'd like to repeat my unpopular position that we just link perf against 
libpfm4 and keep event tables in userspace where they belong.

Vince

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2 15/45] rcu: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-25 Thread Paul E. McKenney
On Wed, Jun 26, 2013 at 01:57:55AM +0530, Srivatsa S. Bhat wrote:
> Once stop_machine() is gone from the CPU offline path, we won't be able
> to depend on disabling preemption to prevent CPUs from going offline
> from under us.
> 
> In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline,
> while being protected by a spinlock. Use the get/put_online_cpus_atomic()
> APIs to prevent CPUs from going offline, while invoking from atomic context.

I am not completely sure that this is needed.  Here is a (quite possibly
flawed) argument for its not being needed:

o   rcu_gp_init() holds off CPU-hotplug operations during
grace-period initialization.  Therefore, RCU will avoid
looking for quiescent states from CPUs that were offline
(and thus in an extended quiescent state) at the beginning
of the grace period.

o   If force_qs_rnp() is looking for a quiescent state from
a given CPU, and if it senses that CPU as being offline,
then even without synchronization we know that the CPU
was offline some time during the current grace period.

After all, it was online at the beginning of the grace
period (otherwise, we would not be looking at it at all),
and our later sampling of its state must have therefore
happened after the start of the grace period.  Given that
the grace period has not yet ended, it also has to happened
before the end of the grace period.

o   Therefore, we should be able to sample the offline state
without synchronization.

Possible flaws in this argument:  memory ordering, oddnesses in
the sampling and updates of the cpumask recording which CPUs are
online, and so on.

Thoughts?

Thanx, Paul

> Cc: Dipankar Sarma 
> Cc: "Paul E. McKenney" 
> Signed-off-by: Srivatsa S. Bhat 
> ---
> 
>  kernel/rcutree.c |4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index cf3adc6..caeed1a 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -2107,6 +2107,8 @@ static void force_qs_rnp(struct rcu_state *rsp, int 
> (*f)(struct rcu_data *))
>   rcu_initiate_boost(rnp, flags); /* releases rnp->lock */
>   continue;
>   }
> +
> + get_online_cpus_atomic();
>   cpu = rnp->grplo;
>   bit = 1;
>   for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
> @@ -2114,6 +2116,8 @@ static void force_qs_rnp(struct rcu_state *rsp, int 
> (*f)(struct rcu_data *))
>   f(per_cpu_ptr(rsp->rda, cpu)))
>   mask |= bit;
>   }
> + put_online_cpus_atomic();
> +
>   if (mask != 0) {
> 
>   /* rcu_report_qs_rnp() releases rnp->lock. */
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [v2] Enhanced support for MPC8xx/8xxx watchdog

2013-06-25 Thread Scott Wood
On Thu, Feb 28, 2013 at 09:52:22AM +0100, LEROY Christophe wrote:
> This patch modifies the behaviour of the MPC8xx/8xxx watchdog. On the MPC8xx,
> at 133Mhz, the maximum timeout of the watchdog timer is 1s, which means it 
> must
> be pinged twice a second. This is not in line with the Linux watchdog concept
> which is based on a default watchdog timeout around 60s.
> This patch introduces an intermediate layer between the CPU and the userspace.
> The kernel pings the watchdog at the required frequency at the condition that
> userspace tools refresh it regularly.
> Existing parameter 'timeout' is renamed 'hw_time'.
> The new parameter 'timeout' allows to set up the userspace timeout.
> The driver also implements the WDIOC_SETTIMEOUT ioctl.
> 
> Signed-off-by: Christophe Leroy 
> 
> 
> diff -ur linux-3.7.9/drivers/watchdog/mpc8xxx_wdt.c 
> linux/drivers/watchdog/mpc8xxx_wdt.c
> --- linux-3.7.9/drivers/watchdog/mpc8xxx_wdt.c2013-02-17 
> 19:53:32.0 +0100
> +++ linux/drivers/watchdog/mpc8xxx_wdt.c  2013-02-27 16:00:07.0 
> +0100
> @@ -52,10 +52,17 @@
>  static struct mpc8xxx_wdt __iomem *wd_base;
>  static int mpc8xxx_wdt_init_late(void);
>  
> -static u16 timeout = 0x;
> -module_param(timeout, ushort, 0);
> +#define WD_TIMO 10   /* Default timeout = 10 seconds */

If the default Linux watchdog timeout is normally 60 seconds, why is it 10
here?

> +static uint timeout = WD_TIMO;
> +module_param(timeout, uint, 0);
>  MODULE_PARM_DESC(timeout,
> - "Watchdog timeout in ticks. (0 + "Watchdog SW timeout in seconds. (0 < timeout < 65536s, default = "
> + __MODULE_STRING(WD_TIMO) "s)");
> +static u16 hw_timo = 0x;
> +module_param(hw_timo, ushort, 0);
> +MODULE_PARM_DESC(hw_timo,
> + "Watchdog HW timeout in ticks. (0 < hw_timo < 65536, default = 65535)");

hw_timeout would be more legibile -- this is a public interface.

>  static bool reset = 1;
>  module_param(reset, bool, 0);
> @@ -72,10 +79,12 @@
>   * to 0
>   */
>  static int prescale = 1;
> -static unsigned int timeout_sec;
> +static unsigned int hw_timo_sec;
>  
> +static int wdt_auto = 1;

bool, and add a comment indicating what this means.

>  static unsigned long wdt_is_open;
>  static DEFINE_SPINLOCK(wdt_spinlock);
> +static unsigned long wdt_last_ping;
>  
>  static void mpc8xxx_wdt_keepalive(void)
>  {
> @@ -91,9 +100,20 @@
>  
>  static void mpc8xxx_wdt_timer_ping(unsigned long arg)
>  {
> - mpc8xxx_wdt_keepalive();
> - /* We're pinging it twice faster than needed, just to be sure. */
> - mod_timer(&wdt_timer, jiffies + HZ * timeout_sec / 2);
> + if (wdt_auto)
> + wdt_last_ping = jiffies;
> +
> + if (jiffies - wdt_last_ping <= timeout * HZ) {

So timeout cannot be more than UINT_MAX / HZ...  Might want to check for
that, just in case.

What happens if there's a race?  If another CPU updates wdt_last_ping in
parallel, then you could see wdt_last_ping greater than the value you
read for jiffies.  Since this is an unsigned comparison, it will fail to
call keepalive.  You might get saved by pinging it twice as often as
necessary, but you shouldn't rely on that.

> + mpc8xxx_wdt_keepalive();
> + /* We're pinging it twice faster than needed, to be sure. */
> + mod_timer(&wdt_timer, jiffies + HZ * hw_timo_sec / 2);
> + }
> +}
> +
> +static void mpc8xxx_wdt_sw_keepalive(void)
> +{
> + wdt_last_ping = jiffies;
> + mpc8xxx_wdt_timer_ping(0);
>  }

This isn't new with this patch, but it looks like
mpc8xxx_wdt_keepalive() can be called either from timer context, or with
interrupts enabled... yet it uses a bare spin_lock() rather than an
irq-safe version.  This should be fixed.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: powerpc/85xx: Move ePAPR paravirt initialization earlier

2013-06-25 Thread Scott Wood
On Tue, Mar 05, 2013 at 05:52:36PM +0200, Laurentiu TUDOR wrote:
> From: Tudor Laurentiu 
> 
> The ePAPR para-virtualization needs to happen very early
> otherwise the bytechannel based console will silently
> drop some of the early boot messages.
> 
> Before this patch, this is how the kernel log started:
> -
>  > Brought up 2 CPUs
>  > devtmpfs: initialized
>  > NET: Registered protocol family 16
>  [...]
> -
> 
> After the patch the early messages show up:
> -
>  > Using P5020 DS machine description
>  > MMU: Supported page sizes
>  >  4 KB as direct
>  >   4096 KB as direct
>  [...]
> -
> 
> At console init, the kernel tried to flush the log buffer.
> Since the paravirt was not yet initialized the console write
> function failed silently, thus losing the buffered messages.
[snip]
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 6da881b..ce092ac 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -66,6 +66,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "setup.h"
>  
> @@ -599,6 +600,8 @@ void __init setup_arch(char **cmdline_p)
>   /* Initialize the MMU context management stuff */
>   mmu_context_init();
>  
> + epapr_paravirt_init();
> +
>   kvm_linear_init();
>  
>   /* Interrupt code needs to be 64K-aligned */

Is this early enough?  There's udbg activity before this.  Maybe it
should even go before udbg_early_init...  This would require converting
the code to use the early device tree functions.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: BUG: no PCI/PCIe devices found in 85xx architecture

2013-06-25 Thread Scott Wood

On 06/25/2013 01:40:14 PM, Stefani Seibold wrote:

Hi,

there is a bug in kernel 3.9 which the new fsl_pci platform driver.  
The
pcibios_init in pci_32.c will be called before the platform driver  
probe

will be invoked.

The call order for a p2020 board with linux 3.9 is currently:

fsl_pci_init
pcibios_init
fsl_pci_probe
fsl_pci_probe
fsl_pci_probe

Therefore the PCI/PCIe bridge will be added after the PCI/PCIe busses
was scanned for devices. So no PCI/PCIe devices are available.

Everything works fine by reverting the fsl_pci.[ch] to the version in
linux 3.4, because the PCI/PCIe bridges will be added in
the ..._setup_arch() function, before the pcibios_init function is
called.

Any solution for this issue?


I can't reproduce this on p3041 -- pcibios_init gets called after  
fsl_pci_probe, and its PCIe e1000 gets detected and used.


fsl_pci_probe should be called when of_platform_bus_probe is called,  
which is in a machine_arch_initcall.  pcibios_init is a  
subsys_initcall, which should happen later.


Which p2020 board are you using?  Could you check when it is calling  
of_platform_bus_probe?


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/6] powerpc/eeh: Don't collect PCI-CFG data on PHB

2013-06-25 Thread Gavin Shan
On Tue, Jun 25, 2013 at 09:55:15PM +1000, Benjamin Herrenschmidt wrote:
>On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote:
>> +   /*
>> +* When the PHB is fenced or dead, it's pointless to collect
>> +* the data from PCI config space because it should return
>> +* 0xFF's. For ER, we still retrieve the data from the PCI
>> +* config space.
>> +*/
>> +   if (eeh_probe_mode_dev() &&
>> +   (pe->type & EEH_PE_PHB) &&
>> +   (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)))
>> +   valid_cfg_log = false;
>> +
>
>I'm still unsure about that one. EEH_PE_ISOLATED could be the result
>of a normal ER of PE#0 (which can happen for various reasons other
>than a fence) in which case the config space is available and
>interesting.
>

It's something like the followings. For ER on PE#0, we will have
PE with type of EEH_PE_BUS marked as isolated, instead of the
one with EEH_PE_PHB.


[ EEH_PE_PHB] <---> [ EEH_PE_PHB] <---> [ EEH_PE_PHB]
  |
[ EEH_PE_BUS ] PE#0
  |
-
|   |
   [ EEH_PE_BUS ] PE#1 [ EEH_PE_BUS] PE#2

>I would either not bother and collect the FF's, or make this specific
>to fence and only fence.
>

I'd like to keep it specific to fenced PHB and it's already be that :-)

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/6] powerpc/eeh: Don't collect PCI-CFG data on PHB

2013-06-25 Thread Gavin Shan
On Tue, Jun 25, 2013 at 09:56:06PM +1000, Benjamin Herrenschmidt wrote:
>On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote:
>> +   pci_regs_buf[0] = 0;
>> +   eeh_pe_for_each_dev(pe, edev) {
>> +   loglen += eeh_gather_pci_data(edev, pci_regs_buf,
>> + EEH_PCI_REGS_LOG_LEN);
>> +   }
>> +   }
>
>Unless I'm mistaken, this is buggy and will overwrite the content of
>pci_regs_buf for every device (they will all write over the same
>portion of the log).
>

No, you're right. I'm going to fix it in next revision :-)

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/6] powerpc/eeh: Check PCIe link after reset

2013-06-25 Thread Gavin Shan
On Tue, Jun 25, 2013 at 09:58:40PM +1000, Benjamin Herrenschmidt wrote:
>On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote:
>> After reset (e.g. complete reset) in order to bring the fenced PHB
>> back, the PCIe link might not be ready yet. The patch intends to
>> make sure the PCIe link is ready before accessing its subordinate
>> PCI devices. The patch also fixes that wrong values restored to
>> PCI_COMMAND register for PCI bridges.
>
>This should also help if we end up doing a full reset for ER cases
>right ?
>
>IE, in a setup with PHB -> device (no switch), if the device driver
>requests a fundamental reset, we should do a PERST at the PHB level (are
>we ?) and thus restore things in a similar way.
>

Yes, you're right. We will do PERST if the device driver requests
fundamental reset on the root port. Otherwise, no matter which
type of reset (fundamental or hot) the device driver requests,
the upstream bridge got hot reset. Anyway, the piece of code
(checking PCIe link) is possiblly applied to ER case.

Thanks,
Gavin

>> Signed-off-by: Gavin Shan 
>> ---
>>  arch/powerpc/kernel/eeh_pe.c |  157 
>> ++
>>  1 files changed, 144 insertions(+), 13 deletions(-)
>> 
>> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>> index 55943fc..016588a 100644
>> --- a/arch/powerpc/kernel/eeh_pe.c
>> +++ b/arch/powerpc/kernel/eeh_pe.c
>> @@ -22,6 +22,7 @@
>>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>>   */
>>  
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -567,30 +568,132 @@ void eeh_pe_state_clear(struct eeh_pe *pe, int state)
>>  eeh_pe_traverse(pe, __eeh_pe_state_clear, &state);
>>  }
>>  
>> -/**
>> - * eeh_restore_one_device_bars - Restore the Base Address Registers for one 
>> device
>> - * @data: EEH device
>> - * @flag: Unused
>> +/*
>> + * Some PCI bridges (e.g. PLX bridges) have primary/secondary
>> + * buses assigned explicitly by firmware, and we probably have
>> + * lost that after reset. So we have to delay the check until
>> + * the PCI-CFG registers have been restored for the parent
>> + * bridge.
>>   *
>> - * Loads the PCI configuration space base address registers,
>> - * the expansion ROM base address, the latency timer, and etc.
>> - * from the saved values in the device node.
>> + * Don't use normal PCI-CFG accessors, which probably has been
>> + * blocked on normal path during the stage. So we need utilize
>> + * eeh operations, which is always permitted.
>>   */
>> -static void *eeh_restore_one_device_bars(void *data, void *flag)
>> +static void eeh_bridge_check_link(struct pci_dev *pdev,
>> +  struct device_node *dn)
>> +{
>> +int cap;
>> +uint32_t val;
>> +int timeout = 0;
>> +
>> +/*
>> + * We only check root port and downstream ports of
>> + * PCIe switches
>> + */
>> +if (!pci_is_pcie(pdev) ||
>> +(pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&
>> + pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
>> +return;
>> +
>> +pr_debug("%s: Check PCIe link for %s ...\n",
>> + __func__, pci_name(pdev));
>> +
>> +/* Check slot status */
>> +cap = pdev->pcie_cap;
>> +eeh_ops->read_config(dn, cap + PCI_EXP_SLTSTA, 2, &val);
>> +if (!(val & PCI_EXP_SLTSTA_PDS)) {
>> +pr_debug("  No card in the slot (0x%04x) !\n", val);
>> +return;
>> +}
>> +
>> +/* Check power status if we have the capability */
>> +eeh_ops->read_config(dn, cap + PCI_EXP_SLTCAP, 2, &val);
>> +if (val & PCI_EXP_SLTCAP_PCP) {
>> +eeh_ops->read_config(dn, cap + PCI_EXP_SLTCTL, 2, &val);
>> +if (val & PCI_EXP_SLTCTL_PCC) {
>> +pr_debug("  In power-off state, power it on ...\n");
>> +val &= ~(PCI_EXP_SLTCTL_PCC | PCI_EXP_SLTCTL_PIC);
>> +val |= (0x0100 & PCI_EXP_SLTCTL_PIC);
>> +eeh_ops->write_config(dn, cap + PCI_EXP_SLTCTL, 2, val);
>> +msleep(2 * 1000);
>> +}
>> +}
>> +
>> +/* Enable link */
>> +eeh_ops->read_config(dn, cap + PCI_EXP_LNKCTL, 2, &val);
>> +val &= ~PCI_EXP_LNKCTL_LD;
>> +eeh_ops->write_config(dn, cap + PCI_EXP_LNKCTL, 2, val);
>> +
>> +/* Check link */
>> +eeh_ops->read_config(dn, cap + PCI_EXP_LNKCAP, 4, &val);
>> +if (!(val & PCI_EXP_LNKCAP_DLLLARC)) {
>> +pr_debug("  No link reporting capability (0x%08x) \n", val);
>> +msleep(1000);
>> +return;
>> +}
>> +
>> +/* Wait the link is up until timeout (5s) */
>> +timeout = 0;
>> +while (timeout < 5000) {
>> +msleep(20);
>> +timeout += 20;
>> +
>> +eeh_ops->read_config(dn, cap + PCI_EXP_LNKSTA, 2, &val);
>> +if (val & PCI_EXP_LNKSTA_DLLLA)
>> +break;
>> +}
>> +
>> +if (val & PCI_EXP_LNKSTA_DLLLA)
>> + 

Re: [PATCH 1/6] powerpc/eeh: Don't collect PCI-CFG data on PHB

2013-06-25 Thread Benjamin Herrenschmidt
On Wed, 2013-06-26 at 07:49 +0800, Gavin Shan wrote:
> It's something like the followings. For ER on PE#0, we will have
> PE with type of EEH_PE_BUS marked as isolated, instead of the
> one with EEH_PE_PHB.
> 
> 
> [ EEH_PE_PHB] <---> [ EEH_PE_PHB] <---> [ EEH_PE_PHB]
>   |
> [ EEH_PE_BUS ] PE#0
>   |

So we actually have two PEs here ? One real (PE#0) and one imaginary
(PHB PE) with no PE# associated ?

> -
> |   |
>[ EEH_PE_BUS ] PE#1 [ EEH_PE_BUS] PE#2
> 
> >I would either not bother and collect the FF's, or make this specific
> >to fence and only fence.
> >
> 
> I'd like to keep it specific to fenced PHB and it's already be
> that :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/6] powerpc/eeh: Don't collect PCI-CFG data on PHB

2013-06-25 Thread Gavin Shan
On Wed, Jun 26, 2013 at 09:57:26AM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2013-06-26 at 07:49 +0800, Gavin Shan wrote:
>> It's something like the followings. For ER on PE#0, we will have
>> PE with type of EEH_PE_BUS marked as isolated, instead of the
>> one with EEH_PE_PHB.
>> 
>> 
>> [ EEH_PE_PHB] <---> [ EEH_PE_PHB] <---> [ EEH_PE_PHB]
>>   |
>> [ EEH_PE_BUS ] PE#0
>>   |
>
>So we actually have two PEs here ? One real (PE#0) and one imaginary
>(PHB PE) with no PE# associated ?
>

Yes, The (PHB PE) is actually a container to all PEs under the
PHB ;-)

>> -
>> |   |
>>[ EEH_PE_BUS ] PE#1 [ EEH_PE_BUS] PE#2
>> 
>> >I would either not bother and collect the FF's, or make this specific
>> >to fence and only fence.
>> >
>> 
>> I'd like to keep it specific to fenced PHB and it's already be
>> that :-)

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 5/6] powerpc/eeh: Refactor the output message

2013-06-25 Thread Gavin Shan
We needn't the the whole backtrace other than one-line message in
the error reporting interrupt handler. For errors triggered by
access PCI config space or MMIO, we replace "WARN(1, ...)" with
pr_err() and dump_stack().

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh.c |9 +++--
 arch/powerpc/platforms/powernv/eeh-ioda.c |   25 -
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 2dd0bd1..f7f2775 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -324,7 +324,9 @@ static int eeh_phb_check_failure(struct eeh_pe *pe)
eeh_serialize_unlock(flags);
eeh_send_failure_event(phb_pe);
 
-   WARN(1, "EEH: PHB failure detected\n");
+   pr_err("EEH: PHB#%x failure detected\n",
+   phb_pe->phb->global_number);
+   dump_stack();
 
return 1;
 out:
@@ -453,7 +455,10 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 * a stack trace will help the device-driver authors figure
 * out what happened.  So print that out.
 */
-   WARN(1, "EEH: failure detected\n");
+   pr_err("EEH: Frozen PE#%x detected on PHB#%x\n",
+   pe->addr, pe->phb->global_number);
+   dump_stack();
+
return 1;
 
 dn_unlock:
diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 85025d7..0cd1c4a 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -853,11 +853,14 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
phb->eeh_state |= PNV_EEH_STATE_REMOVED;
}
 
-   WARN(1, "EEH: dead IOC detected\n");
+   pr_err("EEH: dead IOC detected\n");
ret = 4;
goto out;
-   } else if (severity == OPAL_EEH_SEV_INF)
+   } else if (severity == OPAL_EEH_SEV_INF) {
+   pr_info("EEH: IOC informative error "
+   "detected\n");
ioda_eeh_hub_diag(hose);
+   }
 
break;
case OPAL_EEH_PHB_ERROR:
@@ -865,8 +868,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
if (ioda_eeh_get_phb_pe(hose, pe))
break;
 
-   WARN(1, "EEH: dead PHB#%x detected\n",
-hose->global_number);
+   pr_err("EEH: dead PHB#%x detected\n",
+   hose->global_number);
phb->eeh_state |= PNV_EEH_STATE_REMOVED;
ret = 3;
goto out;
@@ -874,20 +877,24 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
if (ioda_eeh_get_phb_pe(hose, pe))
break;
 
-   WARN(1, "EEH: fenced PHB#%x detected\n",
-hose->global_number);
+   pr_err("EEH: fenced PHB#%x detected\n",
+   hose->global_number);
ret = 2;
goto out;
-   } else if (severity == OPAL_EEH_SEV_INF)
+   } else if (severity == OPAL_EEH_SEV_INF) {
+   pr_info("EEH: PHB#%x informative error "
+   "detected\n",
+   hose->global_number);
ioda_eeh_phb_diag(hose);
+   }
 
break;
case OPAL_EEH_PE_ERROR:
if (ioda_eeh_get_pe(hose, frozen_pe_no, pe))
break;
 
-   WARN(1, "EEH: Frozen PE#%x on PHB#%x detected\n",
-(*pe)->addr, (*pe)->phb->global_number);
+   pr_err("EEH: Frozen PE#%x on PHB#%x detected\n",
+   (*pe)->addr, (*pe)->phb->global_number);
ret = 1;
goto out;
}
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3 00/6] Follow-up fixes for EEH on PowerNV

2013-06-25 Thread Gavin Shan
The series of patches are follow-up in order to make EEH workable for PowerNV
platform on Juno-IOC-L machine. Couple of issues have been fixed with help of
Ben:

- Check PCIe link after PHB complete reset
- Restore config space for bridges
- The EEH address cache wasn't built successfully
- Misc cleanup on output messages
- Misc cleanup on EEH flags maintained by "struct pnv_phb"
- Misc cleanup on properties of functions to avoid build warnings
 
The series of patches have been verified on Juno-IOC-L machine:

Trigger frozen PE:

echo 0x0200 > /sys/kernel/debug/powerpc/PCI/err_injct
sleep 1
echo 0x0 > /sys/kernel/debug/powerpc/PCI/err_injct

Trigger fenced PHB:

echo 0x8000 > /sys/kernel/debug/powerpc/PCI/err_injct


Changelog:
==

v2 -> v3:
* Fix overwritten buffer while collecting data
  from PCI config space.
v1 -> v2:
* Remove the mechanism to block PCI-CFG and MMIO.
* Add one patch to do cleanup on output messages.
* Add one patch to avoid build warnings.
* Split functions to restore BARs for PCI devices and bridges 
separately.

---

arch/powerpc/include/asm/eeh.h|4 +-
arch/powerpc/kernel/eeh.c |   43 ++--
arch/powerpc/kernel/eeh_cache.c   |4 +-
arch/powerpc/kernel/eeh_pe.c  |  157 ++---
arch/powerpc/platforms/powernv/eeh-ioda.c |   33 ---
arch/powerpc/platforms/powernv/pci-ioda.c |1 +
arch/powerpc/platforms/powernv/pci.c  |4 +-
arch/powerpc/platforms/powernv/pci.h  |7 +-
8 files changed, 207 insertions(+), 46 deletions(-)

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


  1   2   >