Nicholas Piggin <npig...@gmail.com> writes:

> System reset is a non-maskable interrupt from Linux's point of view
> (occurs under local_irq_disable()), so it should use nmi_enter/exit.
...
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 802aa6bbe97b..c65c88fb6482 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -278,6 +278,14 @@ void _exception(int signr, struct pt_regs *regs, int 
> code, unsigned long addr)
>  
>  void system_reset_exception(struct pt_regs *regs)
>  {
> +     /*
> +      * Avoid crashes in case of nested NMI exceptions. Recoverability
> +      * is determined by RI and in_nmi
> +      */
> +     bool nested = in_nmi();
> +     if (!nested)
> +             nmi_enter();
> +
>       /* See if any machine dependent calls */
>       if (ppc_md.system_reset_exception) {
>               if (ppc_md.system_reset_exception(regs))


This breaks my QS22 (Cell blade), I get lots of RCU stalls such as:

  INFO: rcu_sched self-detected stall on CPU
        0-...: (5249 ticks this GP) idle=ad6/1/1 softirq=3/3 fqs=3 
         (t=5250 jiffies g=-298 c=-299 q=1289)
  rcu_sched kthread starved for 5234 jiffies! g18446744073709551318 
c18446744073709551317 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
  rcu_sched       S    0     8      2 0x00000800
  Call Trace:
  [c0000003fb9d7950] [c000000000014730] .__switch_to+0x218/0x2b0
  [c0000003fb9d7a00] [c0000000006a0668] .__schedule+0x268/0x778
  [c0000003fb9d7ae0] [c0000000006a0bb0] .schedule+0x38/0xb0
  [c0000003fb9d7b60] [c0000000006a7ba4] .schedule_timeout+0x184/0x2f0
  [c0000003fb9d7c50] [c000000000106c5c] .rcu_gp_kthread+0x5ec/0xa60
  [c0000003fb9d7d70] [c0000000000c69d0] .kthread+0x148/0x188
  [c0000003fb9d7e30] [c00000000000ba70] .ret_from_kernel_thread+0x58/0x68

And I never get to userspace.

This is because cbe_system_reset_exception() doesn't like being called
after nmi_enter() - though I don't know exactly what the problem is.

Moving the nmi_enter() after the ppc_md hook (and fixing up the goto
etc.) fixes it, but that's not really a great solution.

I suspect it will also break pasemi, because it does something similar.

I'm not clear on how best to fix it ATM.

cheers

Reply via email to