Hello, On Thu, Apr 21, 2022 at 12:16:57AM +1000, Michael Ellerman wrote: > This is a partial revert of commit 0faf20a1ad16 ("powerpc/64s/interrupt: > Don't enable MSR[EE] in irq handlers unless perf is in use"). > > Prior to that commit, we always set the decrementer in > timer_interrupt(), to clear the timer interrupt. Otherwise we could end > up continuously taking timer interrupts. > > When high res timers are enabled there is no problem seen with leaving > the decrementer untouched in timer_interrupt(), because it will be > programmed via hrtimer_interrupt() -> tick_program_event() -> > clockevents_program_event() -> decrementer_set_next_event(). > > However with CONFIG_HIGH_RES_TIMERS=n or booting with highres=off, we
How difficult is it to detect this condition? Maybe detecting this could be just added? Thanks Michal > see a stall/lockup, because tick_nohz_handler() does not cause a > reprogram of the decrementer, leading to endless timer interrupts. > Example trace: > > [ 1.898617][ T7] Freeing initrd memory: 2624K^M > [ 22.680919][ C1] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:^M > [ 22.682281][ C1] rcu: 0-....: (25 ticks this GP) idle=073/0/0x1 > softirq=10/16 fqs=1050 ^M > [ 22.682851][ C1] (detected by 1, t=2102 jiffies, g=-1179, q=476)^M > [ 22.683649][ C1] Sending NMI from CPU 1 to CPUs 0:^M > [ 22.685252][ C0] NMI backtrace for cpu 0^M > [ 22.685649][ C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 5.16.0-rc2-00185-g0faf20a1ad16 #145^M > [ 22.686393][ C0] NIP: c000000000016d64 LR: c000000000f6cca4 CTR: > c00000000019c6e0^M > [ 22.686774][ C0] REGS: c000000002833590 TRAP: 0500 Not tainted > (5.16.0-rc2-00185-g0faf20a1ad16)^M > [ 22.687222][ C0] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: > 24000222 XER: 00000000^M > [ 22.688297][ C0] CFAR: c00000000000c854 IRQMASK: 0 ^M > ... > [ 22.692637][ C0] NIP [c000000000016d64] > arch_local_irq_restore+0x174/0x250^M > [ 22.694443][ C0] LR [c000000000f6cca4] __do_softirq+0xe4/0x3dc^M > [ 22.695762][ C0] Call Trace:^M > [ 22.696050][ C0] [c000000002833830] [c000000000f6cc80] > __do_softirq+0xc0/0x3dc (unreliable)^M > [ 22.697377][ C0] [c000000002833920] [c000000000151508] > __irq_exit_rcu+0xd8/0x130^M > [ 22.698739][ C0] [c000000002833950] [c000000000151730] > irq_exit+0x20/0x40^M > [ 22.699938][ C0] [c000000002833970] [c000000000027f40] > timer_interrupt+0x270/0x460^M > [ 22.701119][ C0] [c0000000028339d0] [c0000000000099a8] > decrementer_common_virt+0x208/0x210^M > > Possibly this should be fixed in the lowres timing code, but that would > be a generic change and could take some time and may not backport > easily, so for now make the programming of the decrementer unconditional > again in timer_interrupt() to avoid the stall/lockup. > > Fixes: 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq > handlers unless perf is in use") > Reported-by: Miguel Ojeda <miguel.ojeda.sando...@gmail.com> > Signed-off-by: Michael Ellerman <m...@ellerman.id.au> > --- > arch/powerpc/kernel/time.c | 29 ++++++++++++++--------------- > 1 file changed, 14 insertions(+), 15 deletions(-) > > diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c > index f5cbfe5efd25..f80cce0e3899 100644 > --- a/arch/powerpc/kernel/time.c > +++ b/arch/powerpc/kernel/time.c > @@ -615,23 +615,22 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(timer_interrupt) > return; > } > > - /* Conditionally hard-enable interrupts. */ > - if (should_hard_irq_enable()) { > - /* > - * Ensure a positive value is written to the decrementer, or > - * else some CPUs will continue to take decrementer exceptions. > - * When the PPC_WATCHDOG (decrementer based) is configured, > - * keep this at most 31 bits, which is about 4 seconds on most > - * systems, which gives the watchdog a chance of catching timer > - * interrupt hard lockups. > - */ > - if (IS_ENABLED(CONFIG_PPC_WATCHDOG)) > - set_dec(0x7fffffff); > - else > - set_dec(decrementer_max); > + /* > + * Ensure a positive value is written to the decrementer, or > + * else some CPUs will continue to take decrementer exceptions. > + * When the PPC_WATCHDOG (decrementer based) is configured, > + * keep this at most 31 bits, which is about 4 seconds on most > + * systems, which gives the watchdog a chance of catching timer > + * interrupt hard lockups. > + */ > + if (IS_ENABLED(CONFIG_PPC_WATCHDOG)) > + set_dec(0x7fffffff); > + else > + set_dec(decrementer_max); > > + /* Conditionally hard-enable interrupts. */ > + if (should_hard_irq_enable()) > do_hard_irq_enable(); > - } > > #if defined(CONFIG_PPC32) && defined(CONFIG_PPC_PMAC) > if (atomic_read(&ppc_n_lost_interrupts) != 0) > -- > 2.34.1 >