On Tue, Jan 12, 2021 at 04:09:53PM -0800, Paul E. McKenney wrote: > On Tue, Jan 12, 2021 at 01:59:48PM +0000, Mark Rutland wrote: > > Hi, > > > > While fuzzing arm64 with Syzkaller (under QEMU+KVM) over a number of > > releases, > > I've occasionally seen some ridiculously long stalls (20+ seconds), where it > > appears that a CPU is stuck in a hard IRQ context. As this gets detected > > after > > the CPU returns to the interrupted context, it's difficult to identify where > > exactly the stall is coming from. > > > > These patches are intended to help tracking this down, with a WARN() if an > > IRQ > > handler takes longer than a given timout (1 second by default), logging the > > specific IRQ and handler function. While it's possible to achieve similar > > with > > tracing, it's harder to integrate that into an automated fuzzing setup. > > > > I've been running this for a short while, and haven't yet seen any of the > > stalls with this applied, but I've tested with smaller timeout periods in > > the 1 > > millisecond range by overloading the host, so I'm confident that the check > > works. > > > > Thanks, > > Mark. > > Nice! > > Acked-by: Paul E. McKenney <paul...@kernel.org> > > I added the patch below to add a three-second delay to the scheduling > clock interrupt handler. This executed, but did not cause your warning > to be emitted, probably because rcutorture runs under qemu/KVM. So no > Tested-by, not yet, anyway.
I think this is because on x86, APIC timer interrupts are handled in arch code without going through the usual IRQ management infrastructure. A dump_stack() in rcu_sched_clock_irq() shows: [ 75.131594] rcu: rcu_sched_clock_irq: 3-second delay. [ 75.132557] CPU: 2 PID: 135 Comm: sh Not tainted 5.11.0-rc3+ #12 [ 75.133610] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 75.135639] Call Trace: [ 75.136100] dump_stack+0x57/0x6a [ 75.136713] rcu_sched_clock_irq+0x76d/0x880 [ 75.137493] update_process_times+0x77/0xb0 [ 75.138254] tick_sched_handle.isra.17+0x2b/0x40 [ 75.139105] tick_sched_timer+0x36/0x70 [ 75.139803] ? tick_sched_handle.isra.17+0x40/0x40 [ 75.140665] __hrtimer_run_queues+0xf8/0x230 [ 75.141441] hrtimer_interrupt+0xfc/0x240 [ 75.142169] ? asm_sysvec_apic_timer_interrupt+0xa/0x20 [ 75.143117] __sysvec_apic_timer_interrupt+0x58/0xf0 [ 75.144017] sysvec_apic_timer_interrupt+0x27/0x80 [ 75.144892] asm_sysvec_apic_timer_interrupt+0x12/0x20 Here __sysvec_apic_timer_interrupt() calls local_apic_timer_interrupt() which calls the clock_event_device::event_handler() directly. Since that never goes via an irqaction handler, the code I add is never invoked in this path. I believe this is true for a number of IRQs on x86 (e.g. IPIs). A slow handler for a peripheral interrupt should still be caught, though. On arm64, timer interrupts (and IIUC IPIs too) go though the usual IRQ management code, and so delays there get caught: [ 311.703932] rcu: rcu_sched_clock_irq: 3-second delay. [ 311.705012] CPU: 3 PID: 199 Comm: bash Not tainted 5.11.0-rc3-00003-gbe60490b2295-dirty #13 [ 311.706694] Hardware name: linux,dummy-virt (DT) [ 311.707688] Call trace: [ 311.708233] dump_backtrace+0x0/0x1a0 [ 311.709053] show_stack+0x18/0x70 [ 311.709774] dump_stack+0xd0/0x12c [ 311.710468] rcu_sched_clock_irq+0x7d4/0xcf0 [ 311.711356] update_process_times+0x9c/0xec [ 311.712288] tick_sched_handle+0x34/0x60 [ 311.713191] tick_sched_timer+0x4c/0xa4 [ 311.714043] __hrtimer_run_queues+0x140/0x1e0 [ 311.715012] hrtimer_interrupt+0xe8/0x290 [ 311.715943] arch_timer_handler_virt+0x38/0x4c [ 311.716951] handle_percpu_devid_irq+0x94/0x190 [ 311.717953] __handle_domain_irq+0x7c/0xe0 [ 311.718890] gic_handle_irq+0xc0/0x140 [ 311.719729] el0_irq_naked+0x4c/0x54 [ 314.720833] ------------[ cut here ]------------ [ 314.721950] IRQ 11 handler arch_timer_handler_virt took 3016901740 ns [ 314.723421] WARNING: CPU: 3 PID: 199 at kernel/irq/internals.h:140 handle_percpu_devid_irq+0x158/0x190 I think our options are: 1) Live with it, and don't check these special cases. 2) Rework the special cases to go though the regular irqaction processing. 3) Open-code checks in each special case. 4) Add a helper/wrapper function that can be called in each special case, and update each one accordingly. ... and I reckon some mixture of #3 and #4 is plausible. We could add a __handle_check_irq_function() or similar and use that to wrap the call to local_apic_timer_interrupt() from sysvec_apic_timer_interrupt(), but I'm not sure exactly what that needs to look like to cover any other special cases. Thanks, Mark. > > Thanx, Paul > > ------------------------------------------------------------------------ > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index e04e336..dac8c7a 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2606,6 +2606,8 @@ static void rcu_do_batch(struct rcu_data *rdp) > */ > void rcu_sched_clock_irq(int user) > { > + static atomic_t invctr; > + > trace_rcu_utilization(TPS("Start scheduler-tick")); > lockdep_assert_irqs_disabled(); > raw_cpu_inc(rcu_data.ticks_this_gp); > @@ -2623,6 +2625,14 @@ void rcu_sched_clock_irq(int user) > invoke_rcu_core(); > lockdep_assert_irqs_disabled(); > > + if (atomic_inc_return(&invctr) % 0x3ffff == 0) { > + int i; > + > + pr_alert("%s: 3-second delay.\n", __func__); > + for (i = 0; i < 3000; i++) > + udelay(1000); > + } > + > trace_rcu_utilization(TPS("End scheduler-tick")); > } >