Steven Rostedt's on March 30, 2019 1:31 am: > [ Added Peter ] > > On Fri, 29 Mar 2019 19:13:55 +1000 > Nicholas Piggin <npig...@gmail.com> wrote: > >> Suraj Jitindar Singh's on March 29, 2019 3:20 pm: >> > On Wed, 2019-03-27 at 17:51 +0100, Cédric Le Goater wrote: >> >> On 3/27/19 5:37 PM, Cédric Le Goater wrote: >> >> > On 3/27/19 1:36 PM, Sebastian Andrzej Siewior wrote: >> >> > > With qemu-system-ppc64le -machine pseries -smp 4 I get: >> >> > > >> >> > > > # chrt 1 hackbench >> >> > > > Running in process mode with 10 groups using 40 file >> >> > > > descriptors each (== 400 tasks) >> >> > > > Each sender will pass 100 messages of 100 bytes >> >> > > > Oops: Exception in kernel mode, sig: 4 [#1] >> >> > > > LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA pSeries >> >> > > > Modules linked in: >> >> > > > CPU: 0 PID: 629 Comm: hackbench Not tainted 5.1.0-rc2 #71 >> >> > > > NIP: c000000000046978 LR: c000000000046a38 CTR: >> >> > > > c0000000000b0150 >> >> > > > REGS: c0000001fffeb8e0 TRAP: 0700 Not tainted (5.1.0-rc2) >> >> > > > MSR: 8000000000089033 <SF,EE,ME,IR,DR,RI,LE> CR: >> >> > > > 42000874 XER: 00000000 >> >> > > > CFAR: c000000000046a34 IRQMASK: 1 >> >> > > > GPR00: c0000000000b0170 c0000001fffebb70 c000000000a6ba00 >> >> > > > 0000000028000000 >> >> > > >> >> > > … >> >> > > > NIP [c000000000046978] doorbell_core_ipi+0x28/0x30 >> >> > > > LR [c000000000046a38] doorbell_try_core_ipi+0xb8/0xf0 >> >> > > > Call Trace: >> >> > > > [c0000001fffebb70] [c0000001fffebba0] 0xc0000001fffebba0 >> >> > > > (unreliable) >> >> > > > [c0000001fffebba0] [c0000000000b0170] >> >> > > > smp_pseries_cause_ipi+0x20/0x70 >> >> > > > [c0000001fffebbd0] [c00000000004b02c] >> >> > > > arch_send_call_function_single_ipi+0x8c/0xa0 >> >> > > > [c0000001fffebbf0] [c0000000001de600] >> >> > > > irq_work_queue_on+0xe0/0x130 >> >> > > > [c0000001fffebc30] [c0000000001340c8] >> >> > > > rto_push_irq_work_func+0xc8/0x120 >> >> > > >> >> > > … >> >> > > > Instruction dump: >> >> > > > 60000000 60000000 3c4c00a2 384250b0 3d220009 392949c8 81290000 >> >> > > > 3929ffff >> >> > > > 7d231838 7c0004ac 5463017e 64632800 <7c00191c> 4e800020 >> >> > > > 3c4c00a2 38425080 >> >> > > > ---[ end trace eb842b544538cbdf ]--- >> >> This is unusual and causing powerpc code to crash because the rt >> scheduler is telling irq_work_queue_on to queue work on this CPU. >> Is that something allowed? There's no warnings in there but it must >> be a rarely tested path, would it be better to ban it? >> >> Steven is this queue_work_on to self by design? > > I just figured that a irq_work_queue_on() would default to just > irq_work_queue() if cpu == smp_processor_id().
Yeah it actually doesn't, it raises a CALL_FUNCTION IPI on the current CPU. Maybe it should do that. > > I'm sure this case has been tested (on x86), as I stressed this quite a > bit, and all it takes is to have an RT task queued on the CPU > processing the current "push" logic when it's not holding the rq locks > and read that the current CPU now has an overloaded RT task. > > If calling irq_work_queue_on() is really bad to do to the same CPU, > then we can work on changing the logic to do something different if the > CPU returned from rto_next_cpu() is the current CPU. I don't actually know that it is bad. It does seem to violate the principle of least surprise for arch code though. >From this bug report, I'm guessing it's pretty rare case to hit so I would worry about subtle bugs. And I just cc'ed you in case your code was actually not intending to do this. > Or would it be possible to just change irq_work_queue_on() to call > irq_work_queue() if cpu == smp_processor_id()? Something like this? kernel/irq_work: Do not raise an IPI when queueing work on the local CPU The QEMU powerpc/pseries machine model was not expecting a self-IPI, and it may be a bit surprising thing to do, so have irq_work_queue_on do local queueing when target is current CPU. Suggested-by: Steven Rostedt <rost...@goodmis.org> Signed-off-by: Nicholas Piggin <npig...@gmail.com> --- kernel/irq_work.c | 78 ++++++++++++++++++++++++++--------------------- 1 file changed, 43 insertions(+), 35 deletions(-) diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 6b7cdf17ccf8..f0e539d0f879 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -56,61 +56,69 @@ void __weak arch_irq_work_raise(void) */ } -/* - * Enqueue the irq_work @work on @cpu unless it's already pending - * somewhere. - * - * Can be re-enqueued while the callback is still in progress. - */ -bool irq_work_queue_on(struct irq_work *work, int cpu) +/* Enqueue on current CPU, work must already be claimed and preempt disabled */ +static void __irq_work_queue(struct irq_work *work) { - /* All work should have been flushed before going offline */ - WARN_ON_ONCE(cpu_is_offline(cpu)); - -#ifdef CONFIG_SMP - - /* Arch remote IPI send/receive backend aren't NMI safe */ - WARN_ON_ONCE(in_nmi()); + /* If the work is "lazy", handle it from next tick if any */ + if (work->flags & IRQ_WORK_LAZY) { + if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) && + tick_nohz_tick_stopped()) + arch_irq_work_raise(); + } else { + if (llist_add(&work->llnode, this_cpu_ptr(&raised_list))) + arch_irq_work_raise(); + } +} +/* Enqueue the irq work @work on the current CPU */ +bool irq_work_queue(struct irq_work *work) +{ /* Only queue if not already pending */ if (!irq_work_claim(work)) return false; - if (llist_add(&work->llnode, &per_cpu(raised_list, cpu))) - arch_send_call_function_single_ipi(cpu); - -#else /* #ifdef CONFIG_SMP */ - irq_work_queue(work); -#endif /* #else #ifdef CONFIG_SMP */ + /* Queue the entry and raise the IPI if needed. */ + preempt_disable(); + __irq_work_queue(work); + preempt_enable(); return true; } +EXPORT_SYMBOL_GPL(irq_work_queue); -/* Enqueue the irq work @work on the current CPU */ -bool irq_work_queue(struct irq_work *work) +/* + * Enqueue the irq_work @work on @cpu unless it's already pending + * somewhere. + * + * Can be re-enqueued while the callback is still in progress. + */ +bool irq_work_queue_on(struct irq_work *work, int cpu) { +#ifndef CONFIG_SMP + return irq_work_queue(work); + +#else /* #ifndef CONFIG_SMP */ + /* All work should have been flushed before going offline */ + WARN_ON_ONCE(cpu_is_offline(cpu)); + /* Only queue if not already pending */ if (!irq_work_claim(work)) return false; - /* Queue the entry and raise the IPI if needed. */ preempt_disable(); - - /* If the work is "lazy", handle it from next tick if any */ - if (work->flags & IRQ_WORK_LAZY) { - if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) && - tick_nohz_tick_stopped()) - arch_irq_work_raise(); - } else { - if (llist_add(&work->llnode, this_cpu_ptr(&raised_list))) - arch_irq_work_raise(); - } - + if (cpu != smp_processor_id()) { + /* Arch remote IPI send/receive backend aren't NMI safe */ + WARN_ON_ONCE(in_nmi()); + if (llist_add(&work->llnode, &per_cpu(raised_list, cpu))) + arch_send_call_function_single_ipi(cpu); + } else + __irq_work_queue(work); preempt_enable(); return true; +#endif /* #else #ifndef CONFIG_SMP */ } -EXPORT_SYMBOL_GPL(irq_work_queue); + bool irq_work_needs_cpu(void) { -- 2.20.1