Hi Jiang and Thomas, With recent kernels I'm getting crash when a GPIO interrupt triggers. For unknown reasons I'm not able to capture the crash on the serial console so I did following change to irq_move_masked_irq():
assert_raw_spin_locked(&desc->lock); to WARN_ON(!raw_spin_is_locked(&desc->lock)); The backtrace looks like: [ 6.733640] WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq+0xb8/0xc0() [ 6.768765] Modules linked in: x86_pkg_temp_thermal i2c_hid(+) [ 6.775425] CPU: 0 PID: 5 Comm: kworker/0:0H Not tainted 4.2.0+ #124 [ 6.782639] ffffffff81747e59 ffff8801c3c03e18 ffffffff815d6165 0000000000000000 [ 6.791110] ffff8801c3c03e50 ffffffff81050c9d ffff8801bf00a600 ffff8801beb594c0 [ 6.799540] ffff8801beb594c0 0000000000000002 0000000042000000 ffff8801c3c03e60 [ 6.807989] Call Trace: [ 6.810755] <IRQ> [<ffffffff815d6165>] dump_stack+0x4e/0x79 [ 6.817307] [<ffffffff81050c9d>] warn_slowpath_common+0x7d/0xb0 [ 6.824121] [<ffffffff81050d85>] warn_slowpath_null+0x15/0x20 [ 6.830736] [<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0 [ 6.837450] [<ffffffff8103c161>] ioapic_ack_level+0x111/0x130 [ 6.844079] [<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0 [ 6.851306] [<ffffffff81006bfb>] handle_irq+0xab/0x180 [ 6.857235] [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20 [ 6.864440] [<ffffffff810063c2>] do_IRQ+0x52/0xe0 [ 6.869877] [<ffffffff815dcfbf>] common_interrupt+0x7f/0x7f [ 6.876292] <EOI> [<ffffffff815dbf6d>] ? _raw_write_unlock_irq+0xd/0x30 [ 6.884008] [<ffffffff815dbf99>] _raw_spin_unlock_irq+0x9/0x10 [ 6.890721] [<ffffffff810745bc>] finish_task_switch+0x7c/0x1a0 [ 6.897438] [<ffffffff815d7b9b>] __schedule+0x33b/0xb20 [ 6.903461] [<ffffffff8107cc77>] ? __enqueue_entity+0x67/0x70 [ 6.910079] [<ffffffff815d8408>] schedule+0x38/0x90 [ 6.915711] [<ffffffff815db3a8>] schedule_timeout+0x158/0x250 [ 6.922328] [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20 [ 6.929534] [<ffffffff815d933f>] wait_for_completion_killable+0xaf/0x220 [ 6.937231] [<ffffffff81076d80>] ? wake_up_q+0x60/0x60 [ 6.943158] [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0 [ 6.949970] [<ffffffff8106d5c2>] kthread_create_on_node+0xd2/0x170 [ 6.957079] [<ffffffff8129ac39>] ? snprintf+0x39/0x40 [ 6.962907] [<ffffffff810665b9>] create_worker+0xb9/0x190 [ 6.969130] [<ffffffff810685d3>] worker_thread+0x303/0x4b0 [ 6.975452] [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0 [ 6.982262] [<ffffffff8106d8bf>] kthread+0xcf/0xf0 [ 6.987797] [<ffffffff815dbf99>] ? _raw_spin_unlock_irq+0x9/0x10 [ 6.994722] [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190 [ 7.001636] [<ffffffff815dc86f>] ret_from_fork+0x3f/0x70 [ 7.007765] [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190 The driver in question is drivers/pinctrl/intel/pinctrl-intel.c and the corresponding code which triggers this is below: static void intel_gpio_irq_handler(unsigned irq, struct irq_desc *desc) { struct gpio_chip *gc = irq_desc_get_handler_data(desc); struct intel_pinctrl *pctrl = gpiochip_to_pinctrl(gc); struct irq_chip *chip = irq_desc_get_chip(desc); int i; chained_irq_enter(chip, desc); /* Need to check all communities for pending interrupts */ for (i = 0; i < pctrl->ncommunities; i++) intel_gpio_community_irq_handler(gc, &pctrl->communities[i]); chained_irq_exit(chip, desc); } So once chained_irq_exit() is called, it in turn calls chip->irq_eoi() which in this case is ioapic_ack_level(). I'm sure such crash did not happen when pinctrl-intel.c was developed so I started to investigate what might have changed. Note that it may be the driver does something wrong and the crash is expected. However, many other drivers do pretty much the same (with the exception that the parent IRQ chip is not IO-APIC in most cases). To me assert_raw_spin_locked(&desc->lock) looks valid as the function goes and modifies desc right after the check. Also in intel_gpio_irq_handler() desc->lock is not locked (not sure if it needs to be). After "manual" bisection I found the commit that causes the crash to be triggered: commit aa5cb97f14a2dd5aefabed6538c35ebc087d7c24 Author: Jiang Liu <jiang....@linux.intel.com> Date: Tue Apr 14 10:29:42 2015 +0800 x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces Now there is no user of x86_io_apic_ops.set_affinity anymore, so remove it. It says that there are no users for x86_io_apic_ops.set_affinity but then it does this: - x86_io_apic_ops.set_affinity(idata, mask, false); + irq_set_affinity(irq, mask); The difference is that x86_io_apic_ops.set_affinity() programs affinity directly to the hardware (if I understand it right) but irq_set_affinity() calls irqd_set_move_pending() which defers programming the hardware later. Now when an interrupt triggers we end up calling irq_move_masked_irq() with unlocked descriptor. Without the above change we never do that and the crash does not happen. Since I don't know much about genirq and IO-APIC code in particular, I would like to get your input on how to solve the problem. Do I need to change the pinctrl driver somehow to lock the descriptor or maybe the commit above misses something? It is really easy to trigger so please let me know if further debugging is needed. Thanks in advance. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/