On 2015/9/8 18:40, Mika Westerberg wrote: > Hi Jiang and Thomas, > > With recent kernels I'm getting crash when a GPIO interrupt triggers. For > unknown reasons I'm not able to capture the crash on the serial console so I > did following change to irq_move_masked_irq(): > > assert_raw_spin_locked(&desc->lock); > > to > > WARN_ON(!raw_spin_is_locked(&desc->lock)); > > The backtrace looks like: > > [ 6.733640] WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 > irq_move_masked_irq+0xb8/0xc0() > [ 6.768765] Modules linked in: x86_pkg_temp_thermal i2c_hid(+) > [ 6.775425] CPU: 0 PID: 5 Comm: kworker/0:0H Not tainted 4.2.0+ #124 > [ 6.782639] ffffffff81747e59 ffff8801c3c03e18 ffffffff815d6165 > 0000000000000000 > [ 6.791110] ffff8801c3c03e50 ffffffff81050c9d ffff8801bf00a600 > ffff8801beb594c0 > [ 6.799540] ffff8801beb594c0 0000000000000002 0000000042000000 > ffff8801c3c03e60 > [ 6.807989] Call Trace: > [ 6.810755] <IRQ> [<ffffffff815d6165>] dump_stack+0x4e/0x79 > [ 6.817307] [<ffffffff81050c9d>] warn_slowpath_common+0x7d/0xb0 > [ 6.824121] [<ffffffff81050d85>] warn_slowpath_null+0x15/0x20 > [ 6.830736] [<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0 > [ 6.837450] [<ffffffff8103c161>] ioapic_ack_level+0x111/0x130 > [ 6.844079] [<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0 > [ 6.851306] [<ffffffff81006bfb>] handle_irq+0xab/0x180 > [ 6.857235] [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20 > [ 6.864440] [<ffffffff810063c2>] do_IRQ+0x52/0xe0 > [ 6.869877] [<ffffffff815dcfbf>] common_interrupt+0x7f/0x7f > [ 6.876292] <EOI> [<ffffffff815dbf6d>] ? _raw_write_unlock_irq+0xd/0x30 > [ 6.884008] [<ffffffff815dbf99>] _raw_spin_unlock_irq+0x9/0x10 > [ 6.890721] [<ffffffff810745bc>] finish_task_switch+0x7c/0x1a0 > [ 6.897438] [<ffffffff815d7b9b>] __schedule+0x33b/0xb20 > [ 6.903461] [<ffffffff8107cc77>] ? __enqueue_entity+0x67/0x70 > [ 6.910079] [<ffffffff815d8408>] schedule+0x38/0x90 > [ 6.915711] [<ffffffff815db3a8>] schedule_timeout+0x158/0x250 > [ 6.922328] [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20 > [ 6.929534] [<ffffffff815d933f>] wait_for_completion_killable+0xaf/0x220 > [ 6.937231] [<ffffffff81076d80>] ? wake_up_q+0x60/0x60 > [ 6.943158] [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0 > [ 6.949970] [<ffffffff8106d5c2>] kthread_create_on_node+0xd2/0x170 > [ 6.957079] [<ffffffff8129ac39>] ? snprintf+0x39/0x40 > [ 6.962907] [<ffffffff810665b9>] create_worker+0xb9/0x190 > [ 6.969130] [<ffffffff810685d3>] worker_thread+0x303/0x4b0 > [ 6.975452] [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0 > [ 6.982262] [<ffffffff8106d8bf>] kthread+0xcf/0xf0 > [ 6.987797] [<ffffffff815dbf99>] ? _raw_spin_unlock_irq+0x9/0x10 > [ 6.994722] [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190 > [ 7.001636] [<ffffffff815dc86f>] ret_from_fork+0x3f/0x70 > [ 7.007765] [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190 > > The driver in question is drivers/pinctrl/intel/pinctrl-intel.c and the > corresponding code which triggers this is below: > > static void intel_gpio_irq_handler(unsigned irq, struct irq_desc *desc) > { > struct gpio_chip *gc = irq_desc_get_handler_data(desc); > struct intel_pinctrl *pctrl = gpiochip_to_pinctrl(gc); > struct irq_chip *chip = irq_desc_get_chip(desc); > int i; > > chained_irq_enter(chip, desc); > > /* Need to check all communities for pending interrupts */ > for (i = 0; i < pctrl->ncommunities; i++) > intel_gpio_community_irq_handler(gc, &pctrl->communities[i]); > > chained_irq_exit(chip, desc); > } > > So once chained_irq_exit() is called, it in turn calls chip->irq_eoi() > which in this case is ioapic_ack_level(). > > I'm sure such crash did not happen when pinctrl-intel.c was developed so I > started to investigate what might have changed. Note that it may be the > driver does something wrong and the crash is expected. However, many other > drivers do pretty much the same (with the exception that the parent IRQ > chip is not IO-APIC in most cases). > > To me assert_raw_spin_locked(&desc->lock) looks valid as the function goes > and modifies desc right after the check. Also in intel_gpio_irq_handler() > desc->lock is not locked (not sure if it needs to be). > > After "manual" bisection I found the commit that causes the crash to be > triggered: > > commit aa5cb97f14a2dd5aefabed6538c35ebc087d7c24 > Author: Jiang Liu <jiang....@linux.intel.com> > Date: Tue Apr 14 10:29:42 2015 +0800 > > x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces > > Now there is no user of x86_io_apic_ops.set_affinity anymore, so > remove > it. > > It says that there are no users for x86_io_apic_ops.set_affinity but then > it does this: > > - x86_io_apic_ops.set_affinity(idata, mask, false); > + irq_set_affinity(irq, mask); > > The difference is that x86_io_apic_ops.set_affinity() programs affinity > directly to the hardware (if I understand it right) but irq_set_affinity() > calls irqd_set_move_pending() which defers programming the hardware later.
Hi Mika, I feel this is the root cause and will investigate it tomorrow. Thanks! Gerry > > Now when an interrupt triggers we end up calling irq_move_masked_irq() with > unlocked descriptor. > > Without the above change we never do that and the crash does not happen. > > Since I don't know much about genirq and IO-APIC code in particular, I > would like to get your input on how to solve the problem. Do I need to > change the pinctrl driver somehow to lock the descriptor or maybe the > commit above misses something? > > It is really easy to trigger so please let me know if further debugging is > needed. > > Thanks in advance. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/