On Sun, Jul 22, 2018 at 01:04:11PM -0700, David Miller wrote: > From: Uwe Kleine-König <u.kleine-koe...@pengutronix.de> > Date: Sun, 22 Jul 2018 21:00:35 +0200 > > > On Sat, Jul 21, 2018 at 10:44:09PM -0700, David Miller wrote: > >> From: Uwe Kleine-König <u.kleine-koe...@pengutronix.de> > >> Date: Fri, 20 Jul 2018 11:53:15 +0200 > >> > >> > free_irq() waits until all handlers for this IRQ have completed. As the > >> > relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock > >> > it might never return if the thread calling free_irq() holds this lock. > >> > > >> > For the same reason kthread_cancel_delayed_work_sync() in the polling > >> > case > >> > must not hold this lock. > >> > > >> > Also first free the irq (or stop the worker respectively) such that > >> > mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq > >> > mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the > >> > worker thread to call handle_nested_irq(0) which results in a > >> > NULL-pointer > >> > exception. > >> > > >> > Signed-off-by: Uwe Kleine-König <u.kleine-koe...@pengutronix.de> > >> > >> Looks good. > >> > >> Note than the IRQ domain unmapping will do a synchronize_irq() which > >> should cause the same deadlock as free_irq() will with the reg_lock > >> held. > > > > Do you think that there is still a problem? When free_irq() for the > > external visible irq returns the muxed irqs should be all gone, too, so > > this should not trigger, should it? > > It shouldn't be a problem after your changes. > > I'm just saying that I'm surprised that, in the original code, you see > the deadlock in free_irq(), since the synchronize_irq() done by the > IRQ domain code should have happened first.
ah, I see. This didn't happen because I added an msleep to mv88e6xxx_g1_irq_thread_work() before the lock it taken to widen the race window for a different problem. So the sub-irqs were not active when mv88e6xxx_g1_irq_free() run, only the mux-irq was. When irq_dispose_mapping() is called for the sub-irq there is no problem as this results in synchronize_irq() for the sub-irq, not the mux-irq. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ |