Hi Thierry, On Wed, 20 Aug 2014 09:31:13 +0200 Thierry Reding <thierry.red...@gmail.com> wrote:
> On Wed, Aug 20, 2014 at 01:01:30AM +0200, Boris BREZILLON wrote: > > Hi Jean-Christophe, > > > > On Wed, 20 Aug 2014 06:11:17 +0800 > > Jean-Christophe PLAGNIOL-VILLARD <plagn...@jcrosoft.com> wrote: > > > > > Hi, > > > > > > This is a bit weird as the clock of the TC should be off and the irq > > > free > > > > > > so this should never happened we need to investigate more why this > > > append > > > > I may have found the source of this bug. > > > > As Gael stated, when you're kexec-ing a new kernel your previous kernel > > could be using the tbc_clksrc driver (and especially the clkevent > > device). Thus the kernel might have planned a timer event and then been > > asked to shutdown the machine (requested by the kexec code). > > In this case the AIC interrupt connected to the TC Block is disabled > > but not the interrupts within the TCB IP (IDR registers), possibly > > leaving a pending interrupt before booting the new kernel. > > > > When the tcb_clksrc driver is loaded by the new kernel it enables the > > interrupt line by calling setup_irq [1] while the clockevent device is > > not registered yet [2]. Thus the event_handler is still NULL when the > > AIC line connected to the TCB is unmasked. Remember that an interrupt > > is still pending on this HW block, which will lead to an immediate call > > to the ch2_irq handler, which tries to call the event_handler, which in > > turns is NULL because clkevent device registration has not taken place > > at this moment => Kernel panic. > > ITOH, we can't register the clkevent device before the irq handler is > > set up, because we should be ready to handle clkevent request at the > > time clockevents_config_and_register is called. > > > > This leaves two solution: > > 1) disable the TCB irqs (using TCB IDR registers) before calling > > setup_irq in the tcb_clksrc driver > > 2) disable the TCB irqs at the tclib level (as proposed by Gael) > > > > I prefer solution #2 because it fixes the bug for all TCB users (not > > just the tcb_clksrc driver). > > Wouldn't a more proper fix be to only enable the IRQ (setup_irq()) once > everything has properly been set up? That's certainly how all other > drivers are doing this. Generally I think it's best to assume that an > interrupt can fire at any point after it's been enabled, so everything > should be set up prior to enabling it. Sure. And, AFAIK, another common practice is to disable all interrupts and acknowledge all pending interrupts before registering a new irq handler to avoid inheriting peripheral dirty state from previous usage (either the bootloader, or the previous kernel when using kexec). This being said, I really think we should leave the HW in a clean state when shutdown is called. And disabling interrupts at the tclib level (in a shutdown callback) ensure that. > > Also, does anyone know why this driver uses setup_irq() rather than the > more idiomatic request_irq()? Because nobody has sanitized this driver yet ;-). Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/