Ingo, I've spent several days banging my head on this bug, and I finally found it. I originally thought we had a bug with the latency tracer, since it seemed to only occur when I turned on latency tracing. But I guess it just changed the timings to cause the bug to happen. Now that I found where the bug is, I don't know how it ever worked, even without the tracing.
When taking an Ethernet interrupt, it was handled by handle_fasteoi_irq. handle_fasteoi_irq, when the irq is handled by a thread, sets the irq INPROGRESS and masks the irq. Before leaving handle_fasteoi_irq, a call to desc->chip->eoi is called. For the apic, this will call move_native_irq. In the -rt kernel, move_native_irq masks the irq, moves it, and then blindly unmasks it. So when interrupts are turned on next, we take an interrupt storm. The other handlers besides handle_fasteoi_irq mask the irq regardless of whether the irq is INPROGRESS. But handle_fasteoi_irq will not mask it, if the irq is already INPROGRESS, and just returns. So we keep taking the same interrupt. So, I change move_native_irq to not mask and unmask if the irq is currently INPROGRESS. But... I'm not sure if this is ok or not, since I don't know all the uses to this. My original patch was just to do a +@@ -68,6 +68,9 @@ void move_native_irq(int irq) if (unlikely(desc->status & IRQ_DISABLED)) return; if (unlikely(desc->status & IRQ_INPROGRESS)) + return; + desc->chip->mask(irq); move_masked_irq(irq); desc->chip->unmask(irq); But I thought that this might be too big of a hammer to this nail. So I changed it to the patch below. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Index: linux-2.6.21-rt1-i386/kernel/irq/migration.c =================================================================== --- linux-2.6.21-rt1-i386.orig/kernel/irq/migration.c +++ linux-2.6.21-rt1-i386/kernel/irq/migration.c @@ -61,6 +61,7 @@ void move_masked_irq(int irq) void move_native_irq(int irq) { struct irq_desc *desc = irq_desc + irq; + int mask = 1; if (likely(!(desc->status & IRQ_MOVE_PENDING))) return; @@ -68,8 +69,17 @@ void move_native_irq(int irq) if (unlikely(desc->status & IRQ_DISABLED)) return; - desc->chip->mask(irq); + /* + * If the irq is already in progress, it should be masked. + * If we unmask it, we might cause an interrupt storm on RT. + */ + if (unlikely(desc->status & IRQ_INPROGRESS)) + mask = 0; + + if (mask) + desc->chip->mask(irq); move_masked_irq(irq); - desc->chip->unmask(irq); + if (mask) + desc->chip->unmask(irq); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/