On Thu, Oct 18, 2007 at 04:39:59PM -0700, Linus Torvalds wrote: > > Basic notion: the only thing that serializes the IRQ_INPROGRESS flag is > the descriptor lock. And we don't have to (or even want to!) hold it while > waiting for the thing, but we want to *have*held*it* in between whatever > we're synchronizing with.
Thinking about this again and checking code I have come to the conclusion that 1) There is a race (in some drivers) even with the full mb. 2) Linus's patch fixes it. 3) It's difficult to fix it elegantly in the driver. In other words I think this patch is great :) First of all let's agree on some basic assumptions: * A pair of spin lock/unlock subsumes the effect of a full mb. * A spin lock in general only equates to (SS/SL/LL). * A spin unlock in general only equates to (SS/LS). In particular, a load after a spin unlock may pass before the spin unlock. Here is the race (with tg3 as the only example that I know of). The driver attempts to quiesce interrupts such that after the call to synchronize_irq it is intended that no further IRQ handler calls for that device will do any work besides acking the IRQ. Here is how it does it: CPU0 CPU1 spin lock load irq_sync irq_sync = 1 smp_mb synchronize_irq() while (IRQ_INPROGRESS) wait return set IRQ_INPROGRESS spin unlock tg3_msi ack IRQ if (irq_sync) return do work The problem here is that load of irq_sync in the handler has passed above the setting of IRQ_INPROGRESS. Linus's patch fixes it because this becomes: CPU0 CPU1 spin lock load irq_sync irq_sync = 1 smp_mb synchronize_irq set IRQ_INPROGRESS spin unlock spin lock spin unlock tg3_msi ack IRQ if (irq_sync) return do work while (IRQ_INPROGRESS) wait spin lock clear IRQ_INPROGRESS spin unlock return Even though we still do the work on the right we will now notice the INPROGRESS flag on the left and wait. It's hard to fix this in the drivers because they'd either have to access the desc lock or add a full mb to the fast path on the right. Once this goes in we can also remove the smp_mb from tg3.c. BTW, a lot of drivers (including the fusion example Ben quoted) call synchronize_irq before free_irq. This is unnecessary because the latter already calls it anyway. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/