Hello,
Damien Zammit, le jeu. 10 juil. 2025 10:14:21 +0000, a ecrit:
> This avoids race condition with multiple devices raising interrupts
> simultaneously on the same IRQ and causing mask to fail.
Are we *sure* that PCI interrupts may get raised on several cpus
concurrently ? It makes the race game *really* more complex...
The very probable scenario, however, is __enable_irq() getting called
concurrently (since it is just called on the irq_acknowledge RPC), and
splhigh() is not enough to protect against that.
Rather than using an atomic counter, which is *really* difficult to get
right, I'd say just introduce one simple_lock per interrupt, which you
can take with simple_lock_irq instead of splhigh(). It should be solving
everything in a way which is way simpler to be sure of.
Along the way, better replace unsigned int ndisabled_irq with a
structure containing ndisabled_irq and the simple_lock, and make it
aligned on the cache line size, so each irq has its own cache line, and
reduce cache ping-pong.
> ---
> i386/i386/irq.c | 26 ++++++++++++++++++++------
> 1 file changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/i386/i386/irq.c b/i386/i386/irq.c
> index 91318f67..d91b3856 100644
> --- a/i386/i386/irq.c
> +++ b/i386/i386/irq.c
> @@ -41,11 +41,19 @@ __disable_irq (irq_t irq_nr)
> {
> assert (irq_nr < NINTR);
>
> + /* The spl wrapping protects the same cpu
> + * from being interrupted while updating this */
> spl_t s = splhigh();
> - ndisabled_irq[irq_nr]++;
> +
> + /* masking the irq unconditionally prevents
> + * other cpus being interrupted on this irq */
> + mask_irq (irq_nr);
There is a window here where __enable_irq() can unmask the interrupt
before we increment ndisabled_irq.
> + /* we can atomically increment the nesting counter now */
> + __atomic_add_fetch(&ndisabled_irq[irq_nr], 1, __ATOMIC_RELAXED);
> assert (ndisabled_irq[irq_nr] > 0);
> - if (ndisabled_irq[irq_nr] == 1)
> - mask_irq (irq_nr);
> +
> splx(s);
> }
>
> @@ -54,11 +62,17 @@ __enable_irq (irq_t irq_nr)
> {
> assert (irq_nr < NINTR);
>
> + /* The spl wrapping protects the same cpu
> + * from being interrupted while updating this */
> spl_t s = splhigh();
> - assert (ndisabled_irq[irq_nr] > 0);
> - ndisabled_irq[irq_nr]--;
> - if (ndisabled_irq[irq_nr] == 0)
> +
> + /* This could be racy if the irq was already unmasked,
> + * and the irq is not guaranteed to be masked at this point if more
> + * than one device has called us almost simultaneously on this irq.
> + * Therefore it is only safe to decrement the counter here atomically
> inside the check */
> + if (__atomic_add_fetch(&ndisabled_irq[irq_nr], -1, __ATOMIC_RELAXED) == 0)
There is a window here where __disable_irq() can mask the interrupt, but
then we unmask it.
> unmask_irq (irq_nr);
Samuel