On Wed, 2018-03-21 at 02:22:28 UTC, Nicholas Piggin wrote: > force_external_irq_replay() can be called in the do_IRQ path with > interrupts hard enabled and soft disabled if may_hard_irq_enable() set > MSR[EE]=1. It updates local_paca->irq_happened with a load, modify, > store sequence. If a maskable interrupt hits during this sequence, it > will go to the masked handler to be marked pending in irq_happened. > This update will be lost when the interrupt returns and the store > instruction executes. This can result in unpredictable latencies, > timeouts, lockups, etc. > > Fix this by ensuring hard interrupts are disabled before modifying > irq_happened. > > This could cause any maskable asynchronous interrupt to get lost, but > it was noticed on P9 SMP system doing RDMA NVMe target over 100GbE, > so very high external interrupt rate and high IPI rate. The hang was > bisected down to enabling doorbell interrupts for IPIs. These provided > an interrupt type that could run at high rates in the do_IRQ path, > stressing the race. > > Fixes: 1d607bb3bd ("powerpc/irq: Add mechanism to force a replay of > interrupts") > Reported-by: Carol L. Soto <cls...@us.ibm.com> > Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> > Signed-off-by: Nicholas Piggin <npig...@gmail.com>
Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/ff6781fd1bb404d8a551c02c35c70c cheers