On Tue, Jan 07, 2025 at 02:51:06PM +0100, Sebastian Andrzej Siewior wrote:
> On 2024-12-04 08:42:23 [-0300], Wander Lairson Costa wrote:
> > This is the second attempt at fixing the behavior of igb_msix_other()
> > for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> > concerns raised by Sebastian [3].
> > 
> > The initial approach proposed converting vfs_lock to a raw_spinlock,
> > a minor change intended to make it safe. However, it became evident
> > that igb_rcv_msg_from_vf() invokes kcalloc with GFP_ATOMIC,
> > which is unsafe in interrupt context on PREEMPT_RT systems.
> > 
> > To address this, the solution involves splitting igb_msg_task()
> > into two parts:
> > 
> >     * One part invoked from the IRQ context.
> >     * Another part called from the threaded interrupt handler.
> > 
> > To accommodate this, vfs_lock has been restructured into a double
> > lock: a spinlock_t and a raw_spinlock_t. In the revised design:
> > 
> >     * igb_disable_sriov() locks both spinlocks.
> >     * Each part of igb_msg_task() locks the appropriate spinlock for
> >     its execution context.
> 
> - Is this limited to PREEMPT_RT or does it also occur on PREEMPT systems
>   with threadirqs? And if this is PREEMPT_RT only, why?

PREEMPT systems configured to use threadirqs should be affected as well,
although I never tested with this configuration. Honestly, until now I wasn't
aware of the possibility of a non PREEMPT_RT kernel with threaded IRQs by 
default.

> 
> - What causes the failure? I see you reworked into two parts to behave
>   similar to what happens without threaded interrupts. There is still no
>   explanation for it. Is there a timing limit or was there another
>   register operation which removed the mailbox message?
> 

I explained the root cause of the issue in the last commit. Maybe I should
have added the explanation to the cover letter as well.  Anyway, here is a
partial verbatim copy of it:

"During testing of SR-IOV, Red Hat QE encountered an issue where the
ip link up command intermittently fails for the igbvf interfaces when
using the PREEMPT_RT variant. Investigation revealed that
e1000_write_posted_mbx returns an error due to the lack of an ACK
from e1000_poll_for_ack.

The underlying issue arises from the fact that IRQs are threaded by
default under PREEMPT_RT. While the exact hardware details are not
available, it appears that the IRQ handled by igb_msix_other must
be processed before e1000_poll_for_ack times out. However,
e1000_write_posted_mbx is called with preemption disabled, leading
to a scenario where the IRQ is serviced only after the failure of
e1000_write_posted_mbx."

The call chain from igb_msg_task():

igb_msg_task
        igb_rcv_msg_from_vf
                igb_set_vf_multicasts
                        igb_set_rx_mode
                                igb_write_mc_addr_list
                                        kmalloc

Cannot happen from interrupt context under PREEMPT_RT. So this part of
the interrupt handler is deferred to a threaded IRQ handler.

> > Cheers,
> > Wander
> 
> Sebastian
> 

Reply via email to