On Wed, Jan 8, 2025 at 7:25 AM Sebastian Andrzej Siewior <bige...@linutronix.de> wrote: > > On 2025-01-07 15:52:47 [-0300], Wander Lairson Costa wrote: > > On Tue, Jan 07, 2025 at 02:51:06PM +0100, Sebastian Andrzej Siewior wrote: > > > On 2024-12-04 08:42:23 [-0300], Wander Lairson Costa wrote: > > > > This is the second attempt at fixing the behavior of igb_msix_other() > > > > for PREEMPT_RT. The previous attempt [1] was reverted [2] following > > > > concerns raised by Sebastian [3]. > > > > > > > > The initial approach proposed converting vfs_lock to a raw_spinlock, > > > > a minor change intended to make it safe. However, it became evident > > > > that igb_rcv_msg_from_vf() invokes kcalloc with GFP_ATOMIC, > > > > which is unsafe in interrupt context on PREEMPT_RT systems. > > > > > > > > To address this, the solution involves splitting igb_msg_task() > > > > into two parts: > > > > > > > > * One part invoked from the IRQ context. > > > > * Another part called from the threaded interrupt handler. > > > > > > > > To accommodate this, vfs_lock has been restructured into a double > > > > lock: a spinlock_t and a raw_spinlock_t. In the revised design: > > > > > > > > * igb_disable_sriov() locks both spinlocks. > > > > * Each part of igb_msg_task() locks the appropriate spinlock for > > > > its execution context. > > > > > > - Is this limited to PREEMPT_RT or does it also occur on PREEMPT systems > > > with threadirqs? And if this is PREEMPT_RT only, why? > > > > PREEMPT systems configured to use threadirqs should be affected as well, > > although I never tested with this configuration. Honestly, until now I > > wasn't > > aware of the possibility of a non PREEMPT_RT kernel with threaded IRQs by > > default. > > If the issue is indeed the use of threaded interrupts then the fix > should not be limited to be PREEMPT_RT only. > Although I was not aware of this scenario, the patch should work for it as well, as I am forcing it to run in interrupt context. I will test it to confirm.
> > > - What causes the failure? I see you reworked into two parts to behave > > > similar to what happens without threaded interrupts. There is still no > > > explanation for it. Is there a timing limit or was there another > > > register operation which removed the mailbox message? > > > > > > > I explained the root cause of the issue in the last commit. Maybe I should > > have added the explanation to the cover letter as well. Anyway, here is a > > partial verbatim copy of it: > > > > "During testing of SR-IOV, Red Hat QE encountered an issue where the > > ip link up command intermittently fails for the igbvf interfaces when > > using the PREEMPT_RT variant. Investigation revealed that > > e1000_write_posted_mbx returns an error due to the lack of an ACK > > from e1000_poll_for_ack. > > That ACK would have come if it would poll longer? > No, the service wouldn't be serviced while polling. > > The underlying issue arises from the fact that IRQs are threaded by > > default under PREEMPT_RT. While the exact hardware details are not > > available, it appears that the IRQ handled by igb_msix_other must > > be processed before e1000_poll_for_ack times out. However, > > e1000_write_posted_mbx is called with preemption disabled, leading > > to a scenario where the IRQ is serviced only after the failure of > > e1000_write_posted_mbx." > > Where is this disabled preemption coming from? This should be one of the > ops.write_posted() calls, right? I've been looking around and don't see > anything obvious. I don't remember if I found the answer by looking at the code or by looking at the ftrace flags. I am currently on sick leave with covid. I can check it when I come back. > Couldn't you wait for an event instead of polling? > > > The call chain from igb_msg_task(): > > > > igb_msg_task > > igb_rcv_msg_from_vf > > igb_set_vf_multicasts > > igb_set_rx_mode > > igb_write_mc_addr_list > > kmalloc > > > > Cannot happen from interrupt context under PREEMPT_RT. So this part of > > the interrupt handler is deferred to a threaded IRQ handler. > > > > > > Cheers, > > > > Wander > > Sebastian >