On 06/08/2020 01.34, Stephen Hemminger wrote: > On Wed, 5 Aug 2020 16:25:23 +0200 > Rasmus Villemoes <rasmus.villem...@prevas.dk> wrote: > >> Hi, >> >> We're seeing occasional lockups on an embedded board (running an -rt >> kernel), which I believe I've tracked down to the >> >> if (!rtnl_trylock()) >> return restart_syscall(); >> >> in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task >> writes a "1" to the /sys/class/net/foo/bridge/flush file, while some >> lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that >> happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR >> loop, and the lower-priority task never gets runtime and thus cannot >> release the lock. >> >> I've written a script that rather quickly reproduces this both on our >> target and my desktop machine (pinning everything on one CPU to emulate >> the uni-processor board), see below. Also, with this hacky patch > > There is a reason for the trylock, it works around a priority inversion.
Can you elaborate? It seems to me that it _causes_ a priority inversion since priority inheritance doesn't have a chance to kick in. > The real problem is expecting a SCHED_FIFO task to be safe with this > kind of network operation. Maybe. But ignoring the SCHED_FIFO/rt-prio stuff, it also seems a bit odd to do what is essentially a busy-loop - yes, the restart_syscall() allows signals to be delivered (including allowing the process to get killed), but in the absence of any signals, the pattern essentially boils down to while (!rtnl_trylock()) ; So even for regular tasks, this seems to needlessly hog the cpu. I tried this diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c index 0318a69888d4..e40e264f9b16 100644 --- a/net/bridge/br_sysfs_br.c +++ b/net/bridge/br_sysfs_br.c @@ -44,8 +44,8 @@ static ssize_t store_bridge_parm(struct device *d, if (endp == buf) return -EINVAL; - if (!rtnl_trylock()) - return restart_syscall(); + if (rtnl_lock_interruptible()) + return -ERESTARTNOINTR; err = (*set)(br, val); if (!err) with the obvious definition of rtnl_lock_interruptible(), and it makes the problem go away. Isn't it better to sleep waiting for the lock (and with -rt, giving proper priority boost) or a signal to arrive rather than busy-looping back and forth between syscall entry point and the trylock()? I see quite a lot of if (mutex_lock_interruptible(...)) return -ERESTARTSYS; but for the rtnl_mutex, I see the trylock...restart_syscall pattern being used in a couple of places. So there must be something special about the rtnl_mutex? Thanks, Rasmus