On Fri, May 13, 2022 at 6:43 AM Fotis Panagiotopoulos <f.j.pa...@gmail.com> wrote: > > Hello! > > I am facing various issues with networking in NuttX, including a nasty > deadlock. > > I tried to track down this deadlock, and it seems that it is related to > g_netlock. > I am not sure yet what is the sequence that leads to this. > > I have CONFIG_PRIORITY_INHERITANCE enabled. > However, I see that SEM_INITIALIZER() does not set the SEM_PRIO_INHERIT > flag, > and thus g_netlock does not have priority inheritance enabled. > > I tried to set SEM_PRIO_INHERIT in SEM_INITIALIZER(), and networking is > much more stable now. > I never saw this deadlock happening again. > > Shouldn't this flag be enabled for this semaphore?
Hmm. This is a tricky one. When built with CONFIG_PRIORITY_INHERITANCE, by default ALL semaphores have priority inheritance enabled unless it is turned off for a specific semaphore by calling sem_setprotocol(SEM_PRIO_NONE). If you look at nxsem_set_protocol() (there are two versions; see the one in #ifdef CONFIG_PRIORITY_INHERITANCE), you'll see that "setting" SEM_PRIO_INHERIT actually unsets sem->flags &= ~PRIOINHERIT_FLAGS_DISABLE. That is, priority inheritance is opt-out. As explained in [1], sem_setprotocol(SEM_PRIO_NONE) must be called for *signaling* semaphores, i.e., Thread A calls sem_init() ... sem_wait() and blocks until Thread B calls sem_post(). But this semaphore is called g_netlock and the functions that operate on it are net_lock(), net_trylock(), net_unlock(), etc. This suggests it is used for locking/mutual exclusion, not signaling; i.e., the same thread that calls sem_wait() will call sem_post(). If that is the case, we should *not* call sem_setprotocol(SEM_PRIO_NONE), so that the priority inheritance mechanism can boost the holder's priority when appropriate. I am wondering if the deadlock might be caused by something else, which is hidden by disabling priority inheritance for this semaphore? Hopefully someone with more knowledge of the network stack can chime in... In the meantime, I think you should file a bug in the bug tracker. [1] https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance Nathan