Hi Nathan, you are right, changing this flag disables the priority inheritance, in contrast to what I thought initially.
Thus, the bug manifests itself only when the semaphore has priority inheritance enabled. I would suspect an issue with the priority inheritance mechanism, and not network itself. I will spend some more time investigating this, and report here. Στις Παρ 13 Μαΐ 2022 στις 6:27 μ.μ., ο/η Nathan Hartman < hartman.nat...@gmail.com> έγραψε: > On Fri, May 13, 2022 at 6:43 AM Fotis Panagiotopoulos > <f.j.pa...@gmail.com> wrote: > > > > Hello! > > > > I am facing various issues with networking in NuttX, including a nasty > > deadlock. > > > > I tried to track down this deadlock, and it seems that it is related to > > g_netlock. > > I am not sure yet what is the sequence that leads to this. > > > > I have CONFIG_PRIORITY_INHERITANCE enabled. > > However, I see that SEM_INITIALIZER() does not set the SEM_PRIO_INHERIT > > flag, > > and thus g_netlock does not have priority inheritance enabled. > > > > I tried to set SEM_PRIO_INHERIT in SEM_INITIALIZER(), and networking is > > much more stable now. > > I never saw this deadlock happening again. > > > > Shouldn't this flag be enabled for this semaphore? > > > Hmm. This is a tricky one. > > When built with CONFIG_PRIORITY_INHERITANCE, by default ALL semaphores > have priority inheritance enabled unless it is turned off for a > specific semaphore by calling sem_setprotocol(SEM_PRIO_NONE). > > If you look at nxsem_set_protocol() (there are two versions; see the > one in #ifdef CONFIG_PRIORITY_INHERITANCE), you'll see that "setting" > SEM_PRIO_INHERIT actually unsets sem->flags &= > ~PRIOINHERIT_FLAGS_DISABLE. That is, priority inheritance is opt-out. > > As explained in [1], sem_setprotocol(SEM_PRIO_NONE) must be called for > *signaling* semaphores, i.e., Thread A calls sem_init() ... sem_wait() > and blocks until Thread B calls sem_post(). > > But this semaphore is called g_netlock and the functions that operate > on it are net_lock(), net_trylock(), net_unlock(), etc. This suggests > it is used for locking/mutual exclusion, not signaling; i.e., the same > thread that calls sem_wait() will call sem_post(). If that is the > case, we should *not* call sem_setprotocol(SEM_PRIO_NONE), so that the > priority inheritance mechanism can boost the holder's priority when > appropriate. > > I am wondering if the deadlock might be caused by something else, > which is hidden by disabling priority inheritance for this semaphore? > > Hopefully someone with more knowledge of the network stack can chime in... > > In the meantime, I think you should file a bug in the bug tracker. > > [1] > https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance > > Nathan >