Hi Nathan,

you are right, changing this flag disables the priority inheritance, in
contrast to what I thought initially.

Thus, the bug manifests itself only when the semaphore has priority
inheritance enabled.

I would suspect an issue with the priority inheritance mechanism, and not
network itself.
I will spend some more time investigating this, and report here.


Στις Παρ 13 Μαΐ 2022 στις 6:27 μ.μ., ο/η Nathan Hartman <
hartman.nat...@gmail.com> έγραψε:

> On Fri, May 13, 2022 at 6:43 AM Fotis Panagiotopoulos
> <f.j.pa...@gmail.com> wrote:
> >
> > Hello!
> >
> > I am facing various issues with networking in NuttX, including a nasty
> > deadlock.
> >
> > I tried to track down this deadlock, and it seems that it is related to
> > g_netlock.
> > I am not sure yet what is the sequence that leads to this.
> >
> > I have CONFIG_PRIORITY_INHERITANCE enabled.
> > However, I see that SEM_INITIALIZER() does not set the SEM_PRIO_INHERIT
> > flag,
> > and thus g_netlock does not have priority inheritance enabled.
> >
> > I tried to set SEM_PRIO_INHERIT in SEM_INITIALIZER(), and networking is
> > much more stable now.
> > I never saw this deadlock happening again.
> >
> > Shouldn't this flag be enabled for this semaphore?
>
>
> Hmm. This is a tricky one.
>
> When built with CONFIG_PRIORITY_INHERITANCE, by default ALL semaphores
> have priority inheritance enabled unless it is turned off for a
> specific semaphore by calling sem_setprotocol(SEM_PRIO_NONE).
>
> If you look at nxsem_set_protocol() (there are two versions; see the
> one in #ifdef CONFIG_PRIORITY_INHERITANCE), you'll see that "setting"
> SEM_PRIO_INHERIT actually unsets sem->flags &=
> ~PRIOINHERIT_FLAGS_DISABLE. That is, priority inheritance is opt-out.
>
> As explained in [1], sem_setprotocol(SEM_PRIO_NONE) must be called for
> *signaling* semaphores, i.e., Thread A calls sem_init() ... sem_wait()
> and blocks until Thread B calls sem_post().
>
> But this semaphore is called g_netlock and the functions that operate
> on it are net_lock(), net_trylock(), net_unlock(), etc. This suggests
> it is used for locking/mutual exclusion, not signaling; i.e., the same
> thread that calls sem_wait() will call sem_post(). If that is the
> case, we should *not* call sem_setprotocol(SEM_PRIO_NONE), so that the
> priority inheritance mechanism can boost the holder's priority when
> appropriate.
>
> I am wondering if the deadlock might be caused by something else,
> which is hidden by disabling priority inheritance for this semaphore?
>
> Hopefully someone with more knowledge of the network stack can chime in...
>
> In the meantime, I think you should file a bug in the bug tracker.
>
> [1]
> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
>
> Nathan
>

Reply via email to