Still problems with ESP32 compilation using nuttx-export

2023-03-31 Thread Roberto Bucher

Hi

I still have problems by compiling a NuttX project using the exported 
libraries and the Makefile usually used to compile other targets under 
pysimCoder.


I've recompiled the .ld files as ld.tmp and the I changed the name to 
.ld again in order to find them in my Makefile... I don't know if it is 
correct...


%.ld.tmp: %.ld
    $(CPP) -isystem $(NUTTX_EXPORT)/include -D__NuttX__ -DNDEBUG 
-D__KERNEL__ \

    -I$(NUTTX_EXPORT)/arch/chip \
    -I $(NUTTX_EXPORT)/arch/os/sched -o $@ $<

mv .ld.tmp .ld

After this I launch my makefile again and I obtain:

 xtensa-esp32-elf-ld -nostdlib --gc-sections --cref 
-Map=/home/bucher/sviluppo/NUTTX/nuttx/nuttx.map -L 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/libs 
--entry=__start  -o ../test.elf \
  -r -e main -T 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/scripts/gnu-elf.ld 
\

  -Map test.elf.map \
  nuttx_main.o test.o 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/lib/libpyblk.a --start-group 
/home/bucher/sviluppo/GITHUB/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/11.2.0/libgcc.a 
--end-group

### Created ELF loadable file: test.elf

the elf file seems to be correctly generated, but after this I have a 
lot of errors:


xtensa-esp32-elf-ld: warning: cannot find entry symbol __start; not 
setting start address


and a lot of undefined' symbols like for example:

xtensa-esp32-elf-ld -nostdlib --gc-sections --cref 
-Map=/home/bucher/sviluppo/NUTTX/nuttx/nuttx.map -L 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/libs 
--entry=__start  -T 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/scripts/esp32_rom.ld 
-T 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/scripts/flat_memory.ld 
-T 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/scripts/legacy_sections.ld 
\

  -o ../test  \
  nuttx_main.o test.o  nuttx_main-builtintab.o 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/lib/libpyblk.a --start-group 
-lsched -ldrivers -lboards -lc -lmm -larch -lm -lxx -lapps -lnet -lfs 
-lbinfmt -lwireless -lboard -lboard 
/home/bucher/sviluppo/GITHUB/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/11.2.0/libgcc.a 
--end-group
xtensa-esp32-elf-ld: warning: cannot find entry symbol __start; not 
setting start address


xtensa-esp32-elf-ld: 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/libs/libarch.a(esp32_wifi_adapter.o):(.literal.esp_evt_work_cb+0x14): 
undefined reference to `esp_wifi_set_ps'

...

Any idea?  Does exist a a makefile for ESP32 that uses the 
"nuttx-export" files?


Thanks in advance

Roberto






Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Petro Karashchenko
Hello,

Migration to nxmutex is quite a big effort and unfortunately recently I
didn't have much time to deep dive into this. In general I support an
initiative and do not see a use case for priority inheritance for regular
semaphores, so I think we should clean-up priority inheritance code for the
regular signalling semaphores and introduce a new kernel object (mutex)
instead. This is surely valid for the kernel, but not for the user space
that already has pthread_mutex with priority inheritance option, so I do
not see anything is needed for user space.

I see a few areas where the problems may appear and just want to make sure
that next areas are analyzed and tested with each migration step:
1. Cancellation points and FLAT mode. Since we are having only one copy of
a LIBC here we need to decide what object to use, so all interfaces that
are a cancellation point will still work as before
2. If we start to base pthread objects on nxmutex then we need to make sure
that cancellation point operation is not broken, For example rw_lock API
are cancellation points.

The optimization of priority inheritance operation is welcomed.

Best regards,
Petro

пт, 31 бер. 2023 р. о 07:10 Tomek CEDRO  пише:

> On Fri, Mar 31, 2023 at 12:23 AM Gregory Nutt  wrote:
> >
> >  > In his Confluence paper on "Signaling Semaphores and Priority
> > Inheritance”, Brennan Ashton’s analysis is both thorough and accurate;
> ...
> >
> > Minor fix.  I wrote the paper, Brennan converted the Confluence page
> > from an older DocuWiki page
>
> Respect :-)
>
>
> >  > The solution Brennan suggests is to initialize semaphores used as
> > signaling events as follows:
> >  > sem_init(&sem, 0, 0);
> >  > sem_setprotocol(&sem, SEM_PRIO_NONE);
> >  >
> >  > this is, of course, correct, but retains the dual purpose of
> > sem_wait() and sem_post().  I believe this can be confusing and will
> > continue to be a source of subtle errors.  I suggest going a step
> > further and isolate the two use cases.  Let the current sem_init,
> > sem_wait, sem_post be used only for the resource locking use case.
> >  >
> >  > For the signaling use case, create a new API for event signaling
> > within the kernel: nxev_init, nxev_wait, nxev_post where: nxev_init is
> > simply:
> >  > sem_init(&nxev, 0, 0);
> >  > sem_setprotocol(&nxev, SEM_PRIO_NONE);
> >  >
> >  > and:
> >  >  #define nxev_waitsem_wait
> >  >  #define nxev_postsem_post
> >  >
> >  > In the case were PRIORITY_INHERITANCE is not configured,
> > sem_setprotocol() does nothing and the nxev_*() API is still used for
> > event notification.
> >  >
> >  > This may seem a trivial change, but having specific API function
> > names for the two specific use cases would, I believe, all but eliminate
> > future confusion; especially given that most people look to existing
> > drivers to use as a template.  Finally, enabling or disabling
> > PRIORITY_INHERITANCE would not introduce the subtle error Brennan
> > documented.
> >
> > Your suggestion would clarify the usage.
> >
> > I was thinking out a conceptually simple solution that should also
> > resolve the risk in usage:  Just change the default state to
> > SEM_PRIO_NONE for all semaphores.  That would make the default protocol
> > for semaphore be no priority inheritance.
> >
> > This would be a lot of work, however.  All occurrences of sem_init()
> > would have to be examined:
> >
> >  For the signaling use case, you would do nothing.  We would have to
> > remove all sem_setprotocol(&nxev, SEM_PRIO_NONE);
> >  For the resource locking use case, you would have to add
> > sem_setprotocol(&nxev, SEM_PRIO_INHERIT); assuming priority inheritance
> > is enabled.
> >
> > The eliminates the risk of inappropriately using priority inheritance on
> > a locking semaphore.  The consequences of doing that are very bad:
> >
> https://nuttx.yahoogroups.narkive.com/3hggphCi/problem-related-semaphore-and-priority-inheritance
> >
> > Then the only error that the user can make is to fail to select priority
> > inheritance when it would do good.  That is a lesser error and, as you
> > note, usually OK.
>
> +1 :-)
>
> --
> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Gregory Nutt




Migration to nxmutex is quite a big effort and unfortunately recently I
didn't have much time to deep dive into this. In general I support an
initiative and do not see a use case for priority inheritance for regular
semaphores, so I think we should clean-up priority inheritance code for the
regular signalling semaphores and introduce a new kernel object (mutex)
instead. This is surely valid for the kernel, but not for the user space
that already has pthread_mutex with priority inheritance option, so I do
not see anything is needed for user space.


Not all locking is binary.  The are cases in the OS where there a 
multiple instances of a resource that are protected with a counting 
semaphore.  If there are N things available, N attempts will return a 
thing, but the N+1th attempt blocks.  If the requester of the N+1th 
thing is high priority then priority inversion can occur.


This is a large change and can affect many people.  This appears to be 
controversial.  Controversial change require a vote of the PMC to 
continue.  Different rules for "Votes on Code Modification" appear 
here:  https://www.apache.org/foundation/voting.html


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Petro Karashchenko
Yes and No. For counting semaphores we have multiple holders for example
with priorities 10, 50 and 90. Now a task with priority 100 comes and wants
to take a semaphore. Priority which of the holders should be increased? The
lowest or the highest holder? With a real-time point of view it should be
90 boosted to 100, but is the current implementation doing that?

Let's see how the other RTOS-s are doing that. I do not think that we are
the first who meet this design question.

Best regards,
Petro

пт, 31 бер. 2023 р. о 16:04 Gregory Nutt  пише:

>
> > Migration to nxmutex is quite a big effort and unfortunately recently I
> > didn't have much time to deep dive into this. In general I support an
> > initiative and do not see a use case for priority inheritance for regular
> > semaphores, so I think we should clean-up priority inheritance code for
> the
> > regular signalling semaphores and introduce a new kernel object (mutex)
> > instead. This is surely valid for the kernel, but not for the user space
> > that already has pthread_mutex with priority inheritance option, so I do
> > not see anything is needed for user space.
>
> Not all locking is binary.  The are cases in the OS where there a
> multiple instances of a resource that are protected with a counting
> semaphore.  If there are N things available, N attempts will return a
> thing, but the N+1th attempt blocks.  If the requester of the N+1th
> thing is high priority then priority inversion can occur.
>
> This is a large change and can affect many people.  This appears to be
> controversial.  Controversial change require a vote of the PMC to
> continue.  Different rules for "Votes on Code Modification" appear
> here:  https://www.apache.org/foundation/voting.html
>


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Gregory Nutt




Yes and No. For counting semaphores we have multiple holders for example
with priorities 10, 50 and 90. Now a task with priority 100 comes and wants
to take a semaphore. Priority which of the holders should be increased? The
lowest or the highest holder? With a real-time point of view it should be
90 boosted to 100, but is the current implementation doing that?
Currently ALL holders are boosted to the priority of the highest 
priority waiter.




Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Petro Karashchenko
I still see more questions than answers. As semaphores can be posted from
the interrupt level. Let's take next example:
The counting semaphore manages DMA channels.
Task allocates a DMA channels and takes counting semaphore (becomes a
holder), but posting a semaphore is done from DMA completion can back as
channel is freed there. The holder task may still do some activities on the
background while DMA is working. But current priority boost schema will
rise it's priority (even if boost will not lead to faster posting of a
semaphore). This is more theoretical description, but describes the state
of problem.

I think we can task about inheritance only if take/post are done from task
level and currently only mutex ensure that.

Best regards,
Petro

On Fri, Mar 31, 2023, 4:30 PM Gregory Nutt  wrote:

>
> > Yes and No. For counting semaphores we have multiple holders for example
> > with priorities 10, 50 and 90. Now a task with priority 100 comes and
> wants
> > to take a semaphore. Priority which of the holders should be increased?
> The
> > lowest or the highest holder? With a real-time point of view it should be
> > 90 boosted to 100, but is the current implementation doing that?
> Currently ALL holders are boosted to the priority of the highest
> priority waiter.
>
>


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Gregory Nutt




I still see more questions than answers. As semaphores can be posted from
the interrupt level. Let's take next example:
The counting semaphore manages DMA channels.
Task allocates a DMA channels and takes counting semaphore (becomes a
holder), but posting a semaphore is done from DMA completion can back as
channel is freed there. The holder task may still do some activities on the
background while DMA is working. But current priority boost schema will
rise it's priority (even if boost will not lead to faster posting of a
semaphore). This is more theoretical description, but describes the state
of problem.

I think we can task about inheritance only if take/post are done from task
level and currently only mutex ensure that.


That is not true.  Posting from an interrupt never boosts priority and, 
hence, never causes inheritance of priority.  It can only cause a drop / 
restoration in priority.  That may result in context switches which can 
be done from the interrupt level with no problem.  I don't see any 
issue.  Certainly this works, it is done often and works very well.


This is an important feature of the real time behavior.  We can't lose 
this behavior.





Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Petro Karashchenko
Sorry for been not clear. Here is a better description:
2 DMA channels accountable by a counting semaphore.
The semaphore is posted by DMA completion interrupt.
TaskA with priority 10 allocates DMA0 channel and starts DMA activity.
TaskB with priority 20 allocates DMA1 channel and starts DMA activity.
TaskC with priority 30 wants to allocate a DMA channel, so boosts priority
of TaskA and TaskB to 30 (even if that will not lead to fasted DMA
operation completion).
DMA1 completes and posts semaphore, so TaskC gets it and TaskA and TaskB
priorities are restored.

Best regards,
Petro

On Fri, Mar 31, 2023, 5:26 PM Gregory Nutt  wrote:

>
> > I still see more questions than answers. As semaphores can be posted from
> > the interrupt level. Let's take next example:
> > The counting semaphore manages DMA channels.
> > Task allocates a DMA channels and takes counting semaphore (becomes a
> > holder), but posting a semaphore is done from DMA completion can back as
> > channel is freed there. The holder task may still do some activities on
> the
> > background while DMA is working. But current priority boost schema will
> > rise it's priority (even if boost will not lead to faster posting of a
> > semaphore). This is more theoretical description, but describes the state
> > of problem.
> >
> > I think we can task about inheritance only if take/post are done from
> task
> > level and currently only mutex ensure that.
>
> That is not true.  Posting from an interrupt never boosts priority and,
> hence, never causes inheritance of priority.  It can only cause a drop /
> restoration in priority.  That may result in context switches which can
> be done from the interrupt level with no problem.  I don't see any
> issue.  Certainly this works, it is done often and works very well.
>
> This is an important feature of the real time behavior.  We can't lose
> this behavior.
>
>
>


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Petro Karashchenko
Even more. In my previous example if semaphore is posted from the interrupt
we do not know which of TaskA or TaskB is no longer a "holder l" of a
semaphore.

Best regards,
Petro

On Fri, Mar 31, 2023, 5:39 PM Petro Karashchenko <
petro.karashche...@gmail.com> wrote:

> Sorry for been not clear. Here is a better description:
> 2 DMA channels accountable by a counting semaphore.
> The semaphore is posted by DMA completion interrupt.
> TaskA with priority 10 allocates DMA0 channel and starts DMA activity.
> TaskB with priority 20 allocates DMA1 channel and starts DMA activity.
> TaskC with priority 30 wants to allocate a DMA channel, so boosts priority
> of TaskA and TaskB to 30 (even if that will not lead to fasted DMA
> operation completion).
> DMA1 completes and posts semaphore, so TaskC gets it and TaskA and TaskB
> priorities are restored.
>
> Best regards,
> Petro
>
> On Fri, Mar 31, 2023, 5:26 PM Gregory Nutt  wrote:
>
>>
>> > I still see more questions than answers. As semaphores can be posted
>> from
>> > the interrupt level. Let's take next example:
>> > The counting semaphore manages DMA channels.
>> > Task allocates a DMA channels and takes counting semaphore (becomes a
>> > holder), but posting a semaphore is done from DMA completion can back as
>> > channel is freed there. The holder task may still do some activities on
>> the
>> > background while DMA is working. But current priority boost schema will
>> > rise it's priority (even if boost will not lead to faster posting of a
>> > semaphore). This is more theoretical description, but describes the
>> state
>> > of problem.
>> >
>> > I think we can task about inheritance only if take/post are done from
>> task
>> > level and currently only mutex ensure that.
>>
>> That is not true.  Posting from an interrupt never boosts priority and,
>> hence, never causes inheritance of priority.  It can only cause a drop /
>> restoration in priority.  That may result in context switches which can
>> be done from the interrupt level with no problem.  I don't see any
>> issue.  Certainly this works, it is done often and works very well.
>>
>> This is an important feature of the real time behavior.  We can't lose
>> this behavior.
>>
>>
>>


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Gregory Nutt




Sorry for been not clear. Here is a better description:
2 DMA channels accountable by a counting semaphore.
The semaphore is posted by DMA completion interrupt.
TaskA with priority 10 allocates DMA0 channel and starts DMA activity.
TaskB with priority 20 allocates DMA1 channel and starts DMA activity.
TaskC with priority 30 wants to allocate a DMA channel, so boosts priority
of TaskA and TaskB to 30 (even if that will not lead to fasted DMA
operation completion).


No, but it will result in a more real time, deterministic to the 
completion of a DMA which is a critical event to the healthy behavior of 
the system.  That is the gold of an RTOS -- NOT faster response, but a 
deterministic response.  That is the meaing of "real time"


This is EXTREMELY important to the viability of NuttX as an RTOS.  If 
the OS cannot respond deterministically in cases like this then the RTOS 
is a total failure as an RTOS.  Might as well remove the RT from the 
beginning.


This is key.  This is absolutely critical to the existence of NuttX as 
an RTOS.  If we remove this capability then the OS is a pile of shit and 
never be used by anyone.



DMA1 completes and posts semaphore, so TaskC gets it and TaskA and TaskB
priorities are restored.

Yes, that sounds correct.


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Gregory Nutt




Even more. In my previous example if semaphore is posted from the interrupt
we do not know which of TaskA or TaskB is no longer a "holder l" of a
semaphore.

You are right.  In this usage case, the counting semaphore is not being 
used for locking; it is being used for signalling an event per 
https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance


In that case, priority inheritance must be turned off.



Development priorities (was: [Breaking change] Move nxmutex to sched)

2023-03-31 Thread Nathan Hartman
In "[Breaking change] Move nxmutex to sched" there was a more general
discussion about our development priorities: What is most important to
us about NuttX?

I'm pulling out this part of the discussion to a new thread, to avoid
clogging up the nxmutex discussion...

For context, with my replies below:

On Thu, Mar 30, 2023 at 5:44 PM Gregory Nutt  wrote:
>
>
> >> I have mixed feelings myself and hope that we get some consensus through
> >> dialog.  One one hand, it is important to stay faithful to documented
> >> standard and undocumented conventions for the use the a POSIX/Unix
> >> systems.  But on the other hand, unlike other OSs that strive toward
> >> standard conformance, we are an RTOS and must satisfy certain
> >> requirements for deterministic, real time behavior.
> >>
> >> What do you all think?
> >
> > My opinion is that we have to respect the requirements for deterministic
> > real-time behavior, even though that implies the addition of certain
> > non-standard interfaces. Otherwise we lose our identity as a real time
> > operating system and the applications I am doing with NuttX (and I'm sure
> > many other people) will not be possible.
> >
> > That said, I also very much like that NuttX strives for standards
> > conformance. For me, this means that most non-real-time code can be
> > developed and tested on a PC with a faster code-compile-debug cycle than
> > embedded and then moved over to embedded when it's ready. This has been a
> > huge productivity boost for me (and I'm sure, once again, for many other
> > people).
> >
> > How, then, do we satisfy both needs?
> >
> > I think the answer is that as long as standard functions behave like the
> > standards and practices expect, and deviations from the standards use
> > identifiers that do not collide with the standards, both needs are
> > satisfied well. Applications that do not utilize our real time "extensions"
> > will not notice the difference, and applications that do utilize them will
> > meet real time requirements as needed.
> >
> > I think that in large part we are already doing exactly that, so there
> > isn't really a problem that needs fixing here.
> >
> > I don't know the details of this specific PR yet, so I am just giving my
> > opinion about the premise of NuttX in general.
>
> Well said.
>
> We are creating something uncommon; we are creating an RTOS that let's
> you run POSIX (read  Linux ) code while retaining the real time,
> deterministic performance of an RTOS  If we sacrifice either the real
> time nature or POSIX compatibility, then we have failed.
>
> We are not building another Linux.  We already have a very nice one,
> thank you.
>
> We have had other discussions recently about tradeoffs between POSIX
> compatibility and code size.  I don't think that was resolved to
> everyone's satisfaction.
>
> It seems to me that when we have to make trade-offs , we tend to do so
> according to the following three values:
>
>  1. Real time, deterministic behavior,
>  2. Standards compliance, and
>  3. OS Footprint
>
> Based on recent decisions and tradeoffs, I list those in what seems to
> be their decreasing order of importance to the project. Do you agree
> with those values and their importance?  If so, should they be enshrined
> somehow?
>
> Some of this is in INVIOLABLES.md.  But INVIOLABLES.md  mostly addresses
> a lower level set of design values:  Modularity, coding style, etc.

This is a very good summary. I think it describes very well what our
priorities have been until now, I think we should keep the same list
of priorities moving forward, and it might be useful to codify it
somewhere that NuttX is developed with this order of importance
(copying Greg's summary):

[[[
 1. Real time, deterministic behavior,
 2. Standards compliance, and
 3. OS Footprint
]]]

Regarding (1), as has been said by Greg, myself, and probably others,
the real time deterministic behavior is critical. Without that, I
can't really use NuttX for anything significant.

Regarding (2), the standards compliance is very helpful because it
makes it possible to write and test most of the non-real-time code on
a PC, where the edit-compile-debug cycle is much faster and more
convenient, and then move working code to embedded.

Regarding (3), being careful not to grow the OS Flash footprint too
much means that we can make long-lived products and upgrade their
firmware well into the future. This is important for things used in
industry and infrastructure, where product life cycles are measured in
years to decades.

Cheers,
Nathan


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Gregory Nutt

On 3/31/2023 8:56 AM, Gregory Nutt wrote:


Even more. In my previous example if semaphore is posted from the 
interrupt

we do not know which of TaskA or TaskB is no longer a "holder l" of a
semaphore.

You are right.  In this usage case, the counting semaphore is not 
being used for locking; it is being used for signalling an event per 
https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance


In that case, priority inheritance must be turned off.

You example is really confusing because you are mixing two different 
concepts, just subtly enough to be really confusing.  If a counting 
semaphore is used for locking a set of multiple resources, the posting 
the semaphore does not release the resource.  That is not the way that 
it is done.  Here is a more believable example of how this would work:


1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
   channel allocation function.  If a DMA channel is available, it is
   allocated and the allocation function takes the semaphore. TaskA
   then starts DMA activity.
2. TaskA waits on another signaling semaphore for the DMA to complete.
3. TaskB with priority 20 does the same.
4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
   the channel allocation function which waits on the sempahore for a
   count to be available.  This boost the priority of TaskA and TaskB
   to 30.
5. When the DMA started by TaskA completes, it signals TaskA which
   calls the resource deallocation function which increments the
   counting semaphore, giving the count to TaskC and storing the base
   priorities.

The confusion arises because you are mixing the signaling logic with the 
resource deallocation logic.


The mm/iob logic works basically this way.  The logic more complex then 
you would think from above.  IOBs is an example of a critical system 
resource that has multiple copies and utilizes a counting semaphore with 
priority inheritance to achieve good real time performance.   IOB 
handling is key logic for the correct real time operation of the overall 
system.  Nothing we do must risk this.


Other places where this logic is (probably) used:

   arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0,
   CONFIG_RP2040_I2S_MAXINFLIGHT);
   arch/arm/src/rtl8720c/amebaz_depend.c:  if (sem_init(_sema, 0,
   init_val))
   arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
   CONFIG_SAMA5_SSC_MAXINFLIGHT);
   arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
   CONFIG_SAMV7_SSC_MAXINFLIGHT);
   arch/arm/src/stm32/stm32_i2s.c: nxsem_init(&priv->bufsem, 0,
   CONFIG_STM32_I2S_MAXINFLIGHT);
   drivers/can/mcp2515.c: nxsem_init(&priv->txfsem, 0,
   MCP2515_NUM_TX_BUFFERS);
   drivers/video/vnc/vnc_server.c: nxsem_init(&session->freesem, 0,
   CONFIG_VNCSERVER_NUPDATES);
   sched/pthread/pthread_completejoin.c: nxsem_init(&pjoin->data_sem,
   0, (ntasks_waiting + 1));
   wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.le_pkts_sem, 0,
   g_btdev.le_pkts);
   wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.ncmd_sem, 0, 1);
   wireless/ieee802154/mac802154.c: nxsem_init(&priv->txdesc_sem, 0,
   CONFIG_MAC802154_NTXDESC);
   wireless/ieee802154/mac802154.c: nxsem_init(&mac->opsem, 0, 1);

Maybe:

   arch/risc-v/src/bl602/bl602_os_hal.c: ret = nxsem_init(sem, 0, init);
   arch/risc-v/src/esp32c3/esp32c3_ble_adapter.c: ret =
   sem_init(&bt_sem->sem, 0, init);
   arch/risc-v/src/esp32c3/esp32c3_wifi_adapter.c: ret =
   nxsem_init(sem, 0, init);
   arch/xtensa/src/esp32/esp32_ble_adapter.c: ret = sem_init(sem, 0, init);
   arch/xtensa/src/esp32/esp32_wifi_adapter.c: ret = nxsem_init(sem, 0,
   init);
   arch/xtensa/src/esp32s3/esp32s3_wifi_adapter.c: ret =
   nxsem_init(sem, 0, init);




Re: Development priorities (was: [Breaking change] Move nxmutex to sched)

2023-03-31 Thread Tomek CEDRO
On Fri, Mar 31, 2023 at 6:31 PM Nathan Hartman wrote:
> In "[Breaking change] Move nxmutex to sched" there was a more general
> discussion about our development priorities: What is most important to
> us about NuttX?
>
> I'm pulling out this part of the discussion to a new thread, to avoid
> clogging up the nxmutex discussion...
> (..)
> On Thu, Mar 30, 2023 at 5:44 PM Gregory Nutt wrote:
> > We are creating something uncommon; we are creating an RTOS that let's
> > you run POSIX (read  Linux ) code while retaining the real time,
> > deterministic performance of an RTOS  If we sacrifice either the real
> > time nature or POSIX compatibility, then we have failed.
> >
> > We are not building another Linux.  We already have a very nice one,
> > thank you.
> >
> > We have had other discussions recently about tradeoffs between POSIX
> > compatibility and code size.  I don't think that was resolved to
> > everyone's satisfaction.
> >
> > It seems to me that when we have to make trade-offs , we tend to do so
> > according to the following three values:
> >
> >  1. Real time, deterministic behavior,
> >  2. Standards compliance, and
> >  3. OS Footprint
> >
> > Based on recent decisions and tradeoffs, I list those in what seems to
> > be their decreasing order of importance to the project. Do you agree
> > with those values and their importance?  If so, should they be enshrined
> > somehow?
> >
> > Some of this is in INVIOLABLES.md.  But INVIOLABLES.md  mostly addresses
> > a lower level set of design values:  Modularity, coding style, etc.
>
> This is a very good summary. I think it describes very well what our
> priorities have been until now, I think we should keep the same list
> of priorities moving forward, and it might be useful to codify it
> somewhere that NuttX is developed with this order of importance
> (copying Greg's summary):
>
> [[[
>  1. Real time, deterministic behavior,
>  2. Standards compliance, and
>  3. OS Footprint
> ]]]
>
> Regarding (1), as has been said by Greg, myself, and probably others,
> the real time deterministic behavior is critical. Without that, I
> can't really use NuttX for anything significant.
>
> Regarding (2), the standards compliance is very helpful because it
> makes it possible to write and test most of the non-real-time code on
> a PC, where the edit-compile-debug cycle is much faster and more
> convenient, and then move working code to embedded.
>
> Regarding (3), being careful not to grow the OS Flash footprint too
> much means that we can make long-lived products and upgrade their
> firmware well into the future. This is important for things used in
> industry and infrastructure, where product life cycles are measured in
> years to decades.

+1 +1 +1 :-) :-) :-)

-- 
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread David S. Alessio



> On Mar 30, 2023, at 3:23 PM, Gregory Nutt  wrote:
> 
> > In his Confluence paper on "Signaling Semaphores and Priority Inheritance”, 
> > Brennan Ashton’s analysis is both thorough and accurate; ...
> 
> Minor fix.  I wrote the paper, Brennan converted the Confluence page from an 
> older DocuWiki page

Hi, Greg, I should have known!
Cheers,
-david



Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Petro Karashchenko
Hello Greg,

I already wrote that my example is more theoretical and may be a "bad
design", but it illustrates the issue in the current code.
Please understand me right, I'm not against having priority inheritance
available for semaphores. I just want to have things well defined and
possibly prohibit / catch a bad usage. Like mutex usually have a holder
field and only a holder of a mutex can release it. That is basically not
true for semaphores. I would be happy if:
1. sem_post() would assert on an attempt to call it from interrupt with a
semaphore that has priority inheritance enabled.
2. sem_post() would assert if the caller task is not in a holder list of
a semaphore that has priority inheritance enabled.
But unfortunately that is not true now and currently "nxsem_release_holder"
just decrements a holder count of a random holder if caller TCB is not in a
holder list.

What I'm trying to say is that calling sem_wait/sem_post from task context
is essential for priority inheritance and there is a hole here currently.
I'm ok with building a kernel mutex as a wrapper on top of the "fixed"
semaphores with priority inheritance that will do some additional checks if
needed.

Regarding a "pile of shit" and real-time behavior I agree that determinism
is a key, but want to use some terms here. The RTOS vs OS difference is
that RTOS implements scheduling using rate monotonic scheduling (RMS) and
that is mathematically proven by RMA. That is how determinism is achieved.
If we get back to the example with counting semaphore and the case of "ALL
holders are boosted to the priority of the highest priority waiter". What
is the impact on RMS here? The priority inheritance for mutex (a binary
semaphore case) was intended to minimize impact on RMS when an exceptional
situation happens. So in the case of a few holders, how does creating a
pool of concurrent tasks with the same priority improve determinism? Or
maybe exactly this makes RTOS a "pile of shit"? Personally I do not have a
clear answer and this is more expressing my thoughts here. I really think
that at the end of this discussion we will have some good results that will
lead to an improvement.

Best regards,
Petro

пт, 31 бер. 2023 р. о 20:56 David S. Alessio 
пише:

>
>
> > On Mar 30, 2023, at 3:23 PM, Gregory Nutt  wrote:
> >
> > > In his Confluence paper on "Signaling Semaphores and Priority
> Inheritance”, Brennan Ashton’s analysis is both thorough and accurate; ...
> >
> > Minor fix.  I wrote the paper, Brennan converted the Confluence page
> from an older DocuWiki page
>
> Hi, Greg, I should have known!
> Cheers,
> -david
>
>


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Xiang Xiao
On Sat, Apr 1, 2023 at 12:34 AM Gregory Nutt  wrote:

> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
> >
> >> Even more. In my previous example if semaphore is posted from the
> >> interrupt
> >> we do not know which of TaskA or TaskB is no longer a "holder l" of a
> >> semaphore.
> >>
> > You are right.  In this usage case, the counting semaphore is not
> > being used for locking; it is being used for signalling an event per
> >
> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
> >
> > In that case, priority inheritance must be turned off.
> >
> You example is really confusing because you are mixing two different
> concepts, just subtly enough to be really confusing.  If a counting
> semaphore is used for locking a set of multiple resources, the posting
> the semaphore does not release the resource.  That is not the way that
> it is done.  Here is a more believable example of how this would work:
>
>  1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
> channel allocation function.  If a DMA channel is available, it is
> allocated and the allocation function takes the semaphore. TaskA
> then starts DMA activity.
>  2. TaskA waits on another signaling semaphore for the DMA to complete.
>  3. TaskB with priority 20 does the same.
>  4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
> the channel allocation function which waits on the sempahore for a
> count to be available.  This boost the priority of TaskA and TaskB
> to 30.
>  5. When the DMA started by TaskA completes, it signals TaskA which
> calls the resource deallocation function which increments the
> counting semaphore, giving the count to TaskC and storing the base
> priorities.
>
>

Normally, the resource(dma channel here) is allocated from one thread/task,
but may be freed in another thread/task. Please consider how we malloc and
free memory.


> The confusion arises because you are mixing the signaling logic with the
> resource deallocation logic.
>
> The mm/iob logic works basically this way.  The logic more complex then
> you would think from above.  IOBs is an example of a critical system
> resource that has multiple copies and utilizes a counting semaphore with
> priority inheritance to achieve good real time performance.   IOB
> handling is key logic for the correct real time operation of the overall
> system.  Nothing we do must risk this.
>
>
IOB is a very good example to demonstrate why it's a bad and dangerous idea
to enable priority inheritance for the counting semaphore. IOB is normally
allocated in the send thread but free in the work thread. If we want the
priority inheritance to work as expected instead of crashing the system,
sem_wait/sem_post must come from the same thread, which is a kind of lock.


> Other places where this logic is (probably) used:
>
> arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0,
> CONFIG_RP2040_I2S_MAXINFLIGHT);
> arch/arm/src/rtl8720c/amebaz_depend.c:  if (sem_init(_sema, 0,
> init_val))
> arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
> CONFIG_SAMA5_SSC_MAXINFLIGHT);
> arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
> CONFIG_SAMV7_SSC_MAXINFLIGHT);
> arch/arm/src/stm32/stm32_i2s.c: nxsem_init(&priv->bufsem, 0,
> CONFIG_STM32_I2S_MAXINFLIGHT);
> drivers/can/mcp2515.c: nxsem_init(&priv->txfsem, 0,
> MCP2515_NUM_TX_BUFFERS);
> drivers/video/vnc/vnc_server.c: nxsem_init(&session->freesem, 0,
> CONFIG_VNCSERVER_NUPDATES);
> sched/pthread/pthread_completejoin.c: nxsem_init(&pjoin->data_sem,
> 0, (ntasks_waiting + 1));
> wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.le_pkts_sem, 0,
> g_btdev.le_pkts);
> wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.ncmd_sem, 0, 1);
> wireless/ieee802154/mac802154.c: nxsem_init(&priv->txdesc_sem, 0,
> CONFIG_MAC802154_NTXDESC);
> wireless/ieee802154/mac802154.c: nxsem_init(&mac->opsem, 0, 1);
>
> Maybe:
>
> arch/risc-v/src/bl602/bl602_os_hal.c: ret = nxsem_init(sem, 0, init);
> arch/risc-v/src/esp32c3/esp32c3_ble_adapter.c: ret =
> sem_init(&bt_sem->sem, 0, init);
> arch/risc-v/src/esp32c3/esp32c3_wifi_adapter.c: ret =
> nxsem_init(sem, 0, init);
> arch/xtensa/src/esp32/esp32_ble_adapter.c: ret = sem_init(sem, 0,
> init);
> arch/xtensa/src/esp32/esp32_wifi_adapter.c: ret = nxsem_init(sem, 0,
> init);
> arch/xtensa/src/esp32s3/esp32s3_wifi_adapter.c: ret =
> nxsem_init(sem, 0, init);
>
>


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Xiang Xiao
BTW, https://github.com/apache/nuttx/pull/5070 report that the system will
crash if  the priority inheritance enabled semaphore is waited or posted
from different threads.

On Sat, Apr 1, 2023 at 3:20 AM Xiang Xiao  wrote:

>
>
> On Sat, Apr 1, 2023 at 12:34 AM Gregory Nutt  wrote:
>
>> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
>> >
>> >> Even more. In my previous example if semaphore is posted from the
>> >> interrupt
>> >> we do not know which of TaskA or TaskB is no longer a "holder l" of a
>> >> semaphore.
>> >>
>> > You are right.  In this usage case, the counting semaphore is not
>> > being used for locking; it is being used for signalling an event per
>> >
>> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
>> >
>> > In that case, priority inheritance must be turned off.
>> >
>> You example is really confusing because you are mixing two different
>> concepts, just subtly enough to be really confusing.  If a counting
>> semaphore is used for locking a set of multiple resources, the posting
>> the semaphore does not release the resource.  That is not the way that
>> it is done.  Here is a more believable example of how this would work:
>>
>>  1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
>> channel allocation function.  If a DMA channel is available, it is
>> allocated and the allocation function takes the semaphore. TaskA
>> then starts DMA activity.
>>  2. TaskA waits on another signaling semaphore for the DMA to complete.
>>  3. TaskB with priority 20 does the same.
>>  4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
>> the channel allocation function which waits on the sempahore for a
>> count to be available.  This boost the priority of TaskA and TaskB
>> to 30.
>>  5. When the DMA started by TaskA completes, it signals TaskA which
>> calls the resource deallocation function which increments the
>> counting semaphore, giving the count to TaskC and storing the base
>> priorities.
>>
>>
>
> Normally, the resource(dma channel here) is allocated from one
> thread/task, but may be freed in another thread/task. Please consider how
> we malloc and free memory.
>
>
>> The confusion arises because you are mixing the signaling logic with the
>> resource deallocation logic.
>>
>> The mm/iob logic works basically this way.  The logic more complex then
>> you would think from above.  IOBs is an example of a critical system
>> resource that has multiple copies and utilizes a counting semaphore with
>> priority inheritance to achieve good real time performance.   IOB
>> handling is key logic for the correct real time operation of the overall
>> system.  Nothing we do must risk this.
>>
>>
> IOB is a very good example to demonstrate why it's a bad and dangerous
> idea to enable priority inheritance for the counting semaphore. IOB is
> normally allocated in the send thread but free in the work thread. If we
> want the priority inheritance to work as expected instead of crashing the
> system, sem_wait/sem_post must come from the same thread, which is a kind
> of lock.
>
>
>> Other places where this logic is (probably) used:
>>
>> arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0,
>> CONFIG_RP2040_I2S_MAXINFLIGHT);
>> arch/arm/src/rtl8720c/amebaz_depend.c:  if (sem_init(_sema, 0,
>> init_val))
>> arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
>> CONFIG_SAMA5_SSC_MAXINFLIGHT);
>> arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
>> CONFIG_SAMV7_SSC_MAXINFLIGHT);
>> arch/arm/src/stm32/stm32_i2s.c: nxsem_init(&priv->bufsem, 0,
>> CONFIG_STM32_I2S_MAXINFLIGHT);
>> drivers/can/mcp2515.c: nxsem_init(&priv->txfsem, 0,
>> MCP2515_NUM_TX_BUFFERS);
>> drivers/video/vnc/vnc_server.c: nxsem_init(&session->freesem, 0,
>> CONFIG_VNCSERVER_NUPDATES);
>> sched/pthread/pthread_completejoin.c: nxsem_init(&pjoin->data_sem,
>> 0, (ntasks_waiting + 1));
>> wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.le_pkts_sem, 0,
>> g_btdev.le_pkts);
>> wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.ncmd_sem, 0, 1);
>> wireless/ieee802154/mac802154.c: nxsem_init(&priv->txdesc_sem, 0,
>> CONFIG_MAC802154_NTXDESC);
>> wireless/ieee802154/mac802154.c: nxsem_init(&mac->opsem, 0, 1);
>>
>> Maybe:
>>
>> arch/risc-v/src/bl602/bl602_os_hal.c: ret = nxsem_init(sem, 0, init);
>> arch/risc-v/src/esp32c3/esp32c3_ble_adapter.c: ret =
>> sem_init(&bt_sem->sem, 0, init);
>> arch/risc-v/src/esp32c3/esp32c3_wifi_adapter.c: ret =
>> nxsem_init(sem, 0, init);
>> arch/xtensa/src/esp32/esp32_ble_adapter.c: ret = sem_init(sem, 0,
>> init);
>> arch/xtensa/src/esp32/esp32_wifi_adapter.c: ret = nxsem_init(sem, 0,
>> init);
>> arch/xtensa/src/esp32s3/esp32s3_wifi_adapter.c: ret =
>> nxsem_init(sem, 0, init);
>>
>>
>
>
>


Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Petro Karashchenko
Xiang Xiao, is that still true for the latest code in master branch?
And by "system will
crash if  the priority inheritance enabled semaphore is waited or posted
from different threads" do you mean at the point of sem_post/sem_wait or
some system instability in general?

Best regards,
Petro

On Fri, Mar 31, 2023, 10:39 PM Xiang Xiao  wrote:

> BTW, https://github.com/apache/nuttx/pull/5070 report that the system will
> crash if  the priority inheritance enabled semaphore is waited or posted
> from different threads.
>
> On Sat, Apr 1, 2023 at 3:20 AM Xiang Xiao 
> wrote:
>
> >
> >
> > On Sat, Apr 1, 2023 at 12:34 AM Gregory Nutt 
> wrote:
> >
> >> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
> >> >
> >> >> Even more. In my previous example if semaphore is posted from the
> >> >> interrupt
> >> >> we do not know which of TaskA or TaskB is no longer a "holder l" of a
> >> >> semaphore.
> >> >>
> >> > You are right.  In this usage case, the counting semaphore is not
> >> > being used for locking; it is being used for signalling an event per
> >> >
> >>
> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
> >> >
> >> > In that case, priority inheritance must be turned off.
> >> >
> >> You example is really confusing because you are mixing two different
> >> concepts, just subtly enough to be really confusing.  If a counting
> >> semaphore is used for locking a set of multiple resources, the posting
> >> the semaphore does not release the resource.  That is not the way that
> >> it is done.  Here is a more believable example of how this would work:
> >>
> >>  1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
> >> channel allocation function.  If a DMA channel is available, it is
> >> allocated and the allocation function takes the semaphore. TaskA
> >> then starts DMA activity.
> >>  2. TaskA waits on another signaling semaphore for the DMA to complete.
> >>  3. TaskB with priority 20 does the same.
> >>  4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
> >> the channel allocation function which waits on the sempahore for a
> >> count to be available.  This boost the priority of TaskA and TaskB
> >> to 30.
> >>  5. When the DMA started by TaskA completes, it signals TaskA which
> >> calls the resource deallocation function which increments the
> >> counting semaphore, giving the count to TaskC and storing the base
> >> priorities.
> >>
> >>
> >
> > Normally, the resource(dma channel here) is allocated from one
> > thread/task, but may be freed in another thread/task. Please consider how
> > we malloc and free memory.
> >
> >
> >> The confusion arises because you are mixing the signaling logic with the
> >> resource deallocation logic.
> >>
> >> The mm/iob logic works basically this way.  The logic more complex then
> >> you would think from above.  IOBs is an example of a critical system
> >> resource that has multiple copies and utilizes a counting semaphore with
> >> priority inheritance to achieve good real time performance.   IOB
> >> handling is key logic for the correct real time operation of the overall
> >> system.  Nothing we do must risk this.
> >>
> >>
> > IOB is a very good example to demonstrate why it's a bad and dangerous
> > idea to enable priority inheritance for the counting semaphore. IOB is
> > normally allocated in the send thread but free in the work thread. If we
> > want the priority inheritance to work as expected instead of crashing the
> > system, sem_wait/sem_post must come from the same thread, which is a kind
> > of lock.
> >
> >
> >> Other places where this logic is (probably) used:
> >>
> >> arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0,
> >> CONFIG_RP2040_I2S_MAXINFLIGHT);
> >> arch/arm/src/rtl8720c/amebaz_depend.c:  if (sem_init(_sema, 0,
> >> init_val))
> >> arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
> >> CONFIG_SAMA5_SSC_MAXINFLIGHT);
> >> arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
> >> CONFIG_SAMV7_SSC_MAXINFLIGHT);
> >> arch/arm/src/stm32/stm32_i2s.c: nxsem_init(&priv->bufsem, 0,
> >> CONFIG_STM32_I2S_MAXINFLIGHT);
> >> drivers/can/mcp2515.c: nxsem_init(&priv->txfsem, 0,
> >> MCP2515_NUM_TX_BUFFERS);
> >> drivers/video/vnc/vnc_server.c: nxsem_init(&session->freesem, 0,
> >> CONFIG_VNCSERVER_NUPDATES);
> >> sched/pthread/pthread_completejoin.c: nxsem_init(&pjoin->data_sem,
> >> 0, (ntasks_waiting + 1));
> >> wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.le_pkts_sem, 0,
> >> g_btdev.le_pkts);
> >> wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.ncmd_sem, 0,
> 1);
> >> wireless/ieee802154/mac802154.c: nxsem_init(&priv->txdesc_sem, 0,
> >> CONFIG_MAC802154_NTXDESC);
> >> wireless/ieee802154/mac802154.c: nxsem_init(&mac->opsem, 0, 1);
> >>
> >> Maybe:
> >>
> >> arch/risc-v/src/bl602/bl602_os_hal.c: ret = nxsem_init(se

Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Xiang Xiao
When the priority inheritance is enabled on a semaphore, sem_wait will add
the current thread to the semaphore's holder table and expect that the same
thread will call sem_post later to remove it from the holder table.
If we mess this fundamental assumption by waiting/posting from different
threads, many strange things will happen. For example, let's consider
what's happen when a program send a TCP packet:

   1. The send task call sem_wait to become a holder and get IOB
   2. Network subsystem copy the user buffer into IOB and add IOB to the
   queue
   3. The send task exit and then semphare contain a dangling pointer to
   the sending tcb
   4. After network subsystem send IOB to the wire and return it  the pool,
   sem_post is called and will touch the dangling pointer

Zeng Zhaoxiu provides a patch(https://github.com/apache/nuttx/pull/5171) to
workaround this issue.
But, the semaphore holder tracking can't work as we expect anymore.

On Sat, Apr 1, 2023 at 3:52 AM Petro Karashchenko <
petro.karashche...@gmail.com> wrote:

> Xiang Xiao, is that still true for the latest code in master branch?
> And by "system will
> crash if  the priority inheritance enabled semaphore is waited or posted
> from different threads" do you mean at the point of sem_post/sem_wait or
> some system instability in general?
>
> Best regards,
> Petro
>
> On Fri, Mar 31, 2023, 10:39 PM Xiang Xiao 
> wrote:
>
> > BTW, https://github.com/apache/nuttx/pull/5070 report that the system
> will
> > crash if  the priority inheritance enabled semaphore is waited or posted
> > from different threads.
> >
> > On Sat, Apr 1, 2023 at 3:20 AM Xiang Xiao 
> > wrote:
> >
> > >
> > >
> > > On Sat, Apr 1, 2023 at 12:34 AM Gregory Nutt 
> > wrote:
> > >
> > >> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
> > >> >
> > >> >> Even more. In my previous example if semaphore is posted from the
> > >> >> interrupt
> > >> >> we do not know which of TaskA or TaskB is no longer a "holder l"
> of a
> > >> >> semaphore.
> > >> >>
> > >> > You are right.  In this usage case, the counting semaphore is not
> > >> > being used for locking; it is being used for signalling an event per
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
> > >> >
> > >> > In that case, priority inheritance must be turned off.
> > >> >
> > >> You example is really confusing because you are mixing two different
> > >> concepts, just subtly enough to be really confusing.  If a counting
> > >> semaphore is used for locking a set of multiple resources, the posting
> > >> the semaphore does not release the resource.  That is not the way that
> > >> it is done.  Here is a more believable example of how this would work:
> > >>
> > >>  1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
> > >> channel allocation function.  If a DMA channel is available, it is
> > >> allocated and the allocation function takes the semaphore. TaskA
> > >> then starts DMA activity.
> > >>  2. TaskA waits on another signaling semaphore for the DMA to
> complete.
> > >>  3. TaskB with priority 20 does the same.
> > >>  4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
> > >> the channel allocation function which waits on the sempahore for a
> > >> count to be available.  This boost the priority of TaskA and TaskB
> > >> to 30.
> > >>  5. When the DMA started by TaskA completes, it signals TaskA which
> > >> calls the resource deallocation function which increments the
> > >> counting semaphore, giving the count to TaskC and storing the base
> > >> priorities.
> > >>
> > >>
> > >
> > > Normally, the resource(dma channel here) is allocated from one
> > > thread/task, but may be freed in another thread/task. Please consider
> how
> > > we malloc and free memory.
> > >
> > >
> > >> The confusion arises because you are mixing the signaling logic with
> the
> > >> resource deallocation logic.
> > >>
> > >> The mm/iob logic works basically this way.  The logic more complex
> then
> > >> you would think from above.  IOBs is an example of a critical system
> > >> resource that has multiple copies and utilizes a counting semaphore
> with
> > >> priority inheritance to achieve good real time performance.   IOB
> > >> handling is key logic for the correct real time operation of the
> overall
> > >> system.  Nothing we do must risk this.
> > >>
> > >>
> > > IOB is a very good example to demonstrate why it's a bad and dangerous
> > > idea to enable priority inheritance for the counting semaphore. IOB is
> > > normally allocated in the send thread but free in the work thread. If
> we
> > > want the priority inheritance to work as expected instead of crashing
> the
> > > system, sem_wait/sem_post must come from the same thread, which is a
> kind
> > > of lock.
> > >
> > >
> > >> Other places where this logic is (probably) used:
> > >>
> > >> arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->buf

Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Gregory Nutt
Same problem as 
https://nuttx.yahoogroups.narkive.com/3hggphCi/problem-related-semaphore-and-priority-inheritance


On 3/31/2023 2:19 PM, Xiang Xiao wrote:

When the priority inheritance is enabled on a semaphore, sem_wait will add
the current thread to the semaphore's holder table and expect that the same
thread will call sem_post later to remove it from the holder table.
If we mess this fundamental assumption by waiting/posting from different
threads, many strange things will happen. For example, let's consider
what's happen when a program send a TCP packet:

1. The send task call sem_wait to become a holder and get IOB
2. Network subsystem copy the user buffer into IOB and add IOB to the
queue
3. The send task exit and then semphare contain a dangling pointer to
the sending tcb
4. After network subsystem send IOB to the wire and return it  the pool,
sem_post is called and will touch the dangling pointer

Zeng Zhaoxiu provides a patch(https://github.com/apache/nuttx/pull/5171) to
workaround this issue.
But, the semaphore holder tracking can't work as we expect anymore.

On Sat, Apr 1, 2023 at 3:52 AM Petro Karashchenko <
petro.karashche...@gmail.com> wrote:


Xiang Xiao, is that still true for the latest code in master branch?
And by "system will
crash if  the priority inheritance enabled semaphore is waited or posted
from different threads" do you mean at the point of sem_post/sem_wait or
some system instability in general?

Best regards,
Petro

On Fri, Mar 31, 2023, 10:39 PM Xiang Xiao 
wrote:


BTW, https://github.com/apache/nuttx/pull/5070 report that the system

will

crash if  the priority inheritance enabled semaphore is waited or posted
from different threads.

On Sat, Apr 1, 2023 at 3:20 AM Xiang Xiao 
wrote:



On Sat, Apr 1, 2023 at 12:34 AM Gregory Nutt 

wrote:

On 3/31/2023 8:56 AM, Gregory Nutt wrote:

Even more. In my previous example if semaphore is posted from the
interrupt
we do not know which of TaskA or TaskB is no longer a "holder l"

of a

semaphore.


You are right.  In this usage case, the counting semaphore is not
being used for locking; it is being used for signalling an event per


https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance

In that case, priority inheritance must be turned off.


You example is really confusing because you are mixing two different
concepts, just subtly enough to be really confusing.  If a counting
semaphore is used for locking a set of multiple resources, the posting
the semaphore does not release the resource.  That is not the way that
it is done.  Here is a more believable example of how this would work:

  1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
 channel allocation function.  If a DMA channel is available, it is
 allocated and the allocation function takes the semaphore. TaskA
 then starts DMA activity.
  2. TaskA waits on another signaling semaphore for the DMA to

complete.

  3. TaskB with priority 20 does the same.
  4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
 the channel allocation function which waits on the sempahore for a
 count to be available.  This boost the priority of TaskA and TaskB
 to 30.
  5. When the DMA started by TaskA completes, it signals TaskA which
 calls the resource deallocation function which increments the
 counting semaphore, giving the count to TaskC and storing the base
 priorities.



Normally, the resource(dma channel here) is allocated from one
thread/task, but may be freed in another thread/task. Please consider

how

we malloc and free memory.



The confusion arises because you are mixing the signaling logic with

the

resource deallocation logic.

The mm/iob logic works basically this way.  The logic more complex

then

you would think from above.  IOBs is an example of a critical system
resource that has multiple copies and utilizes a counting semaphore

with

priority inheritance to achieve good real time performance.   IOB
handling is key logic for the correct real time operation of the

overall

system.  Nothing we do must risk this.



IOB is a very good example to demonstrate why it's a bad and dangerous
idea to enable priority inheritance for the counting semaphore. IOB is
normally allocated in the send thread but free in the work thread. If

we

want the priority inheritance to work as expected instead of crashing

the

system, sem_wait/sem_post must come from the same thread, which is a

kind

of lock.



Other places where this logic is (probably) used:

 arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0,
 CONFIG_RP2040_I2S_MAXINFLIGHT);
 arch/arm/src/rtl8720c/amebaz_depend.c:  if (sem_init(_sema, 0,
 init_val))
 arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
 CONFIG_SAMA5_SSC_MAXINFLIGHT);
 arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
 CONFIG_SAMV7_SSC_MAXINFLIGHT);
 a

Re: [Breaking change] Move nxmutex to sched

2023-03-31 Thread Gregory Nutt




BTW, https://github.com/apache/nuttx/pull/5070 report that the system will
crash if  the priority inheritance enabled semaphore is waited or posted
from different threads.
True.  sem_post should fail if priority inheritance is enabled and the 
caller is not a holder of a semaphore count.  That check should be 
added.  Certainly it is not a justification for eliminating core 
functionality.