When the priority inheritance is enabled on a semaphore, sem_wait will add the current thread to the semaphore's holder table and expect that the same thread will call sem_post later to remove it from the holder table. If we mess this fundamental assumption by waiting/posting from different threads, many strange things will happen. For example, let's consider what's happen when a program send a TCP packet:
1. The send task call sem_wait to become a holder and get IOB 2. Network subsystem copy the user buffer into IOB and add IOB to the queue 3. The send task exit and then semphare contain a dangling pointer to the sending tcb 4. After network subsystem send IOB to the wire and return it the pool, sem_post is called and will touch the dangling pointer Zeng Zhaoxiu provides a patch(https://github.com/apache/nuttx/pull/5171) to workaround this issue. But, the semaphore holder tracking can't work as we expect anymore. On Sat, Apr 1, 2023 at 3:52 AM Petro Karashchenko < petro.karashche...@gmail.com> wrote: > Xiang Xiao, is that still true for the latest code in master branch? > And by "system will > crash if the priority inheritance enabled semaphore is waited or posted > from different threads" do you mean at the point of sem_post/sem_wait or > some system instability in general? > > Best regards, > Petro > > On Fri, Mar 31, 2023, 10:39 PM Xiang Xiao <xiaoxiang781...@gmail.com> > wrote: > > > BTW, https://github.com/apache/nuttx/pull/5070 report that the system > will > > crash if the priority inheritance enabled semaphore is waited or posted > > from different threads. > > > > On Sat, Apr 1, 2023 at 3:20 AM Xiang Xiao <xiaoxiang781...@gmail.com> > > wrote: > > > > > > > > > > > On Sat, Apr 1, 2023 at 12:34 AM Gregory Nutt <spudan...@gmail.com> > > wrote: > > > > > >> On 3/31/2023 8:56 AM, Gregory Nutt wrote: > > >> > > > >> >> Even more. In my previous example if semaphore is posted from the > > >> >> interrupt > > >> >> we do not know which of TaskA or TaskB is no longer a "holder l" > of a > > >> >> semaphore. > > >> >> > > >> > You are right. In this usage case, the counting semaphore is not > > >> > being used for locking; it is being used for signalling an event per > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance > > >> > > > >> > In that case, priority inheritance must be turned off. > > >> > > > >> You example is really confusing because you are mixing two different > > >> concepts, just subtly enough to be really confusing. If a counting > > >> semaphore is used for locking a set of multiple resources, the posting > > >> the semaphore does not release the resource. That is not the way that > > >> it is done. Here is a more believable example of how this would work: > > >> > > >> 1. TaskA with priority 10 allocates DMA0 channel by calling a DMA > > >> channel allocation function. If a DMA channel is available, it is > > >> allocated and the allocation function takes the semaphore. TaskA > > >> then starts DMA activity. > > >> 2. TaskA waits on another signaling semaphore for the DMA to > complete. > > >> 3. TaskB with priority 20 does the same. > > >> 4. TaskC with priority 30 wants to allocate a DMA channel. It calls > > >> the channel allocation function which waits on the sempahore for a > > >> count to be available. This boost the priority of TaskA and TaskB > > >> to 30. > > >> 5. When the DMA started by TaskA completes, it signals TaskA which > > >> calls the resource deallocation function which increments the > > >> counting semaphore, giving the count to TaskC and storing the base > > >> priorities. > > >> > > >> > > > > > > Normally, the resource(dma channel here) is allocated from one > > > thread/task, but may be freed in another thread/task. Please consider > how > > > we malloc and free memory. > > > > > > > > >> The confusion arises because you are mixing the signaling logic with > the > > >> resource deallocation logic. > > >> > > >> The mm/iob logic works basically this way. The logic more complex > then > > >> you would think from above. IOBs is an example of a critical system > > >> resource that has multiple copies and utilizes a counting semaphore > with > > >> priority inheritance to achieve good real time performance. IOB > > >> handling is key logic for the correct real time operation of the > overall > > >> system. Nothing we do must risk this. > > >> > > >> > > > IOB is a very good example to demonstrate why it's a bad and dangerous > > > idea to enable priority inheritance for the counting semaphore. IOB is > > > normally allocated in the send thread but free in the work thread. If > we > > > want the priority inheritance to work as expected instead of crashing > the > > > system, sem_wait/sem_post must come from the same thread, which is a > kind > > > of lock. > > > > > > > > >> Other places where this logic is (probably) used: > > >> > > >> arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0, > > >> CONFIG_RP2040_I2S_MAXINFLIGHT); > > >> arch/arm/src/rtl8720c/amebaz_depend.c: if (sem_init(_sema, 0, > > >> init_val)) > > >> arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0, > > >> CONFIG_SAMA5_SSC_MAXINFLIGHT); > > >> arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0, > > >> CONFIG_SAMV7_SSC_MAXINFLIGHT); > > >> arch/arm/src/stm32/stm32_i2s.c: nxsem_init(&priv->bufsem, 0, > > >> CONFIG_STM32_I2S_MAXINFLIGHT); > > >> drivers/can/mcp2515.c: nxsem_init(&priv->txfsem, 0, > > >> MCP2515_NUM_TX_BUFFERS); > > >> drivers/video/vnc/vnc_server.c: nxsem_init(&session->freesem, 0, > > >> CONFIG_VNCSERVER_NUPDATES); > > >> sched/pthread/pthread_completejoin.c: nxsem_init(&pjoin->data_sem, > > >> 0, (ntasks_waiting + 1)); > > >> wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.le_pkts_sem, > 0, > > >> g_btdev.le_pkts); > > >> wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.ncmd_sem, 0, > > 1); > > >> wireless/ieee802154/mac802154.c: nxsem_init(&priv->txdesc_sem, 0, > > >> CONFIG_MAC802154_NTXDESC); > > >> wireless/ieee802154/mac802154.c: nxsem_init(&mac->opsem, 0, 1); > > >> > > >> Maybe: > > >> > > >> arch/risc-v/src/bl602/bl602_os_hal.c: ret = nxsem_init(sem, 0, > > init); > > >> arch/risc-v/src/esp32c3/esp32c3_ble_adapter.c: ret = > > >> sem_init(&bt_sem->sem, 0, init); > > >> arch/risc-v/src/esp32c3/esp32c3_wifi_adapter.c: ret = > > >> nxsem_init(sem, 0, init); > > >> arch/xtensa/src/esp32/esp32_ble_adapter.c: ret = sem_init(sem, 0, > > >> init); > > >> arch/xtensa/src/esp32/esp32_wifi_adapter.c: ret = nxsem_init(sem, > 0, > > >> init); > > >> arch/xtensa/src/esp32s3/esp32s3_wifi_adapter.c: ret = > > >> nxsem_init(sem, 0, init); > > >> > > >> > > > > > > > > > > > >