avgoor opened a new issue, #16133:
URL: https://github.com/apache/nuttx/issues/16133

   ### Description / Steps to reproduce the issue
   
   A RaspberryPi's PICO 2W board with a RP2350 MCU is experiencing weird hangs 
in `ostest` during the nested signals testing when SMP is enabled. It loops 
forever inside the `spin_lock_notrace` function.
   The traces I managed to collect:
   
   ```
   info threads
   Index Tid  Pid  Cpu  Thread                Info                              
                                               Frame
    0    0    0    0 '\000' Thread 0x20003d08     (Name: CPU0 IDLE, State: 
Assigned, Priority: 0, Stack: 1008) 0x100105b2  up_idle() at 
chip/rp23xx_idle.c:94
    1    1    0    1 '\001' Thread 0x20003dd8     (Name: CPU1 IDLE, State: 
Assigned, Priority: 0, Stack: 1008) 0x100105b2  up_idle() at 
chip/rp23xx_idle.c:94
    2    2    2    0 '\000' Thread 0x20006698     (Name: nsh_main, State: 
Waiting,Semaphore, Priority: 100, Stack: 2008) 0x10005086        
nxsem_wait_slow() at semaphore/sem_wait.c:207
    12   12   12   0 '\000' Thread 0x20007880     (Name: ostest, State: 
Waiting,Semaphore, Priority: 100, Stack: 2016) 0x10005086  nxsem_wait_slow() at 
semaphore/sem_wait.c:207
    13   13   13   1 '\001' Thread 0x200084e8     (Name: ostest, State: 
Assigned, Priority: 100, Stack: 8120)  No symbol with pc
   *14   62   13   1 '\001' Thread 0x2000a9d8     (Name: ostest, State: 
Running, Priority: 101, Stack: 8176)   0x1000290c  
enter_critical_section_wo_note() at include/nuttx/spinlock.h:199
   *15   63   13   0 '\000' Thread 0x2000aab8     (Name: ostest, State: 
Running, Priority: 102, Stack: 8176)   0x1000290c  
enter_critical_section_wo_note() at include/nuttx/spinlock.h:199
   
   bt
   #0  0x1000290c in spin_lock_notrace (lock=0x200040c8 <g_cpu_irqlock> "\001") 
at include/nuttx/spinlock.h:199
   #1  enter_critical_section_wo_note () at irq/irq_csection.c:183
   #2  0x1000c754 in uart_xmitchars (dev=0x2000121c <g_uart0port>) at 
serial/serial_io.c:62
   #3  0x10000e54 in up_interrupt (irq=49, context=0x0, arg=0x2000121c 
<g_uart0port>) at chip/rp23xx_serial.c:617
   #4  0x10002836 in irq_dispatch (irq=49, context=0x0) at 
irq/irq_dispatch.c:144
   #5  0x10001b64 in exception_direct () at armv8-m/arm_doirq.c:62
   #6  <signal handler called>
   #7  spin_lock_notrace (lock=0x200040c8 <g_cpu_irqlock> "\001") at 
include/nuttx/spinlock.h:199
   #8  enter_critical_section_wo_note () at irq/irq_csection.c:234
   #9  0x10005376 in nxsig_deliver (stcb=0x2000aab8) at signal/sig_deliver.c:178
   #10 0x10001e9e in arm_sigdeliver () at armv8-m/arm_sigdeliver.c:107
   #11 0x10005fb8 in nxsched_remove_self (tcb=0x40) at 
sched/sched_removereadytorun.c:280
   #12 0x00000000 in ?? ()
   
   list
   194     {
   195     #ifdef CONFIG_TICKET_SPINLOCK
   196       int ticket = atomic_fetch_add(&lock->next, 1);
   197       while (atomic_read(&lock->owner) != ticket)
   198     #else /* CONFIG_TICKET_SPINLOCK */
   199       while (up_testset(lock) == SP_LOCKED)
   200     #endif
   201         {
   202           UP_DSB();
   203           UP_WFE();
   
   info args
   lock = 0x200040c8 <g_cpu_irqlock> "\001"
   ```
   Additional facts:
   - console to the board is connected via UART0
   - the issue reproduces 100% of times when running the `ostest` utility
   - the `smp` utility runs without problems, no issues found
   - the issue does not reproduce on the older RP2040 MCU (different ARM cores)
   - the issue does not reproduce when `CONFIG_SMP_NCPUS=1` but SMP is enabled
   - the issue reproduces even when `RP23XX_TESTSET_SPINLOCK` is changed from 
`0` to `31` (see the RP2350-E2 erratum)
   - the issue reproduces with today's `master`
   
   The output of the `ostest` utility often times is partially cut off:
   ```
   ...
   user_main: nested signal handler test
   signest_test: Starting signal waiter task at priority 101
   signest_test: Started waiter_main pid=62
   waiter_main: Waiter started
   signest_test: Starting interfering task at priority 102
   waiter_main: Setting signal mask
   interfere_main: Waiting on semaphore
   waiter_main: Registering signal handler
   signest_test: Started interfere_main pid=63
   waiter_main: Waiting on semaphore
   signest_test: Simple case:
     Total signalled
   ```
   
   
   ### On which OS does this issue occur?
   
   [OS: Linux]
   
   ### What is the version of your OS?
   
   ArchLinux, Debian
   
   ### NuttX Version
   
   master
   
   ### Issue Architecture
   
   [Arch: arm]
   
   ### Issue Area
   
   [Area: Kernel]
   
   ### Host information
   
   I use two build environments, in both the issue is reproducing 100% of times.
   1: x86_64 PC with ArchLinux and the `arm-none-eabi-*` embedded toolchain
   2: aarch64 VM with Debian and the `arm-none-eabi-*` embedded toolchain
   
   ### Verification
   
   - [x] I have verified before submitting the report.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to