LingaoM opened a new pull request, #19032:
URL: https://github.com/apache/nuttx/pull/19032

    **Summary**
     
     up_backtrace() on the sim arch previously delegated entirely to 
host_backtrace() (a thin wrapper around glibc's
     backtrace()). glibc's backtrace() only walks the calling host thread, so 
it returned a non-empty result solely for
     the currently-running NuttX task. As a consequence, when an assertion 
fired and dump_tasks() walked the task list
     calling sched_dumpstack() for every TCB, every task other than the 
crashing one returned a zero-length backtrace —
     sched_dumpstack() saw size <= 0, break-ed out of its loop, and printed 
nothing. The dump was effectively useless for
     understanding what other tasks were doing at the moment of the crash.
   
     This patch keeps the existing host_backtrace() path for the running task 
(so DWARF unwinding through host libraries
     still works), and adds a frame-pointer walker for every other task. The 
walker starts from the registers saved in
     tcb->xcp.regs by setjmp() at the last context switch, then chases the 
[fp]=prev_fp, [fp+8]=ret_addr chain until
     either the stack range ends, fp becomes mis-aligned, or *fp is NULL.
   
     _A few facts make this safe across hosts:_
   
     - sim's setjmp/longjmp is NuttX's own 
(libs/libc/machine/sim/arch_setjmp_*.S), not host glibc. The assembly stores
     %rbp/%rsp/%rip (or arm fp/sp/pc) directly with no PTR_MANGLE, so the saved 
values are real pointers on Linux, macOS
     and Windows alike.
     - The JB_FP / JB_SP / JB_PC indices in arch/sim/include/setjmp.h are 
uniformly defined for every supported host
     (x86, x86_64, ARM, ARM64).
     - The frame layout [fp]=prev fp, [fp+sizeof(uintptr_t)]=return addr is 
shared by the System V x86_64, Microsoft x64,
     x86, ARM AAPCS and ARM64 AAPCS64 ABIs.
   
     The walker is bounded by [stack_base_ptr, stack_base_ptr + adj_stack_size) 
and rejects mis-aligned fp, so a
     corrupted task stack cannot make us read out of bounds. The walk is 
wrapped in enter_critical_section() to keep the
     target's saved registers stable while we're following them.
   
     **Impact**
   
     - Users: when an assertion or panic fires on sim, the per-task backtraces 
produced by dump_tasks() are now actually
     populated for every task. This is a visible improvement to debug output — 
see the test logs below for before/after.
     - Build: requires CONFIG_FRAME_POINTER=y for the new path to work. This is 
already the recommended/default setting
     on sim defconfigs and is required for the existing per-task fp-walking 
already used by other archs
     (arch/arm/src/common/arm_backtrace_fp.c, etc.). When the option is off the 
running-task path still works exactly as
     before.
     - API / ABI: no change. up_backtrace() signature, return value contract 
and sched_dumpstack() output format are
     unchanged.
     - Hardware: sim only.
     - Security: walker is internally bounded by the task's own stack_base_ptr 
/ adj_stack_size and rejects mis-aligned
     fp, so a corrupted stack cannot turn the walk into an arbitrary read. No 
new attack surface exposed.
     - Compatibility: works on Linux/macOS/Windows hosts and on every host arch 
sim already supports (x86, x86_64, ARM,
     ARM64), because the underlying setjmp layout is provided by NuttX itself 
and identical on all of them.
   
     **Testing**
   
     Host machine: Linux 6.8.0-124-generic, x86_64 (Ubuntu), gcc.
     Board / config: sim:nsh-style build (zblue bluetooth tester defconfig) 
with CONFIG_FRAME_POINTER=y,
     CONFIG_SCHED_BACKTRACE=y, CONFIG_ALLSYMS=y.
     How tested: built nuttx, triggered an assertion in a user task 
(observer_main calling assert(0)), and inspected the
     dump that _assert() produces via dump_tasks() → dump_backtrace() → 
sched_dumpstack() for every task in the system.
   
     _Before this patch_
   
     Only the running task (PID 3, the one that hit the assert) produced a 
backtrace. The other four tasks were silently
     dropped because up_backtrace() returned 0 for them and sched_dumpstack() 
break-ed out of its loop:
   
   ```
     backtrace| 3: 0x0000000000439fdb 0x0000000000439dc0 0x00000000004398fa 
0x0000000000434bf9 0x0000000000430733
     0x000000000040a09f 0x0000000000429833 0x000000000040ebb1
     backtrace| 3: 0x000000000040a5b7 0x0000000000405cb2 0x000000000040d25e
        PID GROUP PRI POLICY   TYPE    NPX STATE   EVENT      SIGMASK          
STACKBASE  STACKSIZE      USED   FILLED
     COMMAND
           0     0   0 FIFO     Kthread N-- Ready              0000000000000000 
0x7ffd997d0020     69616      1224
     1.7%    Idle_Task
           1     1 224 FIFO     Kthread --- Waiting Signal     0000000000000000 
0x7c8f07e01220     67536       824
     1.2%    loop_task
           2     2 224 FIFO     Kthread --- Waiting Semaphore  0000000000000000 
0x7c8f07e12050     67504       680
     1.0%    hpwork
           3     3 100 FIFO     Task    --- Running            0000000000000000 
0x7c8f07e22df0     73680      5640
     7.6%    observer_main
           4     3 110 FIFO     pthread --- Waiting Semaphore  0000000000000000 
0x7c8f07e34f70     67568       808
     1.1%    sysworkq
   ```
     (no per-task backtraces emitted for PIDs 0,1,2,4)
   
     _After this patch_
   
     Every task produces a backtrace, fully resolved through %pS symbol 
formatting (CONFIG_ALLSYMS=y):
   ```
     ASSERTION FAIL [0] @ samples/bluetooth/observer_pref/src/main.c:508
        PID GROUP PRI POLICY   TYPE    NPX STATE   EVENT      SIGMASK          
STACKBASE  STACKSIZE      USED   FILLED
     COMMAND
           0     0   0 FIFO     Kthread N-- Ready              0000000000000000 
0x7ffde15050b0     69616      1224
     1.7%    Idle_Task
           1     1 224 FIFO     Kthread --- Waiting Signal     0000000000000000 
0x71e173201220     67536       824
     1.2%    loop_task
           2     2 224 FIFO     Kthread --- Waiting Semaphore  0000000000000000 
0x71e173212050     67504       680
     1.0%    hpwork
           3     3 100 FIFO     Task    --- Running            0000000000000000 
0x71e173222df0     73680      5480
     7.4%    observer_main
           4     3 110 FIFO     pthread --- Waiting Semaphore  0000000000000000 
0x71e173234f70     67568       808
     1.1%    sysworkq
   
     backtrace:
     [ 0] [<0x4139c8>] up_switch_context+0x68/0xb0
     [ 0] [<0x40a8b2>] sched_unlock+0x92/0xa0
   
     backtrace:
     [ 1] [<0x4139c8>] up_switch_context+0x68/0xb0
     [ 1] [<0x438f75>] nxsig_timedwait+0x235/0x2d0
     [ 1] [<0x438a7a>] nxsig_nanosleep+0x7a/0x150
     [ 1] [<0x438c0b>] clock_nanosleep+0xbb/0x110
     [ 1] [<0x43dc61>] NXusleep+0x61/0x70
     [ 1] [<0x4135b4>] sim_loop_task+0x34/0x40
     [ 1] [<0x40bd51>] nxtask_start+0x51/0x70
     [ 1] [<0x41361e>] pre_start+0x1e/0x30
   
     backtrace:
     [ 2] [<0x4139c8>] up_switch_context+0x68/0xb0
     [ 2] [<0x40b41f>] nxsem_wait+0xbf/0xd0
     [ 2] [<0x40b448>] nxsem_wait_uninterruptible+0x18/0x30
     [ 2] [<0x40ad7b>] work_thread+0x11b/0x140
     [ 2] [<0x40bd51>] nxtask_start+0x51/0x70
     [ 2] [<0x41361e>] pre_start+0x1e/0x30
   
     backtrace:
     [ 3] [<0x4402db>] host_backtrace+0x2b/0x50
     [ 3] [<0x4400c0>] up_backtrace+0x1a0/0x1f0
     [ 3] [<0x43fbfa>] sched_backtrace+0x3a/0x70
     [ 3] [<0x43afcd>] sched_dumpstack+0x7d/0x110
     [ 3] [<0x4366b0>] dump_backtrace+0x10/0x20
     [ 3] [<0x437166>] nxsched_foreach+0x46/0x70
     [ 3] [<0x436b24>] _assert+0x1d4/0x280
     [ 3] [<0x41014f>] __assert+0xf/0x20
     [ 3] [<0x42fbf3>] assert_post_action+0x13/0x20
     [ 3] [<0x414f71>] observer_main+0xc1/0x1d0
     [ 3] [<0x410667>] nxtask_startup+0x27/0x30
     [ 3] [<0x40bd62>] nxtask_start+0x62/0x70
     [ 3] [<0x41361e>] pre_start+0x1e/0x30
   
     backtrace:
     [ 4] [<0x4139c8>] up_switch_context+0x68/0xb0
     [ 4] [<0x40b41f>] nxsem_wait+0xbf/0xd0
     [ 4] [<0x40b448>] nxsem_wait_uninterruptible+0x18/0x30
     [ 4] [<0x4312c5>] k_sem_take+0x35/0x40
     [ 4] [<0x42fcbc>] z_sched_wait+0x7c/0xa0
     [ 4] [<0x4300cb>] work_queue_main+0x1ab/0x250
     [ 4] [<0x431779>] k_thread_main+0x69/0x90
     [ 4] [<0x43ae81>] pthread_startup+0x11/0x20
     [ 4] [<0x43f90f>] pthread_start+0x2f/0x40
     [ 4] [<0x41361e>] pre_start+0x1e/0x30
   ```
   
     Each non-running task's first frame is up_switch_context() (the address 
that setjmp() saved at the last context
     switch), and the chain unwinds correctly all the way down to pre_start, 
which matches expectations:
   
     - PID 0 (Idle): blocked inside sched_unlock() having just yielded
     - PID 1 (loop_task): sleeping in clock_nanosleep
     - PID 2 (hpwork): waiting on its sem inside work_thread
     - PID 4 (sysworkq pthread): waiting on k_sem_take from zblue's 
work_queue_main
   
     The running task (PID 3) still uses host_backtrace() and the dump shows it 
correctly walking through _assert -> 
     __assert -> observer_main.
   
     Sanity: built make -j$(nproc) clean with no new warnings; existing 
assertion path (assert(rtcb)) on the running task
     is unchanged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to