LingaoM opened a new pull request, #19032:
URL: https://github.com/apache/nuttx/pull/19032
**Summary**
up_backtrace() on the sim arch previously delegated entirely to
host_backtrace() (a thin wrapper around glibc's
backtrace()). glibc's backtrace() only walks the calling host thread, so
it returned a non-empty result solely for
the currently-running NuttX task. As a consequence, when an assertion
fired and dump_tasks() walked the task list
calling sched_dumpstack() for every TCB, every task other than the
crashing one returned a zero-length backtrace —
sched_dumpstack() saw size <= 0, break-ed out of its loop, and printed
nothing. The dump was effectively useless for
understanding what other tasks were doing at the moment of the crash.
This patch keeps the existing host_backtrace() path for the running task
(so DWARF unwinding through host libraries
still works), and adds a frame-pointer walker for every other task. The
walker starts from the registers saved in
tcb->xcp.regs by setjmp() at the last context switch, then chases the
[fp]=prev_fp, [fp+8]=ret_addr chain until
either the stack range ends, fp becomes mis-aligned, or *fp is NULL.
_A few facts make this safe across hosts:_
- sim's setjmp/longjmp is NuttX's own
(libs/libc/machine/sim/arch_setjmp_*.S), not host glibc. The assembly stores
%rbp/%rsp/%rip (or arm fp/sp/pc) directly with no PTR_MANGLE, so the saved
values are real pointers on Linux, macOS
and Windows alike.
- The JB_FP / JB_SP / JB_PC indices in arch/sim/include/setjmp.h are
uniformly defined for every supported host
(x86, x86_64, ARM, ARM64).
- The frame layout [fp]=prev fp, [fp+sizeof(uintptr_t)]=return addr is
shared by the System V x86_64, Microsoft x64,
x86, ARM AAPCS and ARM64 AAPCS64 ABIs.
The walker is bounded by [stack_base_ptr, stack_base_ptr + adj_stack_size)
and rejects mis-aligned fp, so a
corrupted task stack cannot make us read out of bounds. The walk is
wrapped in enter_critical_section() to keep the
target's saved registers stable while we're following them.
**Impact**
- Users: when an assertion or panic fires on sim, the per-task backtraces
produced by dump_tasks() are now actually
populated for every task. This is a visible improvement to debug output —
see the test logs below for before/after.
- Build: requires CONFIG_FRAME_POINTER=y for the new path to work. This is
already the recommended/default setting
on sim defconfigs and is required for the existing per-task fp-walking
already used by other archs
(arch/arm/src/common/arm_backtrace_fp.c, etc.). When the option is off the
running-task path still works exactly as
before.
- API / ABI: no change. up_backtrace() signature, return value contract
and sched_dumpstack() output format are
unchanged.
- Hardware: sim only.
- Security: walker is internally bounded by the task's own stack_base_ptr
/ adj_stack_size and rejects mis-aligned
fp, so a corrupted stack cannot turn the walk into an arbitrary read. No
new attack surface exposed.
- Compatibility: works on Linux/macOS/Windows hosts and on every host arch
sim already supports (x86, x86_64, ARM,
ARM64), because the underlying setjmp layout is provided by NuttX itself
and identical on all of them.
**Testing**
Host machine: Linux 6.8.0-124-generic, x86_64 (Ubuntu), gcc.
Board / config: sim:nsh-style build (zblue bluetooth tester defconfig)
with CONFIG_FRAME_POINTER=y,
CONFIG_SCHED_BACKTRACE=y, CONFIG_ALLSYMS=y.
How tested: built nuttx, triggered an assertion in a user task
(observer_main calling assert(0)), and inspected the
dump that _assert() produces via dump_tasks() → dump_backtrace() →
sched_dumpstack() for every task in the system.
_Before this patch_
Only the running task (PID 3, the one that hit the assert) produced a
backtrace. The other four tasks were silently
dropped because up_backtrace() returned 0 for them and sched_dumpstack()
break-ed out of its loop:
```
backtrace| 3: 0x0000000000439fdb 0x0000000000439dc0 0x00000000004398fa
0x0000000000434bf9 0x0000000000430733
0x000000000040a09f 0x0000000000429833 0x000000000040ebb1
backtrace| 3: 0x000000000040a5b7 0x0000000000405cb2 0x000000000040d25e
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK
STACKBASE STACKSIZE USED FILLED
COMMAND
0 0 0 FIFO Kthread N-- Ready 0000000000000000
0x7ffd997d0020 69616 1224
1.7% Idle_Task
1 1 224 FIFO Kthread --- Waiting Signal 0000000000000000
0x7c8f07e01220 67536 824
1.2% loop_task
2 2 224 FIFO Kthread --- Waiting Semaphore 0000000000000000
0x7c8f07e12050 67504 680
1.0% hpwork
3 3 100 FIFO Task --- Running 0000000000000000
0x7c8f07e22df0 73680 5640
7.6% observer_main
4 3 110 FIFO pthread --- Waiting Semaphore 0000000000000000
0x7c8f07e34f70 67568 808
1.1% sysworkq
```
(no per-task backtraces emitted for PIDs 0,1,2,4)
_After this patch_
Every task produces a backtrace, fully resolved through %pS symbol
formatting (CONFIG_ALLSYMS=y):
```
ASSERTION FAIL [0] @ samples/bluetooth/observer_pref/src/main.c:508
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK
STACKBASE STACKSIZE USED FILLED
COMMAND
0 0 0 FIFO Kthread N-- Ready 0000000000000000
0x7ffde15050b0 69616 1224
1.7% Idle_Task
1 1 224 FIFO Kthread --- Waiting Signal 0000000000000000
0x71e173201220 67536 824
1.2% loop_task
2 2 224 FIFO Kthread --- Waiting Semaphore 0000000000000000
0x71e173212050 67504 680
1.0% hpwork
3 3 100 FIFO Task --- Running 0000000000000000
0x71e173222df0 73680 5480
7.4% observer_main
4 3 110 FIFO pthread --- Waiting Semaphore 0000000000000000
0x71e173234f70 67568 808
1.1% sysworkq
backtrace:
[ 0] [<0x4139c8>] up_switch_context+0x68/0xb0
[ 0] [<0x40a8b2>] sched_unlock+0x92/0xa0
backtrace:
[ 1] [<0x4139c8>] up_switch_context+0x68/0xb0
[ 1] [<0x438f75>] nxsig_timedwait+0x235/0x2d0
[ 1] [<0x438a7a>] nxsig_nanosleep+0x7a/0x150
[ 1] [<0x438c0b>] clock_nanosleep+0xbb/0x110
[ 1] [<0x43dc61>] NXusleep+0x61/0x70
[ 1] [<0x4135b4>] sim_loop_task+0x34/0x40
[ 1] [<0x40bd51>] nxtask_start+0x51/0x70
[ 1] [<0x41361e>] pre_start+0x1e/0x30
backtrace:
[ 2] [<0x4139c8>] up_switch_context+0x68/0xb0
[ 2] [<0x40b41f>] nxsem_wait+0xbf/0xd0
[ 2] [<0x40b448>] nxsem_wait_uninterruptible+0x18/0x30
[ 2] [<0x40ad7b>] work_thread+0x11b/0x140
[ 2] [<0x40bd51>] nxtask_start+0x51/0x70
[ 2] [<0x41361e>] pre_start+0x1e/0x30
backtrace:
[ 3] [<0x4402db>] host_backtrace+0x2b/0x50
[ 3] [<0x4400c0>] up_backtrace+0x1a0/0x1f0
[ 3] [<0x43fbfa>] sched_backtrace+0x3a/0x70
[ 3] [<0x43afcd>] sched_dumpstack+0x7d/0x110
[ 3] [<0x4366b0>] dump_backtrace+0x10/0x20
[ 3] [<0x437166>] nxsched_foreach+0x46/0x70
[ 3] [<0x436b24>] _assert+0x1d4/0x280
[ 3] [<0x41014f>] __assert+0xf/0x20
[ 3] [<0x42fbf3>] assert_post_action+0x13/0x20
[ 3] [<0x414f71>] observer_main+0xc1/0x1d0
[ 3] [<0x410667>] nxtask_startup+0x27/0x30
[ 3] [<0x40bd62>] nxtask_start+0x62/0x70
[ 3] [<0x41361e>] pre_start+0x1e/0x30
backtrace:
[ 4] [<0x4139c8>] up_switch_context+0x68/0xb0
[ 4] [<0x40b41f>] nxsem_wait+0xbf/0xd0
[ 4] [<0x40b448>] nxsem_wait_uninterruptible+0x18/0x30
[ 4] [<0x4312c5>] k_sem_take+0x35/0x40
[ 4] [<0x42fcbc>] z_sched_wait+0x7c/0xa0
[ 4] [<0x4300cb>] work_queue_main+0x1ab/0x250
[ 4] [<0x431779>] k_thread_main+0x69/0x90
[ 4] [<0x43ae81>] pthread_startup+0x11/0x20
[ 4] [<0x43f90f>] pthread_start+0x2f/0x40
[ 4] [<0x41361e>] pre_start+0x1e/0x30
```
Each non-running task's first frame is up_switch_context() (the address
that setjmp() saved at the last context
switch), and the chain unwinds correctly all the way down to pre_start,
which matches expectations:
- PID 0 (Idle): blocked inside sched_unlock() having just yielded
- PID 1 (loop_task): sleeping in clock_nanosleep
- PID 2 (hpwork): waiting on its sem inside work_thread
- PID 4 (sysworkq pthread): waiting on k_sem_take from zblue's
work_queue_main
The running task (PID 3) still uses host_backtrace() and the dump shows it
correctly walking through _assert ->
__assert -> observer_main.
Sanity: built make -j$(nproc) clean with no new warnings; existing
assertion path (assert(rtcb)) on the running task
is unchanged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]