Hi Peter, That is interesting (and strange) indeed. IIRC the only difference between those two chips is that the 753 has built-in crypto accelerators while the 743 does not. I believe that a firmware image built for one will work correctly on the other (provided obviously that the firmware does not attempt to access the crypto accelerators).
Did you make a separate build for each chip? Or did you flash an *identical* image to both boards with stack size = 2048 and the same image succeeded on the 753 and failed on the 743? I'm asking because if it's an identical image, that would require a quite different debugging strategy than if it was a separate build for each chip. Thanks, Nathan On Sun, Feb 8, 2026 at 3:11 PM Peter Barada <[email protected]> wrote: > Nathan, > > What's strange is that same master source (nuttx hash > e83606732d5e71eb98a9eb544537dbbeb71aa58b, apps hash > d48b45000d1d083082f7a1650f351573c36a87d0) with INIT_STACKSIZE=2048 in the > default .config fails on nucleo-h743zi2 but passes on nucleo-h743zi2(run on > my nuclo-h753zi board) when I try "time ls". I turned on all the stack > checks just to be sure nuclo-f446re wasn't just "lucky". > On 2/7/26 23:54, Nathan Hartman wrote: > > Yeah, it's usually the stack, but does anyone know why it needs to be > enlarged now? Is something using more stack than before? > > On Sat, Feb 7, 2026 at 5:28 PM Peter Barada <[email protected]> > wrote: > >> Cranking up CONFIG_INIT_STACKSIZE to 3072 fixes the issue. >> >> I tried enabling STACK_COLORATION, STACK_USAGE, and ARMV7M_STACKTRACE >> while leaving INIT_STACKSIZE at 2048 to hopefully and debug using >> STM32CubeIDE when I try "time ls" the GDB session is lost (which seems >> strange). >> >> If I then enable ARMV7M_STACKCHECK_BREAKPOINT GDB stops when it detects >> the stack overflow can get a call stack to understand why but can't >> continue(to show dump). >> >> Finally after enabling ARCH_STACKDUMP, ARMV7M_STACKCHECK, >> SCHED_BACKTRACE, STACK_COLORATION, STACK_USAGE, disable >> STACKCHECK_BREAKPOINT, and enable/set ARCH_INTERRUPTSTACK=2048, and >> ARCH_STACKDUMP_MAX_LENGTH=1024, I get a full dump when it detects stack >> overflow. >> >> Thanks for the help! >> >> >> On 2/7/26 03:25, raiden00pl wrote: >> > hi, this is a 100% stack issue. Increase all stack sizes to at least >> 4092. >> > Another option is to enable full optimisation with >> CONFIG_DEBUG_FULLOPT=y, >> > should also help. >> > >> > quick tip: about 80% of crashes in NuttX are stack issues, the first >> thing >> > you >> > always do when such crashes occur is to increase all stack sizes :) >> > >> > sob., 7 lut 2026 o 04:02 Matteo Golin <[email protected]> >> napisał(a): >> > >> >> I am not familiar enough, but there should be an option for stack >> canaries. >> >> I haven't had much luck with that configuration, and I imagine that >> your >> >> DEBUGASSERT will trigger before stack smashing is detected. >> >> >> >> Matteo >> >> >> >> On Fri, Feb 6, 2026, 8:45 PM Peter Barada <[email protected]> >> wrote: >> >> >> >>> Haven't tried yet(personally feel should know _why_ it happens) - is >> >> there >> >>> a config for compiling in stack checking on function entry? >> >>> On 2/6/26 20:22, Matteo Golin wrote: >> >>> >> >>> Hmmm, if the problem goes that far back it may not be worth triaging >> that >> >>> way. Things have probably diverged so much since then. No luck with >> the >> >>> stack increase? >> >>> >> >>> Matteo >> >>> >> >>> On Fri, Feb 6, 2026, 8:18 PM Peter Barada <[email protected]> >> >> wrote: >> >>>> Matteo, >> >>>> >> >>>> I'm walking back release points and have had to change board >> >>>> configuration names(to nucleo-h743zi), rename nuttx-apps to appa, and >> >> still >> >>>> seeing the fault in release/11.0 branch. >> >>>> >> >>>> I'm trying to go back further but wondering if I'll find a bisect >> start >> >>>> point... >> >>>> On 2/6/26 17:05, Matteo Golin wrote: >> >>>> >> >>>> Hi Peter, >> >>>> >> >>>> My approach is kind of a headache since bisecting over an area where >> >> apps >> >>>> and NuttX are not always in sync is a major limitation of the split >> >> repo. >> >>>> My approach is usually: >> >>>> >> >>>> - Start the bisect in kernel >> >>>> - Check the commit date of the current HEAD >> >>>> - Check out to a commit of the same/similar date in apps >> >>>> - Build >> >>>> - Mentally note if this commit was good or bad based on the results >> of >> >>>> running the image >> >>>> - make distclean (avoids artifacts carrying over between bisections >> and >> >>>> breaking everything) >> >>>> - Mark commit good or bad with git bisect >> >>>> >> >>>> Then basically repeat this until bisecting is finished. It sucks and >> I >> >>>> did suggest a script in /tools/ to try and automate most of this, >> but I >> >>>> never got around to writing it. >> >>>> >> >>>> I would suggest you start by checking for the issue on a stable >> release >> >>>> (i.e. 12.12.0) to see if that's a good commit you can start from. >> >> Usually >> >>>> those releases have a higher degree of testing because everyone who >> >> voted >> >>>> for the release ran some images on their hardware. >> >>>> >> >>>> That's honestly a lot of work but you never know if it'll end up >> being >> >>>> faster than trying to triage with logs! >> >>>> >> >>>> Matteo >> >>>> >> >>>> On Fri, Feb 6, 2026, 4:50 PM Nathan Hartman < >> [email protected]> >> >>>> wrote: >> >>>> >> >>>>> First place I would look: is the stack overflowing? (You could try >> >>>>> enabling some of the stack debugging features.) >> >>>>> >> >>>>> On Fri, Feb 6, 2026 at 4:34 PM Peter Barada <[email protected] >> > >> >>>>> wrote: >> >>>>> >> >>>>>> Matteo, >> >>>>>> >> >>>>>> I don't know if this was working before but if you can suggest a >> good >> >>>>>> starting point I can cycle through git bisect to narrow down to the >> >>>>>> failing commit. What's the best approach to using git bisect >> across >> >>>>>> multiple repos (since changes in nuttx may have necessary changes >> in >> >>>>>> nuttx-apps and need to keep them in sync at each build point)? >> >>>>>> >> >>>>>> As an aside, I also I have a nucleo-f446re board 'time ls' works >> fine >> >>>>>> there. >> >>>>>> >> >>>>>> Further, does anyone have GDB scripts that make it easier to >> decipher >> >>>>>> Nuttx structures from memory (e.g. dump task/semaphore lists, etc)? >> >>>>>> I've >> >>>>>> started cobbling snippets but figure I'd ask before reinventing the >> >>>>>> wheel. >> >>>>>> >> >>>>>> >> >>>>>> On 2/6/26 16:12, Matteo Golin wrote: >> >>>>>>> Hi Peter, >> >>>>>>> >> >>>>>>> If you happen to know that this was working before on an older >> NuttX >> >>>>>>> version, you could use git bisect to narrow down the breaking >> >> commit. >> >>>>>>> Then the issue might be clearer. >> >>>>>>> >> >>>>>>> Best, >> >>>>>>> Matteo >> >>>>>>> >> >>>>>>> On Fri, Feb 6, 2026, 4:09 PM Peter Barada <[email protected] >> > >> >>>>>> wrote: >> >>>>>>> I have a STM32 Nucleo-h753zi board - and configured a build >> for >> >>>>>>> nucleo-743zi2:nsh (which is closest board/chip; the >> stm32h753zi >> >>>>>> is >> >>>>>>> same >> >>>>>>> as stm32h743zi but h753zi includes crypto acceleration >> >> hardware). >> >>>>>>> Build works, but if I boot and try 'time ls' nuttx faults: >> >>>>>>> >> >>>>>>> nsh> uname -a >> >>>>>>> NuttX 0.0.0 9ecfff0833 Feb 6 2026 15:45:28 arm >> nucleo-h743zi2 >> >>>>>>> nsh> time ls >> >>>>>>> /: >> >>>>>>> dev/ >> >>>>>>> >> >>>>>>> 0.00dump_assert_info: Current Version: NuttX 0.0.0 >> 9ecfff0833 >> >>>>>>> Feb 6 2026 15:45:28 arm >> >>>>>>> dump_assert_info: Assertion failed panic: at file: :0 task: >> >>>>>>> <noname> process: <noname> 0x800c9fd >> >>>>>>> up_dump_register: R0: 0801e624 R1: 0000000a R2: 00000050 R3: >> >>>>>> 0000000a >> >>>>>>> up_dump_register: R4: 00000001 R5: 240000e4 R6: 00000000 FP: >> >>>>>> 00000000 >> >>>>>>> up_dump_register: R8: 00000000 SB: 00000000 SL: 00000000 R11: >> >>>>>> 00000000 >> >>>>>>> up_dump_register: IP: 00000000 SP: 38000c08 LR: 080059db PC: >> >>>>>> 08005984 >> >>>>>>> up_dump_register: xPSR: 41000000 BASEPRI: 00000000 CONTROL: >> >>>>>> 00000000 >> >>>>>>> up_dump_register: EXC_RETURN: ffffffe9 >> >>>>>>> dump_stackinfo: User Stack: >> >>>>>>> dump_stackinfo: base: 0x38000518 >> >>>>>>> dump_stackinfo: size: 00002000 >> >>>>>>> dump_stackinfo: sp: 0x38000c08 >> >>>>>>> stack_dump: 0x38000be8: 00000000 00000000 00000000 00000000 >> >>>>>>> 00000000 00000000 00000000 00000000 >> >>>>>>> stack_dump: 0x38000c08: 0000000a 0801e624 0801e624 38000200 >> >>>>>>> 38000fac 00000000 0801e624 080172c1 >> >>>>>>> stack_dump: 0x38000c28: 00000000 0801e624 38000200 38000158 >> >>>>>>> 00000000 00000000 38000fac 0800caa1 >> >>>>>>> stack_dump: 0x38000c48: 00000000 0800cc77 0801e624 000002fc >> >>>>>>> 38000500 00000001 00000001 38000cf0 >> >>>>>>> stack_dump: 0x38000c68: 38000cf0 00000008 38000200 00000000 >> >>>>>>> 00000000 0800ca79 38000500 00000001 >> >>>>>>> stack_dump: 0x38000c88: 00000064 38000cf0 00000064 0800ca33 >> >>>>>>> 38000500 00000001 00000064 00000000 >> >>>>>>> stack_dump: 0x38000ca8: 00000000 08009325 00000000 38000500 >> >>>>>>> 00000001 0800c9fd 00000000 080052f1 >> >>>>>>> stack_dump: 0x38000cc8: 00000000 38000500 00000000 38000158 >> >>>>>>> 00000001 00000001 00000000 00000000 >> >>>>>>> stack_dump: 0x38000ce8: 00000000 00000000 00000000 00000000 >> >>>>>>> 00000000 00000000 00000000 00000000 >> >>>>>>> dump_tasks: PID GROUP PRI POLICY TYPE NPX STATE >> EVENT >> >>>>>>> SIGMASK STACKBASE STACKSIZE COMMAND >> >>>>>>> dump_task: 0 0 0 FIFO Kthread - Ready >> >>>>>>> 0000000000000000 0x240018b0 1000 <noname> >> >>>>>>> dump_task: 1 1 100 RR Task - Running >> >>>>>>> 0000000000000000 0x38000518 2000 <noname> ��]���& >> >>>>>>> >> >>>>>>> Wondering if anyone has run across this before? Backtrace >> >> shows: >> >>>>>>> Program received signal SIGTRAP, Trace/breakpoint trap. >> >>>>>>> exception_common () at armv7-m/arm_exception.S:127 >> >>>>>>> 127 mrs r0, ipsr /* >> >> R0=exception >> >>>>>>> number */ >> >>>>>>> where >> >>>>>>> #0 exception_common () at armv7-m/arm_exception.S:127 >> >>>>>>> #1 <signal handler called> >> >>>>>>> #2 0x08005984 in env_cmpname (pszname=0x801e624 "PS1", >> >>>>>>> peqname=0xa <error: Cannot access memory at address >> 0xa>) >> >>>>>>> at environ/env_findvar.c:50 >> >>>>>>> #3 0x080059da in env_findvar (group=0x38000200, >> pname=0x801e624 >> >>>>>>> "PS1") >> >>>>>>> at environ/env_findvar.c:105 >> >>>>>>> #4 0x080172c0 in getenv (name=0x801e624 "PS1") at >> >>>>>>> environ/env_getenv.c:89 >> >>>>>>> #5 0x0800caa0 in nsh_update_prompt () at nsh_prompt.c:77 >> >>>>>>> #6 0x0800cc76 in nsh_session (pstate=0x38000cf0, login=1, >> >> argc=1, >> >>>>>>> argv=0x38000500) at nsh_session.c:249 >> >>>>>>> #7 0x0800ca78 in nsh_consolemain (argc=1, argv=0x38000500) >> >>>>>>> at nsh_consolemain.c:77 >> >>>>>>> #8 0x0800ca32 in nsh_main (argc=1, argv=0x38000500) at nsh_ >> >>>>>> main.c:76 >> >>>>>>> #9 0x08009324 in nxtask_startup (entrypt=0x800c9fd >> <nsh_main>, >> >>>>>>> argc=1, >> >>>>>>> argv=0x38000500) at sched/task_startup.c:72 >> >>>>>>> #10 0x080052f0 in nxtask_start () at task/task_start.c:104 >> >>>>>>> #11 0x00000000 in ?? () >> >>>>>>> >> >>>>>>> Scratching the surface shows that env_findvar() is called >> with >> >>>>>> group >> >>>>>>> pointer of 0x38000200, group->tg_envp is 0x380004b8, both >> which >> >>>>>> are >> >>>>>>> reasonable. But *group->tg_envp is 0xA. Further if I "watch >> >>>>>>> *(int*)0x380004b8" in GDB, I see it is getting overwritten by >> >>>>>>> up_serialout() invoked from stm32_serial.c::up_send. >> >>>>>>> >> >>>>>>> Any suggestions on how I can best track this down further? >> >>>>>>> >> >>>>>>> Thanks in advance! >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Peter Barada >> >>>>>>> [email protected] >> >>>>>>> >> >>>>>> -- >> >>>>>> Peter Barada >> >>>>>> [email protected] >> >>>>>> >> >>>>> -- >> >>>> Peter [email protected] >> >>>> >> >>>> -- >> >>> Peter [email protected] >> >>> >> >>> >> -- >> Peter Barada >> [email protected] >> >> -- > Peter [email protected] > >
