Re: mm/mm_heap assertion error

2024-03-12 Thread Gregory Nutt
On 3/12/2024 5:12 AM, Nathan Hartman wrote: Try Alan's suggestion to use stack monitor, and that will help understand if there is something wrong. (If it shows that old stack size was OK, while we know corruption was happening, then we will know to look for some out of bound write.) Does the stac

Re: mm/mm_heap assertion error

2024-03-12 Thread Gregory Nutt
After enlarging the stack size of "AppBringUp"  thread, the remote node can boot NSH on RPMSGFS now. I am sorry for not trying this earlier. I was browsing the "rpmsgfs.c" blindly and noticed a few auto variables defined in the stack... then I thought it might worth a try so I did it. That is

Re: mm/mm_heap assertion error

2024-03-12 Thread Gregory Nutt
On 3/12/2024 1:10 AM, yfliu2008 wrote: On the other hand, if we choose not mounting NSH from the RPMSGFS, it can boot smoothly and after boot we can manually mount the RPMSGFS for playing. That sounds like an initialization sequencing problem.  Perhaps something is getting used before it has be

Re: mm/mm_heap assertion error

2024-03-11 Thread Gregory Nutt
meminfo() can be helpful too.  It detects many heap corruption problems (but perhaps not all?).  By sprinkling a few calls to kmm_meminfo() in choice locations, you should also be able to isolate the culprit.  Perhaps after each time the lopri worker runs or after each rpmsg. On 3/11/2024 1:20

Re: mm/mm_heap assertion error

2024-03-11 Thread Simon Filgis
Is there a way to colorize heap to track down the bandid? Like CRC pattern on all the spaces around and check on every call that the CRC pattern ist still OK? Gregory Nutt schrieb am Mo., 11. März 2024, 19:27: > If the memory location that is corrupted is consistent, then you can > monitor that

Re: mm/mm_heap assertion error

2024-03-11 Thread Gregory Nutt
If the memory location that is corrupted is consistent, then you can monitor that location to find the culprit (perhaps using debug output).  If your debugger supports it then setting a watchpoint could also trigger a break when the corruption occurs. Maybe you can also try disabling features

Re: mm/mm_heap assertion error

2024-03-11 Thread Nathan Hartman
What's needed is some way to binary search where the culprit is. If I understand correctly, it looks like the crash is happening in the later stages of board bring-up? What is running before that? Can parts be disabled or skipped to see if the problem goes away? Another idea is to try running a s

Re: mm/mm_heap assertion error

2024-03-11 Thread Gregory Nutt
The reason that the error is confusing is because the error probably did not occur at the time of the assertion; it probably occurred much earlier. In most crashes due to heap corruption there are two players:  the culprit and the victim threads.  The culprit thread actually cause the corrupti

Re: mm/mm_heap assertion error

2024-03-10 Thread Gregory Nutt
On 3/10/2024 4:38 AM, yfliu2008 wrote: Dear experts, When doing regression check on K230 with a previously working Kernel mode configuration, I got assertion error like below: #0  _assert (filename=0x704c598 "mm_heap/mm_malloc.c", linenum=245, msg=0x0,regs=0x7082730 This does indicate