On 3/10/2024 4:38 AM, yfliu2008 wrote:
Dear experts,




When doing regression check on K230 with a previously working Kernel mode 
configuration, I got assertion error like below:



#0  _assert (filename=0x704c598 "mm_heap/mm_malloc.c", linenum=245, msg=0x0,regs=0x7082730 
<g_last_regs&gt;) at misc/assert.c:536#1 &nbsp;0x000000000700ca98 in __assert (filename=0x704c598 
"mm_heap/mm_malloc.c", linenum=245, msg=0x0) at assert/lib_assert.c:36
#2 &nbsp;0x00000000070110f0 in mm_malloc (heap=0x7089c00, size=112) at 
mm_heap/mm_malloc.c:245
#3 &nbsp;0x000000000700fd74 in kmm_malloc (size=112) at kmm_heap/kmm_malloc.c:51
#4 &nbsp;0x0000000007028d4e in elf_loadphdrs (loadinfo=0x7090550) at 
libelf/libelf_sections.c:207
#5 &nbsp;0x0000000007028b0c in elf_load (loadinfo=0x7090550)&nbsp; at 
libelf/libelf_load.c:337
#6 &nbsp;0x00000000070278aa in elf_loadbinary (binp=0x708f5d0, filename=0x704bca8 
"/system/bin/init", exports=0x0, nexports=0) at elf.c:257
#7 &nbsp;0x00000000070293ea in load_absmodule (bin=0x708f5d0, filename=0x704bca8 
"/system/bin/init", exports=0x0, nexports=0) at binfmt_loadmodule.c:115
#8 &nbsp;0x0000000007029504 in load_module (bin=0x708f5d0, filename=0x704bca8 
"/system/bin/init", exports=0x0, nexports=0)&nbsp; at binfmt_loadmodule.c:219
#9 &nbsp;0x0000000007027674 in exec_internal (filename=0x704bca8 
"/system/bin/init", argv=0x70907a0, envp=0x0, exports=0x0, nexports=0, actions=0x0, 
attr=0x7090788, spawn=true) at binfmt_exec.c:98
#10 0x000000000702779c in exec_spawn (filename=0x704bca8 "/system/bin/init", 
argv=0x70907a0, envp=0x0, exports=0x0, nexports=0, actions=0x0, attr=0x7090788) at 
binfmt_exec.c:220
#11 0x000000000700299e in nx_start_application () at init/nx_bringup.c:375
#12 0x00000000070029f0 in nx_start_task (argc=1, argv=0x7090010) at 
init/nx_bringup.c:403
#13 0x0000000007003f84 in nxtask_start () at task/task_start.c:107



It looks like mm/mm_heap data structure consistency was broken. As I am unfamilar 
with these internals, I am looking forward to&nbsp; any hints about how to find 
the root cause.







Regards,

yf

This does indicate heap corruption:

   240       /* Node next must be alloced, otherwise it should be merged.
   241        * Its prenode(the founded node) must be free and
   preceding should
   242        * match with nodesize.
   243        */
   244
   245       DEBUGASSERT(MM_NODE_IS_ALLOC(next) &&
   MM_PREVNODE_IS_FREE(next) &&
   246                   next->preceding == nodesize);

Heap corruption normally occurs when that this a wild write outside of the allocated memory region.  These kinds of wild writes may clobber some other threads data and directory or indirectly clobber the heap meta data.  Trying to traverse the damages heap meta data is probably the root cause of the problem.

Only a kernel thread or interrupt handler could damage the heap.

The cause of this corruption can be really difficult to find because the reported error does not occur when the heap is damaged but may not manifest itself until sometime later.

It is unlikely that anyone will be able to solve this by just talking about it.  It might be worth increasing some kernel thread heap sizes just to eliminate that common cause.


Reply via email to