Re: mm/mm_heap assertion error

Gregory Nutt Sun, 10 Mar 2024 10:44:04 -0700

On 3/10/2024 4:38 AM, yfliu2008 wrote:

Dear experts,





When doing regression check on K230 with a previously working Kernel mode 
configuration, I got assertion error like below:



#0 &nbsp;_assert (filename=0x704c598 "mm_heap/mm_malloc.c", linenum=245, msg=0x0,regs=0x7082730 
<g_last_regs&gt;) at misc/assert.c:536#1 &nbsp;0x000000000700ca98 in __assert (filename=0x704c598 
"mm_heap/mm_malloc.c", linenum=245, msg=0x0) at assert/lib_assert.c:36
#2 &nbsp;0x00000000070110f0 in mm_malloc (heap=0x7089c00, size=112) at 
mm_heap/mm_malloc.c:245
#3 &nbsp;0x000000000700fd74 in kmm_malloc (size=112) at kmm_heap/kmm_malloc.c:51
#4 &nbsp;0x0000000007028d4e in elf_loadphdrs (loadinfo=0x7090550) at 
libelf/libelf_sections.c:207
#5 &nbsp;0x0000000007028b0c in elf_load (loadinfo=0x7090550)&nbsp; at 
libelf/libelf_load.c:337
#6 &nbsp;0x00000000070278aa in elf_loadbinary (binp=0x708f5d0, filename=0x704bca8 
"/system/bin/init", exports=0x0, nexports=0) at elf.c:257
#7 &nbsp;0x00000000070293ea in load_absmodule (bin=0x708f5d0, filename=0x704bca8 
"/system/bin/init", exports=0x0, nexports=0) at binfmt_loadmodule.c:115
#8 &nbsp;0x0000000007029504 in load_module (bin=0x708f5d0, filename=0x704bca8 
"/system/bin/init", exports=0x0, nexports=0)&nbsp; at binfmt_loadmodule.c:219
#9 &nbsp;0x0000000007027674 in exec_internal (filename=0x704bca8 
"/system/bin/init", argv=0x70907a0, envp=0x0, exports=0x0, nexports=0, actions=0x0, 
attr=0x7090788, spawn=true) at binfmt_exec.c:98
#10 0x000000000702779c in exec_spawn (filename=0x704bca8 "/system/bin/init", 
argv=0x70907a0, envp=0x0, exports=0x0, nexports=0, actions=0x0, attr=0x7090788) at 
binfmt_exec.c:220
#11 0x000000000700299e in nx_start_application () at init/nx_bringup.c:375
#12 0x00000000070029f0 in nx_start_task (argc=1, argv=0x7090010) at 
init/nx_bringup.c:403
#13 0x0000000007003f84 in nxtask_start () at task/task_start.c:107



It looks like mm/mm_heap data structure consistency was broken. As I am unfamilar 
with these internals, I am looking forward to&nbsp; any hints about how to find 
the root cause.







Regards,

yf


This does indicate heap corruption:

   240       /* Node next must be alloced, otherwise it should be merged.
   241        * Its prenode(the founded node) must be free and
   preceding should
   242        * match with nodesize.
   243        */
   244
   245       DEBUGASSERT(MM_NODE_IS_ALLOC(next) &&
   MM_PREVNODE_IS_FREE(next) &&
   246                   next->preceding == nodesize);

Heap corruption normally occurs when that this a wild write outside ofthe allocated memory region. These kinds of wild writes may clobbersome other threads data and directory or indirectly clobber the heapmeta data. Trying to traverse the damages heap meta data is probablythe root cause of the problem.


Only a kernel thread or interrupt handler could damage the heap.

The cause of this corruption can be really difficult to find because thereported error does not occur when the heap is damaged but may notmanifest itself until sometime later.

It is unlikely that anyone will be able to solve this by just talkingabout it. It might be worth increasing some kernel thread heap sizesjust to eliminate that common cause.

Re: mm/mm_heap assertion error

Reply via email to