On a glibc build, procd sometimes terminates with SIGILL, and since it's pid 1, a panic occurs. This is with HEAD of source.git, procd, libubox and ubus, as well as older (2014) builds. The platform is a Cortex A9 SoC with kernels 3.2 and 3.19 available:
[ 6.356037] mount_root: mounting /dev/root [ 6.377581] procd: - early - [ 6.563175] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 If I fork a "gdb -attach 1" from init before it exec's to procd, I can do a 'bt': #0 0x76f09962 in uloop_setup_signals (add=<optimized out>) at /home/bogdan/lede/build_dir/target-arm_cortex-a9_glibc-2.21_eabi/libubox-2016-02-263 #1 0x0000000a in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) disas Dump of assembler code for function uloop_setup_signals: ... 0x76f09934 <+48>: add r2, r4, #144 ; 0x90 0x76f09938 <+52>: mov r3, r5 0x76f0993c <+56>: mov r1, r6 0x76f09940 <+60>: mov r0, #15 0x76f09944 <+64>: bl 0x76f09874 <uloop_install_handler> 0x76f09948 <+68>: ldr r1, [pc, #96] ; 0x76f099b0 <uloop_setup_signals+172> 0x76f0994c <+72>: add r2, r4, #284 ; 0x11c 0x76f09950 <+76>: mov r3, r5 0x76f09954 <+80>: mov r0, #17 0x76f09958 <+84>: add r4, sp, #4 0x76f0995c <+88>: add r1, pc, r1 0x76f09960 <+92>: bl 0x76f09874 <uloop_install_handler> 0x76f09964 <+96>: mov r2, r4 0x76f09968 <+100>: mov r1, #0 0x76f0996c <+104>: mov r0, #13 0x76f09970 <+108>: bl 0x76f07dcc <sigaction@plt> 0x76f09974 <+112>: cmp r5, #0 ... And 0x76f09962 from the bt is in the middle of the 'bl' opcode at 9960. Since uloop_setup_signals is called directly via main -> uloop_run, is it possible that the pc printed by gdb from SIGILL is not right ? I used __cyg_profile_func_enter/exit to record all function calls/exits in a circular queue and dumping this from gdb it seems the stack should be different. When the SIGILL arrives, the call queue usually ends with: json_process_expr __json_process_type handle_expr_regex expr_eq_regex json_get_tuple ... msg_find_var ... blobmsg_type blobmsg_data eq_regex_cmp ...and possibly some libc call here which __cyg_... doesn't record (indent level shows call depth, but calls at each level are recorded as well since it's a queue not just a stack). The crash happens about once in 20 reboots. Things that prevent it from happening: - building userspace with uclibc instead of glibc - kernel 3.2 instead of 3.19 - running init under valgrind I've compiled init and its libs with -fsanitize=address, libssp, but these didn't turn up anything. My first question is, what other reasons besides ending up in invalid ARM code are there for receiving a SIGILL ? Can this be related to accessing invalid memory, calling privileged opcodes, or some other illegal action the process is doing ? The second question is, under what conditions could the kernel send a SIGILL with an unrelated pc pointer ? Thanks, Bogdan _______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev