looks like a corruption is happening in machine_kexec.c:machine_kexec()
in the final part - let's see a debugging session:

reboot the panda board and connect via openocd+gdb:

(gdb) l machine_kexec,+50
84      {
85              unsigned long page_list;
86              unsigned long reboot_code_buffer_phys;
87              void *reboot_code_buffer;
88
89              printk("%s::%d\n", __FUNCTION__, __LINE__);
90              page_list = image->head & PAGE_MASK;
91
92              /* we need both effective and real address here */
93              reboot_code_buffer_phys =
94                  page_to_pfn(image->control_code_page) << PAGE_SHIFT;
95              reboot_code_buffer = page_address(image->control_code_page);
96
97              /* Prepare parameters for reboot_code_buffer*/
98              kexec_start_address = image->start;
99              kexec_indirection_page = page_list;
100             kexec_mach_type = machine_arch_type;
101             kexec_boot_atags = image->start - KEXEC_ARM_ZIMAGE_OFFSET + 
KEXEC_ARM_ATAGS_OFFSET;
102     printk("%s::%d\n", __FUNCTION__, __LINE__);
103
104             /* copy our kernel relocation code to the control code page */
105             memcpy(reboot_code_buffer,
106                    relocate_new_kernel, relocate_new_kernel_size);
107
108             printk("%s::%d kexec_start_address: 0x%08lx reboot_code_buffer: 
%p reboot_code_buffer_phys: 0x%08lx relocate_new_kernel_size: %u\n",
109             __FUNCTION__, __LINE__, kexec_start_address, 
reboot_code_buffer, reboot_code_buffer_phys, relocate_new_kernel_size);
110
111             flush_icache_range((unsigned long) reboot_code_buffer,
112                                (unsigned long) reboot_code_buffer + 
KEXEC_CONTROL_PAGE_SIZE);
113             printk(KERN_INFO "Bye!\n");
114
115             if (kexec_reinit)
116                     kexec_reinit();
117             local_irq_disable();
118             local_fiq_disable();
119             setup_mm_for_reboot(0); /* mode is not used, so just pass 0*/
120             flush_cache_all();
121             outer_flush_all();
122             outer_disable();
123             cpu_proc_fin();
124             outer_inv_all();
125             flush_cache_all();
126             cpu_reset(reboot_code_buffer_phys);
127     }
(gdb) break machine_kexec.c:111
Breakpoint 1 at 0xc0051bc4: file 
/home/flag/canonical/ubuntu-natty/arch/arm/kernel/machine_kexec.c, line 111.
(gdb) c
Continuing.

via ssh i execute kexec:

flag@omap:~$ uname -a
Linux omap 2.6.38-1208-omap4 #11 PREEMPT Thu Apr 28 10:44:16 CEST 2011 armv7l 
GNU/Linux
flag@omap:~$ sudo kexec -l /boot/vmlinuz-2.6.38-1208-omap4 
--initrd=/boot/initrd.img-2.6.38-1208-omap4 --command-line="`cat /proc/cmdline`"
flag@omap:~$ sudo kexec -e

on the serial console:

[   72.652587] sys_reboot::380
[   72.655517] kernel_kexec::1504
[   72.986114] omapdss DISPC error: timeout waiting for EVSYNC
[   73.087677] omapdss DISPC error: timeout waiting for EVSYNC
[   73.093566] Starting new kernel
[   73.097625] machine_kexec::89
[   73.100738] machine_kexec::102
[   73.104339] machine_kexec::109 kexec_start_address: 0x80008000 
reboot_code_buffer: ee4af000 reboot_code_buffer_phys: 0xae4af000 
relocate_new_kernel_size: 136

back to gdb:

Breakpoint 1, machine_kexec (image=<value optimized out>) at 
/home/flag/canonical/ubuntu-natty/arch/arm/kernel/machine_kexec.c:111
111             flush_icache_range((unsigned long) reboot_code_buffer,
(gdb) p/x reboot_code_buffer
$3 = 0xee4af000
(gdb) p/x reboot_code_buffer_phys 
$4 = 0xae4af000

everything looks good, advance till line 113:

(gdb) p/x reboot_code_buffer_phys 
$5 = 0xae4af000
(gdb) p/x reboot_code_buffer
$6 = 0x0

here is the first weird thing, reboot_code_buffer is NULL now, advance
some more till line 122.

at this point the kernel relocation code
(arch/arm/kernel/relocate_kernel.S::relocate_new_kernel()) it has
already been copied to reboot_code_buffer (see the memcpy at line 105),
and the caches have been flushed, so i expect to find that routine there
- and dumping the memory confirms it:

(gdb) x/20 relocate_new_kernel
0xc0051dac <relocate_new_kernel>:       0xe59f0074      0xe59f106c      
0xe3500000      0x0a000014
0xc0051dbc <relocate_new_kernel+16>:    0xe4903004      0xe3130001      
0x0a000001      0xe3c34001
0xc0051dcc <relocate_new_kernel+32>:    0xeafffffa      0xe3130002      
0x0a000001      0xe3c30002
0xc0051ddc <relocate_new_kernel+48>:    0xeafffff6      0xe3130004      
0x0a000000      0xea000008
0xc0051dec <relocate_new_kernel+64>:    0xe3130008      0x0afffff1      
0xe3c33008      0xe3a06b01

(gdb) x/20 0xee4af000  (< this is the relocate_code_buffer value AKA 
reboot_code_buffer as the MMU sees it before turning 0 - see above)
0xee4af000:     0xe59f0074      0xe59f106c      0xe3500000      0x0a000014
0xee4af010:     0xe4903004      0xe3130001      0x0a000001      0xe3c34001
0xee4af020:     0xeafffffa      0xe3130002      0x0a000001      0xe3c30002
0xee4af030:     0xeafffff6      0xe3130004      0x0a000000      0xea000008
0xee4af040:     0xe3130008      0x0afffff1      0xe3c33008      0xe3a06b01

dump the physical memory via openocd to double check:

(gdb) mon mdw phys 0xae4af000 20
0xae4af000: e59f0074 e59f106c e3500000 0a000014 e4903004 e3130001 0a000001 
e3c34001 
0xae4af020: eafffffa e3130002 0a000001 e3c30002 eafffff6 e3130004 0a000000 
ea000008 
0xae4af040: e3130008 0afffff1 e3c33008 e3a06b01

fine, everything is setup, let's step till line 126, the last line:

(gdb) p/x reboot_code_buffer_phys
$7 = 0x815d8000

uh? what? this is wrong, why has reboot_code_buffer_phys changed?

as memory dump shows:

(gdb) mon mdw phys 0x815d8000 20
0x815d8000: 80001001 bef27008 80008001 bef35008 bf000008 bef36008 bef37008 
befd2008 
0x815d8020: befd3008 bf002008 bf003008 bef24008 bef25008 befd4008 befd5008 
befd6008 
0x815d8040: befd7008 bef20008 bef21008 bef22008

this is NOT the relocation routine that was copied with memcpy(), and
that is NOT the memory location we should jump!

let's stepi a bit more:

(gdb) stepi
0xc0051c60      126             cpu_reset(reboot_code_buffer_phys);
(gdb) stepi
cpu_v7_reset () at /home/flag/canonical/ubuntu-natty/arch/arm/mm/proc-v7.S:64
64              mov     pc, r0
(gdb) p/x $r0
$8 = 0x815d8000
(gdb) stepi
0x815d8000 in ?? ()
(gdb) c
Continuing.

and here is the panic:

[  518.057678] omap_device: omap-aess-audio.-1: new worst case activate latency 
0: 152587
[  518.067565] omap_device: omap-aess-audio.-1: new worst case deactivate 
latency 0: 61035
[  518.089172] Internal error: Oops - undefined instruction: 0 [#1] PREEMPT
[  518.096221] last sysfs file: /sys/kernel/kexec_loaded
[  518.101501] Modules linked in: wl12xx wl12xx_sdio btsdio
[  518.107086] CPU: 0    Tainted: G        W    (2.6.38-1208-omap4 #11)
[  518.113739] PC is at 0x815dc040
[  518.117065] LR is at machine_kexec+0x15c/0x198
[  518.121704] pc : [<815dc040>]    lr : [<c0051c64>]    psr: 600000d3
[  518.121704] sp : c170de50  ip : 000098c9  fp : 000098c9
[  518.133758] r10: 00000105  r9 : 00000000  r8 : 815d8000
[  518.139221] r7 : c170c000  r6 : 00008000  r5 : 815d8000  r4 : c0051e24
[  518.146057] r3 : 00000002  r2 : 00000005  r1 : 00008000  r0 : 00000000
[  518.152893] Flags: nZCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment 
user
[  518.160552] Control: 10c5387d  Table: ae4dc059  DAC: 00000015
[  518.166564] Process kexec (pid: 542, stack limit = 0xc170c2f0)
[  518.172668] Stack: (0xc170de50 to 0xc170e000)
[  518.177246] de40:                                     ee4af000 ae4af000 
00000088 00000000
[  518.185821] de60: c08a64ac c178cc00 fee1dead 00000000 c004c528 c00c091c 
45584543 45584543
[  518.194396] de80: c0835ed0 c009ec90 00000002 c04a8908 ef9cb800 00000002 
00000000 ffffffef
[  518.202972] dea0: 00000000 c05d45b0 ef9cb800 00001001 00000000 00000000 
c160b000 c00abc58
[  518.211517] dec0: 00000000 00000002 ef9cb800 c04a294c ef9cb800 00001003 
c170ded8 c170ded8
[  518.220092] dee0: 00000002 00000000 ee584f40 c04f50bc ee48a220 ee48a220 
ef4adc00 ef6eef00
[  518.228668] df00: ffffffff c0164480 ef4adc00 00000000 ef6eef00 00000008 
ef4adc20 00000000
[  518.237243] df20: ef6eef00 00000000 ef6eef00 c0143a2c 00000014 ef6eef00 
00000000 c0143c58
[  518.245819] df40: c1608e00 ef4adc20 ef6eef00 c0132600 00000000 00000000 
c1608e14 c1608e00
[  518.254394] df60: 00000000 c160ccc0 000241b8 c014c224 c1608e00 00000000 
c160ccc0 c012eebc
[  518.262939] df80: 00000003 00000001 c170c000 cdcefa50 00000002 400e95e7 
00000000 40101fd0
[  518.271514] dfa0: 00000058 c004c340 400e95e7 00000000 fee1dead 28121969 
45584543 00000000
[  518.280090] dfc0: 400e95e7 00000000 40101fd0 00000058 00000000 00000001 
000241c8 00000105
[  518.288665] dfe0: be967728 be967718 0000ab4b 401a47b0 80000010 fee1dead 
00000000 00000000
[  518.297271] [<c0051c64>] (machine_kexec+0x15c/0x198) from [<c00c091c>] 
(kernel_kexec+0xd8/0x104)
[  518.306518] [<c00c091c>] (kernel_kexec+0xd8/0x104) from [<c009ec90>] 
(sys_reboot+0x190/0x214)
[  518.315460] [<c009ec90>] (sys_reboot+0x190/0x214) from [<c004c340>] 
(ret_fast_syscall+0x0/0x30)
[  518.324584] Code: 00000000 00000000 c15dc038 c15dc038 (ee4f1160)
[  518.331207] usb 1-1.1: ifconfig timed out on ep0out len=0/4
[  518.337066] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 
0x00000014
init: rsyslog main process (347) killed by SEGV signal
init: rsyslog main process ended, respawning
[  518.373504] smsc95xx 1-1.1:1.0: eth0: Failed to write HW_CFG_LRST_ bit in 
HW_CFG register, ret = -110
[  518.549499] ---[ end trace da227214a82491ba ]---

to me, it looks like a memory (stack?) corruption in the final part of
machine_kexec(), anyone with mmu/openocd/gdb knowledge is welcome at
this point.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/768249

Title:
  kexec panic: external abort on non-linefetch (0x1008) at 0x03510000

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to