Hi guys, Sometimes during an oops or panic, we do not get the kernel version from the console output. For example :
[Fri Oct 4 03:49:48 2013][ 204.396041] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [Fri Oct 4 03:49:48 2013][ 204.400010] IP: [<ffffffff811889a0>] rb_insert_color+0x1c/0xe5 [Fri Oct 4 03:49:48 2013][ 204.400010] Oops: 0000 [#1] PREEMPT SMP [Fri Oct 4 03:49:48 2013][ 204.400010] last sysfs file: /sys/devices/pci0000:00/0000:00:08.0/net/ma1/operstate [Fri Oct 4 03:49:48 2013][ 204.400010] Stack: [Fri Oct 4 03:49:48 2013][ 204.400010] Call Trace: [Fri Oct 4 03:49:49 2013][ 204.400010] Code: 06 48 8b 17 83 e2 03 48 09 c2 48 89 17 c9 c3 55 48 89 e5 41 57 41 56 49 89 f6 41 55 49 89 fd 41 54 53 e9 a4 00 00 00 49 83 e4 fc <49> 8b 44 24 10 48 39 c3 75 44 49 8b 44 24 08 48 85 c0 74 08 48 [Fri Oct 4 03:49:49 2013][ 204.400010] RIP [<ffffffff811889a0>] rb_insert_color+0x1c/0xe5 [Fri Oct 4 03:49:49 2013][ 204.400010] CR2: 0000000000000010 [Fri Oct 4 03:49:51 2013]1RU scd /sys/bus/pci/devices/0000:01:06.0/resource0 found, stopping phys (0xc3e803f2) This is because the console loglevel is lower than the default log level of the printk in in arch/x86/kernel/dumpstack.c: dumpstack() function. It would be great if we had the kernel version printed along with all the other information on the console. This would help us to debug/reporduce the crash with the kernel built with the same change number/code when the change number is part of the kernel version. Please also note that sometimes it is not possible to get the whole crash log from /var/log/console. On our systems, we do have the crash kernel mechanism set up but because of an unknown reason that we are still investigating, it does not reliably get triggered on some cases. The console log is only what we have. Also notice that sometimes all code paths does not even call dumpstack(). For example, see no_context() in arch/x86/mm/fault.c. This is what gets called in case of page fault. When page_fault happens in the kernel, it eventually calls __die(). This code calls show_registers() that does not call dump_stack() or print out the kernel version. Bottom line, there should be a reliable way to get the kernel version from the console when a crash happens (in any code path). I am proposing the attached patch that should help in these cases. I have compiled and tested the patch on our HW. Any feedback will be greatly appreciated. Cheers, Ani
Print kernel version on console during panic and oops. Sometimes during an oops or panic, we do not get the kernel version from the crash log. For example : [Fri Oct 4 03:49:48 2013][ 204.396041] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [Fri Oct 4 03:49:48 2013][ 204.400010] IP: [<ffffffff811889a0>] rb_insert_color+0x1c/0xe5 [Fri Oct 4 03:49:48 2013][ 204.400010] Oops: 0000 [#1] PREEMPT SMP [Fri Oct 4 03:49:48 2013][ 204.400010] last sysfs file: /sys/devices/pci0000:00/0000:00:08.0/net/ma1/operstate [Fri Oct 4 03:49:48 2013][ 204.400010] Stack: [Fri Oct 4 03:49:48 2013][ 204.400010] Call Trace: [Fri Oct 4 03:49:49 2013][ 204.400010] Code: 06 48 8b 17 83 e2 03 48 09 c2 48 89 17 c9 c3 55 48 89 e5 41 57 41 56 49 89 f6 41 55 49 89 fd 41 54 53 e9 a4 00 00 00 49 83 e4 fc <49> 8b 44 24 10 48 39 c3 75 44 49 8b 44 24 08 48 85 c0 74 08 48 [Fri Oct 4 03:49:49 2013][ 204.400010] RIP [<ffffffff811889a0>] rb_insert_color+0x1c/0xe5 [Fri Oct 4 03:49:49 2013][ 204.400010] CR2: 0000000000000010 [Fri Oct 4 03:49:51 2013]1RU scd /sys/bus/pci/devices/0000:01:06.0/resource0 found, stopping phys (0xc3e803f2) This is because the console loglevel is lower than the default log level of the printk printing the version. It would be great if we had the kernel version printed along with all the other information. This would help us debug/reporduce the crash with the kernel built with the same change number. Please also note that sometimes it is not possible to get the whole crash log from /var/log/console. The console log is only what we have. Signed-off-by : Ani Sinha <a...@aristanetworks.com> Index: linux-2.6.38/arch/x86/kernel/dumpstack.c =================================================================== --- linux-2.6.38.orig/arch/x86/kernel/dumpstack.c +++ linux-2.6.38/arch/x86/kernel/dumpstack.c @@ -15,6 +15,7 @@ #include <linux/bug.h> #include <linux/nmi.h> #include <linux/sysfs.h> +#include <linux/utsname.h> #include <asm/stacktrace.h> @@ -199,10 +200,10 @@ void dump_stack(void) { unsigned long stack; - printk("Pid: %d, comm: %.20s %s %s %.*s\n", + printk(KERN_EMERG "Pid: %d, comm: %.20s %s %s %.*s\n", current->pid, current->comm, print_tainted(), init_utsname()->release, - (int)strcspn(init_utsname()->version, " "), + (int)strcspn(init_utsname()->version, " "), init_utsname()->version); show_trace(NULL, NULL, &stack); } @@ -304,6 +305,10 @@ int __kprobes __die(const char *str, str printk_address(regs->ip, 1); printk(" RSP <%016lx>\n", regs->sp); #endif + printk(KERN_EMERG "Kernel version : %s %s %.*s\n", print_tainted(), + init_utsname()->release, + (int)strcspn(init_utsname()->version, " "), + init_utsname()->version); return 0; }