** Tags removed: targetmilestone-inin--- ** Tags added: targetmilestone-inin1704
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to kexec-tools in Ubuntu. https://bugs.launchpad.net/bugs/1661168 Title: In Ubuntu16.10: Kdump stuck in boot for longer time need to force reboot via HMC in 32TB Brazos System Status in kexec-tools package in Ubuntu: Fix Released Status in kexec-tools source package in Yakkety: New Bug description: Problem Description =========================== In Ubuntu16.10 tried kdump in Brazos system (32TB Memory and 192 core). when trigger panic kdump process stuck in boot process need to do force reboot .After reboot system captured vmcore-incomplete. Reproducible Step: ====================== 1- Install Ubuntu16.10 2- boot system with 31TB and 192 Core 3- configure kdump in system 4- verify kdump in system that it is running 5- Trigger panic in system Actual Result -------------------------- kdump process stuck in boot process need to do force reboot Expected Result ----------------------------- Kdump will proceed and vmcore captured successfully. LOG: root@ltc-brazos1:~# cat /proc/cmdline BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash crashkernel=4096M root@ltc-brazos1:~# kdump-config show DUMP_MODE: kdump USE_KDUMP: 1 KDUMP_SYSCTL: kernel.panic_on_oops=1 KDUMP_COREDIR: /var/crash crashkernel addr: /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.4.0-30-generic kdump initrd: /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.4.0-30-generic current state: ready to kdump kexec command: /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz root@ltc-brazos1:~# root@ltc-brazos1:~# dpkg -l | grep kdump ii kdump-tools 1:1.6.0-2 all scripts and tools for automating kdump (Linux crash dumps) root@ltc-brazos1:~# root@ltc-brazos1:~# echo c > /proc/sysrq-trigger ltc-brazos1 login: [ 416.229464] sysrq: SysRq : Trigger a crash [ 416.229496] Unable to handle kernel paging request for data at address 0x00000000 [ 416.229502] Faulting instruction address: 0xc000000000670014 [ 416.229508] Oops: Kernel access of bad area, sig: 11 [#1] [ 416.229511] SMP NR_CPUS=2048 NUMA pSeries [ 416.229517] Modules linked in: pseries_rng btrfs xor raid6_pq rtc_generic sunrpc autofs4 ses enclosure ipr [ 416.229532] CPU: 65 PID: 404785 Comm: bash Not tainted 4.4.0-30-generic #49-Ubuntu [ 416.229537] task: c00001f9d583c8e0 ti: c00001fa13cd8000 task.ti: c00001fa13cd8000 [ 416.229543] NIP: c000000000670014 LR: c0000000006710c8 CTR: c00000000066ffe0 [ 416.229548] REGS: c00001fa13cdb990 TRAP: 0300 Not tainted (4.4.0-30-generic) [ 416.229552] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28242222 XER: 00000001 [ 416.229565] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1 GPR00: c0000000006710c8 c00001fa13cdbc10 c0000000015b5d00 0000000000000063 GPR04: c00001fab9049c50 c00001fab905b4e0 c0001f3fff3d0000 0000000000000313 GPR08: 0000000000000007 0000000000000001 0000000000000000 c0001f3fff3dec68 GPR12: c00000000066ffe0 c000000007546980 ffffffffffffffff 0000000022000000 GPR16: 0000000010170dc8 00000100174901d8 0000000010140f58 00000000100c7570 GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608 GPR24: 00003ffff8966c94 0000000000000001 c0000000014f8e58 0000000000000004 GPR28: c0000000014f9218 0000000000000063 c0000000014b11dc 0000000000000000 [ 416.229631] NIP [c000000000670014] sysrq_handle_crash+0x34/0x50 [ 416.229636] LR [c0000000006710c8] __handle_sysrq+0xe8/0x270 [ 416.229640] Call Trace: [ 416.229645] [c00001fa13cdbc10] [c000000000e08f28] _fw_tigon_tg3_bin_name+0x2ce58/0x342b0 (unreliable) [ 416.229652] [c00001fa13cdbc30] [c0000000006710c8] __handle_sysrq+0xe8/0x270 [ 416.229658] [c00001fa13cdbcd0] [c000000000671868] write_sysrq_trigger+0x78/0xa0 [ 416.229666] [c00001fa13cdbd00] [c00000000037ae30] proc_reg_write+0xb0/0x110 [ 416.229673] [c00001fa13cdbd50] [c0000000002e186c] __vfs_write+0x6c/0xe0 [ 416.229678] [c00001fa13cdbd90] [c0000000002e25a0] vfs_write+0xc0/0x230 [ 416.229684] [c00001fa13cdbde0] [c0000000002e35dc] SyS_write+0x6c/0x110 [ 416.229690] [c00001fa13cdbe30] [c000000000009204] system_call+0x38/0xb4 [ 416.229695] Instruction dump: [ 416.229698] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 394931e4 [ 416.229707] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6 [ 416.229717] ---[ end trace 16e5fbbf7faa7340 ]--- [ 416.232059] [ 416.232086] Sending IPI to other CPUs [ 416.242558] IPI complete [ [ 416.229695] Instruction dump: [ 416.229698] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 394931e4 [ 416.229707] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6 [ 416.229717] ---[ end trace 16e5fbbf7faa7340 ]--- [ 416.232059] [ 416.232086] Sending IPI to other CPUs [ 416.242558] IPI complete I'm in purgatory -> smp_release_cpus() spinning_secondaries = 1528 <- smp_release_cpus() <- setup_system() [ 1.146155] sd 0:2:1:0: [sdb] Assuming drive cache: write through [ 1.154176] sd 0:2:0:0: [sda] Assuming drive cache: write through /dev/sdb2: recovering journal /dev/sdb2: clean, 69482/136331264 files, 9047821/545318400 blocks --------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------- tu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 1 6.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101 ;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. --------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------- after force reboot root@ltc-brazos1:/var/crash# ls 201607161510 kexec_cmd root@ltc-brazos1:/var/crash# cd 201607161510/ root@ltc-brazos1:/var/crash/201607161510# ls vmcore-incomplete root@ltc-brazos1: Note : waited for Kdump process more than 2 Hour . Regards Praveen == Comment: #12 - Vaishnavi Bhat <vaish...@in.ibm.com> - 2016-09-16 02:40:20 == root@ltc-brazos1:~# kdump-config show DUMP_MODE: kdump USE_KDUMP: 1 KDUMP_SYSCTL: kernel.panic_on_oops=1 KDUMP_COREDIR: /var/crash crashkernel addr: /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.4.0-9136-generic kdump initrd: /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.4.0-9136-generic current state: ready to kdump kexec command: /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro quiet splash irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz root@ltc-brazos1:~# cat /proc/cmdline BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro crashkernel=4096M quiet splash crashkernel=4096M root@ltc-brazos1:~# dmesg | grep -i crash [ 0.000000] Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 31744000MB) [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro crashkernel=4096M quiet splash crashkernel=4096M == Comment: #26 - Hari Krishna Bathini <hbath...@in.ibm.com> - 2017-02-01 02:02:36 == The following kexec-tools commit is needed to fix this issue: commit f63d8530b9b6a2d7e79b946e326e5a2197eb8f87 Author: Petr Tesarik <ptesa...@suse.com> Date: Thu Jan 19 18:37:09 2017 +0100 ppc64: Reduce number of ELF LOAD segments The number of program header table entries (e_phnum) is an Elf64_Half, which is a 16-bit entity, i.e. the limit is 65534 entries (one entry is reserved for NOTE). This is a hard limit, defined by the ELF standard. It is possible that more LMBs (Logical Memory Blocks) are needed to represent all RAM on some machines, and this field overflows, causing an incomplete /proc/vmcore file. This has actually happened on a machine with 31TB of RAM and an LMB size of 256MB. However, since there is usually no memory hole between adjacent LMBs, the map can be "compressed", combining multiple adjacent into a single LOAD segment. Signed-off-by: Petr Tesarik <ptesa...@suse.com> Signed-off-by: Simon Horman <ho...@verge.net.au> To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1661168/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp