The verification for makedumpfile used the vmcore file provided by another user instead of /proc/vmcore (which is identical, as it's a simple 'cp' copy of /proc/vmcore, per makedumpfile error.)
$ ls -lh /home/ubuntu/201909170743/vmcore.201909170743 -r-------- 1 ubuntu ubuntu 32G Sep 17 2019 /home/ubuntu/201909170743/vmcore.201909170743 $ file /home/ubuntu/201909170743/vmcore.201909170743 /home/ubuntu/201909170743/vmcore.201909170743: ELF 64-bit LSB core file ARM aarch64, version 1 (SYSV), SVR4-style The reproducer system is an arm64 guest in our internal openstack cloud with the same kernel version as the user (4.15.0-76-generic.) $ grep -ao -m1 'Linux version .* ' /home/ubuntu/201909170743/vmcore.201909170743 Linux version 4.15.0-76-generic (buildd@bos02-arm64-060) (gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)) #86-Ubuntu SMP Fri Jan 17 17:25:58 UTC 2020 (Ubuntu 4.15.0-76.86-generic $ uname -mrv 4.15.0-76-generic #86-Ubuntu SMP Fri Jan 17 17:25:58 UTC 2020 aarch64 When using the original package version, makedumpfile fails with error messages about a particular address, then kdump-tools (the caller of makedumpfile) falls back to 'cp', as reported. This takes a long time since it's a 32 GB file. So, since the second step / invocation of makedumpfile, to store the dmesg output on vmcore (makedumpfile --dump-dmesg), fails in the same way (and for the very same particular address), that is, an equivalent failure / symptom of the same root cause, then use only that step, which runs fast regardless of failure or success. These are the changes done to /usr/sbin/kdump-config (shell script independent of makedumpfile binary/executable code.) # Constants #vmcore_file=/proc/vmcore vmcore_file=/home/ubuntu/201909170743/vmcore.201909170743 ... function kdump_save_core() ... log_action_msg "running makedumpfile $MAKEDUMP_ARGS $vmcore_file $KDUMP_CORETEMP" #makedumpfile $MAKEDUMP_ARGS $vmcore_file $KDUMP_CORETEMP #ERROR=$? ERROR=0 if [ $ERROR -ne 0 ] ; then log_failure_msg "$NAME: makedumpfile failed, falling back to 'cp'" ... For documentation purposes, With the user's vmcore, and still collecting the crashdump (i.e., first invocation of makedumpfile), the exact error reproduces: $ dpkg -s makedumpfile | grep -i version Version: 1:1.6.5-1ubuntu1~18.04.4 $ echo 1 | sudo tee /proc/sys/kernel/sysrq && echo c | sudo tee /proc/sysrq-trigger ... [ 222.162389] sysrq: SysRq : Trigger a crash ... [ 222.185756] Call trace: [ 222.186091] sysrq_handle_crash+0x24/0x30 [ 222.186628] __handle_sysrq+0xbc/0x1c0 [ 222.187128] write_sysrq_trigger+0xb8/0x120 [ 222.187690] proc_reg_write+0x80/0xc0 [ 222.188182] __vfs_write+0x48/0x80 [ 222.188639] vfs_write+0xac/0x1b0 [ 222.189148] SyS_write+0x74/0xf0 [ 222.189585] el0_svc_naked+0x30/0x34 [ 222.190073] Code: 52800020 b90ca020 d5033e9f d2800001 (39000020) [ 222.190892] SMP: stopping secondary CPUs [ 222.193873] Starting crashdump kernel... [ 222.194414] Bye! ... [ 8.168635] kdump-tools[516]: Starting kdump-tools: * running makedumpfile -c -d 31 /home/ubuntu/201909170743/vmcore.201909170743 /var/crash/202006171229/dump-incomplete [ 8.185786] kdump-tools[516]: readmem: Can't convert a virtual address(6a2) to physical address. [ 8.187727] kdump-tools[516]: readmem: type_addr: 0, addr:6a2, size:1032 [ 8.191012] kdump-tools[516]: validate_mem_section: Can't read mem_section array. [ 8.195497] kdump-tools[516]: get_mem_section: Could not validate mem_section. [ 8.197581] kdump-tools[516]: get_mm_sparsemem: Can't get the address of mem_section. [ 8.199601] kdump-tools[516]: makedumpfile Failed. [ 8.204346] kdump-tools[516]: * kdump-tools: makedumpfile failed, falling back to 'cp' And when skipping that, and just collecting the dmesg output, the same error reproduces: [ 8.369266] kdump-tools[513]: Starting kdump-tools: * running makedumpfile -c -d 31 /home/ubuntu/201909170743/vmcore.201909170743 /var/crash/202006171242/dump-incomplete [ 8.382379] kdump-tools[513]: mv: cannot stat '/var/crash/202006171242/dump-incomplete': No such file or directory [ 8.385529] kdump-tools[513]: * kdump-tools: saved vmcore in /var/crash/202006171242 [ 8.405479] kdump-tools[513]: * running makedumpfile --dump-dmesg /home/ubuntu/201909170743/vmcore.201909170743 /var/crash/202006171242/dmesg.202006171242 [ 8.422223] kdump-tools[513]: readmem: Can't convert a virtual address(6a2) to physical address. [ 8.424872] kdump-tools[513]: readmem: type_addr: 0, addr:6a2, size:1032 [ 8.428291] kdump-tools[513]: validate_mem_section: Can't read mem_section array. [ 8.429779] kdump-tools[513]: get_mem_section: Could not validate mem_section. [ 8.432257] kdump-tools[513]: get_mm_sparsemem: Can't get the address of mem_section. [ 8.436297] kdump-tools[513]: makedumpfile Failed. [ 8.439351] kdump-tools[513]: * kdump-tools: makedumpfile --dump-dmesg failed. dmesg content will be unavailable [ 8.442197] kdump-tools[513]: * kdump-tools: failed to save dmesg content in /var/crash/202006171242 [ 8.448570] kdump-tools[513]: Wed, 17 Jun 2020 12:42:16 +0000 [ 8.455630] kdump-tools[513]: Rebooting. [ 8.514560] reboot: Restarting system So, this shows the failures seen on the first and second invocations of makedumpfile are equivalent, and we can use just the second (--dump-dmesg) which runs faster, for the purposes of verifying this SRU. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to makedumpfile in Ubuntu. https://bugs.launchpad.net/bugs/1869465 Title: Kdump-Tools: Makedumpfile Failed, Falling Back To 'Cp' Status in makedumpfile package in Ubuntu: Fix Released Status in makedumpfile source package in Xenial: Fix Committed Status in makedumpfile source package in Bionic: Fix Committed Status in makedumpfile source package in Eoan: Fix Committed Status in makedumpfile source package in Focal: Fix Committed Status in makedumpfile source package in Groovy: Fix Released Status in makedumpfile package in Debian: New Bug description: [Impact] On some arm systems makedumpfile fails to translate virtual to physical addresses properly. This may result in makedumpfile looping forever exhausting all memory, or translating a virtual address to an invalid physical address and then failing and falling back to cp. The reason it cannot resolve some addresses is because the PMD mask is wrong. When physical address mask allows up to 48bits pmd mask should allow the same, currently pmd mask is set to 40bits (see commit [1]). Commit [1] fixes this bug. [Test Case] To hit this bug you need a system that needs physical addresses over 1TB. This may be either because you have a lot of memory or because the firmware mapped some memory above 1TB for some reason [1]. A user hit this bug because firmware mapped memory above 1TB and provided a dump so I could reproduce the bug when running makedumpfile on the dump. [Regression Potential] This commit changes the PMD_SECTION_MASK for arm64. So any regression potential would only affect arm64 systems. In addition PMD_SECTION_MASK is used in translation from virtual to physical addresses and therefore any regression would happen during this process. [Other] [1] https://github.com/makedumpfile/makedumpfile/commit/7242ae4cb5288df626f464ced0a8b60fd669100b When testing kdump on Ubuntu 18.04.4 (arm64) GA kernel, makedumpfile fails. The test steps are as follows: # echo 1> / proc / sys / kernel / sysrq # echo c> / proc / sysrq-trigger The logs are as follows: kdump-tools[646]: starting kdump-tools: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/202003251128/dump-incomplete kdump-tools[646]: readpage_elf: Attempt to read non-existent page at 0x0 kdump-tools[646]: readmem: type_addr: 1, addr:ff0, size:8 kdump-tools[646]: vaddr_to_paddr_arm64: Can't read pud kdump-tools[646]: readmem: Can't convert a virtual address(ffff9e653690) to physical address. kdump-tools[646]: readmem: type_addr: 0, addr:ffff9e653690, size:1032 kdump-tools[646]: validate_mem_section: Can't read mem_section array. kdump-tools[646]: get_mem_section: Could not validate mem_section. kdump-tools[646]: get_mm_sparsemem: Can't get the address of mem_section. kdump-tools[646]: makedumpfile Failed. kdump-tools[646]: * kdump-tools: makedumpfile failed, falling back to 'cp' But when I use the HWE kernel, I find that there is no such problem. The HEW kernel version: 5.3.0-42-generic To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1869465/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp