The verification for makedumpfile used the vmcore file provided
by another user instead of /proc/vmcore (which is identical, as
it's a simple 'cp' copy of /proc/vmcore, per makedumpfile error.)

        $ ls -lh /home/ubuntu/201909170743/vmcore.201909170743
        -r-------- 1 ubuntu ubuntu 32G Sep 17  2019 
/home/ubuntu/201909170743/vmcore.201909170743

        $ file /home/ubuntu/201909170743/vmcore.201909170743
        /home/ubuntu/201909170743/vmcore.201909170743: ELF 64-bit LSB core file 
ARM aarch64, version 1 (SYSV), SVR4-style


The reproducer system is an arm64 guest in our internal openstack
cloud with the same kernel version as the user (4.15.0-76-generic.)

        $ grep -ao -m1 'Linux version .* ' 
/home/ubuntu/201909170743/vmcore.201909170743
        Linux version 4.15.0-76-generic (buildd@bos02-arm64-060) (gcc version 
7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)) #86-Ubuntu SMP Fri Jan 17 
17:25:58 UTC 2020 (Ubuntu 4.15.0-76.86-generic

        $ uname -mrv
        4.15.0-76-generic #86-Ubuntu SMP Fri Jan 17 17:25:58 UTC 2020 aarch64

When using the original package version, makedumpfile fails with
error messages about a particular address, then kdump-tools (the
caller of makedumpfile) falls back to 'cp', as reported.

This takes a long time since it's a 32 GB file.

So, since the second step / invocation of makedumpfile, to store
the dmesg output on vmcore (makedumpfile --dump-dmesg), fails in
the same way (and for the very same particular address), that is,
an equivalent failure / symptom of the same root cause, then use
only that step, which runs fast regardless of failure or success.

These are the changes done to /usr/sbin/kdump-config (shell script
independent of makedumpfile binary/executable code.)

        # Constants
        #vmcore_file=/proc/vmcore
        vmcore_file=/home/ubuntu/201909170743/vmcore.201909170743
        ...
        function kdump_save_core()
        ...
                log_action_msg "running makedumpfile $MAKEDUMP_ARGS 
$vmcore_file $KDUMP_CORETEMP"
                #makedumpfile $MAKEDUMP_ARGS $vmcore_file $KDUMP_CORETEMP
                #ERROR=$?
                ERROR=0
                if [ $ERROR -ne 0 ] ; then
                        log_failure_msg "$NAME: makedumpfile failed, falling 
back to 'cp'"
        ...

For documentation purposes,

With the user's vmcore, and still collecting the crashdump (i.e.,
first invocation of makedumpfile), the exact error reproduces:

        $ dpkg -s makedumpfile | grep -i version
        Version: 1:1.6.5-1ubuntu1~18.04.4

        $ echo 1 | sudo tee /proc/sys/kernel/sysrq && echo c | sudo tee 
/proc/sysrq-trigger
        ...
        [  222.162389] sysrq: SysRq : Trigger a crash
        ...
        [  222.185756] Call trace:
        [  222.186091]  sysrq_handle_crash+0x24/0x30
        [  222.186628]  __handle_sysrq+0xbc/0x1c0
        [  222.187128]  write_sysrq_trigger+0xb8/0x120
        [  222.187690]  proc_reg_write+0x80/0xc0
        [  222.188182]  __vfs_write+0x48/0x80
        [  222.188639]  vfs_write+0xac/0x1b0
        [  222.189148]  SyS_write+0x74/0xf0
        [  222.189585]  el0_svc_naked+0x30/0x34
        [  222.190073] Code: 52800020 b90ca020 d5033e9f d2800001 (39000020) 
        [  222.190892] SMP: stopping secondary CPUs
        [  222.193873] Starting crashdump kernel...
        [  222.194414] Bye!
        ...
        [    8.168635] kdump-tools[516]: Starting kdump-tools:  * running 
makedumpfile -c -d 31 /home/ubuntu/201909170743/vmcore.201909170743 
/var/crash/202006171229/dump-incomplete
        [    8.185786] kdump-tools[516]: readmem: Can't convert a virtual 
address(6a2) to physical address.
        [    8.187727] kdump-tools[516]: readmem: type_addr: 0, addr:6a2, 
size:1032
        [    8.191012] kdump-tools[516]: validate_mem_section: Can't read 
mem_section array.
        [    8.195497] kdump-tools[516]: get_mem_section: Could not validate 
mem_section.
        [    8.197581] kdump-tools[516]: get_mm_sparsemem: Can't get the 
address of mem_section.
        [    8.199601] kdump-tools[516]: makedumpfile Failed.
        [    8.204346] kdump-tools[516]:  * kdump-tools: makedumpfile failed, 
falling back to 'cp'

And when skipping that, and just collecting the dmesg output,
the same error reproduces:

        [    8.369266] kdump-tools[513]: Starting kdump-tools:  * running 
makedumpfile -c -d 31 /home/ubuntu/201909170743/vmcore.201909170743 
/var/crash/202006171242/dump-incomplete
        [    8.382379] kdump-tools[513]: mv: cannot stat 
'/var/crash/202006171242/dump-incomplete': No such file or directory
        [    8.385529] kdump-tools[513]:  * kdump-tools: saved vmcore in 
/var/crash/202006171242
        [    8.405479] kdump-tools[513]:  * running makedumpfile --dump-dmesg 
/home/ubuntu/201909170743/vmcore.201909170743 
/var/crash/202006171242/dmesg.202006171242
        [    8.422223] kdump-tools[513]: readmem: Can't convert a virtual 
address(6a2) to physical address.
        [    8.424872] kdump-tools[513]: readmem: type_addr: 0, addr:6a2, 
size:1032
        [    8.428291] kdump-tools[513]: validate_mem_section: Can't read 
mem_section array.
        [    8.429779] kdump-tools[513]: get_mem_section: Could not validate 
mem_section.
        [    8.432257] kdump-tools[513]: get_mm_sparsemem: Can't get the 
address of mem_section.
        [    8.436297] kdump-tools[513]: makedumpfile Failed.
        [    8.439351] kdump-tools[513]:  * kdump-tools: makedumpfile 
--dump-dmesg failed. dmesg content will be unavailable
        [    8.442197] kdump-tools[513]:  * kdump-tools: failed to save dmesg 
content in /var/crash/202006171242
        [    8.448570] kdump-tools[513]: Wed, 17 Jun 2020 12:42:16 +0000
        [    8.455630] kdump-tools[513]: Rebooting.
        [    8.514560] reboot: Restarting system

So, this shows the failures seen on the first and second
invocations of makedumpfile are equivalent, and we can
use just the second (--dump-dmesg) which runs faster, 
for the purposes of verifying this SRU.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1869465

Title:
  Kdump-Tools: Makedumpfile Failed, Falling Back To 'Cp'

Status in makedumpfile package in Ubuntu:
  Fix Released
Status in makedumpfile source package in Xenial:
  Fix Committed
Status in makedumpfile source package in Bionic:
  Fix Committed
Status in makedumpfile source package in Eoan:
  Fix Committed
Status in makedumpfile source package in Focal:
  Fix Committed
Status in makedumpfile source package in Groovy:
  Fix Released
Status in makedumpfile package in Debian:
  New

Bug description:
  [Impact]

  On some arm systems makedumpfile fails to translate virtual to physical 
addresses properly.
  This may result in makedumpfile looping forever exhausting
  all memory, or  translating a virtual address to an invalid physical address 
  and then failing and falling back to cp.
  The reason it cannot resolve some addresses is because the PMD mask is wrong. 
  When physical address mask allows up to 48bits pmd mask should allow the
  same, currently pmd mask is set to 40bits (see commit [1]).

  Commit [1] fixes this bug.

  [Test Case]

  To hit this bug you need a system that needs physical addresses over 1TB.
  This may be either because you have a lot
  of memory or because the firmware mapped some memory above 1TB for some
  reason [1].

  A user hit this bug because firmware mapped memory above 1TB and provided a 
  dump so I could reproduce the bug when running makedumpfile on the dump.

  [Regression Potential]

  This commit changes the PMD_SECTION_MASK for arm64. So any regression 
potential
  would only affect arm64 systems. In addition PMD_SECTION_MASK is used in 
translation
  from virtual to physical addresses and therefore any regression would happen 
during
  this process.

  [Other]

  [1]
  
https://github.com/makedumpfile/makedumpfile/commit/7242ae4cb5288df626f464ced0a8b60fd669100b

  
  When testing kdump on Ubuntu 18.04.4 (arm64) GA kernel, makedumpfile fails. 
The test steps are as follows:
  # echo 1> / proc / sys / kernel / sysrq
  # echo c> / proc / sysrq-trigger
  The logs are as follows:

  kdump-tools[646]: starting kdump-tools: * running makedumpfile -c -d 31 
/proc/vmcore /var/crash/202003251128/dump-incomplete
  kdump-tools[646]: readpage_elf: Attempt to read non-existent page at 0x0
  kdump-tools[646]: readmem: type_addr: 1, addr:ff0, size:8
  kdump-tools[646]: vaddr_to_paddr_arm64: Can't read pud
  kdump-tools[646]: readmem: Can't convert a virtual address(ffff9e653690) to 
physical address.
  kdump-tools[646]: readmem: type_addr: 0, addr:ffff9e653690, size:1032
  kdump-tools[646]: validate_mem_section: Can't read mem_section array.
  kdump-tools[646]: get_mem_section: Could not validate mem_section.
  kdump-tools[646]: get_mm_sparsemem: Can't get the address of mem_section.
  kdump-tools[646]: makedumpfile Failed.
  kdump-tools[646]: * kdump-tools: makedumpfile failed, falling back to 'cp'

  But when I use the HWE kernel, I find that there is no such problem.
  The HEW kernel version: 5.3.0-42-generic

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1869465/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to