Eric, thanks for looking into the testing failure. I was discussing with Cascardo today, and he was able to spot quickly the problem. First, the reason "makedumpfile" test passed in ppc64el before was that the test was skipped; Cascardo changed "makedumpfile" to be part of the called "big packages"[0], in order the VMs used in the tests have more RAM. Now, the test gets executed and fails if kernel version is >= 4.20.
The reason of the failure was that "makedumpfile" couldn't collect a compressed dump, falling back to 'cp' - this led to the 'if' failure in the test. The root cause is that kernel patch 4ffe713b7587 ("powerpc/mm: Increase the max addressable memory to 2PB"), introduced in v4.20, requires a counterpart in "makedumpfile", in the form of patch [1]. Without that, I was able to reproduce the problem locally: $ makedumpfile -c -d 31 /proc/vmcore /var/crash/201908051539/dump-incomplete2 get_machdep_info_ppc64: Can't detect max_physmem_bits. makedumpfile Failed. When I've used kernel 4.18 in the same VM, I got: $ makedumpfile -c -d 31 /proc/vmcore core.418 Copying data : [100.0 %] \ eta: 0s The dumpfile is saved to core.418 The plan here according to Cascardo is to push makedumpfile 1.6.6 (already containing the fix for the "physmem_bits" issue as well as my fix for this LP, the retry/delay mechanism) to Eoan. After that, in my understanding, we can move on with the SRU for Bionic/Disco. Cheers, Guilherme [0] https://git.launchpad.net/~cascardo/autopkgtest-cloud/commit/?id=346b786925 [1] https://salsa.debian.org/debian/makedumpfile/commit/f349b51f -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to makedumpfile in Ubuntu. https://bugs.launchpad.net/bugs/1681909 Title: kdump is not captured in remote host when kdump over ssh is configured Status in The Ubuntu-power-systems project: In Progress Status in makedumpfile package in Ubuntu: Fix Committed Status in makedumpfile source package in Xenial: Won't Fix Status in makedumpfile source package in Bionic: In Progress Status in makedumpfile source package in Cosmic: Won't Fix Status in makedumpfile source package in Disco: In Progress Status in makedumpfile source package in Eoan: Fix Committed Bug description: [Impact] * Kdump over network (like NFS mount or SSH dump) relies on network- online target from systemd. Even so, there are some NICs that report "Link Up" state but aren't ready to transmit packets. This is a generally bad behavior that is credited probably to NIC firmware delays, usually not fixable from drivers. Some adapters known to act like this are bnx2x, tg3 and ixgbe. * Kdump is a mechanism that may be a last resort to debug complex/hard to reproduce issues, so it's interesting to increase its reliability / resilience. We then propose here a solution/quirk to this issue on network dump by adding a retry/delay mechanism; if it's a network dump, kdump will retry some times and sleep between the attempts in order to exclude the case of NICs that aren't ready yet but will soon be able to transmit packets. * Although first reported by IBM in PowerPC arch, the scope for this issue is the NIC, and it was later reported in x86 arch too. [Test case] Usually it's difficult to naturally reproduce this issue in a deterministic way, but we have an artificial test case on comment #24 of this LP. Also, we have a report from this bug in which the user managed to reproduce the problem consistently - it's fixed after testing our solution. [Regression potential] There's not a clear regression potential here since it's just a retry/delay mechanism. Some potential problems may come from bad coding in the script. The delay between attempts is only 3 sec per iteration, so it shouldn't block the kdump progress for a high amount of time at once. [Other information] Salsa Debian commit: https://salsa.debian.org/debian/makedumpfile/commit/d63ba95337988be1eac8c8c76d90825ff5c6d17f To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1681909/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp