Thanks everyone for trying to tackle this long-standing issue. fwiw,
here's my $0.02 no how we could proceed:

Someone should draft a special case page for makedumpfile:
https://wiki.ubuntu.com/StableReleaseUpdates#Documentation_for_Special_Cases
I'm happy to review/provide feedback, but I'd rather someone who would be 
carrying out the plan drive it.

As others have mentioned, testing is the hard part, and we need to
define what will be tested in the special case documentation. Since
makedumpfile is really just a filter, I don't think we need to (or
reasonably could) boot a bunch of systems in different configs and
generate crashdumps for every new update. Rather, i think we could build
a repository of representative, unfiltered, /proc/vmcore files that
focal's existing makedumpfile can parse. Then we can just check that all
of those files can still be parsed by the proposed makedumpfile. With
some scripting and a multi-architecture cloud, this could be automated.
In fact, if this vmcore repo were online, we could implement this an
autopkgtest (w/ needs-internet set). But we should also do at least one
end-to-end kdump, just to make sure the kdump-tools->makedumpfile
interface hasn't been broken.

What is a representative sample? One of each of the current LTS and HWE
kernels on amd64, arm64, ppc64el and s390x seems like an obvious start
(or the subset of those that actually work today). I don't think the
machine type is as important, VMs should be fine IMO. If we know of
examples where different machines expose structures differently in a way
that makedumpfile cares about, then perhaps add those as well. Once a
new makedumpfile lands that adds support for a new HWE kernel, we should
probably then update the repo w/ vmcore samples from that kernel, so we
can make sure the next update doesn't regress that support (probably
convenient to do when verifying the SRU, since I imagine we'd be testing
that it works w/ the new HWE kernel then anyway).

It'd be good to note in the special case request that kdump-tools does fall 
back to a raw /proc/vmcore file cp if makedumpfile fails, which can mitigate 
regressions for a subset of users 
 - those with the necessary disk space and lack of time constraints.

While I agree that crash falls into the same category, I don't think it
necessarily needs to happen at the same time. Obviously users running
focal need to dump their vmcore using focal - bug for developers
debugging a crash, I don't think it is to onerous to use a newer version
of Ubuntu. Again, I'm no saying we *shouldn't* add crash to the special
case, it just seems like a makedumpfile exception is significantly more
important.

Finally, I don't think we need to commit to a frequency of backports, or
a point at which they will stop. Rather we can just stick to agreeing on
how it *can* be done when someone has the time/interest in doing it.
Guillherme's LTS->LTS+1 scheme sounds like a reasonable pattern to shoot
for, but if that doesn't happen every time, we're still improving the
situation over the status quo.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1970672

Title:
  makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid
  pmd_pte."

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1970672/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to