Public bug reported:

Howdy! We have noticed that when running 'kexec' on Azure VM's running the 
'6.8.0-1017.20~22.04.1' linux-azure kernel we will see kernel panics. Each 
panic doesn't have the exact same trace which is confusing to me. Here are a 
few I've collected: 
* https://paste.ubuntu.com/p/Vs2VsQvV33/
* https://paste.ubuntu.com/p/yXkN8w6Frh/

I attempted to downgrade to the linux-azure-lts-22.04 5.15.0.1074.72
kernel and to my surprise , the kexec also triggered another panic
there: https://paste.ubuntu.com/p/6Xq89qD8VT//. I've been unable to
reproduce this on the linux-aws and linux-gcp kernels. I've been able to
easily reproduce this behavior on Standard_D8as_v5 and Standard_F32s_v2
instance types. It doesn't appear to matter what kernel args are passed
into the kexec. My go to so far is a simple 'sudo kexec -l /boot/vmlinuz
--command-line="$(cat /proc/cmdline) fake-new-kernel-arg=1" -f' command
which triggers it. It does seem that one of the kernel args needs to
change for this to happen, simply running 'sudo kexec -l /boot/vmlinuz
--command-line="$(cat /proc/cmdline)" -f' doesn't appear to reproduce
the panics. It looks like this wasn't a problem on the previous 6.5
linux-azure kernel (which makes the 5.15 repro particularly
surprising...). For reference, we currently use these args:
`BOOT_IMAGE=/boot/vmlinuz-6.8.0-1017-azure root=PARTUUID=uuid-here ro
console=tty1 console=ttyS0 earlyprintk=ttyS0 nvme_core.io_timeout=240
apparmor=1 security=apparmor ipv6.disable=1 transparent_hugepage=madvise
systemd.unified_cgroup_hierarchy=0
systemd.legacy_systemd_cgroup_controller=true panic=-1`

>From my testing, eventually the systems will stabilize and stop
panicking but it's been undetermined as to why. I've also noticed cases
where a 'kexec' will happen and not trigger the panics but, that's
seemingly rare. I'd appreciate any help trying to debug this and am more
than happy to provide additional testing/repro info as needed. Thanks!

** Affects: linux-azure (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2086789

Title:
  kexec results in kernel panics

Status in linux-azure package in Ubuntu:
  New

Bug description:
  Howdy! We have noticed that when running 'kexec' on Azure VM's running the 
'6.8.0-1017.20~22.04.1' linux-azure kernel we will see kernel panics. Each 
panic doesn't have the exact same trace which is confusing to me. Here are a 
few I've collected: 
  * https://paste.ubuntu.com/p/Vs2VsQvV33/
  * https://paste.ubuntu.com/p/yXkN8w6Frh/

  I attempted to downgrade to the linux-azure-lts-22.04 5.15.0.1074.72
  kernel and to my surprise , the kexec also triggered another panic
  there: https://paste.ubuntu.com/p/6Xq89qD8VT//. I've been unable to
  reproduce this on the linux-aws and linux-gcp kernels. I've been able
  to easily reproduce this behavior on Standard_D8as_v5 and
  Standard_F32s_v2 instance types. It doesn't appear to matter what
  kernel args are passed into the kexec. My go to so far is a simple
  'sudo kexec -l /boot/vmlinuz --command-line="$(cat /proc/cmdline)
  fake-new-kernel-arg=1" -f' command which triggers it. It does seem
  that one of the kernel args needs to change for this to happen, simply
  running 'sudo kexec -l /boot/vmlinuz --command-line="$(cat
  /proc/cmdline)" -f' doesn't appear to reproduce the panics. It looks
  like this wasn't a problem on the previous 6.5 linux-azure kernel
  (which makes the 5.15 repro particularly surprising...). For
  reference, we currently use these args:
  `BOOT_IMAGE=/boot/vmlinuz-6.8.0-1017-azure root=PARTUUID=uuid-here ro
  console=tty1 console=ttyS0 earlyprintk=ttyS0 nvme_core.io_timeout=240
  apparmor=1 security=apparmor ipv6.disable=1
  transparent_hugepage=madvise systemd.unified_cgroup_hierarchy=0
  systemd.legacy_systemd_cgroup_controller=true panic=-1`

  From my testing, eventually the systems will stabilize and stop
  panicking but it's been undetermined as to why. I've also noticed
  cases where a 'kexec' will happen and not trigger the panics but,
  that's seemingly rare. I'd appreciate any help trying to debug this
  and am more than happy to provide additional testing/repro info as
  needed. Thanks!

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2086789/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to