Public bug reported: Howdy! We have noticed that when running 'kexec' on Azure VM's running the '6.8.0-1017.20~22.04.1' linux-azure kernel we will see kernel panics. Each panic doesn't have the exact same trace which is confusing to me. Here are a few I've collected: * https://paste.ubuntu.com/p/Vs2VsQvV33/ * https://paste.ubuntu.com/p/yXkN8w6Frh/
I attempted to downgrade to the linux-azure-lts-22.04 5.15.0.1074.72 kernel and to my surprise , the kexec also triggered another panic there: https://paste.ubuntu.com/p/6Xq89qD8VT//. I've been unable to reproduce this on the linux-aws and linux-gcp kernels. I've been able to easily reproduce this behavior on Standard_D8as_v5 and Standard_F32s_v2 instance types. It doesn't appear to matter what kernel args are passed into the kexec. My go to so far is a simple 'sudo kexec -l /boot/vmlinuz --command-line="$(cat /proc/cmdline) fake-new-kernel-arg=1" -f' command which triggers it. It does seem that one of the kernel args needs to change for this to happen, simply running 'sudo kexec -l /boot/vmlinuz --command-line="$(cat /proc/cmdline)" -f' doesn't appear to reproduce the panics. It looks like this wasn't a problem on the previous 6.5 linux-azure kernel (which makes the 5.15 repro particularly surprising...). For reference, we currently use these args: `BOOT_IMAGE=/boot/vmlinuz-6.8.0-1017-azure root=PARTUUID=uuid-here ro console=tty1 console=ttyS0 earlyprintk=ttyS0 nvme_core.io_timeout=240 apparmor=1 security=apparmor ipv6.disable=1 transparent_hugepage=madvise systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller=true panic=-1` >From my testing, eventually the systems will stabilize and stop panicking but it's been undetermined as to why. I've also noticed cases where a 'kexec' will happen and not trigger the panics but, that's seemingly rare. I'd appreciate any help trying to debug this and am more than happy to provide additional testing/repro info as needed. Thanks! ** Affects: linux-azure (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/2086789 Title: kexec results in kernel panics Status in linux-azure package in Ubuntu: New Bug description: Howdy! We have noticed that when running 'kexec' on Azure VM's running the '6.8.0-1017.20~22.04.1' linux-azure kernel we will see kernel panics. Each panic doesn't have the exact same trace which is confusing to me. Here are a few I've collected: * https://paste.ubuntu.com/p/Vs2VsQvV33/ * https://paste.ubuntu.com/p/yXkN8w6Frh/ I attempted to downgrade to the linux-azure-lts-22.04 5.15.0.1074.72 kernel and to my surprise , the kexec also triggered another panic there: https://paste.ubuntu.com/p/6Xq89qD8VT//. I've been unable to reproduce this on the linux-aws and linux-gcp kernels. I've been able to easily reproduce this behavior on Standard_D8as_v5 and Standard_F32s_v2 instance types. It doesn't appear to matter what kernel args are passed into the kexec. My go to so far is a simple 'sudo kexec -l /boot/vmlinuz --command-line="$(cat /proc/cmdline) fake-new-kernel-arg=1" -f' command which triggers it. It does seem that one of the kernel args needs to change for this to happen, simply running 'sudo kexec -l /boot/vmlinuz --command-line="$(cat /proc/cmdline)" -f' doesn't appear to reproduce the panics. It looks like this wasn't a problem on the previous 6.5 linux-azure kernel (which makes the 5.15 repro particularly surprising...). For reference, we currently use these args: `BOOT_IMAGE=/boot/vmlinuz-6.8.0-1017-azure root=PARTUUID=uuid-here ro console=tty1 console=ttyS0 earlyprintk=ttyS0 nvme_core.io_timeout=240 apparmor=1 security=apparmor ipv6.disable=1 transparent_hugepage=madvise systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller=true panic=-1` From my testing, eventually the systems will stabilize and stop panicking but it's been undetermined as to why. I've also noticed cases where a 'kexec' will happen and not trigger the panics but, that's seemingly rare. I'd appreciate any help trying to debug this and am more than happy to provide additional testing/repro info as needed. Thanks! To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2086789/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp