@Roger Ramjet> Glad you're making progress.  GRUB can be a pain. I've been
there. The secure boot and BIOS/UEFI plus installing grub on multiple
drives but only one drive's GRUB is active.. sometimes makes kernel updates
on other drives two step chore, minimum.
You might try changing which volume is your primary boot drive in your BIOS
to your Mint drive?   Have your backups up to date and volume-OS's
identified before that. Or not.  Best luck and speedy recovery.

On Fri, Jul 19, 2024 at 4:20 PM Roger Ramjet <2068...@bugs.launchpad.net>
wrote:

> Hello Erv,
>
> Just wanted you to know what I did and found out.
>
> I tried the update grub command when I was in Linux mint OS, it did not
> have any effect.
> Then I remembered reading somebody else has the same issue, no effect
> until they went into something else on their computer, it had 3-4 letters,
> I didn't understand, they said the update worked from there because they
> had loader that software before their Mint software.
> So since I used to use UBUNTU, and still have it on my computer, before I
> loaded Mint, I figured I'd open my UBUNTU software and run the update grub
> command, then went back into Mint and it worked, I am now running 116
> Kernel update.  Good news, but my computer still shows that it cannot find
> that "boot" file, not sure but then I load the "next volume" and then I can
> choose to run either Windows, Ubuntu or Mint, so I'm good.
>
> Thanks for your attempt to help Erv.
>
> Ralph Goe
>
>
>
> Sent with Proton Mail secure email.
>
> On Thursday, July 18th, 2024 at 9:58 PM, Erv Bendiks
> <2068...@bugs.launchpad.net> wrote:
>
> > @RogerRamjet> Do you have motherboard,CPU and OS information to share?
> >
> >
> >
> > On Thu, Jul 18, 2024 at 6:45 PM Roger Ramjet 2068...@bugs.launchpad.net
> >
> > wrote:
> >
> > > Why does the updated kernel show installed but not active, I'm still
> > > having a problem?
> > >
> > > Sent with Proton Mail secure email.
> > >
> > > On Thursday, July 18th, 2024 at 2:47 PM, Pete Orlando
> > > 2068...@bugs.launchpad.net wrote:
> > >
> > > > Ok Black screen now fixed after "update Manger" sent new update on
> > > > 7/17/2024. Thank You Linux team !
> > > >
> > > > On Thu, Jul 18, 2024 at 3:15 AM Roger Ramjet
> 2068...@bugs.launchpad.net
> > > >
> > > > wrote:
> > > >
> > > > > Unfortunately, I still have the same problem, after updating, I
> power
> > > > > down
> > > > > and restart, I get:
> > > > > error: file `/boot/' not found.
> > > > >
> > > > > I view the Kernels in the update manager, it shows 5.15.0-116 is
> > > > > "installed" and "supported until April 2027"
> > > > >
> > > > > The Kernel is loaded and installed but "not found"
> > > > > Then another window opens and I must choose "Boot from next volume"
> > > > > Then another window where I'm given the choice of booting from
> > > > > 5.15.0-107-generic (on /dev/sda5)
> > > > > This new Kernel is not listed at this window, not sure why not,
> seems
> > > > > it
> > > > > should be.
> > > > >
> > > > > If I can give you more info. let me know.
> > > > >
> > > > > Ralph Goe
> > > > >
> > > > > Sent with Proton Mail secure email.
> > > > >
> > > > > On Monday, July 15th, 2024 at 11:38 PM, Ubuntu Kernel Bot
> > > > > 2068...@bugs.launchpad.net wrote:
> > > > >
> > > > > > This bug is awaiting verification that the
> linux-azure/5.15.0-1069.78
> > > > > > kernel in -proposed solves the problem. Please test the kernel
> and
> > > > > > update this bug with the results. If the problem is solved,
> change
> > > > > > the
> > > > > > tag 'verification-needed-jammy-linux-azure' to
> > > > > > 'verification-done-jammy-
> > > > > > linux-azure'. If the problem still exists, change the tag
> > > > > > 'verification-
> > > > > > needed-jammy-linux-azure' to
> 'verification-failed-jammy-linux-azure'.
> > > > > >
> > > > > > If verification is not done by 5 working days from today, this
> fix
> > > > > > will
> > > > > > be dropped from the source code, and this bug will be closed.
> > > > > >
> > > > > > See https://wiki.ubuntu.com/Testing/EnableProposed for
> > > > > > documentation how
> > > > > > to enable and use -proposed. Thank you!
> > > > > >
> > > > > > ** Tags added: kernel-spammed-jammy-linux-azure-v2
> > > > > > verification-needed-jammy-linux-azure
> > > > > >
> > > > > > --
> > > > > > You received this bug notification because you are subscribed to
> a
> > > > > > duplicate bug report (2069485).
> > > > > > https://bugs.launchpad.net/bugs/2068738
> > > > > >
> > > > > > Title:
> > > > > > AMD GPUs fail with null pointer dereference when IOMMU enabled,
> > > > > > leading to black screen
> > > > > >
> > > > > > Status in linux package in Ubuntu:
> > > > > > Fix Released
> > > > > > Status in linux source package in Jammy:
> > > > > > Fix Released
> > > > > >
> > > > > > Bug description:
> > > > > > BugLink: https://bugs.launchpad.net/bugs/2068738
> > > > > >
> > > > > > [Impact]
> > > > > >
> > > > > > On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU
> is
> > > > > > enabled, the system fails to boot correctly, and all users see
> is a
> > > > > > black screen.
> > > > > >
> > > > > > This is caused by a null pointer dereference when enabling the
> IOMMU
> > > > > > after the device has been initialised. It should happen the
> other way
> > > > > > around.
> > > > > >
> > > > > > AMD-Vi: AMD IOMMUv2 loaded and initialized
> > > > > > ...
> > > > > > amdgpu: Topology: Add APU node [0x15d8:0x1002]
> > > > > > kfd kfd: amdgpu: added device 1002:15d8
> > > > > > kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
> > > > > > ...
> > > > > > amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
> > > > > > amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
> > > > > > amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
> > > > > > ...
> > > > > > BUG: kernel NULL pointer dereference, address: 000000000000013c
> > > > > > ...
> > > > > > CPU: 1 PID: 223 Comm: systemd-udevd Not tainted
> 5.15.0-112-generic
> > > > > > #122-Ubuntu
> > > > > > ...
> > > > > > RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > > > > ...
> > > > > > Call Trace:
> > > > > > <TASK>
> > > > > >
> > > > > > ? srso_return_thunk+0x5/0x10
> > > > > > ? show_trace_log_lvl+0x28e/0x2ea
> > > > > > ? show_trace_log_lvl+0x28e/0x2ea
> > > > > > ? dm_hw_fini+0x23/0x30 [amdgpu]
> > > > > > ? show_regs.part.0+0x23/0x29
> > > > > > ? __die_body.cold+0x8/0xd
> > > > > > ? __die+0x2b/0x37
> > > > > > ? page_fault_oops+0x13b/0x170
> > > > > > ? srso_return_thunk+0x5/0x10
> > > > > > ? do_user_addr_fault+0x321/0x670
> > > > > > ? srso_return_thunk+0x5/0x10
> > > > > > ? __free_pages_ok+0x34a/0x4f0
> > > > > > ? exc_page_fault+0x77/0x170
> > > > > > ? asm_exc_page_fault+0x27/0x30
> > > > > > ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > > > > dm_hw_fini+0x23/0x30 [amdgpu]
> > > > > > amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
> > > > > > amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
> > > > > > amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
> > > > > > amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
> > > > > > amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
> > > > > > local_pci_probe+0x4b/0x90
> > > > > > ? srso_return_thunk+0x5/0x10
> > > > > > pci_device_probe+0x119/0x200
> > > > > > really_probe+0x222/0x420
> > > > > > __driver_probe_device+0xe8/0x140
> > > > > > driver_probe_device+0x23/0xc0
> > > > > > __driver_attach+0xf7/0x1f0
> > > > > > ? __device_attach_driver+0x140/0x140
> > > > > > bus_for_each_dev+0x7f/0xd0
> > > > > > driver_attach+0x1e/0x30
> > > > > > bus_add_driver+0x148/0x220
> > > > > > ? srso_return_thunk+0x5/0x10
> > > > > > driver_register+0x95/0x100
> > > > > > __pci_register_driver+0x68/0x70
> > > > > > amdgpu_init+0x7c/0x1000 [amdgpu]
> > > > > > ? 0xffffffffc0e0b000
> > > > > > do_one_initcall+0x49/0x1e0
> > > > > > ? srso_return_thunk+0x5/0x10
> > > > > > ? kmem_cache_alloc_trace+0x19e/0x2e0
> > > > > > do_init_module+0x52/0x260
> > > > > > load_module+0xb45/0xbe0
> > > > > > __do_sys_finit_module+0xbf/0x120
> > > > > > __x64_sys_finit_module+0x18/0x20
> > > > > > x64_sys_call+0x1ac3/0x1fa0
> > > > > > do_syscall_64+0x56/0xb0
> > > > > > ...
> > > > > > entry_SYSCALL_64_after_hwframe+0x67/0xd1
> > > > > >
> > > > > > A workaround does exist. Users can set "nomodeset" or
> "amd_iommu=off"
> > > > > > to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
> > > > > >
> > > > > > [Fix]
> > > > > >
> > > > > > The regression was caused by the following commit that landed in
> > > > > > 5.15.0-112-generic, and 5.15.150 upstream:
> > > > > >
> > > > > > commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
> > > > > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > > > >
> > > > > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > > > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > > > > Link:
> > >
> > >
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
> > >
> > > > > > The fix is to revert this patch, as it was not suppose to be
> > > > > > backported to 5.15 stable.
> > > > > >
> > > > > > The mailing list discussion with AMD developers is:
> > >
> > > https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
> > >
> > > > > > The fix hasn't been acknowledged by Greg KH or Sasha Levin yet,
> so
> > > > > > sending as a Ubuntu SAUCE patch. If the upstream status changes,
> we
> > > > > > can NAK and resend.
> > > > > >
> > > > > > [Testcase]
> > > > > >
> > > > > > You need a system with an AMD Picasso/Raven 2 device. It will
> likely
> > > > > > be an APU, and not a discrete graphics card, but any AMD
> > > > > > Picasso/Raven
> > > > > > 2 device is affected.
> > > > > >
> > > > > > Install the kernel and boot. Make sure full modesetting is
> enabled.
> > > > > >
> > > > > > There is a test kernel available in the ppa below:
> > > > > >
> > > > > > https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
> > > > > >
> > > > > > If you install the test kernel, your system should boot
> successfully.
> > > > > >
> > > > > > [Where problems could occur]
> > > > > >
> > > > > > We are reverting a problematic patch and going back to how it was
> > > > > > before 5.15.0-112-generic. This should not cause any issues for
> > > > > > users.
> > > > > >
> > > > > > If a regression were to occur, users can set "nomodeset" or
> > > > > > "amd_iommu=off" to GRUB_CMDLINE_LINUX_DEFAULT and reboot, or pin
> > > > > > their
> > > > > > kernel to a working one.
> > > > > >
> > > > > > The impact of a regression would be high, as users displays
> could be
> > > > > > blank.
> > > > > >
> > > > > > [Other Info]
> > > > > >
> > > > > > User reports:
> > > > > > https://forums.linuxmint.com/viewtopic.php?t=421484
> > > > > > https://forums.linuxmint.com/viewtopic.php?t=421441
> > >
> > >
> https://www.reddit.com/r/Ubuntu/comments/1d9uviz/had_to_purge_kernel_5150112_could_not_boot/
> > >
> > >
> https://www.reddit.com/r/linuxmint/comments/1d9w6c9/kernel_5150112_boot_failure/
> > >
> > > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068735
> > > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068793
> > > > > > https://bugs.launchpad.net/bugs/2068812
> > > > > >
> > > > > > As bizarre as it is, this commit was actually originally
> included in
> > > > > > 5.15-rc5:
> > > > > >
> > > > > > commit 714d9e4574d54596973ee3b0624ee4a16264d700
> > > > > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > > > >
> > > > > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > > > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > > > > Link:
> > >
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=714d9e4574d54596973ee3b0624ee4a16264d700
> > >
> > > > > > It seems to have caused issues back then too, and was removed in
> the
> > > > > > following fixups, in 5.16-rc1:
> > > > > >
> > > > > > commit 93cec184788b0cf3926bc1f7b47fed74ba87990c
> > > > > > Author: James Zhu james....@amd.com
> > > > > >
> > > > > > Date: Tue Nov 2 21:33:50 2021 -0400
> > > > > > Subject: drm/amdgpu: remove duplicated kfd_resume_iommu
> > > > > > Link:
> > >
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=93cec184788b0cf3926bc1f7b47fed74ba87990c
> > >
> > > > > > commit 9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > > > > > Author: shaoyunl shaoyun....@amd.com
> > > > > >
> > > > > > Date: Fri Nov 5 12:34:14 2021 -0400
> > > > > > Subject: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
> > > > > > Link:
> > >
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > >
> > > > > > I'm not exactly in favor of rewriting history twice, so I think
> we
> > > > > > should just revert the upstream stable patch and move on.
> > > > > >
> > > > > > To manage notifications about this bug go to:
> > >
> > >
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions
> > >
> > > > > --
> > > > > You received this bug notification because you are subscribed to
> the
> > > > > bug
> > > > > report.
> > > > > https://bugs.launchpad.net/bugs/2068738
> > > > >
> > > > > Title:
> > > > > AMD GPUs fail with null pointer dereference when IOMMU enabled,
> > > > > leading to black screen
> > > > >
> > > > > Status in linux package in Ubuntu:
> > > > > Fix Released
> > > > > Status in linux source package in Jammy:
> > > > > Fix Released
> > > > >
> > > > > Bug description:
> > > > > BugLink: https://bugs.launchpad.net/bugs/2068738
> > > > >
> > > > > [Impact]
> > > > >
> > > > > On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
> > > > > enabled, the system fails to boot correctly, and all users see is a
> > > > > black screen.
> > > > >
> > > > > This is caused by a null pointer dereference when enabling the
> IOMMU
> > > > > after the device has been initialised. It should happen the other
> way
> > > > > around.
> > > > >
> > > > > AMD-Vi: AMD IOMMUv2 loaded and initialized
> > > > > ...
> > > > > amdgpu: Topology: Add APU node [0x15d8:0x1002]
> > > > > kfd kfd: amdgpu: added device 1002:15d8
> > > > > kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
> > > > > ...
> > > > > amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
> > > > > amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
> > > > > amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
> > > > > ...
> > > > > BUG: kernel NULL pointer dereference, address: 000000000000013c
> > > > > ...
> > > > > CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> > > > > #122-Ubuntu
> > > > > ...
> > > > > RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > > > ...
> > > > > Call Trace:
> > > > > <TASK>
> > > > > ? srso_return_thunk+0x5/0x10
> > > > > ? show_trace_log_lvl+0x28e/0x2ea
> > > > > ? show_trace_log_lvl+0x28e/0x2ea
> > > > > ? dm_hw_fini+0x23/0x30 [amdgpu]
> > > > > ? show_regs.part.0+0x23/0x29
> > > > > ? __die_body.cold+0x8/0xd
> > > > > ? __die+0x2b/0x37
> > > > > ? page_fault_oops+0x13b/0x170
> > > > > ? srso_return_thunk+0x5/0x10
> > > > > ? do_user_addr_fault+0x321/0x670
> > > > > ? srso_return_thunk+0x5/0x10
> > > > > ? __free_pages_ok+0x34a/0x4f0
> > > > > ? exc_page_fault+0x77/0x170
> > > > > ? asm_exc_page_fault+0x27/0x30
> > > > > ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > > > dm_hw_fini+0x23/0x30 [amdgpu]
> > > > > amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
> > > > > amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
> > > > > amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
> > > > > amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
> > > > > amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
> > > > > local_pci_probe+0x4b/0x90
> > > > > ? srso_return_thunk+0x5/0x10
> > > > > pci_device_probe+0x119/0x200
> > > > > really_probe+0x222/0x420
> > > > > __driver_probe_device+0xe8/0x140
> > > > > driver_probe_device+0x23/0xc0
> > > > > __driver_attach+0xf7/0x1f0
> > > > > ? __device_attach_driver+0x140/0x140
> > > > > bus_for_each_dev+0x7f/0xd0
> > > > > driver_attach+0x1e/0x30
> > > > > bus_add_driver+0x148/0x220
> > > > > ? srso_return_thunk+0x5/0x10
> > > > > driver_register+0x95/0x100
> > > > > __pci_register_driver+0x68/0x70
> > > > > amdgpu_init+0x7c/0x1000 [amdgpu]
> > > > > ? 0xffffffffc0e0b000
> > > > > do_one_initcall+0x49/0x1e0
> > > > > ? srso_return_thunk+0x5/0x10
> > > > > ? kmem_cache_alloc_trace+0x19e/0x2e0
> > > > > do_init_module+0x52/0x260
> > > > > load_module+0xb45/0xbe0
> > > > > __do_sys_finit_module+0xbf/0x120
> > > > > __x64_sys_finit_module+0x18/0x20
> > > > > x64_sys_call+0x1ac3/0x1fa0
> > > > > do_syscall_64+0x56/0xb0
> > > > > ...
> > > > > entry_SYSCALL_64_after_hwframe+0x67/0xd1
> > > > >
> > > > > A workaround does exist. Users can set "nomodeset" or
> "amd_iommu=off"
> > > > > to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
> > > > >
> > > > > [Fix]
> > > > >
> > > > > The regression was caused by the following commit that landed in
> > > > > 5.15.0-112-generic, and 5.15.150 upstream:
> > > > >
> > > > > commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
> > > > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > > > Link:
> > >
> > >
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
> > >
> > > > > The fix is to revert this patch, as it was not suppose to be
> > > > > backported to 5.15 stable.
> > > > >
> > > > > The mailing list discussion with AMD developers is:
> > > > >
> > > > >
> https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
> > > > >
> > > > > The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
> > > > > sending as a Ubuntu SAUCE patch. If the upstream status changes, we
> > > > > can NAK and resend.
> > > > >
> > > > > [Testcase]
> > > > >
> > > > > You need a system with an AMD Picasso/Raven 2 device. It will
> likely
> > > > > be an APU, and not a discrete graphics card, but any AMD
> Picasso/Raven
> > > > > 2 device is affected.
> > > > >
> > > > > Install the kernel and boot. Make sure full modesetting is enabled.
> > > > >
> > > > > There is a test kernel available in the ppa below:
> > > > >
> > > > > https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
> > > > >
> > > > > If you install the test kernel, your system should boot
> successfully.
> > > > >
> > > > > [Where problems could occur]
> > > > >
> > > > > We are reverting a problematic patch and going back to how it was
> > > > > before 5.15.0-112-generic. This should not cause any issues for
> users.
> > > > >
> > > > > If a regression were to occur, users can set "nomodeset" or
> > > > > "amd_iommu=off" to GRUB_CMDLINE_LINUX_DEFAULT and reboot, or pin
> their
> > > > > kernel to a working one.
> > > > >
> > > > > The impact of a regression would be high, as users displays could
> be
> > > > > blank.
> > > > >
> > > > > [Other Info]
> > > > >
> > > > > User reports:
> > > > > https://forums.linuxmint.com/viewtopic.php?t=421484
> > > > > https://forums.linuxmint.com/viewtopic.php?t=421441
> > >
> > >
> https://www.reddit.com/r/Ubuntu/comments/1d9uviz/had_to_purge_kernel_5150112_could_not_boot/
> > >
> > >
> https://www.reddit.com/r/linuxmint/comments/1d9w6c9/kernel_5150112_boot_failure/
> > >
> > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068735
> > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068793
> > > > > https://bugs.launchpad.net/bugs/2068812
> > > > >
> > > > > As bizarre as it is, this commit was actually originally included
> in
> > > > > 5.15-rc5:
> > > > >
> > > > > commit 714d9e4574d54596973ee3b0624ee4a16264d700
> > > > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > > > Link:
> > >
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=714d9e4574d54596973ee3b0624ee4a16264d700
> > >
> > > > > It seems to have caused issues back then too, and was removed in
> the
> > > > > following fixups, in 5.16-rc1:
> > > > >
> > > > > commit 93cec184788b0cf3926bc1f7b47fed74ba87990c
> > > > > Author: James Zhu james....@amd.com
> > > > > Date: Tue Nov 2 21:33:50 2021 -0400
> > > > > Subject: drm/amdgpu: remove duplicated kfd_resume_iommu
> > > > > Link:
> > >
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=93cec184788b0cf3926bc1f7b47fed74ba87990c
> > >
> > > > > commit 9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > > > > Author: shaoyunl shaoyun....@amd.com
> > > > > Date: Fri Nov 5 12:34:14 2021 -0400
> > > > > Subject: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
> > > > > Link:
> > >
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > >
> > > > > I'm not exactly in favor of rewriting history twice, so I think we
> > > > > should just revert the upstream stable patch and move on.
> > > > >
> > > > > To manage notifications about this bug go to:
> > >
> > >
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions
> > >
> > > > --
> > > > You received this bug notification because you are subscribed to a
> > > > duplicate bug report (2069485).
> > > > https://bugs.launchpad.net/bugs/2068738
> > > >
> > > > Title:
> > > > AMD GPUs fail with null pointer dereference when IOMMU enabled,
> > > > leading to black screen
> > > >
> > > > Status in linux package in Ubuntu:
> > > > Fix Released
> > > > Status in linux source package in Jammy:
> > > > Fix Released
> > > >
> > > > Bug description:
> > > > BugLink: https://bugs.launchpad.net/bugs/2068738
> > > >
> > > > [Impact]
> > > >
> > > > On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
> > > > enabled, the system fails to boot correctly, and all users see is a
> > > > black screen.
> > > >
> > > > This is caused by a null pointer dereference when enabling the IOMMU
> > > > after the device has been initialised. It should happen the other way
> > > > around.
> > > >
> > > > AMD-Vi: AMD IOMMUv2 loaded and initialized
> > > > ...
> > > > amdgpu: Topology: Add APU node [0x15d8:0x1002]
> > > > kfd kfd: amdgpu: added device 1002:15d8
> > > > kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
> > > > ...
> > > > amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
> > > > amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
> > > > amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
> > > > ...
> > > > BUG: kernel NULL pointer dereference, address: 000000000000013c
> > > > ...
> > > > CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> > > > #122-Ubuntu
> > > > ...
> > > > RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > > ...
> > > > Call Trace:
> > > > <TASK>
> > > >
> > > > ? srso_return_thunk+0x5/0x10
> > > > ? show_trace_log_lvl+0x28e/0x2ea
> > > > ? show_trace_log_lvl+0x28e/0x2ea
> > > > ? dm_hw_fini+0x23/0x30 [amdgpu]
> > > > ? show_regs.part.0+0x23/0x29
> > > > ? __die_body.cold+0x8/0xd
> > > > ? __die+0x2b/0x37
> > > > ? page_fault_oops+0x13b/0x170
> > > > ? srso_return_thunk+0x5/0x10
> > > > ? do_user_addr_fault+0x321/0x670
> > > > ? srso_return_thunk+0x5/0x10
> > > > ? __free_pages_ok+0x34a/0x4f0
> > > > ? exc_page_fault+0x77/0x170
> > > > ? asm_exc_page_fault+0x27/0x30
> > > > ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > > dm_hw_fini+0x23/0x30 [amdgpu]
> > > > amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
> > > > amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
> > > > amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
> > > > amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
> > > > amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
> > > > local_pci_probe+0x4b/0x90
> > > > ? srso_return_thunk+0x5/0x10
> > > > pci_device_probe+0x119/0x200
> > > > really_probe+0x222/0x420
> > > > __driver_probe_device+0xe8/0x140
> > > > driver_probe_device+0x23/0xc0
> > > > __driver_attach+0xf7/0x1f0
> > > > ? __device_attach_driver+0x140/0x140
> > > > bus_for_each_dev+0x7f/0xd0
> > > > driver_attach+0x1e/0x30
> > > > bus_add_driver+0x148/0x220
> > > > ? srso_return_thunk+0x5/0x10
> > > > driver_register+0x95/0x100
> > > > __pci_register_driver+0x68/0x70
> > > > amdgpu_init+0x7c/0x1000 [amdgpu]
> > > > ? 0xffffffffc0e0b000
> > > > do_one_initcall+0x49/0x1e0
> > > > ? srso_return_thunk+0x5/0x10
> > > > ? kmem_cache_alloc_trace+0x19e/0x2e0
> > > > do_init_module+0x52/0x260
> > > > load_module+0xb45/0xbe0
> > > > __do_sys_finit_module+0xbf/0x120
> > > > __x64_sys_finit_module+0x18/0x20
> > > > x64_sys_call+0x1ac3/0x1fa0
> > > > do_syscall_64+0x56/0xb0
> > > > ...
> > > > entry_SYSCALL_64_after_hwframe+0x67/0xd1
> > > >
> > > > A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
> > > > to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
> > > >
> > > > [Fix]
> > > >
> > > > The regression was caused by the following commit that landed in
> > > > 5.15.0-112-generic, and 5.15.150 upstream:
> > > >
> > > > commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
> > > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > >
> > > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > > Link:
> > > >
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
> > > >
> > > > The fix is to revert this patch, as it was not suppose to be
> > > > backported to 5.15 stable.
> > > >
> > > > The mailing list discussion with AMD developers is:
> > > >
> > > >
> https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
> > > >
> > > > The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
> > > > sending as a Ubuntu SAUCE patch. If the upstream status changes, we
> > > > can NAK and resend.
> > > >
> > > > [Testcase]
> > > >
> > > > You need a system with an AMD Picasso/Raven 2 device. It will likely
> > > > be an APU, and not a discrete graphics card, but any AMD
> Picasso/Raven
> > > > 2 device is affected.
> > > >
> > > > Install the kernel and boot. Make sure full modesetting is enabled.
> > > >
> > > > There is a test kernel available in the ppa below:
> > > >
> > > > https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
> > > >
> > > > If you install the test kernel, your system should boot successfully.
> > > >
> > > > [Where problems could occur]
> > > >
> > > > We are reverting a problematic patch and going back to how it was
> > > > before 5.15.0-112-generic. This should not cause any issues for
> users.
> > > >
> > > > If a regression were to occur, users can set "nomodeset" or
> > > > "amd_iommu=off" to GRUB_CMDLINE_LINUX_DEFAULT and reboot, or pin
> their
> > > > kernel to a working one.
> > > >
> > > > The impact of a regression would be high, as users displays could be
> > > > blank.
> > > >
> > > > [Other Info]
> > > >
> > > > User reports:
> > > > https://forums.linuxmint.com/viewtopic.php?t=421484
> > > > https://forums.linuxmint.com/viewtopic.php?t=421441
> > >
> > >
> https://www.reddit.com/r/Ubuntu/comments/1d9uviz/had_to_purge_kernel_5150112_could_not_boot/
> > >
> > >
> https://www.reddit.com/r/linuxmint/comments/1d9w6c9/kernel_5150112_boot_failure/
> > >
> > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068735
> > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068793
> > > > https://bugs.launchpad.net/bugs/2068812
> > > >
> > > > As bizarre as it is, this commit was actually originally included in
> > > > 5.15-rc5:
> > > >
> > > > commit 714d9e4574d54596973ee3b0624ee4a16264d700
> > > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > >
> > > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > > Link:
> > > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=714d9e4574d54596973ee3b0624ee4a16264d700
> > > >
> > > > It seems to have caused issues back then too, and was removed in the
> > > > following fixups, in 5.16-rc1:
> > > >
> > > > commit 93cec184788b0cf3926bc1f7b47fed74ba87990c
> > > > Author: James Zhu james....@amd.com
> > > >
> > > > Date: Tue Nov 2 21:33:50 2021 -0400
> > > > Subject: drm/amdgpu: remove duplicated kfd_resume_iommu
> > > > Link:
> > > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=93cec184788b0cf3926bc1f7b47fed74ba87990c
> > > >
> > > > commit 9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > > > Author: shaoyunl shaoyun....@amd.com
> > > >
> > > > Date: Fri Nov 5 12:34:14 2021 -0400
> > > > Subject: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
> > > > Link:
> > > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > > >
> > > > I'm not exactly in favor of rewriting history twice, so I think we
> > > > should just revert the upstream stable patch and move on.
> > > >
> > > > To manage notifications about this bug go to:
> > >
> > >
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions
> > >
> > > --
> > > You received this bug notification because you are subscribed to the
> bug
> > > report.
> > > https://bugs.launchpad.net/bugs/2068738
> > >
> > > Title:
> > > AMD GPUs fail with null pointer dereference when IOMMU enabled,
> > > leading to black screen
> > >
> > > Status in linux package in Ubuntu:
> > > Fix Released
> > > Status in linux source package in Jammy:
> > > Fix Released
> > >
> > > Bug description:
> > > BugLink: https://bugs.launchpad.net/bugs/2068738
> > >
> > > [Impact]
> > >
> > > On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
> > > enabled, the system fails to boot correctly, and all users see is a
> > > black screen.
> > >
> > > This is caused by a null pointer dereference when enabling the IOMMU
> > > after the device has been initialised. It should happen the other way
> > > around.
> > >
> > > AMD-Vi: AMD IOMMUv2 loaded and initialized
> > > ...
> > > amdgpu: Topology: Add APU node [0x15d8:0x1002]
> > > kfd kfd: amdgpu: added device 1002:15d8
> > > kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
> > > ...
> > > amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
> > > amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
> > > amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
> > > ...
> > > BUG: kernel NULL pointer dereference, address: 000000000000013c
> > > ...
> > > CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> > > #122-Ubuntu
> > > ...
> > > RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > ...
> > > Call Trace:
> > > <TASK>
> > > ? srso_return_thunk+0x5/0x10
> > > ? show_trace_log_lvl+0x28e/0x2ea
> > > ? show_trace_log_lvl+0x28e/0x2ea
> > > ? dm_hw_fini+0x23/0x30 [amdgpu]
> > > ? show_regs.part.0+0x23/0x29
> > > ? __die_body.cold+0x8/0xd
> > > ? __die+0x2b/0x37
> > > ? page_fault_oops+0x13b/0x170
> > > ? srso_return_thunk+0x5/0x10
> > > ? do_user_addr_fault+0x321/0x670
> > > ? srso_return_thunk+0x5/0x10
> > > ? __free_pages_ok+0x34a/0x4f0
> > > ? exc_page_fault+0x77/0x170
> > > ? asm_exc_page_fault+0x27/0x30
> > > ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > dm_hw_fini+0x23/0x30 [amdgpu]
> > > amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
> > > amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
> > > amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
> > > amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
> > > amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
> > > local_pci_probe+0x4b/0x90
> > > ? srso_return_thunk+0x5/0x10
> > > pci_device_probe+0x119/0x200
> > > really_probe+0x222/0x420
> > > __driver_probe_device+0xe8/0x140
> > > driver_probe_device+0x23/0xc0
> > > __driver_attach+0xf7/0x1f0
> > > ? __device_attach_driver+0x140/0x140
> > > bus_for_each_dev+0x7f/0xd0
> > > driver_attach+0x1e/0x30
> > > bus_add_driver+0x148/0x220
> > > ? srso_return_thunk+0x5/0x10
> > > driver_register+0x95/0x100
> > > __pci_register_driver+0x68/0x70
> > > amdgpu_init+0x7c/0x1000 [amdgpu]
> > > ? 0xffffffffc0e0b000
> > > do_one_initcall+0x49/0x1e0
> > > ? srso_return_thunk+0x5/0x10
> > > ? kmem_cache_alloc_trace+0x19e/0x2e0
> > > do_init_module+0x52/0x260
> > > load_module+0xb45/0xbe0
> > > __do_sys_finit_module+0xbf/0x120
> > > __x64_sys_finit_module+0x18/0x20
> > > x64_sys_call+0x1ac3/0x1fa0
> > > do_syscall_64+0x56/0xb0
> > > ...
> > > entry_SYSCALL_64_after_hwframe+0x67/0xd1
> > >
> > > A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
> > > to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
> > >
> > > [Fix]
> > >
> > > The regression was caused by the following commit that landed in
> > > 5.15.0-112-generic, and 5.15.150 upstream:
> > >
> > > commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
> > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > Link:
> > >
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
> > >
> > > The fix is to revert this patch, as it was not suppose to be
> > > backported to 5.15 stable.
> > >
> > > The mailing list discussion with AMD developers is:
> > >
> > > https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
> > >
> > > The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
> > > sending as a Ubuntu SAUCE patch. If the upstream status changes, we
> > > can NAK and resend.
> > >
> > > [Testcase]
> > >
> > > You need a system with an AMD Picasso/Raven 2 device. It will likely
> > > be an APU, and not a discrete graphics card, but any AMD Picasso/Raven
> > > 2 device is affected.
> > >
> > > Install the kernel and boot. Make sure full modesetting is enabled.
> > >
> > > There is a test kernel available in the ppa below:
> > >
> > > https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
> > >
> > > If you install the test kernel, your system should boot successfully.
> > >
> > > [Where problems could occur]
> > >
> > > We are reverting a problematic patch and going back to how it was
> > > before 5.15.0-112-generic. This should not cause any issues for users.
> > >
> > > If a regression were to occur, users can set "nomodeset" or
> > > "amd_iommu=off" to GRUB_CMDLINE_LINUX_DEFAULT and reboot, or pin their
> > > kernel to a working one.
> > >
> > > The impact of a regression would be high, as users displays could be
> > > blank.
> > >
> > > [Other Info]
> > >
> > > User reports:
> > > https://forums.linuxmint.com/viewtopic.php?t=421484
> > > https://forums.linuxmint.com/viewtopic.php?t=421441
> > >
> > >
> https://www.reddit.com/r/Ubuntu/comments/1d9uviz/had_to_purge_kernel_5150112_could_not_boot/
> > >
> > >
> https://www.reddit.com/r/linuxmint/comments/1d9w6c9/kernel_5150112_boot_failure/
> > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068735
> > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068793
> > > https://bugs.launchpad.net/bugs/2068812
> > >
> > > As bizarre as it is, this commit was actually originally included in
> > > 5.15-rc5:
> > >
> > > commit 714d9e4574d54596973ee3b0624ee4a16264d700
> > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > Link:
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=714d9e4574d54596973ee3b0624ee4a16264d700
> > >
> > > It seems to have caused issues back then too, and was removed in the
> > > following fixups, in 5.16-rc1:
> > >
> > > commit 93cec184788b0cf3926bc1f7b47fed74ba87990c
> > > Author: James Zhu james....@amd.com
> > > Date: Tue Nov 2 21:33:50 2021 -0400
> > > Subject: drm/amdgpu: remove duplicated kfd_resume_iommu
> > > Link:
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=93cec184788b0cf3926bc1f7b47fed74ba87990c
> > >
> > > commit 9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > > Author: shaoyunl shaoyun....@amd.com
> > > Date: Fri Nov 5 12:34:14 2021 -0400
> > > Subject: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
> > > Link:
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > >
> > > I'm not exactly in favor of rewriting history twice, so I think we
> > > should just revert the upstream stable patch and move on.
> > >
> > > To manage notifications about this bug go to:
> > >
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions
> >
> >
> > --
> > Erv Bendiks
> >
> > 416-816-9802
> >
> > --
> > You received this bug notification because you are subscribed to a
> > duplicate bug report (2069485).
> > https://bugs.launchpad.net/bugs/2068738
> >
> > Title:
> > AMD GPUs fail with null pointer dereference when IOMMU enabled,
> > leading to black screen
> >
> > Status in linux package in Ubuntu:
> > Fix Released
> > Status in linux source package in Jammy:
> > Fix Released
> >
> > Bug description:
> > BugLink: https://bugs.launchpad.net/bugs/2068738
> >
> > [Impact]
> >
> > On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
> > enabled, the system fails to boot correctly, and all users see is a
> > black screen.
> >
> > This is caused by a null pointer dereference when enabling the IOMMU
> > after the device has been initialised. It should happen the other way
> > around.
> >
> > AMD-Vi: AMD IOMMUv2 loaded and initialized
> > ...
> > amdgpu: Topology: Add APU node [0x15d8:0x1002]
> > kfd kfd: amdgpu: added device 1002:15d8
> > kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
> > ...
> > amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
> > amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
> > amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
> > ...
> > BUG: kernel NULL pointer dereference, address: 000000000000013c
> > ...
> > CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> #122-Ubuntu
> > ...
> > RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > ...
> > Call Trace:
> > <TASK>
> >
> > ? srso_return_thunk+0x5/0x10
> > ? show_trace_log_lvl+0x28e/0x2ea
> > ? show_trace_log_lvl+0x28e/0x2ea
> > ? dm_hw_fini+0x23/0x30 [amdgpu]
> > ? show_regs.part.0+0x23/0x29
> > ? __die_body.cold+0x8/0xd
> > ? __die+0x2b/0x37
> > ? page_fault_oops+0x13b/0x170
> > ? srso_return_thunk+0x5/0x10
> > ? do_user_addr_fault+0x321/0x670
> > ? srso_return_thunk+0x5/0x10
> > ? __free_pages_ok+0x34a/0x4f0
> > ? exc_page_fault+0x77/0x170
> > ? asm_exc_page_fault+0x27/0x30
> > ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > dm_hw_fini+0x23/0x30 [amdgpu]
> > amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
> > amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
> > amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
> > amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
> > amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
> > local_pci_probe+0x4b/0x90
> > ? srso_return_thunk+0x5/0x10
> > pci_device_probe+0x119/0x200
> > really_probe+0x222/0x420
> > __driver_probe_device+0xe8/0x140
> > driver_probe_device+0x23/0xc0
> > __driver_attach+0xf7/0x1f0
> > ? __device_attach_driver+0x140/0x140
> > bus_for_each_dev+0x7f/0xd0
> > driver_attach+0x1e/0x30
> > bus_add_driver+0x148/0x220
> > ? srso_return_thunk+0x5/0x10
> > driver_register+0x95/0x100
> > __pci_register_driver+0x68/0x70
> > amdgpu_init+0x7c/0x1000 [amdgpu]
> > ? 0xffffffffc0e0b000
> > do_one_initcall+0x49/0x1e0
> > ? srso_return_thunk+0x5/0x10
> > ? kmem_cache_alloc_trace+0x19e/0x2e0
> > do_init_module+0x52/0x260
> > load_module+0xb45/0xbe0
> > __do_sys_finit_module+0xbf/0x120
> > __x64_sys_finit_module+0x18/0x20
> > x64_sys_call+0x1ac3/0x1fa0
> > do_syscall_64+0x56/0xb0
> > ...
> > entry_SYSCALL_64_after_hwframe+0x67/0xd1
> >
> > A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
> > to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
> >
> > [Fix]
> >
> > The regression was caused by the following commit that landed in
> > 5.15.0-112-generic, and 5.15.150 upstream:
> >
> > commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
> > Author: Yifan Zhang yifan1.zh...@amd.com
> >
> > Date: Tue Sep 28 15:42:35 2021 +0800
> > Subject: drm/amdgpu: init iommu after amdkfd device init
> > Link:
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
> >
> > The fix is to revert this patch, as it was not suppose to be
> > backported to 5.15 stable.
> >
> > The mailing list discussion with AMD developers is:
> >
> > https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
> >
> > The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
> > sending as a Ubuntu SAUCE patch. If the upstream status changes, we
> > can NAK and resend.
> >
> > [Testcase]
> >
> > You need a system with an AMD Picasso/Raven 2 device. It will likely
> > be an APU, and not a discrete graphics card, but any AMD Picasso/Raven
> > 2 device is affected.
> >
> > Install the kernel and boot. Make sure full modesetting is enabled.
> >
> > There is a test kernel available in the ppa below:
> >
> > https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
> >
> > If you install the test kernel, your system should boot successfully.
> >
> > [Where problems could occur]
> >
> > We are reverting a problematic patch and going back to how it was
> > before 5.15.0-112-generic. This should not cause any issues for users.
> >
> > If a regression were to occur, users can set "nomodeset" or
> > "amd_iommu=off" to GRUB_CMDLINE_LINUX_DEFAULT and reboot, or pin their
> > kernel to a working one.
> >
> > The impact of a regression would be high, as users displays could be
> > blank.
> >
> > [Other Info]
> >
> > User reports:
> > https://forums.linuxmint.com/viewtopic.php?t=421484
> > https://forums.linuxmint.com/viewtopic.php?t=421441
> >
> https://www.reddit.com/r/Ubuntu/comments/1d9uviz/had_to_purge_kernel_5150112_could_not_boot/
> >
> https://www.reddit.com/r/linuxmint/comments/1d9w6c9/kernel_5150112_boot_failure/
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068735
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068793
> > https://bugs.launchpad.net/bugs/2068812
> >
> > As bizarre as it is, this commit was actually originally included in
> > 5.15-rc5:
> >
> > commit 714d9e4574d54596973ee3b0624ee4a16264d700
> > Author: Yifan Zhang yifan1.zh...@amd.com
> >
> > Date: Tue Sep 28 15:42:35 2021 +0800
> > Subject: drm/amdgpu: init iommu after amdkfd device init
> > Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=714d9e4574d54596973ee3b0624ee4a16264d700
> >
> > It seems to have caused issues back then too, and was removed in the
> > following fixups, in 5.16-rc1:
> >
> > commit 93cec184788b0cf3926bc1f7b47fed74ba87990c
> > Author: James Zhu james....@amd.com
> >
> > Date: Tue Nov 2 21:33:50 2021 -0400
> > Subject: drm/amdgpu: remove duplicated kfd_resume_iommu
> > Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=93cec184788b0cf3926bc1f7b47fed74ba87990c
> >
> > commit 9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > Author: shaoyunl shaoyun....@amd.com
> >
> > Date: Fri Nov 5 12:34:14 2021 -0400
> > Subject: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
> > Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> >
> > I'm not exactly in favor of rewriting history twice, so I think we
> > should just revert the upstream stable patch and move on.
> >
> > To manage notifications about this bug go to:
> >
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2068738
>
> Title:
>   AMD GPUs fail with null pointer dereference when IOMMU enabled,
>   leading to black screen
>
> Status in linux package in Ubuntu:
>   Fix Released
> Status in linux source package in Jammy:
>   Fix Released
>
> Bug description:
>   BugLink: https://bugs.launchpad.net/bugs/2068738
>
>   [Impact]
>
>   On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
>   enabled, the system fails to boot correctly, and all users see is a
>   black screen.
>
>   This is caused by a null pointer dereference when enabling the IOMMU
>   after the device has been initialised. It should happen the other way
>   around.
>
>   AMD-Vi: AMD IOMMUv2 loaded and initialized
>   ...
>   amdgpu: Topology: Add APU node [0x15d8:0x1002]
>   kfd kfd: amdgpu: added device 1002:15d8
>   kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
>   ...
>   amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
>   amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
>   amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
>   ...
>   BUG: kernel NULL pointer dereference, address: 000000000000013c
>   ...
>   CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> #122-Ubuntu
>   ...
>   RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>   ...
>   Call Trace:
>    <TASK>
>    ? srso_return_thunk+0x5/0x10
>    ? show_trace_log_lvl+0x28e/0x2ea
>    ? show_trace_log_lvl+0x28e/0x2ea
>    ? dm_hw_fini+0x23/0x30 [amdgpu]
>    ? show_regs.part.0+0x23/0x29
>    ? __die_body.cold+0x8/0xd
>    ? __die+0x2b/0x37
>    ? page_fault_oops+0x13b/0x170
>    ? srso_return_thunk+0x5/0x10
>    ? do_user_addr_fault+0x321/0x670
>    ? srso_return_thunk+0x5/0x10
>    ? __free_pages_ok+0x34a/0x4f0
>    ? exc_page_fault+0x77/0x170
>    ? asm_exc_page_fault+0x27/0x30
>    ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>    dm_hw_fini+0x23/0x30 [amdgpu]
>    amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
>    amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
>    amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
>    amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
>    amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
>    local_pci_probe+0x4b/0x90
>    ? srso_return_thunk+0x5/0x10
>    pci_device_probe+0x119/0x200
>    really_probe+0x222/0x420
>    __driver_probe_device+0xe8/0x140
>    driver_probe_device+0x23/0xc0
>    __driver_attach+0xf7/0x1f0
>    ? __device_attach_driver+0x140/0x140
>    bus_for_each_dev+0x7f/0xd0
>    driver_attach+0x1e/0x30
>    bus_add_driver+0x148/0x220
>    ? srso_return_thunk+0x5/0x10
>    driver_register+0x95/0x100
>    __pci_register_driver+0x68/0x70
>    amdgpu_init+0x7c/0x1000 [amdgpu]
>    ? 0xffffffffc0e0b000
>    do_one_initcall+0x49/0x1e0
>    ? srso_return_thunk+0x5/0x10
>    ? kmem_cache_alloc_trace+0x19e/0x2e0
>    do_init_module+0x52/0x260
>    load_module+0xb45/0xbe0
>    __do_sys_finit_module+0xbf/0x120
>    __x64_sys_finit_module+0x18/0x20
>    x64_sys_call+0x1ac3/0x1fa0
>    do_syscall_64+0x56/0xb0
>   ...
>    entry_SYSCALL_64_after_hwframe+0x67/0xd1
>
>   A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
>   to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
>
>   [Fix]
>
>   The regression was caused by the following commit that landed in
>   5.15.0-112-generic, and 5.15.150 upstream:
>
>   commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
>   Author: Yifan Zhang <yifan1.zh...@amd.com>
>   Date: Tue Sep 28 15:42:35 2021 +0800
>   Subject: drm/amdgpu: init iommu after amdkfd device init
>   Link:
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
>
>   The fix is to revert this patch, as it was not suppose to be
>   backported to 5.15 stable.
>
>   The mailing list discussion with AMD developers is:
>
>   https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
>
>   The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
>   sending as a Ubuntu SAUCE patch. If the upstream status changes, we
>   can NAK and resend.
>
>   [Testcase]
>
>   You need a system with an AMD Picasso/Raven 2 device. It will likely
>   be an APU, and not a discrete graphics card, but any AMD Picasso/Raven
>   2 device is affected.
>
>   Install the kernel and boot. Make sure full modesetting is enabled.
>
>   There is a test kernel available in the ppa below:
>
>   https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
>
>   If you install the test kernel, your system should boot successfully.
>
>   [Where problems could occur]
>
>   We are reverting a problematic patch and going back to how it was
>   before 5.15.0-112-generic. This should not cause any issues for users.
>
>   If a regression were to occur, users can set "nomodeset" or
>   "amd_iommu=off" to GRUB_CMDLINE_LINUX_DEFAULT and reboot, or pin their
>   kernel to a working one.
>
>   The impact of a regression would be high, as users displays could be
>   blank.
>
>   [Other Info]
>
>   User reports:
>   https://forums.linuxmint.com/viewtopic.php?t=421484
>   https://forums.linuxmint.com/viewtopic.php?t=421441
>
> https://www.reddit.com/r/Ubuntu/comments/1d9uviz/had_to_purge_kernel_5150112_could_not_boot/
>
> https://www.reddit.com/r/linuxmint/comments/1d9w6c9/kernel_5150112_boot_failure/
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068735
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068793
>   https://bugs.launchpad.net/bugs/2068812
>
>   As bizarre as it is, this commit was actually originally included in
>   5.15-rc5:
>
>   commit 714d9e4574d54596973ee3b0624ee4a16264d700
>   Author: Yifan Zhang <yifan1.zh...@amd.com>
>   Date: Tue Sep 28 15:42:35 2021 +0800
>   Subject: drm/amdgpu: init iommu after amdkfd device init
>   Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=714d9e4574d54596973ee3b0624ee4a16264d700
>
>   It seems to have caused issues back then too, and was removed in the
>   following fixups, in 5.16-rc1:
>
>   commit 93cec184788b0cf3926bc1f7b47fed74ba87990c
>   Author: James Zhu <james....@amd.com>
>   Date:   Tue Nov 2 21:33:50 2021 -0400
>   Subject: drm/amdgpu: remove duplicated kfd_resume_iommu
>   Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=93cec184788b0cf3926bc1f7b47fed74ba87990c
>
>   commit 9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
>   Author: shaoyunl <shaoyun....@amd.com>
>   Date:   Fri Nov 5 12:34:14 2021 -0400
>   Subject: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
>   Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
>
>   I'm not exactly in favor of rewriting history twice, so I think we
>   should just revert the upstream stable patch and move on.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions
>
>

-- 
Erv Bendiks

416-816-9802

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2068738

Title:
  AMD GPUs fail with null pointer dereference when IOMMU enabled,
  leading to black screen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to