@RogerRamjet>  Do you have motherboard,CPU and OS information to share?

On Thu, Jul 18, 2024 at 6:45 PM Roger Ramjet <2068...@bugs.launchpad.net>
wrote:

> Why does the updated kernel show installed but not active, I'm still
> having a problem?
>
>
>
> Sent with Proton Mail secure email.
>
> On Thursday, July 18th, 2024 at 2:47 PM, Pete Orlando
> <2068...@bugs.launchpad.net> wrote:
>
> > Ok Black screen now fixed after "update Manger" sent new update on
> > 7/17/2024. Thank You Linux team !
> >
> >
> > On Thu, Jul 18, 2024 at 3:15 AM Roger Ramjet 2068...@bugs.launchpad.net
> >
> > wrote:
> >
> > > Unfortunately, I still have the same problem, after updating, I power
> down
> > > and restart, I get:
> > > error: file `/boot/' not found.
> > >
> > > I view the Kernels in the update manager, it shows 5.15.0-116 is
> > > "installed" and "supported until April 2027"
> > >
> > > The Kernel is loaded and installed but "not found"
> > > Then another window opens and I must choose "Boot from next volume"
> > > Then another window where I'm given the choice of booting from
> > > 5.15.0-107-generic (on /dev/sda5)
> > > This new Kernel is not listed at this window, not sure why not, seems
> it
> > > should be.
> > >
> > > If I can give you more info. let me know.
> > >
> > > Ralph Goe
> > >
> > > Sent with Proton Mail secure email.
> > >
> > > On Monday, July 15th, 2024 at 11:38 PM, Ubuntu Kernel Bot
> > > 2068...@bugs.launchpad.net wrote:
> > >
> > > > This bug is awaiting verification that the linux-azure/5.15.0-1069.78
> > > > kernel in -proposed solves the problem. Please test the kernel and
> > > > update this bug with the results. If the problem is solved, change
> the
> > > > tag 'verification-needed-jammy-linux-azure' to
> 'verification-done-jammy-
> > > > linux-azure'. If the problem still exists, change the tag
> 'verification-
> > > > needed-jammy-linux-azure' to 'verification-failed-jammy-linux-azure'.
> > > >
> > > > If verification is not done by 5 working days from today, this fix
> will
> > > > be dropped from the source code, and this bug will be closed.
> > > >
> > > > See https://wiki.ubuntu.com/Testing/EnableProposed for
> documentation how
> > > > to enable and use -proposed. Thank you!
> > > >
> > > > ** Tags added: kernel-spammed-jammy-linux-azure-v2
> > > > verification-needed-jammy-linux-azure
> > > >
> > > > --
> > > > You received this bug notification because you are subscribed to a
> > > > duplicate bug report (2069485).
> > > > https://bugs.launchpad.net/bugs/2068738
> > > >
> > > > Title:
> > > > AMD GPUs fail with null pointer dereference when IOMMU enabled,
> > > > leading to black screen
> > > >
> > > > Status in linux package in Ubuntu:
> > > > Fix Released
> > > > Status in linux source package in Jammy:
> > > > Fix Released
> > > >
> > > > Bug description:
> > > > BugLink: https://bugs.launchpad.net/bugs/2068738
> > > >
> > > > [Impact]
> > > >
> > > > On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
> > > > enabled, the system fails to boot correctly, and all users see is a
> > > > black screen.
> > > >
> > > > This is caused by a null pointer dereference when enabling the IOMMU
> > > > after the device has been initialised. It should happen the other way
> > > > around.
> > > >
> > > > AMD-Vi: AMD IOMMUv2 loaded and initialized
> > > > ...
> > > > amdgpu: Topology: Add APU node [0x15d8:0x1002]
> > > > kfd kfd: amdgpu: added device 1002:15d8
> > > > kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
> > > > ...
> > > > amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
> > > > amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
> > > > amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
> > > > ...
> > > > BUG: kernel NULL pointer dereference, address: 000000000000013c
> > > > ...
> > > > CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> > > > #122-Ubuntu
> > > > ...
> > > > RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > > ...
> > > > Call Trace:
> > > > <TASK>
> > > >
> > > > ? srso_return_thunk+0x5/0x10
> > > > ? show_trace_log_lvl+0x28e/0x2ea
> > > > ? show_trace_log_lvl+0x28e/0x2ea
> > > > ? dm_hw_fini+0x23/0x30 [amdgpu]
> > > > ? show_regs.part.0+0x23/0x29
> > > > ? __die_body.cold+0x8/0xd
> > > > ? __die+0x2b/0x37
> > > > ? page_fault_oops+0x13b/0x170
> > > > ? srso_return_thunk+0x5/0x10
> > > > ? do_user_addr_fault+0x321/0x670
> > > > ? srso_return_thunk+0x5/0x10
> > > > ? __free_pages_ok+0x34a/0x4f0
> > > > ? exc_page_fault+0x77/0x170
> > > > ? asm_exc_page_fault+0x27/0x30
> > > > ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > > dm_hw_fini+0x23/0x30 [amdgpu]
> > > > amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
> > > > amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
> > > > amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
> > > > amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
> > > > amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
> > > > local_pci_probe+0x4b/0x90
> > > > ? srso_return_thunk+0x5/0x10
> > > > pci_device_probe+0x119/0x200
> > > > really_probe+0x222/0x420
> > > > __driver_probe_device+0xe8/0x140
> > > > driver_probe_device+0x23/0xc0
> > > > __driver_attach+0xf7/0x1f0
> > > > ? __device_attach_driver+0x140/0x140
> > > > bus_for_each_dev+0x7f/0xd0
> > > > driver_attach+0x1e/0x30
> > > > bus_add_driver+0x148/0x220
> > > > ? srso_return_thunk+0x5/0x10
> > > > driver_register+0x95/0x100
> > > > __pci_register_driver+0x68/0x70
> > > > amdgpu_init+0x7c/0x1000 [amdgpu]
> > > > ? 0xffffffffc0e0b000
> > > > do_one_initcall+0x49/0x1e0
> > > > ? srso_return_thunk+0x5/0x10
> > > > ? kmem_cache_alloc_trace+0x19e/0x2e0
> > > > do_init_module+0x52/0x260
> > > > load_module+0xb45/0xbe0
> > > > __do_sys_finit_module+0xbf/0x120
> > > > __x64_sys_finit_module+0x18/0x20
> > > > x64_sys_call+0x1ac3/0x1fa0
> > > > do_syscall_64+0x56/0xb0
> > > > ...
> > > > entry_SYSCALL_64_after_hwframe+0x67/0xd1
> > > >
> > > > A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
> > > > to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
> > > >
> > > > [Fix]
> > > >
> > > > The regression was caused by the following commit that landed in
> > > > 5.15.0-112-generic, and 5.15.150 upstream:
> > > >
> > > > commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
> > > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > >
> > > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > > Link:
> > > >
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
> > > >
> > > > The fix is to revert this patch, as it was not suppose to be
> > > > backported to 5.15 stable.
> > > >
> > > > The mailing list discussion with AMD developers is:
> > > >
> > > >
> https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
> > > >
> > > > The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
> > > > sending as a Ubuntu SAUCE patch. If the upstream status changes, we
> > > > can NAK and resend.
> > > >
> > > > [Testcase]
> > > >
> > > > You need a system with an AMD Picasso/Raven 2 device. It will likely
> > > > be an APU, and not a discrete graphics card, but any AMD
> Picasso/Raven
> > > > 2 device is affected.
> > > >
> > > > Install the kernel and boot. Make sure full modesetting is enabled.
> > > >
> > > > There is a test kernel available in the ppa below:
> > > >
> > > > https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
> > > >
> > > > If you install the test kernel, your system should boot successfully.
> > > >
> > > > [Where problems could occur]
> > > >
> > > > We are reverting a problematic patch and going back to how it was
> > > > before 5.15.0-112-generic. This should not cause any issues for
> users.
> > > >
> > > > If a regression were to occur, users can set "nomodeset" or
> > > > "amd_iommu=off" to GRUB_CMDLINE_LINUX_DEFAULT and reboot, or pin
> their
> > > > kernel to a working one.
> > > >
> > > > The impact of a regression would be high, as users displays could be
> > > > blank.
> > > >
> > > > [Other Info]
> > > >
> > > > User reports:
> > > > https://forums.linuxmint.com/viewtopic.php?t=421484
> > > > https://forums.linuxmint.com/viewtopic.php?t=421441
> > >
> > >
> https://www.reddit.com/r/Ubuntu/comments/1d9uviz/had_to_purge_kernel_5150112_could_not_boot/
> > >
> > >
> https://www.reddit.com/r/linuxmint/comments/1d9w6c9/kernel_5150112_boot_failure/
> > >
> > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068735
> > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068793
> > > > https://bugs.launchpad.net/bugs/2068812
> > > >
> > > > As bizarre as it is, this commit was actually originally included in
> > > > 5.15-rc5:
> > > >
> > > > commit 714d9e4574d54596973ee3b0624ee4a16264d700
> > > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > >
> > > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > > Link:
> > > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=714d9e4574d54596973ee3b0624ee4a16264d700
> > > >
> > > > It seems to have caused issues back then too, and was removed in the
> > > > following fixups, in 5.16-rc1:
> > > >
> > > > commit 93cec184788b0cf3926bc1f7b47fed74ba87990c
> > > > Author: James Zhu james....@amd.com
> > > >
> > > > Date: Tue Nov 2 21:33:50 2021 -0400
> > > > Subject: drm/amdgpu: remove duplicated kfd_resume_iommu
> > > > Link:
> > > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=93cec184788b0cf3926bc1f7b47fed74ba87990c
> > > >
> > > > commit 9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > > > Author: shaoyunl shaoyun....@amd.com
> > > >
> > > > Date: Fri Nov 5 12:34:14 2021 -0400
> > > > Subject: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
> > > > Link:
> > > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > > >
> > > > I'm not exactly in favor of rewriting history twice, so I think we
> > > > should just revert the upstream stable patch and move on.
> > > >
> > > > To manage notifications about this bug go to:
> > >
> > >
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions
> > >
> > > --
> > > You received this bug notification because you are subscribed to the
> bug
> > > report.
> > > https://bugs.launchpad.net/bugs/2068738
> > >
> > > Title:
> > > AMD GPUs fail with null pointer dereference when IOMMU enabled,
> > > leading to black screen
> > >
> > > Status in linux package in Ubuntu:
> > > Fix Released
> > > Status in linux source package in Jammy:
> > > Fix Released
> > >
> > > Bug description:
> > > BugLink: https://bugs.launchpad.net/bugs/2068738
> > >
> > > [Impact]
> > >
> > > On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
> > > enabled, the system fails to boot correctly, and all users see is a
> > > black screen.
> > >
> > > This is caused by a null pointer dereference when enabling the IOMMU
> > > after the device has been initialised. It should happen the other way
> > > around.
> > >
> > > AMD-Vi: AMD IOMMUv2 loaded and initialized
> > > ...
> > > amdgpu: Topology: Add APU node [0x15d8:0x1002]
> > > kfd kfd: amdgpu: added device 1002:15d8
> > > kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
> > > ...
> > > amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
> > > amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
> > > amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
> > > ...
> > > BUG: kernel NULL pointer dereference, address: 000000000000013c
> > > ...
> > > CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> > > #122-Ubuntu
> > > ...
> > > RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > ...
> > > Call Trace:
> > > <TASK>
> > > ? srso_return_thunk+0x5/0x10
> > > ? show_trace_log_lvl+0x28e/0x2ea
> > > ? show_trace_log_lvl+0x28e/0x2ea
> > > ? dm_hw_fini+0x23/0x30 [amdgpu]
> > > ? show_regs.part.0+0x23/0x29
> > > ? __die_body.cold+0x8/0xd
> > > ? __die+0x2b/0x37
> > > ? page_fault_oops+0x13b/0x170
> > > ? srso_return_thunk+0x5/0x10
> > > ? do_user_addr_fault+0x321/0x670
> > > ? srso_return_thunk+0x5/0x10
> > > ? __free_pages_ok+0x34a/0x4f0
> > > ? exc_page_fault+0x77/0x170
> > > ? asm_exc_page_fault+0x27/0x30
> > > ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > > dm_hw_fini+0x23/0x30 [amdgpu]
> > > amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
> > > amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
> > > amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
> > > amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
> > > amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
> > > local_pci_probe+0x4b/0x90
> > > ? srso_return_thunk+0x5/0x10
> > > pci_device_probe+0x119/0x200
> > > really_probe+0x222/0x420
> > > __driver_probe_device+0xe8/0x140
> > > driver_probe_device+0x23/0xc0
> > > __driver_attach+0xf7/0x1f0
> > > ? __device_attach_driver+0x140/0x140
> > > bus_for_each_dev+0x7f/0xd0
> > > driver_attach+0x1e/0x30
> > > bus_add_driver+0x148/0x220
> > > ? srso_return_thunk+0x5/0x10
> > > driver_register+0x95/0x100
> > > __pci_register_driver+0x68/0x70
> > > amdgpu_init+0x7c/0x1000 [amdgpu]
> > > ? 0xffffffffc0e0b000
> > > do_one_initcall+0x49/0x1e0
> > > ? srso_return_thunk+0x5/0x10
> > > ? kmem_cache_alloc_trace+0x19e/0x2e0
> > > do_init_module+0x52/0x260
> > > load_module+0xb45/0xbe0
> > > __do_sys_finit_module+0xbf/0x120
> > > __x64_sys_finit_module+0x18/0x20
> > > x64_sys_call+0x1ac3/0x1fa0
> > > do_syscall_64+0x56/0xb0
> > > ...
> > > entry_SYSCALL_64_after_hwframe+0x67/0xd1
> > >
> > > A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
> > > to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
> > >
> > > [Fix]
> > >
> > > The regression was caused by the following commit that landed in
> > > 5.15.0-112-generic, and 5.15.150 upstream:
> > >
> > > commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
> > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > Link:
> > >
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
> > >
> > > The fix is to revert this patch, as it was not suppose to be
> > > backported to 5.15 stable.
> > >
> > > The mailing list discussion with AMD developers is:
> > >
> > > https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
> > >
> > > The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
> > > sending as a Ubuntu SAUCE patch. If the upstream status changes, we
> > > can NAK and resend.
> > >
> > > [Testcase]
> > >
> > > You need a system with an AMD Picasso/Raven 2 device. It will likely
> > > be an APU, and not a discrete graphics card, but any AMD Picasso/Raven
> > > 2 device is affected.
> > >
> > > Install the kernel and boot. Make sure full modesetting is enabled.
> > >
> > > There is a test kernel available in the ppa below:
> > >
> > > https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
> > >
> > > If you install the test kernel, your system should boot successfully.
> > >
> > > [Where problems could occur]
> > >
> > > We are reverting a problematic patch and going back to how it was
> > > before 5.15.0-112-generic. This should not cause any issues for users.
> > >
> > > If a regression were to occur, users can set "nomodeset" or
> > > "amd_iommu=off" to GRUB_CMDLINE_LINUX_DEFAULT and reboot, or pin their
> > > kernel to a working one.
> > >
> > > The impact of a regression would be high, as users displays could be
> > > blank.
> > >
> > > [Other Info]
> > >
> > > User reports:
> > > https://forums.linuxmint.com/viewtopic.php?t=421484
> > > https://forums.linuxmint.com/viewtopic.php?t=421441
> > >
> > >
> https://www.reddit.com/r/Ubuntu/comments/1d9uviz/had_to_purge_kernel_5150112_could_not_boot/
> > >
> > >
> https://www.reddit.com/r/linuxmint/comments/1d9w6c9/kernel_5150112_boot_failure/
> > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068735
> > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068793
> > > https://bugs.launchpad.net/bugs/2068812
> > >
> > > As bizarre as it is, this commit was actually originally included in
> > > 5.15-rc5:
> > >
> > > commit 714d9e4574d54596973ee3b0624ee4a16264d700
> > > Author: Yifan Zhang yifan1.zh...@amd.com
> > > Date: Tue Sep 28 15:42:35 2021 +0800
> > > Subject: drm/amdgpu: init iommu after amdkfd device init
> > > Link:
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=714d9e4574d54596973ee3b0624ee4a16264d700
> > >
> > > It seems to have caused issues back then too, and was removed in the
> > > following fixups, in 5.16-rc1:
> > >
> > > commit 93cec184788b0cf3926bc1f7b47fed74ba87990c
> > > Author: James Zhu james....@amd.com
> > > Date: Tue Nov 2 21:33:50 2021 -0400
> > > Subject: drm/amdgpu: remove duplicated kfd_resume_iommu
> > > Link:
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=93cec184788b0cf3926bc1f7b47fed74ba87990c
> > >
> > > commit 9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > > Author: shaoyunl shaoyun....@amd.com
> > > Date: Fri Nov 5 12:34:14 2021 -0400
> > > Subject: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
> > > Link:
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > >
> > > I'm not exactly in favor of rewriting history twice, so I think we
> > > should just revert the upstream stable patch and move on.
> > >
> > > To manage notifications about this bug go to:
> > >
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions
> >
> >
> > --
> > You received this bug notification because you are subscribed to a
> > duplicate bug report (2069485).
> > https://bugs.launchpad.net/bugs/2068738
> >
> > Title:
> > AMD GPUs fail with null pointer dereference when IOMMU enabled,
> > leading to black screen
> >
> > Status in linux package in Ubuntu:
> > Fix Released
> > Status in linux source package in Jammy:
> > Fix Released
> >
> > Bug description:
> > BugLink: https://bugs.launchpad.net/bugs/2068738
> >
> > [Impact]
> >
> > On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
> > enabled, the system fails to boot correctly, and all users see is a
> > black screen.
> >
> > This is caused by a null pointer dereference when enabling the IOMMU
> > after the device has been initialised. It should happen the other way
> > around.
> >
> > AMD-Vi: AMD IOMMUv2 loaded and initialized
> > ...
> > amdgpu: Topology: Add APU node [0x15d8:0x1002]
> > kfd kfd: amdgpu: added device 1002:15d8
> > kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
> > ...
> > amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
> > amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
> > amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
> > ...
> > BUG: kernel NULL pointer dereference, address: 000000000000013c
> > ...
> > CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> #122-Ubuntu
> > ...
> > RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > ...
> > Call Trace:
> > <TASK>
> >
> > ? srso_return_thunk+0x5/0x10
> > ? show_trace_log_lvl+0x28e/0x2ea
> > ? show_trace_log_lvl+0x28e/0x2ea
> > ? dm_hw_fini+0x23/0x30 [amdgpu]
> > ? show_regs.part.0+0x23/0x29
> > ? __die_body.cold+0x8/0xd
> > ? __die+0x2b/0x37
> > ? page_fault_oops+0x13b/0x170
> > ? srso_return_thunk+0x5/0x10
> > ? do_user_addr_fault+0x321/0x670
> > ? srso_return_thunk+0x5/0x10
> > ? __free_pages_ok+0x34a/0x4f0
> > ? exc_page_fault+0x77/0x170
> > ? asm_exc_page_fault+0x27/0x30
> > ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > dm_hw_fini+0x23/0x30 [amdgpu]
> > amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
> > amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
> > amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
> > amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
> > amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
> > local_pci_probe+0x4b/0x90
> > ? srso_return_thunk+0x5/0x10
> > pci_device_probe+0x119/0x200
> > really_probe+0x222/0x420
> > __driver_probe_device+0xe8/0x140
> > driver_probe_device+0x23/0xc0
> > __driver_attach+0xf7/0x1f0
> > ? __device_attach_driver+0x140/0x140
> > bus_for_each_dev+0x7f/0xd0
> > driver_attach+0x1e/0x30
> > bus_add_driver+0x148/0x220
> > ? srso_return_thunk+0x5/0x10
> > driver_register+0x95/0x100
> > __pci_register_driver+0x68/0x70
> > amdgpu_init+0x7c/0x1000 [amdgpu]
> > ? 0xffffffffc0e0b000
> > do_one_initcall+0x49/0x1e0
> > ? srso_return_thunk+0x5/0x10
> > ? kmem_cache_alloc_trace+0x19e/0x2e0
> > do_init_module+0x52/0x260
> > load_module+0xb45/0xbe0
> > __do_sys_finit_module+0xbf/0x120
> > __x64_sys_finit_module+0x18/0x20
> > x64_sys_call+0x1ac3/0x1fa0
> > do_syscall_64+0x56/0xb0
> > ...
> > entry_SYSCALL_64_after_hwframe+0x67/0xd1
> >
> > A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
> > to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
> >
> > [Fix]
> >
> > The regression was caused by the following commit that landed in
> > 5.15.0-112-generic, and 5.15.150 upstream:
> >
> > commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
> > Author: Yifan Zhang yifan1.zh...@amd.com
> >
> > Date: Tue Sep 28 15:42:35 2021 +0800
> > Subject: drm/amdgpu: init iommu after amdkfd device init
> > Link:
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
> >
> > The fix is to revert this patch, as it was not suppose to be
> > backported to 5.15 stable.
> >
> > The mailing list discussion with AMD developers is:
> >
> > https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
> >
> > The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
> > sending as a Ubuntu SAUCE patch. If the upstream status changes, we
> > can NAK and resend.
> >
> > [Testcase]
> >
> > You need a system with an AMD Picasso/Raven 2 device. It will likely
> > be an APU, and not a discrete graphics card, but any AMD Picasso/Raven
> > 2 device is affected.
> >
> > Install the kernel and boot. Make sure full modesetting is enabled.
> >
> > There is a test kernel available in the ppa below:
> >
> > https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
> >
> > If you install the test kernel, your system should boot successfully.
> >
> > [Where problems could occur]
> >
> > We are reverting a problematic patch and going back to how it was
> > before 5.15.0-112-generic. This should not cause any issues for users.
> >
> > If a regression were to occur, users can set "nomodeset" or
> > "amd_iommu=off" to GRUB_CMDLINE_LINUX_DEFAULT and reboot, or pin their
> > kernel to a working one.
> >
> > The impact of a regression would be high, as users displays could be
> > blank.
> >
> > [Other Info]
> >
> > User reports:
> > https://forums.linuxmint.com/viewtopic.php?t=421484
> > https://forums.linuxmint.com/viewtopic.php?t=421441
> >
> https://www.reddit.com/r/Ubuntu/comments/1d9uviz/had_to_purge_kernel_5150112_could_not_boot/
> >
> https://www.reddit.com/r/linuxmint/comments/1d9w6c9/kernel_5150112_boot_failure/
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068735
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068793
> > https://bugs.launchpad.net/bugs/2068812
> >
> > As bizarre as it is, this commit was actually originally included in
> > 5.15-rc5:
> >
> > commit 714d9e4574d54596973ee3b0624ee4a16264d700
> > Author: Yifan Zhang yifan1.zh...@amd.com
> >
> > Date: Tue Sep 28 15:42:35 2021 +0800
> > Subject: drm/amdgpu: init iommu after amdkfd device init
> > Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=714d9e4574d54596973ee3b0624ee4a16264d700
> >
> > It seems to have caused issues back then too, and was removed in the
> > following fixups, in 5.16-rc1:
> >
> > commit 93cec184788b0cf3926bc1f7b47fed74ba87990c
> > Author: James Zhu james....@amd.com
> >
> > Date: Tue Nov 2 21:33:50 2021 -0400
> > Subject: drm/amdgpu: remove duplicated kfd_resume_iommu
> > Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=93cec184788b0cf3926bc1f7b47fed74ba87990c
> >
> > commit 9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> > Author: shaoyunl shaoyun....@amd.com
> >
> > Date: Fri Nov 5 12:34:14 2021 -0400
> > Subject: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
> > Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
> >
> > I'm not exactly in favor of rewriting history twice, so I think we
> > should just revert the upstream stable patch and move on.
> >
> > To manage notifications about this bug go to:
> >
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2068738
>
> Title:
>   AMD GPUs fail with null pointer dereference when IOMMU enabled,
>   leading to black screen
>
> Status in linux package in Ubuntu:
>   Fix Released
> Status in linux source package in Jammy:
>   Fix Released
>
> Bug description:
>   BugLink: https://bugs.launchpad.net/bugs/2068738
>
>   [Impact]
>
>   On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
>   enabled, the system fails to boot correctly, and all users see is a
>   black screen.
>
>   This is caused by a null pointer dereference when enabling the IOMMU
>   after the device has been initialised. It should happen the other way
>   around.
>
>   AMD-Vi: AMD IOMMUv2 loaded and initialized
>   ...
>   amdgpu: Topology: Add APU node [0x15d8:0x1002]
>   kfd kfd: amdgpu: added device 1002:15d8
>   kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
>   ...
>   amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
>   amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
>   amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
>   ...
>   BUG: kernel NULL pointer dereference, address: 000000000000013c
>   ...
>   CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> #122-Ubuntu
>   ...
>   RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>   ...
>   Call Trace:
>    <TASK>
>    ? srso_return_thunk+0x5/0x10
>    ? show_trace_log_lvl+0x28e/0x2ea
>    ? show_trace_log_lvl+0x28e/0x2ea
>    ? dm_hw_fini+0x23/0x30 [amdgpu]
>    ? show_regs.part.0+0x23/0x29
>    ? __die_body.cold+0x8/0xd
>    ? __die+0x2b/0x37
>    ? page_fault_oops+0x13b/0x170
>    ? srso_return_thunk+0x5/0x10
>    ? do_user_addr_fault+0x321/0x670
>    ? srso_return_thunk+0x5/0x10
>    ? __free_pages_ok+0x34a/0x4f0
>    ? exc_page_fault+0x77/0x170
>    ? asm_exc_page_fault+0x27/0x30
>    ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>    dm_hw_fini+0x23/0x30 [amdgpu]
>    amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
>    amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
>    amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
>    amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
>    amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
>    local_pci_probe+0x4b/0x90
>    ? srso_return_thunk+0x5/0x10
>    pci_device_probe+0x119/0x200
>    really_probe+0x222/0x420
>    __driver_probe_device+0xe8/0x140
>    driver_probe_device+0x23/0xc0
>    __driver_attach+0xf7/0x1f0
>    ? __device_attach_driver+0x140/0x140
>    bus_for_each_dev+0x7f/0xd0
>    driver_attach+0x1e/0x30
>    bus_add_driver+0x148/0x220
>    ? srso_return_thunk+0x5/0x10
>    driver_register+0x95/0x100
>    __pci_register_driver+0x68/0x70
>    amdgpu_init+0x7c/0x1000 [amdgpu]
>    ? 0xffffffffc0e0b000
>    do_one_initcall+0x49/0x1e0
>    ? srso_return_thunk+0x5/0x10
>    ? kmem_cache_alloc_trace+0x19e/0x2e0
>    do_init_module+0x52/0x260
>    load_module+0xb45/0xbe0
>    __do_sys_finit_module+0xbf/0x120
>    __x64_sys_finit_module+0x18/0x20
>    x64_sys_call+0x1ac3/0x1fa0
>    do_syscall_64+0x56/0xb0
>   ...
>    entry_SYSCALL_64_after_hwframe+0x67/0xd1
>
>   A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
>   to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
>
>   [Fix]
>
>   The regression was caused by the following commit that landed in
>   5.15.0-112-generic, and 5.15.150 upstream:
>
>   commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
>   Author: Yifan Zhang <yifan1.zh...@amd.com>
>   Date: Tue Sep 28 15:42:35 2021 +0800
>   Subject: drm/amdgpu: init iommu after amdkfd device init
>   Link:
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
>
>   The fix is to revert this patch, as it was not suppose to be
>   backported to 5.15 stable.
>
>   The mailing list discussion with AMD developers is:
>
>   https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
>
>   The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
>   sending as a Ubuntu SAUCE patch. If the upstream status changes, we
>   can NAK and resend.
>
>   [Testcase]
>
>   You need a system with an AMD Picasso/Raven 2 device. It will likely
>   be an APU, and not a discrete graphics card, but any AMD Picasso/Raven
>   2 device is affected.
>
>   Install the kernel and boot. Make sure full modesetting is enabled.
>
>   There is a test kernel available in the ppa below:
>
>   https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
>
>   If you install the test kernel, your system should boot successfully.
>
>   [Where problems could occur]
>
>   We are reverting a problematic patch and going back to how it was
>   before 5.15.0-112-generic. This should not cause any issues for users.
>
>   If a regression were to occur, users can set "nomodeset" or
>   "amd_iommu=off" to GRUB_CMDLINE_LINUX_DEFAULT and reboot, or pin their
>   kernel to a working one.
>
>   The impact of a regression would be high, as users displays could be
>   blank.
>
>   [Other Info]
>
>   User reports:
>   https://forums.linuxmint.com/viewtopic.php?t=421484
>   https://forums.linuxmint.com/viewtopic.php?t=421441
>
> https://www.reddit.com/r/Ubuntu/comments/1d9uviz/had_to_purge_kernel_5150112_could_not_boot/
>
> https://www.reddit.com/r/linuxmint/comments/1d9w6c9/kernel_5150112_boot_failure/
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068735
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068793
>   https://bugs.launchpad.net/bugs/2068812
>
>   As bizarre as it is, this commit was actually originally included in
>   5.15-rc5:
>
>   commit 714d9e4574d54596973ee3b0624ee4a16264d700
>   Author: Yifan Zhang <yifan1.zh...@amd.com>
>   Date: Tue Sep 28 15:42:35 2021 +0800
>   Subject: drm/amdgpu: init iommu after amdkfd device init
>   Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=714d9e4574d54596973ee3b0624ee4a16264d700
>
>   It seems to have caused issues back then too, and was removed in the
>   following fixups, in 5.16-rc1:
>
>   commit 93cec184788b0cf3926bc1f7b47fed74ba87990c
>   Author: James Zhu <james....@amd.com>
>   Date:   Tue Nov 2 21:33:50 2021 -0400
>   Subject: drm/amdgpu: remove duplicated kfd_resume_iommu
>   Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=93cec184788b0cf3926bc1f7b47fed74ba87990c
>
>   commit 9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
>   Author: shaoyunl <shaoyun....@amd.com>
>   Date:   Fri Nov 5 12:34:14 2021 -0400
>   Subject: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
>   Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f4f2c1a35248f56b2a9c1c004e0aaff3609b15d
>
>   I'm not exactly in favor of rewriting history twice, so I think we
>   should just revert the upstream stable patch and move on.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions
>
>

-- 
Erv Bendiks

416-816-9802

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2068738

Title:
  AMD GPUs fail with null pointer dereference when IOMMU enabled,
  leading to black screen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to