On 6/16/2025 9:54 AM, Peter Zijlstra wrote:
On Mon, Jun 16, 2025 at 01:51:21PM +0200, Christian König wrote:
Hi Peter,

On 6/16/25 11:39, Peter Zijlstra wrote:
Hi guys,

My (Intel Sapphire Rapids) workstation has a RX 7800 XT and when I kexec
a bunch of times, the amdgpu driver gets upset and barfs on boot.

yeah, that is an "intentional" HW feature and yes you're certainly not
the first one to complain about it :(

The PSP (platform security processor IIRC) is designed in such a way
that you can initialize it only once after a power cycle / hard reset
for security reasons (e.g. to not leak crypto keys used for digital
rights management etc..).

On dGPUs we work around that manually by power cycling the ASIC when
that situation is detected during amdgpu load, but that unfortunately
doesn't work 100% reliable.

Right.. hence the splats.

How about if we reset before the kexec? There is a symbol for drivers to use to know they're about to go through kexec to do $THINGS.

Something like this:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 0fc0eeedc6461..2b1216b14d618 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -34,6 +34,7 @@

 #include <linux/cc_platform.h>
 #include <linux/dynamic_debug.h>
+#include <linux/kexec.h>
 #include <linux/module.h>
 #include <linux/mmu_notifier.h>
 #include <linux/pm_runtime.h>
@@ -2544,6 +2545,9 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
                adev->mp1_state = PP_MP1_STATE_UNLOAD;
        amdgpu_device_ip_suspend(adev);
        adev->mp1_state = PP_MP1_STATE_NONE;
+
+       if (kexec_in_progress)
+               amdgpu_asic_reset(adev);
 }

 static int amdgpu_pmops_prepare(struct device *dev)


On APUs the situation is even worse because the PSP is shared between
the GPU and the CPU.

We have forwarded such complains internally for years, but there is
not much else Alex and I can do about it.

Oh well. Thanks for the info!


Reply via email to