On the Asus Z13 2025, which uses a Strix Halo platform, around 8% of the suspend resumes result in a soft lock around 1 second after the screen turns on (it freezes). This happens due to power gating VPE when it is not used, which happens 1 second after inactivity.
Specifically, the VPE gating after resume is as follows: an initial ungate, followed by a gate in the resume process. Then, amdgpu_device_delayed_init_work_handler with a delay of 2s is scheduled to run tests, one of which is testing VPE in vpe_ring_test_ib. This causes an ungate, After that test, vpe_idle_work_handler is scheduled with VPE_IDLE_TIMEOUT (1s). When vpe_idle_work_handler runs and tries to gate VPE, it causes the SMU to hang and partially freezes half of the GPU IPs, with the thread that called the command being stuck processing it. Specifically, after that SMU command tries to run, we get the following: snd_hda_intel 0000:c4:00.1: Refused to change power state from D0 to D3hot ... xhci_hcd 0000:c4:00.4: Refused to change power state from D0 to D3hot ... amdgpu 0000:c4:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000032 SMN_C2PMSG_82:0x00000000 amdgpu 0000:c4:00.0: amdgpu: Failed to power gate VPE! [drm:vpe_set_powergating_state [amdgpu]] *ERROR* Dpm disable vpe failed, ret = -62. amdgpu 0000:c4:00.0: [drm] *ERROR* [CRTC:93:crtc-0] flip_done timed out amdgpu 0000:c4:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000032 SMN_C2PMSG_82:0x00000000 amdgpu 0000:c4:00.0: amdgpu: Failed to power gate JPEG! [drm:jpeg_v4_0_5_set_powergating_state [amdgpu]] *ERROR* Dpm disable jpeg failed, ret = -62. amdgpu 0000:c4:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000032 SMN_C2PMSG_82:0x00000000 amdgpu 0000:c4:00.0: amdgpu: Failed to power gate VCN instance 0! [drm:vcn_v4_0_5_stop [amdgpu]] *ERROR* Dpm disable uvd failed, ret = -62. thunderbolt 0000:c6:00.5: 0: timeout reading config space 1 from 0xd3 thunderbolt 0000:c6:00.5: 0: timeout reading config space 2 from 0x5 thunderbolt 0000:c6:00.5: Refused to change power state from D0 to D3hot amdgpu 0000:c4:00.0: [drm] *ERROR* [CRTC:97:crtc-1] flip_done timed out amdgpu 0000:c4:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000032 SMN_C2PMSG_82:0x00000000 amdgpu 0000:c4:00.0: amdgpu: Failed to power gate VCN instance 1! In addition to e.g., kwin errors in journalctl. 0000:c4.00.0 is the GPU. Interestingly, 0000:c4.00.6, which is another HDA block, 0000:c4.00.5, a PCI controller, and 0000:c4.00.2, resume normally. 0x00000032 is the PowerDownVpe(50) command which is the common failure point in all failed resumes. On a normal resume, we should get the following power gates: amdgpu 0000:c4:00.0: amdgpu: smu send message: PowerDownVpe(50) param: 0x00000000, resp: 0x00000001 amdgpu 0000:c4:00.0: amdgpu: smu send message: PowerDownJpeg0(33) param: 0x00000000, resp: 0x00000001 amdgpu 0000:c4:00.0: amdgpu: smu send message: PowerDownJpeg1(38) param: 0x00010000, resp: 0x00000001 amdgpu 0000:c4:00.0: amdgpu: smu send message: PowerDownVcn1(4) param: 0x00010000, resp: 0x00000001 amdgpu 0000:c4:00.0: amdgpu: smu send message: PowerDownVcn0(6) param: 0x00000000, resp: 0x00000001 amdgpu 0000:c4:00.0: amdgpu: smu send message: PowerUpVcn0(7) param: 0x00000000, resp: 0x00000001 amdgpu 0000:c4:00.0: amdgpu: smu send message: PowerUpVcn1(5) param: 0x00010000, resp: 0x00000001 amdgpu 0000:c4:00.0: amdgpu: smu send message: PowerUpJpeg0(34) param: 0x00000000, resp: 0x00000001 amdgpu 0000:c4:00.0: amdgpu: smu send message: PowerUpJpeg1(39) param: 0x00010000, resp: 0x00000001 To fix this, increase VPE_IDLE_TIMEOUT to 2 seconds. This increases reliability from 4-25 suspends to 200+ (tested) suspends with a cycle time of 12s sleep, 8s resume. The suspected reason here is that 1s that when VPE is used, it needs a bit of time before it can be gated and there was a borderline delay before, which is not enough for Strix Halo. When the VPE is not used, such as on resume, gating it instantly does not seem to cause issues. Fixes: 5f82a0c90cca ("drm/amdgpu/vpe: enable vpe dpm") Signed-off-by: Antheas Kapenekakis <l...@antheas.dev> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c index 121ee17b522b..24f09e457352 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c @@ -34,8 +34,8 @@ /* VPE CSA resides in the 4th page of CSA */ #define AMDGPU_CSA_VPE_OFFSET (4096 * 3) -/* 1 second timeout */ -#define VPE_IDLE_TIMEOUT msecs_to_jiffies(1000) +/* 2 second timeout */ +#define VPE_IDLE_TIMEOUT msecs_to_jiffies(2000) #define VPE_MAX_DPM_LEVEL 4 #define FIXED1_8_BITS_PER_FRACTIONAL_PART 8 base-commit: c17b750b3ad9f45f2b6f7e6f7f4679844244f0b9 -- 2.50.1