On Mon, Sep 15, 2025 at 9:20 PM Mario Limonciello
<[email protected]> wrote:
>
> KFD suspend and resume routines have been disabled since commit
> 5d3a2d95224da ("drm/amdgpu: skip kfd suspend/resume for S0ix") which
> made sense at that time. However there is a problem that if there is
> any compute work running there may still be active fences. Running
> suspend without draining them can cause the system to hang.
>
> So run KFD suspend/resume routines even in s0ix.
>
> Signed-off-by: Mario Limonciello <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 0fdfde3dcb9f..59688f8ae919 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5220,10 +5220,9 @@ int amdgpu_device_suspend(struct drm_device *dev, bool
> notify_clients)
>
> amdgpu_device_ip_suspend_phase1(adev);
>
> - if (!adev->in_s0ix) {
> - amdgpu_amdkfd_suspend(adev, !amdgpu_sriov_vf(adev) &&
> !adev->in_runpm);
> + amdgpu_amdkfd_suspend(adev, !amdgpu_sriov_vf(adev) &&
> !adev->in_runpm);
> + if (!adev->in_s0ix)
> amdgpu_userq_suspend(adev);
KGD user queues have the same requirements as KFD user queues so this
should be called as well.
> - }
>
> r = amdgpu_device_evict_resources(adev);
> if (r)
> @@ -5318,11 +5317,11 @@ int amdgpu_device_resume(struct drm_device *dev, bool
> notify_clients)
> goto exit;
> }
>
> - if (!adev->in_s0ix) {
> - r = amdgpu_amdkfd_resume(adev, !amdgpu_sriov_vf(adev) &&
> !adev->in_runpm);
> - if (r)
> - goto exit;
> + r = amdgpu_amdkfd_resume(adev, !amdgpu_sriov_vf(adev) &&
> !adev->in_runpm);
> + if (r)
> + goto exit;
>
> + if (!adev->in_s0ix) {
> r = amdgpu_userq_resume(adev);
Same here.
Alex
> if (r)
> goto exit;
> --
> 2.50.1
>