amd: Reset the GPU if pmops failed

Alex Deucher Wed, 29 Oct 2025 14:29:18 -0700

On Thu, Oct 23, 2025 at 1:01 PM Mario Limonciello
<[email protected]> wrote:
>
> If the GPU fails to suspend the return code is passed up to the caller
> but it's left in an inconsistent state.  This could lead to hangs
> if userspace tries to access it.
>
> The last stage of all pmpops calls (success or fail) is the complete()
> callback.  If by the time the PM core reaches this state the GPU is still
> in suspend something went really wrong, so reset it.


What happens at that stage?  Resetting would put the GPU back into a
known state from a hardware perspective (amdgpu_asic_reset() just does
the hardware reset), but software would still think things are
suspended so the sw side would still be broken.  If you want to try
and reset both the hw and sw state, you'd need to do the whole gpu
recovery sequence via a worker thread.  E.g., see
amdgpu_debugfs_reset_work() for reference.

Alex

>
> Signed-off-by: Mario Limonciello <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index a36e15beafeb..e2d598dd462d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2590,6 +2590,17 @@ static int amdgpu_pmops_prepare(struct device *dev)
>
>  static void amdgpu_pmops_complete(struct device *dev)
>  {
> +       struct drm_device *drm_dev = dev_get_drvdata(dev);
> +       struct amdgpu_device *adev = drm_to_adev(drm_dev);
> +
> +       /* sequence failed, use a big 🔨 try to cleanup */
> +       if (adev->in_suspend) {
> +               adev->in_suspend = adev->in_s0ix = adev->in_s3 = false;
> +               dev_crit(adev->dev, "pmpops sequence failed, resetting\n");
> +               amdgpu_asic_reset(adev);
> +               return;
> +       }
> +
>         amdgpu_device_complete(dev_get_drvdata(dev));
>  }
>
> --
> 2.51.1
>

Re: [PATCH v4 2/2] drm/amd: Reset the GPU if pmops failed

Reply via email to