When a GPU job times out, the driver attempts to recover by restarting the scheduler. Previously, the scheduler was restarted with an error code of 0, which does not distinguish between a full GPU reset and a queue reset. This patch changes the error code to -ENODATA for queue resets, while -ECANCELED or -ETIME is used for full GPU resets.
This change improves error handling by: 1. Clearly differentiating between queue resets and full GPU resets. 2. Providing more specific error codes for better debugging and recovery. 3. Aligning with kernel best practices for error reporting. The related commit "b2ef808786d93df3658" (drm/sched: add optional errno to drm_sched_start()) introduced support for passing an error code to drm_sched_start(), enabling this improvement. Signed-off-by: Vitaly Prosyak <vitaly.pros...@amd.com> Signed-off-by: Jesse Zhang <jesse.zh...@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 100f04475943..b18b316872a0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -148,7 +148,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) atomic_inc(&ring->adev->gpu_reset_counter); amdgpu_fence_driver_force_completion(ring); if (amdgpu_ring_sched_ready(ring)) - drm_sched_start(&ring->sched, 0); + drm_sched_start(&ring->sched, -ENODATA); goto exit; } dev_err(adev->dev, "Ring %s reset failure\n", ring->sched.name); -- 2.25.1