On Thu, Dec 18, 2025 at 12:21 AM Timur Kristóf <[email protected]> wrote: > > On 2025. december 15., hétfő 10:07:09 középső államokbeli zónaidő Alex Deucher > wrote: > > Only set an error on the fence if the fence is not > > signalled. We can end up with a warning if the > > per queue reset path signals the fence and sets an error > > as part of the reset, but fails to recover. > > Can you please elaborate why this is necessary? > I don't entirely see the point of this patch. Why don't want to set an error > on the fence when it was signalled by the per queue reset? I would have > thought that the next patch does that, and also fixes the warning mentioned in > the commit message here.
If you call dma_fence_set_error() on a fence that has already signaled it triggers a warning. What could happen is that the queue reset sets the error on the fence and then signals the fence as part of the reset sequence. However if the queue reset ultimately fails, the fence is already signaled and then we try and set an error again here as we fall back to adapter reset, triggering the warning. Alex > > > > > Signed-off-by: Alex Deucher <[email protected]> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index > > 67fde99724bad..7f5d01164897f 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > @@ -147,7 +147,8 @@ static enum drm_gpu_sched_stat > > amdgpu_job_timedout(struct drm_sched_job *s_job) dev_err(adev->dev, "Ring > > %s reset failed\n", ring->sched.name); } > > > > - dma_fence_set_error(&s_job->s_fence->finished, -ETIME); > > + if (dma_fence_get_status(&s_job->s_fence->finished) == 0) > > + dma_fence_set_error(&s_job->s_fence->finished, -ETIME); > > > > amdgpu_vm_put_task_info(ti); > > > >
