Hi Philipp,
On 02/06/25 04:28, Philipp Stanner wrote:
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
[...]
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index
7146069a98492f5fab2a49d96e2054f649e1fe3d..46f5391e84a12232b247886cf13
11f8e09f42f04 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -40,11 +40,11 @@ static enum drm_gpu_sched_stat
etnaviv_sched_timedout_job(struct drm_sched_job
int change;
/*
- * If the GPU managed to complete this jobs fence, the
timout is
- * spurious. Bail out.
+ * If the GPU managed to complete this jobs fence, the
timeout has
+ * fired before free-job worker. The timeout is spurious, so
bail out.
*/
if (dma_fence_is_signaled(submit->out_fence))
- goto out_no_timeout;
+ return DRM_GPU_SCHED_STAT_NO_HANG;
/*
* If the GPU is still making forward progress on the front-
end (which
@@ -70,7 +70,7 @@ static enum drm_gpu_sched_stat
etnaviv_sched_timedout_job(struct drm_sched_job
gpu->hangcheck_dma_addr = dma_addr;
gpu->hangcheck_primid = primid;
gpu->hangcheck_fence = gpu->completed_fence;
- goto out_no_timeout;
+ return DRM_GPU_SCHED_STAT_NO_HANG;
}
/* block scheduler */
@@ -86,10 +86,7 @@ static enum drm_gpu_sched_stat
etnaviv_sched_timedout_job(struct drm_sched_job
drm_sched_resubmit_jobs(&gpu->sched);
drm_sched_start(&gpu->sched, 0);
- return DRM_GPU_SCHED_STAT_RESET;
-out_no_timeout:
- list_add(&sched_job->list, &sched_job->sched->pending_list);
Here you actually remove the manipulation of the scheduler internals,
but you didn't in v3d. Just to point that out.
And BTW I'm just seeing that the pending_list gets manipulated here
with the scheduler's workqueues running and no locks being hold.
Oh man :(
That is most certainly a bug, and I recommend that the etnaviv
maintainers at least add the appropriate lock here and backport that
since it can race any time.
But thx for working on that, Maíra. Good that we can remove the stuff
this way.
Thinking about it, this patch even fixes a bug. So could contain a
Fixes: tag. But I'm not sure if it's worth it to mark the entire series
for Stable. Opinions?
I believe the best way to fix this bug would be doing something similar
to what I did to v3d: send a temporary fix adding the locks, which will
be backported to stable. I'll send a fix to Etnaviv today.
Thanks for the review, Phillip!
Best Regards,
- Maíra
P.
return DRM_GPU_SCHED_STAT_RESET;
}