On Fri, 2025-07-18 at 10:35 +0100, Tvrtko Ursulin wrote: > > On 18/07/2025 10:31, Philipp Stanner wrote: > > On Fri, 2025-07-18 at 08:13 +0100, Tvrtko Ursulin wrote: > > > > > > On 16/07/2025 21:44, Maíra Canal wrote: > > > > Hi Tvrtko, > > > > > > > > On 16/07/25 11:46, Tvrtko Ursulin wrote: > > > > > > > > > > On 16/07/2025 15:30, Maíra Canal wrote: > > > > > > Hi Tvrtko, > > > > > > > > > > > > On 16/07/25 10:49, Tvrtko Ursulin wrote: > > > > > > > > > > > > > > On 16/07/2025 14:31, Maíra Canal wrote: > > > > > > > > Hi Tvrtko, > > > > > > > > > > > > > > > > On 16/07/25 05:51, Tvrtko Ursulin wrote: > > > > > > > > > Currently the job free work item will lock sched- > > > > > > > > > >job_list_lock > > > > > > > > > first time > > > > > > > > > to see if there are any jobs, free a single job, and > > > > > > > > > then lock > > > > > > > > > again to > > > > > > > > > decide whether to re-queue itself if there are more > > > > > > > > > finished jobs. > > > > > > > > > > > > > > > > > > Since drm_sched_get_finished_job() already looks at > > > > > > > > > the second job > > > > > > > > > in the > > > > > > > > > queue we can simply add the signaled check and have > > > > > > > > > it return the > > > > > > > > > presence > > > > > > > > > of more jobs to be freed to the caller. That way the > > > > > > > > > work item > > > > > > > > > does not > > > > > > > > > have to lock the list again and repeat the signaled > > > > > > > > > check. > > > > > > > > > > > > > > > > > > Signed-off-by: Tvrtko Ursulin > > > > > > > > > <tvrtko.ursu...@igalia.com> > > > > > > > > > Cc: Christian König <christian.koe...@amd.com> > > > > > > > > > Cc: Danilo Krummrich <d...@kernel.org> > > > > > > > > > Cc: Maíra Canal <mca...@igalia.com> > > > > > > > > > Cc: Matthew Brost <matthew.br...@intel.com> > > > > > > > > > Cc: Philipp Stanner <pha...@kernel.org> > > > > > > > > > --- > > > > > > > > > v2: > > > > > > > > > * Improve commit text and kerneldoc. (Philipp) > > > > > > > > > * Rename run free work helper. (Philipp) > > > > > > > > > > > > > > > > > > v3: > > > > > > > > > * Rebase on top of Maira's changes. > > > > > > > > > --- > > > > > > > > > drivers/gpu/drm/scheduler/sched_main.c | 53 > > > > > > > > > +++++++++ > > > > > > > > > +---------------- > > > > > > > > > 1 file changed, 21 insertions(+), 32 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > > > > > > > b/drivers/gpu/ > > > > > > > > > drm/ scheduler/sched_main.c > > > > > > > > > index e2cda28a1af4..5a550fd76bf0 100644 > > > > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > > > > > > @@ -349,34 +349,13 @@ static void > > > > > > > > > drm_sched_run_job_queue(struct > > > > > > > > > drm_gpu_scheduler *sched) > > > > > > > > > } > > > > > > > > > /** > > > > > > > > > - * __drm_sched_run_free_queue - enqueue free-job > > > > > > > > > work > > > > > > > > > - * @sched: scheduler instance > > > > > > > > > - */ > > > > > > > > > -static void __drm_sched_run_free_queue(struct > > > > > > > > > drm_gpu_scheduler > > > > > > > > > *sched) > > > > > > > > > -{ > > > > > > > > > - if (!READ_ONCE(sched->pause_submit)) > > > > > > > > > - queue_work(sched->submit_wq, &sched- > > > > > > > > > >work_free_job); > > > > > > > > > -} > > > > > > > > > - > > > > > > > > > -/** > > > > > > > > > - * drm_sched_run_free_queue - enqueue free-job work > > > > > > > > > if ready > > > > > > > > > + * drm_sched_run_free_queue - enqueue free-job work > > > > > > > > > * @sched: scheduler instance > > > > > > > > > */ > > > > > > > > > static void drm_sched_run_free_queue(struct > > > > > > > > > drm_gpu_scheduler > > > > > > > > > *sched) > > > > > > > > > { > > > > > > > > > - struct drm_sched_job *job; > > > > > > > > > - > > > > > > > > > - job = list_first_entry_or_null(&sched- > > > > > > > > > >pending_list, > > > > > > > > > - struct drm_sched_job, list); > > > > > > > > > - if (job && dma_fence_is_signaled(&job->s_fence- > > > > > > > > > >finished)) > > > > > > > > > - __drm_sched_run_free_queue(sched); > > > > > > > > > > > > > > > > I believe we'd still need this chunk for > > > > > > > > DRM_GPU_SCHED_STAT_NO_HANG > > > > > > > > (check the comment in > > > > > > > > drm_sched_job_reinsert_on_false_timeout()). How > > > > > > > > > > > > > > You mean the "is there a signaled job in the list check" > > > > > > > is needed > > > > > > > for drm_sched_job_reinsert_on_false_timeout()? Hmm why? > > > > > > > Worst case > > > > > > > is a false positive wakeup on the free worker, no? > > > > > > > > > > > > Correct me if I'm mistaken, we would also have a false > > > > > > positive wake-up > > > > > > on the run_job worker, which I believe it could be > > > > > > problematic in the > > > > > > cases that we skipped the reset because the job is still > > > > > > running. > > > > > > > > > > Run job worker exits when it sees no free credits so I don't > > > > > think > > > > > there is a problem. What am I missing? > > > > > > > > > > > > > I was the one missing the code in `drm_sched_can_queue()`. > > > > Sorry for the > > > > misleading comments. This is: > > > > > > > > Reviewed-by: Maíra Canal <mca...@igalia.com> > > > > > > No worries, and thanks! > > > > > > Philipp - are you okay with this version? V2 was done to address > > > your > > > feedback so that should be good now. > > > > Was just giving it another spin when you wrote. (a [PATCH v3] > > would've > > been neat for identification, though – I almost pulled the wrong > > patch > > from the archive *wink*) > > Oops, my bad. > > > LGTM, improves things, can be merged. > > > > However, we had to merge Lin Cao's bug fix [1] recently. That one > > is > > now in drm-misc-fixes, and your patch should go to drm-misc-next. > > This > > would cause a conflict once the two branches meet. > > > > So I suggest that we wait with this non-urgent patch until drm- > > misc- > > fixes / Linus's -rc gets merged into drm-misc-next, and then we > > apply > > it. Should be next week or the week after AFAIK. > > > > Unless somebody has a better idea, of course? > > Lin's patch touches sched_entity.c only and mine only sched_main.c - > ie. > no conflict AFAICT?
Aaahhh, I had a hallucination ^^' It doesn't apply to drm-misc-fixes, but that is because fixes misses changes that yours is based on. Because Lin's patch was the last thing I touched on that branch I seem to have jumped to that conclusion. Should be fine, then. My bad. Will apply. P. > > Regards, > > Tvrtko > > > > > Remind me in case I forget. > > > > > > P. > > > > [1] > > https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/15f77764e90a713ee3916ca424757688e4f565b9 > > > > > > > > > > Regards, > > > > > > Tvrtko > > > > > >