On Fri, 2025-10-03 at 10:26 +0100, Tvrtko Ursulin wrote:
> Drm_sched_job_add_dependency() consumes the fence reference both on

s/D/d

> success and failure, so in the latter case the dma_fence_put() on the
> error path (xarray failed to expand) is a double free.
> 
> Interestingly this bug appears to have been present ever since
> ebd5f74255b9 ("drm/sched: Add dependency tracking"), since the code back
> then looked like this:
> 
> drm_sched_job_add_implicit_dependencies():
> ...
>        for (i = 0; i < fence_count; i++) {
>                ret = drm_sched_job_add_dependency(job, fences[i]);
>                if (ret)
>                        break;
>        }
> 
>        for (; i < fence_count; i++)
>                dma_fence_put(fences[i]);
> 
> Which means for the failing 'i' the dma_fence_put was already a double
> free. Possibly there were no users at that time, or the test cases were
> insufficient to hit it.
> 
> The bug was then only noticed and fixed after
> 9c2ba265352a ("drm/scheduler: use new iterator in 
> drm_sched_job_add_implicit_dependencies v2")
> landed, with its fixup of
> 4eaf02d6076c ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies").
> 
> At that point it was a slightly different flavour of a double free, which
> 963d0b356935 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies 
> harder")
> noticed and attempted to fix.
> 
> But it only moved the double free from happening inside the
> drm_sched_job_add_dependency(), when releasing the reference not yet
> obtained, to the caller, when releasing the reference already released by
> the former in the failure case.

That's certainly interesting, but is there a specific reason why you
include all of that?

The code is as is, and AFAICS it's just a bug stemming from original
bugs present and then refactorings happening.

I would at least remove the old 'implicit_dependencies' function from
the commit message. It's just confusing and makes one look for that in
the current code or patch.

> 
> As such it is not easy to identify the right target for the fixes tag so
> lets keep it simple and just continue the chain.
> 
> We also drop the misleading comment about additional reference, since it
> is not additional but the only one from the point of view of dependency
> tracking.


IMO that comment is nonsense. It's useless, too, because I can *see*
that a reference is being taken there, but not *why*.

Argh, these comments. See also my commit 72ebc18b34993


Anyways. Removing it is fine, but adding a better comment is better.
See below.

> 
> Signed-off-by: Tvrtko Ursulin <[email protected]>
> Fixes: 963d0b356935 ("drm/scheduler: fix 
> drm_sched_job_add_implicit_dependencies harder")
> Reported-by: Dan Carpenter <[email protected]>

Is there an error report that could be included here with a Closes:
tag?

> Cc: Christian König <[email protected]>
> Cc: Rob Clark <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Matthew Brost <[email protected]>
> Cc: Danilo Krummrich <[email protected]>
> Cc: Philipp Stanner <[email protected]>
> Cc: "Christian König" <[email protected]>
> Cc: [email protected]
> Cc: <[email protected]> # v5.16+
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 46119aacb809..aff34240f230 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -960,20 +960,16 @@ int drm_sched_job_add_resv_dependencies(struct 
> drm_sched_job *job,
>  {
>       struct dma_resv_iter cursor;
>       struct dma_fence *fence;
> -     int ret;
> +     int ret = 0;
>  
>       dma_resv_assert_held(resv);
>  
>       dma_resv_for_each_fence(&cursor, resv, usage, fence) {
> -             /* Make sure to grab an additional ref on the added fence */
> -             dma_fence_get(fence);
> -             ret = drm_sched_job_add_dependency(job, fence);
> -             if (ret) {
> -                     dma_fence_put(fence);
> -                     return ret;
> -             }
> +             ret = drm_sched_job_add_dependency(job, dma_fence_get(fence));

You still take a reference as before, but there is no comment anymore.
Can you add one explaining why a new reference is taken here?

I guess it will be something like "This needs a new reference for the
job", since you cannot rely on the one from resv.

> +             if (ret)
> +                     break;
>       }
> -     return 0;
> +     return ret;


That's an unnecessarily enlargement of the git diff because of style,
isn't it? Better keep the diff minimal here for git blame.


P.


>  }
>  EXPORT_SYMBOL(drm_sched_job_add_resv_dependencies);
>  

Reply via email to