On Thu, Apr 17, 2025 at 03:20:44PM +0100, Tvrtko Ursulin wrote: > > On 17/04/2025 13:11, Danilo Krummrich wrote: > > On Thu, Apr 17, 2025 at 12:27:29PM +0100, Tvrtko Ursulin wrote: > > > > > > On 17/04/2025 08:45, Philipp Stanner wrote: > > > > On Mon, 2025-04-07 at 17:22 +0200, Philipp Stanner wrote: > > > > > > Problem exactly is that jobs can outlive the entities and the scheduler, > > > while some userspace may have a dma fence reference to the job via sync > > > file. This new callback would not solve it for xe, but if everything > > > required was reference counted it would. > > > > I think you're mixing up the job and the dma_fence here, if a job outlives > > the > > scheduler, it clearly is a bug, always has been. > > > > AFAIK, Xe reference counts it's driver specific job structures *and* the > > driver > > specific scheduler structure, such that drm_sched_fini() won't be called > > before > > all jobs have finished. > > Yes, sorry, dma fence. But it is not enough to postpone drm_sched_fini until > the job is not finished. Problem is exported dma fence holds the pointer to > drm_sched_fence (and so oopses in drm_sched_fence_get_timeline_name on > fence->sched->name) *after* job had finished and driver was free to tear > everything down.
Well, that's a bug in drm_sched_fence then and independent from the other topic. Once the finished fence in a struct drm_sched_fence has been signaled it must live independent of the scheduler. The lifetime of the drm_sched_fence is entirely independent from the scheduler itself, as you correctly point out. Starting to reference count things to keep the whole scheduler etc. alive as long as the drm_sched_fence lives is not the correct solution. > > Multiple solutions have been discussed already, e.g. just wait for the > > pending > > list to be empty, reference count the scheduler for every pending job. > > Those all > > had significant downsides, which I don't see with this proposal. > > > > I'm all for better ideas though -- what do you propose? > > I think we need to brainstorm both issues and see if there is a solution > which solves them both, with bonus points for being elegant. The problems are not related. As mentioned above, once signaled a drm_sched_fence must not depend on the scheduler any longer. > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > > > index 6b72278c4b72..ae3152beca14 100644 > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > > @@ -1465,6 +1465,10 @@ void drm_sched_fini(struct drm_gpu_scheduler > > > > > *sched) > > > > > sched->ready = false; > > > > > kfree(sched->sched_rq); > > > > > sched->sched_rq = NULL; > > > > > + > > > > > + if (!list_empty(&sched->pending_list)) > > > > > + dev_err(sched->dev, "%s: Tearing down scheduler > > > > > while jobs are pending!\n", > > > > > + __func__); > > > > > > It isn't fair to add this error since it would out of the blue start > > > firing > > > for everyone expect nouveau, no? Regardless if there is a leak or not. > > > > I think it is pretty fair to warn when detecting a guaranteed bug, no? > > > > If drm_sched_fini() is call while jobs are still on the pending_list, they > > won't > > ever be freed, because all workqueues are stopped. > > Is it a guaranteed bug for drivers are aware of the drm_sched_fini() > limitation and are cleaning up upon themselves? How could a driver clean up on itself (unless the driver holds its own list of pending jobs)? Once a job is in flight (i.e. it's on the pending_list) we must guarantee that free_job() is called by the scheduler, which it can't do if we call drm_sched_fini() before the pending_list is empty. > In other words if you apply the series up to here would it trigger for > nouveau? No, because nouveau does something very stupid, i.e. replicate the pending_list. > Reportedly it triggers for the mock scheduler which also has no > leak. That sounds impossible. How do you ensure you do *not* leak memory when you tear down the scheduler while it still has pending jobs? Or in other words, who calls free_job() if not the scheduler itself? > Also, I asked in my initial reply if we have a list of which of the current > drivers suffer from memory leaks. Is it all or some etc. Not all, but quite some I think. The last time I looked (which is about a year ago) amdgpu for instance could leak memory when you unbind the driver while enough jobs are in flight.