On Fri, May 16, 2025 at 11:19:50AM +0100, Tvrtko Ursulin wrote: > > On 16/05/2025 10:53, Danilo Krummrich wrote: > > On Fri, May 16, 2025 at 10:33:30AM +0100, Tvrtko Ursulin wrote: > > > On 24/04/2025 10:55, Philipp Stanner wrote: > > > > + * @kill_fence_context: kill the fence context belonging to > > > > this scheduler > > > > > > Which fence context would that be? ;) > > > > There's one one per ring and a scheduler instance represents a single ring. > > So, > > what should be specified here? > > I was pointing out the fact not all drivers are 1:1 sched:entity.
I'm well aware, but how is that relevant? Entities don't have an associated fence context, but a GPU Ring (either hardware or software) has, which a scheduler instance represents. > Thought it would be obvious from the ";)". I should read from ";)" that you refer to a 1:N-sched:entity relationship (which doesn't seem to be related)? > > > Also, "fence context" would be a new terminology in gpu_scheduler.h API > > > level. You could call it ->sched_fini() or similar to signify at which > > > point > > > in the API it gets called and then the fact it takes sched as parameter > > > would be natural. > > > > The driver should tear down the fence context in this callback, not the > > while > > scheduler. ->sched_fini() would hence be misleading. > > Not the while what? Not while drm_sched_fini()? *whole > Could call it sched_kill() > or anything. My point is that we dont' have "fence context" in the API but > entities so adding a new term sounds sub-optimal. In the callback the driver should neither tear down an entity, nor the whole scheduler, hence we shouldn't call it like that. sched_kill() is therefore misleading as well. It should be named after what it actually does (or should do). Feel free to propose a different name that conforms with that. > > > We also probably want some commentary on the topic of indefinite (or very > > > long at least) blocking a thread exit / SIGINT/TERM/KILL time. > > > > You mean in case the driver does implement the callback, but does *not* > > properly > > tear down the fence context? So, you ask for describing potential > > consequences > > of drivers having bugs in the implementation of the callback? Or something > > else? > > I was proposing the kerneldoc for the vfunc should document the callback > must not block, or if blocking is unavoidable, either document a guideline > on how long is acceptable. Maybe even enforce a limit in the scheduler core > itself. Killing the fence context shouldn't block.