On Tue, 30 Sep 2025 12:58:29 +0200 "Danilo Krummrich" <[email protected]> wrote:
> On Tue Sep 30, 2025 at 12:12 PM CEST, Boris Brezillon wrote: > > So, my take on that is that what we want ultimately is to have the > > functionality provided by drm_sched split into different > > components that can be used in isolation, or combined to provide > > advanced scheduling. > > > > JobQueue: > > - allows you to queue jobs with their deps > > - dequeues jobs once their deps are met > > Not too sure if we want a push or a pull model for the job dequeuing, > > but the idea is that once the job is dequeued, ownership is passed to > > the SW entity that dequeued it. Note that I intentionally didn't add > > the timeout handling here, because dequeueing a job doesn't necessarily > > mean it's started immediately. If you're dealing with HW queues, you > > might have to wait for a slot to become available. If you're dealing > > with something like Mali-CSF, where the amount of FW slots is limited, > > you want to wait for your execution context to be passed to the FW for > > scheduling, and the final situation is the full-fledged FW scheduling, > > where you want things to start as soon as you have space in your FW > > queue (AKA ring-buffer?). > > > > JobHWDispatcher: (not sure about the name, I'm bad at naming things) > > This object basically pulls ready-jobs from one or multiple JobQueues > > into its own queue, and wait for a HW slot to become available. If you > > go for the push model, the job gets pushed to the HW dispatcher queue > > and waits here until a HW slot becomes available. > > That's where timeouts should be handled, because the job only becomes > > active when it gets pushed to a HW slot. I guess if we want a > > resubmit mechanism, it would have to take place here, but give how > > tricky this has been, I'd be tempted to leave that to drivers, that is, > > let them requeue the non-faulty jobs directly to their > > JobHWDispatcher implementation after a reset. > > > > FWExecutionContextScheduler: (again, pick a different name if you want) > > This scheduler doesn't know about jobs, meaning there's a > > driver-specific entity that needs to dequeue jobs from the JobQueue > > and push those to the relevant ringbuffer. Once a FWExecutionContext > > has something to execute, it becomes a candidate for > > FWExecutionContextScheduler, which gets to decide which set of > > FWExecutionContext get a chance to be scheduled by the FW. > > That one is for Mali-CSF case I described above, and I'm not too sure > > we want it to be generic, at least not until we have another GPU driver > > needing the same kind of scheduling. Again, you want to defer the > > timeout handling to this component, because the timer should only > > start/resume when the FWExecutionContext gets scheduled, and it should > > be paused as soon as the context gets evicted. > > This sounds pretty much like the existing design with the Panthor group > scheduler layered on top of it, no? Kinda, but with a way to use each component independently. > > Though, one of the fundamental problems I'd like to get rid of is that job > ownership is transferred between two components with fundamentally different > lifetimes (entity and scheduler). Can you remind me what the problem is? I thought the lifetime issue was coming from the fact the drm_sched ownership model was lax enough that the job could be owned by both drm_gpu_scheduler and drm_sched_entity at the same time. > > Instead, I think the new Jobqueue should always own and always dispatch jobs > directly and provide some "control API" to be instructed by an external > component (orchestrator) on top of it when and to which ring to dispatch jobs. Feels to me that we're getting back to a model where the JobQueue needs to know about the upper-layer in charge of the scheduling. I mean, it can work, but you're adding some complexity back to JobQueue, which I was expecting to be a simple FIFO with a dep-tracking logic. For instance, I'd be curious to know which component is in charge of the timeout in your ochestrator-based solution? In Philipp's slides it seemed that the timeout was dealt with at the JobQueue level, but that wouldn't work for us, because when we push a job to the ringbuf in panthor, the group this job is queued to might not be active yet. At the moment we have hacks to pause/resume the drm_sched timers [1] but this is racy, so I'm really hoping that the new design will let us control the timeout at the proper level. > > The group scheduling logic you need for some Mali GPUs can either be > implemented > by hooks into this orchestrator or by a separate component that attaches to > the > same control API of the Jobqueue. I have a hard time seeing how it can fully integrate in this orchestrator model. We can hook ourselves in the JobQueue::run_job() and schedule the group for execution when we queue a job to the ringbuf, but the group scheduler would still be something on the side. This is not a big deal, as long as the group scheduler is in charge of the timeout handling. [1]https://lore-kernel.gnuweeb.org/dri-devel/CAPj87rP=HEfPDX8dDM_-BptLmt054x+WHZdCBZOtdMX=x4v...@mail.gmail.com/T/
