On Tue, 2025-07-08 at 14:02 +0100, Tvrtko Ursulin wrote:
> 
> 
> On 11/02/2025 11:14, Philipp Stanner wrote:
> > drm_sched_init() has a great many parameters and upcoming new
> > functionality for the scheduler might add even more. Generally, the
> > great number of parameters reduces readability and has already
> > caused
> > one missnaming, addressed in:
> > 
> > commit 6f1cacf4eba7 ("drm/nouveau: Improve variable name in
> > nouveau_sched_init()").
> > 
> > Introduce a new struct for the scheduler init parameters and port
> > all
> > users.
> > 
> > Signed-off-by: Philipp Stanner <pha...@kernel.org>
> > Reviewed-by: Liviu Dudau <liviu.du...@arm.com>
> > Acked-by: Matthew Brost <matthew.br...@intel.com> # for Xe
> > Reviewed-by: Boris Brezillon <boris.brezil...@collabora.com> # for
> > Panfrost and Panthor
> > Reviewed-by: Christian Gmeiner <cgmei...@igalia.com> # for Etnaviv
> > Reviewed-by: Frank Binns <frank.bi...@imgtec.com> # for Imagination
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com> # for Sched
> > Reviewed-by: Maíra Canal <mca...@igalia.com> # for v3d
> > ---
> > Changes in v4:
> >    - Add forgotten driver accel/amdxdna. (Me)
> >    - Rephrase the "init to NULL" comments. (Tvrtko)
> >    - Apply RBs by Tvrtko and Maira.
> >    - Terminate the last struct members with a comma, so that future
> >      fields can be added with a minimal patch diff. (Me)
> > 
> > Changes in v3:
> >    - Various formatting requirements.
> > 
> > Changes in v2:
> >    - Point out that the hang-limit is deprecated. (Christian)
> >    - Initialize the structs to 0 at declaration. (Planet Earth)
> >    - Don't set stuff explicitly to 0 / NULL. (Tvrtko)
> >    - Make the structs const where possible. (Boris)
> >    - v3d: Use just 1, universal, function for sched-init. (Maíra)
> > ---
> 
> 8><
> 
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
> > b/drivers/gpu/drm/panfrost/panfrost_job.c
> > index 9b8e82fb8bc4..5657106c2f7d 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > @@ -836,8 +836,16 @@ static irqreturn_t
> > panfrost_job_irq_handler(int irq, void *data)
> >   
> >   int panfrost_job_init(struct panfrost_device *pfdev)
> >   {
> > +   struct drm_sched_init_args args = {
> > +           .ops = &panfrost_sched_ops,
> > +           .num_rqs = DRM_SCHED_PRIORITY_COUNT,
> > +           .credit_limit = 2,
> > +           .timeout = msecs_to_jiffies(JOB_TIMEOUT_MS),
> > +           .timeout_wq = pfdev->reset.wq,
> 
> ^^^
> 
> > +           .name = "pan_js",
> > +           .dev = pfdev->dev,
> > +   };
> >     struct panfrost_job_slot *js;
> > -   unsigned int nentries = 2;
> >     int ret, j;
> >   
> >     /* All GPUs have two entries per queue, but without
> > jobchain
> > @@ -845,7 +853,7 @@ int panfrost_job_init(struct panfrost_device
> > *pfdev)
> >      * so let's just advertise one entry in that case.
> >      */
> >     if (!panfrost_has_hw_feature(pfdev,
> > HW_FEATURE_JOBCHAIN_DISAMBIGUATION))
> > -           nentries = 1;
> > +           args.credit_limit = 1;
> >   
> >     pfdev->js = js = devm_kzalloc(pfdev->dev, sizeof(*js),
> > GFP_KERNEL);
> >     if (!js)
> 
> Stumbled on this while looking at drm_sched_init() workqueue usage.
> 
> I think this patch might need a fixup. Because somewhere around here
> in 
> the code there is this:
> 
>       pfdev->reset.wq = alloc_ordered_workqueue("panfrost-reset",
> 0);
>       if (!pfdev->reset.wq)
>               return -ENOMEM;
> 
> Which means that after the patch panfrost is using system_wq for the 
> timeout handler instead the one it creates.

Ouch yes, that's definitely a very subtle bug. AFAICS it comes to be by
pfdev being initialized to 0.

Let me provide a fix..

P.

> 
> > @@ -875,13 +883,7 @@ int panfrost_job_init(struct panfrost_device
> > *pfdev)
> >     for (j = 0; j < NUM_JOB_SLOTS; j++) {
> >             js->queue[j].fence_context =
> > dma_fence_context_alloc(1);
> >   
> > -           ret = drm_sched_init(&js->queue[j].sched,
> > -                                &panfrost_sched_ops, NULL,
> > -                                DRM_SCHED_PRIORITY_COUNT,
> > -                                nentries, 0,
> > -                               
> > msecs_to_jiffies(JOB_TIMEOUT_MS),
> > -                                pfdev->reset.wq,
> > -                                NULL, "pan_js", pfdev->dev);
> > +           ret = drm_sched_init(&js->queue[j].sched, &args);
> 
> ^^^
> 
> >             if (ret) {
> >                     dev_err(pfdev->dev, "Failed to create
> > scheduler: %d.", ret);
> >                     goto err_sched;
> 
> Regards,
> 
> Tvrtko
> 

Reply via email to