On 10/09/2025 16:52, Boris Brezillon wrote:
> On Wed, 10 Sep 2025 16:42:32 +0100
> Steven Price <steven.pr...@arm.com> wrote:
> 
>>> +int panfrost_jm_ctx_create(struct drm_file *file,
>>> +                      struct drm_panfrost_jm_ctx_create *args)
>>> +{
>>> +   struct panfrost_file_priv *priv = file->driver_priv;
>>> +   struct panfrost_device *pfdev = priv->pfdev;
>>> +   enum drm_sched_priority sched_prio;
>>> +   struct panfrost_jm_ctx *jm_ctx;
>>> +
>>> +   int ret;
>>> +
>>> +   jm_ctx = kzalloc(sizeof(*jm_ctx), GFP_KERNEL);
>>> +   if (!jm_ctx)
>>> +           return -ENOMEM;
>>> +
>>> +   kref_init(&jm_ctx->refcnt);
>>> +
>>> +   /* Same priority for all JS within a single context */
>>> +   jm_ctx->config = JS_CONFIG_THREAD_PRI(args->priority);
>>> +
>>> +   ret = jm_ctx_prio_to_drm_sched_prio(file, args->priority, &sched_prio);
>>> +   if (ret)
>>> +           goto err_put_jm_ctx;
>>> +
>>> +   for (u32 i = 0; i < NUM_JOB_SLOTS - 1; i++) {
>>> +           struct drm_gpu_scheduler *sched = &pfdev->js->queue[i].sched;
>>> +           struct panfrost_js_ctx *js_ctx = &jm_ctx->slots[i];
>>> +
>>> +           ret = drm_sched_entity_init(&js_ctx->sched_entity, sched_prio,
>>> +                                       &sched, 1, NULL);
>>> +           if (ret)
>>> +                   goto err_put_jm_ctx;
>>> +
>>> +           js_ctx->enabled = true;
>>> +   }
>>> +
>>> +   ret = xa_alloc(&priv->jm_ctxs, &args->handle, jm_ctx,
>>> +                  XA_LIMIT(0, MAX_JM_CTX_PER_FILE), GFP_KERNEL);
>>> +   if (ret)
>>> +           goto err_put_jm_ctx;  
>>
>> On error here we just jump down and call panfrost_jm_ctx_put() which
>> will free jm_ctx but won't destroy any of the drm_sched_entities. There
>> seems to be something a bit off with the lifetime management here.
>>
>> Should panfrost_jm_ctx_release() be responsible for tearing down the
>> context, and panfrost_jm_ctx_destroy() be nothing more than dropping the
>> reference?
> 
> The idea was to kill/cancel any pending jobs as soon as userspace
> releases the context, like we were doing previously when the FD was
> closed. If we defer this ctx teardown to the release() function, we're
> basically waiting for all jobs to complete, which:
> 
> 1. doesn't encourage userspace to have proper control over the contexts
>    lifetime
> 2. might use GPU/mem resources to execute jobs no one cares about
>    anymore

Ah, good point - yes killing the jobs in panfrost_jm_ctx_destroy() makes
sense. But we still need to ensure the clean-up happens in the other
paths ;)

So panfrost_jm_ctx_destroy() should keep the killing jobs part, butthe
drm scheduler entity cleanup should be moved.

Thanks,
Steve

Reply via email to