On 10/09/2025 16:52, Boris Brezillon wrote: > On Wed, 10 Sep 2025 16:42:32 +0100 > Steven Price <steven.pr...@arm.com> wrote: > >>> +int panfrost_jm_ctx_create(struct drm_file *file, >>> + struct drm_panfrost_jm_ctx_create *args) >>> +{ >>> + struct panfrost_file_priv *priv = file->driver_priv; >>> + struct panfrost_device *pfdev = priv->pfdev; >>> + enum drm_sched_priority sched_prio; >>> + struct panfrost_jm_ctx *jm_ctx; >>> + >>> + int ret; >>> + >>> + jm_ctx = kzalloc(sizeof(*jm_ctx), GFP_KERNEL); >>> + if (!jm_ctx) >>> + return -ENOMEM; >>> + >>> + kref_init(&jm_ctx->refcnt); >>> + >>> + /* Same priority for all JS within a single context */ >>> + jm_ctx->config = JS_CONFIG_THREAD_PRI(args->priority); >>> + >>> + ret = jm_ctx_prio_to_drm_sched_prio(file, args->priority, &sched_prio); >>> + if (ret) >>> + goto err_put_jm_ctx; >>> + >>> + for (u32 i = 0; i < NUM_JOB_SLOTS - 1; i++) { >>> + struct drm_gpu_scheduler *sched = &pfdev->js->queue[i].sched; >>> + struct panfrost_js_ctx *js_ctx = &jm_ctx->slots[i]; >>> + >>> + ret = drm_sched_entity_init(&js_ctx->sched_entity, sched_prio, >>> + &sched, 1, NULL); >>> + if (ret) >>> + goto err_put_jm_ctx; >>> + >>> + js_ctx->enabled = true; >>> + } >>> + >>> + ret = xa_alloc(&priv->jm_ctxs, &args->handle, jm_ctx, >>> + XA_LIMIT(0, MAX_JM_CTX_PER_FILE), GFP_KERNEL); >>> + if (ret) >>> + goto err_put_jm_ctx; >> >> On error here we just jump down and call panfrost_jm_ctx_put() which >> will free jm_ctx but won't destroy any of the drm_sched_entities. There >> seems to be something a bit off with the lifetime management here. >> >> Should panfrost_jm_ctx_release() be responsible for tearing down the >> context, and panfrost_jm_ctx_destroy() be nothing more than dropping the >> reference? > > The idea was to kill/cancel any pending jobs as soon as userspace > releases the context, like we were doing previously when the FD was > closed. If we defer this ctx teardown to the release() function, we're > basically waiting for all jobs to complete, which: > > 1. doesn't encourage userspace to have proper control over the contexts > lifetime > 2. might use GPU/mem resources to execute jobs no one cares about > anymore
Ah, good point - yes killing the jobs in panfrost_jm_ctx_destroy() makes sense. But we still need to ensure the clean-up happens in the other paths ;) So panfrost_jm_ctx_destroy() should keep the killing jobs part, butthe drm scheduler entity cleanup should be moved. Thanks, Steve