On 07/01/2026 14:46, Tvrtko Ursulin wrote:

On 07/01/2026 09:01, Christian König wrote:
On 12/19/25 14:41, Tvrtko Ursulin wrote:
Struct amdgpu_ctx contains two copies of the pointer to the context
manager. Remove one.

Signed-off-by: Tvrtko Ursulin <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 3 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 1 -
  2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/ drm/amd/amdgpu/amdgpu_ctx.c
index afedea02188d..41c05358d86d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -232,7 +232,7 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, u32 hw_ip,
      } else {
          struct amdgpu_fpriv *fpriv;
-        fpriv = container_of(ctx->ctx_mgr, struct amdgpu_fpriv, ctx_mgr);
+        fpriv = container_of(ctx->mgr, struct amdgpu_fpriv, ctx_mgr);
          r = amdgpu_xcp_select_scheds(adev, hw_ip, hw_prio, fpriv,
                          &num_scheds, &scheds);

Well that code is utterly nonsense to begin with. amdgpu_xcp_select_scheds() needs the xcp id to select from and not fpriv.

Can you look into re-structuring this so that we don't need that cast?

I had a look and so far only cleanup it up visually a bit so there is fewer long array subscript dereferences and confusion between sel_xcp_id and priv->xcp_id.

But on a more fundamental level, since it needs to write to fpriv, the caller will need to have it one way or the other, no?

And then I noticed not only the atomic_read/inc usage is dodgy, but the fpriv->xcp_id assignment itself is racy. Two threads submitting to the same new entity appears can end up with a refcount imbalance and probably worse.

Shall I replace the ref_cnt atomic with a mutex and protect the whole selection?

Or maybe there is no race?

fpriv->xcp_id is first assigned in amdgpu_driver_open_kms() and there it looks it can mostly fail or succeed. I say mostly because the one silent failure path (not failing the device open) I see if xcp->ddev would be NULL. I am not sure if/when that can happen? If it can happen and that is the reason ctx init needs to retry the xcp_id selection? In which case it is racy.

Regards,

Tvrtko


Regards,

Tvrtko

          if (r)
@@ -349,7 +349,6 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority,
      else
          ctx->stable_pstate = current_stable_pstate;
-    ctx->ctx_mgr = &(fpriv->ctx_mgr);
      return 0;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/ drm/amd/amdgpu/amdgpu_ctx.h
index aed758d0acaa..cf8d700a22fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -56,7 +56,6 @@ struct amdgpu_ctx {
      unsigned long            ras_counter_ce;
      unsigned long            ras_counter_ue;
      struct amdgpu_ctx_mgr        *mgr;
-    struct amdgpu_ctx_mgr        *ctx_mgr;
      struct amdgpu_ctx_entity    *entities[AMDGPU_HW_IP_NUM] [AMDGPU_MAX_ENTITY_NUM];
  };



Reply via email to