amdgpu: Remove duplicate struct member

Tvrtko Ursulin Thu, 08 Jan 2026 01:31:53 -0800


On 08/01/2026 09:15, Christian König wrote:

On 1/8/26 10:05, Tvrtko Ursulin wrote:


On 07/01/2026 14:46, Tvrtko Ursulin wrote:


On 07/01/2026 09:01, Christian König wrote:

On 12/19/25 14:41, Tvrtko Ursulin wrote:

Struct amdgpu_ctx contains two copies of the pointer to the context
manager. Remove one.

Signed-off-by: Tvrtko Ursulin <[email protected]>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 3 +--
   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 1 -
   2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/ 
drm/amd/amdgpu/amdgpu_ctx.c
index afedea02188d..41c05358d86d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -232,7 +232,7 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, 
u32 hw_ip,
       } else {
           struct amdgpu_fpriv *fpriv;
-        fpriv = container_of(ctx->ctx_mgr, struct amdgpu_fpriv, ctx_mgr);
+        fpriv = container_of(ctx->mgr, struct amdgpu_fpriv, ctx_mgr);
           r = amdgpu_xcp_select_scheds(adev, hw_ip, hw_prio, fpriv,
                           &num_scheds, &scheds);


Well that code is utterly nonsense to begin with. amdgpu_xcp_select_scheds() 
needs the xcp id to select from and not fpriv.

Can you look into re-structuring this so that we don't need that cast?


I had a look and so far only cleanup it up visually a bit so there is fewer long 
array subscript dereferences and confusion between sel_xcp_id and priv->xcp_id.

But on a more fundamental level, since it needs to write to fpriv, the caller 
will need to have it one way or the other, no?

And then I noticed not only the atomic_read/inc usage is dodgy, but the 
fpriv->xcp_id assignment itself is racy. Two threads submitting to the same new 
entity appears can end up with a refcount imbalance and probably worse.

Shall I replace the ref_cnt atomic with a mutex and protect the whole selection?


Or maybe there is no race?

fpriv->xcp_id is first assigned in amdgpu_driver_open_kms() and there it looks it 
can mostly fail or succeed. I say mostly because the one silent failure path (not 
failing the device open) I see if xcp->ddev would be NULL. I am not sure if/when 
that can happen? If it can happen and that is the reason ctx init needs to retry the 
xcp_id selection? In which case it is racy.


xcp_id selection can only happen in amdgpu_driver_open_kms(), anytime later and 
you have completely messed up page tables.

So no xcp code should ever overwrite fpriv->xcp_id. If we have stuff like that 
then that is clearly a bug as well.

Thanks for pointing that out,

Only if post open fpriv->xcp_id ends up AMDGPU_XCP_NO_PARTITION duefinding an xcp with xcp->ddev == NULL. Again, I am not sure if/when thatcan happen. But if it does, next ctx entity init will retry viaamdgpu_xcp_select_scheds.

Should I keep this patch as is for now and re-sent the series with theother updates? IMO it is still worth dropping the duplicate member evenif it is followed up by more work in this area.


Regards,

Tvrtko

           if (r)
@@ -349,7 +349,6 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, 
int32_t priority,
       else
           ctx->stable_pstate = current_stable_pstate;
-    ctx->ctx_mgr = &(fpriv->ctx_mgr);
       return 0;
   }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/ 
drm/amd/amdgpu/amdgpu_ctx.h
index aed758d0acaa..cf8d700a22fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -56,7 +56,6 @@ struct amdgpu_ctx {
       unsigned long            ras_counter_ce;
       unsigned long            ras_counter_ue;
       struct amdgpu_ctx_mgr        *mgr;
-    struct amdgpu_ctx_mgr        *ctx_mgr;
       struct amdgpu_ctx_entity    *entities[AMDGPU_HW_IP_NUM] 
[AMDGPU_MAX_ENTITY_NUM];
   };

Re: [PATCH v2 06/12] drm/amdgpu: Remove duplicate struct member

Reply via email to