amdgpu: stop reserving VMIDs to enforce isolation

Christian König Fri, 14 Mar 2025 07:15:14 -0700

Am 12.03.25 um 16:10 schrieb SRINIVASAN SHANMUGAM:
> On 3/7/2025 7:18 PM, Christian König wrote:
>> That was quite troublesome for gang submit. Completely drop this
>> approach and enforce the isolation separately.
>>
>> Signed-off-by: Christian König <christian.koe...@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c |  9 +--------
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 11 +++--------
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h |  3 +--
>>  4 files changed, 6 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 2ce0c6a152a6..4375e5019418 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -1111,7 +1111,7 @@ static int amdgpu_cs_vm_handling(struct 
>> amdgpu_cs_parser *p)
>>                      struct drm_gpu_scheduler *sched = entity->rq->sched;
>>                      struct amdgpu_ring *ring = to_amdgpu_ring(sched);
>>  
>> -                    if (amdgpu_vmid_uses_reserved(adev, vm, ring->vm_hub))
>> +                    if (amdgpu_vmid_uses_reserved(vm, ring->vm_hub))
>>                              return -EINVAL;
>>              }
>>      }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index a194bf3347cb..9e5f6b11d923 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -1665,15 +1665,8 @@ static ssize_t 
>> amdgpu_gfx_set_enforce_isolation(struct device *dev,
>>      }
>>  
>>      mutex_lock(&adev->enforce_isolation_mutex);
>> -    for (i = 0; i < num_partitions; i++) {
>> -            if (adev->enforce_isolation[i] && !partition_values[i])
>> -                    /* Going from enabled to disabled */
>> -                    amdgpu_vmid_free_reserved(adev, AMDGPU_GFXHUB(i));
>> -            else if (!adev->enforce_isolation[i] && partition_values[i])
>> -                    /* Going from disabled to enabled */
>> -                    amdgpu_vmid_alloc_reserved(adev, AMDGPU_GFXHUB(i));
>> +    for (i = 0; i < num_partitions; i++)
>>              adev->enforce_isolation[i] = partition_values[i];
>> -    }
>>      mutex_unlock(&adev->enforce_isolation_mutex);
>>  
>>      amdgpu_mes_update_enforce_isolation(adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>> index 92ab821afc06..4c4e087230ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>> @@ -411,7 +411,7 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct 
>> amdgpu_ring *ring,
>>      if (r || !idle)
>>              goto error;
>>  
>> -    if (amdgpu_vmid_uses_reserved(adev, vm, vmhub)) {
>> +    if (amdgpu_vmid_uses_reserved(vm, vmhub)) {
>>              r = amdgpu_vmid_grab_reserved(vm, ring, job, &id, fence);
>>              if (r || !id)
>>                      goto error;
>> @@ -464,19 +464,14 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct 
>> amdgpu_ring *ring,
>>  
>>  /*
>>   * amdgpu_vmid_uses_reserved - check if a VM will use a reserved VMID
>> - * @adev: amdgpu_device pointer
>>   * @vm: the VM to check
>>   * @vmhub: the VMHUB which will be used
>>   *
>>   * Returns: True if the VM will use a reserved VMID.
>>   */
>> -bool amdgpu_vmid_uses_reserved(struct amdgpu_device *adev,
>> -                           struct amdgpu_vm *vm, unsigned int vmhub)
>> +bool amdgpu_vmid_uses_reserved(struct amdgpu_vm *vm, unsigned int vmhub)
>>  {
>> -    return vm->reserved_vmid[vmhub] ||
>> -            (adev->enforce_isolation[(vm->root.bo->xcp_id != 
>> AMDGPU_XCP_NO_PARTITION) ?
>> -                                     vm->root.bo->xcp_id : 0] &&
>> -             AMDGPU_IS_GFXHUB(vmhub));
>> +    return vm->reserved_vmid[vmhub];
>>  }
>>  
>>  int amdgpu_vmid_alloc_reserved(struct amdgpu_device *adev,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
>> index 4012fb2dd08a..240fa6751260 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
>> @@ -78,8 +78,7 @@ void amdgpu_pasid_free_delayed(struct dma_resv *resv,
>>  
>>  bool amdgpu_vmid_had_gpu_reset(struct amdgpu_device *adev,
>>                             struct amdgpu_vmid *id);
>> -bool amdgpu_vmid_uses_reserved(struct amdgpu_device *adev,
>> -                           struct amdgpu_vm *vm, unsigned int vmhub);
>> +bool amdgpu_vmid_uses_reserved(struct amdgpu_vm *vm, unsigned int vmhub);
>>  int amdgpu_vmid_alloc_reserved(struct amdgpu_device *adev,
>>                              unsigned vmhub);
>>  void amdgpu_vmid_free_reserved(struct amdgpu_device *adev,
> here we are trying to remove process isolation based on VMID's  this is 
> because the constraints that we have limited number of vimids, that could be 
> assigned to limited number of clients, now we have switched one level above, 
> that is rather than enforcing isolation on client VMID's , we are directly 
> applying enforce_isolation onto clients, but not on client's vmids,?


Exactly that, yes. We basically separate the two features.

> so based on my this understanding we have removed the enforce_isolation based 
> on vmids in this patch6 & now we are doing process isolation directly on 
> client based on 
> https://patchwork.freedesktop.org/patch/638081/?series=145031&rev=1 
> <https://patchwork.freedesktop.org/patch/638081/?series=145031&rev=1> ie., 
> based on patch4
> [4/8] drm/amdgpu: rework how isolation is enforced v2, we are doing process 
> isolation directly based on first client ie., based on who is the first 
> client or first owner & then after completing all jobs wrt first client or 
> owner , then it is switched to second client or next owner? is my 
> understanding is right please? ie., based on 
> "amdgpu_device_enforce_isolation()" function

Yes, exactly that.

Regards,
Christian.

Re: [PATCH 6/8] drm/amdgpu: stop reserving VMIDs to enforce isolation

Reply via email to