amdgpu: Optimize DMABuf attachment with XGMI

Felix Kuehling Thu, 17 Apr 2025 08:54:28 -0700


On 2025-04-16 7:49, Christian König wrote:
> Am 16.04.25 um 06:45 schrieb Felix Kuehling:
>> When peer memory is accessed through XGMI, it does not need to be visible
>> in the BAR and there is no need for SG-tables or DMA mappings.
>>
>> Signed-off-by: Felix Kuehling <felix.kuehl...@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 16 +++++++++++++++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>> index a72c17230fd37..d9284bee337ba 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>> @@ -144,6 +144,11 @@ static void amdgpu_dma_buf_unpin(struct 
>> dma_buf_attachment *attach)
>>      amdgpu_bo_unpin(bo);
>>  }
>>  
>> +/* Dummy SG table for XGMI attachments. It should never be dereferenced. If 
>> it
>> + * is, it will be caught as a kernel oops.
>> + */
>> +#define SGT_DUMMY ((struct sg_table *)1)
>> +
> Mhm, I'm pretty sure that will blow up ugly sooner or later. On the other 
> hand I see what you're trying to do.


Do you have any more specific details? If it blow up, it means someone is using 
the SG table. And I think that's always wrong. Using the SG table means you're 
using the PCIe P2P path. That is slower, has different coherence behaviour, and 
it may not work at all. So I want that to blow up if it happens.


>
> But if I'm not completely mistaken it isn't necessary in the first place, see 
> below.
>
>
>>  /**
>>   * amdgpu_dma_buf_map - &dma_buf_ops.map_dma_buf implementation
>>   * @attach: DMA-buf attachment
>> @@ -160,9 +165,11 @@ static void amdgpu_dma_buf_unpin(struct 
>> dma_buf_attachment *attach)
>>  static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment 
>> *attach,
>>                                         enum dma_data_direction dir)
>>  {
>> +    struct amdgpu_device *attach_adev = dma_buf_attach_adev(attach);
>>      struct dma_buf *dma_buf = attach->dmabuf;
>>      struct drm_gem_object *obj = dma_buf->priv;
>>      struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>> +    bool is_xgmi = amdgpu_dmabuf_is_xgmi_accessible(attach_adev, bo);
>>      struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>      struct sg_table *sgt;
>>      long r;
>> @@ -174,7 +181,8 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
>> dma_buf_attachment *attach,
>>  
>>              if (bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM &&
>>                  attach->peer2peer) {
>> -                    bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>> +                    if (!is_xgmi)
>> +                            bo->flags |= 
>> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>>                      domains |= AMDGPU_GEM_DOMAIN_VRAM;
>>              }
>>              amdgpu_bo_placement_from_domain(bo, domains);
>> @@ -197,6 +205,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
>> dma_buf_attachment *attach,
>>              break;
>>  
>>      case TTM_PL_VRAM:
>> +            if (is_xgmi)
>> +                    return SGT_DUMMY;
>> +
>
> See for XGMI imported BOs we don't create a TT object, so we also never call 
> dma_buf_map_attachment() either.

That's true for the exported BO, but is that also true for the attachment 
object? amdgpu_dma_buf_create_obj creates the attachment object as an SG BO in 
the CPU domain and it later gets validated in the GTT domain. Wouldn't that 
create a TT object? I don't see any XGMI-special case that would prevent that.

Regards,
  Felix


>
> So I think we could just return an error here instead.
>
> Regards,
> Christian.
>
>>              r = amdgpu_vram_mgr_alloc_sgt(adev, bo->tbo.resource, 0,
>>                                            bo->tbo.base.size, attach->dev,
>>                                            dir, &sgt);
>> @@ -228,6 +239,9 @@ static void amdgpu_dma_buf_unmap(struct 
>> dma_buf_attachment *attach,
>>                               struct sg_table *sgt,
>>                               enum dma_data_direction dir)
>>  {
>> +    if (sgt == SGT_DUMMY)
>> +            return;
>> +
>>      if (sg_page(sgt->sgl)) {
>>              dma_unmap_sgtable(attach->dev, sgt, dir, 0);
>>              sg_free_table(sgt);

Re: [PATCH 3/3] drm/amdgpu: Optimize DMABuf attachment with XGMI

Reply via email to