On 2025-04-16 7:49, Christian König wrote: > Am 16.04.25 um 06:45 schrieb Felix Kuehling: >> When peer memory is accessed through XGMI, it does not need to be visible >> in the BAR and there is no need for SG-tables or DMA mappings. >> >> Signed-off-by: Felix Kuehling <felix.kuehl...@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 16 +++++++++++++++- >> 1 file changed, 15 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c >> index a72c17230fd37..d9284bee337ba 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c >> @@ -144,6 +144,11 @@ static void amdgpu_dma_buf_unpin(struct >> dma_buf_attachment *attach) >> amdgpu_bo_unpin(bo); >> } >> >> +/* Dummy SG table for XGMI attachments. It should never be dereferenced. If >> it >> + * is, it will be caught as a kernel oops. >> + */ >> +#define SGT_DUMMY ((struct sg_table *)1) >> + > Mhm, I'm pretty sure that will blow up ugly sooner or later. On the other > hand I see what you're trying to do.
Do you have any more specific details? If it blow up, it means someone is using the SG table. And I think that's always wrong. Using the SG table means you're using the PCIe P2P path. That is slower, has different coherence behaviour, and it may not work at all. So I want that to blow up if it happens. > > But if I'm not completely mistaken it isn't necessary in the first place, see > below. > > >> /** >> * amdgpu_dma_buf_map - &dma_buf_ops.map_dma_buf implementation >> * @attach: DMA-buf attachment >> @@ -160,9 +165,11 @@ static void amdgpu_dma_buf_unpin(struct >> dma_buf_attachment *attach) >> static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment >> *attach, >> enum dma_data_direction dir) >> { >> + struct amdgpu_device *attach_adev = dma_buf_attach_adev(attach); >> struct dma_buf *dma_buf = attach->dmabuf; >> struct drm_gem_object *obj = dma_buf->priv; >> struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); >> + bool is_xgmi = amdgpu_dmabuf_is_xgmi_accessible(attach_adev, bo); >> struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); >> struct sg_table *sgt; >> long r; >> @@ -174,7 +181,8 @@ static struct sg_table *amdgpu_dma_buf_map(struct >> dma_buf_attachment *attach, >> >> if (bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM && >> attach->peer2peer) { >> - bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; >> + if (!is_xgmi) >> + bo->flags |= >> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; >> domains |= AMDGPU_GEM_DOMAIN_VRAM; >> } >> amdgpu_bo_placement_from_domain(bo, domains); >> @@ -197,6 +205,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct >> dma_buf_attachment *attach, >> break; >> >> case TTM_PL_VRAM: >> + if (is_xgmi) >> + return SGT_DUMMY; >> + > > See for XGMI imported BOs we don't create a TT object, so we also never call > dma_buf_map_attachment() either. That's true for the exported BO, but is that also true for the attachment object? amdgpu_dma_buf_create_obj creates the attachment object as an SG BO in the CPU domain and it later gets validated in the GTT domain. Wouldn't that create a TT object? I don't see any XGMI-special case that would prevent that. Regards, Felix > > So I think we could just return an error here instead. > > Regards, > Christian. > >> r = amdgpu_vram_mgr_alloc_sgt(adev, bo->tbo.resource, 0, >> bo->tbo.base.size, attach->dev, >> dir, &sgt); >> @@ -228,6 +239,9 @@ static void amdgpu_dma_buf_unmap(struct >> dma_buf_attachment *attach, >> struct sg_table *sgt, >> enum dma_data_direction dir) >> { >> + if (sgt == SGT_DUMMY) >> + return; >> + >> if (sg_page(sgt->sgl)) { >> dma_unmap_sgtable(attach->dev, sgt, dir, 0); >> sg_free_table(sgt);