On 2025-04-17 8:57, Christian König wrote: > Am 17.04.25 um 15:33 schrieb Felix Kuehling: >> Pinning of VRAM is for peer devices that don't support dynamic attachment >> and move notifiers. But it requires that all such peer devices are able to >> access VRAM via PCIe P2P. Any device without P2P access requires migration >> to GTT, which fails if the memory is already pinned for another peer >> device. >> >> Sharing between GPUs should not require pinning in VRAM. However, if >> DMABUF_MOVE_NOTIFY is disabled in the kernel build, even DMABufs shared >> between GPUs must be pinned, which can lead to failures and functional >> regressions on systems where some peer GPUs are not P2P accessible. >> >> Disable VRAM pinning if move notifiers are disabled in the kernel build >> to fix regressions when sharing BOs between GPUs. >> >> v2: ensure domains is not 0, add GTT if necessary >> >> Signed-off-by: Felix Kuehling <felix.kuehl...@amd.com> >> Tested-by: Hao (Claire) Zhou <hao.z...@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 19 ++++++++++++++----- >> 1 file changed, 14 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c >> index 667080cc9ae1..3c2c32a252d4 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c >> @@ -81,17 +81,26 @@ static int amdgpu_dma_buf_pin(struct dma_buf_attachment >> *attach) >> >> dma_resv_assert_held(dmabuf->resv); >> >> - /* >> - * Try pinning into VRAM to allow P2P with RDMA NICs without ODP >> + /* Try pinning into VRAM to allow P2P with RDMA NICs without ODP >> * support if all attachments can do P2P. If any attachment can't do >> * P2P just pin into GTT instead. >> + * >> + * To avoid with conflicting pinnings between GPUs and RDMA when move >> + * notifiers are disabled, only allow pinning in VRAM when move >> + * notiers are enabled. >> */ >> - list_for_each_entry(attach, &dmabuf->attachments, node) >> - if (!attach->peer2peer) >> - domains &= ~AMDGPU_GEM_DOMAIN_VRAM; >> + if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) { >> + domains &= ~AMDGPU_GEM_DOMAIN_VRAM; >> + } else { >> + list_for_each_entry(attach, &dmabuf->attachments, node) >> + if (!attach->peer2peer) >> + domains &= ~AMDGPU_GEM_DOMAIN_VRAM; >> + } >> >> if (domains & AMDGPU_GEM_DOMAIN_VRAM) >> bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; >> + else if (!domains) >> + domains = AMDGPU_GEM_DOMAIN_GTT; > Please drop that. > > We should probably use allowed_domains instead of preferred_domains and > return an error if none of them work.
I sent out an updated patch with that. One concern, though: With discardable VRAM BOs, allowed_domains may not include GTT. I'm not sure if such BOs would ever be exported. Regards, Felix > > Regards, > Christian. > >> >> return amdgpu_bo_pin(bo, domains); >> }