Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-11 Thread Christian König
Arun and I have been working on a patch to split up AMDGPU_VA_RESERVED_SIZE into two separate defines. I assumed that Arun landed this by now, but that doesn't seem to be the case. Going to take another look, but yeah the CSA area isn't large enough to fill the upper reserved area so moving

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-11 Thread Felix Kuehling
[+Christian] I'm looking into virtual address reservations in amdgpu and what's reported by AMDGPU_INFO_DEV_INFO. So far I only found AMDGPU_VA_RESERVED_SIZE, which is reserved both at the start of the lower virtual address range and the end of the upper virtual address range. The reservatio

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-10 Thread Marek Olšák
It looks like this would cause failures even with regular 64-bit allocations because the virtual address range allocator in libdrm asks the kernel what ranges of addresses are free, and the kernel doesn't exclude the KFD allocation from that. Basically, no VM allocations can be done by the kernel

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-05 Thread Marek Olšák
The 32-bit address space means the high 32 bits are constant and predetermined and it's definitely somewhere in the upper range of the address space. If ROCm or KFD occupy that space, even accidentally, other UMDs that use libdrm for VA allocation won't be able to start. The VA range allocator is i

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-05 Thread Felix Kuehling
TBA/TMA were relocated to the upper half of the canonical address space. I don't think that qualifies as 32-bit by definition. But maybe you're using a different definition. That said, if Mesa manages its own virtual address space in user mode, and KFD maps the TMA/TBA at an address that Mesa

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-04 Thread Marek Olšák
Hi, I have received information that the original commit makes all 32-bit userspace VA allocations fail, so UMDs like Mesa can't even initialize and they either crash or fail to load. If TBA/TMA was relocated to the 32-bit address range, it would explain why UMDs can't allocate anything in that ra

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-03 Thread Jay Cornwall
On 1/3/2024 12:58, Felix Kuehling wrote: > A segfault in Mesa seems to be a different issue from what's mentioned > in the commit message. I'd let Christian or Marek comment on > compatibility with graphics UMDs. I'm not sure why this patch would > affect them at all. I was referencing this is

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-03 Thread Felix Kuehling
On 2024-01-03 10:32, Jay Cornwall wrote: On 1/3/2024 09:19, Alex Deucher wrote: + Jay, Felix On Wed, Jan 3, 2024 at 5:16 AM Kaibo Ma wrote: That commit causes NULL pointer dereferences in dmesgs when running applications using ROCm, including clinfo, blender, and PyTorch, since v6.6.1. Revert

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-03 Thread Alex Deucher
Applied. Thanks! Alex On Wed, Jan 3, 2024 at 10:33 AM Jay Cornwall wrote: > > On 1/3/2024 09:19, Alex Deucher wrote: > > + Jay, Felix > > > > On Wed, Jan 3, 2024 at 5:16 AM Kaibo Ma wrote: > >> > >> That commit causes NULL pointer dereferences in dmesgs when > >> running applications using ROC

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-03 Thread Jay Cornwall
On 1/3/2024 09:19, Alex Deucher wrote: > + Jay, Felix > > On Wed, Jan 3, 2024 at 5:16 AM Kaibo Ma wrote: >> >> That commit causes NULL pointer dereferences in dmesgs when >> running applications using ROCm, including clinfo, blender, >> and PyTorch, since v6.6.1. Revert it to fix blender again. >

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-03 Thread Alex Deucher
+ Jay, Felix On Wed, Jan 3, 2024 at 5:16 AM Kaibo Ma wrote: > > That commit causes NULL pointer dereferences in dmesgs when > running applications using ROCm, including clinfo, blender, > and PyTorch, since v6.6.1. Revert it to fix blender again. > > This reverts commit 96c211f1f9ef82183493f4ceed

[PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-03 Thread Kaibo Ma
That commit causes NULL pointer dereferences in dmesgs when running applications using ROCm, including clinfo, blender, and PyTorch, since v6.6.1. Revert it to fix blender again. This reverts commit 96c211f1f9ef82183493f4ceed4e347b52849149. Closes: https://github.com/ROCm/ROCm/issues/2596 Closes: