Arun and I have been working on a patch to split up
AMDGPU_VA_RESERVED_SIZE into two separate defines.
I assumed that Arun landed this by now, but that doesn't seem to be the
case.
Going to take another look, but yeah the CSA area isn't large enough to
fill the upper reserved area so moving
[+Christian]
I'm looking into virtual address reservations in amdgpu and what's
reported by AMDGPU_INFO_DEV_INFO. So far I only found
AMDGPU_VA_RESERVED_SIZE, which is reserved both at the start of the
lower virtual address range and the end of the upper virtual address range.
The reservatio
It looks like this would cause failures even with regular 64-bit
allocations because the virtual address range allocator in libdrm asks
the kernel what ranges of addresses are free, and the kernel doesn't
exclude the KFD allocation from that.
Basically, no VM allocations can be done by the kernel
The 32-bit address space means the high 32 bits are constant and
predetermined and it's definitely somewhere in the upper range of the
address space. If ROCm or KFD occupy that space, even accidentally, other
UMDs that use libdrm for VA allocation won't be able to start. The VA range
allocator is i
TBA/TMA were relocated to the upper half of the canonical address space.
I don't think that qualifies as 32-bit by definition. But maybe you're
using a different definition.
That said, if Mesa manages its own virtual address space in user mode,
and KFD maps the TMA/TBA at an address that Mesa
Hi,
I have received information that the original commit makes all 32-bit
userspace VA allocations fail, so UMDs like Mesa can't even initialize
and they either crash or fail to load. If TBA/TMA was relocated to the
32-bit address range, it would explain why UMDs can't allocate
anything in that ra
On 1/3/2024 12:58, Felix Kuehling wrote:
> A segfault in Mesa seems to be a different issue from what's mentioned
> in the commit message. I'd let Christian or Marek comment on
> compatibility with graphics UMDs. I'm not sure why this patch would
> affect them at all.
I was referencing this is
On 2024-01-03 10:32, Jay Cornwall wrote:
On 1/3/2024 09:19, Alex Deucher wrote:
+ Jay, Felix
On Wed, Jan 3, 2024 at 5:16 AM Kaibo Ma wrote:
That commit causes NULL pointer dereferences in dmesgs when
running applications using ROCm, including clinfo, blender,
and PyTorch, since v6.6.1. Revert
Applied. Thanks!
Alex
On Wed, Jan 3, 2024 at 10:33 AM Jay Cornwall wrote:
>
> On 1/3/2024 09:19, Alex Deucher wrote:
> > + Jay, Felix
> >
> > On Wed, Jan 3, 2024 at 5:16 AM Kaibo Ma wrote:
> >>
> >> That commit causes NULL pointer dereferences in dmesgs when
> >> running applications using ROC
On 1/3/2024 09:19, Alex Deucher wrote:
> + Jay, Felix
>
> On Wed, Jan 3, 2024 at 5:16 AM Kaibo Ma wrote:
>>
>> That commit causes NULL pointer dereferences in dmesgs when
>> running applications using ROCm, including clinfo, blender,
>> and PyTorch, since v6.6.1. Revert it to fix blender again.
>
+ Jay, Felix
On Wed, Jan 3, 2024 at 5:16 AM Kaibo Ma wrote:
>
> That commit causes NULL pointer dereferences in dmesgs when
> running applications using ROCm, including clinfo, blender,
> and PyTorch, since v6.6.1. Revert it to fix blender again.
>
> This reverts commit 96c211f1f9ef82183493f4ceed
That commit causes NULL pointer dereferences in dmesgs when
running applications using ROCm, including clinfo, blender,
and PyTorch, since v6.6.1. Revert it to fix blender again.
This reverts commit 96c211f1f9ef82183493f4ceed4e347b52849149.
Closes: https://github.com/ROCm/ROCm/issues/2596
Closes:
12 matches
Mail list logo