Public bug reported:

When running PyTorch on an APU it reports wrong amount of memory and
models can't run.

torch.OutOfMemoryError: HIP out of memory. Tried to allocate 18.00 MiB.
GPU 0 has a total capacity of 15.60 GiB of which 8.09 MiB is free. Of
the allocated memory 15.10 GiB is allocated by PyTorch, and 195.37 MiB
is reserved by PyTorch but unallocated. If reserved but unallocated
memory is large try setting
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.
See documentation for Memory Management
(https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)


These two commits need to be backported into amdkfd to fix it.

commit 8b0d068e7dd1 ("drm/amdkfd: add a new flag to manage where VRAM 
allocations go")
commit 759e764f7d58 ("drm/amdkfd: use GTT for VRAM on APUs only if GTT is 
larger")

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: linux-oem-6.14 (Ubuntu)
     Importance: Undecided
         Status: Invalid

** Affects: linux (Ubuntu Noble)
     Importance: Undecided
         Status: New

** Affects: linux-oem-6.14 (Ubuntu Noble)
     Importance: Undecided
         Status: New

** Affects: linux (Ubuntu Plucky)
     Importance: Undecided
         Status: New

** Affects: linux-oem-6.14 (Ubuntu Plucky)
     Importance: Undecided
         Status: Invalid

** Affects: linux (Ubuntu Questing)
     Importance: Undecided
         Status: New

** Affects: linux-oem-6.14 (Ubuntu Questing)
     Importance: Undecided
         Status: Invalid


** Tags: originate-from-2120453

** Also affects: linux (Ubuntu)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Also affects: linux-oem-6.14 (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Plucky)
   Importance: Undecided
       Status: New

** Also affects: linux-oem-6.14 (Ubuntu Plucky)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Questing)
   Importance: Undecided
       Status: New

** Also affects: linux-oem-6.14 (Ubuntu Questing)
   Importance: Undecided
       Status: New

** Changed in: linux-oem-6.14 (Ubuntu Plucky)
       Status: New => Invalid

** Changed in: linux-oem-6.14 (Ubuntu Questing)
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2120454

Title:
  Pytorch reports incorrect GPU memory causing "HIP Out of Memory"
  errors

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2120454/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to