** Description changed:

  [ Impact ]
  
    ROCm 7.1.0's HSA runtime (libhsa-runtime64-1, libhsakmt1) contains
  several bugs in IPC (Inter-Process Communication) memory handling and
  virtual memory management that affect GPU compute workloads:
  
    1. Stale IPC handle usage — There is no validation mechanism to detect when 
an IPC memory handle references a buffer that has been freed and reallocated. 
This can lead to silent data corruption in
    multi-process GPU applications (e.g., RCCL-based distributed training, 
multi-process OpenCL workloads).
    2. Virtual memory use-after-unmap — When revoking GPU access permissions 
via VMemorySetAccess, the runtime fully unmaps the CPU mapping instead of 
downgrading it to PROT_NONE. Subsequent operations
    on the virtual address range can SIGSEGV or produce undefined behavior.
    3. Signal creation crash — hsa_amd_signal_create dereferences a null 
pointer if memory allocation fails, causing an unrecoverable crash instead of 
returning an error code.
    4. Uncached memory lock rejected — hsa_amd_memory_lock_to_pool incorrectly 
rejects the HSA_AMD_MEMORY_POOL_UNCACHED_FLAG, preventing applications from 
using uncached pinned memory (needed for certain
     low-latency GPU communication patterns).
    5. IPC legacy mode default — The new DMA-buf IPC path has known 
compatibility issues on some configurations; upstream has flipped the default 
to prefer the legacy IPC path unless explicitly opted
    out.
  
    The 7.1.1 point release fixes all of the above. These are internal
  implementation fixes — the public ABI (symbol set and public headers) is
  unchanged, so no reverse dependency rebuilds are required.
  
    Reverse dependencies affected: libamdhip64-5, libamdhip64-6,
  librccl1-tests, libucx0, rocm-opencl-icd, rocminfo.
  
  See abigail report showing no changes:
  ```
  === Comparing libhsa-runtime64-1 ===
  Running: abipkgdiff 
/home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/old/libhsa-runtime64-1_7.1.0+dfsg-0ubuntu9_amd64.deb
 
/home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/new/libhsa-runtime64-1_7.1.1+dfsg-0ubuntu1~git202604271237.7d82154b_amd64.deb
  
  === Comparing libhsa-runtime64-tests ===
  Running: abipkgdiff 
/home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/old/libhsa-runtime64-tests_7.1.0+dfsg-0ubuntu9_amd64.deb
 
/home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/new/libhsa-runtime64-tests_7.1.1+dfsg-0ubuntu1~git202604271237.7d82154b_amd64.deb
  
  === Comparing libhsakmt1 ===
  Running: abipkgdiff 
/home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/old/libhsakmt1_7.1.0+dfsg-0ubuntu9_amd64.deb
 
/home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/new/libhsakmt1_7.1.1+dfsg-0ubuntu1~git202604271237.7d82154b_amd64.deb
  ```
  
  [ Test Plan ]
  
-   1. Build verification:
-     - sbuild or dpkg-buildpackage the package successfully.
-     - Verify dpkg --compare-versions shows the new version is greater.
-     - Run dpkg-gensymbols and confirm no symbols are added/removed/changed 
(the .symbols files should remain identical).
+   1. Build verification in PPA
    2. Installability:
      - apt install libhsa-runtime64-1 libhsakmt1 from the PPA.
      - Confirm reverse dependencies (rocminfo, libamdhip64-6, rocm-opencl-icd) 
remain installable without rebuild.
      - Run rocminfo and confirm it lists available GPUs without error.
    3. Run autopkgtest (rocrtst) to verify no new failures.
  
  [ Where problems could occur ]
  
    1. IPC metadata validation (most likely area of concern): The new metadata 
stamping uses amdgpu_bo_set_metadata/amdgpu_bo_query_info on DRM buffer 
objects. If a kernel or libdrm version does not
    properly support BO metadata (unlikely on supported ROCm kernels, but 
possible on older/custom kernels), IPC operations could fail with 
HSA_STATUS_ERROR_INVALID_ARGUMENT. Symptom:
    Multi-process GPU applications fail at hsa_amd_ipc_memory_attach with 
"Invalid IPC handle" stderr message.
    2. IPC legacy mode default flip: Applications that were working with the 
new DMA-buf IPC path (the previous default) might behave differently under the 
legacy path. Symptom: IPC performance
    regression or failure in setups that only support DMA-buf. Mitigation: Set 
HSA_ENABLE_IPC_MODE_LEGACY=0 to restore old behavior.
    3. Virtual memory PROT_NONE remapping: The new RemoveAccess() logic remaps 
with MAP_FIXED instead of unmapping. If the mmap call fails (e.g., due to 
address space constraints), VMemorySetAccess
    returns an error where it previously succeeded silently. Symptom: 
Applications using hsa_amd_vmem_set_access to revoke and re-grant permissions 
might see unexpected errors on memory-constrained
    systems.
    4. Uncached memory lock flag passthrough: Applications that previously 
passed non-zero flags to hsa_amd_memory_lock_to_pool and relied on it being 
rejected (defensive coding) will now have those
    flags honored. Symptom: Unexpected uncached memory behavior if flags were 
passed erroneously.
  
  [ Other Info ]
  
   * No ABI/API breakage: The debian/libhsa-runtime64-1.symbols and 
debian/libhsakmt1.symbols files are identical between 7.1.0 and 7.1.1. No 
symbols were added, removed, or had their signatures
    changed. The SONAME remains libhsa-runtime64.so.1 and libhsakmt.so.1. 
Public installed headers (runtime/hsa-runtime/inc/) have zero diff.
   * All internal API changes are implementation-private: The 
hsakmt_fmm_register_memory() signature change and MemoryRegion::Lock() 
parameter addition are in private headers not installed by the -dev
    package. They do not affect reverse dependencies.
   * Environment variable escape hatch: The one user-visible behavior change 
(IPC legacy mode default) is controllable via HSA_ENABLE_IPC_MODE_LEGACY=0|1, 
providing a no-rebuild rollback path if issues
    are discovered.
   * PPA: https://launchpad.net/~tchavadar/+archive/ubuntu/lp2150430
   * Autopkgtest: https://autopkgtest.ubuntu.com/user/tchavadar/ppa/lp2150430
   * Upstream version comparison: 
https://github.com/ROCm/rocr-runtime/compare/rocm-7.1.0...rocm-7.1.1

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2150430

Title:
  SRU: New Upstream Version 7.1.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2150430/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to