** Description changed: [ Impact ] ROCm 7.1.0's HSA runtime (libhsa-runtime64-1, libhsakmt1) contains several bugs in IPC (Inter-Process Communication) memory handling and virtual memory management that affect GPU compute workloads: 1. Stale IPC handle usage — There is no validation mechanism to detect when an IPC memory handle references a buffer that has been freed and reallocated. This can lead to silent data corruption in multi-process GPU applications (e.g., RCCL-based distributed training, multi-process OpenCL workloads). 2. Virtual memory use-after-unmap — When revoking GPU access permissions via VMemorySetAccess, the runtime fully unmaps the CPU mapping instead of downgrading it to PROT_NONE. Subsequent operations on the virtual address range can SIGSEGV or produce undefined behavior. 3. Signal creation crash — hsa_amd_signal_create dereferences a null pointer if memory allocation fails, causing an unrecoverable crash instead of returning an error code. 4. Uncached memory lock rejected — hsa_amd_memory_lock_to_pool incorrectly rejects the HSA_AMD_MEMORY_POOL_UNCACHED_FLAG, preventing applications from using uncached pinned memory (needed for certain low-latency GPU communication patterns). 5. IPC legacy mode default — The new DMA-buf IPC path has known compatibility issues on some configurations; upstream has flipped the default to prefer the legacy IPC path unless explicitly opted out. The 7.1.1 point release fixes all of the above. These are internal implementation fixes — the public ABI (symbol set and public headers) is unchanged, so no reverse dependency rebuilds are required. Reverse dependencies affected: libamdhip64-5, libamdhip64-6, librccl1-tests, libucx0, rocm-opencl-icd, rocminfo. See abigail report showing no changes: ``` === Comparing libhsa-runtime64-1 === Running: abipkgdiff /home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/old/libhsa-runtime64-1_7.1.0+dfsg-0ubuntu9_amd64.deb /home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/new/libhsa-runtime64-1_7.1.1+dfsg-0ubuntu1~git202604271237.7d82154b_amd64.deb === Comparing libhsa-runtime64-tests === Running: abipkgdiff /home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/old/libhsa-runtime64-tests_7.1.0+dfsg-0ubuntu9_amd64.deb /home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/new/libhsa-runtime64-tests_7.1.1+dfsg-0ubuntu1~git202604271237.7d82154b_amd64.deb === Comparing libhsakmt1 === Running: abipkgdiff /home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/old/libhsakmt1_7.1.0+dfsg-0ubuntu9_amd64.deb /home/ubuntu/actions-runner/_work/bullwinkle-cicd/bullwinkle-cicd/new/libhsakmt1_7.1.1+dfsg-0ubuntu1~git202604271237.7d82154b_amd64.deb ``` [ Test Plan ] - 1. Build verification: - - sbuild or dpkg-buildpackage the package successfully. - - Verify dpkg --compare-versions shows the new version is greater. - - Run dpkg-gensymbols and confirm no symbols are added/removed/changed (the .symbols files should remain identical). + 1. Build verification in PPA 2. Installability: - apt install libhsa-runtime64-1 libhsakmt1 from the PPA. - Confirm reverse dependencies (rocminfo, libamdhip64-6, rocm-opencl-icd) remain installable without rebuild. - Run rocminfo and confirm it lists available GPUs without error. 3. Run autopkgtest (rocrtst) to verify no new failures. [ Where problems could occur ] 1. IPC metadata validation (most likely area of concern): The new metadata stamping uses amdgpu_bo_set_metadata/amdgpu_bo_query_info on DRM buffer objects. If a kernel or libdrm version does not properly support BO metadata (unlikely on supported ROCm kernels, but possible on older/custom kernels), IPC operations could fail with HSA_STATUS_ERROR_INVALID_ARGUMENT. Symptom: Multi-process GPU applications fail at hsa_amd_ipc_memory_attach with "Invalid IPC handle" stderr message. 2. IPC legacy mode default flip: Applications that were working with the new DMA-buf IPC path (the previous default) might behave differently under the legacy path. Symptom: IPC performance regression or failure in setups that only support DMA-buf. Mitigation: Set HSA_ENABLE_IPC_MODE_LEGACY=0 to restore old behavior. 3. Virtual memory PROT_NONE remapping: The new RemoveAccess() logic remaps with MAP_FIXED instead of unmapping. If the mmap call fails (e.g., due to address space constraints), VMemorySetAccess returns an error where it previously succeeded silently. Symptom: Applications using hsa_amd_vmem_set_access to revoke and re-grant permissions might see unexpected errors on memory-constrained systems. 4. Uncached memory lock flag passthrough: Applications that previously passed non-zero flags to hsa_amd_memory_lock_to_pool and relied on it being rejected (defensive coding) will now have those flags honored. Symptom: Unexpected uncached memory behavior if flags were passed erroneously. [ Other Info ] * No ABI/API breakage: The debian/libhsa-runtime64-1.symbols and debian/libhsakmt1.symbols files are identical between 7.1.0 and 7.1.1. No symbols were added, removed, or had their signatures changed. The SONAME remains libhsa-runtime64.so.1 and libhsakmt.so.1. Public installed headers (runtime/hsa-runtime/inc/) have zero diff. * All internal API changes are implementation-private: The hsakmt_fmm_register_memory() signature change and MemoryRegion::Lock() parameter addition are in private headers not installed by the -dev package. They do not affect reverse dependencies. * Environment variable escape hatch: The one user-visible behavior change (IPC legacy mode default) is controllable via HSA_ENABLE_IPC_MODE_LEGACY=0|1, providing a no-rebuild rollback path if issues are discovered. * PPA: https://launchpad.net/~tchavadar/+archive/ubuntu/lp2150430 * Autopkgtest: https://autopkgtest.ubuntu.com/user/tchavadar/ppa/lp2150430 * Upstream version comparison: https://github.com/ROCm/rocr-runtime/compare/rocm-7.1.0...rocm-7.1.1
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2150430 Title: SRU: New Upstream Version 7.1.1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2150430/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
