These patched are an evolution of the USM portion of the patches previously posted in July 2022 (yes, it's taken a while!)
https://patchwork.sourceware.org/project/gcc/list/?series=10748&state=%2A&archive=both The pinned memory portion was already posted (and partially approved already) and must be applied before this series (v5 version). https://patchwork.sourceware.org/project/gcc/list/?series=35022&state=%2A&archive=both The series implements OpenMP's "Unified Shared Memory" concept, first for NVidia GPUs, and then for AMD GPUs. We already have a very simple implementation of USM that works on integrated APU devices and any other device that supports shared memory access natively. This new implementation replaces that implementation in the case where using "managed memory" is likely to be a win (the usual non-APU case). In theory, explicit mapping of exactly the right memory with carefully hand-optimized "to" and "from" directives is the most optimal implementation (except possibly in the case where the data is too large for the device). Experimentally, the "dumb" USM implementation we already have performs quite well with modern devices and drivers. This new managed memory implementation appears to fall between the two, and can outperform explicit mapping in the non-trivial cases (e.g. many small mappings, sparse data, rectangular copies, etc.) The trade-off for the additional performance is added complexity and malloc/free is no longer compatible with external libraries (e.g. strdup). To help mitigate these incompatibility issues, two new GNU extensions are added: 1. ompx_gnu_unified_shared_mem_alloc / ompx_gnu_unified_shared_mem_space This new pre-defined allocator, used with omp_alloc, allows a programmer to explicitly allocate managed memory without converting the whole program to USM. Creating explicit mappings for this memory is now optional, and if they do occur the runtime will detect the USM and apply no-op mappings. 2. ompx_gnu_host_mem_alloc / ompx_gnu_host_mem_space Conversely, this new pre-defined allocator allows a programmer to override "requires unified_shared_memory" and obtain regular host memory from the regular system heap. This might be desirable when a large amount of memory is needed in a completely unrelated context, or for interacting with external libraries. Known limitation: We can intercept dynamic heap allocations, but static data and automatic stack variables are generally not accessible from the device. (Migrating stack pages used by an active thread seems like a bad idea, in any case.) I can approve the amdgcn patches myself, but comments are welcome. OK for mainline? (Once the pinned memory dependencies are committed.) Thanks Andrew P.S. This series includes contributions from (at least) Thomas Schwinge, Marcel Vollweiler, Kwok Cheung Yeung, and Abid Qadeer. Andrew Stubbs (6): libgomp: Disentangle shared memory from managed openmp, nvptx: ompx_gnu_unified_shared_mem_alloc openmp: Enable -foffload-memory=unified amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK amdgcn: libgomp plugin USM implementation libgomp: Map omp_default_mem_space to USM Hafiz Abid Qadeer (1): openmp: Use libgomp memory allocation functions with unified shared memory. Marcel Vollweiler (1): openmp, libgomp: Handle unified shared memory in omp_target_is_accessible gcc/c/c-parser.cc | 20 +- gcc/config/gcn/gcn.cc | 32 +- gcc/config/gcn/mkoffload.cc | 35 +- gcc/cp/parser.cc | 20 +- gcc/fortran/openmp.cc | 14 +- gcc/fortran/parse.cc | 3 +- gcc/omp-low.cc | 188 +++++++ gcc/passes.def | 1 + gcc/testsuite/c-c++-common/gomp/usm-1.c | 4 + gcc/testsuite/c-c++-common/gomp/usm-2.c | 46 ++ gcc/testsuite/c-c++-common/gomp/usm-3.c | 44 ++ gcc/testsuite/g++.dg/gomp/usm-1.C | 32 ++ gcc/testsuite/g++.dg/gomp/usm-2.C | 30 ++ gcc/testsuite/g++.dg/gomp/usm-3.C | 38 ++ gcc/testsuite/g++.dg/gomp/usm-4.C | 32 ++ gcc/testsuite/g++.dg/gomp/usm-5.C | 30 ++ gcc/testsuite/gfortran.dg/gomp/usm-1.f90 | 6 + gcc/testsuite/gfortran.dg/gomp/usm-2.f90 | 16 + gcc/testsuite/gfortran.dg/gomp/usm-3.f90 | 13 + gcc/tree-pass.h | 1 + include/cuda/cuda.h | 13 + include/hsa.h | 28 +- include/hsa_ext_amd.h | 459 +++++++++++++++++- include/hsa_ext_image.h | 2 +- libgomp/Makefile.in | 13 +- libgomp/allocator.c | 17 +- libgomp/config/gcn/allocator.c | 10 + libgomp/config/linux/allocator.c | 29 +- libgomp/config/nvptx/allocator.c | 10 + libgomp/libgomp-plugin.h | 4 + libgomp/libgomp.h | 8 + libgomp/omp.h.in | 4 + libgomp/omp_lib.f90.in | 8 + libgomp/omp_lib.h.in | 10 + libgomp/plugin/Makefrag.am | 2 +- libgomp/plugin/cuda-lib.def | 2 + libgomp/plugin/plugin-gcn.c | 209 +++++++- libgomp/plugin/plugin-nvptx.c | 68 ++- libgomp/target.c | 96 +++- libgomp/testsuite/lib/libgomp.exp | 22 + libgomp/testsuite/libgomp.c++/usm-1.C | 54 +++ libgomp/testsuite/libgomp.c++/usm-2.C | 33 ++ .../libgomp.c-c++-common/requires-1.c | 1 + .../libgomp.c-c++-common/requires-4.c | 3 + .../libgomp.c-c++-common/requires-4a.c | 2 + .../libgomp.c-c++-common/requires-5.c | 5 +- .../target-implicit-map-4.c | 18 + .../target-is-accessible-1.c | 22 +- .../target-is-accessible-2.c | 21 + .../alloc-ompx_gnu_host_mem_alloc-1.c | 77 +++ libgomp/testsuite/libgomp.c/usm-1.c | 26 + libgomp/testsuite/libgomp.c/usm-2.c | 34 ++ libgomp/testsuite/libgomp.c/usm-3.c | 37 ++ libgomp/testsuite/libgomp.c/usm-4.c | 38 ++ libgomp/testsuite/libgomp.c/usm-5.c | 30 ++ libgomp/testsuite/libgomp.c/usm-6.c | 94 ++++ .../target-is-accessible-1.f90 | 20 +- .../target-is-accessible-2.f90 | 22 + libgomp/testsuite/libgomp.fortran/usm-1.f90 | 28 ++ libgomp/testsuite/libgomp.fortran/usm-2.f90 | 33 ++ libgomp/testsuite/libgomp.fortran/usm-3.f90 | 33 ++ libgomp/usm-allocator.c | 232 +++++++++ libgomp/usmpin-allocator.c | 3 + 63 files changed, 2403 insertions(+), 82 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C create mode 100644 gcc/testsuite/g++.dg/gomp/usm-4.C create mode 100644 gcc/testsuite/g++.dg/gomp/usm-5.C create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90 mode change 100644 => 100755 include/hsa.h mode change 100644 => 100755 include/hsa_ext_amd.h mode change 100644 => 100755 include/hsa_ext_image.h create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C create mode 100644 libgomp/testsuite/libgomp.c++/usm-2.C create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-2.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-ompx_gnu_host_mem_alloc-1.c create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c create mode 100644 libgomp/testsuite/libgomp.fortran/target-is-accessible-2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/usm-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/usm-2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/usm-3.f90 create mode 100644 libgomp/usm-allocator.c -- 2.41.0