Hi Andrew, hi Jakub, hello world, Andrew Stubbs wrote:
Compared to the previous v3 posting of this patch, the enumeration of the "ompx" allocators have been moved to start at "100"
100 is a bad value - as can be seen below. As Jakub suggested at https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640432.html "given that LLVM uses 100-102 range, perhaps pick a different one, 200 or 150" (I know that the first review email suggested 100.)
This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations currently in development.
Namely: ompx_pinned_mem_alloc RFC: Should we use this name or - similar to LLVM - prefix this by a vendor prefix instead (gnu_omp_ or gcc_omp_ instead of ompx_)? IMHO it is fine to use ompx_ for pinned as the semantic is clear and should be compatible with IBM and AMD. For other additional memspaces / allocators, I am less sure, i.e. on OG13 there are: - ompx_unified_shared_mem_space, ompx_host_mem_space - ompx_unified_shared_mem_alloc, ompx_host_mem_alloc (BTW: In light of TR13 naming, the USM one could be ..._devices_all_mem_{alloc,space}, just to start some bikeshading or following LLVM + Intel '…target_{host,shared}…'.) * * * Looking at other compilers: IBM's compiler, https://www.ibm.com/docs/en/SSXVZZ_16.1.1/pdf/compiler.pdf , has: - ompx_pinned_mem_alloc, tagged as IBM extension and otherwise without documenting it further Checking omp.h, they define it as: ompx_pinned_mem_alloc = 9, /* Preview of host pinned memory support */ and additionally have: LOMP_MAX_MEM_ALLOC = 1024, AMD's compiler based on clang has: /* Preview of pinned memory support */ ompx_pinned_mem_alloc = 120, in addition to the LLVM defines shown below. Regarding LLVM: - they don't offer 'pinned' - they use the prefix 'llvm_omp' not 'ompx' Namely: typedef enum omp_allocator_handle_t ... llvm_omp_target_host_mem_alloc = 100, llvm_omp_target_shared_mem_alloc = 101, llvm_omp_target_device_mem_alloc = 102, ... typedef enum omp_memspace_handle_t ... llvm_omp_target_host_mem_space = 100, llvm_omp_target_shared_mem_space = 101, llvm_omp_target_device_mem_space = 102, Remark: I did not find a documentation - and while I understand in principle host and shared, I wonder how LLVM handles 'device_mem_space' when there is more than one device. BTW: OpenMP TR13 avoids this issue by adding two sets of API routines. Namely: First, for memspaces, - omp_get_{device,devices}_memspace - omp_get_{device,devices}_and_host_memspace - omp_get_devices_all_memspace and, secondly, for allocators: - omp_get_{device,devices}_allocator - omp_get_{device,devices}_and_host_allocator - omp_get_devices_all_allocator where omp_get_device_* takes a single device number and omp_get_devices_* a list of device numbers while _and_host automatically adds the initial device to the list. * * * Looking at Intel, they even use extensions without prefix: omp_target_{host,shared,device}_mem_{space,alloc} and contrary to LLVM they document it with the semantic, cf. https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/openmp-memory-spaces-and-allocators.html * * *
The allocator is equivalent to using a custom allocator with the pinned trait and the null fallback trait.
...
diff --git a/libgomp/allocator.c b/libgomp/allocator.c index cdedc7d80e9..18e3f525ec6 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -99,6 +99,8 @@ GOMP_is_alloc (void *ptr)
...
#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0])) -_Static_assert (ARRAY_SIZE (predefined_alloc_mapping) +_Static_assert (ARRAY_SIZE (predefined_omp_alloc_mapping) == omp_max_predefined_alloc + 1, - "predefined_alloc_mapping must match omp_memspace_handle_t"); + "predefined_omp_alloc_mapping must match omp_memspace_handle_t"); +#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
I am surprised that this compiles: Why do you re-#define this macro? * * *
--- a/libgomp/omp.h.in +++ b/libgomp/omp.h.in @@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM omp_cgroup_mem_alloc = 6, omp_pteam_mem_alloc = 7, omp_thread_mem_alloc = 8, + ompx_pinned_mem_alloc = 100,
See remark regarding "100" at the top of this email.
--- a/libgomp/omp_lib.f90.in +++ b/libgomp/omp_lib.f90.in + integer (kind=omp_allocator_handle_kind), & + parameter :: ompx_pinned_mem_alloc = 100
Likewise. * * * Why didn't you also update omp_lib.h.in? * * * I think you really want to update the checking code inside GCC itself, i.e. for Fortran: 3 | !$omp allocate(a) allocator(100) | 2 1 Error: Predefined allocator required in ALLOCATOR clause at (1) as the list item 'a' at (2) has the SAVE attribute and for C: foo.c:6:58: error: 'allocator' clause requires a predefined allocator as 'a' is static 6 | #pragma omp allocate(a) allocator(ompx_pinned_mem_alloc) | ^ this shows up using module m integer :: a !$omp allocate(a) allocator(100) end and enum omp_allocator_handle_t { ompx_pinned_mem_alloc = 100 }; void f() { static int a; #pragma omp allocate(a) allocator(ompx_pinned_mem_alloc) } You probably also want to update the testcases. See also: https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/c-c++-common/gomp/allocate-9.c#L23-L28 which checks for '> 9' and I think there is also a Fortran testcase. * * * On the code side, c-parser.cc has: else if (allocator && (wi::to_widest (allocator) < 1 || wi::to_widest (allocator) > 8)) /* 8 = largest predefined memory allocator. */ error_at (allocator_loc, "%<allocator%> clause requires a predefined allocator as " "%qD is static", var); And gcc/fortran/openmp.cc has the function 'is_predefined_allocator' we could consider to unify the number handling, but it should be updated. * * * NOTE: If you are missing C++, I (or someone else) still have to address Jakub's review comments for: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633782.html * * * RFC: Should this also be supported as: OMP_ALLOCATOR=ompx_pinned_mem Your current documentation change implies so. Hence, you also need to touch: libgomp/env.c's parse_allocator * * * And carrying on my question from last time:
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-5.c @@ -0,0 +1,103 @@ +/* { dg-do run } */
...
+#define CHECK_SIZE(SIZE) { \ + struct rlimit limit; \ + if (getrlimit (RLIMIT_MEMLOCK, &limit) \ + || limit.rlim_cur <= SIZE) \ + fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \ + }
Namely, I wrote: Glancing through the patches, for test cases, I think you should 'abort()' in CHECK_SIZE if it fails (rlimit issue or not supported system). Or do you think that the results are still could make sense when continuing and possibly failing later? Otherwise, it looks good to me. Tobias