Hi Andrew, hi Jakub, hello world,

Andrew Stubbs wrote:

Compared to the previous v3 posting of this patch, the enumeration of
the "ompx" allocators have been moved to start at "100"

100 is a bad value - as can be seen below.

As Jakub suggested at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640432.html
"given that LLVM uses 100-102 range, perhaps pick a different one, 200 or 150"

(I know that the first review email suggested 100.)

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.

Namely: ompx_pinned_mem_alloc

RFC: Should we use this name or - similar to LLVM - prefix this by
a vendor prefix instead (gnu_omp_ or gcc_omp_ instead of ompx_)?

IMHO it is fine to use ompx_ for pinned as the semantic is clear
and should be compatible with IBM and AMD.

For other additional memspaces / allocators, I am less sure, i.e.
on OG13 there are:
- ompx_unified_shared_mem_space, ompx_host_mem_space
- ompx_unified_shared_mem_alloc, ompx_host_mem_alloc

(BTW: In light of TR13 naming, the USM one could be
..._devices_all_mem_{alloc,space}, just to start some bikeshading
or following LLVM + Intel '…target_{host,shared}…'.)

* * *

Looking at other compilers:

IBM's compiler, https://www.ibm.com/docs/en/SSXVZZ_16.1.1/pdf/compiler.pdf , 
has:
- ompx_pinned_mem_alloc, tagged as IBM extension and otherwise without 
documenting it further

Checking omp.h, they define it as:
  ompx_pinned_mem_alloc = 9, /* Preview of host pinned memory support */
and additionally have:
  LOMP_MAX_MEM_ALLOC = 1024,

AMD's compiler based on clang has:
      /* Preview of pinned memory support */
      ompx_pinned_mem_alloc = 120,
in addition to the LLVM defines shown below.

Regarding LLVM:
- they don't offer 'pinned'
- they use the prefix 'llvm_omp' not 'ompx'

Namely:
    typedef enum omp_allocator_handle_t
...
      llvm_omp_target_host_mem_alloc = 100,
      llvm_omp_target_shared_mem_alloc = 101,
      llvm_omp_target_device_mem_alloc = 102,
...
    typedef enum omp_memspace_handle_t
...
      llvm_omp_target_host_mem_space = 100,
      llvm_omp_target_shared_mem_space = 101,
      llvm_omp_target_device_mem_space = 102,

Remark: I did not find a documentation - and while I
understand in principle host and shared, I wonder how
LLVM handles 'device_mem_space' when there is more than
one device.

BTW: OpenMP TR13 avoids this issue by adding two sets of
API routines. Namely:

First, for memspaces,
- omp_get_{device,devices}_memspace
- omp_get_{device,devices}_and_host_memspace
- omp_get_devices_all_memspace

and, secondly, for allocators:
- omp_get_{device,devices}_allocator
- omp_get_{device,devices}_and_host_allocator
- omp_get_devices_all_allocator

where omp_get_device_* takes a single device number and
omp_get_devices_* a list of device numbers while _and_host
automatically adds the initial device to the list.

* * *

Looking at Intel, they even use extensions without prefix:

omp_target_{host,shared,device}_mem_{space,alloc}

and contrary to LLVM they document it with the semantic, cf.
https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/openmp-memory-spaces-and-allocators.html

* * *

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.

...

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index cdedc7d80e9..18e3f525ec6 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -99,6 +99,8 @@ GOMP_is_alloc (void *ptr)

...

   #define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
-_Static_assert (ARRAY_SIZE (predefined_alloc_mapping)
+_Static_assert (ARRAY_SIZE (predefined_omp_alloc_mapping)
                == omp_max_predefined_alloc + 1,
-               "predefined_alloc_mapping must match omp_memspace_handle_t");
+               "predefined_omp_alloc_mapping must match 
omp_memspace_handle_t");
+#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))

I am surprised that this compiles: Why do you re-#define this macro?

* * *

--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
     omp_cgroup_mem_alloc = 6,
     omp_pteam_mem_alloc = 7,
     omp_thread_mem_alloc = 8,
+  ompx_pinned_mem_alloc = 100,

See remark regarding "100" at the top of this email.

--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
+        integer (kind=omp_allocator_handle_kind), &
+                 parameter :: ompx_pinned_mem_alloc = 100

Likewise.

* * *

Why didn't you also update omp_lib.h.in?

* * *

I think you really want to update the checking code inside GCC itself,

i.e. for Fortran:

    3 |   !$omp allocate(a) allocator(100)

      |                 2            1

Error: Predefined allocator required in ALLOCATOR clause at (1) as the list 
item 'a' at (2) has the SAVE attribute

and for C:

foo.c:6:58: error: 'allocator' clause requires a predefined allocator as 'a' is 
static

    6 |   #pragma omp allocate(a) allocator(ompx_pinned_mem_alloc)

      |                                                          ^

this shows up using

module m

  integer :: a

  !$omp allocate(a) allocator(100)

end

and

enum omp_allocator_handle_t { ompx_pinned_mem_alloc = 100 };

void f()

{

  static int a;

  #pragma omp allocate(a) allocator(ompx_pinned_mem_alloc)

}

You probably also want to update the testcases. See also:

https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/c-c++-common/gomp/allocate-9.c#L23-L28

which checks for '> 9' and I think there is also a Fortran testcase.

* * *

On the code side, c-parser.cc has:

          else if (allocator

                   && (wi::to_widest (allocator) < 1

                       || wi::to_widest (allocator) > 8))

            /* 8 = largest predefined memory allocator. */

            error_at (allocator_loc,

                      "%<allocator%> clause requires a predefined allocator as "

                      "%qD is static", var);

And gcc/fortran/openmp.cc has the function 'is_predefined_allocator'

we could consider to unify the number handling, but it should be updated.

* * *

NOTE: If you are missing C++, I (or someone else) still have to

address Jakub's review comments for:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633782.html

* * *

RFC: Should this also be supported as:

OMP_ALLOCATOR=ompx_pinned_mem

Your current documentation change implies so. Hence, you also need to touch:

libgomp/env.c's parse_allocator

* * *

And carrying on my question from last time:

+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-5.c
@@ -0,0 +1,103 @@
+/* { dg-do run } */
...
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+      || limit.rlim_cur <= SIZE) \
+    fprintf (stderr, "unsufficient lockable memory; please increase 
ulimit\n"); \
+  }

Namely, I wrote:

Glancing through the patches, for test cases, I think you should
'abort()' in CHECK_SIZE if it fails (rlimit issue or not supported
system). Or do you think that the results are still could make sense
when continuing and possibly failing later?

Otherwise, it looks good to me.

Tobias

Reply via email to