Andrew Stubbs wrote:
On 10/11/2025 10:44, Tobias Burnus wrote:
There is still no clear note in the documentation that
this allocator uses the default device when doing the
allocation – and even less so that it must be the same
device (actually: device runtime) as for the allocation.
The documentation is there now.
Thanks.
and pass it to target.c as
'gomp_managed_alloc (size, &used_device);' and
'gomp_managed_free (ptr, used_device);'
(Note: With some handling to avoid races.)
I don't think I like the idea of hidden magic that a) might prevent
clever solutions, and b) once we start doing it we can never stop. I
prefer the clear documentation together with diagnosing the cases that
we can.
Well, I am not sure whether one hidden logic is better than the other;
we could also store the value device value in the descriptor.
But that solution is also fine if sufficiently documented.
* * *
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -1265,11 +1271,13 @@ parse_allocator (const char *env, const char *val, void
*const params[])
C (omp_pteam_mem_alloc, false)
C (omp_thread_mem_alloc, false)
C (ompx_gnu_pinned_mem_alloc, false)
+ C (ompx_gnu_managed_mem_alloc, false)
C (omp_default_mem_space, true)
C (omp_large_cap_mem_space, true)
C (omp_const_mem_space, true)
C (omp_high_bw_mem_space, true)
C (omp_low_lat_mem_space, true)
+ C (ompx_gnu_managed_mem_space, false)
#undef C
Not quite. The second argument is:
#define C(v, m) \
...
memspace = m; \
Thus: please use ', true)' for the last line.
* * *
+@item @code{ompx_gnu_managed_mem_space} is a GNU extension that provides
...
+ already accessible on the device. If managed memory is not supported by
+ the default device, as configured at the moment the allocator is called,
+ then the allocator will use the fall-back setting.
Actually, on non-Linux, the normal 'malloc' is used – and not the fallback.
Example:
omp target device(dev)
{
const omp_alloctrait_t traits[]
= { { omp_atk_alignment, 256},
{omp_atk_fallback, omp_atv_abort_fb} };
omp_allocator_handle_t myalloc
= omp_init_allocator(ompx_gnu_managed_mem_space, 2, traits);
void *ptr = omp_alloc(n, ompx_gnu_managed_mem_alloc);
}
For dev == omp_initial_device (on a Linux host):
* default device == an nvidia device
→ managed memory is used (honoring the alignment)
* default device != nvidia device (e.g. the host)
→ fall back is used → ABORT
(with default_fb, 'malloc' would be used but
without the 265-bit alignment).
For dev == any non-host device
* default mem space is used ('malloc'), honoring
the alignment.
The current wording does not really make his clear:
+@item @code{ompx_gnu_managed_mem_space} is a GNU extension that provides
+ managed memory accessible by both host and device (as determined by the
+ @var{default-device-var} ICV); it is only available for supported offload
+ targets (see @ref{Offload-Target Specifics}). This memory is accessible
Actually, the "and device" is also not quite right. For Nvidia devices, all
Nvidia devices can access that memory – it is not device specific, either.
* * *
How about, e.g. (note also 'devices' [-s]):
@item @code{ompx_gnu_managed_mem_space} is a GNU extension that provides
managed memory accessible by both host and devices. The memory space
is available if the offload target associated with the
@var{default-device-var} ICV supports managed memory (see
@ref{Offload-Target Specifics}). Otherwise, on Linux the
fall-back setting of the allocator is used and on other systems
the default memory space.
and continuing as in the patch:
+ This memory is accessible
+ by both the host and the device at the same address, so it need not be
+ mapped with @code{map} clauses. Instead, use the @code{is_device_ptr}
+ clause or @code{has_device_addr} clause to indicate that the pointer is
+ already accessible on the device. If managed memory is not supported by
+ the default device, as configured at the moment the allocator is called,
+ then the allocator will use the fall-back setting. If the default device
+ is configured differently when the memory is freed, via @code{omp_free}
+ or @code{omp_realloc}, the result may be undefined.
Albeit the "If managed ... fall-back setting." can be removed.
* * *
[AMD GPUs]
+@item Managed memory allocated with the OpenMP
+ @code{ompx_gnu_managed_mem_alloc} allocator or in the
+ @code{ompx_gnu_managed_mem_space} is not currently supported on AMD GPU
+ devices; attempting to use it in an allocator will trigger the fall-back
+ trait.
I think we need again "for" instead of "on" – as using
it on the device is fine, except that it will not be
managed but the default mem space. (Possibly adding 'on the host', but
it is not really needed.)
* * *
[Nvidia GPUs]
+@item Managed memory allocated with the @code{ompx_gnu_managed_mem_alloc}
Maybe change this to 'Managed memory allocated *on* *the* *host*'?
+ allocator or in the @code{ompx_gnu_managed_mem_space} (both GNU
+ extensions) allocate memory in the CUDA Managed Memory space using
+ @code{cuMemAllocManaged}. This memory is accessible by both the host and
+ the device at the same address, so it need not be mapped with @code{map}
+ clauses. Instead, use the @code{is_device_ptr} clause or
+ @code{has_device_addr} clause to indicate that the pointer is already
+ accessible on the device. The CUDA runtime will automatically handle
+ data migration between host and device as needed.
+ If managed memory is not supported by the default device, as configured
+ at the moment the allocator is called, then the allocator will use the
+ fall-back setting.
I wonder whether this is needed – or the wording in the other section is
enough? I am slightly inclined of removing it.
+ If the default device is configured differently when
+ the memory is freed, via @code{omp_free} or @code{omp_realloc}, the
+ result may be undefined.
While this one is redundant, I think it is sensible to keep it – to
avoid user surprises. It also hints at the default device in case
the user missed it in the other section.
* * *
All in all: LGTM with the env.c issue fixed and considering some
.texi wording changes.
Thanks for the patch!
Tobias