Andrew Stubbs wrote:
On 28/11/2025 12:16, Andrew Stubbs wrote:
This patch extends omp_target_is_accessible to check the actual device
status
for the memory region, on amdgcn and nvptx devices (rather than just
checking
if shared memory is enabled).
OpenMP 6.0 has the following clarification:
"If ptr is NULL or the implementation cannot guarantee
accessibility, the routine returns zero."
Can you add at the top of target.c's omp_target_is_accessible
if (ptr == NULL)
return false;
or maybe also 'if (ptr == NULL || size == 0)'; the specification
requires that size is positive.
Could be then tested as:
for (int dev = 0; dev < omp_get_num_devices(); dev++)
if (omp_is_accessible (nullptr, dev))
__builtin_abort ();
That's PR 113213. (Which is not listed + would be fixed by this).
[Talking of which, there is also PR 113216, which is very similar
to PR121813.]
* * *
- /* TODO: Unified shared memory must be handled when available. */
+ /* Unified shared memory (or true shared memory). */
+ if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+ return true;
I think it makes more sense to remove the check here - and
handle it in the plugin. (More to this later.)
* * *
Can you also update libgomp.texi,
https://gcc.gnu.org/onlinedocs/libgomp/omp_005ftarget_005fis_005faccessible.html
(a) adding the NULL / not garantee note.
(b) I think the following should be replaced by
the assumption that for device == host, all
pointers are accessible.
[We should do a better check eventually,
but presumably in a later patch; cf. below]
"Note that GCC’s current implementation assumes that ptr is a valid host
pointer. Therefore, all addresses given by ptr are assumed to be
accessible on the initial device. And, to err on the safe side, this
memory is only available on a non-host device that can access all host
memory ([uniform] shared memory access)."
* * *
BTW: OpenMP 6 also removed "When called from within a target
region the effect is unspecified.", i.e. it is now device-callable.
Something to address eventually but IMHO there is no need to do
so as part of this patch.
* * *
+++ b/libgomp/libgomp-plugin.h
@@ -175,0 +176 @@ extern bool GOMP_OFFLOAD_managed_free (int, void *);
+extern bool GOMP_OFFLOAD_is_shared_ptr (int, const void *, size_t);
I think 'is_accessible_ptr' makes more sense than to talk
about 'is_shared_ptr' as that's what it is about.
Additionally, for future use, I'd prefer an 'int' (or enum)
instead of a 'bool' here:
'-1' Known to be accessible by that device but *not* host accessible
'1' Known to be accessible by that device
(and either also known to be host accessible or unknown host
accessibility status)
'2' Known to be device accessible if it is host accessible,
the device can access host memory (USM).
'0' Not device accessible / unknown
The reasons is that we eventually want to handle:
ptr = omp_target_alloc (nbytes, omp_default_device);
int a = omp_target_is_accessible (ptr, omp_initial_device);
and
ptr = omp_target_alloc (nbytes, 5 );
int a = omp_target_is_accessible (ptr, 3 );
i.e. checking on the host for device-only allocated memory
and checking on the device for memory allocated on a different
device.
For the host or if the device returned '2', we have then to
walk all nonhost devices (but the selected device) to check
whether any of them returns '-1'; if not, we assume that it
is accessible.
* * *
Thus, I think it makes sense to have an 'int'. However,
looking at CUDA, I realize that for one check, we could
use CU_POINTER_ATTRIBUTE_DEVICE_POINTER and for the other
CU_POINTER_ATTRIBUTE_HOST_POINTER.
Thus, we presumably would need a boolean flag whether the
check should be done for device or host accessibility.
But presumably, we still want to stick to an int:
- unknown status
- known to be accessible by the host | this device
- known not to be accessible by the host | this device
Hence: consider preparing for this by adding a boolean
and returning an int, even if we don't implement this right now.
* * *
BTW: I think we want to add and support the following
testcase, which works with Nvidia but not with AMD:
-----------------------------------
#include <omp.h>
void check (int dev)
{
constexpr int N = 10;
constexpr int size = N*sizeof(int);
int A[N] = {};
void *ptr = omp_target_alloc (size, dev);
if (ptr == nullptr || !omp_target_is_accessible (ptr, size, dev))
__builtin_abort ();
#pragma omp target device(dev) firstprivate(ptr)
for (int i = 0; i < N; i++)
((int *)ptr)[i] = i + 1;
if (omp_target_memcpy (A, ptr, size, 0, 0, omp_initial_device, dev) != 0)
__builtin_abort ();
for (int i = 0; i < N; i++)
if (A[i] != i + 1)
__builtin_abort ();
omp_target_free (ptr, dev);
}
int main ()
{
check (omp_default_device);
for (int dev = 0; dev <= omp_get_num_devices(); dev++)
check (dev);
}
-----------------------------------
* * *
Actually, I think the AMD example can be made working using
the following patch.
Interestingly, nagents == 0, but the first check is successful:
$1 = {size = 56, type = HSA_EXT_POINTER_TYPE_HSA, agentBaseAddress =
0x7ffff4800000, hostBaseAddress = 0x7ffff4800000, sizeInBytes = 40,
userData = 0x0, agentOwner = {handle = 5030896}, global_flags = 4}
i.e. the size fits and the owner is the current agent.
I think we should consider of either using this one instead or
in addition to the existing check.
Namely something like the following.
[Disclaimer: I have not checked the latest change I did,
not even compiled it - but the core part worked when I
tested it.]
+++ b/libgomp/plugin/plugin-gcn.c
@@ -238,2 +238,5 @@ struct hsa_runtime_fn_info
size_t attribute_count);
+ hsa_status_t (*hsa_amd_pointer_info_fn)
+ (const void *, hsa_amd_pointer_info_t *, void *(*)(size_t),
+ uint32_t *, hsa_agent_t **);
};
@@ -1500,2 +1503,3 @@ init_hsa_runtime_functions (void)
DLSYM_OPT_FN (hsa_amd_svm_attributes_get)
+ DLSYM_OPT_FN (hsa_amd_pointer_info)
return true;
@@ -3505,3 +3509,3 @@ gcn_exec (struct kernel_info *kernel,
-#if 0 /* TODO: Use to enable self-mapping/USM automatically. */
+
/* FIXME: The auto-self-map feature depends on still mapping 'declare
target'
@@ -3558,3 +3562,2 @@ is_integrated_apu (struct agent_info *agent, bool
check_xnack)
}
-#endif
@@ -5269,3 +5272,4 @@ GOMP_OFFLOAD_is_shared_ptr (int device, const void
*ptr, size_t size)
|| device < 0 || device > hsa_context.agent_count
- || !hsa_fns.hsa_amd_svm_attributes_get_fn)
+ || (!hsa_fns.hsa_amd_svm_attributes_get_fn
+ && !hsa_fns.hsa_amd_pointer_info_fn))
return false;
@@ -5274,2 +5278,37 @@ GOMP_OFFLOAD_is_shared_ptr (int device, const
void *ptr, size_t size)
+ if (hsa_fns.hsa_amd_pointer_info_fn)
+ {
+ hsa_amd_pointer_info_t info;
+ uint32_t nagents;
+ hsa_agent_t *agents;
+ info.size = sizeof (hsa_amd_pointer_info_t);
+
+ hsa_status_t status = hsa_fns.hsa_amd_pointer_info_fn (ptr,
&info, NULL,
+ &nagents, &agents);
+ if (status2 == HSA_STATUS_SUCCESS && info.type !=
HSA_EXT_POINTER_TYPE_UNKNOWN)
+ {
+ /* Owns the pointer; can be true even for nagents == 0. */
+ if (agent->id.handle == info.agentOwner.handle)
+ return info.sizeInBytes >= size;
+ for (unsigned i = 0; i < nagents; i++)
+ if (agent->id.handle == agents[0].handle)
+ return info.sizeInBytes >= size;
+ if (info.type != HSA_EXT_POINTER_TYPE_LOCKED)
+ return false; // Not host memory and belonging to other agents.
+ }
+
+ /* Assume memory is host accessible. */
+ bool svm_accessible;
+ hsa_system_info_t type = HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT;
+ hsa_status_t status2 = hsa_fns.hsa_system_get_info_fn (type,
&svm_accessible);
+ if (status2 == HSA_STATUS_SUCCESS && svm_accessible)
+ return true;
+ if (is_integrated_apu (agent, /* xnack */ true
+ /* FIXME: pass !(HSA_AMD_SYSTEM_INFO_XNACK_ENABLED) status here? */ ))
+ return true;
+
+ /* FIXME: Will the following provide additional 'true' cases or not? */
+ if (!hsa_fns.hsa_amd_svm_attributes_get_fn)
+ return false;
+
/* The HSA API doesn't seem to report for the whole range given, so
we call
---------------------------------------------------
* * *
For Nvidia, while it somehow works:
(A) I think we should run it on the right device,
i.e.
CUcontext old_ctx;
CUDA_CALL_ERET (false, cuCtxPushCurrent, ptx_dev->ctx);
....
CUDA_CALL_ASSERT (cuCtxPopCurrent, &old_ctx);
(B) I wonder whether it shouldn't be instead:
CU_POINTER_ATTRIBUTE_DEVICE_POINTER
I assume that this will also deal with USM, but I have not
checked whether additionally a USM check would make sense,
similar to the AMD part above or whether that's already
covered that way.
* * *
Otherwise, nothing spotted, but I still want to reread the
patch.
Tobias