Andrew Stubbs wrote:
+ /* If USM has been requested and is supported by all devices
+ of this type, set the capability accordingly. */
+ if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)
+ current_device.capabilities |= GOMP_OFFLOAD_CAP_SHARED_MEM;
+
This breaks my USM patches that add the omp_alloc support (because it
now short-circuits all of those code-paths),
which I believe is fine. Your USM patches are for pseudo-USM, i.e. a
(useful) bandaid for systems where the memory is not truely
unified-shared memory but only specially tagged host memory is device
accessible. (e.g. only memory allocated via cuMemAllocManaged) — And,
quite similar, for -foffload-memory=pinned.
I think if a user wants to have pseudo USM – and does so by passing
-foffload-memory=unified – we can add another flag to the internal
omp_requires_mask. - By passing this option, a user should then also be
aware of all the unavoidable special-case issues of pseudo-USM and
cannot complain if they run into those.
If not, well, then the user either gets true USM (if supported) - or
host fallback. Either of it is perfectly fine.
With -foffload-memory=unified, the compiler can then add all the
omp_alloc calls – and, e.g., set a new GOMP_REQUIRES_OFFLOAD_MANAGED
flag. If that's set, we wouldn't do the line above quoted capability
setting in libgomp/target.c.
For nvidia, GOMP_REQUIRES_OFFLOAD_MANAGED probably requires
CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS, i.e. when 0 then we
probably want to return -1 also for -foffload-memory=unified. - A quick
check shows that Tesla K20 (Kepler, sm_35) has 0 while Volta, Ada,
Ampere (sm_70, sm_82, sm_89) have 1. (I recall using managed memory on
an old system; page migration to the device worked fine, but a on-host
accesses while the kernel was still running, crashed the program.|)
|
For amdgcn, my impression is that we don't need to handle
-foffload-memory=unified as only the MI200 series (+ APUs) supports this
well, but MI200 also supports true USM (with page migration; for APU it
makes even less sense). - But, of course, we still may. — Auto-setting
HSA_XNACK could be still be done MI200, but I wonder how to distinguish
MI300X vs. MI300A, but it probably doesn't harm (nor help) to set
HSA_XNACK for APUs …
and it's just not true for devices where all host memory isn't
magically addressable on the device.
Is there another way to detect truly shared memory?
Do you have any indication that the current checks become true when the
memory is not accessible?
Tobias