Hi, I concur that it would be useful to have information about USM suport something. I am less sure what we should test for.
Q1: Does the default device support USM (and is not the host)? Q2: Is there a device that support 'requires unified_shared_memory'? At a glance, the answer for both is identical and answered by: (a) Runtime access check: struct S {int *x; }; int x = 6; struct S s; s.x = &x; #pragma omp target {if (s->x == 6) *s->x = 7; } // crashes if not USM if (x != 7) return 1; return 0; (b) 'requires unified_shared_memory' (or self_maps) and check whether the omp_get_initial_device() inside a 'target' region did run on a device. [Contrary to the "effective-target offload_device_usm" in Thomas' email, this would use 'tofrom' and not 'from'.] However: (a) This answers Q1 (with a slight chance that the stack memory for x is accessible but not other memory). But Q1 = true → Q2 = true is wrong. Example: Two AMD GPUs, one an APU and one discrete GPU. As the libgomp checks the overall property, there will be no 'requires USM' devices left, i.e. only host fallback. (b) By construction, this answers Q2. However, for a code that does not use 'requires USM', we don't know whether the (then) default device supports USM or only some other device. (Shows up with multiple Nvidia devices or mixing AMD and Nvidia, where USM is only supported by some.) * * * A separate question is whether copy-mapping or self-mapping is used (assuming that the default-device is a non-host device now): (1) no requires clause (2) requires unified_shared_memory (3) requires self_maps OK, (3) is clear: If fulfilled, we know that self-mapping is used. For (1) and (2), my plan for GCC 16 is an environment variable that permits (if supported by hardware) to toggle between self-mapping and copy-mapping. For (2) and - for APUs - also for (1), defaulting to self-maps. I guess this should be a three way flag: use-copy-map, prefer-self-copy, use-self-copy - where the latter will remove non-USM devices from the list with (1) [and be the same as the third for (2)]. [For (3), the env var has no effect.] * * * And the final question is: Which effective target is required (or would be useful). While Q1/Q2 are not detectable at runtime, at least self-mapping vs copy-mapping is and we can then also check this for every device, in particular, whether 'target' is run on the host or on a device. * * * For libgomp.c-c++-common/target-1.c, the most interesting case is copy-mapping on an USM machine and copy-mapping on a non-USM machine. Currently, that's done - but we cannot do some extra checks as we don't know whether the system is USM or not. libgomp.c-c++-common/target-usm-1.c will test it on a known USM system. This could be copy- or self-mapping; while this is testable, GCC's current version will always use self-mapping and an updated version will do so by default. If we knew Q1 (i.e. the default device without requirement is actually USM), the extra checks of libgomp.c-c++-common/target-1.c could be enabled. (At least for the default device; the code actually iterates over all devices.) Knowing Q2 might also work - a false negative does not matter (not doing extra test even if we could), but a false positive would cause a crash. - This is unlikely as it requires (a) multiple GPUs where one supports USM and the other does not and (b) that the default GPU is the one without USM. Still, it feels a bit fragile. For this particular test case the USM of pointer components but still doing mapping is crucial as it is part of a complex mapping patch. * * * For general code, I am not really sure what it buys us. For normal code, we can always check whether code runs on the host or and whether self-mapping happens. Thus, if code expects USM to work – adding the requirement should be fine. This will cause some code to run on the host (host fallback), but that's fine – just having a bit extra host testing. And we can self-mapping vs. copy-mapping vs. host runs can be checked for. And for non USM code, the normal is-shared-memory check should be sufficient. * * * Note that 'self_maps' differs from 'unified_shared_memory' in two ways: (i) 'self_maps' always requires self-mapping, 'unified_shared_memory' permits it but does not require it. (ii) 'self_maps' requires that global variables works. GCC 15 (+ mainline) currently only partially implements (ii); i.e. if there is 'declare target link', it works - but it does not automatically convert explicit/implicit 'declare target' to 'declare target link' for those. * * * Regarding the patch, I think it especially the comments are likely to bitrot fast - besides the issue that it is not completely clear what the test should test for. Adding '{ ! offload_device_usm }' will only save some CPU time, albeit it will also disable an (arguably not so important) test case. Tobias