On 07/10/2025 11:31, Tobias Burnus wrote:
This patch adds the currently unused static function 'is_integrated_apu'
to libgomp/plugin/plugin-{gcn,nvptx}.c.
While currently not in use ('#if 0'), I'd like to add it already now
as prep work. The idea is to use it to enable self mapping automatically
be default if mapping it pointless (copying data despite sharing the
same memory controller). [See below for more.]
Any comments?
If going this far, why not make it live? I already posted the libgomp
parts a year ago, and there's not much:
https://patchwork.sourceware.org/project/gcc/patch/[email protected]/
There were some comments outside the scope of the patch, but mostly the
problem was what properties to query, which seems to be what you cover
here. (Also the Managed Memory question, but that's a future issue.)
Andrew
* * *
Regarding the property check:
For Nvidia GPUs, I am not sure whether it is useful as there does not
seem to be any integrated GPU so far - not even Grace Hopper.
For AMD GPUs, I have no idea whether it works for older GPUs, but for
MI300A it works - but XNACK needs to be enabled. It should also works
if the APU does not support XNACK.
Side remark: I think we need eventually to switch to per-device
capabilities on top of the per-device-type (Nvidia, GCN) capability
to support multi-GPU systems (e.g. AMD GPU with APU plus separate
discrete AMD GPU) to support this more fine grained. This applies
likewise to auto self mapping and to 'omp requires unified_shared_memory'.
On the other hand, having one type of Nvidia or AMD GPUs is common
and disabling GPUs is also one way out (e.g. ROCR_VISIBLE_DEVICES).
For AMD, this goes in lockstep with compiled-for vs. not-compiled-for
GPUs as only one type is supported at a time. We should also eventually
handle this better (host fallback if for a GPU ISA no code is available,
Supporting multiple ISA in a binary), but that's not an urgent feature.
* * *
Regarding auto-USM support:
The new functions currently aren't not used as for global variables
in static memory ('declare target'), 'map' still needs to data to/from
those. - One solution is to only have 'declare target link' variables
as GCC initializes them (with USM) such that they link to the host
variable.
Thus, prerequisite for this feature is the missing mapping support
in libgomp (OpenMP and OpenACC? Or only OpenMP as a starter?) and
for 'omp requires self_maps', the conversion to 'link' should also
happen automatically.
Once the mapping support is in, auto-USM can be enabled and I guess
we also want to have an environment variable to toggle between:
- always map (to override auto USM),
- use self-maps (for systems supporting USM but aren't APUs), and
- force-use self-maps (for sytems that report not supporting USM,
e.g. only one GPU supports it but the other not or similar issues).
[For your own risk. Better is to disable such GPUs, e.g. via
ROCR_VISIBLE_DEVICES, but it might still be useful at times.]
Tobias
PS: The APU check was tried with MI210 (false) and MI300A (true).
For the latter, both HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT
and HSA_AMD_SYSTEM_INFO_XNACK_ENABLED are only true if HSA_XNACK=1
was set. [The HSA_AMD_AGENT_INFO_SVM_DIRECT_HOST_ACCESS flag seems
to go in lockstep with HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU, except
that the former is unsurprisingly always true for the CPUs.]
For Grace-Hopper, CU_DEVICE_ATTRIBUTE_INTEGRATED = 0. I think GH
is currently the most integrated GPU by Nvidia and still uses
separate memory controllers (albeit a fast interconnect). [I ignore
embedded CPU/GPUs that Nvidia also offers.] Possibly, the Intel +
Nvidia collaboration will yield a CPU+GPU system for which the
flag will be true.