Hi all,

Jakub Jelinek wrote:
On Sat, Jul 20, 2024 at 02:42:22PM -0600, Sandra Loosemore wrote:
This patch implements the libgomp runtime support for the dynamic
target_device selector via the GOMP_evaluate_target_device function.
[…]

Now for kind, isa and arch traits in the target_device set this patch
decides based on compiler flags used to compile some routine in libgomp.so
or libgomp.a.

While this can work in the (very unfortunate) GCN state of things where
only exact isa match is possible (I really hope we can one day generalize
it by being able to compile for a set of isas by supporting lowest
denominator and patching the EM_* in the ELF header or something similar,
perhaps with runtime decisions on what to do for different CPUs),

I think that can only work to some extend. LLVM has "gfx11-generic" which is compatible with gfx110{0,1,2,3,} and gfx115{0,1,2}, which at least helps a bit. For gfx10, it has gfx10-1-generic for gfx101{0,1,2,3} and gfx10-3-generic for gfx103[0-6] and gfx9-generic for gfx90{0,2,4,6,9,c}.

Thus, we could have versions which support a common subset, but we still need multiple libraries. And it needs to be implemented …

This sounds like a task for the GCN maintainer …

* * *

deciding what to do based on how libgomp.a or libgomp.so.1 has been compiled 
for the
rest is IMHO wrong.

I wonder whether we should do something like the following.

[The following is a mix between compile code and generated code, for 
illustrative
purpose.]

Inside the compiler do:

#ifndef ACCEL_COMPILER
intr = 0; if (targetm.omp.device_kind_arch_isa != NULL) r = targetm.omp.device_kind_arch_isa (omp_device_{kind,arch,isa}, val);

   if (dev_num && TREE_CODE (dev_num) == INTEGER_CST)
     {
       if (dev_num < -1 /* INVALID_DEVICE or nonconforming */)
         → 0
       if (dev_num == initial_device)
         → r
     }
<code gen>
     /* The '? :' condition is a compile time condition. */
     d = <dev_num> ? <dev_num> : omp_get_default_device ();
     if (d < -1)
       → 0
     else if (d == -1 || d == omp_get_initial_device ())
       → r
     else
       → GOMP_get_device_kind_arch_isa  (d, kind, arch, isa)
</codegen>
#else
   /* VARIANT 1: Assume that neither reverse offload nor nested target occurs. 
*/
   →targetm.omp.device_kind_arch_isa  (kind, arch, isa)
   /* VARIANT 2 -
   d = <dev_num> ? <dev_num> : omp_get_default_device ();
   if (d == omp_get_device_num ())
     →targetm.omp.device_kind_arch_isa  (kind, arch, isa)
   else
     /* Cannot really do anything here - and as no nested target is permitted,
        use 'false'.  */
     → 0
#endif


* * *

And on the libgomp side GOMP_get_device_kind_arch_isa → plugin code.

And there:

(A) GCN:

kind and arch are clear. For ISA:

agent->device_isa + use existing isa_hsa_name() function (or likewise).

(B) Nvptx:

cuDeviceGetAttribute + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75 and CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76.

Example: sm_89 = (major) 8 and (minor) 9.

* * *

Does this sound sensible?

Tobias

PS: For the current host-offload GSoC task, we might eventually think of using cpuid on x86-64, i.e. gcc/config/i386/cpuid.h.

PS: RFC remains: Should 'sm_80' be true if the hardware/compilation is 'sm_89' or not? Namely: Does 'sm_80' denote the capability or the specific hardware?

Regarding this topic, see also https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662059.html

Reply via email to