On Sat, Jul 20, 2024 at 02:42:22PM -0600, Sandra Loosemore wrote:
> This patch implements the libgomp runtime support for the dynamic
> target_device selector via the GOMP_evaluate_target_device function.

For kind, isa and arch traits in the device sets we decide based on
compiler flags and overrides through target attribute etc., not on actual
hw capabilities (and I think we have to, it shouldn't be a dynamic
selection).

Now for kind, isa and arch traits in the target_device set this patch
decides based on compiler flags used to compile some routine in libgomp.so
or libgomp.a.

While this can work in the (very unfortunate) GCN state of things where
only exact isa match is possible (I really hope we can one day generalize
it by being able to compile for a set of isas by supporting lowest
denominator and patching the EM_* in the ELF header or something similar,
perhaps with runtime decisions on what to do for different CPUs), deciding
what to do based on how libgomp.a or libgomp.so.1 has been compiled for the
rest is IMHO wrong.

Now, at least in 5.2 I don't see a restriction that target_device trait
can't be used inside of selectors in a target region.
IMHO that is a bug in the standard.  E.g. it says that
"The expression of a device_num trait must evaluate to a non-negative integer 
value that is
less than or equal to the value returned by omp_get_num_devices."
but it is unspecified what happens when omp_get_num_devices is
called in the target region.
Not really sure if in the patch you actually support say metadirective
with target_device from inside of a target region querying properties of
say the host device or something similar.

If (hopefully) one can only query target_device on the host, then I think
the best would be that at least for the initial device we actually use
the ISAs etc. of whatever function queries that trait, rather than what
compiler flags were used to compile libgomp.so.1.  That would mean
returning from the function something to the caller to say it is actually
a host device and in the emitted code do the matching based on that rather
than on what the function would otherwise match.
That would then mean we don't need to supply special x86 etc. versions (and
whatever other host, powerpc, ..., where we just didn't define enough
details).

For other devices, this is harder because there is no specific offload code
associated with the target_device trait use.  Guess it would be best if
it could be picked from the minimum ISA actually supported in the offloading
code or something similar, by the time this is invoked libgomp should have
the offloading code (if any) already registered (unless it is in some
library dlopened later, that is fuzzy thing), so best would be e.g. for PTX
to watch the minimum required SM level of the code that is being registered,
whether stored in the offload section separately or figured out from the PTX
code being loaded.  But perhaps initially what you do for offloading devices
might be still ok.

> include/ChangeLog
>       * cuda/cuda.h (CUdevice_attribute): Add definitions for
>       CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and
>       CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR.
> 
> libgomp/ChangeLog
>       * Makefile.am (libgomp_la_SOURCES): Add selector.c.
>       * Makefile.in: Regenerate.
>       * config/gcn/selector.c: New.
>       * config/linux/selector.c: New.
>       * config/linux/x86/selector.c: New.
>       * config/nvptx/selector.c: New.
>       * libgomp-plugin.h (GOMP_OFFLOAD_evaluate_device): New.
>       * libgomp.h (struct gomp_device_descr): Add evaluate_device_func field.
>       * libgomp.map (GOMP_5.1.3): New, add GOMP_evaluate_target_device.
>       * libgomp.texi (OpenMP Context Selectors): Document dynamic selector
>       matching of kind/arch/isa.
>       * libgomp_g.h (GOMP_evaluate_current_device): New.
>       (GOMP_evaluate_target_device): New.
>       * oacc-host.c (host_evaluate_device): New.
>       (host_openacc_exec): Initialize evaluate_device_func field to
>       host_evaluate_device.
>       * plugin/plugin-gcn.c (gomp_match_selectors): New.
>       (gomp_match_isa): New.
>       (GOMP_OFFLOAD_evaluate_device): New.
>       * plugin/plugin-nvptx.c (struct ptx_device): Add compute_major and
>       compute_minor fields.
>       (nvptx_open_device): Read compute capability information from device.
>       (gomp_match_selectors): New.
>       (gomp_match_selector): New.
>       (CHECK_ISA): New macro.
>       (GOMP_OFFLOAD_evaluate_device): New.
>       * selector.c: New.
>       * target.c (GOMP_evaluate_target_device): New.
>       (gomp_load_plugin_for_device): Load evaluate_device plugin function.
> 
> Co-Authored-By: Kwok Cheung Yeung <k...@codesourcery.com>
> Co-Authored-By: Sandra Loosemore <san...@codesourcery.com>

> --- /dev/null
> +++ b/libgomp/config/gcn/selector.c
> @@ -0,0 +1,102 @@
> +/* Copyright (C) 2022 Free Software Foundation, Inc.

2022-2024

> +
> +/* The selectors are passed as strings, but are actually sets of multiple
> +   trait property names, separated by '\0' and with an extra '\0' at
> +   the end.  Match such a string SELECTORS against an array of strings
> +   CHOICES, that is terminated by a null pointer.
> +   matches.  */
> +static bool
> +gomp_match_selectors (const char *selectors, const char **choices)
> +{
> +  while (*selectors != '\0')
> +    {
> +      bool match = false;
> +      for (int i = 0; !match && choices[i]; i++)
> +     match = !strcmp (selectors, choices[i]);
> +      if (!match)
> +     return false;
> +      selectors += strlen (selectors) + 1;
> +    }
> +  return true;
> +}

Isn't this function the same on all arches?
If yes, shouldn't it be defined in one place (static inline function in
libgomp.h, or defined somewhere where it will be compiled for all libgomp
targets?

> +bool
> +GOMP_evaluate_current_device (const char *kind, const char *arch,
> +                           const char *isa)
> +{
> +  static const char *kind_choices[] = { "gpu", "nohost", NULL };

Is "any" handled on the compiler side and never makes it through here?

> --- /dev/null
> +++ b/libgomp/config/linux/selector.c

Why does this exist at all?  Isn't the libgomp/selector.c the same?

> --- /dev/null
> +++ b/libgomp/config/linux/x86/selector.c

As I wrote earlier, I'd strongly prefer if the host device part was
done on the compiler side based on actual compiler flags, not decide
based on libgomp.so compilation flags (which are mostly the same at least
in distro builds, lowest common denominator).
> +#ifdef __AVX2__
> +       "avx2",
> +#endif
> +#ifdef __AVX512F__
> +  "avx512f",

This one is misindented.

Also, any time one adds a new isa this would need to be updated.)

> --- a/libgomp/oacc-host.c
> +++ b/libgomp/oacc-host.c
> @@ -136,6 +136,16 @@ host_run (int n __attribute__ ((unused)), void *fn_ptr, 
> void *vars,
>    fn (vars);
>  }
>  
> +static bool
> +host_evaluate_device (int device_num __attribute__ ((unused)),
> +                   const char *kind __attribute__ ((unused)),
> +                   const char *arch __attribute__ ((unused)),
> +                   const char *isa __attribute__ ((unused)))
> +{
> +  __builtin_unreachable ();

GOMP_fatal or at least abort (); please.

Plus, what Tobias said about the passed in device_num, decide whether it
is the actual device number, or one with backwards compatibility
transformations and depending on that adjust it on the runtime side or
on the library side.

        Jakub

Reply via email to