On Sat, Jul 20, 2024 at 02:42:22PM -0600, Sandra Loosemore wrote: > This patch implements the libgomp runtime support for the dynamic > target_device selector via the GOMP_evaluate_target_device function.
For kind, isa and arch traits in the device sets we decide based on compiler flags and overrides through target attribute etc., not on actual hw capabilities (and I think we have to, it shouldn't be a dynamic selection). Now for kind, isa and arch traits in the target_device set this patch decides based on compiler flags used to compile some routine in libgomp.so or libgomp.a. While this can work in the (very unfortunate) GCN state of things where only exact isa match is possible (I really hope we can one day generalize it by being able to compile for a set of isas by supporting lowest denominator and patching the EM_* in the ELF header or something similar, perhaps with runtime decisions on what to do for different CPUs), deciding what to do based on how libgomp.a or libgomp.so.1 has been compiled for the rest is IMHO wrong. Now, at least in 5.2 I don't see a restriction that target_device trait can't be used inside of selectors in a target region. IMHO that is a bug in the standard. E.g. it says that "The expression of a device_num trait must evaluate to a non-negative integer value that is less than or equal to the value returned by omp_get_num_devices." but it is unspecified what happens when omp_get_num_devices is called in the target region. Not really sure if in the patch you actually support say metadirective with target_device from inside of a target region querying properties of say the host device or something similar. If (hopefully) one can only query target_device on the host, then I think the best would be that at least for the initial device we actually use the ISAs etc. of whatever function queries that trait, rather than what compiler flags were used to compile libgomp.so.1. That would mean returning from the function something to the caller to say it is actually a host device and in the emitted code do the matching based on that rather than on what the function would otherwise match. That would then mean we don't need to supply special x86 etc. versions (and whatever other host, powerpc, ..., where we just didn't define enough details). For other devices, this is harder because there is no specific offload code associated with the target_device trait use. Guess it would be best if it could be picked from the minimum ISA actually supported in the offloading code or something similar, by the time this is invoked libgomp should have the offloading code (if any) already registered (unless it is in some library dlopened later, that is fuzzy thing), so best would be e.g. for PTX to watch the minimum required SM level of the code that is being registered, whether stored in the offload section separately or figured out from the PTX code being loaded. But perhaps initially what you do for offloading devices might be still ok. > include/ChangeLog > * cuda/cuda.h (CUdevice_attribute): Add definitions for > CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and > CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR. > > libgomp/ChangeLog > * Makefile.am (libgomp_la_SOURCES): Add selector.c. > * Makefile.in: Regenerate. > * config/gcn/selector.c: New. > * config/linux/selector.c: New. > * config/linux/x86/selector.c: New. > * config/nvptx/selector.c: New. > * libgomp-plugin.h (GOMP_OFFLOAD_evaluate_device): New. > * libgomp.h (struct gomp_device_descr): Add evaluate_device_func field. > * libgomp.map (GOMP_5.1.3): New, add GOMP_evaluate_target_device. > * libgomp.texi (OpenMP Context Selectors): Document dynamic selector > matching of kind/arch/isa. > * libgomp_g.h (GOMP_evaluate_current_device): New. > (GOMP_evaluate_target_device): New. > * oacc-host.c (host_evaluate_device): New. > (host_openacc_exec): Initialize evaluate_device_func field to > host_evaluate_device. > * plugin/plugin-gcn.c (gomp_match_selectors): New. > (gomp_match_isa): New. > (GOMP_OFFLOAD_evaluate_device): New. > * plugin/plugin-nvptx.c (struct ptx_device): Add compute_major and > compute_minor fields. > (nvptx_open_device): Read compute capability information from device. > (gomp_match_selectors): New. > (gomp_match_selector): New. > (CHECK_ISA): New macro. > (GOMP_OFFLOAD_evaluate_device): New. > * selector.c: New. > * target.c (GOMP_evaluate_target_device): New. > (gomp_load_plugin_for_device): Load evaluate_device plugin function. > > Co-Authored-By: Kwok Cheung Yeung <k...@codesourcery.com> > Co-Authored-By: Sandra Loosemore <san...@codesourcery.com> > --- /dev/null > +++ b/libgomp/config/gcn/selector.c > @@ -0,0 +1,102 @@ > +/* Copyright (C) 2022 Free Software Foundation, Inc. 2022-2024 > + > +/* The selectors are passed as strings, but are actually sets of multiple > + trait property names, separated by '\0' and with an extra '\0' at > + the end. Match such a string SELECTORS against an array of strings > + CHOICES, that is terminated by a null pointer. > + matches. */ > +static bool > +gomp_match_selectors (const char *selectors, const char **choices) > +{ > + while (*selectors != '\0') > + { > + bool match = false; > + for (int i = 0; !match && choices[i]; i++) > + match = !strcmp (selectors, choices[i]); > + if (!match) > + return false; > + selectors += strlen (selectors) + 1; > + } > + return true; > +} Isn't this function the same on all arches? If yes, shouldn't it be defined in one place (static inline function in libgomp.h, or defined somewhere where it will be compiled for all libgomp targets? > +bool > +GOMP_evaluate_current_device (const char *kind, const char *arch, > + const char *isa) > +{ > + static const char *kind_choices[] = { "gpu", "nohost", NULL }; Is "any" handled on the compiler side and never makes it through here? > --- /dev/null > +++ b/libgomp/config/linux/selector.c Why does this exist at all? Isn't the libgomp/selector.c the same? > --- /dev/null > +++ b/libgomp/config/linux/x86/selector.c As I wrote earlier, I'd strongly prefer if the host device part was done on the compiler side based on actual compiler flags, not decide based on libgomp.so compilation flags (which are mostly the same at least in distro builds, lowest common denominator). > +#ifdef __AVX2__ > + "avx2", > +#endif > +#ifdef __AVX512F__ > + "avx512f", This one is misindented. Also, any time one adds a new isa this would need to be updated.) > --- a/libgomp/oacc-host.c > +++ b/libgomp/oacc-host.c > @@ -136,6 +136,16 @@ host_run (int n __attribute__ ((unused)), void *fn_ptr, > void *vars, > fn (vars); > } > > +static bool > +host_evaluate_device (int device_num __attribute__ ((unused)), > + const char *kind __attribute__ ((unused)), > + const char *arch __attribute__ ((unused)), > + const char *isa __attribute__ ((unused))) > +{ > + __builtin_unreachable (); GOMP_fatal or at least abort (); please. Plus, what Tobias said about the passed in device_num, decide whether it is the actual device number, or one with backwards compatibility transformations and depending on that adjust it on the runtime side or on the library side. Jakub