Hi Andrew! On 2024-03-07T11:38:27+0000, Andrew Stubbs <a...@baylibre.com> wrote: > On 07/03/2024 11:29, Thomas Schwinge wrote: >> On 2019-11-12T13:29:16+0000, Andrew Stubbs <a...@codesourcery.com> wrote: >>> This patch contributes the GCN libgomp plugin, with the various >>> configure and make bits to go with it. >> >> An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is >> different from the libgomp-level host-fallback execution): >> >>> --- /dev/null >>> +++ b/libgomp/plugin/plugin-gcn.c >> >>> +/* Flag to decide if the runtime should suppress a possible fallback to >>> host >>> + execution. */ >>> + >>> +static bool suppress_host_fallback; >> >>> +static void >>> +init_environment_variables (void) >>> +{ >>> + [...] >>> + if (secure_getenv ("GCN_SUPPRESS_HOST_FALLBACK")) >>> + suppress_host_fallback = true; >>> + else >>> + suppress_host_fallback = false; >> >>> +/* Return true if the HSA runtime can run function FN_PTR. */ >>> + >>> +bool >>> +GOMP_OFFLOAD_can_run (void *fn_ptr) >>> +{ >>> + struct kernel_info *kernel = (struct kernel_info *) fn_ptr; >>> + >>> + init_kernel (kernel); >>> + if (kernel->initialization_failed) >>> + goto failure; >>> + >>> + return true; >>> + >>> +failure: >>> + if (suppress_host_fallback) >>> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed"); >>> + GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n"); >>> + return false; >>> +} >> >> This originates in the libgomp HSA plugin, where the idea was -- in my >> understanding -- that you wouldn't have device code available for all >> 'fn_ptr's, and in that case transparently (shared-memory system!) do >> host-fallback execution. Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set, >> you'd get those diagnosed. >> >> This has then been copied into the libgomp GCN plugin (see above). >> However, is it really still applicable there; don't we assume that we're >> generating device code for all relevant functions? (I suppose everyone >> really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?) Should we thus >> actually remove 'suppress_host_fallback' (that is, make it >> always-'true'), including removal of the 'can_run' hook? (I suppose that >> even in a future shared-memory "GCN" configuration, we're not expecting >> to use this again; expecting always-'true' for 'can_run'?) >> >> >> Now my actual issue: the libgomp GCN plugin then invented an additional >> use of 'GCN_SUPPRESS_HOST_FALLBACK': >> >>> +/* Initialize hsa_context if it has not already been done. >>> + Return TRUE on success. */ >>> + >>> +static bool >>> +init_hsa_context (void) >>> +{ >>> + hsa_status_t status; >>> + int agent_index = 0; >>> + >>> + if (hsa_context.initialized) >>> + return true; >>> + init_environment_variables (); >>> + if (!init_hsa_runtime_functions ()) >>> + { >>> + GCN_WARNING ("Run-time could not be dynamically opened\n"); >>> + if (suppress_host_fallback) >>> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed"); >>> + return false; >>> + } >> >> That is, if 'GCN_SUPPRESS_HOST_FALLBACK' is (routinely) set (for its >> original purpose), and you have the libgomp GCN plugin configured, but >> don't have 'libhsa-runtime64.so.1' available, you run into a fatal error. >> >> The libgomp nvptx plugin in such cases silently disables the >> plugin/device (and thus lets libgomp proper do its thing), and I propose >> we do the same here. OK to push the attached >> "GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to >> 'init_hsa_runtime_functions' is not fatal"? > > If you try to run the offload testsuite on a device that is not properly > configured then we want FAIL
Exactly, and that's what I'm working towards. (Currently we're not implementing that properly.) But why is 'GCN_SUPPRESS_HOST_FALLBACK' controlling 'init_hsa_runtime_functions' relevant for that? As you know, that function only deals with dynamically loading 'libhsa-runtime64.so.1', and Failure to load that one (because it doesn't exist) should have the agreed-upon behavior of *not* raising an error. (Any other, later errors should be fatal, I certainly agree.) > not pass-via-fallback. You're breaking that. Sorry, I don't follow, please explain? Grüße Thomas