Hi Thomas,

first, I have the feeling we talk about (more or less) the same code region and use the same words – but we talk about rather different things. Thus, you confuse me (and possibly Andrew) – and my reply confuses you.

Thomas Schwinge wrote:
On 2024-03-07T12:43:07+0100, Tobias Burnus<tbur...@baylibre.com>  wrote:
Thomas Schwinge wrote:
First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it
is also not really desirable.
External users probably don't, but certainly all our internal testing is
setting it,

First, I doubt it – secondly, if it were true, it was broken for the last 5 years or so as we definitely did not notice fails due to not working offload devices. – Neither for AMD GCN nor ...

and also implicitly all nvptx offloading testing: simply by
means of having such knob in the libgomp nvptx plugin.

I did see it at some places set for AMD but I do not see any nvptx-specific environment variable which permits to do the same.

However:
  That is, the
libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for
(the original meaning of) that flag

I think that's one of the problems here – you talk about suppress_host_fallback (implicit, original meaning), while I talk about the GCN_SUPPRESS_HOST_FALLBACK environment variable.

Besides all the talk about suppress_host_fallback, 'init_hsa_runtime_functions' is not fatal' of the subject line seems to be something to be considered (beyond the patches you already suggested).


If I run on my Linux system the system compiler with nvptx + gcn suppost
installed, I get (with a nvptx permission problem):

$ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out

libgomp: GCN host fallback has been suppressed

And exit code = 1. The same result with '-foffload=disable' or with
'-foffload=nvptx-none'.
I can't tell if that's what you expect to see there, or not?

Well, obviously not that I get this error by default – and as your wording indicated that the internal variable will be always true – and not only when the env var GCN_SUPPRESS_HOST_FALLBACK is explicit set, I worry that I would get the error any time.

(For avoidance of doubt: I'm expecting silent host-fallback execution in
case that libgomp GCN and/or nvptx plugins are available, but no
corresponding devices.  That's what my patch achieves.)

I concur that the silent host fallback should happen by default (unless env vars tell otherwise) - at least when either no code was generated for the device (e.g. -foffload=disable) or when the vendor runtime library is not available or no device (be it no hardware or no permission).

That's the current behavior and if that remains, my main concern evaporates.

* * *

If we want to remove it, we can make it always false - but I am strongly
against making it always true.
I'm confused.  So you want the GCN and nvptx plugins to behave
differently in that regard?
No – or at least: not unless GCN_SUPPRESS_HOST_FALLBACK is set.
Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to
prevent the host fallback, but don't break somewhat common systems.
That's an orthogonal concept?

No – It's the same concept of the main use of the GCN_SUPPRESS_HOST_FALLBACK environment variable: You get a run-time error instead of a silent host fallback.

But I have in the whole thread the feeling that – while talking about the same code region and throwing in the same words – we actually talk about completely different things.

Tobias

Reply via email to