Hi Thomas,
first, I have the feeling we talk about (more or less) the same code
region and use the same words – but we talk about rather different
things. Thus, you confuse me (and possibly Andrew) – and my reply
confuses you.
Thomas Schwinge wrote:
On 2024-03-07T12:43:07+0100, Tobias Burnus<tbur...@baylibre.com> wrote:
Thomas Schwinge wrote:
First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it
is also not really desirable.
External users probably don't, but certainly all our internal testing is
setting it,
First, I doubt it – secondly, if it were true, it was broken for the
last 5 years or so as we definitely did not notice fails due to not
working offload devices. – Neither for AMD GCN nor ...
and also implicitly all nvptx offloading testing: simply by
means of having such knob in the libgomp nvptx plugin.
I did see it at some places set for AMD but I do not see any
nvptx-specific environment variable which permits to do the same.
However:
That is, the
libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for
(the original meaning of) that flag
I think that's one of the problems here – you talk about
suppress_host_fallback (implicit, original meaning), while I talk about
the GCN_SUPPRESS_HOST_FALLBACK environment variable.
Besides all the talk about suppress_host_fallback,
'init_hsa_runtime_functions' is not fatal' of the subject line seems to
be something to be considered (beyond the patches you already suggested).
If I run on my Linux system the system compiler with nvptx + gcn suppost
installed, I get (with a nvptx permission problem):
$ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out
libgomp: GCN host fallback has been suppressed
And exit code = 1. The same result with '-foffload=disable' or with
'-foffload=nvptx-none'.
I can't tell if that's what you expect to see there, or not?
Well, obviously not that I get this error by default – and as your
wording indicated that the internal variable will be always true – and
not only when the env var GCN_SUPPRESS_HOST_FALLBACK is explicit set, I
worry that I would get the error any time.
(For avoidance of doubt: I'm expecting silent host-fallback execution in
case that libgomp GCN and/or nvptx plugins are available, but no
corresponding devices. That's what my patch achieves.)
I concur that the silent host fallback should happen by default (unless
env vars tell otherwise) - at least when either no code was generated
for the device (e.g. -foffload=disable) or when the vendor runtime
library is not available or no device (be it no hardware or no permission).
That's the current behavior and if that remains, my main concern evaporates.
* * *
If we want to remove it, we can make it always false - but I am strongly
against making it always true.
I'm confused. So you want the GCN and nvptx plugins to behave
differently in that regard?
No – or at least: not unless GCN_SUPPRESS_HOST_FALLBACK is set.
Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to
prevent the host fallback, but don't break somewhat common systems.
That's an orthogonal concept?
No – It's the same concept of the main use of the
GCN_SUPPRESS_HOST_FALLBACK environment variable: You get a run-time
error instead of a silent host fallback.
But I have in the whole thread the feeling that – while talking about
the same code region and throwing in the same words – we actually talk
about completely different things.
Tobias