Re: [PATCH] OpenMP: Disable GPU threads when only teams are used

Jakub Jelinek via Gcc-patches Fri, 03 Jul 2020 04:06:00 -0700

On Thu, Jul 02, 2020 at 10:16:25PM +0100, Andrew Stubbs wrote:
> On 02/07/2020 18:00, Jakub Jelinek wrote:
> > On Thu, Jul 02, 2020 at 05:15:20PM +0100, Andrew Stubbs wrote:
> > > This patch, originally by Kwok, auto-adjusts the default OpenMP target
> > > arguments to set num_threads(1) when there are no parallel regions. There
> > > may still be multiple teams in this case.
> > > 
> > > The result is that libgomp will not attempt to launch GPU threads that 
> > > will
> > > never get used.
> > > 
> > > OK to commit?
> > 
> > That doesn't look safe to me.
> > My understanding of the patch is that it looks for parallel construct
> > lexically in the target region, but that isn't sufficient, one can do that
> > only if the target region can't encounter a parallel construct in the target
> > region (i.e. the body and all functions that are called from it at runtime).
> 
> OpenMP is complicated. :-(


And it is and getting worse.

> Is it normally expected that the runtime will always launch the maximum
> number of threads, just in case?

That is an implementation detail, the OpenMP model doesn't require that.
The question is whether when encountering the parallel you can ask for more
threads or not.  E.g. on the host or in the host fallback, that is the case, we
can just pthread_create as many threads as needed, for PTX there is the
theoretical possibility to use dynamic parallelism, but I think it doesn't
really work well and there were major problems with that.

Anyway, I'd think OpenMP code that will only do teams and not parallel
paralelism will be very rare in practice, it is true that in our testsuite
we have probably a lot of tests for that but those are artificial tests.
If somebody wants to get as much as possible from the hw, one should use all
of teams, parallel and simd parallelism.

If the user put an explicit thread_limit clause, I'd just trust the user
what he is doing.  If not, it is implementation defined what the maximum
will be, but I'd say using a maximum of 1 if we don't find a parallel
construct lexically nested is not a good default, even when it can be
conforming.  Because a reasonable application will have the parallel
parallelism burried in one or more of the functions it calls, or if not,
will use explicit thread_limit(1).

If you want to perform some IPA analysis for this and tweak the default
thread_limit based on what it (conservatively) finds out, I have nothing
against that.

        Jakub

Re: [PATCH] OpenMP: Disable GPU threads when only teams are used

Reply via email to