On 28/09/2020 15:02, Tom de Vries wrote:
This patch simply skips barriers when they would "wait" for only one
thread (the current thread). This means that teams nested inside other
teams now run independently, instead of strictly in lock-step, and is
only valid as long as inner teams are limited to one thread each
(currently the case).
Is this inner-team-one-thread-limit coded or documented somewhere?
In libgomp/parallel.c, gomp_resolve_num_threads we have:
else if (thr->ts.active_level >= 1 && !icv->nest_var)
return 1;
If so, it might be good to add a comment there referring to the code
this patch adds.
/* Accelerators with fixed thread counts require this to return 1 for
nested parallel regions. */
WDYT?
Andrew