On 9/10/20 4:20 AM, Richard Biener wrote:
On Wed, Sep 9, 2020 at 7:55 PM Sandra Loosemore <san...@codesourcery.com> wrote:

This set of patches implements C/C++ and Fortran front end support for
adding "acc loop auto" annotations to loop nests in OpenACC kernels
regions.  For background on this, refer to Thomas Schwinge's talk from
last year's cauldron, at

https://gcc.gnu.org/wiki/cauldron2019talks?action=AttachFile&do=view&target=OpenACC+kernels-cauldron2019.pdf

In particular, pages 20-24 describe this part of the work.  We're
trying to identify loops that might be parallelizable and convert them
to ACC_LOOP tree structures for further analysis, instead of lowering
them to goto form early in compilation, as we do with ordinary
for/while/do loops in C/C++ and DO loops in Fortran.

So the issue I ran into when trying a simplistic "transfer" of DO CONCURRENT
is that variables in DO CONCURRENT scope get moved to function scope
by simplification and nothing prevents optimizers from extending lifetime
of those which means we end up eventually creating additional cross-iteration
dependences and the result is a loop that is no longer satisfying 'DO
CONCURRENT'.

I don't have any background on this issue, but I think it must be orthogonal? My patch only examines EXEC_DO, not EXEC_DO_CONCURRENT.

I realize OACC handling is hacked in place in a set of passes during early
optimization so these kind of transforms simply might not happen "yet"
(by luck - nothing made them "invalid" on GIMPLE).

I didn't look at the how you "annotate" and until when the annotation prevails
(the headers of the two patches don't say so either) so maybe you will
not have such issues by design?

The strategy is pretty simple: it does a code walk to examine the parsed form of ordinary loop constructs (EXEC_DO in Fortran, FOR_STMT in the newly combined C/C++ representation) within a kernels region. If any loop in a nest has an explicit "acc loop" annotation, the annotator ignores that entire nest on the theory that the user has already indicated what parallelism they want, except for combined "acc kernels loop" directives where the intent in actual code seems to be to try to optimize the entire nest. It does some sanity checks about modification of the loop variable in the body of the loop, etc. If it looks plausible, the annotator changes the representation to the equivalent of "acc loop auto", and it's up to later passes to figure out whether "auto" can be compiled as "parallel" or if it has to fall back to "seq". I tried to add a lot of comments throughout the code explaining the rationale for the various heuristics and restrictions controlling the annotation.

-Sandra

Reply via email to