Hi!

On Thu, 26 Sep 2013 22:19:31 +0400, Evgeny Gavrin <e.gav...@samsung.com> wrote:
>       My colleagues and I shared our current implementation of OpenACC 1.0
> to the [openacc-1_0-branch].

Many thanks for posting this; I had a first look at your patch.  I'm
still learning my share of GCC internals in this area; this is by no
means an in-depth review.

Before discussing your changes, we should first clarify some more
high-level questions about how to continue all our work.  (Thus, adding a
few more people to be CCed.)

Is my understanding correct that the GCC policy regarding extensions such
as support for OpenACC or OpenMP 4 is: first develop and polish this on a
branch (such as openacc-1_0-branch or gomp-4_0-branch), and once
*everything* of the respective standard has been implemented, the
development branch is then merged into mainline (closing it at the same
time), instead of committing individual sub-features (such as support for
only »#pragme acc parallel« but not yet covering the whole respective
standard) directly to trunk?  The issue with the latter, I assume, is
that such half-finished implementations in trunk might delay/disturb GCC
releases.

OpenACC and OpenMP's target construct are very similar in that they can
be expected to share at least a large part of the "device offloading"
functionality: managing memory mappings and data copying, device
detection, initalization, code offloading, and so on.  In my
understanding it makes no sense to duplicate that, and instead the
OpenACC implementation shall re-use what has already been implemented for
OpenMP, or rather: is currently being implemented on gomp-4_0-branch
(using per-device plugins).  For this, I think the OpenACC implementation
should either be done on a branch based on gomp-4_0-branch, or directly
in gomp-4_0-branch.  (This is assuming that gomp-4_0-branch will not
"soon" be merged into trunk.)  Having a separate branch from
gomp-4_0-branch (but based on it) would be higher maintenance overhead
(merging between branches), and implementing OpenACC in gomp-4_0-branch
might bring us into the situation that OpenMP 4 support is ready for
merge into trunk but OpenACC support is not yet, and thus delaying the
merge.  How to solve this?


Back to your patch.

> It's not finished, and not even bootstraps, but able to produce working
> executables with offloading to GPU.
> Since now, we will continue the further development of OpenACC on this
> branch on a regular basis.

I had to apply the following patch to make it build:

--- gcc/fortran/resolve.c
+++ gcc/fortran/resolve.c
@@ -14538,10 +14538,9 @@ insert_acc_data_construct (gfc_namespace *ns, 
gfc_acc_clauses *acc_clauses)
 {
   gfc_code *head = ns->code;
   gfc_code *tail, *end;
-  gfc_code *new_root = gfc_get_code ();
-  gfc_code *block = gfc_get_code ();
+  gfc_code *new_root = gfc_get_code (EXEC_ACC_DATA);
+  gfc_code *block = gfc_get_code (EXEC_ACC_DATA);
 
-  new_root->op = block->op = EXEC_ACC_DATA;
   new_root->block = block;
   new_root->ext.acc_clauses = acc_clauses;
 
--- gcc/fortran/trans-openacc.c
+++ gcc/fortran/trans-openacc.c
@@ -293,7 +293,7 @@ gfc_trans_acc_array_reduction (tree c, gfc_symbol *sym, 
locus where)
       gfc_start_block (&block);
       gfc_add_expr_to_block (&block, gfc_trans_assignment (e3, e4, false,
                              true));
-      gfc_add_expr_to_block (&block, gfc_trans_dealloc_allocated (decl, 
false));
+      gfc_add_expr_to_block (&block, gfc_trans_dealloc_allocated (decl, false, 
NULL));
       stmt = gfc_finish_block (&block);
     }
   else

Does that look right to you, and should I commit this patch?

Unfortunately, even with that in place, I'm getting ICEs in LTO streaming
(that is, the test cases specifying -flto GCC's testsuite), and also for
trivial OpenACC code with -fopenacc, both for Fortran and C code.


The "trivia" section: your change(s) will need to be reworked to comply
with the GCC coding standards: indentation, maximum line length,
copyright/licensing statements for new files (also, your liboacc seems to
be missing the GCC Runtime Library Exception), and so on.  Also please
avoid spurious changes, such as whitespace changes.  (Makes diffs less
readable; I used diff's -b option when reading your patch.)

As you said yourself (but what is of course totally fine at this early
development stage), the code is not yet mature: there are TODOs for
missing error checking, catching invalid usage of directives by users,
generally robustness, and so on.  We shall address these as we continue
working on this.

In gcc/builtin-types.def, sort your added types properly next to the
existing ones.

The gcc/gcc.c driver spec change to handle -fopenacc: for completeness,
update gcc/config/darwin.h accordingly.


> Test suite:
>       We've generated lots of compile-time tests for all front-ends, but
> I'm not sure it's the good idea to add thousands of tests to the GCC's
> testsuite. Meanwhile, test generators are very useful for "nuclear testing".
>       Manually rewritten tests for Fortran are in repository.
>       For C and C++ - soon.

These are the Fortran -fopenacc test cases in
gcc/testsuite/gfortran.dg/gacc/ -- a lot of individual ones, which
probably we should merge into fewer files?


> Back-end & [liboacc]:
>       For back-end, we developed the prototype of code generator that
> emits OpenCL 1.1 for kernel/parallel directives and for copy* clauses.
> [liboacc] requires OpenCL SDK installed, the path to SDK must be specified
> in configuration stage (${GCC_SRC_TOP}/configure):
> --with-opencl-include=PATH, which specifies path to OpenCL SDK header files
> and --with-opencl-lib=PATH, which specifies path to libOpenCL. 
>       ${SRC_DIR}/configure \
>               --with-opencl-include=/usr/include/ \
>               --with-opencl-lib=/usr/lib/nvidia-310
> 
> If mentioned OpenCL SDK has regular directory structure, that is header
> files in ${OPENCL_SDK_ROOT}/include, library in ${OPENCL_SDK_ROOT}/lib, then
> it is sufficient to supply only path to root of SDK.
> In future its planned to control build with only --enable-openacc.

(With Debian/Ubuntu's nvidia-opencl-dev package installed, I just used
--with-opencl=/usr.)

The functions implemented in gcc/gimple-opencl.c translate Gimple
statements into OpenCL -- this can be considered sort-of a GCC backend,
but should then be better integrated (but of course I understand how this
is useful for you as it is right now).  You currently expand OpenACC
after OpenMP expansion, after some optimization passes have been run --
the latter being, I asume, because you're then directly generating OpenCL
From the (partly optimized) trees.


I have not yet reviewed liboacc it in any detail.  It is highly specific
to OpenCL, and as discussed above, I think it should be designed and
integrated in line with the libgomp enhancements currently being
discussed and implemented in gomp-4_0-branch.

The Fortran OpenACC library interface (include file openacc_lib.h and
module openacc) is not yet implemented.

A bit more "trivia": changes to contrib/gcc_update are missing for the
generated files in liboacc, and these should be using the same Autoconf
and Automake versions as the rest of GCC is.


As already discussed/suggested before: I suggest we re-use more OpenMP
infrastructure.  (Of coruse, you're right that as long as one's not
familiar with the existing code, first copying the needed pieces from
OpenMP and changing these as required for OpenACC will be easier than
first understanding all of the existing code and generalizing (adapting
them as required) into the OpenACC context.  So I can completely
understand your approach.)  The goal, I'd say, now is to reduce code
duplication; it'll be fine to generalize and extend existing OpenMP code
to make it usable for OpenACC, too.  As we concluded with Jakub and
Nathan, we shall not rename *omp* functions, to avoid issues when
backporting patches, etc.  I think we also shouldn't add any »#define
foo_oacc_bar foo_omp_bar« as that will only add new names to remember for
no real benefit -- instead just use existing *omp* stuff for OpenACC,
too.

In OpenACC pragma handling, for example,
gcc/c-family/c-pragma.c:acc_pragma_def can re-use omp_pragma_def.  It may
make sense to merge gcc/c/c-parser.c:c_parser_acc_construct into
c_parser_omp_construct.

For OpenACC clauses it seems that a lot of (if not all) infrastructure
can be shared (extending it as needed), avoiding OpenACC
re-implementations of gcc/c-family/c-pragma.h:enum pragma_acc_clause;
gcc/c/c-parser.c:c_parser_acc_clause_*;
gcc/gimplify.c:gimplify_scan_acc_clauses; gcc/tree-core.h:enum
tree_node_kind:acc_clause_kind, enum acc_clause_code, struct
tree_acc_clause; gcc/tree-dump.c:dequeue_and_dump:case ACC_CLAUSE;
gcc/tree-pretty-print.c:dump_acc_clause*; gcc/tree.c:acc_clause_num_ops,
acc_clause_code_name; gcc/tree.defs:ACC_CLAUSE; gcc/tree.h:ACC_CLAUSE_*;
gcc/treestruct.def:TS_ACC_CLAUSE, omp-low.c:acc_reduction_init.

I have not yet looked into your most of your gcc/cp and gcc/fortran
changes; I have so far mostly been working on the C side of things.

For gcc/tree.def and gcc/gimple.def I agree it makes sense to have codes
specific to OpenACC instead of, for example, shoehorning OACC_PARALLEL
into the existing OMP_PARALLEL (and then adding a flag to it whether it
is an OpenACC or OpenMP parallel statement...).  The only concept I can
think of where it'd make sense to even there re-use the existing OpenMP
codes would be if there was a distinct "context" inside which it's clear
whether OMP_PARALLEL means an OpenACC or OpenMP parallel construct.  But
that's probably more complex than simply having separate codes.  But, we
can re-use gcc/gimple.h:gimple_statement_omp as basis for OpenACC codes,
avoiding gcc/gimple.h:gimple_statement_acc,
gcc/gsstruct.def:gimple_statement_acc.

The new gcc/diagnose-gotos.c should probably re-use/be merged with the
existing omp-low.c:pass_diagnose_omp_blocks from which it duplicates.

In gimplification/OpenACC lowering, I have not yet digested your
approach.  It's different from OpenMP's region building code (terminated
with GIMPLE_OMP_RETURN), which I've began using in my current development
for OpenACC.  Can you comment on your approach?  Until I get a clear
understanding of your approach (which I'll continue to figure out on my
own, too), I can't really comment on most of the other gimplification and
lowering code yet.

Your loop analysis and parallelization for OpenACC kernels constructs
seems to be based on tree-parloops.c code, which should also be properly
shared instead of duplicating it.

In gcc/tree-inline.c:estimate_num_insns there is OpenACC cost management
not re-using OpenMP's (which might make sense to do?) -- but I can't see
acc_cost being initialized anywhere?


Documentation:

> Soon, we'll share some wiki articles with the description of [liboacc] and
> our understanding of OpenACC to OpenCL translation magic.

Even so, eventually GCC's Texinfo documentation needs to be updated, too:
gcc/doc/generic.texi (OpenACC codes), gcc/doc/gimple.texi (OpenACC
codes), gcc/doc/invoke.texi (-fopenacc), gcc/doc/passes.texi (lowering
and expansion passes), gcc/doc/sourcebuild.texi (mention runtime
library), gcc/fortran/gfortran.texi, gcc/fortran/intrinsic.texi,
gcc/fortran/invoke.texi, as well as a liboacc manual or updated libgomp
manual assuming we're going that route.


So, how to continue?  Would it make sense that we first
rework/stabilitize the general infrastructure in openacc-1_0-branch, or
gomp-4_0-branch, or trunk, or yet another branch -- based on
gomp-4_0-branch?  Then, flesh out the implementation of one construct,
OpenACC parallel (for it is simpler than the kernels one)?  Then, after
we're happy with that, add the other ones?


Grüße,
 Thomas

Attachment: pgp3FqjvxK0Zh.pgp
Description: PGP signature

Reply via email to