Hi! On Thu, 26 Sep 2013 22:19:31 +0400, Evgeny Gavrin <e.gav...@samsung.com> wrote: > My colleagues and I shared our current implementation of OpenACC 1.0 > to the [openacc-1_0-branch].
Many thanks for posting this; I had a first look at your patch. I'm still learning my share of GCC internals in this area; this is by no means an in-depth review. Before discussing your changes, we should first clarify some more high-level questions about how to continue all our work. (Thus, adding a few more people to be CCed.) Is my understanding correct that the GCC policy regarding extensions such as support for OpenACC or OpenMP 4 is: first develop and polish this on a branch (such as openacc-1_0-branch or gomp-4_0-branch), and once *everything* of the respective standard has been implemented, the development branch is then merged into mainline (closing it at the same time), instead of committing individual sub-features (such as support for only »#pragme acc parallel« but not yet covering the whole respective standard) directly to trunk? The issue with the latter, I assume, is that such half-finished implementations in trunk might delay/disturb GCC releases. OpenACC and OpenMP's target construct are very similar in that they can be expected to share at least a large part of the "device offloading" functionality: managing memory mappings and data copying, device detection, initalization, code offloading, and so on. In my understanding it makes no sense to duplicate that, and instead the OpenACC implementation shall re-use what has already been implemented for OpenMP, or rather: is currently being implemented on gomp-4_0-branch (using per-device plugins). For this, I think the OpenACC implementation should either be done on a branch based on gomp-4_0-branch, or directly in gomp-4_0-branch. (This is assuming that gomp-4_0-branch will not "soon" be merged into trunk.) Having a separate branch from gomp-4_0-branch (but based on it) would be higher maintenance overhead (merging between branches), and implementing OpenACC in gomp-4_0-branch might bring us into the situation that OpenMP 4 support is ready for merge into trunk but OpenACC support is not yet, and thus delaying the merge. How to solve this? Back to your patch. > It's not finished, and not even bootstraps, but able to produce working > executables with offloading to GPU. > Since now, we will continue the further development of OpenACC on this > branch on a regular basis. I had to apply the following patch to make it build: --- gcc/fortran/resolve.c +++ gcc/fortran/resolve.c @@ -14538,10 +14538,9 @@ insert_acc_data_construct (gfc_namespace *ns, gfc_acc_clauses *acc_clauses) { gfc_code *head = ns->code; gfc_code *tail, *end; - gfc_code *new_root = gfc_get_code (); - gfc_code *block = gfc_get_code (); + gfc_code *new_root = gfc_get_code (EXEC_ACC_DATA); + gfc_code *block = gfc_get_code (EXEC_ACC_DATA); - new_root->op = block->op = EXEC_ACC_DATA; new_root->block = block; new_root->ext.acc_clauses = acc_clauses; --- gcc/fortran/trans-openacc.c +++ gcc/fortran/trans-openacc.c @@ -293,7 +293,7 @@ gfc_trans_acc_array_reduction (tree c, gfc_symbol *sym, locus where) gfc_start_block (&block); gfc_add_expr_to_block (&block, gfc_trans_assignment (e3, e4, false, true)); - gfc_add_expr_to_block (&block, gfc_trans_dealloc_allocated (decl, false)); + gfc_add_expr_to_block (&block, gfc_trans_dealloc_allocated (decl, false, NULL)); stmt = gfc_finish_block (&block); } else Does that look right to you, and should I commit this patch? Unfortunately, even with that in place, I'm getting ICEs in LTO streaming (that is, the test cases specifying -flto GCC's testsuite), and also for trivial OpenACC code with -fopenacc, both for Fortran and C code. The "trivia" section: your change(s) will need to be reworked to comply with the GCC coding standards: indentation, maximum line length, copyright/licensing statements for new files (also, your liboacc seems to be missing the GCC Runtime Library Exception), and so on. Also please avoid spurious changes, such as whitespace changes. (Makes diffs less readable; I used diff's -b option when reading your patch.) As you said yourself (but what is of course totally fine at this early development stage), the code is not yet mature: there are TODOs for missing error checking, catching invalid usage of directives by users, generally robustness, and so on. We shall address these as we continue working on this. In gcc/builtin-types.def, sort your added types properly next to the existing ones. The gcc/gcc.c driver spec change to handle -fopenacc: for completeness, update gcc/config/darwin.h accordingly. > Test suite: > We've generated lots of compile-time tests for all front-ends, but > I'm not sure it's the good idea to add thousands of tests to the GCC's > testsuite. Meanwhile, test generators are very useful for "nuclear testing". > Manually rewritten tests for Fortran are in repository. > For C and C++ - soon. These are the Fortran -fopenacc test cases in gcc/testsuite/gfortran.dg/gacc/ -- a lot of individual ones, which probably we should merge into fewer files? > Back-end & [liboacc]: > For back-end, we developed the prototype of code generator that > emits OpenCL 1.1 for kernel/parallel directives and for copy* clauses. > [liboacc] requires OpenCL SDK installed, the path to SDK must be specified > in configuration stage (${GCC_SRC_TOP}/configure): > --with-opencl-include=PATH, which specifies path to OpenCL SDK header files > and --with-opencl-lib=PATH, which specifies path to libOpenCL. > ${SRC_DIR}/configure \ > --with-opencl-include=/usr/include/ \ > --with-opencl-lib=/usr/lib/nvidia-310 > > If mentioned OpenCL SDK has regular directory structure, that is header > files in ${OPENCL_SDK_ROOT}/include, library in ${OPENCL_SDK_ROOT}/lib, then > it is sufficient to supply only path to root of SDK. > In future its planned to control build with only --enable-openacc. (With Debian/Ubuntu's nvidia-opencl-dev package installed, I just used --with-opencl=/usr.) The functions implemented in gcc/gimple-opencl.c translate Gimple statements into OpenCL -- this can be considered sort-of a GCC backend, but should then be better integrated (but of course I understand how this is useful for you as it is right now). You currently expand OpenACC after OpenMP expansion, after some optimization passes have been run -- the latter being, I asume, because you're then directly generating OpenCL From the (partly optimized) trees. I have not yet reviewed liboacc it in any detail. It is highly specific to OpenCL, and as discussed above, I think it should be designed and integrated in line with the libgomp enhancements currently being discussed and implemented in gomp-4_0-branch. The Fortran OpenACC library interface (include file openacc_lib.h and module openacc) is not yet implemented. A bit more "trivia": changes to contrib/gcc_update are missing for the generated files in liboacc, and these should be using the same Autoconf and Automake versions as the rest of GCC is. As already discussed/suggested before: I suggest we re-use more OpenMP infrastructure. (Of coruse, you're right that as long as one's not familiar with the existing code, first copying the needed pieces from OpenMP and changing these as required for OpenACC will be easier than first understanding all of the existing code and generalizing (adapting them as required) into the OpenACC context. So I can completely understand your approach.) The goal, I'd say, now is to reduce code duplication; it'll be fine to generalize and extend existing OpenMP code to make it usable for OpenACC, too. As we concluded with Jakub and Nathan, we shall not rename *omp* functions, to avoid issues when backporting patches, etc. I think we also shouldn't add any »#define foo_oacc_bar foo_omp_bar« as that will only add new names to remember for no real benefit -- instead just use existing *omp* stuff for OpenACC, too. In OpenACC pragma handling, for example, gcc/c-family/c-pragma.c:acc_pragma_def can re-use omp_pragma_def. It may make sense to merge gcc/c/c-parser.c:c_parser_acc_construct into c_parser_omp_construct. For OpenACC clauses it seems that a lot of (if not all) infrastructure can be shared (extending it as needed), avoiding OpenACC re-implementations of gcc/c-family/c-pragma.h:enum pragma_acc_clause; gcc/c/c-parser.c:c_parser_acc_clause_*; gcc/gimplify.c:gimplify_scan_acc_clauses; gcc/tree-core.h:enum tree_node_kind:acc_clause_kind, enum acc_clause_code, struct tree_acc_clause; gcc/tree-dump.c:dequeue_and_dump:case ACC_CLAUSE; gcc/tree-pretty-print.c:dump_acc_clause*; gcc/tree.c:acc_clause_num_ops, acc_clause_code_name; gcc/tree.defs:ACC_CLAUSE; gcc/tree.h:ACC_CLAUSE_*; gcc/treestruct.def:TS_ACC_CLAUSE, omp-low.c:acc_reduction_init. I have not yet looked into your most of your gcc/cp and gcc/fortran changes; I have so far mostly been working on the C side of things. For gcc/tree.def and gcc/gimple.def I agree it makes sense to have codes specific to OpenACC instead of, for example, shoehorning OACC_PARALLEL into the existing OMP_PARALLEL (and then adding a flag to it whether it is an OpenACC or OpenMP parallel statement...). The only concept I can think of where it'd make sense to even there re-use the existing OpenMP codes would be if there was a distinct "context" inside which it's clear whether OMP_PARALLEL means an OpenACC or OpenMP parallel construct. But that's probably more complex than simply having separate codes. But, we can re-use gcc/gimple.h:gimple_statement_omp as basis for OpenACC codes, avoiding gcc/gimple.h:gimple_statement_acc, gcc/gsstruct.def:gimple_statement_acc. The new gcc/diagnose-gotos.c should probably re-use/be merged with the existing omp-low.c:pass_diagnose_omp_blocks from which it duplicates. In gimplification/OpenACC lowering, I have not yet digested your approach. It's different from OpenMP's region building code (terminated with GIMPLE_OMP_RETURN), which I've began using in my current development for OpenACC. Can you comment on your approach? Until I get a clear understanding of your approach (which I'll continue to figure out on my own, too), I can't really comment on most of the other gimplification and lowering code yet. Your loop analysis and parallelization for OpenACC kernels constructs seems to be based on tree-parloops.c code, which should also be properly shared instead of duplicating it. In gcc/tree-inline.c:estimate_num_insns there is OpenACC cost management not re-using OpenMP's (which might make sense to do?) -- but I can't see acc_cost being initialized anywhere? Documentation: > Soon, we'll share some wiki articles with the description of [liboacc] and > our understanding of OpenACC to OpenCL translation magic. Even so, eventually GCC's Texinfo documentation needs to be updated, too: gcc/doc/generic.texi (OpenACC codes), gcc/doc/gimple.texi (OpenACC codes), gcc/doc/invoke.texi (-fopenacc), gcc/doc/passes.texi (lowering and expansion passes), gcc/doc/sourcebuild.texi (mention runtime library), gcc/fortran/gfortran.texi, gcc/fortran/intrinsic.texi, gcc/fortran/invoke.texi, as well as a liboacc manual or updated libgomp manual assuming we're going that route. So, how to continue? Would it make sense that we first rework/stabilitize the general infrastructure in openacc-1_0-branch, or gomp-4_0-branch, or trunk, or yet another branch -- based on gomp-4_0-branch? Then, flesh out the implementation of one construct, OpenACC parallel (for it is simpler than the kernels one)? Then, after we're happy with that, add the other ones? Grüße, Thomas
pgp3FqjvxK0Zh.pgp
Description: PGP signature