Re: [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule
On 2022/6/28 10:06 PM, Jakub Jelinek wrote: On Thu, Jun 23, 2022 at 11:47:59PM +0800, Chung-Lin Tang wrote: with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next: (1) chunk_size <= -1: wraps into large unsigned value, seems to work though. (2) chunk_size == 0: infinite loop The (2) behavior is obviously not desired. This patch fixes this by changing Why? It is a user error, undefined behavior, we shouldn't slow down valid code for users who don't bother reading the standard. This is loop init code, not per-iteration. The overhead really isn't that much. The question should be, if GCC having infinite loop behavior is reasonable, even if it is undefined in the spec. E.g. OpenMP 5.1 [132:14] says clearly: "chunk_size must be a loop invariant integer expression with a positive value." and omp_set_schedule for chunk_size < 1 should use a default value (which it does). For OMP_SCHEDULE the standard says it is implementation-defined what happens if the format isn't the specified one, so I guess the env.c change could be acceptable (though without it it is fine too), but the loop.c change is wrong. Note, if the loop.c change would be ok, you'd need to also change loop_ull.c too. I've updated the patch to add the same changes for libgomp/loop_ull.c and updated the testcase too. Tested on mainline trunk without regressions. Thanks, Chung-Lin libgomp/ChangeLog: * env.c (parse_schedule): Make negative values invalid for chunk_size. * loop.c (gomp_loop_init): For non-STATIC schedule and chunk_size <= 0, set initialized chunk_size to 1. * loop_ull.c (gomp_loop_ull_init): Likewise. * testsuite/libgomp.c/loop-28.c: New test.diff --git a/libgomp/env.c b/libgomp/env.c index 1c4ee894515..dff07617e15 100644 --- a/libgomp/env.c +++ b/libgomp/env.c @@ -182,6 +182,8 @@ parse_schedule (void) goto invalid; errno = 0; + if (*env == '-') +goto invalid; value = strtoul (env, &end, 10); if (errno || end == env) goto invalid; diff --git a/libgomp/loop.c b/libgomp/loop.c index be85162bb1e..018b4e9a8bd 100644 --- a/libgomp/loop.c +++ b/libgomp/loop.c @@ -41,7 +41,7 @@ gomp_loop_init (struct gomp_work_share *ws, long start, long end, long incr, enum gomp_schedule_type sched, long chunk_size) { ws->sched = sched; - ws->chunk_size = chunk_size; + ws->chunk_size = (sched == GFS_STATIC || chunk_size > 1) ? chunk_size : 1; /* Canonicalize loops that have zero iterations to ->next == ->end. */ ws->end = ((incr > 0 && start > end) || (incr < 0 && start < end)) ? start : end; diff --git a/libgomp/loop_ull.c b/libgomp/loop_ull.c index 602737296d4..74ddb1bd623 100644 --- a/libgomp/loop_ull.c +++ b/libgomp/loop_ull.c @@ -43,7 +43,7 @@ gomp_loop_ull_init (struct gomp_work_share *ws, bool up, gomp_ull start, gomp_ull chunk_size) { ws->sched = sched; - ws->chunk_size_ull = chunk_size; + ws->chunk_size_ull = (sched == GFS_STATIC || chunk_size > 1) ? chunk_size : 1; /* Canonicalize loops that have zero iterations to ->next == ->end. */ ws->end_ull = ((up && start > end) || (!up && start < end)) ? start : end; diff --git a/libgomp/testsuite/libgomp.c/loop-28.c b/libgomp/testsuite/libgomp.c/loop-28.c new file mode 100644 index 000..664842e27aa --- /dev/null +++ b/libgomp/testsuite/libgomp.c/loop-28.c @@ -0,0 +1,21 @@ +/* { dg-do run } */ +/* { dg-timeout 10 } */ + +void __attribute__((noinline)) +foo (int a[], int n, int chunk_size) +{ + #pragma omp parallel for schedule (dynamic,chunk_size) + for (int i = 0; i < n; i++) +a[i] = i; + + #pragma omp parallel for schedule (dynamic,chunk_size) + for (unsigned long long i = 0; i < n; i++) +a[i] = i; +} + +int main (void) +{ + int a[100]; + foo (a, 100, 0); + return 0; +}
[PATCH, OG10, committed] Support A->B expressions in map clause
This patch tries to allow map(A->ptr) to be properly handled the same way as map(B.ptr) expressions. map(struct:*A) clauses are now produced during gimplify. Julian, I'm CCing you since IIRC you seemed to be the author of this area of code. Would appreciate if you gave a look if you have time, though I've already went ahead and pushed to OG10 after testing results looked okay. Thanks, Chung-Lin gcc/ChangeLog: * gimplify.c ("tree-hash-traits.h"): Add include. (gimplify_scan_omp_clauses): Change struct_map_to_clause to type hash_map *. Adjust struct map handling to handle cases of *A and A->B expressions. (gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for exit data directives code to earlier position. gcc/testsuite/ChangeLog: * g++.dg/gomp/target-3.C: Adjust testcase gimple scanning. * g++.dg/gomp/target-this-2.C: Likewise. * g++.dg/gomp/target-this-3.C: Likewise. * g++.dg/gomp/target-this-4.C: Likewise. libgomp/ChangeLog: * testsuite/libgomp.c++/target-23.C: New testcase. From bf8605f14ec33ea31233a3567f3184fee667b695 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Mon, 8 Feb 2021 07:53:55 -0800 Subject: [PATCH] Enable gimplify GOMP_MAP_STRUCT handling of (COMPONENT_REF (INDIRECT_REF ...)) map clauses. This patch tries to allow map(A->ptr) to be properly handled the same way as map(B.ptr) expressions. map(struct:*A) clauses are now produced during gimplify. This patch, as of time of commit, is only pushed to devel/omp/gcc-10, not yet submitted as mainline patch to upstream. 2021-02-08 Chung-Lin Tang gcc/ChangeLog: * gimplify.c ("tree-hash-traits.h"): Add include. (gimplify_scan_omp_clauses): Change struct_map_to_clause to type hash_map *. Adjust struct map handling to handle cases of *A and A->B expressions. (gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for exit data directives code to earlier position. gcc/testsuite/ChangeLog: * g++.dg/gomp/target-3.C: Adjust testcase gimple scanning. * g++.dg/gomp/target-this-2.C: Likewise. * g++.dg/gomp/target-this-3.C: Likewise. * g++.dg/gomp/target-this-4.C: Likewise. libgomp/ChangeLog: * testsuite/libgomp.c++/target-23.C: New testcase. --- gcc/gimplify.c| 51 +++ gcc/testsuite/g++.dg/gomp/target-3.C | 2 +- gcc/testsuite/g++.dg/gomp/target-this-2.C | 2 +- gcc/testsuite/g++.dg/gomp/target-this-3.C | 2 +- gcc/testsuite/g++.dg/gomp/target-this-4.C | 4 +-- libgomp/testsuite/libgomp.c++/target-23.C | 34 + 6 files changed, 78 insertions(+), 17 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c++/target-23.C diff --git a/gcc/gimplify.c b/gcc/gimplify.c index b90ba5b..ba19017 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include "langhooks.h" #include "tree-cfg.h" #include "tree-ssa.h" +#include "tree-hash-traits.h" #include "omp-general.h" #include "omp-low.h" #include "gimple-low.h" @@ -8514,7 +8515,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, { struct gimplify_omp_ctx *ctx, *outer_ctx; tree c; - hash_map *struct_map_to_clause = NULL; + hash_map *struct_map_to_clause = NULL; hash_set *struct_deref_set = NULL; tree *prev_list_p = NULL, *orig_list_p = list_p; int handled_depend_iterators = -1; @@ -9082,12 +9083,15 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, && TREE_CODE (decl) == INDIRECT_REF && TREE_CODE (TREE_OPERAND (decl, 0)) == COMPONENT_REF && (TREE_CODE (TREE_TYPE (TREE_OPERAND (decl, 0))) - == REFERENCE_TYPE)) + == REFERENCE_TYPE) + && (OMP_CLAUSE_MAP_KIND (c) + != GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION)) { pd = &TREE_OPERAND (decl, 0); decl = TREE_OPERAND (decl, 0); } bool indir_p = false; + bool component_ref_p = false; tree orig_decl = decl; tree decl_ref = NULL_TREE; if ((region_type & (ORT_ACC | ORT_TARGET | ORT_TARGET_DATA)) != 0 @@ -9098,6 +9102,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, while (TREE_CODE (decl) == COMPONENT_REF) { decl = TREE_OPERAND (decl, 0); + component_ref_p = true; if (((TREE_CODE (decl) == MEM_REF && integer_zerop (
[PATCH, OG10, OpenMP, committed] Fix array members in OpenMP map clauses
Previous patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564976.html was reverted by Catherine when I was away, due to regressions in mapping array members. The fix appears to be a re-placement of finish_non_static_data_member() inside handle_omp_array_sections(). Tested and committed to devel/omp/gcc-10, the above patch was also re-committed as well. Chung-Lin From da047f63c601118ad875d13929453094acc6c6c9 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Fri, 26 Feb 2021 20:13:29 +0800 Subject: [PATCH] Fix regression of array members in OpenMP map clauses. Fixed a regression of array members not working in OpenMP map clauses after commit bf8605f14ec33ea31233a3567f3184fee667b695. This patch itself probably should be considered a fix for commit aadfc9843. 2021-02-26 Chung-Lin Tang gcc/cp/ChangeLog: * semantics.c (handle_omp_array_sections): Adjust position of making COMPONENT_REF from FIELD_DECL to earlier position. --- gcc/cp/semantics.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 370d5831091..55a5983528e 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -5386,6 +5386,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) } OMP_CLAUSE_DECL (c) = first; OMP_CLAUSE_SIZE (c) = size; + if (TREE_CODE (t) == FIELD_DECL) + t = finish_non_static_data_member (t, NULL_TREE, NULL_TREE); if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP || (TREE_CODE (t) == COMPONENT_REF && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)) @@ -5414,8 +5416,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) } tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP); - if (TREE_CODE (t) == FIELD_DECL) - t = finish_non_static_data_member (t, NULL_TREE, NULL_TREE); if ((ort & C_ORT_OMP_DECLARE_SIMD) != C_ORT_OMP && ort != C_ORT_ACC) OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_POINTER); else if (TREE_CODE (t) == COMPONENT_REF) -- 2.17.1
[PATCH, C++, OG10, OpenACC/OpenMP, committed] Allow static constexpr fields in mappable types
On 2020/1/21 12:49 AM, Jakub Jelinek wrote: The OpenMP 4.5 definition of mappable type for C++ is that - All data members must be non-static. among other requirements. In OpenMP 5.0 that has been removed. So, if we follow the 4.5 definition, it shouldn't change, if we follow 5.0 definition, the whole loop should be dropped, but in no case shall static constexpr data members be treated any differently from any other static data members. We have merged the patch as is (only static constexprs) to devel/omp/gcc-10 for now. Its possible that the entire checking loop should be eventually removed to allow the full 5.0 range, but wondered if things like (automatic) accessibility of the static members within target regions is an issue to resolve? For now, I've committed the patch in its current state to OG10. Re-tested on OG10, and committed with an additional testcase (same for OpenMP) Chung-Lin cp/ * decl2.c (cp_omp_mappable_type_1): Allow fields with DECL_DECLARED_CONSTEXPR_P to be mapped. testsuite/ * g++.dg/goacc/static-constexpr-1.C: New test. * g++.dg/gomp/static-constexpr-1.C: New test. From 1c3f38b30c1db0aef5ccbf6d20fb5fd13785d482 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Wed, 3 Mar 2021 22:39:10 +0800 Subject: [PATCH] Allow static constexpr fields in mappable types for C++ This patch is a merge of: https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg01246.html Static members in general disqualify a C++ class from being target mappable, but static constexprs are inline optimized away, so should not interfere. OpenMP 5.0 in general lifts the static member limitation, so this patch will probably further adjusted later. 2021-03-03 Chung-Lin Tang gcc/cp/ChangeLog: * decl2.c (cp_omp_mappable_type_1): Allow fields with DECL_DECLARED_CONSTEXPR_P to be mapped. gcc/testsuite/ChangeLog: * g++.dg/goacc/static-constexpr-1.C: New test. * g++.dg/gomp/static-constexpr-1.C: New test. --- gcc/cp/decl2.c | 5 - gcc/testsuite/g++.dg/goacc/static-constexpr-1.C | 17 + gcc/testsuite/g++.dg/gomp/static-constexpr-1.C | 17 + 3 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/goacc/static-constexpr-1.C create mode 100644 gcc/testsuite/g++.dg/gomp/static-constexpr-1.C diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index 5343ea3b068..872122fe83c 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -1460,7 +1460,10 @@ cp_omp_mappable_type_1 (tree type, bool notes) { tree field; for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) - if (VAR_P (field)) + if (VAR_P (field) + /* Fields that are 'static constexpr' can be folded away at compile + time, thus does not interfere with mapping. */ + && !DECL_DECLARED_CONSTEXPR_P (field)) { if (notes) inform (DECL_SOURCE_LOCATION (field), diff --git a/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C b/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C new file mode 100644 index 000..edf5f1a7628 --- /dev/null +++ b/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C @@ -0,0 +1,17 @@ +// { dg-do compile } +// { dg-require-effective-target c++11 } + +/* Test that static constexpr members do not interfere with offloading. */ +struct rec +{ + static constexpr int x = 1; + int y, z; +}; + +void foo (rec& r) +{ + #pragma acc parallel copy(r) + { +r.y = r.y = r.x; + } +} diff --git a/gcc/testsuite/g++.dg/gomp/static-constexpr-1.C b/gcc/testsuite/g++.dg/gomp/static-constexpr-1.C new file mode 100644 index 000..39eee92 --- /dev/null +++ b/gcc/testsuite/g++.dg/gomp/static-constexpr-1.C @@ -0,0 +1,17 @@ +// { dg-do compile } +// { dg-require-effective-target c++11 } + +/* Test that static constexpr members do not interfere with offloading. */ +struct rec +{ + static constexpr int x = 1; + int y, z; +}; + +void foo (rec& r) +{ + #pragma omp target map(r) + { +r.y = r.y = r.x; + } +} -- 2.17.1
[PATCH, OG10, OpenMP, committed] Support A->B expressions in map clause (C front-end)
This patch is a merge of parts from: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562467.html and devel/omp/gcc-10 commit 36a1eb, which was a modified merge of: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558975.html to provide the equivalent front-end patches for support "map(A->B)" clauses for the C front-end (only the C++ front-end received such changes before). Some associated middle-end changes are also in this patch. Tested without regressions, and pushed to devel/omp/gcc-10. Chung-Lin From 08caada8efd8f35db634647bbda6091fb667b00d Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Mon, 8 Mar 2021 15:56:52 +0800 Subject: [PATCH] Arrow operator handling for C front-end in OpenMP map clauses This patch merges some of the equivalent changes already done for the C++ front-end to the C parts. 2021-03-08 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in call to c_parser_omp_variable_list to 'true'. * c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in array base handling. (c_finish_omp_clauses): Handle 'A->member' case in map clauses. gcc/ChangeLog: * gimplify.c (gimplify_scan_omp_clauses): Add MEM_REF case when handling component_ref_p case. Add unshare_expr and gimplification when created GOMP_MAP_STRUCT is not a DECL. Add code to add firstprivate pointer for *pointer-to-struct case. gcc/testsuite/ChangeLog: * gcc.dg/gomp/target-3.c: New test. --- gcc/c/c-parser.c | 3 +- gcc/c/c-typeck.c | 22 +++ gcc/gimplify.c | 41 ++-- gcc/testsuite/gcc.dg/gomp/target-3.c | 16 +++ 4 files changed, 79 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index fae597128e9..0a6aee439f6 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -15700,7 +15700,8 @@ c_parser_omp_clause_map (c_parser *parser, tree list) } } - nl = c_parser_omp_variable_list (parser, clause_loc, OMP_CLAUSE_MAP, list); + nl = c_parser_omp_variable_list (parser, clause_loc, OMP_CLAUSE_MAP, list, + C_ORT_OMP, true); for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c)) OMP_CLAUSE_SET_MAP_KIND (c, kind); diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 6af19766324..7c887a80ce9 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -12917,6 +12917,12 @@ handle_omp_array_sections_1 (tree c, tree t, vec &types, return error_mark_node; } t = TREE_OPERAND (t, 0); + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) + && TREE_CODE (t) == MEM_REF) + { + t = TREE_OPERAND (t, 0); + STRIP_NOPS (t); + } if (ort == C_ORT_ACC && TREE_CODE (t) == MEM_REF) { if (maybe_ne (mem_ref_offset (t), 0)) @@ -13778,6 +13784,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) tree ordered_clause = NULL_TREE; tree schedule_clause = NULL_TREE; bool oacc_async = false; + bool indir_component_ref_p = false; tree last_iterators = NULL_TREE; bool last_iterators_remove = false; tree *nogroup_seen = NULL; @@ -14505,6 +14512,11 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) { while (TREE_CODE (t) == COMPONENT_REF) t = TREE_OPERAND (t, 0); + if (TREE_CODE (t) == MEM_REF) + { + t = TREE_OPERAND (t, 0); + STRIP_NOPS (t); + } if (bitmap_bit_p (&map_field_head, DECL_UID (t))) break; if (bitmap_bit_p (&map_head, DECL_UID (t))) @@ -14561,6 +14573,15 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) bias) to zero here, so it is not set erroneously to the pointer size later on in gimplify.c. */ OMP_CLAUSE_SIZE (c) = size_zero_node; + indir_component_ref_p = false; + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) + && TREE_CODE (t) == COMPONENT_REF + && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF) + { + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + indir_component_ref_p = true; + STRIP_NOPS (t); + } if (TREE_CODE (t) == COMPONENT_REF && OMP_CLAUSE_CODE (c) != OMP_CLAUSE__CACHE_) { @@ -14633,6 +14654,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
Re: [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num
On 2021/7/23 7:01 PM, Tobias Burnus wrote: I personally prefer having: int initial_dev; and inside 'omp target' (with 'map(from:initial_dev)'): initial_device = omp_is_initial_device(); Then the check would be: if (initial_device && host_device_num != device_num) abort(); if (!initial_device && host_device_num == device_num) abort(); (Likewise for Fortran.) Thanks, I've adjusted the new testcases to use this style. And instead of restricting the target to nvptx/gcn, we could just add dg-xfail-run-if for *-intelmic-* and *-intelmicemul-*. I've added a 'offload_target_intelmic' to use on the new testcases. Additionally, offload_target_nvptx/...amdgcn only check whether compilation support is available not whether a device exists at run time. (The device availability is checked by target_offload_device, using omp_is_initial_device().) I guess there is value in testing compilation as long as the compiler is properly configured, and leaving the execution as an independent test. OTOH, I think the OpenMP execution tests are not properly forcing offload (or not) using the environment variables, unlike what we have for OpenACC. Thanks, Chung-Lin
[PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num
On 2021/7/23 6:39 PM, Jakub Jelinek wrote: On Fri, Jul 23, 2021 at 06:21:41PM +0800, Chung-Lin Tang wrote: --- a/libgomp/icv-device.c +++ b/libgomp/icv-device.c @@ -61,8 +61,17 @@ omp_is_initial_device (void) return 1; } +int +omp_get_device_num (void) +{ + /* By specification, this is equivalent to omp_get_initial_device + on the host. */ + return omp_get_initial_device (); +} + I think this won't work properly with the intel micoffload, where the host libgomp is used in the offloaded code. For omp_is_initial_device, the plugin solves it by: liboffloadmic/plugin/offload_target_main.cpp overriding it: /* Override the corresponding functions from libgomp. */ extern "C" int omp_is_initial_device (void) __GOMP_NOTHROW { return 0; } extern "C" int32_t omp_is_initial_device_ (void) { return omp_is_initial_device (); } but guess it will need slightly more work because we need to copy the value to the offloading device too. It can be done incrementally though. I guess this part of intelmic functionality will just have to wait later. There seem to be other parts of liboffloadmic that seems to need re-work, e.g. omp_get_num_devices() return mic_engines_total, where it should actually return the number of all devices (not just intelmic). omp_get_initial_device() returning -1 (which I don't quite understand), etc. Really suggest to have intelmic support be re-worked as an offload plugin inside libgomp, rather than floating outside by itself. --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -102,6 +102,12 @@ struct addr_pair uintptr_t end; }; +/* This symbol is to name a target side variable that holds the designated + 'device number' of the target device. The symbol needs to be available to + libgomp code and the offload plugin (which in the latter case must be + stringified). */ +#define GOMP_DEVICE_NUM_VAR __gomp_device_num For a single var it is acceptable (though, please avoid the double space before offload plugin in the comment), but once we have more than one variable, I think we should simply have a struct which will contain all the parameters that need to be copied from the host to the offloading device at image load time (and have eventually another struct that holds parameters that we'll need to copy to the device on each kernel launch, I bet some ICVs will be one category, other ICVs another one). Actually, if you look at the 5.[01] specifications, omp_get_device_num() is not defined in terms of an ICV. Maybe it conceptually ought to be, but the current description of "the device number of the device on which the calling thread is executing" is not one if the defined ICVs. It looks like there will eventually be some kind of ICV block handled in a similar way, but I think that the modifications will be straightforward then. For now, I think it's okay for GOMP_DEVICE_NUM_VAR to just be a normal global variable. diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map index 8ea27b5565f..ffcb98ae99e 100644 --- a/libgomp/libgomp.map +++ b/libgomp/libgomp.map @@ -197,6 +197,8 @@ OMP_5.0.1 { omp_get_supported_active_levels_; omp_fulfill_event; omp_fulfill_event_; + omp_get_device_num; + omp_get_device_num_; } OMP_5.0; This is wrong. We've already released GCC 11.1 with the OMP_5.0.1 symbol version, so we must not add any further symbols into that symbol version. OpenMP 5.0 routines added in GCC 12 should be OMP_5.0.2 symbol version. I've adjusted this into 5.0.2, in between 5.0.1 and the new 5.1 added by the recent omp_display_env[_] routines. omp_get_device_num is a OpenMP 5.0 introduced API function, so I think this is the correct handling (instead of stashing into 5.1). There is a new function check_effective_target_offload_target_intelmic() in testsuite/lib/libgomp.exp, used to test for non-intelmic offloading situations. Re-tested with no regressions, seeking approval for trunk. Thanks, Chung-Lin 2021-08-02 Chung-Lin Tang libgomp/ChangeLog * icv-device.c (omp_get_device_num): New API function, host side. * fortran.c (omp_get_device_num_): New interface function. * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol. * libgomp.map (OMP_5.0.2): New version space with omp_get_device_num, omp_get_device_num_. * libgomp.texi (omp_get_device_num): Add documentation for new API function. * omp.h.in (omp_get_device_num): Add declaration. * omp_lib.f90.in (omp_get_device_num): Likewise. * omp_lib.h.in (omp_get_device_num): Likewise. * target.c (gomp_load_image_to_device): If additional entry for device number exists at end of returned entries from 'load_image_func' hook, copy the assigned device number over to the device variable. * config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): D
Re: [PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num
On 2021/8/3 8:22 PM, Thomas Schwinge wrote: Hi Chung-Lin! On 2021-08-02T21:10:57+0800, Chung-Lin Tang wrote: --- a/libgomp/fortran.c +++ b/libgomp/fortran.c +int32_t +omp_get_device_num_ (void) +{ + return omp_get_device_num (); +} Missing 'ialias_redirect (omp_get_device_num)'? Grüße Thomas Thanks, will fix before committing. Chung-Lin
[PATCH, v3, libgomp, OpenMP 5.0, committed] Implement omp_get_device_num
view of the entire libgomp source. This means that there will be cases where putting some kind of setup/initialization in the plugin will be awkward and hard to implement (without pulling even more stuff into the plugin). Having the plugin simply do the job of finding the device location of an opaque variable with pre-arranged name and size, and return it for libgomp to do the setup work, is a better separation of interface. --- a/libgomp/config/gcn/icv-device.c +++ b/libgomp/config/gcn/icv-device.c @@ -70,6 +70,16 @@ omp_is_initial_device (void) return 0; } +/* This is set to the device number of current GPU during device initialization, + when the offload image containing this libgomp portion is loaded. */ +static int GOMP_DEVICE_NUM_VAR; + +int +omp_get_device_num (void) +{ + return GOMP_DEVICE_NUM_VAR; +} + ialias (omp_set_default_device) ialias (omp_get_default_device) ialias (omp_get_initial_device) I suppose also add 'ialias (omp_get_device_num)' here, like... Done, thanks for catching. --- a/libgomp/testsuite/lib/libgomp.exp +++ b/libgomp/testsuite/lib/libgomp.exp +# Return 1 if compiling for offload target intelmic +proc check_effective_target_offload_target_intelmic { } { +return [libgomp_check_effective_target_offload_target "*-intelmic"] +} --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c @@ -0,0 +1,30 @@ +/* { dg-do run { target { ! offload_target_intelmic } } } */ This means that the test case is skipped as soon as the compiler is configured for Intel MIC offloading -- even if that's not used during execution. From some older experiment of mine, I do have a 'check_effective_target_offload_device_intel_mic', which I'll propose as a follow-up, once this is in. Great. + if (initial_device .and. host_device_num .ne. device_num) stop 2 That one matches 'libgomp.c-c++-common/target-45.c': if (initial_device && host_device_num != device_num) abort (); ..., but here: + if (initial_device .and. host_device_num .eq. device_num) stop 3 ... shouldn't that be '.not.initial_device', like in: if (!initial_device && host_device_num == device_num) abort (); Yeah, Tobias also caught this as well :) (Also, I'm not familiar with Fortran operator precedence rules, so probably would put the individual expressions into braces.;-) -- But I trust you know better than I do, of course.) Done. Attached is the final "v3" patch that I committed. Thanks, Chung-Lin From 0bac793ed6bad2c0c13cd1e93a1aa5808467afc8 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 5 Aug 2021 23:29:03 +0800 Subject: [PATCH] openmp: Implement omp_get_device_num routine This patch implements the omp_get_device_num library routine, specified in OpenMP 5.0. GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number" variable, is defined on the device-side libgomp, has it's address returned to host-side libgomp during device initialization, and the host libgomp then sets its value to the designated device number. libgomp/ChangeLog: * icv-device.c (omp_get_device_num): New API function, host side. * fortran.c (omp_get_device_num_): New interface function. * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol. * libgomp.map (OMP_5.0.2): New version space with omp_get_device_num, omp_get_device_num_. * libgomp.texi (omp_get_device_num): Add documentation for new API function. * omp.h.in (omp_get_device_num): Add declaration. * omp_lib.f90.in (omp_get_device_num): Likewise. * omp_lib.h.in (omp_get_device_num): Likewise. * target.c (gomp_load_image_to_device): If additional entry for device number exists at end of returned entries from 'load_image_func' hook, copy the assigned device number over to the device variable. * config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * plugin/plugin-gcn.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * plugin/plugin-nvptx.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * testsuite/lib/libgomp.exp (check_effective_target_offload_target_intelmic): New function for testing for intelmic offloading. * testsuite/libgomp.c-c++-common/target-45.c: New test. * testsuite/libgomp.fortran/target10.f90: New te
[PATCH, libgomp, OpenMP 5.0, OG11, committed] Implement omp_get_device_num
The omp_get_device_num patch was merged to devel/omp/gcc-11 (OG11) after testing. Commit was 83177ca9f262b230c892e667ebf685f96a718ec8. This commit also effective reverts the one-liner patch by Cesar: https://gcc.gnu.org/pipermail/gcc-patches/2017-October/484844.html (which was still kept in OG11 at 59ef9fea377db72f198b2bd5a95d5aef58b3f9c4) That small patch is not on mainline, and conflicts with the current merge, and upon review and test, appears isn't really needed anymore. Thus took the liberty to overwrite it with the merge of this omp_get_device_num patch. Chung-Lin From 83177ca9f262b230c892e667ebf685f96a718ec8 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Mon, 9 Aug 2021 08:58:07 +0200 Subject: [PATCH] openmp: Implement omp_get_device_num routine This patch implements the omp_get_device_num library routine, specified in OpenMP 5.0. GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number" variable, is defined on the device-side libgomp, has it's address returned to host-side libgomp during device initialization, and the host libgomp then sets its value to the designated device number. libgomp/ChangeLog: * icv-device.c (omp_get_device_num): New API function, host side. * fortran.c (omp_get_device_num_): New interface function. * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol. * libgomp.map (OMP_5.0.2): New version space with omp_get_device_num, omp_get_device_num_. * libgomp.texi (omp_get_device_num): Add documentation for new API function. * omp.h.in (omp_get_device_num): Add declaration. * omp_lib.f90.in (omp_get_device_num): Likewise. * omp_lib.h.in (omp_get_device_num): Likewise. * target.c (gomp_load_image_to_device): If additional entry for device number exists at end of returned entries from 'load_image_func' hook, copy the assigned device number over to the device variable. * config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * plugin/plugin-gcn.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * plugin/plugin-nvptx.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * testsuite/lib/libgomp.exp (check_effective_target_offload_target_intelmic): New function for testing for intelmic offloading. * testsuite/libgomp.c-c++-common/target-45.c: New test. * testsuite/libgomp.fortran/target10.f90: New test. (cherry picked from commit 0bac793ed6bad2c0c13cd1e93a1aa5808467afc8) --- libgomp/ChangeLog.omp | 42 +++--- libgomp/config/gcn/icv-device.c| 11 ++ libgomp/config/nvptx/icv-device.c | 11 ++ libgomp/fortran.c | 7 libgomp/icv-device.c | 9 + libgomp/libgomp-plugin.h | 6 libgomp/libgomp.map| 8 - libgomp/libgomp.texi | 29 +++ libgomp/omp.h.in | 1 + libgomp/omp_lib.f90.in | 6 libgomp/omp_lib.h.in | 3 ++ libgomp/plugin/plugin-gcn.c| 38 ++-- libgomp/plugin/plugin-nvptx.c | 25 +++-- libgomp/target.c | 36 ++- libgomp/testsuite/lib/libgomp.exp | 5 +++ libgomp/testsuite/libgomp.c-c++-common/target-45.c | 30 libgomp/testsuite/libgomp.fortran/target10.f90 | 20 +++ 17 files changed, 276 insertions(+), 11 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-45.c create mode 100644 libgomp/testsuite/libgomp.fortran/target10.f90 diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp index 9467e90..3a3299b 100644 --- a/libgomp/ChangeLog.omp +++ b/libgomp/ChangeLog.omp @@ -1,15 +1,49 @@ -2021-06-30 Tobias Burnus +2021-08-09 Tobias Burnus Backported from master: - 2021-06-29 Thomas Schwinge + 2021-08-05 Chung-Lin Tang + + * icv-device.c (omp_get_device_num): New API function, host side. + * fortran.c (omp_get_device_num_): New interface function. + * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol. + * libgomp.map (OMP_5.0.2): New version space with o
[PATCH, OG11, OpenACC, committed] Fix ICE for non-contiguous arrays
Currently we ICE when non-decl base-pointers (like struct members) are used in OpenACC non-contiguous array sections. This patch is kind of a band-aid to reject such cases ATM. We'll deal with the more elaborate middle-end stuff to fully support them later. Committed to devel/omp/gcc-11 after testing. This is not for mainline. Chung-Lin From 4e34710679ac084d7ca15ccf387c1b6f1e64c2d1 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 19 Aug 2021 16:17:02 +0800 Subject: [PATCH] openacc: fix ICE for non-decl expression in non-contiguous array base-pointer Currently, we do not support cases like struct-members as the base-pointer for an OpenACC non-contiguous array. Mark such cases as unsupported in the C/C++ front-ends, instead of ICEing on them. gcc/c/ChangeLog: * c-typeck.c (handle_omp_array_sections_1): Robustify non-contiguous array check and reject non-DECL base-pointer cases as unsupported. gcc/cp/ChangeLog: * semantics.c (handle_omp_array_sections_1): Robustify non-contiguous array check and reject non-DECL base-pointer cases as unsupported. --- gcc/c/c-typeck.c | 35 +++ gcc/cp/semantics.c | 39 --- 2 files changed, 47 insertions(+), 27 deletions(-) diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 9c4822bbf27..a8b54c676c0 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -13431,25 +13431,36 @@ handle_omp_array_sections_1 (tree c, tree t, vec &types, && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_AFFINITY && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST) { - if (ort == C_ORT_ACC) - /* Note that OpenACC does accept these kinds of non-contiguous - pointer based arrays. */ - non_contiguous = true; - else + /* If any prior dimension has a non-one length, then deem this +array section as non-contiguous. */ + for (tree d = TREE_CHAIN (t); TREE_CODE (d) == TREE_LIST; + d = TREE_CHAIN (d)) { - /* If any prior dimension has a non-one length, then deem this -array section as non-contiguous. */ - for (tree d = TREE_CHAIN (t); TREE_CODE (d) == TREE_LIST; - d = TREE_CHAIN (d)) + tree d_length = TREE_VALUE (d); + if (d_length == NULL_TREE || !integer_onep (d_length)) { - tree d_length = TREE_VALUE (d); - if (d_length == NULL_TREE || !integer_onep (d_length)) + if (ort == C_ORT_ACC) { + while (TREE_CODE (d) == TREE_LIST) + d = TREE_CHAIN (d); + if (DECL_P (d)) + { + /* Note that OpenACC does accept these kinds of +non-contiguous pointer based arrays. */ + non_contiguous = true; + break; + } error_at (OMP_CLAUSE_LOCATION (c), - "array section is not contiguous in %qs clause", + "base-pointer expression in %qs clause not " + "supported for non-contiguous arrays", omp_clause_code_name[OMP_CLAUSE_CODE (c)]); return error_mark_node; } + + error_at (OMP_CLAUSE_LOCATION (c), + "array section is not contiguous in %qs clause", + omp_clause_code_name[OMP_CLAUSE_CODE (c)]); + return error_mark_node; } } } diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index e56ad8aa1e1..ad62ad76ff9 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -5292,32 +5292,41 @@ handle_omp_array_sections_1 (tree c, tree t, vec &types, return error_mark_node; } /* If there is a pointer type anywhere but in the very first -array-section-subscript, the array section could be non-contiguous. -Note that OpenACC does accept these kinds of non-contiguous pointer -based arrays. */ +array-section-subscript, the array section could be non-contiguous. */ if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_AFFINITY && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST) { - if (ort == C_ORT_ACC) - /* Note that OpenACC does accept these kinds of non-contiguous - pointer based arrays. */ - non_contiguous = true; - else + /* If any prior dimension has a non-one length, then deem this +array section as non-contiguous. */ + for (tree d = TREE_CHAIN (t); TREE_CODE (d) =
[PATCH, OpenMP 5, C++] Implement implicit mapping of this[:1] (PR92120)
Hi Jakub, this patch implements automatically adding map(tofrom: this[:1]) to omp target regions inside non-static member functions, as specified in OpenMP 5.0. This patch factors away some parts of cp_parser_omp_target, into a new finish_omp_target function, and implements the new clause adding there. For target regions in normal non-static member functions, the case is more simple. For the inside lambda function case, this is implemented by copying the entire __closure as a "to" map first (and yeah, this patch allows target regions inside lambda functions to largely work, but since it's just a copying of __closure, the capture by reference case still shouldn't work yet). __closure->this is then implemented by an always_pointer map clause. I've added two testcases, as both compiler scan testcases and libgomp executable test. Testing of g++ and libgomp both are regression free with nvptx offloading. Is this okay for trunk? Thanks, Chung-Lin 2020-09-16 Chung-Lin Tang PR middle-end/92120 gcc/cp/ * cp-tree.h (finish_omp_target): New declaration. (set_omp_target_this_expr): Likewise. * lambda.c (lambda_expr_this_capture): Add call to set_omp_target_this_expr. * parser.c (cp_parser_omp_target): Factor out code, change to call finish_omp_target, add re-initing call to set_omp_target_this_expr. * semantics.c (omp_target_this_expr): New static variable. (finish_non_static_data_member): Add call to set_omp_target_this_expr. (finish_this_expr): Likewise. (set_omp_target_this_expr): New function to set omp_target_this_expr. (finish_omp_target): New function with code merged from cp_parser_omp_target, plus code to add this and __closure map clauses for OpenMP. gcc/testsuite/ * g++.dg/gomp/target-this-1.C: New testcase. * g++.dg/gomp/target-this-2.C: New testcase. libgomp/testsuite/ * libgomp.c++/target-this-1.C: New testcase. * libgomp.c++/target-this-2.C: New testcase. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 6e4de7d0c4b..81e72449856 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7241,6 +7241,11 @@ extern tree finish_omp_structured_block (tree); extern tree finish_oacc_data (tree, tree); extern tree finish_oacc_host_data (tree, tree); extern tree finish_omp_construct (enum tree_code, tree, tree); + +extern tree finish_omp_target (location_t, tree, tree, bool); +extern void set_omp_target_this_expr (tree); + + extern tree begin_omp_parallel (void); extern tree finish_omp_parallel(tree, tree); extern tree begin_omp_task (void); diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index c94fe8edb8e..aea5f5adc52 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -842,6 +842,9 @@ lambda_expr_this_capture (tree lambda, int add_capture_p) type cast (_expr.cast_ 5.4) to the type of 'this'. [ The cast ensures that the transformed expression is an rvalue. ] */ result = rvalue (result); + + /* Acknowledge to OpenMP target that 'this' was referenced. */ + set_omp_target_this_expr (result); } return result; diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index fba3fcc0c4c..46de8e6cb65 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -40742,8 +40742,6 @@ static bool cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok, enum pragma_context context, bool *if_p) { - tree *pc = NULL, stmt; - if (flag_openmp) omp_requires_mask = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED); @@ -40796,6 +40794,7 @@ cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok, keep_next_level (true); tree sb = begin_omp_structured_block (), ret; unsigned save = cp_parser_begin_omp_structured_block (parser); + set_omp_target_this_expr (NULL_TREE); switch (ccode) { case OMP_TEAMS: @@ -40847,15 +40846,9 @@ cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok, cclauses[C_OMP_CLAUSE_SPLIT_TARGET] = tc; } } - tree stmt = make_node (OMP_TARGET); - TREE_TYPE (stmt) = void_type_node; - OMP_TARGET_CLAUSES (stmt) = cclauses[C_OMP_CLAUSE_SPLIT_TARGET]; - OMP_TARGET_BODY (stmt) = body; - OMP_TARGET_COMBINED (stmt) = 1; - SET_EXPR_LOCATION (stmt, pragma_tok->location); - add_stmt (stmt); - pc = &OMP_TARGET_CLAUSES (stmt); - goto check_clauses; + finish_omp_target (pragma_tok->location, +cclauses[C_OMP_CLAUSE_SPLIT_TARGET], body, true); + return true; } else if (!flag_openmp)
Re: [PATCH, 1/3, OpenMP] Target mapping changes for OpenMP 5.0, front-end parts
Ping this patch set. Thanks, Chung-Lin On 2020/9/1 9:16 PM, Chung-Lin Tang wrote: Hi Jakub, this patch set implements parts of the target mapping changes introduced in OpenMP 5.0, mainly the attachment requirements for pointer-based list items, and the clause ordering. The first patch here are the C/C++ front-end changes. The entire set of changes has been tested for without regressions for the compiler and libgomp. Hope this is ready to commit to master. Thanks, Chung-Lin gcc/c-family/ * c-common.h (c_omp_adjust_clauses): New declaration. * c-omp.c (c_omp_adjust_clauses): New function. gcc/c/ * c-parser.c (c_parser_omp_target_data): Add use of new c_omp_adjust_clauses function. Add GOMP_MAP_ATTACH_DETACH as handled map clause kind. (c_parser_omp_target_enter_data): Likewise. (c_parser_omp_target_exit_data): Likewise. (c_parser_omp_target): Likewise. * c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. (c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and same struct field access to co-exist on OpenMP construct. gcc/cp/ * parser.c (cp_parser_omp_target_data): Add use of new c_omp_adjust_clauses function. Add GOMP_MAP_ATTACH_DETACH as handled map clause kind. (cp_parser_omp_target_enter_data): Likewise. (cp_parser_omp_target_exit_data): Likewise. (cp_parser_omp_target): Likewise. * semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix interaction between reference case and attach/detach. (finish_omp_clauses): Adjust bitmap checks to allow struct decl and same struct field access to co-exist on OpenMP construct.
Re: [PATCH, 1/3, OpenMP] Target mapping changes for OpenMP 5.0, front-end parts
On 2020/9/29 6:16 PM, Jakub Jelinek wrote: On Tue, Sep 01, 2020 at 09:16:23PM +0800, Chung-Lin Tang wrote: this patch set implements parts of the target mapping changes introduced in OpenMP 5.0, mainly the attachment requirements for pointer-based list items, and the clause ordering. The first patch here are the C/C++ front-end changes. Do you think you could mention in detail which exact target mapping changes in the spec is the patchset attempting to implement? 5.0 unfortunately contains many target mapping changes and this patchset can't implement them all and it would be easier to see the list of rules (e.g. from openmp-diff-full-4.5-5.0.pdf, if you don't have that one, I can send it to you), rather than trying to guess them from the patchset. Thanks. Hi Jakub, the main implemented features are the clause ordering rules: "For a given construct, the effect of a map clause with the to, from, or tofrom map-type is ordered before the effect of a map clause with the alloc, release, or delete map-type." "If item1 is a list item in a map clause, and item2 is another list item in a map clause on the same construct that has a base pointer that is, or is part of, item1, then: * If the map clause(s) appear on a target, target data, or target enter data construct, then on entry to the corresponding region the effect of the map clause on item1 is ordered to occur before the effect of the map clause on item2. * If the map clause(s) appear on a target, target data, or target exit data construct then on exit from the corresponding region the effect of the map clause on item2 is ordered to occur before the effect of the map clause on item1." and the base-pointer attachment behavior: "If a list item in a map clause has a base pointer, and a pointer variable is present in the device data environment that corresponds to the base pointer when the effect of the map clause occurs, then if the corresponding pointer or the corresponding list item is created in the device data environment on entry to the construct, then: ... 2. The corresponding pointer variable becomes an attached pointer for the corresponding list item." (these passages are all in the "2.19.7.1 map Clause" section of the 5.0 spec, all are new as also verified from the diff PDFs you sent us) Also, because of the these new features, having multiple maps of the same variables now have meaning in OpenMP, so changes in the C/C++ frontends to relax the no-duplicate rules are also included. gcc/c-family/ * c-common.h (c_omp_adjust_clauses): New declaration. * c-omp.c (c_omp_adjust_clauses): New function. This function name is too broad, it should have target in it as it is for processing target* construct clauses only. Jakub Sure, I'll update this naming in a later version. Thanks, Chung-Lin
[PATCH, OpenMP, C++] Allow classes with static members to be mappable
Hi Jakub, Now in OpenMP 5.x, static members are supposed to be not a barrier for a class to be target-mapped. There is the related issue of actually providing access to static const/constexpr members on the GPU (probably a case of https://github.com/OpenMP/spec/issues/2158) but that is for later. This patch basically just removes the check for static members inside cp_omp_mappable_type_1, and adjusts a testcase. Not sure if more tests are needed. Tested on trunk without regressions, okay when stage1 reopens? Thanks, Chung-Lin 2022-03-09 Chung-Lin Tang gcc/cp/ChangeLog: * decl2.cc (cp_omp_mappable_type_1): Remove requirement that all members must be non-static; remove check for static fields. gcc/testsuite/ChangeLog: * g++.dg/gomp/unmappable-1.C: Adjust testcase.diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc index c53acf4546d..ace7783d9bd 100644 --- a/gcc/cp/decl2.cc +++ b/gcc/cp/decl2.cc @@ -1544,21 +1544,14 @@ cp_omp_mappable_type_1 (tree type, bool notes) /* Arrays have mappable type if the elements have mappable type. */ while (TREE_CODE (type) == ARRAY_TYPE) type = TREE_TYPE (type); - /* All data members must be non-static. */ + if (CLASS_TYPE_P (type)) { tree field; for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) - if (VAR_P (field)) - { - if (notes) - inform (DECL_SOURCE_LOCATION (field), - "static field %qD is not mappable", field); - result = false; - } /* All fields must have mappable types. */ - else if (TREE_CODE (field) == FIELD_DECL -&& !cp_omp_mappable_type_1 (TREE_TYPE (field), notes)) + if (TREE_CODE (field) == FIELD_DECL + && !cp_omp_mappable_type_1 (TREE_TYPE (field), notes)) result = false; } return result; diff --git a/gcc/testsuite/g++.dg/gomp/unmappable-1.C b/gcc/testsuite/g++.dg/gomp/unmappable-1.C index 364f884500c..1532b9c73f1 100644 --- a/gcc/testsuite/g++.dg/gomp/unmappable-1.C +++ b/gcc/testsuite/g++.dg/gomp/unmappable-1.C @@ -4,7 +4,7 @@ class C { public: - static int static_member; /* { dg-message "static field .C::static_member. is not mappable" } */ + static int static_member; virtual void f() {} };
[RFC][PATCH, OpenMP/OpenACC, libgomp] Allow base-pointers to be NULL
Hi all, when troubleshooting building/running SPEC HPC 2021 with GCC with OpenMP offloading, specifically 534.hpgmgfv_t, an issue encountered in the benchmark was: when the benchmark was initializing and creating its data environment on the GPU, it was trying to map array sections where the base-pointer is actually NULL: ... for (block=0;block<3;++block) { #pragma omp target enter data map(to:level->restriction[shape].blocks[block][:length]) // level->restriction[shape].blocks[block] == NULL for some values of index 'block' ... The benchmark appears to be assuming that such NULL base-pointers would simply be silently ignored, and the program would just keep running. (BTW, the above case needs this patch to compile: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590658.html which is still awaiting review :) ) What we currently do in libgomp, however, is that we issue an error and call gomp_fatal(): libgomp/target.c:gomp_attach_pointer(): ... if ((void *) target == NULL) { - gomp_mutex_unlock (&devicep->lock); - gomp_fatal ("attempt to attach null pointer"); + n->aux->attach_count[idx] = 0; // proposed change attached in patch + return; ... Some quick testing shows that clang/LLVM behaves mostly the same as GCC. OTOH, nVidia HPC SDK (PGI) does appear to silently go on without bailing out. (I have not verified if 534.hpgmgfv_t fully works with PGI, just observed how their runtime handles NULL base-pointers) I don't see any explicit description of this case in the OpenMP specifications, just simply "The corresponding pointer variable becomes an attached pointer", lack of description on how this is to be handled. So WDYGT? Should libgomp behavior be adjusted here, or should SPEC benchmark source be adjusted? (The attached patch to adjust libgomp attach behavior has been regtested without regressions, FWIW) Thanks, Chung-Lin 2022-03-09 Chung-Lin Tang libgomp/ChangeLog: * target.c (gomp_attach_pointer): When pointer is NULL, return instead of calling gomp_fatal. diff --git a/libgomp/target.c b/libgomp/target.c index 9017458885e..0e8bbd83c20 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -796,8 +796,8 @@ gomp_attach_pointer (struct gomp_device_descr *devicep, if ((void *) target == NULL) { - gomp_mutex_unlock (&devicep->lock); - gomp_fatal ("attempt to attach null pointer"); + n->aux->attach_count[idx] = 0; + return; } s.host_start = target + bias;
[PATCH, OpenMP] Fix nested use_device_ptr
Hi Jakub, this patch fixes a bug in lower_omp_target, where for Fortran arrays, the expanded sender assignment is wrongly using the variable in the current ctx, instead of the one looked-up outside, which is causing use_device_ptr/addr to fail to work when used inside an omp-parallel (where the omp child_fn is split away from the original). Just a one-character change to fix this. The fix is inside omp-low.cc, though because the omp_array_data langhook is used only by Fortran, this is essentially Fortran-specific. Tested on x86_64-linux + nvptx offloading without regressions. This is probably not a regression, but seeking to commit when stage1 opens. Thanks, Chung-Lin 2022-04-01 Chung-Lin Tang gcc/ChangeLog: * omp-low.cc (lower_omp_target): Use outer context looked-up 'var' as argument to lang_hooks.decls.omp_array_data, instead of 'ovar' from current clause. libgomp/ChangeLog: * testsuite/libgomp.fortran/use_device_ptr-4.f90: New testcase. diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index 392bb18..bf5779b 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -13405,7 +13405,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) type = TREE_TYPE (ovar); if (lang_hooks.decls.omp_array_data (ovar, true)) - var = lang_hooks.decls.omp_array_data (ovar, false); + var = lang_hooks.decls.omp_array_data (var, false); else if (((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_ADDR || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR) && !omp_privatize_by_reference (ovar) diff --git a/libgomp/testsuite/libgomp.fortran/use_device_ptr-4.f90 b/libgomp/testsuite/libgomp.fortran/use_device_ptr-4.f90 new file mode 100644 index 000..8c361d1 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/use_device_ptr-4.f90 @@ -0,0 +1,41 @@ +! { dg-do run } +! +! Test user_device_ptr nested within another parallel +! construct +! +program test_nested_use_device_ptr + use iso_c_binding, only: c_loc, c_ptr + implicit none + real, allocatable, target :: arr(:,:) + integer :: width = 1024, height = 1024, i + type(c_ptr) :: devptr + + allocate(arr(width,height)) + + !$omp target enter data map(alloc: arr) + + !$omp target data use_device_ptr(arr) + devptr = c_loc(arr(1,1)) + !$omp end target data + + !$omp parallel default(none) shared(arr, devptr) + !$omp single + + !$omp target data use_device_ptr(arr) + call thing(c_loc(arr), devptr) + !$omp end target data + + !$omp end single + !$omp end parallel + !$omp target exit data map(delete: arr) + +contains + + subroutine thing(myarr, devptr) +use iso_c_binding, only: c_ptr, c_associated +implicit none +type(c_ptr) :: myarr, devptr +if (.not.c_associated(myarr, devptr)) stop 1 + end subroutine thing + +end program
[PATCH, OpenMP, C/C++] Handle array reference base-pointers in array sections
Hi Jakub, as encountered in cases where a program constructs its own deep-copying for arrays-of-pointers, e.g: #pragma omp target enter data map(to:level->vectors[:N]) for (i = 0; i < N; i++) #pragma omp target enter data map(to:level->vectors[i][:N]) We need to treat the part of the array reference before the array section as a base-pointer (here 'level->vectors[i]'), providing pointer-attachment behavior. This patch adds this inside handle_omp_array_sections(), tracing the whole sequence of array dimensions, creating a whole base-pointer reference iteratively using build_array_ref(). The conditions are that each of the "absorbed" dimensions must be length==1, and the final reference must be of pointer-type (so that pointer attachment makes sense). There's also a little patch in gimplify_scan_omp_clauses(), to make sure the array-ref base-pointer goes down the right path. This case was encountered when working to make 534.hpgmgfv_t from SPEChpc 2021 properly compile. Tested without regressions on trunk. Okay to go in once stage1 opens? Thanks, Chung-Lin 2022-02-21 Chung-Lin Tang gcc/c/ChangeLog: * c-typeck.cc (handle_omp_array_sections): Add handling for creating array-reference base-pointer attachment clause. gcc/cp/ChangeLog: * semantics.cc (handle_omp_array_sections): Add handling for creating array-reference base-pointer attachment clause. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Add case for attach/detach map kind for ARRAY_REF of POINTER_TYPE. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-enter-data-1.c: Adjust testcase. libgomp/testsuite/ChangeLog: * libgomp.c-c++-common/ptr-attach-2.c: New test.diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index 3075c883548..4257e373557 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -13649,6 +13649,10 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) if (int_size_in_bytes (TREE_TYPE (first)) <= 0) maybe_zero_len = true; + struct dim { tree low_bound, length; }; + auto_vec dims (num); + dims.safe_grow (num); + for (i = num, t = OMP_CLAUSE_DECL (c); i > 0; t = TREE_CHAIN (t)) { @@ -13763,6 +13767,9 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) else size = size_binop (MULT_EXPR, size, l); } + + dim d = { low_bound, length }; + dims[i] = d; } if (side_effects) size = build2 (COMPOUND_EXPR, sizetype, side_effects, size); @@ -13802,6 +13809,23 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) OMP_CLAUSE_DECL (c) = t; return false; } + + tree aref = t; + for (i = 0; i < dims.length (); i++) + { + if (dims[i].length && integer_onep (dims[i].length)) + { + tree lb = dims[i].low_bound; + aref = build_array_ref (OMP_CLAUSE_LOCATION (c), aref, lb); + } + else + { + if (TREE_CODE (TREE_TYPE (aref)) == POINTER_TYPE) + t = aref; + break; + } + } + first = c_fully_fold (first, false, NULL); OMP_CLAUSE_DECL (c) = first; if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR) @@ -13836,7 +13860,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) break; } tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP); - if (TREE_CODE (t) == COMPONENT_REF) + if (TREE_CODE (t) == COMPONENT_REF || TREE_CODE (t) == ARRAY_REF + || TREE_CODE (t) == INDIRECT_REF) OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH); else OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER); diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 0cb17a6a8ab..646f4883d66 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -5497,6 +5497,10 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) if (processing_template_decl && maybe_zero_len) return false; + struct dim { tree low_bound, length; }; + auto_vec dims (num); + dims.safe_grow (num); + for (i = num, t = OMP_CLAUSE_DECL (c); i > 0; t = TREE_CHAIN (t)) { @@ -5604,6 +5608,9 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) else size = size_binop (MULT_EXPR, size, l); } + + dim d = { low_bound, length }; + dims[i] = d; } if (!processing_template_decl) { @@ -5647,6 +5654,24 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) OMP_CLAUSE_DECL (c) = t; return false; } + + tree aref = t; + for (i = 0; i < dims.length (); i++) +
[PATCH, v2, OpenMP 5.0] Remove array section base-pointer mapping semantics, and other front-end adjustments (mainline trunk)
Hi Jakub, attached is a rebased version of this "OpenMP fixes/adjustments" patch. This version removes some of the (ort == C_ORT_OMP || ort == C_ORT_ACC) stuff that's not needed in handle_omp_array_sections_1 and [c_]finish_omp_clauses. Note that this is meant to be patched atop of the recent also posted C++ PR92120 v5 patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584602.html Again, tested without regressions (together with the PR92120 patch), awaiting review. Thanks, Chung-Lin (ChangeLog updated below) On 2021/5/25 9:36 PM, Chung-Lin Tang wrote: This patch largely implements three pieces of functionality: (1) Per discussion and clarification on the omp-lang mailing list, standards conforming behavior for mapping array sections should *NOT* also map the base-pointer, i.e for this code: struct S { int *ptr; ... }; struct S s; #pragma omp target enter data map(to: s.ptr[:100]) Currently we generate after gimplify: #pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \ map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) which is deemed incorrect. After this patch, the gimplify results are now adjusted to: #pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) (the attach operation is still generated, and if s.ptr is already mapped prior, attachment will happen) The correct way of achieving the base-pointer-also-mapped behavior would be to use: #pragma omp target enter data map(to: s.ptr, s.ptr[:100]) This adjustment in behavior required a number of small adjustments here and there in gimplify, including to accomodate map sequences for C++ references. There is also a small Fortran front-end patch involved (hence CCing Tobias and fortran@). The new gimplify processing changed behavior in handling GOMP_MAP_ALWAYS_POINTER maps such that the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the Fortran FE was generating a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and the pre-patch behavior was removing this map anyways. I have a small change in trans-openmp.c:gfc_trans_omp_array_section to not generate the map in this case, and so far no bad test results. (2) The second part (though kind of related to the first above) are fixes in libgomp/target.c to not overwrite attached pointers when handling device<->host copies, mainly for the "always" case. This behavior is also noted in the 5.0 spec, but not yet properly coded before. (3) The third is a set of changes to the C/C++ front-ends to extend the allowed component access syntax in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, so despite in the long term the entire map clause syntax parsing is probably going to be revamped, we're still adding this in for now. These changes are enabled for both OpenACC and OpenMP. 2021-11-19 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.c (struct omp_dim): New struct type for use inside c_parser_omp_variable_list. (c_parser_omp_variable_list): Allow multiple levels of array and component accesses in array section base-pointer expression. (c_parser_omp_clause_to): Set 'allow_deref' to true in call to c_parser_omp_var_list_parens. (c_parser_omp_clause_from): Likewise. * c-typeck.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (c_finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/cp/ChangeLog: * parser.c (struct omp_dim): New struct type for use inside cp_parser_omp_var_list_no_open. (cp_parser_omp_var_list_no_open): Allow multiple levels of array and component accesses in array section base-pointer expression. (cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to cp_parser_omp_var_list for to/from clauses. * semantics.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (handle_omp_array_sections): Adjust pointer map generation of references. (finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_array_section): Do not generate GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type. gcc/ChangeLog: * gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter, accomodate case where 'offset' return of get_inner_reference is non-NULL. (is_or_contains_p): Further robustify conditions.
Re: [PATCH, PR90030] Fortran OpenMP/OpenACC array mapping alignment fix
Ping. On 2021/11/4 4:23 PM, Chung-Lin Tang wrote: Hi Jakub, As Thomas reported and submitted a patch a while ago: https://gcc.gnu.org/pipermail/gcc-patches/2019-April/519932.html https://gcc.gnu.org/pipermail/gcc-patches/2019-May/522738.html There's an issue with the Fortran front-end when mapping arrays: when creating the data MEM_REF for the map clause, there's a convention of casting the referencing pointer to 'c_char *' by fold_convert (build_pointer_type (char_type_node), ptr). This causes the alignment passed to the libgomp runtime for array data hardwared to '1', and causes alignment errors on the offload target (not always showing up, but can trigger due to slight change of clause ordering) This patch is not exactly Thomas' patch from 2019, but does the same thing. The new libgomp tests are directly reused though. A lot of scan test adjustment is also included in this patch. Patch has been tested for no regressions for gfortran and libgomp, is this okay for trunk? Thanks, Chung-Lin Fortran: fix array alignment for OpenMP/OpenACC target mapping clauses [PR90030] The Fortran front-end is creating maps of array data with a type of pointer to char_type_node, which when eventually passed to libgomp during runtime, marks the passed array with an alignment of 1, which can cause mapping alignment errors on the offload target. This patch removes the related fold_convert(build_pointer_type (char_type_node)) calls in fortran/trans-openmp.c, and adds gcc_asserts to ensure pointer type. 2021-11-04 Chung-Lin Tang Thomas Schwinge PR fortran/90030 gcc/fortran/ChangeLog: * trans-openmp.c (gfc_omp_finish_clause): Remove fold_convert to pointer to char_type_node, add gcc_assert of POINTER_TYPE_P. (gfc_trans_omp_array_section): Likewise. (gfc_trans_omp_clauses): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/goacc/finalize-1.f: Adjust scan test. * gfortran.dg/gomp/affinity-clause-1.f90: Likewise. * gfortran.dg/gomp/affinity-clause-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-4.f90: Likewise. * gfortran.dg/gomp/defaultmap-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-6.f90: Likewise. * gfortran.dg/gomp/map-3.f90: Likewise. * gfortran.dg/gomp/pr78260-2.f90: Likewise. * gfortran.dg/gomp/pr78260-3.f90: Likewise. libgomp/ChangeLog: * testsuite/libgomp.oacc-fortran/pr90030.f90: New test. * testsuite/libgomp.fortran/pr90030.f90: New test.
[PATCH, Fortran] Fix setting of array lower bound for named arrays
This patch by Tobias, fixes a case of setting array low-bounds, found for particular uses of SOURCE=/MOLD=. For example: program A_M implicit none real, dimension (:), allocatable :: A, B allocate (A(0:5)) call Init (A) contains subroutine Init ( A ) real, dimension ( 0 : ), intent ( in ) :: A integer, dimension ( 1 ) :: lb_B allocate (B, mold = A) ... lb_B = lbound (B, dim=1) ! Error: lb_B assigned 1, instead of 0 like lower-bound of A. Referencing the Fortran standard: "16.9.109 LBOUND (ARRAY [, DIM, KIND])" states: "If DIM is present, ARRAY is a whole array, and either ARRAY is an assumed-size array of rank DIM or dimension DIM of ARRAY has nonzero extent, the result has a value equal to the lower bound for subscript DIM of ARRAY. Otherwise, if DIM is present, the result value is 1." And on what is a "whole array": "9.5.2 Whole arrays" "A whole array is a named array or a structure component ..." The attached patch adjusts the relevant part in gfc_trans_allocate() to only set e3_has_nodescriptor only for non-named arrays. Tobias has tested this once, and I've tested this patch as well on our complete set of testsuites (which usually serves for OpenMP related stuff). Everything appears well with no regressions. Is this okay for trunk? Thanks, Chung-Lin 2021-11-29 Tobias Burnus gcc/fortran/ChangeLog: * trans-stmt.c (gfc_trans_allocate): Set e3_has_nodescriptor to true only for non-named arrays. gcc/testsuite/ChangeLog: * gfortran.dg/allocate_with_source_26.f90: Adjust testcase. * gfortran.dg/allocate_with_mold_4.f90: New testcase.diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c index bdf7957..982e1e0 100644 --- a/gcc/fortran/trans-stmt.c +++ b/gcc/fortran/trans-stmt.c @@ -6660,16 +6660,13 @@ gfc_trans_allocate (gfc_code * code) else e3rhs = gfc_copy_expr (code->expr3); - // We need to propagate the bounds of the expr3 for source=/mold=; - // however, for nondescriptor arrays, we use internally a lower bound - // of zero instead of one, which needs to be corrected for the allocate obj - if (e3_is == E3_DESC) - { - symbol_attribute attr = gfc_expr_attr (code->expr3); - if (code->expr3->expr_type == EXPR_ARRAY || - (!attr.allocatable && !attr.pointer)) - e3_has_nodescriptor = true; - } + // We need to propagate the bounds of the expr3 for source=/mold=. + // However, for non-named arrays, the lbound has to be 1 and neither the + // bound used inside the called function even when returning an + // allocatable/pointer nor the zero used internally. + if (e3_is == E3_DESC + && code->expr3->expr_type != EXPR_VARIABLE) + e3_has_nodescriptor = true; } /* Loop over all objects to allocate. */ diff --git a/gcc/testsuite/gfortran.dg/allocate_with_mold_4.f90 b/gcc/testsuite/gfortran.dg/allocate_with_mold_4.f90 new file mode 100644 index 000..d545fe1 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/allocate_with_mold_4.f90 @@ -0,0 +1,24 @@ +program A_M + implicit none + real, parameter :: C(5:10) = 5.0 + real, dimension (:), allocatable :: A, B + allocate (A(6)) + call Init (A) +contains + subroutine Init ( A ) +real, dimension ( -1 : ), intent ( in ) :: A +integer, dimension ( 1 ) :: lb_B + +allocate (B, mold = A) +if (any (lbound (B) /= lbound (A))) stop 1 +if (any (ubound (B) /= ubound (A))) stop 2 +if (any (shape (B) /= shape (A))) stop 3 +if (size (B) /= size (A)) stop 4 +deallocate (B) +allocate (B, mold = C) +if (any (lbound (B) /= lbound (C))) stop 5 +if (any (ubound (B) /= ubound (C))) stop 6 +if (any (shape (B) /= shape (C))) stop 7 +if (size (B) /= size (C)) stop 8 +end +end diff --git a/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90 b/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90 index 28f24fc..323c8a3 100644 --- a/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90 +++ b/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90 @@ -34,23 +34,23 @@ program p if (lbound(p1, 1) /= 3 .or. ubound(p1, 1) /= 4 & .or. lbound(p2, 1) /= 3 .or. ubound(p2, 1) /= 4 & .or. lbound(p3, 1) /= 1 .or. ubound(p3, 1) /= 2 & - .or. lbound(p4, 1) /= 7 .or. ubound(p4, 1) /= 8 & + .or. lbound(p4, 1) /= 1 .or. ubound(p4, 1) /= 2 & .or. p1(3)%i /= 43 .or. p1(4)%i /= 56 & .or. p2(3)%i /= 43 .or. p2(4)%i /= 56 & .or. p3(1)%i /= 43 .or. p3(2)%i /= 56 & - .or. p4(7)%i /= 11 .or. p4(8)%i /= 12) then + .or. p4(1)%i /= 11 .or. p4(2)%i /= 12) then call abort() endif !write(*,*) lbound(a,1), ubound(a,1) ! prints 1 3 !write(*,*) lbound(b,1), ubound(b,1) ! prints 1 3 - !write(*,*) lbound(c,1), ubound(c,1) ! prints 3 5 + !write(*,*) lbound(c,1), ubound(c,1) ! prints 1 3 !write(*,*) lbound(d,1), ubound(d,1) ! prints 1 5 !write(*,*) lbound(e,1), ubound(e,1) ! prints 1 6
Re: [PATCH, v5, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v5]
On 2021/12/4 12:47 AM, Jakub Jelinek wrote: On Tue, Nov 16, 2021 at 08:43:27PM +0800, Chung-Lin Tang wrote: 2021-11-16 Chung-Lin Tang PR middle-end/92120 gcc/cp/ChangeLog: ... + if (allow_zero_length_array_sections) + { + /* When allowing attachment to zero-length array sections, we +allow attaching to NULL pointers when the target region is not +mapped. */ + data = 0; + } No {}s around single statement if body. Otherwise LGTM. Jakub Thanks for the review and approval, Jakub. Thomas, I pushed another 2766448c5cc3efc4 commit to fix the non-offload config FAILs, just FYI. Chung-Lin
Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
Ping. On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: > Hi Tom, > I had a patch submitted earlier, where I reported that the current way of > implementing > barriers in libgomp on nvptx created a quite significant performance drop on > some SPEChpc2021 > benchmarks: > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html > > That previous patch wasn't accepted well (admittedly, it was kind of a hack). > So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. > > Basically, instead of trying to have the GPU do CPU-with-OS-like things that > it isn't suited for, > barriers are implemented simplistically with bar.* synchronization > instructions. > Tasks are processed after threads have joined, and only if team->task_count > != 0 > > (arguably, there might be a little bit of performance forfeited where earlier > arriving threads > could've been used to process tasks ahead of other threads. But that again > falls into requiring > implementing complex futex-wait/wake like behavior. Really, that kind of > tasking is not what target > offloading is usually used for) > > Implementation highlight notes: > 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in > the usual manner) > 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. > 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" > > 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): > The main synchronization is done using a 'bar.red' instruction. This > reduces across all threads > the condition (team->task_count != 0), to enable the task processing down > below if any thread > created a task. (this bar.red usage required the need of the second GCC > patch in this series) > > This patch has been tested on x86_64/powerpc64le with nvptx offloading, using > libgomp, ovo, omptests, > and sollve_vv testsuites, all without regressions. Also verified that the > SPEChpc 2021 521.miniswp_t > and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle > has been restored to > devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? > > (also suggest backporting to GCC12 branch, if performance regression can be > considered a defect) > > Thanks, > Chung-Lin > > libgomp/ChangeLog: > > 2022-09-21 Chung-Lin Tang > > * config/nvptx/bar.c (generation_to_barrier): Remove. > (futex_wait,futex_wake,do_spin,do_wait): Remove. > (GOMP_WAIT_H): Remove. > (#include "../linux/bar.c"): Remove. > (gomp_barrier_wait_end): New function. > (gomp_barrier_wait): Likewise. > (gomp_barrier_wait_last): Likewise. > (gomp_team_barrier_wait_end): Likewise. > (gomp_team_barrier_wait): Likewise. > (gomp_team_barrier_wait_final): Likewise. > (gomp_team_barrier_wait_cancel_end): Likewise. > (gomp_team_barrier_wait_cancel): Likewise. > (gomp_team_barrier_cancel): Likewise. > * config/nvptx/bar.h (gomp_team_barrier_wake): Remove > prototype, add new static inline function.
[Ping x2] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
Ping x2. On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: > Ping. > > On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >> Hi Tom, >> I had a patch submitted earlier, where I reported that the current way of >> implementing >> barriers in libgomp on nvptx created a quite significant performance drop on >> some SPEChpc2021 >> benchmarks: >> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html >> >> That previous patch wasn't accepted well (admittedly, it was kind of a hack). >> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. >> >> Basically, instead of trying to have the GPU do CPU-with-OS-like things that >> it isn't suited for, >> barriers are implemented simplistically with bar.* synchronization >> instructions. >> Tasks are processed after threads have joined, and only if team->task_count >> != 0 >> >> (arguably, there might be a little bit of performance forfeited where >> earlier arriving threads >> could've been used to process tasks ahead of other threads. But that again >> falls into requiring >> implementing complex futex-wait/wake like behavior. Really, that kind of >> tasking is not what target >> offloading is usually used for) >> >> Implementation highlight notes: >> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" >> in the usual manner) >> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. >> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" >> >> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >> The main synchronization is done using a 'bar.red' instruction. This >> reduces across all threads >> the condition (team->task_count != 0), to enable the task processing >> down below if any thread >> created a task. (this bar.red usage required the need of the second GCC >> patch in this series) >> >> This patch has been tested on x86_64/powerpc64le with nvptx offloading, >> using libgomp, ovo, omptests, >> and sollve_vv testsuites, all without regressions. Also verified that the >> SPEChpc 2021 521.miniswp_t >> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle >> has been restored to >> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? >> >> (also suggest backporting to GCC12 branch, if performance regression can be >> considered a defect) >> >> Thanks, >> Chung-Lin >> >> libgomp/ChangeLog: >> >> 2022-09-21 Chung-Lin Tang >> >> * config/nvptx/bar.c (generation_to_barrier): Remove. >> (futex_wait,futex_wake,do_spin,do_wait): Remove. >> (GOMP_WAIT_H): Remove. >> (#include "../linux/bar.c"): Remove. >> (gomp_barrier_wait_end): New function. >> (gomp_barrier_wait): Likewise. >> (gomp_barrier_wait_last): Likewise. >> (gomp_team_barrier_wait_end): Likewise. >> (gomp_team_barrier_wait): Likewise. >> (gomp_team_barrier_wait_final): Likewise. >> (gomp_team_barrier_wait_cancel_end): Likewise. >> (gomp_team_barrier_wait_cancel): Likewise. >> (gomp_team_barrier_cancel): Likewise. >> * config/nvptx/bar.h (gomp_team_barrier_wake): Remove >> prototype, add new static inline function.
[PATCH, OpenMP, Fortran] requires unified_shared_memory 1/2: adjust libgfortran memory allocators
Hi, this patch is to fix the case where 'requires unified_shared_memory' doesn't work due to memory allocator mismatch. Currently this is only for OG12 (devel/omp/gcc-12), but will apply to mainline as well once those requires patches get in. Basically, under 'requires unified_shared_memory' enables the usm_transform pass, which transforms some of the expanded Fortran intrinsic code that uses __builtin_free() into 'omp_free (..., ompx_unified_shared_mem_alloc)'. The intention is to make all dynamic memory allocation use the OpenMP unified_shared_memory allocator, but there is a big gap in this, namely libgfortran. What happens in some tests are that libgfortran allocates stuff using normal malloc(), and the usm_transform generates code that frees the stuff using omp_free(), and chaos ensues. So the proper fix we believe is: to make it possible to move the entire libgfortran on to unified_shared_memory. This first patch is a mostly mechanical patch to change all references of malloc/free/calloc/realloc in libgfortran into xmalloc/xfree/xcalloc/xrealloc in libgfortran/runtime/memory.c, as well as strdup uses into a new internal xstrdup. All of libgfortran is adjusted this way, except libgfortran/caf, which is an independent library outside of libgfortran.so. The second patch of this series will present a way to switch the references of allocators in libgfortran/runtime/memory.c from the normal glibc malloc/free/etc. to omp_alloc/omp_free/etc. when 'requires unified_shared_memory' is detected. Tested on devel/omp/gcc-12. Plans is to commit there soon, but also seeking approval for mainline once the requires stuff goes in. Thanks, Chung-Lin 2022-08-15 Chung-Lin Tang libgfortran/ChangeLog: * m4/matmul_internal.m4: Adjust malloc/free to xmalloc/xfree. * generated/matmul_c10.c: Regenerate. * generated/matmul_c16.c: Likewise. * generated/matmul_c17.c: Likewise. * generated/matmul_c4.c: Likewise. * generated/matmul_c8.c: Likewise. * generated/matmul_i1.c: Likewise. * generated/matmul_i16.c: Likewise. * generated/matmul_i2.c: Likewise. * generated/matmul_i4.c: Likewise. * generated/matmul_i8.c: Likewise. * generated/matmul_r10.c: Likewise. * generated/matmul_r16.c: Likewise. * generated/matmul_r17.c: Likewise. * generated/matmul_r4.c: Likewise. * generated/matmul_r8.c: Likewise. * generated/matmulavx128_c10.c: Likewise. * generated/matmulavx128_c16.c: Likewise. * generated/matmulavx128_c17.c: Likewise. * generated/matmulavx128_c4.c: Likewise. * generated/matmulavx128_c8.c: Likewise. * generated/matmulavx128_i1.c: Likewise. * generated/matmulavx128_i16.c: Likewise. * generated/matmulavx128_i2.c: Likewise. * generated/matmulavx128_i4.c: Likewise. * generated/matmulavx128_i8.c: Likewise. * generated/matmulavx128_r10.c: Likewise. * generated/matmulavx128_r16.c: Likewise. * generated/matmulavx128_r17.c: Likewise. * generated/matmulavx128_r4.c: Likewise. * generated/matmulavx128_r8.c: Likewise. * intrinsics/access.c (access_func): Adjust free to xfree. * intrinsics/chdir.c (chdir_i4_sub): Likewise. (chdir_i8_sub): Likewise. * intrinsics/chmod.c (chmod_func): Likewise. * intrinsics/date_and_time.c (secnds): Likewise. * intrinsics/env.c (PREFIX(getenv)): Likewise. (get_environment_variable_i4): Likewise. * intrinsics/execute_command_line.c (execute_command_line): Likewise. * intrinsics/getcwd.c (getcwd_i4_sub): Likewise. * intrinsics/getlog.c (PREFIX(getlog)): Likewise. * intrinsics/link.c (link_internal): Likewise. * intrinsics/move_alloc.c (move_alloc): Likewise. * intrinsics/perror.c (perror_sub): Likewise. * intrinsics/random.c (constructor_random): Likewise. * intrinsics/rename.c (rename_internal): Likewise. * intrinsics/stat.c (stat_i4_sub_0): Likewise. (stat_i8_sub_0): Likewise. * intrinsics/symlnk.c (symlnk_internal): Likewise. * intrinsics/system.c (system_sub): Likewise. * intrinsics/unlink.c (unlink_i4_sub): Likewise. * io/async.c (update_pdt): Likewise. (async_io): Likewise. (free_async_unit): Likewise. (init_async_unit): Adjust calloc to xcalloc. (enqueue_done_id): Likewise. (enqueue_done): Likewise. (enqueue_close): Likewise. * io/async.h (MUTEX_DEBUG_ADD): Adjust malloc/free to xmalloc/xfree. * io/close.c (st_close): Adjust strdup/free to xstrdup/xfree. * io/fbuf.c (fbuf_destroy): Adjust free to xfree. * io/format.c (free_format_hash_table): Likewise. (save_parsed_format): Likewise. (free_format): Likewise. (free_format_data)
[PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran
After the first libgfortran memory allocator preparation patch, this is the actual patch that organizes unified_shared_memory allocation into libgfortran. In the current OpenMP requires implementation, the requires_mask is collected through offload LTO processing, and presented to libgomp when registering offload images through GOMP_offload_register_ver() (called by the mkoffload generated constructor linked into the program binary) This means that the only reliable place to access omp_requires_mask is in GOMP_offload_register_ver, however since it is called through an ELF constructor in the *main program*, this runs later than libgfortran/runtime/main.c:init() constructor, and because some libgfortran init actions there start allocating memory, this can cause more deallocation errors later. Another issue is that CUDA appears to be registering some cleanup actions using atexit(), which forces libgomp to register gomp_target_fini() using atexit as well (to properly run before the underlying CUDA stuff disappears). This happens to us here as well. So to summarize we need to: (1) order libgfortran init actions after omp_requires_mask processing is done, and (2) order libgfortran cleanup actions before gomp_target_fini, to properly deallocate stuff without crashing. The above explanation is for why there's a little new set of definitions, as well as callback registering functions exported from libgomp to libgfortran, basically to register libgfortran init/fini actions into libgomp to run. Inside GOMP_offload_register_ver, after omp_requires_mask processing is done, we call into libgfortran through a new _gfortran_mem_allocators_init function to insert the omp_free/alloc/etc. based allocators into the Fortran runtime, when GOMP_REQUIRES_UNIFIED_SHARED_MEMORY is set. All symbol references between libgfortran/libgomp are defined with weak symbols. Test of the weak symbols are also used to determine if the other library exists in this program. A final issue is: the case where we have an OpenMP program that does NOT have offloading. We cannot passively determine in libgomp/libgfortran whether offloading exists or not, only the main program itself can, by seeing if the hidden __OFFLOAD_TABLE__ exists. When we do init/fini libgomp callback registering for OpenMP programs, those with no offloading will not have those callback properly run (because of no offload image loading) Therefore the solution here is a constructor added into the crtoffloadend.o fragment that does a "null" call of GOMP_offload_register_ver, solely for triggering the post-offload_register callbacks when __OFFLOAD_TABLE__ is NULL. (and because of this, the crtoffloadend.o Makefile rule is adjusted to compile with PIC) I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran interacts, but it's finally working. Again tested without regressions. Preparing to commit to devel/omp/gcc-12, and seeking approval for mainline when the requires patches are in. Thanks, Chung-Lin 2022-08-15 Chung-Lin Tang libgcc/ * Makefile.in (crtoffloadend$(objext)): Add $(PICFLAG) to compile rule. * offloadstuff.c (GOMP_offload_register_ver): Add declaration of weak symbol. (__OFFLOAD_TABLE__): Likewise. (init_non_offload): New function. libgfortran/ * gfortran.map (GFORTRAN_13): New namespace. (_gfortran_mem_allocators_init): New name inside GFORTRAN_13. * libgfortran.h (mem_allocators_init): New exported declaration. * runtime/main.c (do_init): Rename from init, add run-once guard code. (cleanup): Add run-once guard code. (GOMP_post_offload_register_callback): Declare weak symbol. (GOMP_pre_gomp_target_fini_callback): Likewise. (init): New constructor to register offload callbacks, or call do_init when not OpenMP. * runtime/memory.c (gfortran_malloc): New pointer variable. (gfortran_calloc): Likewise. (gfortran_realloc): Likewise. (gfortran_free): Likewise. (mem_allocators_init): New function. (xmalloc): Use gfortran_malloc. (xmallocarray): Use gfortran_malloc. (xcalloc): Use gfortran_calloc. (xrealloc): Use gfortran_realloc. (xfree): Use gfortran_free. libgomp/ * libgomp.map (GOMP_5.1.2): New version namespace. (GOMP_post_offload_register_callback): New name inside GOMP_5.1.2. (GOMP_pre_gomp_target_fini_callback): Likewise. (GOMP_DEFINE_CALLBACK_SET): Macro to define callback set. (post_offload_register): Define callback set for after offload image register. (pre_gomp_target_fini): Define callback set for before gomp_target_fini is called. (libgfortran_malloc_usm): New function. (libgfortran_calloc_usm): Likewise (libgfortran_realloc_usm): Likewise (libgfortran_free_usm): Likewise. (_gfortran
Re: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran
On 2022/8/15 7:06 PM, Chung-Lin Tang wrote: I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran interacts, but it's finally working. Again tested without regressions. Preparing to commit to devel/omp/gcc-12, and seeking approval for mainline when the requires patches are in. Just realized that I don't have the new testcases added in this patch. Will supplement them later :P Thanks, Chung-Lin
[PING] Re: [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule
On 2022/8/4 9:31 PM, Koning, Paul wrote: On Aug 4, 2022, at 9:17 AM, Chung-Lin Tang wrote: On 2022/6/28 10:06 PM, Jakub Jelinek wrote: On Thu, Jun 23, 2022 at 11:47:59PM +0800, Chung-Lin Tang wrote: with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next: (1) chunk_size <= -1: wraps into large unsigned value, seems to work though. (2) chunk_size == 0: infinite loop The (2) behavior is obviously not desired. This patch fixes this by changing Why? It is a user error, undefined behavior, we shouldn't slow down valid code for users who don't bother reading the standard. This is loop init code, not per-iteration. The overhead really isn't that much. The question should be, if GCC having infinite loop behavior is reasonable, even if it is undefined in the spec. I wouldn't think so. The way I see "undefined code" is that you can't complain about "wrong code" produced by the compiler. But for the compiler to malfunction on wrong input is an entirely differerent matter. For one thing, it's hard to fix your code if the compiler fails. How would you locate the offending source line? paul Ping?
[OpenMP, nvptx] Use bar.sync/arrive for barriers when tasking is not used
Hi, our work on SPEChpc2021 benchmarks show that, after the fix for PR99555 was committed: [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=5ed77fb3ed1ee0289a0ec9499ef52b99b39421f1 while that patch fixed the hang, there were quite severe performance regressions caused by this new barrier code. Under OpenMP target offload mode, Minisweep regressed by about 350%, while HPGMG-FV was about 2x slower. So the problem was presumably the new barriers, which replaced erroneous but fast bar.sync instructions, with correct but really heavy-weight futex_wait/wake operations on the GPU. This is probably required for preserving correct task vs. barrier behavior. However, the observation is that: when tasks-related functionality are not used at all by the team inside an OpenMP target region, and a barrier is just a place to wait for all threads to rejoin (no problem of invoking waiting tasks to re-start) a barrier can in that case be implemented by simple bar.sync and bar.arrive PTX instructions. That should be able to recover most performance the cases that usually matter, e.g. 'omp parallel for' inside 'omp target'. So the plan is to mark cases where 'tasks are never used'. This patch adds a 'task_never_used' flag inside struct gomp_team, initialized to true, and set to false when tasks are added to the team. The nvptx specific gomp_team_barrier_wait_end routines can then use simple barrier when team->task_never_used remains true on the barrier. Some other cases, like the master/masked construct, and single construct, also needs to have task_never_used set false; because these constructs inherently creates asymmetric loads where only a subset of threads run through the region (which may or may not use tasking), there may be the case where different threads wait at the end assuming different task_never_used cases. For correctness, these constructs must have team->task_never_used conservatively marked false at the start of the construct. This patch has been divided into two: the first is the inlining of contents of config/linux/bar.c into config/nvptx/bar.c (instead of an include). This is needed now because some parts of gomp_team_barrier_wait_[cancel_]end now needs nvptx specific adjustments. The second contains the above described changes. Tested on powerpc64le-linux and x86_64-linux with nvptx offloading, seeking approval for trunk. Thanks, Chung-Lin From c2fdc31880d2d040822e8abece015c29a6d7b472 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 1 Sep 2022 05:53:49 -0700 Subject: [PATCH 1/2] libgomp: inline config/linux/bar.c into config/nvptx/bar.c Preparing to add nvptx specific modifications to gomp_team_barrier_wait_end, et al., so change from using an #include of config/linux/bar.c in config/nvptx/bar.c, to a full copy of the implementation. 2022-09-01 Chung-Lin Tang libgomp/ChangeLog: * config/nvptx/bar.c: Adjust include of "../linux/bar.c" into an inlining of contents of config/linux/bar.c, --- libgomp/config/nvptx/bar.c | 183 - 1 file changed, 180 insertions(+), 3 deletions(-) diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c index eee2107..a850c22 100644 --- a/libgomp/config/nvptx/bar.c +++ b/libgomp/config/nvptx/bar.c @@ -161,6 +161,183 @@ static inline void do_wait (int *addr, int val) futex_wait (addr, val); } -/* Reuse the linux implementation. */ -#define GOMP_WAIT_H 1 -#include "../linux/bar.c" +/* Below is based on the linux implementation. */ + +void +gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state) +{ + if (__builtin_expect (state & BAR_WAS_LAST, 0)) +{ + /* Next time we'll be awaiting TOTAL threads again. */ + bar->awaited = bar->total; + __atomic_store_n (&bar->generation, bar->generation + BAR_INCR, + MEMMODEL_RELEASE); + futex_wake ((int *) &bar->generation, INT_MAX); +} + else +{ + do + do_wait ((int *) &bar->generation, state); + while (__atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE) == state); +} +} + +void +gomp_barrier_wait (gomp_barrier_t *bar) +{ + gomp_barrier_wait_end (bar, gomp_barrier_wait_start (bar)); +} + +/* Like gomp_barrier_wait, except that if the encountering thread + is not the last one to hit the barrier, it returns immediately. + The intended usage is that a thread which intends to gomp_barrier_destroy + this barrier calls gomp_barrier_wait, while all other threads + call gomp_barrier_wait_last. When gomp_barrier_wait returns, + the barrier can be safely destroyed. */ + +void +gomp_barrier_wait_last (gomp_barrier_t *bar) +{ + gomp_barrier_state_t state = gomp_barrier_wait_start (bar); + if (state & BAR_WAS_LAST) +gomp_barrier_wait_end (bar, st
Re: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran
On 2022/8/15 7:15 PM, Chung-Lin Tang wrote: On 2022/8/15 7:06 PM, Chung-Lin Tang wrote: I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran interacts, but it's finally working. Again tested without regressions. Preparing to commit to devel/omp/gcc-12, and seeking approval for mainline when the requires patches are in. Just realized that I don't have the new testcases added in this patch. Will supplement them later :P Here's the USM allocator/libgfortran patch, with a libgomp.fortran testcase added. Thanks, Chung-Lin 2022-09-05 Chung-Lin Tang libgcc/ * Makefile.in (crtoffloadend$(objext)): Add $(PICFLAG) to compile rule. * offloadstuff.c (GOMP_offload_register_ver): Add declaration of weak symbol. (__OFFLOAD_TABLE__): Likewise. (init_non_offload): New function. libgfortran/ * gfortran.map (GFORTRAN_13): New namespace. (_gfortran_mem_allocators_init): New name inside GFORTRAN_13. * libgfortran.h (mem_allocators_init): New exported declaration. * runtime/main.c (do_init): Rename from init, add run-once guard code. (cleanup): Add run-once guard code. (GOMP_post_offload_register_callback): Declare weak symbol. (GOMP_pre_gomp_target_fini_callback): Likewise. (init): New constructor to register offload callbacks, or call do_init when not OpenMP. * runtime/memory.c (gfortran_malloc): New pointer variable. (gfortran_calloc): Likewise. (gfortran_realloc): Likewise. (gfortran_free): Likewise. (mem_allocators_init): New function. (xmalloc): Use gfortran_malloc. (xmallocarray): Use gfortran_malloc. (xcalloc): Use gfortran_calloc. (xrealloc): Use gfortran_realloc. (xfree): Use gfortran_free. libgomp/ * libgomp.map (GOMP_5.1.2): New version namespace. (GOMP_post_offload_register_callback): New name inside GOMP_5.1.2. (GOMP_pre_gomp_target_fini_callback): Likewise. (GOMP_DEFINE_CALLBACK_SET): Macro to define callback set. (post_offload_register): Define callback set for after offload image register. (pre_gomp_target_fini): Define callback set for before gomp_target_fini is called. (libgfortran_malloc_usm): New function. (libgfortran_calloc_usm): Likewise (libgfortran_realloc_usm): Likewise (libgfortran_free_usm): Likewise. (_gfortran_mem_allocators_init): Declare weak symbol. (gomp_libgfortran_omp_allocators_init): New function. (GOMP_offload_register_ver): Add handling of host_table == NULL, calling into libgfortran to set unified_shared_memory allocators, and execution of post_offload_register callbacks. (gomp_target_init): Register all pre_gomp_target_fini callbacks to run at end of main using atexit(). * testsuite/libgomp.fortran/target-unified_shared_memory-1.f90: New test. diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in index 09b3ec8bc2e..70720cc910c 100644 --- a/libgcc/Makefile.in +++ b/libgcc/Makefile.in @@ -1045,8 +1045,9 @@ crtbeginT$(objext): $(srcdir)/crtstuff.c crtoffloadbegin$(objext): $(srcdir)/offloadstuff.c $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN +# crtoffloadend contains a constructor with calls to libgomp, so build as PIC. crtoffloadend$(objext): $(srcdir)/offloadstuff.c - $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END + $(crt_compile) $(CRTSTUFF_T_CFLAGS) $(PICFLAG) -c $< -DCRT_END crtoffloadtable$(objext): $(srcdir)/offloadstuff.c $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_TABLE diff --git a/libgcc/offloadstuff.c b/libgcc/offloadstuff.c index 10e1fe19c8e..2edb6810021 100644 --- a/libgcc/offloadstuff.c +++ b/libgcc/offloadstuff.c @@ -63,6 +63,19 @@ const void *const __offload_vars_end[0] __attribute__ ((__used__, visibility ("hidden"), section (OFFLOAD_VAR_TABLE_SECTION_NAME))) = { }; +extern void GOMP_offload_register_ver (unsigned, const void *, int, + const void *); +extern const void *const __OFFLOAD_TABLE__[0] __attribute__ ((weak)); +static void __attribute__((constructor)) +init_non_offload (void) +{ + /* If an OpenMP program has no offloading, post-offload_register callbacks + that need to run will require a call to GOMP_offload_register_ver, in + order to properly trigger those callbacks during init. */ + if (__OFFLOAD_TABLE__ == NULL) +GOMP_offload_register_ver (0, NULL, 0, NULL); +} + #elif defined CRT_TABLE extern const void *const __offload_func_table[]; diff --git a/libgfortran/gfortran.map b/libgfortran/gfortran.map index e0e795c3d48..55d2a529acd 100644 --- a/libgfortran/gfortran.map +++ b/libgfortran/gfortran.map @@ -1759,3 +1759,8 @@ GFORTRAN_12 { _gfortran_transfer_real128_write; #endif } GFORTRAN_10.2; + +GFORTRAN_13 { + global: + _gfortran_mem_allocators_init; +} GFORTRAN_12; diff --git a/libgfortran/libgfortran.h b/libgfortran/libgfortran.h index 0b893a51851.
[PATCH, nios2, committed] Add #undef of MUSL_DYNAMIC_LINKER
This patch adds an #undef of MUSL_DYNAMIC_LINKER before its #define in config/nios2/linux.h. This makes the nios2-linux build pass when the compiler is configured with --enable-werror-always. Patch pushed to master at 0697bd070c4fffb33468976c93baff9493922fb3 Chung-LinFrom 0697bd070c4fffb33468976c93baff9493922fb3 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 8 Sep 2022 23:14:38 +0800 Subject: [PATCH] nios2: Add #undef of MUSL_DYNAMIC_LINKER Add #undef of MUSL_DYNAMIC_LINKER before #define, to satisfy build checks when configured with --enable-werror-always. gcc/ChangeLog: * config/nios2/linux.h (MUSL_DYNAMIC_LINKER): Add #undef before #define. --- gcc/config/nios2/linux.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/config/nios2/linux.h b/gcc/config/nios2/linux.h index f5dd813acad..9e53dd657e4 100644 --- a/gcc/config/nios2/linux.h +++ b/gcc/config/nios2/linux.h @@ -30,6 +30,8 @@ #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{pthread:-D_REENTRANT}" #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-nios2.so.1" + +#undef MUSL_DYNAMIC_LINKER #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-nios2.so.1" #undef LINK_SPEC -- 2.17.1
[PATCH] optc-save-gen.awk: adjust generated array compare
Hi Joseph, Jan-Benedict reported a build-bot error for the nios2 port under --enable-werror-always: options-save.cc: In function 'bool cl_target_option_eq(const cl_target_option*, const cl_target_option*)': options-save.cc:9291:38: error: comparison between two arrays [-Werror=array-compare] 9291 | if (ptr1->saved_custom_code_status != ptr2->saved_custom_code_status | ~~~^ options-save.cc:9291:38: note: use unary '+' which decays operands to pointers or '&'component_ref' not supported by dump_decl[0] != &'component_ref' not supported by dump_decl[0]' to compare the addresses options-save.cc:9294:37: error: comparison between two arrays [-Werror=array-compare] 9294 | if (ptr1->saved_custom_code_index != ptr2->saved_custom_code_index | ~~^~~~ ... This is due to an array-typed TargetSave state in config/nios2/nios2.opt: ... TargetSave enum nios2_ccs_code saved_custom_code_status[256] TargetSave int saved_custom_code_index[256] ... This patch adjusts the generated array state compare from 'ptr1->array' into '&ptr1->array[0]' in gcc/optc-save-gen.awk, seems sufficient to pass the tougher checks. Tested by ensuring the compiler builds, which should be sufficient here. Okay to commit to mainline? Thanks, Chung-Lin * optc-save-gen.awk: Adjust array compare to use '&ptr->name[0]' instead of 'ptr->name'. diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk index 233d1fbb637..27aabf2955e 100644 --- a/gcc/optc-save-gen.awk +++ b/gcc/optc-save-gen.awk @@ -1093,7 +1093,7 @@ for (i = 0; i < n_target_array; i++) { name = var_target_array[i] size = var_target_array_size[i] type = var_target_array_type[i] - print " if (ptr1->" name" != ptr2->" name ""; + print " if (&ptr1->" name"[0] != &ptr2->" name "[0]"; print " || memcmp (ptr1->" name ", ptr2->" name ", " size " * sizeof(" type ")))" print "return false;"; }
[PING x2] Re: [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule
On 2022/8/26 4:15 PM, Chung-Lin Tang wrote: > On 2022/8/4 9:31 PM, Koning, Paul wrote: >> >> >>> On Aug 4, 2022, at 9:17 AM, Chung-Lin Tang wrote: >>> >>> On 2022/6/28 10:06 PM, Jakub Jelinek wrote: >>>> On Thu, Jun 23, 2022 at 11:47:59PM +0800, Chung-Lin Tang wrote: >>>>> with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next: >>>>> >>>>> (1) chunk_size <= -1: wraps into large unsigned value, seems to work >>>>> though. >>>>> (2) chunk_size == 0: infinite loop >>>>> >>>>> The (2) behavior is obviously not desired. This patch fixes this by >>>>> changing >>>> Why? It is a user error, undefined behavior, we shouldn't slow down valid >>>> code for users who don't bother reading the standard. >>> >>> This is loop init code, not per-iteration. The overhead really isn't that >>> much. >>> >>> The question should be, if GCC having infinite loop behavior is reasonable, >>> even if it is undefined in the spec. >> >> I wouldn't think so. The way I see "undefined code" is that you can't >> complain about "wrong code" produced by the compiler. But for the compiler >> to malfunction on wrong input is an entirely differerent matter. For one >> thing, it's hard to fix your code if the compiler fails. How would you >> locate the offending source line? >> >> paul > > Ping? Ping x2.
[PATCH, OpenMP] Implement uses_allocators clause for target regions
Hi Jakub, this patch implements the uses_allocators clause for OpenMP target regions. For user defined allocator handles, this allows target regions to assign memory space and traits to allocators, and automatically calls omp_init/destroy_allocator() in the beginning/end of the target region. For pre-defined allocators (i.e. omp_..._mem_alloc names), this is a no-op, such clauses are not created. Asides from the front-end portions, the target region transforms are done in gimplify_omp_workshare. This patch also includes added changes to enforce the "allocate allocator must be also in a uses_allocator clause", as to mentioned in[1]. This is done during gimplify_scan_omp_clauses. [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594039.html Tested on mainline, please see if this is okay. Thanks, Chung-Lin 2022-05-06 Chung-Lin Tang gcc/c-family/ChangeLog: * c-omp.cc (c_omp_split_clauses): Add OMP_CLAUSE_USES_ALLOCATORS case. * c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_name): Add case for uses_allocators clause. (c_parser_omp_clause_uses_allocators): New function. (c_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS case. (OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS to mask. * c-typeck.cc (c_finish_omp_clauses): Add case handling for OMP_CLAUSE_USES_ALLOCATORS. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_name): Add case for uses_allocators clause. (cp_parser_omp_clause_uses_allocators): New function. (cp_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS case. (OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS to mask. * semantics.cc (finish_omp_clauses): Add case handling for OMP_CLAUSE_USES_ALLOCATORS. fortran/ChangeLog: * gfortran.h (struct gfc_omp_namelist): Add memspace_sym, traits_sym fields. (OMP_LIST_USES_ALLOCATORS): New list enum. * openmp.cc (enum omp_mask2): Add OMP_CLAUSE_USES_ALLOCATORS. (gfc_match_omp_clause_uses_allocators): New function. (gfc_match_omp_clauses): Add case to handle OMP_CLAUSE_USES_ALLOCATORS. (OMP_TARGET_CLAUSES): Add OMP_CLAUSE_USES_ALLOCATORS. (resolve_omp_clauses): Add "USES_ALLOCATORS" to clause_names[]. * trans-array.cc (gfc_conv_array_initializer): Adjust array index to always be a created tree expression instead of NULL_TREE when zero. * trans-openmp.cc (gfc_trans_omp_clauses): For ALLOCATE clause, handle using gfc_trans_omp_variable for EXPR_VARIABLE exprs. Add handling of OMP_LIST_USES_ALLOCATORS case. * types.def (BT_FN_VOID_PTRMODE): Define. (BT_FN_PTRMODE_PTRMODE_INT_PTR): Define. gcc/ChangeLog: * builtin-types.def (BT_FN_VOID_PTRMODE): Define. (BT_FN_PTRMODE_PTRMODE_INT_PTR): Define. * omp-builtins.def (BUILT_IN_OMP_INIT_ALLOCATOR): Define. (BUILT_IN_OMP_DESTROY_ALLOCATOR): Define. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_USES_ALLOCATORS. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_USES_ALLOCATORS. * tree.h (OMP_CLAUSE_USES_ALLOCATORS_ALLOCATOR): New macro. (OMP_CLAUSE_USES_ALLOCATORS_MEMSPACE): New macro. (OMP_CLAUSE_USES_ALLOCATORS_TRAITS): New macro. * tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_USES_ALLOCATORS. (omp_clause_code_name): Add "uses_allocators". * gimplify.cc (gimplify_scan_omp_clauses): Add checking of OpenMP target region allocate clauses, to require a uses_allocators clause to exist for allocators. (gimplify_omp_workshare): Add handling of OMP_CLAUSE_USES_ALLOCATORS for OpenMP target regions; create calls of omp_init/destroy_allocator around target region body. gcc/testsuite/ChangeLog: * c-c++-common/gomp/uses_allocators-1.c: New test. * c-c++-common/gomp/uses_allocators-2.c: New test. * gfortran.dg/gomp/uses_allocators-1.f90: New test. * gfortran.dg/gomp/uses_allocators-2.f90: New test. * gfortran.dg/gomp/uses_allocators-3.f90: New test. diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index 3a7cecdf087..be3e6ff697e 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -283,6 +283,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT32_DFLOAT32, BT_DFLOAT32, BT_DFLOAT32) DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT64_DFLOAT64, BT_DFLOAT64, BT_DFLOAT64) DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, BT_DFLOAT128, BT_DFLOAT128) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR) +DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRMODE, BT_VOID, BT_PTRMODE) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_CONST_PTR, BT_VOID, BT_CONST_PTR) DEF_FU
[PATCH, OpenMP, v2] Implement uses_allocators clause for target regions
On 2022/5/7 12:40 AM, Tobias Burnus wrote: Can please also handle the new clause in Fortran's dump-parse-tree.cc? I did see some split handling in C, but not in Fortran; do you also need to up update gfc_split_omp_clauses in Fortran's trans-openmp.cc? Done. Actually, glancing at the testcases, no combined construct (like "omp target parallel") is used, I think that would be useful because of ↑. Okay, added some to testcases. +/* OpenMP 5.2: + uses_allocators ( allocator-list ) That's not completely true: uses_allocators is OpenMP 5.1. However, 5.1 only supports (for non-predefined allocators): uses_allocators( allocator(traits) ) while OpenMP 5.2 added modifiers: uses_allocatrors( traits(...), memspace(...) : allocator ) and deprecated the 5.1 'allocator(traits)'. (Scheduled for removal in OMP 6.0) The advantage of 5.2 syntax is that a memory space can be defined. I supported both syntaxes, that's why I designated it as "5.2". BTW: This makes uses_allocators the first OpenMP 5.2 feature which will make it into GCC :-) :) gcc/fortran/openmp.cc: + if (gfc_get_symbol ("omp_allocator_handle_kind", NULL, &sym) + || !sym->value + || sym->value->expr_type != EXPR_CONSTANT + || sym->value->ts.type != BT_INTEGER) + { + gfc_error ("OpenMP % constant not found by " + "% clause at %C"); + goto error; + } + allocator_handle_kind = sym; I think you rather want to use gfc_find_symbol ("omp_...", NULL, true, &sym) || sym == NULL where true is for parent_flag to search also the parent namespace. (The function returns 1 if the symbol is ambiguous, 0 otherwise - including 0 + sym == NULL when the symbol could not be found.) || sym->attr.flavor != FL_PARAMETER || sym->ts.type != BT_INTEGER || sym->attr.dimension Looks cleaner than to access sym->value. The attr.dimension is just to makes sure the user did not smuggle an array into this. (Invalid as omp_... is a reserved namespace but users will still do this and some are good in finding ICE as hobby.) Well, the intention here is to search for "omp_allocator_handle_kind" and "omp_memspace_handle_kind", and use their value to check if the kinds are the same as declared allocator handles and memspace constant. Not to generally search for "omp_...". However the sym->attr.dimension test seems useful, added in new v2 patch. However, I fear that will fail for the following two examples (both untested): use omp_lib, my_kind = omp_allocator_handle_kind integer(my_kind) :: my_allocator as this gives 'my_kind' in the symtree->name (while symtree->n.sym->name is "omp_..."). Hence, by searching the symtree for 'omp_...' the symbol will not be found. It will likely also fail for the following more realistic example: ... subroutine foo use m use omp_lib, only: omp_alloctrait ... !$omp target uses_allocators(my_allocator(traits_array) allocate(my_allocator:A) firstprivate(A) ... !$omp end target end If someone wants to use OpenMP allocators, but intentionally only imports insufficient standard symbols from omp_lib, then he/she is on their own :) The specification really makes this quite clear: omp_allocator_handle_kind, omp_alloctrait, omp_memspace_handle_kind are all part of the same package. In this case, omp_allocator_handle_kind is not in the namespace of 'foo' but the code should be still valid. Thus, an alternative would be to hard-code the value - as done for the depobj. As we have: integer, parameter :: omp_allocator_handle_kind = c_intptr_t integer, parameter :: omp_memspace_handle_kind = c_intptr_t that would be sym->ts.type == BT_CHARACTER sym->ts.kind == gfc_index_integer_kind for the allocator variable and the the memspace kind. However, I grant that either example is not very typical. The second one is more natural – such a code will very likely be written in the real world. But not with uses_allocators but rather with "!$omp requires dynamic_allocators" and omp_init_allocator(). Thoughts? As above. I mean, what is so hard with including "use omp_lib" where you need it? :D * * * gcc/fortran/openmp.cc + if (++i > 2) + { + gfc_error ("Only two modifiers are allowed on % " + "clause at %C"); + goto error; + } + Is this really needed? There is a check for multiple traits and multiple memspace Thus, 'trait(),memspace(),trait()' is already handled and 'trait(),something' give a break and will lead to an error as in that case a ':' and not ',something' is expected. I think it could be worth reminding that limitation, instead of a generic error. + if (gfc_match_char ('(') == MATCH_YES) + { + if (memspace_seen || traits_seen) + { + gfc_error ("Modifiers cannot be used with legacy " + "array syntax at %C"); I wouldn't uses the term 'array synax' to denote uses_allocators(allocator (alloc_array) ) How about: error: "Using both
[PATCH, OpenACC, v2] Non-contiguous array support for OpenACC data clauses
Hi Thomas, after your last round of review, I realized that the bulk of the compiler omp-low work was simply a case of dumb over-engineering in the wrong direction :P (although it did painstakingly function correctly) Instead of making code changes for bias adjustment in the child function code in the omp-low phase, this should simply be done by the libgomp runtime map preparation (similar to how the current single-dimension array biases are handled) So this updated patch (1) discards away a large part of the last omp-low.c patch, and (2) adjusts the libgomp/target.c patch to do the per-dimensional adjustments. Also, the bit of C/C++ front-end logic you mentioned that was questionable was removed. After looking closely, it wasn't needed; the relaxing of pointers for OpenACC was enough. Still some aspects of handling arrays inside the multi-dimension type still need some more work, e.g. see the catching in the omp-low.c part. A compiler dg-scan testcase was also added. However, the issue of ACC_DEVICE_TYPE=host not working (and hence "!openacc_host_selected" in the testcases) actually is a bit more sophisticated than I thought: The reason it doesn't work for the host device, is because we use the map pointer (i.e. a hostaddrs[] entry when passed into libgomp) to point to an array descriptor to pass the whole array information, and rely on code inside gomp_map_vars_* to setup things, and place the final on-device address of the non-contig. array into devaddrs[], therefore only using a single map entry (something I thought was quite clever) However, this broke down on the host and host-fallback devices, simply because, there we do NOT do any gomp_map_vars processing; our current code in GOACC_parallel_keyed simply skips it and passes the offload function the original hostaddrs[] contents. Lacking the processing to transform the descriptor pointer into a proper array ref, things of course segfault. So I think we have three options for this (which may have some interactions with say, the "proper" host-side parallelization we eventually need to implement for OpenACC 2.7) (1) The simplest solution: implement a processing which searches and reverts such non-contiguous array map entries in GOACC_parallel_keyed. (note: I have implemented this in the current attached "v2" patch) (2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes; i.e. still do the proper gomp_map_vars processing for all cases. (3) Modify the non-contiguous array map conventions: a possible solution is to use two maps placed together: one for the array pointer, another for the array descriptor (as opposed to the current style of using only one map) This needs more further elaborate compiler/runtime work. The first two options will pessimize host-mode performance somewhat. The third I have some WIP patches, but it's still buggy ATM. Seeking your opinion on what we should do. Thanks, Chung-Lin gcc/c/ * c-typeck.c (handle_omp_array_sections_1): Add 'bool &non_contiguous' parameter, adjust recursive call site, add cases for allowing pointer based multi-dimensional arrays for OpenACC. (handle_omp_array_sections): Adjust handle_omp_array_sections_1 call, handle non-contiguous case to create dynamic array map. gcc/cp/ * semantics.c (handle_omp_array_sections_1): Add 'bool &non_contiguous' parameter, adjust recursive call site, add cases for allowing pointer based multi-dimensional arrays for OpenACC. (handle_omp_array_sections): Adjust handle_omp_array_sections_1 call, handle non-contiguous case to create dynamic array map. gcc/ * gimplify.c (gimplify_scan_omp_clauses): For non-contiguous array map kinds, make sure bias in each dimension are put into firstprivate variables. * omp-low.c (append_field_to_record_type): New function. (create_noncontig_array_descr_type): Likewise. (create_noncontig_array_descr_init_code): Likewise. (scan_sharing_clauses): For non-contiguous array map kinds, check for supported dimension structure, and install non-contiguous array variable into current omp_context. (reorder_noncontig_array_clauses): New function. (scan_omp_target): Call reorder_noncontig_array_clauses to place non-contiguous array map clauses at beginning of clause sequence. (lower_omp_target): Add handling for non-contiguous array map kinds. * tree-pretty-print.c (dump_omp_clauses): Add cases for printing GOMP_MAP_NONCONTIG_ARRAY map kinds. include/ * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define. (enum gomp_map_kind): Add GOMP_MAP_NONCONTIG_ARRAY, GOMP_MAP_NONCONTIG_ARRAY_TO, GOMP_MAP_NONCONTIG_ARRAY_FROM, GOMP_MAP_NONCONTIG_ARRAY_TOFROM, GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO, GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM, GOMP_MAP_NONCONTI
Re: [PATCH, nvptx] Expand OpenACC child function arguments to use CUDA params space
On 2019/10/8 10:05 PM, Thomas Schwinge wrote: Hi Chung-Lin! While we're all waiting for Tom to comment on this;-) -- here's another item I realized: On 2019-09-10T19:41:59+0800, Chung-Lin Tang wrote: The libgomp nvptx plugin changes are also quite contained, with lots of now unneeded [...] code deleted (since we no longer first cuAlloc a buffer for the argument record before cuLaunchKernel) It would be nice;-) -- but unless I'm confused, it's not that simple: we either have to reject (force host-fallback execution) or keep supporting "old-style" nvptx offloading code: new-libgomp has to continue to work with nvptx offloading code once generated by old-GCC. Possibly even a mixture of old and new nvptx offloading code, if libraries are involved, huh! I have not completely thought that through, but I suppose this could be addressed by adding a flag to the 'struct nvptx_fn' (or similar) that's synthesized by nvptx 'mkoffload'? Hi Thomas, Tom, I've looked at the problem, it is unfortunate that we overlooked the need for versioning of NVPTX images, and did not reserve something in 'struct nvptx_tdata' for something like this. But how about something like: typedef struct nvptx_tdata { const struct targ_ptx_obj *ptx_objs; unsigned ptx_num; unsigned ptx_version; /* < Add version field here. */ const char *const *var_names; unsigned var_num; const struct targ_fn_launch *fn_descs; unsigned fn_num; } nvptx_tdata_t; We currently only support x86_64 and powerpc64le hosts, which are both LP64 targets. Assuming that, the position above where I put the new 'ptx_version' field is already a 32-bit sized alignment hole, doesn't change the layout of other fields, and in the static 'target_data' variable generated by mkoffload should be zeroed in current circulating binaries (unless binutils is not doing the intuitive thing...) If these assumptions are safe, then we can treat as if ptx_version == 0 right now, and from now on bump it to 1 for these new nvptx convention changes. (We can do a similar thing in 'struct targ_fn_launch' if we want to differentiate at a per-function level.) Any considerations? Thanks, Chung-Lin
Re: [PATCH, OpenACC, v2] Non-contiguous array support for OpenACC data clauses
Hi Thomas, thanks for the first review. I'm still working on another revision, but wanted to respond to some of the issues you raised first: On 2019/11/7 8:48 AM, Thomas Schwinge wrote: (1) The simplest solution: implement a processing which searches and reverts such non-contiguous array map entries in GOACC_parallel_keyed. (note: I have implemented this in the current attached "v2" patch) (2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes; i.e. still do the proper gomp_map_vars processing for all cases. (3) Modify the non-contiguous array map conventions: a possible solution is to use two maps placed together: one for the array pointer, another for the array descriptor (as opposed to the current style of using only one map) This needs more further elaborate compiler/runtime work. The first two options will pessimize host-mode performance somewhat. The third I have some WIP patches, but it's still buggy ATM. Seeking your opinion on what we should do. I'll have to think about it some more, but variant (1) doesn't seem so bad actually, for a first take. While it's not nice to pessimize in particular directives with 'if (false)' clauses, at least it does work, the run-time overhead should not be too bad (also compared to variant (2), I suppose), and variant (3) can still be implemented later. The issue is that (1),(2) vs (3) have different binary interfaces, so a decision has to be made first, lest we again have compatibility issues later. Also, (1) vs (2) also may be somewhat different do to the memory copying effects of gomp_map_vars() (possible semantic difference versus the usual shared memory expectations?) I'm currently working on another way of implementing something similar to (3), but using the variadic arguments of GOACC_parallel_keyed instead of maps, WDYT? @@ -13238,6 +13247,7 @@ handle_omp_array_sections (tree c, enum c_omp_regi unsigned int num = types.length (), i; tree t, side_effects = NULL_TREE, size = NULL_TREE; tree condition = NULL_TREE; + tree ncarray_dims = NULL_TREE; if (int_size_in_bytes (TREE_TYPE (first)) <= 0) maybe_zero_len = true; @@ -13261,6 +13271,13 @@ handle_omp_array_sections (tree c, enum c_omp_regi length = fold_convert (sizetype, length); if (low_bound == NULL_TREE) low_bound = integer_zero_node; + + if (non_contiguous) + { + ncarray_dims = tree_cons (low_bound, length, ncarray_dims); + continue; + } + if (!maybe_zero_len && i > first_non_one) { if (integer_nonzerop (low_bound)) I'm not at all familiar with this array sections code, will trust your understanding that we don't need any of the processing that you're skipping here ('continue'): 'TREE_SIDE_EFFECTS' handling for the length expressions, and other things. I will re-check on this. Ditto for the other minor issues you raised. if (DECL_P (decl)) { if (DECL_SIZE (decl) @@ -2624,6 +2830,14 @@ scan_omp_target (gomp_target *stmt, omp_context *o gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn); } + /* If is OpenACC construct, put non-contiguous array clauses (if any) + in front of clause chain. The runtime can then test the first to see + if the additional map processing for them is required. */ + if (is_gimple_omp_oacc (stmt)) +reorder_noncontig_array_clauses (gimple_omp_target_clauses_ptr (stmt)); Should that be deemed unsuitable for any reason, then add a new 'GOACC_FLAG_*' flag to indicate existance of non-contiguous arrays. I'm considering using that convention unconditionally, not sure if it's faster though, since that means we can't do the 'early breaking' you mentioned when scanning through maps looking for GOMP_MAP_NONCONTIG_ARRAY_P. --- include/gomp-constants.h(revision 277827) +++ include/gomp-constants.h(working copy) @@ -40,6 +40,7 @@ #define GOMP_MAP_FLAG_SPECIAL_0 (1 << 2) #define GOMP_MAP_FLAG_SPECIAL_1 (1 << 3) #define GOMP_MAP_FLAG_SPECIAL_2 (1 << 4) +#define GOMP_MAP_FLAG_SPECIAL_3(1 << 5) #define GOMP_MAP_FLAG_SPECIAL (GOMP_MAP_FLAG_SPECIAL_1 \ | GOMP_MAP_FLAG_SPECIAL_0) /* Flag to force a specific behavior (or else, trigger a run-time error). */ @@ -127,6 +128,26 @@ enum gomp_map_kind /* Decrement usage count and deallocate if zero. */ GOMP_MAP_RELEASE =(GOMP_MAP_FLAG_SPECIAL_2 | GOMP_MAP_DELETE), +/* Mapping kinds for non-contiguous arrays. */ +GOMP_MAP_NONCONTIG_ARRAY = (GOMP_MAP_FLAG_SPECIAL_3), +GOMP_MAP_NONCONTIG_ARRAY_TO = (GOMP_MAP_NONCONTIG_ARRAY +| GOMP_MAP_TO), +GOMP_MAP_NONCONTIG_ARRAY_FROM =(GOMP_MAP_NONCONTIG_ARRAY +
Re: [PATCH, OpenMP 5.0] Implement structure element mapping changes in 5.0
Thank you Jakub, I'll need some time to look at this. Thanks. Chung-Lin On 2020/10/30 10:05 PM, Jakub Jelinek wrote: On Mon, Oct 26, 2020 at 09:10:08AM +0100, Jakub Jelinek via Gcc-patches wrote: Yes, it is a QoI and it is important not to regress about that. Furthermore, the more we diverge from what the spec says, it will be harder for us to implement, not just now, but in the future too. What I wrote about the actual implementation is actually not accurate, we need the master and slaves to be the struct splay_tree_key_s objects. And that one already has the aux field that could be used for the slaves, so we could e.g. use another magic value of refcount, e.g. REFCOUNT_SLAVE ~(uintptr_t) 2, and in that case aux would point to the master splay_tree_key_s. And the "If the corresponding list item’s reference count was not already incremented because of the effect of a map clause on the construct then: a) The corresponding list item’s reference count is incremented by one;" and "If the map-type is not delete and the corresponding list item’s reference count is finite and was not already decremented because of the effect of a map clause on the construct then: a) The corresponding list item’s reference count is decremented by one;" rules we need to implement in any case, I don't see a way around that. The same list item can now be mapped (or unmapped) multiple times on the same construct. To show up what exactly I meant, here is a proof of concept (but unfinished) patch. For OpenMP only (I believe OpenACC ATM doesn't have such concept of structure sibling lists nor requirement as OpenMP 5.0 that on one construct one refcount isn't incremented multiple times nor decremented multiple times) it uses the dynamic_refcount field otherwise only used in OpenACC for the structure sibling lists; in particular, all but the first mapping in a structure sibling list will have refcount == REFCOUNT_SIBLING and dynamic_refcount pointing to their master's refcount field. And the master has dynamic_refcount set to the number of REFCOUNT_SIBLING following those. In the patch I've only changed the construction of such splay_tree_keys and changed gomp_exit_data to do deal with those (that is the very easy part) plus implement the OpenMP 5.0 rule that one refcount isn't decremented more than once. What would need to be done is handle the rest, in particular (for OpenMP only) adjust the refcount (splay_tree_key only, not target_mem_desc), such that for the just created splay_tree_keys (refcount pointers in between tgt->array and end of the array (perhaps we should add a field how many elts the array has) it doesn't bump anything - just rely on the refcount = 1 we do elsewhere, and for other refcounts, if REFCOUNT_SIBLING, use the dynamic_refcount pointer and if not REFCOUNT_INFINITY, instead of bumping the refcount queue it for later increments (again, with allocaed list). And when unmapping at the end of target or target data, do something similar to what gomp_exit_data does in the patch (perhaps with some helper functions). At least from omp-lang discussions, the intent is that e.g. on struct S { int a, b, c, d, e; } s = { 1, 2, 3, 4, 5}; #pragma omp target enter data map (s) // same thing as // #pragma omp target enter data map (s.a, s.b, s.c, s.d, s.e) // The above at least theoretically creates 5 mappings, with // refcount set to 1 for each (but with all those refcount behaving // in sync), but I'd strongly prefer to create just one with one refcount. int *p = &s.b; int *q = &s.d; #pragma omp target enter data map (p[:1]) map (q[:1]) // Above needs to bump either the refcounts of all of s.a, s.b, s.c, s.d and // s.e by 1, or when it all has just a single refcount, bump it also just by // 1. int a; #pragma omp target enter data map (a) // This creates just one mapping and sets refcount to 1 // as int is not an aggregate char *r, *s; r = (char *) &a; s = r + 2; #pragma omp target enter data map (r[:1], s[:1]) // The above should bump the refcount of a just once, not twice in OpenMP // 5.0. For both testcases, I guess one can try to construct from that user observable tests where the refcount will result in copying the data back at certain points (or not). And for the non-contiguous structure element mappings, the idea would be that we still use a single refcount for the whole structure sibling list defined in the spec. --- libgomp/libgomp.h.jj2020-10-30 12:57:16.176284101 +0100 +++ libgomp/libgomp.h 2020-10-30 12:57:40.264014514 +0100 @@ -1002,6 +1002,10 @@ struct target_mem_desc { /* Special value for refcount - tgt_offset contains target address of the artificial pointer to "omp declare target link" object. */ #define REFCOUNT_LINK (~(uintptr_t) 1) +/* Special value for refcount - structure sibling list item other than + the first one. *(uintptr_t *)dynamic_refcount is the actual refcount + for it. */ +#define REFCOUNT_SIBLING (~(uintptr_t) 2) /* Special offset values. */ #
Re: [PATCH, 1/3, OpenMP] Target mapping changes for OpenMP 5.0, front-end parts
Hi Jakub, here is v3 of this patch set. On 2020/10/29 7:44 PM, Jakub Jelinek wrote: +extern void c_omp_adjust_clauses (tree, bool); So, can you please rename the function to either c_omp_adjust_target_clauses or c_omp_adjust_mapping_clauses or c_omp_adjust_map_clauses? I've renamed it to 'c_omp_adjust_map_clauses'. --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -2579,3 +2579,50 @@ c_omp_map_clause_name (tree clause, bool oacc) } return omp_clause_code_name[OMP_CLAUSE_CODE (clause)]; } + +/* Adjust map clauses after normal clause parsing, mainly to turn specific + base-pointer map cases into attach/detach and mark them addressable. */ +void +c_omp_adjust_clauses (tree clauses, bool is_target) +{ + for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP + && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER If this is only meant to handle decls, perhaps there should be && DECL_P (OMP_CLAUSE_DECL (c)) ? + && TREE_CODE (TREE_TYPE (OMP_CLAUSE_DECL (c))) != ARRAY_TYPE) + { + tree ptr = OMP_CLAUSE_DECL (c); + bool ptr_mapped = false; + if (is_target) + { + for (tree m = clauses; m; m = OMP_CLAUSE_CHAIN (m)) + if (OMP_CLAUSE_CODE (m) == OMP_CLAUSE_MAP + && OMP_CLAUSE_DECL (m) == ptr + && (OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_ALLOC + || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_TO + || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_FROM + || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_TOFROM + || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_ALWAYS_TO + || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_ALWAYS_FROM + || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_ALWAYS_TOFROM)) + { + ptr_mapped = true; + break; + } What you could e.g. do is have this loop at the start of function, with && DECL_P (OMP_CLAUSE_DECL (m)) instead of the == ptr check, and perhaps && POINTER_TYPE_P (TREE_TYPE (OMP_CLAUSE_DECL (m))) check and set a bit in a bitmap for each such decl, then in the GOMP_MAP_FIRSTPRIVATE_POINTER loop just check the bitmap. Or, keep it in the loop like it is above, but populate the bitmap lazily (upon seeing the first GOMP_MAP_FIRSTPRIVATE_POINTER) and for further ones just use it. I re-wrote c_omp_adjust_map_clauses to address the complexity issues you mentioned, now it should be limited by a linear pass to collect and merge the firstprivate base pointer + existence of a mapping of it, using a hash_map. Patch set has been re-tested with no regressions for gcc, g++, gfortran, and libgomp. Thanks, Chung-Lin gcc/c-family/ * c-common.h (c_omp_adjust_map_clauses): New declaration. * c-omp.c (c_omp_adjust_map_clauses): New function. gcc/c/ * c-parser.c (c_parser_omp_target_data): Add use of new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as handled map clause kind. (c_parser_omp_target_enter_data): Likewise. (c_parser_omp_target_exit_data): Likewise. (c_parser_omp_target): Likewise. * c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. (c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and same struct field access to co-exist on OpenMP construct. gcc/cp/ * parser.c (cp_parser_omp_target_data): Add use of new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as handled map clause kind. (cp_parser_omp_target_enter_data): Likewise. (cp_parser_omp_target_exit_data): Likewise. (cp_parser_omp_target): Likewise. * semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix interaction between reference case and attach/detach. (finish_omp_clauses): Adjust bitmap checks to allow struct decl and same struct field access to co-exist on OpenMP construct. diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index bb38e6c76a4..3eb909a2946 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -1221,6 +1221,7 @@ extern enum omp_clause_defaultmap_kind c_omp_predetermined_mapping (tree); extern tree c_omp_check_context_selector (location_t, tree); extern void c_omp_mark_declare_variant (location_t, tree, tree); extern const char *c_omp_map_clause_name (tree, bool); +extern void c_omp_adjust_map_clauses (tree, bool); /* Return next tree in the chain for chain_next walking of tree nodes. */ static inline tree diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c index d7cff0f4cca..275c6afabe1 100644 --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -2579,3 +2579,92 @@ c_omp_m
Re: [PATCH, 2/3, OpenMP] Target mapping changes for OpenMP 5.0, middle-end parts and compiler testcases
On 2020/10/29 7:49 PM, Jakub Jelinek wrote: On Wed, Oct 28, 2020 at 06:32:21PM +0800, Chung-Lin Tang wrote: @@ -8958,25 +9083,20 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, /* An "attach/detach" operation on an update directive should behave as a GOMP_MAP_ALWAYS_POINTER. Beware that unlike attach or detach map kinds, GOMP_MAP_ALWAYS_POINTER depends on the previous mapping. */ if (code == OACC_UPDATE && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH) OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_ALWAYS_POINTER); - if (gimplify_expr (pd, pre_p, NULL, is_gimple_lvalue, fb_lvalue) - == GS_ERROR) - { - remove = true; - break; - } So what gimplifies those now? They're gimplified somewhere during omp-low now. (some gimplify scan testcases were adjusted to accommodate this change) I don't remember the exact case I encountered, but there were some issues with gimplified expressions inside the map clauses making some later checking more difficult. I haven't seen any negative effect of this modification so far. I don't like that, it goes against many principles, gimplification really shouldn't leave around non-GIMPLE IL. If you need to compare same expression or same expression bases later, perhaps detect the equalities during gimplification before actually gimplifying the clauses and ensure they are gimplified to the same expression or are using same base (e.g. by adding SAVE_EXPRs or TARGET_EXPRs before the gimplification). I have moved that same gimplify_expr call down to below the processing block, and things still work as expected. My aforementioned gimple-scan-test modifications have all been reverted, and all original tests still pass correctly. Thanks, Chung-Lin gcc/ * gimplify.c (is_or_contains_p): New static helper function. (omp_target_reorder_clauses): New function. (gimplify_scan_omp_clauses): Add use of omp_target_reorder_clauses to reorder clause list according to OpenMP 5.0 rules. Add handling of GOMP_MAP_ATTACH_DETACH for OpenMP cases. * omp-low.c (is_omp_target): New static helper function. (scan_sharing_clauses): Add scan phase handling of GOMP_MAP_ATTACH/DETACH for OpenMP cases. (lower_omp_target): Add lowering handling of GOMP_MAP_ATTACH/DETACH for OpenMP cases. gcc/testsuite/ * c-c++-common/gomp/clauses-2.c: Remove dg-error cases now valid. * gfortran.dg/gomp/map-2.f90: Likewise. * c-c++-common/gomp/map-5.c: New testcase. diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 29f385c9368..c2500656193 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -8364,6 +8364,113 @@ extract_base_bit_offset (tree base, tree *base_ref, poly_int64 *bitposp, return base; } +/* Returns true if EXPR is or contains (as a sub-component) BASE_PTR. */ + +static bool +is_or_contains_p (tree expr, tree base_ptr) +{ + while (expr != base_ptr) +if (TREE_CODE (base_ptr) == COMPONENT_REF) + base_ptr = TREE_OPERAND (base_ptr, 0); +else + break; + return expr == base_ptr; +} + +/* Implement OpenMP 5.x map ordering rules for target directives. There are + several rules, and with some level of ambiguity, hopefully we can at least + collect the complexity here in one place. */ + +static void +omp_target_reorder_clauses (tree *list_p) +{ + /* Collect refs to alloc/release/delete maps. */ + auto_vec ard; + tree *cp = list_p; + while (*cp != NULL_TREE) +if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP + && (OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_ALLOC + || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_RELEASE + || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_DELETE)) + { + /* Unlink cp and push to ard. */ + tree c = *cp; + tree nc = OMP_CLAUSE_CHAIN (c); + *cp = nc; + ard.safe_push (c); + + /* Any associated pointer type maps should also move along. */ + while (*cp != NULL_TREE + && OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP + && (OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_FIRSTPRIVATE_REFERENCE + || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_FIRSTPRIVATE_POINTER + || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_ATTACH_DETACH + || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_POINTER + || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_ALWAYS_POINTER + || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_TO_PSET)) + { + c = *cp; + nc = OMP_CLAUSE_CHAIN (c); + *cp = nc; + ard.safe_push (c); + } + } +else + cp = &OMP_CLAUSE_CHAIN (*cp); + + /* Link alloc/release/delete maps t
Re: [PATCH, 3/3, OpenMP] Target mapping changes for OpenMP 5.0, libgomp parts [resend]
On 2020/10/28 6:33 PM, Chung-Lin Tang wrote: On 2020/9/1 9:37 PM, Chung-Lin Tang wrote: his patch is the changes to libgomp and testcases. There is now (again) a need to indicate OpenACC/OpenMP and an 'enter data' style directive, so the associated changes to 'enum gomp_map_vars_kind'. There is a slight change in the logic of gomp_attach_pointer handling, because for OpenMP there might be a non-offloaded data clause that attempts an attachment but silently continues in case the pointer is not mapped. Also in the testcases, an XFAILed testcase for structure element mapping is added. OpenMP 5.0 specifies that a element of the same structure variable are allocated/deallocated in a uniform fashion, but this hasn't been implemented yet in this patch. Hi Jakub, you haven't reviewed this 3rd part yet, but still updating with a rebased patch here. I've removed the above mentioned XFAILed testcase from the patch, since it actually belongs in the structure element mapping patches instead of here. Thanks, Chung-Lin libgomp/ * libgomp.h (enum gomp_map_vars_kind): Adjust enum values to be bit-flag usable. * oacc-mem.c (acc_map_data): Adjust gomp_map_vars argument flags to 'GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA'. (goacc_enter_datum): Likewise for call to gomp_map_vars_async. (goacc_enter_data_internal): Likewise. * target.c (gomp_map_vars_internal): Change checks of GOMP_MAP_VARS_ENTER_DATA to use bit-and (&). Adjust use of gomp_attach_pointer for OpenMP cases. (gomp_exit_data): Add handling of GOMP_MAP_DETACH. (GOMP_target_enter_exit_data): Add handling of GOMP_MAP_ATTACH. * testsuite/libgomp.c-c++-common/ptr-attach-1.c: New testcase. For the libgomp patch, v3 doesn't update any of the code proper, but the libgomp.c-c++-common/ptr-attach-1.c testcase had some code added to test the case of a base-pointer on device by "declare target". Thanks, Chung-Lin diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index da7ac037dcd..0cc3f4d406b 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1162,10 +1162,10 @@ struct gomp_device_descr /* Kind of the pragma, for which gomp_map_vars () is called. */ enum gomp_map_vars_kind { - GOMP_MAP_VARS_OPENACC, - GOMP_MAP_VARS_TARGET, - GOMP_MAP_VARS_DATA, - GOMP_MAP_VARS_ENTER_DATA + GOMP_MAP_VARS_OPENACC= 1, + GOMP_MAP_VARS_TARGET = 2, + GOMP_MAP_VARS_DATA = 4, + GOMP_MAP_VARS_ENTER_DATA = 8 }; extern void gomp_acc_declare_allocate (bool, size_t, void **, size_t *, diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c index 65757ab2ffc..8dc521ac6d6 100644 --- a/libgomp/oacc-mem.c +++ b/libgomp/oacc-mem.c @@ -403,7 +403,8 @@ acc_map_data (void *h, void *d, size_t s) struct target_mem_desc *tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, &devaddrs, &sizes, -&kinds, true, GOMP_MAP_VARS_ENTER_DATA); +&kinds, true, +GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA); assert (tgt); assert (tgt->list_count == 1); splay_tree_key n = tgt->list[0].key; @@ -572,7 +573,8 @@ goacc_enter_datum (void **hostaddrs, size_t *sizes, void *kinds, int async) struct target_mem_desc *tgt = gomp_map_vars_async (acc_dev, aq, mapnum, hostaddrs, NULL, sizes, - kinds, true, GOMP_MAP_VARS_ENTER_DATA); + kinds, true, + GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA); assert (tgt); assert (tgt->list_count == 1); n = tgt->list[0].key; @@ -1202,7 +1204,7 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, struct target_mem_desc *tgt = gomp_map_vars_async (acc_dev, aq, groupnum, &hostaddrs[i], NULL, &sizes[i], &kinds[i], true, - GOMP_MAP_VARS_ENTER_DATA); + GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA); assert (tgt); gomp_mutex_lock (&acc_dev->lock); diff --git a/libgomp/target.c b/libgomp/target.c index 1a8c67c2df5..61dab064fae 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -683,7 +683,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum); tgt->list_count = mapnum; - tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1; + tgt->refcount = (pragma_kind & GOMP_MAP_VARS_ENTER_DATA) ? 0 : 1; tgt->device_descr = devicep; tgt->prev = NULL; struct gomp_coalesce_buf cbuf, *cbufp = NULL; @@ -1212,15 +1212,16 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, /* OpenACC
[PATCH v2, OpenMP 5, C++] Implement implicit mapping of this[:1] (PR92120)
Hi Jakub, there was a first version of this patch here: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554087.html The attached patch here is a v2 version that adds implementation of this part in the this[:1] functionality description in the OpenMP 5.0 spec: "if the [member] variable [accessed in a target region] is of a type pointer or reference to pointer, it is also treated as if it has appeared in a map clause as a zero-length array section." Basically, referencing a pointer member 'ptr' automatically maps it with the equivalent of 'map(this->ptr[:0])' To achieve this, two new map kinds GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION were added, which are basically split from GOMP_MAP_ATTACH and GOMP_MAP_POINTER, except now allowing the pointer target to be NULL. This patch has been tested for gcc, g++, gfortran (C and Fortran are not really affected, but since omp-low.c was slightly touched, tested along for completeness) and libgomp on x86_64-linux with nvptx offloading, all without regressions. Is this okay for trunk? Thanks, Chung-Lin 2020-11-13 Chung-Lin Tang PR middle-end/92120 gcc/cp/ * cp-tree.h (finish_omp_target): New declaration. (set_omp_target_this_expr): Likewise. * lambda.c (lambda_expr_this_capture): Add call to set_omp_target_this_expr. * parser.c (cp_parser_omp_target): Factor out code, change to call finish_omp_target, add re-initing call to set_omp_target_this_expr. * semantics.c (omp_target_this_expr): New static variable. (omp_target_ptr_members_accessed): New static hash_map for tracking accessed non-static pointer-type members. (finish_non_static_data_member): Add call to set_omp_target_this_expr. Add recording of non-static pointer-type members access. (finish_this_expr): Add call to set_omp_target_this_expr. (set_omp_target_this_expr): New function to set omp_target_this_expr. (finish_omp_target): New function with code merged from cp_parser_omp_target, plus code to implement this[:1] and __closure map clauses for OpenMP. gcc/ * omp-low.c (lower_omp_target): Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. * tree-pretty-print.c (dump_omp_clause): Likewise. include/ * gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. (GOMP_MAP_POINTER_P): Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION. libgomp/ * libgomp.h (gomp_attach_pointer): Add bool parameter. * oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer. (goacc_enter_data_internal): Likewise. * target.c (gomp_map_vars_existing): Update assert condition to include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION. (gomp_map_pointer): Add 'bool allow_zero_length_array_sections' parameter, add support for mapping a pointer with NULL target. (gomp_attach_pointer): Add 'bool allow_zero_length_array_sections' parameter, add support for attaching a pointer with NULL target. (gomp_map_vars_internal): Update calls to gomp_map_pointer and gomp_attach_pointer, add handling for GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases. gcc/testsuite/ * g++.dg/gomp/target-this-1.C: New testcase. * g++.dg/gomp/target-this-2.C: New testcase. * g++.dg/gomp/target-this-3.C: New testcase. * g++.dg/gomp/target-this-4.C: New testcase. libgomp/ * testsuite/libgomp.c++/target-this-1.C: New testcase. * testsuite/libgomp.c++/target-this-2.C: New testcase. * testsuite/libgomp.c++/target-this-3.C: New testcase. * testsuite/libgomp.c++/target-this-4.C: New testcase. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 63724c0e84f..e45540e08f1 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7277,6 +7277,8 @@ extern void record_null_lambda_scope (tree); extern void finish_lambda_scope(void); extern tree start_lambda_function (tree fn, tree lambda_expr); extern void finish_lambda_function (tree body); +extern tree finish_omp_target (location_t, tree, tree, bool); +extern void set_omp_target_this_expr (tree); /* in tree.c */ extern int cp_tree_operand_length (const_tree); diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index 1a1647f465e..eb09971f288 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -841,6 +841,9 @@ lambda_expr_this_capture (tree lambda, int add_capture_p)
Re: [PING^4][PATCH 0/4] Fix library testsuite compilation for build sysroot
On 2020/1/6 11:25 PM, Maciej W. Rozycki wrote: Overall if libgomp-test-support.exp is considered appropriate for standalone testing, then I think two solutions are possible here: 1. An option is added to libgomp's $CC such that the compiler is able to make builds involving the offload compiler where requested, and this then propagates to GCC_UNDER_TEST as it stands. 2. The definition of GCC_UNDER_TEST in libgomp-test-support.exp is only made if inexistent, and then you can predefine the variable in site.exp however you find appropriate. Hi Maciej, I understand your situation with --with-build-sysroot/--without-sysroot, but the way you set GCC_UNDER_TEST in libgomp-test-support.exp appears to override too much of the machinery in libgomp/testsuite/lib/libgomp.exp that sets GCC_UNDER_TEST using DejaGNU find_gcc, etc. Can you test if the attached patch works for you? The patch exports the build sysroot setting from the toplevel to target library subdirs, and adds the --sysroot= option when doing build-tree testing (I assume that blddir != "" test is sufficient from the surrounding comments) I can only verify that it no longer "interferes" with our installed-mode testing. Also, if this does work, then other library testsuites (e.g. libatomic.exp) might also need considering updating, I think. Thanks, Chung-Lin 2020-01-14 Chung-Lin Tang * Makefile.tpl (NORMAL_TARGET_EXPORTS): Add export of SYSROOT_CFLAGS_FOR_TARGET variable. * Makefile.in: Regenerate. libgomp/ * testsuite/lib/libgomp.exp (ALWAYS_CFLAGS): Add --sysroot=$SYSROOT_CFLAGS_FOR_TARGET option when doing build-tree testing. Fix comment typo. * testsuite/libgomp-test-support.exp.in (GCC_UNDER_TEST): Delete definition. Index: libgomp/testsuite/lib/libgomp.exp === --- libgomp/testsuite/lib/libgomp.exp (revision 279954) +++ libgomp/testsuite/lib/libgomp.exp (working copy) @@ -171,9 +171,16 @@ proc libgomp_init { args } { lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/../../include" lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/.." -# For build-tree testing, also consider the library paths used for builing. +# For build-tree testing, also consider the library paths used for building. # For installed testing, we assume all that to be provided in the sysroot. if { $blddir != "" } { + + # If --with-build-sysroot= was specified, we assume it will be needed + # for build-tree testing. + if [info exists SYSROOT_CFLAGS_FOR_TARGET] { + lappend ALWAYS_CFLAGS "additional_flags=--sysroot=$SYSROOT_CFLAGS_FOR_TARGET" + } + # The `-fopenacc' and `-fopenmp' options imply `-pthread', and # that implies `-latomic' on some hosts, so wire in libatomic # build directories. Index: libgomp/testsuite/libgomp-test-support.exp.in === --- libgomp/testsuite/libgomp-test-support.exp.in (revision 279954) +++ libgomp/testsuite/libgomp-test-support.exp.in (working copy) @@ -1,5 +1,3 @@ -set GCC_UNDER_TEST {@CC@} - set cuda_driver_include "@CUDA_DRIVER_INCLUDE@" set cuda_driver_lib "@CUDA_DRIVER_LIB@" set hsa_runtime_lib "@HSA_RUNTIME_LIB@" Index: Makefile.in === --- Makefile.in (revision 279954) +++ Makefile.in (working copy) @@ -319,6 +319,7 @@ RAW_CXX_TARGET_EXPORTS = \ NORMAL_TARGET_EXPORTS = \ $(BASE_TARGET_EXPORTS) \ + SYSROOT_CFLAGS_FOR_TARGET="$(SYSROOT_CFLAGS_FOR_TARGET)"; export SYSROOT_CFLAGS_FOR_TARGET; \ CXX="$(CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; export CXX; # Where to find GMP Index: Makefile.tpl === --- Makefile.tpl(revision 279954) +++ Makefile.tpl(working copy) @@ -322,6 +322,7 @@ RAW_CXX_TARGET_EXPORTS = \ NORMAL_TARGET_EXPORTS = \ $(BASE_TARGET_EXPORTS) \ + SYSROOT_CFLAGS_FOR_TARGET="$(SYSROOT_CFLAGS_FOR_TARGET)"; export SYSROOT_CFLAGS_FOR_TARGET; \ CXX="$(CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; export CXX; # Where to find GMP
[PATCH, C++, OpenACC/OpenMP] Allow static constexpr fields in mappable types
Hi Jakub, Thomas, We had a customer with a C++ program using GPU offloading failing to compile due to the code's extensive use of 'static constexpr' in its many template classes (code was using OpenMP, but OpenACC is no different) While the FE should ensure that no static members should exist for struct/class types that are being mapped to the GPU, 'static constexpr' are completely resolved and folded statically during compile time, so they really shouldn't count. This is a small patch to cp/decl2.c:cp_omp_mappable_type_1() to allow the DECL_DECLARED_CONSTEXPR_P == true case to be mapped, and a g++ testcase. Patch has been tested with no regressions in g++ and libgomp testsuites. Probably not okay for trunk now, okay for stage1? Thanks, Chung-Lin cp/ * decl2.c (cp_omp_mappable_type_1): Allow fields with DECL_DECLARED_CONSTEXPR_P to be mapped. testsuite/ * g++.dg/goacc/static-constexpr-1.C: New test. diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index 042d6fa12df..4f7d9b0ebd4 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -1461,7 +1461,10 @@ cp_omp_mappable_type_1 (tree type, bool notes) { tree field; for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) - if (VAR_P (field)) + if (VAR_P (field) + /* Fields that are 'static constexpr' can be folded away at compile + time, thus does not interfere with mapping. */ + && !DECL_DECLARED_CONSTEXPR_P (field)) { if (notes) inform (DECL_SOURCE_LOCATION (field), diff --git a/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C b/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C new file mode 100644 index 000..2bf69209de4 --- /dev/null +++ b/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C @@ -0,0 +1,16 @@ +// { dg-do compile } + +/* Test that static constexpr members do not interfere with offloading. */ +struct rec +{ + static constexpr int x = 1; + int y, z; +}; + +void foo (rec& r) +{ + #pragma acc parallel copy(r) + { +r.y = r.y = r.x; + } +}
Re: [PING^4][PATCH 0/4] Fix library testsuite compilation for build sysroot
Hi Maciej, sorry for the late reply. On 2020/2/1 5:46 AM, Maciej W. Rozycki wrote: On Tue, 21 Jan 2020, Maciej W. Rozycki wrote: I'll give your proposal a shot and I'm lucky enough to have a build configuration where I can have no compiler preinstalled, so at least I can check if testing with your change applied correctly picks the newly built uninstalled compiler in that case. So it does seem to pick the right uninstalled compiler, however without the sysroot option and therefore all tests fail either like: .../bin/riscv64-linux-gnu-ld: cannot find crt1.o: No such file or directory .../bin/riscv64-linux-gnu-ld: cannot find -lm collect2: error: ld returned 1 exit status compiler exited with status 1 FAIL: libgomp.c/../libgomp.c-c++-common/depend-iterator-2.c (test for excess errors) or like: .../libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/cancel-parallel-1.c:4:10: fatal error: stdlib.h: No such file or directory compilation terminated. compiler exited with status 1 FAIL: libgomp.c/../libgomp.c-c++-common/cancel-parallel-1.c (test for excess errors) Weird that the --sysroot option doesn't properly get down there. I'll try to whip up a similar environment later to really test this myself if I have time. As for your patch... I am somewhat wary about picking the compiler options to pass selectively anyway. However the change below does not do that and works for me and I think it should be fine for your use case too. Please confirm. I'm yet verifying the other libraries with corresponding changes and will formally submit v2 of this series once that has completed. Apologies for the slightly long RTT with this update. Maciej The 'AM_RUNTESTFLAGS = --tool_exec "$(CC)"' does work for us, but only because you backed out the change from libgomp-test-support.exp, and our installed testing doesn't use the libgomp/testsuite/Makefile.* files (we invoke runtest using another script). From the code in libgomp/testsuite/lib/libgomp.exp:libgomp_init(): ... if ![info exists GCC_UNDER_TEST] then { if [info exists TOOL_EXECUTABLE] { set GCC_UNDER_TEST $TOOL_EXECUTABLE } else { set GCC_UNDER_TEST "[find_gcc]" } } So essentially this patch is the same as the prior one, and still blocks the usual find_gcc logic from ever getting control (as long as we use the in-tree 'make check'). I'm not sure that is the right thing to do... That said, I don't have anything further against this patch. Okay for me. (I do still think that actually detecting the right in-tree compiler and giving the correct sysroot options from the configuration is the more proper approach, maybe later) Thanks, Chung-Lin --- libgomp/testsuite/Makefile.am |1 + libgomp/testsuite/Makefile.in |1 + libgomp/testsuite/libgomp-test-support.exp.in |2 -- 3 files changed, 2 insertions(+), 2 deletions(-) gcc-test-libgomp-runtestflags-tool-exec.diff Index: gcc/libgomp/testsuite/Makefile.am === --- gcc.orig/libgomp/testsuite/Makefile.am +++ gcc/libgomp/testsuite/Makefile.am @@ -11,6 +11,7 @@ EXPECT = $(shell if test -f $(top_buildd _RUNTEST = $(shell if test -f $(top_srcdir)/../dejagnu/runtest; then \ echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi) RUNTESTDEFAULTFLAGS = --tool $$tool --srcdir $$srcdir +AM_RUNTESTFLAGS = --tool_exec "$(CC)" # Instead of directly in ../testsuite/libgomp-test-support.exp.in, the # following variables have to be "routed through" this Makefile, for expansion Index: gcc/libgomp/testsuite/Makefile.in === --- gcc.orig/libgomp/testsuite/Makefile.in +++ gcc/libgomp/testsuite/Makefile.in @@ -308,6 +308,7 @@ _RUNTEST = $(shell if test -f $(top_srcd echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi) RUNTESTDEFAULTFLAGS = --tool $$tool --srcdir $$srcdir +AM_RUNTESTFLAGS = --tool_exec "$(CC)" all: all-am .SUFFIXES: Index: gcc/libgomp/testsuite/libgomp-test-support.exp.in === --- gcc.orig/libgomp/testsuite/libgomp-test-support.exp.in +++ gcc/libgomp/testsuite/libgomp-test-support.exp.in @@ -1,5 +1,3 @@ -set GCC_UNDER_TEST {@CC@} - set cuda_driver_include "@CUDA_DRIVER_INCLUDE@" set cuda_driver_lib "@CUDA_DRIVER_LIB@" set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
[PATCH, OpenMP, Fortran] Support in_reduction for Fortran
Hi Jakub, and Fortran folks, this patch does the required adjustments to let 'in_reduction' work for Fortran. Not just for the target directive actually, task directive is also working after this patch. There is a little bit of adjustment in omp-low.c:scan_sharing_clauses: RTL expand of the copy of the OMP_CLAUSE_IN_REDUCTION decl was failing for Fortran by-reference arguments, which seems to work after placing them under the outer ctx (when it exists). This also now needs checking the field_map for existence of the field before inserting. Tested without regressions on mainline trunk, is this okay? (testing for devel/omp/gcc-11 is in progress) Thanks, Chung-Lin 2021-09-17 Chung-Lin Tang gcc/fortran/ChangeLog: * openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case. (gfc_match_omp_clauses): Add 'openmp_target' default false parameter, adjust call to gfc_match_omp_clause_reduction. (match_omp): Adjust call to gfc_match_omp_clauses * trans-openmp.c (gfc_trans_omp_taskgroup): Add call to gfc_match_omp_clause, create and return block. gcc/ChangeLog: * omp-low.c (scan_sharing_clauses): Place in_reduction copy of variable in outer ctx if if exists. Check if non-existent in field_map before installing OMP_CLAUSE_IN_REDUCTION decl. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan pattern. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-in-reduction-1.f90: New test. diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index a64b7f5aa10..8179b5aa8bc 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -1138,7 +1138,7 @@ failed: static match gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses *c, bool openacc, - bool allow_derived) + bool allow_derived, bool openmp_target = false) { if (pc == 'r' && gfc_match ("reduction ( ") != MATCH_YES) return MATCH_NO; @@ -1285,6 +1285,19 @@ gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses *c, bool openacc, n->u2.udr = gfc_get_omp_namelist_udr (); n->u2.udr->udr = udr; } + if (openmp_target && list_idx == OMP_LIST_IN_REDUCTION) + { + gfc_omp_namelist *p = gfc_get_omp_namelist (), **tl; + p->sym = n->sym; + p->where = p->where; + p->u.map_op = OMP_MAP_ALWAYS_TOFROM; + + tl = &c->lists[OMP_LIST_MAP]; + while (*tl) + tl = &((*tl)->next); + *tl = p; + p->next = NULL; + } } return MATCH_YES; } @@ -1353,7 +1366,7 @@ gfc_match_dupl_atomic (bool not_dupl, const char *name) static match gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, bool first = true, bool needs_space = true, - bool openacc = false) + bool openacc = false, bool openmp_target = false) { bool error = false; gfc_omp_clauses *c = gfc_get_omp_clauses (); @@ -2057,8 +2070,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, goto error; } if ((mask & OMP_CLAUSE_IN_REDUCTION) - && gfc_match_omp_clause_reduction (pc, c, openacc, -allow_derived) == MATCH_YES) + && gfc_match_omp_clause_reduction (pc, c, openacc, allow_derived, +openmp_target) == MATCH_YES) continue; if ((mask & OMP_CLAUSE_INBRANCH) && (m = gfc_match_dupl_check (!c->inbranch && !c->notinbranch, @@ -3496,7 +3509,8 @@ static match match_omp (gfc_exec_op op, const omp_mask mask) { gfc_omp_clauses *c; - if (gfc_match_omp_clauses (&c, mask) != MATCH_YES) + if (gfc_match_omp_clauses (&c, mask, true, true, false, +(op == EXEC_OMP_TARGET)) != MATCH_YES) return MATCH_ERROR; new_st.op = op; new_st.ext.omp_clauses = c; diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index e55e0c81868..08483951066 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -6391,12 +6391,17 @@ gfc_trans_omp_task (gfc_code *code) static tree gfc_trans_omp_taskgroup (gfc_code *code) { + stmtblock_t block; + gfc_start_block (&block); tree body = gfc_trans_code (code->block->next); tree stmt = make_node (OMP_TASKGROUP); TREE_TYPE (stmt) = void_type_node; OMP_TASKGROUP_BODY (stmt) = body; - OMP_TASKGROUP_CLAUSES (stmt) = NULL_TREE; - return stmt; + OMP_TASKGROUP_CLAUSES (stmt) = gfc_trans_omp_clauses (&
Re: [PATCH] OpenACC reference count overhaul
On 2019/12/10 12:04 AM, Julian Brown wrote: I'm citing below the changes introducing 'gomp_remove_var_async', modelled similar to the existing 'gomp_unmap_vars_async'. Also for both these, do I understand correctly, that it's actually not the 'gomp_unref_tgt' that needs to be "delayed" via 'goacc_asyncqueue', but rather really only the 'gomp_free_device_memory', called via 'gomp_unmap_tgt', called via 'gomp_unref_tgt'? In other words: why do we need to keep the 'struct target_mem_desc' alive? Per my understanding, that one is one component of the mapping table, and not relevant anymore (thus can be 'free'd) as soon as it has been determined that 'tgt->refcount == 0'? Am I missing something there? IIRC, that was Chung-Lin's choice. I'll CC him in. I think delaying freeing of the target_mem_desc isn't really a huge problem, in practice. I don't clearly remember all the details. It could be possible that not asyncqueue-ifying gomp_remove_var was simply an overlook. The 'target_mem_desc' is supposed to represent the piece of device memory inside libgomp, so unref/freeing it only after all dev-to-host copying is done seems logical. Chung-Lin
[PATCH, PR90030] Fortran OpenMP/OpenACC array mapping alignment fix
Hi Jakub, As Thomas reported and submitted a patch a while ago: https://gcc.gnu.org/pipermail/gcc-patches/2019-April/519932.html https://gcc.gnu.org/pipermail/gcc-patches/2019-May/522738.html There's an issue with the Fortran front-end when mapping arrays: when creating the data MEM_REF for the map clause, there's a convention of casting the referencing pointer to 'c_char *' by fold_convert (build_pointer_type (char_type_node), ptr). This causes the alignment passed to the libgomp runtime for array data hardwared to '1', and causes alignment errors on the offload target (not always showing up, but can trigger due to slight change of clause ordering) This patch is not exactly Thomas' patch from 2019, but does the same thing. The new libgomp tests are directly reused though. A lot of scan test adjustment is also included in this patch. Patch has been tested for no regressions for gfortran and libgomp, is this okay for trunk? Thanks, Chung-Lin Fortran: fix array alignment for OpenMP/OpenACC target mapping clauses [PR90030] The Fortran front-end is creating maps of array data with a type of pointer to char_type_node, which when eventually passed to libgomp during runtime, marks the passed array with an alignment of 1, which can cause mapping alignment errors on the offload target. This patch removes the related fold_convert(build_pointer_type (char_type_node)) calls in fortran/trans-openmp.c, and adds gcc_asserts to ensure pointer type. 2021-11-04 Chung-Lin Tang Thomas Schwinge PR fortran/90030 gcc/fortran/ChangeLog: * trans-openmp.c (gfc_omp_finish_clause): Remove fold_convert to pointer to char_type_node, add gcc_assert of POINTER_TYPE_P. (gfc_trans_omp_array_section): Likewise. (gfc_trans_omp_clauses): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/goacc/finalize-1.f: Adjust scan test. * gfortran.dg/gomp/affinity-clause-1.f90: Likewise. * gfortran.dg/gomp/affinity-clause-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-4.f90: Likewise. * gfortran.dg/gomp/defaultmap-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-6.f90: Likewise. * gfortran.dg/gomp/map-3.f90: Likewise. * gfortran.dg/gomp/pr78260-2.f90: Likewise. * gfortran.dg/gomp/pr78260-3.f90: Likewise. libgomp/ChangeLog: * testsuite/libgomp.oacc-fortran/pr90030.f90: New test. * testsuite/libgomp.fortran/pr90030.f90: New test.diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index e81c558..0ff90b7 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -1564,7 +1564,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc) if (present) ptr = gfc_build_cond_assign_expr (&block, present, ptr, null_pointer_node); - ptr = fold_convert (build_pointer_type (char_type_node), ptr); + gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr))); ptr = build_fold_indirect_ref (ptr); OMP_CLAUSE_DECL (c) = ptr; c2 = build_omp_clause (input_location, OMP_CLAUSE_MAP); @@ -2381,7 +2381,7 @@ gfc_trans_omp_array_section (stmtblock_t *block, gfc_omp_namelist *n, OMP_CLAUSE_SIZE (node), elemsz); } gcc_assert (se.post.head == NULL_TREE); - ptr = fold_convert (build_pointer_type (char_type_node), ptr); + gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr))); OMP_CLAUSE_DECL (node) = build_fold_indirect_ref (ptr); ptr = fold_convert (ptrdiff_type_node, ptr); @@ -2849,8 +2849,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl))) { decl = gfc_conv_descriptor_data_get (decl); - decl = fold_convert (build_pointer_type (char_type_node), - decl); + gcc_assert (POINTER_TYPE_P (TREE_TYPE (decl))); decl = build_fold_indirect_ref (decl); } else if (DECL_P (decl)) @@ -2873,8 +2872,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, } gfc_add_block_to_block (&iter_block, &se.pre); gfc_add_block_to_block (&iter_block, &se.post); - ptr = fold_convert (build_pointer_type (char_type_node), - ptr); + gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr))); OMP_CLAUSE_DECL (node) = build_fold_indirect_ref (ptr); } if (list == OMP_LIST_DEPEND) @@ -3117,8 +3115,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, if (present)
[PATCH, v2, OpenMP 5.0] Implement relaxation of implicit map vs. existing device mappings (for mainline trunk)
Hi Jakub, On 2021/6/24 11:55 PM, Jakub Jelinek wrote: On Fri, May 14, 2021 at 09:20:25PM +0800, Chung-Lin Tang wrote: diff --git a/gcc/gimplify.c b/gcc/gimplify.c index e790f08b23f..69c4a8e0a0a 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -10374,6 +10374,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data) gcc_unreachable (); } OMP_CLAUSE_SET_MAP_KIND (clause, kind); + OMP_CLAUSE_MAP_IMPLICIT_P (clause) = 1; if (DECL_SIZE (decl) && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST) { As Thomas mentioned, there is now also OMP_CLAUSE_MAP_IMPLICIT that means something different: /* Nonzero on map clauses added implicitly for reduction clauses on combined or composite constructs. They shall be removed if there is an explicit map clause. */ Having OMP_CLAUSE_MAP_IMPLICIT and OMP_CLAUSE_MAP_IMPLICIT_P would be too confusing. So either we need to use just one flag for both purposes or have two different flags and find a better name for one of them. The former would be possible if no OMP_CLAUSE_MAP clauses added by the FEs are implicit - then you could clear OMP_CLAUSE_MAP_IMPLICIT in gimplify_scan_omp_clauses. I wonder if it is the case though, e.g. doesn't your "Improve OpenMP target support for C++ [PR92120 v4]" patch add a lot of such implicit map clauses (e.g. the this[:1] and various others)? I have changed the name to OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P, to signal that this bit is to be passed to the runtime. Right now its intended to be used by clauses created by the middle-end, but front-end uses like that for C++ could be clarified later. Also, gimplify_adjust_omp_clauses_1 sometimes doesn't add just one map clause, but several, shouldn't those be marked implicit too? And similarly it calls lang_hooks.decls.omp_finish_clause which can add even further map clauses implicitly, shouldn't those be implicit too (in that case copy the flag from the clause it is called on to the extra clauses it adds)? Also as Thomas mentioned, it should be restricted to non-OpenACC, it can check gimplify_omp_ctxp->region_type if it is OpenMP or OpenACC. Agreed, I've adjusted the patch to only to this implicit setting for OpenMP. This reduces a lot of the originally needed scan test adjustment for existing OpenACC testcases. @@ -10971,9 +10972,15 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p, list_p = &OMP_CLAUSE_CHAIN (c); } - /* Add in any implicit data sharing. */ + /* Add in any implicit data sharing. Implicit clauses are added at the start Two spaces after dot in comments. Done. + of the clause list, but after any non-map clauses. */ struct gimplify_adjust_omp_clauses_data data; - data.list_p = list_p; + tree *implicit_add_list_p = orig_list_p; + while (*implicit_add_list_p +&& OMP_CLAUSE_CODE (*implicit_add_list_p) != OMP_CLAUSE_MAP) +implicit_add_list_p = &OMP_CLAUSE_CHAIN (*implicit_add_list_p); Why are the implicit map clauses added first and not last? As I also explained in the first submission email, due to the processing order, if implicit classes are added last (and processed last), for example: #pragma omp target map(tofrom: var.ptr[:N]) map(tofrom: var[implicit]) { // access of var.ptr[] } The explicit var.ptr[:N] will not find anything to map, because the (implicit) map(var) has not been seen yet, and the assumed array section attachment behavior will fail. Only an order like: map(tofrom: var[implicit]) map(tofrom: var.ptr[:N]) will the usual assumed behavior show. And yes, this depends on the new behavior implemented by patch [1], which I still need you to review. e.g. for map(var.ptr[:N]), the proper behavior should *only* map the array section but NOT the base-pointer. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571195.html There is also the OpenMP 5.1 [352:17-22] case which basically says that the implicit mappings should be ignored if there are explicit ones on the same construct (though, do we really create implicit clauses in that case?). Implicit clauses do not appear to be created if there's an explicit clause already existing. +#define GOMP_MAP_IMPLICIT (GOMP_MAP_FLAG_SPECIAL_3 \ +| GOMP_MAP_FLAG_SPECIAL_4) +/* Mask for entire set of special map kind bits. */ +#define GOMP_MAP_FLAG_SPECIAL_BITS (GOMP_MAP_FLAG_SPECIAL_0 \ +| GOMP_MAP_FLAG_SPECIAL_1 \ +| GOMP_MAP_FLAG_SPECIAL_2 \ +| GOMP_MAP_FLAG_SPECIAL_3 \ +| GOMP_MAP_FLAG_SPECIAL_4) ... +#define GOMP_MAP_IMPLICIT_P(X) \ + (((X) & GOMP_MAP_FLAG_SPECIAL_BITS) == GOMP_MAP_IMPLICIT) I think here we need to decide with which GOMP_MAP* kin
[PATCH, v5, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v5]
Hi Jakub, On 2021/6/24 9:15 PM, Jakub Jelinek wrote: On Fri, Jun 18, 2021 at 10:25:16PM +0800, Chung-Lin Tang wrote: Note, you'll need to rebase your patch, it clashes with r12-1768-g7619d33471c10fe3d149dcbb701d99ed3dd23528. Sorry for that. And sorry for patch review delay. --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -13104,6 +13104,12 @@ handle_omp_array_sections_1 (tree c, tree t, vec &types, return error_mark_node; } t = TREE_OPERAND (t, 0); + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) Map clauses never appear on declare simd, so (ort == C_ORT_ACC || ort == C_ORT_OMP) previously meant always and since the in_reduction change is incorrect (as C_ORT_OMP_TARGET is used for target construct but not for e.g. target data* or target update). + && TREE_CODE (t) == MEM_REF) Upon reviewing, it appears that most of these C_ORT_* tests are no longer needed, removed in new patch. So please just use if (TREE_CODE (t) == MEM_REF) or explain when it shouldn't trigger. @@ -14736,6 +14743,11 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) { while (TREE_CODE (t) == COMPONENT_REF) t = TREE_OPERAND (t, 0); + if (TREE_CODE (t) == MEM_REF) + { + t = TREE_OPERAND (t, 0); + STRIP_NOPS (t); + } This doesn't look correct. At least the parsing (and the spec AFAIK) doesn't ensure that if there is ->, it must come before all the dots. So, if one uses map (s->x.y) the above would work, but if map (s->x.y->z) or map (s.a->b->c->d->e) is used, it wouldn't. I'd expect a single while loop that looks through COMPONENT_REFs and MEM_REFs as they appear. Maybe the handle_omp_array_sections_1 MEM_REF case too? Or do you want to have it done incrementally, start with supporting only a single -> first before all the dots and later on add support for the rest? I think the 5.0 and especially 5.1 wording basically says that map clause operand is arbitrary lvalue expression that includes array section support too, so eventually we should just have somewhere in parsing scope a bool whether OpenMP array sections are allowed or not, add OMP_ARRAY_REF or similar tree code for those and after parsing the expression, ensure array sections appear only where they can appear and for a subset of the lvalue expressions where we have decl plus series of -> field or . field or [ index ] or [ array section stuff ] handle those specially. That arbitrary lvalue can certainly be done incrementally. map (foo(123)->a.b[3]->c.d[:7]) and the like. Indeed this kind of modification is sort of "as encountered", so there are probably many cases that are not completely handled yet; it's not just the front-end, but also changes in gimplify_scan_omp_clauses(). However, I had another patch that should've plowed a bit further on this: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html as well as those patch sets that Julian is working on. (our current plan is to have my sets go in first, and Julian's on top, to minimize clashing) if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP && OMP_CLAUSE_MAP_IMPLICIT (c) && (bitmap_bit_p (&map_head, DECL_UID (t)) @@ -14802,6 +14814,15 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) bias) to zero here, so it is not set erroneously to the pointer size later on in gimplify.c. */ OMP_CLAUSE_SIZE (c) = size_zero_node; + indir_component_ref_p = false; + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) Same comment about ort tests. + && TREE_CODE (t) == COMPONENT_REF + && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF) + { + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + indir_component_ref_p = true; + STRIP_NOPS (t); + } Again, this can handle only a single -> @@ -42330,16 +42328,10 @@ cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok, cclauses[C_OMP_CLAUSE_SPLIT_TARGET] = tc; } } - tree stmt = make_node (OMP_TARGET); - TREE_TYPE (stmt) = void_type_node; - OMP_TARGET_CLAUSES (stmt) = cclauses[C_OMP_CLAUSE_SPLIT_TARGET]; - c_omp_adjust_map_clauses (OMP_TARGET_CLAUSES (stmt), true); - OMP_TARGET_BODY (stmt) = body; - OMP_TARGET_COMBINED (stmt) = 1; - SET_EXPR_LOCATION (stmt, pragma_tok->location); - add_stmt (stmt); - pc = &OMP_TARGET_CLAUSES (stmt); - goto check_clauses; + c_omp_adjust_map_clauses (cclauses[C_OMP_C
[PATCH, OpenMP 5.0] Improve OpenMP target support for C++ (includes PR92120 v3)
Hi Jakub, the attached patch is a combination of the below patches already pushed to devel/omp/gcc-10, some are kind of transient bug fixes, but listing all for completeness: aadfc984: [PATCH] Target mapping C++ members inside member functions https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562467.html 36a1ebdb: [PATCH] OpenMP 5.0: map this[:1] in C++ non-static member functions (PR 92120) https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558975.html bf8605f1: [PATCH] Enable gimplify GOMP_MAP_STRUCT handling of (COMPONENT_REF (INDIRECT_REF ...)) map clauses. https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564976.html da047f63: [PATCH] Fix regression of array members in OpenMP map clauses. https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566086.html 4e714eaa: [PATCH] Fix template case of non-static member access inside member functions https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566592.html 2ed80263: [PATCH] Lambda capturing of pointers and references in target directives https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566935.html 08caada8: Arrow operator handling for C front-end in OpenMP map clauses https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566419.html To summarize, this patch set is an improvement for OpenMP target support for C++, including for inside non-static members, lambda objects, and struct member deref access expressions. The corresponding modifications for the C front-end are also included. This patch supercedes the prior versions of my PR92120 patch (implicit C++ map(this[:1])), so dubbing this "v3" of patch for that PR. Prior versions of the PR92120 patch was implemented by recording uses of 'this' in the parser, and then use the recorded uses during "finish" to create the implicit maps. When working on supporting lambda objects, this required using a tree-walk style processing of the OMP_TARGET body, so in only made sense to merge the entire 'this' processing together with it, so a large part of the parser changes were dropped, with the main processing in semantics.c now. Other parser changes to support '->' in map clauses are also with this patch. Tested without regressions on x86_64-linux with nvptx offloading, okay for trunk? Thanks, Chung-Lin 2021-05-20 Chung-Lin Tang gcc/cp/ * cp-tree.h (finish_omp_target): New declaration. (finish_omp_target_clauses): Likewise. * parser.c (cp_parser_omp_clause_map): Adjust call to cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true. (cp_parser_omp_target): Factor out code, adjust into calls to new function finish_omp_target. * pt.c (tsubst_expr): Add call to finish_omp_target_clauses for OMP_TARGET case. * semantics.c (handle_omp_array_sections_1): Add handling to create 'this->member' from 'member' FIELD_DECL. (handle_omp_array_sections): Likewise. (finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP map clauses. Handle 'A->member' case in map clauses. (struct omp_target_walk_data): New struct for walking over target-directive tree body. (finish_omp_target_clauses_r): New function for tree walk. (finish_omp_target_clauses): New function. (finish_omp_target): New function. gcc/c/ * c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in call to c_parser_omp_variable_list to 'true'. * c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in array base handling. (c_finish_omp_clauses): Handle 'A->member' case in map clauses. gcc/ * gimplify.c ("tree-hash-traits.h"): Add include. (gimplify_scan_omp_clauses): Change struct_map_to_clause to type hash_map *. Adjust struct map handling to handle cases of *A and A->B expressions. Under !DECL_P case of GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when handling component_ref_p case. Add unshare_expr and gimplification when created GOMP_MAP_STRUCT is not a DECL. Add code to add firstprivate pointer for *pointer-to-struct case. (gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for exit data directives code to earlier position. * omp-low.c (lower_omp_target): Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. * tree-pretty-print.c (dump_omp_clause): Likewise. gcc/testsuite/ * gcc.dg/gomp/target-3.c: New testcase. * g++.dg/gomp/target-3.C: New testcase. * g++.dg/gomp/target-lambda-1.C: New testcase. * g++.d
[PATCH, OpenMP 5.0] Remove array section base-pointer mapping semantics, and other front-end adjustments (mainline trunk)
Hi Jakub, this is a version of this patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html for mainline trunk. This patch largely implements three pieces of functionality: (1) Per discussion and clarification on the omp-lang mailing list, standards conforming behavior for mapping array sections should *NOT* also map the base-pointer, i.e for this code: struct S { int *ptr; ... }; struct S s; #pragma omp target enter data map(to: s.ptr[:100]) Currently we generate after gimplify: #pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \ map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) which is deemed incorrect. After this patch, the gimplify results are now adjusted to: #pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) (the attach operation is still generated, and if s.ptr is already mapped prior, attachment will happen) The correct way of achieving the base-pointer-also-mapped behavior would be to use: #pragma omp target enter data map(to: s.ptr, s.ptr[:100]) This adjustment in behavior required a number of small adjustments here and there in gimplify, including to accomodate map sequences for C++ references. There is also a small Fortran front-end patch involved (hence CCing Tobias and fortran@). The new gimplify processing changed behavior in handling GOMP_MAP_ALWAYS_POINTER maps such that the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the Fortran FE was generating a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and the pre-patch behavior was removing this map anyways. I have a small change in trans-openmp.c:gfc_trans_omp_array_section to not generate the map in this case, and so far no bad test results. (2) The second part (though kind of related to the first above) are fixes in libgomp/target.c to not overwrite attached pointers when handling device<->host copies, mainly for the "always" case. This behavior is also noted in the 5.0 spec, but not yet properly coded before. (3) The third is a set of changes to the C/C++ front-ends to extend the allowed component access syntax in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, so despite in the long term the entire map clause syntax parsing is probably going to be revamped, we're still adding this in for now. These changes are enabled for both OpenACC and OpenMP. Tested on x86_64-linux with nvptx offloading with no regressions. This patch was merged and tested atop of the prior submitted patches: (a) https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570886.html "[PATCH, OpenMP 5.0] Improve OpenMP target support for C++ (includes PR92120 v3)" (b) https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570365.html "[PATCH, OpenMP 5.0] Implement relaxation of implicit map vs. existing device mappings (for mainline trunk)" so you might queued this one later than those for review. Thanks, Chung-Lin 2021-05-25 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.c (struct omp_dim): New struct type for use inside c_parser_omp_variable_list. (c_parser_omp_variable_list): Allow multiple levels of array and component accesses in array section base-pointer expression. (c_parser_omp_clause_to): Set 'allow_deref' to true in call to c_parser_omp_var_list_parens. (c_parser_omp_clause_from): Likewise. * c-typeck.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (c_finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/cp/ChangeLog: * parser.c (struct omp_dim): New struct type for use inside cp_parser_omp_var_list_no_open. (cp_parser_omp_var_list_no_open): Allow multiple levels of array and component accesses in array section base-pointer expression. (cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to cp_parser_omp_var_list for to/from clauses. * semantics.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (handle_omp_array_sections): Adjust pointer map generation of references. (finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_array_section): Do not generate GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type. gcc/ChangeLog: * gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter, accomodate case where 'offset' return of get_inner_r
[PATCH, v3, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0
Hi Jakub, this is a v3 version of my OpenMP 5.0 structure element mapping patch, v2 was here: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561139.html This v3 adds a small bug fix, where the initialization of the refcount didn't handle all cases, fixed by using gomp_refcount_increment here (more consistent). I know you had performance concerns in the last round, compared with your sorting approach. I'll try to research on that later. Getting the v3 patch posted before backporting to devel/omp/gcc-11. Thanks, Chung-Lin libgomp/ * hashtab.h (htab_clear): New function with initialization code factored out from... (htab_create): ...here, adjust to use htab_clear function. * libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of special refcount values, add comments. (REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL. (REFCOUNT_LINK): Likewise. (REFCOUNT_STRUCTELEM): New special refcount range for structure element siblings. (REFCOUNT_STRUCTELEM_P): Macro for testing for structure element sibling maps. (REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling. (REFCOUNT_STRUCTELEM_FLAG_LAST): Flag to indicate last sibling. (REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag. (REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag. (struct splay_tree_key_s): Add structelem_refcount and structelem_refcount_ptr fields into a union with dynamic_refcount. Add comments. (gomp_map_vars): Delete declaration. (gomp_map_vars_async): Likewise. (gomp_unmap_vars): Likewise. (gomp_unmap_vars_async): Likewise. (goacc_map_vars): New declaration. (goacc_unmap_vars): Likewise. * oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars. (goacc_enter_datum): Likewise. (goacc_enter_data_internal): Likewise. * oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars and goacc_unmap_vars. (GOACC_data_start): Adjust to use goacc_map_vars. (GOACC_data_end): Adjust to use goacc_unmap_vars. * target.c (hash_entry_type): New typedef. (htab_alloc): New function hook for hashtab.h. (htab_free): Likewise. (htab_hash): Likewise. (htab_eq): Likewise. (hashtab.h): Add file include. (gomp_increment_refcount): New function. (gomp_decrement_refcount): Likewise. (gomp_map_vars_existing): Add refcount_set parameter, adjust to use gomp_increment_refcount. (gomp_map_fields_existing): Add refcount_set parameter, adjust calls to gomp_map_vars_existing. (gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p variable to guard OpenMP specific paths, adjust calls to gomp_map_vars_existing, add structure element sibling splay_tree_key sequence creation code, adjust Fortran map case to avoid increment under OpenMP. (gomp_map_vars): Adjust to static, add refcount_set parameter, manage local refcount_set if caller passed in NULL, adjust call to gomp_map_vars_internal. (gomp_map_vars_async): Adjust and rename into... (goacc_map_vars): ...this new function, adjust call to gomp_map_vars_internal. (gomp_remove_splay_tree_key): New function with code factored out from gomp_remove_var_internal. (gomp_remove_var_internal): Add code to handle removing multiple splay_tree_key sequence for structure elements, adjust code to use gomp_remove_splay_tree_key for splay-tree key removal. (gomp_unmap_vars_internal): Add refcount_set parameter, adjust to use gomp_decrement_refcount. (gomp_unmap_vars): Adjust to static, add refcount_set parameter, manage local refcount_set if caller passed in NULL, adjust call to gomp_unmap_vars_internal. (gomp_unmap_vars_async): Adjust and rename into... (goacc_unmap_vars): ...this new function, adjust call to gomp_unmap_vars_internal. (GOMP_target): Manage refcount_set and adjust calls to gomp_map_vars and gomp_unmap_vars. (GOMP_target_ext): Likewise. (gomp_target_data_fallback): Adjust call to gomp_map_vars. (GOMP_target_data): Likewise. (GOMP_target_data_ext): Likewise. (GOMP_target_end_data): Adjust call to gomp_unmap_vars. (gomp_exit_data): Add refcount_set parameter, adjust to use gomp_decrement_refcount, adjust to queue splay-tree keys for removal after main loop. (GOMP_target_enter_exit_data): Manage refcount_set and adjust calls to gomp_map_vars and gomp_exit_data. (gomp_target_task_fn): Likewise. * testsuite/libgomp.c-c++-common/refcount-1.c: New testcase. * testsuite/libgomp.c-c++-common/struct-elem-1.c:
[PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives
Hi all, this patch add support for "strictly-structured blocks" introduced in OpenMP 5.1, basically allowing BLOCK constructs to serve as the body for directives: !$omp target block ... end block [!$omp end target] !! end directive is optional !$omp parallel block ... end block ... !$omp end parallel !! error, considered as not match to above parallel directive The parsing loop in parse_omp_structured_block() has been modified to allow a BLOCK construct after the first statement has been detected to be ST_BLOCK. This is done by a hard modification of the state into (the new) COMP_OMP_STRICTLY_STRUCTURED_BLOCK after the statement is known (I'm not sure if there's a way to 'peek' the next statement/token in the Fortran FE, open to suggestions on how to better write this) Tested with no regressions on trunk, is this okay to commit? Thanks, Chung-Lin 2021-10-07 Chung-Lin Tang gcc/fortran/ChangeLog: * decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case together with COMP_BLOCK. * parse.c (parse_omp_structured_block): Adjust declaration, add 'bool strictly_structured_block' default true parameter, add handling for strictly-structured block case, adjust recursive calls to parse_omp_structured_block. (parse_executable): Adjust calls to parse_omp_structured_block. * parse.h (enum gfc_compile_state): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK. * trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case handling. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/strictly-structured-block-1.f90: New test. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index b3c65b7175b..ff66d1f9475 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -8445,6 +8445,7 @@ gfc_match_end (gfc_statement *st) break; case COMP_BLOCK: +case COMP_OMP_STRICTLY_STRUCTURED_BLOCK: *st = ST_END_BLOCK; target = " block"; eos_ok = 0; diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index 7d765a0866d..d78bf9b8fa5 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -5451,8 +5451,9 @@ parse_oacc_loop (gfc_statement acc_st) /* Parse the statements of an OpenMP structured block. */ -static void -parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) +static gfc_statement +parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only, + bool strictly_structured_block = true) { gfc_statement st, omp_end_st; gfc_code *cp, *np; @@ -5538,6 +5539,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) gcc_unreachable (); } + bool block_construct = false; + gfc_namespace* my_ns = NULL; + gfc_namespace* my_parent = NULL; + + st = next_statement (); + + if (strictly_structured_block && st == ST_BLOCK) +{ + /* Adjust state to a strictly-structured block, now that we found that +the body starts with a BLOCK construct. */ + s.state = COMP_OMP_STRICTLY_STRUCTURED_BLOCK; + + block_construct = true; + gfc_notify_std (GFC_STD_F2008, "BLOCK construct at %C"); + + my_ns = gfc_build_block_ns (gfc_current_ns); + gfc_current_ns = my_ns; + my_parent = my_ns->parent; + + new_st.op = EXEC_BLOCK; + new_st.ext.block.ns = my_ns; + new_st.ext.block.assoc = NULL; + accept_statement (ST_BLOCK); + st = parse_spec (ST_NONE); +} + do { if (workshare_stmts_only) @@ -5554,7 +5581,6 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) restrictions apply recursively. */ bool cycle = true; - st = next_statement (); for (;;) { switch (st) @@ -5576,17 +5602,20 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) parse_forall_block (); break; + case ST_OMP_PARALLEL_SECTIONS: + st = parse_omp_structured_block (st, false, false); + continue; + case ST_OMP_PARALLEL: case ST_OMP_PARALLEL_MASKED: case ST_OMP_PARALLEL_MASTER: - case ST_OMP_PARALLEL_SECTIONS: - parse_omp_structured_block (st, false); - break; + st = parse_omp_structured_block (st, false); + continue; case ST_OMP_PARALLEL_WORKSHARE: case ST_OMP_CRITICAL: - parse_omp_structured_block (st, true); - break; + st = parse_omp_structured_block (st, true); + continue; case ST_OMP_PARALLEL_DO: case ST_OMP_PARALLEL_DO_SIMD: @@ -5609,7 +5638,7 @@ parse_omp_structured_block (gfc_statement omp_st, boo
Re: [PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives
On 2021/10/14 7:19 PM, Jakub Jelinek wrote: On Thu, Oct 14, 2021 at 12:20:51PM +0200, Jakub Jelinek via Gcc-patches wrote: Thinking more about the Fortran case for !$omp sections, there is an ambiguity. !$omp sections block !$omp section end block is clear and !$omp end sections is optional, but !$omp sections block end block is ambiguous during parsing, it could be either followed by !$omp section and then the BLOCK would be first section, or by !$omp end sections and then it would be clearly the whole sections, with first section being empty inside of the block, or if it is followed by something else, it is ambiguous whether the block ... end block is part of the first section, followed by something and then we should be looking later for either !$omp section or !$omp end section to prove that, or if !$omp sections block end block was the whole sections construct and we shouldn't await anything further. I'm afraid back to the drawing board. And I have to correct myself, there is no ambiguity in 5.2 here, the important fact is hidden in sections/parallel sections being block-associated constructs. That means the body of the whole construct has to be a structured-block, and by the 5.1+ definition of Fortran structured block, it is either block ... end block or something that doesn't start with block. So, !$omp sections block end block a = 1 is only ambiguous in whether it is actually !$omp sections block !$omp section end block a = 1 or !$omp sections !$omp section block end block !$omp end sections a = 1 but both actually do the same thing, work roughly as !$omp single. If one wants block statement as first in structured-block-sequence of the first section, followed by either some further statements or by other sections, then one needs to write !$omp sections !$omp section block end block a = 1 ... !$omp end sections or !$omp sections block block end block a = 1 ... end block Your patch probably already handles it that way, but we again need testsuite coverage to prove it is handled the way it should in all these cases (and that we diagnose what is invalid). The patch currently does not allow strictly-structured BLOCK for sections/parallel sections, since I was referencing the 5.1 spec while writing it, although that is trivially fixable. (was sensing a bit odd why those two constructs had to be specially treated in 5.1 anyways) The bigger issue is that under the current way the patch is written, the statements inside a [parallel] sections construct are parsed automatically by parse_executable(), so to enforce the specified meaning of "structured-block-sequence" (i.e. BLOCK or non-BLOCK starting sequence of stmts) will probably be more a bit harder to implement: !$omp sections block !$omp section block x=0 end block x=1 !! This is allowed now, though should be wrong spec-wise !$omp section x=2 end block Currently "$!omp section" acts essentially as a top-level separator within a sections-construct, rather than a structured directive. Though I would kind of argue this is actually better to use for the user (why prohibit what looks like very apparent meaning of the program?) So Jakub, my question for this is, is this current state okay? Or must we implement the spec pedantically? As for the other issues: (1) BLOCK/END BLOCK is not generally handled in parse_omp_structured_block, so for workshare, it is only handled for the top-level construct, not within workshare. I think this is what you meant in the last mail. (2) As for the dangling-!$omp_end issue Tobias raised, because we are basically using 1-statement lookahead, any "!$omp end <*>" is naturally bound with the adjacent BLOCK/END BLOCK, so we should be okay there. Thanks, Chung-Lin
[PATCH, v2, OpenMP, Fortran] Support in_reduction for Fortran
t have any more evidence this is needed, so removed now. --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/target-in-reduction-1.f90 @@ -0,0 +1,33 @@ +! { dg-do run } + +subroutine foo (x, y) ... + if (x .ne. 11) stop 1 + if (y .ne. 21) stop 2 + +end program main Again, something that can be dealt incrementally, but the testsuite coverage of https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573600.html was larger than this. Would be nice e.g. to cover both scalar vars and array sections/arrays, parameters passed by reference as in the above testcase, but also something that isn't a reference (either a local variable or dummy parameter with VALUE, etc. Jakub I have expanded target-in-reduction-1.f90 to cover local variables and VALUE passed parameters. Array sections in reductions appear to be still not supported by the Fortran FE in general (Tobias plans to work on that later). I also added another target-in-reduction-2.f90 testcase that tests the "orphaned" case in Fortran, where the task/target-in_reduction is in another separate subroutine. Tested without regressions on trunk, is this okay to commit? Thanks, Chung-Lin 2021-10-19 Chung-Lin Tang gcc/fortran/ChangeLog: * openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case. (gfc_match_omp_clauses): Add 'openmp_target' default false parameter, adjust call to gfc_match_omp_clause_reduction. (match_omp): Adjust call to gfc_match_omp_clauses * trans-openmp.c (gfc_trans_omp_taskgroup): Add call to gfc_match_omp_clause, create and return block. gcc/ChangeLog: * omp-low.c (omp_copy_decl_2): For !ctx, use record_vars to add new copy as local variable. (scan_sharing_clauses): Place copy of OMP_CLAUSE_IN_REDUCTION decl in ctx->outer instead of ctx. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan pattern. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-in-reduction-1.f90: New test. * testsuite/libgomp.fortran/target-in-reduction-2.f90: New test.diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 6a4ca2868f8..210fb06dbec 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -1138,7 +1138,7 @@ failed: static match gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses *c, bool openacc, - bool allow_derived) + bool allow_derived, bool openmp_target = false) { if (pc == 'r' && gfc_match ("reduction ( ") != MATCH_YES) return MATCH_NO; @@ -1285,6 +1285,19 @@ gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses *c, bool openacc, n->u2.udr = gfc_get_omp_namelist_udr (); n->u2.udr->udr = udr; } + if (openmp_target && list_idx == OMP_LIST_IN_REDUCTION) + { + gfc_omp_namelist *p = gfc_get_omp_namelist (), **tl; + p->sym = n->sym; + p->where = p->where; + p->u.map_op = OMP_MAP_ALWAYS_TOFROM; + + tl = &c->lists[OMP_LIST_MAP]; + while (*tl) + tl = &((*tl)->next); + *tl = p; + p->next = NULL; + } } return MATCH_YES; } @@ -1353,7 +1366,7 @@ gfc_match_dupl_atomic (bool not_dupl, const char *name) static match gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, bool first = true, bool needs_space = true, - bool openacc = false) + bool openacc = false, bool openmp_target = false) { bool error = false; gfc_omp_clauses *c = gfc_get_omp_clauses (); @@ -2057,8 +2070,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, goto error; } if ((mask & OMP_CLAUSE_IN_REDUCTION) - && gfc_match_omp_clause_reduction (pc, c, openacc, -allow_derived) == MATCH_YES) + && gfc_match_omp_clause_reduction (pc, c, openacc, allow_derived, +openmp_target) == MATCH_YES) continue; if ((mask & OMP_CLAUSE_INBRANCH) && (m = gfc_match_dupl_check (!c->inbranch && !c->notinbranch, @@ -3512,7 +3525,8 @@ static match match_omp (gfc_exec_op op, const omp_mask mask) { gfc_omp_clauses *c; - if (gfc_match_omp_clauses (&c, mask) != MATCH_YES) + if (gfc_match_omp_clauses (&c, mask, true, true, false, +op == EXEC_OMP_TARGET) != MATCH_YES) return MATCH_ERROR; new_st.op = op; new_st.ext.omp_clauses = c; diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-ope
[PATCH, v2, OpenMP 5.2, Fortran] Strictly-structured block support for OpenMP directives
Hi Jakub, this version adjusts the patch to let sections/parallel sections also use strictly-structured blocks, making it more towards 5.2. Because of this change, some of the testcases using the sections-construct need a bit of adjustment too, since "block; end block" at the start of the construct now means something different than before. There are now three new testcases, with the non-dg-error/dg-error cases separated, and a third testcase containing a few cases listed in prior emails. I hope this is enough. The implementation status entry in libgomp/libgomp.texi for strictly-structured blocks has also been changed to "Y" in this patch. Tested without regressions, is this now okay for trunk? Thanks, Chung-Lin 2021-10-20 Chung-Lin Tang gcc/fortran/ChangeLog: * decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case together with COMP_BLOCK. * parse.c (parse_omp_structured_block): Change return type to 'gfc_statement', add handling for strictly-structured block case, adjust recursive calls to parse_omp_structured_block. (parse_executable): Adjust calls to parse_omp_structured_block. * parse.h (enum gfc_compile_state): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK. * trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case handling. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/cancel-1.f90: Adjust testcase. * gfortran.dg/gomp/nesting-3.f90: Adjust testcase. * gfortran.dg/gomp/strictly-structured-block-1.f90: New test. * gfortran.dg/gomp/strictly-structured-block-2.f90: New test. * gfortran.dg/gomp/strictly-structured-block-3.f90: New test. libgomp/ChangeLog: * libgomp.texi (Support of strictly structured blocks in Fortran): Adjust to 'Y'. * testsuite/libgomp.fortran/task-reduction-16.f90: Adjust testcase. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index d6a22d13451..66489da12be 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -8449,6 +8449,7 @@ gfc_match_end (gfc_statement *st) break; case COMP_BLOCK: +case COMP_OMP_STRICTLY_STRUCTURED_BLOCK: *st = ST_END_BLOCK; target = " block"; eos_ok = 0; diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index 7d765a0866d..2fb98844356 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -5451,7 +5451,7 @@ parse_oacc_loop (gfc_statement acc_st) /* Parse the statements of an OpenMP structured block. */ -static void +static gfc_statement parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) { gfc_statement st, omp_end_st; @@ -5538,6 +5538,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) gcc_unreachable (); } + bool block_construct = false; + gfc_namespace *my_ns = NULL; + gfc_namespace *my_parent = NULL; + + st = next_statement (); + + if (st == ST_BLOCK) +{ + /* Adjust state to a strictly-structured block, now that we found that +the body starts with a BLOCK construct. */ + s.state = COMP_OMP_STRICTLY_STRUCTURED_BLOCK; + + block_construct = true; + gfc_notify_std (GFC_STD_F2008, "BLOCK construct at %C"); + + my_ns = gfc_build_block_ns (gfc_current_ns); + gfc_current_ns = my_ns; + my_parent = my_ns->parent; + + new_st.op = EXEC_BLOCK; + new_st.ext.block.ns = my_ns; + new_st.ext.block.assoc = NULL; + accept_statement (ST_BLOCK); + st = parse_spec (ST_NONE); +} + do { if (workshare_stmts_only) @@ -5554,7 +5580,6 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) restrictions apply recursively. */ bool cycle = true; - st = next_statement (); for (;;) { switch (st) @@ -5580,13 +5605,13 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) case ST_OMP_PARALLEL_MASKED: case ST_OMP_PARALLEL_MASTER: case ST_OMP_PARALLEL_SECTIONS: - parse_omp_structured_block (st, false); - break; + st = parse_omp_structured_block (st, false); + continue; case ST_OMP_PARALLEL_WORKSHARE: case ST_OMP_CRITICAL: - parse_omp_structured_block (st, true); - break; + st = parse_omp_structured_block (st, true); + continue; case ST_OMP_PARALLEL_DO: case ST_OMP_PARALLEL_DO_SIMD: @@ -5609,7 +5634,7 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) } } else - st = parse_executable (ST_NONE); + st = parse_executable (st); if (st == ST_NONE) unexpected_eof (); else if (st == S
Re: [PATCH, v2, OpenMP 5.2, Fortran] Strictly-structured block support for OpenMP directives
On 2021/10/21 12:15 AM, Jakub Jelinek wrote: +program main + integer :: x, i, n + + !$omp parallel + block +x = x + 1 + end block I'd prefer not to use those x = j or x = x + 1 etc. as statements that do random work here whenever possible. While those are dg-do compile testcases, especially if it is without dg-errors I think it is preferrable not to show bad coding examples. E.g. the x = x + 1 above is wrong for 2 reasons, x is uninitialized before the parallel, and there is a data race, the threads, teams etc. can write to x concurrently. I think better would be to use something like call do_work which doesn't have to be defined anywhere and will just stand there as a black box for unspecified work. + !$omp workshare + block +x = x + 1 + end block There are exceptions though, e.g. workshare is such a case, because e.g. call do_work is not valid in workshare. So, it is ok to keep using x = x + 1 here if you initialize it first at the start of the program. + !$omp workshare + block +x = 1 +!$omp critical +block + x = 3 +end block + end block And then there are cases like the above, please just use different variables there (all initialized) or say an array and access different elements in the different spots. Jakub Thanks, attached is what I finally committed. Chung-Lin From 2e4659199e814b7ee0f6bd925fd2c0a7610da856 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 21 Oct 2021 14:56:20 +0800 Subject: [PATCH] openmp: Fortran strictly-structured blocks support This implements strictly-structured blocks support for Fortran, as specified in OpenMP 5.2. This now allows using a Fortran BLOCK construct as the body of most OpenMP constructs, with a "!$omp end ..." ending directive optional for that form. gcc/fortran/ChangeLog: * decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case together with COMP_BLOCK. * parse.c (parse_omp_structured_block): Change return type to 'gfc_statement', add handling for strictly-structured block case, adjust recursive calls to parse_omp_structured_block. (parse_executable): Adjust calls to parse_omp_structured_block. * parse.h (enum gfc_compile_state): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK. * trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case handling. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/cancel-1.f90: Adjust testcase. * gfortran.dg/gomp/nesting-3.f90: Adjust testcase. * gfortran.dg/gomp/strictly-structured-block-1.f90: New test. * gfortran.dg/gomp/strictly-structured-block-2.f90: New test. * gfortran.dg/gomp/strictly-structured-block-3.f90: New test. libgomp/ChangeLog: * libgomp.texi (Support of strictly structured blocks in Fortran): Adjust to 'Y'. * testsuite/libgomp.fortran/task-reduction-16.f90: Adjust testcase. --- gcc/fortran/decl.c| 1 + gcc/fortran/parse.c | 69 +- gcc/fortran/parse.h | 2 +- gcc/fortran/trans-openmp.c| 6 +- gcc/testsuite/gfortran.dg/gomp/cancel-1.f90 | 3 + gcc/testsuite/gfortran.dg/gomp/nesting-3.f90 | 20 +- .../gomp/strictly-structured-block-1.f90 | 214 ++ .../gomp/strictly-structured-block-2.f90 | 139 .../gomp/strictly-structured-block-3.f90 | 52 + libgomp/libgomp.texi | 2 +- .../libgomp.fortran/task-reduction-16.f90 | 1 + 11 files changed, 484 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-3.f90 diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 6784b07ae9e..6043e100fbb 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -8429,6 +8429,7 @@ gfc_match_end (gfc_statement *st) break; case COMP_BLOCK: +case COMP_OMP_STRICTLY_STRUCTURED_BLOCK: *st = ST_END_BLOCK; target = " block"; eos_ok = 0; diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index 2a454be79b0..b1e73ee6801 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -5459,7 +5459,7 @@ parse_oacc_loop (gfc_statement acc_st) /* Parse the statements of an OpenMP structured block. */ -static void +static gfc_statement parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) { gfc_statement st, omp_end_st; @@ -5546,6 +5546,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) gcc_unreachable (); } + bool block_construct = false; + gfc_namespace *my_ns = NULL; + gfc_namespace *my_parent = NULL; + + st = next_statement ()
[PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num
Hi all, this patch implements the omp_get_device_num API function, which appears to be a missing piece in the library routines implementation. The host-side implementation is simple, which by specification is equivalent to omp_get_initial_device. Inside offloaded regions, the preferred way to should be that the device already has this information initialized (once) when the device is initialized. And the function merely returns the stored value. This implementation adds a convention for an additional entry (dubbed under 'others' in the code) returned by the 'load_image' plugin hook. Basically we define a variable name in libgomp-plugin.h, which the device libgomp defines, and the offload plugin searches for, and returns the variable device location start/end for gomp_load_image_from_device to initialize. The device-side omp_get_device_num then just returns that value. This patch implements for gcn and nvptx offload targets. The icv-device.c file is starting to look like a file ready to consolidate away the target specific versions, but that's for later. Basic libgomp tests were added for C/C++ and Fortran. Tested without regressions with offloading for amdgcn and nvptx on x86_64-linux host. Okay for trunk? Thanks, Chung-Lin 2021-07-23 Chung-Lin Tang libgomp/ChangeLog * icv-device.c (omp_get_device_num): New API function, host side. * fortran.c (omp_get_device_num_): New interface function. * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol. * libgomp.map (OMP_5.0.1): Add omp_get_device_num, omp_get_device_num_. * libgomp.texi (omp_get_device_num): Add documentation for new API function. * omp.h.in (omp_get_device_num): Add declaration. * omp_lib.f90.in (omp_get_device_num): Likewise. * omp_lib.h.in (omp_get_device_num): Likewise. * target.c (gomp_load_image_to_device): If additional entry for device number exists at end of returned entries from 'load_image_func' hook, copy the assigned device number over to the device variable. * config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * config/plugin/plugin-gcn.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * config/plugin/plugin-nvptx.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * testsuite/libgomp.c-c++-common/target-45.c: New test. * testsuite/libgomp.fortran/target10.f90: New test. diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c index 72d4f7cff74..8f72028a6c8 100644 --- a/libgomp/config/gcn/icv-device.c +++ b/libgomp/config/gcn/icv-device.c @@ -70,6 +70,16 @@ omp_is_initial_device (void) return 0; } +/* This is set to the device number of current GPU during device initialization, + when the offload image containing this libgomp portion is loaded. */ +static int GOMP_DEVICE_NUM_VAR; + +int +omp_get_device_num (void) +{ + return GOMP_DEVICE_NUM_VAR; +} + ialias (omp_set_default_device) ialias (omp_get_default_device) ialias (omp_get_initial_device) diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c index 3b96890f338..e586da1d3a8 100644 --- a/libgomp/config/nvptx/icv-device.c +++ b/libgomp/config/nvptx/icv-device.c @@ -58,8 +58,19 @@ omp_is_initial_device (void) return 0; } +/* This is set to the device number of current GPU during device initialization, + when the offload image containing this libgomp portion is loaded. */ +static int GOMP_DEVICE_NUM_VAR; + +int +omp_get_device_num (void) +{ + return GOMP_DEVICE_NUM_VAR; +} + ialias (omp_set_default_device) ialias (omp_get_default_device) ialias (omp_get_initial_device) ialias (omp_get_num_devices) ialias (omp_is_initial_device) +ialias (omp_get_device_num) diff --git a/libgomp/fortran.c b/libgomp/fortran.c index 4ec39c4e61b..2360582e32e 100644 --- a/libgomp/fortran.c +++ b/libgomp/fortran.c @@ -598,6 +598,12 @@ omp_get_initial_device_ (void) return omp_get_initial_device (); } +int32_t +omp_get_device_num_ (void) +{ + return omp_get_device_num (); +} + int32_t omp_get_max_task_priority_ (void) { diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c index c1bedf46647..f11bdfa85c4 100644 --- a/libgomp/icv-device.c +++ b/libgomp/icv-device.c @@ -61,8 +61,17 @@ omp_is_initial_device (void) return 1; } +int +omp_get_device_num (void) +{ + /* By specification, this is equivalent to omp_get_initial_devi
Re: [PATCH, OG10, OpenMP 5.0, committed] Implement relaxation of implicit map vs. existing device mappings
On 2021/5/7 8:35 PM, Thomas Schwinge wrote: On 2021-05-05T23:17:25+0800, Chung-Lin Tang via Gcc-patches wrote: This patch implements relaxing the requirements when a map with the implicit attribute encounters an overlapping existing map. [...] Oh, oh, these data mapping interfaces/semantics ares getting more and more "convoluted"... %-\ (Not your fault, of course.) Haven't looked in too much detail in the patch/implementation (I'm not very well-versend in the exact OpenMP semantics anyway), but I suppose we should do similar things for OpenACC, too. I think we even currently do have a gimplification-level "hack" to replicate data clauses' array bounds for implicit data clauses on compute constructs, if the default "complete" mapping is going to clash with a "limited" mapping that's specified in an outer OpenACC 'data' directive. (That, of course, doesn't work for the general case of non-lexical scoping, or dynamic OpenACC 'enter data', etc., I suppose) I suppose your method could easily replace and improve that; we shall look into that later. That said, in your patch, is this current implementation (explicitly) meant or not meant to be active for OpenACC, too, or just OpenMP (I couldn't quickly tell), and/or is it (implicitly?) a no-op for OpenACC? It appears that I have inadvertently enabled it for OpenACC as well! But everything was tested together, so I assume it works okay for that mode as well. The entire set of implicit-specific actions are enabled by the setting of 'OMP_CLAUSE_MAP_IMPLICIT_P (clause) = 1' inside gimplify.c:gimplify_adjust_omp_clauses_1, so in case you want to disable it for OpenACC again, that's where you need to add the guard condition. Also, another adjustment in this patch is how implicitly created clauses are added to the current clause list in gimplify_adjust_omp_clauses(). Instead of simply appending the new clauses to the end, this patch adds them at the position "after initial non-map clauses, but right before any existing map clauses". Probably you haven't been testing such a configuration; I've just pushed "Fix up 'c-c++-common/goacc/firstprivate-mappings-1.c' for C, non-LP64" to devel/omp/gcc-10 branch in commit c51cc3b96f0b562deaffcfbcc51043aed216801a, see attached. Thanks, I was relying on eyeballing to know where to fix testcases like this; I did fix another similar case, but missed this one. The reason for this is: when combined with other map clauses, for example: #pragma omp target map(rec.ptr[:N]) for (int i = 0; i < N; i++) rec.ptr[i] += 1; There will be an implicit map created for map(rec), because of the access inside the target region. The expectation is that 'rec' is implicitly mapped, and then the pointed array-section part by 'rec.ptr' will be mapped, and then attachment to the 'rec.ptr' field of the mapped 'rec' (in that order). If the implicit 'map(rec)' is appended to the end, instead of placed before other maps, the attachment operation will not find anything to attach to, and the entire region will fail. But that doesn't (negatively) affect user-visible semantics (OpenMP, and also OpenACC, if applicable), in that more/bigger objects then get mapped than were before? (I suppose not?) It probably won't affect user level semantics, although we should look out if this change in convention exposes some other bugs. Chung-Lin
[PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping semantics, and other front-end adjustments.
This patch largely implements three pieces of functionality: (1) Per discussion and clarification on the omp-lang mailing list, standards conforming behavior for mapping array sections should *NOT* also map the base-pointer, i.e for this code: struct S { int *ptr; ... }; struct S s; #pragma omp target enter data map(to: s.ptr[:100]) Currently we generate after gimplify: #pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \ map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) which is deemed incorrect. After this patch, the gimplify results are now adjusted to: #pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) (the attach operation is still generated, and if s.ptr is already mapped prior, attachment will happen) The correct way of achieving the base-pointer-also-mapped behavior would be to use: #pragma omp target enter data map(to: s.ptr, s.ptr[:100]) This adjustment in behavior required a number of small adjustments here and there in gimplify, including to accomodate map sequences for C++ references. There is also a small Fortran front-end patch involved (hence CCing Tobias). The new gimplify processing changed behavior in handling GOMP_MAP_ALWAYS_POINTER maps such that the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the Fortran FE was generating a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and the pre-patch behavior was removing this map anyways. I have a small change in trans-openmp.c:gfc_trans_omp_array_section to not generate the map in this case, and so far no bad test results. (2) The second part (though kind of related to the first above) are fixes in libgomp/target.c to not overwrite attached pointers when handling device<->host copies, mainly for the "always" case. This behavior is also noted in the 5.0 spec, but not yet properly coded before. (3) The third is a set of changes to the C/C++ front-ends to extend the allowed component access syntax in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, so despite in the long term the entire map clause syntax parsing is probably going to be revamped, we're still adding this in for now. These changes are enabled for both OpenACC and OpenMP. Tested on x86_64-linux with nvptx offloading with no regressions. Pushed to devel/omp/gcc-10, will send mainline version of patch later. Chung-Lin 2021-05-11 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.c (struct omp_dim): New struct type for use inside c_parser_omp_variable_list. (c_parser_omp_variable_list): Allow multiple levels of array and component accesses in array section base-pointer expression. (c_parser_omp_clause_to): Set 'allow_deref' to true in call to c_parser_omp_var_list_parens. (c_parser_omp_clause_from): Likewise. * c-typeck.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (c_finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/cp/ChangeLog: * parser.c (struct omp_dim): New struct type for use inside cp_parser_omp_var_list_no_open. (cp_parser_omp_var_list_no_open): Allow multiple levels of array and component accesses in array section base-pointer expression. (cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to cp_parser_omp_var_list for to/from clauses. * semantics.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (handle_omp_array_sections): Adjust pointer map generation of references. (finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_array_section): Do not generate GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type. gcc/ChangeLog: * gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter, accomodate case where 'offset' return of get_inner_reference is non-NULL. (is_or_contains_p): Further robustify conditions. (omp_target_reorder_clauses): In alloc/to/from sorting phase, also move following GOMP_MAP_ALWAYS_POINTER maps along. Add new sorting phase where we make sure pointers with an attach/detach map are ordered correctly. (gimplify_scan_omp_clauses): Add modifications to avoid creating GOMP_MAP_STRUCT and associated alloc map for attach/detach maps. gcc/testsuite/ChangeLog: * c-c++-common/goacc/deep-copy-arrayofstruct.c:
Re: [PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping semantics, and other front-end adjustments.
On 2021/5/11 11:15 , Thomas Schwinge wrote: Hi Chung-Lin! On 2021-05-11T19:28:04+0800, Chung-Lin Tang wrote: This patch largely implements three pieces of functionality: (1) Per discussion and clarification on the omp-lang mailing list, standards conforming behavior for mapping array sections should *NOT* also map the base-pointer, i.e for this code: struct S { int *ptr; ... }; struct S s; #pragma omp target enter data map(to: s.ptr[:100]) Currently we generate after gimplify: #pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \ map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) which is deemed incorrect. After this patch, the gimplify results are now adjusted to: #pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) (the attach operation is still generated, and if s.ptr is already mapped prior, attachment will happen) The correct way of achieving the base-pointer-also-mapped behavior would be to use: #pragma omp target enter data map(to: s.ptr, s.ptr[:100]) This adjustment in behavior required a number of small adjustments here and there in gimplify, including to accomodate map sequences for C++ references. I'm a bit confused by that -- this mandates the bulk of the testsuite changes that you've included, and these seem a step backwards in terms of user experience, but then, I have no state on the exact OpenMP specification requirements, so you certainly may be right on that. (And also, as Julian mentioned, how this relates to OpenACC semantics, which I also haven't considered in detail -- but I note you didn't adjust any OpenACC testcases for that, so I suppose that's really conditionalized to OpenMP only.) It is indeed a bit awkward to use, but that's what the omp-lang list seemed to decide. This change is OpenMP only. I took care to only handle OpenMP constructs like this in the middle-end, of course this does not preclude some mistake in adjusting the shared code paths... There is also a small Fortran front-end patch involved (hence CCing Tobias). The new gimplify processing changed behavior in handling GOMP_MAP_ALWAYS_POINTER maps such that the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the Fortran FE was generating a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and the pre-patch behavior was removing this map anyways. I have a small change in trans-openmp.c:gfc_trans_omp_array_section to not generate the map in this case, and so far no bad test results. Makes sense to argue that one separately, with testcases, for the master branch submission? Maybe. although this part was needed to solve a regression caused by the above changes. (2) The second part (though kind of related to the first above) are fixes in libgomp/target.c to not overwrite attached pointers when handling device<->host copies, mainly for the "always" case. This behavior is also noted in the 5.0 spec, but not yet properly coded before. Likewise, if that makes sense? Some of the separation of base-pointer/array-section in map clauses seemed to step on this bug (e.g. if one mechanically updates "s.ptr[:N]" into "s.ptr, s.ptr[:N]", and a target-update overwrites the base-pointer) So it's arguably separate, but also can cause some testsuite chaos if not included together. (3) The third is a set of changes to the C/C++ front-ends to extend the allowed component access syntax in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, so despite in the long term the entire map clause syntax parsing is probably going to be revamped, we're still adding this in for now. These changes are enabled for both OpenACC and OpenMP. Likewise, if that makes sense? ;-) Yeah, this might be separated :P Tested on x86_64-linux with nvptx offloading with no regressions. I'm seeing a regression with 'libgomp.oacc-c-c++-common/noncontig_array-1.c' execution testing, both C and C++, for '-O2' (but not '-O0'), and only for about half of the invocations. But it seems to reliable reproduce in GDB: Thread 1 "a.out" received signal SIGSEGV, Segmentation fault. gomp_decrement_refcount (do_remove=, do_copy=, delete_p=false, refcount_set=0x0, k=0xc4d450) at [...]/source-gcc/libgomp/target.c:468 468 uintptr_t orig_refcount = *refcount_ptr; (gdb) bt #0 gomp_decrement_refcount (do_remove=, do_copy=, delete_p=false, refcount_set=0x0, k=0xc4d450) at [...]/source-gcc/libgomp/target.c:468 #1 gomp_unmap_vars_internal (aq=0x0, aq@entry=0x8223c0, refcount_set=0x0, do_copyfrom=, do_copyfrom@entry=true, tgt=tgt@entry=0xc696a0) at [...]/source-gcc/libgomp/target.c:2065 #2 goacc_unmap_vars (tgt=tgt@entry=0xc696a0, do_copyfrom=do_copyfrom@entry=true, aq=aq@entry=0x0) at
[PATCH, OpenMP 5.0] Implement relaxation of implicit map vs. existing device mappings (for mainline trunk)
Hi Jakub, This is a version of patch https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569665.html for mainline trunk. This patch implements relaxing the requirements when a map with the implicit attribute encounters an overlapping existing map. As the OpenMP 5.0 spec describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22): "If a single contiguous part of the original storage of a list item with an implicit data-mapping attribute has corresponding storage in the device data environment prior to a task encountering the construct that is associated with the map clause, only that part of the original storage will have corresponding storage in the device data environment as a result of the map clause." Also tracked in the OpenMP spec context as issue #1463: https://github.com/OpenMP/spec/issues/1463 The implementation inside the compiler is to of course, tag the implicitly created maps with some indication of "implicit". I've done this with a OMP_CLAUSE_MAP_IMPLICIT_P macro, using 'base.deprecated_flag' underneath. There is an encoding of this as GOMP_MAP_IMPLICIT == GOMP_MAP_FLAG_SPECIAL_3|GOMP_MAP_FLAG_SPECIAL_4 in include/gomp-constants.h for the runtime, but I've intentionally avoided exploding the entire gimplify/omp-low with a new set of GOMP_MAP_IMPLICIT_TO/FROM/etc. symbols, instead adding in the new flag bits only at the final runtime call generation during omp-lowering. The rest is libgomp mapping taking care of the implicit case: allowing map success if an existing map is a proper subset of the new map, if the new map is implicit. Straightforward enough I think. There are also some additions to print the implicit attribute during tree pretty-printing, for that reason some scan tests were updated. Also, another adjustment in this patch is how implicitly created clauses are added to the current clause list in gimplify_adjust_omp_clauses(). Instead of simply appending the new clauses to the end, this patch adds them at the position "after initial non-map clauses, but right before any existing map clauses". The reason for this is: when combined with other map clauses, for example: #pragma omp target map(rec.ptr[:N]) for (int i = 0; i < N; i++) rec.ptr[i] += 1; There will be an implicit map created for map(rec), because of the access inside the target region. The expectation is that 'rec' is implicitly mapped, and then the pointed array-section part by 'rec.ptr' will be mapped, and then attachment to the 'rec.ptr' field of the mapped 'rec' (in that order). If the implicit 'map(rec)' is appended to the end, instead of placed before other maps, the attachment operation will not find anything to attach to, and the entire region will fail. Note: this touches a bit on another issue which I will be sending a patch for later: per the discussion on omp-lang, an array section list item should *not* be mapping its base-pointer (although an attachment attempt should exist), while in current GCC behavior, for struct member pointers like 'rec.ptr' above, we do map it (which should be deemed incorrect). This means that as of right now, this modification of map order doesn't really exhibit the above mentioned behavior yet. I have included it as part of this patch because the "[implicit]" tree printing requires modifying many gimple scan tests already, so including the test modifications together seems more manageable patch-wise. Tested with no regressions on x86_64-linux with nvptx offloading. Was already pushed to devel/omp/gcc-10 a while ago, asking for approval for mainline trunk. Chung-Lin 2021-05-14 Chung-Lin Tang include/ChangeLog: * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro. (GOMP_MAP_IMPLICIT): New special map kind bits value. (GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of special map kind bits. (GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds. gcc/ChangeLog: * tree.h (OMP_CLAUSE_MAP_IMPLICIT_P): New access macro for 'implicit' bit, using 'base.deprecated_flag' field of tree_node. * tree-pretty-print.c (dump_omp_clause): Add support for printing implicit attribute in tree dumping. * gimplify.c (gimplify_adjust_omp_clauses_1): Set OMP_CLAUSE_MAP_IMPLICIT_P to 1 if map clause is implicitly created. (gimplify_adjust_omp_clauses): Adjust place of adding implicitly created clauses, from simple append, to starting of list, after non-map clauses. * omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind values passed to libgomp for implicit maps. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-implicit-map-1.c: New test. * c-c++-common/goacc/combined-reduction.c: Adjust scan te
Re: [PATCH 5/5] Mapping of components of references to pointers to structs for OpenMP/OpenACC
Hi Julian, On 2021/5/15 5:27 AM, Julian Brown wrote: GCC currently raises a parse error for indirect accesses to struct members, where the base of the access is a reference to a pointer. This patch fixes that case. gcc/cp/ * semantics.c (finish_omp_clauses): Handle components of references to pointers to structs. libgomp/ * testsuite/libgomp.oacc-c++/deep-copy-17.C: Update test. --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -7670,7 +7670,12 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) if ((ort == C_ORT_ACC || ort == C_ORT_OMP) && TREE_CODE (t) == COMPONENT_REF && TREE_CODE (TREE_OPERAND (t, 0)) == INDIRECT_REF) - t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + { + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + /* References to pointers have a double indirection here. */ + if (TREE_CODE (t) == INDIRECT_REF) + t = TREE_OPERAND (t, 0); + } if (TREE_CODE (t) == COMPONENT_REF && ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP || ort == C_ORT_ACC) There is already a large plethora of such modifications in this patch: "[PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping semantics, and other front-end adjustments." https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html I am in the process of taking that patch to mainline, so are you sure this is not already handled there? diff --git a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C index dacbb520f3d..e038e9e3802 100644 --- a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C +++ b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C @@ -83,7 +83,7 @@ void strrp (void) a[0] = 8; c[0] = 10; e[0] = 12; - #pragma acc parallel copy(n->a[0:10], n->c[0:10], n->e[0:10]) + #pragma acc parallel copy(n->a[0:10], n->b, n->c[0:10], n->d, n->e[0:10]) { n->a[0] = n->c[0] + n->e[0]; } This testcase can be added. Chung-Lin
Re: [PATCH 7/7] [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes
On 2021/5/11 4:57 PM, Julian Brown wrote: This work-in-progress patch tries to get GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION to behave more like GOMP_MAP_ATTACH_DETACH -- in that the mapping is made to form groups to be processed by build_struct_group/build_struct_comp_map. I think that's important to integrate with how groups of mappings for array sections are handled in other cases. This patch isn't sufficient by itself to fix a couple of broken test cases at present (libgomp.c++/target-lambda-1.C, libgomp.c++/target-this-4.C), though. No, GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION is supposed to be just a slightly different behavior version of GOMP_MAP_ATTACH; it tolerates an unmapped pointer-target and assigns NULL on the device, instead of just gomp_fatal(). (see its handling in libgomp/target.c) In case OpenACC can have the same such zero-length array section behavior, we can just share one GOMP_MAP_ATTACH map. For now it is treated as separate cases. Chung-Lin 2021-05-11 Julian Brown gcc/ * gimplify.c (build_struct_comp_nodes): Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION handling. (build_struct_group): Process GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION as part of pointer group. (gimplify_scan_omp_clauses): Update prev_list_p such that GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION will form part of pointer group. --- gcc/gimplify.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 6d204908c82..c5cb486aa23 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -8298,7 +8298,9 @@ build_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end, if (grp_mid && OMP_CLAUSE_CODE (grp_mid) == OMP_CLAUSE_MAP && (OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ALWAYS_POINTER - || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH)) + || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH + || (OMP_CLAUSE_MAP_KIND (grp_mid) + == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION))) { tree c3 = build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), OMP_CLAUSE_MAP); @@ -8774,12 +8776,14 @@ build_struct_group (struct gimplify_omp_ctx *ctx, ? splay_tree_lookup (ctx->variables, (splay_tree_key) decl) : NULL); bool ptr = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_POINTER); - bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH); + bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH + || (OMP_CLAUSE_MAP_KIND (c) + == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)); bool attach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH); bool has_attachments = false; /* For OpenACC, pointers in structs should trigger an attach action. */ - if (attach_detach + if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH && ((region_type & (ORT_ACC | ORT_TARGET | ORT_TARGET_DATA)) || code == OMP_TARGET_ENTER_DATA || code == OMP_TARGET_EXIT_DATA)) @@ -9784,6 +9788,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, if (!remove && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ALWAYS_POINTER && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ATTACH_DETACH + && (OMP_CLAUSE_MAP_KIND (c) + != GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION) && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_TO_PSET && OMP_CLAUSE_CHAIN (c) && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (c)) == OMP_CLAUSE_MAP @@ -9792,7 +9798,9 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c)) == GOMP_MAP_ATTACH_DETACH) || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c)) - == GOMP_MAP_TO_PSET))) + == GOMP_MAP_TO_PSET) + || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c)) + == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION))) prev_list_p = list_p; break;
Re: [PATCH 7/7] [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes
On 2021/5/17 10:26 PM, Julian Brown wrote: OK, understood. But, I'm a bit concerned that we're ignoring some "hidden rules" with regards to OMP pointer clause ordering/grouping that certain code (at least the bit that creates GOMP_MAP_STRUCT node groups, and parts of omp-low.c) relies on. I believe those rules are as follows: - an array slice is mapped using two or three pointers -- two for a normal (non-reference) base pointer, and three if we have a reference to a pointer (i.e. in C++) or an array descriptor (i.e. in Fortran). So we can have e.g. GOMP_MAP_TO GOMP_MAP_ALWAYS_POINTER GOMP_MAP_TO GOMP_MAP_.*_POINTER GOMP_MAP_ALWAYS_POINTER GOMP_MAP_TO GOMP_MAP_TO_PSET GOMP_MAP_ALWAYS_POINTER - for OpenACC, we extend this to allow (up to and including gimplify.c) the GOMP_MAP_ATTACH_DETACH mapping. So we can have (for component refs): GOMP_MAP_TO GOMP_MAP_ATTACH_DETACH GOMP_MAP_TO GOMP_MAP_TO_PSET GOMP_MAP_ATTACH_DETACH GOMP_MAP_TO GOMP_MAP_.*_POINTER GOMP_MAP_ATTACH_DETACH For the scanning in insert_struct_comp_map (as it is at present) to work right, these groups must stay intact. I think the current behaviour of omp_target_reorder_clauses on the og10 branch can break those groups apart though! Originally this sorting was intended to enforce OpenMP 5.0 map ordering rules, although I did add some ATTACH_DETACH ordering code in the latest round of patching. May not be the best practice. (The "prev_list_p" stuff in the loop in question in gimplify.c just keeps track of the first node in these groups.) Such a brittle way of doing this; even the variable name is not that obvious in what it intends to do. For OpenACC, the GOMP_MAP_ATTACH_DETACH code does*not* depend on the previous clause when lowering in omp-low.c. But GOMP_MAP_ALWAYS_POINTER does! And in one case ("update" directive), GOMP_MAP_ATTACH_DETACH is rewritten to GOMP_MAP_ALWAYS_POINTER, so for that case at least, the dependency on the preceding mapping node must stay intact. Yes, I think there are some weird conventions here, stemming from the front-ends. I would think that _ALWAYS_POINTER should exist at a similar level like _ATTACH_DETACH, both a pointer operation, just different details in runtime behavior, though its intended purpose for C++ references seem to skew some things here and there. OpenACC also allows "bare" GOMP_MAP_ATTACH and GOMP_MAP_DETACH nodes (corresponding to the "attach" and "detach" clauses). Those are handled a bit differently to GOMP_MAP_ATTACH_DETACH in gimplify.c -- but GOMP_MAP_ATTACH_Z_L_A_S doesn't quite behave like that either, I don't think? IIRC, GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION was handled that way (just a single line in gimplify.c) due to idiosyncrasies with the surrounding generated maps from the C++ front-end (which ATM is the only user of this map-kind). So yeah, inside the compiler, its not entirely the same as GOMP_MAP_ATTACH, but it is intended to live through for the runtime to see. Anyway: I've not entirely understood what omp_target_reorder_clauses is doing, but I think it may need to try harder to keep the groups mentioned above together. What do you think? As you know, attach operations don't really need to be glued to the prior operations, it just has to be ordered after mapping of the pointer and the pointed. There's already some book-keeping to move clauses together, but as you say, it might need more. Overall, I think this re-organizing of the struct-group creation is a good thing, but actually as you probably also observed, this insistence of "in-flight" tree chain manipulation is just hard to work with and modify. Maybe instead of directly working on clause expression chains at this point, we should be stashing all this information into a single clause tree node, e.g. starting from the front-end, we can set 'OMP_CLAUSE_MAP_POINTER_KIND(c) = ALWAYS/ATTACH_DETACH/FIRSTPRIVATE/etc.', (instead of actually creating new, must-follow-in-order maps that's causing all these conventions). For struct-groups, during the start of gimplify_scan_omp_clauses(), we could work with map clause tree nodes with OMP_CLAUSE_MAP_STRUCT_LIST(c), which contains the entire TREE_LIST or VEC of elements. Then later, after scanning is complete, expand the list into the current form. Ordering is only created at this stage. Just an idea, not sure if it will help understandability in general, but it should definitely help to simplify when we're reordering due to other rules. Chung-Lin
[PATCH, OpenACC 2.7, v2] Adjust acc_map_data/acc_unmap_data interaction with reference counters
>> >>uintptr_t *refcount_ptr = &k->refcount; >> >> - if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) >> + if (k->refcount == REFCOUNT_ACC_MAP_DATA) >> +refcount_ptr = &k->dynamic_refcount; >> + else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) >> refcount_ptr = &k->structelem_refcount; >>else if (REFCOUNT_STRUCTELEM_P (k->refcount)) >> refcount_ptr = k->structelem_refcount_ptr; >> @@ -527,7 +529,9 @@ gomp_decrement_refcount (splay_tree_key k, htab_t >> *refcount_set, bool delete_p, >> >>uintptr_t *refcount_ptr = &k->refcount; >> >> - if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) >> + if (k->refcount == REFCOUNT_ACC_MAP_DATA) >> +refcount_ptr = &k->dynamic_refcount; >> + else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) >> refcount_ptr = &k->structelem_refcount; >>else if (REFCOUNT_STRUCTELEM_P (k->refcount)) >> refcount_ptr = k->structelem_refcount_ptr; >> @@ -560,6 +564,10 @@ gomp_decrement_refcount (splay_tree_key k, htab_t >> *refcount_set, bool delete_p, >>else if (*refcount_ptr > 0) >> *refcount_ptr -= 1; >> >> + /* Force back to 1 if this is an acc_map_data mapping. */ >> + if (k->refcount == REFCOUNT_ACC_MAP_DATA && *refcount_ptr == 0) >> +*refcount_ptr = 1; >> + >> end: >>if (*refcount_ptr == 0) >> { > > It's not clear to me why you need this handling -- instead of just > handling 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' here, that is, > early 'return'? > > Per my understanding, this code is for OpenACC only exercised for > structured data regions, and it seems strange (unnecessary?) to adjust > the 'dynamic_refcount' for these for 'acc_map_data'-mapped data? Or am I > missing anything? No, that is not true. It goes through almost everything through gomp_map_vars_existing/_internal. This is what happens when you acc_create/acc_copyin on a mapping created by acc_map_data. > Overall, your changes regress the > commit 3e888f94624294d2b9b34ebfee0916768e5d9c3f > "Add OpenACC 'acc_map_data' variant to > 'libgomp.oacc-c-c++-common/deep-copy-8.c'" > that I just pushed. I think you just need to handle > 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' in > 'libgomp/oacc-mem.c:goacc_enter_data_internal', 'if (n && struct_p)'? > Please verify. Fixed by adding another '&& n->refcount != REFCOUNT_ACC_MAP_DATA' check in goacc_enter_data_internal. > But please also to the "Minimal OpenACC variant corresponding to PR96668" > code in 'libgomp/oacc-mem.c:goacc_enter_data_internal' add a safeguard > that we're not running into 'REFCOUNT_ACC_MAP_DATA' there. I think > that's currently not (reasonably easily) possible, given that > 'acc_map_data' isn't available in OpenACC/Fortran, but it'll be available > later, and then I'd rather have an 'assert' trigger there, instead of > random behavior. (I'm not asking you to write a mixed OpenACC/Fortran > plus C test case for that scenario -- if feasible at all.) I am not really sure what you want me to do here, but REFCOUNT_ACC_MAP_DATA mappings are all created through a single GOMP_MAP_ALLOC kind. The complex stuff of MAP_STRUCT, MAP_TO_PSET, etc. should all be not related here (I presume even if Fortran eventually gets acc_map_data, it would be the compiler side which should take care of passing the raw data-pointer/array-size to the acc_map_data routine) I have re-tested this on x86_64-linux + nvptx. Please see if this is okay for committing to mainline. Thanks, Chung-Lin 2024-03-04 Chung-Lin Tang libgomp/ChangeLog: * libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2). * oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA, initialize dynamic_refcount as 1. (acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA, remove TODO comments. Add assert of 'n->dynamic_refcount >= 1' and comments. (goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case. (goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA. (goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case. * target.c (gomp_increment_refcount): Add REFCOUNT_ACC_MAP_DATA case. (gomp_decrement_refcount): Add REFCOUNT_ACC_MAP_DATA case, force lowest dyn
Re: [PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends
Hi Thomas, Tobias, On 2023/10/26 6:43 PM, Thomas Schwinge wrote: > +++ b/gcc/tree.h > @@ -1813,6 +1813,14 @@ class auto_suppress_location_wrappers > #define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \ > (OMP_CLAUSE_SUBCODE_CHECK (NODE, > OMP_CLAUSE_MAP)->base.addressable_flag) > > +/* Nonzero if OpenACC 'readonly' modifier set, used for 'copyin'. */ > +#define OMP_CLAUSE_MAP_READONLY(NODE) \ > + TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)) > + > +/* Same as above, for use in OpenACC cache directives. */ > +#define OMP_CLAUSE__CACHE__READONLY(NODE) \ > + TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_)) I'm not sure if these special accessor functions are actually useful, or we should just directly use 'TREE_READONLY' instead? We're only using them in contexts where it's clear that the 'OMP_CLAUSE_SUBCODE_CHECK' is satisfied, for example. >>> I find directly using TREE_READONLY confusing. >> >> FWIW, I've changed to use TREE_NOTHROW instead, if it can give a better >> sense of safety :P > > I don't understand that, why not use 'TREE_READONLY'? > >> I think there's a misunderstanding here anyways: we are not relying on a >> DECL marked >> TREE_READONLY here. We merely need the OMP_CLAUSE_MAP to be marked as >> OMP_CLAUSE_MAP_READONLY == 1. > > Yes, I understand that. My question was why we don't just use > 'TREE_READONLY (c)', where 'c' is the > 'OMP_CLAUSE_MAP'/'OMP_CLAUSE__CACHE_' clause (not its decl), and avoid > the indirection through > '#define OMP_CLAUSE_MAP_READONLY'/'#define OMP_CLAUSE__CACHE__READONLY', > given that we're only using them in contexts where it's clear that the > 'OMP_CLAUSE_SUBCODE_CHECK' is satisfied. I don't have a strong > preference, though. After further re-testing using TREE_NOTHROW, I have reverted to using TREE_READONLY, because TREE_NOTHROW clashes with OMP_CLAUSE_RELEASE_DESCRIPTOR (which doesn't use the OMP_CLAUSE_MAP_* naming convention and is not documented in gcc/tree-core.h either, hmmm...) I have added the comment adjustments in gcc/tree-core.h for the new uses of TREE_READONLY/readonly_flag. We basically all use OMP_CLAUSE_SUBCODE_CHECK macros for OpenMP clause expressions exclusively, so I don't see a reason to diverge from that style (even when context is clear). > Either way, you still need to document this: > > | Also, for the new use for OMP clauses, update 'gcc/tree.h:TREE_READONLY', > | and in 'gcc/tree-core.h' for 'readonly_flag' the > | "table lists the uses of each of the above flags". Okay, done as mentioned above. > In addition to a few individual comments above and below, you've also not > yet responded to my requests re test cases. I have greatly expanded the test scan patterns to include parallel/kernels/serial/data/enter data, as well as non-readonly copyin clause together with readonly. Also added simple 'declare' tests, but there is not anything to scan in the 'tree-original' dump though. >> + tree nl = list; >> + bool readonly = false; >> + matching_parens parens; >> + if (parens.require_open (parser)) >> +{ >> + /* Turn on readonly modifier parsing for copyin clause. */ >> + if (c_kind == PRAGMA_OACC_CLAUSE_COPYIN) >> + { >> + c_token *token = c_parser_peek_token (parser); >> + if (token->type == CPP_NAME >> + && !strcmp (IDENTIFIER_POINTER (token->value), "readonly") >> + && c_parser_peek_2nd_token (parser)->type == CPP_COLON) >> + { >> + c_parser_consume_token (parser); >> + c_parser_consume_token (parser); >> + readonly = true; >> + } >> + } >> + location_t loc = c_parser_peek_token (parser)->location; > > I suppose 'loc' here now points to after the opening '(' or after the > 'readonly :'? This is different from what 'c_parser_omp_var_list_parens' > does, and indeed, 'c_parser_omp_variable_list' states that "CLAUSE_LOC is > the location of the clause", not the location of the variable-list? As > this, I suppose, may change diagnostics, please restore the original > behavior. (This appears to be different in the C++ front end, huh.) Thanks for catching this! Fixed. >> --- a/gcc/fortran/openmp.cc >> +++ b/gcc/fortran/openmp.cc >> @@ -1197,7 +1197,7 @@ omp_inv_mask::omp_inv_mask (const omp_mask &m) : >> omp_mask (m) >> >> static bool >> gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op, >> - bool allow_common, bool allow_derived) >> + bool allow_common, bool allow_derived, bool readonly >> = false) >> { >>gfc_omp_namelist **head = NULL; >>if (gfc_match_omp_variable_list ("", list, allow_common, NULL, &head, >> true, >> @@ -1206,7 +1206,10 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, >> gfc_omp_map_op map_op, >> { >>gfc_omp_namelist *n; >>for (n = *head; n; n = n->next) >> -
Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis
Hi Richard, Thomas, On 2023/10/30 8:46 PM, Richard Biener wrote: >> >> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the >> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY' >> flag. >> >> The actual optimization then is done in this second patch. Chung-Lin >> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that. >> I don't have much experience with most of the following generic code, so >> would appreciate a helping hand, whether that conceptually makes sense as >> well as from the implementation point of view: First of all, I have removed all of the gimplify-stage scanning and setting of DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes to gimplify.cc now) I remember this code was an artifact of earlier attempts to allow struct-member pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed anyways. I think the omp_data_* member accesses when building child function side receiver_refs is blocking points-to analysis from working (didn't try digging deeper) Also during gimplify, VAR_DECLs appeared to be reused (at least in some cases) for map clause decl reference building, so hoping that the variables "happen to be" single-use and DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does appear to be a little risky. However, for firstprivate pointers processed during omp-low, it appears to be somewhat different. (see below description) > No, I don't think you can use that flag on non-default-defs, nor > preserve it on copying. So > it also doesn't nicely extend to DECLs as done by the patch. We > currently _only_ use it > for incoming parameters. When used on arbitrary code you can get to for > example > > ptr1(points-to-readony-memory) = &p->x; > ... access via ptr1 ... > ptr2 = &p->x; > ... access via ptr2 ... > > where both are your OMP regions differently constrained (the constrain is on > the > code in the region, _not_ on the actual protections of the pointed to > data, much like > for the fortran case). But now CSE comes along and happily replaces all ptr2 > with ptr2 in the second region and ... oops! Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in the second region"? That doesn't happen, because during omp-lower/expand, OMP target regions (which is all that this applies currently) is separated into different individual child functions. (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during omp-lower, when for firstprivate pointers (i.e. 'a' here) we set this bit when constructing the first load of this pointer) #pragma acc parallel copyin(readonly: a[:32]) copyout(r) { foo (a, a[8]); r = a[8]; } #pragma acc parallel copyin(readonly: a[:32]) copyout(r) { foo (a, a[12]); r = a[12]; } After omp-expand (before SSA): __attribute__((oacc parallel, omp target entrypoint, noclone)) void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i) { ... : D.2962 = .omp_data_i->D.2947; a.8 = D.2962; r.1 = (*a.8)[12]; foo (a.8, r.1); r.1 = (*a.8)[12]; D.2965 = .omp_data_i->r; *D.2965 = r.1; return; } __attribute__((oacc parallel, omp target entrypoint, noclone)) void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i) { ... : D.2968 = .omp_data_i->D.2939; a.4 = D.2968; r.0 = (*a.4)[8]; foo (a.4, r.0); r.0 = (*a.4)[8]; D.2971 = .omp_data_i->r; *D.2971 = r.0; return; } So actually, the creating of DECL_POINTS_TO_READONLY and its relaying to SSA_NAME_POINTS_TO_READONLY_MEMORY here, is actually quite similar to a default-def for an PARM_DECL, at least conceptually. (If offloading was structured significantly differently, say if child functions were separated much earlier before omp-lowering, than this readonly-modifier might possibly be a direct application of 'r' in the "fn spec" attribute) Other changes since first version of patch include: 1) update of C/C++ FE changes to new style in c-family/c-omp.cc 2) merging of two if cases in fortran/trans-openmp.cc like Thomas suggested 3) Update of readonly-2.c testcase to scan before/after "fre1" pass, to verify removal of a MEM load, also as Thomas suggested. I have re-tested this patch using mainline, with no regressions. Is this okay for mainline? Thanks, Chung-Lin 2024-04-03 Chung-Lin Tang gcc/c-family/ChangeLog: * c-omp.cc (c_omp_address_inspector::expand_array_base): Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause. (c_omp_address_inspector::expand_component_selector): Likewise. gcc/fortran/ChangeLog: * trans-openmp.cc (g
[PATCH, OpenACC 2.7, v3] Adjust acc_map_data/acc_unmap_data interaction with reference counters
Hi Thomas, On 2024/3/15 7:24 PM, Thomas Schwinge wrote: > Hi Chung-Lin! > > I realized: please add "PR libgomp/92840" to the Git commit log, as your > changes are directly a continuation of my earlier changes. Okay, I'll remember to do that. ... > - if (n->refcount != REFCOUNT_INFINITY) > + if (n->refcount != REFCOUNT_INFINITY > + && n->refcount != REFCOUNT_ACC_MAP_DATA) > n->refcount--; >n->dynamic_refcount--; > } > > + /* Mappings created by 'acc_map_data' may only be deleted by > + 'acc_unmap_data'. */ > + if (n->refcount == REFCOUNT_ACC_MAP_DATA > + && n->dynamic_refcount == 0) > +n->dynamic_refcount = 1; > + >if (n->refcount == 0) > { >bool copyout = (kind == GOMP_MAP_FROM > > ..., which really should have the same semantics? No strong opinion on > which of the two variants you now chose. My guess is that breaking off the REFCOUNT_ACC_MAP_DATA case separately will be lighter on any branch predictors (faster performing overall), so I will stick with my version here. >>> >>> It's not clear to me why you need this handling -- instead of just >>> handling 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' here, that is, >>> early 'return'? >>> >>> Per my understanding, this code is for OpenACC only exercised for >>> structured data regions, and it seems strange (unnecessary?) to adjust >>> the 'dynamic_refcount' for these for 'acc_map_data'-mapped data? Or am I >>> missing anything? >> >> No, that is not true. It goes through almost everything through >> gomp_map_vars_existing/_internal. >> This is what happens when you acc_create/acc_copyin on a mapping created by >> acc_map_data. > > But I don't understand what you foresee breaking with the following (on > top of your v2): > > --- a/libgomp/target.c > +++ b/libgomp/target.c > @@ -476,14 +476,14 @@ gomp_free_device_memory (struct gomp_device_descr > *devicep, void *devptr) > static inline void > gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set) > { > - if (k == NULL || k->refcount == REFCOUNT_INFINITY) > + if (k == NULL > + || k->refcount == REFCOUNT_INFINITY > + || k->refcount == REFCOUNT_ACC_MAP_DATA) > return; > >uintptr_t *refcount_ptr = &k->refcount; > > - if (k->refcount == REFCOUNT_ACC_MAP_DATA) > -refcount_ptr = &k->dynamic_refcount; > - else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) > + if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) > refcount_ptr = &k->structelem_refcount; ... > Can you please show a test case? I have re-tested the patch *without* the gomp_increment/decrement_refcount changes, and have these regressions (just to demonstrate what is affected): +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test Now, I have also re-tested your version (aka, just break early and return when k->refcount == REFCOUNT_ACC_MAP_DATA) And for the record, that also works (no regressions). However, I strongly suggest we use my version here where we adjust the dynamic_refcount, simply because: *It is the whole point of this project item in OpenACC 2.7* The 2.7 spec articulated how increment/decrement interacts with acc_map_data/acc_unmap_data and this patch was supposed to make libgomp more conforming to it implementation-wise. (otherwise, no point in working on this at all, as there wasn't really anything behaviorally wrong about our implementation before) > I see we already have: > > if ((kinds[i] & 0xff) == GOMP_MAP_TO_PSET > && tgt->list_count == 0) > { > /* 'declare target'. */ > assert (n->refcount == REFCOUNT_INFINITY); > > I think I wanted to you to add: > > --- a/libgomp/o
[PATCH, OpenACC 2.7] struct/array reductions for Fortran
Hi Tobias, Thomas, this patch adds support for Fortran to use arrays and struct(record) types in OpenACC reductions. There is still some shortcomings in the current state, mainly that only explicit-shaped arrays can be used (like its C counterpart). Anything else is currently a bit more complicated in the middle-end, since the existing reduction code creates an "init-op" (literal of initial values) which can't be done when say TYPE_MAX_VALUE (TYPE_DOMAIN (array_type)) is not a tree constant. I think we'll be on the hook to solve this later, but I think the current state is okay to submit. Tested without regressions on mainline (on top of first struct/array reduction patch[1]) Thanks, Chung-Lin [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html 2024-02-08 Chung-Lin Tang gcc/fortran/ChangeLog: * openmp.cc (oacc_reduction_defined_type_p): New function. (resolve_omp_clauses): Adjust OpenACC array reduction error case. Use oacc_reduction_defined_type_p for OpenACC. * trans-openmp.cc (gfc_trans_omp_array_reduction_or_udr): Add 'bool openacc' parameter, adjust part of function to be !openacc only. (gfc_trans_omp_reduction_list): Add 'bool openacc' parameter, pass to calls to gfc_trans_omp_array_reduction_or_udr. (gfc_trans_omp_clauses): Add 'openacc' argument to calls to gfc_trans_omp_reduction_list. (gfc_trans_omp_do): Pass 'op == EXEC_OACC_LOOP' as 'bool openacc' parameter in call to gfc_trans_omp_clauses. gcc/ChangeLog: * omp-low.cc (omp_reduction_init_op): Add checking if reduced array has constant bounds. (lower_oacc_reductions): Add handling of error_mark_node. gcc/testsuite/ChangeLog: * gfortran.dg/goacc/array-reduction.f90: Adjust testcase. * gfortran.dg/goacc/reduction.f95: Likewise. libgomp/ChangeLog: * libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90: New testcase. * libgomp/testsuite/libgomp.oacc-fortran/reduction-10.f90: Likewise. * libgomp/testsuite/libgomp.oacc-fortran/reduction-11.f90: Likewise. * libgomp/testsuite/libgomp.oacc-fortran/reduction-12.f90: Likewise. * libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90: Likewise. diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 0af80d54fad..4bba9e666d6 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -7047,6 +7047,72 @@ oacc_is_loop (gfc_code *code) || code->op == EXEC_OACC_LOOP; } +static bool +oacc_reduction_defined_type_p (enum gfc_omp_reduction_op rop, gfc_typespec *ts) +{ + if (rop == OMP_REDUCTION_USER || rop == OMP_REDUCTION_NONE) +return false; + + if (ts->type == BT_INTEGER) +switch (rop) + { + case OMP_REDUCTION_AND: + case OMP_REDUCTION_OR: + case OMP_REDUCTION_EQV: + case OMP_REDUCTION_NEQV: + return false; + default: + return true; + } + + if (ts->type == BT_LOGICAL) +switch (rop) + { + case OMP_REDUCTION_AND: + case OMP_REDUCTION_OR: + case OMP_REDUCTION_EQV: + case OMP_REDUCTION_NEQV: + return true; + default: + return false; + } + + if (ts->type == BT_REAL || ts->type == BT_COMPLEX) +switch (rop) + { + case OMP_REDUCTION_PLUS: + case OMP_REDUCTION_TIMES: + case OMP_REDUCTION_MINUS: + return true; + + case OMP_REDUCTION_AND: + case OMP_REDUCTION_OR: + case OMP_REDUCTION_EQV: + case OMP_REDUCTION_NEQV: + return false; + + case OMP_REDUCTION_MAX: + case OMP_REDUCTION_MIN: + return ts->type != BT_COMPLEX; + case OMP_REDUCTION_IAND: + case OMP_REDUCTION_IOR: + case OMP_REDUCTION_IEOR: + return false; + default: + gcc_unreachable (); + } + + if (ts->type == BT_DERIVED) +{ + for (gfc_component *p = ts->u.derived->components; p; p = p->next) + if (!oacc_reduction_defined_type_p (rop, &p->ts)) + return false; + return true; +} + + return false; +} + static void resolve_scalar_int_expr (gfc_expr *expr, const char *clause) { @@ -8137,13 +8203,15 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses, else n->sym->mark = 1; - /* OpenACC does not support reductions on arrays. */ - if (n->sym->as) + /* OpenACC current only supports array reductions on explicit-shape +arrays. */ + if ((n->sym->as && n->sym->as->type != AS_EXPLICIT) + || n->sym->attr.codimension) gfc_error ("Array %qs is not permitted in reduction at %L", n->sym->name, &n->where); } } - + for (n = omp_clauses->lists[OMP_LIST_TO]; n; n = n->next)
[committed] MAINTAINERS: Update my email address
Updated my email address. Thanks, Chung-Lin From ffeab69e1ffc0405da3a9222c7b9f7a000252702 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 25 Jan 2024 18:20:43 + Subject: [PATCH] MAINTAINERS: Update my work email address * MAINTAINERS: Update my work email address. --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 7d3b78d276e..8b11ddbc069 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -99,7 +99,7 @@ moxie portAnthony Green msp430 portNick Clifton nds32 port Chung-Ju Wu nds32 port Shiva Chen -nios2 port Chung-Lin Tang +nios2 port Chung-Lin Tang nios2 port Sandra Loosemore nvptx port Tom de Vries nvptx port Thomas Schwinge -- 2.34.1
[PATCH, OpenACC 2.7] Implement reductions for arrays and structs
Hi Thomas, Andrew, this patch implements reductions for arrays and structs for OpenACC. Following the pattern for OpenACC reductions, this is mostly in the respective NVPTX/GCN backends' *_goacc_reduction_setup/init/fini/teardown hooks, particularly in the fini part, and [nvptx/gcn]_reduction_update routines. The code is mostly similar between the two targets, with mostly the lack of vector mode handling in GCN. To Julian, there is a patch to the middle-end neutering, a hack actually, that detects SSA_NAMEs used in reduction array MEM_REFs, and avoids single->parallel copying (by moving those definitions before BUILT_IN_GOACC_SINGLE_COPY_START). This appears to work because reductions do their own initializing of the private copy. As we discussed in our internal calls, the real proper way is to create the private array in a more appropriate stage, but that is too long a shot for now. The changes here are needed at least for some -O0 cases (when under optimization, propagation of the private copies' local address eliminate the SSA_NAME and things actually just work in that case). So please bear with this hack. I believe the new added libgomp testcases should be fairly complete. Though note that one case of reduction of * for double arrays has been commented out for now, for there appears to be a (presumably) unrelated issue causing this case to fail (maybe has to do with the loop-based atomic form used by both NVPTX/GCN). Maybe should XFAIL instead of comment out. Will do this in next iteration. Thanks, Chung-Lin 2024-01-02 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_reduction): Adjustments for OpenACC-specific cases. * c-typeck.cc (c_oacc_reduction_defined_type_p): New function. (c_oacc_reduction_code_name): Likewise. (c_finish_omp_clauses): Handle OpenACC cases using new functions. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_reduction): Adjustments for OpenACC-specific cases. * semantics.cc (cp_oacc_reduction_defined_type_p): New function. (cp_oacc_reduction_code_name): Likewise. (finish_omp_reduction_clause): Handle OpenACC cases using new functions. gcc/ChangeLog: * config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for handling ARRAY_TYPE and RECORD_TYPE reductions. (gcn_goacc_reduction_setup): Likewise. (gcn_goacc_reduction_init): Likewise. (gcn_goacc_reduction_fini): Likewise. (gcn_goacc_reduction_teardown): Likewise. * config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate V2SI shuffle using vec_extract op. (nvptx_get_shared_red_addr): Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT instead of machine mode based. (nvptx_reduction_update): Additions for handling ARRAY_TYPE and RECORD_TYPE reductions. (nvptx_goacc_reduction_setup): Likewise. (nvptx_goacc_reduction_init): Likewise. (nvptx_goacc_reduction_fini): Likewise. (nvptx_goacc_reduction_teardown): Likewise. * omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type building to use decl type, rather than generic ptr_type_node. (omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op construction. (lower_oacc_reductions): Add code to teardown/recover array access MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements. Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT instead of machine mode based. * omp-oacc-neuter-broadcast.cc (worker_single_copy): Add 'hash_set *array_reduction_base_vars' parameter. Add xxx. (neuter_worker_single): Add 'hash_set *array_reduction_base_vars' parameter. Adjust recursive calls to self and worker_single_copy. (oacc_do_neutering): Add 'hash_set *array_reduction_base_vars' parameter. Adjust call to neuter_worker_single. (execute_omp_oacc_neuter_broadcast): Add local 'hash_set array_reduction_base_vars' declaration. Collect MEM_REF base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add '&array_reduction_base_vars' argument to call of oacc_do_neutering. * omp-offload.cc (default_goacc_reduction): Add unshare_expr. gcc/testsuite/ChangeLog: * c-c++-common/goacc/reduction-9.c: New test. * c-c++-common/goacc/reduction-10.c: New test. * c-c++-common/goacc/reduction-11.c: New test. * c-c++-common/goacc/reduction-12.c: New test. * c-c++-common/goacc/reduction-13.c: New test. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/reduction.h (check_reduction_array_xx): New macro. (operator_apply): Likewise. (check_reduction_a
[PATCH, OpenACC 2.7, v2] Implement reductions for arrays and structs
Hi Thomas, This is v2 of the C/C++/middle-end parts of array/struct support for OpenACC reductions. The main changes are much fixed support for sub-arrays, and some new testcases. Tested on mainline using x86_64 host and nvptx/amdgcn offloading. Will backport to upcoming omp/devel/gcc-14 branch after approved for mainline. Thanks, Chung-Lin 2024-06-06 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_reduction): Adjustments for OpenACC-specific cases. * c-typeck.cc (c_oacc_reduction_defined_type_p): New function. (c_oacc_reduction_code_name): Likewise. (c_finish_omp_clauses): Handle OpenACC cases using new functions. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_reduction): Adjustments for OpenACC-specific cases. * semantics.cc (cp_oacc_reduction_defined_type_p): New function. (cp_oacc_reduction_code_name): Likewise. (finish_omp_reduction_clause): Handle OpenACC cases using new functions. gcc/ChangeLog: * config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for handling ARRAY_TYPE and RECORD_TYPE reductions. (gcn_goacc_reduction_setup): Likewise. (gcn_goacc_reduction_init): Likewise. (gcn_goacc_reduction_fini): Likewise. (gcn_goacc_reduction_teardown): Likewise. * config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate V2SI shuffle using vec_extract op. (nvptx_get_shared_red_addr): Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT instead of machine mode based. (nvptx_reduction_update): Additions for handling ARRAY_TYPE and RECORD_TYPE reductions. (nvptx_goacc_reduction_setup): Likewise. (nvptx_goacc_reduction_init): Likewise. (nvptx_goacc_reduction_fini): Likewise. (nvptx_goacc_reduction_teardown): Likewise. * gimplify.cc (gimplify_scan_omp_clauses): Sanity checking for supported array reduction cases. (gimplify_adjust_omp_clauses): Peel away array MEM_REF for decl lookup. * omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type building to use decl type, rather than generic ptr_type_node. (omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op construction. (lower_rec_input_clauses): Set OMP_CLAUSE_REDUCTION_PRIVATE_EXPR. (oacc_array_reduction_bias): New function. (lower_oacc_reductions): Add code to teardown/recover array access MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements. Use OMP_CLAUSE_REDUCTION_PRIVATE_EXPR as reduction private copy if set. Handle array reductions using new oacc_array_reduction_bias function. Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT instead of machine mode based. * omp-oacc-neuter-broadcast.cc (worker_single_copy): Add 'hash_set *array_reduction_base_vars' parameter. Add xxx. (neuter_worker_single): Add 'hash_set *array_reduction_base_vars' parameter. Adjust recursive calls to self and worker_single_copy. (oacc_do_neutering): Add 'hash_set *array_reduction_base_vars' parameter. Adjust call to neuter_worker_single. (execute_omp_oacc_neuter_broadcast): Add local 'hash_set array_reduction_base_vars' declaration. Collect MEM_REF base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add '&array_reduction_base_vars' argument to call of oacc_do_neutering. * omp-offload.cc (default_goacc_reduction): Add unshare_expr. * tree.cc (omp_clause_num_ops): Increase OMP_CLAUSE_REDUCTION ops to 6. * tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_EXPR): New macro. gcc/testsuite/ChangeLog: * c-c++-common/goacc/reduction-9.c: New test. * c-c++-common/goacc/reduction-10.c: New test. * c-c++-common/goacc/reduction-11.c: New test. * c-c++-common/goacc/reduction-12.c: New test. * c-c++-common/goacc/reduction-13.c: New test. * c-c++-common/goacc/reduction-14.c: New test. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/reduction.h (check_reduction_array_xx): New macro. (operator_apply): Likewise. (check_reduction_array_op): Likewise. (check_reduction_arraysec_op): Likewise. (function_apply): Likewise. (check_reduction_array_macro): Likewise. (check_reduction_arraysec_macro): Likewise. (check_reduction_xxx_xx_all): Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/reduction-arrays-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/reduction-arrays-3.c: New test. * testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c: New test. diff --git a/gcc/c/c-parser
[PATCH, OpenACC 2.7, v3] Implement reductions for arrays and structs
On 2024/6/6 9:41 PM, Chung-Lin Tang wrote: > This is v2 of the C/C++/middle-end parts of array/struct > support for OpenACC reductions. > > The main changes are much fixed support for sub-arrays, > and some new testcases. > > Tested on mainline using x86_64 host and nvptx/amdgcn offloading. > Will backport to upcoming omp/devel/gcc-14 branch after approved for mainline. This is a quick update to a "v3" version: apart from tiny bug fixes in testcases, an addition of automatic LDS increase for GCN (triggered by reductions over arrays of sufficient size). Andrew, what I now do in gcn_shared_mem_layout is: increase acc_lds_size by increments of 0x600, while giving a warning that this may decrease occupancy. Another warning type is given when the LDS usage is more than architectural limit of 64KB, but compilation is allowed to proceed. I think this is the better route, since maybe this limit is not very "hard" (more allowed in future?) (FWIW, I was able to at least run such offload regions with more than 64K LDS usage, though I'm not sure if somewhere later in the compiler/linker curbs this automatically) Thanks, Chung-Lin 2024-06-18 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_reduction): Adjustments for OpenACC-specific cases. * c-typeck.cc (c_oacc_reduction_defined_type_p): New function. (c_oacc_reduction_code_name): Likewise. (c_finish_omp_clauses): Handle OpenACC cases using new functions. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_reduction): Adjustments for OpenACC-specific cases. * semantics.cc (cp_oacc_reduction_defined_type_p): New function. (cp_oacc_reduction_code_name): Likewise. (finish_omp_reduction_clause): Handle OpenACC cases using new functions. gcc/ChangeLog: * config/gcn/gcn.cc (LDS_INCR_UNIT): New macro symbol. (acc_lds_size): Adjust init value definition. (gcn_shared_mem_layout): Adjust acc_lds_size when reduction size too large. Issue warning when reduction size causes LDS usage to increase or break 64K limit. * config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for handling ARRAY_TYPE and RECORD_TYPE reductions. (gcn_goacc_reduction_setup): Likewise. (gcn_goacc_reduction_init): Likewise. (gcn_goacc_reduction_fini): Likewise. (gcn_goacc_reduction_teardown): Likewise. * config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate V2SI shuffle using vec_extract op. (nvptx_get_shared_red_addr): Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT instead of machine mode based. (nvptx_reduction_update): Additions for handling ARRAY_TYPE and RECORD_TYPE reductions. (nvptx_goacc_reduction_setup): Likewise. (nvptx_goacc_reduction_init): Likewise. (nvptx_goacc_reduction_fini): Likewise. (nvptx_goacc_reduction_teardown): Likewise. * gimplify.cc (gimplify_scan_omp_clauses): Sanity checking for supported array reduction cases. (gimplify_adjust_omp_clauses): Peel away array MEM_REF for decl lookup. * omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type building to use decl type, rather than generic ptr_type_node. (omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op construction. (lower_rec_input_clauses): Set OMP_CLAUSE_REDUCTION_PRIVATE_EXPR. (oacc_array_reduction_bias): New function. (lower_oacc_reductions): Add code to teardown/recover array access MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements. Use OMP_CLAUSE_REDUCTION_PRIVATE_EXPR as reduction private copy if set. Handle array reductions using new oacc_array_reduction_bias function. Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT instead of machine mode based. * omp-oacc-neuter-broadcast.cc (worker_single_copy): Add 'hash_set *array_reduction_base_vars' parameter. Add xxx. (neuter_worker_single): Add 'hash_set *array_reduction_base_vars' parameter. Adjust recursive calls to self and worker_single_copy. (oacc_do_neutering): Add 'hash_set *array_reduction_base_vars' parameter. Adjust call to neuter_worker_single. (execute_omp_oacc_neuter_broadcast): Add local 'hash_set array_reduction_base_vars' declaration. Collect MEM_REF base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add '&array_reduction_base_vars' argument to call of oacc_do_neutering. * omp-offload.cc (default_goacc_reduction): Add unshare_expr. * tree.cc (omp_clause_num_ops): Increase OMP_CLAUSE_REDUCTION ops to 6. * tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_EXPR):
Re: [PATCH, OpenACC 2.7, v3] Adjust acc_map_data/acc_unmap_data interaction with reference counters
On 2024/4/12 3:14 PM, Thomas Schwinge wrote: >> I have re-tested the patch *without* the gomp_increment/decrement_refcount >> changes, >> and have these regressions (just to demonstrate what is affected): >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 >> execution test >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 >> execution test >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 >> execution test >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 >> execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 >> execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 >> execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 >> execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 >> execution test > ... are cases where we 'acc_map_data' something, and then invoke an > OpenACC compute constuct with a data clause for the same memory region... > >> Now, I have also re-tested your version (aka, just break early and return >> when k->refcount == REFCOUNT_ACC_MAP_DATA) >> And for the record, that also works (no regressions). >> >> However, I strongly suggest we use my version here where we adjust the >> dynamic_refcount > ..., and it's confusing to me why such an OpenACC compute constuct (which > is to use the structured reference counter) should then use the dynamic > reference counter, for 'acc_map_data'-mapped data? > >> simply because: *It is the whole point of this project item in OpenACC 2.7* >> >> The 2.7 spec articulated how increment/decrement interacts with >> acc_map_data/acc_unmap_data and this patch was supposed to make libgomp more >> conforming to it implementation-wise. >> (otherwise, no point in working on this at all, as there wasn't really >> anything behaviorally wrong about our implementation before) > That is, in my understanding, those 'gomp_increment_refcount' changes > don't affect the 'acc_map_data' reference counting, but instead, they > change the reference counting for OpenACC constructs that are originally > using structured reference counter to instead use the dynamic reference > counter. This doesn't seem conceptually right to me. (..., even if not > observable from the outside.) Okay, I've committed the attached patch, with the "early return upon k->refcount == REFCOUNT_ACC_MAP_DATA" in gomp_increment/decrement_refcount. If we continue to use k->refcount itself as the flag holder of map type, I guess we will not be able to directly determine whether it is a structured or dynamic adjustment at that point. Probably need a new field entirely. I think we don't really need to do that right now. Thanks, Chung-Lin From a7578a077ed8b64b94282aa55faf7037690abbc5 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Tue, 16 Apr 2024 09:03:21 + Subject: [PATCH] OpenACC 2.7: Adjust acc_map_data/acc_unmap_data interaction with reference counters This patch adjusts the implementation of acc_map_data/acc_unmap_data API library routines to more fit the description in the OpenACC 2.7 specification. Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA special value to mark acc_map_data-created mappings. Adjustment around mapping related code to respect OpenACC semantics are also added. libgomp/ChangeLog: * libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2). * oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA, initialize dynamic_refcount as 1. (acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA, (goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case. (goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA. (goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case. * targe
[PING] Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Ping. BTW, the SC has approved the Nios II port already: http://gcc.gnu.org/ml/gcc/2013-07/msg00434.html The port is still awaiting technical review. Thanks, Chung-Lin On 13/7/14 下午3:54, Chung-Lin Tang wrote: > Hi, the last ping of the Nios II patches was: > http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html > > After assessing the state, we feel it would be better to post a > re-submission of the newest patches. > > The changes accumulated since the original post include: > > 1) Several bug fixes related to built-in function expanding. > 2) A few holes in hard-float FPU code generation was plugged. > 3) Support for parsing white-spaces in target attributes. > 4) Revision of consistency check behavior of codes in custom instruction > built-ins. > 5) Some new testcases. > > The issues raised by Joseph in the first round of reviewing have been > addressed. Testing has been re-done on both 32-bit and 64-bit hosts. > > PR55035 appears to not have been resolved yet, which affects nios2 among > several other targets, thus configured with --enable-werror-always still > does not build. > > As before, Sandra and me will serve as nios2 port maintainers. > > Attached is the patch for the compiler-proper. > > Thanks, > Chung-Lin > > 2013-07-14 Chung-Lin Tang > Sandra Loosemore > Based on patches from Altera Corporation > > * config.gcc (nios2-*-*): Add nios2 config targets. > * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case. > ("$cpu_type"): Add nios2 as new cpu type. > * configure: Regenerate. > * config/nios2/nios2.c: New file. > * config/nios2/nios2.h: New file. > * config/nios2/nios2-opts.h: New file. > * config/nios2/nios2-protos.h: New file. > * config/nios2/elf.h: New file. > * config/nios2/elf.opt: New file. > * config/nios2/linux.h: New file. > * config/nios2/nios2.opt: New file. > * config/nios2/nios2.md: New file. > * config/nios2/predicates.md: New file. > * config/nios2/constraints.md: New file. > * config/nios2/t-nios2: New file. > * common/config/nios2/nios2-common.c: New file. > * doc/invoke.texi (Nios II options): Document Nios II specific > options. > * doc/md.texi (Nios II family): Document Nios II specific > constraints. > * doc/extend.texi (Function Specific Option Pragmas): Document > Nios II supported target pragma functionality. >
Re: [PING] Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Ping. On 13/8/20 10:57 AM, Chung-Lin Tang wrote: > Ping. > > BTW, the SC has approved the Nios II port already: > http://gcc.gnu.org/ml/gcc/2013-07/msg00434.html > > The port is still awaiting technical review. > > Thanks, > Chung-Lin > > On 13/7/14 下午3:54, Chung-Lin Tang wrote: >> Hi, the last ping of the Nios II patches was: >> http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html >> >> After assessing the state, we feel it would be better to post a >> re-submission of the newest patches. >> >> The changes accumulated since the original post include: >> >> 1) Several bug fixes related to built-in function expanding. >> 2) A few holes in hard-float FPU code generation was plugged. >> 3) Support for parsing white-spaces in target attributes. >> 4) Revision of consistency check behavior of codes in custom instruction >> built-ins. >> 5) Some new testcases. >> >> The issues raised by Joseph in the first round of reviewing have been >> addressed. Testing has been re-done on both 32-bit and 64-bit hosts. >> >> PR55035 appears to not have been resolved yet, which affects nios2 among >> several other targets, thus configured with --enable-werror-always still >> does not build. >> >> As before, Sandra and me will serve as nios2 port maintainers. >> >> Attached is the patch for the compiler-proper. >> >> Thanks, >> Chung-Lin >> >> 2013-07-14 Chung-Lin Tang >> Sandra Loosemore >> Based on patches from Altera Corporation >> >> * config.gcc (nios2-*-*): Add nios2 config targets. >> * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case. >> ("$cpu_type"): Add nios2 as new cpu type. >> * configure: Regenerate. >> * config/nios2/nios2.c: New file. >> * config/nios2/nios2.h: New file. >> * config/nios2/nios2-opts.h: New file. >> * config/nios2/nios2-protos.h: New file. >> * config/nios2/elf.h: New file. >> * config/nios2/elf.opt: New file. >> * config/nios2/linux.h: New file. >> * config/nios2/nios2.opt: New file. >> * config/nios2/nios2.md: New file. >> * config/nios2/predicates.md: New file. >> * config/nios2/constraints.md: New file. >> * config/nios2/t-nios2: New file. >> * common/config/nios2/nios2-common.c: New file. >> * doc/invoke.texi (Nios II options): Document Nios II specific >> options. >> * doc/md.texi (Nios II family): Document Nios II specific >> constraints. >> * doc/extend.texi (Function Specific Option Pragmas): Document >> Nios II supported target pragma functionality. >> >
Re: [PATCH 0/6] Thread pointer built-in functions
This patch set has been committed, thanks to all maintainers who reviewed the respective parts. Thanks, Chung-Lin Full ChangeLog: 2012-10-11 Chung-Lin Tang * builtins.c (expand_builtin_thread_pointer): New. (expand_builtin_set_thread_pointer): New. (expand_builtin): Add BUILT_IN_THREAD_POINTER, BUILT_IN_SET_THREAD_POINTER expand cases. * builtins.def (BUILT_IN_THREAD_POINTER): New __builtin_thread_pointer builtin. (BUILT_IN_SET_THREAD_POINTER): New __builtin_set_thread_pointer builtin. * optabs.def (get_thread_pointer,set_thread_pointer): New standard names. * doc/md.texi (Standard Names): Document get_thread_pointer and set_thread_pointer patterns. * config/alpha/alpha.md (get_thread_pointerdi): Rename from load_tp. (set_thread_pointerdi): Rename from set_tp. * config/alpha/alpha.c (alpha_legitimize_address_1): Change gen_load_tp calls to gen_get_thread_pointerdi. (alpha_builtin): Remove ALPHA_BUILTIN_THREAD_POINTER, ALPHA_BUILTIN_SET_THREAD_POINTER. (code_for_builtin): Remove CODE_FOR_load_tp, CODE_FOR_set_tp. (alpha_init_builtins): Remove __builtin_thread_pointer, __builtin_set_thread_pointer machine-specific builtins. (alpha_expand_builtin_thread_pointer): Add hook function for TARGET_EXPAND_BUILTIN_THREAD_POINTER. (alpha_expand_builtin_set_thread_pointer): Add hook function for TARGET_EXPAND_BUILTIN_SET_THREAD_POINTER. (alpha_fold_builtin): Remove ALPHA_BUILTIN_THREAD_POINTER, ALPHA_BUILTIN_SET_THREAD_POINTER cases. * config/arm/arm.md (get_thread_pointersi): New pattern. * config/arm/arm-protos.h (arm_load_tp): Add extern declaration. * config/arm/arm.c (arm_load_tp): Remove static. (arm_builtins): Remove ARM_BUILTIN_THREAD_POINTER. (arm_init_tls_builtins): Remove function. (arm_init_builtins): Remove call to arm_init_tls_builtins(). (arm_expand_builtin): Remove ARM_BUILTIN_THREAD_POINTER case. * config/mips/mips.md (get_thread_pointer): New pattern. * config/mips/mips-protos.h (mips_expand_thread_pointer): Add extern declaration. * config/mips/mips.c (mips_expand_thread_pointer): Renamed from mips_get_tp. (mips_get_tp): New stub calling mips_expand_thread_pointer. * config/s390/s390.c (s390_builtin,code_for_builtin_64, code_for_builtin_31,s390_init_builtins,s390_expand_builtin): Remove. * config/s390/s390.md (get_tp_64,get_tp_31,set_tp_64,set_tp_31): Remove. (get_thread_pointer,set_thread_pointer): New, adapted from removed patterns. * config/xtensa/xtensa.md (get_thread_pointersi): Renamed from load_tp. (set_thread_pointersi): Renamed from set_tp. * config/xtensa/xtensa.c (xtensa_legitimize_tls_address): Change gen_load_tp calls to gen_get_thread_pointersi. (xtensa_builtin): Remove XTENSA_BUILTIN_THREAD_POINTER and XTENSA_BUILTIN_SET_THREAD_POINTER. (xtensa_init_builtins): Remove __builtin_thread_pointer, __builtin_set_thread_pointer machine-specific builtins. (xtensa_fold_builtin): Remove XTENSA_BUILTIN_THREAD_POINTER, XTENSA_BUILTIN_SET_THREAD_POINTER cases. (xtensa_expand_builtin): Remove XTENSA_BUILTIN_THREAD_POINTER, XTENSA_BUILTIN_SET_THREAD_POINTER cases.
Re: [PATCH 0/6] Thread pointer built-in functions / [SH] PR 54760
On 2012/10/12 06:55 AM, Oleg Endo wrote: > This broke the recently added thread pointer built-ins on SH, but I was > prepared for that, so no problem here. The attached patch is a straight > forward fix. > > However, with the patch applied I get an ICE on one of the SH thread > pointer tests: gcc/testsuite/gcc.target/sh/pr54760-3.c, function > test04: > > internal compiler error: in expand_insn, at optabs.c:8208 > __builtin_set_thread_pointer (xx[i]); Looks like I was supposed to use create_input_operand() there instead. I've committed the attached patch as obvious. This should be fixed now. Thanks, Chung-Lin * builtins.c (expand_builtin_set_thread_pointer): Use create_input_operand() instead of create_fixed_operand(). Index: builtins.c === --- builtins.c (revision 192421) +++ builtins.c (revision 192422) @@ -5776,7 +5776,7 @@ struct expand_operand op; rtx val = expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX, Pmode, EXPAND_NORMAL); - create_fixed_operand (&op, val); + create_input_operand (&op, val, Pmode); expand_insn (icode, 1, &op); return; }
Re: [PATCH, ARM] Fix PR44557 (Thumb-1 ICE)
On 12/9/27 6:25 AM, Janis Johnson wrote: > On 09/26/2012 01:58 AM, Chung-Lin Tang wrote: > > +/* { dg-do compile } */ > +/* { dg-options "-mthumb -O1 -march=armv5te -fno-omit-frame-pointer > -fno-forward-propagate" } */ > +/* { dg-require-effective-target arm_thumb1_ok } */ > > This test will fail to compile for test flags that conflict with > the -march option, and the specified -march option might be > overridden with similar options from other test flags. The problem > might have also been seen for other -march options. I recommend > leaving it off and omitting the dg-require so the test can be run > for more multilibs. I'm not sure, as the intent is to test a Thumb-1 case here. If the maintainers think we should adjust the testcase, I'm of course fine with it. And ping for the patch. Thanks, Chung-Lin
[PATCH][xtensa] Remove unused variable
Hi Sterling, the last thread pointer builtin changes left an unused 'arg' variable in xtensa_expand_builtin(), which triggered a new warning. Thanks to Jan-Benedict for testing this. Attached patch was committed as obvious. Thanks, Chung-Lin * config/xtensa/xtensa.c (xtensa_expand_builtin): Remove unused 'arg' variable. Index: config/xtensa/xtensa.c === --- config/xtensa/xtensa.c (revision 192647) +++ config/xtensa/xtensa.c (working copy) @@ -3133,7 +3133,6 @@ xtensa_expand_builtin (tree exp, rtx target, { tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); unsigned int fcode = DECL_FUNCTION_CODE (fndecl); - rtx arg; switch (fcode) {
Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).
On 2013/10/6 05:57 PM, Richard Sandiford wrote: >> > But case 16 is different. >> > This case is only produced at prologue/epilogue phase, using a temporary >> > register $r15 to hold a large constant for adjusting stack pointer. >> > Since prologue/epilogue is after split1/split2 phase, we can only >> > output "sethi" + "ori" directly. >> > (The "addi" instruction with $r15 is a 32-bit instruction.) > But this code is in the output template of the define_insn. That code > is only executed during final, after all passes have been run. If the > template returns "#", final will split the instruction itself, which is > possible even at that late stage. "#" doesn't have any effect on the > passes themselves. > > (FWIW, there's also a split3 pass that runs after prologue/epilogue > generation but before sched2.) > > However, ISTR there is/was a rule that prologue instructions shouldn't > be split, since they'd lose their RTX_FRAME_RELATED_P bit or something. > Maybe you hit an ICE because of that? > > Another way to handle this would be to have the movsi expander split > large constant moves. When can_create_pseudo_p (), the intermediate > results can be stored in new registers, otherwise they should reuse > operands[0]. Two advantages to doing it that way are that high parts > can be shared before RA, and that calls to emit_move_insn from the > prologue code will split the move automatically. I think many ports > do it that way (including MIPS FWIW). FWIW, most ports usually just handle such "large adjustment" cases in the prologue/epilogue code manually; either multiple SP-adjustments, or use of a temp register (better control of RTX_FRAME_RELATED_P anyways). You might be able to get it to work, but trying to rely on the splitter does not seem like best practice... Chung-Lin
Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).
On 2013/10/6 下午 06:33, Richard Sandiford wrote: > Chung-Lin Tang writes: >> On 2013/10/6 05:57 PM, Richard Sandiford wrote: >>>>> But case 16 is different. >>>>> This case is only produced at prologue/epilogue phase, using a temporary >>>>> register $r15 to hold a large constant for adjusting stack pointer. >>>>> Since prologue/epilogue is after split1/split2 phase, we can only >>>>> output "sethi" + "ori" directly. >>>>> (The "addi" instruction with $r15 is a 32-bit instruction.) >>> But this code is in the output template of the define_insn. That code >>> is only executed during final, after all passes have been run. If the >>> template returns "#", final will split the instruction itself, which is >>> possible even at that late stage. "#" doesn't have any effect on the >>> passes themselves. >>> >>> (FWIW, there's also a split3 pass that runs after prologue/epilogue >>> generation but before sched2.) >>> >>> However, ISTR there is/was a rule that prologue instructions shouldn't >>> be split, since they'd lose their RTX_FRAME_RELATED_P bit or something. >>> Maybe you hit an ICE because of that? >>> >>> Another way to handle this would be to have the movsi expander split >>> large constant moves. When can_create_pseudo_p (), the intermediate >>> results can be stored in new registers, otherwise they should reuse >>> operands[0]. Two advantages to doing it that way are that high parts >>> can be shared before RA, and that calls to emit_move_insn from the >>> prologue code will split the move automatically. I think many ports >>> do it that way (including MIPS FWIW). >> >> FWIW, most ports usually just handle such "large adjustment" cases in >> the prologue/epilogue code manually; either multiple SP-adjustments, or >> use of a temp register (better control of RTX_FRAME_RELATED_P anyways). >> You might be able to get it to work, but trying to rely on the splitter >> does not seem like best practice... > > To be clear, I wasn't talking about relying on the splitter in the > define_split sense. I was saying that the move expanders could > split large constants. Okay, I sort of missed the context. > MIPS prologue code does use emit_move_insn to move large constants, > which automatically produces a split form from the outset. I don't > really agree that it's bad practice. I think that's mostly the same as what I meant by "manually"; it seems that there's lots of MIPS backend machinery starting from mips_legitimize_move(), so it's not really "automatic" ;) Chung-Lin
Re: [PATCH] OpenACC use_device clause ICE fix
On 2016/1/20 09:17 PM, Bernd Schmidt wrote: > On 01/05/2016 02:15 PM, Chung-Lin Tang wrote: >> * omp-low.c (scan_sharing_clauses): Call add_local_decl() for >> use_device/use_device_ptr variables. > > It looks vaguely plausible, but if everything is part of the host > function, why make a copy of the decl at all? I.e. what happens if you > just remove the install_var_local call? Because (only) inside the OpenMP context, the variable is supposed to contain the device-side value; a runtime call is used to obtain the value from the device back to host. So a new variable is created, the remap_decl mechanisms are used to change references inside the omp context, and other references of the original variable are not touched.
Re: [PATCH] OpenACC use_device clause ICE fix
On 2016/1/22 12:32 AM, Jakub Jelinek wrote: > On Thu, Jan 21, 2016 at 10:22:19PM +0800, Chung-Lin Tang wrote: >> On 2016/1/20 09:17 PM, Bernd Schmidt wrote: >>> On 01/05/2016 02:15 PM, Chung-Lin Tang wrote: >>>> * omp-low.c (scan_sharing_clauses): Call add_local_decl() for >>>> use_device/use_device_ptr variables. >>> >>> It looks vaguely plausible, but if everything is part of the host >>> function, why make a copy of the decl at all? I.e. what happens if you >>> just remove the install_var_local call? >> >> Because (only) inside the OpenMP context, the variable is supposed to >> contain the device-side value; a runtime call is used to obtain the >> value from the device back to host. So a new variable is created, the >> remap_decl mechanisms are used to change references inside the omp >> context, and other references of the original variable are not touched. > > The patch looks wrong to me, the var shouldn't be actually used, > it is supposed to have DECL_VALUE_EXPR set for it during omp lowering and > the following gimplification is supposed to replace it. > > I've tried the testcases you've listed and couldn't get an ICE, so, if you > see some ICE, can you mail the testcase (in patch form)? > Perhaps there is something wrong with the OpenACC lowering? > > Jakub > I've attached a small testcase that triggers the ICE under -fopenacc. This stll happens under current trunk. Thanks, Chung-Lin void foo (float *x, float *y) { int n = 1 << 20; #pragma acc data create(x[0:n]) copyout(y[0:n]) { #pragma acc host_data use_device(x,y) { for (int i = 1 ; i < n; i++) y[0] += x[i] * y[i]; } } }
Re: [PATCH] OpenACC use_device clause ICE fix
On 2016/1/25 7:06 PM, Jakub Jelinek wrote: > The following ICEs without the patch and works with it, so I think it is > better: > > 2016-01-25 Jakub Jelinek > > * omp-low.c (lower_omp_target) : Set > DECL_VALUE_EXPR of new_var even for the non-array case. Look > through DECL_VALUE_EXPR for expansion. > > * c-c++-common/goacc/use_device-1.c: New test. Thanks, the test was indeed just a reduction of a whole example program, which I'm not sure we're at liberty to directly include in the testsuite. I've verified that the patch allows the program to build and run correctly. Thanks, Chung-Lin
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Ping. On 2013/11/26 02:45 PM, Chung-Lin Tang wrote: > Hi Bernd, > I've updated the patch again, please see if it looks fit for approval > now. Including ChangeLog again for completeness. > > Thanks, > Chung-Lin > > 2013-11-26 Chung-Lin Tang > Sandra Loosemore > Based on patches from Altera Corporation > > * config.gcc (nios2-*-*): Add nios2 config targets. > * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case. > ("$cpu_type"): Add nios2 as new cpu type. > * configure: Regenerate. > * config/nios2/nios2.c: New file. > * config/nios2/nios2.h: New file. > * config/nios2/nios2-opts.h: New file. > * config/nios2/nios2-protos.h: New file. > * config/nios2/elf.h: New file. > * config/nios2/elf.opt: New file. > * config/nios2/linux.h: New file. > * config/nios2/nios2.opt: New file. > * config/nios2/nios2.md: New file. > * config/nios2/predicates.md: New file. > * config/nios2/constraints.md: New file. > * config/nios2/t-nios2: New file. > * common/config/nios2/nios2-common.c: New file. > * doc/invoke.texi (Nios II options): Document Nios II specific > options. > * doc/md.texi (Nios II family): Document Nios II specific > constraints. > * doc/extend.texi (Function Specific Option Pragmas): Document > Nios II supported target pragma functionality. >
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Ping x2. On 2013/12/5 12:19 PM, Chung-Lin Tang wrote: > Ping. > > On 2013/11/26 02:45 PM, Chung-Lin Tang wrote: >> Hi Bernd, >> I've updated the patch again, please see if it looks fit for approval >> now. Including ChangeLog again for completeness. >> >> Thanks, >> Chung-Lin >> >> 2013-11-26 Chung-Lin Tang >> Sandra Loosemore >> Based on patches from Altera Corporation >> >> * config.gcc (nios2-*-*): Add nios2 config targets. >> * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case. >> ("$cpu_type"): Add nios2 as new cpu type. >> * configure: Regenerate. >> * config/nios2/nios2.c: New file. >> * config/nios2/nios2.h: New file. >> * config/nios2/nios2-opts.h: New file. >> * config/nios2/nios2-protos.h: New file. >> * config/nios2/elf.h: New file. >> * config/nios2/elf.opt: New file. >> * config/nios2/linux.h: New file. >> * config/nios2/nios2.opt: New file. >> * config/nios2/nios2.md: New file. >> * config/nios2/predicates.md: New file. >> * config/nios2/constraints.md: New file. >> * config/nios2/t-nios2: New file. >> * common/config/nios2/nios2-common.c: New file. >> * doc/invoke.texi (Nios II options): Document Nios II specific >> options. >> * doc/md.texi (Nios II family): Document Nios II specific >> constraints. >> * doc/extend.texi (Function Specific Option Pragmas): Document >> Nios II supported target pragma functionality. >> >
Re: [PATCH] Hexadecimal numbers in option arguments
On 2013/7/14 09:27 PM, Joseph S. Myers wrote: > On Sun, 14 Jul 2013, Chung-Lin Tang wrote: > >> Original patch posted as part of Nios II patches: >> http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01087.html >> >> This patch is to allow hexadecimal numbers to be used in option >> arguments, e.g. -falign-loops=0x10 can now be used as equivalent to >> -falign-loops=16. >> >> Joseph, the patch has been modified to use IXDIGIT to check the argument >> string first, as you suggested in the last submission. Is this okay for >> trunk? > > This version looks like it will allow plain "0x" or "0X" as an argument, > treating it as 0, rather than treating it as an error (i.e., you need to > check there is at least one hex digit after the "0x" or "0X" before > passing the string to strtol). > Hi Joseph, Forgot to follow up on this patch. Here it is with a small update to check if 'p' got updated to a difference position. Does this now look okay? Thanks, Chung-Lin Index: opts-common.c === --- opts-common.c (revision 205847) +++ opts-common.c (working copy) @@ -147,7 +147,7 @@ find_opt (const char *input, unsigned int lang_mas return match_wrong_lang; } -/* If ARG is a non-negative integer made up solely of digits, return its +/* If ARG is a non-negative decimal or hexadecimal integer, return its value, otherwise return -1. */ int @@ -161,6 +161,17 @@ integral_argument (const char *arg) if (*p == '\0') return atoi (arg); + /* It wasn't a decimal number - try hexadecimal. */ + if (arg[0] == '0' && (arg[1] == 'x' || arg[1] == 'X')) +{ + p = arg + 2; + while (*p && ISXDIGIT (*p)) + p++; + + if (p != arg + 2 && *p == '\0') + return strtol (arg, NULL, 16); +} + return -1; }
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Ping x3. On 13/12/10 12:57 PM, Chung-Lin Tang wrote: > Ping x2. > > On 2013/12/5 12:19 PM, Chung-Lin Tang wrote: >> Ping. >> >> On 2013/11/26 02:45 PM, Chung-Lin Tang wrote: >>> Hi Bernd, >>> I've updated the patch again, please see if it looks fit for approval >>> now. Including ChangeLog again for completeness. >>> >>> Thanks, >>> Chung-Lin >>> >>> 2013-11-26 Chung-Lin Tang >>> Sandra Loosemore >>> Based on patches from Altera Corporation >>> >>> * config.gcc (nios2-*-*): Add nios2 config targets. >>> * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case. >>> ("$cpu_type"): Add nios2 as new cpu type. >>> * configure: Regenerate. >>> * config/nios2/nios2.c: New file. >>> * config/nios2/nios2.h: New file. >>> * config/nios2/nios2-opts.h: New file. >>> * config/nios2/nios2-protos.h: New file. >>> * config/nios2/elf.h: New file. >>> * config/nios2/elf.opt: New file. >>> * config/nios2/linux.h: New file. >>> * config/nios2/nios2.opt: New file. >>> * config/nios2/nios2.md: New file. >>> * config/nios2/predicates.md: New file. >>> * config/nios2/constraints.md: New file. >>> * config/nios2/t-nios2: New file. >>> * common/config/nios2/nios2-common.c: New file. >>> * doc/invoke.texi (Nios II options): Document Nios II specific >>> options. >>> * doc/md.texi (Nios II family): Document Nios II specific >>> constraints. >>> * doc/extend.texi (Function Specific Option Pragmas): Document >>> Nios II supported target pragma functionality. >>> >> >
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
On 13/12/23 12:54 AM, Chung-Lin Tang wrote: >> Other than these two, I think this can go in. >> > Bernd > Attached is the updated patch for the compiler. > > Since Bernd is a Global Reviewer, am I clear for committing the port > now? (including the testsuite and libgcc parts) I will be taking Bernd's prior mail as an approval. For avoidance of doubt, unless there are more comments raised, I will be committing the port to trunk next week. Thanks, Chung-Lin
nios2 port committed (Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts)
On 2013/12/28 02:29 PM, Chung-Lin Tang wrote: > On 13/12/23 12:54 AM, Chung-Lin Tang wrote: >>> Other than these two, I think this can go in. >>>> Bernd >> Attached is the updated patch for the compiler. >> >> Since Bernd is a Global Reviewer, am I clear for committing the port >> now? (including the testsuite and libgcc parts) > > I will be taking Bernd's prior mail as an approval. For avoidance of > doubt, unless there are more comments raised, I will be committing the > port to trunk next week. The nios2 port was just committed. Thanks to all that gave time and effort to review this. Thanks, Chung-Lin
Re: [buildrobot] [PATCH] Fix redefinition of BITS_PER_UNIT
On 2014/1/1 02:45 PM, Mike Stump wrote: > On Dec 31, 2013, at 12:26 PM, Jan-Benedict Glaw wrote: >> On Tue, 2013-12-31 15:24:52 +0800, Chung-Lin Tang >> wrote: >>> The nios2 port was just committed. Thanks to all that gave time and >>> effort to review this. >> >> Just a heads-up: I see a lot of warnings about BITS_PER_UNIT being >> redefined, see eg. >> http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=74923 >> as an example. >> >> >> 2013-12-31 Jan-Benedict Glaw >> >> * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it. >> >> diff --git a/gcc/config/nios2/nios2.h b/gcc/config/nios2/nios2.h >> index 8e6941b..f333be3 100644 >> --- a/gcc/config/nios2/nios2.h >> +++ b/gcc/config/nios2/nios2.h >> @@ -73,7 +73,6 @@ >> #define BITS_BIG_ENDIAN 0 >> #define BYTES_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0) >> #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0) >> -#define BITS_PER_UNIT 8 >> #define BITS_PER_WORD 32 >> #define UNITS_PER_WORD 4 >> #define POINTER_SIZE 32 >> >> >> Ok? > > Ok. > Thanks for catching that.
[PATCH, gomp4] Propagate independent clause for OpenACC kernels pass
Hi Tom, this patch provides a 'bool independent' field in struct loop, which will be switched on by an "independent" clause in a #pragma acc loop directive. I assume you'll be wiring it to the kernels parloops pass in a followup patch. Note: there are already a few other similar fields in struct loop, namely 'safelen' and 'can_be_parallel', used by OMP simd safelen and GRAPHITE respectively. The intention and/or setting of these fields are all a bit different, so I've decided to add a new bool for OpenACC. Tested and committed to gomp-4_0-branch. Chung-Lin 2015-07-14 Chung-Lin Tang * cfgloop.h (struct loop): Add 'bool marked_independent' field. * gimplify.c (gimplify_scan_omp_clauses): Keep OMP_CLAUSE_INDEPENDENT. * omp-low.c (struct omp_region): Add 'int kind' and 'bool independent' fields. (expand_omp_for): Set 'marked_independent' field for loop corresponding to region. (find_omp_for_region_data): New function. (find_omp_target_region_data): Set kind field. (build_omp_regions_1): Call find_omp_for_region_data() for GIMPLE_OMP_FOR statements. Index: cfgloop.h === --- cfgloop.h (revision 225758) +++ cfgloop.h (working copy) @@ -194,6 +194,10 @@ struct GTY ((chain_next ("%h.next"))) loop { /* True if the loop is part of an oacc kernels region. */ bool in_oacc_kernels_region; + /* True if loop is tagged as having independent iterations by user, + e.g. the OpenACC independent clause. */ + bool marked_independent; + /* For SIMD loops, this is a unique identifier of the loop, referenced by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE builtins. */ Index: gimplify.c === --- gimplify.c (revision 225758) +++ gimplify.c (working copy) @@ -6602,7 +6602,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se break; case OMP_CLAUSE_DEVICE_RESIDENT: - case OMP_CLAUSE_INDEPENDENT: remove = true; break; @@ -6612,6 +6611,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se case OMP_CLAUSE_COLLAPSE: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: + case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_MERGEABLE: case OMP_CLAUSE_PROC_BIND: case OMP_CLAUSE_SAFELEN: Index: omp-low.c === --- omp-low.c (revision 225758) +++ omp-low.c (working copy) @@ -136,8 +136,16 @@ struct omp_region /* True if this is nested inside an OpenACC kernels construct. */ bool inside_kernels_p; + /* Records a generic kind field. */ + int kind; + /* For an OpenACC loop, the level of parallelism requested. */ int gwv_this; + + /* For an OpenACC loop directive, true if has the 'independent' clause. */ + bool independent; + + tree broadcast_array; }; /* Context structure. Used to store information about each parallel @@ -8273,8 +8281,15 @@ expand_omp_for (struct omp_region *region, gimple loops_state_set (LOOPS_NEED_FIXUP); if (region->inside_kernels_p) -expand_omp_for_generic (region, &fd, BUILT_IN_NONE, BUILT_IN_NONE, - inner_stmt); +{ + expand_omp_for_generic (region, &fd, BUILT_IN_NONE, BUILT_IN_NONE, + inner_stmt); + if (region->independent && region->cont->loop_father) + { + struct loop *loop = region->cont->loop_father; + loop->marked_independent = true; + } +} else if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_SIMD) expand_omp_simd (region, &fd); else if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_CILKFOR) @@ -9943,6 +9958,34 @@ find_omp_for_region_gwv (gimple stmt) return tmp; } +static void +find_omp_for_region_data (struct omp_region *region, gomp_for *stmt) +{ + region->gwv_this = find_omp_for_region_gwv (stmt); + region->kind = gimple_omp_for_kind (stmt); + + if (region->kind == GF_OMP_FOR_KIND_OACC_LOOP) +{ + struct omp_region *target_region = region->outer; + while (target_region + && target_region->type != GIMPLE_OMP_TARGET) + target_region = target_region->outer; + if (!target_region) + return; + + tree clauses = gimple_omp_for_clauses (stmt); + + if (target_region->kind == GF_OMP_TARGET_KIND_OACC_PARALLEL + && !find_omp_clause (clauses, OMP_CLAUSE_SEQ)) + /* In OpenACC parallel constructs, 'independent' is implied on all + loop directives without a 'seq' clause. */ + region->independent = true; + else if (target_region->kind == GF_OMP_TARGET_KIND_OACC_KERNELS + && find_omp_clause (clauses, OMP_CLAUSE_INDEPENDENT)) + region->independent = true; +} +} + /* Fill in additional da
Re: [PATCH, gomp4] Propagate independent clause for OpenACC kernels pass
On 15/7/14 3:00 PM, Jakub Jelinek wrote: > On Tue, Jul 14, 2015 at 01:46:04PM +0800, Chung-Lin Tang wrote: >> this patch provides a 'bool independent' field in struct loop, which >> will be switched on by an "independent" clause in a #pragma acc loop >> directive. >> I assume you'll be wiring it to the kernels parloops pass in a followup >> patch. >> >> Note: there are already a few other similar fields in struct loop, namely >> 'safelen' and 'can_be_parallel', used by OMP simd safelen and GRAPHITE >> respectively. >> The intention and/or setting of these fields are all a bit different, so I've >> decided to add a new bool for OpenACC. > > How is it different though? Can you cite exact definition of the > independent clause vs. safelen (set to INT_MAX)? > The OpenMP definition is: > "A SIMD loop has logical iterations numbered 0,1,...,N-1 where N is the > number of loop iterations, and the logical numbering denotes the sequence in > which the iterations would > be executed if the associated loop(s) were executed with no SIMD > instructions. If the safelen > clause is used then no two iterations executed concurrently with SIMD > instructions can have a > greater distance in the logical iteration space than its value." > ... > "Lexical forward dependencies in the iterations of the > original loop must be preserved within each SIMD chunk." The wording of OpenACC independent is more simple: "... the independent clause tells the implementation that the iterations of this loop are data-independent with respect to each other." -- OpenACC spec 2.7.9 I would say this implies even more relaxed conditions than OpenMP simd safelen, essentially saying that the compiler doesn't even need dependence analysis; just assume independence of iterations. > So e.g. safelen >= 32 means for PTX you can safely implement it by > running up to 32 consecutive iterations by all threads in the warp > (assuming code that for some reason must be run by a single thread > (e.g. calls to functions that are marked so that they expect to be run > by the first thread in a warp initially) is run sequentially by increasing > iterator), but it doesn't mean the iterations have no dependencies in between > them whatsoever (see the above note about lexical forward dependencies), > so you can't parallelize it by assigning different iterations to different > threads outside of warp (or pthread_create created threads). > So if OpenACC independent means there are no dependencies in between > iterations, the OpenMP counterpart here is #pragma omp for simd schedule > (auto) > or #pragma omp distribute parallel for simd schedule (auto). schedule(auto) appears to correspond to the OpenACC 'auto' clause, or what is implied in a kernels compute construct, but I'm not sure it implies no dependencies between iterations? Putting aside the semantic issues, as of currently safelen>0 turns on a certain amount of vectorization code that we are not currently using (and not likely at all for nvptx). Right now, we're just trying to pass the new flag to a kernels tree-parloops based pass. Maybe this can all be reconciled later in a more precise way, e.g. have flags that correspond specifically to phases of internal compiler passes (and selected by needs of the accel target), instead of ones that are "sort of" associated with high-level language features. Chung-Lin
[PATCH, nios2] Remove unused header from libgcc linux-atomic.c
The header was used back when Nios II Linux used a syscall cmpxchg, long since removed and actually never got into the FSF trunk. Patch removes the #include, and the following error code #defines which are all no longer used. Committed. Chung-Lin 2015-07-22 Chung-Lin Tang * config/nios2/linux-atomic.c (): Remove #include. (EFAULT,EBUSY,ENOSYS): Delete unused #defines. Index: config/nios2/linux-atomic.c === --- config/nios2/linux-atomic.c (revision 226061) +++ config/nios2/linux-atomic.c (working copy) @@ -20,11 +20,6 @@ a copy of the GCC Runtime Library Exception along see the files COPYING3 and COPYING.RUNTIME respectively. If not, see <http://www.gnu.org/licenses/>. */ -#include -#define EFAULT 14 -#define EBUSY 16 -#define ENOSYS 38 - /* We implement byte, short and int versions of each atomic operation using the kernel helper defined below. There is no support for 64-bit operations yet. */
[PATCH, libgomp] PR 67141, uninitialized acc_device_lock mutex
Hi, this patch fixes the uninitialized acc_device_lock mutex situation reported in PR 67141. The patch attached on the bugzilla page tries to solve it by constructor priorities, which we think will probably be less manageable in general. This patch changes goacc_host_init() to be called from goacc_runtime_initialize() instead, thereby ensuring the init order. libgomp testsuite was re-run without regressions, okay for trunk? Thanks, Chung-Lin 2015-09-18 Chung-Lin Tang PR libgomp/67141 * oacc-int.h (goacc_host_init): Add declaration. * oacc-host.c (goacc_host_init): Remove static and constructor attribute * oacc-init.c (goacc_runtime_initialize): Call goacc_host_init() at end. Index: oacc-host.c === --- oacc-host.c (revision 227895) +++ oacc-host.c (working copy) @@ -256,7 +256,7 @@ static struct gomp_device_descr host_dispatch = }; /* Initialize and register this device type. */ -static __attribute__ ((constructor)) void +void goacc_host_init (void) { gomp_mutex_init (&host_dispatch.lock); Index: oacc-int.h === --- oacc-int.h (revision 227895) +++ oacc-int.h (working copy) @@ -97,6 +97,7 @@ void goacc_runtime_initialize (void); void goacc_save_and_set_bind (acc_device_t); void goacc_restore_bind (void); void goacc_lazy_initialize (void); +void goacc_host_init (void); #ifdef HAVE_ATTRIBUTE_VISIBILITY # pragma GCC visibility pop Index: oacc-init.c === --- oacc-init.c (revision 227895) +++ oacc-init.c (working copy) @@ -644,6 +644,9 @@ goacc_runtime_initialize (void) goacc_threads = NULL; gomp_mutex_init (&goacc_thread_lock); + + /* Initialize and register the 'host' device type. */ + goacc_host_init (); } /* Compiler helper functions */
Re: [PATCH, libgomp] PR 67141, uninitialized acc_device_lock mutex
On 2015/9/18 04:02 PM, Jakub Jelinek wrote: > On Fri, Sep 18, 2015 at 03:41:30PM +0800, Chung-Lin Tang wrote: >> this patch fixes the uninitialized acc_device_lock mutex situation >> reported in PR 67141. The patch attached on the bugzilla page >> tries to solve it by constructor priorities, which we think will >> probably be less manageable in general. >> >> This patch changes goacc_host_init() to be called from >> goacc_runtime_initialize() instead, thereby ensuring the init order. >> libgomp testsuite was re-run without regressions, okay for trunk? >> >> Thanks, >> Chung-Lin >> >> 2015-09-18 Chung-Lin Tang >> >> PR libgomp/67141 >> > > No vertical space in between PR line and subsequent entries. > >> * oacc-int.h (goacc_host_init): Add declaration. >> * oacc-host.c (goacc_host_init): Remove static and >> constructor attribute > > Full stop at the end of entry. > >> * oacc-init.c (goacc_runtime_initialize): Call goacc_host_init() >> at end. > > The patch is ok. Though, perhaps as a follow-up, I think I'd prefer getting > rid of pthread_key_create (&goacc_cleanup_key, goacc_destroy_thread);, > it is wasteful if we do the same thing in initialize_team. As the > goacc_tls_data pointer is __thread anyway, I think just putting it into > struct gomp_thread, arranging for init_team to be called from the env.c > ctor and from the team TLS destructor call also some oacc freeing if > the goacc_tls_data pointer is non-NULL (perhaps with __builtin_expect > unlikely). > > Jakub Committed, thanks for the review. I believe this patch is also needed for 5.x, okay for that branch as well? Thanks, Chung-Lin