Hi! Jakub, need your review/approval here, please:
On 2022-01-13T10:54:16+0100, I wrote: > On 2019-05-08T14:51:57+0100, Julian Brown <jul...@codesourcery.com> wrote: >> - The "addressable" bit is set during the kernels conversion pass for >> variables that have "create" (alloc) clauses created for them in the >> synthesised outer data region (instead of in the front-end, etc., >> where it can't be done accurately). Such variables actually have >> their address taken during transformations made in a later pass >> (omp-low, I think), but there's a phase-ordering problem that means >> the flag should be set earlier. > > The actual issue is a bit different, but yes, there is a problem. > The related ICE has also been reported as <https://gcc.gnu.org/PR100280> > "ICE in lower_omp_target, at omp-low.c:12287". (And I'm confused why we > didn't run into that with the OpenACC 'kernels' decomposition > originally.) I've pushed to master branch > commit 9b32c1669aad5459dd053424f9967011348add83 > "OpenACC 'kernels' decomposition: Mark variables used in synthesized data > clauses as addressable [PR100280]" > ... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary: > > 13073 else if (is_gimple_reg (var)) > 13074 { > 13075 gcc_assert (offloaded); > 13076 tree avar = create_tmp_var (TREE_TYPE > (var)); > 13077 mark_addressable (avar); > > ..., which (a) is only implemented for actualy *offloaded* regions (but not > data regions), and (b) the subsequently synthesized code for writing to and > later reading back from the temporary fundamentally conflicts with OpenACC > 'async' (as used by OpenACC 'kernels' decomposition). That's all not trivial > to make work, so let's just avoid this case. > --- a/gcc/omp-oacc-kernels-decompose.cc > +++ b/gcc/omp-oacc-kernels-decompose.cc > @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple > *body) > > /* If INNER_BIND_VARS holds variables, build an OpenACC data region with > location LOC containing BODY and having 'create (var)' clauses for each > - variable. If INNER_CLEANUP is present, add a try-finally statement with > + variable (as a side effect, such variables also get TREE_ADDRESSABLE set). > + If INNER_CLEANUP is present, add a try-finally statement with > this cleanup code in the finally block. Return the new data region, or > the original BODY if no data region was needed. */ > > @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple > *body, > inner_data_clauses = new_clause; > > prev_mapped_var = v; > + > + /* See <https://gcc.gnu.org/PR100280>. */ > + TREE_ADDRESSABLE (v) = 1; > } > } So, that's too simple. ;-) ... and gives rise to workaround patches like we have on the og11 development branch: - "Avoid introducing 'create' mapping clauses for loop index variables in kernels regions", - "Run all kernels regions with GOMP_MAP_FORCE_TOFROM mappings synchronously", - "Fix for is_gimple_reg vars to 'data kernels'" We're after gimplification, and must not just set 'TREE_ADDRESSABLE', because that may easily violate GIMPLE invariants, leading to ICEs later. There are a few open PRs, which my following changes are addressing. To make "late" 'TREE_ADDRESSABLE' work, we have a precedent in OpenMP's 'gcc/omp-low.cc:task_shared_vars' handling, as Jakub had pointed to in discussion of <https://gcc.gnu.org/PR102330>. (PR102330 turned out to be unrelated from the "late" 'TREE_ADDRESSABLE' problem here; I have a different patch for it.) I'm thus proposing to generalize 'gcc/omp-low.cc:task_shared_vars' into 'make_addressable_vars', plus new 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' that we then may use instead of the 'TREE_ADDRESSABLE (v) = 1;' quoted above (plus one or two additional ones to be introduced in later patches), and wire that up in 'gcc/omp-low.cc:scan_sharing_clauses', for 'OMP_CLAUSE_MAP': set 'TREE_ADDRESSABLE' and put into 'make_addressable_vars' for later fix-up. (In reply to Jakub Jelinek from comment #9) > Whether you can use the same bitmap or need to add another bitmap next to > task_shared_vars is something hard to guess without diving into it deeply. Per my understanding of the code, the only place where I had doubts is 'gcc/omp-low.cc:finish_taskreg_scan', but I have convinced myself that what this is doing is either a no-op in the 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' case, or in fact necessary as the original 'task_shared_vars' handling has been. Either way: I couldn't come up with a way (test case) that we'd actually run into this case; you'd have to have the relevant OpenMP constructs inside an OpenACC 'kernels' region, which isn't permitted per 'gcc/omp-low.cc:check_omp_nesting_restrictions'. OK to proceed in this way? Grüße Thomas --- gcc/omp-low.cc +++ gcc/omp-low.cc @@ -188,7 +188,7 @@ struct omp_context static splay_tree all_contexts; static int taskreg_nesting_level; static int target_nesting_level; -static bitmap task_shared_vars; +static bitmap make_addressable_vars; static bitmap global_nonaddressable_vars; static vec<omp_context *> taskreg_contexts; static vec<gomp_task *> task_cpyfns; @@ -572,9 +572,9 @@ use_pointer_for_field (tree decl, omp_context *shared_ctx) /* Taking address of OUTER in lower_send_shared_vars might need regimplification of everything that uses the variable. */ - if (!task_shared_vars) - task_shared_vars = BITMAP_ALLOC (NULL); - bitmap_set_bit (task_shared_vars, DECL_UID (outer)); + if (!make_addressable_vars) + make_addressable_vars = BITMAP_ALLOC (NULL); + bitmap_set_bit (make_addressable_vars, DECL_UID (outer)); TREE_ADDRESSABLE (outer) = 1; } return true; @@ -601,13 +601,13 @@ omp_copy_decl_2 (tree var, tree name, tree type, omp_context *ctx) else record_vars (copy); - /* If VAR is listed in task_shared_vars, it means it wasn't - originally addressable and is just because task needs to take - it's address. But we don't need to take address of privatizations + /* If VAR is listed in make_addressable_vars, it wasn't + originally addressable, but was only later made so. + We don't need to take address of privatizations from that var. */ if (TREE_ADDRESSABLE (var) - && ((task_shared_vars - && bitmap_bit_p (task_shared_vars, DECL_UID (var))) + && ((make_addressable_vars + && bitmap_bit_p (make_addressable_vars, DECL_UID (var))) || (global_nonaddressable_vars && bitmap_bit_p (global_nonaddressable_vars, DECL_UID (var))))) TREE_ADDRESSABLE (copy) = 0; @@ -1495,6 +1495,21 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) if (ctx->outer) scan_omp_op (&OMP_CLAUSE_SIZE (c), ctx->outer); decl = OMP_CLAUSE_DECL (c); + /* If requested, make 'decl' addressable. */ + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP + && OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (c)) + { + gcc_checking_assert (DECL_P (decl)); + + gcc_checking_assert (!TREE_ADDRESSABLE (decl)); + if (!make_addressable_vars) + make_addressable_vars = BITMAP_ALLOC (NULL); + bitmap_set_bit (make_addressable_vars, DECL_UID (decl)); + TREE_ADDRESSABLE (decl) = 1; + + /* Done. */ + OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (c) = 0; + } /* Global variables with "omp declare target" attribute don't need to be copied, the receiver side will use them directly. However, global variables with "omp declare target link" @@ -2371,11 +2405,11 @@ finish_taskreg_scan (omp_context *ctx) if (ctx->record_type == NULL_TREE) return; - /* If any task_shared_vars were needed, verify all + /* If any make_addressable_vars were needed, verify all OMP_CLAUSE_SHARED clauses on GIMPLE_OMP_{PARALLEL,TASK,TEAMS} statements if use_pointer_for_field hasn't changed because of that. If it did, update field types now. */ - if (task_shared_vars) + if (make_addressable_vars) { tree c; @@ -2390,7 +2424,7 @@ finish_taskreg_scan (omp_context *ctx) the receiver side will use them directly. */ if (is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))) continue; - if (!bitmap_bit_p (task_shared_vars, DECL_UID (decl)) + if (!bitmap_bit_p (make_addressable_vars, DECL_UID (decl)) || !use_pointer_for_field (decl, ctx)) continue; tree field = lookup_field (decl, ctx); @@ -14040,7 +14074,7 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx) /* Callback for lower_omp_1. Return non-NULL if *tp needs to be regimplified. If DATA is non-NULL, lower_omp_1 is outside - of OMP context, but with task_shared_vars set. */ + of OMP context, but with make_addressable_vars set. */ static tree lower_omp_regimplify_p (tree *tp, int *walk_subtrees, @@ -14054,9 +14088,9 @@ lower_omp_regimplify_p (tree *tp, int *walk_subtrees, && DECL_HAS_VALUE_EXPR_P (t)) return t; - if (task_shared_vars + if (make_addressable_vars && DECL_P (t) - && bitmap_bit_p (task_shared_vars, DECL_UID (t))) + && bitmap_bit_p (make_addressable_vars, DECL_UID (t))) return t; /* If a global variable has been privatized, TREE_CONSTANT on @@ -14141,7 +14175,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx) if (gimple_has_location (stmt)) input_location = gimple_location (stmt); - if (task_shared_vars) + if (make_addressable_vars) memset (&wi, '\0', sizeof (wi)); /* If we have issued syntax errors, avoid doing any heavy lifting. @@ -14158,7 +14192,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx) case GIMPLE_COND: { gcond *cond_stmt = as_a <gcond *> (stmt); - if ((ctx || task_shared_vars) + if ((ctx || make_addressable_vars) && (walk_tree (gimple_cond_lhs_ptr (cond_stmt), lower_omp_regimplify_p, ctx ? NULL : &wi, NULL) @@ -14250,7 +14284,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx) lower_omp_critical (gsi_p, ctx); break; case GIMPLE_OMP_ATOMIC_LOAD: - if ((ctx || task_shared_vars) + if ((ctx || make_addressable_vars) && walk_tree (gimple_omp_atomic_load_rhs_ptr ( as_a <gomp_atomic_load *> (stmt)), lower_omp_regimplify_p, ctx ? NULL : &wi, NULL)) @@ -14371,7 +14405,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx) default: regimplify: - if ((ctx || task_shared_vars) + if ((ctx || make_addressable_vars) && walk_gimple_op (stmt, lower_omp_regimplify_p, ctx ? NULL : &wi)) { @@ -14435,10 +14469,10 @@ execute_lower_omp (void) if (all_contexts->root) { - if (task_shared_vars) + if (make_addressable_vars) push_gimplify_context (); lower_omp (&body, NULL); - if (task_shared_vars) + if (make_addressable_vars) pop_gimplify_context (NULL); } @@ -14447,7 +14481,7 @@ execute_lower_omp (void) splay_tree_delete (all_contexts); all_contexts = NULL; } - BITMAP_FREE (task_shared_vars); + BITMAP_FREE (make_addressable_vars); BITMAP_FREE (global_nonaddressable_vars); /* If current function is a method, remove artificial dummy VAR_DECL created --- gcc/omp-oacc-kernels-decompose.cc +++ gcc/omp-oacc-kernels-decompose.cc @@ -845,7 +845,11 @@ maybe_build_inner_data_region (location_t loc, gimple *body, prev_mapped_var = v; /* See <https://gcc.gnu.org/PR100280>. */ - TREE_ADDRESSABLE (v) = 1; + if (!TREE_ADDRESSABLE (v)) + { + /* Request that OMP lowering make 'v' addressable. */ + OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (new_clause) = 1; + } } } --- gcc/tree-core.h +++ gcc/tree-core.h @@ -1145,6 +1145,9 @@ struct GTY(()) tree_base { PREDICT_EXPR_OUTCOME in PREDICT_EXPR + OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE in + OMP_CLAUSE + static_flag: TREE_STATIC in --- gcc/tree.h +++ gcc/tree.h @@ -1695,6 +1695,11 @@ class auto_suppress_location_wrappers #define OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P(NODE) \ (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.deprecated_flag) +/* Flag that 'OMP_CLAUSE_DECL (NODE)' is to be made addressable during OMP + lowering. */ +#define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \ + (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.addressable_flag) + /* True on an OMP_CLAUSE_USE_DEVICE_PTR with an OpenACC 'if_present' clause. */ #define OMP_CLAUSE_USE_DEVICE_PTR_IF_PRESENT(NODE) \ ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955