Hi Chung-Lin! On 2019-11-05T22:35:43+0800, Chung-Lin Tang <chunglin_t...@mentor.com> wrote: > Hi Thomas, > after your last round of review, I realized that the bulk of the compiler > omp-low work was > simply a case of dumb over-engineering in the wrong direction :P > (although it did painstakingly function correctly)
Hehe -- that happens. ;-) > However, the issue of ACC_DEVICE_TYPE=host not working (and hence > "!openacc_host_selected" > in the testcases) Actually not just for that, but also generally for any shared-memory models that may come into existance at some point, such as CUDA Unified Memory, for example? > actually is a bit more sophisticated than I thought: > > The reason it doesn't work for the host device, is because we use the map > pointer (i.e. > a hostaddrs[] entry when passed into libgomp) to point to an array descriptor > to pass > the whole array information, and rely on code inside gomp_map_vars_* to setup > things, > and place the final on-device address of the non-contig. array into > devaddrs[], therefore > only using a single map entry (something I thought was quite clever) > > However, this broke down on the host and host-fallback devices, simply > because, there > we do NOT do any gomp_map_vars processing; our current code in > GOACC_parallel_keyed > simply skips it and passes the offload function the original hostaddrs[] > contents. > Lacking the processing to transform the descriptor pointer into a proper > array ref, > things of course segfault. > > So I think we have three options for this (which may have some interactions > with say, > the "proper" host-side parallelization we eventually need to implement for > OpenACC 2.7) > > (1) The simplest solution: implement a processing which searches and reverts > such > non-contiguous array map entries in GOACC_parallel_keyed. > (note: I have implemented this in the current attached "v2" patch) > > (2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes; > i.e. still do the proper gomp_map_vars processing for all cases. > > (3) Modify the non-contiguous array map conventions: a possible solution is > to use > two maps placed together: one for the array pointer, another for the array > descriptor (as > opposed to the current style of using only one map) This needs more further > elaborate > compiler/runtime work. > > The first two options will pessimize host-mode performance somewhat. The > third I have > some WIP patches, but it's still buggy ATM. Seeking your opinion on what we > should do. I'll have to think about it some more, but variant (1) doesn't seem so bad actually, for a first take. While it's not nice to pessimize in particular directives with 'if (false)' clauses, at least it does work, the run-time overhead should not be too bad (also compared to variant (2), I suppose), and variant (3) can still be implemented later. A few comments/questions: Please reference PR76739 in your submission/ChangeLog updates. > --- gcc/c/c-typeck.c (revision 277827) > +++ gcc/c/c-typeck.c (working copy) > @@ -12868,7 +12868,7 @@ c_finish_omp_cancellation_point (location_t loc, t > static tree > handle_omp_array_sections_1 (tree c, tree t, vec<tree> &types, > bool &maybe_zero_len, unsigned int &first_non_one, > - enum c_omp_region_type ort) > + bool &non_contiguous, enum c_omp_region_type ort) > { > tree ret, low_bound, length, type; > if (TREE_CODE (t) != TREE_LIST) > @@ -13160,14 +13161,21 @@ handle_omp_array_sections_1 (tree c, tree t, vec<t > return error_mark_node; > } > /* If there is a pointer type anywhere but in the very first > - array-section-subscript, the array section can't be contiguous. */ > + array-section-subscript, the array section can't be contiguous. > + Note that OpenACC does accept these kinds of non-contiguous pointer > + based arrays. */ That comment update should instead be moved to the function comment before the 'handle_omp_array_sections_1' function definition, and should then also explain the new 'non_contiguous' out variable. The latter needs to be done anyway, and the former (no comment here) is easy enough to tell from the code: > if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND > && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST) > { > - error_at (OMP_CLAUSE_LOCATION (c), > - "array section is not contiguous in %qs clause", > - omp_clause_code_name[OMP_CLAUSE_CODE (c)]); > - return error_mark_node; > + if (ort == C_ORT_ACC) > + non_contiguous = true; > + else > + { > + error_at (OMP_CLAUSE_LOCATION (c), > + "array section is not contiguous in %qs clause", > + omp_clause_code_name[OMP_CLAUSE_CODE (c)]); > + return error_mark_node; > + } > } > @@ -13238,6 +13247,7 @@ handle_omp_array_sections (tree c, enum c_omp_regi > unsigned int num = types.length (), i; > tree t, side_effects = NULL_TREE, size = NULL_TREE; > tree condition = NULL_TREE; > + tree ncarray_dims = NULL_TREE; > > if (int_size_in_bytes (TREE_TYPE (first)) <= 0) > maybe_zero_len = true; > @@ -13261,6 +13271,13 @@ handle_omp_array_sections (tree c, enum c_omp_regi > length = fold_convert (sizetype, length); > if (low_bound == NULL_TREE) > low_bound = integer_zero_node; > + > + if (non_contiguous) > + { > + ncarray_dims = tree_cons (low_bound, length, ncarray_dims); > + continue; > + } > + > if (!maybe_zero_len && i > first_non_one) > { > if (integer_nonzerop (low_bound)) I'm not at all familiar with this array sections code, will trust your understanding that we don't need any of the processing that you're skipping here ('continue'): 'TREE_SIDE_EFFECTS' handling for the length expressions, and other things. > @@ -13357,6 +13374,14 @@ handle_omp_array_sections (tree c, enum c_omp_regi > size = size_binop (MULT_EXPR, size, l); > } > } > + if (non_contiguous) > + { > + int kind = OMP_CLAUSE_MAP_KIND (c); > + OMP_CLAUSE_SET_MAP_KIND (c, kind | GOMP_MAP_NONCONTIG_ARRAY); > + OMP_CLAUSE_DECL (c) = t; > + OMP_CLAUSE_SIZE (c) = ncarray_dims; > + return false; > + } > if (side_effects) > size = build2 (COMPOUND_EXPR, sizetype, side_effects, size); > if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION Likewise for all the code being skipped here ('return false'). > --- gcc/cp/semantics.c (revision 277827) > +++ gcc/cp/semantics.c (working copy) Analoguous to the C front end. > --- gcc/gimplify.c (revision 277827) > +++ gcc/gimplify.c (working copy) > @@ -8622,9 +8622,17 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se > if (OMP_CLAUSE_SIZE (c) == NULL_TREE) > OMP_CLAUSE_SIZE (c) = DECL_P (decl) ? DECL_SIZE_UNIT (decl) > : TYPE_SIZE_UNIT (TREE_TYPE (decl)); > + if (OMP_CLAUSE_SIZE (c) > + && TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST > + && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c))) Per the code above, 'OMP_CLAUSE_SIZE (c)' will always be set to something, so no point in checking that here? Isn't the 'GOMP_MAP_NONCONTIG_ARRAY_P' check alone sufficient already? And then maybe 'assert (TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST' in here: > { > + /* For non-contiguous array maps, OMP_CLAUSE_SIZE is a TREE_LIST > + of the individual array dimensions, which gimplify_expr doesn't > + handle, so skip the call to gimplify_expr here. */ > + } > - if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p, > - NULL, is_gimple_val, fb_rvalue) == GS_ERROR) > + else if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p, > + NULL, is_gimple_val, fb_rvalue) == GS_ERROR) > + { > remove = true; > break; > } Again, that means we're skipping other code here; don't understand yet. Your ChangeLog update says: > * gimplify.c (gimplify_scan_omp_clauses): For non-contiguous array map > kinds, > make sure bias in each dimension are put into firstprivate variables. I'm not yet seeing how that's happening. Ah, I see that ChangeLog comment is probably just a remnant from the previous version. > --- gcc/omp-low.c (revision 277827) > +++ gcc/omp-low.c (working copy) Have not yet reviewed in detail. > @@ -1367,6 +1498,38 @@ scan_sharing_clauses (tree clauses, omp_context *c > install_var_local (decl, ctx); > break; > } > + > + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP > + && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c))) > + { > + tree array_decl = OMP_CLAUSE_DECL (c); > + tree array_type = TREE_TYPE (array_decl); > + bool by_ref = (TREE_CODE (array_type) == ARRAY_TYPE > + ? true : false); > + > + /* Checking code to ensure we only have arrays at top dimension. > + This limitation might be lifted in the future. */ Please reference PR76739 here, and in PR76739 also add a comment about this limitation. (As well as any other limitations, of course.) > + if (TREE_CODE (array_type) == REFERENCE_TYPE) > + array_type = TREE_TYPE (array_type); > + tree t = array_type, prev_t = NULL_TREE; > + while (t) > + { > + if (TREE_CODE (t) == ARRAY_TYPE && prev_t) > + { > + error_at (gimple_location (ctx->stmt), "array types are" > + " only allowed at outermost dimension of" > + " non-contiguous array"); > + break; > + } > + prev_t = t; > + t = TREE_TYPE (t); > + } > + > + install_var_field (array_decl, by_ref, 3, ctx); > + install_var_local (array_decl, ctx); > + break; > + } > + Assuming this intentionally means to skip ('break' just above) the following 'if (DECL_P (decl))' and its 'else' branch, then maybe remove the 'break' just above, and instead do 'else if (DECL_P (decl))'? > if (DECL_P (decl)) > { > if (DECL_SIZE (decl) > @@ -2624,6 +2830,14 @@ scan_omp_target (gomp_target *stmt, omp_context *o > gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn); > } > > + /* If is OpenACC construct, put non-contiguous array clauses (if any) > + in front of clause chain. The runtime can then test the first to see > + if the additional map processing for them is required. */ > + if (is_gimple_omp_oacc (stmt)) > + reorder_noncontig_array_clauses (gimple_omp_target_clauses_ptr (stmt)); Should that be deemed unsuitable for any reason, then add a new 'GOACC_FLAG_*' flag to indicate existance of non-contiguous arrays. > --- include/gomp-constants.h (revision 277827) > +++ include/gomp-constants.h (working copy) > @@ -40,6 +40,7 @@ > #define GOMP_MAP_FLAG_SPECIAL_0 (1 << 2) > #define GOMP_MAP_FLAG_SPECIAL_1 (1 << 3) > #define GOMP_MAP_FLAG_SPECIAL_2 (1 << 4) > +#define GOMP_MAP_FLAG_SPECIAL_3 (1 << 5) > #define GOMP_MAP_FLAG_SPECIAL (GOMP_MAP_FLAG_SPECIAL_1 \ > | GOMP_MAP_FLAG_SPECIAL_0) > /* Flag to force a specific behavior (or else, trigger a run-time error). */ > @@ -127,6 +128,26 @@ enum gomp_map_kind > /* Decrement usage count and deallocate if zero. */ > GOMP_MAP_RELEASE = (GOMP_MAP_FLAG_SPECIAL_2 > | GOMP_MAP_DELETE), > + /* Mapping kinds for non-contiguous arrays. */ > + GOMP_MAP_NONCONTIG_ARRAY = (GOMP_MAP_FLAG_SPECIAL_3), > + GOMP_MAP_NONCONTIG_ARRAY_TO = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_TO), > + GOMP_MAP_NONCONTIG_ARRAY_FROM = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_FROM), > + GOMP_MAP_NONCONTIG_ARRAY_TOFROM = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_TOFROM), > + GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO = (GOMP_MAP_NONCONTIG_ARRAY_TO > + | GOMP_MAP_FLAG_FORCE), > + GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM = (GOMP_MAP_NONCONTIG_ARRAY_FROM > + | GOMP_MAP_FLAG_FORCE), > + GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM = (GOMP_MAP_NONCONTIG_ARRAY_TOFROM > + | GOMP_MAP_FLAG_FORCE), > + GOMP_MAP_NONCONTIG_ARRAY_ALLOC = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_ALLOC), > + GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_FORCE_ALLOC), > + GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT = (GOMP_MAP_NONCONTIG_ARRAY > + | GOMP_MAP_FORCE_PRESENT), Just an idea: instead of this long list, would it maybe be better (if feasible at all?) to have a single "lead-in" mapping 'GOMP_MAP_NONCONTIG_ARRAY_MODE', which specifies how many of the following (normal) mappings belong to that "non-contiguous array mode". (Roughly similar to what 'GOMP_MAP_TO_PSET' is doing with any 'GOMP_MAP_POINTER's following it.) Might that make some things simpler, or even more complicated (more internal state to keep)? > --- libgomp/oacc-parallel.c (revision 277827) > +++ libgomp/oacc-parallel.c (working copy) > +static inline void > +revert_noncontig_array_map_pointers (size_t mapnum, void **hostaddrs, > + unsigned short *kinds) > +{ > + for (int i = 0; i < mapnum; i++) > + { > + if (GOMP_MAP_NONCONTIG_ARRAY_P (kinds[i] & 0xff)) > + hostaddrs[i] = *((void **)hostaddrs[i]); Can we be (or, do we make) sure that 'hostaddrs' will never be in read-only memory? And, it's permissible to alter 'hostaddrs'? Ah, other code (including 'libgomp/target.c') is doing such things, too, so it must be fine. > + else > + /* We assume all non-contiguous array map entries are placed at the > + start; first other map kind means we can exit. */ > + break; > + } > +} > --- libgomp/target.c (revision 277827) > +++ libgomp/target.c (working copy) Have not yet reviewed in detail. > @@ -533,9 +679,37 @@ gomp_map_vars_internal (struct gomp_device_descr * > const int typemask = short_mapkind ? 0xff : 0x7; > struct splay_tree_s *mem_map = &devicep->mem_map; > struct splay_tree_key_s cur_node; > - struct target_mem_desc *tgt > - = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum); > - tgt->list_count = mapnum; > + struct target_mem_desc *tgt; > + > + bool process_noncontig_arrays = false; > + size_t nca_data_row_num = 0, row_start = 0; > + size_t nca_info_num = 0, nca_index; > + struct ncarray_info *nca_info = NULL; > + struct target_var_desc *row_desc; > + uintptr_t target_row_addr; > + void **host_data_rows = NULL, **target_data_rows = NULL; > + void *row; > + > + if (mapnum > 0) > + { Also add such a comment here: "We assume all non-contiguous array map entries are placed at the start". > + int kind = get_kind (short_mapkind, kinds, 0); > + process_noncontig_arrays = GOMP_MAP_NONCONTIG_ARRAY_P (kind & > typemask); > + } > + > + if (process_noncontig_arrays) > + for (i = 0; i < mapnum; i++) > + { > + int kind = get_kind (short_mapkind, kinds, i); > + if (GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask)) > + { > + nca_data_row_num += gomp_noncontig_array_count_rows (hostaddrs[i]); > + nca_info_num += 1; > + } > + } Or, actually, can the 'if (mapnum > 0)' above and the 'for' loop here again be simplified to just one loop with 'break', like you've done in 'libgomp/oacc-parallel.c:revert_noncontig_array_map_pointers'? > + > + tgt = gomp_malloc (sizeof (*tgt) > + + sizeof (tgt->list[0]) * (mapnum + nca_data_row_num)); > + tgt->list_count = mapnum + nca_data_row_num; > tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1; > tgt->device_descr = devicep; > struct gomp_coalesce_buf cbuf, *cbufp = NULL; > @@ -735,6 +931,56 @@ gomp_map_vars_internal (struct gomp_device_descr * > } > } > > + /* For non-contiguous arrays. Each data row is one target item, separated > + from the normal map clause items, hence we order them after mapnum. */ > + if (process_noncontig_arrays) > + for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++) > + { > + int kind = get_kind (short_mapkind, kinds, i); > + if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask)) > + continue; Can instead 'break' again? > @@ -1044,8 +1299,112 @@ gomp_map_vars_internal (struct gomp_device_descr * > array++; > } > } > + > + /* Processing of non-contiguous array rows. */ > + if (process_noncontig_arrays) > + { > + for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++) > + { > + int kind = get_kind (short_mapkind, kinds, i); > + if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask)) > + continue; Likewise? It's now gotten too late; more review to follow later. Grüße Thomas
signature.asc
Description: PGP signature