Hi Chung-Lin!

On 2019-11-05T22:35:43+0800, Chung-Lin Tang <chunglin_t...@mentor.com> wrote:
> Hi Thomas,
> after your last round of review, I realized that the bulk of the compiler 
> omp-low work was
> simply a case of dumb over-engineering in the wrong direction :P
> (although it did painstakingly function correctly)

Hehe -- that happens.  ;-)

> However, the issue of ACC_DEVICE_TYPE=host not working (and hence 
> "!openacc_host_selected"
> in the testcases)

Actually not just for that, but also generally for any shared-memory
models that may come into existance at some point, such as CUDA Unified
Memory, for example?

> actually is a bit more sophisticated than I thought:
>
> The reason it doesn't work for the host device, is because we use the map 
> pointer (i.e.
> a hostaddrs[] entry when passed into libgomp) to point to an array descriptor 
> to pass
> the whole array information, and rely on code inside gomp_map_vars_* to setup 
> things,
> and place the final on-device address of the non-contig. array into 
> devaddrs[], therefore
> only using a single map entry (something I thought was quite clever)
>
> However, this broke down on the host and host-fallback devices, simply 
> because, there
> we do NOT do any gomp_map_vars processing; our current code in 
> GOACC_parallel_keyed
> simply skips it and passes the offload function the original hostaddrs[] 
> contents.
> Lacking the processing to transform the descriptor pointer into a proper 
> array ref,
> things of course segfault.
>
> So I think we have three options for this (which may have some interactions 
> with say,
> the "proper" host-side parallelization we eventually need to implement for 
> OpenACC 2.7)
>
> (1) The simplest solution: implement a processing which searches and reverts 
> such
> non-contiguous array map entries in GOACC_parallel_keyed.
> (note: I have implemented this in the current attached "v2" patch)
>
> (2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes;
> i.e. still do the proper gomp_map_vars processing for all cases.
>
> (3) Modify the non-contiguous array map conventions: a possible solution is 
> to use
> two maps placed together: one for the array pointer, another for the array 
> descriptor (as
> opposed to the current style of using only one map) This needs more further 
> elaborate
> compiler/runtime work.
>
> The first two options will pessimize host-mode performance somewhat. The 
> third I have
> some WIP patches, but it's still buggy ATM. Seeking your opinion on what we 
> should do.

I'll have to think about it some more, but variant (1) doesn't seem so
bad actually, for a first take.  While it's not nice to pessimize in
particular directives with 'if (false)' clauses, at least it does work,
the run-time overhead should not be too bad (also compared to variant
(2), I suppose), and variant (3) can still be implemented later.


A few comments/questions:

Please reference PR76739 in your submission/ChangeLog updates.

> --- gcc/c/c-typeck.c  (revision 277827)
> +++ gcc/c/c-typeck.c  (working copy)
> @@ -12868,7 +12868,7 @@ c_finish_omp_cancellation_point (location_t loc, t
>  static tree
>  handle_omp_array_sections_1 (tree c, tree t, vec<tree> &types,
>                            bool &maybe_zero_len, unsigned int &first_non_one,
> -                          enum c_omp_region_type ort)
> +                          bool &non_contiguous, enum c_omp_region_type ort)
>  {
>    tree ret, low_bound, length, type;
>    if (TREE_CODE (t) != TREE_LIST)

> @@ -13160,14 +13161,21 @@ handle_omp_array_sections_1 (tree c, tree t, vec<t
>         return error_mark_node;
>       }
>        /* If there is a pointer type anywhere but in the very first
> -      array-section-subscript, the array section can't be contiguous.  */
> +      array-section-subscript, the array section can't be contiguous.
> +      Note that OpenACC does accept these kinds of non-contiguous pointer
> +      based arrays.  */

That comment update should instead be moved to the function comment
before the 'handle_omp_array_sections_1' function definition, and should
then also explain the new 'non_contiguous' out variable.  The latter
needs to be done anyway, and the former (no comment here) is easy enough
to tell from the code:

>        if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND
>         && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
>       {
> -       error_at (OMP_CLAUSE_LOCATION (c),
> -                 "array section is not contiguous in %qs clause",
> -                 omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
> -       return error_mark_node;
> +       if (ort == C_ORT_ACC)
> +         non_contiguous = true;
> +       else
> +         {
> +           error_at (OMP_CLAUSE_LOCATION (c),
> +                     "array section is not contiguous in %qs clause",
> +                     omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
> +           return error_mark_node;
> +         }
>       }

> @@ -13238,6 +13247,7 @@ handle_omp_array_sections (tree c, enum c_omp_regi
>        unsigned int num = types.length (), i;
>        tree t, side_effects = NULL_TREE, size = NULL_TREE;
>        tree condition = NULL_TREE;
> +      tree ncarray_dims = NULL_TREE;
>  
>        if (int_size_in_bytes (TREE_TYPE (first)) <= 0)
>       maybe_zero_len = true;
> @@ -13261,6 +13271,13 @@ handle_omp_array_sections (tree c, enum c_omp_regi
>           length = fold_convert (sizetype, length);
>         if (low_bound == NULL_TREE)
>           low_bound = integer_zero_node;
> +
> +       if (non_contiguous)
> +         {
> +           ncarray_dims = tree_cons (low_bound, length, ncarray_dims);
> +           continue;
> +         }
> +
>         if (!maybe_zero_len && i > first_non_one)
>           {
>             if (integer_nonzerop (low_bound))

I'm not at all familiar with this array sections code, will trust your
understanding that we don't need any of the processing that you're
skipping here ('continue'): 'TREE_SIDE_EFFECTS' handling for the length
expressions, and other things.

> @@ -13357,6 +13374,14 @@ handle_omp_array_sections (tree c, enum c_omp_regi
>               size = size_binop (MULT_EXPR, size, l);
>           }
>       }
> +      if (non_contiguous)
> +     {
> +       int kind = OMP_CLAUSE_MAP_KIND (c);
> +       OMP_CLAUSE_SET_MAP_KIND (c, kind | GOMP_MAP_NONCONTIG_ARRAY);
> +       OMP_CLAUSE_DECL (c) = t;
> +       OMP_CLAUSE_SIZE (c) = ncarray_dims;
> +       return false;
> +     }
>        if (side_effects)
>       size = build2 (COMPOUND_EXPR, sizetype, side_effects, size);
>        if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION

Likewise for all the code being skipped here ('return false').

> --- gcc/cp/semantics.c        (revision 277827)
> +++ gcc/cp/semantics.c        (working copy)

Analoguous to the C front end.

> --- gcc/gimplify.c    (revision 277827)
> +++ gcc/gimplify.c    (working copy)
> @@ -8622,9 +8622,17 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se
>         if (OMP_CLAUSE_SIZE (c) == NULL_TREE)
>           OMP_CLAUSE_SIZE (c) = DECL_P (decl) ? DECL_SIZE_UNIT (decl)
>                                 : TYPE_SIZE_UNIT (TREE_TYPE (decl));
> +       if (OMP_CLAUSE_SIZE (c)
> +           && TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST
> +           && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))

Per the code above, 'OMP_CLAUSE_SIZE (c)' will always be set to
something, so no point in checking that here?

Isn't the 'GOMP_MAP_NONCONTIG_ARRAY_P' check alone sufficient already?
And then maybe 'assert (TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST' in
here:

>           {
> +           /* For non-contiguous array maps, OMP_CLAUSE_SIZE is a TREE_LIST
> +              of the individual array dimensions, which gimplify_expr doesn't
> +              handle, so skip the call to gimplify_expr here.  */
> +         }

> -       if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p,
> -                          NULL, is_gimple_val, fb_rvalue) == GS_ERROR)
> +       else if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p,
> +                               NULL, is_gimple_val, fb_rvalue) == GS_ERROR)
> +         {
>             remove = true;
>             break;
>           }

Again, that means we're skipping other code here; don't understand yet.

Your ChangeLog update says:

>       * gimplify.c (gimplify_scan_omp_clauses): For non-contiguous array map 
> kinds,
>       make sure bias in each dimension are put into firstprivate variables.

I'm not yet seeing how that's happening.

Ah, I see that ChangeLog comment is probably just a remnant from the
previous version.

> --- gcc/omp-low.c     (revision 277827)
> +++ gcc/omp-low.c     (working copy)

Have not yet reviewed in detail.

> @@ -1367,6 +1498,38 @@ scan_sharing_clauses (tree clauses, omp_context *c
>             install_var_local (decl, ctx);
>             break;
>           }
> +
> +       if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> +           && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))
> +         {
> +           tree array_decl = OMP_CLAUSE_DECL (c);
> +           tree array_type = TREE_TYPE (array_decl);
> +           bool by_ref = (TREE_CODE (array_type) == ARRAY_TYPE
> +                          ? true : false);
> +
> +           /* Checking code to ensure we only have arrays at top dimension.
> +              This limitation might be lifted in the future.  */

Please reference PR76739 here, and in PR76739 also add a comment about
this limitation.  (As well as any other limitations, of course.)

> +           if (TREE_CODE (array_type) == REFERENCE_TYPE)
> +             array_type = TREE_TYPE (array_type);
> +           tree t = array_type, prev_t = NULL_TREE;
> +           while (t)
> +             {
> +               if (TREE_CODE (t) == ARRAY_TYPE && prev_t)
> +                 {
> +                   error_at (gimple_location (ctx->stmt), "array types are"
> +                             " only allowed at outermost dimension of"
> +                             " non-contiguous array");
> +                   break;
> +                 }
> +               prev_t = t;
> +               t = TREE_TYPE (t);
> +             }
> +
> +           install_var_field (array_decl, by_ref, 3, ctx);
> +           install_var_local (array_decl, ctx);
> +           break;
> +         }
> +

Assuming this intentionally means to skip ('break' just above) the
following 'if (DECL_P (decl))' and its 'else' branch, then maybe remove
the 'break' just above, and instead do 'else if (DECL_P (decl))'?

>         if (DECL_P (decl))
>           {
>             if (DECL_SIZE (decl)

> @@ -2624,6 +2830,14 @@ scan_omp_target (gomp_target *stmt, omp_context *o
>        gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
>      }
> 
> +  /* If is OpenACC construct, put non-contiguous array clauses (if any)
> +     in front of clause chain. The runtime can then test the first to see
> +     if the additional map processing for them is required.  */
> +  if (is_gimple_omp_oacc (stmt))
> +    reorder_noncontig_array_clauses (gimple_omp_target_clauses_ptr (stmt));

Should that be deemed unsuitable for any reason, then add a new
'GOACC_FLAG_*' flag to indicate existance of non-contiguous arrays.

> --- include/gomp-constants.h  (revision 277827)
> +++ include/gomp-constants.h  (working copy)
> @@ -40,6 +40,7 @@
>  #define GOMP_MAP_FLAG_SPECIAL_0              (1 << 2)
>  #define GOMP_MAP_FLAG_SPECIAL_1              (1 << 3)
>  #define GOMP_MAP_FLAG_SPECIAL_2              (1 << 4)
> +#define GOMP_MAP_FLAG_SPECIAL_3              (1 << 5)
>  #define GOMP_MAP_FLAG_SPECIAL                (GOMP_MAP_FLAG_SPECIAL_1 \
>                                        | GOMP_MAP_FLAG_SPECIAL_0)
>  /* Flag to force a specific behavior (or else, trigger a run-time error).  */
> @@ -127,6 +128,26 @@ enum gomp_map_kind
>      /* Decrement usage count and deallocate if zero.  */
>      GOMP_MAP_RELEASE =                       (GOMP_MAP_FLAG_SPECIAL_2
>                                        | GOMP_MAP_DELETE),
> +    /* Mapping kinds for non-contiguous arrays.  */
> +    GOMP_MAP_NONCONTIG_ARRAY =               (GOMP_MAP_FLAG_SPECIAL_3),
> +    GOMP_MAP_NONCONTIG_ARRAY_TO =    (GOMP_MAP_NONCONTIG_ARRAY
> +                                      | GOMP_MAP_TO),
> +    GOMP_MAP_NONCONTIG_ARRAY_FROM =  (GOMP_MAP_NONCONTIG_ARRAY
> +                                      | GOMP_MAP_FROM),
> +    GOMP_MAP_NONCONTIG_ARRAY_TOFROM =        (GOMP_MAP_NONCONTIG_ARRAY
> +                                      | GOMP_MAP_TOFROM),
> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO =      (GOMP_MAP_NONCONTIG_ARRAY_TO
> +                                      | GOMP_MAP_FLAG_FORCE),
> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM =    (GOMP_MAP_NONCONTIG_ARRAY_FROM
> +                                              | GOMP_MAP_FLAG_FORCE),
> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM =  (GOMP_MAP_NONCONTIG_ARRAY_TOFROM
> +                                              | GOMP_MAP_FLAG_FORCE),
> +    GOMP_MAP_NONCONTIG_ARRAY_ALLOC =         (GOMP_MAP_NONCONTIG_ARRAY
> +                                              | GOMP_MAP_ALLOC),
> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC =   (GOMP_MAP_NONCONTIG_ARRAY
> +                                              | GOMP_MAP_FORCE_ALLOC),
> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT = (GOMP_MAP_NONCONTIG_ARRAY
> +                                              | GOMP_MAP_FORCE_PRESENT),

Just an idea: instead of this long list, would it maybe be better (if
feasible at all?) to have a single "lead-in" mapping
'GOMP_MAP_NONCONTIG_ARRAY_MODE', which specifies how many of the
following (normal) mappings belong to that "non-contiguous array mode".
(Roughly similar to what 'GOMP_MAP_TO_PSET' is doing with any
'GOMP_MAP_POINTER's following it.)  Might that make some things simpler,
or even more complicated (more internal state to keep)?

> --- libgomp/oacc-parallel.c   (revision 277827)
> +++ libgomp/oacc-parallel.c   (working copy)

> +static inline void
> +revert_noncontig_array_map_pointers (size_t mapnum, void **hostaddrs,
> +                                  unsigned short *kinds)
> +{
> +  for (int i = 0; i < mapnum; i++)
> +    {
> +      if (GOMP_MAP_NONCONTIG_ARRAY_P (kinds[i] & 0xff))
> +     hostaddrs[i] = *((void **)hostaddrs[i]);

Can we be (or, do we make) sure that 'hostaddrs' will never be in
read-only memory?

And, it's permissible to alter 'hostaddrs'?

Ah, other code (including 'libgomp/target.c') is doing such things, too,
so it must be fine.

> +      else
> +     /* We assume all non-contiguous array map entries are placed at the
> +        start; first other map kind means we can exit.  */
> +     break;
> +    }
> +}

> --- libgomp/target.c  (revision 277827)
> +++ libgomp/target.c  (working copy)

Have not yet reviewed in detail.

> @@ -533,9 +679,37 @@ gomp_map_vars_internal (struct gomp_device_descr *
>    const int typemask = short_mapkind ? 0xff : 0x7;
>    struct splay_tree_s *mem_map = &devicep->mem_map;
>    struct splay_tree_key_s cur_node;
> -  struct target_mem_desc *tgt
> -    = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
> -  tgt->list_count = mapnum;
> +  struct target_mem_desc *tgt;
> +
> +  bool process_noncontig_arrays = false;
> +  size_t nca_data_row_num = 0, row_start = 0;
> +  size_t nca_info_num = 0, nca_index;
> +  struct ncarray_info *nca_info = NULL;
> +  struct target_var_desc *row_desc;
> +  uintptr_t target_row_addr;
> +  void **host_data_rows = NULL, **target_data_rows = NULL;
> +  void *row;
> +
> +  if (mapnum > 0)
> +    {

Also add such a comment here: "We assume all non-contiguous array map
entries are placed at the start".

> +      int kind = get_kind (short_mapkind, kinds, 0);
> +      process_noncontig_arrays = GOMP_MAP_NONCONTIG_ARRAY_P (kind & 
> typemask);
> +    }
> +
> +  if (process_noncontig_arrays)
> +    for (i = 0; i < mapnum; i++)
> +      {
> +     int kind = get_kind (short_mapkind, kinds, i);
> +     if (GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))
> +       {
> +         nca_data_row_num += gomp_noncontig_array_count_rows (hostaddrs[i]);
> +         nca_info_num += 1;
> +       }
> +      }

Or, actually, can the 'if (mapnum > 0)' above and the 'for' loop here
again be simplified to just one loop with 'break', like you've done in
'libgomp/oacc-parallel.c:revert_noncontig_array_map_pointers'?

> +
> +  tgt = gomp_malloc (sizeof (*tgt)
> +                  + sizeof (tgt->list[0]) * (mapnum + nca_data_row_num));
> +  tgt->list_count = mapnum + nca_data_row_num;
>    tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
>    tgt->device_descr = devicep;
>    struct gomp_coalesce_buf cbuf, *cbufp = NULL;

> @@ -735,6 +931,56 @@ gomp_map_vars_internal (struct gomp_device_descr *
>       }
>      }
>  
> +  /* For non-contiguous arrays. Each data row is one target item, separated
> +     from the normal map clause items, hence we order them after mapnum.  */
> +  if (process_noncontig_arrays)
> +    for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++)
> +      {
> +     int kind = get_kind (short_mapkind, kinds, i);
> +     if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))
> +       continue;

Can instead 'break' again?

> @@ -1044,8 +1299,112 @@ gomp_map_vars_internal (struct gomp_device_descr *
>               array++;
>             }
>         }
> +
> +      /* Processing of non-contiguous array rows.  */
> +      if (process_noncontig_arrays)
> +     {
> +       for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++)
> +         {
> +           int kind = get_kind (short_mapkind, kinds, i);
> +           if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))
> +             continue;

Likewise?


It's now gotten too late; more review to follow later.


Grüße
 Thomas

Attachment: signature.asc
Description: PGP signature

Reply via email to