Hi PA,

Paul-Antoine Arras wrote:
Please find attached a slightly amended version of the patch with OMP_CLAUSE_MAP_SIZE_NEEDS_ADJUSTMENT defined as a flag rather than a special value.

Thanks. I was about to write an email about this.

The following is copied from the draft reply to your previous email,
I think it still applies. And sorry for the slow patch review with
intervals due to getting interrupted by other things all the time.

* * *

Paul-Antoine Arras wrote:
With current mainline and 1/2, I see:

libgomp: Trying to map into device [0x19591a50..0x19591b00) object
when [0x19591a50..0x19591aa8) is already mapped

for the first 'target enter data':
...
This last error is expected and is not introduced by this patch -- it has to do with the way the 'present' modifier is handled for allocatables in general.
I'll try to come up with a fix in a separate patch.

I didn't meant to imply that it was. And thanks for looking into it.

For future reference, r16-7288-g1e71ff87c97fcd fixes the same issue for bare allocatables (outside any derived type).

* * *

Regarding the attached testcase (present-nodt.f90), I notice that
with your follow-up patch, it behaves as follows (tested with the 1/2 patch of this thread + current mainline): As it, it runs but the value is not copied out ("STOP 2");
using 'tofrom' instead of 'to' for 'density1', it compiles and works.

As discussed elsewhere, this is due to the underspecified behaviour of allocatable mapping (even without DT). The current libgomp implementation treats Fortran pointers and allocatables the same, so that pointer and pointee have different refcounts.

The latter seems to be an issue relative to OpenMP <= 5.2, but seems to match 
OpenMP 6.0+
as a result of the ref_ptr/ref_ptee changes there.

* * *

Some dump tests still assume that pointers (or rather: array bounds) are
64 bit wide (descriptor size). I think it should  handle 32bit as well. ('-m32')

This seems to be fixed in gcc/testsuite/gfortran.dg/gomp/map-subarray-5.f90,
but not in gcc/testsuite/gfortran.dg/gomp/map-subarray-3.f90.

Did you run with RUNTESTFLAGS="--target_board=unix'{-m64,-m32}'" to check?

I think the following needs to be also adapted ('pointer set, len: 64'):

gcc/testsuite/gfortran.dg/gomp/map-subarray-3.f90:

+! { dg-final { scan-tree-dump-times { #pragma omp target enter data 
map\(alloc:\*\(struct tile_type\[0:\] \* restrict\) chunk\.tiles\.data \[len: 0\] 
\[runtime_implicit\]\) map\(to:chunk\.tiles \[pointer set, len: 64\]\) 
map\(attach_detach:\(struct tile_type\[0:\] \* restrict\) chunk\.tiles\.data \[bias: 
[0-9]+\]\) } 1 "original" } }
...
+! { dg-final { scan-tree-dump-times { #pragma omp target exit data 
map\(release:chunk\.tiles \[pointer set, len: 64\]\) map\(attach_detach:\(struct 
tile_type\[0:\] \* restrict\) chunk\.tiles\.data \[bias: [0-9]+\]\) } 1 
"original" } }

I think it is 36 instead of 64 with -m32 (not rechecked with this patch).

* * *

Regarding:

        /* Sanity check: the standalone attach node will not work if we have
           an "enter data" operation (because for those, variables need to be
           mapped separately and attach nodes must be grouped together with the

@@ -13416,7 +13449,9 @@ omp_build_struct_sibling_lists (enum tree_code code,
           base they attach to).  We should only have created the
           ATTACH_DETACH node after GOMP_MAP_STRUCT for a target region, so
           this should never be true.  */
-       gcc_assert ((region_type & ORT_TARGET) != 0);
+       // This is no longer true. See zlas in gomp_map_vars_internal
+       // (libgomp/target.c).
+       // gcc_assert ((region_type & ORT_TARGET) != 0);

The background of the comment is that for 'target data' and 'target', all mapped variables are mapped and unmapped in one step – and can be combined. For 'target enter/exit data', that's not the case while for 'target enter data map(to: a,b)' both 'a' and 'b' are mapped, either of them could be separately from the other unmapped. Thus, for 'target'/'target data' all map variables are visible to gomp_map_vars_internal – but for 'target enter/exit data', there are multiple separate calls to gomp_map_vars_internal and the splitting might not always
be done in an optimal way.

ZLAs are zero-length arrays - and 'a[:0]' and 'a[:5]' are handled differently, one as ZLA pointer attachment and one was map followed by an attach. If the size is not known at compile time ('a[:N]'), a nop map (nullptr, zero-size) plus an attachment can be generated.
The code mentioned in libgomp is:

               case GOMP_MAP_ATTACH:
               case GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION:
...
                       bool zlas
                         = ((kind & typemask)
                            == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION);
                       /* For 'target enter data', the map clauses are split;
                          however, for more complex code with struct and
                          pointer members, the mapping and the attach can end up
                          in different sets; or the wrong mapping with the
                          attach. As there is no way to know whether a size
                          zero like  'var->ptr[i][:0]' happend in the same
                          directive or not, the not-attached check is now
                          fully silenced for 'enter data'.  */
                       if (openmp_p && (pragma_kind & GOMP_MAP_VARS_ENTER_DATA))
                         zlas = true;
                       if (!gomp_attach_pointer (devicep, aq, mem_map, n,
                                                 (uintptr_t) hostaddrs[i], 
sizes[i],
                                                 cbufp, zlas, !openmp_p))
                         {
                           /* Pointee not found; that's an error except for
                              map(var[:n]) with n == 0; the compiler adds a
                              runtime condition such that for those the kind is
                              always GOMP_MAP_ZERO_LEN_ARRAY_SECTION.  */
and the latter has:
           if (!allow_zero_length_array_sections && fail_if_not_found)
             {
               gomp_mutex_unlock (&devicep->lock);
               gomp_fatal ("pointer target not mapped for attach");

* * *

Back to the assert:

           ATTACH_DETACH node after GOMP_MAP_STRUCT for a target region, so
           this should never be true.  */
-       gcc_assert ((region_type & ORT_TARGET) != 0);
+       // This is no longer true. See zlas in gomp_map_vars_internal
+       // (libgomp/target.c).
+       // gcc_assert ((region_type & ORT_TARGET) != 0);

The commented 'gcc_assert' implies that this patch now generates such
attach also for target (enter) data :-)

I wonder whether we can do any better than just removing the comment and
the check for good. - Or, alternatively, leave a stub explanation that there
are issues with doing it here - even thought it kind of works.

* * *

Back to the latest patch:

        if (wholestruct)
         {
+         tree desc = OMP_CLAUSE_CHAIN (*(*wholestruct)->grp_start);
+         if (desc != NULL_TREE && omp_map_clause_descriptor_p (desc))
+           goto next;
           *mapped_by_group = *wholestruct;
           return true;
         }

I know that we tend to underdocument things, but I think adding
a short comment would help in the future.

* * *

+  /* Find each attach node whose bias needs to be adjusted and move it to the
+   * group containing its pointee, right after the struct node.  */
+  FOR_EACH_VEC_ELT (*groups, i, grp)
...
+         && OMP_CLAUSE_MAP_KIND (grp->grp_end) == GOMP_MAP_ATTACH_DETACH
+         && OMP_CLAUSE_MAP_SIZE_NEEDS_ADJUSTMENT (grp->grp_end))

I think the comment needs to be expanded why the bias is adjusted by moving it
right after the struct node. The added code doesn't do it (as one would expect
naively from reading the code) as it happens indirectly.

* * *

+/* Helper function for gfc_trans_omp_clauses.  */
+
+static bool
+gfc_map_array_descriptor (
+  tree &node, tree &node2, tree &node3, tree &node4, tree descr, bool openacc,
+  location_t map_loc, stmtblock_t *block, gfc_exec_op op, gfc_omp_namelist *n,
+  hash_map<gfc_symbol *, gfc_omp_namelist *> *&sym_rooted_nl, gfc_se se,
+  gfc_omp_clauses *clauses, bool mid_desc_p)

Can you at least roughly describe (in a comment) what that function
actually does? Including stating what the return value means.

* * *

(1) map (alloc: *(struct tile_type[0:] * restrict) chunk.tiles.data [len: 0])
(2) map (to: chunk.tiles [pointer set, len: 64])
(3) map (attach_detach: (struct tile_type[0:] * restrict) chunk.tiles.data
[bias: -1])

(1) will turn into a no-op at runtime because the inner component is explicitly
to-mapped but alloc is required at compile time for attaching. (2) ensures that
the array descriptor will be available at runtime to compute offsets and strides
in various dimensions. The gimplifier will turn (3) into a regular attach of the
data pointer and compute the bias.

It seems as if we should remove (1) after/during gimplification if we know that 
it
is only used intermittently. Maybe add another flag for this?

Tobias,
who still has to continue reviewing the rest of the patch.

Reply via email to