On Fri, Oct 23, 2020 at 06:24:20PM +0800, Chung-Lin Tang wrote:
> this patch set implements more of OpenMP 5.0 mapping, specifically this part 
> in 2.19.7.1 map Clause:
> 
> "If a list item in a map clause is a structure element then all other 
> structure elements of the
>  containing structure variable form a structure sibling list. The map clause 
> and the structure sibling
>  list are associated with the same construct. If a corresponding list item of 
> the structure sibling list
>  item is present in the device data environment when the construct is 
> encountered then:
> 
>  * If the 1 structure sibling list item does not appear in a map clause on 
> the construct then:
>    – If the construct is a target, target data, or target enter data 
> construct then the
>      structure sibling list item is treated as if it is a list item in a map 
> clause on the construct with a
>      map-type of alloc.
>    – If the construct is target exit data construct, then the structure 
> sibling list item is treated
>      as if it is a list item in a map clause on the construct with a map-type 
> of release."
> 
> While really wordy, I believe this simply means that maps of structure 
> element fields have their reference
> counts increased/decreased in a uniform fashion, i.e. the are alloc/release'd 
> together, instead of
> having parts of the structure possibly deallocated while others still exist 
> on the device.

I think part of the reason for the above wording being so long is the
declare mapper stuff which caused the mapping of the whole struct to be
implicitly treated as mapping of all the elements individually (which we
don't want to actually implement that way unless we have to (e.g. due to
references in there, different declare mappers etc.)).

> In general, upon encountering a construct, we can't statically determine and 
> insert alloc/release maps
> for each element of a structure variable, since we don't really know which 
> region of the structure is
> currently mapped or not, hence this probably can't be properly implemented in 
> the compiler.
> 
> Instead this patch tries to do the equivalent in the runtime: I've modified 
> the handling of the
> (GOMP_MAP_STRUCT, <field-map1>, <field-map2>, ...) sequence to:
> 
>   (1) Create just a single splay_tree_key to represent the entire structure's 
> mapped-region
>       (all element target_var_desc's now reference this same key instead of 
> creating their own), and

I'm not sure that is what we want.  If we create just a single
splay_tree_key spanning the whole structure mapped region, then we can't
diagnose various mapping errors.  E.g. if I have:
void bar (struct S *);
struct S { int a, b, c, d, e; };
void foo (struct S s)
{
  #pragma omp target data map(tofrom: s.b, s.d)
  #pragma omp target map (s.b, s.c)
  bar (&s);
}
then target data maps the &s.b to &s.d + 1 region of the struct, but s.c
wasn't mapped and so the target region's mapping should fail, even when it
is in the middle of the mapped region.

The structure mapping wording was written in a way to give implementations a
choice, either map the whole struct (space inefficient), or the region from
the first to last element in the struct the needs mapping (what GCC
implements, also space inefficient, but less so), or only map the fields
individually and somehow remap all uses of the struct in the region (I think
that is only theoretically possible if one can analyze the whole target
region and rewrite anything that could refer to it in there).

So, I'd think instead of having just one splay_tree_key, we need multiple
(we could merge adjacent ones though if we want) but we need some way to tie
them together (e.g. represent them as one master entry (perhaps the first one) 
and
slaves entries and use the refcount of the master entry for all of them.

There are other OpenMP 5.0 changes which are very tightly related to that
though, namely that OpenMP 4.5 disallowed mapping the same variable multiple
times in the same region and therefore had the simple rule that each mapping
bumps the refcount by one.  As OpenMP 5.0 dropped that, we have instead:
"If the corresponding list item’s reference count was not already incremented 
because of the
effect of a map clause on the construct then:
a) The corresponding list item’s reference count is incremented by one;"

So, additionally we need to ensure that we don't bump again refcounts we've
bumped already in the same GOMP_* call and similarly at the end of region
when unmapping.  Additionally there is the complication that for enter data
and exit data we call actually gomp_map_vars multiple times, so we'll need
to track it somehow even across those calls.

Though, the refcount is only on the target_mem_desc struct not on the
target_var_desc.  But we need to bump the refcount for each separate
target_var_desc in there unless it is one of these slave entries, otherwise
e.g.
#pragma omp target data map (x, y, z)
{
#pragma omp target enter data map (to: x)
#pragma omp target enter data map (to: y)
#pragma omp target enter data map (to: z)
#pragma omp target exit data map (from: x, y, z)
}
would keep the target_mem_desc mapped when it shouldn't (if we'd just
increment or decrement refcount in each target_mem_desc once per construct
(well, twice for target data and target, once upon entry, once upon exit),
then the above would set refcount of the block containing all of x, y, z
to 1, then 2, 3, 4, then decrease to 3 and finally decrease to 2).

For data structures, perhaps change the 4 bool fields in target_var_desc
into bitfields, so that we don't grow the structure on 32-bit architectures
and add one bit for slave entries.  We'd need to ensure the keys for the
same GOMP_MAP_STRUCT are consecutive in the list[] array, but I think that
should be the case already, so finding the corresponding master would be
while (k->slave) k--;

And then dunno, perhaps push the addresses of all the target_var_desc that
should have refcount increased or decreased into a vector, qsort it and
only bump refcount on the first entry in the array or if the previous
address in the vector was different than the current one?  Or a hash table
recording what has been bumped already?  Though perhaps with some cheaper
way how to handle the most common case of only few mapped vars, because hash
table creation would be too expensive in that case?

As far as the merging of adjacent fields, perhaps that is something
that can be as optional optimization done in the compiler (gimplify.c)
when we have the stuff sorted for GOMP_MAP_STRUCT.  But we need to take
into account different mapping kinds.

I'm very sorry about this, I really appreciate your work on this.

        Jakub

Reply via email to