On Fri, Oct 23, 2020 at 06:24:20PM +0800, Chung-Lin Tang wrote: > this patch set implements more of OpenMP 5.0 mapping, specifically this part > in 2.19.7.1 map Clause: > > "If a list item in a map clause is a structure element then all other > structure elements of the > containing structure variable form a structure sibling list. The map clause > and the structure sibling > list are associated with the same construct. If a corresponding list item of > the structure sibling list > item is present in the device data environment when the construct is > encountered then: > > * If the 1 structure sibling list item does not appear in a map clause on > the construct then: > – If the construct is a target, target data, or target enter data > construct then the > structure sibling list item is treated as if it is a list item in a map > clause on the construct with a > map-type of alloc. > – If the construct is target exit data construct, then the structure > sibling list item is treated > as if it is a list item in a map clause on the construct with a map-type > of release." > > While really wordy, I believe this simply means that maps of structure > element fields have their reference > counts increased/decreased in a uniform fashion, i.e. the are alloc/release'd > together, instead of > having parts of the structure possibly deallocated while others still exist > on the device.
I think part of the reason for the above wording being so long is the declare mapper stuff which caused the mapping of the whole struct to be implicitly treated as mapping of all the elements individually (which we don't want to actually implement that way unless we have to (e.g. due to references in there, different declare mappers etc.)). > In general, upon encountering a construct, we can't statically determine and > insert alloc/release maps > for each element of a structure variable, since we don't really know which > region of the structure is > currently mapped or not, hence this probably can't be properly implemented in > the compiler. > > Instead this patch tries to do the equivalent in the runtime: I've modified > the handling of the > (GOMP_MAP_STRUCT, <field-map1>, <field-map2>, ...) sequence to: > > (1) Create just a single splay_tree_key to represent the entire structure's > mapped-region > (all element target_var_desc's now reference this same key instead of > creating their own), and I'm not sure that is what we want. If we create just a single splay_tree_key spanning the whole structure mapped region, then we can't diagnose various mapping errors. E.g. if I have: void bar (struct S *); struct S { int a, b, c, d, e; }; void foo (struct S s) { #pragma omp target data map(tofrom: s.b, s.d) #pragma omp target map (s.b, s.c) bar (&s); } then target data maps the &s.b to &s.d + 1 region of the struct, but s.c wasn't mapped and so the target region's mapping should fail, even when it is in the middle of the mapped region. The structure mapping wording was written in a way to give implementations a choice, either map the whole struct (space inefficient), or the region from the first to last element in the struct the needs mapping (what GCC implements, also space inefficient, but less so), or only map the fields individually and somehow remap all uses of the struct in the region (I think that is only theoretically possible if one can analyze the whole target region and rewrite anything that could refer to it in there). So, I'd think instead of having just one splay_tree_key, we need multiple (we could merge adjacent ones though if we want) but we need some way to tie them together (e.g. represent them as one master entry (perhaps the first one) and slaves entries and use the refcount of the master entry for all of them. There are other OpenMP 5.0 changes which are very tightly related to that though, namely that OpenMP 4.5 disallowed mapping the same variable multiple times in the same region and therefore had the simple rule that each mapping bumps the refcount by one. As OpenMP 5.0 dropped that, we have instead: "If the corresponding list item’s reference count was not already incremented because of the effect of a map clause on the construct then: a) The corresponding list item’s reference count is incremented by one;" So, additionally we need to ensure that we don't bump again refcounts we've bumped already in the same GOMP_* call and similarly at the end of region when unmapping. Additionally there is the complication that for enter data and exit data we call actually gomp_map_vars multiple times, so we'll need to track it somehow even across those calls. Though, the refcount is only on the target_mem_desc struct not on the target_var_desc. But we need to bump the refcount for each separate target_var_desc in there unless it is one of these slave entries, otherwise e.g. #pragma omp target data map (x, y, z) { #pragma omp target enter data map (to: x) #pragma omp target enter data map (to: y) #pragma omp target enter data map (to: z) #pragma omp target exit data map (from: x, y, z) } would keep the target_mem_desc mapped when it shouldn't (if we'd just increment or decrement refcount in each target_mem_desc once per construct (well, twice for target data and target, once upon entry, once upon exit), then the above would set refcount of the block containing all of x, y, z to 1, then 2, 3, 4, then decrease to 3 and finally decrease to 2). For data structures, perhaps change the 4 bool fields in target_var_desc into bitfields, so that we don't grow the structure on 32-bit architectures and add one bit for slave entries. We'd need to ensure the keys for the same GOMP_MAP_STRUCT are consecutive in the list[] array, but I think that should be the case already, so finding the corresponding master would be while (k->slave) k--; And then dunno, perhaps push the addresses of all the target_var_desc that should have refcount increased or decreased into a vector, qsort it and only bump refcount on the first entry in the array or if the previous address in the vector was different than the current one? Or a hash table recording what has been bumped already? Though perhaps with some cheaper way how to handle the most common case of only few mapped vars, because hash table creation would be too expensive in that case? As far as the merging of adjacent fields, perhaps that is something that can be as optional optimization done in the compiler (gimplify.c) when we have the stuff sorted for GOMP_MAP_STRUCT. But we need to take into account different mapping kinds. I'm very sorry about this, I really appreciate your work on this. Jakub