On Mon, Aug 26, 2013 at 05:29:36PM +0400, Michael V. Zolotukhin wrote:
> > Nope, there is only one target data pragma, so you would use here just:
> > 
> >   struct data_descriptor data_desc2[2] = { ... };
> >   GOMP_target (-1, bar.omp_fn.1, "bar.omp_fn.1", data_desc2, 2);
> This 'pragma target' is placed inside a 'pragma target data' - so all
> variables for 'pragma target data' should be available for the 'pragma
> target'.  So we need to pass to GOMP_target an array, that contains
> united set of mapped variables from both pragmas - in our example these
> would be variables B, C, and S.  So as I see it, we need to use the same
> array of descriptors both in outer 'pragma target data' and in inner
> 'pragma target'.  Is it correct?  If data_desc2 contains descriptors of
> only C and S, how B would be passed to bar.omp_fn.1?

Actually no, that should be the responsibility of the runtime library.
Note, the #pragma omp target data directive doesn't have to be in the same
function as #pragma omp target.  And, I'm sorry for having to confuse this
by hacking in a premature optimization in the gimplifier.  It is true
that if the target data is around target directive that the target will
always be able to only look stuff up, will not need to allocate it,
but 1) the gimplifier doesn't verify it is the same device between those two
2) and as discussed earlier we need it also for the mapping in GOMP_target
So, the way it should actually work IMHO is that both GOMP_target_data
is passed a descriptor for b, and also GOMP_target.
In a more complicated testcase where you have a pointer based array section:
void
foo (int *p)
{
  #pragma omp target data map (tofrom: p[:1024])
  {
    #pragma omp target
    for (int i = 0; i < 1024; i++)
      p[i] += 2;
  }
}
GOMP_target_data does two mappings - one where it maps
(char *) p+0 ... (char *) p+1024*sizeof(int)-1
region (tofrom) and one where it maps
&p ... (char *)(&p+1)+1
region with pointerassign type (i.e. that it is initialized to
the address of the target pointer section).
And then GOMP_target during gimplification determines that p
is used, but not explicitly mapped, so it is added automatically
to #pragma omp target as implicit map(tofrom:p).  That doesn't
do anything with the corresponding array section, and while it is tofrom,
it will actually be always ignored, since the region is already mapped.
Perhaps as optimization the compiler could hint the runtime library that
it can just look it up and doesn't need to allocate/copy anything.

Anyway, the GOMP_target_data implementation and part of GOMP_target would
be something along the lines of following pseudocode:

device_data = lookup_device_id (device_id);
if (device_data == NULL)
  do host fallback;
else
  {
    size_t i, length_sum = 0;
    target_data_env = create_target_data_env ();
    void *target_addrs[num_device_descs]; // VLA or alloca etc.
    char *target_addr = NULL;
    memset (target_addrs, 0, sizeof (target_addrs));
    vec_safe_push (device_data->target_data_envs, target_data_env);
    for (i = 0; i < num_device_descs; i++)
      {
        target_addrs[i] = lookup_in_target_address_tree 
(device_data->addr_tree, device_desc[i].host_addr, device_desc[i].length);
        if (target_addrs[i] == NULL)
          length_sum += device_desc[i].length;
      }
    if (length_sum)
      {
        target_addr = target_malloc (device_data, length_sum);
        length_sum = 0;
        for (i = 0; i < num_device_descs; i++)
          if (target_addrs[i] == NULL)
            {
              target_addrs[i] = target_addr + length_sum;
              length_sum += device_desc[i].length;
              switch (device_desc[i].kind)
                {
                case ALLOC: case FROM: /* nothing */ break;
                case TO:
                case TOFROM: target_copy_todevice (device_data, 
device_desc[i].host_addr, target_addrs[i], device_desc[i].length); break;
                case POINTER: lookup + copy to; break;
                }
              ptr = add_to_target_address_tree (device_data->addr_tree, 
device_desc[i].host_addr, device_desc[i].length, target_addrs[i], 
device_desc[i].kind);
              vec_safe_push (target_data_env->vec, ptr);
            }
      }
    if (GOMP_target call)
      {
      target_call (fn_name, target_addrs);
      FOR_EACH_SAFE_VEC (target_data_env->vec, ptr)
        {
          switch (ptr->kind)
            {
            case FROM: case TOFROM: target_copy_fromdevice (device_data, ...); 
break;
            }
          remove_from_target_address_tree (device_data->addr_tree, ptr);
        }
      vec_pop_and_free(device_data->target_data_envs);
      }
  }

and for GOMP_target_data_end it would do pretty much the stuff in between
FOR_EACH_SAFE_VEC and vec_pop_and_free.
All names above subject to change for something better, I just wanted to
make the picture clear.  There needs to be some address -> target address
data structure (addr_tree), probably some tree (AVL, whatever), in any case
a lookup doesn't need to be exact, you can e.g. look up part of an existing
mapping.  Trying to map something that overlaps an existing mapping, but is
larger than that, is a user bug.

        Jakub

Reply via email to