> Actually, not two versions of those during the compilation, you have
> just one v and one tgt, both have __attribute__(("omp declare target"))
> on them (note, you can't specify that attribute manually).
> And just when streaming into .gnu.target_lto_* sections you only stream
> everything that has those attributes and types used by it, but nothing else.
Agreed.  The point was that in the bytecode we would have two versions.

> Nope.  It would be:
>   struct data_descriptor data_desc1[1] = { { &b, 1024*sizeof(float), TO } };
>   GOMP_target_data (-1, data_desc1, 1);
> or so.  The compiler always knows how many vector elements it needs, there
> is no point in making the vector dynamic ...
Yes, that's a good point.  We really don't need a dynamic type here.

> ... and vec<> is a compiler data
> structure, while you want to emit runtime code...
Yep, I know that - I just probably chose poor notation here.  I meant
that data_desc would be some vector storing structures
'data_descriptor'.  And now I see that there is even no need in vector -
array would be sufficient.

> ...  For the if clause, the question is if we want to pass
> it down to the runtime library too (as bool, defaulting to true if missing),
> or do something else.
I think we should do that in the same way as it's done in 'pragma
parallel'.

> Nope, there is only one target data pragma, so you would use here just:
> 
>   struct data_descriptor data_desc2[2] = { ... };
>   GOMP_target (-1, bar.omp_fn.1, "bar.omp_fn.1", data_desc2, 2);
This 'pragma target' is placed inside a 'pragma target data' - so all
variables for 'pragma target data' should be available for the 'pragma
target'.  So we need to pass to GOMP_target an array, that contains
united set of mapped variables from both pragmas - in our example these
would be variables B, C, and S.  So as I see it, we need to use the same
array of descriptors both in outer 'pragma target data' and in inner
'pragma target'.  Is it correct?  If data_desc2 contains descriptors of
only C and S, how B would be passed to bar.omp_fn.1?

> No, I didn't mean you'd do this.  omp-lower.c would simply create
> a type here that would have the same layout as what would the runtime
> library pass to it.
> So it would be:
> 
> void
> bar.omp_fn.1 (struct omp_target_data *.omp_data_in)
> {
>   int i;
>   *.omp_data_in->s = 0;
>   for (i = 0; i < 1024; i++)
>     tgt (), *.omp_data_in->s += .omp_data_in->b[i] * .omp_data_in->c[i];
> }
> 
> Just look what omplower pass does for normal OpenMP code, say
> #pragma omp parallel, task etc.
Actually, I meant the same (but probably used a poor notation for this
as well) - I like the idea of having similar approaches in 'pragma
target' and 'pragma parallel/etc.'.

On 26 Aug 14:51, Jakub Jelinek wrote:
> On Mon, Aug 26, 2013 at 03:59:11PM +0400, Michael V. Zolotukhin wrote:
> > As I currently see it, the given code would be expanded to something like
> > this:
> > 
> >   // Create two versions of V: for host and for target
> >   int v;
> >   int v_target __attribute(target);
> > 
> >   // The same for TGT function
> >   int tgt ()
> >   {
> >     .. update v ..
> >   }
> >   int tgt_target () __attribute(target)
> >   {
> >     .. update v_target ..
> >   }
> 
> Actually, not two versions of those during the compilation, you have
> just one v and one tgt, both have __attribute__(("omp declare target"))
> on them (note, you can't specify that attribute manually).
> And just when streaming into .gnu.target_lto_* sections you only stream
> everything that has those attributes and types used by it, but nothing else.
> > 
> >   float
> >   bar (int x, int y, int z)
> >   {
> >     float b[1024], c[1024], s = 0;
> >     int i, j;
> >     baz (b, c, x);
> >     // #pragma omp target data map(to: b)
> >     vec<data_descriptor> data_desc;
> >     data_desc.push ({&b, 1024*sizeof(float), TO});
> >     GOMP_target_data (&data_desc);
> 
> Nope.  It would be:
>   struct data_descriptor data_desc1[1] = { { &b, 1024*sizeof(float), TO } };
>   GOMP_target_data (-1, data_desc1, 1);
> or so.  The compiler always knows how many vector elements it needs, there
> is no point in making the vector dynamic, and vec<> is a compiler data
> structure, while you want to emit runtime code.  The -1 in there stands
> for missing device(device-id) clause, otherwise it would be the provided
> device-id expression.  For the if clause, the question is if we want to pass
> it down to the runtime library too (as bool, defaulting to true if missing),
> or do something else.
> 
> >     {
> >       // #pragma omp target map(tofrom: c) map(from:s)
> >       data_desc.push ({&c, 1024*sizeof(float), TOFROM});
> >       data_desc.push ({&s, sizeof(float), FROM});
> >       GOMP_target_data (&data_desc); // Add mapping for S and C variables,
> >                                  // mapping for B shouldn't change
> 
> Nope, there is only one target data pragma, so you would use here just:
> 
> >       GOMP_target (foo1, "foo1", &data_desc); // Call either FOO1 or 
> > offloaded
> >                                           // FOO1_TARGET with arguments
> >                                           // from vector DATA_DESC
> 
>   struct data_descriptor data_desc2[2] = { ... };
>   GOMP_target (-1, bar.omp_fn.1, "bar.omp_fn.1", data_desc2, 2);
> 
> > 
> >       // #pragma omp target update from(b, v)
> >       vec<data_descriptor> data_desc_update; // target update pragma 
> > require a
> >                                          // separate vector
> >       data_desc_update.push ({&b, 1024*sizeof(float), FROM});
> >       data_desc_update.push ({&v, sizeof(int), FROM});
> >       GOMP_target_data (&data_desc_update);
> 
> Similarly here.
> 
> >     }
> >     return s;
> >   }
> >   void
> >   foo1 (vec<data_descriptor> data_desc)
> >   {
> >     float b = *data_desc[0].host_address;
> >     float c = *data_desc[1].host_address;
> >     float s = 0;
> >     int i;
> >     for (i = 0; i < 1024; i++)
> >       tgt (), s += b[i] * c[i];
> >     *data_desc[2].host_address = s;
> 
> No, I didn't mean you'd do this.  omp-lower.c would simply create
> a type here that would have the same layout as what would the runtime
> library pass to it.
> So it would be:
> 
> void
> bar.omp_fn.1 (struct omp_target_data *.omp_data_in)
> {
>   int i;
>   *.omp_data_in->s = 0;
>   for (i = 0; i < 1024; i++)
>     tgt (), *.omp_data_in->s += .omp_data_in->b[i] * .omp_data_in->c[i];
> }
> 
> Just look what omplower pass does for normal OpenMP code, say
> #pragma omp parallel, task etc.
> 
>       Jakub

Reply via email to