> Actually, not two versions of those during the compilation, you have > just one v and one tgt, both have __attribute__(("omp declare target")) > on them (note, you can't specify that attribute manually). > And just when streaming into .gnu.target_lto_* sections you only stream > everything that has those attributes and types used by it, but nothing else. Agreed. The point was that in the bytecode we would have two versions.
> Nope. It would be: > struct data_descriptor data_desc1[1] = { { &b, 1024*sizeof(float), TO } }; > GOMP_target_data (-1, data_desc1, 1); > or so. The compiler always knows how many vector elements it needs, there > is no point in making the vector dynamic ... Yes, that's a good point. We really don't need a dynamic type here. > ... and vec<> is a compiler data > structure, while you want to emit runtime code... Yep, I know that - I just probably chose poor notation here. I meant that data_desc would be some vector storing structures 'data_descriptor'. And now I see that there is even no need in vector - array would be sufficient. > ... For the if clause, the question is if we want to pass > it down to the runtime library too (as bool, defaulting to true if missing), > or do something else. I think we should do that in the same way as it's done in 'pragma parallel'. > Nope, there is only one target data pragma, so you would use here just: > > struct data_descriptor data_desc2[2] = { ... }; > GOMP_target (-1, bar.omp_fn.1, "bar.omp_fn.1", data_desc2, 2); This 'pragma target' is placed inside a 'pragma target data' - so all variables for 'pragma target data' should be available for the 'pragma target'. So we need to pass to GOMP_target an array, that contains united set of mapped variables from both pragmas - in our example these would be variables B, C, and S. So as I see it, we need to use the same array of descriptors both in outer 'pragma target data' and in inner 'pragma target'. Is it correct? If data_desc2 contains descriptors of only C and S, how B would be passed to bar.omp_fn.1? > No, I didn't mean you'd do this. omp-lower.c would simply create > a type here that would have the same layout as what would the runtime > library pass to it. > So it would be: > > void > bar.omp_fn.1 (struct omp_target_data *.omp_data_in) > { > int i; > *.omp_data_in->s = 0; > for (i = 0; i < 1024; i++) > tgt (), *.omp_data_in->s += .omp_data_in->b[i] * .omp_data_in->c[i]; > } > > Just look what omplower pass does for normal OpenMP code, say > #pragma omp parallel, task etc. Actually, I meant the same (but probably used a poor notation for this as well) - I like the idea of having similar approaches in 'pragma target' and 'pragma parallel/etc.'. On 26 Aug 14:51, Jakub Jelinek wrote: > On Mon, Aug 26, 2013 at 03:59:11PM +0400, Michael V. Zolotukhin wrote: > > As I currently see it, the given code would be expanded to something like > > this: > > > > // Create two versions of V: for host and for target > > int v; > > int v_target __attribute(target); > > > > // The same for TGT function > > int tgt () > > { > > .. update v .. > > } > > int tgt_target () __attribute(target) > > { > > .. update v_target .. > > } > > Actually, not two versions of those during the compilation, you have > just one v and one tgt, both have __attribute__(("omp declare target")) > on them (note, you can't specify that attribute manually). > And just when streaming into .gnu.target_lto_* sections you only stream > everything that has those attributes and types used by it, but nothing else. > > > > float > > bar (int x, int y, int z) > > { > > float b[1024], c[1024], s = 0; > > int i, j; > > baz (b, c, x); > > // #pragma omp target data map(to: b) > > vec<data_descriptor> data_desc; > > data_desc.push ({&b, 1024*sizeof(float), TO}); > > GOMP_target_data (&data_desc); > > Nope. It would be: > struct data_descriptor data_desc1[1] = { { &b, 1024*sizeof(float), TO } }; > GOMP_target_data (-1, data_desc1, 1); > or so. The compiler always knows how many vector elements it needs, there > is no point in making the vector dynamic, and vec<> is a compiler data > structure, while you want to emit runtime code. The -1 in there stands > for missing device(device-id) clause, otherwise it would be the provided > device-id expression. For the if clause, the question is if we want to pass > it down to the runtime library too (as bool, defaulting to true if missing), > or do something else. > > > { > > // #pragma omp target map(tofrom: c) map(from:s) > > data_desc.push ({&c, 1024*sizeof(float), TOFROM}); > > data_desc.push ({&s, sizeof(float), FROM}); > > GOMP_target_data (&data_desc); // Add mapping for S and C variables, > > // mapping for B shouldn't change > > Nope, there is only one target data pragma, so you would use here just: > > > GOMP_target (foo1, "foo1", &data_desc); // Call either FOO1 or > > offloaded > > // FOO1_TARGET with arguments > > // from vector DATA_DESC > > struct data_descriptor data_desc2[2] = { ... }; > GOMP_target (-1, bar.omp_fn.1, "bar.omp_fn.1", data_desc2, 2); > > > > > // #pragma omp target update from(b, v) > > vec<data_descriptor> data_desc_update; // target update pragma > > require a > > // separate vector > > data_desc_update.push ({&b, 1024*sizeof(float), FROM}); > > data_desc_update.push ({&v, sizeof(int), FROM}); > > GOMP_target_data (&data_desc_update); > > Similarly here. > > > } > > return s; > > } > > void > > foo1 (vec<data_descriptor> data_desc) > > { > > float b = *data_desc[0].host_address; > > float c = *data_desc[1].host_address; > > float s = 0; > > int i; > > for (i = 0; i < 1024; i++) > > tgt (), s += b[i] * c[i]; > > *data_desc[2].host_address = s; > > No, I didn't mean you'd do this. omp-lower.c would simply create > a type here that would have the same layout as what would the runtime > library pass to it. > So it would be: > > void > bar.omp_fn.1 (struct omp_target_data *.omp_data_in) > { > int i; > *.omp_data_in->s = 0; > for (i = 0; i < 1024; i++) > tgt (), *.omp_data_in->s += .omp_data_in->b[i] * .omp_data_in->c[i]; > } > > Just look what omplower pass does for normal OpenMP code, say > #pragma omp parallel, task etc. > > Jakub