On Tue, Aug 27, 2013 at 03:26:09PM +0400, Michael V. Zolotukhin wrote: > > Anyway, the GOMP_target_data implementation and part of GOMP_target would > > be something along the lines of following pseudocode: > > > > device_data = lookup_device_id (device_id); > > ... > Thanks, I've seen that similarly. But the problem with passing > arguments to the target is still open. I'll try to explain, what is the > problem. > > Remember what we did for 'pragma parallel': > struct .omp_data_s.0 .omp_data_o.2; > .omp_data_o.2.s = 0.0; > .omp_data_o.2.b = &b; > .omp_data_o.2.c = &c; > .omp_data_o.2.y = y_7(D); > .omp_data_o.2.j = j_9(D); > __builtin_GOMP_parallel (bar._omp_fn.0, &.omp_data_o.2, 0, 0); > s_12 = .omp_data_o.2.s; > y_13 = .omp_data_o.2.y; > j_14 = .omp_data_o.2.j; > > I.e. compiler prepares a structure with all arguments and pass it to the > runtime. Runtime passes this structure as-is to callee (i.e. to > bar._omp_fn.0). > > In bar._omp_fn.0 the compiler just emits code that extracts > corresponding fields from the given struct and thus initialize all > needed local vars: > bar._omp_fn.0 (struct .omp_data_s.0 * .omp_data_i) > { > int _12; > int _13; > ... > _12 = .omp_data_i_11(D)->y; > _13 = .omp_data_i_11(D)->j; > ... > } > > That scheme would work perfectly for implementing host fallback, but as > I see it, can't be applied as is for target offloading. The reason is > the following: > *) Compiler doesn't know runtime info, i.e. it doesn't know target > addresses so it can't fill the structure for passing to target version > of the routine. > *) Runtime doesn't know the structure layout - runtime should firstly > translate addresses and only then pass it to the callee, but it don't > know which addresses to translate, because it doesn't know which > variables are used by the callee. > > Currently, I see two possible solutions for this: > 1) add to the structure with arguments fields, describing size of each > field. Then GOMP_target parses this struct and replace every found > address with the corresponding target address, and only then call > target_call. > 2) Lift mapping/allocation stuff from runtime to compile time, i.e. > allow the compiler to generate calls like this: > .omp_data_o.2.s = 0.0; > .omp_data_o.2.b = &b; > .omp_data_o.2.c = &c; > .omp_data_o.2.y = y_7(D); > .omp_data_o.2.j = j_9(D); > .omp_data_o.target.2.s = GOMP_translate_target_address (0.0); > .omp_data_o.target.2.b = GOMP_translate_target_address (&b); > .omp_data_o.target.2.c = GOMP_translate_target_address (&c); > .omp_data_o.target.2.y = GOMP_translate_target_address (y_7(D)); > .omp_data_o.target.2.j = GOMP_translate_target_address (j_9(D)); > GOMP_target (bar._omp_fn.0, &.omp_data_o.2, &.omp_data_o.target.2, 0, 0, ); > Thus runtime would have two versions of structure with arguments and > will be able to pass it as-is to target callee. But probably we'll need > a version of that struct for each target and that would look very ugly. > > What do you think on that? Maybe I'm missing or overcomplicating > something, but for now I can't get how all this stuff could work > together without answers to these questions.
What I meant was just that if you call GOMP_target with num_descs N, then the structure will look like: struct .omp_target_data { sometype0 *var0; sometype1 *var1; ... sometypeNminus1 *varNminus1; }; so pretty much the runtime will call the target routine with address of an array of N pointers, and the compiler generated target routine will just use a struct to access it to make it more debuggable. As there won't be any paddings in the structure, I'd hope the structure layout will be exactly the same as the array. Jakub