On Thu, Aug 22, 2013 at 06:08:10PM +0400, Michael V. Zolotukhin wrote: > We're working on design for offloading support in GCC (part of OpenMP4), and I > have a question regarding libgomp part. > > Suppose we expand '#pragma omp target' like we expand '#pragma omp parallel', > i.e. the compiler expands the following code: > #pragma omp target > { > body; > } > to this: > void subfunction (void *data) > { > use data; > body; > } > > setup data; > function_name = "subfunction"; > GOMP_offload (subfunction, &data, function_name);
Roughly. We have 3 directives here, #pragma omp target #pragma omp target data #pragma omp target update and all of them have various clauses, some that are allowed at most once (e.g. the device clause, if clause) and others that can be used many times (the data movement clauses). I'd prefer GOMP_target instead of GOMP_offload for the function name, to make it clearly related to the corresponding directive. The question is if we want to emit multiple calls for the single directive, say one for each data movement clause (where for each one we need address, length, direction and some way how to propagate the transformed address to the accelerator code), or if we build an array of the data movement structures and just pass that down to a single routine. Because of the device clause which should be probably passed just as an integer with -1 meaning the default, perhaps single routine might be better. > GOMP_offload is a call to libgomp, which will be implemented somehow like > this: > void GOMP_offload (void (*fn)(void*), void *data, const char *fname) > { > if (gomp_offload_available ()) This really isn't just check whether accelerator is available, we need to query all accelerators in the system (and cache that somehow in the library), assign device numbers to individual devices (say, you could have two Intel MIC cards, one AMD HSAIL capable GPGPU and 4 Nvidia PTX capable GPGPUs or similar), ensure that already assigned device numbers aren't reused when discovering new ones and then just check what device user requested (if not available, fall back to host), next check see if we have corresponding code for that accelerator (again, fallback to host otherwise), optionally compile the code if not compiled yet (HSAIL/PTX code only) then finally do the name lookup and spawn it. Stuff specific to the HW should be in libgomp plugins IMHO, so we have one dlopenable module for each of the 3 variants, where one fn in the plugin would be about checking what HW is available, one about trying to run the code etc. Jakub