On Wed, Aug 28, 2013 at 01:21:53PM +0200, Richard Biener wrote: > My thought was that we need to have control over scheduling and thus have > a single runtime to be able to execute the following in parallel on the > accelerator and the CPU: > > #pragma omp parallel > { > #pragma omp target > for (;;) > ... > #pragma omp for > for (;;) > ... > } > #pragma omp wait > > that is, the omp target dispatch may not block the CPU. I can hardly
OpenMP #pragma omp target blocks the host CPU until the accelerator code finishes. So if the goal is to spawn some accelerator code in parallel with parallelized host code, you'd need to make the code more complicated. I guess you could #pragma omp parallel { #pragma omp single #pragma omp target { #pragma omp parallel ... } #pragma omp for schedule(dynamic, N) for (;;) ... } or similar, then only one of the host parallel threads would spawn the target code, wait for it to be done and other threads in the mean time would do the worksharing (and the dynamic schedule would make sure that if the target region took long time, then no work or almost no work would be scheduled for the thread executing the target region). > > In the Intel MIC case (the only thing I've looked briefly at for how the > > offloading works - the COI library) you can load binaries and shared > > libraries either from files or from host memory image, so e.g. you can > > embed the libgomp library, some kind of libm and some kind of libc > > (would that be glibc, newlib, something else?) compiled for the target > > into some data section inside of the plugin or something > > (or load it from files of course). No idea how you do this in the > > HSAIL case, or PTX. > > For HSA you can do arbitrary calls to CPU code (that will then of course > execute on the CPU). GCC compiles into assembly or bytecode for HSAIL, right, and that then is further processed by some (right now proprietary?) blob. The question is does this allow linking of multiple HSAIL bytecode objects/libraries, etc. Say you have something providing (a subset of) C library, math library, libgomp, then say for OpenMP one host shared library provides some #pragma omp declare target ... #pragma omp end declare target routine, and another shared library uses #pragma omp target and calls that routine from there. So, I'd assume you have some HSAIL assembly/bytecode in each of the shared libraries, can you link that together and tell the runtime to execute some (named?) routine in there? Jakub