On Wed, 2013-08-28 at 13:21 +0200, Richard Biener wrote: > On Wed, Aug 28, 2013 at 1:06 PM, Jakub Jelinek <ja...@redhat.com> wrote: > > On Wed, Aug 28, 2013 at 12:39:00PM +0200, Richard Biener wrote: > >> >From the accelerator BOF video I gather we agreed on using the GOMP > >> representation as unified middle-end. What I didn't get is whether we > >> agreed on libgomp being the unified single runtime (that eventually > >> dispatches to accelerator specific runtimes, opened via dlopen)? > > > > I guess that is up to discussions. It can be e.g. that libgomp library > > dlopens libgomp specific plugins, or that those plugins are written to be > > usable by more libraries (libopenacc, etc.), or some code for those plugins > > is shared. > > Important is also what target "libraries" we actually provide, e.g. OpenMP > > 4.0 says basically that from target code you can only call code declared > > or defined in #pragma omp declare target ... #pragma omp end declare target > > region, but it pretty much assumes that you can use various omp_* library > > calls, various #pragma omp ... directives (which probably need some library > > implementation) and stuff like printf and various math library functions. > > My thought was that we need to have control over scheduling and thus have > a single runtime to be able to execute the following in parallel on the > accelerator and the CPU: > > #pragma omp parallel > { > #pragma omp target > for (;;) > ... > #pragma omp for > for (;;) > ... > } > #pragma omp wait > > that is, the omp target dispatch may not block the CPU.
And that's not the only combination we have to consider. ISO C++ will come up with something eventually (both for parallelism and likely as well for concurrency), and ISO C has a study group (CPLEX) looking at Cilk with some OpenMP mixed in. So we will have different programming abstractions (OpenMP, some Cilk-like, perhaps some kind of lightweight threads for concurrency, ...) to support, and currently they all use different schedulers. There are conversations going on in the respective ISO C++ and C study groups about how to tame the scheduler side of this, but nothing tangible has emerged from that so far. > I can hardly > see how you can make multiple runtimes co-exist from the GCC code > generation side. Perhaps having several runtimes is not as much of a problem as potentially having several runtimes that can't agree on the same semantics of how to share resources, and how parallel/concurrent tasks look like. IOW, we might have to do more "unification" work in the intermediate representation too (i.e., current GOMP + something + changes) > > In the Intel MIC case (the only thing I've looked briefly at for how the > > offloading works - the COI library) you can load binaries and shared > > libraries either from files or from host memory image, so e.g. you can > > embed the libgomp library, some kind of libm and some kind of libc > > (would that be glibc, newlib, something else?) compiled for the target > > into some data section inside of the plugin or something > > (or load it from files of course). No idea how you do this in the > > HSAIL case, or PTX. > > For HSA you can do arbitrary calls to CPU code (that will then of course > execute on the CPU). Right, which means that we would have a dispatch for both directions, controlled by some part of the HSA runtime. This would probably also mean that the execution of parts executed on the CPU can be parallel, so the HSA runtime would probably want to do that by calling back into the generic scheduler code responsible for all parallel/concurrent tasks. Torvald