On Wed, May 8, 2013 at 10:25 PM, Torvald Riegel <trie...@redhat.com> wrote: > On Tue, 2013-05-07 at 12:46 +0200, Richard Biener wrote: >> On Tue, May 7, 2013 at 12:42 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >> > On Tue, May 7, 2013 at 11:02 AM, Tobias Burnus <bur...@net-b.de> wrote: >> >> Richard Biener wrote: >> >>> >> >>> We're going to look at supporting HSA from GCC (which would make it more >> >>> or less trivial to also target openCL I think) >> >> >> >> >> >> For the friends of link-time optimization (LTO): >> >> >> >> Unless I missed some fine point in OpenACC and OpenMP's target, they only >> >> work with directives which are locally visible. Thus, if one does a >> >> function >> >> call in the device/target section, it can only be placed on the >> >> accelerator >> >> if the function can be inlined. >> >> >> >> Thus, it would be useful, if LTO could be used to inline such function >> >> into >> >> device code. I know one OpenACC code which calls functions in different >> >> translation units (TU) - and the Cray compiler handles this via LTO. Thus, >> >> it would be great if the HSA/OpenMP target/OpenACC middle-end >> >> infrastructure >> >> could do likewise, which also means deferring the error that an external >> >> function cannot be used to the middle-end/LTO FE and not placing it into >> >> the >> >> FE. - In the mentioned code, the called function does not have any OpenACC >> >> annotation but only consists of constructs which are permitted by the >> >> accelerator - thus, no automatic code gen of accelerator code happens for >> >> that. TU. >> >> >> >> (I just want to mention this to ensure that this kind of LTO/accelerator >> >> inlining is kept in mind when implementing the infrastructure for >> >> HSA/OpenACC/OpenMP target/OpenCL - even if cross-TU inlining is not >> >> supported initially.) >> > >> > In my view we'd get the "regular" OpenMP processing done during omp >> > lowering/expansion (which happens before LTO) which should mark the >> > generated worker functions apropriately. Emitting accelerator code should >> > then happen at LTRANS time, thus after all IPA inlining took place. The >> > interesting bits we can borrow from OMP is basically marking of functions >> > that are a) interesting, b) possible to transform. Unmarked functions / >> > loops >> > will have to go the autopar way, thus we have to prove via dependence >> > analysis >> > that executing iterations in parallel is possible. >> >> Btw, we plan to re-use the GOMP runtime as otherwise any synchronisation >> between accelerator code and regular thread code is impossible. > > I can't follow this line of reasoning. Can you elaborate? Which kind > of synchronization are you referring to? > > As far as parallel execution and resource management is concerned, > libgomp has just the kinds of scheduler that you need in the OpenMP rule > set. Work-stealing schedulers such as Cilk's are others, and might > actually become the more common approach. And there are other thread > pools that programs might use; e.g., there's lots of discussion about > all this in ISO C++ study group 1 on parallelism and concurrency, and > several different proposals. > > With that in mind, I'm wondering whether the cooperative scheduling that > we likely need should be at a lower level than libgomp or the Cilk > runtime. Otherwise, libgomp needs to become the scheduler that runs > them all (that is, if you want it to work well when combined with other > abstractions for parallelism), and I'm not sure whether that's the right > approach.
See my other mail. >> Which >> means changing the GOMP runtime in a way to be able to pass a descriptor >> which eventually has accelerator code (and a fallback regular function so >> you can disable accelerator usage at runtime). > > It probably should be a list of different codes -- you might have more > than one suitable accelerator available. Of course. And the descriptor should be versioned to avoid future ABI changes. Note that I'd always generate code for the CPU as fallback. > BTW: What about putting this topic on the Cauldron agenda? Is there > still time available to discuss what GCC might do regarding accelerators > and HW heterogeneity? I am not able to attend, but certainly the topic is interesting. Richard. > > Torvald >