On Tue, May 7, 2013 at 12:42 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, May 7, 2013 at 11:02 AM, Tobias Burnus <bur...@net-b.de> wrote: >> Richard Biener wrote: >>> >>> We're going to look at supporting HSA from GCC (which would make it more >>> or less trivial to also target openCL I think) >> >> >> For the friends of link-time optimization (LTO): >> >> Unless I missed some fine point in OpenACC and OpenMP's target, they only >> work with directives which are locally visible. Thus, if one does a function >> call in the device/target section, it can only be placed on the accelerator >> if the function can be inlined. >> >> Thus, it would be useful, if LTO could be used to inline such function into >> device code. I know one OpenACC code which calls functions in different >> translation units (TU) - and the Cray compiler handles this via LTO. Thus, >> it would be great if the HSA/OpenMP target/OpenACC middle-end infrastructure >> could do likewise, which also means deferring the error that an external >> function cannot be used to the middle-end/LTO FE and not placing it into the >> FE. - In the mentioned code, the called function does not have any OpenACC >> annotation but only consists of constructs which are permitted by the >> accelerator - thus, no automatic code gen of accelerator code happens for >> that. TU. >> >> (I just want to mention this to ensure that this kind of LTO/accelerator >> inlining is kept in mind when implementing the infrastructure for >> HSA/OpenACC/OpenMP target/OpenCL - even if cross-TU inlining is not >> supported initially.) > > In my view we'd get the "regular" OpenMP processing done during omp > lowering/expansion (which happens before LTO) which should mark the > generated worker functions apropriately. Emitting accelerator code should > then happen at LTRANS time, thus after all IPA inlining took place. The > interesting bits we can borrow from OMP is basically marking of functions > that are a) interesting, b) possible to transform. Unmarked functions / loops > will have to go the autopar way, thus we have to prove via dependence analysis > that executing iterations in parallel is possible.
Btw, we plan to re-use the GOMP runtime as otherwise any synchronisation between accelerator code and regular thread code is impossible. Which means changing the GOMP runtime in a way to be able to pass a descriptor which eventually has accelerator code (and a fallback regular function so you can disable accelerator usage at runtime). Richard. > Richard. > >> Tobias