Re: OpenACC support in 4.9

Richard Biener Tue, 07 May 2013 03:46:19 -0700

On Tue, May 7, 2013 at 12:42 PM, Richard Biener
<[email protected]> wrote:
> On Tue, May 7, 2013 at 11:02 AM, Tobias Burnus <[email protected]> wrote:
>> Richard Biener wrote:
>>>
>>> We're going to look at supporting HSA from GCC (which would make it more
>>> or less trivial to also target openCL I think)
>>
>>
>> For the friends of link-time optimization (LTO):
>>
>> Unless I missed some fine point in OpenACC and OpenMP's target, they only
>> work with directives which are locally visible. Thus, if one does a function
>> call in the device/target section, it can only be placed on the accelerator
>> if the function can be inlined.
>>
>> Thus, it would be useful, if LTO could be used to inline such function into
>> device code. I know one OpenACC code which calls functions in different
>> translation units (TU) - and the Cray compiler handles this via LTO. Thus,
>> it would be great if the HSA/OpenMP target/OpenACC middle-end infrastructure
>> could do likewise, which also means deferring the error that an external
>> function cannot be used to the middle-end/LTO FE and not placing it into the
>> FE. - In the mentioned code, the called function does not have any OpenACC
>> annotation but only consists of constructs which are permitted by the
>> accelerator - thus, no automatic code gen of accelerator code happens for
>> that. TU.
>>
>> (I just want to mention this to ensure that this kind of LTO/accelerator
>> inlining is kept in mind when implementing the infrastructure for
>> HSA/OpenACC/OpenMP target/OpenCL - even if cross-TU inlining is not
>> supported initially.)
>
> In my view we'd get the "regular" OpenMP processing done during omp
> lowering/expansion (which happens before LTO) which should mark the
> generated worker functions apropriately.  Emitting accelerator code should
> then happen at LTRANS time, thus after all IPA inlining took place.  The
> interesting bits we can borrow from OMP is basically marking of functions
> that are a) interesting, b) possible to transform.  Unmarked functions / loops
> will have to go the autopar way, thus we have to prove via dependence analysis
> that executing iterations in parallel is possible.


Btw, we plan to re-use the GOMP runtime as otherwise any synchronisation
between accelerator code and regular thread code is impossible.  Which
means changing the GOMP runtime in a way to be able to pass a descriptor
which eventually has accelerator code (and a fallback regular function so
you can disable accelerator usage at runtime).

Richard.

> Richard.
>
>> Tobias

Re: OpenACC support in 4.9

Reply via email to