Re: Ping Re: [gomp4] Dumping gimple for offload.

Richard Biener Fri, 29 Nov 2013 04:37:24 -0800

On Fri, Nov 29, 2013 at 1:17 PM, Bernd Schmidt <ber...@codesourcery.com> wrote:
> On 11/20/2013 10:36 AM, Jakub Jelinek wrote:
>> On Wed, Nov 20, 2013 at 10:34:30AM +0100, Richard Biener wrote:
>>> On Tue, Nov 19, 2013 at 10:58 AM, Ilya Tocar <tocarip.in...@gmail.com> 
>>> wrote:
>>>> On 14 Nov 11:27, Richard Biener wrote:
>>>>>> +  /* Set when symbol needs to be dumped for lto/offloading.  */
>>>>>> +  unsigned need_dump : 1;
>>>>>> +
>>>>>
>>>>> That's very non-descriptive.  What's "offloading"?  But yes, something
>>>>> like this is what I was asking for.
>>>>
>>>> I've changed it into:
>>>> Set when symbol needs to be dumped into LTO bytecode for LTO,
>>>> or in pragma omp target case, for separate compilation targeting
>>>> a different architecture.
>>>>
>>>> Ok for gomp4 branch now?
>>>
>>> Works for me.  I'll let branch maintainers decide if it follows whatever
>>> is done there (I haven't found time to follow stuff here).
>>
>> Ok then.
>
> We've been working on similar patches for our OpenACC project. The goal
> is to have functions generated during omp-low that will ultimately
> execute on a ptx target, write them out using LTO infrastructure and
> read them back in using a nvptx-none lto1.
>
> Unforunately, with multiple teams working in the same area there's
> obviously going to be some measure of duplication. What I'd like to do
> is to post a snapshot of what I currently have, to show the general
> ideas and hopefully get some discussion of what the final picture should
> look like. The next few mails in reply to this one will contain patches
> that work towards the following general outline. I've been trying to
> keep this flexible enough so that it won't be suitable just for the
> OpenACC work but for whatever else people want to achieve in this area.
>
> 1. New configure options are added, --enable-accelerator and
> --enable-as-accelerator-for. The names are certainly up for discussion.
> These allow the compiler to know which target combinations are
> available. The host compiler will be configured with
> --enable-accelerator, and the offload/accelerator compiler is configured
> with both options (mostly to ensure they both agree on the spelling of
> the accelerator target name).
> 2. Using --enable-as-accelerator-for= changes the install paths, so that
> the accelerator compilers end up in (for example)
>    bin/x86_64-linux-gnu-accel-nvptx-gcc-4.9.0
>    libexec/x86_64-linux-gnu/accel/nvptx/4.9.0/lto1
> which should keep them separate in case a target can be used both as a
> normal target and as an accelerator.
> 3. Some machinery is added to build the accelerator gcc directly in the
> same tree as the host compiler, in a separate "accel-gcc" subdir. This
> works for nvptx because that target doesn't even want to build a libgcc.
> It may not be suitable for other accelerators if they want to build
> target libraries, but otherwise I think it would be a nice convenience.
> However, building separately should work fine as well as long as the
> right options are used for configuring all the involved compilers.
> 4. We add a vector of target machines to the compiler. Normally this is
> just initialized to the single machine for which the compiler is
> configured, but when e.g. OpenACC with an accelerator is enabled, the
> accelerator machine is added to that list. It should cope fine with
> multiple different accelerator devices.
> 5. There's a new DECL_TARGET which refers to this list of target
> machines. It's set when creating a child function from e.g. "#pragma acc
> parallel"
> 6. ipa_write_summaries iterates over DECL_TARGET machines to write out
> LTO for each of them. LTO sections for a different target get a separate
> prefix encoding the machine name, e.g. ".gnu.tlto_nvptx_...".
> 7. lto_wrapper recognizes them and calls the various gcc drivers as
> needed. This is where the series ends, and this step is still incomplete.
>
> As mentioned, this patch series is still incomplete and has rough edges,
> but I hope it will generate discussion. Further details that will need
> to be addressed are (among others) option handling between compilers for
> different targets, and slightly rewriting the incoming gimple to be
> valid for the target (nvptx requires variables to go into various
> different address spaces).
>
> The patches I'll send assume that the present patch from this thread has
> been reverted, but otherwise they should apply to current gomp-4_0-branch.
>
> Thoughts, comments? Does anyone have a good name for these accelerator
> targets or output targets, something that avoids the overloaded word
> "target" (I was thinking "destination machine" maybe)?


Note that we (SUSE/AMD) sofar think we can go an easier route, not
adding a real backend that targets HSAIL/BRIG but instead use a
custom GIMPLE SSA -> HSAIL/BRIG translator (including a SSA
based register allocator).  Which if course simplifies driving this a bit
as we don't need to write/read any GIMPLE.

The idea is of course that the "highlevel" target languages, being it
HSAIL/BRIG or PTX run through another compiler + optimizer anyway,
so machine specific optimization is not necessary (fingers crossing...).

Not sure if anybody announced it yet (but gcc-cvs readers may have
noticed), there is a 'hsa' branch in svn covering work done sofar
(see gcc/README.hsa for how to use it).

Richard.

>
> Bernd
>

Re: Ping Re: [gomp4] Dumping gimple for offload.

Reply via email to