On Fri, Nov 29, 2013 at 1:17 PM, Bernd Schmidt <ber...@codesourcery.com> wrote: > On 11/20/2013 10:36 AM, Jakub Jelinek wrote: >> On Wed, Nov 20, 2013 at 10:34:30AM +0100, Richard Biener wrote: >>> On Tue, Nov 19, 2013 at 10:58 AM, Ilya Tocar <tocarip.in...@gmail.com> >>> wrote: >>>> On 14 Nov 11:27, Richard Biener wrote: >>>>>> + /* Set when symbol needs to be dumped for lto/offloading. */ >>>>>> + unsigned need_dump : 1; >>>>>> + >>>>> >>>>> That's very non-descriptive. What's "offloading"? But yes, something >>>>> like this is what I was asking for. >>>> >>>> I've changed it into: >>>> Set when symbol needs to be dumped into LTO bytecode for LTO, >>>> or in pragma omp target case, for separate compilation targeting >>>> a different architecture. >>>> >>>> Ok for gomp4 branch now? >>> >>> Works for me. I'll let branch maintainers decide if it follows whatever >>> is done there (I haven't found time to follow stuff here). >> >> Ok then. > > We've been working on similar patches for our OpenACC project. The goal > is to have functions generated during omp-low that will ultimately > execute on a ptx target, write them out using LTO infrastructure and > read them back in using a nvptx-none lto1. > > Unforunately, with multiple teams working in the same area there's > obviously going to be some measure of duplication. What I'd like to do > is to post a snapshot of what I currently have, to show the general > ideas and hopefully get some discussion of what the final picture should > look like. The next few mails in reply to this one will contain patches > that work towards the following general outline. I've been trying to > keep this flexible enough so that it won't be suitable just for the > OpenACC work but for whatever else people want to achieve in this area. > > 1. New configure options are added, --enable-accelerator and > --enable-as-accelerator-for. The names are certainly up for discussion. > These allow the compiler to know which target combinations are > available. The host compiler will be configured with > --enable-accelerator, and the offload/accelerator compiler is configured > with both options (mostly to ensure they both agree on the spelling of > the accelerator target name). > 2. Using --enable-as-accelerator-for= changes the install paths, so that > the accelerator compilers end up in (for example) > bin/x86_64-linux-gnu-accel-nvptx-gcc-4.9.0 > libexec/x86_64-linux-gnu/accel/nvptx/4.9.0/lto1 > which should keep them separate in case a target can be used both as a > normal target and as an accelerator. > 3. Some machinery is added to build the accelerator gcc directly in the > same tree as the host compiler, in a separate "accel-gcc" subdir. This > works for nvptx because that target doesn't even want to build a libgcc. > It may not be suitable for other accelerators if they want to build > target libraries, but otherwise I think it would be a nice convenience. > However, building separately should work fine as well as long as the > right options are used for configuring all the involved compilers. > 4. We add a vector of target machines to the compiler. Normally this is > just initialized to the single machine for which the compiler is > configured, but when e.g. OpenACC with an accelerator is enabled, the > accelerator machine is added to that list. It should cope fine with > multiple different accelerator devices. > 5. There's a new DECL_TARGET which refers to this list of target > machines. It's set when creating a child function from e.g. "#pragma acc > parallel" > 6. ipa_write_summaries iterates over DECL_TARGET machines to write out > LTO for each of them. LTO sections for a different target get a separate > prefix encoding the machine name, e.g. ".gnu.tlto_nvptx_...". > 7. lto_wrapper recognizes them and calls the various gcc drivers as > needed. This is where the series ends, and this step is still incomplete. > > As mentioned, this patch series is still incomplete and has rough edges, > but I hope it will generate discussion. Further details that will need > to be addressed are (among others) option handling between compilers for > different targets, and slightly rewriting the incoming gimple to be > valid for the target (nvptx requires variables to go into various > different address spaces). > > The patches I'll send assume that the present patch from this thread has > been reverted, but otherwise they should apply to current gomp-4_0-branch. > > Thoughts, comments? Does anyone have a good name for these accelerator > targets or output targets, something that avoids the overloaded word > "target" (I was thinking "destination machine" maybe)?
Note that we (SUSE/AMD) sofar think we can go an easier route, not adding a real backend that targets HSAIL/BRIG but instead use a custom GIMPLE SSA -> HSAIL/BRIG translator (including a SSA based register allocator). Which if course simplifies driving this a bit as we don't need to write/read any GIMPLE. The idea is of course that the "highlevel" target languages, being it HSAIL/BRIG or PTX run through another compiler + optimizer anyway, so machine specific optimization is not necessary (fingers crossing...). Not sure if anybody announced it yet (but gcc-cvs readers may have noticed), there is a 'hsa' branch in svn covering work done sofar (see gcc/README.hsa for how to use it). Richard. > > Bernd >