On 01/22/2014 11:53 AM, Andrey Turetskiy wrote:
We have some testcases, but they require XeonPhi hardware and a
working libgomp plugin. Our current version of the plugin depends on
some libraries, that are not open-sourced yet, so currently we can’t
share it.
However, you could examine what these patches do, making the following steps:
1) Build GCC with patches:
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01484.html
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01485.html
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01486.html
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01896.html
2) Set environment variables (e.g. for two ‘targets’):
export OFFLOAD_TARGET_NAMES=mic:hsail (for now
names don’t really matter)
export OFFLOAD_TARGET_COMPILERS=./gcc:./gcc (use GCC with
patches above as target compiler, because it must support the
-fopenmp_target option)
3) Build any example with #pragma omp target (e.g. see attachment):
./gcc -flto -fopenmp test.c -o test.exe
Options -flto and -fopenmp are necessary for using.
Now you have a binary with target images embedded and tables properly
filled. You can’t run it due to reasons mentioned above, though you
could examine it with objdump/nm/readelf to see new sections and their
content: there will be .offload_image_section with ‘target’ code and
.offload_func_table_section with ‘target’ function table.
I played around with this for a while last week. To have a slightly more
realistic scenario where the offload compiler is for a different target,
I built an aarch64-linux compiler and used that in
OFFLOAD_TARGET_COMPILERS. This exposed some problems.
+ /* Run gcc for target. */
+ obstack_init (&argv_obstack);
+ obstack_ptr_grow (&argv_obstack, compiler);
+ obstack_ptr_grow (&argv_obstack, "-shared");
+ obstack_ptr_grow (&argv_obstack, "-fPIC");
+ obstack_ptr_grow (&argv_obstack, "-xlto");
+ obstack_ptr_grow (&argv_obstack, "-fopenmp_target");
+ obstack_ptr_grow (&argv_obstack, "-o");
+ obstack_ptr_grow (&argv_obstack, target_image_file_name);
Since environment variables such as GCC_EXEC_PREFIX and COMPILER_PATH
are set at this point, the compiler we're running here won't find the
correct lto1 - best case it doesn't find anything, worst case it finds
the lto1 for the host compiler and produces an image for the host, not
the target (this fails with an arm compiler since the host assembler
doesn't understand -meabi=5, but it could silently do the wrong thing
with other offload toolchains).
Once I worked around this by unsetting the environment variables around
this compiler invocation here, the next problem is exposed - the code
tries to link together files compiled for the target (created by the
code quoted above) and the host (the _omp_descr file, I believe). Linker
errors ensue.
As mentioned before, I think all this target-specific code has no place
in lto-wrapper to begin with. For ptx, we're going to require some quite
different mechanisms, so I think it might be best to invoke a new tool,
maybe called $target-gen-offload, which knows how to produce an image
that can be linked into the host executable. Different offload targets
can then use different strategies to produce such an image. Probably
each such image should contain its own code to register itself with
libgomp, so that we don't have to construct a table.
Some other observations:
* is OFFLOAD_TARGET_NAMES actually useful, or would any string
generated at link time suffice?
* Is the user expected to set OFFLOAD_TARGET_COMPILERS, or should
this be done by the gcc driver, possibly based on command line
options (I'd much prefer that)?
* Do we actually need an -fopenmp-target option? The way I imagine it
(and which was somewhat present in the Makefile patches I posted
last year) is that an offload compiler is specially configured to
know that that's how it will be used, and to know what the host
architecture is. A $target-gen-offload could then be built with
knowledge of the host architecture and installed in the host
compiler's libexec install directory.
I think I'll need to implement my own set of mechanisms for ptx, since
this code doesn't seem suitable for inclusion in its current state. I'll
try to take on board some of the ideas I've found here in the hope that
we'll converge on something that works for everybody.
Bernd