[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Samuel Antao via Phabricator via cfe-commits Wed, 30 May 2018 03:31:46 -0700

sfantao added a comment.

> In a discussion off-list I proposed adding constructor functions to all 
> object files and handle them like shared libraries are already handled today 
> (ie register separately and let the runtime figure out how to relocate 
> symbols in different translation units). I don't have an implementation of 
> that approach so I can't claim that it works and doesn't have a huge 
> performance impact (which we don't want either), but it should be agnostic of 
> the offloading target so it may be worth investigating.


I don't understand how this would work. Doing something like that would require 
reimplementing the GPU-code linker, which requires knowing proprietary 
information of the GPU binary format. I would know how to resolve all the 
relocations in the device code. In my view, the solution would only work (or at 
least be more easily implemented) if we don't have relocatable code.

> Assuming we do proceed with back-to-CUDA approach, one thing I'd consider 
> would be using clang's -fcuda-include-gpubinary option which CUDA uses to 
> include GPU code into the host object. You may be able to use it to avoid 
> compiling and partially linking .fatbin and host .o.

Cool, I agree this is worth investigating.



================
Comment at: lib/Driver/ToolChains/Cuda.cpp:536
+  }
 }
 
----------------
gtbercea wrote:
> sfantao wrote:
> > What prevents all this from being done in the bundler? If I understand it 
> > correctly, if the bundler implements this wrapping all the checks for 
> > librariers wouldn't be required and, only two changes would be required in 
> > the driver:
> > 
> > - generate fatbin instead of cubin. This is straightforward to do by 
> > changing the device assembling job. In terms of the loading of the kernels 
> > by the device API, doing it through fatbin or cubin should be equivalent 
> > except that fatbin enables storing the PTX format and JIT for newer GPUs.
> > - Use NVIDIA linker as host linker.
> > 
> > This last requirement could be problematic if we get two targets attempting 
> >  to use different (incompatible linkers). If we get this kind of 
> > incompatibility we should get the appropriate diagnostic.
> What prevents it is the fact that the bundler is called AFTER the HOST and 
> DEVICE object files have been produced. The creation of the fatbin (FATBINARY 
> + CALNG++) needs to happen within the NVPTX toolchain.
> 
Why does it have to happen in NVPTX toolchain, you are making the NVPTX 
toolchain generate an ELF object from another toolchain, right? What I'm 
suggesting is to do the stuff that mixes two (or more) toolchains in the 
bundler. Your inputs are still a fatbin and a host file.   


================
Comment at: test/Driver/openmp-offload.c:497
 // RUN:   %clang -###  -fopenmp=libomp -o %t.out -lsomelib -target 
powerpc64le-linux 
-fopenmp-targets=powerpc64le-ibm-linux-gnu,x86_64-pc-linux-gnu %t.i 
-no-canonical-prefixes 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-UBJOBS %s
 // RUN:   %clang -### -fopenmp=libomp -o %t.out -lsomelib -target 
powerpc64le-linux 
-fopenmp-targets=powerpc64le-ibm-linux-gnu,x86_64-pc-linux-gnu %t.i -save-temps 
-no-canonical-prefixes 2>&1 \
----------------
gtbercea wrote:
> gtbercea wrote:
> > sfantao wrote:
> > > We need a test for the static linking. The host linker has to be nvcc in 
> > > that case, right?
> > The host linker is "ld". The "bundling" step is replaced (in the case of 
> > OpenMP NVPTX device offloading only) by a call to "ld -r" to partially link 
> > the 2 object files: the object file produced by the HOST toolchain and the 
> > object file produced by the OpenMP NVPTX device offloading toolchain 
> > (because we want to produce a single output).
> nvcc is not called at all in this patch.
Ok, so how do you link device code? I.e. if you have two compilation units that 
depend on each other (some definition in one unit is used in the other), where 
are they linked together? Something has to understand the two files resulting 
from your "ld -r" step, my understanding is that that something is nvcc that 
calls nvlink behind the scenes, right? So, nvcc will do the unbundling+linking 
bit, right?


Repository:
  rC Clang

https://reviews.llvm.org/D47394



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Reply via email to