Re: [PATCH] D9888: [OPENMP] Driver support for OpenMP offloading

Samuel Antao via cfe-commits Wed, 06 Apr 2016 18:54:07 -0700

sfantao marked 8 inline comments as done.
sfantao added a comment.

Hi Eric,

Thanks for the review!

As you are probably a aware, I started partitioning this patch following your 
initial concern related with the size of this patch and the feedback I got from 
http://lists.llvm.org/pipermail/cfe-dev/2016-February/047547.html. I am keeping 
this patch as it shows the big picture of what I am trying to accomplish, so if 
you prefer to add other higher level suggesting here that's perfectly fine. Let 
me know if there is a more proper way to link patches.

So, I am incorporating your suggestions here in the partioned patches as 
specified in the inline comments. The partitioned patches are 
http://reviews.llvm.org/D18170, http://reviews.llvm.org/D18171 and 
http://reviews.llvm.org/D18172.

> One meta comment: offload appears to be something that could be used for CUDA 
> and OpenMP (and OpenACC etc) as a term. I think we should either merge these 
> concepts or pick a different name :)

Yes, I agree. I am now using `offloading`. I only refer to the programming 
model name if the code relates to something specific of that programming model.

Thanks again,
Samuel

================
Comment at: include/clang/Driver/Driver.h:210-213
@@ +209,6 @@
+  /// owns all the ToolChain objects stored in it, and will clean them up when
+  /// torn down. We use a different cache for offloading as it is possible to
+  /// have offloading toolchains with the same triple the host has, and the
+  /// implementation has to differentiate the two in order to adjust the
+  /// commands for offloading.
+  mutable llvm::StringMap<ToolChain *> OffloadToolChains;
----------------
echristo wrote:
> Example?
I got rid of this extra toolchain cache and I am organizing it in a multimap by 
offload kind as Art suggested in http://reviews.llvm.org/D18170. That avoids 
the multiple containers for the offloading toolchains (this one and the ordered 
one).

================
Comment at: include/clang/Driver/Driver.h:216-217
@@ +215,4 @@
+
+  /// \brief Array of the toolchains of offloading targets in the order they
+  /// were requested by the user.
+  SmallVector<const ToolChain *, 4> OrderedOffloadingToolchains;
----------------
echristo wrote:
> Any reason?
Currently in OpenMP any directive that relates with offloading supports a 
`device()` clause that basically specifies which device to use for that region 
or data transfer. E.g.

```
void foo() {
 ...
}

void bar(int i) {
  #pragma omp target device(i)
   foo();
}
```
... here foo is going to be executed on the device `i`. The problem is that the 
device is an integer - it does not tell which device type it is - so it is up 
to the implementation to decide how `i` is interpreted. So, if we have a system 
with two GPUs and two DSP devices. We may bind 0-1 to the GPUs and 2-3 to the 
DSPs. 

My goal with preserving the order of the toolchains was to allow codegen to 
leverage that information and make a better decision on how to bind devices to 
integers. Maybe, if the user requests the GPU toolchain first he may be 
interested in prioritizing its use, so the first IDs would map to GPUs. Making 
a long story short, this is only about preserving information so that codegen 
can use it. 

In any case, this is going to change in the future as the OpenMP language  
committee is working on having a device identifier to use instead of an 
integer.  So, if you prefer remove the `ordered` out of the name, I am not 
opposed to that.

================
Comment at: include/clang/Driver/Driver.h:427-435
@@ -383,10 +426,11 @@
   /// action \p A.
   void BuildJobsForAction(Compilation &C,
                           const Action *A,
                           const ToolChain *TC,
                           const char *BoundArch,
                           bool AtTopLevel,
                           bool MultipleArchs,
                           const char *LinkingOutput,
-                          InputInfo &Result) const;
+                          InputInfo &Result,
+                          OffloadingHostResultsTy &OffloadingHostResults) 
const;

----------------
echristo wrote:
> This function is starting to get a little silly. Perhaps we should look into 
> refactoring such that this doesn't need to be "the one function that rules 
> them all". Perhaps a different ownership model for the things that are 
> arguments here?
This has changed a little in recent CUDA work, in the version 
http://reviews.llvm.org/D18171 is based on, `Result` is returned instead of 
being passed by reference, and we have a `string/action-result map. I'll have 
to add to that string the offloading kind eventually, but in the partitioned 
patches I didn't touch that yet.

Do you suggest having that cache owned by the driver instead of passing it 
along?

================
Comment at: lib/Driver/Compilation.cpp:66-67
@@ +65,4 @@
+
+    // Check if there is any offloading specific translation to do.
+    DerivedArgList *OffloadArgs = TC->TranslateOffloadArgs(*Entry, BoundArch);
+    if (OffloadArgs) {
----------------
echristo wrote:
> Hmm?
This relates in some extend to your other question: how do we pass 
device-specific options.

So, right now we are relying on the host options to derive device-specific 
options. This hook was meant to make the tuning of the host options so that 
things that do not make sense on the device are filtered. Also, the device 
resulting image is usually a shared library so it that can be easily loaded, 
this hook is also used to specify the options that result in a shared library, 
even if the host options don't ask for a host shared library.

Can you think of a better way to abstract this?

================
Comment at: lib/Driver/Driver.cpp:224-225
@@ +223,4 @@
+
+/// \brief Dump the job bindings for a given action.
+static void DumpJobBindings(ArrayRef<const ToolChain *> TCs, StringRef 
ToolName,
+                            ArrayRef<InputInfo> Inputs,
----------------
echristo wrote:
> This can probably be done separately? Can you split this out and make it 
> generally useful?
Given the feedback I got in 
http://lists.llvm.org/pipermail/cfe-dev/2016-February/047547.html, I end up 
moving most the functionality that I have in jobs creation to the creation of 
actions. Having a action graph that shows the offloading specifics was desired 
feature. As a result, what gets more complex is the dump of the actions. 

In http://reviews.llvm.org/D18171 I have an example on how that dump looks 
like. That patch also proposes a unified offloading action that should be 
reused by the different offloading programming models. Does this address your 
concern?

================
Comment at: lib/Driver/Driver.cpp:2045-2051
@@ -1739,11 +2044,9 @@
     // checking the backend tool, check if the tool for the CompileJob
-    // has an integrated assembler.
-    const ActionList *BackendInputs = &(*Inputs)[0]->getInputs();
-    // Compile job may be wrapped in CudaHostAction, extract it if
-    // that's the case and update CollapsedCHA if we combine phases.
-    CudaHostAction *CHA = dyn_cast<CudaHostAction>(*BackendInputs->begin());
-    JobAction *CompileJA =
-        cast<CompileJobAction>(CHA ? *CHA->begin() : *BackendInputs->begin());
-    assert(CompileJA && "Backend job is not preceeded by compile job.");
-    const Tool *Compiler = TC->SelectTool(*CompileJA);
-    if (!Compiler)
+    // has an integrated assembler. However, if OpenMP offloading is required
+    // the backend and compile jobs have to be kept separate and an integrated
+    // assembler of the backend job will be queried instead.
+    JobAction *CurJA = cast<BackendJobAction>(*Inputs->begin());
+    const ActionList *BackendInputs = &CurJA->getInputs();
+    CudaHostAction *CHA = nullptr;
+    if (!RequiresOpenMPOffloading(TC)) {
+      // Compile job may be wrapped in CudaHostAction, extract it if
----------------
echristo wrote:
> Might be time to make some specialized versions of this function. This may 
> take it from "ridiculously confusing" to "code no one should ever look at" :)
I agree. This function is really messy... :S

In http://reviews.llvm.org/D18171 I am proposing `collapseOffloadingAction` 
that drives the collapsing of offload actions and abstracts some of the 
complexity in `selectToolForJob`. Do you think that goes in the right 
direction, or you think I should do something else?

================
Comment at: lib/Driver/Tools.cpp:6032
@@ +6031,3 @@
+  // The (un)bundling command looks like this:
+  // clang-offload-bundler -type=bc
+  //   -omptargets=host-triple,device-triple1,device-triple2
----------------
echristo wrote:
> Should we get the offload bundler in first so that the interface is there and 
> testable? (Honest question, no particular opinion here). Though the command 
> lines there will affect how this code is written. 
Yes, sure, I proposed an implementation of the bundler, using a generic format 
in http://reviews.llvm.org/D13909. Let me know any comments you have about that 
specific component.

I still need to add testing specific to http://reviews.llvm.org/D13909, which I 
didn't yet because I didn't know where it was supposed to live - maybe in the 
Driver? Do you have an opinion about that?

Also, in http://lists.llvm.org/pipermail/cfe-dev/2016-February/047547.html,  
the generic opinion was that the bundler should use the host object format to 
bundle whenever possible. So, I also have to add a default behavior for the 
binary bundler when the input is an object file. For the other input types, I 
don't think there were any strong opinions. Do you happen to have one?

In any case, I was planing to add the object file specific bundling in a 
separate patch, which seems to me  a natural way to partition the bundler 
functionality. Does that sound like a good plan?

================
Comment at: test/OpenMP/target_driver.c:41-47
@@ +40,9 @@
+
+// CHK-PHASES-LIB-DAG: {{.*}}: linker, {[[L0:[0-9]+]], [[A0:[0-9]+]]}, image
+// CHK-PHASES-LIB-DAG: [[A0]]: assembler, {[[A1:[0-9]+]]}, object
+// CHK-PHASES-LIB-DAG: [[A1]]: backend, {[[A2:[0-9]+]]}, assembler
+// CHK-PHASES-LIB-DAG: [[A2]]: compiler, {[[A3:[0-9]+]]}, ir
+// CHK-PHASES-LIB-DAG: [[A3]]: preprocessor, {[[I:[0-9]+]]}, cpp-output
+// CHK-PHASES-LIB-DAG: [[I]]: input, {{.*}}, c
+// CHK-PHASES-LIB-DAG: [[L0]]: input, "m", object
+
----------------
echristo wrote:
> Do we really think the phases should be a DAG check?
Using a DAG seemed to me a robust way to test that. I'd have to double check, 
but several map containers are used for the inputs and actions, so the order 
may depend on the implementation of the container. I was just trying to use a 
safe way to test.

Do you prefer to change this to the exact sequence I am getting?

================
Comment at: test/OpenMP/target_driver.c:54
@@ +53,3 @@
+// RUN:   echo 'bla' > %t.o
+// RUN:   %clang -ccc-print-phases -lm -fopenmp=libomp -target 
powerpc64-ibm-linux-gnu -omptargets=x86_64-pc-linux-gnu,powerpc64-ibm-linux-gnu 
%s %t.o 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-PHASES-OBJ %s
----------------
echristo wrote:
> How do you pass options to individual omptargets? e.g. -mvsx or -mavx2?
Well, currently I don't. In 
http://lists.llvm.org/pipermail/cfe-dev/2016-February/047547.html I was 
proposing something to tackle that, but the opinion was that it was somewhat 
secondary and the driver design should be settled first.

What I as proposing was some sort of group option associated with the device 
triple. The idea was to avoid proliferation of device specific options and 
reuse what we already have, just organize it groups so that i could be 
forwarded to the right tool chain. The goal was to make things like this 
possible:
```
clang -mcpu=pwr8 -target-offload=nvptx64-nvidia-cuda -fopenmp -mcpu=sm_35 
-target-offload=nvptx64-nvidia-cuda -fcuda -mcpu=sm_32 a.c 
```
... where mcpu is used to specify the cpu/gpu for the different tool chains and 
programing models. This would also be useful to specify include and library 
paths that only make sense to the device.

Do you have any opinion about that?

http://reviews.llvm.org/D9888

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Re: [PATCH] D9888: [OPENMP] Driver support for OpenMP offloading

Reply via email to