[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-08-07 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

Do we still need this? I think what we really need to solve is the problem of 
(host) inline assembly in the header files...


Repository:
  rC Clang

https://reviews.llvm.org/D47849



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-08-08 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D47849#1190997, @gtbercea wrote:

> Don't we want to use device specific math functions?
>  It's not just about avoiding some the host specific assembly, it's also 
> about getting an implementation tailored to the device.


Ok, so you are already talking about performance. I think we should fix 
correctness first, in particular the compiler shouldn't complain whenever 
`` is included.

I experimented with adding only a minimum of target defines (`__amd64__` and 
`__x86_64__`): While I think this is a step into the right direction it still 
fails when including ``.

Btw the GCC folks don't have a complete solution either: If you compile with 
`-O2` you get the same complaints once the code starts calling `signbit`. Maybe 
Clang should also implement lazy Sema checking for device side compilation?


Repository:
  rC Clang

https://reviews.llvm.org/D47849



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-08-08 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D47849#1192134, @gtbercea wrote:

> This patch is concerned with calling device functions when you're on the 
> device. The correctness issues you mention are orthogonal to this and should 
> be handled by another patch. I don't think this patch should be held up any 
> longer.


I'm confused by now, could you please highlight the point that I'm missing?

IIRC you started to work on this to fix the problem with inline assembly (see 
https://reviews.llvm.org/D47849#1125019). AFAICS this patch fixes declarations 
of math functions but you still cannot include `math.h` which most "correct" 
codes do.

In https://reviews.llvm.org/D47849#1170670, @tra wrote:

> The rumors of "high performance" functions in the libdevice are somewhat 
> exaggerated , IMO. If you take a look at the IR in the libdevice of recent 
> CUDA version, you will see that a lot of the functions just call their llvm 
> counterpart. If it turns out that in some case llvm generates slower code 
> than what nvidia provides, I'm sure it will be possible to implement a 
> reasonably fast replacement.


So regarding performance it's not yet clear to me which cases actually benefit: 
Is there a particular function that is slow if LLVM's backend resolves the call 
vs. the wrapper script directly calls libdevice?
If I understand @tra's comment correctly, I think we should have clear evidence 
(ie a small "benchmark") that this patch actually improves performance.


Repository:
  rC Clang

https://reviews.llvm.org/D47849



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-08-08 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D47849#1192321, @gtbercea wrote:

> > IIRC you started to work on this to fix the problem with inline assembly 
> > (see https://reviews.llvm.org/D47849#1125019). AFAICS this patch fixes 
> > declarations of math functions but you still cannot include `math.h` which 
> > most "correct" codes do.
>
> I'm not sure what you mean by this. This patch enables me to include math.h.


`math.c`:

  #include 

executed commands:

   $ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -c math.c -O2
  In file included from math.c:1:
  In file included from /usr/include/math.h:413:
  /usr/include/bits/mathinline.h:131:43: error: invalid input constraint 'x' in 
asm
__asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
^
  /usr/include/bits/mathinline.h:143:43: error: invalid input constraint 'x' in 
asm
__asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
^
  2 errors generated.


Repository:
  rC Clang

https://reviews.llvm.org/D47849



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-08-08 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D47849#1192375, @gtbercea wrote:

> I do not get that error.


In the beginning you said that you were facing the same error. Did that go away 
in the meantime?
Are you testing on x86 or Power? With optimizations enabled?


Repository:
  rC Clang

https://reviews.llvm.org/D47849



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-08-08 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D47849#1192493, @gtbercea wrote:

> @Hahnfeld do you get the same error if you compile with clang++ instead of 
> clang?


Yes, with both trunk and this patch applied. It's the same header after all...


Repository:
  rC Clang

https://reviews.llvm.org/D47849



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-08-10 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld removed a reviewer: Hahnfeld.
Hahnfeld added a comment.

I feel like there is no progress in the discussion (here and off-list), partly 
because we might still not be talking about the same things. So I'm stepping 
down from this revision to unblock review from somebody else.

Here's my current understanding of the issue(s):

- `math.h` (or transitively included files) on both PowerPC and x86 contain 
inline assembly.
  - On x86 Clang directly bails out because the code is using the `x` input 
constraint which doesn't exist for NVPTX (-> `invalid input constraint 'x' in 
asm`).
  - From my understanding the header passes Sema analysis on PowerPC, but 
rejects CodeGen because the assembly instructions are invalid on NVPTX?
- This problem can be avoided (for testing purposes; including `math.h` should 
be fixed as well some day!) by explicitly declaring all needed math functions 
(like `extern double exp(double);`)
  - Without additional flags this makes Clang emit Intrinsic Functions 
 like `@llvm.exp.f64` 
for NVPTX.
  - That's because `IsMathErrnoDefault()` returns `false` for the Cuda 
ToolChain. This behaviour can be overwritten using `-fmath-errno` (the test 
case `nvptx_device_math_functions.c` uses this flag; I'm not sure why?)
- That at least looks to be producing correct IR in both cases which is then 
passed to the backend:
  1. For intrinsic functions (with some notable exceptions) the backend 
complains `Cannot select: [...] ExternalSymbol'exp'`.
- Some exceptions are `sqrt.f32`, `sqrt.f64`, `sin.f32` and `cos.f32`: The 
backend will directly lower them to the corresponding PTX instruction. 
Unfortunately there is none for `exp`...
  2. For "real" function calls (like `call double @exp(double %3)`) `nvlink` 
will throw `Undefined reference` errors.

This patch takes the following approach:

1. Avoid intrinsics for math builtins by passing `-fno-math-builtin` for device 
compilation.
2. Use the CUDA header to redirect math functions to their libdevice 
equivalents in the frontend, mostly just prefixed by `__nv_` (for example 
`exp(a)` -> `__nv_exp(a)`).

The downside of this approach is that LLVM doesn't recognize these function 
calls and doesn't perform optimizations to fold libcalls. For example `pow(a, 
2)` is transformed into a multiplication but `__nv_pow(a, 2)` is not.

In https://reviews.llvm.org/D47849#1124638, @Hahnfeld wrote:

> IMO this goes into the right direction, we should use the fast implementation 
> in libdevice.


So yeah, my comment seems to be outdated if these simple optimizations don't 
happen anymore with this patch: I don't want to use a fast `pow(a, 2)`, I don't 
want to call a library function for that at all.

We could of course make LLVM recognize the calls to libdevice and handle them 
the same way. But that's adding more workarounds to make this patch not regress 
on easy cases (in terms of transformations).
Another approach would be to make the NVPTX backend lower remaining calls of 
math functions to libdevice equivalents. I came across 
https://reviews.llvm.org/D34708 which seems to go into that direction (but 
doesn't work out-of-the-box after fixing some build errors, complaing about 
`Undefined external symbol`s because libdevice is optimized away as it wasn't 
needed before)...


Repository:
  rC Clang

https://reviews.llvm.org/D47849



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-16 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.
Hahnfeld added reviewers: tra, gtbercea, hfinkel.
Herald added subscribers: cfe-commits, guansong.

When compiling CUDA or OpenMP device code Clang parses header files
that expect certain predefined macros from the host architecture. To
make this work the compiler passes the host triple via the -aux-triple
argument and (until now) pulls in all macros for that "auxiliary triple"
unconditionally.

However this results in defines like __SSE_MATH__ that will trigger
inline assembly making use of the "advertised" target features. See
the discussion of https://reviews.llvm.org/D47849 and PR38464 for a detailed 
explanation of
the encountered problems.

Instead of blacklisting "known bad" examples this patch starts adding
defines that are needed for certain headers like bits/wordsize.h and
bits/mathinline.h.
The disadvantage of this approach is that it decouples the definitions
from their target toolchain. However in my opinion it's more important
to keep definitions for one header close together. For one this will
include a clear documentation why these particular defines are needed.
Furthermore it simplifies maintenance because adding defines for a new
header or support for a new aux-triple only needs to touch one piece
of code.


Repository:
  rC Clang

https://reviews.llvm.org/D50845

Files:
  lib/Frontend/InitPreprocessor.cpp
  test/Preprocessor/aux-triple.c
  test/SemaCUDA/builtins.cu

Index: test/SemaCUDA/builtins.cu
===
--- test/SemaCUDA/builtins.cu
+++ test/SemaCUDA/builtins.cu
@@ -12,8 +12,8 @@
 // RUN: -aux-triple x86_64-unknown-unknown \
 // RUN: -fsyntax-only -verify %s
 
-#if !(defined(__amd64__) && defined(__PTX__))
-#error "Expected to see preprocessor macros from both sides of compilation."
+#if !defined(__x86_64__)
+#error "Expected to see preprocessor macros from the host."
 #endif
 
 void hf() {
Index: test/Preprocessor/aux-triple.c
===
--- /dev/null
+++ test/Preprocessor/aux-triple.c
@@ -0,0 +1,48 @@
+// Ensure that Clang sets some very basic target defines based on -aux-triple.
+
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN: | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+// RUN: %clang_cc1 -x c++ -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN: | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN: | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+
+// CUDA:
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none -aux-triple powerpc64le-none-none \
+// RUN: | FileCheck -match-full-lines -check-prefixes NVPTX64,PPC64 %s
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none -aux-triple x86_64-none-none \
+// RUN: | FileCheck -match-full-lines -check-prefixes NVPTX64,X86_64 %s
+
+// OpenMP:
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device \
+// RUN: -triple nvptx64-none-none -aux-triple powerpc64le-none-none \
+// RUN: | FileCheck -match-full-lines -check-prefixes NVPTX64,PPC64 %s
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device \
+// RUN: -triple nvptx64-none-none -aux-triple x86_64-none-none \
+// RUN: | FileCheck -match-full-lines -check-prefixes NVPTX64,X86_64 %s
+// RUN: %clang_cc1 -x c++ -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device \
+// RUN: -triple nvptx64-none-none -aux-triple powerpc64le-none-none \
+// RUN: | FileCheck -match-full-lines -check-prefixes NVPTX64,PPC64 %s
+// RUN: %clang_cc1 -x c++ -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device \
+// RUN: -triple nvptx64-none-none -aux-triple x86_64-none-none \
+// RUN: | FileCheck -match-full-lines -check-prefixes NVPTX64,X86_64 %s
+
+// NVPTX64:#define _LP64 1
+// NVPTX64:#define __LP64__ 1
+// NVPTX64:#define __NVPTX__ 1
+// NVPTX64:#define __PTX__ 1
+
+// NONE-NOT:#define __powerpc64__
+// NONE-NOT:#define __x86_64__
+
+// PPC64:#define __powerpc64__ 1
+// X86_64:#define __x86_64__ 1
Index: lib/Frontend/InitPreprocessor.cpp
===
--- lib/Frontend/InitPreprocessor.cpp
+++ lib/Frontend/InitPreprocessor.cpp
@@ -1099,6 +1099,24 @@
   TI.getTargetDefines(LangOpts, Builder);
 }
 
+/// Initialize macros based on AuxTargetInfo.
+static void InitializePredefinedAuxMacros(const TargetInfo &AuxTI,
+  MacroBuilder &Builder) {
+  // Define basic target macros needed by at least bits/wordsize.h and
+  // bits/mathinline.h
+  switch (AuxTI.

[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-16 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added inline comments.



Comment at: test/SemaCUDA/builtins.cu:15-17
+#if !defined(__x86_64__)
+#error "Expected to see preprocessor macros from the host."
 #endif

@tra I'm not sure here: Do we want `__PTX__` to be defined during host 
compilation? I can't think of a valid use case, but you have more experience 
with user code.


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-16 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1202540, @ABataev wrote:

> Maybe for device compilation we also should define `__NO_MATH_INLINES` and 
> `__NO_STRING_INLINES` macros to disable inline assembly in glibc?


The problem is that `__NO_MATH_INLINES` doesn't even avoid all inline assembly 
from `bits/mathinline.h` :-( incidentally Clang already defines 
`__NO_MATH_INLINES` for x86 (due to an old bug which has been fixed long ago) - 
and on CentOS we still have problems as described in PR38464.

As a second thought: This might be valid for NVPTX, but I don't think it's a 
good idea for x86-like offloading targets - they might well profit from inline 
assembly code.


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-16 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1202838, @tra wrote:

> In https://reviews.llvm.org/D50845#1202551, @ABataev wrote:
>
> > In https://reviews.llvm.org/D50845#1202550, @Hahnfeld wrote:
> >
> > > In https://reviews.llvm.org/D50845#1202540, @ABataev wrote:
> > >
> > > > Maybe for device compilation we also should define `__NO_MATH_INLINES` 
> > > > and `__NO_STRING_INLINES` macros to disable inline assembly in glibc?
> > >
> > >
> > > The problem is that `__NO_MATH_INLINES` doesn't even avoid all inline 
> > > assembly from `bits/mathinline.h` :-( incidentally Clang already defines 
> > > `__NO_MATH_INLINES` for x86 (due to an old bug which has been fixed long 
> > > ago) - and on CentOS we still have problems as described in PR38464.
> > >
> > > As a second thought: This might be valid for NVPTX, but I don't think 
> > > it's a good idea for x86-like offloading targets - they might well profit 
> > > from inline assembly code.
> >
> >
> > I'm not saying that we should define those macros for all targets, only for 
> > NVPTX. But still, it may disable some inline assembly for other 
> > architectures.
>
>
> IMO, trying to avoid inline assembly by defining(or not) some macros and 
> hoping for the best is rather fragile as we'll have to chase *all* patches 
> that host's math.h may have on any given system.


Completely agree here: This patch tries to pick the low-hanging fruits that 
happen to fix `include ` on most systems (and addressing a 
long-standing `FIXME` in the code). I know there are more headers that define 
inline assembly unconditionally and need more advanced fixes (see below).

> If I understand it correctly, the root cause of this exercise is that we want 
> to compile for GPU using plain C. CUDA avoids this issue by separating device 
> and host code via target attributes and clang has few special cases to ignore 
> inline assembly errors in the host code if we're compiling for device. For 
> OpenMP there's no such separation, not in the system headers, at least.

Yes, that's one of the nice properties of CUDA (for the compiler). There used 
to be the same restriction for OpenMP where all functions used in `target` 
regions needed to be put in `declare target`. However that was relaxed in favor 
of implicitly marking all **called** functions in that TU to be `declare 
target`.
So ideally I think Clang should determine which functions are really `declare 
target` (either explicit or implicit) and only run semantical analysis on them. 
If a function is then found to be "broken" it's perfectly desirable to error 
back to the user.


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-16 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1202963, @hfinkel wrote:

> As a result, we should really have a separate header that has those 
> actually-available functions. When targeting NVPTX, why don't we have the 
> included math.h be CUDA's math.h? In the end, those are the functions we need 
> to call when we generate code. Right?


That's what https://reviews.llvm.org/D47849 deals with.


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-17 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1202973, @ABataev wrote:

> > So ideally I think Clang should determine which functions are really 
> > `declare target` (either explicit or implicit) and only run semantical 
> > analysis on them. If a function is then found to be "broken" it's perfectly 
> > desirable to error back to the user.
>
> It is not possible for OpenMP because we support implicit declare target 
> functions. Clang cannot identify whether the function is going to be used on 
> the device or not during sema analysis.


You are right, we can't do this during device compilation because we don't have 
an AST before Sema.

However I'm currently thinking about the following:

1. Identify explicit and implicit `declare target` functions during host Sema 
and CodeGen.
2. Attach meta-data for all of them to LLVM IR `.bc` which is passed via  
`-fopenmp-host-ir-file-path`. I think we already do something similar for 
outlined `target` regions?
3. During device Sema query that meta-data so Clang knows when a function will 
be called from within a `target` region. Skip analysis of functions that are 
not needed for the device, just as CUDA does.
4. Check that we don't need functions that weren't marked in 2. That's to catch 
users doing something like:

  #pragma omp target
  {
  #ifdef __NVPTX__
target_func()
  #endif
  }

For now that's just an idea, I didn't start implementing any of this yet. Do 
you think that could work?


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-17 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1203967, @ABataev wrote:

> In https://reviews.llvm.org/D50845#1203961, @Hahnfeld wrote:
>
> > In https://reviews.llvm.org/D50845#1202973, @ABataev wrote:
> >
> > > > So ideally I think Clang should determine which functions are really 
> > > > `declare target` (either explicit or implicit) and only run semantical 
> > > > analysis on them. If a function is then found to be "broken" it's 
> > > > perfectly desirable to error back to the user.
> > >
> > > It is not possible for OpenMP because we support implicit declare target 
> > > functions. Clang cannot identify whether the function is going to be used 
> > > on the device or not during sema analysis.
> >
> >
> > You are right, we can't do this during device compilation because we don't 
> > have an AST before Sema.
> >
> > However I'm currently thinking about the following:
> >
> > 1. Identify explicit and implicit `declare target` functions during host 
> > Sema and CodeGen.
> > 2. Attach meta-data for all of them to LLVM IR `.bc` which is passed via  
> > `-fopenmp-host-ir-file-path`. I think we already do something similar for 
> > outlined `target` regions?
> > 3. During device Sema query that meta-data so Clang knows when a function 
> > will be called from within a `target` region. Skip analysis of functions 
> > that are not needed for the device, just as CUDA does.
> > 4. Check that we don't need functions that weren't marked in 2. That's to 
> > catch users doing something like: ```lang=c #pragma omp target { #ifdef 
> > __NVPTX__ target_func() #endif } ```
> >
> >   For now that's just an idea, I didn't start implementing any of this yet. 
> > Do you think that could work?
>
>
> I thought about this approach already. But it won't work in general. The main 
> problem here is that host and device compilation phases may end up with the 
> different set of implicit declare target functions. The main problem here not 
> the user code, but the system libraries, which may use the different set of 
> functions.


How common is that for functions that are used in `target` regions? In the 
worst case we can make my fourth point a warning and lose Sema checking for 
those functions.

> Another one problem here is that the user may use the function that has some 
> host assembler inside. In this case we still need to emit error message, 
> otherwise, we may end up with the compiler crash.

Once we know which functions are used, they can be checked as usual.

> The best solution is to use only device specific header files. Device 
> compilation phase should use system header files for the host at all.

You mean "shouldn't use system header files for the host"? I think that may be 
hard to achieve, especially if we want to Sema check all of the source code 
during device compilation.


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-17 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1203991, @ABataev wrote:

> In https://reviews.llvm.org/D50845#1203973, @Hahnfeld wrote:
>
> > In https://reviews.llvm.org/D50845#1203967, @ABataev wrote:
> >
> > > I thought about this approach already. But it won't work in general. The 
> > > main problem here is that host and device compilation phases may end up 
> > > with the different set of implicit declare target functions. The main 
> > > problem here not the user code, but the system libraries, which may use 
> > > the different set of functions.
> >
> >
> > How common is that for functions that are used in `target` regions? In the 
> > worst case we can make my fourth point a warning and lose Sema checking for 
> > those functions.
>
>
> It does not matter how common is it or not. If the bad situation can happen, 
> it will happen.
>  Warning won't work here, because, again, you may end up with the code that 
> may cause compiler crash for the device. For example, if the system function 
> uses throw/catch stmts, we may emit the warning for this function, but will 
> have troubles during the codegen.


Right, warning wasn't a good thought. We really want strict checking and would 
have to error out when we find a function that wasn't implicitly `declare 
target` on the host.
I meant to ask how common that would be? If that's only some known functions we 
could handle them separately.

>>> The best solution is to use only device specific header files. Device 
>>> compilation phase should use system header files for the host at all.
>> 
>> You mean "shouldn't use system header files for the host"? I think that may 
>> be hard to achieve, especially if we want to Sema check all of the source 
>> code during device compilation.
> 
> Yes, I mean should not. Yes, this is hard to achieve but that's the only 
> complete and correct solution. Everything else looks like a non-stable hack.

How do you propose to handle inline assembly in non-system header files?


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-17 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1204210, @ABataev wrote:

> > Right, warning wasn't a good thought. We really want strict checking and 
> > would have to error out when we find a function that wasn't implicitly 
> > `declare target` on the host.
> >  I meant to ask how common that would be? If that's only some known 
> > functions we could handle them separately.
>
> Again, it does not matter how common is this situation. We cannot rely on the 
> probability here, we need to make the compiler to work correctly in all 
> possible situation, no matter how often they can occur.


Got that, I agree on the conservative approach: If we find a function to be 
called that wasn't checked (because it wasn't implicitly `declare target` on 
the host) the compiler can error out. That should be correct in all cases, 
shouldn't it?

There's a trade-off here:

- How many TUs pass full analysis and how many don't? (today's situation; we 
know that some headers don't work)
- How many TUs pass when we only check called functions (and error if we call 
non-checked ones) and how many regress compared to today's situation?

If the number of regressions is zero for all practical situations but we can 
compile some important cases, that should be a win.

> The best solution is to use only device specific header files. Device 
> compilation phase should use system header files for the host at all.
 
 You mean "shouldn't use system header files for the host"? I think that 
 may be hard to achieve, especially if we want to Sema check all of the 
 source code during device compilation.
>>> 
>>> Yes, I mean should not. Yes, this is hard to achieve but that's the only 
>>> complete and correct solution. Everything else looks like a non-stable hack.
>> 
>> How do you propose to handle inline assembly in non-system header files?
> 
> Just like as usual - if the assembler is supported by the device - it is ok, 
> otherwise - error message.

Even if the function is never called? That would mean you can't include any 
`Eigen` header...


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-17 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1202540, @ABataev wrote:

> Maybe for device compilation we also should define `__NO_MATH_INLINES` and 
> `__NO_STRING_INLINES` macros to disable inline assembly in glibc?


Coming back to this original question:

- I just searched the headers on CentOS and Arch Linux and all cases 
considering these macros are guarded by `ifndef __x86_64__` which this patch 
still propagates for device compilation.
- From the CentOS package for PPC64LE it looks like the only affected case is 
in `bits/fenvinline.h` which defines the macros `fegetround()`, 
`feraiseexcept(__excepts)`, and `feclearexcept(__excepts)`. All matches in 
`bits/mathinline.h` are guarded by `ifndef __powerpc64__` or don't use inline 
assembly which should be fine.

As I'm not primarily developing on Power (and can't test such change), I'd ask 
you to create a patch adding these macros.


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-17 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1204340, @ABataev wrote:

> In https://reviews.llvm.org/D50845#1204216, @Hahnfeld wrote:
>
> > Got that, I agree on the conservative approach: If we find a function to be 
> > called that wasn't checked (because it wasn't implicitly `declare target` 
> > on the host) the compiler can error out. That should be correct in all 
> > cases, shouldn't it?
> >
> > There's a trade-off here:
> >
> > - How many TUs pass full analysis and how many don't? (today's situation; 
> > we know that some headers don't work)
> > - How many TUs pass when we only check called functions (and error if we 
> > call non-checked ones) and how many regress compared to today's situation? 
> > If the number of regressions is zero for all practical situations but we 
> > can compile some important cases, that should be a win.
>
>
> I need to think about it. We need to estimate all pros and cons here. It 
> might work.


I'll try to put together a protoype so that we can actually test.


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46540: [X86] ptwrite intrinsic

2018-05-08 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

Could you maybe add some short summaries to your patches? It's hard for 
non-Intel employees to guess what all these instructions do...


https://reviews.llvm.org/D46540



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46540: [X86] ptwrite intrinsic

2018-05-10 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D46540#1092620, @GBuella wrote:

> In https://reviews.llvm.org/D46540#1091625, @Hahnfeld wrote:
>
> > Could you maybe add some short summaries to your patches? It's hard for 
> > non-Intel employees to guess what all these instructions do...
>
>
> Well, I was thinking I could copy-paste this from 
> https://software.intel.com/en-us/articles/intel-sdm :
>  "This instruction reads data in the source operand and sends it to the Intel 
> Processor Trace hardware to be encoded
>  in a PTW packet if TriggerEn, ContextEn, FilterEn, and PTWEn are all set to 
> 1. For more details on these values, see
>  Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C, 
> Section 35.2.2, “Software Trace
>  Instrumentation with PTWRITE”."
>
> Do you think this would really help anyone? It appears to be just meaningless 
> without larger context.
>  Those who ever need this, need to read a lot of these manuals anyways, I 
> think noone in practice is going to be enlightened by such a short 
> description.
>
> That of course makes a lot more sense with simpler instructions, e.g. 
> movdir64b - I can just describe that as something like "atomically moving 64 
> bytes".


My 2 cents: I actually think this is worth a bit because it gives additional 
information so the reader can at least put the instruction into a category.


Repository:
  rC Clang

https://reviews.llvm.org/D46540



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47070: [CUDA] Upgrade linked bitcode to enable inlining

2018-05-18 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.
Hahnfeld added reviewers: tra, jlebar.
Herald added a subscriber: cfe-commits.

Revision https://reviews.llvm.org/rC329829 added the architecture to 
"target-features". This
prevents inlining of previously generated bitcode because the
feature sets don't match. Thus duplicate the information from
"target-cpu" to avoid writing special cases in the analysis.

I'm not sure if that will save us in the long term because inlining
will break again when we add new features. Additionally, using later
CUDA versions might raise the PTX version which is also a feature...


Repository:
  rC Clang

https://reviews.llvm.org/D47070

Files:
  lib/CodeGen/CGCall.cpp
  test/CodeGenCUDA/Inputs/device-code-2.ll
  test/CodeGenCUDA/Inputs/device-code.ll
  test/CodeGenCUDA/link-device-bitcode.cu

Index: test/CodeGenCUDA/link-device-bitcode.cu
===
--- test/CodeGenCUDA/link-device-bitcode.cu
+++ test/CodeGenCUDA/link-device-bitcode.cu
@@ -56,15 +56,24 @@
 // Make sure device_mul_or_add() is present in IR, is internal and
 // calls __nvvm_reflect().
 // CHECK-IR-LABEL: define internal float @_Z17device_mul_or_addff(
+// CHECK-IR-SAME: [[MUL_OR_ADD:#[0-9]+]] {
 // CHECK-IR-NLD-LABEL: define float @_Z17device_mul_or_addff(
 // CHECK-IR: call i32 @__nvvm_reflect
 // CHECK-IR: ret float
 
 // Make sure we've linked in and internalized only needed functions
 // from the second bitcode file.
 // CHECK-IR-2-LABEL: define internal double @__nv_sin
+// CHECK-IR-2-SAME: [[IR2ATTR:#[0-9]+]] {
 // CHECK-IR-2-LABEL: define internal double @__nv_exp
+// CHECK-IR-2-SAME: [[IR2ATTR]] {
 // CHECK-IR-2-NOT: double @__unused
 
+// CHECK-IR: attributes [[MUL_OR_ADD]] = {
+// CHECK-IR-SAME: "target-features"="+ptx42,+sm_35"
+
+// CHECK-IR-2: attributes [[IR2ATTR]] = {
+// CHECK-IR-2-SAME: "target-features"="+sm_35"
+
 // Verify that NVVMReflect pass is among the passes run by NVPTX back-end.
 // CHECK-REFLECT: Replace occurrences of __nvvm_reflect() calls with 0/1
Index: test/CodeGenCUDA/Inputs/device-code.ll
===
--- test/CodeGenCUDA/Inputs/device-code.ll
+++ test/CodeGenCUDA/Inputs/device-code.ll
@@ -16,7 +16,7 @@
ret void
 }
 
-define float @_Z17device_mul_or_addff(float %a, float %b) {
+define float @_Z17device_mul_or_addff(float %a, float %b) #0 {
   %reflect = call i32 @__nvvm_reflect(i8* addrspacecast (i8 addrspace(1)* getelementptr inbounds ([8 x i8], [8 x i8] addrspace(1)* @"$str", i32 0, i32 0) to i8*))
   %cmp = icmp ne i32 %reflect, 0
   br i1 %cmp, label %use_mul, label %use_add
@@ -36,3 +36,5 @@
 
   ret float %ret
 }
+
+attributes #0 = { "target-cpu"="sm_35" "target-features"="+ptx42" }
Index: test/CodeGenCUDA/Inputs/device-code-2.ll
===
--- test/CodeGenCUDA/Inputs/device-code-2.ll
+++ test/CodeGenCUDA/Inputs/device-code-2.ll
@@ -2,15 +2,16 @@
 
 target triple = "nvptx-unknown-cuda"
 
-define double @__nv_sin(double %a) {
+define double @__nv_sin(double %a) #0 {
ret double 1.0
 }
 
-define double @__nv_exp(double %a) {
+define double @__nv_exp(double %a) #0 {
ret double 3.0
 }
 
 define double @__unused(double %a) {
ret double 2.0
 }
 
+attributes #0 = { "target-cpu"="sm_35" }
Index: lib/CodeGen/CGCall.cpp
===
--- lib/CodeGen/CGCall.cpp
+++ lib/CodeGen/CGCall.cpp
@@ -1790,12 +1790,45 @@
   }
 }
 
+static bool hasTargetFeature(llvm::StringRef FeatureList,
+ llvm::StringRef Feature) {
+  StringRef Rest = FeatureList;
+  while (!Rest.empty()) {
+auto Split = Rest.split(',');
+if (Split.first == Feature)
+  return true;
+Rest = Split.second;
+  }
+
+  return false;
+}
+
 void CodeGenModule::AddDefaultFnAttrs(llvm::Function &F) {
   llvm::AttrBuilder FuncAttrs;
   ConstructDefaultFnAttrList(F.getName(),
  F.hasFnAttribute(llvm::Attribute::OptimizeNone),
  /* AttrOnCallsite = */ false, FuncAttrs);
   F.addAttributes(llvm::AttributeList::FunctionIndex, FuncAttrs);
+
+  if (getTriple().isNVPTX()) {
+// Revision 329829 added the architecture as a "target-feature". Duplicate
+// this information from "target-cpu" to maintain the ability to inline
+// functions from bitcode files compiled with older versions of LLVM/Clang.
+auto TargetCpu = F.getFnAttribute("target-cpu");
+if (TargetCpu.isStringAttribute()) {
+  llvm::StringRef CpuAttr = TargetCpu.getValueAsString();
+
+  auto TargetFeatures = F.getFnAttribute("target-features");
+  if (TargetFeatures.isStringAttribute()) {
+llvm::StringRef FeatureList = TargetFeatures.getValueAsString();
+if (!hasTargetFeature(FeatureList, CpuAttr.str())) {
+  F.addFnAttr("target-features", (FeatureList + ",+" + CpuAttr).str());
+}
+  } 

[PATCH] D47070: [CUDA] Upgrade linked bitcode to enable inlining

2018-05-18 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

I think that's intended because the generated code might use instructions based 
on that feature. If we want to ignore that, we could override 
`TargetTransformInfo::areInlineCompatible` for NVPTX to only compare 
`target-cpu`


Repository:
  rC Clang

https://reviews.llvm.org/D47070



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47070: [CUDA] Upgrade linked bitcode to enable inlining

2018-05-19 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added subscribers: chandlerc, ahatanak.
Hahnfeld added a comment.

Looks like this was added as a "temporary solution" in 
https://reviews.llvm.org/D8984. Meanwhile the attribute whitelist was merged 
half a year later (https://reviews.llvm.org/D7802), so maybe we can just get 
rid of comparing `target-cpu` and `target-features` entirely?


Repository:
  rC Clang

https://reviews.llvm.org/D47070



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47200: [Sema] Add tests for weak functions

2018-05-22 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.
Hahnfeld added reviewers: aaron.ballman, rjmccall.
Herald added a subscriber: cfe-commits.

I found these checks to be missing, just add some simple cases.


Repository:
  rC Clang

https://reviews.llvm.org/D47200

Files:
  test/Sema/attr-weak.c


Index: test/Sema/attr-weak.c
===
--- test/Sema/attr-weak.c
+++ test/Sema/attr-weak.c
@@ -1,7 +1,9 @@
 // RUN: %clang_cc1 -verify -fsyntax-only %s
 
+extern int f0() __attribute__((weak));
 extern int g0 __attribute__((weak));
 extern int g1 __attribute__((weak_import));
+int f2() __attribute__((weak));
 int g2 __attribute__((weak));
 int g3 __attribute__((weak_import)); // expected-warning {{'weak_import' 
attribute cannot be specified on a definition}}
 int __attribute__((weak_import)) g4(void);
@@ -11,6 +13,7 @@
 struct __attribute__((weak)) s0 {}; // expected-warning {{'weak' attribute 
only applies to variables, functions, and classes}}
 struct __attribute__((weak_import)) s1 {}; // expected-warning {{'weak_import' 
attribute only applies to variables and functions}}
 
+static int f() __attribute__((weak)); // expected-error {{weak declaration 
cannot have internal linkage}}
 static int x __attribute__((weak)); // expected-error {{weak declaration 
cannot have internal linkage}}
 
 // rdar://9538608


Index: test/Sema/attr-weak.c
===
--- test/Sema/attr-weak.c
+++ test/Sema/attr-weak.c
@@ -1,7 +1,9 @@
 // RUN: %clang_cc1 -verify -fsyntax-only %s
 
+extern int f0() __attribute__((weak));
 extern int g0 __attribute__((weak));
 extern int g1 __attribute__((weak_import));
+int f2() __attribute__((weak));
 int g2 __attribute__((weak));
 int g3 __attribute__((weak_import)); // expected-warning {{'weak_import' attribute cannot be specified on a definition}}
 int __attribute__((weak_import)) g4(void);
@@ -11,6 +13,7 @@
 struct __attribute__((weak)) s0 {}; // expected-warning {{'weak' attribute only applies to variables, functions, and classes}}
 struct __attribute__((weak_import)) s1 {}; // expected-warning {{'weak_import' attribute only applies to variables and functions}}
 
+static int f() __attribute__((weak)); // expected-error {{weak declaration cannot have internal linkage}}
 static int x __attribute__((weak)); // expected-error {{weak declaration cannot have internal linkage}}
 
 // rdar://9538608
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47201: [CUDA] Implement nv_weak attribute for functions

2018-05-22 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.
Hahnfeld added a reviewer: tra.
Herald added a subscriber: cfe-commits.

This is needed for relocatable device code with CUDA 9 and later.
Before this patch, linking two or more object files resulted in
"Multiple definition" errors for a group of functions from
cuda_device_runtime_api.h which are annoted with "nv_weak".

CUDA headers already used this attribute in earlier releases, but
until CUDA 8.0 the only definitions in cuda_device_runtime_api.h
were conditional under `defined(__CUDABE__)` which is explicitly
undefined in Clang's wrapper. However since CUDA 9.0 this has
changed to `!defined(__CUDACC_RTC__)`. Trying to add that define
resulted in errors that nvrtc_device_runtime.h could not be found.

Reported by Andrea Bocci!


Repository:
  rC Clang

https://reviews.llvm.org/D47201

Files:
  include/clang/Basic/Attr.td
  include/clang/Basic/DiagnosticSemaKinds.td
  lib/CodeGen/CodeGenModule.cpp
  lib/Sema/SemaDecl.cpp
  lib/Sema/SemaDeclAttr.cpp
  test/CodeGenCUDA/nv_weak.cu
  test/SemaCUDA/attr-declspec.cu
  test/SemaCUDA/attr-nv_weak.cu
  test/SemaCUDA/attributes-on-non-cuda.cu

Index: test/SemaCUDA/attributes-on-non-cuda.cu
===
--- test/SemaCUDA/attributes-on-non-cuda.cu
+++ test/SemaCUDA/attributes-on-non-cuda.cu
@@ -7,11 +7,12 @@
 // RUN: %clang_cc1 -DEXPECT_WARNINGS -fsyntax-only -verify -x c %s
 
 #if defined(EXPECT_WARNINGS)
-// expected-warning@+12 {{'device' attribute ignored}}
-// expected-warning@+12 {{'global' attribute ignored}}
-// expected-warning@+12 {{'constant' attribute ignored}}
-// expected-warning@+12 {{'shared' attribute ignored}}
-// expected-warning@+12 {{'host' attribute ignored}}
+// expected-warning@+13 {{'device' attribute ignored}}
+// expected-warning@+13 {{'global' attribute ignored}}
+// expected-warning@+13 {{'constant' attribute ignored}}
+// expected-warning@+13 {{'shared' attribute ignored}}
+// expected-warning@+13 {{'host' attribute ignored}}
+// expected-warning@+13 {{'nv_weak' attribute ignored}}
 //
 // NOTE: IgnoredAttr in clang which is used for the rest of
 // attributes ignores LangOpts, so there are no warnings.
@@ -24,11 +25,11 @@
 __attribute__((constant)) int* g_constant;
 __attribute__((shared)) float *g_shared;
 __attribute__((host)) void f_host();
+__attribute__((nv_weak)) void f_nv_weak();
 __attribute__((device_builtin)) void f_device_builtin();
 typedef __attribute__((device_builtin)) const void *t_device_builtin;
 enum __attribute__((device_builtin)) e_device_builtin {E};
 __attribute__((device_builtin)) int v_device_builtin;
 __attribute__((cudart_builtin)) void f_cudart_builtin();
-__attribute__((nv_weak)) void f_nv_weak();
 __attribute__((device_builtin_surface_type)) unsigned long long surface_var;
 __attribute__((device_builtin_texture_type)) unsigned long long texture_var;
Index: test/SemaCUDA/attr-nv_weak.cu
===
--- /dev/null
+++ test/SemaCUDA/attr-nv_weak.cu
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -verify -fsyntax-only %s
+
+extern int f0() __attribute__((nv_weak));
+extern int g0 __attribute__((nv_weak)); // expected-warning {{'nv_weak' attribute only applies to functions}}
+int f1() __attribute__((nv_weak));
+int g1 __attribute__((nv_weak)); // expected-warning {{'nv_weak' attribute only applies to functions}}
+
+
+struct __attribute__((nv_weak)) s0 {}; // expected-warning {{'nv_weak' attribute only applies to functions}}
+
+static int f() __attribute__((nv_weak)); // expected-error {{nv_weak declaration cannot have internal linkage}}
+
+static void pr14946_f();
+void pr14946_f() __attribute__((nv_weak)); // expected-error {{nv_weak declaration cannot have internal linkage}}
Index: test/SemaCUDA/attr-declspec.cu
===
--- test/SemaCUDA/attr-declspec.cu
+++ test/SemaCUDA/attr-declspec.cu
@@ -6,11 +6,12 @@
 // RUN: %clang_cc1 -DEXPECT_WARNINGS -fms-extensions -fsyntax-only -verify -x c %s
 
 #if defined(EXPECT_WARNINGS)
-// expected-warning@+12 {{'__device__' attribute ignored}}
-// expected-warning@+12 {{'__global__' attribute ignored}}
-// expected-warning@+12 {{'__constant__' attribute ignored}}
-// expected-warning@+12 {{'__shared__' attribute ignored}}
-// expected-warning@+12 {{'__host__' attribute ignored}}
+// expected-warning@+13 {{'__device__' attribute ignored}}
+// expected-warning@+13 {{'__global__' attribute ignored}}
+// expected-warning@+13 {{'__constant__' attribute ignored}}
+// expected-warning@+13 {{'__shared__' attribute ignored}}
+// expected-warning@+13 {{'__host__' attribute ignored}}
+// expected-warning@+13 {{'__nv_weak__' attribute ignored}}
 //
 // (Currently we don't for the other attributes. They are implemented with
 // IgnoredAttr, which is ignored irrespective of any LangOpts.)
@@ -23,12 +24,11 @@
 __declspec(__constant__) int* g_constant;
 __declspec(__shared__) float *g_shared;
 __declsp

[PATCH] D47200: [Sema] Add tests for weak functions

2018-05-25 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL333283: [Sema] Add tests for weak functions (authored by 
Hahnfeld, committed by ).
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D47200?vs=148021&id=148616#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D47200

Files:
  cfe/trunk/test/Sema/attr-weak.c


Index: cfe/trunk/test/Sema/attr-weak.c
===
--- cfe/trunk/test/Sema/attr-weak.c
+++ cfe/trunk/test/Sema/attr-weak.c
@@ -1,7 +1,9 @@
 // RUN: %clang_cc1 -verify -fsyntax-only %s
 
+extern int f0() __attribute__((weak));
 extern int g0 __attribute__((weak));
 extern int g1 __attribute__((weak_import));
+int f2() __attribute__((weak));
 int g2 __attribute__((weak));
 int g3 __attribute__((weak_import)); // expected-warning {{'weak_import' 
attribute cannot be specified on a definition}}
 int __attribute__((weak_import)) g4(void);
@@ -11,6 +13,7 @@
 struct __attribute__((weak)) s0 {}; // expected-warning {{'weak' attribute 
only applies to variables, functions, and classes}}
 struct __attribute__((weak_import)) s1 {}; // expected-warning {{'weak_import' 
attribute only applies to variables and functions}}
 
+static int f() __attribute__((weak)); // expected-error {{weak declaration 
cannot have internal linkage}}
 static int x __attribute__((weak)); // expected-error {{weak declaration 
cannot have internal linkage}}
 
 // rdar://9538608


Index: cfe/trunk/test/Sema/attr-weak.c
===
--- cfe/trunk/test/Sema/attr-weak.c
+++ cfe/trunk/test/Sema/attr-weak.c
@@ -1,7 +1,9 @@
 // RUN: %clang_cc1 -verify -fsyntax-only %s
 
+extern int f0() __attribute__((weak));
 extern int g0 __attribute__((weak));
 extern int g1 __attribute__((weak_import));
+int f2() __attribute__((weak));
 int g2 __attribute__((weak));
 int g3 __attribute__((weak_import)); // expected-warning {{'weak_import' attribute cannot be specified on a definition}}
 int __attribute__((weak_import)) g4(void);
@@ -11,6 +13,7 @@
 struct __attribute__((weak)) s0 {}; // expected-warning {{'weak' attribute only applies to variables, functions, and classes}}
 struct __attribute__((weak_import)) s1 {}; // expected-warning {{'weak_import' attribute only applies to variables and functions}}
 
+static int f() __attribute__((weak)); // expected-error {{weak declaration cannot have internal linkage}}
 static int x __attribute__((weak)); // expected-error {{weak declaration cannot have internal linkage}}
 
 // rdar://9538608
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

2018-05-29 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D47394#1115086, @tra wrote:

> On one hand I can see how being able to treat GPU-side binaries as any other 
> host files is convenient. On the other hand, this convenience comes with the 
> price of targeting only NVPTX. This seems contrary to OpenMP's goal of 
> supporting many different kinds of accelerators. I'm not sure what's the 
> consensus in the OpenMP community these days, but I vaguely recall that 
> generic bundling/unbundling was explicitly chosen over vendor-specific 
> encapsulation in host .o when the bundling was implemented. If the underlying 
> reasons have changed since then it would be great to hear more details about 
> that.


I second this statement, static linking might come handy for all targets and 
Clang should try to avoid vendor specific solutions as much as possible.

In a discussion off-list I proposed adding constructor functions to all object 
files and handle them like shared libraries are already handled today (ie 
register separately and let the runtime figure out how to relocate symbols in 
different translation units). I don't have an implementation of that approach 
so I can't claim that it works and doesn't have a huge performance impact 
(which we don't want either), but it should be agnostic of the offloading 
target so it may be worth investigating.


Repository:
  rC Clang

https://reviews.llvm.org/D47394



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38257: [OpenMP] Fix memory leak when translating arguments

2017-09-25 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.

Parsing the argument after -Xopenmp-target allocates memory that needs
to be freed. Associate it with the final DerivedArgList after we know
which one will be used.


https://reviews.llvm.org/D38257

Files:
  include/clang/Driver/ToolChain.h
  lib/Driver/Compilation.cpp
  lib/Driver/ToolChain.cpp
  test/Driver/openmp-offload-gpu.c


Index: test/Driver/openmp-offload-gpu.c
===
--- test/Driver/openmp-offload-gpu.c
+++ test/Driver/openmp-offload-gpu.c
@@ -2,9 +2,6 @@
 /// Perform several driver tests for OpenMP offloading
 ///
 
-// Until this test is stabilized on all local configurations.
-// UNSUPPORTED: linux
-
 // REQUIRES: clang-driver
 // REQUIRES: x86-registered-target
 // REQUIRES: powerpc-registered-target
Index: lib/Driver/ToolChain.cpp
===
--- lib/Driver/ToolChain.cpp
+++ lib/Driver/ToolChain.cpp
@@ -800,9 +800,10 @@
   return VersionTuple();
 }
 
-llvm::opt::DerivedArgList *
-ToolChain::TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList &Args,
-Action::OffloadKind DeviceOffloadKind) const {
+llvm::opt::DerivedArgList *ToolChain::TranslateOpenMPTargetArgs(
+const llvm::opt::DerivedArgList &Args,
+Action::OffloadKind DeviceOffloadKind,
+SmallVector &AllocatedArgs) const {
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
 const OptTable &Opts = getDriver().getOpts();
@@ -854,6 +855,7 @@
   }
   XOpenMPTargetArg->setBaseArg(A);
   A = XOpenMPTargetArg.release();
+  AllocatedArgs.push_back(A);
   DAL->append(A);
   NewArgAdded = true;
 }
Index: lib/Driver/Compilation.cpp
===
--- lib/Driver/Compilation.cpp
+++ lib/Driver/Compilation.cpp
@@ -51,9 +51,10 @@
 
   DerivedArgList *&Entry = TCArgs[{TC, BoundArch, DeviceOffloadKind}];
   if (!Entry) {
+SmallVector AllocatedArgs;
 // Translate OpenMP toolchain arguments provided via the -Xopenmp-target 
flags.
-DerivedArgList *OpenMPArgs = TC->TranslateOpenMPTargetArgs(*TranslatedArgs,
-DeviceOffloadKind);
+DerivedArgList *OpenMPArgs = TC->TranslateOpenMPTargetArgs(
+*TranslatedArgs, DeviceOffloadKind, AllocatedArgs);
 if (!OpenMPArgs) {
   Entry = TC->TranslateArgs(*TranslatedArgs, BoundArch, DeviceOffloadKind);
 } else {
@@ -63,6 +64,11 @@
 
 if (!Entry)
   Entry = TranslatedArgs;
+
+// Add allocated arguments to the final DAL.
+for (auto ArgPtr : AllocatedArgs) {
+  Entry->AddSynthesizedArg(ArgPtr);
+}
   }
 
   return *Entry;
Index: include/clang/Driver/ToolChain.h
===
--- include/clang/Driver/ToolChain.h
+++ include/clang/Driver/ToolChain.h
@@ -249,9 +249,10 @@
   ///
   /// \param DeviceOffloadKind - The device offload kind used for the
   /// translation.
-  virtual llvm::opt::DerivedArgList *
-  TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList &Args,
-  Action::OffloadKind DeviceOffloadKind) const;
+  virtual llvm::opt::DerivedArgList *TranslateOpenMPTargetArgs(
+  const llvm::opt::DerivedArgList &Args,
+  Action::OffloadKind DeviceOffloadKind,
+  SmallVector &AllocatedArgs) const;
 
   /// Choose a tool to use to handle the action \p JA.
   ///


Index: test/Driver/openmp-offload-gpu.c
===
--- test/Driver/openmp-offload-gpu.c
+++ test/Driver/openmp-offload-gpu.c
@@ -2,9 +2,6 @@
 /// Perform several driver tests for OpenMP offloading
 ///
 
-// Until this test is stabilized on all local configurations.
-// UNSUPPORTED: linux
-
 // REQUIRES: clang-driver
 // REQUIRES: x86-registered-target
 // REQUIRES: powerpc-registered-target
Index: lib/Driver/ToolChain.cpp
===
--- lib/Driver/ToolChain.cpp
+++ lib/Driver/ToolChain.cpp
@@ -800,9 +800,10 @@
   return VersionTuple();
 }
 
-llvm::opt::DerivedArgList *
-ToolChain::TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList &Args,
-Action::OffloadKind DeviceOffloadKind) const {
+llvm::opt::DerivedArgList *ToolChain::TranslateOpenMPTargetArgs(
+const llvm::opt::DerivedArgList &Args,
+Action::OffloadKind DeviceOffloadKind,
+SmallVector &AllocatedArgs) const {
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
 const OptTable &Opts = getDriver().getOpts();
@@ -854,6 +855,7 @@
   }
   XOpenMPTargetArg->setBaseArg(A);
   A = XOpenMPTargetArg.release();
+  AllocatedArgs.push_back(A);
   DAL->append(A);
   NewArgAdded = true;
 }
Index: lib/Driver/Compilation.cpp
===
--- lib/Driver/Compilation.cpp
+++ lib/Driver/Compilat

[PATCH] D38258: [OpenMP] Fix passing of -m arguments to device toolchain

2017-09-25 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.

AuxTriple is not set if host and device share a toolchain. Also,
removing an argument modifies the DAL which needs to be returned
for future use.
(Move tests back to offload-openmp.c as they are not related to GPUs.)


https://reviews.llvm.org/D38258

Files:
  lib/Driver/ToolChain.cpp
  test/Driver/openmp-offload-gpu.c
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -39,6 +39,54 @@
 
 /// ###
 
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -mcpu=pwr7 is passed when compiling for the device.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -mcpu=pwr7 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} "-target-cpu" "pwr7" {{.*}}"-fopenmp-is-device"
+
+/// ###
+
+/// Check -Xopenmp-target -mcpu=pwr7 is passed when compiling for the device.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -mcpu=pwr7 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} "-target-cpu" "pwr7" {{.*}}"-fopenmp-is-device"
+
+/// ##
+
+/// Check -mcpu=pwr7 is passed to the same triple.
+// RUN:%clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -target powerpc64le-ibm-linux-gnu -mcpu=pwr7 %s 2>&1 \
+// RUN:| FileCheck -check-prefix=CHK-FOPENMP-MCPU-TO-SAME-TRIPLE %s
+
+// CHK-FOPENMP-MCPU-TO-SAME-TRIPLE: clang{{.*}} "-target-cpu" "pwr7" {{.*}}"-fopenmp-is-device"
+
+/// ##
+
+/// Check -march=pwr7 is NOT passed to nvptx64-nvidia-cuda.
+// RUN:%clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -march=pwr7 %s 2>&1 \
+// RUN:| FileCheck -check-prefix=CHK-FOPENMP-MARCH-TO-GPU %s
+
+// CHK-FOPENMP-MARCH-TO-GPU-NOT: clang{{.*}} "-target-cpu" "pwr7" {{.*}}"-fopenmp-is-device"
+
+/// ###
+
+/// Check -Xopenmp-target triggers error when multiple triples are used.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu,powerpc64le-unknown-linux-gnu -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR %s
+
+// CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR: clang{{.*}} error: cannot deduce implicit triple value for -Xopenmp-target, specify triple using -Xopenmp-target=
+
+/// ###
+
+/// Check -Xopenmp-target triggers error when an option requiring arguments is passed to it.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-NESTED-ERROR %s
+
+// CHK-FOPENMP-TARGET-NESTED-ERROR: clang{{.*}} error: invalid -Xopenmp-target argument: '-Xopenmp-target -Xopenmp-target', options requiring arguments are unsupported
+
+/// ###
+
 /// Check the phases graph when using a single target, different from the host.
 /// We should have an offload action joining the host compile and device
 /// preprocessor and another one joining the device linking outputs to the host
Index: test/Driver/openmp-offload-gpu.c
===
--- test/Driver/openmp-offload-gpu.c
+++ test/Driver/openmp-offload-gpu.c
@@ -9,38 +9,6 @@
 
 /// ###
 
-/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr7 is passed when compiling for the device.
-// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -mcpu=pwr7 %s 2>&1 \
-// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
-
-// CHK-FOPENMP-EQ-TARGET: clang{{.*}} "-target-cpu" "pwr7"
-
-/// ###
-
-/// Check -Xopenmp-target -march=pwr7 is passed when compiling for the device.
-// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -mcpu=pwr7 %s 2>&1 \
-// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
-
-// CHK-FOPENMP-TARGET: clang{{.*}} "-target-cpu" "pw

[PATCH] D38259: [OpenMP] Fix translation of target args

2017-09-25 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.

ToolChain::TranslateArgs() returns nullptr if no changes are performed.
This would currently mean that OpenMPArgs are lost. Patch fixes this
by falling back to simply using OpenMPArgs in that case.


https://reviews.llvm.org/D38259

Files:
  lib/Driver/Compilation.cpp


Index: lib/Driver/Compilation.cpp
===
--- lib/Driver/Compilation.cpp
+++ lib/Driver/Compilation.cpp
@@ -57,14 +57,16 @@
 *TranslatedArgs, DeviceOffloadKind, AllocatedArgs);
 if (!OpenMPArgs) {
   Entry = TC->TranslateArgs(*TranslatedArgs, BoundArch, DeviceOffloadKind);
+  if (!Entry)
+Entry = TranslatedArgs;
 } else {
   Entry = TC->TranslateArgs(*OpenMPArgs, BoundArch, DeviceOffloadKind);
-  delete OpenMPArgs;
+  if (!Entry)
+Entry = OpenMPArgs;
+  else
+delete OpenMPArgs;
 }
 
-if (!Entry)
-  Entry = TranslatedArgs;
-
 // Add allocated arguments to the final DAL.
 for (auto ArgPtr : AllocatedArgs) {
   Entry->AddSynthesizedArg(ArgPtr);


Index: lib/Driver/Compilation.cpp
===
--- lib/Driver/Compilation.cpp
+++ lib/Driver/Compilation.cpp
@@ -57,14 +57,16 @@
 *TranslatedArgs, DeviceOffloadKind, AllocatedArgs);
 if (!OpenMPArgs) {
   Entry = TC->TranslateArgs(*TranslatedArgs, BoundArch, DeviceOffloadKind);
+  if (!Entry)
+Entry = TranslatedArgs;
 } else {
   Entry = TC->TranslateArgs(*OpenMPArgs, BoundArch, DeviceOffloadKind);
-  delete OpenMPArgs;
+  if (!Entry)
+Entry = OpenMPArgs;
+  else
+delete OpenMPArgs;
 }
 
-if (!Entry)
-  Entry = TranslatedArgs;
-
 // Add allocated arguments to the final DAL.
 for (auto ArgPtr : AllocatedArgs) {
   Entry->AddSynthesizedArg(ArgPtr);
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38277: [compiler-rt}[CMake] Fix configuration on PowerPC with sanitizers

2017-09-26 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.
Herald added subscribers: mgorny, dberris, nemanjai.

TEST_BIG_ENDIAN() performs compile tests that will fail with
-nodefaultlibs when building under LLVM_USE_SANITIZER.


https://reviews.llvm.org/D38277

Files:
  cmake/base-config-ix.cmake


Index: cmake/base-config-ix.cmake
===
--- cmake/base-config-ix.cmake
+++ cmake/base-config-ix.cmake
@@ -148,7 +148,14 @@
 endif()
   endif()
 elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "powerpc")
+  # Strip out -nodefaultlibs when calling TEST_BIG_ENDIAN. Configuration
+  # will fail with this option when building with a sanitizer.
+  set(OLD_CMAKE_REQUIRED_FLAGS ${CMAKE_REQUIRED_FLAGS})
+  string(REPLACE "-nodefaultlibs" "" CMAKE_REQUIRED_FLAGS 
${OLD_CMAKE_REQUIRED_FLAGS})
   TEST_BIG_ENDIAN(HOST_IS_BIG_ENDIAN)
+  # Undo the change.
+  set(CMAKE_REQUIRED_FLAGS "${OLD_CMAKE_REQUIRED_FLAGS}")
+
   if(HOST_IS_BIG_ENDIAN)
 test_target_arch(powerpc64 "" "-m64")
   else()


Index: cmake/base-config-ix.cmake
===
--- cmake/base-config-ix.cmake
+++ cmake/base-config-ix.cmake
@@ -148,7 +148,14 @@
 endif()
   endif()
 elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "powerpc")
+  # Strip out -nodefaultlibs when calling TEST_BIG_ENDIAN. Configuration
+  # will fail with this option when building with a sanitizer.
+  set(OLD_CMAKE_REQUIRED_FLAGS ${CMAKE_REQUIRED_FLAGS})
+  string(REPLACE "-nodefaultlibs" "" CMAKE_REQUIRED_FLAGS ${OLD_CMAKE_REQUIRED_FLAGS})
   TEST_BIG_ENDIAN(HOST_IS_BIG_ENDIAN)
+  # Undo the change.
+  set(CMAKE_REQUIRED_FLAGS "${OLD_CMAKE_REQUIRED_FLAGS}")
+
   if(HOST_IS_BIG_ENDIAN)
 test_target_arch(powerpc64 "" "-m64")
   else()
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38040: [OpenMP] Add an additional test for D34888

2017-09-26 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld accepted this revision.
Hahnfeld added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D38040



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38258: [OpenMP] Fix passing of -m arguments to device toolchain

2017-09-27 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added inline comments.



Comment at: test/Driver/openmp-offload.c:89
+/// ###
+
 /// Check the phases graph when using a single target, different from the host.

gtbercea wrote:
> Shouldn't these tests be in the gpu test file?
There is nothing specific to GPUs here IMO, that is why I moved the test back 
to this file


https://reviews.llvm.org/D38258



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38258: [OpenMP] Fix passing of -m arguments to device toolchain

2017-09-27 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL314329: [OpenMP] Fix passing of -m arguments to device 
toolchain (authored by Hahnfeld).

Changed prior to commit:
  https://reviews.llvm.org/D38258?vs=116608&id=116845#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D38258

Files:
  cfe/trunk/lib/Driver/ToolChain.cpp
  cfe/trunk/test/Driver/openmp-offload-gpu.c
  cfe/trunk/test/Driver/openmp-offload.c

Index: cfe/trunk/test/Driver/openmp-offload.c
===
--- cfe/trunk/test/Driver/openmp-offload.c
+++ cfe/trunk/test/Driver/openmp-offload.c
@@ -39,6 +39,54 @@
 
 /// ###
 
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -mcpu=pwr7 is passed when compiling for the device.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -mcpu=pwr7 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} "-target-cpu" "pwr7" {{.*}}"-fopenmp-is-device"
+
+/// ###
+
+/// Check -Xopenmp-target -mcpu=pwr7 is passed when compiling for the device.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -mcpu=pwr7 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} "-target-cpu" "pwr7" {{.*}}"-fopenmp-is-device"
+
+/// ##
+
+/// Check -mcpu=pwr7 is passed to the same triple.
+// RUN:%clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -target powerpc64le-ibm-linux-gnu -mcpu=pwr7 %s 2>&1 \
+// RUN:| FileCheck -check-prefix=CHK-FOPENMP-MCPU-TO-SAME-TRIPLE %s
+
+// CHK-FOPENMP-MCPU-TO-SAME-TRIPLE: clang{{.*}} "-target-cpu" "pwr7" {{.*}}"-fopenmp-is-device"
+
+/// ##
+
+/// Check -march=pwr7 is NOT passed to nvptx64-nvidia-cuda.
+// RUN:%clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -march=pwr7 %s 2>&1 \
+// RUN:| FileCheck -check-prefix=CHK-FOPENMP-MARCH-TO-GPU %s
+
+// CHK-FOPENMP-MARCH-TO-GPU-NOT: clang{{.*}} "-target-cpu" "pwr7" {{.*}}"-fopenmp-is-device"
+
+/// ###
+
+/// Check -Xopenmp-target triggers error when multiple triples are used.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu,powerpc64le-unknown-linux-gnu -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR %s
+
+// CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR: clang{{.*}} error: cannot deduce implicit triple value for -Xopenmp-target, specify triple using -Xopenmp-target=
+
+/// ###
+
+/// Check -Xopenmp-target triggers error when an option requiring arguments is passed to it.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-NESTED-ERROR %s
+
+// CHK-FOPENMP-TARGET-NESTED-ERROR: clang{{.*}} error: invalid -Xopenmp-target argument: '-Xopenmp-target -Xopenmp-target', options requiring arguments are unsupported
+
+/// ###
+
 /// Check the phases graph when using a single target, different from the host.
 /// We should have an offload action joining the host compile and device
 /// preprocessor and another one joining the device linking outputs to the host
Index: cfe/trunk/test/Driver/openmp-offload-gpu.c
===
--- cfe/trunk/test/Driver/openmp-offload-gpu.c
+++ cfe/trunk/test/Driver/openmp-offload-gpu.c
@@ -9,38 +9,6 @@
 
 /// ###
 
-/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr7 is passed when compiling for the device.
-// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -mcpu=pwr7 %s 2>&1 \
-// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
-
-// CHK-FOPENMP-EQ-TARGET: clang{{.*}} "-target-cpu" "pwr7"
-
-/// ###
-
-/// Check -Xopenmp-target -march=pwr7 is passed when compiling for the device.
-// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -mcpu=

[PATCH] D38259: [OpenMP] Fix translation of target args

2017-09-27 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL314330: [OpenMP] Fix translation of target args (authored by 
Hahnfeld).

Changed prior to commit:
  https://reviews.llvm.org/D38259?vs=116610&id=116846#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D38259

Files:
  cfe/trunk/lib/Driver/Compilation.cpp


Index: cfe/trunk/lib/Driver/Compilation.cpp
===
--- cfe/trunk/lib/Driver/Compilation.cpp
+++ cfe/trunk/lib/Driver/Compilation.cpp
@@ -57,14 +57,16 @@
 *TranslatedArgs, DeviceOffloadKind, AllocatedArgs);
 if (!OpenMPArgs) {
   Entry = TC->TranslateArgs(*TranslatedArgs, BoundArch, DeviceOffloadKind);
+  if (!Entry)
+Entry = TranslatedArgs;
 } else {
   Entry = TC->TranslateArgs(*OpenMPArgs, BoundArch, DeviceOffloadKind);
-  delete OpenMPArgs;
+  if (!Entry)
+Entry = OpenMPArgs;
+  else
+delete OpenMPArgs;
 }
 
-if (!Entry)
-  Entry = TranslatedArgs;
-
 // Add allocated arguments to the final DAL.
 for (auto ArgPtr : AllocatedArgs) {
   Entry->AddSynthesizedArg(ArgPtr);


Index: cfe/trunk/lib/Driver/Compilation.cpp
===
--- cfe/trunk/lib/Driver/Compilation.cpp
+++ cfe/trunk/lib/Driver/Compilation.cpp
@@ -57,14 +57,16 @@
 *TranslatedArgs, DeviceOffloadKind, AllocatedArgs);
 if (!OpenMPArgs) {
   Entry = TC->TranslateArgs(*TranslatedArgs, BoundArch, DeviceOffloadKind);
+  if (!Entry)
+Entry = TranslatedArgs;
 } else {
   Entry = TC->TranslateArgs(*OpenMPArgs, BoundArch, DeviceOffloadKind);
-  delete OpenMPArgs;
+  if (!Entry)
+Entry = OpenMPArgs;
+  else
+delete OpenMPArgs;
 }
 
-if (!Entry)
-  Entry = TranslatedArgs;
-
 // Add allocated arguments to the final DAL.
 for (auto ArgPtr : AllocatedArgs) {
   Entry->AddSynthesizedArg(ArgPtr);
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38257: [OpenMP] Fix memory leak when translating arguments

2017-09-27 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL314328: [OpenMP] Fix memory leak when translating arguments 
(authored by Hahnfeld).

Changed prior to commit:
  https://reviews.llvm.org/D38257?vs=116607&id=116844#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D38257

Files:
  cfe/trunk/include/clang/Driver/ToolChain.h
  cfe/trunk/lib/Driver/Compilation.cpp
  cfe/trunk/lib/Driver/ToolChain.cpp
  cfe/trunk/test/Driver/openmp-offload-gpu.c


Index: cfe/trunk/lib/Driver/ToolChain.cpp
===
--- cfe/trunk/lib/Driver/ToolChain.cpp
+++ cfe/trunk/lib/Driver/ToolChain.cpp
@@ -800,9 +800,10 @@
   return VersionTuple();
 }
 
-llvm::opt::DerivedArgList *
-ToolChain::TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList &Args,
-Action::OffloadKind DeviceOffloadKind) const {
+llvm::opt::DerivedArgList *ToolChain::TranslateOpenMPTargetArgs(
+const llvm::opt::DerivedArgList &Args,
+Action::OffloadKind DeviceOffloadKind,
+SmallVector &AllocatedArgs) const {
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
 const OptTable &Opts = getDriver().getOpts();
@@ -854,6 +855,7 @@
   }
   XOpenMPTargetArg->setBaseArg(A);
   A = XOpenMPTargetArg.release();
+  AllocatedArgs.push_back(A);
   DAL->append(A);
   NewArgAdded = true;
 }
Index: cfe/trunk/lib/Driver/Compilation.cpp
===
--- cfe/trunk/lib/Driver/Compilation.cpp
+++ cfe/trunk/lib/Driver/Compilation.cpp
@@ -51,9 +51,10 @@
 
   DerivedArgList *&Entry = TCArgs[{TC, BoundArch, DeviceOffloadKind}];
   if (!Entry) {
+SmallVector AllocatedArgs;
 // Translate OpenMP toolchain arguments provided via the -Xopenmp-target 
flags.
-DerivedArgList *OpenMPArgs = TC->TranslateOpenMPTargetArgs(*TranslatedArgs,
-DeviceOffloadKind);
+DerivedArgList *OpenMPArgs = TC->TranslateOpenMPTargetArgs(
+*TranslatedArgs, DeviceOffloadKind, AllocatedArgs);
 if (!OpenMPArgs) {
   Entry = TC->TranslateArgs(*TranslatedArgs, BoundArch, DeviceOffloadKind);
 } else {
@@ -63,6 +64,11 @@
 
 if (!Entry)
   Entry = TranslatedArgs;
+
+// Add allocated arguments to the final DAL.
+for (auto ArgPtr : AllocatedArgs) {
+  Entry->AddSynthesizedArg(ArgPtr);
+}
   }
 
   return *Entry;
Index: cfe/trunk/include/clang/Driver/ToolChain.h
===
--- cfe/trunk/include/clang/Driver/ToolChain.h
+++ cfe/trunk/include/clang/Driver/ToolChain.h
@@ -249,9 +249,10 @@
   ///
   /// \param DeviceOffloadKind - The device offload kind used for the
   /// translation.
-  virtual llvm::opt::DerivedArgList *
-  TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList &Args,
-  Action::OffloadKind DeviceOffloadKind) const;
+  virtual llvm::opt::DerivedArgList *TranslateOpenMPTargetArgs(
+  const llvm::opt::DerivedArgList &Args,
+  Action::OffloadKind DeviceOffloadKind,
+  SmallVector &AllocatedArgs) const;
 
   /// Choose a tool to use to handle the action \p JA.
   ///
Index: cfe/trunk/test/Driver/openmp-offload-gpu.c
===
--- cfe/trunk/test/Driver/openmp-offload-gpu.c
+++ cfe/trunk/test/Driver/openmp-offload-gpu.c
@@ -2,9 +2,6 @@
 /// Perform several driver tests for OpenMP offloading
 ///
 
-// Until this test is stabilized on all local configurations.
-// UNSUPPORTED: linux
-
 // REQUIRES: clang-driver
 // REQUIRES: x86-registered-target
 // REQUIRES: powerpc-registered-target


Index: cfe/trunk/lib/Driver/ToolChain.cpp
===
--- cfe/trunk/lib/Driver/ToolChain.cpp
+++ cfe/trunk/lib/Driver/ToolChain.cpp
@@ -800,9 +800,10 @@
   return VersionTuple();
 }
 
-llvm::opt::DerivedArgList *
-ToolChain::TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList &Args,
-Action::OffloadKind DeviceOffloadKind) const {
+llvm::opt::DerivedArgList *ToolChain::TranslateOpenMPTargetArgs(
+const llvm::opt::DerivedArgList &Args,
+Action::OffloadKind DeviceOffloadKind,
+SmallVector &AllocatedArgs) const {
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
 const OptTable &Opts = getDriver().getOpts();
@@ -854,6 +855,7 @@
   }
   XOpenMPTargetArg->setBaseArg(A);
   A = XOpenMPTargetArg.release();
+  AllocatedArgs.push_back(A);
   DAL->append(A);
   NewArgAdded = true;
 }
Index: cfe/trunk/lib/Driver/Compilation.cpp
===
--- cfe/trunk/lib/Driver/Compilation.cpp
+++ cfe/trunk/lib/Driver/Compilation.cpp
@@ -51,9 +51,10 @@
 
   DerivedArgList *&Entry = TCArgs[{TC, BoundArch, DeviceOffloadKind}];
   if (!Entry) {
+SmallVector AllocatedAr

[PATCH] D38277: [compiler-rt}[CMake] Fix configuration on PowerPC with sanitizers

2017-09-28 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld marked 2 inline comments as done.
Hahnfeld added a comment.

The error with `-DLLVM_USE_SANITIZER=Address` is

  -- Check if the system is big endian
  -- Searching 16 bit integer
  -- Looking for stddef.h
  -- Looking for stddef.h - not found
  -- Check size of unsigned short
  -- Check size of unsigned short - failed
  -- Check size of unsigned int
  -- Check size of unsigned int - failed
  -- Check size of unsigned long
  -- Check size of unsigned long - failed
  CMake Error at <...>/cmake/share/cmake-3.5/Modules/TestBigEndian.cmake:51 
(message):
no suitable type found
  Call Stack (most recent call first):
projects/compiler-rt/cmake/base-config-ix.cmake:151 (TEST_BIG_ENDIAN)
projects/compiler-rt/cmake/config-ix.cmake:138 (test_targets)
projects/compiler-rt/CMakeLists.txt:99 (include)

The reason is that the compile tests are performed with `-fsanitize=address 
-nodefaultlibs`. This gives a lot of undefined references because the runtime 
dependencies aren't linked in.


https://reviews.llvm.org/D38277



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38277: [compiler-rt][CMake] Fix configuration on PowerPC with sanitizers

2017-09-28 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld updated this revision to Diff 116977.
Hahnfeld retitled this revision from "[compiler-rt}[CMake] Fix configuration on 
PowerPC with sanitizers" to "[compiler-rt][CMake] Fix configuration on PowerPC 
with sanitizers".
Hahnfeld added a subscriber: gtbercea.

https://reviews.llvm.org/D38277

Files:
  cmake/base-config-ix.cmake


Index: cmake/base-config-ix.cmake
===
--- cmake/base-config-ix.cmake
+++ cmake/base-config-ix.cmake
@@ -148,7 +148,13 @@
 endif()
   endif()
 elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "powerpc")
+  # Strip out -nodefaultlibs when calling TEST_BIG_ENDIAN. Configuration
+  # will fail with this option when building with a sanitizer.
+  cmake_push_check_state()
+  string(REPLACE "-nodefaultlibs" "" CMAKE_REQUIRED_FLAGS 
${OLD_CMAKE_REQUIRED_FLAGS})
   TEST_BIG_ENDIAN(HOST_IS_BIG_ENDIAN)
+  cmake_pop_check_state()
+
   if(HOST_IS_BIG_ENDIAN)
 test_target_arch(powerpc64 "" "-m64")
   else()


Index: cmake/base-config-ix.cmake
===
--- cmake/base-config-ix.cmake
+++ cmake/base-config-ix.cmake
@@ -148,7 +148,13 @@
 endif()
   endif()
 elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "powerpc")
+  # Strip out -nodefaultlibs when calling TEST_BIG_ENDIAN. Configuration
+  # will fail with this option when building with a sanitizer.
+  cmake_push_check_state()
+  string(REPLACE "-nodefaultlibs" "" CMAKE_REQUIRED_FLAGS ${OLD_CMAKE_REQUIRED_FLAGS})
   TEST_BIG_ENDIAN(HOST_IS_BIG_ENDIAN)
+  cmake_pop_check_state()
+
   if(HOST_IS_BIG_ENDIAN)
 test_target_arch(powerpc64 "" "-m64")
   else()
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38372: [OpenMP] Fix passing of -m arguments correctly

2017-09-28 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.

The recent fix in https://reviews.llvm.org/D38258 was wrong: getAuxTriple() 
only returns
non-null values for the CUDA toolchain. That is why the now added
test for PPC and X86 failed.


https://reviews.llvm.org/D38372

Files:
  include/clang/Driver/ToolChain.h
  lib/Driver/Compilation.cpp
  lib/Driver/ToolChain.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -71,6 +71,14 @@
 
 /// ###
 
+/// Check -march=pwr7 is NOT passed to x86_64-unknown-linux-gnu.
+// RUN:%clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=x86_64-unknown-linux-gnu -march=pwr7 %s 2>&1 \
+// RUN:| FileCheck -check-prefix=CHK-FOPENMP-MARCH-TO-X86 %s
+
+// CHK-FOPENMP-MARCH-TO-X86-NOT: clang{{.*}} "-target-cpu" "pwr7" {{.*}}"-fopenmp-is-device"
+
+/// ###
+
 /// Check -Xopenmp-target triggers error when multiple triples are used.
 // RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu,powerpc64le-unknown-linux-gnu -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR %s
Index: lib/Driver/ToolChain.cpp
===
--- lib/Driver/ToolChain.cpp
+++ lib/Driver/ToolChain.cpp
@@ -801,74 +801,68 @@
 }
 
 llvm::opt::DerivedArgList *ToolChain::TranslateOpenMPTargetArgs(
-const llvm::opt::DerivedArgList &Args,
-Action::OffloadKind DeviceOffloadKind,
-SmallVector &AllocatedArgs) const {
-  if (DeviceOffloadKind == Action::OFK_OpenMP) {
-DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
-const OptTable &Opts = getDriver().getOpts();
-bool Modified = false;
-
-// Handle -Xopenmp-target flags
-for (Arg *A : Args) {
-  // Exclude flags which may only apply to the host toolchain.
-  // Do not exclude flags when the host triple (AuxTriple)
-  // matches the current toolchain triple. If it is not present
-  // at all, target and host share a toolchain.
-  if (A->getOption().matches(options::OPT_m_Group)) {
-if (!getAuxTriple() || getAuxTriple()->str() == getTriple().str())
-  DAL->append(A);
-else
-  Modified = true;
-continue;
-  }
-
-  unsigned Index;
-  unsigned Prev;
-  bool XOpenMPTargetNoTriple = A->getOption().matches(
-  options::OPT_Xopenmp_target);
-
-  if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
-// Passing device args: -Xopenmp-target= -opt=val.
-if (A->getValue(0) == getTripleString())
-  Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
-else
-  continue;
-  } else if (XOpenMPTargetNoTriple) {
-// Passing device args: -Xopenmp-target -opt=val.
-Index = Args.getBaseArgs().MakeIndex(A->getValue(0));
-  } else {
+const llvm::opt::DerivedArgList &Args, bool SameTripleAsHost,
+SmallVectorImpl &AllocatedArgs) const {
+  DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
+  const OptTable &Opts = getDriver().getOpts();
+  bool Modified = false;
+
+  // Handle -Xopenmp-target flags
+  for (Arg *A : Args) {
+// Exclude flags which may only apply to the host toolchain.
+// Do not exclude flags when the host triple (AuxTriple)
+// matches the current toolchain triple. If it is not present
+// at all, target and host share a toolchain.
+if (A->getOption().matches(options::OPT_m_Group)) {
+  if (SameTripleAsHost)
 DAL->append(A);
+  else
+Modified = true;
+  continue;
+}
+
+unsigned Index;
+unsigned Prev;
+bool XOpenMPTargetNoTriple =
+A->getOption().matches(options::OPT_Xopenmp_target);
+
+if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
+  // Passing device args: -Xopenmp-target= -opt=val.
+  if (A->getValue(0) == getTripleString())
+Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
+  else
 continue;
-  }
-
-  // Parse the argument to -Xopenmp-target.
-  Prev = Index;
-  std::unique_ptr XOpenMPTargetArg(Opts.ParseOneArg(Args, Index));
-  if (!XOpenMPTargetArg || Index > Prev + 1) {
-getDriver().Diag(diag::err_drv_invalid_Xopenmp_target_with_args)
-<< A->getAsString(Args);
-continue;
-  }
-  if (XOpenMPTargetNoTriple && XOpenMPTargetArg &&
-  Args.getAllArgValues(
-  options::OPT_fopenmp_targets_EQ).size() != 1) {
-getDriver().Diag(diag::err_drv_Xopenmp_target_missing_triple);
-continue;
-  }
-  XOpenMPTargetArg->setBaseArg(A);
-  A = XOpenMPTargetArg.release();
-  AllocatedArgs.push_

[PATCH] D38277: [compiler-rt][CMake] Fix configuration on PowerPC with sanitizers

2017-09-29 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL314512: [CMake] Fix configuration on PowerPC with sanitizers 
(authored by Hahnfeld).

Changed prior to commit:
  https://reviews.llvm.org/D38277?vs=116977&id=117132#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D38277

Files:
  compiler-rt/trunk/cmake/base-config-ix.cmake


Index: compiler-rt/trunk/cmake/base-config-ix.cmake
===
--- compiler-rt/trunk/cmake/base-config-ix.cmake
+++ compiler-rt/trunk/cmake/base-config-ix.cmake
@@ -148,7 +148,13 @@
 endif()
   endif()
 elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "powerpc")
+  # Strip out -nodefaultlibs when calling TEST_BIG_ENDIAN. Configuration
+  # will fail with this option when building with a sanitizer.
+  cmake_push_check_state()
+  string(REPLACE "-nodefaultlibs" "" CMAKE_REQUIRED_FLAGS 
${OLD_CMAKE_REQUIRED_FLAGS})
   TEST_BIG_ENDIAN(HOST_IS_BIG_ENDIAN)
+  cmake_pop_check_state()
+
   if(HOST_IS_BIG_ENDIAN)
 test_target_arch(powerpc64 "" "-m64")
   else()


Index: compiler-rt/trunk/cmake/base-config-ix.cmake
===
--- compiler-rt/trunk/cmake/base-config-ix.cmake
+++ compiler-rt/trunk/cmake/base-config-ix.cmake
@@ -148,7 +148,13 @@
 endif()
   endif()
 elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "powerpc")
+  # Strip out -nodefaultlibs when calling TEST_BIG_ENDIAN. Configuration
+  # will fail with this option when building with a sanitizer.
+  cmake_push_check_state()
+  string(REPLACE "-nodefaultlibs" "" CMAKE_REQUIRED_FLAGS ${OLD_CMAKE_REQUIRED_FLAGS})
   TEST_BIG_ENDIAN(HOST_IS_BIG_ENDIAN)
+  cmake_pop_check_state()
+
   if(HOST_IS_BIG_ENDIAN)
 test_target_arch(powerpc64 "" "-m64")
   else()
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38277: [compiler-rt][CMake] Fix configuration on PowerPC with sanitizers

2017-09-29 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added inline comments.



Comment at: compiler-rt/trunk/cmake/base-config-ix.cmake:154
+  cmake_push_check_state()
+  string(REPLACE "-nodefaultlibs" "" CMAKE_REQUIRED_FLAGS 
${OLD_CMAKE_REQUIRED_FLAGS})
   TEST_BIG_ENDIAN(HOST_IS_BIG_ENDIAN)

alekseyshl wrote:
> Oh, right, it should be:
> 
> string(REPLACE "-nodefaultlibs" "" CMAKE_REQUIRED_FLAGS 
> ${CMAKE_REQUIRED_FLAGS})
Sorry! Thanks for fixing it so fast, my `git-svn` just told me that I'm out of 
sync!


Repository:
  rL LLVM

https://reviews.llvm.org/D38277



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38468: [CUDA] Fix name of __activemask()

2017-10-02 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.

The name has two underscores in the official CUDA documentation:
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-vote-functions


https://reviews.llvm.org/D38468

Files:
  lib/Headers/__clang_cuda_intrinsics.h


Index: lib/Headers/__clang_cuda_intrinsics.h
===
--- lib/Headers/__clang_cuda_intrinsics.h
+++ lib/Headers/__clang_cuda_intrinsics.h
@@ -186,7 +186,7 @@
   return __nvvm_vote_ballot_sync(mask, pred);
 }
 
-inline __device__ unsigned int activemask() { return __nvvm_vote_ballot(1); }
+inline __device__ unsigned int __activemask() { return __nvvm_vote_ballot(1); }
 
 #endif // !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 300
 


Index: lib/Headers/__clang_cuda_intrinsics.h
===
--- lib/Headers/__clang_cuda_intrinsics.h
+++ lib/Headers/__clang_cuda_intrinsics.h
@@ -186,7 +186,7 @@
   return __nvvm_vote_ballot_sync(mask, pred);
 }
 
-inline __device__ unsigned int activemask() { return __nvvm_vote_ballot(1); }
+inline __device__ unsigned int __activemask() { return __nvvm_vote_ballot(1); }
 
 #endif // !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 300
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38468: [CUDA] Fix name of __activemask()

2017-10-02 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL314691: [CUDA] Fix name of __activemask() (authored by 
Hahnfeld).

Changed prior to commit:
  https://reviews.llvm.org/D38468?vs=117384&id=117392#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D38468

Files:
  cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h


Index: cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h
===
--- cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h
+++ cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h
@@ -186,7 +186,7 @@
   return __nvvm_vote_ballot_sync(mask, pred);
 }
 
-inline __device__ unsigned int activemask() { return __nvvm_vote_ballot(1); }
+inline __device__ unsigned int __activemask() { return __nvvm_vote_ballot(1); }
 
 #endif // !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 300
 


Index: cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h
===
--- cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h
+++ cfe/trunk/lib/Headers/__clang_cuda_intrinsics.h
@@ -186,7 +186,7 @@
   return __nvvm_vote_ballot_sync(mask, pred);
 }
 
-inline __device__ unsigned int activemask() { return __nvvm_vote_ballot(1); }
+inline __device__ unsigned int __activemask() { return __nvvm_vote_ballot(1); }
 
 #endif // !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 300
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38372: [OpenMP] Fix passing of -m arguments correctly

2017-10-04 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL314902: [OpenMP] Fix passing of -m arguments correctly 
(authored by Hahnfeld).

Changed prior to commit:
  https://reviews.llvm.org/D38372?vs=117023&id=117664#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D38372

Files:
  cfe/trunk/include/clang/Driver/ToolChain.h
  cfe/trunk/lib/Driver/Compilation.cpp
  cfe/trunk/lib/Driver/ToolChain.cpp
  cfe/trunk/test/Driver/openmp-offload.c

Index: cfe/trunk/lib/Driver/ToolChain.cpp
===
--- cfe/trunk/lib/Driver/ToolChain.cpp
+++ cfe/trunk/lib/Driver/ToolChain.cpp
@@ -801,74 +801,68 @@
 }
 
 llvm::opt::DerivedArgList *ToolChain::TranslateOpenMPTargetArgs(
-const llvm::opt::DerivedArgList &Args,
-Action::OffloadKind DeviceOffloadKind,
-SmallVector &AllocatedArgs) const {
-  if (DeviceOffloadKind == Action::OFK_OpenMP) {
-DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
-const OptTable &Opts = getDriver().getOpts();
-bool Modified = false;
-
-// Handle -Xopenmp-target flags
-for (Arg *A : Args) {
-  // Exclude flags which may only apply to the host toolchain.
-  // Do not exclude flags when the host triple (AuxTriple)
-  // matches the current toolchain triple. If it is not present
-  // at all, target and host share a toolchain.
-  if (A->getOption().matches(options::OPT_m_Group)) {
-if (!getAuxTriple() || getAuxTriple()->str() == getTriple().str())
-  DAL->append(A);
-else
-  Modified = true;
-continue;
-  }
-
-  unsigned Index;
-  unsigned Prev;
-  bool XOpenMPTargetNoTriple = A->getOption().matches(
-  options::OPT_Xopenmp_target);
-
-  if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
-// Passing device args: -Xopenmp-target= -opt=val.
-if (A->getValue(0) == getTripleString())
-  Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
-else
-  continue;
-  } else if (XOpenMPTargetNoTriple) {
-// Passing device args: -Xopenmp-target -opt=val.
-Index = Args.getBaseArgs().MakeIndex(A->getValue(0));
-  } else {
+const llvm::opt::DerivedArgList &Args, bool SameTripleAsHost,
+SmallVectorImpl &AllocatedArgs) const {
+  DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
+  const OptTable &Opts = getDriver().getOpts();
+  bool Modified = false;
+
+  // Handle -Xopenmp-target flags
+  for (Arg *A : Args) {
+// Exclude flags which may only apply to the host toolchain.
+// Do not exclude flags when the host triple (AuxTriple)
+// matches the current toolchain triple. If it is not present
+// at all, target and host share a toolchain.
+if (A->getOption().matches(options::OPT_m_Group)) {
+  if (SameTripleAsHost)
 DAL->append(A);
-continue;
-  }
+  else
+Modified = true;
+  continue;
+}
 
-  // Parse the argument to -Xopenmp-target.
-  Prev = Index;
-  std::unique_ptr XOpenMPTargetArg(Opts.ParseOneArg(Args, Index));
-  if (!XOpenMPTargetArg || Index > Prev + 1) {
-getDriver().Diag(diag::err_drv_invalid_Xopenmp_target_with_args)
-<< A->getAsString(Args);
-continue;
-  }
-  if (XOpenMPTargetNoTriple && XOpenMPTargetArg &&
-  Args.getAllArgValues(
-  options::OPT_fopenmp_targets_EQ).size() != 1) {
-getDriver().Diag(diag::err_drv_Xopenmp_target_missing_triple);
+unsigned Index;
+unsigned Prev;
+bool XOpenMPTargetNoTriple =
+A->getOption().matches(options::OPT_Xopenmp_target);
+
+if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
+  // Passing device args: -Xopenmp-target= -opt=val.
+  if (A->getValue(0) == getTripleString())
+Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
+  else
 continue;
-  }
-  XOpenMPTargetArg->setBaseArg(A);
-  A = XOpenMPTargetArg.release();
-  AllocatedArgs.push_back(A);
+} else if (XOpenMPTargetNoTriple) {
+  // Passing device args: -Xopenmp-target -opt=val.
+  Index = Args.getBaseArgs().MakeIndex(A->getValue(0));
+} else {
   DAL->append(A);
-  Modified = true;
+  continue;
 }
 
-if (Modified) {
-  return DAL;
-} else {
-  delete DAL;
+// Parse the argument to -Xopenmp-target.
+Prev = Index;
+std::unique_ptr XOpenMPTargetArg(Opts.ParseOneArg(Args, Index));
+if (!XOpenMPTargetArg || Index > Prev + 1) {
+  getDriver().Diag(diag::err_drv_invalid_Xopenmp_target_with_args)
+  << A->getAsString(Args);
+  continue;
 }
+if (XOpenMPTargetNoTriple && XOpenMPTargetArg &&
+Args.getAllArgValues(options::OPT_fopenmp_targets_EQ).size() != 1) {
+  getDriver().Diag(diag::err_drv_Xopenmp_target_missing_triple);
+  continue;
+}
+XOpenMPTargetArg->setBaseArg(

[PATCH] D38883: [CMake][OpenMP] Customize default offloading arch

2017-10-13 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.
Herald added a subscriber: mgorny.

For the shuffle instructions in reductions we need at least sm_30
but the user may want to customize the default architecture.
Also remove some code that went in while troubleshooting broken
tests on external build bots.


https://reviews.llvm.org/D38883

Files:
  CMakeLists.txt
  include/clang/Config/config.h.cmake
  lib/Driver/ToolChains/Cuda.cpp
  lib/Driver/ToolChains/Cuda.h


Index: lib/Driver/ToolChains/Cuda.h
===
--- lib/Driver/ToolChains/Cuda.h
+++ lib/Driver/ToolChains/Cuda.h
@@ -76,17 +76,6 @@
   std::string getLibDeviceFile(StringRef Gpu) const {
 return LibDeviceMap.lookup(Gpu);
   }
-  /// \brief Get lowest available compute capability
-  /// for which a libdevice library exists.
-  std::string getLowestExistingArch() const {
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())
-return key;
-}
-return "sm_20";
-  }
 };
 
 namespace tools {
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -167,19 +167,6 @@
   }
 }
 
-// This code prevents IsValid from being set when
-// no libdevice has been found.
-bool allEmpty = true;
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())
-allEmpty = false;
-}
-
-if (allEmpty)
-  continue;
-
 IsValid = true;
 break;
   }
@@ -565,12 +552,8 @@
 
 StringRef Arch = DAL->getLastArgValue(options::OPT_march_EQ);
 if (Arch.empty()) {
-  // Default compute capability for CUDA toolchain is the
-  // lowest compute capability supported by the installed
-  // CUDA version.
-  DAL->AddJoinedArg(nullptr,
-  Opts.getOption(options::OPT_march_EQ),
-  CudaInstallation.getLowestExistingArch());
+  DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
+CLANG_OPENMP_NVPTX_DEFAULT_ARCH);
 }
 
 return DAL;
Index: include/clang/Config/config.h.cmake
===
--- include/clang/Config/config.h.cmake
+++ include/clang/Config/config.h.cmake
@@ -20,6 +20,9 @@
 /* Default OpenMP runtime used by -fopenmp. */
 #define CLANG_DEFAULT_OPENMP_RUNTIME "${CLANG_DEFAULT_OPENMP_RUNTIME}"
 
+/* Default architecture for OpenMP offloading to Nvidia GPUs. */
+#define CLANG_OPENMP_NVPTX_DEFAULT_ARCH "${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}"
+
 /* Multilib suffix for libdir. */
 #define CLANG_LIBDIR_SUFFIX "${CLANG_LIBDIR_SUFFIX}"
 
Index: CMakeLists.txt
===
--- CMakeLists.txt
+++ CMakeLists.txt
@@ -235,6 +235,16 @@
 set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING
   "Default OpenMP runtime used by -fopenmp.")
 
+# OpenMP offloading requires at least sm_30 because we use shuffle instructions
+# to generate efficient code for reductions.
+set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+  "Default architecture for OpenMP offloading to Nvidia GPUs.")
+if (NOT("${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}" MATCHES "^sm_[0-9]+$"))
+  message(WARNING "Resetting default architecture for OpenMP offloading to 
Nvidia GPUs to sm_30")
+  set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+"Default architecture for OpenMP offloading to Nvidia GPUs." FORCE)
+endif()
+
 set(CLANG_VENDOR ${PACKAGE_VENDOR} CACHE STRING
   "Vendor-specific text for showing with version information.")
 


Index: lib/Driver/ToolChains/Cuda.h
===
--- lib/Driver/ToolChains/Cuda.h
+++ lib/Driver/ToolChains/Cuda.h
@@ -76,17 +76,6 @@
   std::string getLibDeviceFile(StringRef Gpu) const {
 return LibDeviceMap.lookup(Gpu);
   }
-  /// \brief Get lowest available compute capability
-  /// for which a libdevice library exists.
-  std::string getLowestExistingArch() const {
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())
-return key;
-}
-return "sm_20";
-  }
 };
 
 namespace tools {
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -167,19 +167,6 @@
   }
 }
 
-// This code prevents IsValid from being set when
-// no libdevice has been found.
-bool allEmpty = true;
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())
-allEmpty = false;
-}
-
- 

[PATCH] D38883: [CMake][OpenMP] Customize default offloading arch

2017-10-13 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld marked an inline comment as done.
Hahnfeld added inline comments.



Comment at: lib/Driver/ToolChains/Cuda.cpp:170-182
-// This code prevents IsValid from being set when
-// no libdevice has been found.
-bool allEmpty = true;
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())

tra wrote:
> I'd keep this code. It appears to serve useful purpose as it requires CUDA 
> installation to have at least some libdevice library in it.  It gives us a 
> change to find a valid installation, instead of ailing some time later when 
> we ask for a libdevice file and fail because there are none.
We had some internal discussions about this after I submitted the patch here.

The main question is: Do we want to support CUDA installations without 
libdevice and are there use cases for that? I'd say that the user should be 
able to use a toolchain without libdevice together with `-nocudalib`.



Comment at: lib/Driver/ToolChains/Cuda.cpp:540
   // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){

This check guards the whole block.



Comment at: lib/Driver/ToolChains/Cuda.cpp:556
+  DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
+CLANG_OPENMP_NVPTX_DEFAULT_ARCH);
 }

tra wrote:
> This sets default GPU arch for *everyone* based on OPENMP requirements. 
> Perhaps this should be predicated on this being openmp compilation.
> 
> IMO to avoid unnecessary surprises, the default for CUDA compilation should 
> follow defaults of nvcc. sm_30 becomes default only in CUDA-9.
> 
This branch is only executed for OpenMP, see above


https://reviews.llvm.org/D38883



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38883: [CMake][OpenMP] Customize default offloading arch

2017-10-13 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld marked 4 inline comments as done.
Hahnfeld added inline comments.



Comment at: lib/Driver/ToolChains/Cuda.cpp:170-182
-// This code prevents IsValid from being set when
-// no libdevice has been found.
-bool allEmpty = true;
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())

tra wrote:
> Hahnfeld wrote:
> > tra wrote:
> > > I'd keep this code. It appears to serve useful purpose as it requires 
> > > CUDA installation to have at least some libdevice library in it.  It 
> > > gives us a change to find a valid installation, instead of ailing some 
> > > time later when we ask for a libdevice file and fail because there are 
> > > none.
> > We had some internal discussions about this after I submitted the patch 
> > here.
> > 
> > The main question is: Do we want to support CUDA installations without 
> > libdevice and are there use cases for that? I'd say that the user should be 
> > able to use a toolchain without libdevice together with `-nocudalib`.
> Sounds reasonable. How about keeping the code but putting it under 
> `if(!hasArg(nocudalib))`?
> 
Ok, I'll do that in a separate patch and keep the code here for now.


https://reviews.llvm.org/D38883



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38883: [CMake][OpenMP] Customize default offloading arch

2017-10-13 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld updated this revision to Diff 118961.
Hahnfeld marked an inline comment as done.
Hahnfeld edited the summary of this revision.
Hahnfeld added a comment.

Check that the user didn't specify a value lower than `sm_30` and re-add some 
code as discussed.


https://reviews.llvm.org/D38883

Files:
  CMakeLists.txt
  include/clang/Config/config.h.cmake
  lib/Driver/ToolChains/Cuda.cpp
  lib/Driver/ToolChains/Cuda.h


Index: lib/Driver/ToolChains/Cuda.h
===
--- lib/Driver/ToolChains/Cuda.h
+++ lib/Driver/ToolChains/Cuda.h
@@ -76,17 +76,6 @@
   std::string getLibDeviceFile(StringRef Gpu) const {
 return LibDeviceMap.lookup(Gpu);
   }
-  /// \brief Get lowest available compute capability
-  /// for which a libdevice library exists.
-  std::string getLowestExistingArch() const {
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())
-return key;
-}
-return "sm_20";
-  }
 };
 
 namespace tools {
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -551,9 +551,9 @@
   // flags are not duplicated.
   // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
-for (Arg *A : Args){
+for (Arg *A : Args) {
   bool IsDuplicate = false;
-  for (Arg *DALArg : *DAL){
+  for (Arg *DALArg : *DAL) {
 if (A == DALArg) {
   IsDuplicate = true;
   break;
@@ -564,14 +564,9 @@
 }
 
 StringRef Arch = DAL->getLastArgValue(options::OPT_march_EQ);
-if (Arch.empty()) {
-  // Default compute capability for CUDA toolchain is the
-  // lowest compute capability supported by the installed
-  // CUDA version.
-  DAL->AddJoinedArg(nullptr,
-  Opts.getOption(options::OPT_march_EQ),
-  CudaInstallation.getLowestExistingArch());
-}
+if (Arch.empty())
+  DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
+CLANG_OPENMP_NVPTX_DEFAULT_ARCH);
 
 return DAL;
   }
Index: include/clang/Config/config.h.cmake
===
--- include/clang/Config/config.h.cmake
+++ include/clang/Config/config.h.cmake
@@ -20,6 +20,9 @@
 /* Default OpenMP runtime used by -fopenmp. */
 #define CLANG_DEFAULT_OPENMP_RUNTIME "${CLANG_DEFAULT_OPENMP_RUNTIME}"
 
+/* Default architecture for OpenMP offloading to Nvidia GPUs. */
+#define CLANG_OPENMP_NVPTX_DEFAULT_ARCH "${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}"
+
 /* Multilib suffix for libdir. */
 #define CLANG_LIBDIR_SUFFIX "${CLANG_LIBDIR_SUFFIX}"
 
Index: CMakeLists.txt
===
--- CMakeLists.txt
+++ CMakeLists.txt
@@ -235,6 +235,17 @@
 set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING
   "Default OpenMP runtime used by -fopenmp.")
 
+# OpenMP offloading requires at least sm_30 because we use shuffle instructions
+# to generate efficient code for reductions.
+set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+  "Default architecture for OpenMP offloading to Nvidia GPUs.")
+string(REGEX MATCH "^sm_([0-9]+)$" MATCHED_ARCH 
"${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}")
+if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 30)
+  message(WARNING "Resetting default architecture for OpenMP offloading to 
Nvidia GPUs to sm_30")
+  set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+"Default architecture for OpenMP offloading to Nvidia GPUs." FORCE)
+endif()
+
 set(CLANG_VENDOR ${PACKAGE_VENDOR} CACHE STRING
   "Vendor-specific text for showing with version information.")
 


Index: lib/Driver/ToolChains/Cuda.h
===
--- lib/Driver/ToolChains/Cuda.h
+++ lib/Driver/ToolChains/Cuda.h
@@ -76,17 +76,6 @@
   std::string getLibDeviceFile(StringRef Gpu) const {
 return LibDeviceMap.lookup(Gpu);
   }
-  /// \brief Get lowest available compute capability
-  /// for which a libdevice library exists.
-  std::string getLowestExistingArch() const {
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())
-return key;
-}
-return "sm_20";
-  }
 };
 
 namespace tools {
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -551,9 +551,9 @@
   // flags are not duplicated.
   // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
-for (Arg *A : Args){
+for (Arg *A : Args) {
   bool IsDuplicate = false;
-  for (Arg *DALArg : *DAL){
+  for (Arg *DALArg : *DAL) {
 if (A == DALArg) {
 

[PATCH] D38901: [CUDA] Require libdevice only if needed

2017-10-13 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.

If the user passes -nocudalib, we can live without it being present.
Simplify the code by just checking whether LibDeviceMap is empty.


https://reviews.llvm.org/D38901

Files:
  lib/Driver/ToolChains/Cuda.cpp


Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -167,17 +167,9 @@
   }
 }
 
-// This code prevents IsValid from being set when
-// no libdevice has been found.
-bool allEmpty = true;
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())
-allEmpty = false;
-}
-
-if (allEmpty)
+// Check that we have found at least one libdevice that we can link in if
+// -nocudalib hasn't been specified.
+if (LibDeviceMap.empty() && !Args.hasArg(options::OPT_nocudalib))
   continue;
 
 IsValid = true;


Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -167,17 +167,9 @@
   }
 }
 
-// This code prevents IsValid from being set when
-// no libdevice has been found.
-bool allEmpty = true;
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())
-allEmpty = false;
-}
-
-if (allEmpty)
+// Check that we have found at least one libdevice that we can link in if
+// -nocudalib hasn't been specified.
+if (LibDeviceMap.empty() && !Args.hasArg(options::OPT_nocudalib))
   continue;
 
 IsValid = true;
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38901: [CUDA] Require libdevice only if needed

2017-10-13 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld updated this revision to Diff 118969.
Hahnfeld added a comment.

Fix one more condition that checks for `nvvm/libdevice` and add a test.


https://reviews.llvm.org/D38901

Files:
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/Inputs/CUDA-nolibdevice/usr/local/cuda/bin/.keep
  test/Driver/Inputs/CUDA-nolibdevice/usr/local/cuda/include/.keep
  test/Driver/Inputs/CUDA-nolibdevice/usr/local/cuda/lib/.keep
  test/Driver/Inputs/CUDA-nolibdevice/usr/local/cuda/lib64/.keep
  test/Driver/cuda-detect.cu


Index: test/Driver/cuda-detect.cu
===
--- test/Driver/cuda-detect.cu
+++ test/Driver/cuda-detect.cu
@@ -2,7 +2,7 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: nvptx-registered-target
 //
-// # Check that we properly detect CUDA installation.
+// Check that we properly detect CUDA installation.
 // RUN: %clang -v --target=i386-unknown-linux \
 // RUN:   --sysroot=%S/no-cuda-there 2>&1 | FileCheck %s -check-prefix NOCUDA
 // RUN: %clang -v --target=i386-apple-macosx \
@@ -18,6 +18,19 @@
 // RUN: %clang -v --target=i386-apple-macosx \
 // RUN:   --cuda-path=%S/Inputs/CUDA/usr/local/cuda 2>&1 | FileCheck %s
 
+// Check that we don't find a CUDA installation without libdevice ...
+// RUN: %clang -v --target=i386-unknown-linux \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s 
-check-prefix NOCUDA
+// RUN: %clang -v --target=i386-apple-macosx \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s 
-check-prefix NOCUDA
+
+// ... unless the user doesn't need libdevice
+// RUN: %clang -v --target=i386-unknown-linux -nocudalib \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s 
-check-prefix NO-LIBDEVICE
+// RUN: %clang -v --target=i386-apple-macosx -nocudalib \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s 
-check-prefix NO-LIBDEVICE
+
+
 // Make sure we map libdevice bitcode files to proper GPUs. These
 // tests use Inputs/CUDA_80 which has full set of libdevice files.
 // However, libdevice mapping only matches CUDA-7.x at the moment.
@@ -112,6 +125,7 @@
 // RUN: | FileCheck %s --check-prefix CHECK-CXXINCLUDE
 
 // CHECK: Found CUDA installation: {{.*}}/Inputs/CUDA/usr/local/cuda
+// NO-LIBDEVICE: Found CUDA installation: 
{{.*}}/Inputs/CUDA-nolibdevice/usr/local/cuda
 // NOCUDA-NOT: Found CUDA installation:
 
 // MISSINGLIBDEVICE: error: cannot find libdevice for sm_20.
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -87,8 +87,7 @@
 LibDevicePath = InstallPath + "/nvvm/libdevice";
 
 auto &FS = D.getVFS();
-if (!(FS.exists(IncludePath) && FS.exists(BinPath) &&
-  FS.exists(LibDevicePath)))
+if (!(FS.exists(IncludePath) && FS.exists(BinPath)))
   continue;
 
 // On Linux, we have both lib and lib64 directories, and we need to choose
@@ -167,17 +166,9 @@
   }
 }
 
-// This code prevents IsValid from being set when
-// no libdevice has been found.
-bool allEmpty = true;
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())
-allEmpty = false;
-}
-
-if (allEmpty)
+// Check that we have found at least one libdevice that we can link in if
+// -nocudalib hasn't been specified.
+if (LibDeviceMap.empty() && !Args.hasArg(options::OPT_nocudalib))
   continue;
 
 IsValid = true;


Index: test/Driver/cuda-detect.cu
===
--- test/Driver/cuda-detect.cu
+++ test/Driver/cuda-detect.cu
@@ -2,7 +2,7 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: nvptx-registered-target
 //
-// # Check that we properly detect CUDA installation.
+// Check that we properly detect CUDA installation.
 // RUN: %clang -v --target=i386-unknown-linux \
 // RUN:   --sysroot=%S/no-cuda-there 2>&1 | FileCheck %s -check-prefix NOCUDA
 // RUN: %clang -v --target=i386-apple-macosx \
@@ -18,6 +18,19 @@
 // RUN: %clang -v --target=i386-apple-macosx \
 // RUN:   --cuda-path=%S/Inputs/CUDA/usr/local/cuda 2>&1 | FileCheck %s
 
+// Check that we don't find a CUDA installation without libdevice ...
+// RUN: %clang -v --target=i386-unknown-linux \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s -check-prefix NOCUDA
+// RUN: %clang -v --target=i386-apple-macosx \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s -check-prefix NOCUDA
+
+// ... unless the user doesn't need libdevice
+// RUN: %clang -v --target=i386-unknown-linux -nocudalib \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s -check-prefix NO-LIBDEVICE
+// RUN: %clang -v --target=i386-apple-macosx -nocudalib \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s -check-prefix NO-LIBDEVICE
+
+
 // Make sure we

[PATCH] D38901: [CUDA] Require libdevice only if needed

2017-10-16 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL315902: [CUDA] Require libdevice only if needed (authored by 
Hahnfeld).

Changed prior to commit:
  https://reviews.llvm.org/D38901?vs=118969&id=119149#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D38901

Files:
  cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
  cfe/trunk/test/Driver/Inputs/CUDA-nolibdevice/usr/local/cuda/bin/.keep
  cfe/trunk/test/Driver/Inputs/CUDA-nolibdevice/usr/local/cuda/include/.keep
  cfe/trunk/test/Driver/Inputs/CUDA-nolibdevice/usr/local/cuda/lib/.keep
  cfe/trunk/test/Driver/Inputs/CUDA-nolibdevice/usr/local/cuda/lib64/.keep
  cfe/trunk/test/Driver/cuda-detect.cu


Index: cfe/trunk/test/Driver/cuda-detect.cu
===
--- cfe/trunk/test/Driver/cuda-detect.cu
+++ cfe/trunk/test/Driver/cuda-detect.cu
@@ -2,7 +2,7 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: nvptx-registered-target
 //
-// # Check that we properly detect CUDA installation.
+// Check that we properly detect CUDA installation.
 // RUN: %clang -v --target=i386-unknown-linux \
 // RUN:   --sysroot=%S/no-cuda-there 2>&1 | FileCheck %s -check-prefix NOCUDA
 // RUN: %clang -v --target=i386-apple-macosx \
@@ -18,6 +18,19 @@
 // RUN: %clang -v --target=i386-apple-macosx \
 // RUN:   --cuda-path=%S/Inputs/CUDA/usr/local/cuda 2>&1 | FileCheck %s
 
+// Check that we don't find a CUDA installation without libdevice ...
+// RUN: %clang -v --target=i386-unknown-linux \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s 
-check-prefix NOCUDA
+// RUN: %clang -v --target=i386-apple-macosx \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s 
-check-prefix NOCUDA
+
+// ... unless the user doesn't need libdevice
+// RUN: %clang -v --target=i386-unknown-linux -nocudalib \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s 
-check-prefix NO-LIBDEVICE
+// RUN: %clang -v --target=i386-apple-macosx -nocudalib \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s 
-check-prefix NO-LIBDEVICE
+
+
 // Make sure we map libdevice bitcode files to proper GPUs. These
 // tests use Inputs/CUDA_80 which has full set of libdevice files.
 // However, libdevice mapping only matches CUDA-7.x at the moment.
@@ -112,6 +125,7 @@
 // RUN: | FileCheck %s --check-prefix CHECK-CXXINCLUDE
 
 // CHECK: Found CUDA installation: {{.*}}/Inputs/CUDA/usr/local/cuda
+// NO-LIBDEVICE: Found CUDA installation: 
{{.*}}/Inputs/CUDA-nolibdevice/usr/local/cuda
 // NOCUDA-NOT: Found CUDA installation:
 
 // MISSINGLIBDEVICE: error: cannot find libdevice for sm_20.
Index: cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
===
--- cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
+++ cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
@@ -87,8 +87,7 @@
 LibDevicePath = InstallPath + "/nvvm/libdevice";
 
 auto &FS = D.getVFS();
-if (!(FS.exists(IncludePath) && FS.exists(BinPath) &&
-  FS.exists(LibDevicePath)))
+if (!(FS.exists(IncludePath) && FS.exists(BinPath)))
   continue;
 
 // On Linux, we have both lib and lib64 directories, and we need to choose
@@ -167,17 +166,9 @@
   }
 }
 
-// This code prevents IsValid from being set when
-// no libdevice has been found.
-bool allEmpty = true;
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())
-allEmpty = false;
-}
-
-if (allEmpty)
+// Check that we have found at least one libdevice that we can link in if
+// -nocudalib hasn't been specified.
+if (LibDeviceMap.empty() && !Args.hasArg(options::OPT_nocudalib))
   continue;
 
 IsValid = true;


Index: cfe/trunk/test/Driver/cuda-detect.cu
===
--- cfe/trunk/test/Driver/cuda-detect.cu
+++ cfe/trunk/test/Driver/cuda-detect.cu
@@ -2,7 +2,7 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: nvptx-registered-target
 //
-// # Check that we properly detect CUDA installation.
+// Check that we properly detect CUDA installation.
 // RUN: %clang -v --target=i386-unknown-linux \
 // RUN:   --sysroot=%S/no-cuda-there 2>&1 | FileCheck %s -check-prefix NOCUDA
 // RUN: %clang -v --target=i386-apple-macosx \
@@ -18,6 +18,19 @@
 // RUN: %clang -v --target=i386-apple-macosx \
 // RUN:   --cuda-path=%S/Inputs/CUDA/usr/local/cuda 2>&1 | FileCheck %s
 
+// Check that we don't find a CUDA installation without libdevice ...
+// RUN: %clang -v --target=i386-unknown-linux \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s -check-prefix NOCUDA
+// RUN: %clang -v --target=i386-apple-macosx \
+// RUN:   --sysroot=%S/Inputs/CUDA-nolibdevice 2>&1 | FileCheck %s -check-prefix NOCUDA
+
+// ... unless the user doesn't need libdevice
+// RUN: %clang -v --target=i386-unknown-linux -nocud

[PATCH] D38968: [OpenMP] Implement omp_is_initial_device() as builtin

2017-10-16 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.

This allows to return the static value that we know at compile time.


https://reviews.llvm.org/D38968

Files:
  include/clang/Basic/Builtins.def
  include/clang/Basic/Builtins.h
  lib/AST/ExprConstant.cpp
  lib/Basic/Builtins.cpp
  test/OpenMP/is_initial_device.c


Index: test/OpenMP/is_initial_device.c
===
--- /dev/null
+++ test/OpenMP/is_initial_device.c
@@ -0,0 +1,36 @@
+// REQUIRES: powerpc-registered-target
+
+// RUN: %clang_cc1 -verify -fopenmp -x c -triple powerpc64le-unknown-unknown 
-fopenmp-targets=powerpc64le-unknown-unknown \
+// RUN:-emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x ir -triple powerpc64le-unknown-unknown 
-emit-llvm \
+// RUN: %t-ppc-host.bc -o - | FileCheck %s -check-prefixes 
HOST,OUTLINED
+// RUN: %clang_cc1 -verify -fopenmp -x c -triple powerpc64le-unknown-unknown 
-emit-llvm -fopenmp-is-device \
+// RUN: %s -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | 
FileCheck %s -check-prefixes DEVICE,OUTLINED
+
+// expected-no-diagnostics
+int check() {
+  int host = omp_is_initial_device();
+  int device;
+#pragma omp target map(tofrom: device)
+  {
+device = omp_is_initial_device();
+  }
+
+  return host + device;
+}
+
+// The host should get a value of 1:
+// HOST: define{{.*}} @check()
+// HOST: [[HOST:%.*]] = alloca i32
+// HOST: store i32 1, i32* [[HOST]]
+
+// OUTLINED: define{{.*}} @{{.*}}omp_offloading{{.*}}(i32*{{.*}} 
[[DEVICE_ARGUMENT:%.*]])
+// OUTLINED: [[DEVICE_ADDR_STORAGE:%.*]] = alloca i32*
+// OUTLINED: store i32* [[DEVICE_ARGUMENT]], i32** [[DEVICE_ADDR_STORAGE]]
+// OUTLINED: [[DEVICE_ADDR:%.*]] = load i32*, i32** [[DEVICE_ADDR_STORAGE]]
+
+// The outlined function that is called as fallback also runs on the host:
+// HOST: store i32 1, i32* [[DEVICE_ADDR]]
+
+// The device should get a value of 0:
+// DEVICE: store i32 0, i32* [[DEVICE_ADDR]]
Index: lib/Basic/Builtins.cpp
===
--- lib/Basic/Builtins.cpp
+++ lib/Basic/Builtins.cpp
@@ -75,8 +75,9 @@
   (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES) == 
OCLC20_LANG;
   bool OclCUnsupported = !LangOpts.OpenCL &&
  (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES);
+  bool OpenMPUnsupported = !LangOpts.OpenMP && BuiltinInfo.Langs == OMP_LANG;
   return !BuiltinsUnsupported && !MathBuiltinsUnsupported && !OclCUnsupported 
&&
- !OclC1Unsupported && !OclC2Unsupported &&
+ !OclC1Unsupported && !OclC2Unsupported && !OpenMPUnsupported &&
  !GnuModeUnsupported && !MSModeUnsupported && !ObjCUnsupported;
 }
 
Index: lib/AST/ExprConstant.cpp
===
--- lib/AST/ExprConstant.cpp
+++ lib/AST/ExprConstant.cpp
@@ -7929,6 +7929,9 @@
 return BuiltinOp == Builtin::BI__atomic_always_lock_free ?
 Success(0, E) : Error(E);
   }
+  case Builtin::BIomp_is_initial_device:
+// We can decide statically which value the runtime would return if called.
+return Success(Info.getLangOpts().OpenMPIsDevice ? 0 : 1, E);
   }
 }
 
Index: include/clang/Basic/Builtins.h
===
--- include/clang/Basic/Builtins.h
+++ include/clang/Basic/Builtins.h
@@ -38,6 +38,7 @@
   MS_LANG = 0x10, // builtin requires MS mode.
   OCLC20_LANG = 0x20, // builtin for OpenCL C 2.0 only.
   OCLC1X_LANG = 0x40, // builtin for OpenCL C 1.x only.
+  OMP_LANG = 0x80,// builtin requires OpenMP.
   ALL_LANGUAGES = C_LANG | CXX_LANG | OBJC_LANG, // builtin for all languages.
   ALL_GNU_LANGUAGES = ALL_LANGUAGES | GNU_LANG,  // builtin requires GNU mode.
   ALL_MS_LANGUAGES = ALL_LANGUAGES | MS_LANG,// builtin requires MS mode.
Index: include/clang/Basic/Builtins.def
===
--- include/clang/Basic/Builtins.def
+++ include/clang/Basic/Builtins.def
@@ -1434,6 +1434,9 @@
 BUILTIN(__builtin_os_log_format_buffer_size, "zcC*.", "p:0:nut")
 BUILTIN(__builtin_os_log_format, "v*v*cC*.", "p:0:nt")
 
+// OpenMP 4.0
+LANGBUILTIN(omp_is_initial_device, "i", "nc", OMP_LANG)
+
 // Builtins for XRay
 BUILTIN(__xray_customevent, "vcC*z", "")
 


Index: test/OpenMP/is_initial_device.c
===
--- /dev/null
+++ test/OpenMP/is_initial_device.c
@@ -0,0 +1,36 @@
+// REQUIRES: powerpc-registered-target
+
+// RUN: %clang_cc1 -verify -fopenmp -x c -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-unknown-unknown \
+// RUN:-emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x ir -triple powerpc64le-unknown-unknown -emit-llvm \
+// RUN: %t-ppc-host.bc -o - | FileCheck %s -check-prefixes HOST,OUTLINED
+// RUN: %clang_cc1 -verify -fopenmp -x c -triple powerpc64le-unknown-unknown -emit-llvm -fopenmp-is-de

[PATCH] D38968: [OpenMP] Implement omp_is_initial_device() as builtin

2017-10-16 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D38968#898951, @grokos wrote:

> Now that this issue has been addressed and regressions tests pass, should we 
> re-enable Cmake to build libomptarget by default?


Yes, I already have a local patch which also takes care of restricting the 
tests to Clang versions newer than 6.0.0. I will post it for review once I've 
committed this revision.


https://reviews.llvm.org/D38968



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38883: [CMake][OpenMP] Customize default offloading arch

2017-10-17 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL315996: [CMake][OpenMP] Customize default offloading arch 
(authored by Hahnfeld).

Changed prior to commit:
  https://reviews.llvm.org/D38883?vs=118961&id=119310#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D38883

Files:
  cfe/trunk/CMakeLists.txt
  cfe/trunk/include/clang/Config/config.h.cmake
  cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
  cfe/trunk/lib/Driver/ToolChains/Cuda.h


Index: cfe/trunk/include/clang/Config/config.h.cmake
===
--- cfe/trunk/include/clang/Config/config.h.cmake
+++ cfe/trunk/include/clang/Config/config.h.cmake
@@ -20,6 +20,9 @@
 /* Default OpenMP runtime used by -fopenmp. */
 #define CLANG_DEFAULT_OPENMP_RUNTIME "${CLANG_DEFAULT_OPENMP_RUNTIME}"
 
+/* Default architecture for OpenMP offloading to Nvidia GPUs. */
+#define CLANG_OPENMP_NVPTX_DEFAULT_ARCH "${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}"
+
 /* Multilib suffix for libdir. */
 #define CLANG_LIBDIR_SUFFIX "${CLANG_LIBDIR_SUFFIX}"
 
Index: cfe/trunk/CMakeLists.txt
===
--- cfe/trunk/CMakeLists.txt
+++ cfe/trunk/CMakeLists.txt
@@ -235,6 +235,17 @@
 set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING
   "Default OpenMP runtime used by -fopenmp.")
 
+# OpenMP offloading requires at least sm_30 because we use shuffle instructions
+# to generate efficient code for reductions.
+set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+  "Default architecture for OpenMP offloading to Nvidia GPUs.")
+string(REGEX MATCH "^sm_([0-9]+)$" MATCHED_ARCH 
"${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}")
+if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 30)
+  message(WARNING "Resetting default architecture for OpenMP offloading to 
Nvidia GPUs to sm_30")
+  set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+"Default architecture for OpenMP offloading to Nvidia GPUs." FORCE)
+endif()
+
 set(CLANG_VENDOR ${PACKAGE_VENDOR} CACHE STRING
   "Vendor-specific text for showing with version information.")
 
Index: cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
===
--- cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
+++ cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
@@ -542,9 +542,9 @@
   // flags are not duplicated.
   // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
-for (Arg *A : Args){
+for (Arg *A : Args) {
   bool IsDuplicate = false;
-  for (Arg *DALArg : *DAL){
+  for (Arg *DALArg : *DAL) {
 if (A == DALArg) {
   IsDuplicate = true;
   break;
@@ -555,14 +555,9 @@
 }
 
 StringRef Arch = DAL->getLastArgValue(options::OPT_march_EQ);
-if (Arch.empty()) {
-  // Default compute capability for CUDA toolchain is the
-  // lowest compute capability supported by the installed
-  // CUDA version.
-  DAL->AddJoinedArg(nullptr,
-  Opts.getOption(options::OPT_march_EQ),
-  CudaInstallation.getLowestExistingArch());
-}
+if (Arch.empty())
+  DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
+CLANG_OPENMP_NVPTX_DEFAULT_ARCH);
 
 return DAL;
   }
Index: cfe/trunk/lib/Driver/ToolChains/Cuda.h
===
--- cfe/trunk/lib/Driver/ToolChains/Cuda.h
+++ cfe/trunk/lib/Driver/ToolChains/Cuda.h
@@ -76,17 +76,6 @@
   std::string getLibDeviceFile(StringRef Gpu) const {
 return LibDeviceMap.lookup(Gpu);
   }
-  /// \brief Get lowest available compute capability
-  /// for which a libdevice library exists.
-  std::string getLowestExistingArch() const {
-std::string LibDeviceFile;
-for (auto key : LibDeviceMap.keys()) {
-  LibDeviceFile = LibDeviceMap.lookup(key);
-  if (!LibDeviceFile.empty())
-return key;
-}
-return "sm_20";
-  }
 };
 
 namespace tools {


Index: cfe/trunk/include/clang/Config/config.h.cmake
===
--- cfe/trunk/include/clang/Config/config.h.cmake
+++ cfe/trunk/include/clang/Config/config.h.cmake
@@ -20,6 +20,9 @@
 /* Default OpenMP runtime used by -fopenmp. */
 #define CLANG_DEFAULT_OPENMP_RUNTIME "${CLANG_DEFAULT_OPENMP_RUNTIME}"
 
+/* Default architecture for OpenMP offloading to Nvidia GPUs. */
+#define CLANG_OPENMP_NVPTX_DEFAULT_ARCH "${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}"
+
 /* Multilib suffix for libdir. */
 #define CLANG_LIBDIR_SUFFIX "${CLANG_LIBDIR_SUFFIX}"
 
Index: cfe/trunk/CMakeLists.txt
===
--- cfe/trunk/CMakeLists.txt
+++ cfe/trunk/CMakeLists.txt
@@ -235,6 +235,17 @@
 set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING
   "Default OpenMP runtime used by -fopenmp.")
 
+# OpenMP offloading requires at least sm_30 because we use shuffle instructions
+# to generate efficient code 

[PATCH] D38968: [OpenMP] Implement omp_is_initial_device() as builtin

2017-10-17 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL316001: [OpenMP] Implement omp_is_initial_device() as 
builtin (authored by Hahnfeld).

Changed prior to commit:
  https://reviews.llvm.org/D38968?vs=119190&id=119320#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D38968

Files:
  cfe/trunk/include/clang/Basic/Builtins.def
  cfe/trunk/include/clang/Basic/Builtins.h
  cfe/trunk/lib/AST/ExprConstant.cpp
  cfe/trunk/lib/Basic/Builtins.cpp
  cfe/trunk/test/OpenMP/is_initial_device.c


Index: cfe/trunk/include/clang/Basic/Builtins.h
===
--- cfe/trunk/include/clang/Basic/Builtins.h
+++ cfe/trunk/include/clang/Basic/Builtins.h
@@ -38,6 +38,7 @@
   MS_LANG = 0x10, // builtin requires MS mode.
   OCLC20_LANG = 0x20, // builtin for OpenCL C 2.0 only.
   OCLC1X_LANG = 0x40, // builtin for OpenCL C 1.x only.
+  OMP_LANG = 0x80,// builtin requires OpenMP.
   ALL_LANGUAGES = C_LANG | CXX_LANG | OBJC_LANG, // builtin for all languages.
   ALL_GNU_LANGUAGES = ALL_LANGUAGES | GNU_LANG,  // builtin requires GNU mode.
   ALL_MS_LANGUAGES = ALL_LANGUAGES | MS_LANG,// builtin requires MS mode.
Index: cfe/trunk/include/clang/Basic/Builtins.def
===
--- cfe/trunk/include/clang/Basic/Builtins.def
+++ cfe/trunk/include/clang/Basic/Builtins.def
@@ -1434,6 +1434,9 @@
 BUILTIN(__builtin_os_log_format_buffer_size, "zcC*.", "p:0:nut")
 BUILTIN(__builtin_os_log_format, "v*v*cC*.", "p:0:nt")
 
+// OpenMP 4.0
+LANGBUILTIN(omp_is_initial_device, "i", "nc", OMP_LANG)
+
 // Builtins for XRay
 BUILTIN(__xray_customevent, "vcC*z", "")
 
Index: cfe/trunk/test/OpenMP/is_initial_device.c
===
--- cfe/trunk/test/OpenMP/is_initial_device.c
+++ cfe/trunk/test/OpenMP/is_initial_device.c
@@ -0,0 +1,36 @@
+// REQUIRES: powerpc-registered-target
+
+// RUN: %clang_cc1 -verify -fopenmp -x c -triple powerpc64le-unknown-unknown 
-fopenmp-targets=powerpc64le-unknown-unknown \
+// RUN:-emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x ir -triple powerpc64le-unknown-unknown 
-emit-llvm \
+// RUN: %t-ppc-host.bc -o - | FileCheck %s -check-prefixes 
HOST,OUTLINED
+// RUN: %clang_cc1 -verify -fopenmp -x c -triple powerpc64le-unknown-unknown 
-emit-llvm -fopenmp-is-device \
+// RUN: %s -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | 
FileCheck %s -check-prefixes DEVICE,OUTLINED
+
+// expected-no-diagnostics
+int check() {
+  int host = omp_is_initial_device();
+  int device;
+#pragma omp target map(tofrom: device)
+  {
+device = omp_is_initial_device();
+  }
+
+  return host + device;
+}
+
+// The host should get a value of 1:
+// HOST: define{{.*}} @check()
+// HOST: [[HOST:%.*]] = alloca i32
+// HOST: store i32 1, i32* [[HOST]]
+
+// OUTLINED: define{{.*}} @{{.*}}omp_offloading{{.*}}(i32*{{.*}} 
[[DEVICE_ARGUMENT:%.*]])
+// OUTLINED: [[DEVICE_ADDR_STORAGE:%.*]] = alloca i32*
+// OUTLINED: store i32* [[DEVICE_ARGUMENT]], i32** [[DEVICE_ADDR_STORAGE]]
+// OUTLINED: [[DEVICE_ADDR:%.*]] = load i32*, i32** [[DEVICE_ADDR_STORAGE]]
+
+// The outlined function that is called as fallback also runs on the host:
+// HOST: store i32 1, i32* [[DEVICE_ADDR]]
+
+// The device should get a value of 0:
+// DEVICE: store i32 0, i32* [[DEVICE_ADDR]]
Index: cfe/trunk/lib/AST/ExprConstant.cpp
===
--- cfe/trunk/lib/AST/ExprConstant.cpp
+++ cfe/trunk/lib/AST/ExprConstant.cpp
@@ -7929,6 +7929,9 @@
 return BuiltinOp == Builtin::BI__atomic_always_lock_free ?
 Success(0, E) : Error(E);
   }
+  case Builtin::BIomp_is_initial_device:
+// We can decide statically which value the runtime would return if called.
+return Success(Info.getLangOpts().OpenMPIsDevice ? 0 : 1, E);
   }
 }
 
Index: cfe/trunk/lib/Basic/Builtins.cpp
===
--- cfe/trunk/lib/Basic/Builtins.cpp
+++ cfe/trunk/lib/Basic/Builtins.cpp
@@ -75,8 +75,9 @@
   (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES) == 
OCLC20_LANG;
   bool OclCUnsupported = !LangOpts.OpenCL &&
  (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES);
+  bool OpenMPUnsupported = !LangOpts.OpenMP && BuiltinInfo.Langs == OMP_LANG;
   return !BuiltinsUnsupported && !MathBuiltinsUnsupported && !OclCUnsupported 
&&
- !OclC1Unsupported && !OclC2Unsupported &&
+ !OclC1Unsupported && !OclC2Unsupported && !OpenMPUnsupported &&
  !GnuModeUnsupported && !MSModeUnsupported && !ObjCUnsupported;
 }
 


Index: cfe/trunk/include/clang/Basic/Builtins.h
===
--- cfe/trunk/include/clang/Basic/Builtins.h
+++ cfe/trunk/include/clang/Basic/Builtins.h
@@ -38,6 +38,7 @@
   MS_LANG = 0x10, // builtin requires MS m

[PATCH] D26244: [Driver] Add CLANG_PREFER_GCC_LIBRARIES which can be disabled

2017-10-19 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld abandoned this revision.
Hahnfeld added a comment.

Abandoning as I lost interest in this.


https://reviews.llvm.org/D26244



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D39136: [OpenMP] Avoid VLAs for some reductions on array sections

2017-10-20 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.

In some cases the compiler can deduce the length of an array section
as constants. With this information, VLAs can be avoided in place of
a constant sized array or even a scalar value if the length is 1.
Example:

  int a[4], b[2];
  pragma omp parallel reduction(+: a[1:2], b[1:1])
  { }

For chained array sections, this optimization is restricted to cases
where all array sections except the last have a constant length 1.
This trivially guarantees that there are no holes in the memory region
that needs to be privatized.
Example:

  int c[3][4];
  pragma omp parallel reduction(+: c[1:1][1:2])
  { }


https://reviews.llvm.org/D39136

Files:
  lib/CodeGen/CGOpenMPRuntime.cpp
  lib/CodeGen/CGStmtOpenMP.cpp
  lib/Sema/SemaOpenMP.cpp
  test/OpenMP/for_reduction_codegen.cpp
  test/OpenMP/for_reduction_codegen_UDR.cpp

Index: test/OpenMP/for_reduction_codegen_UDR.cpp
===
--- test/OpenMP/for_reduction_codegen_UDR.cpp
+++ test/OpenMP/for_reduction_codegen_UDR.cpp
@@ -40,15 +40,16 @@
 // CHECK-DAG: [[REDUCTION_LOCK:@.+]] = common global [8 x i32] zeroinitializer
 
 #pragma omp declare reduction(operator&& : int : omp_out = 111 & omp_in)
-template 
+template 
 T tmain() {
   T t;
   S test;
   T t_var = T(), t_var1;
   T vec[] = {1, 2};
   S s_arr[] = {1, 2};
   S &var = test;
   S var1;
+  S arr[length];
 #pragma omp declare reduction(operator& : T : omp_out = 15 + omp_in)
 #pragma omp declare reduction(operator+ : T : omp_out = 1513 + omp_in) initializer(omp_priv = 321)
 #pragma omp declare reduction(min : T : omp_out = 47 - omp_in) initializer(omp_priv = 432 / omp_orig)
@@ -66,6 +67,12 @@
 vec[i] = t_var;
 s_arr[i] = var;
   }
+#pragma omp parallel
+#pragma omp for reduction(+ : arr[1:length-2])
+  for (int i = 0; i < 2; ++i) {
+vec[i] = t_var;
+s_arr[i] = var;
+  }
   return T();
 }
 
@@ -78,12 +85,12 @@
   S test;
   float t_var = 0, t_var1;
   int vec[] = {1, 2};
-  S s_arr[] = {1, 2};
+  S s_arr[] = {1, 2, 3, 4};
   S &var = test;
   S var1, arrs[10][4];
   S **var2 = foo();
-  S vvar2[2];
-  S(&var3)[2] = s_arr;
+  S vvar2[5];
+  S(&var3)[4] = s_arr;
 #pragma omp declare reduction(operator+ : int : omp_out = 555 * omp_in) initializer(omp_priv = 888)
 #pragma omp parallel
 #pragma omp for reduction(+ : t_var) reduction(& : var) reduction(&& : var1) reduction(min : t_var1)
@@ -115,24 +122,24 @@
 #pragma omp for reduction(& : var3)
   for (int i = 0; i < 10; ++i)
 ;
-  return tmain();
+  return tmain();
 }
 
 // CHECK: define {{.*}}i{{[0-9]+}} @main()
 // CHECK: [[TEST:%.+]] = alloca [[S_FLOAT_TY]],
 // CHECK: call {{.*}} [[S_FLOAT_TY_CONSTR:@.+]]([[S_FLOAT_TY]]* [[TEST]])
-// CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 6, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, float*, [[S_FLOAT_TY]]*, [[S_FLOAT_TY]]*, float*, [2 x i32]*, [2 x [[S_FLOAT_TY]]]*)* [[MAIN_MICROTASK:@.+]] to void
+// CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 6, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, float*, [[S_FLOAT_TY]]*, [[S_FLOAT_TY]]*, float*, [2 x i32]*, [4 x [[S_FLOAT_TY]]]*)* [[MAIN_MICROTASK:@.+]] to void
 // CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 5, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, i64, i64, i32*, [2 x i32]*, [10 x [4 x [[S_FLOAT_TY*)* [[MAIN_MICROTASK1:@.+]] to void
 // CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 4, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, i64, i64, i32*, [10 x [4 x [[S_FLOAT_TY*)* [[MAIN_MICROTASK2:@.+]] to void
 // CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 1, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, [[S_FLOAT_TY]]***)* [[MAIN_MICROTASK3:@.+]] to void
-// CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 1, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, [2 x [[S_FLOAT_TY]]]*)* [[MAIN_MICROTASK4:@.+]] to void
-// CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 1, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, [2 x [[S_FLOAT_TY]]]*)* [[MAIN_MICROTASK5:@.+]] to void
-// CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}},

[PATCH] D39136: [OpenMP] Avoid VLAs for some reductions on array sections

2017-10-20 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL316229: [OpenMP] Avoid VLAs for some reductions on array 
sections (authored by Hahnfeld).

Changed prior to commit:
  https://reviews.llvm.org/D39136?vs=119687&id=119689#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D39136

Files:
  cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp
  cfe/trunk/lib/CodeGen/CGStmtOpenMP.cpp
  cfe/trunk/lib/Sema/SemaOpenMP.cpp
  cfe/trunk/test/OpenMP/for_reduction_codegen.cpp
  cfe/trunk/test/OpenMP/for_reduction_codegen_UDR.cpp

Index: cfe/trunk/lib/Sema/SemaOpenMP.cpp
===
--- cfe/trunk/lib/Sema/SemaOpenMP.cpp
+++ cfe/trunk/lib/Sema/SemaOpenMP.cpp
@@ -9330,6 +9330,68 @@
 };
 } // namespace
 
+static bool CheckOMPArraySectionConstantForReduction(
+ASTContext &Context, const OMPArraySectionExpr *OASE, bool &SingleElement,
+SmallVectorImpl &ArraySizes) {
+  const Expr *Length = OASE->getLength();
+  if (Length == nullptr) {
+// For array sections of the form [1:] or [:], we would need to analyze
+// the lower bound...
+if (OASE->getColonLoc().isValid())
+  return false;
+
+// This is an array subscript which has implicit length 1!
+SingleElement = true;
+ArraySizes.push_back(llvm::APSInt::get(1));
+  } else {
+llvm::APSInt ConstantLengthValue;
+if (!Length->EvaluateAsInt(ConstantLengthValue, Context))
+  return false;
+
+SingleElement = (ConstantLengthValue.getSExtValue() == 1);
+ArraySizes.push_back(ConstantLengthValue);
+  }
+
+  // Get the base of this array section and walk up from there.
+  const Expr *Base = OASE->getBase()->IgnoreParenImpCasts();
+
+  // We require length = 1 for all array sections except the right-most to
+  // guarantee that the memory region is contiguous and has no holes in it.
+  while (const auto *TempOASE = dyn_cast(Base)) {
+Length = TempOASE->getLength();
+if (Length == nullptr) {
+  // For array sections of the form [1:] or [:], we would need to analyze
+  // the lower bound...
+  if (OASE->getColonLoc().isValid())
+return false;
+
+  // This is an array subscript which has implicit length 1!
+  ArraySizes.push_back(llvm::APSInt::get(1));
+} else {
+  llvm::APSInt ConstantLengthValue;
+  if (!Length->EvaluateAsInt(ConstantLengthValue, Context) ||
+  ConstantLengthValue.getSExtValue() != 1)
+return false;
+
+  ArraySizes.push_back(ConstantLengthValue);
+}
+Base = TempOASE->getBase()->IgnoreParenImpCasts();
+  }
+
+  // If we have a single element, we don't need to add the implicit lengths.
+  if (!SingleElement) {
+while (const auto *TempASE = dyn_cast(Base)) {
+  // Has implicit length 1!
+  ArraySizes.push_back(llvm::APSInt::get(1));
+  Base = TempASE->getBase()->IgnoreParenImpCasts();
+}
+  }
+
+  // This array section can be privatized as a single value or as a constant
+  // sized array.
+  return true;
+}
+
 static bool ActOnOMPReductionKindClause(
 Sema &S, DSAStackTy *Stack, OpenMPClauseKind ClauseKind,
 ArrayRef VarList, SourceLocation StartLoc, SourceLocation LParenLoc,
@@ -9628,7 +9690,26 @@
 auto *RHSVD = buildVarDecl(S, ELoc, Type, D->getName(),
D->hasAttrs() ? &D->getAttrs() : nullptr);
 auto PrivateTy = Type;
-if (OASE ||
+
+// Try if we can determine constant lengths for all array sections and avoid
+// the VLA.
+bool ConstantLengthOASE = false;
+if (OASE) {
+  bool SingleElement;
+  llvm::SmallVector ArraySizes;
+  ConstantLengthOASE = CheckOMPArraySectionConstantForReduction(
+  Context, OASE, SingleElement, ArraySizes);
+
+  // If we don't have a single element, we must emit a constant array type.
+  if (ConstantLengthOASE && !SingleElement) {
+for (auto &Size : ArraySizes) {
+  PrivateTy = Context.getConstantArrayType(
+  PrivateTy, Size, ArrayType::Normal, /*IndexTypeQuals=*/0);
+}
+  }
+}
+
+if ((OASE && !ConstantLengthOASE) ||
 (!ASE &&
  D->getType().getNonReferenceType()->isVariablyModifiedType())) {
   // For arrays/array sections only:
Index: cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp
+++ cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -925,7 +925,7 @@
   cast(cast(ClausesData[N].Private)->getDecl());
   QualType PrivateType = PrivateVD->getType();
   bool AsArraySection = isa(ClausesData[N].Ref);
-  if (!AsArraySection && !PrivateType->isVariablyModifiedType()) {
+  if (!PrivateType->isVariablyModifiedType()) {
 Sizes.emplace_back(
 CGF.getTypeSize(
 SharedAddresses[N].first.getType().getNonReferenceType()),
@@ -963,10 +963,9 @@
   auto *PrivateVD =
   cast(cast(ClausesData[N].Private)->getDecl());
   QualType Priv

[PATCH] D39136: [OpenMP] Avoid VLAs for some reductions on array sections

2017-10-20 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld reopened this revision.
Hahnfeld added a comment.
This revision is now accepted and ready to land.

At least two buildbots failing:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/1175
http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/10478


Repository:
  rL LLVM

https://reviews.llvm.org/D39136



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D39136: [OpenMP] Avoid VLAs for some reductions on array sections

2017-10-23 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL316362: [OpenMP] Avoid VLAs for some reductions on array 
sections (authored by Hahnfeld).

Changed prior to commit:
  https://reviews.llvm.org/D39136?vs=119689&id=119909#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D39136

Files:
  cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp
  cfe/trunk/lib/CodeGen/CGStmtOpenMP.cpp
  cfe/trunk/lib/Sema/SemaOpenMP.cpp
  cfe/trunk/test/OpenMP/for_reduction_codegen.cpp
  cfe/trunk/test/OpenMP/for_reduction_codegen_UDR.cpp

Index: cfe/trunk/test/OpenMP/for_reduction_codegen.cpp
===
--- cfe/trunk/test/OpenMP/for_reduction_codegen.cpp
+++ cfe/trunk/test/OpenMP/for_reduction_codegen.cpp
@@ -27,15 +27,16 @@
 // CHECK-DAG: [[REDUCTION_LOC:@.+]] = private unnamed_addr constant %{{.+}} { i32 0, i32 18, i32 0, i32 0, i8*
 // CHECK-DAG: [[REDUCTION_LOCK:@.+]] = common global [8 x i32] zeroinitializer
 
-template 
+template 
 T tmain() {
   T t;
   S test;
   T t_var = T(), t_var1;
   T vec[] = {1, 2};
   S s_arr[] = {1, 2};
   S &var = test;
   S var1;
+  S arr[length];
 #pragma omp parallel
 #pragma omp for reduction(+:t_var) reduction(&:var) reduction(&& : var1) reduction(min: t_var1) nowait
   for (int i = 0; i < 2; ++i) {
@@ -48,6 +49,12 @@
 vec[i] = t_var;
 s_arr[i] = var;
   }
+#pragma omp parallel
+#pragma omp for reduction(+ : arr[1:length-2])
+  for (int i = 0; i < 2; ++i) {
+vec[i] = t_var;
+s_arr[i] = var;
+  }
   return T();
 }
 
@@ -180,12 +187,12 @@
   S test;
   float t_var = 0, t_var1;
   int vec[] = {1, 2};
-  S s_arr[] = {1, 2};
+  S s_arr[] = {1, 2, 3, 4};
   S &var = test;
   S var1, arrs[10][4];
   S **var2 = foo();
-  S vvar2[2];
-  S (&var3)[2] = s_arr;
+  S vvar2[5];
+  S (&var3)[4] = s_arr;
 #pragma omp parallel
 #pragma omp for reduction(+:t_var) reduction(&:var) reduction(&& : var1) reduction(min: t_var1)
   for (int i = 0; i < 2; ++i) {
@@ -205,36 +212,62 @@
   for (int i = 0; i < 10; ++i)
 ;
 #pragma omp parallel
+#pragma omp for reduction(& : var2[1][1 : 6])
+  for (int i = 0; i < 10; ++i)
+;
+#pragma omp parallel
+#pragma omp for reduction(& : var2[1 : 1][1 : 6])
+  for (int i = 0; i < 10; ++i)
+;
+#pragma omp parallel
+#pragma omp for reduction(& : var2[1 : 1][1])
+  for (int i = 0; i < 10; ++i)
+;
+#pragma omp parallel
 #pragma omp for reduction(& : vvar2[0 : 5])
   for (int i = 0; i < 10; ++i)
 ;
 #pragma omp parallel
 #pragma omp for reduction(& : var3[1 : 2])
   for (int i = 0; i < 10; ++i)
 ;
 #pragma omp parallel
+#pragma omp for reduction(& : var3[ : 2])
+  for (int i = 0; i < 10; ++i)
+;
+  // TODO: The compiler should also be able to generate a constant sized array in this case!
+#pragma omp parallel
+#pragma omp for reduction(& : var3[2 : ])
+  for (int i = 0; i < 10; ++i)
+;
+#pragma omp parallel
 #pragma omp for reduction(& : var3)
   for (int i = 0; i < 10; ++i)
 ;
-  return tmain();
+  return tmain();
 #endif
 }
 
 // CHECK: define {{.*}}i{{[0-9]+}} @main()
 // CHECK: [[TEST:%.+]] = alloca [[S_FLOAT_TY]],
 // CHECK: call {{.*}} [[S_FLOAT_TY_CONSTR:@.+]]([[S_FLOAT_TY]]* [[TEST]])
-// CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 6, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, float*, [[S_FLOAT_TY]]*, [[S_FLOAT_TY]]*, float*, [2 x i32]*, [2 x [[S_FLOAT_TY]]]*)* [[MAIN_MICROTASK:@.+]] to void
+// CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 6, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, float*, [[S_FLOAT_TY]]*, [[S_FLOAT_TY]]*, float*, [2 x i32]*, [4 x [[S_FLOAT_TY]]]*)* [[MAIN_MICROTASK:@.+]] to void
 // CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 5, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, i64, i64, i32*, [2 x i32]*, [10 x [4 x [[S_FLOAT_TY*)* [[MAIN_MICROTASK1:@.+]] to void
 // CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 4, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, i64, i64, i32*, [10 x [4 x [[S_FLOAT_TY*)* [[MAIN_MICROTASK2:@.+]] to void
 // CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 1, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-9]+}}*, i{{[0-9]+}}*, [[S_FLOAT_TY]]***)* [[MAIN_MICROTASK3:@.+]] to void
-// CHECK: call void (%{{.+}}*, i{{[0-9]+}}, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)*, ...) @__kmpc_fork_call(%{{.+}}* @{{.+}}, i{{[0-9]+}} 1, void (i{{[0-9]+}}*, i{{[0-9]+}}*, ...)* bitcast (void (i{{[0-

[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-23 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

The discussion kind of moved away from the original patch, probably because the 
problem is larger than the defition of some host macros. However I still think 
that this patch improves the situation.


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-23 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1211463, @gregrodgers wrote:

> What am I missing?


As discussed above this patch doesn't fix this problem. However we need 
`__x86_64__` because `bits/wordsize.h` will use it to determine if we are 64- 
or 32-bit.


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-23 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld planned changes to this revision.
Hahnfeld added a comment.

This patch breaks C++ and CUDA compilation at the moment, sorry. I need to find 
and add more macros that turn out to be needed.


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-24 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld updated this revision to Diff 162328.
Hahnfeld added a comment.
Herald added a subscriber: krytarowski.

Add required macros for compiling C++ code.


https://reviews.llvm.org/D50845

Files:
  lib/Frontend/InitPreprocessor.cpp
  test/Preprocessor/aux-triple.c
  test/SemaCUDA/builtins.cu

Index: test/SemaCUDA/builtins.cu
===
--- test/SemaCUDA/builtins.cu
+++ test/SemaCUDA/builtins.cu
@@ -12,8 +12,8 @@
 // RUN: -aux-triple x86_64-unknown-unknown \
 // RUN: -fsyntax-only -verify %s
 
-#if !(defined(__amd64__) && defined(__PTX__))
-#error "Expected to see preprocessor macros from both sides of compilation."
+#if !defined(__x86_64__)
+#error "Expected to see preprocessor macros from the host."
 #endif
 
 void hf() {
Index: test/Preprocessor/aux-triple.c
===
--- /dev/null
+++ test/Preprocessor/aux-triple.c
@@ -0,0 +1,62 @@
+// Ensure that Clang sets some very basic target defines based on -aux-triple.
+
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+// RUN: %clang_cc1 -x c++ -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+
+// CUDA:
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none -aux-triple powerpc64le-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s \
+// RUN: -check-prefixes NVPTX64,PPC64,LINUX,LINUX-CPP
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s \
+// RUN: -check-prefixes NVPTX64,X86_64,LINUX,LINUX-CPP
+
+// OpenMP:
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device -triple nvptx64-none-none \
+// RUN: -aux-triple powerpc64le-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,PPC64,LINUX %s
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device -triple nvptx64-none-none \
+// RUN: -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,X86_64,LINUX %s
+// RUN: %clang_cc1 -x c++ -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device -triple nvptx64-none-none \
+// RUN: -aux-triple powerpc64le-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s \
+// RUN: -check-prefixes NVPTX64,PPC64,LINUX,LINUX-CPP
+// RUN: %clang_cc1 -x c++ -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device -triple nvptx64-none-none \
+// RUN: -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s \
+// RUN: -check-prefixes NVPTX64,X86_64,LINUX,LINUX-CPP
+
+// NONE-NOT:#define _GNU_SOURCE
+// LINUX-CPP:#define _GNU_SOURCE 1
+
+// NVPTX64:#define _LP64 1
+
+// NONE-NOT:#define __ELF__
+// LINUX:#define __ELF__ 1
+
+// NVPTX64:#define __LP64__ 1
+// NVPTX64:#define __NVPTX__ 1
+// NVPTX64:#define __PTX__ 1
+
+// NONE-NOT:#define __linux__
+// LINUX:#define __linux__ 1
+
+// NONE-NOT:#define __powerpc64__
+// PPC64:#define __powerpc64__ 1
+
+// NONE-NOT:#define __x86_64__
+// X86_64:#define __x86_64__ 1
Index: lib/Frontend/InitPreprocessor.cpp
===
--- lib/Frontend/InitPreprocessor.cpp
+++ lib/Frontend/InitPreprocessor.cpp
@@ -1099,6 +1099,37 @@
   TI.getTargetDefines(LangOpts, Builder);
 }
 
+/// Initialize macros based on AuxTargetInfo.
+static void InitializePredefinedAuxMacros(const TargetInfo &AuxTI,
+  const LangOptions &LangOpts,
+  MacroBuilder &Builder) {
+  auto AuxTriple = AuxTI.getTriple();
+
+  // Define basic target macros needed by at least bits/wordsize.h and
+  // bits/mathinline.h
+  switch (AuxTriple.getArch()) {
+  case llvm::Triple::x86_64:
+Builder.defineMacro("__x86_64__");
+break;
+  case llvm::Triple::ppc64:
+  case llvm::Triple::ppc64le:
+Builder.defineMacro("__powerpc64__");
+break;
+  default:
+break;
+  }
+
+  // Checked in libc++ to find out object file format and threading API.
+  if (AuxTriple.getOS() == llvm::Triple::Linux) {
+Builder.defineMacro("__ELF__");
+Builder.defineMacro("__linux__");
+// Used in features.h. If this is omitted, math.h doesn't declare float
+// versions of the functions in bits/mathcalls.h.
+if (LangOpts.CPlusPlus)
+  Builder.defineMacro("_GNU_SOURCE");
+  }
+}
+
 /// Ini

[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-25 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld updated this revision to Diff 162543.
Hahnfeld added a comment.

Based on libc++ I guessed some more macros that may be needed on macOS and 
Windows. As I can't test myself if somebody else could report if this change is 
regressing CUDA support on these platforms.


https://reviews.llvm.org/D50845

Files:
  lib/Frontend/InitPreprocessor.cpp
  test/Preprocessor/aux-triple.c
  test/SemaCUDA/builtins.cu

Index: test/SemaCUDA/builtins.cu
===
--- test/SemaCUDA/builtins.cu
+++ test/SemaCUDA/builtins.cu
@@ -12,8 +12,8 @@
 // RUN: -aux-triple x86_64-unknown-unknown \
 // RUN: -fsyntax-only -verify %s
 
-#if !(defined(__amd64__) && defined(__PTX__))
-#error "Expected to see preprocessor macros from both sides of compilation."
+#if !defined(__x86_64__)
+#error "Expected to see preprocessor macros from the host."
 #endif
 
 void hf() {
Index: test/Preprocessor/aux-triple.c
===
--- /dev/null
+++ test/Preprocessor/aux-triple.c
@@ -0,0 +1,62 @@
+// Ensure that Clang sets some very basic target defines based on -aux-triple.
+
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+// RUN: %clang_cc1 -x c++ -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+
+// CUDA:
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none -aux-triple powerpc64le-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s \
+// RUN: -check-prefixes NVPTX64,PPC64,LINUX,LINUX-CPP
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s \
+// RUN: -check-prefixes NVPTX64,X86_64,LINUX,LINUX-CPP
+
+// OpenMP:
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device -triple nvptx64-none-none \
+// RUN: -aux-triple powerpc64le-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,PPC64,LINUX %s
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device -triple nvptx64-none-none \
+// RUN: -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,X86_64,LINUX %s
+// RUN: %clang_cc1 -x c++ -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device -triple nvptx64-none-none \
+// RUN: -aux-triple powerpc64le-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s \
+// RUN: -check-prefixes NVPTX64,PPC64,LINUX,LINUX-CPP
+// RUN: %clang_cc1 -x c++ -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device -triple nvptx64-none-none \
+// RUN: -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s \
+// RUN: -check-prefixes NVPTX64,X86_64,LINUX,LINUX-CPP
+
+// NONE-NOT:#define _GNU_SOURCE
+// LINUX-CPP:#define _GNU_SOURCE 1
+
+// NVPTX64:#define _LP64 1
+
+// NONE-NOT:#define __ELF__
+// LINUX:#define __ELF__ 1
+
+// NVPTX64:#define __LP64__ 1
+// NVPTX64:#define __NVPTX__ 1
+// NVPTX64:#define __PTX__ 1
+
+// NONE-NOT:#define __linux__
+// LINUX:#define __linux__ 1
+
+// NONE-NOT:#define __powerpc64__
+// PPC64:#define __powerpc64__ 1
+
+// NONE-NOT:#define __x86_64__
+// X86_64:#define __x86_64__ 1
Index: lib/Frontend/InitPreprocessor.cpp
===
--- lib/Frontend/InitPreprocessor.cpp
+++ lib/Frontend/InitPreprocessor.cpp
@@ -1099,6 +1099,44 @@
   TI.getTargetDefines(LangOpts, Builder);
 }
 
+/// Initialize macros based on AuxTargetInfo.
+static void InitializePredefinedAuxMacros(const TargetInfo &AuxTI,
+  const LangOptions &LangOpts,
+  MacroBuilder &Builder) {
+  auto AuxTriple = AuxTI.getTriple();
+
+  // Define basic target macros needed by at least bits/wordsize.h and
+  // bits/mathinline.h
+  switch (AuxTriple.getArch()) {
+  case llvm::Triple::x86_64:
+Builder.defineMacro("__x86_64__");
+break;
+  case llvm::Triple::ppc64:
+  case llvm::Triple::ppc64le:
+Builder.defineMacro("__powerpc64__");
+break;
+  default:
+break;
+  }
+
+  // libc++ needs to find out the object file format and threading API.
+  if (AuxTriple.getOS() == llvm::Triple::Linux) {
+Builder.defineMacro("__ELF__");
+Builder.defineMacro("__linux__");
+// Used in features.h. If this is omitted, math.h doesn't declare float
+// versions of the funct

[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-25 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1212643, @tra wrote:

> Please keep an eye on CUDA buildbot 
> http://lab.llvm.org:8011/builders/clang-cuda-build.
>  It runs fair amount of tests with libc++ and handful of libstdc++ versions 
> and may a canary if these changes break something.


I just tested locally and `std::remainder` fails with CUDA 8.0.44 when 
compiling for `c++11` or later - both with and without this patch. My guess is 
that this version has a bug because all tests pass with CUDA 9.2.88.

I'll land this change now and watch the buildbot for any problems, thanks.


https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-25 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL340681: [CUDA/OpenMP] Define only some host macros during 
device compilation (authored by Hahnfeld, committed by ).
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D50845?vs=162543&id=162545#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D50845

Files:
  cfe/trunk/lib/Frontend/InitPreprocessor.cpp
  cfe/trunk/test/Preprocessor/aux-triple.c
  cfe/trunk/test/SemaCUDA/builtins.cu

Index: cfe/trunk/lib/Frontend/InitPreprocessor.cpp
===
--- cfe/trunk/lib/Frontend/InitPreprocessor.cpp
+++ cfe/trunk/lib/Frontend/InitPreprocessor.cpp
@@ -1099,6 +1099,44 @@
   TI.getTargetDefines(LangOpts, Builder);
 }
 
+/// Initialize macros based on AuxTargetInfo.
+static void InitializePredefinedAuxMacros(const TargetInfo &AuxTI,
+  const LangOptions &LangOpts,
+  MacroBuilder &Builder) {
+  auto AuxTriple = AuxTI.getTriple();
+
+  // Define basic target macros needed by at least bits/wordsize.h and
+  // bits/mathinline.h
+  switch (AuxTriple.getArch()) {
+  case llvm::Triple::x86_64:
+Builder.defineMacro("__x86_64__");
+break;
+  case llvm::Triple::ppc64:
+  case llvm::Triple::ppc64le:
+Builder.defineMacro("__powerpc64__");
+break;
+  default:
+break;
+  }
+
+  // libc++ needs to find out the object file format and threading API.
+  if (AuxTriple.getOS() == llvm::Triple::Linux) {
+Builder.defineMacro("__ELF__");
+Builder.defineMacro("__linux__");
+// Used in features.h. If this is omitted, math.h doesn't declare float
+// versions of the functions in bits/mathcalls.h.
+if (LangOpts.CPlusPlus)
+  Builder.defineMacro("_GNU_SOURCE");
+  } else if (AuxTriple.isOSDarwin()) {
+Builder.defineMacro("__APPLE__");
+Builder.defineMacro("__MACH__");
+  } else if (AuxTriple.isOSWindows()) {
+Builder.defineMacro("_WIN32");
+if (AuxTriple.isWindowsGNUEnvironment())
+  Builder.defineMacro("__MINGW32__");
+  }
+}
+
 /// InitializePreprocessor - Initialize the preprocessor getting it and the
 /// environment ready to process a single file. This returns true on error.
 ///
@@ -1120,13 +1158,9 @@
 
   // Install things like __POWERPC__, __GNUC__, etc into the macro table.
   if (InitOpts.UsePredefines) {
-// FIXME: This will create multiple definitions for most of the predefined
-// macros. This is not the right way to handle this.
-if ((LangOpts.CUDA || LangOpts.OpenMPIsDevice) && PP.getAuxTargetInfo())
-  InitializePredefinedMacros(*PP.getAuxTargetInfo(), LangOpts, FEOpts,
- Builder);
-
 InitializePredefinedMacros(PP.getTargetInfo(), LangOpts, FEOpts, Builder);
+if ((LangOpts.CUDA || LangOpts.OpenMPIsDevice) && PP.getAuxTargetInfo())
+  InitializePredefinedAuxMacros(*PP.getAuxTargetInfo(), LangOpts, Builder);
 
 // Install definitions to make Objective-C++ ARC work well with various
 // C++ Standard Library implementations.
Index: cfe/trunk/test/Preprocessor/aux-triple.c
===
--- cfe/trunk/test/Preprocessor/aux-triple.c
+++ cfe/trunk/test/Preprocessor/aux-triple.c
@@ -0,0 +1,62 @@
+// Ensure that Clang sets some very basic target defines based on -aux-triple.
+
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+// RUN: %clang_cc1 -x c++ -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,NONE %s
+
+// CUDA:
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none -aux-triple powerpc64le-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s \
+// RUN: -check-prefixes NVPTX64,PPC64,LINUX,LINUX-CPP
+// RUN: %clang_cc1 -x cuda -E -dM -ffreestanding < /dev/null \
+// RUN: -triple nvptx64-none-none -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s \
+// RUN: -check-prefixes NVPTX64,X86_64,LINUX,LINUX-CPP
+
+// OpenMP:
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device -triple nvptx64-none-none \
+// RUN: -aux-triple powerpc64le-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines -check-prefixes NVPTX64,PPC64,LINUX %s
+// RUN: %clang_cc1 -E -dM -ffreestanding < /dev/null \
+// RUN: -fopenmp -fopenmp-is-device -triple nvptx64-none-none \
+// RUN: -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines -chec

[PATCH] D51312: [OpenMP][NVPTX] Use appropriate _CALL_ELF macro when offloading

2018-08-27 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld accepted this revision.
Hahnfeld added a comment.
This revision is now accepted and ready to land.

LGTM. Can you add a comment to `InitializePredefinedAuxMacros` explaining that 
the macro is used in `gnu/stubs.h` and add a check to the test?


Repository:
  rC Clang

https://reviews.llvm.org/D51312



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51378: [OPENMP] Add support for nested 'declare target' directives

2018-08-30 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D51378#1218184, @RaviNarayanaswamy wrote:

> We should just go with generating an error if the DeclareTargetNestingLevel 
> is not 0 at the end of compilation unit.  
>  Hard to detect if user accidentally forgot to have end declare in header 
> file and had it in the include file or it was intentional.


That will effectively forbid the legacy approach of doing

  #pragma omp declare target
  #include <...>
  #pragma omp end declare target

as in the test because `DeclareTargetNestingLevel` will be 1 throughout the 
header file. I think that's still relevant today, so the condition should be 
"has the same value as when entering this file".


Repository:
  rC Clang

https://reviews.llvm.org/D51378



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51446: [OpenMP][bugfix] Add missing macros for Power

2018-08-30 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld requested changes to this revision.
Hahnfeld added a comment.
This revision now requires changes to proceed.

Please also update the test.




Comment at: lib/Frontend/InitPreprocessor.cpp:1115-1130
   case llvm::Triple::ppc64:
+if (AuxTI.getLongDoubleWidth() == 128) {
+  Builder.defineMacro("__LONG_DOUBLE_128__");
+  Builder.defineMacro("__LONGDOUBLE128");
+}
 Builder.defineMacro("__powerpc64__");
 Builder.defineMacro("_CALL_ELF", "1");

I'd suggest to merge these two:
```lang=c++
  case llvm::Triple::ppc64:
  case llvm::Triple::ppc64le:
Builder.defineMacro("__powerpc64__");

StringRef ABI = AuxTI.getABI();
// Set _CALL_ELF macro needed for gnu/stubs.h
if (ABI == "elfv1" || ABI == "elfv1-qpx")
  Builder.defineMacro("_CALL_ELF", "1");
if (ABI == "elfv2")
  Builder.defineMacro("_CALL_ELF", "2");

// TODO: Add comment where this is needed and for what reason.
if (AuxTI.getLongDoubleWidth() == 128) {
  Builder.defineMacro("__LONG_DOUBLE_128__");
  Builder.defineMacro("__LONGDOUBLE128");
}
break;


Repository:
  rC Clang

https://reviews.llvm.org/D51446



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-30 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

Do you have invocations or headers that don't work? The problem is that the 
previous code defined all macros unconditionally, so it will afterwards be hard 
to find the necessary macros...


Repository:
  rL LLVM

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-30 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1219726, @gtbercea wrote:

> In general, it looks like this patch leads to some host macros having to be 
> defined again for the auxiliary triple case. It is not clear to me how to 
> exhaustively identify the missing macros, so far it's been just trial and 
> error.


Well, that's the point of this patch, isn't it? Again, the current approach is 
to just define all macros which is definitely broken.


Repository:
  rL LLVM

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-30 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1219797, @tra wrote:

> I've sent out https://reviews.llvm.org/D51501. It unbreaks CUDA compilation 
> and keeps OpenMP unchanged.


I think a full revert would make more sense. And you definitely want to 
reinstantiate

  // FIXME: This will create multiple definitions for most of the predefined
  // macros. This is not the right way to handle this.

which is what I meant with "broken".

In any case, I'd like to request some more time to investigate. For now it 
looks like Clang was never able to parse that code, so we cannot come across 
this in the past.


Repository:
  rL LLVM

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-30 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1219746, @tra wrote:

> In our case the headers from a relatively old glibc and compiler errors out 
> on this:
>
>   /* This function is used in the `isfinite' macro.  */
>   __MATH_INLINE int
>   __NTH (__finite (double __x))
>   {
> return (__extension__
> (union { double __d; int __i[2]; }) {__d: __x}).__i[1]
>| 0x800fu) + 1) >> 31));
>   }
>
>
> expanded to this:
>
>   extern __inline __attribute__ ((__always_inline__)) __attribute__ 
> ((__gnu_inline__)) int
>__finite (double __x) throw ()
>   {
> return (__extension__
>  (union { double __d; int __i[2]; }) {__d: __x}).__i[1]
> | 0x800fu) + 1) >> 31));
>   }
>
>
> The error:
>
>   .../include/bits/mathinline.h:945:9: error: '(anonymous union at 
> .../include/bits/mathinline.h:945:9)' cannot be defined in a type specifier
> (union { double __d; int __i[2]; }) {__d: __x}).__i[1]
>  ^
>   .../include/bits/mathinline.h:945:55: error: member reference base type 
> 'void' is not a structure or union
> (union { double __d; int __i[2]; }) {__d: __x}).__i[1]
>^~~~
>
>
> Also, whatever macros we generate do not prevent headers from using x86 
> inline assembly. I see quite a few inline asm code in preprocessed output. 
> The headers are from libc ~2.19.


Ok, the top preprocessor condition for that function is `#ifndef __SSE2_MATH__` 
- the exact same macro that was part of the motivation. Can you please test 
compiling a simple C file (including `math.h`) with `-mno-sse`? My guess would 
be that this is broken as well.
If yes I'm fine with reverting because I need to teach Clang to allow anonymous 
unions in type specifiers to make that weird system header work with this patch.


Repository:
  rL LLVM

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-30 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1219853, @tra wrote:

> In https://reviews.llvm.org/D50845#1219819, @Hahnfeld wrote:
>
> > Ok, the top preprocessor condition for that function is `#ifndef 
> > __SSE2_MATH__` - the exact same macro that was part of the motivation. Can 
> > you please test compiling a simple C file (including `math.h`) with 
> > `-mno-sse`? My guess would be that this is broken as well.
> >  If yes I'm fine with reverting because I need to teach Clang to allow 
> > anonymous unions in type specifiers to make that weird system header work 
> > with this patch.
>
>
> It compiles fine. The code that causes the problem is also conditional on 
> `!defined __NO_MATH_INLINES` and it's always defined for X86, so compilation 
> only breaks for when we compile for NVPTX.


(which references a bug fixed in 2010 IIRC).

> Still, the issue seems to be way too hairy for one-line fix, so I'll proceed 
> with the unroll if you don't beat me to it.

Please go ahead. You'll probably get conflicts because of 
https://reviews.llvm.org/D51446, but removing `InitializePredefinedAuxMacros` 
and the new test completely should do.


Repository:
  rL LLVM

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-30 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1219865, @gtbercea wrote:

> In https://reviews.llvm.org/D50845#1219859, @Hahnfeld wrote:
>
> > removing `InitializePredefinedAuxMacros` and the new test completely should 
> > do.
>
>
> Yep they also contain https://reviews.llvm.org/D51312 in case you're rolling 
> back individual commits.


Err yes, that's the one I wanted to link


Repository:
  rL LLVM

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51446: [OpenMP][bugfix] Add missing macros for Power

2018-09-04 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

Not needed anymore after the reverts in https://reviews.llvm.org/rC341115 and 
https://reviews.llvm.org/rC341118, right? Maybe we should revive the test to 
make sure we don't break this in the future?


Repository:
  rC Clang

https://reviews.llvm.org/D51446



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51501: [CUDA] Fix CUDA compilation broken by D50845

2018-09-04 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

Not needed anymore after the reverts in https://reviews.llvm.org/rC341115 and 
https://reviews.llvm.org/rC341118, right?


https://reviews.llvm.org/D51501



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51686: [OpenMP] Improve search for libomptarget-nvptx

2018-09-05 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld created this revision.
Hahnfeld added reviewers: gtbercea, ABataev.
Herald added subscribers: cfe-commits, guansong.

When looking for the bclib Clang considered the default library
path first while it preferred directories in LIBRARY_PATH when
constructing the invocation of nvlink. The latter actually makes
more sense because during development it allows using a non-default
runtime library. So change the search for the bclib to start
looking in directories given by LIBRARY_PATH.
Additionally add a new option --libomptarget-nvptx-path= which
will be searched first. This will be handy for testing purposes.


Repository:
  rC Clang

https://reviews.llvm.org/D51686

Files:
  include/clang/Driver/Options.td
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload-gpu.c


Index: test/Driver/openmp-offload-gpu.c
===
--- test/Driver/openmp-offload-gpu.c
+++ test/Driver/openmp-offload-gpu.c
@@ -30,6 +30,22 @@
 
 /// ###
 
+/// Check that -lomptarget-nvptx is passed to nvlink.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp \
+// RUN:  -fopenmp-targets=nvptx64-nvidia-cuda %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-NVLINK %s
+/// Check that the value of --libomptarget-nvptx-path is forwarded to nvlink.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp \
+// RUN:  --libomptarget-nvptx-path=/path/to/libomptarget/ \
+// RUN:  -fopenmp-targets=nvptx64-nvidia-cuda %s 2>&1 \
+// RUN:   | FileCheck -check-prefixes=CHK-NVLINK,CHK-LIBOMPTARGET-NVPTX-PATH %s
+
+// CHK-NVLINK: nvlink
+// CHK-LIBOMPTARGET-NVPTX-PATH-SAME: "-L/path/to/libomptarget/"
+// CHK-NVLINK-SAME: "-lomptarget-nvptx"
+
+/// ###
+
 /// Check cubin file generation and usage by nvlink
 // RUN:   %clang -### -no-canonical-prefixes -target 
powerpc64le-unknown-linux-gnu -fopenmp=libomp \
 // RUN:  -fopenmp-targets=nvptx64-nvidia-cuda -save-temps %s 2>&1 \
@@ -151,6 +167,11 @@
 // RUN:   -Xopenmp-target -march=sm_20 
--cuda-path=%S/Inputs/CUDA_80/usr/local/cuda \
 // RUN:   -fopenmp-relocatable-target -save-temps -no-canonical-prefixes %s 
2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-BCLIB %s
+/// The user can override default detection using --libomptarget-nvptx-path=.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
--libomptarget-nvptx-path=%S/Inputs/libomptarget \
+// RUN:   -Xopenmp-target -march=sm_20 
--cuda-path=%S/Inputs/CUDA_80/usr/local/cuda \
+// RUN:   -fopenmp-relocatable-target -save-temps -no-canonical-prefixes %s 
2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-BCLIB %s
 
 // CHK-BCLIB: 
clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}libomptarget-nvptx-sm_20.bc
 // CHK-BCLIB-NOT: {{error:|warning:}}
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -509,6 +509,11 @@
   CmdArgs.push_back("-arch");
   CmdArgs.push_back(Args.MakeArgString(GPUArch));
 
+  // Assume that the directory specified with --libomptarget_nvptx_path
+  // contains the static library libomptarget-nvptx.a.
+  if (Arg *A = Args.getLastArg(options::OPT_libomptarget_nvptx_path_EQ))
+CmdArgs.push_back(Args.MakeArgString(Twine("-L") + A->getValue()));
+
   // Add paths specified in LIBRARY_PATH environment variable as -L options.
   addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");
 
@@ -642,12 +647,9 @@
 
   if (DeviceOffloadingKind == Action::OFK_OpenMP) {
 SmallVector LibraryPaths;
-// Add path to lib and/or lib64 folders.
-SmallString<256> DefaultLibPath =
-  llvm::sys::path::parent_path(getDriver().Dir);
-llvm::sys::path::append(DefaultLibPath,
-Twine("lib") + CLANG_LIBDIR_SUFFIX);
-LibraryPaths.emplace_back(DefaultLibPath.c_str());
+
+if (Arg *A = 
DriverArgs.getLastArg(options::OPT_libomptarget_nvptx_path_EQ))
+  LibraryPaths.push_back(A->getValue());
 
 // Add user defined library paths from LIBRARY_PATH.
 llvm::Optional LibPath =
@@ -660,6 +662,12 @@
 LibraryPaths.emplace_back(Path.trim());
 }
 
+// Add path to lib / lib64 folder.
+SmallString<256> DefaultLibPath =
+llvm::sys::path::parent_path(getDriver().Dir);
+llvm::sys::path::append(DefaultLibPath, Twine("lib") + 
CLANG_LIBDIR_SUFFIX);
+LibraryPaths.emplace_back(DefaultLibPath.c_str());
+
 std::string LibOmpTargetName =
   "libomptarget-nvptx-" + GpuArch.str() + ".bc";
 bool FoundBCLibrary = false;
Index: include/clang/Driver/Options.td
===
--- include/clang/Driver/Options.td
+++ include/clang/Driver/Options.td
@@ -596,6 +596,8 @@
   HelpText<"HIP device library">;
 def fhip_dump_offl

[PATCH] D48862: [OpenEmbedded] Fix lib paths for OpenEmbedded targets

2018-07-31 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

I fixed `linux-header-search.cpp` by adding `-stdlib=libstdc++` in r338360 
because I was seeing the same failure and that's what agreed to do in these 
cases. If you can verify that it fixes your problems, I think it's safe to add 
`-rtlib=libgcc` to the other test.


Repository:
  rL LLVM

https://reviews.llvm.org/D48862



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-07-31 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D47849#1124861, @hfinkel wrote:

> In https://reviews.llvm.org/D47849#1124638, @Hahnfeld wrote:
>
> > 2. Incidentally I ran into a closely related problem: I can't `#include 
> > ` in translation units compiled for offloading, Clang complains 
> > about inline assembly for x86 (see below). Does that work for you?
> >
> >
> >
> >   In file included from /usr/include/math.h:413:
> >   /usr/include/bits/mathinline.h:131:43: error: invalid input constraint 
> > 'x' in asm
> > __asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
> > ^
> >   /usr/include/bits/mathinline.h:143:43: error: invalid input constraint 
> > 'x' in asm
> > __asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
> > ^
> >   2 errors generated.
>
>
> Hrmm. I thought that we had fixed that already.
>
> In case it's helpful, in an out-of-tree experimental target I have I ran into 
> a similar problem, and to fix that I wrote the following code in the target's 
> getTargetDefines function (in lib/Basic/Targets):
>
>   // If used as an OpenMP target on x86, x86 target feature macros are 
> defined. math.h
>   // and other system headers will include inline asm if these are defined.
>   Builder.undefineMacro("__SSE2_MATH__");
>   Builder.undefineMacro("__SSE_MATH__");


Just found another workaround:

  diff --git a/lib/Sema/SemaStmtAsm.cpp b/lib/Sema/SemaStmtAsm.cpp
  index 0db15ea..b95f949 100644
  --- a/lib/Sema/SemaStmtAsm.cpp
  +++ b/lib/Sema/SemaStmtAsm.cpp
  @@ -306,7 +306,9 @@ StmtResult Sema::ActOnGCCAsmStmt(SourceLocation AsmLoc, 
bool IsSimple,
   
   TargetInfo::ConstraintInfo Info(Literal->getString(), InputName);
   if 
(!Context.getTargetInfo().validateInputConstraint(OutputConstraintInfos,
  - Info)) {
  + Info) &&
  +!(Context.getLangOpts().OpenMPIsDevice &&
  +  Context.getSourceManager().isInSystemHeader(AsmLoc))) {
 return StmtError(Diag(Literal->getLocStart(),
   diag::err_asm_invalid_input_constraint)
  << Info.getConstraintStr());

This will ignore all errors during OpenMP device codegen from system headers 
when the inline assembly is not used. In that case (calling `signbit`) you'll 
get

  In file included from math.c:2:
  In file included from /usr/include/math.h:413:
  /usr/include/bits/mathinline.h:143:10: error: couldn't allocate input reg for 
constraint 'x'
__asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
   ^
  1 error generated.

Not sure if that's acceptable...


Repository:
  rC Clang

https://reviews.llvm.org/D47849



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-08-01 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D47849#1183150, @hfinkel wrote:

> Hrmm. Doesn't that make it so that whatever functions are implemented using 
> that inline assembly will not be callable from target code (or, perhaps 
> worse, will crash the backend if called)?


You are right :-(

However I'm getting worried about a more general case, not all inline assembly 
is guarded by `#ifdef`s that we could hope to get right. For example take 
`sys/io.h` which currently throws 18 errors when compiling with offloading to 
GPUs, even with `-O0`. The inline assembly is only guarded by `#if defined 
__GNUC__ && __GNUC__ >= 2` which should be defined by any modern compiler 
claiming compatibility with GCC. I'm not sure this particular header will ever 
end up in an OpenMP application, but others with inline assembly will. From a 
quick grep it looks like some headers dealing with atomic operations have 
inline assembly and even `eigen3/Eigen/src/Core/util/Memory.h` for finding the 
cpuid.

Coming back to the original problem: Maybe we need to undefine optimization 
macros as in your patch to get as many correct inline functions as possible AND 
ignore errors from inline assembly as in my patch to not break when including 
weird headers?


Repository:
  rC Clang

https://reviews.llvm.org/D47849



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-08-01 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

In https://reviews.llvm.org/D47849#1184367, @hfinkel wrote:

> The problem is that the inline assembly might actually be for the target, 
> instead of the host, because we also have target preprocessor macros defined, 
> and it's going to be hard to tell. I'm not sure that there's a great solution 
> here, and I agree that having something more general than undefining some 
> specific things that happen to matter for math.h would be better. As you 
> point out, this is not just a system-header problem. We might indeed want to 
> undefine all of the target-feature-related macros (although that won't always 
> be sufficient, because we need basic arch macros for the system headers to 
> work at all, and those are generally enough to guard some inline asm).


I think there was a reason for pulling in the host defines. I'd have to look at 
the commit message though...

> Maybe the following makes sense: Only define the host macros, minus 
> target-feature ones, when compiling for the target in the context of the 
> system headers. That makes the system headers work while providing a "clean" 
> preprocessor environment for the rest of the code (and, thus, retains our 
> ability to complain about bad inline asm).

I'm not sure how that's going to help with Eigen: Just including `Eigen/Core` 
will pull in the other header file I mentioned with inline assembly. That's 
completely independent of preprocessor macros, I think it's enough the 
library's build system detected the host architecture during install.


Repository:
  rC Clang

https://reviews.llvm.org/D47849



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42978: Make march/target-cpu print a note with the list of valid values for ARM

2018-02-08 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

I think this means that the Clang test needs to be updated whenever somebody 
adds an architecture to LLVM? Maybe just test that Clang emits a note and don't 
check which values it prints? These should be checked in the backend...


https://reviews.llvm.org/D42978



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43041: Add X86 Support to ValidCPUList (enabling march notes)

2018-02-08 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added inline comments.



Comment at: lib/Basic/Targets/X86.cpp:1670-1672
+#define PROC_ALIAS(ENUM, ALIAS)
\
+  if (checkCPUKind(getCPUKind(ALIAS))) 
\
+Values.emplace_back(ALIAS);

`checkCPUKind()` doesn't define `PROC_ALIAS` so I don't think it can handle the 
values?



Comment at: test/Misc/target-invalid-cpu-note.c:11
+// X86: error: unknown target CPU 'not-a-cpu'
+// X86: note: valid target CPU values are: i386, i486, winchip-c6, winchip2, 
c3, i586, pentium, pentium-mmx, pentiumpro, i686, pentium2, pentium3, 
pentium3m, pentium-m, c3-2, yonah, pentium4, pentium4m, prescott, nocona, 
core2, penryn, bonnell, atom, silvermont, slm, goldmont, nehalem, corei7, 
westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, 
broadwell, skylake, skylake-avx512, skx, cannonlake, icelake, knl, knm, 
lakemont, k6, k6-2, k6-3, athlon, athlon-tbird, athlon-xp, athlon-mp, athlon-4, 
k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, 
amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, 
x86-64, geode
+

If we really want to check all these values (does any other test do it? I don't 
really get the value) this line needs to be split, it's way too long.


https://reviews.llvm.org/D43041



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43041: Add X86 Support to ValidCPUList (enabling march notes)

2018-02-08 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added inline comments.



Comment at: lib/Basic/Targets/X86.cpp:1670-1672
+#define PROC_ALIAS(ENUM, ALIAS)
\
+  if (checkCPUKind(getCPUKind(ALIAS))) 
\
+Values.emplace_back(ALIAS);

Hahnfeld wrote:
> `checkCPUKind()` doesn't define `PROC_ALIAS` so I don't think it can handle 
> the values?
Never mind, the enum value is enough. Silly me :-/


https://reviews.llvm.org/D43041



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42978: Make march/target-cpu print a note with the list of valid values for ARM

2018-02-08 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added inline comments.



Comment at: test/Misc/target-invalid-cpu-note.c:1
+// RUN: not %clang_cc1 -triple armv5--- -target-cpu not-a-cpu -fsyntax-only %s 
2>&1 | FileCheck %s --check-prefix ARM
+// ARM: error: unknown target CPU 'not-a-cpu'

Is there a reason you don't use `-verify=` in this test? That's what 
I've usually seen for checking errors and notes...



Comment at: test/Misc/target-invalid-cpu-note.c:3
+// ARM: error: unknown target CPU 'not-a-cpu'
+// ARM: note: valid target CPU values are: arm2,
+

Is this guaranteed to be first? If not, you might want to add `{{.*}}` to 
account for future updates.

(If not using `-verify` as suggested above, you could also use `ARM-SAME` on a 
new-line. This should also allow arbitrary values in between.)


https://reviews.llvm.org/D42978



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42840: [docs] Fix duplicate arguments for JoinedAndSeparate

2018-02-09 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a comment.

Ping


Repository:
  rC Clang

https://reviews.llvm.org/D42840



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42920: [CUDA] Fix test cuda-external-tools.cu

2018-02-12 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added inline comments.



Comment at: test/Driver/cuda-external-tools.cu:11
+// RUN: | FileCheck -check-prefix CHECK -check-prefix ARCH64 \
+// RUN: -check-prefix SM20 -check-prefix OPT0 %s
 // RUN: %clang -### -target x86_64-linux-gnu -O1 -c %s 2>&1 \

tra wrote:
> Nit: I'd use --check-prefixes=CHECK,ARCH64,SM20,OPT0 . Up to you.
Did some search-and-replace magic, don't know why this change didn't occur to 
me...


https://reviews.llvm.org/D42920



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42920: [CUDA] Fix test cuda-external-tools.cu

2018-02-12 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld updated this revision to Diff 133816.
Hahnfeld marked an inline comment as done.
Hahnfeld added a comment.

Use `--check-prefixes` instead of multiple `--check-prefix`.


https://reviews.llvm.org/D42920

Files:
  test/Driver/cuda-external-tools.cu

Index: test/Driver/cuda-external-tools.cu
===
--- test/Driver/cuda-external-tools.cu
+++ test/Driver/cuda-external-tools.cu
@@ -7,112 +7,115 @@
 
 // Regular compiles with -O{0,1,2,3,4,fast}.  -O4 and -Ofast map to ptxas O3.
 // RUN: %clang -### -target x86_64-linux-gnu -O0 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT0 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT0 %s
 // RUN: %clang -### -target x86_64-linux-gnu -O1 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT1 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT1 %s
 // RUN: %clang -### -target x86_64-linux-gnu -O2 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT2 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT2 %s
 // RUN: %clang -### -target x86_64-linux-gnu -O3 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT3 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT3 %s
 // RUN: %clang -### -target x86_64-linux-gnu -O4 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT3 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT3 %s
 // RUN: %clang -### -target x86_64-linux-gnu -Ofast -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT3 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT3 %s
 
 // With debugging enabled, ptxas should be run with with no ptxas optimizations.
 // RUN: %clang -### -target x86_64-linux-gnu --cuda-noopt-device-debug -O2 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix DBG %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,DBG %s
 
 // --no-cuda-noopt-device-debug overrides --cuda-noopt-device-debug.
 // RUN: %clang -### -target x86_64-linux-gnu --cuda-noopt-device-debug \
 // RUN:   --no-cuda-noopt-device-debug -O2 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT2 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT2 %s
 
 // Regular compile without -O.  This should result in us passing -O0 to ptxas.
 // RUN: %clang -### -target x86_64-linux-gnu -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT0 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT0 %s
 
 // Regular compiles with -Os and -Oz.  For lack of a better option, we map
 // these to ptxas -O3.
 // RUN: %clang -### -target x86_64-linux-gnu -Os -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT2 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT2 %s
 // RUN: %clang -### -target x86_64-linux-gnu -Oz -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT2 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT2 %s
 
 // Regular compile targeting sm_35.
 // RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_35 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM35 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM35 %s
 
 // 32-bit compile.
-// RUN: %clang -### -target x86_32-linux-gnu -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH32 -check-prefix SM20 %s
+// RUN: %clang -### -target i386-linux-gnu -c %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH32,SM20 %s
 
 // Compile with -fintegrated-as.  This should still cause us to invoke ptxas.
 // RUN: %clang -### -target x86_64-linux-gnu -fintegrated-as -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT0 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT0 %s
 
 // Check -Xcuda-ptxas and -Xcuda-fatbinary
 // RUN: %clang -### -target x86_64-linux-gnu -c -Xcuda-ptxas -foo1 \
 // RUN:   -Xcuda-fatbinary -bar1 -Xcuda-ptxas -foo2 -Xcuda-fatbinary -bar2 %s 2>&1 \
-// RUN: | FileCheck -check-prefix SM20 -check-prefix PTXAS-EXTRA \
-// RUN:   -check-prefix FATBINARY-EXTRA %s
+// RUN: | FileCheck -check-prefixes=CHECK,SM20,PTXAS-EXTRA,FATBINARY-EXTRA %s
 
 // MacOS spot-checks
 // RUN: %clang -### -target x86_64-apple-macosx -O0 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT0 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT0 %s
 // RUN: %clang -### -target x86_64-apple-macosx --cuda-gpu-arch=sm_35 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM35 %s
-// RUN: %clang -### -target x86_32-apple-macosx -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH32 -check-prefix SM20 %s
+// RUN: | File

[PATCH] D42921: [CUDA] Add option to generate relocatable device code

2018-02-12 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld updated this revision to Diff 133817.
Hahnfeld marked an inline comment as done.
Hahnfeld added a comment.

Hide help for `-fcuda-rdc` until support is ready.


https://reviews.llvm.org/D42921

Files:
  include/clang/Basic/LangOptions.def
  include/clang/Driver/Options.td
  lib/Driver/ToolChains/Clang.cpp
  lib/Driver/ToolChains/Cuda.cpp
  lib/Frontend/CompilerInvocation.cpp
  test/Driver/cuda-external-tools.cu

Index: test/Driver/cuda-external-tools.cu
===
--- test/Driver/cuda-external-tools.cu
+++ test/Driver/cuda-external-tools.cu
@@ -18,6 +18,9 @@
 // RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT3 %s
 // RUN: %clang -### -target x86_64-linux-gnu -Ofast -c %s 2>&1 \
 // RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT3 %s
+// Generating relocatable device code
+// RUN: %clang -### -target x86_64-linux-gnu -fcuda-rdc -c %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,RDC %s
 
 // With debugging enabled, ptxas should be run with with no ptxas optimizations.
 // RUN: %clang -### -target x86_64-linux-gnu --cuda-noopt-device-debug -O2 -c %s 2>&1 \
@@ -42,14 +45,23 @@
 // Regular compile targeting sm_35.
 // RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_35 -c %s 2>&1 \
 // RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM35 %s
+// Separate compilation targeting sm_35.
+// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_35 -fcuda-rdc -c %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM35,RDC %s
 
 // 32-bit compile.
 // RUN: %clang -### -target i386-linux-gnu -c %s 2>&1 \
 // RUN: | FileCheck -check-prefixes=CHECK,ARCH32,SM20 %s
+// 32-bit compile when generating relocatable device code.
+// RUN: %clang -### -target i386-linux-gnu -fcuda-rdc -c %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH32,SM20,RDC %s
 
 // Compile with -fintegrated-as.  This should still cause us to invoke ptxas.
 // RUN: %clang -### -target x86_64-linux-gnu -fintegrated-as -c %s 2>&1 \
 // RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT0 %s
+// Check that we still pass -c when generating relocatable device code.
+// RUN: %clang -### -target x86_64-linux-gnu -fintegrated-as -fcuda-rdc -c %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,RDC %s
 
 // Check -Xcuda-ptxas and -Xcuda-fatbinary
 // RUN: %clang -### -target x86_64-linux-gnu -c -Xcuda-ptxas -foo1 \
@@ -64,6 +76,14 @@
 // RUN: %clang -### -target i386-apple-macosx -c %s 2>&1 \
 // RUN: | FileCheck -check-prefixes=CHECK,ARCH32,SM20 %s
 
+// Check relocatable device code generation on MacOS.
+// RUN: %clang -### -target x86_64-apple-macosx -O0 -fcuda-rdc -c %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,RDC %s
+// RUN: %clang -### -target x86_64-apple-macosx --cuda-gpu-arch=sm_35 -fcuda-rdc -c %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM35,RDC %s
+// RUN: %clang -### -target i386-apple-macosx -fcuda-rdc -c %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH32,SM20,RDC %s
+
 // Check that CLANG forwards the -v flag to PTXAS.
 // RUN:   %clang -### -save-temps -no-canonical-prefixes -v %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-PTXAS-VERBOSE %s
@@ -76,6 +96,8 @@
 // SM35-SAME: "-target-cpu" "sm_35"
 // SM20-SAME: "-o" "[[PTXFILE:[^"]*]]"
 // SM35-SAME: "-o" "[[PTXFILE:[^"]*]]"
+// RDC-SAME: "-fcuda-rdc"
+// CHECK-NOT: "-fcuda-rdc"
 
 // Match the call to ptxas (which assembles PTX to SASS).
 // CHECK: ptxas
@@ -97,6 +119,8 @@
 // CHECK-SAME: "[[PTXFILE]]"
 // PTXAS-EXTRA-SAME: "-foo1"
 // PTXAS-EXTRA-SAME: "-foo2"
+// RDC-SAME: "-c"
+// CHECK-NOT: "-c"
 
 // Match the call to fatbinary (which combines all our PTX and SASS into one
 // blob).
@@ -117,5 +141,7 @@
 // ARCH64-SAME: "-triple" "x86_64-
 // ARCH32-SAME: "-triple" "i386-
 // CHECK-SAME: "-fcuda-include-gpubinary" "[[FATBINARY]]"
+// RDC-SAME: "-fcuda-rdc"
+// CHECK-NOT: "-fcuda-rdc"
 
 // CHK-PTXAS-VERBOSE: ptxas{{.*}}" "-v"
Index: lib/Frontend/CompilerInvocation.cpp
===
--- lib/Frontend/CompilerInvocation.cpp
+++ lib/Frontend/CompilerInvocation.cpp
@@ -2074,6 +2074,8 @@
   if (Opts.CUDAIsDevice && Args.hasArg(OPT_fcuda_approx_transcendentals))
 Opts.CUDADeviceApproxTranscendentals = 1;
 
+  Opts.CUDARelocatableDeviceCode = Args.hasArg(OPT_fcuda_rdc);
+
   if (Opts.ObjC1) {
 if (Arg *arg = Args.getLastArg(OPT_fobjc_runtime_EQ)) {
   StringRef value = arg->getValue();
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -355,11 +355,17 @@
   for (const auto& A : Args.getAllArgValues(options::OPT_Xcuda_ptxas))
 CmdArgs.push_back(Args.MakeArgString(A));
 
-  // In OpenMP we need to generate relocatable code.
-  if (JA.isOffloading(Action::OFK_OpenMP) &&
-  Args.hasFlag(options::OPT_fopenmp_relocatabl

[PATCH] D42921: [CUDA] Add option to generate relocatable device code

2018-02-12 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added inline comments.



Comment at: include/clang/Driver/Options.td:572
+  HelpText<"Generate relocatable device code, also known as separate 
compilation mode.">;
+def fno_cuda_rdc : Flag<["-"], "fno-cuda-rdc">;
 def dA : Flag<["-"], "dA">, Group;

tra wrote:
> Does the options show up in clang --help? 
> If it does, and if you plan to commit patches one at a time, we may want to 
> make it hidden until everything is in place.
Good idea, I'll submit a patch enabling the help text and adding release notes 
after full support has landed.


https://reviews.llvm.org/D42921



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42921: [CUDA] Add option to generate relocatable device code

2018-02-12 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL324878: [CUDA] Add option to generate relocatable device 
code (authored by Hahnfeld, committed by ).
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D42921?vs=133817&id=133820#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D42921

Files:
  cfe/trunk/include/clang/Basic/LangOptions.def
  cfe/trunk/include/clang/Driver/Options.td
  cfe/trunk/lib/Driver/ToolChains/Clang.cpp
  cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
  cfe/trunk/lib/Frontend/CompilerInvocation.cpp
  cfe/trunk/test/Driver/cuda-external-tools.cu

Index: cfe/trunk/lib/Driver/ToolChains/Clang.cpp
===
--- cfe/trunk/lib/Driver/ToolChains/Clang.cpp
+++ cfe/trunk/lib/Driver/ToolChains/Clang.cpp
@@ -4658,14 +4658,20 @@
 CmdArgs.push_back(Args.MakeArgString(Flags));
   }
 
-  // Host-side cuda compilation receives device-side outputs as Inputs[1...].
-  // Include them with -fcuda-include-gpubinary.
-  if (IsCuda && Inputs.size() > 1)
-for (auto I = std::next(Inputs.begin()), E = Inputs.end(); I != E; ++I) {
-  CmdArgs.push_back("-fcuda-include-gpubinary");
-  CmdArgs.push_back(I->getFilename());
+  if (IsCuda) {
+// Host-side cuda compilation receives device-side outputs as Inputs[1...].
+// Include them with -fcuda-include-gpubinary.
+if (Inputs.size() > 1) {
+  for (auto I = std::next(Inputs.begin()), E = Inputs.end(); I != E; ++I) {
+CmdArgs.push_back("-fcuda-include-gpubinary");
+CmdArgs.push_back(I->getFilename());
+  }
 }
 
+if (Args.hasFlag(options::OPT_fcuda_rdc, options::OPT_fno_cuda_rdc, false))
+  CmdArgs.push_back("-fcuda-rdc");
+  }
+
   // OpenMP offloading device jobs take the argument -fopenmp-host-ir-file-path
   // to specify the result of the compile phase on the host, so the meaningful
   // device declarations can be identified. Also, -fopenmp-is-device is passed
Index: cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
===
--- cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
+++ cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
@@ -355,11 +355,17 @@
   for (const auto& A : Args.getAllArgValues(options::OPT_Xcuda_ptxas))
 CmdArgs.push_back(Args.MakeArgString(A));
 
-  // In OpenMP we need to generate relocatable code.
-  if (JA.isOffloading(Action::OFK_OpenMP) &&
-  Args.hasFlag(options::OPT_fopenmp_relocatable_target,
-   options::OPT_fnoopenmp_relocatable_target,
-   /*Default=*/ true))
+  bool Relocatable = false;
+  if (JA.isOffloading(Action::OFK_OpenMP))
+// In OpenMP we need to generate relocatable code.
+Relocatable = Args.hasFlag(options::OPT_fopenmp_relocatable_target,
+   options::OPT_fnoopenmp_relocatable_target,
+   /*Default=*/true);
+  else if (JA.isOffloading(Action::OFK_Cuda))
+Relocatable = Args.hasFlag(options::OPT_fcuda_rdc,
+   options::OPT_fno_cuda_rdc, /*Default=*/false);
+
+  if (Relocatable)
 CmdArgs.push_back("-c");
 
   const char *Exec;
@@ -540,6 +546,10 @@
 if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,
options::OPT_fno_cuda_approx_transcendentals, false))
   CC1Args.push_back("-fcuda-approx-transcendentals");
+
+if (DriverArgs.hasFlag(options::OPT_fcuda_rdc, options::OPT_fno_cuda_rdc,
+   false))
+  CC1Args.push_back("-fcuda-rdc");
   }
 
   if (DriverArgs.hasArg(options::OPT_nocudalib))
Index: cfe/trunk/lib/Frontend/CompilerInvocation.cpp
===
--- cfe/trunk/lib/Frontend/CompilerInvocation.cpp
+++ cfe/trunk/lib/Frontend/CompilerInvocation.cpp
@@ -2074,6 +2074,8 @@
   if (Opts.CUDAIsDevice && Args.hasArg(OPT_fcuda_approx_transcendentals))
 Opts.CUDADeviceApproxTranscendentals = 1;
 
+  Opts.CUDARelocatableDeviceCode = Args.hasArg(OPT_fcuda_rdc);
+
   if (Opts.ObjC1) {
 if (Arg *arg = Args.getLastArg(OPT_fobjc_runtime_EQ)) {
   StringRef value = arg->getValue();
Index: cfe/trunk/include/clang/Driver/Options.td
===
--- cfe/trunk/include/clang/Driver/Options.td
+++ cfe/trunk/include/clang/Driver/Options.td
@@ -566,6 +566,9 @@
 def fcuda_approx_transcendentals : Flag<["-"], "fcuda-approx-transcendentals">,
   Flags<[CC1Option]>, HelpText<"Use approximate transcendental functions">;
 def fno_cuda_approx_transcendentals : Flag<["-"], "fno-cuda-approx-transcendentals">;
+def fcuda_rdc : Flag<["-"], "fcuda-rdc">, Flags<[CC1Option, HelpHidden]>,
+  HelpText<"Generate relocatable device code, also known as separate compilation mode.">;
+def fno_cuda_rdc : Flag<["-"], "fno-cuda-rdc">;
 def dA : Flag<["-"], "dA">, Group;
 def dD : Flag<["-"], "dD">, G

[PATCH] D42920: [CUDA] Fix test cuda-external-tools.cu

2018-02-12 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL324877: [CUDA] Fix test cuda-external-tools.cu (authored by 
Hahnfeld, committed by ).
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D42920?vs=133816&id=133819#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D42920

Files:
  cfe/trunk/test/Driver/cuda-external-tools.cu

Index: cfe/trunk/test/Driver/cuda-external-tools.cu
===
--- cfe/trunk/test/Driver/cuda-external-tools.cu
+++ cfe/trunk/test/Driver/cuda-external-tools.cu
@@ -7,112 +7,115 @@
 
 // Regular compiles with -O{0,1,2,3,4,fast}.  -O4 and -Ofast map to ptxas O3.
 // RUN: %clang -### -target x86_64-linux-gnu -O0 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT0 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT0 %s
 // RUN: %clang -### -target x86_64-linux-gnu -O1 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT1 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT1 %s
 // RUN: %clang -### -target x86_64-linux-gnu -O2 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT2 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT2 %s
 // RUN: %clang -### -target x86_64-linux-gnu -O3 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT3 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT3 %s
 // RUN: %clang -### -target x86_64-linux-gnu -O4 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT3 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT3 %s
 // RUN: %clang -### -target x86_64-linux-gnu -Ofast -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT3 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT3 %s
 
 // With debugging enabled, ptxas should be run with with no ptxas optimizations.
 // RUN: %clang -### -target x86_64-linux-gnu --cuda-noopt-device-debug -O2 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix DBG %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,DBG %s
 
 // --no-cuda-noopt-device-debug overrides --cuda-noopt-device-debug.
 // RUN: %clang -### -target x86_64-linux-gnu --cuda-noopt-device-debug \
 // RUN:   --no-cuda-noopt-device-debug -O2 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT2 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT2 %s
 
 // Regular compile without -O.  This should result in us passing -O0 to ptxas.
 // RUN: %clang -### -target x86_64-linux-gnu -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT0 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT0 %s
 
 // Regular compiles with -Os and -Oz.  For lack of a better option, we map
 // these to ptxas -O3.
 // RUN: %clang -### -target x86_64-linux-gnu -Os -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT2 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT2 %s
 // RUN: %clang -### -target x86_64-linux-gnu -Oz -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT2 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT2 %s
 
 // Regular compile targeting sm_35.
 // RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_35 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM35 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM35 %s
 
 // 32-bit compile.
-// RUN: %clang -### -target x86_32-linux-gnu -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH32 -check-prefix SM20 %s
+// RUN: %clang -### -target i386-linux-gnu -c %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH32,SM20 %s
 
 // Compile with -fintegrated-as.  This should still cause us to invoke ptxas.
 // RUN: %clang -### -target x86_64-linux-gnu -fintegrated-as -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT0 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT0 %s
 
 // Check -Xcuda-ptxas and -Xcuda-fatbinary
 // RUN: %clang -### -target x86_64-linux-gnu -c -Xcuda-ptxas -foo1 \
 // RUN:   -Xcuda-fatbinary -bar1 -Xcuda-ptxas -foo2 -Xcuda-fatbinary -bar2 %s 2>&1 \
-// RUN: | FileCheck -check-prefix SM20 -check-prefix PTXAS-EXTRA \
-// RUN:   -check-prefix FATBINARY-EXTRA %s
+// RUN: | FileCheck -check-prefixes=CHECK,SM20,PTXAS-EXTRA,FATBINARY-EXTRA %s
 
 // MacOS spot-checks
 // RUN: %clang -### -target x86_64-apple-macosx -O0 -c %s 2>&1 \
-// RUN: | FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT0 %s
+// RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM20,OPT0 %s
 // RUN: %clang -### -target x86_64-apple-macosx --cuda-gpu-arch=sm_35 -c %s 2>&1 \
-// RUN: | FileC

[PATCH] D42922: [CUDA] Register relocatable GPU binaries

2018-02-12 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld updated this revision to Diff 133831.
Hahnfeld added a comment.

Rebase and fix `Debug` build.


https://reviews.llvm.org/D42922

Files:
  lib/CodeGen/CGCUDANV.cpp

Index: lib/CodeGen/CGCUDANV.cpp
===
--- lib/CodeGen/CGCUDANV.cpp
+++ lib/CodeGen/CGCUDANV.cpp
@@ -15,12 +15,13 @@
 #include "CGCUDARuntime.h"
 #include "CodeGenFunction.h"
 #include "CodeGenModule.h"
-#include "clang/CodeGen/ConstantInitBuilder.h"
 #include "clang/AST/Decl.h"
+#include "clang/CodeGen/ConstantInitBuilder.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/CallSite.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DerivedTypes.h"
+#include "llvm/Support/Format.h"
 
 using namespace clang;
 using namespace CodeGen;
@@ -45,9 +46,12 @@
   /// ModuleCtorFunction() and used to create corresponding cleanup calls in
   /// ModuleDtorFunction()
   llvm::SmallVector GpuBinaryHandles;
+  /// Whether we generate relocatable device code.
+  bool RelocatableDeviceCode;
 
   llvm::Constant *getSetupArgumentFn() const;
   llvm::Constant *getLaunchFn() const;
+  llvm::FunctionType *getRegisterGlobalsFnTy() const;
 
   /// Creates a function to register all kernel stubs generated in this module.
   llvm::Function *makeRegisterGlobalsFn();
@@ -71,7 +75,23 @@
 
 return llvm::ConstantExpr::getGetElementPtr(ConstStr.getElementType(),
 ConstStr.getPointer(), Zeros);
- }
+  }
+
+  /// Helper function that generates an empty dummy function returning void.
+  llvm::Function *makeDummyFunction(llvm::FunctionType *FnTy) {
+assert(FnTy->getReturnType()->isVoidTy() &&
+   "Can only generate dummy functions returning void!");
+llvm::Function *DummyFunc = llvm::Function::Create(
+FnTy, llvm::GlobalValue::InternalLinkage, "dummy", &TheModule);
+
+llvm::BasicBlock *DummyBlock =
+llvm::BasicBlock::Create(Context, "", DummyFunc);
+CGBuilderTy FuncBuilder(CGM, Context);
+FuncBuilder.SetInsertPoint(DummyBlock);
+FuncBuilder.CreateRetVoid();
+
+return DummyFunc;
+  }
 
   void emitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList &Args);
 
@@ -93,7 +113,8 @@
 
 CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM)
 : CGCUDARuntime(CGM), Context(CGM.getLLVMContext()),
-  TheModule(CGM.getModule()) {
+  TheModule(CGM.getModule()),
+  RelocatableDeviceCode(CGM.getLangOpts().CUDARelocatableDeviceCode) {
   CodeGen::CodeGenTypes &Types = CGM.getTypes();
   ASTContext &Ctx = CGM.getContext();
 
@@ -161,6 +182,10 @@
   CGF.EmitBlock(EndBlock);
 }
 
+llvm::FunctionType *CGNVCUDARuntime::getRegisterGlobalsFnTy() const {
+  return llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false);
+}
+
 /// Creates a function that sets up state on the host side for CUDA objects that
 /// have a presence on both the host and device sides. Specifically, registers
 /// the host side of kernel functions and device global variables with the CUDA
@@ -181,8 +206,8 @@
 return nullptr;
 
   llvm::Function *RegisterKernelsFunc = llvm::Function::Create(
-  llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),
-  llvm::GlobalValue::InternalLinkage, "__cuda_register_globals", &TheModule);
+  getRegisterGlobalsFnTy(), llvm::GlobalValue::InternalLinkage,
+  "__cuda_register_globals", &TheModule);
   llvm::BasicBlock *EntryBB =
   llvm::BasicBlock::Create(Context, "entry", RegisterKernelsFunc);
   CGBuilderTy Builder(CGM, Context);
@@ -257,8 +282,29 @@
   if (CGM.getCodeGenOpts().CudaGpuBinaryFileNames.empty())
 return nullptr;
 
+  llvm::FunctionType *RegisterGlobalsFnTy;
+  llvm::FunctionType *RegisterLinkedBinaryFnTy;
+  llvm::Function *DummyCallback;
+  if (RelocatableDeviceCode) {
+RegisterGlobalsFnTy = getRegisterGlobalsFnTy();
+
+auto CallbackFnTy = llvm::FunctionType::get(VoidTy, VoidPtrTy, false);
+DummyCallback = makeDummyFunction(CallbackFnTy);
+
+// void __cudaRegisterLinkedBinary%NVModuleID%(void (*)(void *), void *,
+// void *, void (*)(void **))
+llvm::Type *Params[] = {RegisterGlobalsFnTy, VoidPtrTy, VoidPtrTy,
+CallbackFnTy};
+RegisterLinkedBinaryFnTy = llvm::FunctionType::get(VoidTy, Params, false);
+  }
+
   // void __cuda_register_globals(void* handle);
   llvm::Function *RegisterGlobalsFunc = makeRegisterGlobalsFn();
+  // We always need a function to pass in as callback. Create a dummy
+  // implementation if we don't need to register anything.
+  if (RelocatableDeviceCode && !RegisterGlobalsFunc)
+RegisterGlobalsFunc = makeDummyFunction(RegisterGlobalsFnTy);
+
   // void ** __cudaRegisterFatBinary(void *);
   llvm::Constant *RegisterFatbinFunc = CGM.CreateRuntimeFunction(
   llvm::FunctionType::get(VoidPtrPtrTy, VoidPtrTy, false),
@@ -291,11 +337,18 @@
   continue;
 }
 
-const char *FatbinConstantName =
-CGM.getTriple().isMacOSX() ? "__NV_CUDA,__nv_fatbin" : ".nv_fatbin";
+  

[PATCH] D42922: [CUDA] Register relocatable GPU binaries

2018-02-12 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld planned changes to this revision.
Hahnfeld added a comment.

Still no regression tests.

I did some functional tests though (https://reviews.llvm.org/F5822023): With 
this patch Clang can generate valid object files with relocatable device code. 
For linking I still defer to `nvcc` and I'm not sure if I'm interested in 
reverse-engineering the needed tools to make this fully work with Clang's 
Driver: I think the biggest advantage of CUDA in Clang is using LLVM's CodeGen. 
Note that (in my simple tests) Clang's object files had enough compatibility to 
mix them with other objects generated by `nvcc` (see `Makefile.mixed`)!


https://reviews.llvm.org/D42922



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42923: [CUDA] Allow external variables in separate compilation

2018-02-12 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added inline comments.



Comment at: test/SemaCUDA/extern-shared.cu:4
+// These declarations are fine in separate compilation mode!
+// RUN: %clang_cc1 -fsyntax-only -fcuda-rdc -verify=rdc %s
+// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -fcuda-rdc -verify=rdc %s

tra wrote:
> Nit. `-verify=rdc` is somewhat confusing as there's no rdc prefixes in the 
> checks below. Perhaps something along the lines of 
> `-verify=there-should-be-no-errors`  would be more descriptive.
There is: `rdc-no-diagnostics`.

But given that you missed it, maybe I should move the comment `declarations are 
fine` between `RUN` lines and `no-diagnostics`? Don't know if that helps much 
though...


Repository:
  rC Clang

https://reviews.llvm.org/D42923



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43197: [OpenMP] Add flag for linking runtime bitcode library

2018-02-13 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added inline comments.



Comment at: test/Driver/openmp-offload-gpu.c:150
+/// bitcode library that will be found via the LIBRARY_PATH.
+// RUN:   touch /tmp/libomptarget-nvptx-sm_60.bc
+// RUN:   LIBRARY_PATH=/tmp %clang -### -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda \

This should not be in `/tmp` but probably `%T`.



Comment at: test/Driver/openmp-offload-gpu.c:151
+// RUN:   touch /tmp/libomptarget-nvptx-sm_60.bc
+// RUN:   LIBRARY_PATH=/tmp %clang -### -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda \
+// RUN:   -Xopenmp-target -march=sm_60 -fopenmp-relocatable-target -save-temps 
\

You may want to add `env` which should make this check portable because `lit` 
on Windows does the right thing then (I don't know if this test is run on 
Windows, it probably is)


Repository:
  rC Clang

https://reviews.llvm.org/D43197



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42923: [CUDA] Allow external variables in separate compilation

2018-02-14 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rC325136: [CUDA] Allow external variables in separate 
compilation (authored by Hahnfeld, committed by ).

Changed prior to commit:
  https://reviews.llvm.org/D42923?vs=132866&id=134230#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D42923

Files:
  lib/Sema/SemaDeclAttr.cpp
  test/SemaCUDA/extern-shared.cu


Index: lib/Sema/SemaDeclAttr.cpp
===
--- lib/Sema/SemaDeclAttr.cpp
+++ lib/Sema/SemaDeclAttr.cpp
@@ -4112,7 +4112,8 @@
   auto *VD = cast(D);
   // extern __shared__ is only allowed on arrays with no length (e.g.
   // "int x[]").
-  if (VD->hasExternalStorage() && !isa(VD->getType())) {
+  if (!S.getLangOpts().CUDARelocatableDeviceCode && VD->hasExternalStorage() &&
+  !isa(VD->getType())) {
 S.Diag(Attr.getLoc(), diag::err_cuda_extern_shared) << VD;
 return;
   }
Index: test/SemaCUDA/extern-shared.cu
===
--- test/SemaCUDA/extern-shared.cu
+++ test/SemaCUDA/extern-shared.cu
@@ -1,6 +1,11 @@
 // RUN: %clang_cc1 -fsyntax-only -verify %s
 // RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s
 
+// RUN: %clang_cc1 -fsyntax-only -fcuda-rdc -verify=rdc %s
+// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -fcuda-rdc -verify=rdc %s
+// These declarations are fine in separate compilation mode:
+// rdc-no-diagnostics
+
 #include "Inputs/cuda.h"
 
 __device__ void foo() {


Index: lib/Sema/SemaDeclAttr.cpp
===
--- lib/Sema/SemaDeclAttr.cpp
+++ lib/Sema/SemaDeclAttr.cpp
@@ -4112,7 +4112,8 @@
   auto *VD = cast(D);
   // extern __shared__ is only allowed on arrays with no length (e.g.
   // "int x[]").
-  if (VD->hasExternalStorage() && !isa(VD->getType())) {
+  if (!S.getLangOpts().CUDARelocatableDeviceCode && VD->hasExternalStorage() &&
+  !isa(VD->getType())) {
 S.Diag(Attr.getLoc(), diag::err_cuda_extern_shared) << VD;
 return;
   }
Index: test/SemaCUDA/extern-shared.cu
===
--- test/SemaCUDA/extern-shared.cu
+++ test/SemaCUDA/extern-shared.cu
@@ -1,6 +1,11 @@
 // RUN: %clang_cc1 -fsyntax-only -verify %s
 // RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s
 
+// RUN: %clang_cc1 -fsyntax-only -fcuda-rdc -verify=rdc %s
+// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -fcuda-rdc -verify=rdc %s
+// These declarations are fine in separate compilation mode:
+// rdc-no-diagnostics
+
 #include "Inputs/cuda.h"
 
 __device__ void foo() {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42923: [CUDA] Allow external variables in separate compilation

2018-02-14 Thread Jonas Hahnfeld via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL325136: [CUDA] Allow external variables in separate 
compilation (authored by Hahnfeld, committed by ).
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D42923?vs=132866&id=134231#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D42923

Files:
  cfe/trunk/lib/Sema/SemaDeclAttr.cpp
  cfe/trunk/test/SemaCUDA/extern-shared.cu


Index: cfe/trunk/test/SemaCUDA/extern-shared.cu
===
--- cfe/trunk/test/SemaCUDA/extern-shared.cu
+++ cfe/trunk/test/SemaCUDA/extern-shared.cu
@@ -1,6 +1,11 @@
 // RUN: %clang_cc1 -fsyntax-only -verify %s
 // RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s
 
+// RUN: %clang_cc1 -fsyntax-only -fcuda-rdc -verify=rdc %s
+// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -fcuda-rdc -verify=rdc %s
+// These declarations are fine in separate compilation mode:
+// rdc-no-diagnostics
+
 #include "Inputs/cuda.h"
 
 __device__ void foo() {
Index: cfe/trunk/lib/Sema/SemaDeclAttr.cpp
===
--- cfe/trunk/lib/Sema/SemaDeclAttr.cpp
+++ cfe/trunk/lib/Sema/SemaDeclAttr.cpp
@@ -4112,7 +4112,8 @@
   auto *VD = cast(D);
   // extern __shared__ is only allowed on arrays with no length (e.g.
   // "int x[]").
-  if (VD->hasExternalStorage() && !isa(VD->getType())) {
+  if (!S.getLangOpts().CUDARelocatableDeviceCode && VD->hasExternalStorage() &&
+  !isa(VD->getType())) {
 S.Diag(Attr.getLoc(), diag::err_cuda_extern_shared) << VD;
 return;
   }


Index: cfe/trunk/test/SemaCUDA/extern-shared.cu
===
--- cfe/trunk/test/SemaCUDA/extern-shared.cu
+++ cfe/trunk/test/SemaCUDA/extern-shared.cu
@@ -1,6 +1,11 @@
 // RUN: %clang_cc1 -fsyntax-only -verify %s
 // RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s
 
+// RUN: %clang_cc1 -fsyntax-only -fcuda-rdc -verify=rdc %s
+// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -fcuda-rdc -verify=rdc %s
+// These declarations are fine in separate compilation mode:
+// rdc-no-diagnostics
+
 #include "Inputs/cuda.h"
 
 __device__ void foo() {
Index: cfe/trunk/lib/Sema/SemaDeclAttr.cpp
===
--- cfe/trunk/lib/Sema/SemaDeclAttr.cpp
+++ cfe/trunk/lib/Sema/SemaDeclAttr.cpp
@@ -4112,7 +4112,8 @@
   auto *VD = cast(D);
   // extern __shared__ is only allowed on arrays with no length (e.g.
   // "int x[]").
-  if (VD->hasExternalStorage() && !isa(VD->getType())) {
+  if (!S.getLangOpts().CUDARelocatableDeviceCode && VD->hasExternalStorage() &&
+  !isa(VD->getType())) {
 S.Diag(Attr.getLoc(), diag::err_cuda_extern_shared) << VD;
 return;
   }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   >